Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 45

Chapter 3: Principles of Scalable Performance

 Performance measures
 Speedup laws
 Scalability principles
 Scaling up vs. scaling down
Performance metrics and measures
2

 Parallelism profiles
 Asymptotic speedup factor
 System efficiency, utilization and quality
 Standard performance measures
Degree of parallelism
3

 Reflects the matching of software and hardware


parallelism
 Discrete time function – measure, for each time
period, the # of processors used
 Parallelism profile is a plot of the DOP as a function
of time
 Ideally have unlimited resources
Factors affecting parallelism profiles
4

 Algorithm structure
 Program optimization
 Resource utilization
 Run-time conditions
 Realistically limited by # of available processors,
memory, and other nonprocessor resources
Average parallelism variables
5

 n – homogeneous processors
 m – maximum parallelism in a profile
  - computing capacity of a single processor
(execution rate only, no overhead)
 DOP=i – # processors busy during an observation
period
Average parallelism
6

 Total amount of work performed is proportional to


the area under the profile curve

t2
W    DOP (t )dt
t1
m
W   i  ti
i 1
Average parallelism
7

1 t2
A 
t 2  t1 t 1
DOP (t )dt

 m
  m

A    i  ti  /   ti 
 i 1   i 1 
Example: parallelism profile and average
parallelism
8
Asymptotic speedup
9

m
m
T (1)   ti (1)  
m
Wi
T (1) W i

i 1 i 1  S   i 1
T ( ) m
m
T (  )   ti (  )  
m
Wi W / i
i 1
i

i 1 i 1 i = A in the ideal case


(response time)
Performance measures
10

 Consider n processors executing m programs in


various modes
 Want to define the mean performance of these
multimode computers:
 Arithmetic mean performance
 Geometric mean performance
 Harmonic mean performance
Arithmetic mean performance
11

m
Ra   Ri / m Arithmetic mean execution rate
(assumes equal weighting)
i 1
m
R   ( f i Ri )
* Weighted arithmetic mean
a execution rate
i 1

-proportional to the sum of the inverses of


execution times
Geometric mean performance
12

m
Rg   R 1/ m
i
Geometric mean execution rate

i 1
m
R   Ri
*
g
fi Weighted geometric mean
execution rate
i 1
-does not summarize the real performance since it does
not have the inverse relation with the total time
Harmonic mean performance
13

Ti  1 / Ri Mean execution time per instruction


For program i

1 m 1 m 1
Ta  Ti   Arithmetic mean execution time
m i 1 m i 1 Ri per instruction
Harmonic mean performance
14

m
Rh  1 / Ta  m Harmonic mean execution rate
 (1 / R )
i 1
i

1
R 
*
h m
Weighted harmonic mean execution rate

( f
i 1
i / Ri )

-corresponds to total # of operations divided by


the total time (closest to the real performance)
Harmonic Mean Speedup
15

 Ties the various modes of a program to the number


of processors used
 Program is in mode i if i processors used
 Sequential execution time T1 = 1/R1 = 1

1
S  T1 / T 
*


n
i 1
f i / Ri
Harmonic Mean Speedup Performance
16
Amdahl’s Law
17

 Assume Ri = i, w = (, 0, 0, …, 1- )
 System is either sequential, with probability , or
fully parallel with prob. 1- 

n
Sn 
1  (n  1)
 Implies S  1/  as n  
Speedup Performance
18
System Efficiency
19

 O(n) is the total # of unit operations


 T(n) is execution time in unit time steps
 T(n) < O(n) and T(1) = O(1)

S ( N )  T (1) / T (n)

S (n) T (1)
E ( n)  
n nT (n)
Redundancy and Utilization
20

 Redundancy signifies the extent of matching


software and hardware parallelism

R(n)  O(n) / O(1)


 Utilization indicates the percentage of resources kept
busy during execution

O ( n)
U ( n)  R ( n ) E ( n) 
nT (n)
Quality of Parallelism
21

 Directly proportional to the speedup and efficiency


and inversely related to the redundancy
 Upper-bounded by the speedup S(n)

3
S (n) E (n) T (1)
Q(n)   2
R(n) nT (n)O(n)
Example of Performance
22

 Given O(1) = T(1) = n3, O(n) = n3 + n2log n, and T(n)


= 4 n3/(n+3)
 S(n) = (n+3)/4
 E(n) = (n+3)/(4n)
 R(n) = (n + log n)/n
 U(n) = (n+3)(n + log n)/(4n2)
 Q(n) = (n+3)2 / (16(n + log n))
Standard Performance Measures
23

 MIPS and Mflops


 Depends on instruction set and program used

 Dhrystone results
 Measure of integer performance

 Whestone results
 Measure of floating-point performance

 TPS and KLIPS ratings


 Transaction performance and reasoning power
Parallel Processing Applications
24

 Drug design
 High-speed civil transport
 Ocean modeling
 Ozone depletion research
 Air pollution
 Digital anatomy
Application Models for Parallel Computers
25

 Fixed-load model
 Constant workload

 Fixed-time model
 Demands constant program execution time

 Fixed-memory model
 Limited by the memory bound
26
Algorithm Characteristics
27

 Deterministic vs. nondeterministic


 Computational granularity
 Parallelism profile
 Communication patterns and synchronization
requirements
 Uniformity of operations
 Memory requirement and data structures
Isoefficiency Concept
28

 Relates workload to machine size n needed to


maintain a fixed efficiency
workload
w( s)
E overhead
w( s)  h( s, n)
 The smaller the power of n, the more scalable the
system
Isoefficiency Function
29

 To maintain a constant E, w(s) should grow in


proportion to h(s,n)

E
w( s )   h( s, n)
1 E
 C = E/(1-E) is constant for fixed E

f E (n)  C  h(s, n)
Speedup Performance Laws
30

 Amdahl’s law
 for fixed workload or fixed problem size

 Gustafson’s law
 for scaled problems (problem size increases with increased
machine size)
 Speedup model
 for scaled problems bounded by memory capacity
Amdahl’s Law
31

 As number of processors increase, the fixed load is


distributed to more processors
 Minimal turnaround time is primary goal
 Speedup factor is upper-bounded by a sequential
bottleneck
 Two cases:
 DOP < n
 DOP  n
Fixed Load Speedup Factor
32
 Case 1: DOP > n  Case 2: DOP < n

Wi  i 
ti (i )  Wi
i  n  ti ( n)  ti ( ) 
i
m
Wi  i 
T ( n)    
i 1 i  n  m

T (1) W i
Sn   i i
m
Wi i 

T ( n)
 n 
i 1 i
Amdahl’s Law
33
Gustafson’s Law
34

 With Amdahl’s Law, the workload cannot scale to


match the available computing power as n increases
 Gustafson’s Law fixes the time, allowing the problem
size to increase with higher n
 Not saving time, but increasing accuracy
Fixed-time Speedup
35

 As the machine size increases, have increased


workload and new profile
 In general, Wi’ > Wi for 2  i  m’ and W1’ = W1
 Assume T(1) = T’(n)
Gustafson’s Scaled Speedup
36

i 
m m '
Wi

i 1
Wi  
i 1 i
 n   Q ( n )
m'

W i
'
W1  nWn
S ' i 1

W1  Wn
n m

W
i 1
i
Gustafson’s Scaled Speedup
37
Memory Bounded Speedup Model
38

 Idea is to solve largest problem, limited by memory


space
 Results in a scaled workload and higher accuracy
 Each node can handle only a small subproblem for
distributed memory
 Using a large # of nodes collectively increases the
memory capacity proportionally
Fixed-Memory Speedup
39

 Let M be the memory requirement and W the


computational workload: W = g(M)
 g*(nM)=G(n)g(M)=G(n)Wn

m*

W i
*
W1  G (n)Wn
S 
* i 1

W1  G (n)Wn / n
n m*
Wi* i 

i 1 i
 n   Q ( n )
Fixed-Memory Speedup
40

Scaled speedup model using fixed memory


Relating Speedup Models
41

 G(n) reflects the increase in workload as memory


increases n times
 G(n) = 1 : Fixed problem size (Amdahl)
 G(n) = n : Workload increases n times when memory
increased n times (Gustafson)
 G(n) > n : workload increases faster than memory
than the memory requirement
Scalability Metrics
42

 Machine size (n) : # of processors


 Clock rate (f) : determines basic m/c cycle
 Problem size (s) : amount of computational
workload. Directly proportional to T(s,1).
 CPU time (T(s,n)) : actual CPU time for execution
 I/O demand (d) : demand in moving the program,
data, and results for a given run
Scalability Metrics
43

 Memory capacity (m) : max # of memory words


demanded
 Communication overhead (h(s,n)) : amount of time
for interprocessor communication,
synchronization, etc.
 Computer cost (c) : total cost of h/w and s/w
resources required
 Programming overhead (p) : development
overhead associated with an application program
Speedup and Efficiency
44

 The problem size is the independent parameter

T ( s,1)
S ( s , n) 
T ( s , n )  h( s , n )
S ( s, n)
E ( s, n) 
n
Scalable Systems
45

 Ideally, if E(s,n)=1 for all algorithms and any s and n,


system is scalable
 Practically, consider the scalability of a m/c

S ( s, n) TI ( s, n)
 ( s, n)  
S I ( s, n) T ( s, n)

You might also like