Module 1 Chapter3

Chapter 3: Principles of Scalable Performance
 Performance measures
 Speedup laws
 Scalability principles
 Scaling up vs. scaling down
Performance metrics and measures
2
 Parallelism profiles
 Asymptotic speedup factor
 System efficiency, utilization and quality
 Standard performance measures
Degree of parallelism
3
 Reflects the matching of software and hardware

parallelism
 Discrete time function – measure, for each time
period, the # of processors used
 Parallelism profile is a plot of the DOP as a function
of time
 Ideally have unlimited resources
Factors affecting parallelism profiles
4
 Algorithm structure
 Program optimization
 Resource utilization
 Run-time conditions
 Realistically limited by # of available processors,
memory, and other nonprocessor resources
Average parallelism variables
5
 n – homogeneous processors
 m – maximum parallelism in a profile
  - computing capacity of a single processor
(execution rate only, no overhead)
 DOP=i – # processors busy during an observation
period
Average parallelism
6
 Total amount of work performed is proportional to

the area under the profile curve
t2
W    DOP (t )dt
t1
m
W   i  ti
i 1
Average parallelism
7
1 t2
A 
t 2  t1 t 1
DOP (t )dt
 m
  m

A    i  ti  /   ti 
 i 1   i 1 
Example: parallelism profile and average
parallelism
8
Asymptotic speedup
9
m
m
T (1)   ti (1)  
m
Wi
T (1) W i
i 1 i 1  S   i 1
T ( ) m
m
T (  )   ti (  )  
m
Wi W / i
i 1
i
i 1 i 1 i = A in the ideal case

(response time)
Performance measures
10
 Consider n processors executing m programs in

various modes
 Want to define the mean performance of these
multimode computers:
 Arithmetic mean performance
 Geometric mean performance
 Harmonic mean performance
Arithmetic mean performance
11
m
Ra   Ri / m Arithmetic mean execution rate
(assumes equal weighting)
i 1
m
R   ( f i Ri )
* Weighted arithmetic mean
a execution rate
i 1
-proportional to the sum of the inverses of

execution times
Geometric mean performance
12
m
Rg   R 1/ m
i
Geometric mean execution rate
i 1
m
R   Ri
*
g
fi Weighted geometric mean
execution rate
i 1
-does not summarize the real performance since it does
not have the inverse relation with the total time
Harmonic mean performance
13
Ti  1 / Ri Mean execution time per instruction

For program i
1 m 1 m 1
Ta  Ti   Arithmetic mean execution time
m i 1 m i 1 Ri per instruction
Harmonic mean performance
14
m
Rh  1 / Ta  m Harmonic mean execution rate
 (1 / R )
i 1
i
1
R 
*
h m
Weighted harmonic mean execution rate
( f
i 1
i / Ri )
-corresponds to total # of operations divided by

the total time (closest to the real performance)
Harmonic Mean Speedup
15
 Ties the various modes of a program to the number

of processors used
 Program is in mode i if i processors used
 Sequential execution time T1 = 1/R1 = 1
1
S  T1 / T 
*

n
i 1
f i / Ri
Harmonic Mean Speedup Performance
16
Amdahl’s Law
17
 Assume Ri = i, w = (, 0, 0, …, 1- )
 System is either sequential, with probability , or
fully parallel with prob. 1- 
n
Sn 
1  (n  1)
 Implies S  1/  as n  
Speedup Performance
18
System Efficiency
19
 O(n) is the total # of unit operations

 T(n) is execution time in unit time steps
 T(n) < O(n) and T(1) = O(1)
S ( N )  T (1) / T (n)
S (n) T (1)
E ( n)  
n nT (n)
Redundancy and Utilization
20
 Redundancy signifies the extent of matching

software and hardware parallelism
R(n)  O(n) / O(1)

 Utilization indicates the percentage of resources kept
busy during execution
O ( n)
U ( n)  R ( n ) E ( n) 
nT (n)
Quality of Parallelism
21
 Directly proportional to the speedup and efficiency

and inversely related to the redundancy
 Upper-bounded by the speedup S(n)
3
S (n) E (n) T (1)
Q(n)   2
R(n) nT (n)O(n)
Example of Performance
22
 Given O(1) = T(1) = n3, O(n) = n3 + n2log n, and T(n)

= 4 n3/(n+3)
 S(n) = (n+3)/4
 E(n) = (n+3)/(4n)
 R(n) = (n + log n)/n
 U(n) = (n+3)(n + log n)/(4n2)
 Q(n) = (n+3)2 / (16(n + log n))
Standard Performance Measures
23
 MIPS and Mflops

 Depends on instruction set and program used
 Dhrystone results
 Measure of integer performance
 Whestone results
 Measure of floating-point performance
 TPS and KLIPS ratings

 Transaction performance and reasoning power
Parallel Processing Applications
24
 Drug design
 High-speed civil transport
 Ocean modeling
 Ozone depletion research
 Air pollution
 Digital anatomy
Application Models for Parallel Computers
25
 Fixed-load model
 Constant workload
 Fixed-time model
 Demands constant program execution time
 Fixed-memory model
 Limited by the memory bound
26
Algorithm Characteristics
27
 Deterministic vs. nondeterministic

 Computational granularity
 Parallelism profile
 Communication patterns and synchronization
requirements
 Uniformity of operations
 Memory requirement and data structures
Isoefficiency Concept
28
 Relates workload to machine size n needed to

maintain a fixed efficiency
workload
w( s)
E overhead
w( s)  h( s, n)
 The smaller the power of n, the more scalable the
system
Isoefficiency Function
29
 To maintain a constant E, w(s) should grow in

proportion to h(s,n)
E
w( s )   h( s, n)
1 E
 C = E/(1-E) is constant for fixed E
f E (n)  C  h(s, n)
Speedup Performance Laws
30
 Amdahl’s law
 for fixed workload or fixed problem size
 Gustafson’s law
 for scaled problems (problem size increases with increased
machine size)
 Speedup model
 for scaled problems bounded by memory capacity
Amdahl’s Law
31
 As number of processors increase, the fixed load is

distributed to more processors
 Minimal turnaround time is primary goal
 Speedup factor is upper-bounded by a sequential
bottleneck
 Two cases:
 DOP < n
 DOP  n
Fixed Load Speedup Factor
32
 Case 1: DOP > n  Case 2: DOP < n
Wi  i 
ti (i )  Wi
i  n  ti ( n)  ti ( ) 
i
m
Wi  i 
T ( n)    
i 1 i  n  m
T (1) W i
Sn   i i
m
Wi i 

T ( n)
 n 
i 1 i
Amdahl’s Law
33
Gustafson’s Law
34
 With Amdahl’s Law, the workload cannot scale to

match the available computing power as n increases
 Gustafson’s Law fixes the time, allowing the problem
size to increase with higher n
 Not saving time, but increasing accuracy
Fixed-time Speedup
35
 As the machine size increases, have increased

workload and new profile
 In general, Wi’ > Wi for 2  i  m’ and W1’ = W1
 Assume T(1) = T’(n)
Gustafson’s Scaled Speedup
36
i 
m m '
Wi

i 1
Wi  
i 1 i
 n   Q ( n )
m'
W i
'
W1  nWn
S ' i 1

W1  Wn
n m
W
i 1
i
Gustafson’s Scaled Speedup
37
Memory Bounded Speedup Model
38
 Idea is to solve largest problem, limited by memory

space
 Results in a scaled workload and higher accuracy
 Each node can handle only a small subproblem for
distributed memory
 Using a large # of nodes collectively increases the
memory capacity proportionally
Fixed-Memory Speedup
39
 Let M be the memory requirement and W the

computational workload: W = g(M)
 g*(nM)=G(n)g(M)=G(n)Wn
m*
W i
*
W1  G (n)Wn
S 
* i 1

W1  G (n)Wn / n
n m*
Wi* i 

i 1 i
 n   Q ( n )
Fixed-Memory Speedup
40
Scaled speedup model using fixed memory

Relating Speedup Models
41
 G(n) reflects the increase in workload as memory

increases n times
 G(n) = 1 : Fixed problem size (Amdahl)
 G(n) = n : Workload increases n times when memory
increased n times (Gustafson)
 G(n) > n : workload increases faster than memory
than the memory requirement
Scalability Metrics
42
 Machine size (n) : # of processors

 Clock rate (f) : determines basic m/c cycle
 Problem size (s) : amount of computational
workload. Directly proportional to T(s,1).
 CPU time (T(s,n)) : actual CPU time for execution
 I/O demand (d) : demand in moving the program,
data, and results for a given run
Scalability Metrics
43
 Memory capacity (m) : max # of memory words

demanded
 Communication overhead (h(s,n)) : amount of time
for interprocessor communication,
synchronization, etc.
 Computer cost (c) : total cost of h/w and s/w
resources required
 Programming overhead (p) : development
overhead associated with an application program
Speedup and Efficiency
44
 The problem size is the independent parameter
T ( s,1)
S ( s , n) 
T ( s , n )  h( s , n )
S ( s, n)
E ( s, n) 
n
Scalable Systems
45
 Ideally, if E(s,n)=1 for all algorithms and any s and n,

system is scalable
 Practically, consider the scalability of a m/c
S ( s, n) TI ( s, n)
 ( s, n)  
S I ( s, n) T ( s, n)

Module 1 Chapter3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 1 Chapter3

Uploaded by

Copyright:

Available Formats

Chapter 3: Principles of Scalable Performance

 Reflects the matching of software and hardware

 Total amount of work performed is proportional to

i 1 i 1 i = A in the ideal case

 Consider n processors executing m programs in

-proportional to the sum of the inverses of

Ti  1 / Ri Mean execution time per instruction

-corresponds to total # of operations divided by

 Ties the various modes of a program to the number

 O(n) is the total # of unit operations

 Redundancy signifies the extent of matching

R(n)  O(n) / O(1)

 Directly proportional to the speedup and efficiency

 Given O(1) = T(1) = n3, O(n) = n3 + n2log n, and T(n)

 MIPS and Mflops

 TPS and KLIPS ratings

 Deterministic vs. nondeterministic

 Relates workload to machine size n needed to

 To maintain a constant E, w(s) should grow in

 As number of processors increase, the fixed load is

 With Amdahl’s Law, the workload cannot scale to

 As the machine size increases, have increased

 Idea is to solve largest problem, limited by memory

 Let M be the memory requirement and W the

Scaled speedup model using fixed memory

 G(n) reflects the increase in workload as memory

 Machine size (n) : # of processors

 Memory capacity (m) : max # of memory words

 The problem size is the independent parameter

 Ideally, if E(s,n)=1 for all algorithms and any s and n,

You might also like