Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

Principles of Scalable Performance

• Performance measures
• Speedup laws
• Scalability principles
• Scaling up vs. scaling down

1
Performance metrics and measures

• Parallelism profiles

• Asymptotic speedup factor

• System efficiency, utilization and quality

• Standard performance measures

2
Parallelism profile in Programs

• The degree of parallelism reflects the extent to


which software parallelism matches hardware
parallelism
Degree of parallelism

• Execution of a program on a parallel computers-


use different number of processor at different time
periods during the execution cycle

• For each period –number of processor used to


execute a program – degree of parallelism (DOP)

• Discrete time function- only non negative integer


value

4
Degree of parallelism
• Parallelism profile is a plot of the DOP as a
function of time

• Ideally have unlimited resources

• Software tools –available to trace the


parallelism profile
Factors affecting parallelism profiles

• Algorithm structure

• Program optimization

• Resource utilization

• Run-time conditions

6
Degree of parallelism

• DOP-assumption – unbounded number of


available processors and other necessary
resources

• DOP not achievable on a real computer with


limited resources

• DOP exceeds maximum number of available


processor – parallel branches executed in
chunks sequentially
Degree of parallelism
• Parallelism still exists within each chunk , limited
by machine size

• Limited by memory & other non processor


resources
Average parallelism variables

• n – homogeneous processors

• m – maximum parallelism in a profile

•  - computing capacity of a single processor


(execution rate only, no overhead)

• DOP=i – # processors busy during an observation


period

9
Average parallelism

• Total amount of work performed is proportional to


the area under the profile curve
t2
W    DOP(t )dt
t1
m
W    i  ti
i 1
• ti total amount of time that DOP = I

• t2 –t1-total elapsed time


10
Average parallelism

1 t2
A 
t 2  t1 t 1
DOP (t )dt

 m
  m

A    i  ti  /   ti 
 i 1   i 1 

11
Example: parallelism profile and average
parallelism

12
Available Parallelism

• Potential parallelism in application programs

• Engineering & scientific codes exhibit a high DOP due


to data parallelism

• Computation is less –little parallelism when basic


boundaries are ignored

• Basic block- block of instructions that has single entry


and single exit points

• Complier organization & algorithm redesign –increase


available parallelism
Asymptotic speedup

m
m
T (1)   ti (1)  
Wim
T (1) W i

i 1 i 1  S   i 1
T ( ) m
m
T (  )   ti (  )  
Wim
W / i
i 1
i

i 1 i 1 i = A in the ideal case

(response time)

14
Performance measures

• Consider n processors executing m programs in


various modes with different performance levels
• Want to define the mean performance of these
multimode computers:
• Arithmetic mean performance
• Geometric mean performance
• Harmonic mean performance

15
Arithmetic mean performance

m
Ra   Ri / m Arithmetic mean execution rate
(assumes equal weighting)
i 1
m
R   ( f i Ri )
* Weighted arithmetic mean
execution rate
a
i 1
-proportional to the sum of the inverses of
execution times

16
Geometric mean performance

m
Rg   R 1/ m
i
Geometric mean execution rate

i 1
m
R   Ri
*
g
fi Weighted geometric mean
execution rate

i 1
-does not summarize the real performance since it does
not have the inverse relation with the total time

17
Harmonic mean performance

Mean execution time per instruction


Ti  1 / Ri For program i

1 m 1 m 1
Ta   Ti   Arithmetic mean execution time
per instruction
m i 1 m i 1 Ri

18
Harmonic mean performance

m
Rh  1 / Ta  m
Harmonic mean execution rate

 (1 / R )
i 1
i

1
R 
*
h m
Weighted harmonic mean execution rate

( f
i 1
i / Ri )
-corresponds to total # of operations divided by
the total time (closest to the real performance)

19
Harmonic Mean Speedup

• Ties the various modes of a program to the


number of processors used

• Program is in execution mode i, if i processors


used
1
S  T1 / T 
*


n
i 1
f i / Ri

• Sequential execution time T1 = 1/R1 = 1

20
Harmonic Mean Speedup Performance

21
Amdahl’s Law

• Assume Ri = i, w = (, 0, 0, …, 1- )
• System is either sequential, with
probability , or fully parallel with prob.
1- 
n
Sn 
1  (n  1)

• Implies S  1/  as n  
22
Speedup Performance

23
System Efficiency

• O(n) is the total # of unit operations

• T(n) is execution time in unit time steps

• T(n) < O(n) and T(1) = O(1)


𝑆 (𝑛)=𝑇 (1)/𝑇 (𝑛)
𝑆 (𝑛) 𝑇 (1)
𝐸 (𝑛)= =
𝑛 𝑛𝑇 (𝑛)

24
Redundancy and Utilization

• Redundancy signifies the extent of matching


software and hardware parallelism

R (n)  O (n) / O(1)


• Utilization indicates the percentage of resources
kept busy during execution

O ( n)
U ( n)  R ( n) E ( n) 
nT (n)

25
Quality of Parallelism

• Directly proportional to the speedup and efficiency


and inversely related to the redundancy

• Upper-bounded by the speedup S(n)

S ( n) E ( n) T 3 (1)
Q ( n)  
R ( n) nT 2 (n)O(n)

26
Example of Performance

• Given O(1) = T(1) = n3, O(n) = n3 + n2log n, and T(n) =


4 n3/(n+3)
S(n) = (n+3)/4
E(n) = (n+3)/(4n)
R(n) = (n + log n)/n
U(n) = (n+3)(n + log n)/(4n2)
Q(n) = (n+3)2 / (16(n + log n))

27
Standard Performance Measures

• MIPS and Mflops


• Describe the instruction execution rate & floating point
capability of a parallel computer
• MIPS= fx Ic/ C x 10^6

• MIPS-Depends on instruction ,performance

• Mflops – depends on machine hardware design and on


program behavior

28
Standard Performance Measures

• Dhrystone results
• CPU intensive benchmark
• Consists of 100 high level language instructions
& data types
• Balanced with respect to statement type, data
type, locality of reference , with no operating
system calls and making no use of library
functions or subroutines
• Measure of integer performance of modern
processor
Standard Performance Measures

• Whestone results
• Fortran based synthetic benchmark
• Measure of floating-point performance
• Benchmark includes both integer & floating
point operations involving array
indexing ,subroutine calls, parameter
passing ,conditional branching
Standard Performance Measures
• Performance depends on compliers used

• Dhrystone – to test CPU

• Procedure in-lining compiler technique –affect


dhrystone performance

• Sensitivity to compliers – drawback


Standard Performance Measures
• TPS and KLIPS ratings
• On line transaction processing applications
demand rapid, interactive processing for a
large number of relatively simple transaction
• Supported by very large database
• Automated teller machine & airline reservation
-examples
• Transaction performance
Standard Performance Measures

• Throughput of computers –on-line transaction


processing –transaction per second

• Transaction involve – database search , query


answering , database update operations

• In AI applications , the measure KLIPS(Kilo logic


interference per second)

• reasoning power of AI machine


Standard Performance Measures

• Japan fifth generation computer system –


performance of 400 KLIPS

• 400 KLIPS = 40 MIPS

• Logic inference demands symbolic manipulation

You might also like