Professional Documents
Culture Documents
ch5 3
ch5 3
analysis
Computers as Components 3e
© 2012 Marilyn Wolf
Program-level performance
analysis
Need to understand
performance in detail:
Real-time behavior, not
just typical.
On complex platforms.
Program performance
CPU performance:
Pipeline, cache are
windows into program.
We must analyze the entire
program.
Computers as Components 3e
© 2012 Marilyn Wolf
Complexities of program
performance
Computers as Components 3e
© 2012 Marilyn Wolf
How to measure program
performance
Computers as Components 3e
© 2012 Marilyn Wolf
Program performance
metrics
Computers as Components 3e
© 2012 Marilyn Wolf
Data-dependent paths in
an if statement
if (a || b) { /* T1 */ a b c path
if ( c ) /* T2 */ 0 0 0 T1=F, T3=F: no assignments
Computers as Components 3e
© 2012 Marilyn Wolf
Paths in a loop
N
i=N
Y
f = f + c[i] * x[i]
i=i+1
Computers as Components 3e
© 2012 Marilyn Wolf
Instruction timing
Computers as Components 3e
© 2012 Marilyn Wolf
Mesaurement-driven
performance analysis
Computers as Components 3e
© 2012 Marilyn Wolf
Feeding the program
Computers as Components 3e
© 2012 Marilyn Wolf
Trace-driven measurement
Trace-driven:
Instrument the program.
Save information about the path.
Requires modifying the program.
Trace files are large.
Widely used for cache analysis.
Computers as Components 3e
© 2012 Marilyn Wolf
Physical measurement
Computers as Components 3e
© 2012 Marilyn Wolf
CPU simulation
Computers as Components 3e
© 2012 Marilyn Wolf
SimpleScalar FIR filter
simulation
int x[N] = {8, 17, … }; COUNT total sim sim cycles
cycles per filter
int c[N] = {1, 2, … };
execution
main() {
100 25854 259
int i, k, f;
1,000 155759 156
for (k=0; k<COUNT; k++)
1,0000 1451840 145
for (i=0; i<N; i++)
f += c[i]*x[i];
}
Computers as Components 3e
© 2012 Marilyn Wolf
Performance optimization
motivation
Computers as Components 3e
© 2012 Marilyn Wolf
Loop optimizations
Computers as Components 3e
© 2012 Marilyn Wolf
Code motion
i<N*M
i<X N
Y
z[i] = a[i] + b[i];
i = i+1;
Computers as Components 3e
© 2012 Marilyn Wolf
Induction variable
elimination
Computers as Components 3e
© 2012 Marilyn Wolf
Array conflicts in cache
a[0,0] 1024
1024 4099
Computers as Components 3e
© 2012 Marilyn Wolf
Performance optimization
hints
Computers as Components 3e
© 2012 Marilyn Wolf
Energy/power optimization
Computers as Components 3e
© 2012 Marilyn Wolf
Measuring energy
consumption
while (TRUE)
a();
Computers as Components 3e
© 2012 Marilyn Wolf
Sources of energy
consumption
Computers as Components 3e
© 2012 Marilyn Wolf
Cache sweet spot
Computers as Components 3e
© 2012 Marilyn Wolf
Optimizing for energy
First-order optimization:
high performance = low energy.
Not many instructions trade speed for
energy.
Computers as Components 3e
© 2012 Marilyn Wolf
Optimizing for energy,
cont’d.
General rules:
Don’t use function calls.
Keep loop body small to enable local repeat
(only forward branches).
Use unsigned integer for loop counter.
Use <= to test loop counter.
Make use of compiler---global optimization,
software pipelining.
Computers as Components 3e
© 2012 Marilyn Wolf
Single-instruction repeat loop
example
STM #4000h,AR2
; load pointer to source
STM #100h,AR3
; load pointer to destination
RPT #(1024-1)
MVDD *AR2+,*AR3+
; move
Computers as Components 3e
© 2012 Marilyn Wolf
Optimizing for program
size
Goal:
reduce hardware cost of memory;
reduce power consumption of memory units.
Two opportunities:
data;
instructions.
Computers as Components 3e
© 2012 Marilyn Wolf
Data size minimization
Computers as Components 3e
© 2012 Marilyn Wolf
Reducing code size
Computers as Components 3e
© 2012 Marilyn Wolf
Program validation and
testing
Computers as Components 3e
© 2012 Marilyn Wolf
Clear-box testing
Computers as Components 3e
© 2012 Marilyn Wolf
Controlling and observing
programs
firout = 0.0;
Controllability:
for (j=curr, k=0; j<N; j++, k++)
firout += buff[j] * c[k]; Must fill circular buffer
for (j=0; j<curr; j++, k++) with desired N values.
firout += buff[j] * c[k]; Other code governs
if (firout > 100.0) firout = 100.0; how we access the
if (firout < -100.0) firout = -100.0;
buffer.
Observability:
Want to examine
firout before limit
testing.
Computers as Components 3e
© 2012 Marilyn Wolf
Execution paths and
testing
Computers as Components 3e
© 2012 Marilyn Wolf
Choosing the paths to test
Possible criteria:
Execute every
statement at least not covered
once.
Execute every branch
direction at least once.
Equivalent for
structured programs.
Not true for gotos.
Computers as Components 3e
© 2012 Marilyn Wolf
Basis paths
Approximate CDFG
with undirected
graph.
Undirected graphs
have basis paths:
All paths are linear
combinations of basis
paths.
Computers as Components 3e
© 2012 Marilyn Wolf
Cyclomatic complexity
Cyclomatic complexity
is a bound on the size
of basis sets:
e = # edges
n = # nodes
p = number of graph
components
M = e – n + 2p.
Computers as Components 3e
© 2012 Marilyn Wolf
Branch testing
Computers as Components 3e
© 2012 Marilyn Wolf
Branch testing example
Correct: Test:
if (a || (b >= c)) { a = F
printf(“OK\n”); } (b >=c) = T
Incorrect: Example:
if (a && (b >= c)) { Correct: [0 || (3 >=
printf(“OK\n”); } 2)] = T
Incorrect: [0 && (3
>= 2)] = F
Computers as Components 3e
© 2012 Marilyn Wolf
Another branch testing
example
Computers as Components 3e
© 2012 Marilyn Wolf
Domain testing
Computers as Components 3e
© 2012 Marilyn Wolf
Def-use pairs
Variable def-use:
Def when value is
assigned (defined).
Use when used on
right-hand side.
Exercise each def-use
pair.
Requires testing
correct path.
Computers as Components 3e
© 2012 Marilyn Wolf
Loop testing
Computers as Components 3e
© 2012 Marilyn Wolf
Black-box test vectors
Random tests.
May weight distribution based on software
specification.
Regression tests.
Tests of previous versions, bugs, etc.
May be clear-box tests of previous versions.
Computers as Components 3e
© 2012 Marilyn Wolf
How much testing is
enough?