Professional Documents
Culture Documents
Lect1 PDF
Lect1 PDF
CS F342
Introduction
memory capacity
processor speed (due to advances in technology
and hardware organization)
Moores Law
P4 Extreme Ed.
178 millions w/ 2MB L3
2,250
42 millions
Exponential growth
Feature Size
Input
secondary (disk,
CD, DVD, )
Datapath Processor
Processor
(CPU)
Control
Control
Datapath
Output
Memory
1001010010110000
0010100101010001
1111011101100110
1001010010110000
1001010010110000
1001010010110000
Abstraction
High Level
Language
main() {
int i,b,c,a[10];
for (i=0; i<10; i++)
a[2] = b + c*i;
}
Compiler
ISA
lw r2, mem[r7]
add r3, r4, r2
st
r3, mem[r8]
Assembler
software
instruction set
hardware
architecture
Application
Operating
System
Compiler
Firmware
I/O system
Instruction Set
Architecture
Technology
Programming
Languages
Applications
Computer
Architecture
Operating
Systems
History
Performance issues
A specific instruction set architecture
Arithmetic and how to build an ALU
Constructing a processor to execute our instructions
Pipelining to improve performance
Memory: caches and virtual memory
I/O
18
Components
Mid Sem Test
:75
Lab (Reg+Test)
:45
Comprehensive
:80
Performance
What do we measure?
Define performance.
Airplane
Boeing 737-100
Boeing 747
BAC/Sud Concorde
Douglas DC-8-50
Passengers
101
470
132
146
Range (mi)
630
4150
4000
8720
Speed (mph)
598
610
1350
544
Defining Performance
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
0
100
200
300
400
500
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
500
1000
4000
6000
8000 10000
Passenger Capacity
2000
1500
Computer Performance:
TIME, TIME, TIME!!!
Individual user
concerns
Throughput:
Systems manager
concerns
Execution Time
Elapsed Time
CPU time
doesn't count waiting for I/O or time spent running other programs
can be divided into user CPU time and system CPU time (OS calls)
CPU time = user CPU time + system CPU time
elapsed time = user CPU time + system CPU time + wait time
Our focus: user CPU time (CPU execution time or, simply, execution
time)
time spent executing the lines of code that are in our program
Definition of Performance
Clock Cycles
program program
cycle
tick
tick
cycle
time
Performance Equation I
seconds
cycles
seconds
program
program
cycle
equivalently
CPU execution time
for a program
...
6th
5th
4th
3rd instruction
2nd instruction
time
time
Example
Clock cycles(A)=40x109
6=1.2x 40x109/Clock rate(B)
Clock rate (B)=8GHz
Terminology
Performance Measure
Performance Equation II
CPU execution time
Instruction count average CPI
=
for a program
for a program
seconds
cycles seconds
program program
cycle
CPI Example
by this much
n
Clock Cycles
Instruction Counti
CPI
CPIi
Instruction Count i1
Instruction Count
Relative
frequency
CPI Example
IC in sequence 1
IC in sequence 2
Sequence 1: IC = 5
Clock Cycles
= 21 + 12 + 23
= 10
Avg. CPI = 10/5 =
2.0
Sequence 2: IC = 6
Clock Cycles
= 41 + 12 + 13
=9
Avg. CPI = 9/6 = 1.5
MIPS Example
Performance Summary
Instructions Clock cycles Seconds
CPU Time
Program
Instruction Clock cycle
Performance depends on
Benchmarks
Benchmark suites
SPEC History
Language
What It Is
164.gzip
175.vpr
176.gcc
181.mcf
186.crafty
197.parser
252.eon
253.perlbmk
254.gap
255.vortex
256.bzip2
300.twolf
C
C
C
C
C
C
C++
C
C
C
C
C
Compression
FPGA Circuit Placement and Routing
C Programming Language Compiler
Combinatorial Optimization
Game Playing: Chess
Word Processing
Computer Visualization
PERL Programming Language
Group Theory, Interpreter
Object-oriented Database
Compression
Place and Route Simulator
Language
What It Is
168.wupwise
171.swim
172.mgrid
173.applu
177.mesa
178.galgel
179.art
183.equake
187.facerec
188.ammp
189.lucas
191.fma3d
200.sixtrack
301.apsi
Fortran 77
Fortran 77
Fortran 77
Fortran 77
C
Fortran 90
C
C
Fortran 90
C
Fortran 90
Fortran 90
Fortran 77
Fortran 77
SPEC CPU2006
i1
CPI
Tc (ns)
Exec time
Ref time
SPECratio
2,118
0.75
0.40
637
9,777
15.3
bzip2
Block-sorting compression
2,389
0.85
0.40
817
9,650
11.8
gcc
GNU C Compiler
1,050
1.72
0.40
724
8,050
11.1
mcf
Combinatorial optimization
336
10.00
0.40
1,345
9,120
6.8
go
Go game (AI)
1,658
1.09
0.40
721
10,490
14.6
hmmer
2,783
0.80
0.40
890
9,330
10.5
sjeng
2,176
0.96
0.40
837
12,100
14.5
libquantum
1,623
1.61
0.40
1,047
20,720
19.8
h264avc
Video compression
3,102
0.80
0.40
993
22,130
22.3
omnetpp
587
2.94
0.40
690
6,250
9.1
astar
Games/path finding
1,082
1.79
0.40
773
7,020
9.1
xalancbmk
XML parsing
1,058
2.70
0.40
1,143
6,900
6.0
Name
Description
perl
Geometric mean
11.7
SPEC 95
10
10
SPECfp
SPECint
50
100
150
Clock rate (MHz)
200
250
Pentium
Pentium Pro
50
100
150
200
250
Pentium
Pentium Pro
49
I/O
Network
Graphics
Java
Web server
Transaction processing (databases)
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )
Time before
Improvement
Time after
Improvement
51
Amdahls Law
Speed-up =
Perfnew / Perfold =Exec_timeold / Exec_timenew =
1
f
(1 f )
P
Told
(1 - f)
Tnew
(1 - f)
f/P
52
Example
"Suppose a program runs in 100 seconds on a machine, with multiply
responsible for 80 seconds of this time. How much do we have to
improve the speed of multiplication if we want the program to run 4
times faster?"
53
Remember
Performance is specific to a particular program/s
Total execution time is a consistent summary of performance
For a given architecture performance increases come from:
increases in clock rate (without adverse CPI affects)
improvements in processor organization that lower CPI
compiler enhancements that lower CPI and/or instruction count
Pitfall: expecting improvement in one aspect of a machines