Professional Documents
Culture Documents
Computer Organization & Architecture: Van-Khoa Pham (PHD.)
Computer Organization & Architecture: Van-Khoa Pham (PHD.)
Computer Organization & Architecture: Van-Khoa Pham (PHD.)
Programming
Language
Function Units
Cycles per second (clock rate)
Transistors Wires Pins
+ Relative Performance
Performance = 1/Execution Time
Performanc e A Performanc eB
Execution time B Execution time A n
an
co di alog
nv git to
er al
CC = 1 / CR
1998, The Computer Language Co.
Performance improved by
o Reducing number of clock cycles
o Increasing clock rate
We are trying to build a computer B, that will run this program in 6 seconds.
If on Computer B, require 1.2 times as many clock cycles as computer A for this
program.
Programming
Language
Compiler
Datapath
Control CPI
Function Units
Transistors Wires Pins Cycle Time
n
Clock Cycles (CPIi Instruction Count i )
i1
average CPI
Clock Cycles n
Instruction Count i
CPI CPIi
Instruction Count i1 Instruction Count
Relative frequency
Clock Cycles Per Instruction: CPI 14
Class A B C
CPI for class 1 2 3
Instruction IC in sequence 1 2 1 2
Count
IC in sequence 2 4 1 1
Sequence 1: IC = 5 Sequence 2: IC = 6
Clock Cycles Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 = 9 (Faster)
Avg. CPI = 10/5 = 2.0 Avg. CPI = 9/6 = 1.5
Ic p m k
Instruction set
X X
architecture
Compiler technology X X X
Processor
X X
implementation
Cache and memory
X X
hierarchy
ISA
X X X
Processor
X X
organization
Technology
X
+ CPI and Clock Rate
o Computer A: Cycle Time = 250ps (4 GHz clock rate), CPI = 2.0
o Computer B: Cycle Time = 500ps (2 GHz clock rate), CPI = 1.2
o Same ISA
o Which is faster, and by how much?
B I 600ps 1.2
CPU Time
CPU Time I 500ps …by this much
A
A Simple Example
Op Freq CPIi Freq x CPIi
ALU 50% 1 .5 .5 .5
Load 20% 5 1.0 .4 1.0
Store 10% 3 .3 .3 .3
Branch 20% 2
.4 .4 .2
= 2.2 1.6 2.0
How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles?
Relative performance is 2.2/1.6 means 37.5% faster
How does this compare with using branch prediction to
shave a cycle off the branch time?
Relative performance is 2.2/2.0 means 10% faster
+ MIPS as a Performance Metric
MIPS
Instruction count (3)
Execution time 10 6
102
Delay increases as the RC product increases
10
As components on the chip decrease in size, the wire
interconnects become thinner, increasing resistance 1
components
Reduce the frequency
Architectural examples of memory access by
incorporating
include: increasingly complex
and efficient cache
structures between
the processor and
main memory