Professional Documents
Culture Documents
Lecture2 E5231
Lecture2 E5231
Ehsan Atoofian
Electrical Engineering Department
Lakehead University
Chip Multiprocessor
Instead of single processor, several processors on the same die
Power wall: Each core, simple architecture
Memory wall: Overlap computation and memory access(sensitivity of CPU
multiprocessor are less sensitive than single core)
ILP wall: Thread level parallelism
P0 P1
Single Processor
P2 P3
2
Outline
Guidelines for Design & Analysis
1)Parallelism
2)Locality
3)Common case
3
Guidelines for Design & Analysis
1) Parallelism
-In servers, multiple processor and disk, improves
performance
4
Datapath of MIPS
Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back
Next PC
MUX
Adder
Next SEQ PC
RS2
Memory
MUX MUX
Inst
PC
ALU
Memory
Data
RD
MUX
Sign
Imm Extend
WB Data 5
Pipelining in MIPS
Time (clock cycles)
ALU
Ifetch Reg DMem Reg
s add
t
r.
ALU
mult Ifetch Reg DMem Reg
ALU
Reg
r store Ifetch Reg DMem
d
e
ALU
jump Ifetch Reg DMem Reg
r
6
Guidelines for Design & Analysis
1) Parallelism
-In servers, multiple processor and disk, improves
performance
7
Guidelines for Design & Analysis
2) Locality: programs tend to reuse instruction and
data they have used recently
90% of programs’ time, spent in 10% of code
8
Guidelines for Design & Analysis
3) Common case: in a design trade-off, favour the
frequent case over rare case
e.g., instruction fetch and decode stages common
among all instruction, but multiplier is not, optimize
fetch and decode first
9
Amdahl’s Law
Amdahl’s law says the performance improvement of the
whole system achieved by enhancing a fraction of the
system
Told1 Told2
Partial Enhancement
speedupoveral 1.56
11
Another Example
80% of a sequential program is parallelizable
Speedup in a dual core relative to single core
speedupoveral 1.66
12
Maximum Speedup
The maximum speedup achieved by improving
fractionenhanced:
13
Outline
Guidelines for Design & Analysis
1)Parallelism
2)Locality
3)Common case
14
Performance
What do we mean by: computer X is faster than
computer Y?
15
Performance
Assuming we are concerned execution time
16
Execution Time
∑ ICi×CPIi
CPItotal = (CPU Clock Cycles/#of total instructions) =
#of total instructions
=∑ frequencyi×CPIi ICi
frequencyi=
#of total instructions
17
Example
Assume a program with following instructions
Instruction Frequency CPI
ALU 45% 1
Branch 15% 3
Memory 40% 10
1) CPI= 4.585
Speedup= 1.069
2) CPI= 4.5
Speedup=0.98 Slow-down
21
SPEC Benchmark Suite
Standard Performance Evaluation Corporation (Spec):
a popular benchmark suite for desktops
Focuses processor performance
Initial introduced in 1989 (SPEC89)
Fifth generation released in 2006 (SPEC2006)
12 integer benchmarks, 17 floating-point benchmarks
A mix of C and Fortran programs
22
Outline
Guidelines for Design & Analysis
1)Parallelism
2)Locality
3)Common case
23
Cost of Computers
Yield: Percentage of products that pass test
24
Wafer
Pentium 4 wafer in 130nm
25
Die
Pentium 4 Die
26
Cost of Chips
27
Die Yield
Empirical equation
29
Reliability & Availability
When feature size reduces, failure rate increases
30
Reliability & Availability
System alternate between two states:
1) Service accomplishment: service delivered properly
2) Service interruption: system fails
Time to repair: T2 - T1
31
Reliability & Availability
Interested in average
Mean Time To Failure (MTTF)
Failure rate: 1/MTTF
Mean Time TO Repair (MTTR)
T0 T1 T2
32
Example
Calculate MTTF for 10 disks (1M hour MTTF per disk),
1 disk controller (0.5M hour MTTF), and 1 power
supply (0.2M hour MTTF):
33
Readings
Chapter 1
34
ISA as Interface
Programmer's View Computer
Program
ADD 01010 (Instructions)
SUBTRACT 01110
AND 10011 CPU
OR 10001 Memory
COMPARE 11010
. .
. . I/O
. .
Computer's View
36
General Format of Instructions
37
Classes of ISA
Based on internal storage:
1) Stack
A=B+C
2)Accumulator
3)Register-register
B C
4)Register-memory
Comparison
# of instructions ?
B
38