Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 38

Computer Architecture (ENGI-5231)

Ehsan Atoofian
Electrical Engineering Department
Lakehead University
Chip Multiprocessor
 Instead of single processor, several processors on the same die
 Power wall: Each core, simple architecture
 Memory wall: Overlap computation and memory access(sensitivity of CPU
multiprocessor are less sensitive than single core)
 ILP wall: Thread level parallelism

P0 P1
Single Processor
P2 P3

Single Processor Chip Multiprocessor

2
Outline
Guidelines for Design & Analysis
1)Parallelism
2)Locality
3)Common case

Performance evaluation & benchmarks


Cost
Reliability & Availability

3
Guidelines for Design & Analysis
1) Parallelism
-In servers, multiple processor and disk, improves
performance

-In processors, pipeline, overlap instruction execution,


reduces total time

4
Datapath of MIPS
Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back
Next PC

MUX
Adder

Next SEQ PC

4 RS1 Reg File


Zero?

RS2
Memory

MUX MUX
Inst
PC

ALU

Memory
Data
RD

MUX
Sign
Imm Extend

WB Data 5
Pipelining in MIPS
Time (clock cycles)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8


I
n

ALU
Ifetch Reg DMem Reg
s add
t
r.

ALU
mult Ifetch Reg DMem Reg

ALU
Reg
r store Ifetch Reg DMem

d
e

ALU
jump Ifetch Reg DMem Reg

r
6
Guidelines for Design & Analysis
1) Parallelism
-In servers, multiple processor and disk, improves
performance

-In processors, pipeline, overlap instruction execution,


reduces total time

-In component level, memory banks increase


parallelism

7
Guidelines for Design & Analysis
2) Locality: programs tend to reuse instruction and
data they have used recently
90% of programs’ time, spent in 10% of code

Cache works based on locality

Processor Cache Memory

8
Guidelines for Design & Analysis
3) Common case: in a design trade-off, favour the
frequent case over rare case
e.g., instruction fetch and decode stages common
among all instruction, but multiplier is not, optimize
fetch and decode first

How much performance is improved by optimizing


common case? Amdahl’s Law

9
Amdahl’s Law
Amdahl’s law says the performance improvement of the
whole system achieved by enhancing a fraction of the
system
Told1 Told2

Partial Enhancement

Tnew1 = Told1 Tnew2

Note that fractionenhanced is fraction of time before enhancement.10


Example
A new CPU 10X faster
60% of the time, waiting I/O-40% of the time spent
on computation
Total Speedup?

speedupoveral  1.56

11
Another Example
80% of a sequential program is parallelizable
Speedup in a dual core relative to single core

speedupoveral  1.66

12
Maximum Speedup
The maximum speedup achieved by improving
fractionenhanced:

13
Outline
Guidelines for Design & Analysis
1)Parallelism
2)Locality
3)Common case

Performance evaluation & benchmarks


Cost
Reliability & Availability

14
Performance
What do we mean by: computer X is faster than
computer Y?

Execution Time Or Throughput?

Desktop user interested in execution time

Server admin. interested in throughput

15
Performance
Assuming we are concerned execution time

X is n times faster than Y means:

16
Execution Time

∑ ICi×CPIi
CPItotal = (CPU Clock Cycles/#of total instructions) =
#of total instructions

=∑ frequencyi×CPIi ICi
frequencyi=
#of total instructions
17
Example
 Assume a program with following instructions
Instruction Frequency CPI
ALU 45% 1
Branch 15% 3
Memory 40% 10

 What is average CPI ?


 In order to improve the performance we have the two following
options:
 1) Use a better ALU and reduce CPI of ALU instructions to 0.3
 2) Use a better and more complex memory system, which will reduce
the CPI of memory instruction to 9 but will require a 10% increase in
clock cycle. Which option provides better performance?
18
Example
 CPI=4.9

 1) CPI= 4.585
 Speedup= 1.069

 2) CPI= 4.5
 Speedup=0.98 Slow-down

 Execution time is the REAL measure of computer performance!

 We should take into account side effect of an optimization


technique
19
Execution Time

 # of instructions depends on programmer, compiler, and ISA


 CISC vs. RISC?

 CPI (clock cycles per instruction): depends on architecture of


processor
 e.g. an architecture with lower cache miss, lower CPI

 Clock cycle time: depends on organization and VLSI technology


 Pipelining reduces clock cycle
 Smaller feature size, faster gates
20
Benchmarks
Which type of application to measure performance?

A collection of benchmarks representative for real


application, called benchmark suite

21
SPEC Benchmark Suite
Standard Performance Evaluation Corporation (Spec):
a popular benchmark suite for desktops
Focuses processor performance
Initial introduced in 1989 (SPEC89)
Fifth generation released in 2006 (SPEC2006)
 12 integer benchmarks, 17 floating-point benchmarks
 A mix of C and Fortran programs

22
Outline
Guidelines for Design & Analysis
1)Parallelism
2)Locality
3)Common case

Performance evaluation & benchmarks


Cost
Reliability & Availability

23
Cost of Computers
Yield: Percentage of products that pass test

Time: Cost reduces over time due to learning curve

Volume: Increasing volumes, reduces learning curve,


amortizes design cost

Competition: reduces gap between cost and selling


price

24
Wafer
Pentium 4 wafer in 130nm

From Howe, Sodini: Microelectronics:An


Integrated Approach, Prentice Hall

25
Die
Pentium 4 Die

26
Cost of Chips

27
Die Yield
Empirical equation

Defects per unit area


 Random manufacturing defects
 0.016-0.057 defects per square cm (2010)

N = process-complexity factor = 11.5-15.5 (40 nm, 2010)

In architectural level, only die area is controllable, the other


parameters dictated by manufacturing process
28
Outline
Guidelines for Design & Analysis
1)Parallelism
2)Locality
3)Common case

Performance evaluation & benchmarks


Performance equation
Cost
Reliability & Availability

29
Reliability & Availability
When feature size reduces, failure rate increases

What is the chance a system fails? Formulate


Reliability & Availability.

30
Reliability & Availability
System alternate between two states:
 1) Service accomplishment: service delivered properly
 2) Service interruption: system fails

Reliability: A measure of continuous service accomplishment


T0 start time
T1 System fails
T2 system restores
Time to failure: T1 - T0 T0 T1 T2

Time to repair: T2 - T1

31
Reliability & Availability
Interested in average
Mean Time To Failure (MTTF)
 Failure rate: 1/MTTF
Mean Time TO Repair (MTTR)

Module availability: MTTF / ( MTTF + MTTR)

T0 T1 T2

32
Example
Calculate MTTF for 10 disks (1M hour MTTF per disk),
1 disk controller (0.5M hour MTTF), and 1 power
supply (0.2M hour MTTF):

FailureRat e  10  (1 / 1,000,000)  1 / 500,000  1 / 200,000


 10  2  5 / 1,000,000
 17 / 1,000,000
 17,000 FIT
MTTF  1,000,000,000 / 17,000
 59,000hours

33
Readings
Chapter 1

34
ISA as Interface
Programmer's View Computer
Program
ADD 01010 (Instructions)
SUBTRACT 01110
AND 10011 CPU
OR 10001 Memory
COMPARE 11010
. .
. . I/O
. .
Computer's View

 Last through generation (portability)


 Efficient implementation in hardware
35
Question in ISA Level
What type of operation should be supported ?
Is it enough to have just load, store, and branch
instructions ?

What data type should be supported ?


Character, integer, floating-point ?

How many operands should be supported ?


Add operation with 4 sources operands ? It makes
compiler writers happy, what about hardware complexity ?

36
General Format of Instructions

Opcode SIZE ? Data SIZE ?

How many operations? How many operands?

37
Classes of ISA
Based on internal storage:
1) Stack
A=B+C
2)Accumulator
3)Register-register
B C
4)Register-memory

Comparison

# of instructions ?
B

38

You might also like