Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

COMPUTER ORGANIZATION AND DESIGN RISC-V

2nd ed.
The Hardware/Software Interface

Chapter 1
Computer Abstractions
and Technology
Semiconductor Technology
◼ Silicon: semiconductor
◼ Add materials to transform properties:
◼ Conductors
◼ Insulators
◼ Switch

Chapter 1 — Computer Abstractions and Technology — 2


Logic Chips Design Goals
◼ Small area
◼ High performance
◼ Low power/energy

Chapter 1 — Computer Abstractions and Technology — 3


Manufacturing ICs

◼ Yield: proportion of working dies per wafer

Chapter 1 — Computer Abstractions and Technology — 4


Intel® Core 10th Gen

◼ 300mm wafer, 506 chips, 10nm technology


◼ Each chip is 11.4 x 10.7 mm
Chapter 1 — Computer Abstractions and Technology — 5
Integrated Circuit Cost
Cost per wafer
Cost per die =
Dies per wafer  Yield
Dies per wafer  Wafer area Die area
1
Yield =
(1+ (Defects per area  Die area/2))2

◼ Nonlinear relation to area and defect rate


◼ Wafer cost and area are fixed
◼ Defect rate determined by manufacturing process
◼ Die area determined by architecture and circuit design

Chapter 1 — Computer Abstractions and Technology — 6


§1.6 Performance
Defining Performance
◼ Which airplane has the best performance?

Chapter 1 — Computer Abstractions and Technology — 7


Defining Performance
◼ Create a similar example using
automobiles.

Chapter 1 — Computer Abstractions and Technology — 8


What is Performance?

◼ Response Time and Throughput


◼ Response time
◼ How long it takes to do a task
◼ Throughput
◼ Total work done per unit time
◼ e.g., tasks/transactions/… per hour

◼ How are response time and throughput affected


by
◼ Replacing the processor with a faster version?
◼ Adding more processors?
◼ We’ll focus on response time for now…

Chapter 1 — Computer Abstractions and Technology — 9


Relative Performance
◼ Define Performance = 1/Execution Time
◼ “X is n time faster than Y”
Performance X Performance Y
= Execution time Y Execution time X = n

◼ Example: time taken to run a program


◼ 10s on A, 15s on B
◼ Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
◼ So, A is 1.5 times faster than B
◼ You can also say A is 50% faster than B.

Chapter 1 — Computer Abstractions and Technology — 10


Measuring Execution Time
◼ Elapsed time
◼ Total response time, including all aspects
◼ Processing, I/O, OS overhead, idle time
◼ Determines system performance
◼ CPU time
◼ Time spent processing a given job
◼ Discounts I/O time, other jobs’ shares
◼ Comprises user CPU time and system CPU time
◼ Different programs are affected differently by CPU
and system performance

Chapter 1 — Computer Abstractions and Technology — 11


CPU Clocking
◼ Operation of digital hardware governed by a
constant-rate clock
Clock period

Clock (cycles)

Data transfer
and computation
Update state

◼ Clock period: duration of a clock cycle


◼ e.g., 250ps = 0.25ns = 250×10–12s
◼ Clock frequency (rate): cycles per second
◼ e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Chapter 1 — Computer Abstractions and Technology — 12
CPU Time
CPU Time = CPU Clock Cycles  Clock Cycle Time
CPU Clock Cycles
=
Clock Rate
◼ Performance improved by
◼ Reducing number of clock cycles
◼ Increasing clock rate
◼ Hardware designer must often trade off clock
rate against cycle count

Chapter 1 — Computer Abstractions and Technology — 13


CPU Time Example
◼ Computer A: 2GHz clock, 10s CPU time
◼ Designing Computer B
◼ Aim for 6s CPU time
◼ Can do faster clock, but causes 1.2 × clock cycles
◼ How fast must Computer B clock be?
Clock CyclesB 1.2  Clock CyclesA
Clock RateB = =
CPU Time B 6s
Clock CyclesA = CPU Time A  Clock Rate A
= 10s  2GHz = 20  10 9
1.2  20  10 9 24  10 9
Clock RateB = = = 4GHz
6s 6s
Chapter 1 — Computer Abstractions and Technology — 14
Instruction Count and CPI
Clock Cycles = Instructio n Count  Cycles per Instructio n
CPU Time = Instructio n Count  CPI  Clock Cycle Time
Instructio n Count  CPI
=
Clock Rate
◼ Instruction Count for a program
◼ Determined by program, ISA and compiler
◼ Average cycles per instruction
◼ Determined by CPU hardware
◼ If different instructions have different CPI
◼ Average CPI affected by instruction mix

Chapter 1 — Computer Abstractions and Technology — 15


CPI Example
◼ Computer A: Cycle Time = 250ps, CPI = 2.0
◼ Computer B: Cycle Time = 500ps, CPI = 1.2
◼ Same ISA
◼ Which is faster, and by how much?
CPU Time = Instruction Count  CPI  Cycle Time
A A A
= I  2.0  250ps = I  500ps A is faster…
CPU Time = Instruction Count  CPI  Cycle Time
B B B
= I  1.2  500ps = I  600ps

B = I  600ps = 1.2
CPU Time
…by this much
CPU Time I  500ps
A
Chapter 1 — Computer Abstractions and Technology — 16
CPI in More Detail
◼ If different instruction classes take different
numbers of cycles
n
Clock Cycles =  (CPIi  Instruction Counti )
i=1

◼ Weighted average CPI


Clock Cycles n
 Instruction Counti 
CPI = =   CPIi  
Instruction Count i=1  Instruction Count 

Relative frequency

Chapter 1 — Computer Abstractions and Technology — 17


CPI Example
◼ Alternative compiled code sequences using
instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

◼ Sequence 1: IC = 5 ◼ Sequence 2: IC = 6
◼ Clock Cycles ◼ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
◼ Avg. CPI = 10/5 = 2.0 ◼ Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and Technology — 18
Performance Summary
The BIG Picture

Instructions Clock cycles Seconds


CPU Time =  
Program Instruction Clock cycle

◼ Performance depends on
◼ Algorithm: affects IC, possibly CPI
◼ Programming language: affects IC, CPI
◼ Compiler: affects IC, CPI
◼ Instruction set architecture: affects IC, CPI, Tc

Chapter 1 — Computer Abstractions and Technology — 19


§1.7 The Power Wall
Power Trends

◼ In CMOS IC technology
Power = Capacitive load  Voltage 2  Frequency

×30 5V → 1V ×1000

Chapter 1 — Computer Abstractions and Technology — 20


Reducing Power
◼ Suppose a new CPU has
◼ 85% of capacitive load of old CPU
◼ 15% voltage and 15% frequency reduction
Pnew Cold  0.85  (Vold  0.85)2  Fold  0.85
= = 0.85 4
= 0.52
Cold  Vold  Fold
2
Pold

◼ The power wall


◼ We can’t reduce voltage further
◼ We can’t remove more heat
◼ How else can we improve performance?
Chapter 1 — Computer Abstractions and Technology — 21
§1.8 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance

Constrained by power, instruction-level parallelism,


memory latency

Chapter 1 — Computer Abstractions and Technology — 22


SPEC CPU Benchmark
◼ Programs used to measure performance
◼ Supposedly typical of actual workload
◼ Standard Performance Evaluation Corp (SPEC)
◼ Develops benchmarks for CPU, I/O, Web, …
◼ SPEC CPU2006
◼ Elapsed time to execute a selection of programs
◼ Negligible I/O, so focuses on CPU performance
◼ Normalize relative to reference machine
◼ Summarize as geometric mean of performance ratios
◼ CINT2006 (integer) and CFP2006 (floating-point)

n
n
 Execution time ratio
i=1
i

Chapter 1 — Computer Abstractions and Technology — 23


SPECspeed 2017 Integer benchmarks on a
1.8 GHz Intel Xeon E5-2650L

Chapter 1 — Computer Abstractions and Technology — 24


SPEC Power Benchmark
◼ Power consumption of server at different
workload levels
◼ Performance: ssj_ops/sec
◼ Power: Watts (Joules/sec)

 10   10 
Overall ssj_ops per Watt =   ssj_opsi    poweri 
 i =0   i =0 

Chapter 1 — Computer Abstractions and Technology — 25


SPECpower_ssj2008 for Xeon E5-2650L

Chapter 1 — Computer Abstractions and Technology — 26


§1.11 Fallacies and Pitfalls
Pitfall: Amdahl’s Law
◼ Improving an aspect of a computer and
expecting a proportional improvement in
overall performance
Taffected
Timproved = + Tunaffected
improvemen t factor
◼ Example: multiply accounts for 80s/100s
◼ How much improvement in multiply performance to
get 5× overall?
80 ◼ Can’t be done!
20 = + 20
n
◼ Corollary: make the common case fast
Chapter 1 — Computer Abstractions and Technology — 27
Fallacy: Low Power at Idle
◼ Look back at i7 power benchmark
◼ At 100% load: 258W
◼ At 50% load: 170W (66%)
◼ At 10% load: 121W (47%)
◼ Google data center
◼ Mostly operates at 10% – 50% load
◼ At 100% load less than 1% of the time
◼ Consider designing processors to make
power proportional to load

Chapter 1 — Computer Abstractions and Technology — 28


Pitfall: MIPS as a Performance Metric
◼ MIPS: Millions of Instructions Per Second
◼ Doesn’t account for
◼ Differences in ISAs between computers
◼ Differences in complexity between instructions

Instruction count
MIPS =
Execution time  10 6
Instruction count Clock rate
= =
Instruction count  CPI CPI  10 6
 10 6

Clock rate
◼ CPI varies between programs on a given CPU
Chapter 1 — Computer Abstractions and Technology — 29
§1.12 Concluding Remarks
Concluding Remarks
◼ Cost/performance is improving
◼ Due to underlying technology development
◼ Hierarchical layers of abstraction
◼ In both hardware and software
◼ Instruction set architecture
◼ The hardware/software interface
◼ Execution time: the best performance
measure
◼ Power is a limiting factor
◼ Use parallelism to improve performance
Chapter 1 — Computer Abstractions and Technology — 30

You might also like