Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

§1.

1 Introduction
COMPUTER ORGANIZATION AND DESIGN
The Hardware/Software Interface
ARM
Edition The Computer Revolution
n Progress in computer technology
Underpinned by Moore’s Law
Chapter 1
n

n Makes novel applications feasible


n Computers in automobiles
Computer Abstractions n Cell phones
and Technology
n Human genome project
n World Wide Web
n Search Engines
n Computers are pervasive

Chapter 1 — Computer Abstractions and Technology — 2

Classes of Computers Classes of Computers


n Personal computers n Supercomputers
n General purpose, variety of software n High-end scientific and engineering
n Subject to cost/performance tradeoff calculations
n Highest capability but represent a small
fraction of the overall computer market
n Server computers
n Network based
n High capacity, performance, reliability n Embedded computers
n Range from small servers to building sized n Hidden as components of systems
n Stringent power/performance/cost constraints

Chapter 1 — Computer Abstractions and Technology — 3 Chapter 1 — Computer Abstractions and Technology — 4
The PostPC Era The PostPC Era
n Personal Mobile Device (PMD)
n Battery operated
n Connects to the Internet
n Hundreds of dollars
n Smart phones, tablets, electronic glasses
n Cloud computing
n Warehouse Scale Computers (WSC)
n Software as a Service (SaaS)
n Portion of software run on a PMD and a
portion run in the Cloud
n Amazon and Google
Chapter 1 — Computer Abstractions and Technology — 5 Chapter 1 — Computer Abstractions and Technology — 6

What You Will Learn Understanding Performance


n How programs are translated into the n Algorithm
machine language n Determines number of operations executed

n And how the hardware executes them n Programming language, compiler, architecture
n Determine number of machine instructions executed
n The hardware/software interface per operation
n What determines program performance n Processor and memory system
n And how it can be improved n Determine how fast instructions are executed
n How hardware designers improve n I/O system (including OS)
performance n Determines how fast I/O operations are executed

n What is parallel processing


Chapter 1 — Computer Abstractions and Technology — 7 Chapter 1 — Computer Abstractions and Technology — 8
§1.2 Eight Great Ideas in Computer Architecture

§1.3 Below Your Program


Eight Great Ideas Below Your Program
n Design for Moore’s Law n Application software
n Written in high-level language
n Use abstraction to simplify design
n System software
n Make the common case fast n Compiler: translates HLL code to
machine code
n Performance via parallelism
n Operating System: service code
n Performance via pipelining n Handling input/output
n Managing memory and storage
n Performance via prediction n Scheduling tasks & sharing resources
n Hierarchy of memories n Hardware
n Processor, memory, I/O controllers
n Dependability via redundancy

Chapter 1 — Computer Abstractions and Technology — 9 Chapter 1 — Computer Abstractions and Technology — 10

§1.4 Under the Covers


Levels of Program Code Components of a Computer
n High-level language The BIG Picture n Same components for
n Level of abstraction closer all kinds of computer
to problem domain
n Desktop, server,
n Provides for productivity embedded
and portability
n Assembly language n Input/output includes
n Textual representation of n User-interface devices
instructions n Display, keyboard, mouse

n Hardware representation n Storage devices


Hard disk, CD/DVD, flash
n Binary digits (bits) n

n Encoded instructions and n Network adapters


data n For communicating with
other computers

Chapter 1 — Computer Abstractions and Technology — 11 Chapter 1 — Computer Abstractions and Technology — 12
Touchscreen Through the Looking Glass
n PostPC device n LCD screen: picture elements (pixels)
n Supersedes keyboard n Mirrors content of frame buffer memory
and mouse
n Resistive and
Capacitive types
n Most tablets, smart
phones use capacitive
n Capacitive allows
multiple touches
simultaneously

Chapter 1 — Computer Abstractions and Technology — 13 Chapter 1 — Computer Abstractions and Technology — 14

Opening the Box Inside the Processor (CPU)


Capacitive multitouch LCD screen n Datapath: performs operations on data
3.8 V, 25 Watt-hour battery
n Control: sequences datapath, memory, ...
Computer board n Cache memory
n Small fast SRAM memory for immediate
access to data

Chapter 1 — Computer Abstractions and Technology — 15 Chapter 1 — Computer Abstractions and Technology — 16
Inside the Processor Abstractions
n Apple A5 The BIG Picture

n Abstraction helps us deal with complexity


n Hide lower-level detail
n Instruction set architecture (ISA)
n The hardware/software interface
n Application binary interface
n The ISA plus system software interface
n Implementation
n The details underlying and interface
Chapter 1 — Computer Abstractions and Technology — 17 Chapter 1 — Computer Abstractions and Technology — 18

A Safe Place for Data Networks


n Volatile main memory n Communication, resource sharing,
Loses instructions and data when power off
nonlocal access
n

n Non-volatile secondary memory


n Magnetic disk n Local area network (LAN): Ethernet
Flash memory
Wide area network (WAN): the Internet
n
n
n Optical disk (CDROM, DVD)
n Wireless network: WiFi, Bluetooth

Chapter 1 — Computer Abstractions and Technology — 19 Chapter 1 — Computer Abstractions and Technology — 20
§1.5 Technologies for Building Processors and Memory
Technology Trends Semiconductor Technology
n Electronics n Silicon: semiconductor
technology
continues to evolve n Add materials to transform properties:
n Increased capacity n Conductors
and performance
n Reduced cost n Insulators
DRAM capacity
n Switch
Year Technology Relative performance/cost
1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000

Chapter 1 — Computer Abstractions and Technology — 21 Chapter 1 — Computer Abstractions and Technology — 22

Manufacturing ICs Intel Core i7 Wafer

n 300mm wafer, 280 chips, 32nm technology


n Yield: proportion of working dies per wafer
n Each chip is 20.7 x 10.5 mm
Chapter 1 — Computer Abstractions and Technology — 23 Chapter 1 — Computer Abstractions and Technology — 24
§1.6 Performance
Integrated Circuit Cost Defining Performance
n Which airplane has the best performance?
Cost per wafer
Cost per die =
Dies per wafer ´ Yield Boeing 777 Boeing 777

Dies per wafer » Wafer area Die area Boeing 747 Boeing 747

BAC/Sud BAC/Sud
Concorde Concorde

1 Douglas Douglas DC-

Yield = DC-8-50 8-50

(1+ (Defects per area ´ Die area/2))2 0 100 200 300 400 500 0 2000 4000 6000 8000 10000

Passenger Capacity Cruising Range (miles)

n Nonlinear relation to area and defect rate Boeing 777 Boeing 777

n Wafer cost and area are fixed Boeing 747 Boeing 747

BAC/Sud BAC/Sud

n Defect rate determined by manufacturing process Concorde


Douglas
Concorde
Douglas DC-
DC-8-50 8-50

n Die area determined by architecture and circuit design 0 500 1000 1500 0 100000 200000 300000 400000

Cruising Speed (mph) Passengers x mph

Chapter 1 — Computer Abstractions and Technology — 25 Chapter 1 — Computer Abstractions and Technology — 26

Response Time and Throughput Relative Performance


n Response time n Define Performance = 1/Execution Time
How long it takes to do a task
n
n “X is n time faster than Y”
n Throughput
n Total work done per unit time
Performanc e X Performanc e Y
n e.g., tasks/transactions/… per hour = Execution time Y Execution time X = n
n How are response time and throughput affected
by
n Example: time taken to run a program
n Replacing the processor with a faster version? n 10s on A, 15s on B
n Adding more processors? n Execution TimeB / Execution TimeA
n We’ll focus on response time for now… = 15s / 10s = 1.5
n So A is 1.5 times faster than B
Chapter 1 — Computer Abstractions and Technology — 27 Chapter 1 — Computer Abstractions and Technology — 28
Measuring Execution Time CPU Clocking
n Elapsed time n Operation of digital hardware governed by a
n Total response time, including all aspects constant-rate clock
n Processing, I/O, OS overhead, idle time Clock period
n Determines system performance Clock (cycles)

n CPU time Data transfer


and computation
n Time spent processing a given job Update state
n Discounts I/O time, other jobs’ shares
n Comprises user CPU time and system CPU n Clock period: duration of a clock cycle
time n e.g., 250ps = 0.25ns = 250×10–12s
n Different programs are affected differently by
n Clock frequency (rate): cycles per second
CPU and system performance
n e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Chapter 1 — Computer Abstractions and Technology — 29 Chapter 1 — Computer Abstractions and Technology — 30

CPU Time CPU Time Example


n Computer A: 2GHz clock, 10s CPU time
CPU Time = CPU Clock Cycles ´ Clock Cycle Time n Designing Computer B
CPU Clock Cycles n Aim for 6s CPU time
= Can do faster clock, but causes 1.2 × clock cycles
Clock Rate n

n How fast must Computer B clock be?


n Performance improved by Clock CyclesB 1.2 ´ Clock Cycles A
Clock RateB = =
n Reducing number of clock cycles CPU Time B 6s
n Increasing clock rate Clock Cycles A = CPU Time A ´ Clock Rate A
n Hardware designer must often trade off clock
= 10s ´ 2GHz = 20 ´ 10 9
rate against cycle count
1.2 ´ 20 ´ 10 9 24 ´ 10 9
Clock RateB = = = 4GHz
6s 6s
Chapter 1 — Computer Abstractions and Technology — 31 Chapter 1 — Computer Abstractions and Technology — 32
Instruction Count and CPI CPI Example
Clock Cycles = Instruction Count ´ Cycles per Instruction
n Computer A: Cycle Time = 250ps, CPI = 2.0
n Computer B: Cycle Time = 500ps, CPI = 1.2
CPU Time = Instruction Count ´ CPI ´ Clock Cycle Time
n Same ISA
Instruction Count ´ CPI n Which is faster, and by how much?
=
Clock Rate CPU Time = Instruction Count ´ CPI ´ Cycle Time
A A A
n Instruction Count for a program = I ´ 2.0 ´ 250ps = I ´ 500ps A is faster…
n Determined by program, ISA and compiler CPU Time = Instruction Count ´ CPI ´ Cycle Time
B B B
n Average cycles per instruction = I ´ 1.2 ´ 500ps = I ´ 600ps
n Determined by CPU hardware
CPU Time
n If different instructions have different CPI B = I ´ 600ps = 1.2
…by this much
CPU Time I ´ 500ps
n Average CPI affected by instruction mix A
Chapter 1 — Computer Abstractions and Technology — 33 Chapter 1 — Computer Abstractions and Technology — 34

CPI in More Detail CPI Example


n If different instruction classes take different n Alternative compiled code sequences using
instructions in classes A, B, C
numbers of cycles
n Class A B C
Clock Cycles = å (CPIi ´ Instruction Count i ) CPI for class 1 2 3
i=1 IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
n Weighted average CPI
n Sequence 1: IC = 5 n Sequence 2: IC = 6
Clock Cycles n
æ Instruction Count i ö
CPI = = å ç CPIi ´ ÷ n Clock Cycles n Clock Cycles
Instruction Count i=1 è Instruction Count ø
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
Relative frequency
n Avg. CPI = 10/5 = 2.0 n Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and Technology — 35 Chapter 1 — Computer Abstractions and Technology — 36
§1.7 The Power Wall
Performance Summary Power Trends
The BIG Picture

Instructions Clock cycles Seconds


CPU Time = ´ ´
Program Instruction Clock cycle

n Performance depends on
n Algorithm: affects IC, possibly CPI
n Programming language: affects IC, CPI n In CMOS IC technology
n Compiler: affects IC, CPI Power = Capacitive load ´ Voltage 2 ´ Frequency
n Instruction set architecture: affects IC, CPI, Tc
×30 5V → 1V ×1000

Chapter 1 — Computer Abstractions and Technology — 37 Chapter 1 — Computer Abstractions and Technology — 38

§1.8 The Sea Change: The Switch to Multiprocessors


Reducing Power Uniprocessor Performance
n Suppose a new CPU has
n 85% of capacitive load of old CPU
n 15% voltage and 15% frequency reduction
Pnew Cold ´ 0.85 ´ (Vold ´ 0.85)2 ´ Fold ´ 0.85
= 2
= 0.85 4 = 0.52
Pold Cold ´ Vold ´ Fold

n The power wall


n We can’t reduce voltage further
n We can’t remove more heat
Constrained by power, instruction-level parallelism,
n How else can we improve performance? memory latency

Chapter 1 — Computer Abstractions and Technology — 39 Chapter 1 — Computer Abstractions and Technology — 40
Multiprocessors SPEC CPU Benchmark
n Multicore microprocessors n Programs used to measure performance
n Supposedly typical of actual workload
n More than one processor per chip n Standard Performance Evaluation Corp (SPEC)
n Requires explicitly parallel programming n Develops benchmarks for CPU, I/O, Web, …
n Compare with instruction level parallelism n SPEC CPU2006
n Hardware executes multiple instructions at once n Elapsed time to execute a selection of programs
n Negligible I/O, so focuses on CPU performance
n Hidden from the programmer n Normalize relative to reference machine
n Hard to do n Summarize as geometric mean of performance ratios
n CINT2006 (integer) and CFP2006 (floating-point)
n Programming for performance
n Load balancing n

n Optimizing communication and synchronization n


Õ Execution time ratio
i=1
i

Chapter 1 — Computer Abstractions and Technology — 41 Chapter 1 — Computer Abstractions and Technology — 42

CINT2006 for Intel Core i7 920 SPEC Power Benchmark


n Power consumption of server at different
workload levels
n Performance: ssj_ops/sec
n Power: Watts (Joules/sec)

æ 10 ö æ 10 ö
Overall ssj_ops per Watt = ç å ssj_ops i ÷ ç å poweri ÷
è i=0 ø è i=0 ø

Chapter 1 — Computer Abstractions and Technology — 43 Chapter 1 — Computer Abstractions and Technology — 44
§1.10 Fallacies and Pitfalls
SPECpower_ssj2008 for Xeon X5650 Pitfall: Amdahl’s Law
n Improving an aspect of a computer and
expecting a proportional improvement in
overall performance
Taffected
Timproved = + Tunaffected
improvemen t factor
n Example: multiply accounts for 80s/100s
n How much improvement in multiply performance to
get 5× overall?
80 n Can’t be done!
20 = + 20
n
n Corollary: make the common case fast
Chapter 1 — Computer Abstractions and Technology — 45 Chapter 1 — Computer Abstractions and Technology — 46

Fallacy: Low Power at Idle Pitfall: MIPS as a Performance Metric


n Look back at i7 power benchmark n MIPS: Millions of Instructions Per Second
n At 100% load: 258W n Doesn’t account for
n At 50% load: 170W (66%) n Differences in ISAs between computers
n At 10% load: 121W (47%) n Differences in complexity between instructions

n Google data center MIPS =


Instruction count
n Mostly operates at 10% – 50% load Execution time ´ 10 6

At 100% load less than 1% of the time Instruction count Clock rate
n = =
Instruction count ´ CPI CPI ´ 10 6
n Consider designing processors to make ´ 10 6
Clock rate
power proportional to load
n CPI varies between programs on a given CPU
Chapter 1 — Computer Abstractions and Technology — 47 Chapter 1 — Computer Abstractions and Technology — 48
§1.9 Concluding Remarks
Concluding Remarks
n Cost/performance is improving
n Due to underlying technology development
n Hierarchical layers of abstraction
n In both hardware and software
n Instruction set architecture
n The hardware/software interface
n Execution time: the best performance
measure
n Power is a limiting factor
n Use parallelism to improve performance
Chapter 1 — Computer Abstractions and Technology — 49

You might also like