Professional Documents
Culture Documents
Computer Architecture Slides 1
Computer Architecture Slides 1
INTRODUCTION TO
COMPUTER ORGANISATION
Turbo Majumder
turbo@ee.iitd.ac.in
About Instructor and Course
Instructor: Dr. Turbo Majumder
Department of Electrical Engineering
Office: III-335
Email: turbo@ee.iitd.ac.in
Phone: 1073
Course webpage: http://web.iitd.ac.in/~turbo/EEL308_1302.htm
TAs: See course page for full list.
Textbook:
Computer Organization and Design: The Hardware/Software Interface,
ARM Edition, David A. Patterson, John L. Hennessy, Morgan
Kaufmann (Source of most material and figures in the lecture slides)
Class hours: Slot F: Tue, Thu, Fri: 11:00 11:50 am
Tutorial hours: 1:00 1:50 pm
Grading policy
Minor1: 20
Minor2: 20
Major: 30
Class participation
Term paper: 5
Quizzes: 15
Tutorial: 10
Attendance policy: As per Institute rules
Collaboration is good when it is open, honest and given
due credit. Clandestine collaboration invites an F.
Why learn computer architecture?
It is a core course, duh.
Computers (or if you will, microprocessors) are
everywhere. You will probably be designing or using one
in whatever job you do.
To design well, of course.
To use it well (e.g. programming), you need to know what is inside.
Plus, knowing this stuff gives you a geeky edge!
Where are computers used?
What you can hope to learn?
A. How does my computer understand the C-program I
have written?
B. Where does software and hardware interface in a
microprocessor? How does the interface look like?
C. What is performance? How can I characterise it? How
can I improve upon it?
D. Briefly, why do we need multicore processors and
parallel processing?
What impacts program performance?
Algorithm
Programming language, compiler and architecture
Processor and memory design
I/O interface design
We will look at all of these in terms of A, B, C and D
(previous slide).
How does my computer understand my
program?
Applications software
(browser, word processor,
media player)
Systems software (OS)
Computer hardware
(microprocessor)
Compiler
Assembler
Assembly
language
Machine
language
H
i
g
h
-
l
e
v
e
l
l
a
n
g
u
a
g
e
c = a + b;
ADD RC, RA, RB
0x40af8020
Only binary language,
please!
Instruction set
architecture
Basic components of a computer
Processor
Datapath
Control
Memory
Volatile
Non-volatile
I/O
Input
Output
Networking
LAN, WAN, WLAN
Moores Law
Source: Wikimedia
Commons
Moores Law again
Rachel Courtland, The Status of Moore's Law: It's Complicated, IEEE Spectrum, 28 Oct 2013 (based on data from
Global Foundries)
Moores Law: New dimensions?
AMDs Barcelona Architecture
Quad-core, 65 nm process
2007
(Courtesy: AnandTech)
Effect on number of
cores in a
microprocessor
Multicores
More on this later
Performance
Mostly concerned with time performance
Execution time
Performance = 1/(Execution time)
Important for individual applications/tasks
Improves (decreases) with faster processors
What is faster?
Higher clock speed?
Greater parallelism?
Computation throughput
Performance = No. of tasks/operations performed per second
Usually from different applications
Measured typically in GFLOPS, TFLOPS, ExaFLOPS
Important for server/cloud applications
Parallelism is key to getting these benefits.
Performance: Deep Dive
Relative performance:
Perf(X)/Perf(Y) = ExTime(Y)/ExTime(X)
Total execution time
Wall clock time, response time or elapsed time
CPU time
User CPU time
System CPU time
Difficult to separate these components
Use top command in Linux shell or Task Manager in Windows.
CPU Performance
CPU execution time
= CPU clock cycles per program X clock cycle time (clock period)
= CPU clock cycles per program / clock frequency
Program Set of (assembly/machine language)
instructions
CPU clock cycles per program
= Instructions per program X average clock cycles per instruction
= Instruction count (IC) X cycles per instruction (CPI)
CPU execution time
= IC X CPI X Tclk
= IC X CPI / fclk
CPU Performance: An example
Two programs foo and faa
Instruction types:
Instr_0 1 cycle
Instr_1 2 cycles
Instr_2 5 cycles
foo: total 10 instructions
Instr_0: 7
Instr_1: 2
Instr_2: 1
faa: total 8 instructions
Instr_0: 4
Instr_1: 2
Instr_2: 2
Total clock cycles
= 7*1+2*2+1*5 =16
Total clock cycles
= 4*1+2*2+2*5 =18
CPI details
Different instructions have different individual CPIs
Overall CPI is given by using a weighted average
foo: CPI = 1.6
faa: CPI = 2.25 Higher relative frequency of Instr_2
CPI
Clock Cycles
Instruction Count
CPI
i
Instruction Count
i
Instruction Count
i1
n
Relative frequency
of instruction i
Power: Problem with Moores Law
0.1
1
10
100
1,000
10,000
71 74 78 85 92 00 04 08
Power
(Watts)
4004
8008
8080
8085
8086
286
386
486
Pentium
processors
Power Projections Too High!
Hot Plate
Nuclear Reactor
Rocket Nozzle
Suns Surface
Source: Intel
Circumventing the power wall
P = CV
2
f
V: 5 V 1V; f: 30 MHz 3 GHz
We can reduce voltage and capacitive load
by only so much.
Other limitations in uniprocessors
Constrained by
power, instruction-
level parallelism
(ILP) and memory
latency
Moores Law: New approach
AMDs Barcelona Architecture
Quad-core, 65 nm process
2007
(Courtesy: AnandTech)
Increasing number of
cores in a processor to
be better prepared for
the power wall.
More processing done
in parallel at the same
clock frequency.
Age of multicore
processors
Multiprocessor trends
Larger number of cores
Better performance (speed, energy)
Greater complexity in design and application porting
Single-core
Dual-core 8-core
GPU
NoC
22
Benchmarking for performance
Standard Performance Evaluation Corporation (SPEC)
Integer (CINT2006) or Floating point (CFP2006)
Reference: Sun UltraSparc II system at 296MHz
Standard Performance Evaluation Corporation
info@spec.org
http://www.spec.org/
Page 2
spec