Professional Documents
Culture Documents
Credits: WWW - Cse.Scu - Edu/ Rdaniels/Html/Courses/Co En1/Cpuarch
Credits: WWW - Cse.Scu - Edu/ Rdaniels/Html/Courses/Co En1/Cpuarch
1)
CREDITS
www.cse.scu.edu/~rdaniels/html/courses/Co
en1/CPUarch.ppt
(6.2)
Central Processing Unit Architecture
Architecture overview
Machine organization
– von Neumann
Speeding up CPU operations
– multiple registers
– pipelining
– superscalar and VLIW
CISC vs. RISC
(6.3)
Computer Architecture
Major components of a computer
– Central Processing Unit (CPU)
– memory
– peripheral devices
Architecture is concerned with
– internal structures of each
– interconnections
» speed and width
– relative speeds of components
Want maximum execution speed
– Balance is often critical issue
(6.4)
Computer Architecture (continued)
CPU
– performs arithmetic and logical operations
– synchronous operation
– may consider instruction set architecture
» how machine looks to a programmer
– detailed hardware design
(6.5)
Computer Architecture (continued)
Memory
– stores programs and data
– organized as
» bit
» byte = 8 bits (smallest addressable location)
» word = 4 bytes (typically; machine dependent)
– instructions consist of operation codes and
addresses oprn addr 1
» 2’s complement
•negative values change 0 to 1, add 1
– floating point (approximate representation)
» scientific notation: 0.3481 x 106
» inherently imprecise s exp significand
Program
Control Unit
0 8 20 28 39
op code address op code address
(6.8)
Simple Machine Organization (continued)
ALU does arithmetic and logical comparisons
– AC = accumulator holds results
– MQ = memory-quotient holds second portion of
long results
– MBR = memory buffer register holds data while
operation executes
(6.9)
Simple Machine Organization (continued)
Program control determines what computer does
based on instruction read from memory
– MAR = memory address register holds address of
memory cell to be read
– PC = program counter; address of next instruction
to be read
– IR = instruction register holds instruction being
executed
– IBR holds right half of instruction read from memory
(6.10)
Simple Machine Organization (continued)
Machine operates on fetch-execute cycle
Fetch
– PC MAR
– read M(MAR) into MBR
– copy left and right instructions into IR and IBR
Execute
– address part of IR MAR
– read M(MAR) into MBR
– execute opcode
(6.11)
Simple Machine Organization (continued)
(6.12)
Architecture Families
Before mid-60’s, every new machine had a
different instruction set architecture
– programs from previous generation didn’t run on
new machine
– cost of replacing software became too large
IBM System/360 created family concept
– single instruction set architecture
– wide range of price and performance with same
software
Performance improvements based on different
detailed implementations
– memory path width (1 byte to 8 bytes)
– faster, more complex CPU design
– greater I/O throughput and overlap
“Software compatibility” now a major issue
– partially offset by high level language (HLL) software
(6.13)
Architecture Families
(6.14)
Multiple Register Machines
Initially, machines had only a few registers
– 2 to 8 or 16 common
– registers more expensive than memory
Most instructions operated between memory
locations
– results had to start from and end up in
memory, so fewer instructions
» although more complex
– means smaller programs and (supposedly)
faster execution
» fewer instructions and data to move between
memory and ALU
But registers are much faster than memory
– 30 times faster
(6.15)
Multiple Register Machines (continued)
Also, many operands are reused within a
short time
– waste time loading operand again the next
time it’s needed
Depending on mix of instructions and
operand use, having many registers may
lead to less traffic to memory and faster
execution
Most modern machines use a multiple
register architecture
– maximum number about 512, common
number 32 integer, 32 floating point
(6.16)
Pipelining
One way to speed up CPU is to increase
clock rate
– limitations on how fast clock can run to
complete instruction
Another way is to execute more than one
instruction at one time
(6.17)
Pipelining
Pipelining breaks instruction execution down
into several stages
– put registers between stages to “buffer” data
and control
– execute one instruction
– as first starts second stage, execute second
instruction, etc.
– speedup same as number of stages as long as
pipe is full
(6.18)
Pipelining (continued)
Consider an example with 6 stages
– FI = fetch instruction
– DI = decode instruction
– CO = calculate location of operand
– FO = fetch operand
– EI = execute instruction
– WO = write operand (store result)
(6.19)
Pipelining Example
0 1 2 3 4 5 6 7 8 clock
LD F10, LD F14, SB
16(R1) 24(R1) R1,R1,#4
8
LD LD AD F4,F0,F2 AD F8,F6,F2
F18,32(R1) F22,40(R1)
LD AD AD
F26,48(R1) F12,F10,F2 F16,F14,F2
(6.30)
Instruction Level Parallelism
Success of superscalar and VLIW machines
depends on number of instructions that occur
together that can be issued in parallel
– no dependencies
– no branches
Compilers can help create parallelism
Speculation techniques try to overcome
branch problems
– assume branch is taken
– execute instructions but don’t let them store
results until status of branch is known
(6.31)
CISC vs. RISC
CISC = Complex Instruction Set Computer
RISC = Reduced Instruction Set Computer
(6.32)
CISC vs. RISC (continued)
Historically, machines tend to add features
over time
– instruction opcodes
» IBM 70X, 70X0 series went from 24 opcodes to
185 in 10 years
» same time performance increased 30 times
– addressing modes
– special purpose registers
Motivations are to
– improve efficiency, since complex instructions
can be implemented in hardware and
execute faster
– make life easier for compiler writers
– support more complex higher-level languages
(6.33)
CISC vs. RISC
Examination of actual code indicated many
of these features were not used
RISC advocates proposed
– simple, limited instruction set
– large number of general purpose registers
» and mostly register operations
– optimized instruction pipeline
Benefits should include
– faster execution of instructions commonly
used
– faster design and implementation
(6.34)
CISC vs. RISC
Comparing some architectures