Computer Architecture vs. Instruction Set Architecture

CMSC 411
Computer Systems Architecture

Lecture 4
MIPS ISA & Basic Pipelining
COMPUTER ARCHITECTURE
VS. INSTRUCTION SET
ARCHITECTURE
CMSC 411 - 1 2
CS252 S05
Instruction Set Architecture: Critical Interface
software
instruction set
hardware
• Properties of a good abstraction

– Lasts through many generations (portability)
– Used in many different ways (generality)
– Provides convenient functionality to higher levels
– Permits an efficient implementation at lower levels
CMSC 411 - 1 3
Example: MIPS
r0 0
Programmable storage Data types ?
r1
° 2^32 x bytes Format ?
° 31 x 32-bit GPRs (R0=0)
° Addressing Modes?
r31 32 x 32-bit FP regs (paired DP)
PC PC
Arithmetic logical
Add, AddU, Sub, SubU, And, Or, Xor, SLT, SLTU,
AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI,
SLL, SRL, SRA
Memory Access
LB, LBU, LH, LHU, LW,
SB, SH, SW
Control 32-bit instructions on word boundary
J, JAL, JR, JALR
BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ
CMSC 411 - 1 4
CS252 S05
Instruction Set Architecture (ISA)
“... the attributes of a [computing] system as seen by
the programmer, i.e. the conceptual structure and
functional behavior, as distinct from the organization
of the data flows and controls the logic design, and
the physical implementation.”
– Amdahl, Blaauw, and Brooks, 1964
SOFTWARE
-- Organization of Programmable
Storage
-- Data Types & Data Structures:
Encodings & Representations
-- Instruction Formats
-- Instruction (or Operation Code) Set
-- Modes of Addressing and Accessing Data Items and Instructions
-- Exceptional Conditions
CMSC 411 - 1 5
ISA vs. Computer Architecture

• Old definition of computer architecture
= instruction set design
– Other aspects of computer design called implementation
– Insinuates implementation is uninteresting or less challenging
• H&P’s view is computer architecture >> ISA
• Architect’s job much more than instruction set
design; technical hurdles today more challenging
than those in instruction set design
• Since instruction set design not where action is,
some conclude computer architecture (using old
definition) is not where action is
– H&P disagree on conclusion
– Agree that ISA not where action is (ISA in CA:AQA 4/e appendix)
CMSC 411 - 1 6
CS252 S05
Computer Architecture Is An Integrated Approach
• What really matters is the functioning of the complete

system
– hardware, runtime system, compiler, operating system, and
application
– In networking, this is called the “End to End argument”
• Computer architecture is not just about transistors,
individual instructions, or particular implementations
– E.g., Original RISC projects replaced complex instructions with a
compiler + simple instructions
CMSC 411 - 1 7
Computer Architecture Is Design & Analysis

Architecture is an iterative process:
Design
• Searching the space of possible designs
• At all levels of computer systems
Analysis
Creativity
Cost /
Performance
Analysis
Good Ideas
Mediocre Ideas
Bad Ideas CMSC 411 - 1 8
CS252 S05
MIPS INSTRUCTION SET
ARCHITECTURE
CMSC 411 - 1 9
A "Typical" RISC ISA
• 32-bit fixed format instruction (4 formats)

• 32 32-bit GPR (R0 contains zero, DP take pair)
• 3-address, reg-reg arithmetic instruction
• Single address mode for load/store:
base + displacement
– no indirection
• Simple branch conditions
• Delayed branch
see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC,
CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3
CMSC 411 - 3 (from Patterson) 10
CS252 S05
Example: MIPS
Register-Register
31 26 25 21 20 16 15 11 10 6 5 0
op rs rt rd opx
rd ← rs OP rt
Register-Immediate
31 26 25 21 20 16 15 0
op rs rt immediate
rt ← rs OP immed
Jump / Call
31 26 25 0
op target
5 Steps of MIPS Datapath

Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back
Next PC
MUX
Adder
Next SEQ PC
4 RS
Zero?
MUX MUX
Address
Memory
Reg File
RT
Inst
ALU
Memory
RD L
Data
M
MUX
D
Sign
IR ← mem[PC]; Imm Extend
PC ← PC + 4
WB Data
Reg[IRrd] ← Reg[IRrs] opIRop Reg[IRrt]
CS252 S05
5 Steps of MIPS Datapath
Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back
Next PC
MUX
Next SEQ PC Next SEQ PC
Adder
4 RS
Zero?
MUX MUX
MEM/WB
Address
Memory
EX/MEM
Reg File
RT
ID/EX
IF/ID
ALU
Memory
Data
MUX
IR ← mem[PC];
WB Data
Sign
PC ← PC + 4 Imm
Extend
A ← Reg[IRrs];
RD RD RD
B ← Reg[IRrt]
rslt ← A opIRop B
WB ← rslt
Reg[IRrd] ← WB
Instruction Set Processor Controller
IR ← mem[PC]; Ifetch
PC ← PC + 4
A ← Reg[IRrs]; opFetch-DCD
B ← Reg[IRrt]
br jmp LD
RR RI
r ← A opIRop B r ← A opIRop IRim r ← A + IRim
if bop(A,b) PC ← IRjaddr
PC ← PC+IRim
WB ← r WB ← r WB ← Mem[r]
Reg[IRrd] ← WB Reg[IRrt] ← WB Reg[IRrt] ← WB
CS252 S05
Visualizing Pipelining
Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

I
ALU
Reg
n Ifetch Reg DMem
s
t
r.
ALU
Ifetch Reg DMem Reg
O
r
ALU
Ifetch Reg DMem Reg
d
e
r
ALU
Ifetch Reg DMem Reg
Pipelining Is Not Quite That Easy!
• Limits to pipelining: Hazards prevent next instruction

from executing during its designated clock cycle
– Structural hazards: HW cannot support this combination of
instructions (single person to fold and put clothes away)
– Data hazards: Instruction depends on result of prior
instruction still in the pipeline (missing sock)
– Control hazards: Caused by delay between the fetching of
instructions and decisions about changes in control flow
(branches and jumps).
CS252 S05
One Memory Port/Structural Hazards
Time (clock cycles)

ALU
I Load Ifetch Reg DMem Reg
n
s
ALU
t
Instr 1 Ifetch Reg DMem Reg
r.
ALU
Ifetch Reg DMem Reg
Instr 2
O
r
ALU
Ifetch Reg DMem Reg
d Instr 3
e
ALU
r Instr 4 Ifetch Reg DMem Reg
One Memory Port/Structural Hazards

Time (clock cycles)
ALU
I Load Ifetch Reg DMem Reg
n
s
ALU
t
Instr 1 Ifetch Reg DMem Reg
r.
ALU
Ifetch Reg DMem Reg

Instr 2
O
r
d Stall Bubble Bubble Bubble Bubble Bubble
e
ALU
r Instr 3 Ifetch Reg DMem Reg
How do you “bubble” the pipe?

CS252 S05
Speed Up Equation for Pipelining
CPIpipelined = Ideal CPI + Average Stall cycles per Inst
Ideal CPI × Pipeline depth Cycle Timeunpipelined

Speedup = ×
Ideal CPI + Pipeline stall CPI Cycle Timepipelined
For simple RISC pipeline, ideal CPI = 1:
Pipeline depth Cycle Timeunpipelined

Speedup = ×
1 + Pipeline stall CPI Cycle Timepipelined
Example: Dual-port vs. Single-port
• Machine A: Dual ported memory (“Harvard Architecture”)

• Machine B: Single ported memory, but its pipelined
implementation has a 1.05 times faster clock rate
• Ideal CPI = 1 for both
• Loads are 40% of instructions executed
SpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockpipe)
= Pipeline Depth
SpeedUpB = Pipeline Depth/(1 + 0.4 x 1) x (clockunpipe/(clockunpipe / 1.05)
= (Pipeline Depth/1.4) x 1.05
= 0.75 x Pipeline Depth
SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33
• Machine A is 1.33 times faster
CS252 S05
Data Hazard on R1
Time (clock cycles)
IF ID/RF EX MEM WB
ALU
add r1,r2,r3 Ifetch Reg DMem Reg
n
s
ALU
t sub r4,r1,r3 Ifetch Reg DMem Reg
r.
ALU
Ifetch Reg DMem Reg
O and r6,r1,r7
r
d
ALU
Ifetch Reg DMem Reg
e or r8,r1,r9
r
ALU
Reg
xor r10,r1,r11 Ifetch Reg DMem
Three Generic Data Hazards
• Read After Write (RAW)

InstrJ tries to read operand before InstrI writes it
I: add r1,r2,r3
J: sub r4,r1,r3
• Caused by a “true / flow dependence” (in compiler

nomenclature). This hazard results from an actual
need for communication.
CS252 S05
• Write After Read (WAR)

InstrJ writes operand before InstrI reads it
I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7
• Called an “anti-dependence” by compiler writers.
This results from reuse of the name “r1”.
• Can’t happen in MIPS 5 stage pipeline because:

– All instructions take 5 stages, and
– Reads are always in stage 2, and
– Writes are always in stage 5

• Write After Write (WAW)
InstrJ writes operand before InstrI writes it.
I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7
• Called an “output dependence” by compiler writers
This also results from the reuse of name “r1”.
• Can’t happen in MIPS 5 stage pipeline because:
– All instructions take 5 stages, and
– Writes are always in stage 5
• Will see WAR and WAW in more complicated pipes
CS252 S05
Forwarding to Avoid Data Hazard
Time (clock cycles)
I
n
ALU
s
t
r.
ALU
sub r4,r1,r3 Ifetch Reg DMem Reg
O
r
ALU
Ifetch Reg DMem Reg
d and r6,r1,r7
e
r
ALU
Ifetch Reg DMem Reg
or r8,r1,r9
ALU
Ifetch Reg DMem Reg
xor r10,r1,r11
HW Change for Forwarding
NextPC
mux
Registers
MEM/WR
EX/MEM
ALU
ID/EX
Data
mux
Memory
mux
Immediate
CS252 S05
Forwarding to Avoid LW-SW Data Hazard
Time (clock cycles)
I
n
ALU
s
t
r.
ALU
lw r4, 0(r1) Ifetch Reg DMem Reg
O
r
ALU
Ifetch Reg DMem Reg
d sw r4,12(r1)
e
r
ALU
Ifetch Reg DMem Reg
or r8,r6,r9
ALU
Ifetch Reg DMem Reg
xor r10,r9,r11
Data Hazard Even with Forwarding
Time (clock cycles)

ALU
I lw r1, 0(r2) Ifetch Reg DMem Reg
n
s
ALU
t sub r4,r1,r6 Ifetch Reg DMem Reg
r.
ALU
O and r6,r1,r7 Ifetch Reg DMem Reg
r
d
e
ALU
Ifetch Reg DMem Reg
r
or r8,r1,r9
CS252 S05
Data Hazard Even with Forwarding
Time (clock cycles)

I
n
ALU
s lw r1, 0(r2) Ifetch Reg DMem Reg
t
r.
ALU
Ifetch Reg Bubble DMem Reg
sub r4,r1,r6
O
r
d
ALU
Ifetch Bubble Reg DMem Reg
e and r6,r1,r7
r
ALU
Bubble Ifetch Reg DMem
or r8,r1,r9
Software Scheduling Instead

Try producing fast code for
a = b + c;
d = e – f;
assuming a, b, c, d ,e, and f in memory.
Slow code: Fast code:
LW Rb,b LW Rb,b
LW Rc,c LW Rc,c
ADD Ra,Rb,Rc LW Re,e
SW a,Ra ADD Ra,Rb,Rc
LW Re,e LW Rf,f
LW Rf,f SW a,Ra
SUB Rd,Re,Rf SUB Rd,Re,Rf
SW d,Rd SW d,Rd
Compiler optimizes for performance. Hardware checks for safety.

CS252 S05

Computer Architecture vs. Instruction Set Architecture

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Architecture vs. Instruction Set Architecture

Uploaded by

Copyright:

Available Formats

CMSC 411

Computer Systems Architecture

• Properties of a good abstraction

ISA vs. Computer Architecture

• What really matters is the functioning of the complete

Computer Architecture Is Design & Analysis

A "Typical" RISC ISA

• 32-bit fixed format instruction (4 formats)

CMSC 411 - 3 (from Patterson) 10

CMSC 411 - 3 (from Patterson) 11

5 Steps of MIPS Datapath

CMSC 411 - 4 (from Patterson) 12

Instruction Set Processor Controller

Reg[IRrd] ← WB Reg[IRrt] ← WB Reg[IRrt] ← WB

CMSC 411 - 4 (from Patterson) 14

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

CMSC 411 - 4 (from Patterson) 15

Pipelining Is Not Quite That Easy!

• Limits to pipelining: Hazards prevent next instruction

CMSC 411 - 4 (from Patterson) 16

Time (clock cycles)

CMSC 411 - 4 (from Patterson) 17

One Memory Port/Structural Hazards

I Load Ifetch Reg DMem Reg

Ifetch Reg DMem Reg

r Instr 3 Ifetch Reg DMem Reg

How do you “bubble” the pipe?

CPIpipelined = Ideal CPI + Average Stall cycles per Inst

Ideal CPI × Pipeline depth Cycle Timeunpipelined

For simple RISC pipeline, ideal CPI = 1:

Pipeline depth Cycle Timeunpipelined

CMSC 411 - 4 (from Patterson) 19

Example: Dual-port vs. Single-port

• Machine A: Dual ported memory (“Harvard Architecture”)

CMSC 411 - 4 (from Patterson) 20

CMSC 411 - 4 (from Patterson) 21

Three Generic Data Hazards

• Read After Write (RAW)

• Caused by a “true / flow dependence” (in compiler

CMSC 411 - 4 (from Patterson) 22

• Write After Read (WAR)

• Can’t happen in MIPS 5 stage pipeline because:

Three Generic Data Hazards

CMSC 411 - 4 (from Patterson) 24

CMSC 411 - 4 (from Patterson) 25

HW Change for Forwarding

CMSC 411 - 4 (from Patterson) 26

CMSC 411 - 5 (from Patterson) 27

Data Hazard Even with Forwarding

Time (clock cycles)

I lw r1, 0(r2) Ifetch Reg DMem Reg

t sub r4,r1,r6 Ifetch Reg DMem Reg

O and r6,r1,r7 Ifetch Reg DMem Reg

Ifetch Reg DMem Reg

CMSC 411 - 5 (from Patterson) 28

Time (clock cycles)

CMSC 411 - 5 (from Patterson) 29

Software Scheduling Instead

Compiler optimizes for performance. Hardware checks for safety.

You might also like