Professional Documents
Culture Documents
04 Pipeline
04 Pipeline
Pipelining
Giorgio Richelli
Computer Architecture
Instruction Execution
• 5 basic steps
1. fetch instruction (F)
I-Fetch
2. decode instruction/read registers (R)
3. execute (X)
4. access memory (M) Decode
5. store result (W)
Execute
Memory
Write
Result
Computer Architecture
Why Pipeline?
Time (clock cycles)
ALU
I
n Inst 0 Im Reg Dm Reg
ALU
t Inst 1 Im Reg Dm Reg
r.
ALU
O Inst 2 Im Reg Dm Reg
r
d
Inst 3
ALU
Im Reg Dm Reg
e
r
Inst 4
ALU
Im Reg Dm Reg
• Before pipelining:
– Throughput: 1 instruction per cycle
tclk = t F + t R + t X + t M + tW
– (or lower cycle time and CPI=5)
Computer Architecture
Visualizing Pipelining
Time (clock cycles)
ALU
n Ifetch Reg DMem Reg
s
t
r.
ALU
Ifetch Reg DMem Reg
O
r
ALU
Ifetch Reg DMem Reg
d
e
r
ALU
Ifetch Reg DMem Reg
Computer Architecture
Pipeline Hazards
Computer Architecture
Pipeline Hazards
• Data hazards
– an instruction uses the result of a previous instruction
ADD R1, R2, R3 or SW R1, 3(R2)
ADD R4, R1, R5 LW R3, 3(R2)
• Control hazards
– the location of an instruction depends on a previous instruction
JMP R25
…
LOOP: ADD R1, R2, R3
• Structural hazards
– two instructions need access to the same resource
• e.g., single unit shared for instruction fetch and load/store
• collision in reservation table
Computer Architecture
Data Hazards: RAW
I: add r1,r2,r3
J: sub r4,r1,r3
Computer Architecture
Data Hazards: WAR
Computer Architecture
Data Hazards: WAW
Computer Architecture
Data Hazards (RAW)
Cycle
F R X M W
F R X M W
Computer Architecture
Types of Data Hazards
Computer Architecture
Datapath vs Control
Datapath Controller
Regs signals
A
L
U Control Points
Computer Architecture
Control Hazards
Cycle
F R X M W
F R X M W
Computer Architecture
Control Hazards
I Load Ifetch
ALU
Reg DMem Reg
n
s
ALU
Ifetch Reg DMem Reg
Instr 1
t
r.
ALU
Ifetch Reg DMem Reg
Instr 2
O
r
ALU
Ifetch Reg DMem Reg
Instr 3
d
e
ALU
Ifetch Reg DMem Reg
r Instr 4
Computer Architecture
Resolving Hazards: Pipeline Stalls
Computer Architecture
Resolving Hazards
I Load Ifetch
ALU
Reg DMem Reg
n
s Instr 1
ALU
Ifetch Reg DMem Reg
t
r.
ALU
Ifetch Reg DMem Reg
Instr 2
O
Stall Bubble Bubble Bubble Bubble Bubble
r
d
e
ALU
Ifetch Reg DMem Reg
Instr 3
r
Computer Architecture
Example Pipeline Stall (Diagram)
Cycle
F R X M W
F Bubble R X M W
Computer Architecture
Forwarding to Avoid Data Hazard
ALU
add r1,r2,r3 Ifetch Reg DMem Reg
s
t
r.
ALU
sub r4,r1,r3 Ifetch Reg DMem Reg
O
r
ALU
Ifetch Reg DMem Reg
d and r6,r1,r7
e
r
ALU
Ifetch Reg DMem Reg
or r8,r1,r9
ALU
Ifetch Reg DMem Reg
xor r10,r1,r11
Computer Architecture
Speedup Equation for Pipelining
Computer Architecture
Example: Dual-port vs. Single-port
Faster code:
Slow code:
LW Rb,b
LW Rb,b LW Rc,c
LW Rc,c LW Re,e
ADD Ra,Rb,Rc ADD Ra,Rb,Rc
SW a,Ra LW Rf,f
LW Re,e SW a,Ra
LW Rf,f SUB Rd,Re,Rf
SUB Rd,Re,Rf SW d,Rd
SW d,Rd
Computer Architecture
Branch Stall Impact
• Solution:
– Determine branch taken or not sooner, AND
– Compute taken branch address earlier
Computer Architecture
Branch Hazard Alternatives
Computer Architecture
Branch Hazard Alternatives
Delayed Branch
– Define branch to take place AFTER a following instruction
branch instruction
sequential successor1
sequential successor2 Branch delay of length n
........
sequential successorn
branch target if taken
– MIPS, SPARC
– Experience has shown it has issues:
• Makes code difficult to maintain and debug
• Compiler has to find an instruction which is safe to
execute regarless if the branch is taken or not
• If the hardware changes, old programs may not work at all
Computer Architecture
Scheduling Branch Delay Slots
• A is the best choice, fills delay slot & reduces instruction count (IC)
• In B, the sub instruction may need to be copied, increasing IC
• In B and C, must be okay to execute sub when branch fails
Computer Architecture
Delayed Branch
Computer Architecture
Exceptions and Interrupts
Computer Architecture
Exceptions and Interrupts
Computer Architecture
Precise Exceptions (Sequential Processor)
Computer Architecture