Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Pipeline Hazards

Outline
❖ Pipeline Hazards

❖ Data Hazards and Forwarding

❖ Load Delay, Hazard Detection, and Stall

❖ Control Hazards

❖ Delayed Branch and Dynamic Branch Prediction

EC792 2
Pipeline Hazards
❖ Hazards: Situations that would cause incorrect execution
 If next instruction were launched during its designated clock cycle
1. Structural hazards
 Caused by resource contention - a required resource is busy
 Using same resource by two instructions during the same cycle
2. Data hazards
 An instruction may compute a result needed by next instruction
 Hardware can detect dependencies between instructions
 Need to wait for previous instruction to complete its data
read/write
3. Control hazards
 Caused by instructions that change control flow (branches/jumps)
 Delays in changing the flow of control
❖ Hazards complicate pipeline control and limit performance
EC792 3
Structural Hazards
❖ Problem: Shared code & data memory
❖ Conflict for use of a resource
❖ In RISC-V pipeline with a single memory
 Load/store requires data access
 Instruction fetch would have to stall for that cycle
▪ Would cause a pipeline “bubble”
❖ Hence, pipelined datapaths require separate
instruction/data memories
 Or separate instruction/data caches

EC792 4
Structural Hazards
❖ Problem: Conflict in accessing pipeline stages
 Attempt to use the same hardware resource by two different
instructions during the same cycle
Structural Hazard
❖ Example
Two instructions are
 Writing back ALU result in stage 4 attempting to write
 Conflict with writing load data in stage 5 the register file
during same cycle

lw R14, 8(R21) IF ID EX MEM WB


Instructions

ori R12, R19, 7 IF ID EX WB


sub R13, R18,R19 IF ID EX WB
sw R18, 10(R19) IF ID EX MEM

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 Time

EC792 5
Structural Hazards
❖ Problem: Conflict in accessing common memory
 Attempt to use the same hardware resource by two different
instructions during the same cycle
❖ Example Structural Hazard
Two instructions are
 Reading data from the memory in stage 4 attempting to access
 Conflict with reading instruction in stage 1 memory during the
same cycle

lw R14, 8(R21) IF ID EX MEM WB


Instructions

ori R12, R19, 7 IF ID EX WB


sub R13, R18,R19 IF ID EX WB
sw R18, 10(R19) IF ID EX MEM

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 Time

EC792 6
Resolving Structural Hazards
❖ Solution 1: Delay (Stall) Access to Resource
 Must have mechanism to delay instruction access to resource
▪ Hardware includes control logic that stalls until earlier instruction is
no longer using contended resource
 Delay all write backs to the register file to stage 5
▪ ALU instructions bypass stage 4 (memory) without doing anything
❖ Solution 2: Schedule instruction appropriately
 Programmer explicitly avoids scheduling instructions that would
create structural hazards.
❖ Solution 3: Add more hardware resources (more costly)
 Add more hardware to eliminate the structural hazard
▪ Have separate instruction and data memory
 Redesign the register file to have two write ports
▪ First write port can be used to write back ALU results in stage 4
▪ Second write port can be used to write back load data in stage 5
EC792 7
Next . . .
❖ Pipelining versus Serial Execution

❖ Pipelined Datapath and Control

❖ Pipeline Hazards

❖ Data Hazards and Forwarding

❖ Load Delay, Hazard Detection, and Stall

❖ Control Hazards

❖ Delayed Branch and Dynamic Branch Prediction

EC792 8
Data Hazards
❖ Dependency between instructions causes a data hazard
❖ Three types:
 Read after Write hazard (RAW)
 Write after Read hazard (WAR)
 Write after Write hazard (WAW)

❖ The dependent instructions are close to each other


 Pipelined execution might change the order of operand access

EC792 9
Data Hazards
❖ Read After Write – RAW Hazard
 Given two instructions I and J, where I comes before J
 Instruction J reads an operand after it is written by I
 Called a data dependence in compiler terminology
I: add R17, R18, R19 # R17 is written
J: sub R20, R17, R16 # R17 is read
 Hazard occurs when J reads the operand before I writes it

WB = Write Back
IF = Instruction Fetch ID = Decode & EX = Execute MEM = Memory
Register Read Access

Imm NPC2
Next
NPC

PC
+1 Imm26
ALU result 32
Imm16 zero
Instruction 32 Data
Register File

ALUout
Instruction

A
A

BusA

WB Data
Memory Rs 5 Memory
RA L 0
Instruction Rt 5 Address
0 RB E 1 U Data_out
32
1
PC

Address BusB
B

0 0 32
1 RW Data_in
D
1 BusW 32
Rd
32
clk 10
Example of a RAW Data Hazard
Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
value of R18 10 10 10 10 10 20 20 20
Program Execution Order

sub R18, R9, R11 IF Reg ALU DM Reg

add R20, R18, R13 IF Reg ALU DM Reg

or R22, R11, R18 IF Reg ALU DM Reg

and R23, R12, R18 IF Reg ALU DM Reg

sw R24, 10(R18) IF Reg ALU DM

❖ Result of sub is needed by add, or, and, & sw instructions


❖ Instructions add & or will read old value of R18 from reg file
❖ During CC5, R18 is written at end of cycle
EC792 11
Solution 1: Stalling the Pipeline
Time (in cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
value of R18 10 10 10 10 10 20 20 20 20
Instruction Order

sub R18, R9, R11 IF Reg ALU DM Reg

add R20, R18, R13 IF Reg Reg Reg Reg ALU DM Reg

stall stall stall


or R22, R11, R18 IF Reg ALU DM

❖ Three stall cycles (pipeline bubbles) during CC3 thru CC5


(wasting 3 cycles)
 Stall cycles delay execution of add & fetching of or instruction
❖ The add instruction cannot read R18 until beginning of CC6
 The add instruction remains in the Instruction register until CC6
 The PC register is not modified until beginning of CC6

EC792 12
Solution 2: Forwarding ALU Result
❖ The ALU result is forwarded (fed back) to the ALU input
 No bubbles are inserted into the pipeline and no cycles are wasted
❖ ALU result is forwarded from ALU, MEM, and WB stages
Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
value of R18 10 10 10 10 10 20 20 20
Program Execution Order

sub R18, R9, R11 IF Reg ALU DM Reg forward a=0

add R20, R18, R13 IF Reg ALU DM Reg forwardA=1

or R22, R11, R18 IF Reg ALU DM Reg forwardA=0


forwardB=2

and R23, R22, R18 IF Reg ALU DM Reg forwardA=1


forwardB=3

sw R24, 10(R18) IF Reg ALU DM

EC792 13
Implementing Forwarding
❖ Two multiplexers added at the inputs of A & B registers
 Data from ALU stage, MEM stage, and WB stage is fed back
❖ Two signals: ForwardA and ForwardB control forwarding
ForwardA

Im26
Imm26 Imm16
32 32 ALU result

0 A

Result
Register File

Rs BusA 1
L Address
A
RA 2
Instruction

Rt 3 E U Data 0

WData
1
RB BusB
0 Memory
0 32
1 32 32 Data_out 1
RW

D
B

BusW 2 Data_in
3
32
Rd2

Rd4
Rd3
0
1
Rd

clk
ForwardB

EC792 14
Forwarding Control Signals
Signal Explanation
ForwardA = 0 First ALU operand comes from register file = Value of (Rs)

ForwardA = 1 Forward result of previous instruction to A (from ALU stage)

ForwardA = 2 Forward result of 2nd previous instruction to A (from MEM stage)

ForwardA = 3 Forward result of 3rd previous instruction to A (from WB stage)

ForwardB = 0 Second ALU operand comes from register file = Value of (Rt)

ForwardB = 1 Forward result of previous instruction to B (from ALU stage)

ForwardB = 2 Forward result of 2nd previous instruction to B (from MEM stage)

ForwardB = 3 Forward result of 3rd previous instruction to B (from WB stage)

EC792 15
Forwarding Example
Instruction sequence: When sub instruction is in decode stage
lw R12, 4(R8) ori will be in the ALU stage
ori R15, R9, 2
lw will be in the MEM stage
sub R11, R12, R15
ForwardA = 2 from MEM stage ForwardB = 1 from ALU stage
sub R11,R12,R15 ori R15, R9,2 lw R12,4(R8)
Imm26

Imm16 2

Imm
ext
32 32 ALU result

0 A

Result
Register File

Rs BusA 1 A
L Address
RA 2
Instruction

Rt 3 U Data 0

WData
1
RB BusB
0 Memory
0 32
1 32 32 Data_out 1
RW

D
B

BusW 2 Data_in
3
32
Rd2

Rd3

Rd4
0
1
Rd 1

clk

EC792 16
RAW Hazard Detection
❖ Current instruction being decoded is in Decode stage
 Previous instruction is in the Execute stage
 Second previous instruction is in the Memory stage
 Third previous instruction in the Write Back stage

If ((Rs != 0) and (Rs == Rd2) and (EX.RegWrite)) ForwardA  1


Else if ((Rs != 0) and (Rs == Rd3) and (MEM.RegWrite)) ForwardA  2
Else if ((Rs != 0) and (Rs == Rd4) and (WB.RegWrite)) ForwardA  3
Else ForwardA  0

If ((Rt != 0) and (Rt == Rd2) and (EX.RegWrite)) ForwardB  1


Else if ((Rt != 0) and (Rt == Rd3) and (MEM.RegWrite)) ForwardB  2
Else if ((Rt != 0) and (Rt == Rd4) and (WB.RegWrite)) ForwardB  3
Else ForwardB  0

EC792 17
Hazard Detect and Forward Logic
Imm26

Im26
32 32 ALU result
E
0 A

Result
Register File
Rs BusA 1
L Address

A
RA 2
Instruction

Rt 3 U Data 0

WData
1
RB BusB
0 Memory
0 32
Rd 1 32 32 Data_out 1
RW

D
B
BusW 2 Data_in
3 ALUCtrl 32

Rd2

Rd3

Rd4
0
1

clk
RegDst
ForwardB ForwardA

Hazard Detect
and Forward
func
RegWrite RegWrite RegWrite
Op Main
EX

& ALU
MEM

Control

WB
EC792 18
Next . . .
❖ Pipelining versus Serial Execution

❖ Pipelined Datapath and Control

❖ Pipeline Hazards

❖ Data Hazards and Forwarding

❖ Load Delay, Hazard Detection, and Pipeline Stall

❖ Control Hazards

❖ Delayed Branch and Dynamic Branch Prediction

EC792 19
Load Delay
❖ Unfortunately, not all data hazards can be forwarded
 Load has a delay that cannot be eliminated by forwarding
❖ In the example shown below …
 The LW instruction does not read data until end of CC4
 Cannot forward data to ADD at end of CC3 - NOT possible
Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

lw R18, 20(R17) ALU


However, load can
IF Reg DM Reg
Program Order

forward data to
2nd next and later
add R20, R18, R13 IF Reg ALU DM Reg
instructions

or R14, R11, R18 IF Reg ALU DM Reg

and R15, R18, R12 IF Reg ALU DM Reg

EC792 20
Detecting RAW Hazard after Load
❖ Detecting a RAW hazard after a Load instruction:
 The load instruction will be in the EX stage
 Instruction that depends on the load data is in the decode stage

❖ Condition for stalling the pipeline


if ((EX.MemRead == 1) // Detect Load in EX stage
and (ForwardA==1 or ForwardB==1)) Stall // RAW Hazard

❖ Insert a bubble into the EX stage after a load instruction


 Bubble is a no-op that wastes one clock cycle
 Delays the dependent instruction after load by once cycle
▪ Because of RAW hazard

EC792 21
Stall the Pipeline for one Cycle
❖ ADD instruction depends on LW ➔ stall at CC3
 Allow Load instruction in ALU stage to proceed
 Freeze PC and Instruction registers (NO instruction is fetched)
 Introduce a bubble into the ALU stage (bubble is a NO-OP)
❖ Load can forward data to next instruction after delaying it
Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

lw R18, 20(R17) IF Reg ALU DM Reg


Program Order

add R20, R18, R13 IF stall bubble bubble bubble

Reg ALU DM Reg

or R14, R11, R18 IF Reg ALU DM Reg

EC792 22
Showing Stall Cycles
❖ Stall cycles can be shown on instruction-time diagram
❖ Hazard is detected in the Decode stage
❖ Stall indicates that instruction is delayed
❖ Instruction fetching is also delayed after a stall
❖ Example:
Data forwarding is shown using blue & red arrows

lw R17, (R13) IF ID EX MEM WB


lw R18, 8(R17) IF Stall ID EX MEM WB
add R2, R18, R11 IF Stall ID EX MEM WB
sub R1, R18, R2 IF ID EX MEM WB

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 Time
R2 data is available at ex stage and for sub alu i/p
is given by r2 and r18
EC792 23
Hazard Detect, Forward, and Stall
Imm26

Im26
32 32 ALU result
E
0 A

Result
Register File
Rs BusA 1
L Address

A
RA 2
Instruction

Rt 3 U Data 0

WData
1
RB BusB
PC

0 Memory
0 32
Rd 1 32 32 Data_out 1
RW

D
B
BusW 2 Data_in
3 32

Rd2

Rd3

Rd4
0
1

clk
RegDst
ForwardB ForwardA
Disable PC

func
Hazard Detect
Forward, & Stall

MemRead
Stall RegWrite RegWrite
Op Control Signals RegWrite
Main & ALU 0
EX

Control Bubble 1 MEM

WB
=0

EC792 24
Code Scheduling to Avoid Stalls
❖ Compilers reorder code in a way to avoid load stalls
❖ Consider the translation of the following statements:
A = B + C; D = E – F; // A thru F are in Memory

❖ Slow code: ❖ Fast code: No Stalls


lw R8, 4(R16) # &B = 4(R16) lw R8, 4(R16)
lw R9, 8(R16) # &C = 8(R16) lw R9, 8(R16)
add R10, R8, R9 # stall cycle lw R11, 16(R16)
sw R10, 0(R16) # &A = 0(R16) lw R12, 20(R16)
lw R11, 16(R16) # &E = 16(R16) add R10, R8, R9
lw R12, 20(R16) # &F = 20(R16) sw R10, 0(R16)
sub R13, R11, R12 # stall cycle sub R13, R11, R12
sw R13, 24(R16) # &D = 24(R16) sw R13, 24(R16)

EC792 25
Name Dependence: Write After Read
❖ Instruction J writes its result after it is read by I
❖ Called anti-dependence by compiler writers
I: sub R12, R9, R11 # R9 is read
J: add R9, R10, R11 # R9 is written
❖ Results from reuse of the name R9
❖ NOT a data hazard in the 5-stage pipeline because:
 Reads are always in stage 2
 Writes are always in stage 5, and
 Instructions are processed in order
❖ Anti-dependence can be eliminated by renaming
 Use a different destination register for add (eg, R13)
EC792 26
Name Dependence: Write After Write
❖ Same destination register is written by two instructions
❖ Called output-dependence in compiler terminology
I: sub R9, R12, R11 # R9 is written
J: add R9, R10, R11 # R9 is written again
❖ Not a data hazard in the 5-stage pipeline because:
 All writes are ordered and always take place in stage 5
❖ However, can be a hazard in more complex pipelines
 If instructions are allowed to complete out of order, and
 Instruction J completes and writes R9 before instruction I
❖ Output dependence can be eliminated by renaming R9
❖ Read After Read is NOT a name dependence
EC792 27
Hazards due to Loads and Stores
❖ Consider the following statements:
sw R10, 0(R1)
lw R11, 6(R5)
Is there any Data Hazard possible in the above code sequence?
If 0(R1) == 6(R5), then it means the same memory location!!
 Data dependency through Data Memory
 But no Data Hazard since the memory is not accessed by the two
instructions simultaneously – writing and reading happens in two
consecutive clock cycles
 But, in an out-of-order execution processor, this can lead to a
Hazard!
❖ Solving data hazards: Stall / Schedule / Forward / Speculate
 Speculate: Guess that there is not a problem, if incorrect kill speculative
instruction and restart (later in the course)
EC792 28

You might also like