Data Hazards: Danger!Danger!Danger!

Data Hazards
Time (in clock cycles)
IM
ALU
Reg
Reg
IM
AND R6, R1, R7
OR R8, R1, R9
CC 4
DM
Reg
IM
CC 5
CC 6
Reg
DM
Reg
Reg
DM
ALU
or
Danger!Danger!Danger!
SUB R4, R1, R5
IM
CC 3
ALU
Pipeline Hazards
CC 2
ALU
Program execution order (in instructions)
ADD R1, R2, R3
CC 1
XOR R10, R1, R11

IM
CSE 240A
Dean Tullsen
Data Hazard
lw R8, 10000(R3)
ID/EX
IF/ID
4
ADD
add R6, R2, R1
M
u
x
addi R3, R1, #35
MEM/WB
Instruction
memory
IR
IR 11..15
MEM/WB.IR
Registers
M
u
x
M
u
x
16
CSE 240A
Sign
extend
Data hazards are caused by data dependences

Data dependences, and thus data hazards, come in
3 flavors (not all of which apply to this pipeline).
RAW (read-after-write)
WAW (write-after-write)
WAR (write-after-read)
Branch
taken
IR 6..10
PC
Dean Tullsen
Data Dependence
EX/MEM
Zero?
Reg
CSE 240A
ALU
Data
memory
M
u
x
32
Dean Tullsen
CSE 240A
Dean Tullsen
RAW Hazard
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
RAW hazard is extremely common
CSE 240A
Dean Tullsen
CSE 240A
Dealing with Data Hazards

through Forwarding
WAR Hazard
Dean Tullsen
later instruction tries to write an operand before earlier instruction

reads it
The dependence
add R1, R2, R3
sub R2, R5, R4
The hazard?
add R1, R2, R3
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
sub R2, R5, R4
WAR hazard is uncommon/impossible in a reasonable (in-order)
pipeline
ADD R1, R2, R3
SUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9
CC 1
CC 2
IM
Reg
IM
CC 3
Reg
IM
CC 4
DM
Reg
IM
CC 5
CC 6
Reg
DM
Reg
Reg
DM
ALU
The hazard
add R1, R2, R3
sub R5, R1, R4
ALU
later instruction tries to write an operand before earlier

instruction writes it
The dependence
add R1, R2, R3
sub R1, R2, R4
The hazard
ID
EX
MEM MEM2 MEM3
lw R1, R2, R3 IF
IF
ID
EX
MEM
WB
sub R1, R2, R4
WAW hazard possible in a reasonable pipeline, but not in
the very simple pipeline were assuming.
ALU
later instruction tries to read an operand before earlier instruction

writes it
The dependence
add R1, R2, R3
sub R5, R1, R4
ALU
WAW Hazard
XOR R10, R1, R11

IM
CSE 240A
Dean Tullsen
CSE 240A
Reg
Dean Tullsen
WB
Dealing with Data Hazards through

Forwarding
AND R6, R4, R7
ID/EX
IF/ID
Zero?
MEM/WB
Branch
taken
IR 6..10
Instruction
memory
M
u
x
IR 11..15
IR
Registers
MEM/WB.IR
ALU
M
u
x
16
Sign
extend
Data
memory
M
u
x
(Im letting ADD stand in for all ALU operations)
32
CSE 240A
Dean Tullsen
CSE 240A
Dean Tullsen
More Forwarding
CC 1
CC 2
CC 3
CC 4
Forwarding and Stalling

CC 5
IM
Reg
ALU
LW
ADD R1, R2, R3
DM
R1, 0(R2)
Reg
ALU
IM
DM
Reg
ALU
IM
Reg
CC 4
CC 5
CC 6
DM
Reg
IM
Reg
Bubble
IM
Bubble
Reg
Bubble
IM
DM
DM
OR R8, R1, R9
CSE 240A
IM
CC 3
Reg
AND R6, R1, R7
SW 12(R1), R4
CC 2
Reg
SUB R4, R1, R5
LW R4, 0(R1)
CC 1
CC 6
ALU
PC
ADD -> ADD

ADD -> LW
ADD -> SW (2 operands)
LW -> ADD
LW -> LW
LW -> SW (2 operands)
ALU
ADD
EX/MEM
M
u
x
ADD R1, R2, R3
ALU
SUB R4, R1, R5
Forwarding Options
Dean Tullsen
CSE 240A
Reg
Dean Tullsen
Example
Avoiding Pipeline Stalls

lw R1, 1000(R2)
lw R3, 2000(R2)
add R4, R1, R3
lw R1, 3000(R2)
add R6, R4, R1
sw R6, 1000(R2)
ADD R1, R2, R3

SW R1, 1000(R2)
LW R7, 2000(R2)
ADD R5, R7, R1
LW R8, 2004(R2)
SW R7, 2008(R8)
ADD R8, R8, R2
this is a compiler technique called instruction scheduling.
LW R9, 1012(R8)
SW R9, 1016(R8)
CSE 240A
How big a problem are these pipeline

stalls?
Detecting ALU Input Hazards
13% of the loads in FP programs

25% of the loads in integer programs
ID/EX
opcode rd rs1 rs2
Dean Tullsen
CSE 240A
Dean Tullsen
CSE 240A
EX/MEM
MEM/WB
opcode rd
Dean Tullsen
opcode rd
CSE 240A
ALU
Dean Tullsen
Inserting Bubbles
Adding Datapaths
Set all control values in the EX/MEM register to safe

values (equivalent to a nop)
Keep same values in the ID/EX register and IF/ID register
Keep PC from incrementing
ID/EX
EX/MEM
MEM/WB
Zero?
M
u
x
ALU
Data
memory
M
u
x
CSE 240A
Dean Tullsen
CSE 240A
Dean Tullsen
Control Hazards
Branch Hazards
Instructions are not only dependent on instructions that

produce their operands, but also on all previous control
flow (branch, jump) instructions that lead to that
instruction.
add
add
bne
sub
and
beq
Branch
IF
I2
I3
I4
correct instruction
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
branch taken
and
sub
CSE 240A
Dean Tullsen
CSE 240A
Dean Tullsen
Branch Stall Impact
New Datapath
ID/EX
If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1.9!

Two part solution:
ADD
IF/ID
Determine branch taken or not sooner, AND

Compute taken branch address earlier
ADD
IR
IR
Instruction
memory
Dean Tullsen
correct instruction
I4
I5
CSE 240A
MEM/WB.IR
Registers
ALU
Sign
extend
Data
M
u
x
memory
32
CSE 240A
Branch Hazards
I2
6..10
11..15
16
CSE 240A
IF
IR
M
u
x
Move Zero test to ID/RF stage

Adder to calculate new PC in ID/RF stage
1 clock cycle penalty for branch versus 3
Branch
M
u
x
PC
(limited MIPS) branch tests if register = 0 or 0

MIPS Solution:
MEM/WB
EX/MEM
Zero?
Dean Tullsen
What We Know About Branches
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
more conditional branches than unconditional

more forward than backward
67% of branches taken
backward branches taken 80%
WB
Dean Tullsen
CSE 240A
Dean Tullsen
Four Branch Hazard Alternatives
Four Branch Hazard Alternatives

#1: Stall until branch direction is clear
#2: Predict Branch Not Taken
#4: Delayed Branch

Define branch to take place AFTER a following instruction
Execute successor instructions in sequence

Squash instructions in pipeline if branch actually taken
Advantage of late pipeline state update
33% MIPS branches not taken on average
PC+4 already calculated, so use it to get next instruction
branch instruction
sequential successor1
sequential successor2
........
sequential successorn
branch target if taken
#3: Predict Branch Taken

67% MIPS branches taken on average
But havent calculated branch target address in this MIPS architecture
1 slot delay allows proper decision and branch target address in 5 stage
pipeline
MIPS uses this
MIPS still incurs 1 cycle branch penalty

Other machines: branch target known before outcome
CSE 240A
Dean Tullsen
CSE 240A
Delayed Branch
Dean Tullsen
Key Points
Where to get instructions to fill branch delay slot?
Branch delay of length n
Before branch instruction

From the target address: only valuable when branch taken
From fall through: only valuable when branch not taken
Cancelling branches allow more slots to be filled
Compiler effectiveness for single branch delay slot:
Hard to keep the pipeline completely full

Data Hazards require dependent instructions to wait for the
producer instruction
Most of the problem handled with forwarding (bypassing)
Sometimes stall still required (especially in modern processors)
Control hazards require control-dependent (post-branch)

instructions to wait for the branch to be resolved
Fills about 60% of branch delay slots

About 80% of instructions executed in branch delay slots useful in
computation
About 50% (60% x 80%) of slots usefully filled
CSE 240A
Dean Tullsen
CSE 240A
Dean Tullsen

Data Hazards: Danger!Danger!Danger!

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Hazards: Danger!Danger!Danger!

Uploaded by

Copyright:

Available Formats

Data Hazards

Time (in clock cycles)

AND R6, R1, R7

SUB R4, R1, R5

Program execution order (in instructions)

ADD R1, R2, R3

XOR R10, R1, R11

add R6, R2, R1

addi R3, R1, #35

Data hazards are caused by data dependences

RAW hazard is extremely common

Dealing with Data Hazards

later instruction tries to write an operand before earlier instruction

ADD R1, R2, R3

SUB R4, R1, R5

AND R6, R1, R7

later instruction tries to write an operand before earlier

later instruction tries to read an operand before earlier instruction

XOR R10, R1, R11

Dealing with Data Hazards through

(Im letting ADD stand in for all ALU operations)

Forwarding and Stalling

ADD R1, R2, R3

AND R6, R1, R7

SUB R4, R1, R5

ADD -> ADD

ADD R1, R2, R3

SUB R4, R1, R5

Avoiding Pipeline Stalls

ADD R1, R2, R3

ADD R8, R8, R2

this is a compiler technique called instruction scheduling.

How big a problem are these pipeline

Detecting ALU Input Hazards

13% of the loads in FP programs

opcode rd rs1 rs2

Set all control values in the EX/MEM register to safe

Instructions are not only dependent on instructions that

Branch Stall Impact

If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1.9!

Determine branch taken or not sooner, AND

Move Zero test to ID/RF stage

(limited MIPS) branch tests if register = 0 or 0

What We Know About Branches

more conditional branches than unconditional

Four Branch Hazard Alternatives

Four Branch Hazard Alternatives

#4: Delayed Branch

Execute successor instructions in sequence

#3: Predict Branch Taken

MIPS still incurs 1 cycle branch penalty

Where to get instructions to fill branch delay slot?

Branch delay of length n

Before branch instruction

Compiler effectiveness for single branch delay slot:

Hard to keep the pipeline completely full

Control hazards require control-dependent (post-branch)

Fills about 60% of branch delay slots

You might also like