Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Data Hazards

Time (in clock cycles)

IM

ALU

Reg

Reg

IM

AND R6, R1, R7

OR R8, R1, R9

CC 4

DM

Reg

IM

CC 5

CC 6

Reg

DM

Reg

Reg

DM

ALU

or
Danger!Danger!Danger!

SUB R4, R1, R5

IM

CC 3

ALU

Pipeline Hazards

CC 2

ALU

Program execution order (in instructions)

ADD R1, R2, R3

CC 1

XOR R10, R1, R11


IM

CSE 240A

Dean Tullsen

Data Hazard
lw R8, 10000(R3)

ID/EX

IF/ID
4

ADD

add R6, R2, R1

M
u
x

addi R3, R1, #35

MEM/WB

Instruction
memory

IR

IR 11..15
MEM/WB.IR

Registers

M
u
x
M
u
x

16

CSE 240A

Sign
extend

Data hazards are caused by data dependences


Data dependences, and thus data hazards, come in
3 flavors (not all of which apply to this pipeline).
RAW (read-after-write)
WAW (write-after-write)
WAR (write-after-read)

Branch
taken

IR 6..10
PC

Dean Tullsen

Data Dependence

EX/MEM

Zero?

Reg

CSE 240A

ALU
Data
memory

M
u
x

32

Dean Tullsen

CSE 240A

Dean Tullsen

RAW Hazard

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

RAW hazard is extremely common

CSE 240A

Dean Tullsen

CSE 240A

Dealing with Data Hazards


through Forwarding

WAR Hazard

Dean Tullsen

later instruction tries to write an operand before earlier instruction


reads it
The dependence
add R1, R2, R3
sub R2, R5, R4
The hazard?
add R1, R2, R3
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
sub R2, R5, R4
WAR hazard is uncommon/impossible in a reasonable (in-order)
pipeline

ADD R1, R2, R3

SUB R4, R1, R5

AND R6, R1, R7

OR R8, R1, R9

CC 1

CC 2

IM

Reg

IM

CC 3

Reg

IM

CC 4

DM

Reg

IM

CC 5

CC 6

Reg

DM

Reg

Reg

DM

ALU

The hazard
add R1, R2, R3
sub R5, R1, R4

ALU

later instruction tries to write an operand before earlier


instruction writes it
The dependence
add R1, R2, R3
sub R1, R2, R4
The hazard
ID
EX
MEM MEM2 MEM3
lw R1, R2, R3 IF
IF
ID
EX
MEM
WB
sub R1, R2, R4
WAW hazard possible in a reasonable pipeline, but not in
the very simple pipeline were assuming.

ALU

later instruction tries to read an operand before earlier instruction


writes it
The dependence
add R1, R2, R3
sub R5, R1, R4

ALU

WAW Hazard

XOR R10, R1, R11


IM

CSE 240A

Dean Tullsen

CSE 240A

Reg

Dean Tullsen

WB

Dealing with Data Hazards through


Forwarding
AND R6, R4, R7

ID/EX

IF/ID

Zero?

MEM/WB

Branch
taken

IR 6..10

Instruction
memory

M
u
x

IR 11..15

IR

Registers

MEM/WB.IR

ALU

M
u
x

16

Sign
extend

Data
memory

M
u
x

(Im letting ADD stand in for all ALU operations)

32

CSE 240A

Dean Tullsen

CSE 240A

Dean Tullsen

More Forwarding
CC 1

CC 2

CC 3

CC 4

Forwarding and Stalling


CC 5

IM

Reg

ALU

LW

ADD R1, R2, R3

DM

R1, 0(R2)

Reg

ALU

IM

DM

Reg

ALU

IM

Reg

CC 4

CC 5

CC 6

DM

Reg

IM

Reg

Bubble

IM

Bubble

Reg

Bubble

IM

DM

DM
OR R8, R1, R9

CSE 240A

IM

CC 3

Reg

AND R6, R1, R7

SW 12(R1), R4

CC 2

Reg

SUB R4, R1, R5

LW R4, 0(R1)

CC 1

CC 6

ALU

PC

ADD -> ADD


ADD -> LW
ADD -> SW (2 operands)
LW -> ADD
LW -> LW
LW -> SW (2 operands)

ALU

ADD

EX/MEM

M
u
x

ADD R1, R2, R3

ALU

SUB R4, R1, R5

Forwarding Options

Dean Tullsen

CSE 240A

Reg

Dean Tullsen

Example

Avoiding Pipeline Stalls


lw R1, 1000(R2)
lw R3, 2000(R2)
add R4, R1, R3
lw R1, 3000(R2)
add R6, R4, R1
sw R6, 1000(R2)

ADD R1, R2, R3


SW R1, 1000(R2)
LW R7, 2000(R2)
ADD R5, R7, R1
LW R8, 2004(R2)
SW R7, 2008(R8)

ADD R8, R8, R2

this is a compiler technique called instruction scheduling.

LW R9, 1012(R8)
SW R9, 1016(R8)
CSE 240A

How big a problem are these pipeline


stalls?

Detecting ALU Input Hazards

13% of the loads in FP programs


25% of the loads in integer programs

ID/EX

opcode rd rs1 rs2

Dean Tullsen

CSE 240A

Dean Tullsen

CSE 240A

EX/MEM

MEM/WB

opcode rd

Dean Tullsen

opcode rd

CSE 240A

ALU

Dean Tullsen

Inserting Bubbles

Adding Datapaths

Set all control values in the EX/MEM register to safe


values (equivalent to a nop)
Keep same values in the ID/EX register and IF/ID register
Keep PC from incrementing

ID/EX

EX/MEM

MEM/WB

Zero?

M
u
x

ALU
Data
memory

M
u
x

CSE 240A

Dean Tullsen

CSE 240A

Dean Tullsen

Control Hazards

Branch Hazards

Instructions are not only dependent on instructions that


produce their operands, but also on all previous control
flow (branch, jump) instructions that lead to that
instruction.
add
add
bne
sub
and
beq

Branch

IF

I2
I3
I4
correct instruction

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

branch taken
and
sub

CSE 240A

Dean Tullsen

CSE 240A

Dean Tullsen

Branch Stall Impact

New Datapath
ID/EX

If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1.9!


Two part solution:

ADD
IF/ID

Determine branch taken or not sooner, AND


Compute taken branch address earlier

ADD

IR
IR
Instruction
memory

Dean Tullsen

correct instruction
I4
I5

CSE 240A

MEM/WB.IR

Registers

ALU

Sign
extend

Data

M
u
x

memory

32

CSE 240A

Branch Hazards

I2

6..10

11..15

16

CSE 240A

IF

IR

M
u
x

Move Zero test to ID/RF stage


Adder to calculate new PC in ID/RF stage
1 clock cycle penalty for branch versus 3

Branch

M
u
x

PC

(limited MIPS) branch tests if register = 0 or 0


MIPS Solution:

MEM/WB

EX/MEM
Zero?

Dean Tullsen

What We Know About Branches

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

more conditional branches than unconditional


more forward than backward
67% of branches taken
backward branches taken 80%

WB

Dean Tullsen

CSE 240A

Dean Tullsen

Four Branch Hazard Alternatives

Four Branch Hazard Alternatives


#1: Stall until branch direction is clear
#2: Predict Branch Not Taken

#4: Delayed Branch


Define branch to take place AFTER a following instruction

Execute successor instructions in sequence


Squash instructions in pipeline if branch actually taken
Advantage of late pipeline state update
33% MIPS branches not taken on average
PC+4 already calculated, so use it to get next instruction

branch instruction
sequential successor1
sequential successor2
........
sequential successorn
branch target if taken

#3: Predict Branch Taken


67% MIPS branches taken on average
But havent calculated branch target address in this MIPS architecture

1 slot delay allows proper decision and branch target address in 5 stage
pipeline
MIPS uses this

MIPS still incurs 1 cycle branch penalty


Other machines: branch target known before outcome

CSE 240A

Dean Tullsen

CSE 240A

Delayed Branch

Dean Tullsen

Key Points

Where to get instructions to fill branch delay slot?

Branch delay of length n

Before branch instruction


From the target address: only valuable when branch taken
From fall through: only valuable when branch not taken
Cancelling branches allow more slots to be filled

Compiler effectiveness for single branch delay slot:

Hard to keep the pipeline completely full


Data Hazards require dependent instructions to wait for the
producer instruction
Most of the problem handled with forwarding (bypassing)
Sometimes stall still required (especially in modern processors)

Control hazards require control-dependent (post-branch)


instructions to wait for the branch to be resolved

Fills about 60% of branch delay slots


About 80% of instructions executed in branch delay slots useful in
computation
About 50% (60% x 80%) of slots usefully filled
CSE 240A

Dean Tullsen

CSE 240A

Dean Tullsen

You might also like