Professional Documents
Culture Documents
2 Pipelining 2
2 Pipelining 2
2 Pipelining 2
Architecture
Lecture – 15, 16: Pipelined Processor – Data and Control
Hazards
07, 08 September 2020
Data Hazards
These occur when at any time, there are instructions
active that need to access the same data (memory or
register) locations.
instruction A
instruction B
I: add r1,r2,r3
J: sub r4,r1,r3
• Caused by a “Dependence” (in compiler nomenclature). This hazard
results from an actual need for communication.
Data Hazards
Write After Read (WAR)
Execution Order is:
InstrJ tries to write operand before InstrI reads i
InstrI
– Gets wrong operand
InstrJ
I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7
I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7
• Bypass/forward/shortcircuit
• Use data before it is in the register
+ reduces/avoids stalls
-- complex
• Crucial for common RAW hazards
Data Hazards
Time (clock cycles)
IF ID/R F EX MEM WB
ALU
I add r1,r2,r3 Ifetch Reg DMem Reg
n
s
ALU
t sub r4,r1,r3 Ifetch Reg DMem Reg
r.
ALU
and r6,r1,r7 Ifetch Reg DMem Reg
O
r
ALU
d or r8,r1,r9 Ifetch Reg DMem Reg
e
r
ALU
xor r10,r1,r11 Ifetch Reg DMem Reg
The use of the result of the ADD instruction in the next three instructions causes a hazard,
since the register is not written until after those instructions read it.
Forwarding is the concept of making data
available to the input of the ALU for subsequent
Data Hazards instructions, even though the generating
instruction hasn’t gotten to WB in order to write
Forwarding To Avoid the memory or registers.
Data Hazard
Time (clock cycles)
I
add r1,r2,r3
ALU
n Ifetch Reg DMem Reg
s
t
ALU
r. sub r4,r1,r3 Ifetch Reg DMem Reg
ALU
r Reg
and r6,r1,r7 Ifetch Reg DMem
d
e
r
ALU
Reg
or r8,r1,r9 Ifetch Reg DMem
ALU
Reg
xor r10,r1,r11 Ifetch Reg DMem
The data isn’t loaded until after
Data Hazards the MEM stage.
Time (clock cycles)
lw r1, 0(r2)
ALU
I Ifetch Reg DMem Reg
n
s
t
ALU
sub r4,r1,r6 Ifetch Reg DMem Reg
r.
ALU
r and r6,r1,r7 Ifetch Reg DMem Reg
d
e
ALU
r or r8,r1,r9 Ifetch Reg DMem Reg
There are some instances where hazards occur, even with forwarding.
The stall is necessary as shown
Data Hazards here.
Time (clock cycles)
I
n
lw r1, 0(r2)
ALU
Ifetch Reg DMem Reg
s
t
r.
ALU
O sub r4,r1,r6 Ifetch Reg Bubble DMem Reg
r
d
ALU
e and r6,r1,r7 Ifetch Bubble Reg DMem Reg
ALU
Bubble Ifetch Reg DMem
or r8,r1,r9
There are some instances where hazards occur, even with forwarding.
Pipeline interlock – detects hazards and stalls.
This is another
representation of
Data Hazards the stall.
lw Rb, b
lw Rc, c
Add Ra, Rb, Rc
sw a, Ra
lw Re, e
lw Rf, f
sub Rd, Re, Rf
sw d, Rd
Data Hazards
Pipeline Scheduling
lw Rb, b
lw Rc, c
lw Re, e
Add Ra, Rb, Rc
lw Rf, f
sw a, Ra
sub Rd, Re, Rf
sw d, Rd
Pipeline Scheduling
Data Hazards
scheduled unscheduled
54%
gcc 31%
spice 42%
14%
tex 65%
25%
ALU
Ifetch Reg DMem Reg
The PC is
not written
until the
end of 4th
ALU
14: and r2,r3,r5 Ifetch Reg DMem Reg
clock
cycle!
ALU
18: or r6,r1,r7 Ifetch Reg DMem Reg
ALU
22: add r8,r1,r9 Ifetch Reg DMem Reg
ALU
36: xor r10,r1,r11 Ifetch Reg DMem Reg
Control Hazards Branch Stall Impact
• If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1.9!
(Whoa! How did we get that 1.9???)
• Two part solution to this dramatic increase:
– Determine branch taken or not sooner, AND
– Compute taken branch address earlier
• MIPS Solution:
– Move Zero test to ID/IF stage
– Adder to calculate new PC in ID/IF stage
●
must be fast
●
can't afford to subtract
●
compares with 0 are simple
●
Greater-than, Less-than test sign-bit, but not-equal must OR all bits
●
more general compares need ALU
– 1 clock cycle penalty for branch versus 3
1 clock cycle penalty
1 clock cycle penalty
Control Hazards Four Branch Hazard
Alternatives
branch instruction
sequential successor1
sequential successor2 Branch delay of length n
........
sequential successorn
branch target if taken
– 1 slot delay allows proper decision and branch target address in 5 stage
pipeline
– MIPS uses this – branch delay slot
Delayed Branch
Control Hazards Delayed Branch
• Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per
clock (superscalar)
Delay slot example
●
From before: ●
BEQZ $r2, label
ADD $r1, $r2, $r3 ADD $r1, $r2, $r3
BEQ $r2,$0, label
Delay slot
●
From target
●
From fall-through
Control Hazards Evaluating Branch
Alternatives
• Branch Prediction:
– Static low-cost Branch Prediction: rely on information available at compile
time
– Dynamic Branch Prediction: based on program behavior.
Control Hazards
●
Dynamic Branch Prediction: branch-prediction
buffer or branch history table (1-bit scheme)
– Small memory indexed by the lower portion of the
address of the branch instruction.
– A bit to indicate taken or not taken
– If the hint turns out to be wrong, the prediction bit is
inverted and stored back.
●
Even if a branch is almost always taken, we will likely predict
incorrectly twice, rather than once, when it is not taken
Control Hazards
●
Dynamic Branch Prediction: branch-prediction
buffer or branch history table (2-bit scheme)
Studies of n-bit predictors have shown that the 2-bit predictors do almost as well, thus most
systems rely on 2-bit branch predictors rather than the more general n-bit predictors.
Control Hazards
●
Dynamic Branch Prediction: branch-prediction buffer or
branch history table (2-bit scheme)
– A branch-prediction buffer can be implemented as a small, special
“cache” accessed with the instruction address during the IF pipe
stage, or
– as a pair of bits attached to each block in the instruction cache and
fetched with the instruction.
– If the instruction is decoded as a branch and if the branch is
predicted as taken, fetching begins from the target as soon as the
PC is known.
– Otherwise, sequential fetching and executing continue.
Control Hazards