Professional Documents
Culture Documents
L11 PipelineHazards 4
L11 PipelineHazards 4
Next . . .
❖ Control Hazards
EC792 2
Correlating Branches
❖ Basic two-bit predictor schemes
addi R2, R0, 2
use recent behavior of a branch bne R10, R2, B1
to predict its future behavior xor R10, R10, R10
j B3
❖ Improve the prediction accuracy
B1: bne R11, R2, B2
look also at recent behavior
xor R11, R11, R11
of other branches
j B3
if (aa==2) // B1 B2 B2
aa = 0; 0 1 0 1
if (bb==2) // B2
bb = 0;
B3 B3 B3 B3
if (aa!=bb) { // B3
……. Path: A:0-0 B:0-1 C:1-0 D:1-1
} aa=0 aa=0 aa2 aa2
bb=0 bb2 bb=0 bb2
❖ Branch direction
Not independent
Correlated to the path taken
❖ Example: Decision of B3 can be surely known beforehand if
the path to B3 is 0-0
❖ Track path using a 2-bit register
4
Correlating Branches
Example: BNE R1, R0 B1 ; (B1)(d!=0)
if (d==0) ADDI R1,R1,#1; since d==0, make d=1
d=1; B1 : SUBI R3,R1,#1
if (d==1) BNE R3, R0 B2; (B2)(d!=1)
.....
B2 :
EC792 7
Correlated Branch Prediction
❖ Idea: record m most recently executed branches as
taken or not taken, and use that pattern to select the
proper n-bit branch history table
❖ In general, (m,n) predictor means record last m branches
to select between 2m history tables, each with n-bit
counters
Thus, old 2-bit BHT is a (0,2) predictor
❖ Global Branch History: m-bit shift register keeping T/NT
status of last m branches.
❖ Each entry in table has m n-bit predictors.
EC792 8
Correlated Branch Predictor
2-bit shift register
Subsequent (global branch history)
branch
direction
select
Branch PC Branch PC
X X X X
Prediction w Prediction
hash . hash . . . .
. . . . .
. . . . .
. . . . .
2w
2-bit 2-bit 2-bit 2-bit
2-bit counter counter counter counter
counter
(2,2) Correlation Scheme
2-bit Sat. Counter Scheme
❖ (M,N) correlation scheme
M: shift register size (# bits)
N: N-bit counter
EC792 9
Correlating Branches
EC792 10
Tournament Predictors
Local
Predictor M
U
Global X
Predictor
Branch PC Tournament
Predictor
Table of 2-bit
saturating counters
EC792 11
Tournament Predictors
❖ Multilevel branch predictor
Selector for the Global and Local predictors of correlating branch
prediction
❖ Use n-bit saturating counter to choose between predictors
❖ Usual choice between global and local predictors
EC792 12
Tournament Predictors
EC792 13
Tournament Predictors
EC792 14
Tournament Predictors
• Advantage of tournament predictor is the ability to select the right
predictor for a particular branch
EC792 15
Accuracy v. Size (SPEC89)
10%
Conditional branch misprediction rate
9%
8%
5%
4%
Correlating - (2,2) scheme
3%
2% Tournament
1%
0%
0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128
EC792 17
Accuracy of Different Schemes
20%
16%
8%
6% 6% 6%
6%
5% 5%
4%
4%
2%
1% 1%
0%
0%
spice
gcc
fpppp
nasa7
doducd
li
matrix300
expresso
eqntott
tomcatv
4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)
EC792 18
Branch Prediction is More Important Today
Conditional branches still comprise about 20% of instructions
Correct predictions are more important today - why?
• Pipelines deeper
Branch not resolved until more cycles from fetching - therefore the
misprediction penalty is greater
▪ Cycle times smaller - more emphasis on throughput
(performance)
▪ More functionality between fetch & execute
• Multiple instruction issue (superscalars & VLIW)
Branch occurs almost every cycle
▪ Flushing & refetching more instructions
• Object-oriented programming
More indirect branches - which are harder to predict
• Dual of Amdahl’s Law
Other forms of pipeline stalling are being addressed - so the portion
of CPI due to branch delays is relatively larger
All this means that the potential stalling due to branches is greater
EC792 19
Branch Prediction is More Important Today
On the other hand,
• Chips are denser so we can consider sophisticated
HW solutions
• Hardware cost is small compared to the performance
gain
EC792 20
Directions in Branch Prediction
1: Improve the prediction
▪ Correlated (2-level) predictor (Pentium III - 512 entries, 2-bit,
Pentium Pro - 4 history bits)
• Hybrid local/global predictor (Alpha 21264)
• Confidence predictors
2: Determine the target earlier
• Branch target buffer (Pentium Pro, IA-64 Itanium)
• Next address in I-cache (Alpha 21264, UltraSPARC)
• Return address stack (Alpha 21264, IA-64 Itanium, MIPS R10000,
Pentium Pro, UltraSPARC-3)
3: Reduce misprediction penalty
• Fetch both instruction streams (IBM mainframes, SuperSPARC)
4: Eliminate the branch
• Predicated execution (IA-64 Itanium, Alpha 21264)
EC792 21
Predicated Execution
❖ Predicated instructions execute conditionally
Some other (previous) instruction sets a condition
Predicated instruction tests the condition & executes if the condition is
true
If the condition is false, predicated instruction isn’t executed
▪ i.e., instruction execution is predicated on the condition
Eliminates conditional branch (expensive if mispredicted)
▪ changes a control hazard to a data hazard
Fetch both true & false paths
EC792 22
Predicated Execution
❖ Example:
Without predicated execution
SUB R10, R4, R5 ❖ Advantages
➢ No branch hazard,
BEQZ R10, Label
➢ Creates straight line code
SUB R2, R1, R6
➢ More independent instructions
Label: OR R3, R2, R7
❖ Disadvantages
With predicated execution ➢ Instructions on both paths are
executed
SUB R10, R4, R5
➢ Hard to add predicated instructions
SUB R2, R1, R6, R10 to an existing instruction set;
OR R3, R2, R7 ➢ Additional register pressure
➢ Complex conditions if nested loops
EC792 23
Conditional Execution in ARM ISA
ARM ISA Example:
This is conditionally executed
; flags set by a previous based on the zero flag as,
instruction
; flags set by a previous
BNE Next instruction
LSL R0, R1, #24
ADD R0, R0, #2 LSLEQ R0, R1, #24
Next ADDEQ R0, R0, #2
;... ;...
EC792 24
Pipelining Complications
❖ Exceptions: Events other than branches or jumps that change the
normal flow of instruction execution. Some types of exceptions:
EC792 27
Pipelining Complications
❖ Our MIPS pipeline only writes results at the end of the
instruction’s execution. Not all processors do this.
❖ Address modes: Auto-increment causes register change
during instruction execution
Interrupts Need to restore register state
Adds WAR and WAW hazards since writes happen not only in
last stage
❖ Memory-Memory Move Instructions
Must be able to handle multiple page faults
VAX and x86 store values temporarily in registers
❖ Condition Codes
Need to detect the last instruction to change condition codes
EC792 28
Fallacies and Pitfalls
❖Pipelining is easy!
The basic idea is easy
The devil is in the details
▪ Detecting data hazards and stalling pipeline
EC792 29
Pipeline Hazards Summary
❖ Three types of pipeline hazards
Structural hazards: conflicts using a resource during same cycle
Data hazards: due to data dependencies between instructions
Control hazards: due to branch and jump instructions