Professional Documents
Culture Documents
Instruction Level Parallelism: Pipelining
Instruction Level Parallelism: Pipelining
Instruction Level Parallelism: Pipelining
Pipelining (ILP)
The concept:
A task is broken down into steps. Assume that there are N steps, each takes the same amount of
time.
(Mark Hill's) EXAMPLE: car wash
steps:
P
W
R
D
X
------
prep
wash
rinse
dry
wax
1
P
time units
3 4 5 6
R D X
P
2
W
9 10 11 12 13 14 15
9 10 11 12 13 14 15
X
D
R
X
D
1
P
time
3
R
W
P
2
W
P
units
4 5
D X
R D
W R
P W
IT STILL TAKES 5 TIME UNITS TO WASH 1 CAR, BUT THE RATE OF CAR WASHES GOES UP!
Two very important terms when discussing pipelining:
1
F
time units
3 4 5 6
2
E
F
E
F
E
F
8 . . .
2 time units
5-stage pipeline
A popular pipelined implementation:
(Note: the R2000/3000 has 5 stages, the R6000 has 5 stages (but different), and the R4000 has 8
stages)
steps:
IF -ID -EX -MA -WB --
which
instruction
1
2
3
time units
2
3
4
ID EX MA
IF ID EX
IF ID
1
IF
5
WB
MA
EX
WB
MA
WB
8 . . .
lw
$8, data1
addi $9, $8, 1
The data loaded does not get written to $8 until WB, but the addi instruction wants to get the data
out of $8 it its ID stage. . .
which
instruction
lw
time units
2
3
4
ID EX MA
1
IF
addi
IF
ID
^^
EX
5
WB
^^
MA
8 . . .
WB
The simplest solution is to STALL the pipeline. (Also called HOLES, HICCOUGHS or
BUBBLES in the pipe.)
which
instruction
lw
time units
2
3
4
ID EX MA
1
IF
addi
IF
ID
^^
ID
^^
5
6
7
8 . . .
WB
^^
ID EX MA WB
^^ (pipeline stalling)
which
instruction
b
addi
time units
2
3
4
ID EX MA
1
IF
IF
^^
5
6
7 8 . . .
WB
^^ (PC changed here)
ID EX MA WB
(WRONG instruction fetched here!)
Whenever the PC changes (except for PC <- PC + 4), we have a control dependency.
Control dependencies break pipelines. They cause performance to plummet.
So, lots of (partial) solutions have been implemented to try to help the situation. Worst case, the
pipeline must be stalled such that instructions are going through sequentially.
Note that just stalling does not really help, since the (potentially) wrong instruction is fetched before
it is determined that the previous instruction is a branch.
How to minimize the effect of control dependencies on pipelines.
Easiest solution (poor performance):
Cancel anything (later) in the pipe when a branch (jump) is decoded. This works as long as
nothing changes the program's state before the cancellation. Then let the branch instruction
finish ("flush the pipe"), and start up again.
which
instruction
b
addi
mult
time units
2
3
4
ID EX MA
1
IF
5
6
7 8 . . .
WB
^^ (PC changed here)
IF
^^ (cancelled)
IF
ID
EX
MA
WB
Note that in this code example, we want one of two possibilities for the code that gets
executed:
1)
add $8, $9, $10
beq $3, $4, label
move $18, $5
or 2)
If the assembler has any smarts at all, it would REARRANGE the code to be
beq $3, $4, label
add $8, $9, $10
move $18, $5
.
.
.
label:
This code can be rearranged only if there are no data dependencies between the branch and
the add instructions. In fact, any instruction from before the branch (and after any previous
branch) can be moved into the DELAY SLOT, as long as there are no dependencies on it.
Delayed branching depends on a smart assembler (sw) to make the hardware perform at
peak efficiency. This is a general trend in the field of computer science. Let the sw do more
and more to improve performance of the hw.
An aside, on condition codes
A historically significant way of branching. Condition codes were used on MANY machines before
pipelining became popular.
4 1-bit registers (condition code register):
N -- negative
V -- overflow
P -- positive
Z -- zero
The result of an instruction set these 4 bits. Conditional branches were then based on these flags.
Example: bn label # branch to label if the N bit is set
Earlier computers had virtually every instruction set the condition codes. This had the effect that the
test (for the branch) needed to come directly before the branch.
Example:
sub r3, r4, r5
bn label