Professional Documents
Culture Documents
Lect8 Pipelined DP Control
Lect8 Pipelined DP Control
Lect8 Pipelined DP Control
Pipelining
Pipelining
Start work ASAP!! Do not waste time!
6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order
Not pipelined
A
Assume 30 min. each task – wash, dry, fold, store – and that
separate tasks use separate hardware and so can be overlapped
6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order
A
Pipelined
B
D
Pipelined vs. Single-Cycle
Instruction Execution: the Plan
Program
execution 2 4 6 8 10 12 14 16 18
order Time
Single-cycle
(in instructions)
Instruction Data
lw $1, 100($0) fetch
Reg ALU
access
Reg
Instruction Data
lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg
Instruction
lw $3, 300($0) 8 ns fetch
...
8 ns
Instruction Data
Pipelined
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
2 ns 2 ns 2 ns 2 ns 2 ns
Pipelining: Keep in Mind
Pipelining does not reduce latency of a single task, it
increases throughput of entire workload
Pipeline rate limited by longest stage
potential speedup = number pipe stages
unbalanced lengths of pipe stages reduces speedup
Time to fill pipeline and time to drain it – when there is
slack in the pipeline – reduces speedup
Pipelining MIPS
What makes it easy with MIPS?
all instructions are same length
so fetch and decode stages are similar for all instructions
just a few instruction formats
simplifies instruction decode and makes it possible in one stage
memory operands appear only in load/stores
so memory access can be deferred to exactly one later stage
operands are aligned in memory
one data transfer instruction requires one memory access stage
Pipelining MIPS
What makes it hard?
structural hazards: different instructions, at different stages,
in the pipeline want to use the same hardware resource
control hazards: succeeding instruction, to put into pipeline,
depends on the outcome of a previous branch instruction,
already in pipeline
data hazards: an instruction in the pipeline requires data to
be computed by a previous instruction still in the pipeline
2 ns 2 ns 2 ns 2 ns 2 ns
Program
execution 2 4 6 8 10 12 14 16
order Time
(in instructions)
Instruction
Reg ALU
Data
Reg Note that branch outcome is
add $4, $5, $6 fetch access computed in ID stage with
Instruction Data added hardware (later…)
beq $1, $2, 40 Reg ALU Reg
2ns fetch access
Instruction Data
lw $3, 300($0) bubble fetch
Reg ALU
access
Reg
4 ns 2ns
Pipeline stall
Control Hazards
Solution 2 Predict branch outcome
e.g., predict branch-not-taken :
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5, $6 fetch
Reg ALU
access
Reg
Instruction Data
beq $1, $2, 40 fetch
Reg ALU
access
Reg
2 ns
Instruction Data
lw $3, 300($0) fetch
Reg ALU
access
Reg
2 ns
Prediction success
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5 ,$6 Reg ALU Reg
fetch access
Instruction Data
beq $1, $2, 40 fetch
Reg ALU
access
Reg
2 ns
bubble bubble bubble bubble bubble
Instruction Data
Reg ALU Reg
or $7, $8, $9 fetch access
4 ns
Prediction failure: undo (=flush) lw
Control Hazards
Solution 3 Delayed branch: always execute the sequentially next
statement with the branch executing after one instruction delay
– compiler’s job to find a statement that can be put in the slot
that is independent of branch outcome
MIPS does this
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
lw $3, 300($0) Reg ALU Reg
2 ns fetch access
2 ns
2 4 6 8 10
Time
Instruction pipeline diagram:
shade indicates use –
add $s0, $t0, $t1 IF ID EX MEM WB left=write, right=read
P rogra m
e xe cution 2 4 6 8 10
order T im e
(in instructions)
Without forwarding – blue line –
a dd $s0 , $ t0 , $t1 IF ID EX MEM WB data has to go back in time;
with forwarding – red line –
data is available in time
2 4 6 8 10 12 14
P rogra m T ime
execution
order
(in instructions)
Without a stall it is impossible
IF ID
lw $ s0 , 2 0($t1 ) EX MEM WB
to provide input to the sub
instruction in time
sub $ t2 , $ s0 , $ t3 IF ID EX MEM WB
2 4 6 8 10 12 14
P rogra m T ime
execution
order
(in instructions)
With a one-stage stall, forwarding
lw $ s0 , 2 0($t1 ) IF ID EX MEM WB can get the data to the sub
instruction in time
sub $ t2 , $ s0 , $ t3 IF ID WB
EX MEM
Reordering Code to Avoid
Pipeline Stall (Software Solution)
Example:
lw $t0, 0($t1)
lw $t2, 4($t1)
Data hazard
sw $t2, 0($t1)
sw $t0, 4($t1)
Reordered code:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t0, 4($t1)
Interchanged
sw $t2, 0($t1)
Pipelined Datapath
We now move to actually building a pipelined datapath
First recall the 5 steps in instruction execution
1. Instruction Fetch & PC Increment (IF)
2. Instruction Decode and Register Read (ID)
3. Execution or calculate address (EX)
4. Memory access (MEM)
5. Write result into register (WB)
Review: single-cycle processor
all 5 steps done in a single clock cycle
dedicated hardware required for each step
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD M
E Memory U
16 X 32 X
T WD
N
D
IF ID EX MEM WB
Instruction Fetch Instruction Decode Execute/ Address Calc. Memory Access Write Back
Pipelined Datapath – Key Idea
What happens if we break the execution into multiple cycles, but keep
the extra hardware?
Answer: We may be able to start executing a new instruction at each
clock cycle - pipelining
…but we shall need extra registers to hold data between cycles
– pipeline registers
Pipelined Datapath
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD M
E Memory U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD M
E Memory U
16 X 32 X
T WD
N
D
Wrong
register
number
Corrected Datapath for Load
EX for Store
MEM for Store
WB for Store
Bug in the Datapath
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
ADD
ADD
4 64 bits 133 bits
102 bits 69 bits
<<2
PC
ADDR RD 5
RN1 RD1
32
ALU Zero
Instruction RN2
5
Memory Register
5
WN File RD2 M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5 D
LW
Single-Clock-Cycle Diagram:
Clock Cycle 2
SW LW
Single-Clock-Cycle Diagram:
Clock Cycle 3
ADD SW LW
Single-Clock-Cycle Diagram:
Clock Cycle 4
SUB ADD SW LW
Single-Clock-Cycle Diagram:
Clock Cycle 5
SUB ADD SW LW
Single-Clock-Cycle Diagram:
Clock Cycle 6
SUB ADD SW
Single-Clock-Cycle Diagram:
Clock Cycle 7
SUB ADD
Single-Clock-Cycle Diagram:
Clock Cycle 8
SUB
Alternative View –
Multiple-Clock-Cycle Diagram
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8
Time axis
lw $t0, 10($t1) IM REG ALU DM REG
Instruction [5 0]
Recall Single-Cycle – ALU Control
Instruction AluOp Instruction Funct Field Desired ALU control
opcode operation ALU action input
LW 00 load word xxxxxx add 010
SW 00 store word xxxxxx add 010
Branch eq 01 branch eq xxxxxx subtract 110
R-type 10 add 100000 add 010
R-type 10 subtract 100010 subtract 110
R-type 10 AND 100100 and 000
R-type 10 OR 100101 or 001
R-type 10 set on less 101010 set on less 111
RegDst The register destination number for the The register destination number for the
Write register comes from the rt field (bits 20-16) Write register comes from the rd field (bits 15-11)
RegWrite None The register on the Write register input is written
with the value on the Write data input
AlLUSrc The second ALU operand comes from the The second ALU operand is the sign-extended,
second register file output (Read data 2) lower 16 bits of the instruction
PCSrc The PC is replaced by the output of the adder The PC is replaced by the output of the adder
that computes the value of PC + 4 that computes the branch target
MemRead None Data memory contents designated by the address
input are put on the first Read data output
MemWrite None Data memory contents designated by the address
input are replaced by the value of the Write data input
MemtoReg The value fed to the register Write data input The value fed to the register Write data input
comes from the ALU comes from the data memory
0
M
u
x
1
Add
Add
4 Add result
Branch
Shift
RegWrite left 2
Read
Instruction
MemWrite
PC Address register 1 Read
Read data 1
ALUSrc MemtoReg
register 2 Zero
Zero
Instruction
Registers Read ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u M
Data u
Write x memory
data x
1
0
Write
data
Instruction
[15– 0] 16 6
Sign 32 ALU
extend MemRead
Same control
control
Instruction
signals as the [20– 16]
0
single-cycle Instruction
M
u
ALUOp
datapath
[15– 11] x
1
RegDst
Pipeline Control Signals
There are five stages in the pipeline
instruction fetch / PC increment Nothing to control as instruction memory
read and PC write are always enabled
instruction decode / register fetch
memory access
write back
Execution/Address Write-back
Calculation stage control Memory access stage stage control
lines control lines lines
Reg ALU ALU ALU Branc Mem Mem Reg Mem
Instruction Dst Op1 Op0 Src h Read Write write to Reg
R-format 1 1 0 0 0 0 0 1 0
lw 0 0 0 1 0 1 0 1 1
sw X 0 0 1 0 0 1 0 X
beq X 0 1 0 1 0 0 0 X
Pipeline Control Implementation
Pass control signals along just like the data – extend each
pipeline register to hold needed control bits for succeeding stages
WB
Instruction
Control M WB
EX M WB
ID/EX
0
M
u WB
x EX/MEM
1
Control M WB
MEM/WB
EX M WB
IF/ID
Add
4 Add
Add result
RegWrite
Shift Branch
MemWrite
left 2
ALUSrc
MemtoReg
Read
Instruction
Instruction 16 32 6
[15– 0] Sign ALU MemRead
extend control
Control signals Instruction
emanate from
[20– 16]
0 ALUOp
M
the control Instruction
[15– 11]
u
x
portions of the 1
RegDst
pipeline registers
Pipelined Execution and Control
Instruction sequence:
lw $10, 20($1)
sub $11, $2, $3
and $12, $4, $7
or $13, $6, $7
add $14, $8, $9
Add
Add
4 Add result
RegWrite
Branch
Shift
MemWrite
left 2
ALUSrc
MemtoReg
Read
Instruction
Instruction
[15– 0] Sign ALU MemRead
extend control
Instruction
[20–16]
0 ALUOp
M
Instruction u
[15–11] x
C lo c k 1 1
RegDst
IF : s u b $ 1 1 , $ 2 , $ 3 I D : lw $ 1 0 , 2 0 ( $ 1 ) E X : b e fo re < 1 > M E M : b e fo r e < 2 > W B : b e f o re < 3 >
Add
4 Add
Add result
RegWrite
Shift Branch
MemWrite
left 2
ALUSrc
1
MemtoReg
Instruction
Read
register 1
PC Address Read $1
X data 1
Read
register 2 Zero
Instruction
Registers Read $X ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data
Instruction
20 [15– 0] Sign 20 ALU MemRead
extend control
Instruction
10 [20–16] 10
0 ALUOp
M
Instruction u
X [15–11] X x
C lo c k 2 1
RegDst
IF : a n d $ 1 2 , $ 4 , $ 5 ID : s u b $ 1 1 , $ 2 , $ 3 E X : lw $ 1 0 , . . . M E M : b e fo r e < 1 > W B : b e fo r e < 2 >
Add
Add
4 Add result
RegWrite
Shift Branch
MemW rite
left 2
ALUSrc
2
M emtoReg
I nstruction
Read
PC Address register 1 Read $2 $1
3 Read data 1
register 2 Zero
Instruction
Registers Read $3 ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data
Instruction
X [15–0] Sign X 20 ALU MemRead
extend control
Instruction
X [20–16] X 10
0 ALUOp
M
Instruction u
11 [15–11] 11 x
C lo c k 3 1
RegDst
IF : o r $ 1 3 , $ 6 , $ 7 ID : a n d $ 1 2 , $ 2 , $ 3 E X : sub $1 1, . . . M E M : lw $ 1 0 , . . . W B : b e fo r e < 1 >
Add
4 Add
Add result
RegWrite
Shift Branch
MemWrite
left 2
ALUSrc
4
MemtoReg
Instruction
Read
register 1
PC Address Read $4 $2
5 data 1
Read
register 2 Zero
Instruction
Registers Read $5 $3 ALU ALU
memory Write 0 Address Read
data 2 result 1
register M data
u Data M
Write x u
memory x
data 1
0
Write
data
Instruction
X [15–0] Sign X ALU MemRead
extend control
Instruction
X [20–16] X
0 ALUOp
M 10
Instruction u
12 [15–11] 12 11 x
C lo c k 4 1
RegDst
IF : a dd $ 1 4 , $ 8 , $ 9 ID : o r $ 1 3 , $ 6 , $ 7 E X : and $12, . . . M E M : sub $1 1, . . . W B : lw $ 1 0 , . . .
Add
Add
4 Add result
RegWrite
Shift Branch
MemWrite
left 2
ALUSrc
6
MemtoReg
Read
Instruction
Instruction
X [15–0] Sign X ALU MemRead
extend control
Instruction
X [20–16] X
0 ALUOp
M 11 10
Instruction u
13 [15–11] 13 12 x
C lo c k 5 1
RegDst
I F : a fte r< 1 > ID : a d d $ 1 4 , $ 8 , $ 9 E X : or $13 , . . . M E M : and $12, . . . W B : su b $ 1 1 , . . .
Add
4 Add
Add result
RegWrite
Shift Branch
MemWrite
left 2
ALUSrc
8
MemtoReg
Instruction
Read
register 1
PC Address Read $8 $6
9 data 1
Read
register 2 Zero
Instruction
Registers Read $9 $7 ALU ALU
memory 11 Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data
Instruction
X [15–0] Sign X ALU MemRead
extend control
Instruction
X [20–16] X
0 ALUOp
M 12 11
Instruction u
14 [15–11] 14 13 x
C lo c k 6 1
RegDst
I F : a fte r < 2 > I D : a fte r< 1 > E X : add $ 14, . . . M E M : or $ 1 3 , . . . W B: and $1 2, . . .
Add
Add
4 Add result
RegWrite
Shift Branch
MemWrite
left 2
ALUSrc
MemtoReg
Read
Instruction
Instruction
[15–0] Sign ALU MemRead
extend control
Instruction
[20–16]
0 ALUOp
M 13 12
Instruction u
[15–11] 14 x
C lo c k 7 1
RegDst
I F : a fte r < 3 > I D : a fte r< 2 > E X : a fte r < 1 > M E M : add $14, . . . W B : or $ 13 , . . .
Add
4 Add
Add result
RegWrite
Shift Branch
MemWrite
left 2
ALUSrc
MemtoReg
Instruction
Read
PC Address register 1
Read
data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory 13 Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data
Instruction
[15–0] Sign ALU MemRead
extend control
Instruction
[20–16]
0 ALUOp
M 14 13
Instruction u
[15–11] x
C lo c k 8 1
RegDst
Pipelined Execution and Control
I F : a ft e r < 4 > I D : a fte r < 3 > E X : a f te r < 2 > M E M : a f te r < 1 > W B: add $1 4, . . .
Add
4 Add
Add result
RegWrite
Shift Branch
MemWrite
left 2
ALUSrc
MemtoReg
Instruction
Read
PC Address register 1 Read
Read data 1
register 2 Zero
Instruction
Registers Read ALU ALU
memory 14 Write 0 Read
data 2 result Address data 1
register M M
u Data
Write x memory u
data x
1 0
Write
data
Instruction
[15–0] Sign ALU MemRead
extend control
Instruction
[20–16]
0 ALUOp
M 14
Instruction u
[15–11] x
C lo c k 9 1
RegDst