Lec12 - Pipelined Implementation I

VLSI Architecture
ES ZG642 / MEL ZG 642

Session 12
BITS Pilani Pawan Sharma
ps@pilani.bits-pilani.ac.in
Pilani Campus 01/11/2023
Last Lecture
• Multi cycle processor Implementation contd…….

• Pipelined Implementation
Today’s Lecture
• Pipelined Implementation contd……

PIPELINING
• Implementation technique : multiple instructions are

overlapped in execution
• Each step in the pipeline completes a part of the
instruction, each step operates in parallel with other
steps
• Instructions enter at one end and exit at other end
• Throughput determined by how often an instruction exits
pipeline.
• If stages are perfectly balanced time per instruction on
pipelined machine gets divided by number of stages.
number of pipe stages.
BITS Pilani, Pilani Campus

Review - Single-Cycle Datapath “Steps”
division of an instruction into five stages (IF,ID,EX.MEM.WB) means a five-stage pipeline, which in turn means
that up to five instructions will be in execution during any single clock cycle. Thus, we must separate the datapath
into five pieces, with each piece named corresponding to a stage of instruction execution

• Pipelining the datapath requires values passed from one pipestage to next
- need pipeline registers.
• All registers needed to hold values temporarily between clock cycles within one
instruction are included into these pipeline registers.
• Pipeline registers carry both data and control from one stage to next. instructions and
data move generally from left to right through the five stages as they complete
execution, with two exceptions:
• The write-back stage, which places the result back into the register file in the
middle of the datapath –can cause data hazard
• The selection of the next value of the PC, choosing between the incremented PC
and the branch address from the MEM stage– can cause control hazard
• Data flowing from right to left does not affect the current instruction; these
reverse data movements influence only later instructions in the pipeline.
• Any instruction is active in exactly one stage of the pipeline at a time

Pipelined Datapath
Pipeline registers wide enough to hold data coming in

ADD
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
Instruction 5 5 5
Memory RN1 RN2 WN

RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
E MemoryRD M
U
16 X 32 X
T WD
N
D
IF/ID ID/EX EX/MEM MEM/WB

• During each cycle, an instruction advances from one
pipeline register to the next pipeline register.
• The registers are labeled by the stages that they
separate.
• Pipeline registers are as wide as necessary to hold all of
the data passed into them.
• For instance, IF/ID is 64 bits wide because it must hold a
32-bit instruction and a 32-bit PC+4 result.

Pipelined datapath for load
instruction
Load word is a good instruction to start with because it is active in every stage of the
pipelined datapath.
lw $rt, immed($rs)
The load word instruction adds an immed or constant value to the contents of source
register ($rs) to obtain the address in memory whose contents are written to
destination register ($rt)

Instruction Fetch (IF)
The instruction is read from memory using the contents of PC and placed in the IF/ID register. The PC address is
incremented by 4 and written back to the PC register, as well as placed in the IF/ID register in case it is needed later
for am insruction, such as beq.
Note: the datapath does not know that we are performing a load word at this point so it forwards the PC+4 value,
so it must prepare for any instruction, passing potentially needed information down the pipeline..
In the diag, the right half of memory is shaded meaning it is being read. The left half of IF/ID is shaded because it is
being written.
Instruction Decode and Register
File Read (ID):
The register numbers $rs and $rt are read from the register file and stored in the ID/EX pipeline register.
The 16-bit immediate field is sign-extended to 32-bits and stored in the ID/EX pipeline register.
The PC+4 value is copied from the IF/ID register into the ID/EX register in case the instruction needs it later.

Execute or Address Calculation (EX):
From the ID/EX pipeline register, take the contents of $rs and the sign-extended immediate field as inputs to the
ALU, which performs an add operation. The sum is placed in the EX/MEM pipeline register.

Memory Access (MEM):
Take the address stored in the EX/MEM pipeline register and use it to access data memory. The data read from
memory is stored in the MEM/WB pipeline register.

Write Back (WB)
Read the data from the MEM/WB register and write it back to the register file in the middle of the datapath.

• It’s important to note that any information we need will have
to be passed from pipeline register to pipeline register while
instruction executes.
• Because the instructions share the elements, we cannot

assume anything from a previous cycle is still there. We
must carry the data with us as we move along the data path.

• Which register is changed in the final stage of the load?
• The IF/ID pipeline register should no longer contain the
necessary instruction field – it’s already been overwritten by
three other instructions at this point.
• So, we have a bug in the design!!

Pipelined Datapath
Pipeline registers wide enough to hold data coming in

ADD
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
Instruction 5 5 5
Memory RN1 RN2 WN

RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
E MemoryRD M
U
16 X 32 X
T WD
N
D

data flowing right to left may cause hazard…, why?
Bug in the Datapath

ADD
4 ADD
PC
Instruction I <<2
ADDR RD
32 16 32
Instruction
5 5 5
Memory RN1 RN2 WN

RD1
Register File
WD
ALU
RD2 M
U ADDR
X
Data RD M
16
E
X 32
Memory U
X
T WD
N
D
Write register number comes from another later instruction!

Solution
• Carry the information through each stage using the

pipeline registers.
• To do this, we’ll modify the datapath a little bit.
• Now, we’ll pass the write register number from the MEM/WB
pipeline register along with the data.
• This register number is initially discovered in the ID stage and
must be passed through the pipeline registers until we need it
in the WB stage

Corrected Datapath
ADD
ADD
4 64 bits 133 bits
<<2 102 bits 69 bits
PC
ADDR RD 5
RN1 RD1
32 Zero
Instruction RN2 ALU
5
Memory WN
Register
5
WD
File RD2 M
U ADDR
X
Data RD
Memory
M
E U
16 X 32 X
T WD
N
5 D
preserve the destination register number in the load instruction. Destination register number is also
passed through ID/EX, EX/MEM and MEM/WB registers, which are now wider by 5 bits

Key Points
• Not to perform two different operations that use same data
path resource on same clock cycle.
• Ex: ALU (for effective address calculation, SUB operation)
• Using data and instruction memory separately eliminates a

conflict for a single memory that would arise between
instruction fetch and data memory access.
• A single register file in the datapath which is used in two

stages- ID for reading and WB for writing because Reads and
writes go to separate ports on the register file. Writes occur in
the first half of the cycle, reads occur in the second half

Pipeline Control
A lot of the control logic for pipelined implementation is borrowed from the single-
cycle and multi-cycle implementations.
So the initial design – motivated by single-cycle datapath control – use the same
control signals
Observe: Will be
modified
– No separate write signal for the PC as it is written every cycle by hazard
– No separate write signals for the pipeline registers as they are written everydetection
unit!!
cycle
– No separate read signal for instruction memory as it is read every clock
cycle
– No separate read signal for register file as it is read every clock cycle
Need to set control signals during each pipeline stage

Since control signals are associated with components active during a single pipeline
stage, can group control lines into five groups according to pipeline stage

Pipeline Control Signals
There are five stages in the pipeline.
– instruction fetch / PC increment Nothing to control as instruction memory
read and PC write are always enabled
– instruction decode / register fetch
– execution / address calculation
– memory access
– write back
Execution/Address Write-back
Calculation stage control Memory access stage stage control
lines control lines lines
Reg ALU ALU ALU Branc Mem Mem Reg Mem
Instruction Dst Op1 Op0 Src h Read Write write to Reg
R-format 1 1 0 0 0 0 0 1 0
lw 0 0 0 1 0 1 0 1 1
sw X 0 0 1 0 0 1 0 X
beq X 0 1 0 1 0 0 0 X

Events on the Pipestage
In IF, in addition to fetching the instruction and computing the new PC,
we store the incremented PC both into the PC and into a pipeline register
(NPC) for later use in computing the branch-target address, the PC is updated in IF
from one of two sources
In ID, we fetch the registers, extend the sign of the lower 16 bits of the IR (the
immediate field), and pass along the IR and NPC

During EX, we perform an ALU operation or an address calculation; we pass along the IR and the B register (if the
instruction is a store) I We also set the value of cond to 1 if the instruction is a taken branch
During the MEM phase, we cycle the memory, write the PC if needed, and pass along values needed in the final
pipe stage
Finally, during WB, we update the register field from either the ALU output or the loaded value
For simplicity, we always pass the entire IR from one stage to the next, although as an instruction proceeds down the
pipeline, less and less of the IR is needed

Pipelined Datapath with Control I

Control lines
• As was the case for the single-cycle implementation, we assume that

the PC is written on each clock cycle, so there is no separate
write signal for the PC.
• By the same argument, there are no separate write signals for the
pipeline registers (IF/ID, ID/EX, EX/MEM, and MEM/WB), since
the pipeline registers are also written during each clock cycle.
• To specify control for the pipeline, we need only set the control
values during each pipeline stage.
• Because each control line is associated with a component active in
only a single pipeline stage, we can divide the control lines into
five groups according to the pipeline stage.

• Instruction fetch: The control signals to read instruction
memory and to write the PC are always asserted, so there is
nothing special to control in this pipeline stage.
• Instruction decode/register file read: As in the previous stage,
the same thing happens at every clock cycle, so there are no
optional control lines to set.
• Execution/address calculation: The signals to be set are
RegDst, ALUOp, and ALUSrc. The signals select the Result
register, the ALU operation, and either Read data 2 or a sign-
extended immediate for the ALU.

• Memory access: The control lines set in this stage are Branch,
MemRead, and MemWrite. The branch equal, load, and store
instructions set these signals, respectively. Recall that PCSrc in
selects the next sequential address unless control asserts
Branch and the ALU result was 0.
• Write-back: The two control lines are MemtoReg, which
decides between sending the ALU result or the memory value
to the register file, and Reg-Write, which writes the chosen
value.

The values of the control lines are the same as in single cycle implementation, but they have been shuffled
into three groups corresponding to the last three pipeline stages or the nine control lines are grouped by
pipeline stage.

Pipeline Control Implementation
Pass control signals along just like the
data – extend each pipeline register to
hold needed control bits for succeeding
stages. In the fig, four of the nine control
lines are used in the EX phase, with the
remaining five control lines passed on to
the EX/MEM pipeline register extended to
hold the control lines; three are used
during the MEM stage, and the last two
are passed to MEM/WB for use in the WB
stage.
Note that we now need the 6-bit funct

field (function code) of the instruction in
the EX stage as input to ALU control, so
these bits must also be included in the
ID/EX pipeline register. Recall that these 6
bits are also the 6 least significant bits of
the immediate field in the instruction, so
the ID/EX pipeline register can supply them
from the immediate field since sign
extension leaves these bits unchanged.

PIPELINE HAZARDS

Hazards prevent the next instruction stream from executing during its designated clock. There are three classes
of hazards.
Structural Hazards: A structural hazard in the laundry room would occur if we used a washer-dryer
combination instead of a separate washer and dryer
Arise from resource conflicts when the hardware cannot support all possible combinations of instructions in
simultaneous overlapped instructions.
E.g., suppose you have a single – not separate – instruction and data memory in pipeline with one read port. in
the same clock cycle. the first instruction is accessing data from memory while the fourth instruction is fetching an
instruction from that same memory. Without two memories, our pipeline could have a structural hazard.
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access
Pipelined
Instruction Data
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $4, 400($0) Reg ALU Reg
2 ns fetch access
2 ns 2 ns 2 ns 2 ns 2 ns

A machine having only one register-file write port, want to perform two
writes in a clock cycle. When a sequence of instructions encounter this
hazard the pipeline will stall one of the instructions until the required unit
is available.
A stall is commonly called bubble.
-When an instruction is stalled all instructions issued later from the stalled
instruction also stalled.
- Instructions issued earlier than the stalled instruction must continue
- No new instructions are fetched during the stall.

Data Hazards:
Arise when an instruction depends on the result of a previous instruction

data hazards arise from the dependence of one instruction on an earlier one that is still in the pipeline
For example, the last four instructions are all dependent on the result in register R1 of the first
instruction, the ADD instruction. If register R1 had the value 10 before the ADD instruction and −20
afterwards, the programmer intends that −20 will be used in the following instructions that refer to
register R1. However, the middle three instructions will read the value to be 10 if we do not intervene.
ADD R1, R2, R3

SUB R4, R1, R5
AND R6, R5, R1
OR R8, R1, R9
XOR R10, R1, R11

All the instructions after add use the result of add instruction.
CC1 CC2 CC3 CC4 CC5 CC6
ADD R1, R2, R3 IM REG ALU DM REG
SUB R4, R1, R5
ADD instruction writes the value of R1 in WB stage but SUB instruction is

to use result in ID stage – Data Hazard ( wrong value read)

CC1 CC2 CC3 CC4 CC5 CC6
ADD R1, R2, R3 IM REG ALU DM REG
SUB R4, R1, R5
AND R6, R1, R7
AND Suffers from Data Hazard.

ADD Instruction is writing
the value of R1 to the IF/ID ID/EX EX/MEM MEM/WB
register file in the same cycle
that OR instruction is
reading the value of R1. IF/ID ID/EX EX/MEM MEM/WB
We assume that the write

operation takes place in the IF/ID ID/EX EX/MEM
first half of the clock cycle, W|R
while the read operation
takes place in the second IF/ID ID/EX
half. Therefore, the updated
R1 value is available.
IF/ID
So the only data hazards occur for instructions 2 and 3. In this style of representation, we can easily identify true data
hazards as they are the only ones whose dependency lines go back in time. Instruction 2 reads R1 in cycle 3 and
instruction 3 reads R1 in cycle 4.

Solution?
• The primary solution is based on the observation that we don’t need to wait for the instruction to complete
before trying to resolve the data hazard.
• If we could somehow bypass the writeback and register read stages when needed, then we can eliminate
these data hazards.
• For the code sequence, as soon as the ALU creates the sum for the ADD, we can supply it as an input for
the SUB.
• Luckily ADD instruction calculates the new values in cycle 3. If we simply forward the data as soon as it
is calculated, then we will have it in time for the subsequent instructions to execute.
• Essentially, we need to pass the ALU output from ADD

directly to the SUB and AND instructions, without going
through the register file.
• It’s ok if we read the wrong values from the register file, but
we need to make sure the right values are used as input to the
ALU!

Forwarding logic
IM REG ALU DM REG
IM REG ALU DM
IM REG ALU
The ADD instruction produces its result in its

ALU or EX stage, during cycle 3.
The SUB and AND need the new value of R1
IM REG
in their EX stages, during clock cycles 4-5.

Minimizing Data hazard Stalls by Forwarding.
If the results can be moved from where the ADD produces it, the EX/MEM
register, to where the SUB needs it, the ALU input latches, then the need for stall
can be avoided.
ALU result from the EX/MEM register is always fedback to ALU input latches.
Forwarding logic detects whether previous ALU operation has written the register
corresponding to a source for the current ALU operation, if so, then control logic
selects the forwarded result as the ALU input rather than value read from register
file.
Forward results not only from the immediately previous instruction, but possibly
from instruction started three clock cycles earlier.

Forwarding hardware outline
• If there is no hazard, the ALU’s operands will come from the

register file, just like before.
• If there is a hazard, the operands will come from either the
EX/MEM or MEM/WB pipeline registers instead.
• If we can take the inputs to the ALU from any pipeline register
rather than just ID/EX, then we can forward the proper data.
• The ALU sources will be selected by two new multiplexers, with
control signals named ForwardA and ForwardB.
• By adding wider multiplexors to the input of the ALU, and with
the proper controls, we can run the pipeline at full speed in the
presence of these data dependences.

Wider MUXs

Lec12 - Pipelined Implementation I

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec12 - Pipelined Implementation I

Uploaded by

Copyright:

Available Formats

VLSI Architecture

ES ZG642 / MEL ZG 642

• Multi cycle processor Implementation contd…….

• Pipelined Implementation contd……

• Implementation technique : multiple instructions are

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

Pipeline registers wide enough to hold data coming in

Memory RN1 RN2 WN

IF/ID ID/EX EX/MEM MEM/WB

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

• Because the instructions share the elements, we cannot

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

Pipeline registers wide enough to hold data coming in

Memory RN1 RN2 WN

IF/ID ID/EX EX/MEM MEM/WB

IF/ID ID/EX EX/MEM MEM/WB

Memory RN1 RN2 WN

Write register number comes from another later instruction!

• Carry the information through each stage using the

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

• Ex: ALU (for effective address calculation, SUB operation)

• Using data and instruction memory separately eliminates a

• A single register file in the datapath which is used in two

BITS Pilani, Pilani Campus

Need to set control signals during each pipeline stage

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

• As was the case for the single-cycle implementation, we assume that

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

Note that we now need the 6-bit funct

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

A stall is commonly called bubble.

- Instructions issued earlier than the stalled instruction must continue

- No new instructions are fetched during the stall.

BITS Pilani, Pilani Campus

Arise when an instruction depends on the result of a previous instruction

ADD R1, R2, R3

BITS Pilani, Pilani Campus

CC1 CC2 CC3 CC4 CC5 CC6

ADD R1, R2, R3 IM REG ALU DM REG

SUB R4, R1, R5

ADD instruction writes the value of R1 in WB stage but SUB instruction is

BITS Pilani, Pilani Campus

ADD R1, R2, R3 IM REG ALU DM REG