L12 Pipelining 1

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31

Computer Architecture

Lecture – 12
 Pipelining :
 Pipelining is a way of organizing concurrent activity
in a computer system.
 The basic idea is to do some tasks previously that
will be needed in future, but will be done previously
not to keep any task performing part of a
computer idle.
 To achieve high performance is the main objective
of pipelining in modern computer.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 There are many ways to improve the speed of
execution of program(performance) :
1. To use faster circuit technology to build the
processor and the main memory
2. To arrange the hardware so that more than one
instructions can be performed at the same time.
▪ So the number of operations performed per second in
increased, but the time required is only the time needed
to perform one operation.
▪ This way is called pipelining.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 The idea of implementation of pipelining :
 The execution of a program is divided into two
steps :
1. Fetching Unit
2. Execution Unit
 For instruction Ii, let Fi and Ei refer to the fetching
and execution step.
 Execution of a program consists of a sequence of
fetch and execute steps(like following figure :
explanation : [class note/see book]).
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
T ime
I1 I2 I3

F1 E1 F2 E2 F3 E3

(a) Sequential execution

Interstage buffer

Instruction Ex ecution
fetch unit

(b) Hardware organization

T ime
Clock cycle 1 2 3 4
I1 F1 E1

I2 F2 E2

I3 F3 E3

(c) Pipelined execution

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 Consider, a computer has two separate units :
 Fetching Unit : For fetching an instruction
 Execution Unit : For executing that instruction
 After fetching the instruction, it is deposited
into an intermediate storage buffer, B1.
 This buffer makes the fetching unit free so that it
can fetch the next instruction.
 And, also enables the execution unit to perform its
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 It is assumed that both the source and
destination of the data are extracted from the
instruction inside the ‘Execution Unit’ block.
 The results execution are deposited in the
destination location specified by the
 The computer is controlled by a clock whose
period is such that the fetch and execute
steps of any instruction can be computed in
one clock cycle.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 The above arrangement of pipelining is called
two-stage pipelining, because
 The execution of an instruction is divided into two
stages where within each stage, two consecutive
different steps of execution of the instruction
 The new result of one stage’s execution is loaded to
the intermediate buffer at the end of each clock

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 Another idea of implementing pipelining :
 The processing of an instruction is divided into four
stages(Four Stage Pipelining) instead of two :
1. F(Fetch) : Read the instruction from the memory
2. D(Decode) : Decode the instruction(what is it?) and
fetch the source operand(s)
3. E(Execute) : Perform the operation specified by the
4. W(Write) : Store the result in the destination location.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 In this arrangements, execution of four
separate instruction progress at the same time.
 So four distinct hardware units are needed at
the same time.
 These units are capable of performing their tasks
 And these units do not interfere with one another’s

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

T ime
Clock cycle 1 2 3 4 5 6 7


I1 F1 D1 E 1 W 1

I2 F2 D2 E 2 W 2

I3 F3 D3 E 3 W 3

I4 F4 D4 E 4 W 4

(a) Instruction execution divided into four steps

Interstage b uf fers

D : Decode
F : Fetch instruction E: Ex ecute W : Write
instruction and fetch operation results
B1 B2 B3

(b) Hardware organization

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 Information is passed from one unit to another
through storage buffer.
 As the execution of one instruction progresses
through the pipelining, all the information
needed for the next stages are passed.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 Explanations of tasks of the each intermediate
storage :
 B1 : holds the instruction fetched by fetch unit.
 B2 : holds the decoded instruction, that is,
specification of instruction(what is to be done
with the instruction), also holds the source
operands and the location of the result of the
 B3 : result of the instruction and its destination.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 Each stage in the pipelining is expected to be
completed in one clock cycle.
 So the clock period should be long enough to
complete the task in any stage.
 If different units of pipelining takes different
amount of time, then the clock period must be
long enough to complete the task by the
slowest unit.
 Then, the fast unit will remain idle sometime.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 But the pipelining gives the best performance if
the task performed by each of the stages is
completed within nearly the same amount of
 This is specifically important when a situation
comes, where we need to fetch any instruction
from main memory.
 Because fetching any information from main
memory takes much more times long than the
time assigned to each stage.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 So, when main memory is to be accessed,
pipelining can not improve the performance.
 The use of cache memory solves this problem
to some extent.
 Since cache memory is attached to the
processor, fetching from cache memory takes
the same time as the other operations.
 Then it is possible to divide the
processing(fetch & execute) an instruction into
same duration.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 In the four stage pipelining, processor
completes the processing of one instruction in
four clock cycles.
 That means the rate of instruction processing
is four times better than that of sequential
 So, the potential increase in performance
resulting from pipelining is proportional to the
number of pipelining stages.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 However, this increase is achieved if there is no
interruption occurs in the program execution.
 But, in practical cases, several kinds of
interruption can be occurred.
 There can be many reasons, for which, one of
the pipeline stages may not be able to
complete their task in the allotted time.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 When this kind of situation occurs, pipelined
operation is said to have been stalled.
 Any condition that causes pipelined operations
to be stalled is called hazards.
 Three kinds of hazards can occur :
1. Data Hazard
2. Instruction Hazard &
3. Structural Hazard

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 Example Data Hazard :
 Execution Unit of pipelining are responsible for
doing any kind of arithmetic and logical function.
 It is assumed that all functions need same time.
 But if any function needs more time, that is, the
data is not ready at the allotted time for Writing
Unit, due to lack of data, Data Hazard occurs.
 An example scenario is shown below :

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

Clock cycle 1 2 3 4 5 6 7 8 9


I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

I5 F5 D5 E5

Figure 8.3. Ef fect of an xe ecution operation taking more than one clock

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 In the above example, the execution unit is
supposed to end its task at the end of clock cycle 4,
but instead of that it took two extra clock cycles.
 So, pipelining is stalled for two clock cycles.
 One important thing to note about this kinds of
hazards :
▪ This delay of two clock cycle remains same for all the
successive instructions.
▪ So, in the long run, there remains no chance to make
up the delay.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 So, Data hazards are those hazards which
occurs due to the unavailability of any of the
source or destination operand at the expected
time in pipelining.
 And, Instruction hazards/Control Hazards
are those which occurs due to the delay in the
availability of an instruction at expected time
and pipelining get stalled.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 Example Instruction Hazard :
 Assume, an instruction fetching operation misses
the cache.
 So, it has to access the main memory to fetch that
 For that, fetch unit needs three extra clock cycles.
 So, pipelining stalled for three clock cycles.
 This extra clock cycle or idle periods are called stalls
or bubbles.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
T ime
Clock c ycle 1 2 3 4 5 6 7 8 9


I 1 F 1 D 1 E1 W 1

I 2 F 2 D 2 E2 W 2

I 3 F 3 D 3 E3 W 3

(a) Instruction execution steps in successive clock cycles

T ime
Clock c ycle 1 2 3 4 5 6 7 8 9

F: Fetch F 1 F 2 F 2 F 2 F 2 F 3

D: Decode D 1
idle idle idle D 2
D 3

E: Execute E1 idle idle idle E2 E3

W: Write W 1
idle idle idle W 2
W 3

(b) Function performed by each processor stage in successive clock cycles

Figure 8.4. Pipeline stall caused by a cache miss in F2.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 When a delay occurs due to a situation of the
requirement of using the same hardware at the
same time by more than one instruction, a
hazard is occurred, called Structural Hazard.
 In most of the cases, this hazard occurs when
more than one instructions want to access the
memory at the same time.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 A sample situation is, one instruction wants to
fetch its source operands while the fetching
unit wants to fetch an instruction from the
memory at the same time.
 If same cache is used for data and instruction
then one instruction get chance and other is
delayed or stalled.
 For this, sometimes processor has two separate
caches for data and instruction.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 Example Structural Hazard :
 Load X(R1), R2 == Load X + [R1], R2
 The memory access requires one extra clock cycle.
 For that, this instruction and the next instruction
wants to access the Register File, at the same time
for writing, which causes a Structural Hazard.
 It causes the next instruction(I3) stalled by one
clock cycle.
 It occurs because one hardware, Register File, can
not handle two operations at once.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
Clock cycle 1 2 3 4 5 6 7

I1 F1 D1 E1 W1

I2 (Load) F2 D2 E2 M2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4

I5 F5 D5

Figure 8.5. Ef fect of a Load instruction on pipeline timing.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 If the Register File would have more than one input
ports, then two simultaneous access would be
possible and pipeline would not be stalled.
 This is the only way to avoid structural hazards,
that is to provide necessary resources.
 Now, in the figure, I4 is delayed because I3 is already
using the memory at that time, so at that time, I4
can not access the memory.
 For, the same reason, I5 is stalled by one clock cycle.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 Zaky
 Chapter 8 : 8.1(8.1.1, 8.1.2)
 Class Lecture

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

You might also like