L12 Pipelining 1

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31

Computer Architecture

Lecture – 12
 Pipelining :
 Pipelining is a way of organizing concurrent activity
in a computer system.
 The basic idea is to do some tasks previously that
will be needed in future, but will be done previously
not to keep any task performing part of a
computer idle.
 To achieve high performance is the main objective
of pipelining in modern computer.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 There are many ways to improve the speed of
execution of program(performance) :
1. To use faster circuit technology to build the
processor and the main memory
2. To arrange the hardware so that more than one
instructions can be performed at the same time.
▪ So the number of operations performed per second in
increased, but the time required is only the time needed
to perform one operation.
▪ This way is called pipelining.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 The idea of implementation of pipelining :
 The execution of a program is divided into two
steps :
1. Fetching Unit
2. Execution Unit
 For instruction Ii, let Fi and Ei refer to the fetching
and execution step.
 Execution of a program consists of a sequence of
fetch and execute steps(like following figure :
explanation : [class note/see book]).
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
T ime
I1 I2 I3

F1 E1 F2 E2 F3 E3

(a) Sequential execution

Interstage buffer
B1

Instruction Ex ecution
fetch unit
unit

(b) Hardware organization

T ime
Clock cycle 1 2 3 4
Instruction
I1 F1 E1

I2 F2 E2

I3 F3 E3

(c) Pipelined execution

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Consider, a computer has two separate units :
 Fetching Unit : For fetching an instruction
 Execution Unit : For executing that instruction
 After fetching the instruction, it is deposited
into an intermediate storage buffer, B1.
 This buffer makes the fetching unit free so that it
can fetch the next instruction.
 And, also enables the execution unit to perform its
operation.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 It is assumed that both the source and
destination of the data are extracted from the
instruction inside the ‘Execution Unit’ block.
 The results execution are deposited in the
destination location specified by the
instruction.
 The computer is controlled by a clock whose
period is such that the fetch and execute
steps of any instruction can be computed in
one clock cycle.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 The above arrangement of pipelining is called
two-stage pipelining, because
 The execution of an instruction is divided into two
stages where within each stage, two consecutive
different steps of execution of the instruction
happens.
 The new result of one stage’s execution is loaded to
the intermediate buffer at the end of each clock
cycle.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Another idea of implementing pipelining :
 The processing of an instruction is divided into four
stages(Four Stage Pipelining) instead of two :
1. F(Fetch) : Read the instruction from the memory
2. D(Decode) : Decode the instruction(what is it?) and
fetch the source operand(s)
3. E(Execute) : Perform the operation specified by the
instruction.
4. W(Write) : Store the result in the destination location.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 In this arrangements, execution of four
separate instruction progress at the same time.
 So four distinct hardware units are needed at
the same time.
 These units are capable of performing their tasks
simultaneously.
 And these units do not interfere with one another’s
tasks.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


T ime
Clock cycle 1 2 3 4 5 6 7

Instruction

I1 F1 D1 E 1 W 1

I2 F2 D2 E 2 W 2

I3 F3 D3 E 3 W 3

I4 F4 D4 E 4 W 4

(a) Instruction execution divided into four steps

Interstage b uf fers

D : Decode
F : Fetch instruction E: Ex ecute W : Write
instruction and fetch operation results
operands
B1 B2 B3

(b) Hardware organization

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Information is passed from one unit to another
through storage buffer.
 As the execution of one instruction progresses
through the pipelining, all the information
needed for the next stages are passed.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Explanations of tasks of the each intermediate
storage :
 B1 : holds the instruction fetched by fetch unit.
 B2 : holds the decoded instruction, that is,
specification of instruction(what is to be done
with the instruction), also holds the source
operands and the location of the result of the
instruction.
 B3 : result of the instruction and its destination.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Each stage in the pipelining is expected to be
completed in one clock cycle.
 So the clock period should be long enough to
complete the task in any stage.
 If different units of pipelining takes different
amount of time, then the clock period must be
long enough to complete the task by the
slowest unit.
 Then, the fast unit will remain idle sometime.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 But the pipelining gives the best performance if
the task performed by each of the stages is
completed within nearly the same amount of
time.
 This is specifically important when a situation
comes, where we need to fetch any instruction
from main memory.
 Because fetching any information from main
memory takes much more times long than the
time assigned to each stage.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 So, when main memory is to be accessed,
pipelining can not improve the performance.
 The use of cache memory solves this problem
to some extent.
 Since cache memory is attached to the
processor, fetching from cache memory takes
the same time as the other operations.
 Then it is possible to divide the
processing(fetch & execute) an instruction into
same duration.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 In the four stage pipelining, processor
completes the processing of one instruction in
four clock cycles.
 That means the rate of instruction processing
is four times better than that of sequential
operation.
 So, the potential increase in performance
resulting from pipelining is proportional to the
number of pipelining stages.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 However, this increase is achieved if there is no
interruption occurs in the program execution.
 But, in practical cases, several kinds of
interruption can be occurred.
 There can be many reasons, for which, one of
the pipeline stages may not be able to
complete their task in the allotted time.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 When this kind of situation occurs, pipelined
operation is said to have been stalled.
 Any condition that causes pipelined operations
to be stalled is called hazards.
 Three kinds of hazards can occur :
1. Data Hazard
2. Instruction Hazard &
3. Structural Hazard

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Example Data Hazard :
 Execution Unit of pipelining are responsible for
doing any kind of arithmetic and logical function.
 It is assumed that all functions need same time.
 But if any function needs more time, that is, the
data is not ready at the allotted time for Writing
Unit, due to lack of data, Data Hazard occurs.
 An example scenario is shown below :

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


Time
Clock cycle 1 2 3 4 5 6 7 8 9

Instruction

I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

I5 F5 D5 E5

Figure 8.3. Ef fect of an xe ecution operation taking more than one clock
ycle.
c

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 In the above example, the execution unit is
supposed to end its task at the end of clock cycle 4,
but instead of that it took two extra clock cycles.
 So, pipelining is stalled for two clock cycles.
 One important thing to note about this kinds of
hazards :
▪ This delay of two clock cycle remains same for all the
successive instructions.
▪ So, in the long run, there remains no chance to make
up the delay.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 So, Data hazards are those hazards which
occurs due to the unavailability of any of the
source or destination operand at the expected
time in pipelining.
 And, Instruction hazards/Control Hazards
are those which occurs due to the delay in the
availability of an instruction at expected time
and pipelining get stalled.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Example Instruction Hazard :
 Assume, an instruction fetching operation misses
the cache.
 So, it has to access the main memory to fetch that
instruction.
 For that, fetch unit needs three extra clock cycles.
 So, pipelining stalled for three clock cycles.
 This extra clock cycle or idle periods are called stalls
or bubbles.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
T ime
Clock c ycle 1 2 3 4 5 6 7 8 9

Instruction

I 1 F 1 D 1 E1 W 1

I 2 F 2 D 2 E2 W 2

I 3 F 3 D 3 E3 W 3

(a) Instruction execution steps in successive clock cycles

T ime
Clock c ycle 1 2 3 4 5 6 7 8 9

Stage
F: Fetch F 1 F 2 F 2 F 2 F 2 F 3

D: Decode D 1
idle idle idle D 2
D 3

E: Execute E1 idle idle idle E2 E3

W: Write W 1
idle idle idle W 2
W 3

(b) Function performed by each processor stage in successive clock cycles

Figure 8.4. Pipeline stall caused by a cache miss in F2.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 When a delay occurs due to a situation of the
requirement of using the same hardware at the
same time by more than one instruction, a
hazard is occurred, called Structural Hazard.
 In most of the cases, this hazard occurs when
more than one instructions want to access the
memory at the same time.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 A sample situation is, one instruction wants to
fetch its source operands while the fetching
unit wants to fetch an instruction from the
memory at the same time.
 If same cache is used for data and instruction
then one instruction get chance and other is
delayed or stalled.
 For this, sometimes processor has two separate
caches for data and instruction.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 Example Structural Hazard :
 Load X(R1), R2 == Load X + [R1], R2
 The memory access requires one extra clock cycle.
 For that, this instruction and the next instruction
wants to access the Register File, at the same time
for writing, which causes a Structural Hazard.
 It causes the next instruction(I3) stalled by one
clock cycle.
 It occurs because one hardware, Register File, can
not handle two operations at once.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
Time
Clock cycle 1 2 3 4 5 6 7

Instruction
I1 F1 D1 E1 W1

I2 (Load) F2 D2 E2 M2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4

I5 F5 D5

Figure 8.5. Ef fect of a Load instruction on pipeline timing.


09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 If the Register File would have more than one input
ports, then two simultaneous access would be
possible and pipeline would not be stalled.
 This is the only way to avoid structural hazards,
that is to provide necessary resources.
 Now, in the figure, I4 is delayed because I3 is already
using the memory at that time, so at that time, I4
can not access the memory.
 For, the same reason, I5 is stalled by one clock cycle.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Zaky
 Chapter 8 : 8.1(8.1.1, 8.1.2)
 Class Lecture

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

You might also like