Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 22

Computer Architecture

Lecture – 14
 The task of the instruction fetch unit is to
supply the next stage of pipelining with a
steady stream of instructions.
 Whenever, this stream is interrupted, pipeline
get stalled and this situation is called
Instruction Hazards.
 Cache miss and Branch instruction is two
important cases where instruction hazard can
occur.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 The effect of cache miss on pipelining and
then the delay occurred for fetching the
instruction from main memory has already
shown.
 Here, we will examine how branch instruction
affects the performance of pipelining.
 We will first examine the unconditional
branch instructions.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Example 1 :
 Let assume first a two-stage pipelining.
Time
Clock cycle 1 2 3 4 5 6

Instruction
I1 F1 E1

I2 (Branch) F2 E2 Ex ecution unit idle

I3 F3 X

Ik Fk Ek

I k+1 Fk+1 Ek+1

Figure 8.8. An idle c ycle caused by a branch instruction.


09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 Here, I1, I2 and I3 are stored in successive memory
locations.
 Of them , I2 is a branch instructions whose target
instruction is Ik.
 At the end of cycle 3, I3 is completely fetched and
then it has been found that it is a branch instruction
whose target instruction is Ik.
 Then, this I3 instruction is to be discarded since it is
incorrectly fetched.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 So then, the fetch unit starts fetching the target
instruction.
 In the mean time, that is in the execution time of I3,
the hardware responsible for that, that is execution
unit must be told to idle.
 So, a delay for this unit occurs.
 This loss of time as a result of the branch
instruction(mainly due to incorrectly fetched
instruction) is called branch penalty.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 In the first example, branch penalty is one
clock cycle.
 If the pipelining stages increases, for the same
branch instruction, branch penalty increases.
 Example 2 :
 The same example is implemented on a four-stage
pipelining hardware.
 This time branching decision is made at the end
execution phase of I2.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 And here, two instructions are incorrectly fetched.
 So two instructions are to be discarded and the
branch penalty becomes two clock cycles.
 As early it is detected that there is branch
instruction, branch penalty reduces.
 For example, in the next figure , first execution
decides that it is branch instruction, so branch
penalty two clock cycle, but if it is detected by the
decoder unit, branch penalty becomes one clock
cycle.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
T ime
Clock c ycle 1 2 3 4 5 6 7 8

I F 1 D 1 E 1 W 1
1

I 2 (Branch) F 2 D 2 E 2

I F 3 D 3 X
3

I 4 F 4 X

I k F k D k E k W k

I k+ 1 F k+ 1 D k+ 1 E k+ 1

(a) Branch address computed in Ex ecute stage

T ime
Clock c ycle 1 2 3 4 5 6 7

I F 1 D 1 E 1 W 1
1

I 2 (Branch) F 2 D 2

I F 3 X
3

I k
F k D k E k W k

I k+ 1 F k+ 1 D k+ 1 E k+ 1

(b) Branch address computed in Decode stage

Figure 8.9. Branch timing.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Both the cache miss and branch instruction
can stall the pipelining.
 For both the cases, a hardware mechanism
can be used.
 The more sophisticated and effective fetching unit
can be created which can fetch the instruction
earlier than it is needed.
 These prefetched instructions are kept in a queue,
called Instruction Queue.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 Another new unit, named Dispatched Unit, takes
the instructions from the front of the queue.
 It also performs decoding and the sends the
instruction to the execution unit.
Instruction fetch unit
Instruction queue
F : Fetch
instruction

D : Dispatch/ E : Execute W : Write


Decode instruction results
unit

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 When the pipeline get stalled because any
data hazard, for example, dispatch unit is not
able to issue the instruction for some clock
cycle.
 But then fetching unit should not stop, it
should continue fetching the instruction, and
store them into Queue.
 Similarly, if fetching unit causes any delay in
fetching instruction dispatch unit continue to
take instructions from there.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 Example 3 :
 In this example, we will show that how the queue
length changes and how it affects the relationship
between different pipeline stages.
 Assume, initially the queue has one instruction.
 End of every fetch add one instruction and end of
every dispatch decreases one instruction from the
queue.
 I1 makes 2 clock cycle delay in the execution.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 But since there is space in the instruction queue,
fetching unit continues to fetch instruction I2, I3, I4,
I5 and I6.
 But the previous 2 clock cycle delay remains as well
in the consecutive instructions.
 After decoding I5, it is found that, it is a branch
instruction and its target instruction of Ik.
 In the mean time all other instructions completes
their processing, except I6.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 So, I6 is discarded and queue drops to one.
Time
Clock cycle 1 2 3 4 5 6 7 8 9 10
Queue length 1 1 1 1 2 3 2 1 1 1

I1 F1 D1 E1 E1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

I 5 (Branch) F5 D5

I6 F6 X

Ik Fk Dk Ek Wk

I k+1 F k+1 D k+1 E k+1

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Now, note an interesting observation from the
figure :
 The sequence of instruction processing completion is
I1, I2, I3, I4 and then Ik in the consecutive clock cycles
as desired.
 So, here, although an instruction is discarded, the
overall execution time does not increase.
 This happens because, the instruction fetch unit
continues to do its duty without thinking about the
state of the other instructions.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 This situation is called Branch Folding.
 Branch Folding can occur only if at the time of
branch instruction execution there is one more
instruction in the instruction queue other than the
branch instruction.
 So, Instruction queue helps to arise branch folding
and reduces overall delay.
 For this it is desirable that, the instruction queue
remains full as more time as possible.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 This can be done by increasing the rate at which the
fetch unit reads instruction from the cache.
 In many processor, this is done by increasing the
width of the connection between instruction cache
and fetching unit.
 If the fetch unit fills up the instruction queue quickly
after a branch instruction occurs, the probability of
branch folding increases.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Having an instruction queue is also beneficial
for dealing with cache miss.
 When, cache miss occurs, the dispatch unit to
take instruction from the instruction queue and
send them for execution until the queue is
empty.
 In the mean time, instruction block is brought
from the main memory or secondary cache to
the main cache.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 And the fetching unit again refills the
instruction queue.
 If instruction queue is not empty in the mean
time, then cache miss does not have any effect
on the instruction execution performance.
 So the instruction queue removes the
delay/stall for cache miss by continuously
giving instruction to next stage.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 And Instruction queue also removes the overall
execution delay for unconditional branch
instruction by its prefetching technique and
branch folding.
 And the effectiveness of this technique
increases if more than one instruction can be
fetched at the same time.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Zaky
 Chapter 8 : 8.3(8.3.1)
 Class Lecture

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

You might also like