L13 Pipelining 2

Computer Architecture
Lecture – 13
 We already know that the Data Hazards
occur in a situation when the data to be
operated on is not available when it is needed.
 Assume that, a program contains two
instructions(I1 & I2).
 If the two instructions are executed in pipeline,
then it is obviously possible that execution of I2
begins before the execution of I1 ends.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 That means, the result of I1 may not be
available for the use of I2.
 Now, it must be ensured, the result of a
program should be same whether it is executed
in pipelining or sequential mode.
 Example 1 :
 I1 : A  3 + A and I2 : B  4 x A
 Here, the result of I1(value of A) is needed for
having correct result result of I2(Value of B).
 So, second instruction’s correct evaluation is
dependent on the completion of the first instruction
correctly.
 This kind of operations are called Dependent
operation.
 In this kind of situation, pipelining can give incorrect
result.
 This situation is also called Data Dependency.

 Example 2 :
 I1 : A  5 x C and I2 : B  20 + C
 Here, the result of I1(value of A) and the result of
I2(Value of B) is totally independent of one another.
 So, this two instruction can be performed at same time.
 There is no data dependency here.
 And this kind of operations are called Independent
Operation.
 For this kind of operation pipelining gives very good
performance.

 So it is now clear that, when two consecutive
operations are dependent, then for correct
results of both, they must be executed
sequentially in correct order.
 Following is an example another data
dependency where, the destination of the first
instruction is used as one of the sources in the
next instruction.

 I1 : Mul R2, R3, R4 and I2 : Add R5, R4, R6
 Here, the destination of multiplication, R4 is used
as the source of addition.
 So, until I1 is executed, I2 can not be computed
correctly. Clock c ycle 1 2 3 4 5 6 7 8 9
T ime
Instruction
I 1 (Mul) F1 D1 E1 W1
I 2 (Add) F2 D2 D 2A E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4

 So, in case of pipelining performance, there is
far reaching bad consequences of these
dependent operations.
 Because once the delay occurs, it continues to
remain in the consecution instructions.
 So, we have to now think about trade offs and
also alternative methods.

 Operand Forwarding one of the methods for
solving the down performance of pipelining due
to the Data Hazards:
 In the previous examples of Data hazards,
pipelining is stalled because the execution phase of
I2 has to wait for the writing phase of I1 in the
Register file.
 Now, the result of I1 is available in the output of
ALU at the end of the execution phase.

 So, if a method can be arranged in which result of I1
will be forwarded for using in the next instruction
before writing in the register, then the delay can be
reduced.
 In this case, at the same time the operand will be
forwarded for using in the next instruction and to
the writing phase of that instruction to be written
into the destination register.
 This is a hardware related arrangement for solving
the delay due to Data Hazards.
 This hardware arrangement is following :
SRC1, SRC2 RSLT
E : Execute W : Write
(ALU) (Register File)
Forwarding path

Source 1
Source 2
SRC1 SRC2
Register MUX MUX

File
ALU
RSLT
Destination
Datapath

 The above datapath shows a part of the
processor involving the ALU and Register file.
 This arrangement is similar to the three-bus
architecture except the three registers SRC1,
SRC2 and RSLT.
 These registers are used as interstage buffer.
 The operand forwarding is done through the
blue lines in the figure.

 The task of the two multiplexers are to select
whether the input of the ALU will be the
content of the destination line or the
SRC1/SRC2 register.
 Operand forwarding done in the following
way :
 When the execution of I2 is dependent on the result
of I1, then at the end execution of I1, the result is
passed from the destination line.
 At the same time, the result is also sent to the
register file to be written into the specific
destination register.
 But for this, there is no delay occurs in the execution
of the instruction, l2, since its necessary operand is
already forwarded.
 In this way, by adding some extra hardware, delay
of Data Hazard can be minimized by operand
forwarding.

 Till now, we have assumed that, whether there
is a data dependency in the instruction is
detected by the hardware, decoder unit of
pipelining.
 If operand forwarding is not used, then data
dependency causes the pipelining stalled for
some clock cycle.
 But the duty of detecting the data dependency
and dealing with then can be given solely to
the software also.
 In that case, the duty is given to the compiler.
 Then the compiler inserts NOP instruction into
the time slots where pipelining get stalled to
get correct result.
I1: Mul R2, R3, R4
NOP
NOP
I2: Mul R5, R4, R6
 So, the same task can be done by the software

and hardware.
 But, handling by software leads to have a little
less complex hardware.
 If the software detects the data dependency
earlier, instead of inserting NOP, if possible, it
can rearrange the code and can put some
useful instruction in those NOP slots and can
achieve better performance.
 But always such instructions are not found, so
insertion of NOP then increases the code size,
but without any delay.
 Instead of having this software method,
processor normally offers different hardware
organization to handle the delay for data
hazards.
 Because for some instruction, insertion of NOP
instruction may seems to be unnecessary,
which would lead to reduce the performance.

 The data dependencies which occurred in the
previous are very explicitly understood.
 Because, it is seen that the content of the
destination register is to be used as a source
operand for the next instruction.
 But sometimes instruction changes any
register content internally, which can not be
explicitly seen and handled, i.e.
autoincrement, autodecrement operation.
 An instruction is said to have Side Effect, if it
changes the content of any location other than
which is explicitly mentioned as a destination
operand.
 Example : autoincrement, autodecrement,
push, pop operation.
 Other instructions with side effect are
branch , add with carry instructions which
involves condition code flags.
 Example : R1, R2, R3 and R4 contains double
precision integers.
 The task is to add R1, R3 and then Add R2, R4
 Now, an implicit dependency exists between
these two instructions.
 That is the 1st instruction generates a carry
flag, so carry flag becomes set.
 So, 2nd instruction becomes becomes,
AddWithcarry R2, R4.
 So, this implicit setting of flag gives incorrect
result of 2nd instruction.
 So instructions that have side effects which
generates multiple data dependencies,
increases the complexity in both hardware and
software too much.
 For this, instruction to be executed in a
pipelined processor should have few side
effects to achieve good performance.
 The best performance from pipelining and
reduction of data hazards can be achieved if
the corresponding instructions only change the
destination register/location.
 For having good performance, any instruction
having Side Effects through flag changes,
updation of address pointer should be kept
minimum in a pipelined processor.

 Zaky
 Chapter 8 : 8.2(8.2.1, 8.2.2, 8.2.3))
 Class Lecture

L13 Pipelining 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L13 Pipelining 2

Uploaded by

Copyright:

Available Formats

Computer Architecture

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

SRC1, SRC2 RSLT

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

Register MUX MUX

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

 So, the same task can be done by the software

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

You might also like