Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 25

Computer Architecture

Lecture – 13
 We already know that the Data Hazards
occur in a situation when the data to be
operated on is not available when it is needed.
 Assume that, a program contains two
instructions(I1 & I2).
 If the two instructions are executed in pipeline,
then it is obviously possible that execution of I2
begins before the execution of I1 ends.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 That means, the result of I1 may not be
available for the use of I2.
 Now, it must be ensured, the result of a
program should be same whether it is executed
in pipelining or sequential mode.
 Example 1 :
 I1 : A  3 + A and I2 : B  4 x A
 Here, the result of I1(value of A) is needed for
having correct result result of I2(Value of B).
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 So, second instruction’s correct evaluation is
dependent on the completion of the first instruction
correctly.
 This kind of operations are called Dependent
operation.
 In this kind of situation, pipelining can give incorrect
result.
 This situation is also called Data Dependency.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Example 2 :
 I1 : A  5 x C and I2 : B  20 + C
 Here, the result of I1(value of A) and the result of
I2(Value of B) is totally independent of one another.
 So, this two instruction can be performed at same time.
 There is no data dependency here.
 And this kind of operations are called Independent
Operation.
 For this kind of operation pipelining gives very good
performance.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 So it is now clear that, when two consecutive
operations are dependent, then for correct
results of both, they must be executed
sequentially in correct order.
 Following is an example another data
dependency where, the destination of the first
instruction is used as one of the sources in the
next instruction.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 I1 : Mul R2, R3, R4 and I2 : Add R5, R4, R6
 Here, the destination of multiplication, R4 is used
as the source of addition.
 So, until I1 is executed, I2 can not be computed
correctly. Clock c ycle 1 2 3 4 5 6 7 8 9
T ime

Instruction
I 1 (Mul) F1 D1 E1 W1

I 2 (Add) F2 D2 D 2A E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 So, in case of pipelining performance, there is
far reaching bad consequences of these
dependent operations.
 Because once the delay occurs, it continues to
remain in the consecution instructions.
 So, we have to now think about trade offs and
also alternative methods.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Operand Forwarding one of the methods for
solving the down performance of pipelining due
to the Data Hazards:
 In the previous examples of Data hazards,
pipelining is stalled because the execution phase of
I2 has to wait for the writing phase of I1 in the
Register file.
 Now, the result of I1 is available in the output of
ALU at the end of the execution phase.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 So, if a method can be arranged in which result of I1
will be forwarded for using in the next instruction
before writing in the register, then the delay can be
reduced.
 In this case, at the same time the operand will be
forwarded for using in the next instruction and to
the writing phase of that instruction to be written
into the destination register.
 This is a hardware related arrangement for solving
the delay due to Data Hazards.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 This hardware arrangement is following :

SRC1, SRC2 RSLT

E : Execute W : Write
(ALU) (Register File)

Forwarding path

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


Source 1

Source 2

SRC1 SRC2

Register MUX MUX


File

ALU

RSLT

Destination

Datapath

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 The above datapath shows a part of the
processor involving the ALU and Register file.
 This arrangement is similar to the three-bus
architecture except the three registers SRC1,
SRC2 and RSLT.
 These registers are used as interstage buffer.
 The operand forwarding is done through the
blue lines in the figure.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 The task of the two multiplexers are to select
whether the input of the ALU will be the
content of the destination line or the
SRC1/SRC2 register.
 Operand forwarding done in the following
way :
 When the execution of I2 is dependent on the result
of I1, then at the end execution of I1, the result is
passed from the destination line.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 At the same time, the result is also sent to the
register file to be written into the specific
destination register.
 But for this, there is no delay occurs in the execution
of the instruction, l2, since its necessary operand is
already forwarded.
 In this way, by adding some extra hardware, delay
of Data Hazard can be minimized by operand
forwarding.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Till now, we have assumed that, whether there
is a data dependency in the instruction is
detected by the hardware, decoder unit of
pipelining.
 If operand forwarding is not used, then data
dependency causes the pipelining stalled for
some clock cycle.
 But the duty of detecting the data dependency
and dealing with then can be given solely to
the software also.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 In that case, the duty is given to the compiler.
 Then the compiler inserts NOP instruction into
the time slots where pipelining get stalled to
get correct result.
I1: Mul R2, R3, R4
NOP
NOP
I2: Mul R5, R4, R6

 So, the same task can be done by the software


and hardware.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 But, handling by software leads to have a little
less complex hardware.
 If the software detects the data dependency
earlier, instead of inserting NOP, if possible, it
can rearrange the code and can put some
useful instruction in those NOP slots and can
achieve better performance.
 But always such instructions are not found, so
insertion of NOP then increases the code size,
but without any delay.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 Instead of having this software method,
processor normally offers different hardware
organization to handle the delay for data
hazards.
 Because for some instruction, insertion of NOP
instruction may seems to be unnecessary,
which would lead to reduce the performance.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 The data dependencies which occurred in the
previous are very explicitly understood.
 Because, it is seen that the content of the
destination register is to be used as a source
operand for the next instruction.
 But sometimes instruction changes any
register content internally, which can not be
explicitly seen and handled, i.e.
autoincrement, autodecrement operation.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 An instruction is said to have Side Effect, if it
changes the content of any location other than
which is explicitly mentioned as a destination
operand.
 Example : autoincrement, autodecrement,
push, pop operation.
 Other instructions with side effect are
branch , add with carry instructions which
involves condition code flags.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 Example : R1, R2, R3 and R4 contains double
precision integers.
 The task is to add R1, R3 and then Add R2, R4
 Now, an implicit dependency exists between
these two instructions.
 That is the 1st instruction generates a carry
flag, so carry flag becomes set.
 So, 2nd instruction becomes becomes,
AddWithcarry R2, R4.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 So, this implicit setting of flag gives incorrect
result of 2nd instruction.
 So instructions that have side effects which
generates multiple data dependencies,
increases the complexity in both hardware and
software too much.
 For this, instruction to be executed in a
pipelined processor should have few side
effects to achieve good performance.
09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET
 The best performance from pipelining and
reduction of data hazards can be achieved if
the corresponding instructions only change the
destination register/location.
 For having good performance, any instruction
having Side Effects through flag changes,
updation of address pointer should be kept
minimum in a pipelined processor.

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET


 Zaky
 Chapter 8 : 8.2(8.2.1, 8.2.2, 8.2.3))
 Class Lecture

09/23/22 Sumaiya Iqbal, Lecturer, CSE, BUET

You might also like