Professional Documents
Culture Documents
Exe On Pipelining
Exe On Pipelining
Miscellania of pipelining
Complex pipelining, Pipeline
ACA
2
Complex Pipeline
In this problem we will examine the execution of a code segment
on the following single-issue out-of-order processor:
3
You can assume that
All functional units are pipelined
ALU operations take 1 cycle
Memory operations take 3 cycles (includes time in ALU)
Floating-point add instructions take 3 cycles
Floating-point multiply instructions take 5 cycles
There is no register renaming
Instructions are fetched, decoded and issued in order
The issue stage is a buffer of unlimited length that holds
instructions waiting to start execution
An instruction will only enter the issue stage if it does not
cause a WAR or WAW hazard
Only one instruction can be issued at a time, and in the
case multiple instructions are ready, the oldest one will go
first
4
Code
I1 L.D F3, 0(R0)
I2 ADD.D F2, F2, F3
I3 MUL.D F5, F4, F4
I4 ADD.I R0, R0, 8
I5 L.D F3, 0(R0)
I6 ADD.D F2, F3, F5
5
Code and Architecture
I1 L.D F3, 0(R0) ALU OP: 1 cycle
I2 ADD.D F2, F2, F3
MEM OP: 3 cycles
I3 MUL.D F5, F4, F4
I4 ADD.I R0, R0, 8 FP ADD: 3 cycles
I5 L.D F3, 0(R0) FP MULT: 5 cycles
I6 ADD.D F2, F3, F5
6
Conflicts
RAW I1-I2 on F3
I1 L.D F3, 0(R0)
WAW I1-I5 on F3
I2 ADD.D F2, F2, F3
WAR I2-I5 on F3
I3 MUL.D F5, F4, F4
WAR I1-I4 on R0
I4 ADD.I R0, R0, 8
I5 L.D F3, 0(R0) RAW I4-I5 on R0
I6 ADD.D F2, F3, F5 RAW I5-I6 on F3
RAW I3-I6 on F5
7
Pipeline schema
I1 L.D F3, 0(R0) RAW I1-I2 on F3
I2 ADD.D F2, F2, F3 WAR I1-I4 on R0
I3 MUL.D F5, F4, F4
WAR I2-I5 on F3
I4 ADD.I R0, R0, 8
I5 L.D F3, 0(R0)
RAW I5-I6 on F3
I6 ADD.D F2, F3, F5 RAW I3-I6 on F5
I E C
I E C
I E C
I E C
I E C
I E C
8
9
while (i != N)
The Code
The program has been compiled in MIPS assembly
The symbols BASEA, BASEB and BASEC are 16-
L1: lw $2, BASEA ($4)
addi $2, $2, INC1
lw $3, BASEB ($4)
addi $3, $3, INC2
add $5, $2, $3
sw $5, BASEC ($4)
addi $4, $4, 4
bne $4, $7, L1
Consider the above program be executed on a 2-is
TAKEN FORWARD NOT TAKEN) with Bran
10
Assume there are the following optimizations in
Pipelining
Consider the code to be executed on
a 2-issue Superscalar MIPS architecture
with Static Branch Prediction BTFNT (BACKWARD TAKEN
FORWARD NOT TAKEN) with Branch Target Buffer
Assume there are the following optimizations in the
pipeline
Consider for each instruction issue: 1 ALU/BRANCH and
1 LOAD/STORE
Consider a Register File with 4 read ports, 2 write
ports. A single read operation and a single write
operation both at the same address can be executed;
Forwarding
Computation of PC and TARGET ADDRESS for branch &
jump instructions anticipated in the ID stage
11
Open issues…
12