Lect12 TomasuloExample PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Lecture-12

Tomasulos Approach
(Detailed Examples)

Quiz Time

EE/CS520- Comp. Archi.

10/8/2012

MIPS FP Unit using Tomasulos Algo


IS

EX

WR

EE/CS520- Comp. Archi.

10/8/2012

Data Structure
Qj, Qk: Rsrv. Station for Src1 and Src2

Vj, Vk: Values of two operands. Values are valid if Qj and Qk are zero
Busy: Rsrv. Station and FU are busy

Opcode: Operation to be performed

A: Memory Addresses for LD/SD (initially Imm Value)

Qi: Contained in Reg-file, indicates the src Rsrv. Station (like

scoreboard)

Is Ex W

Reservation Stations
Busy Op

LD1
LD2
AD1
AD2
AD3
ML1
ML2
SD1
SD2

Register Status: Qi

EE/CS520- Comp. Archi.

F0

F2

Vj

F4

Vk

F6

F8

Qj

Qk

F10 F12

10/8/2012

Detailed Example (1)

Same example that we did with


Scoreboard

EE/CS520- Comp. Archi.

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

Reservation Stations
Is Ex W

Busy Op

LD1
LD2
AD1
AD2
AD3
ML1
ML2

Register Status:

EE/CS520- Comp. Archi.

F0

F2

Vj

F4

Vk

F6

F8

Qj

Qk

F10 F12

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

Reservation Stations
Is Ex W
1

Busy Op
LD1 1
L.D

LD2
AD1
AD2
AD3
ML1
ML2

Register Status:

EE/CS520- Comp. Archi.

F0

F2

Vj

F4

Vk

F6

LD1

F8

Qj

Qk

F10 F12

A
34

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

Reservation Stations
Is Ex W
1 2
2

Busy Op
LD1 1
L.D
LD2 1
L.D

AD1
AD2
AD3
ML1
ML2

Register Status:

EE/CS520- Comp. Archi.

F0

F2

LD2

Vj

F4

Vk

F6

LD1

F8

Qj

Qk

F10 F12

A
134
45

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

Reservation Stations
Is Ex W
1 2
2 3
3

Busy Op
LD1 1
L.D
LD2 1
L.D

AD1
AD2
AD3
ML1
ML2

1 MUL.D
F0

F2

Register Status: ML1 LD2

EE/CS520- Comp. Archi.

Vj

F4

F6

LD1

Vk

Qj

2.5

LD2

F8

Qk

F10 F12

A
134
245

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

10

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

Reservation Stations
Is Ex W
1 2 4
2 3
3
4

Busy
LD1 1
LD2 1
AD1 1

AD2
AD3
ML1
ML2

Op
L.D
L.D
SUB.D

1 MUL.D
F0

F2

Register Status: ML1 LD2

EE/CS520- Comp. Archi.

Vj

Vk
@[134]=0.5

F4

F6

2.5

F8

LD1 AD1

Qj

Qk

LD2 LD1

A
134
245

LD2

F10 F12

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

11

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

Reservation Stations
Is Ex W
1 2 4
2 3 5
3
4
5

Busy Op
LD1 0
LD2 1
L.D
AD1 1 SUB.D

AD2
AD3
ML1
ML2

@[245]=1.5

Vk

0.5

1 MUL.D @[245]=1.5 2.5


1 DIV.D
0.5
F0 F2 F4 F6 F8

Register Status: ML1 LD2

EE/CS520- Comp. Archi.

Vj

Qj

LD2

Qk

LD2
ML1
F10 F12

AD1 ML2

245

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

12

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

Reservation Stations
Is
1
2
3
4
5
6

Ex W
2 4
3 5
6
6

Busy Op
LD1 0
LD2 0
AD1 1 SUB.D
AD2 1 ADD.D

AD3
ML1
ML2

1.5

Vk

0.5
1.5

1 MUL.D 1.5
2.5
1 DIV.D
0.5
F0 F2 F4 F6 F8

Register Status: ML1

EE/CS520- Comp. Archi.

Vj

Qj

Qk

AD1

ML1
F10 F12

AD2 AD1 ML2

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

13

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

Reservation Stations
Is
1
2
3
4
5
6

Ex W
2 4
3 5
6
6 8

Busy Op
LD1 0
LD2 0
AD1 1 SUB.D
AD2 1 ADD.D

AD3
ML1
ML2

1.5
1.0

Vk

0.5
1.5

1 MUL.D 1.5
2.5
1 DIV.D
0.5
F0 F2 F4 F6 F8

Register Status: ML1

EE/CS520- Comp. Archi.

Vj

Qj

Qk

AD1

ML1
F10 F12

AD2 AD1 ML2

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

14

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

Reservation Stations
Is
1
2
3
4
5
6

Ex W
2 4
3 5
6
6 8
9

Busy Op
LD1 0
LD2 0
AD1 0
AD2 1 ADD.D

AD3
ML1
ML2

Vk

1.0

1.5

1 MUL.D 1.5
2.5
1 DIV.D
0.5
F0 F2 F4 F6 F8

Register Status: ML1

EE/CS520- Comp. Archi.

Vj

AD2

Qj

Qk

ML1
F10 F12

ML2

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

15

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

11

Reservation Stations
Is
1
2
3
4
5
6

Ex W
2 4
3 5
6
6 8

9 11

Busy Op
LD1 0
LD2 0
AD1 0
AD2 1 ADD.D

AD3
ML1
ML2

Vk

1.0

1.5

1 MUL.D 1.5
2.5
1 DIV.D
0.5
F0 F2 F4 F6 F8

Register Status: ML1

EE/CS520- Comp. Archi.

Vj

AD2

Qj

Qk

ML1
F10 F12

ML2

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

16

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

16

Reservation Stations
Is
1
2
3
4
5
6

Ex W
2 4
3 5
6 16
6 8
9 11

Busy Op
LD1 0
LD2 0
AD1 0
AD2 0

AD3
ML1
ML2

Vk

1 MUL.D 1.5
2.5
1 DIV.D 3.75
0.5
F0 F2 F4 F6 F8

Register Status: ML1

EE/CS520- Comp. Archi.

Vj

Qj

Qk

ML1
F10 F12

ML2

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

17

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

17

Reservation Stations
Is
1
2
3
4
5
6

Ex W
2 4
3 5
6 16
6 8
17
9 11

Busy Op
LD1 0
LD2 0
AD1 0
AD2 0

AD3
ML1
ML2

Register Status:

EE/CS520- Comp. Archi.

0
1 DIV.D
F0 F2

Vj

Vk

3.75
0.5
F4 F6 F8

Qj

Qk

F10 F12

ML2

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

18

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

57

Reservation Stations
Is
1
2
3
4
5
6

Ex W
2 4
3 5
6 16
6 8
17 57
9 11

Busy Op
LD1 0
LD2 0
AD1 0
AD2 0

AD3
ML1
ML2

Register Status:

EE/CS520- Comp. Archi.

0
1 DIV.D
F0 F2

Vj

Vk

3.75
0.5
F4 F6 F8

Qj

Qk

F10 F12

ML2

10/8/2012

Detailed Example
Load: 2 cycles
Add: 2 cycles
Mult: 10 cycles
Divide: 40 cycles

Assume

R2 is 100
R3 is 200
F4 is 2.5
1.
2.
3.
4.
5.
6.

19

L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

Cycle:

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F2, F6
F10,F0,F6
F6, F8, F2

57

Reservation Stations
Is
1
2
3
4
5
6

Ex W
2 4
3 5
6 16
6 8
17 57
9 11

Busy Op
LD1 0
LD2 0
AD1 0
AD2 0

AD3
ML1
ML2

Register Status:

EE/CS520- Comp. Archi.

0
0
F0

F2

Vj

F4

Vk

F6

F8

Qj

Qk

F10 F12

10/8/2012

Scoreboard vs. Tomasulo vs. MC FP


Instruction
L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D

20

F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F6, F2
F10, F0, F6
F6, F8, F2

Scoreboard
Read
Issue
Operand
1
2-5
6
7
8
9-13

2
6
7-9
8-9
9-21
14

Tomasulo

Execution
Complete

Write
Result

3
7
10-19
10-11
22-61
15-16

4
8
20
12
62
22

Is
1
2
3
4
5
6

Ex W
2 4
3 5
6 16
6 8
17 57
9 11

MC Pipeline

WB
5
6
17
10
57
20

Observations:
Scoreboard vs. Tomasulos Approach
Structural hazards (multiple loads in Tomasulos)
No forwarding
(data forwarding using CDB)
WAR and WAW avoided using register renaming in Tomasulos approach
MC Pipeline vs. Tomasulos Approach
Dynamic Scheduling (O-O-O Execution Start)
Facilitation to independent inst execution

EE/CS520- Comp. Archi.

10/8/2012

Loop-based Example

21

EE/CS520- Comp. Archi.

10/8/2012

Loop-based Example
Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

22

Cycle:

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Is Ex W

0
F0
0
R1
Loop

Busy Op
LD1 0
LD2 0
LD3 0
SD1 0
SD2 0
SD3 0
AD1 0
AD2 0
AD3 0
ML1 0
ML2 0
F0 F2

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)
Vj

F4

Vk

F6

F8

Qj

Qk

F10 F12

10/8/2012

Loop-based Example CC1


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

23

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 1
R1:
80

Is Ex W
1

0
F0
0
R1
Loop

Busy Op
LD1 1
LD
LD2 0
LD3 0
SD1 0
SD2 0
SD3 0
AD1 0
AD2 0
AD3 0
ML1 0
ML2 0
F0 F2

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

LD1

Vj

F4

Assume
Load: 1 cycles
Mul: 4 cycles
But first load takes
8 cycles (cache miss)
Vk

F6

F8

Qj

Qk

F10 F12

A
0

10/8/2012

Loop-based Example CC2


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

24

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 2
R1:
80

Is Ex W
1 2
2

0
F0
0
R1
Loop

Busy Op
Vj
Vk
LD1 1
LD
LD2 0
LD3 0
SD1 0
SD2 0
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D
R(F2)
ML2 0
F0 F2 F4 F6 F8

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD1

ML1

Qj

Qk

A
80

LD1

F10 F12

10/8/2012

Loop-based Example CC3


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

25

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 3
80
R1:

Is Ex W
1 3
2
3

0
F0
0
R1
Loop

Busy Op
Vj
Vk
Qj
Qk
LD1 1
LD
LD2 0
LD3 0
SD1 1
SD
ML1
SD2 0
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D
R(F2) LD1
ML2 0
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD1

ML1

A
80
0

10/8/2012

Loop-based Example CC4


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

26

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 4
R1:
80

Is Ex W
1 4
2
3 4

0
F0
0
R1
Loop

Dispatching SUBI Instruction

Busy Op
Vj
Vk
Qj
Qk
LD1 1
LD
LD2 0
LD3 0
SD1 1
SD
ML1
SD2 0
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D
R(F2) LD1
ML2 0
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD1

ML1

A
80
80

10/8/2012

Loop-based Example CC5


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

27

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 5
R1:
72

Is Ex W
1 5
2
3 4

0
F0
0
R1
Loop

Dispatching BNEZ Instruction

Busy Op
Vj
Vk
Qj
Qk
LD1 1
LD
LD2 0
LD3 0
SD1 1
SD
ML1
SD2 0
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D
R(F2) LD1
ML2 0
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD1

ML1

A
80
80

10/8/2012

Loop-based Example CC6


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

28

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 6
R1:
72

Is Ex W
1 6
2
3 4
6

0
F0
0
R1
Loop

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

Busy Op
Vj
Vk
Qj
Qk
LD1 1
LD
LD2 1
LD
LD3 0
SD1 1
SD
ML1
SD2 0
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D
R(F2) LD1
ML2 0
F0 F2 F4 F6 F8 F10 F12

ML1
Register Status: LD2
Notice that F0 never sees Load from location 80

EE/CS520- Comp. Archi.

A
80
0
80

10/8/2012

Loop-based Example CC7


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Is
1
2
3
6
7

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

29

Cycle: 7
R1:
72

Ex W
7
4
7

0
F0
0
R1
Loop

Busy Op
Vj
Vk
Qj
Qk
LD1 1
LD
LD2 1
LD
LD3 0
SD1 1
SD
ML1
SD2 0
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D
R(F2) LD1
ML2 1 MUL.D
R(F2) LD2
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD2

ML2

A
80
72
80

10/8/2012

Loop-based Example CC8


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

30

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 8
R1:
72

Is
1
2
3
6
7
8

Ex W
8
4
7

0
F0
0
R1
Loop

Busy Op
Vj
Vk
Qj
Qk
LD1 1
LD
LD2 1
LD
LD3 0
SD1 1
SD
ML1
SD2 1
SD
ML2
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D
R(F2) LD1
ML2 1 MUL.D
R(F2) LD2
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD2

ML2

A
80
72
80
0

10/8/2012

Loop-based Example CC9


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

31

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 9
R1:
72

Is
1
2
3
6
7
8

Ex W
9
4
7
9

0
F0
0
R1
Loop

Dispatching SUBI Instruction

Busy Op
Vj
Vk
Qj
Qk
LD1 1
LD
LD2 1
LD
LD3 0
SD1 1
SD
ML1
SD2 1
SD
ML2
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D
R(F2) LD1
ML2 1 MUL.D
R(F2) LD2
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD2

ML2

A
80
72
80
72

10/8/2012

Loop-based Example CC10


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

32

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 10
R1:
64

Is
1
2
3
6
7
8

Ex W
9 10

4
10
9

0
F0
0
R1
Loop

Dispatching BNEZ Instruction

Busy Op
Vj
Vk
Qj
Qk
LD1 0
LD2 1
LD
LD3 0
SD1 1
SD
ML1
SD2 1
SD
ML2
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D @[80] R(F2)
ML2 1 MUL.D
R(F2) LD2
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD2

ML2

72

80
72

10/8/2012

Loop-based Example CC11


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

33

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 11
R1:
64

Is
1
2
3
6
7
8

Ex W
9 10
11
4
10 11
9

0
F0
0
R1
Loop

Busy Op
Vj
Vk
Qj
Qk
LD1 0
LD2 0
LD3 1
LD
SD1 1
SD
ML1
SD2 1
SD
ML2
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D @[80] R(F2)
ML2 1 MUL.D @[72] R(F2)
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD3

ML2

0
80
72

10/8/2012

Loop-based Example CC12


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

34

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 12
R1:
64

Is
1
2
3
6
7
8

Ex W
9 10
12
4
10 11
12
9

0
F0
0
R1
Loop

Busy Op
Vj
Vk
Qj
Qk
LD1 0
LD2 0
LD3 1
LD
SD1 1
SD
ML1
SD2 1
SD
ML2
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D @[80] R(F2)
ML2 1 MUL.D @[72] R(F2)
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD3

ML2

Why not issue third multiply??

64
80
72

10/8/2012

Loop-based Example CC13


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 13
R1:
64

35

Is
1
2
3
6
7
8

Ex W
9 10
13
4
10 11
13
9

0
F0
0
R1
Loop

Busy Op
Vj
Vk
Qj
Qk
LD1 0
LD2 0
LD3 1
LD
SD1 1
SD
ML1
SD2 1
SD
ML2
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D @[80] R(F2)
ML2 1 MUL.D @[72] R(F2)
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD3

ML2

64
80
72

10/8/2012

Loop-based Example CC14


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 14
R1:
64

36

Is
1
2
3
6
7
8

Ex W
9 10
14
4
10 11
14
9

0
F0
0
R1
Loop

Busy Op
Vj
Vk
Qj
Qk
LD1 0
LD2 0
LD3 1
LD
SD1 1
SD
ML1
SD2 1
SD
ML2
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D @[80] R(F2)
ML2 1 MUL.D @[72] R(F2)
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD3

ML2

64
80
72

10/8/2012

Loop-based Example CC15


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

37

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 15
R1:
64

Is
1
2
3
6
7
8

Ex W
9 10
14 15
4
10 11
15
9

0
F0
0
R1
Loop

Busy Op
Vj
Vk
Qj
Qk
LD1 0
LD2 0
LD3 1
LD
SD1 1
SD @[80]*F2
SD2 1
SD
ML2
SD3 0
AD1 0
AD2 0
AD3 0
ML1 0
ML2 1 MUL.D @[72] R(F2)
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD3

ML2

64
80
72

10/8/2012

Loop-based Example CC16


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

38

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 16
R1:
64

Is
1
2
3
6
7
8

Ex W
9 10
14 15
16
10 11
15 16
9

0
F0
0
R1
Loop

Busy Op
Vj
Vk
LD1 0
LD2 0
LD3 1
LD
SD1 1
SD @[80]*F2
SD2 1
SD @[72]*F2
SD3 0
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D
R(F2)
ML2 0
F0 F2 F4 F6 F8

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD3

ML1

Qj

Qk

64
80
72

LD3

F10 F12

10/8/2012

Loop-based Example CC17


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 17
R1:
64

39

Is
1
2
3
6
7
8

Ex W
9 10
14 15
16 17
10 11
15 16
17

0
F0
0
R1
Loop

Busy Op
Vj
Vk
Qj
Qk
LD1 0
LD2 0
LD3 1
LD
SD1 0
SD2 1
SD @[72]*F2
SD3 1
SD
ML1
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D
R(F2) LD3
ML2 0
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD3

ML1

64
72
0

10/8/2012

Loop-based Example CC18


Loop:

LD
MUL.D
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Reservation Stations

Iter Inst
1. L.D
1. MUL.D
1. SD
2. L.D
2. MUL.D
2. SD

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F0, 0(R1)
F4, F0, F2
F4, 0(R1)

Cycle: 18
R1:
64

40

Is
1
2
3
6
7
8

Ex W
9 10
14 15
16 17
10 11
15 16
17 18

0
F0
0
R1
Loop

Dispatching SUBI Instruction

Busy Op
Vj
Vk
Qj
Qk
LD1 0
LD2 0
LD3 1
LD
SD1 0
SD2 0
SD3 1
SD
ML1
AD1 0
AD2 0
AD3 0
ML1 1 MUL.D
R(F2) LD3
ML2 0
F0 F2 F4 F6 F8 F10 F12

Register Status:

EE/CS520- Comp. Archi.

R1
F2
R1
#8

Assume
Load: 2 cycles
Mul: 4 cycles
But first load takes
8 cycles in EX
(data cache miss)

LD3

ML1

64
64

10/8/2012

What Could be Concluded from Loop-Based


Example??

41

EE/CS520- Comp. Archi.

10/8/2012

You might also like