Professional Documents
Culture Documents
L6 Tomasulo
L6 Tomasulo
Nov. 2, 2004
Lec. 7
3. Register result statusIndicates which functional unit will write each register, if one
exists. Blank when no pending instructions will write that register
Nov. 2, 2004
Lec. 7
Wait until
Bookkeeping
Issue
Read
operands
Rj and Rk
Rj No; Rk No
Execution
complete
Functional unit
done
WAW
f((Fj( f )!=Fi(FU)
or Rj( f )=No) &
Write result (Fk( f )!=Fi(FU) or
Rk( f )=No))
WAR
Nov. 2, 2004
Lec. 7
Scoreboard Example
The following numbers are to illustrate behavior, not
representative
LD 1 cycle
(compute address + data cache access)
Nov. 2, 2004
Lec. 7
Scoreboard Example
Instruction status
Instruction j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2 F4
SUBD F8
F6 F2
DIVD F10 F0 F6
ADDDF6
F8 F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read Execution
Write
Issue operands
complete
Result
Busy
No
No
No
No
No
Clock
F0
Op
dest
Fi
S1
Fj
S2
Fk
Fk?
Rk
F2
F4
F6
F8
F10 F12
F30
...
FU
Nov. 2, 2004
Lec. 7
Instruction status
Instruction j
k
LD
F6 34+ R2
LD
F2 45+ R3
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDDF6 F8 F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read Execution
Write
Issue operands
complete
Result
1
Busy
Yes
No
No
No
No
Clock
F0
FU
Nov. 2, 2004
Op
Load
dest
Fi
F6
S1
Fj
S2
Fk
R2
F2
F4
F6 F8 F10 F12
...
Fk?
Rk
Yes
F30
Integer
Lec. 7
Nov. 2, 2004
dest
Op
Fi
Load F6
S1 S2 FU for FU
j for F
k j?
Fj Fk Qj
Qk
Rj
R2
Fk?
Rk
No
F2
F6 F8 F10
Integer
F30
F4
Lec. 7
F12
...
Nov. 2, 2004
dest
Op Fi
Load F6
S1 S2 FU for FU
j for F
k j?
Fj Fk Qj
Qk
Rj
R2
Fk?
Rk
No
F2
F6 F8 F10
Integer
F30
F4
Lec. 7
F12
...
Nov. 2, 2004
dest
Op Fi
Load F6
S1 S2 FU for FU
j for F
k j?
Fj Fk Qj
Qk
Rj
R2
Fk?
Rk
No
F2
F6 F8 F10
F30
F4
Lec. 7
F12
...
Nov. 2, 2004
dest
Op
Fi
Load F2
S1 S2 FU for FU
j for F
k j?
Fj Fk Qj
Qk
Rj
R3
Fk?
Rk
Yes
F2
F4
Integer
F6 F8 F10
F30
Lec. 7
F12
...
10
Nov. 2, 2004
dest
Op Fi
Load F2
Mult F0
Fk?
Rk
No
Yes
F2
F4
Integer
F6 F8 F10
F30
Lec. 7
F12
...
11
Nov. 2, 2004
I3 stalled at read
because I2 isnt
complete
dest
Op Fi
Load F2
Mult F0
Subd F8
F6 F2
Integer Yes No
F2
F4
Integer
F6 F8 F10
Add
F12
Lec. 7
...
Fk?
Rk
No
Yes
F30
12
Read EX
Write
Op
compl. Result
2
3
4
6
7
8
Op
dest
Fi
S1 S2 FU for FU
j for kFj? Fk?
Fj Fk Qj
Qk
Rj Rk
Mult
F0
F2 F4
Yes Yes
Sub
Div
F8
F10
F6 F2
F0 F6 Mult1
Yes Yes
No Yes
F2
F4
F6 F8 F10 F12
Lec.Add
7 Divide
...
F30
13
Nov. 2, 2004
Read EX
Write
IssueOp
complete
Result
1
2
3
4
5
6
7
8
6
9
7
9
8
Busy Op
No
Yes Mult
No
Yes Sub
Yes Div
F0 F2
Mult1
dest
Fi
Fk?
Rk
F0
F2 F4
No
No
F8
F10
F6 F2
F0 F6 Mult1
No
No
No
Yes
F4
F6 F8 F10
Add Divide
...
F30
Lec. 7
F12
14
Nov. 2, 2004
Lec. 7
15
Nov. 2, 2004
FU for FU
j for F
k j?
Qj
Qk
Rj
Fk?
Rk
No
No
Mult1
No
Yes
F10 F12
Divide
...
F30
Lec. 7
16
Nov. 2, 2004
Mult1
F10
F12
Divide
Lec. 7
Fk?
Rk
No
Yes Yes
No Yes
...
F30
17
Nov. 2, 2004
FU for FU
j for F
k j?
Qj
Qk
Rj
Mult1
F10 F12
Divide
Lec. 7
Fk?
Rk
No
No
No
No
No
Yes
...
F30
18
Nov. 2, 2004
Mult1
F10
F12
Divide
Lec. 7
Fk?
Rk
No
No
No
No
No
Yes
...
F30
19
Nov. 2, 2004
Mult1
F10
Divide
Lec. 7
F12
Fk?
Rk
No
No
No
No
No
Yes
...
F30
20
Nov. 2, 2004
FU for FU
j for F
k j?
Qj
Qk
Rj
Mult1
F10 F12
Divide
Lec. 7
Fk?
Rk
No
No
No
No
No
Yes
...
F30
21
Nov. 2, 2004
FU for FU
j for F
k j?
Qj
Qk
Rj
Mult1
F10 F12
Divide
Lec. 7
Fk?
Rk
No
No
No
No
No
Yes
...
F30
22
Nov. 2, 2004
MULT completes
after 10 cycles
FU for FU
j for F
k j?
Qj
Qk
Rj
Mult1
F10 F12
Divide
Lec. 7
Fk?
Rk
No
No
No
No
No
Yes
...
F30
23
Nov. 2, 2004
FU for FU
j for F
k j?
Qj
Qk
Rj
Fk?
Rk
No No
Yes Yes
F10 F12
Divide
Lec. 7
...
F30
24
Nov. 2, 2004
FU for FU
j for F
k j?
Qj
Qk
Rj
F10 F12
Divide
Lec. 7
Fk?
Rk
No
No
No
No
...
F30
25
Nov. 2, 2004
FU for FU
j for F
k j?
Qj
Qk
Rj
F10 F12
Divide
Lec. 7
Fk?
Rk
No
No
...
F30
26
Nov. 2, 2004
DIVD completes
execution
FU for FU
j for F
k j?
Qj
Qk
Rj
F10 F12
Divide
Lec. 7
Fk?
Rk
No
No
...
F30
27
Instruction status
Instruction j
k
LD
F6 34+ R2
LD
F2 45+ R3
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDDF6 F8 F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
0 Divide
Register result status
Read Execution
Write
Issue operands
complete
Result
1
2
3
4
5
6
7
8
6
9
19
20
7
9
11
12
8
21
61
62
13
14
16
22
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
No
No
No
No
Clock
F0
62
F2
F4
Execution is finished
F6 F8 F10 F12
...
Fk?
Rk
F30
FU
Nov. 2, 2004
Lec. 7
28
Review: Scoreboard
Limitations of 6600 scoreboard
No forwarding
Limited to instructions in basic block (small window)
Large number of functional units (structural hazards)
Stall on WAR hazards
Stall on WAW hazards
DIV.D
ADD.D
WAR S.D
SUB.D
Antidependence
MUL.D
F0, F2, F4
F6, F0, F8
F6, 0(R1)
WAW
F8, F10, F14
Output dependence
F6, F10, F8
Name dependence
Nov. 2, 2004
Lec. 7
29
Lec. 7
30
Nov. 2, 2004
Lec. 7
31
F0, F2, F4
S, F0, F8
S, 0(R1)
T, F10, F14
F6, F10, T
register renaming
Nov. 2, 2004
Lec. 7
32
Nov. 2, 2004
Lec. 7
33
Nov. 2, 2004
Lec. 7
34
Nov. 2, 2004
Lec. 7
Address
F12
...
35
F30
Nov. 2, 2004
Lec. 7
Address
34+R2
F12
...
36
F30
Nov. 2, 2004
Lec. 7
37
Nov. 2, 2004
Lec. 7
Busy
Yes
Yes
No
Address
34+R2
45+R3
F10
F12
...
38
F30
k
R2
R3
F4
F2
F6
F2
Execution Write
Issue complete Result
1
2--3
4
2
3--4
3
4
Busy Op
Yes Sub
No
No
Yes Mult
No
FU
Nov. 2, 2004
F0
Mult1
S1
Vj
M(A1)
F2
Load2
S2
Vk
Busy
Load1 No
Load2 Yes
Load3 No
Address
45+R3
RS for j RS for k
Qj
Qk
Load2
R(F4)
Load2
F4
F6
M(A1)
Lec. 7
F8
Add1
F10
F12
...
39
F30
k
R2
R3
F4
F2
F6
F2
Execution Write
Issue complete Result
1
2--3
4
2
3--4
5
3
4
5
Busy Op
Yes Sub
No
No
Yes Mult
Yes Div
FU
Nov. 2, 2004
F0
Mult1
Busy
Load1 No
Load2 No
Load3 No
S1
Vj
M(A1)
S2 RS for j RS for k
Vk
Qj
Qk
M(A2)
M(A2)
R(F4)
M(A1)
F2
M(A2)
F4
Address
Mult1
F6
M(A1)
Lec. 7
F8
F10 F12
Add1 Mult2
...
40
F30
Execution Write
k Issue complete Result
Busy
R2
1
2--3
4
Load1 No
R3
2
3--4
5
Load2 No
F4
3
6 -Load3 No
F2
4
6 -F6
5
F2
6
S1
S2 RS for j RS for k
Busy Op
Vj
Vk
Qj
Qk
Yes Sub
M(A1)
M(A2)
Yes Add
M(A2) Add1
No
Yes Mult
M(A2)
R(F4)
Yes Div
M(A1) Mult1
FU
Nov. 2, 2004
F0
Mult1
F2
M(A2)
F4
F6
Add2
Lec. 7
Address
F8
F10 F12
Add1 Mult2
...
41
F30
Execution Write
k Issue complete Result
Busy
R2
1
2--3
4
Load1 No
R3
2
3--4
5
Load2 No
F4
3
6 -Load3 No
F2
4
6 -- 7
F6
5
F2
6
S1
S2 RS for j RS for k
Busy Op
Vj
Vk
Qj
Qk
Yes Sub
M(A1)
M(A2)
Yes Add
M(A2) Add1
No
Yes Mult
M(A2)
R(F4)
Yes Div
M(A1) Mult1
FU
Nov. 2, 2004
F0
Mult1
F2
M(A2)
F4
F6
Add2
Lec. 7
Address
F8
F10 F12
Add1 Mult2
...
42
F30
Execution Write
k Issue complete Result
Busy
R2
1
2--3
4
Load1 No
R3
2
3--4
5
Load2 No
F4
3
6 -Load3 No
F2
4
6 -- 7
8
F6
5
F2
6
S1
S2 RS for j RS for k
Busy Op
Vj
Vk
Qj
Qk
No
Yes Add
M1-M2
M(A2)
No
Yes Mult
M(A2)
R(F4)
Yes Div
M(A1) Mult1
FU
Nov. 2, 2004
F0
Mult1
F2
M(A2)
F4
Address
F6
F8
F10 F12
Add2 M1-M2 Mult2
Lec. 7
...
43
F30
Execution Write
k Issue complete Result
Busy
R2
1
2--3
4
Load1 No
R3
2
3--4
5
Load2 No
F4
3
6 -Load3 No
F2
4
6 -- 7
8
F6
5
F2
6
9 -S1
S2 RS for j RS for k
Busy Op
Vj
Vk
Qj
Qk
No
Yes Add
M1-M2
M(A2)
No
Yes Mult
M(A2)
R(F4)
Yes Div
M(A1) Mult1
FU
Nov. 2, 2004
F0
Mult1
F2
M(A2)
F4
Address
F6
F8
F10 F12
Add2 M1-M2 Mult2
Lec. 7
...
44
F30
Execution Write
k Issue complete Result
Busy
R2
1
2--3
4
Load1 No
R3
2
3--4
5
Load2 No
F4
3
6 -Load3 No
F2
4
6 -- 7
8
F6
5
F2
6
9 -- 10
S1
S2 RS for j RS for k
Busy Op
Vj
Vk
Qj
Qk
No
Yes Add
M1-M2
M(A2)
No
Yes Mult
M(A2)
R(F4)
Yes Div
M(A1) Mult1
FU
Nov. 2, 2004
F0
Mult1
F2
M(A2)
F4
Address
F6
F8
F10 F12
Add2 M1-M2 Mult2
Lec. 7
...
45
F30
Execution Write
k Issue complete Result
Busy
R2
1
2--3
4
Load1 No
R3
2
3--4
5
Load2 No
F4
3
6 -Load3 No
F2
4
6 -- 7
8
F6
5
F2
6
9 -- 10
11
S1
S2 RS for j RS for k
Busy Op
Vj
Vk
Qj
Qk
No
No
No
Yes Mult
M(A2)
R(F4)
Yes Div
M(A1) Mult1
FU
Nov. 2, 2004
F0
Mult1
F2
M(A2)
F4
Address
F6
F8
F10 F12
M1-M2+M(A2)
M1-M2 Mult2
Lec. 7
...
46
F30
Execution Write
k Issue complete Result
Busy
R2
1
2--3
4
Load1 No
R3
2
3--4
5
Load2 No
F4
3
6 -Load3 No
F2
4
6 -- 7
8
F6
5
F2
6
9 -- 10
11
S1
S2 RS for j RS for k
Busy Op
Vj
Vk
Qj
Qk
No
No
No
Yes Mult
M(A2)
R(F4)
Yes Div
M(A1) Mult1
FU
Nov. 2, 2004
F0
Mult1
F2
M(A2)
F4
Address
F6
F8
F10 F12
M1-M2+M(A2)
M1-M2 Mult2
Lec. 7
...
47
F30
Execution Write
k Issue complete Result
Busy
R2
1
2--3
4
Load1 No
R3
2
3--4
5
Load2 No
F4
3
6 -- 15
Load3 No
F2
4
6 -- 7
8
F6
5
F2
6
9 -- 10
11
S1
S2 RS for j RS for k
Busy Op
Vj
Vk
Qj
Qk
No
No
No
Yes Mult
M(A2)
R(F4)
Yes Div
M(A1) Mult1
FU
Nov. 2, 2004
F0
Mult1
F2
M(A2)
F4
Address
F6
F8
F10 F12
M1-M2+M(A2)
M1-M2 Mult2
Lec. 7
...
48
F30
Nov. 2, 2004
Lec. 7
49
F30
Nov. 2, 2004
Lec. 7
50
F30
Nov. 2, 2004
Lec. 7
51
F30
Nov. 2, 2004
Lec. 7
52
Branch Prediction
Easiest (static prediction)
Next easiest
1 bit predictor remember last taken/not taken per branch
Use a branch-prediction buffer or branch-history table
Use part of the PC (low-order bits) to index buffer/table
Multiple branches may share the same bit
Nov. 2, 2004
Lec. 7
53
Q: Assume a loop branch is taken nine times in a row, then not taken once. What
is the prediction accuracy using 1-bit predictor?
A: After first loop, the predictor will say not to take because the last time the
execution came out of loop, it set a 0 in the predictor. So, its a misprediction.
The bit will now be set to 1. Works fine until the last loop when it is predicted
as taken. So, 2 mispredictions in in 10 loop executions => 80% accuracy.
How about a 2-bit predictor? Let the prediction be changed only after it misses
twice in a row.
Nov. 2, 2004
Lec. 7
54
Nov. 2, 2004
Lec. 7
55
BHT
01
Nov. 2, 2004
Lec. 7
56
Can we do better ?
Correlating branch predictors also look at other branches for
clues
if (aa==2)
T
aa = 0
if (bb==2)
bb = 0
if(aa!=bb) {
NT
Lec. 7
57
Nov. 2, 2004
Lec. 7
58
Nov. 2, 2004
Lec. 7
59
Nov. 2, 2004
n-bit predictors
00
Lec. 7
60
Can we eliminate the one cycle delay for the 5-stage pipeline?
Need to fetch from branch target immediately after branch
Nov. 2, 2004
Lec. 7
61
Nov. 2, 2004
Lec. 7
62