Computer Science 146 Computer Architecture

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Computer Science 146

Computer Architecture
Spring 2004
Harvard University
Instructor: Prof. David Brooks
dbrooks@eecs.harvard.edu
Lecture 6: Scoreboarding Example,
Tomasulos Algorithm
Computer Science 146
David Brooks

Lecture Outline
Scoreboarding Review (A.8)
Tomasulos Algorithm (3.1-3.3)
Dynamic Scheduling + Register Renaming
Example 1: Same code as last time
Example 2: Hardware Loop Unrolling

Pointer-Based Renaming (MIPS R10000)

Computer Science 146


David Brooks

An aside
Interested in the real-world, business side of computer
architecture?
Bob Colwell (Chief Architect of Intel P6-core, used in
PentiumPro, PII, PIII) gave a talk at Stanford recently.
Technical architecture vs. marketing/management, many
anecdotes
http://stanford-online.stanford.edu/courses/ee380/040218-ee380-100.asx

Computer Science 146


David Brooks

Scoreboarding
Centralized scheme
No bypassing
WAR/WAW hazards
are a problem

Originally proposed
in CDC6600 (S. Cray,
1964)

Computer Science 146


David Brooks

Scoreboarding Stages Issue


(Or Dispatch)
Fetch Same as before
Issue (Check Structural Hazards)
If FU is free an no other active instruction has same
destination register (WAW), then issue instruction
Do not issue until structural hazards cleared
Stalled instruction stay in I-Buffer
Size of buffer is also a structural Hazard
May have to stall Fetch if buffer fills

Note: Issue is In-Order, stalls stops younger instructions


Computer Science 146
David Brooks

Scoreboarding Stages
Read Operands (Or Issue!)
Read Operands (Check Data Hazards)
Check scoreboard for whether source operands are
available
Available?
No earlier issued active instructions will write register
No currently active FU is going to write it

Dynamically avoids RAW hazards

Computer Science 146


David Brooks

Scoreboarding Stages
Execution/Write Result
Execution
Execute/Update scoreboard

Write Result
Scoreboard checks for WAR stalls and stalls completing
instruction, if necessary
Before, stalls only occur at the beginning of instructions,
now it can be at the end as well
Can happen if:
Completing instruction destination register matches an older
instruction that has not yet read its source operands
Computer Science 146
David Brooks

Scoreboarding Control Hardware


Three main parts
Instruction Status Bits
Indicate which of the four stages instruction is in

Functional Unit Status Bits

Busy (In Use or not), Operation being Performed


Fi -- Destination Register, Fj, Fk, -- Source Registers
Qj, Qk FU producing source regs Fj, Fk
Rj, Rk Flags indicating when Fj, Fk are ready but not yet read

Register Result Status


Which FU will write each register

Computer Science 146


David Brooks

Scoreboard Example
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

Busy

Op

dest
Fi

S1
Fj

S2
Fk

F2

F4

F6

F8

FU
Qj

FU
Qk

Fj?
Rj

Fk?
Rk

...

F30

No
No
No
No
No

Register result status:


F0
Clock

F10 F12

FU

Example courtesy of Prof. Broderson, CS152, UCB, Copyright (C) 2001 UCB
Computer Science 146
David Brooks

Scoreboard Example: Cycle 1


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

Busy

Op

dest
Fi

Yes
No
No
No
No

Load

F6

F2

F4

Register result status:


F0
Clock
1

FU

S1
Fj

S2
Fk

FU
Qj

FU
Qk

Fj?
Rj

R2

F6

F8

Fk?
Rk
Yes

F10 F12

...

F30

Integer

Computer Science 146


David Brooks

Scoreboard Example: Cycle 2


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1

Busy

Op

dest
Fi

Yes
No
No
No
No

Load

F6

F2

F4

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

Register result status:


F0
Clock
FU

S1
Fj

S2
Fk

FU
Qj

FU
Qk

Fj?
Rj

R2

F6

F8

Fk?
Rk
Yes

F10 F12

...

F30

Fj?
Rj

Fk?
Rk

Integer

Issue 2nd LD?

Computer Science 146


David Brooks

Scoreboard Example: Cycle 3


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1

Busy

Op

dest
Fi

Yes
No
No
No
No

Load

F6

F2

F4

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

Register result status:


F0
Clock
3

FU

Issue MULT?

S1
Fj

S2
Fk

FU
Qj

FU
Qk

R2

F6

F8

No

F10 F12

...

F30

Integer

Computer Science 146


David Brooks

Scoreboard Example: Cycle 4


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1

Busy

Op

dest
Fi

S1
Fj

S2
Fk

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

FU
Qk

Fj?
Rj

Fk?
Rk

...

F30

Fj?
Rj

Fk?
Rk

No
No
No
No
No

Register result status:


F0
Clock
FU

FU
Qj

F10 F12

Integer

Computer Science 146


David Brooks

Scoreboard Example: Cycle 5


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5

Busy

Op

dest
Fi

S1
Fj

Yes
No
No
No
No

Load

F2

F2

F4

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

Register result status:


F0
Clock
5

FU

S2
Fk

FU
Qj

FU
Qk

R3

F6

F8

Yes

F10 F12

...

F30

Integer

Computer Science 146


David Brooks

Scoreboard Example: Cycle 6


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6

2
6

Busy

Op

dest
Fi

S1
Fj

S2
Fk

FU
Qj

Yes
Yes
No
No
No

Load
Mult

F2
F0

F2

R3
F4

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

Register result status:


F0
Clock

FU
Qk

Fj?
Rj

Fk?
Rk

Integer

No

Yes
Yes

F10 F12

...

F30

Fj?
Rj

Fk?
Rk

No

No
Yes

Yes

No

...

F30

FU Mult1 Integer

Computer Science 146


David Brooks

Scoreboard Example: Cycle 7


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7

2
6

3
7

Busy

Op

dest
Fi

S1
Fj

S2
Fk

FU
Qj

Yes
Yes
No
Yes
No

Load
Mult

F2
F0

F2

R3
F4

Integer

Sub

F8

F6

F2

F2

F4

F6

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

Register result status:


F0
Clock
7

FU Mult1 Integer

F8

FU
Qk

Integer

F10 F12

Add

Read multiply operands?

Computer Science 146


David Brooks

Scoreboard Example: Cycle 8a (First half of clock cycle)


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8

2
6

3
7

Busy

Op

dest
Fi

S1
Fj

S2
Fk

FU
Qj

Yes
Yes
No
Yes
Yes

Load
Mult

F2
F0

F2

R3
F4

Integer

Sub
Div

F8
F10

F6
F0

F2
F6

F2

F4

F6

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

Register result status:


F0
Clock

FU Mult1 Integer

F8

FU
Qk

Fj?
Rj

Fk?
Rk

No

No
Yes

Mult1

Yes
No

No
Yes

F10 F12

...

F30

Integer

Add Divide

Computer Science 146


David Brooks

Scoreboard Example: Cycle 8b (Second half of clock cycle)


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8

2
6

3
7

4
8

Busy

Op

dest
Fi

S1
Fj

S2
Fk

Mult

F0

F2

F4

Sub
Div

F8
F10

F6
F0

F2
F6

F2

F4

F6

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

No
Yes
No
Yes
Yes

Register result status:


F0
Clock
8

FU Mult1

F8

FU
Qj

FU
Qk

Fj?
Rj

Fk?
Rk

Yes

Yes

Mult1

Yes
No

Yes
Yes

F10 F12

...

F30

Add Divide

Computer Science 146


David Brooks

Scoreboard Example: Cycle 9


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8

2
6
9
9

3
7

4
8

Busy

Op

dest
Fi

S1
Fj

S2
Fk

Mult

F0

F2

F4

Sub
Div

F8
F10

F6
F0

F2
F6

F2

F4

F6

Functional unit status:


Note
Remaining

Time Name
Integer
10 Mult1
Mult2
2 Add
Divide

No
Yes
No
Yes
Yes

Register result status:


F0
Clock
FU Mult1

F8

FU
Qj

FU
Qk

Fj?
Rj

Fk?
Rk

Yes

Yes

Mult1

Yes
No

Yes
Yes

F10 F12

...

F30

Add Divide

Read operands for MULT


& SUB? Issue ADDD?
Computer Science 146
David Brooks

Scoreboard Example: Cycle 10


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8

2
6
9
9

3
7

4
8

Busy

Op

dest
Fi

S1
Fj

S2
Fk

Mult

F0

F2

F4

Sub
Div

F8
F10

F6
F0

F2
F6

F2

F4

F6

Functional unit status:


Time Name
Integer
9 Mult1
Mult2
1 Add
Divide

No
Yes
No
Yes
Yes

Register result status:


F0
Clock
10

FU Mult1

F8

FU
Qj

FU
Qk

Fj?
Rj

Fk?
Rk

No

No

Mult1

No
No

No
Yes

F10 F12

...

F30

Add Divide

Computer Science 146


David Brooks

10

Scoreboard Example: Cycle 11


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8

2
6
9
9

11

Busy

Op

dest
Fi

S1
Fj

S2
Fk

Mult

F0

F2

F4

Sub
Div

F8
F10

F6
F0

F2
F6

F2

F4

F6

Functional unit status:


Time Name
Integer
8 Mult1
Mult2
0 Add
Divide

No
Yes
No
Yes
Yes

Register result status:


F0
Clock

3
7

4
8

FU Mult1

11

F8

FU
Qj

FU
Qk

Fj?
Rj

Fk?
Rk

No

No

Mult1

No
No

No
Yes

F10 F12

...

F30

Fj?
Rj

Fk?
Rk

No

No

Mult1

No

Yes

F10 F12

...

F30

Add Divide

Computer Science 146


David Brooks

Scoreboard Example: Cycle 12


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8

2
6
9
9

3
7

4
8

11

12

Busy

Op

dest
Fi

S1
Fj

S2
Fk

Mult

F0

F2

F4

Div

F10

F0

F6

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
7 Mult1
Mult2
Add
Divide

No
Yes
No
No
Yes

Register result status:


F0
Clock
12

FU Mult1

FU
Qj

FU
Qk

Divide

Read operands for DIVD?

Computer Science 146


David Brooks

11

Scoreboard Example: Cycle 13


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8
13

2
6
9
9

3
7

4
8

11

12

Busy

Op

dest
Fi

S1
Fj

S2
Fk

Mult

F0

F2

F4

Add
Div

F6
F10

F8
F0

F2
F6

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
6 Mult1
Mult2
Add
Divide

No
Yes
No
Yes
Yes

Register result status:


F0
Clock
FU Mult1

13

Add

FU
Qj

FU
Qk

Fj?
Rj

Fk?
Rk

No

No

Mult1

Yes
No

Yes
Yes

F10 F12

...

F30

Fj?
Rj

Fk?
Rk

No

No

Mult1

Yes
No

Yes
Yes

F10 F12

...

F30

Divide

Computer Science 146


David Brooks

Scoreboard Example: Cycle 14


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8
13

2
6
9
9

3
7

4
8

11

12

14

Busy

Op

dest
Fi

S1
Fj

S2
Fk

Mult

F0

F2

F4

Add
Div

F6
F10

F8
F0

F2
F6

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
5 Mult1
Mult2
2 Add
Divide

No
Yes
No
Yes
Yes

Register result status:


F0
Clock
14

FU Mult1

Add

FU
Qj

FU
Qk

Divide

Computer Science 146


David Brooks

12

Scoreboard Example: Cycle 15


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8
13

2
6
9
9

3
7

4
8

11

12

14

Busy

Op

dest
Fi

S1
Fj

S2
Fk

Mult

F0

F2

F4

Add
Div

F6
F10

F8
F0

F2
F6

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
4 Mult1
Mult2
1 Add
Divide

No
Yes
No
Yes
Yes

Register result status:


F0
Clock
FU Mult1

15

Add

FU
Qj

FU
Qk

Fj?
Rj

Fk?
Rk

No

No

Mult1

No
No

No
Yes

F10 F12

...

F30

Fj?
Rj

Fk?
Rk

No

No

Mult1

No
No

No
Yes

F10 F12

...

F30

Divide

Computer Science 146


David Brooks

Scoreboard Example: Cycle 16


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8
13

2
6
9
9

3
7

4
8

11

12

14

16

Busy

Op

dest
Fi

S1
Fj

S2
Fk

Mult

F0

F2

F4

Add
Div

F6
F10

F8
F0

F2
F6

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
3 Mult1
Mult2
0 Add
Divide

No
Yes
No
Yes
Yes

Register result status:


F0
Clock
16

FU Mult1

Add

FU
Qj

FU
Qk

Divide

Computer Science 146


David Brooks

13

Scoreboard Example: Cycle 17


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8
13

2
6
9
9

3
7

4
8

11

12

14

16

Busy

Op

dest
Fi

S1
Fj

S2
Fk

Mult

F0

F2

F4

Add
Div

F6
F10

F8
F0

F2
F6

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
2 Mult1
Mult2
Add
Divide

No
Yes
No
Yes
Yes

Register result status:


F0
Clock
FU Mult1

17

WAR Hazard!

Add

FU
Qj

FU
Qk

Fj?
Rj

Fk?
Rk

No

No

Mult1

No
No

No
Yes

F10 F12

...

F30

Fj?
Rj

Fk?
Rk

No

No

Mult1

No
No

No
Yes

F10 F12

...

F30

Divide

Why not write resultComputer


of ADD???
Science 146
David Brooks

Scoreboard Example: Cycle 18


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8
13

2
6
9
9

3
7

4
8

11

12

14

16

Busy

Op

dest
Fi

S1
Fj

S2
Fk

Mult

F0

F2

F4

Add
Div

F6
F10

F8
F0

F2
F6

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
1 Mult1
Mult2
Add
Divide

No
Yes
No
Yes
Yes

Register result status:


F0
Clock
18

FU Mult1

Add

FU
Qj

FU
Qk

Divide

Computer Science 146


David Brooks

14

Scoreboard Example: Cycle 19


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8
13

2
6
9
9

3
7
19
11

14

16

Busy

Op

dest
Fi

S1
Fj

S2
Fk

Mult

F0

F2

F4

Add
Div

F6
F10

F8
F0

F2
F6

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
0 Mult1
Mult2
Add
Divide

No
Yes
No
Yes
Yes

Register result status:


F0
Clock
FU Mult1

19

4
8
12

Add

FU
Qj

FU
Qk

Fj?
Rj

Fk?
Rk

No

No

Mult1

No
No

No
Yes

F10 F12

...

F30

Fj?
Rj

Fk?
Rk

No
Yes

No
Yes

...

F30

Divide

Computer Science 146


David Brooks

Scoreboard Example: Cycle 20


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8
13

2
6
9
9

3
7
19
11

14

16

Busy

Op

dest
Fi

S1
Fj

S2
Fk

No
No
No
Yes
Yes

Add
Div

F6
F10

F8
F0

F2
F6

Register result status:


F0
Clock

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

20

FU

4
8
20
12

Add

FU
Qj

FU
Qk

F10 F12
Divide

Computer Science 146


David Brooks

15

Scoreboard Example: Cycle 21


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8
13

2
6
9
9
21
14

16

Busy

Op

dest
Fi

S1
Fj

S2
Fk

No
No
No
Yes
Yes

Add
Div

F6
F10

F8
F0

F2
F6

Register result status:


F0
Clock

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

3
7
19
11

FU

21

4
8
20
12

Add

FU
Qj

FU
Qk

F10 F12

Fj?
Rj

Fk?
Rk

No
Yes

No
Yes

...

F30

Fj?
Rj

Fk?
Rk

No

No

...

F30

Divide

WAR Hazard is now Computer


gone...
Science 146
David Brooks

Scoreboard Example: Cycle 22


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8
13

2
6
9
9
21
14

3
7
19
11

4
8
20
12

16

22

Busy

Op

dest
Fi

S1
Fj

S2
Fk

No
No
No
No
Yes

Div

F10

F0

F6

Register result status:


F0
Clock

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
39 Divide

22

FU

FU
Qj

FU
Qk

F10 F12
Divide

Computer Science 146


David Brooks

16

Faster than light


computation
(skip a couple of cycles)

Computer Science 146


David Brooks

Scoreboard Example: Cycle 61


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8
13

2
6
9
9
21
14

3
7
19
11
61
16

22

Busy

Op

dest
Fi

S1
Fj

S2
Fk

No
No
No
No
Yes

Div

F10

F0

F6

Register result status:


F0
Clock

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
0 Divide

61

4
8
20
12

FU

FU
Qj

FU
Qk

F10 F12

Fj?
Rj

Fk?
Rk

No

No

...

F30

Divide

Computer Science 146


David Brooks

17

Scoreboard Example: Cycle 62


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8
13

2
6
9
9
21
14

3
7
19
11
61
16

4
8
20
12
62
22

Busy

Op

dest
Fi

S1
Fj

S2
Fk

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

FU
Qj

FU
Qk

Fj?
Rj

Fk?
Rk

...

F30

No
No
No
No
No

Register result status:


F0
Clock

F10 F12

FU

62

Computer Science 146


David Brooks

Review: Scoreboard Example: Cycle 62


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Read Exec Write


Issue Oper Comp Result
1
5
6
7
8
13

2
6
9
9
21
14

3
7
19
11
61
16

4
8
20
12
62
22

Busy

Op

dest
Fi

S1
Fj

S2
Fk

F2

F4

F6

F8

Functional unit status:


Time Name
Integer
Mult1
Mult2
Add
Divide

FU
Qk

Fj?
Rj

Fk?
Rk

...

F30

No
No
No
No
No

Register result status:


F0
Clock
62

FU
Qj

F10 F12

FU

In-order issue; out-of-order


execute & commit
Computer Science 146
David Brooks

18

Scoreboarding Review
LD
LD
MULTD
SUBD
DIVD
ADDD

LD F6, 34(R2)

F6
F2
F0
F8
F10
F6

Iss

Rd

Ex

Wb

LD F2, 45(R3)

34+
45+
F2
F6
F0
F8

R2
R3
F4
F2
F6
F2

Iss

Rd

Ex

Wb

Iss

Iss
Iss

MULTD F0, F2, F4


SUBD F8, F6, F2
DIVD F10, F0, F6

10

11

12

13

Iss

Rd

M1

M2

M3

M4

Iss

Rd

A1

A2

Wb

Iss

Iss

Iss

Iss

Iss

ADDD F6, F8, F2

Iss
Iss

Computer Science 146


David Brooks

Scoreboarding Review
LD
LD
MULTD
SUBD
DIVD
ADDD

F6
F2
F0
F8
F10
F6

34+
45+
F2
F6
F0
F8

R2
R3
F4
F2
F6
F2

11

12

13

14

15

16

17

18

19

20

21

22 . 62

MULTD F0, F2, F4

M2

M3

M4

M5

M6

M7

M8

M9

M10

Wb

SUBD F8, F6, F2

A2

Wb

DIVD F10, F0, F6

Iss

Iss

Iss

Iss

Iss

Iss

Iss

Iss

Iss

Iss

Rd

A1

A2

A2

A2

A2

Iss

Rd

D1

A2

A2

Wb

LD F6, 34(R2)
LD F2, 45(R3)

ADDD F6, F8, F2

Wb

Computer Science 146


David Brooks

19

Scoreboarding Limitations
Number and type of functional units
Number of instruction buffer entries (scoreboard
size)
Amount of application ILP (RAW hazards)
Presence of antidependencies (WAR) and output
dependencies (WAW)
Inorder issue for WAW/Structural Hazards limits
scheduler
WAR stalls are critical for loops (hardware loop
unrolling)
Computer Science 146
David Brooks

Tomasulos Approach
Used in IBM 360/91 Machines (Late 60s)
Similar to scoreboarding, but added renaming
Key concept: Reservation Stations
Very Important Topic
Scheduling ideas led to Alpha 21264, HP PA-8000,
MIPS R10K, Pentium III, Pentium 4, PowerPC 604,
etc

Computer Science 146


David Brooks

20

Reservation Stations (RS)


Distributed (rather than centralized) control scheme
Bypassing is allowed via Common Data Bus (CDB) to RS
Register Renaming eliminates WAR/WAW hazards

Scoreboard/Instruction Buffer => Reservation Stations


Fetch and Buffer operands as soon as available
Eliminates need to always get values from registers at execute

Pending instructions designate reservation stations that will


provide their inputs
Successive writes to a register cause only the last one to update
the register
Computer Science 146
David Brooks

Register Renaming
Compiler can eliminate some WAW/WAR false
hazards, but not all
Not enough registers
Hazards across branches (common!) can eliminate on
taken, or fall through but not both
Hazards with itself -- dynamic loops (example later)

Example (spill code causing false hazards)


C=A+B
D=A-B

ADD
SW
SUB

R3, R1, R2
R3, 0(R4)
R3, R1, R2

Computer Science 146


David Brooks

21

Register Renaming
Dynamically change register names to eliminate
false dependencies (WAR/WAW hazards)
Architectural registers: Names not Locations
Many more locations (reservation stations or physical
registers) than names (logical or architectural registers)
Dynamically map names to locations

Computer Science 146


David Brooks

Register Renaming Example


Assume temporary registers S and T

DIV
ADD
SW
SUB
MUL

F0, F2, F4
F6, F0, F8
F6, 0(R1)
F8, F10, F14
F6, F10, F8

DIV
ADD
SW
SUB
MUL

F0, F2, F4
S, F0, F8
S, 0(R1)
T, F10, F14
F6, F10, T

Computer Science 146


David Brooks

22

Register Renaming with Tomasulo


At instruction issue:
Register specifiers for source operands are renamed to the
names of the reservation stations
Values can exist in reservation station or register file
To eliminate WARs, register file values are copied to reservation
stations at issue
Other methods example use pointer-based renaming (map-table)

Technique used in Pentium III, PowerPC604

Computer Science 146


David Brooks

Reservation Station Components


Op: Operation to perform in the unit
Qj, Qk: Reservation stations producing source registers
(value to be written)
Note: No ready flags needed as in Scoreboard
Qj,Qk=0 => ready
Store buffers only have Qi for RS producing result

Vj, Vk: Value of Source operands


Store buffers has V field, result to be stored

Busy: Indicates reservation station or FU are occupied


Register Result Status: Indicates which functional unit will
write each register, if one exists. Blank when no pending
instructions that will write that register.
Computer Science 146
David Brooks

23

Three Stages of Tomasulo Algorithm


1. Issueget instruction from FP Op Queue
If reservation station free (no structural hazard),
control issues instr & sends operands (renames registers).

2. Executionoperate on operands (EX)


When both operands ready then execute;
if not ready, watch Common Data Bus for result

3. Write resultfinish execution (WB)


Write on Common Data Bus to all awaiting units;
mark reservation station available
Computer Science 146
David Brooks

Data Buses in Tomasulo Algorithm


Normal data bus: data + destination (go to bus)
Common data bus: data + source (come from bus)
64 bits of data + 4 bits of Functional Unit source address
Write if matches expected Functional Unit (produces
result)
Does the broadcast

Computer Science 146


David Brooks

24

Tomasulo Organization
FP Registers

FP Op
Queue
Load Buffers

From Mem

Load1
Load2
Load3
Load4
Load5
Load6

Store
Buffers
Add1
Add2
Add3

Mult1
Mult2

FP
FP adders
adders

Reservation
Stations

To Mem
FP
FP multipliers
multipliers

Common Data Bus (CDB)


Computer Science 146
David Brooks

Tomasulo Example
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result

Reservation Stations:
Time Name Busy
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No

Register result status:


Clock
0

Busy Address
Load1
Load2
Load3

Op

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F0

F2

F4

F6

F8

No
No
No

F10

F12

...

F30

FU

Computer Science 146


David Brooks

25

Tomasulo Example Cycle 1


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result

Reservation Stations:
Time Name Busy
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No

Register result status:


Clock

Load1
Load2
Load3

Op

S1
Vj

S2
Vk

F0

F2

F4

FU

Busy Address

RS
Qj

RS
Qk

F6

F8

Yes
No
No

34+R2

F10

F12

...

F30

...

F30

Load1

Computer Science 146


David Brooks

Tomasulo Example Cycle 2


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result

Reservation Stations:
Time Name Busy
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No

Register result status:


Clock
2

FU

Busy Address

1
2

Op

F0

Load1
Load2
Load3

S1
Vj

S2
Vk

F2

F4

Load2

RS
Qj

RS
Qk

F6

F8

Yes
Yes
No

34+R2
45+R3

F10

F12

Load1

Note: Unlike 6600, can have multiple loads outstanding


Computer Science 146
David Brooks

26

Tomasulo Example Cycle 3


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3

Reservation Stations:
Time Name Busy Op
Add1 No
Add2 No
Add3 No
Mult1 Yes MULTD
Mult2 No

Register result status:


Clock

F0

FU

Busy Address

S1
Vj

Load1
Load2
Load3

S2
Vk

RS
Qj

Yes
Yes
No

34+R2
45+R3

F10

F12

RS
Qk

R(F4) Load2

F2

F4

Mult1 Load2

F6

F8

...

F30

Load1

Note: registers names are removed (renamed) in


Reservation Stations; MULT issued vs. scoreboard
Computer Science 146
Load1 completing; whatDavid
is Brooks
waiting for Load1?

Tomasulo Example Cycle 4


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4

Reservation Stations:

3
4

S1

S2

Busy Address
Load1
Load2
Load3

RS

No
Yes
No

45+R3

F10

F12

RS

Vj
Vk
Qj
Qk
Time Name Busy Op
Add1 Yes SUBD M(A1)
Load2
Add2 No
Add3 No
Mult1 Yes MULTD
R(F4) Load2
Mult2 No

Register result status:


Clock
4

FU

F0

F2

Mult1 Load2

F4

F6

F8

...

F30

M(A1) Add1

Load2 completing; what is waiting for Load1?


Computer Science 146
David Brooks

27

Tomasulo Example Cycle 5


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5

Reservation Stations:

3
4

4
5

S1

S2

Busy Address
Load1
Load2
Load3

RS

RS
Qk

Vj
Vk
Qj
Time Name Busy Op
2 Add1 Yes SUBD M(A1) M(A2)
Add2 No
Add3 No
10 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock

F0

FU

F2

F4

Mult1 M(A2)

No
No
No

F6

F8

F10

F12

...

F30

...

F30

M(A1) Add1 Mult2

Computer Science 146


David Brooks

Tomasulo Example Cycle 6


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4

4
5

S1

S2

Busy Address
Load1
Load2
Load3

RS

Vj
Vk
Qj
Time Name Busy Op
1 Add1 Yes SUBD M(A1) M(A2)
Add2 Yes ADDD
M(A2) Add1
Add3 No
9 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
6

FU

F0

F2

Mult1 M(A2)

F4

F6
Add2

No
No
No

RS
Qk

F8

F10

F12

Add1 Mult2

Issue ADDD here vs. scoreboard?


Computer Science 146
David Brooks

28

Tomasulo Example Cycle 7


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4

Busy Address

4
5

Load1
Load2
Load3

S1

S2

RS

Vj
Vk
Qj
Time Name Busy Op
0 Add1 Yes SUBD M(A1) M(A2)
Add2 Yes ADDD
M(A2) Add1
Add3 No
8 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock

F0

FU

No
No
No

F2

F4

Mult1 M(A2)

F6
Add2

RS
Qk

F8

F10

F12

...

F30

...

F30

Add1 Mult2

Add1 completing; what is waiting for it?


Computer Science 146
David Brooks

Tomasulo Example Cycle 8


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4

4
5

S1

S2

Busy Address
Load1
Load2
Load3

RS

Vj
Vk
Qj
Time Name Busy Op
Add1 No
2 Add2 Yes ADDD (M-M) M(A2)
Add3 No
7 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
8

FU

F0

F2

Mult1 M(A2)

F4

F6

No
No
No

RS
Qk

F8

F10

F12

Add2 (M-M) Mult2

Computer Science 146


David Brooks

29

Tomasulo Example Cycle 9


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4

4
5

S1

S2

Busy Address
Load1
Load2
Load3

RS

Vj
Vk
Qj
Time Name Busy Op
Add1 No
1 Add2 Yes ADDD (M-M) M(A2)
Add3 No
6 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock

F0

FU

F2

F4

Mult1 M(A2)

F6

No
No
No

RS
Qk

F8

F10

F12

...

F30

...

F30

Add2 (M-M) Mult2

Computer Science 146


David Brooks

Tomasulo Example Cycle 10


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4

4
5

Busy Address
Load1
Load2
Load3

10

S1

S2

RS

Vj
Vk
Qj
Time Name Busy Op
Add1 No
0 Add2 Yes ADDD (M-M) M(A2)
Add3 No
5 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
10

FU

F0

No
No
No

F2

Mult1 M(A2)

F4

F6

RS
Qk

F8

F10

F12

Add2 (M-M) Mult2

Add2 completing; what is waiting for it?


Computer Science 146
David Brooks

30

Tomasulo Example Cycle 11


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4

4
5

10

11

S1

S2

Busy Address
Load1
Load2
Load3

RS

Vj
Vk
Qj
Time Name Busy Op
Add1 No
Add2 No
Add3 No
4 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock

F0

FU

11

F2

F4

F6

No
No
No

RS
Qk

F8

F10

F12

...

F30

...

F30

(M-M+M(M-M) Mult2

Mult1 M(A2)

Write result of ADDD here vs. scoreboard?


Science 146
All quick instructions Computer
complete
in this cycle!
David Brooks

Tomasulo Example Cycle 12


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4

4
5

10

11

S1

S2

Busy Address
Load1
Load2
Load3

RS

Vj
Vk
Qj
Time Name Busy Op
Add1 No
Add2 No
Add3 No
3 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
12

FU

F0

F2

Mult1 M(A2)

F4

F6

No
No
No

RS
Qk

F8

F10

F12

(M-M+M(M-M) Mult2

Computer Science 146


David Brooks

31

Tomasulo Example Cycle 13


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4

4
5

10

11

S1

S2

Busy Address
Load1
Load2
Load3

RS

Vj
Vk
Qj
Time Name Busy Op
Add1 No
Add2 No
Add3 No
2 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock

F0

FU

13

F2

F4

F6

No
No
No

RS
Qk

F8

F10

F12

...

F30

...

F30

(M-M+M(M-M) Mult2

Mult1 M(A2)

Computer Science 146


David Brooks

Tomasulo Example Cycle 14


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4

4
5

10

11

S1

S2

Busy Address
Load1
Load2
Load3

RS

Vj
Vk
Qj
Time Name Busy Op
Add1 No
Add2 No
Add3 No
1 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
14

FU

F0

F2

Mult1 M(A2)

F4

F6

No
No
No

RS
Qk

F8

F10

F12

(M-M+M(M-M) Mult2

Computer Science 146


David Brooks

32

Tomasulo Example Cycle 15


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4
15
7

4
5

10

11

S1

S2

Busy Address
Load1
Load2
Load3

RS

RS
Qk

Vj
Vk
Qj
Time Name Busy Op
Add1 No
Add2 No
Add3 No
0 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock

F0

FU

15

F2

No
No
No

F4

F6

F8

F10

F12

...

F30

...

F30

(M-M+M(M-M) Mult2

Mult1 M(A2)

Computer Science 146


David Brooks

Tomasulo Example Cycle 16


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4
15
7

4
5
16
8

10

11

S1

S2

Vj
Vk
Time Name Busy Op
Add1 No
Add2 No
Add3 No
Mult1 No
40 Mult2 Yes DIVD M*F4 M(A1)

Register result status:


Clock
16

FU

F0

F2

M*F4 M(A2)

F4

Busy Address
Load1
Load2
Load3

RS
Qj

RS
Qk

F6

F8

No
No
No

F10

F12

(M-M+M(M-M) Mult2

Computer Science 146


David Brooks

33

Tomasulo Example Cycle 55


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4
15
7

4
5
16
8

10

11

S1

S2

Vj
Vk
Time Name Busy Op
Add1 No
Add2 No
Add3 No
Mult1 No
1 Mult2 Yes DIVD M*F4 M(A1)

Register result status:


Clock

F0

FU

55

F2

Busy Address
Load1
Load2
Load3

F4

RS
Qj

RS
Qk

F6

F8

No
No
No

F10

F12

...

F30

...

F30

(M-M+M(M-M) Mult2

M*F4 M(A2)

Computer Science 146


David Brooks

Tomasulo Example Cycle 56


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4
15
7
56
10

11

S1

S2

Load1
Load2
Load3

Vj
Vk
Time Name Busy Op
Add1 No
Add2 No
Add3 No
Mult1 No
0 Mult2 Yes DIVD M*F4 M(A1)

Register result status:


Clock
56

FU

F0

F2

M*F4 M(A2)

Busy Address

4
5
16
8

F4

RS
Qj

RS
Qk

F6

F8

No
No
No

F10

F12

(M-M+M(M-M) Mult2

Mult2 is completing; what is waiting for it?


Computer Science 146
David Brooks

34

Tomasulo Example Cycle 57


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

3
4
15
7
56
10

4
5
16
8
57
11

S1

S2

Vj
Vk
Time Name Busy Op
Add1 No
Add2 No
Add3 No
Mult1 No
0 Mult2 Yes DIVD M*F4 M(A1)

Register result status:


Clock
FU

56

F0

F2

M*F4 M(A2)

Busy Address
Load1
Load2
Load3

F4

RS
Qj

RS
Qk

F6

F8

No
No
No

F10

F12

...

F30

(M-M+M(M-M) Mult2

Once again: In-order issue, out-of-order execution


and completion.
Computer Science 146
David Brooks

Compare to Scoreboard Cycle 62


Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

Read Exec Write


k Issue Oper Comp Result
R2
1
2
3
4
R3
5
6
7
8
F4
6
9
19
20
F2
7
9
11
12
F6
8
21
61
62
F2
13
14
16
22

Exec Write
Issue ComplResult
1
2
3
4
5
6

3
4
15
7
56
10

4
5
16
8
57
11

Why take longer on scoreboard/6600?


Structural Hazards
Lack of forwarding

Computer Science 146


David Brooks

35

Tomasulo Review
LD
LD
MULTD
SUBD
DIVD
ADDD
1
LD F6, 34(R2)

Iss

LD F2, 45(R3)

F6
F2
F0
F8
F10
F6
5

34+
45+
F2
F6
F0
F8

R2
R3
F4
F2
F6
F2

10

11

12

13

M4

M5

M6

M7

M8

Iss

Iss

Ex

Wb

Iss

Ex

Wb

Iss

Iss

Iss

M1

M2

M3

Iss

Iss

A1

A2

Wb

Iss

Iss

Iss

Iss

Iss

Iss

Iss

Iss

Iss

Iss

A1

A2

Wb

MULTD F0, F2, F4


SUBD F8, F6, F2
DIVD F10, F0, F6
ADDD F6, F8, F2

Computer Science 146


David Brooks

Tomasulo Review
LD
LD
MULTD
SUBD
DIVD
ADDD

F6
F2
F0
F8
F10
F6

34+
45+
F2
F6
F0
F8

11

12

13

14

15

16

M6

M7

M8

M9

M10

Wb

Iss

Iss

Iss

Iss

Iss

Iss

R2
R3
F4
F2
F6
F2
17

18

19

20

21

22 . 57

D1

D2

D3

D4

D5

D6

LD F6, 34(R2)
LD F2, 45(R3)
MULTD F0, F2, F4
SUBD F8, F6, F2
DIVD F10, F0, F6

Wb

ADDD F6, F8, F2


Computer Science 146
David Brooks

36

How can Tomasulo overlap


iterations of loops?
Register renaming

Multiple iterations use different physical destinations


for registers (dynamic loop unrolling).
Replace static register names from code with
dynamic register locations
Increases effective size of register file
Permit instruction issue to advance past integer
control flow operations.
Crucial: integer unit must get ahead of floating point unit
so that we can issue multiple iterations
Computer Science 146
David Brooks

Tomasulo Loop Example


Loop:LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

Multiply takes 4 clocks


Assume first load takes 8 clocks (cache miss), second load
takes 1 clock (hit)
Will show clocks for SUBI, BNEZ
Reality: integer instructions run ahead
Computer Science 146
David Brooks

37

Loop Example
Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Op

Vj

Reservation Stations:
Time

Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No

Exec Write
Issue CompResult

S1
Vk

S2
Qj

RS
Qk

Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3

No
No
No
No
No
No

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Fu

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
0

F0

R1
80

F2

F4

F6

F8

F10 F12

Fu
Computer Science 146
David Brooks

Loop Example Cycle 1


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Op

Vj

Reservation Stations:
Time

Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No

Exec Write
Issue CompResult
1

S1
Vk

S2
Qj

RS
Qk

Busy Addr

Fu

Load1
Load2
Load3
Store1
Store2
Store3

Yes
No
No
No
No
No

80

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
1

R1
80

F0

F2

F4

F6

F8

F10 F12

Fu Load1
Computer Science 146
David Brooks

38

Loop Example Cycle 2


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Reservation Stations:
Time

Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No

Vj

Exec Write
Issue CompResult
1
2

S1
Vk

S2
Qj

RS
Qk

R(F2) Load1

Busy Addr

Fu

Load1
Load2
Load3
Store1
Store2
Store3

Yes
No
No
No
No
No

80

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
2

F0

R1
80

F2

Fu Load1

F4

F6

F8

F10 F12

Mult1
Computer Science 146
David Brooks

Loop Example Cycle 3


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Reservation Stations:
Time

Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No

Vj

Exec Write
Issue CompResult
1
2
3

S1
Vk

S2
Qj

RS
Qk

R(F2) Load1

Busy Addr

Fu

Load1
Load2
Load3
Store1
Store2
Store3

Yes
No
No
Yes
No
No

80

80

Mult1

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
3

R1
80

F0
Fu Load1

F2

F4

F6

F8

F10 F12

Mult1

Implicit renaming sets up DataFlow graph


Computer Science 146
David Brooks

39

What does this mean physically?


FP Registers

FP Op
Queue
Load Buffers

From Mem

Load1
Load2
Load3
Load4
Load5
Load6

F0:
F0:Load
Load11
F4:
F4:Mult1
Mult1

addr:
addr:80
80
Store
Buffers

Addr:
Mult1
Addr:80
80 Mult1

Add1
Add2
Add3

Mult1 mul R(F2)


Mult2

Reservation
Stations

FP
FP adders
adders

Load1

To Mem
FP
FP multipliers
multipliers

Common Data Bus (CDB)


Computer Science 146
David Brooks

Loop Example Cycle 4


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Reservation Stations:
Time

Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No

Vj

Exec Write
Issue CompResult
1
2
3

S1
Vk

S2
Qj

RS
Qk

R(F2) Load1

Busy Addr

Fu

Load1
Load2
Load3
Store1
Store2
Store3

Yes
No
No
Yes
No
No

80

80

Mult1

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
4

R1
80

F0
Fu Load1

F2

F4

F6

F8

F10 F12

Mult1

Dispatching SUBI Instruction

Computer Science 146


David Brooks

40

Loop Example Cycle 5


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Reservation Stations:
Time

Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No

Vj

Exec Write
Issue CompResult
1
2
3

S1
Vk

S2
Qj

RS
Qk

R(F2) Load1

Busy Addr

Fu

Load1
Load2
Load3
Store1
Store2
Store3

Yes
No
No
Yes
No
No

80

80

Mult1

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
5

F0

R1
72

F2

Fu Load1

F4

F6

F8

F10 F12

Mult1

And, BNEZ instruction

Computer Science 146


David Brooks

Loop Example Cycle 6


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Reservation Stations:
Time

Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No

Vj

Exec Write
Issue CompResult
1
2
3
6

S1
Vk

S2
Qj

RS
Qk

R(F2) Load1

Busy Addr

Fu

Load1
Load2
Load3
Store1
Store2
Store3

Yes
Yes
No
Yes
No
No

80
72
80

Mult1

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
6

R1
72

F0
Fu Load2

F2

F4

F6

F8

F10 F12

Mult1

Notice that F0 never sees Load from location 80


Computer Science 146
David Brooks

41

Loop Example Cycle 7


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Reservation Stations:
Time

Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 Yes Multd

Vj

Exec Write
Issue CompResult
1
2
3
6
7
S1
Vk

S2
Qj

RS
Qk

R(F2) Load1
R(F2) Load2

Fu

Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3

Yes
Yes
No
Yes
No
No

80
72
80

Mult1

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
7

F0

R1
72

F2

Fu Load2

F4

F6

F8

F10 F12

Mult2

Register file completely detached from iteration 1


Computer Science 146
David Brooks

Loop Example Cycle 8


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

1
2
3
6
7
8

Vj

S1
Vk

Reservation Stations:
Time

Exec Write
Issue CompResult

Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 Yes Multd

S2
Qj

RS
Qk

R(F2) Load1
R(F2) Load2

Fu

Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3

Yes
Yes
No
Yes
Yes
No

80
72
80
72

Mult1
Mult2

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
8

R1
72

F0
Fu Load2

F2

F4

F6

F8

F10 F12

Mult2

First and Second iteration completely overlapped


Computer Science 146
David Brooks

42

What does this mean physically?


FP Registers

FP Op
Queue
Load Buffers

From Mem

F0:
F0:Load2
Load2
F4:
F4:Mult2
Mult2

Load1 addr:
addr:80
80
Load2 addr:
addr:72
72
Load3
Load4
Load5
Load6

Store
Buffers

Addr:
Mult1
Addr:80
80 Mult1
Addr:
Mult2
Addr:72
72 Mult2

Add1
Add2
Add3

Mult1 mul R(F2)


Mult2 mul R(F2)

Reservation
Stations

FP
FP adders
adders

Load1
Load2

To Mem
FP
FP multipliers
multipliers

Common Data Bus (CDB)


Computer Science 146
David Brooks

Loop Example Cycle 9


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

1
2
3
6
7
8

Vj

S1
Vk

S2
Qj

Reservation Stations:
Time

Exec Write
Issue CompResult

Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 Yes Multd

RS
Qk

R(F2) Load1
R(F2) Load2

Busy Addr

Fu

Load1
Load2
Load3
Store1
Store2
Store3

Yes
Yes
No
Yes
Yes
No

80
72
80
72

Mult1
Mult2

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
9

R1
72

F0
Fu Load2

F2

F4

F6

F8

F10 F12

Mult2

Load1 completing: who is waiting?


Science 146
Note: Dispatching SUBI Computer
David Brooks

43

Loop Example Cycle 10


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Exec Write
Issue CompResult
1
2
3
6
7
8

10

10

Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3

No
Yes
No
Yes
Yes
No

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

Fu

72
80
72

Mult1
Mult2

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Reservation Stations:
Time

S1
S2
RS
Name Busy Op
Vj
Vk
Qj
Qk
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd M[80] R(F2)
Mult2 Yes Multd
R(F2) Load2

Register result status

Clock
10

F0

R1
64

F2

Fu Load2

F4

F6

F8

F10 F12

Mult2

Load2 completing: who is waiting?


Science 146
Note: Dispatching BNEZComputer
David Brooks

Loop Example Cycle 11


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Exec Write
Issue CompResult
1
2
3
6
7
8

Reservation Stations:
Time

3
4

S1
Name Busy Op
Vj
Vk
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd M[80] R(F2)
Mult2 Yes Multd M[72] R(F2)

10

10

11

S2
Qj

RS
Qk

Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3

No
No
Yes
Yes
Yes
No

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

64
80
72

Fu

Mult1
Mult2

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
11

R1
64

F0
Fu Load3

F2

F4

F6

F8

F10 F12

Mult2

Next load in sequence


Computer Science 146
David Brooks

44

Loop Example Cycle 12


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Exec Write
Issue CompResult
1
2
3
6
7
8

Reservation Stations:
Time

2
3

S1
Name Busy Op
Vj
Vk
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd M[80] R(F2)
Mult2 Yes Multd M[72] R(F2)

10

10

11

S2
Qj

RS
Qk

Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3

No
No
Yes
Yes
Yes
No

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

64
80
72

Fu

Mult1
Mult2

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
12

F0

R1
64

F2

Fu Load3

F4

F6

F8

F10 F12

Mult2

Why not issue third multiply?


Computer Science 146
David Brooks

Loop Example Cycle 13


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Exec Write
Issue CompResult
1
2
3
6
7
8

Reservation Stations:
Time

1
2

S1
Name Busy Op
Vj
Vk
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd M[80] R(F2)
Mult2 Yes Multd M[72] R(F2)

10

10

11

S2
Qj

RS
Qk

Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3

No
No
Yes
Yes
Yes
No

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

64
80
72

Fu

Mult1
Mult2

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
13

R1
64

F0
Fu Load3

F2

F4

F6

F8

F10 F12

Mult2
Computer Science 146
David Brooks

45

Loop Example Cycle 14


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Exec Write
Issue CompResult
1
2
3
6
7
8

Reservation Stations:
Time

0
1

S1
Name Busy Op
Vj
Vk
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd M[80] R(F2)
Mult2 Yes Multd M[72] R(F2)

9
14

10

10

11

S2
Qj

RS
Qk

Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3

No
No
Yes
Yes
Yes
No

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

64
80
72

Fu

Mult1
Mult2

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
14

F0

R1
64

F2

Fu Load3

F4

F6

F8

F10 F12

Mult2

Mult1 completing. Who is waiting?


Computer Science 146
David Brooks

Loop Example Cycle 15


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

Exec Write
Issue CompResult
1
2
3
6
7
8

Reservation Stations:
Time

S1
Name Busy Op
Vj
Vk
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 Yes Multd M[72] R(F2)

9
14

10
15

10
15

11

S2
Qj

RS
Qk

Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3

No
No
Yes
Yes
Yes
No

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

64
80
72

Fu

[80]*R2
Mult2

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
15

R1
64

F0
Fu Load3

F2

F4

F6

F8

F10 F12

Mult2

Mult2 completing. Who is waiting?


Computer Science 146
David Brooks

46

Loop Example Cycle 16


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

1
2
3
6
7
8

Vj

S1
Vk

Reservation Stations:
Time

Exec Write
Issue CompResult

Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No

9
14

10
15

10
15

11
16

S2
Qj

RS
Qk

R(F2) Load3

Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3

No
No
Yes
Yes
Yes
No

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

64
80
72

Fu

[80]*R2
[72]*R2

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

Register result status

Clock
16

F0

R1
64

F2

Fu Load3

F4

F6

F8

F10 F12

Mult1
Computer Science 146
David Brooks

Loop Example Cycle 17


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

1
2
3
6
7
8

Vj

S1
Vk

Reservation Stations:
Time

Exec Write
Issue CompResult

Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No

9
14

10
15

10
15

11
16

S2
Qj

RS
Qk

R(F2) Load3

Busy Addr

Fu

Load1
Load2
Load3
Store1
Store2
Store3

No
No
Yes
Yes
Yes
Yes

64
80
72
64

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

[80]*R2
[72]*R2
Mult1

Register result status

Clock
17

R1
64

F0
Fu Load3

F2

F4

F6

F8

F10 F12

Mult1
Computer Science 146
David Brooks

47

Loop Example Cycle 18


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

1
2
3
6
7
8

9
14
18
10
15

11
16

Vj

S1
Vk

S2
Qj

RS
Qk

Reservation Stations:
Time

Exec Write
Issue CompResult

Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No

10
15

R(F2) Load3

Busy Addr

Fu

Load1
Load2
Load3
Store1
Store2
Store3

No
No
Yes
Yes
Yes
Yes

64
80
72
64

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

[80]*R2
[72]*R2
Mult1

Register result status

Clock
18

F0

R1
64

F2

Fu Load3

F4

F6

F8

F10 F12

Mult1
Computer Science 146
David Brooks

Loop Example Cycle 19


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

1
2
3
6
7
8

9
14
18
10
15
19

10
15
19
11
16

Vj

S1
Vk

S2
Qj

RS
Qk

Reservation Stations:
Time

Exec Write
Issue CompResult

Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No

R(F2) Load3

Busy Addr

Fu

Load1
Load2
Load3
Store1
Store2
Store3

No
No
Yes
No
Yes
Yes

72
64

[72]*R2
Mult1

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

64

Register result status

Clock
19

R1
64

F0
Fu Load3

F2

F4

F6

F8

F10 F12

Mult1
Computer Science 146
David Brooks

48

Loop Example Cycle 20


Instruction status:
ITER Instruction
1
1
1
2
2
2

LD
MULTD
SD
LD
MULTD
SD

F0
F4
F4
F0
F4
F4

Exec Write
Issue CompResult

0
F0
0
0
F0
0

R1
F2
R1
R1
F2
R1

1
2
3
6
7
8

9
14
18
10
15
19

10
15
19
11
16
20

Vj

S1
Vk

S2
Qj

RS
Qk

Reservation Stations:
Time

Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No

R(F2) Load3

Busy Addr

Fu

Load1
Load2
Load3
Store1
Store2
Store3

No
No
Yes
No
No
Yes

64

Mult1

Code:
LD
MULTD
SD
SUBI
BNEZ

F0
F4
F4
R1
R1

0
F0
0
R1
Loop

R1
F2
R1
#8

...

F30

64

Register result status

Clock
20

R1
64

F0

F2

Fu Load3

F4

F6

F8

F10 F12

Mult1
Computer Science 146
David Brooks

Tomasulo Review
Reservation Stations

Distribute RAW hazard detection


Renaming eliminates WAW hazards
Buffering values in Reservation Stations removes WARs
Tag match in CDB requires many associative compares

Common Data Bus


Achilles heal of Tomasulo
Multiple writebacks (multiple CDBs) expensive

Load/Store reordering
Load address compared with store address in store buffer
Computer Science 146
David Brooks

49

Tomasulo vs. Scoreboarding


1. No explicit checking for WAW or WAR hazards
2. CDB broadcasts results rather than waiting on
registers
3. Loads/Store are treated like basic FUs
4. Distributed vs. Centralized control

Computer Science 146


David Brooks

Register Renaming: Pointer-Based


MIPS R10K, Alpha 21264, Pentium 4, POWER4
Mapper/Map Table: Hardware to hold these
mappings
Register Writes: Allocate new location, note mapping in
table
Register Reads: Look in map table, find location of most
recent write

Deallocate mappings when done

Computer Science 146


David Brooks

50

Register Renaming: Example


Mapper/Map Table: Hardware to hold these mappings
Register Writes: Allocate new location, note mapping in table
Register Reads: Look in map table, find location of most recent write

Deallocate mappings when done

Assume
4 Architected/Logical Registers (F1,F2,F3,F4) names
8 Physical/Rename Registers (P1P8) locations

Code Lots of Potential WAR/WAW, also RAWs


ADD
SUB
ADD
ADD

R1, R2, R4
R4, R1, R2
R3, R1, R3
R1, R3, R2

Computer Science 146


David Brooks

Register Renaming: Example


Initial Mapping
ADD
SUB
ADD
ADD

R1, R2, R4
R4, R1, R2
R3, R1, R3
R1, R3, R2

Map Table
R1

R2

R3

R4

P1

P2

P3

P4

P5

P2

P3

P4

P5

P2

P3

P6

P5

P2

P7

P6

P8

P2

P7

P6

ADD
SUB
ADD
ADD

P5, P2, P4
P6, P5, P2
P7, P5, P3
P8, P7, P2

Computer Science 146


David Brooks

51

For next time


Branch Prediction
Section 3.4/3.5 of H&P
A Comparison of Dynamic Branch Predictors that use
Two Levels of Branch History Tse-Yu Yeh and Yale
Patt, ISCA-1993

Computer Science 146


David Brooks

52

You might also like