Hardware

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Hardware based Speculation

● Execute instructions along predicted execution paths but


only commit the results if prediction was correct
● Instruction commit: allowing an instruction to update the
register file when instruction is no longer speculative
● Need an additional piece of hardware to prevent any
irrevocable action until an instruction commits
– Reorder Buffer
● In-order commit
● Stores instruction results before instruction commits
● Clear ROB on misprediction
● Exceptions
Tomasulo's Algorithm with Speculation
ROB – Loop Based Example
ROB
Entry Busy Instruction State Destination Value
1 no L.D F0, 0(R1) Commit F0 Mem[0+Regs[R1]]
2 no MUL.D F4, F0, F2 Commit F4 #1 * Regs[F2]
3 yes S.D F4, 0(R1) Write Result 0+Regs[R1] #2
4 yes DADDIU R1, R1, #-8 Write Result R1 Regs[R1]-8
5 yes BNE R1, R2, LOOP Write Result
6 yes L.D F0, 0(R1) Write Result F0 Mem[#4]
7 yes MUL.D F4, F0, F2 Write Result F4 #6 * Regs[F2]
8 yes S.D F4, 0(R1) Write Result 0+Regs[R1] #7
9 yes DADDIU R1, R1, #-8 Write Result R1 #4 - #8
10 yes BNE R1, R2, LOOP Write Result
Multiple Issue and Static Scheduling
To achieve CPI < 1, need to complete multiple
instructions per clock

● Statically scheduled superscalar processors


● VLIW (Very Long Instruction Word) processors
● Dynamically scheduled superscalar processors
Multiple Issue Processors
Dynamic Scheduling + Multiple Issue + Speculation
Limit the number of instructions of a given class that can be
issued in a “bundle”
I.e. on FP, one integer, one load, one store

Examine all the dependencies among the instructions in the


bundle

Also need multiple completion/commit


Dynamic Scheduling + Multiple Issue
2-way Superscalar
Instructions Issues Executes Mem Access Write CDB
at clock at clock at clock at clock
1 LD R2, 0(R1) 1 2 3 4
1 DADDIU R2, R2, #1 1 5 6
1 SD R2, 0(R1) 2 3 7
1 DADDIU R1, R1, #8 2 3 4
1 BNE R2, R3, L 3 7
2 LD R2, 0(R1) 4 8 9 10
2 DADDIU R2, R2, #1 4 11 12
2 SD R2, 0(R1) 5 9 13
2 DADDIU R1, R1, #8 5 8 9
2 BNE R2, R3, L 6 13
3 LD R2, 0(R1) 7 14 15 16
3 DADDIU R2, R2, #1 7 17 18
3 SD R2, 0(R1) 8 15 19
3 DADDIU R1, R1, #8 8 14 15
3 BNE R2, R3, L 9 19
Dynamic Scheduling + Multiple Issue + Speculation
2-way Superscalar
Instructions Issues Executes Mem Access Write CDB Commits at
at clock at clock at clock at clock clock
1 LD R2, 0(R1) 1 2 3 4 5
1 DADDIU R2, R2, #1 1 5 6 7
1 SD R2, 0(R1) 2 3 7
1 DADDIU R1, R1, #8 2 3 4 8
1 BNE R2, R3, L 3 7 8
2 LD R2, 0(R1) 4 5 6 7 9
2 DADDIU R2, R2, #1 4 8 9 10
2 SD R2, 0(R1) 5 6 10
2 DADDIU R1, R1, #8 5 6 7 11
2 BNE R2, R3, L 6 10 11
3 LD R2, 0(R1) 7 8 9 10 12
3 DADDIU R2, R2, #1 7 11 12 13
3 SD R2, 0(R1) 8 9 13
3 DADDIU R1, R1, #8 8 9 10 14
3 BNE R2, R3, L 9 13 14
Literature on Processors
● Efficient Reading of Papers in Science and Technolo
gy
● Yeager, The MIPS R10000 Processor, MICRO,
1996.
● Hinton et. al., The Microarchitecture of the Pentium 4
Processor. Intel Technology Journal Q1, 2001.
● Smith and Sohi. Microarchitecture of Superscalar
Processors. Proc. of IEEE. 1995.
● Kahle, et. al. Introduction to the Cell multiprocessor.
IBM J. RES. & DEV. 2005.
● Hammerlund, et. al., Haswell: The fourth generation
Intel Processor, MICRO 2014.
Extra
Tomasulo's - Loop based Example 1

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
L.D F0, 0(R1) √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 yes
no Load Regs[R1] + 0
Load2 yes
no Load Regs[R1] - 8
Add1 no
Mult1 yes
no MUL Regs[F2] Load1
Mult2 yes
no MUL Regs[F2] Load2
Store1 yes
no Store Regs[R1]+0 Mult1
Store2 yes
no Store Regs[R1]-8 Mult2

Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Load1
Load2 Mult2
Mult1
Tomasulo's - Loop based Example 2

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
L.D F0, 0(R1) √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 yes Load Regs[R1] + 0
Load2 yes Load Regs[R1] - 8
Add1 no
Mult1 yes MUL Regs[F2] Load1
Mult2 yes MUL Regs[F2] Load2
Store1 yes Store Regs[R1]+0 Mult1
Store2 yes Store Regs[R1]-8 Mult2

Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Load2 Mult2
Tomasulo's - Loop based Example 3

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
L.D F0, 0(R1) √ √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 yes
no Load Regs[R1] + 0
Load2 yes Load Regs[R1] - 8
Add1 no
Mult1 yes MUL Mem[Regs[R1] + 0] Regs[F2] Load1
Mult2 yes MUL Regs[F2] Load2
Store1 yes Store Regs[R1]+0 Mult1
Store2 yes Store Regs[R1]-8 Mult2

Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Load2 Mult2
Tomasulo's - Loop based Example 4

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 yes
no Load Regs[R1] - 8
Add1 no
Mult1 yes MUL Mem[Regs[R1] + 0] Regs[F2]
Mult2 yes MUL Regs[F2] Load2
Store1 yes Store Regs[R1]+0 Mult1
Store2 yes Store Regs[R1]-8 Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Load2 Mult2
Tomasulo's - Loop based Example 5

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 yes MUL Mem[Regs[R1] + 0] Regs[F2]
Mult2 yes MUL Mem[Regs[R1] - 8] Regs[F2] Load2
Store1 yes Store Regs[R1]+0 Mult1
Store2 yes Store Regs[R1]-8 Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
Tomasulo's - Loop based Example 6

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 yes MUL Mem[Regs[R1] + 0] Regs[F2]
Mult2 yes MUL Mem[Regs[R1] - 8] Regs[F2]
Store1 yes Store Regs[R1]+0 Mult1
Store2 yes Store Regs[R1]-8 Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
Tomasulo's - Loop based Example 7

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 yes MUL Mem[Regs[R1] + 0] Regs[F2]
Mult2 yes MUL Mem[Regs[R1] - 8] Regs[F2]
Store1 yes Store Regs[R1]+0 Mult1
Store2 yes Store Regs[R1]-8 Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
Tomasulo's - Loop based Example 8

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 yes
no MUL Mem[Regs[R1] + 0] Regs[F2]
Mult2 yes MUL Mem[Regs[R1] - 8] Regs[F2]
Store1 yes Store Regs[R1]+0 Mul[F4] Mult1
Store2 yes Store Regs[R1]-8 Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
Tomasulo's - Loop based Example 9

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √ √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 no
Mult2 yes MUL Mem[Regs[R1] - 8] Regs[F2]
Store1 yes Store Regs[R1]+0 Mul[F4]
Store2 yes Store Regs[R1]-8 Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
Tomasulo's - Loop based Example 10

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √ √ √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 no
Mult2 yes
no MUL Mem[Regs[R1] - 8] Regs[F2]
Store1 yes
no Store Regs[R1]+0 Mul[F4]
Store2 yes Store Regs[R1]-8 Mem[Regs[R1] - 8] Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
Tomasulo's - Loop based Example 11

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √ √ √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √ √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 no
Mult2 no
Store1 no
Store2 yes Store Regs[R1]-8 Mem[Regs[R1] - 8]

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi
Tomasulo's - Loop based Example 12

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √ √ √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √ √ √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 no
Mult2 no
Store1 no
Store2 yes Store Regs[R1]-8 Mem[Regs[R1] - 8]

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi
VLIW Example

● Performance?
● Overhead?

You might also like