Professional Documents
Culture Documents
Leksioni 5 ILP2
Leksioni 5 ILP2
Kompjuterit
Paralelizimi ne nivel instruksioni
Pjesa II
Algoritmi i Tomasulo
Per IBM 360/91 rreth 3 vjet pas CDC 6600 (1966)
Qellimi: Performance e larte pa kompilues speciale
Diferencat ndermjet IBM 360 & CDC 6600 ISA
IBM ka vetem 2regjsitra specifikues /instruksione vs. 3 ne CDC 6600
IBM ka 4 rtegjistra FP vs. 8 neCDC 6600
Pse studiohet? Sipas tij jane projektuar Alpha 21264, HP 8000, MIPS
Leksioni 5
Leksioni 5
Organizimi i Tomasulo
FP Op Queue
From Mem
FP Registers
Load Buffers
Load1
Load2
Load3
Load4
Load5
Load6
Store
Buffers
Add1
Add2
Add3
Mult1
Mult2
FP
FPadders
adders
Reservation
Stations
To Mem
FP
FPmultipliers
multipliers
Komponentet e RS
OpOperacioni qe duhet kryer ne njesi (e.g., + or )
Vj, VkVlera e operandeve Burim
Buferat Storekane V fusha, rezultate per te ruajtur.
Qj, QkRS qe prodhojne regjsitra burim(vlera per tu shkruar)
Kujdes: Pa flamuj gjendje si ne Scoreboard; Qj,Qk=0 => gati
Buferat Store kane vetem Qi per te prodhuar rezultatet RS
BusyTregon qe stacioni i rezervimit ose FU eshte i zene
Regjsitri i statusit te rezultatitTregon cila FU do te shkruaj cilin
regjsiter, nese ekziston nje. ndicates which functional unit will
write each register, if one exists. Bosh kur nuk ka instruksione ne
pritje per te shkruar ne ate regjister.
Leksioni 5
Leksioni 5
Shembull
i
Tomasulo
Instruction status:
Instruction stream
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
Load1
Load2
Load3
Clock cycle
counter
7
FU
No
No
No
3 Load/Buffers
Reservation Stations:
Time Name Busy
Add1
No
Add2
No
FU count
Add3
No
down
Mult1 No
Mult2 No
Busy Address
Op
S1
Vj
S2
Vk
RS
Qj
RS
Qk
3 FP Adder R.S.
2 FP Mult R.S.
F0
F2
F4
F6
F8
F10
F12
...
F30
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
Reservation Stations:
Time Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
FU
Busy Address
Load1
Load2
Load3
Op
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F0
F2
F4
F6
F8
Load1
Yes
No
No
34+R2
F10
F12
...
F30
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
Reservation Stations:
Time Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
FU
Busy Address
Load1
Load2
Load3
Op
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F0
F2
F4
F6
F8
Load2
Load1
Yes
Yes
No
34+R2
45+R3
F10
F12
...
F30
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
Reservation Stations:
Time Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes MULTD
Mult2 No
FU
F0
Busy Address
S1
Vj
Load1
Load2
Load3
S2
Vk
RS
Qj
Yes
Yes
No
34+R2
45+R3
F10
F12
RS
Qk
R(F4) Load2
F2
Mult1 Load2
F4
F6
F8
...
Load1
F30
Shembull Tomasulo,
Cikli 4
Exec Write
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
Busy Address
3
4
Load1
Load2
Load3
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
No
Yes
No
45+R3
F10
F12
FU
F0
Mult1 Load2
M(A1) Add1
...
F30
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
FU
F0
Mult1 M(A2)
F10
No
No
No
F12
...
F30
Shembull Tomasulo,
Cikli 6
Exec Write
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
FU
F0
Mult1 M(A2)
Add2
No
No
No
F10
F12
...
F30
Add1 Mult2
Shembull Tomasulo,
Cikli 7
Exec Write
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
3
4
Busy Address
4
5
Load1
Load2
Load3
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
FU
F0
No
No
No
Mult1 M(A2)
Add2
F10
F12
Add1 Mult2
14
...
F30
Shembull Tomasulo,
Cikli 8
Exec Write
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
15
FU
F0
Mult1 M(A2)
No
No
No
F10
F12
...
F30
Shembull Tomasulo,
Cikli 9
Exec Write
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
16
FU
F0
Mult1 M(A2)
No
No
No
F10
F12
...
F30
Shembull Tomasulo,
Cikli 10
Exec Write
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
3
4
4
5
Busy Address
Load1
Load2
Load3
10
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
FU
F0
No
No
No
Mult1 M(A2)
F10
F12
17
...
F30
Shembull Tomasulo,
Cikli 11
Exec Write
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
FU
F0
Mult1 M(A2)
No
No
No
F10
(M-M+M)(M-M) Mult2
F12
...
F30
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
19
FU
F0
Mult1 M(A2)
No
No
No
F10
(M-M+M)(M-M) Mult2
F12
...
F30
Shembull Tomasulo,
Cikli 13
Exec Write
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
20
FU
F0
Mult1 M(A2)
No
No
No
F10
(M-M+M)(M-M) Mult2
F12
...
F30
Shembull Tomasulo,
Cikli 14
Exec Write
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
21
FU
F0
Mult1 M(A2)
No
No
No
F10
(M-M+M)(M-M) Mult2
F12
...
F30
Shembull
Tomasulo,
Cikli
15
Instruction status:
Exec Write
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
Busy Address
3
4
15
7
4
5
Load1
Load2
Load3
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
FU
F0
Mult1 M(A2)
No
No
No
F10
F12
(M-M+M)(M-M) Mult2
22
...
F30
Shembull Tomasulo,
Cikli 16
Exec Write
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
3
4
15
7
4
5
16
8
Load1
Load2
Load3
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
FU
F0
Busy Address
M*F4 M(A2)
F10
(M-M+M)(M-M) Mult2
23
No
No
No
F12
...
F30
Llogaritje me te shpejta se
shpejtesia e drites
(le ti anashkalojme ca cikle)
24
Shembull Tomasulo,
Cikli 55
Exec Write
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
3
4
15
7
4
5
16
8
Load1
Load2
Load3
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
25
FU
F0
Busy Address
M*F4 M(A2)
No
No
No
F10
(M-M+M)(M-M) Mult2
F12
...
F30
Shembull Tomasulo,
Cikli 56
Exec Write
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
3
4
15
7
56
10
4
5
16
8
Load1
Load2
Load3
S1
Vj
S2
Vk
RS
Qj
RS
Qk
56
FU
F0
F2
F4
F6
F8
M*F4 M(A2)
No
No
No
11
Busy Address
F10
F12
(M-M+M)(M-M) Mult2
26
...
F30
Shembull Tomasulo,
Cikli 57
Exec Write
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Reservation Stations:
3
4
15
7
56
10
4
5
16
8
57
11
Load1
Load2
Load3
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
FU
F0
Busy Address
M*F4 M(A2)
No
No
No
F10
F12
...
(M-M+M)(M-M) Result
F30
Permbledhje: Tomasulo
Parandalon regjistrat nga bottleneck
Eviton rreziqet WAR, WAW te Scoreboard
Lejon zberthim cikli ne HW
Jo e limitiar ne blloqe baze (ofron parashikim dege)
Kontribbute permanente
Skedulim Dinamik
Riemertim regjistrash
Kthjellim Load/store
28
Leksioni 5
SPEC
Skema te tjera
parashikojne
permes
mbledhjes info
profili mbi
ekzekutime te
meparshme,
Integer
29
Floating Point
Adresa
31
0
1
Bits 13 - 2
1023
30
Leksioni 5
P
a
r
a
s
h
i
k
i
m
Parashikimi dinamik i
deges
Zgjidhje: Skeme me 2-bite ku ndryshojme
parashikimin vetem nese keq parashikojme 2
here.
T
NT
Predict E marre
T
Predict Not
E marre
T
NT
T
Predict E marre
NT
Predict Not
E marre
NT
Saktesia e BHT
Keqparashikim sepse:
Supozim i gabuar per ate dege
Kemi marre historine e deges se gabuar gjate indeksimit te
tabeles
Programet me tabele 4096 regjstrimesh variojne nga 1%
32
Leksioni 5
33
Deget e korreluara
Parashikues(2,2)
Sjelljaedegevetefundit
perzgjedhmidis
4parashikimevetedeges
pasasrdhese,dukeupdate
uarvetemateparashikim
Branch address
2-bits per branch predictors
Prediction
Prediction
Leksioni 5
4096regjstrime2bitBHT
Rregjistrimepalimit2bitBHT
1024rregjistrime(2,2)BHT
35
16%
14%
12%
11%
10%
8%
6%
6%
5%
6%
6%
li
eqntott
expresso
gcc
fpppp
spice
1%
doducd
0%
tomcatv
0%
1%
matrix300
2%
5%
4%
4%
nasa7
Frekuencaekeqparashikimit
18%
36
Parashikuesit Lokale
Tabele historiku lokale: 1024, regjsitrime me 10 bite, per 10 deget e
37
Krahasimi i Parashikuesve
Avantazhi i Parashikuesve konkurues eshta aftesia per te
pothuaj 40% te kohes per SPEC integer dhe me pak se 15% te kohes per
SPEC FP
38
Buferi
i deges objektiv
Branch Target Buffer (BTB): Perdorin adresen e deges si indeks per te
parashikuar dhe adresen e deges ( nese merrret)
Kujdes: duhet kontrolluar per dege tani, sepse nuk mund te perdoren adresa te
gabuara degesh.
Parashikimi i deges:
E marre apo jo
39
Leksioni 5
Shembull
Instruksione
Ne Bufer
Po
Po
Jo
Parashikimi
E marre
E marre
Dega
Aktuale
E marre
JoE marre
E marre
Leksioni 5
Cikle
Penalitet
0
2
2
degen e ardhshme
Ose dege te ndryshme(GA)
Ose ekzekutime te ndryshme per te njejtat dege (PA)
parashikimin
41