Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 41

Arkitektura e avancuar e

Kompjuterit
Paralelizimi ne nivel instruksioni
Pjesa II

Algoritmi i Tomasulo
Per IBM 360/91 rreth 3 vjet pas CDC 6600 (1966)
Qellimi: Performance e larte pa kompilues speciale
Diferencat ndermjet IBM 360 & CDC 6600 ISA
IBM ka vetem 2regjsitra specifikues /instruksione vs. 3 ne CDC 6600
IBM ka 4 rtegjistra FP vs. 8 neCDC 6600

Pse studiohet? Sipas tij jane projektuar Alpha 21264, HP 8000, MIPS

10000, Pentium II, PowerPC 604,

Leksioni 5

Algortimi i Tomasulo vs. Scoreboard


Kontrolli dhe buferat te shperndare me njesite funksionale

(FU) vs. Te qenderzuar ne scoreboard;

Bufera FU te quajtur stacione rezervimi; kane operande pezull

Regjistrat ne Ins zevendesohen nga vlera ose pointera ne

stacionet e rezervimit(RS); te quajtur riemertim regjistrash;


Eviton rreziqet WAR, WAW
Me shume stacione rezervimi se regjistra, => realizohen

optimizime qe kompiluesit nuk mund ti bejne

Rezultate ne FU nga RS, jo permes regjsitrave, permes Bus te

dhenash te zakonshem te cilet transmetojne rezultatet ne FU


Load dhe Store trajtohen si FUs me RSs mjaft mire
Ins Integer mund t ekalojne pas degeve, duke i lejuar
operacioneve FP te tejklalojne blloqet baze ne rradhen e FP
3

Leksioni 5

Organizimi i Tomasulo
FP Op Queue
From Mem

FP Registers

Load Buffers
Load1
Load2
Load3
Load4
Load5
Load6

Store
Buffers
Add1
Add2
Add3

Mult1
Mult2

FP
FPadders
adders

Reservation
Stations

To Mem
FP
FPmultipliers
multipliers

Common Data Bus (CDB)

Komponentet e RS
OpOperacioni qe duhet kryer ne njesi (e.g., + or )
Vj, VkVlera e operandeve Burim
Buferat Storekane V fusha, rezultate per te ruajtur.
Qj, QkRS qe prodhojne regjsitra burim(vlera per tu shkruar)
Kujdes: Pa flamuj gjendje si ne Scoreboard; Qj,Qk=0 => gati
Buferat Store kane vetem Qi per te prodhuar rezultatet RS
BusyTregon qe stacioni i rezervimit ose FU eshte i zene
Regjsitri i statusit te rezultatitTregon cila FU do te shkruaj cilin
regjsiter, nese ekziston nje. ndicates which functional unit will
write each register, if one exists. Bosh kur nuk ka instruksione ne
pritje per te shkruar ne ate regjister.

Leksioni 5

Tre fazat e Algortimit te Tomasulo


1.Veprime (Issue) Merr ins nga rradha e operacioneve FP
Nese RS eshte i lire (ska rrezik strukturor), kontroli sinjalizon
nisjen e Ins dhe dergon operandet (riemerton regjistrat).
2. Ekzekutimioperon tek operandet (EX)
Kur te dy operandet jane gati atehere ekzekutohen;
nese jo kontrollojne CDB per rezultate
3. Shkrimi i rezultateveperfundon ekzekutimin (WB)
Shkruan ne CDB per te gjitha njesite ne pritje;
shenjon RS si te gatshme (available)
Bus te dhenash normal (NDB): data + destinacion (shko ne
bus)
Bus te dhenash te zakonshme (CDB): data + burim (vjen nga bus)
64 bite te dhenash + 4 bite te adreses burim te FU.
Shkruan nese perfundon FU (prodhohet rezuktat)
Kryhet transmetimi
6

Leksioni 5

Shembull
i
Tomasulo
Instruction status:

Instruction stream
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result

Load1
Load2
Load3

Register result status:


Clock
0

Clock cycle
counter
7

FU

No
No
No

3 Load/Buffers

Reservation Stations:
Time Name Busy
Add1
No
Add2
No
FU count
Add3
No
down
Mult1 No
Mult2 No

Busy Address

Op

S1
Vj

S2
Vk

RS
Qj

RS
Qk

3 FP Adder R.S.
2 FP Mult R.S.

F0

F2

F4

F6

F8

F10

F12

...

F30

Shembull Tomasulo, Cikli 1

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1

Reservation Stations:
Time Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No

Register result status:


Clock
1

FU

Busy Address
Load1
Load2
Load3

Op

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F0

F2

F4

F6

F8

Load1

Yes
No
No

34+R2

F10

F12

...

F30

Shembull Tomasulo, Cikli 2

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2

Reservation Stations:
Time Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No

Register result status:


Clock
2

FU

Busy Address
Load1
Load2
Load3

Op

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F0

F2

F4

F6

F8

Load2

Load1

Yes
Yes
No

34+R2
45+R3

F10

F12

...

F30

Shembull Tomasulo, Cikli 3

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3

Reservation Stations:
Time Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes MULTD
Mult2 No

Register result status:


Clock
3

FU

F0

Busy Address

S1
Vj

Load1
Load2
Load3

S2
Vk

RS
Qj

Yes
Yes
No

34+R2
45+R3

F10

F12

RS
Qk

R(F4) Load2

F2

Mult1 Load2

F4

F6

F8

...

Load1

Vini re: emrat e regjstiravge jane hequr (riemertuar) ne RS; kryhet


MULT
Load1 perfundon;per cfare po pret Load1?
10

F30

Shembull Tomasulo,
Cikli 4
Exec Write

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4

Reservation Stations:

Busy Address

3
4

Load1
Load2
Load3

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

No
Yes
No

45+R3

F10

F12

Time Name Busy Op


Add1 Yes SUBD M(A1)
Load2
Add2
No
Add3
No
Mult1 Yes MULTD
R(F4) Load2
Mult2 No

Register result status:


Clock
4

FU

F0

Mult1 Load2

M(A1) Add1

Load2 perfundon; per cfare po pret Load2?


11

...

F30

Shembull Tomasulo, Cikli 5

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5

Reservation Stations:

Busy Address

3
4

4
5

Load1
Load2
Load3

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


2 Add1 Yes SUBD M(A1) M(A2)
Add2
No
Add3
No
10 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
5

FU

F0

Mult1 M(A2)

F10

M(A1) Add1 Mult2

Koha fillon te bjere per Add1, Mult1


12

No
No
No

F12

...

F30

Shembull Tomasulo,
Cikli 6
Exec Write

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4
5
6

Reservation Stations:

Busy Address

3
4

4
5

Load1
Load2
Load3

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


1 Add1 Yes SUBD M(A1) M(A2)
Add2 Yes ADDD
M(A2) Add1
Add3
No
9 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
6

FU

F0

Mult1 M(A2)

Add2

No
No
No

F10

F12

...

F30

Add1 Mult2

Veprojme ADDD ne kete rast pavaresisht varesise se emrit


13
ne F6?

Shembull Tomasulo,
Cikli 7
Exec Write

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4
5
6

Reservation Stations:

3
4

Busy Address

4
5

Load1
Load2
Load3

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


0 Add1 Yes SUBD M(A1) M(A2)
Add2 Yes ADDD
M(A2) Add1
Add3
No
8 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
7

FU

F0

No
No
No

Mult1 M(A2)

Add2

F10

F12

Add1 Mult2

Add1 (SUBD) po perfundon;kush po pret per te?

14

...

F30

Shembull Tomasulo,
Cikli 8
Exec Write

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4
5
6

Reservation Stations:

Busy Address

3
4

4
5

Load1
Load2
Load3

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


Add1
No
2 Add2 Yes ADDD (M-M) M(A2)
Add3
No
7 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
8

15

FU

F0

Mult1 M(A2)

No
No
No

F10

Add2 (M-M) Mult2

F12

...

F30

Shembull Tomasulo,
Cikli 9
Exec Write

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4
5
6

Reservation Stations:

Busy Address

3
4

4
5

Load1
Load2
Load3

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


Add1
No
1 Add2 Yes ADDD (M-M) M(A2)
Add3
No
6 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
9

16

FU

F0

Mult1 M(A2)

No
No
No

F10

Add2 (M-M) Mult2

F12

...

F30

Shembull Tomasulo,
Cikli 10
Exec Write

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4
5
6

Reservation Stations:

3
4

4
5

Busy Address
Load1
Load2
Load3

10

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


Add1
No
0 Add2 Yes ADDD (M-M) M(A2)
Add3
No
5 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
10

FU

F0

No
No
No

Mult1 M(A2)

F10

F12

Add2 (M-M) Mult2

Add2 (ADDD) po perfundon; kush po pret per te?

17

...

F30

Shembull Tomasulo,
Cikli 11
Exec Write

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4
5
6

Reservation Stations:

Busy Address

3
4

4
5

Load1
Load2
Load3

10

11

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


Add1
No
Add2
No
Add3
No
4 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
11

FU

F0

Mult1 M(A2)

No
No
No

F10

(M-M+M)(M-M) Mult2

A ti shkruajme rezultatet e ADDD ketu?


Te gjithe ins e shpejte perfundojne ne kete cikel!
18

F12

...

F30

Shembull Tomasulo, Cikli


Instruction
status:
12
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Exec Write
Issue Comp Result
1
2
3
4
5
6

Reservation Stations:

Busy Address

3
4

4
5

Load1
Load2
Load3

10

11

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


Add1
No
Add2
No
Add3
No
3 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
12

19

FU

F0

Mult1 M(A2)

No
No
No

F10

(M-M+M)(M-M) Mult2

F12

...

F30

Shembull Tomasulo,
Cikli 13
Exec Write

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4
5
6

Reservation Stations:

Busy Address

3
4

4
5

Load1
Load2
Load3

10

11

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


Add1
No
Add2
No
Add3
No
2 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
13

20

FU

F0

Mult1 M(A2)

No
No
No

F10

(M-M+M)(M-M) Mult2

F12

...

F30

Shembull Tomasulo,
Cikli 14
Exec Write

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4
5
6

Reservation Stations:

Busy Address

3
4

4
5

Load1
Load2
Load3

10

11

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


Add1
No
Add2
No
Add3
No
1 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
14

21

FU

F0

Mult1 M(A2)

No
No
No

F10

(M-M+M)(M-M) Mult2

F12

...

F30

Shembull
Tomasulo,
Cikli
15
Instruction status:
Exec Write
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4
5
6

Reservation Stations:

Busy Address

3
4
15
7

4
5

Load1
Load2
Load3

10

11

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


Add1
No
Add2
No
Add3
No
0 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1

Register result status:


Clock
15

FU

F0

Mult1 M(A2)

No
No
No

F10

F12

(M-M+M)(M-M) Mult2

Mult1 (MULTD) po perfundon; kush po pret per te?

22

...

F30

Shembull Tomasulo,
Cikli 16
Exec Write

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4
5
6

Reservation Stations:

3
4
15
7

4
5
16
8

Load1
Load2
Load3

10

11

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


Add1
No
Add2
No
Add3
No
Mult1 No
40 Mult2 Yes DIVD M*F4 M(A1)

Register result status:


Clock
16

FU

F0

Busy Address

M*F4 M(A2)

F10

(M-M+M)(M-M) Mult2

Ne pritje te perfundimit te Mult2 (DIVD)

23

No
No
No

F12

...

F30

Llogaritje me te shpejta se
shpejtesia e drites
(le ti anashkalojme ca cikle)

24

Shembull Tomasulo,
Cikli 55
Exec Write

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4
5
6

Reservation Stations:

3
4
15
7

4
5
16
8

Load1
Load2
Load3

10

11

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


Add1
No
Add2
No
Add3
No
Mult1 No
1 Mult2 Yes DIVD M*F4 M(A1)

Register result status:


Clock
55

25

FU

F0

Busy Address

M*F4 M(A2)

No
No
No

F10

(M-M+M)(M-M) Mult2

F12

...

F30

Shembull Tomasulo,
Cikli 56
Exec Write

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4
5
6

Reservation Stations:

3
4
15
7
56
10

4
5
16
8

Load1
Load2
Load3

S1
Vj

S2
Vk

RS
Qj

RS
Qk

56

FU

F0

F2

F4

F6

F8

M*F4 M(A2)

No
No
No

11

Time Name Busy Op


Add1
No
Add2
No
Add3
No
Mult1 No
0 Mult2 Yes DIVD M*F4 M(A1)

Register result status:


Clock

Busy Address

F10

F12

(M-M+M)(M-M) Mult2

Mult2 (DIVD) po perfundon; cfare po pret per te?

26

...

F30

Shembull Tomasulo,
Cikli 57
Exec Write

Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6

j
34+
45+
F2
F6
F0
F8

k
R2
R3
F4
F2
F6
F2

Issue Comp Result


1
2
3
4
5
6

Reservation Stations:

3
4
15
7
56
10

4
5
16
8
57
11

Load1
Load2
Load3

S1
Vj

S2
Vk

RS
Qj

RS
Qk

F2

F4

F6

F8

Time Name Busy Op


Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 Yes DIVD M*F4 M(A1)

Register result status:


Clock
56

FU

F0

Busy Address

M*F4 M(A2)

No
No
No

F10

F12

...

(M-M+M)(M-M) Result

Edhe nje here: Veprim sipas rradhes, ekzekutim jashte


rradhe dhe perfundim jashte rradhe.
27

F30

Permbledhje: Tomasulo
Parandalon regjistrat nga bottleneck
Eviton rreziqet WAR, WAW te Scoreboard
Lejon zberthim cikli ne HW
Jo e limitiar ne blloqe baze (ofron parashikim dege)
Kontribbute permanente
Skedulim Dinamik
Riemertim regjistrash
Kthjellim Load/store

360/91 pasardhesit jane PowerPC 604, 620; MIPS R10000; HP-

PA 8000; Intel Pentium Pro

28

Leksioni 5

Parashikimi statik i deges


Pr t Rirenditur kodin perreth degve, nevojitet t parashikojme degn

statikisht gjat kompilimit.


Skema me e thjeshte: Supozojme se dega ndiqet.
Gabimi mesatar i parashikimit (misprediction) = frekuenca e deges qe nuk merret = 34%

SPEC

Skema te tjera
parashikojne
permes
mbledhjes info
profili mbi
ekzekutime te
meparshme,

Integer

29

Floating Point

Parashikimi dinamik i deges

Performance = f(saktesi, kosto e parashikimit gabim)


Tabela e historise se deges

Bitet e ulta t PC adresojn tabelen e

indeksimit me vlera 1 biteshe.


Tregon nese nje dege eshte marre apo jo heren e fundit
Nuk kontrollohet adresa

Problem: ne nje cikel, 1-bit BHT shkakton 2 keq parashikime:


Iteracioni i fundit, vazhdon iteracioni ne vend qe te dalim nga cikli
Iteracioni i paret, dalim nga cikli ne vend qe te rikryejme iteracionin

Adresa
31

0
1

Bits 13 - 2
1023

30

Leksioni 5

P
a
r
a
s
h
i
k
i
m

Parashikimi dinamik i
deges
Zgjidhje: Skeme me 2-bite ku ndryshojme
parashikimin vetem nese keq parashikojme 2
here.
T
NT

Predict E marre
T
Predict Not
E marre

T
NT
T

Predict E marre
NT
Predict Not
E marre
NT

E Kuqe: stop, mos e merr


Jeshile: vazhdo, merre
31

Branch Prediction Buffers

Saktesia e BHT
Keqparashikim sepse:
Supozim i gabuar per ate dege
Kemi marre historine e deges se gabuar gjate indeksimit te

tabeles
Programet me tabele 4096 regjstrimesh variojne nga 1%

keqparashikim (nasa7, tomcatv) deri ne 18% (eqntott),.


4096 operon po aq mire sa nje tabele e pafundme por i

nevojiten shume HW.

32

Leksioni 5

Parashikimi i korreluar i deges


Ideja: regjsitrojme m deget e fundit te ekzekutuara si te

marra ose jo dhe prdorim kete model per te ndertuar


tabelen e historikut te deges me N- bitet e duhura
Ne pergjithesi, parashikues (m,n) dmth qe regjistrohne

m deget e fundit per te zgjedhur ndermjet 2m tabela


historiku, seicili me numerues n-bitesh
Keshtu, BHT 2-bit i vjeter eshte, parashikues (0,2)

Historiku global i deges: regjsitra rreshqites m-bitesh qe

ruajne gjendjen M/JM te m degeve te fundit.

33

Branch Prediction Buffers

Deget e korreluara

Parashikues(2,2)

Sjelljaedegevetefundit

perzgjedhmidis
4parashikimevetedeges
pasasrdhese,dukeupdate
uarvetemateparashikim

Branch address
2-bits per branch predictors

Prediction
Prediction

2-bit global branch history


34

Leksioni 5

Saktesia e skemave te ndryshme


20%

4096regjstrime2bitBHT
Rregjistrimepalimit2bitBHT
1024rregjistrime(2,2)BHT

35

16%
14%
12%

11%

10%
8%
6%

6%

5%

6%

6%

li

eqntott

expresso

gcc

fpppp

spice

1%
doducd

0%

tomcatv

0%

1%
matrix300

2%

5%

4%

4%

nasa7

Frekuencaekeqparashikimit

18%

Parashikuesit konkurues (Tournament )


Parashikues dege Multinivelesh
Perdorim numrator n-bitesh per te zgjedhur mes parashikuesve

36

Parashikuesit konkurues (Tournament )


Parashikues konkurues qe perdor psh. Numrator 2-bit,4K,te indeksuar
nga adresa e degeve lokale. Zgjedh midis:
Parashikuesit global
4K regjistrime te indeksuara, nga historiku i 12 degeve te fundit(212 = 4K)
Cdo regjistrim eshte nje parashikues standard 2-biteshr

Parashikuesit Lokale
Tabele historiku lokale: 1024, regjsitrime me 10 bite, per 10 deget e

fundit, indeksuar mbi adresen e deges


Modeli i 10 ngjarjeve te fundit per nje dege te caktar perdoret per te

indeksuar tabelen me 1K regjistrime dhe numrator me 3 bite.

37

Krahasimi i Parashikuesve
Avantazhi i Parashikuesve konkurues eshta aftesia per te

perzgjedhur parashikimin e sakte per nje dege te caktuar


Vecanerisht kritik per performancen e int.
Nje Parashikues tipik konkurues do te zgjedhe Parashikuesin global

pothuaj 40% te kohes per SPEC integer dhe me pak se 15% te kohes per
SPEC FP

38

Buferi
i deges objektiv
Branch Target Buffer (BTB): Perdorin adresen e deges si indeks per te
parashikuar dhe adresen e deges ( nese merrret)

Kujdes: duhet kontrolluar per dege tani, sepse nuk mund te perdoren adresa te

gabuara degesh.

Kthen adresat e degeve te parashikuara me stak

Parashikimi i deges:
E marre apo jo

39

Leksioni 5

Shembull

Instruksione
Ne Bufer
Po
Po
Jo

Parashikimi
E marre
E marre

Dega
Aktuale
E marre
JoE marre
E marre

Percaktoni penalitetin total te deges per nje BTB duke perdorur


penalitet e mesiperme. Supozoni gjithashtu se:
Saktesia e parashikimit= 80%
Hit rate ne bufer = 90%
Frekuenca e marrjes se deges =60%.

Penaliteti i deges = Perqindja e buffer hit rate X Perqindja e prarashikimit te


gabuar X 2 + ( 1 perqindja e buffer hit rate) X deget e marra X 2
Penaliteti i deges = ( 90% X 10% X 2) + (10% X 60% X 2)
Penaliteti i deges = 0.18 + 0.12 = 0.30 cikle ore
40

Leksioni 5

Cikle
Penalitet
0
2
2

Parashikimi dinamik i deges Permbledhje


Parashikimi eshte pjese e rendesishme e ekzekutimit
Tabela e Historikut te degeve (BHT): 2 bite per saktesi cikli
Korrelimi: Deget e ekzekutuara se fundmi te korreluara me

degen e ardhshme
Ose dege te ndryshme(GA)
Ose ekzekutime te ndryshme per te njejtat dege (PA)

Parashikuesit konkurues marrin njohuri per nivelin pasardhes

duke perdorur parashikues te shumfishte.


Zakonisht njeri i bazuar ne informacion global dhe nje ne informacion lokal,

te kombinuar sipas nje selektuesi


Ne 2006, Parashikuesit konkurues perdornin 30K bit, ne procesoret si
Power5 dhePentium 4

Branch Target Buffer:perfshin adresen e deges dhe

parashikimin

41

You might also like