Professional Documents
Culture Documents
Chapter 5 - Caches
Chapter 5 - Caches
Ni dung
Phn cp b nh
Lm th no to ra mt b nh ln v nhanh?
Lin kt SRAM, DRAM, v a cng
Caching
Nhng b nh nh lu nhng d liu quan trng
V d
B nh cache lm vic nh th no?
Cc th: Tags
Cc khi: Blocks (lines)
Thc thi
3 loi cache: kt hp hon ton (Fullyassociative), kt hp theo tp
hp (setassociative), nh x trc tip (directmapped)
Hiu nng
Phn cp b nh
t vn
Cn b nh ln v nhanh
B nh lnh ln ISA : 232 memory address (4GB)
Yu cu nhanh v 33% cc lnh l loads/stores v 100% cc lnh cn phi ti
v thanh ghi lnh
Tn ti b nh c th c dung lng ln v truy nhp nhanh?
B nh ln v nhanh
Cc loi b nh c?
Hard disk: Huge (1000 GB) Super slow (1M cycles)
Flash: Big (100 GB) Very slow (1k cycles)
DRAM: Medium (10 GB) Slow (100 cycles)
SRAM: Small (10 MB) Fast (110 cycles)
Cn b nh nhanh v ln
Khng th s dng SRAM (too small)
Khng th s dng DRAM (too slow and small)
Khng th s dng Flash/Hard disk (way too slow)
C th kt ni gia chng:
Speed t (small) SRAMs
Size t (big) DRAM v Hard disk
Xy dng mt phn cp s dng cng ngh khc tn dng cc u im ca
cc b nh c sn.
Phn cp b nh
Phn loi:
Dung lng nh v nhanh: SRAM
Chm: DRAM
a cng dung lng ln nhng
rt chm
Vin cnh:
Rt ln
Rt nhanh (on average)
Mc tiu?
Lu tr thng tin quan trng trong
b nh nhanh.
Di chuyn nhng thng tin khng
quan trng vo b nh chm
V d: sa video
Video dung lng ln (ln hn DRAM)
Lu vo cng
Ti phn cn chnh sa vo DRAM
CPU ti d liu x l vo cache.
Di chuyn d liu mi vo DRAM v
cache khi x l video
Ch :
Lu nhng d liu quan trng vo
b nh nhanh
Di chuyn nhng d liu khng
quan trng vo b nh chm
Lm th no SRAMs c
dung lng ln hn, DRAM
truy cp nhanh hn?
Cc tng c bn v cache
t nhng d liu quan trng trong b
nh nh v nhanh (cache).
Nu truy cp (load/store) nhng d liu
quan trng, cn thc hin nhanh.
Nu truy cp (load/store) nhng d liu
khc, dch chuyn d liu vo trong cache.
Nu t chnh xc d liu cn dng vo
cache, khi hu ht cc truy cp s tm
ra d liu hu ch trong cache v tr nn
nhanh hn.
V d: b nh m
Ci g trong cache
(0x8)
(add r2, r2, r1)
(1)
A: 30
a ch th xc nh a ch d liu
trong cache, cn bits nhn din a
ch t.
(We would need 32 bits if the memory
was not wordaligned.)
1/2 wasted
1/3 wasted
1/5 wasted
1/9 wasted
1/17 wasted (c s dng trong b x l ngy nay)
nh a ch cho cc cache c
kch thc khi khc nhau
a ch:
2 bit cui : xc nh byte trong t
(khng cn thit bi v truy nhp b nh bng cc t)
N bit tip theo: xc nh t trong khi
(v d, kch thc khi l 4 cn 2 bits xc nh t)
32N2 bit cn li: tag
(cn thit xc nh a ch b nh)
A: 2
Bng vic ti 3 t tip
theo cng vi t cn
dng, t l thnh cng
cao hn. Phng
Q: Ti sao kch thc block 4 php ny c gi l
khng gian cc b t l tt ?
a phng: Nhng
1. t tn khng gian hn
d liu gn vi nhng
2. Ti trc cc d liu cn
d liu va s dng l
dng sau
nhng d liu hp l
3. Th t bit hn.
nht.
Cc khi cache ln hn
u im:
t tn khng gian cho cc th (higher % data)
Quy nh cc b:
N1 words tip theo c s c s dng sau
Nhc im:
Nu khng s dng d liu trong cache s lm tn khng gian d liu
V d:
for (int i=0; i<100; i+=2) {
a[i]++;
}
Tn mt na khng gian
64 bytes (8 words) l kch thc chun
Intel lun np 2 khi cache ti mt thi im tng ng vi 128 bytes hay 16
words
B nh m nh x trc tip
tng hiu qu tm kim
t vn
Tm kim khi
Cn n b so snh
Hoc cn n chu k
Mc tiu?
C th t d liu bt k u
trong cache: flexibility
N blocks c th lu d n mng
d liu, mi v tr t khi c th
cha mt trong mt trong tt c cc
khi trong b nh.
Q: Nhc im?
1. t lng ph khng gian
2. Ch cn mt b so
snh
3. Ch mt block c
tm kim
A: Ch mt block c tm
kim
Tm kim ton b cache, vi
n b so snh hoc vi mt
b snh trong n chu k
V tr khi trong b m nh
x trc tip
Mi a ch nh x ti mt block
nh a ch theo modulo (K = i mod n)
K : v tr khi t trong cache
i : s th t khi trong b nh trong
n : s khi ca cache
V d:
0 (0 mod 8) = 0
1 (1 mod 8) = 1
2 (2 mod 8) = 2
8 (8 mod 8) = 0
9 (9 mod 8) = 1
18 (18 mod 8) = 2
Ch 3 bits cui:
0 000000 0
1 000001 1
Tag bits 2 000010 2
8 001000 0
9 001001 1
18 010010 2
(b qua nh
a ch byte)
V d b m nh x trc tip
u im: d dng trong vic tm kim
S dng a ch tm mt v tr (modulo
addressing)
So snh vi tag
Nhc im: km linh hot (1 chu k tm kim
1 khi )
Nu 2 a ch nh x ti cng mt dng khi
ch c th lu tr mt khi d liu
Xy ra xung t
Nu 2 a ch khi trong b nh trong c
cng a ch modulo, c 2 khng c lu tr
cng nhau trong cache, k c khi cache cn
trng. (Inflexible Conflicts)
B nh m kt hp theo tp
hp: Setassociative caches
linh hot hn
C th tha hip?
C th kt hp vic tm kim d dng trong b m nh x trc tip
(directmapped) v s linh hot trong b m kt hp ton khi
(fullyassociative caches)?
C: B m kt hp theo tp hp (setassociative)
Mi khi nm trong mt tp (1 set gm nhiu khi - c cng a ch
modulo)
Mi tp c nhiu khi (tm kim kt hp)
D dng hn trong vic tm km:
Xc nh c khi da trn a ch
Ch cn kim tra mt s khi cache trong tp.
Linh hot:
C th t khi bt k u trong tp
Gim s ln trt b m.
B m phi hp kiu tp
hp (Setassociative cache)
Mi a ch nh x ti mt khi
S dng a ch modulo (K = i mod s)
K : v tr khi t trong cache
i : s th t khi trong b nh trong
s : s lng tp hp trong cache
0 (0 mod 4) = anywhere in Set 0
1 (1 mod 4) = anywhere in Set 1
2 (2 mod 4) = anywhere in Set 2
8 (8 mod 4) = anywhere in Set 0
9 (9 mod 4) = anywhere in Set 1
18 (18 mod 4) = anywhere in Set 2
Xem xt 2 bits cui:
0 000000 Set 0
1 000001 Set 1
2 000010 Set 2
8 001000 Set 0
Tag bits
9 001001 Set 1
18 010010 Set 2
(Ignoring byte
addressing for
now)
V d: Setassociative
u im: D dng trong vic tm kim
S dng a ch tm kim cc tp
So snh tt c cc tag trong tp
Nhc im:
t linh hot hn kiu fullyassociative (can
only go anywhere in the set)
Phc tp hp (kim tra nhiu tags hn)
Cc loi caches
a ch tng ng mt khi
(9 mod 8) = 1
Tm kim d dng, khng linh hot
a ch tng ng vi mt tp .
(9 mod 4) = Set 1
Tm kim tt hn, linh hot hn
a ch tng ng vi bt k v tr no
trong cache.
Kh trong vic tm kim, rt linh hot
Thit k cache
B nh m ngy nay
Intel Ivy Bridge (2012, 4core x86)
L1 Data:
32kB
8way setassociative
L1 Instruction: 32kB
8way setassociative
L2 I&D:
256kB
8way setassociative
L3 I&D:
8MB
16way setassociative
64byte line
64byte line
64byte line
64byte line
64byte line
64byte line
64byte line
64byte line
64byte line
Higher associativity
for larger caches.
Filter cache.
Designed to
save energy for
small pieces of
code
Dung lng b nh m
iu g xy ra nu b nh m
y?
Ghi vo b nh m
Chin thut ghi?
Ghi vo b nh m
Khi c lnh hoc d liu: t vo cache
Ghi nh th no?
Writethrough (ghi ng thi)
Thng tin c ghi ng thi vo khi ca cache v khi ca b nh
trong
n gin, nhng chm (phi i ghi vo DRAM)
Writeback (ghi li)
Ch ghi d liu vo cache
Nu khng c trong cache, cn np li vo cache
Cn nh du li nu d liu trong cache c cp nht
Khi mt khi b thay th, khi ny s c ghi li vo b nh trong.
Writethrough
Lun ghi vo DRAM
Nu d liu trong cache, khi ghi
ng thi vo cache.
Q: Ghi chm hn v sao?
1. Ph thuc vo v tr ca d liu
trong cache.
2. Chm do ghi vo DRAM
3. Chm do ghi vo DRAM v
cache
A: Chm do ghi vo DRAM
Ghi vo DRAM mi thi im.
Writethrough with
allocateonwrite
Lun ghi vo DRAM
Nu d liu trong cache, ghi ng thi vo
cache
Allocateonwrite
Nu d liu khng c trong cache, ti d liu v
cache
Q: Tt hn ti sao?
1. It isnt
2. Faster to read and write than just write
3. Subsequent reads will hit in the cache
A: Subsequent reads will hit in the cache
Thng hay truy cp d liu v ghi li, ch ghi mt
phn ca khi nh mt t hoc mt byte, do vy ti vo
cache s nhanh hn.
Writeback
Lun ghi vo cache
Nu d liu khng c trong cache,
ti vo cache
Note: cache v DRAM l khng
nh nhau!
Khi chng ta loi b mt dng cn
phi bit khc vi d liu ghi t
DRAM nh th no?
Ghi vo b nh m
Writethrough n gin, nhng chm
Phi i ghi ng thi vi DRAM trong mi ln ghi
Writeback phc tp hn, nhng nhanh
Phi nh du d liu bn (dirty data) trong cache v ghi li vo b
nh trong nu n b loi b
Ghi nhanh hn
Allocateonwrite ti d liu vo cache khi ghi
Mt khc ch ti d liu vo cache khi c.
Ghi vo cache ngy nay l kiu writeback
Intels new Xeon Phi c mc L2 writethrough cache
Hiu nng b nh m
T s trt b m
Miss ratio = % of cache misses = (# cache misses / # memory accesses)
T l trt (%) = (s ln trt/s ln truy nhp)
Q: Hiu nng thay i th
no khi t s trt b m
gim t 10.5% n 3.5%?
1. Slower
2. Stays the same
3. Faster
A: Faster
Cc ng dng c th chy
nhanh hn, nhng khng
th bit l nhanh n bao
nhiu. T l Trng cache ln
hn rt nhiu so vi trt.
Hit time
Miss time
Miss penalty
= 1 Hit (1cycle)
= 1 Miss (1 cycle)
= 3 Penalty
V d: AMAT
Machine 1
100 truy nhp DRAM
1 chu k truy nhp cache (hit or miss)
Tnh AMAT cho lbm?
cache c dung lng 256kB ?
6% miss ratio
(1) + (6%)*(1+100) = 7.06 cycles per memory
access
cache c dung lng 8MB ?
3% miss ratio
(1) + (3%)*(1+100) = 4.03 cycles per memory
access
Machine 2
100 cycles to DRAM
2 cycle cache access time (hit or miss)
Tnh AMAT cho bzip2?
Q: Ti sao bzip2s AMAT vi
256kB cache?
256kB cache gn ging vi
1.5% miss ratio
lbms vi 8MB?
(2) + (1.5%)*(2+100) = 3.53
1. Slower cache
2. Different miss ratios
3. Different applications
A: Slower cache
Cache mt 2 cycles cho
Machine 2 v ch mt chu k
cho Machine 1. Diu ny lm
nh hng n AMAT.ng
dng khc nhau s thy 8MB
cache c 3% miss ratio i vi
lbm trong khi bzip2 c 1.5%
miss ratio vi cache 256kB
Performance impacts of
memory access time
nh thng ca cace n hiu nng nh th no(CPI)?
100 chu k trng pht trt b m
3% t l trt
1.33 ln truy nhp b nh trn mt lnh (1 for instruction 33% for data)
CPI = 1.0 khi thc thi thng thng (l tng)
Hiu nng thay i th no?
CPI = CPIExecution + Memory stall cycles per instruction = 1.0 +
1.33*0.03*100 = 4.99
Phn cp b nh ny lm b x l chy chm hn 5 ln!
Caches l rt quan trng! Caches l quan trng nht tng hiu nng.
AMD/Intel caches
Heterogeneous processor
caches
Memory hierarchy
We want big and fast
Build a hierarchy where we keep the most important data in fast
memory
Other data goes in slow memory
If we move the data correctly we provide the illusion of fast and
big
Registers 3 accesses/cycle 3264
Cache 110 cycles 8kB256kB
Cache 40 cycles 420MB
DRAM 200 cycles 416GB
Flash 1000+ cycles 64512GB
Hard Disk 1M+ cycles 2 4TB