Professional Documents
Culture Documents
ASA Chapter4
ASA Chapter4
2019
dce
2019
ADVANCED SYSTEM
ARCHITECTURES
Khoa Khoa học và Kỹ thuật Máy tính Memory Hierarchy Design
BM Kỹ thuật Máy tính
BK
TP.HCM
©2019, dce
2
dce dce
2019
Since 1980, CPU has outpaced DRAM ... 2019
Processor-DRAM Performance Gap Impact
Four-issue 2GHz superscalar accessing 100ns DRAM could execute • To illustrate the performance impact, assume a single-issue pipelined
800 instructions during time for one memory access! CPU with CPI = 1 using non-ideal memory.
Performance • Ignoring other factors, the minimum cost of a full memory access in terms
(1/latency) CPU of number of wasted CPU cycles:
CPU 60% per yr
CPU CPU Memory Minimum CPU memory stall cycles
2X in 1.5 yrs Year speed cycle Access or instructions wasted
MHZ ns ns
3 4
1
dce Addressing the Processor-Memory Performance GAP dce
2019 2019
Levels of the Memory Hierarchy
Upper Level
Capacity
Access Time Staging
Cost
Today’s Xfer Unit faster
• Goal: Illusion of large, fast, cheap memory. CPU Registers Focus Registers
100s Bytes
Let programs address a memory space that <10s ns
Instr. Operands prog./compiler
1-8 bytes
scales to the disk size, at a speed that is Cache
K Bytes
Cache
10-100 ns
usually as fast as register access 1-0.1 cents/bit cache cntl
Blocks 8-128 bytes
Main Memory
M Bytes Memory
200ns- 500ns
• Solution: Put smaller, faster “cache” $.0001-.00001 cents /bit
Pages
OS
512-4K bytes
Disk
memories between CPU and DRAM. Create G Bytes, 10 ms
(10,000,000 ns) Disk
a “memory hierarchy”. -5 -6
10 - 10 cents/bit user/operator
Files Mbytes
Tape Larger
infinite
sec-min Tape Lower Level
10 -8
5 6
dce dce
2019
Common Predictable Patterns 2019
Caches
7 8
2
dce dce
Simple view of cache
2019 2019
Simple view of cache
Address Address • Hit rate: fraction of cache hit
Processor Main
CACHE Memory • Miss rate: 1 – Hit rate
Data Data
- Miss penalty: Time to replace a block + time to
• The processor accesses the cache first deliver the data to the processor
• Cache hit: Just use the data
• Cache miss: replace a block in cache by a
block from main memory, use the data
• The data transferred between cache and main
memory is in blocks, and controlled by
independent hardware
9 10
dce dce
2019
Simple view of cache 2019
Replacement
• Example: For(i = 0; i < 10; i++) S = S + A[i]; Block Number 0 1 2 3 4 5 6 7 8 9
1111111111 2222222222 33
0123456789 0123456789 01
3
dce dce
2019
Basic Cache Design & Operation Issues 2019
dce dce
2019
Direct-Mapped Cache
2019
Direct-mapped Cache
Tag Index Block Address • Address: N bits (2N bytes)
Offset
t
• Cache has 2k lines (blocks)
k b
V Tag Data Block • Each line has 2b bytes
• Block M is mapped to the line M % 2k
2k
lines • Need t = N-k-bTag bits to identify mem. block
t • Advantage: Simple
= • Disadvantage: High miss rate
• What if CPU accesses block N0, N1 and N0 %
HIT Data Word or Byte 2k = N1 % 2k ?
15 16
4
dce dce
2019
Direct-mapped Cache 2019
4KB Direct Mapped Cache Example
A d d r e s s ( s h o w i n g b i t p o s it io n s )
Index field
31 30 13 12 11 2 1 0
1111111111 2222222222 33
Tag field B y te
20 10
H it D a ta
1K = 1024 Blocks Tag
Can cache up to
In d e x V a l id T ag D a ta
01234567
232 bytes = 4 GB 1
of memory 2
Mapping function:
Cache Block frame number = 1021
1022
(Block address) MOD (1024)
• Access N0, N1 where N0 % 2k = N1 % 2k i.e. index field or
1023
20 32
17 18
dce dce
2019
64KB Direct Mapped Cache Example 2019
Can cache up to In d e x B lo c k o f f s e t
232 bytes = 4 GB
of memory
1 6 b i ts 1 2 8 b i ts
t
V T ag D a ta
4K
Tag
e n t rie s
t
16 32 32 32 32
=
HIT
Mux
Offset
Block
Hit or miss? 32
Data
Larger cache blocks take better advantage of spatial locality
and thus may result in a lower miss rate
Block Address = 28 bits
Block offset = Word
Tag = 16 bits Index = 12 bits = 4 bits
b or Byte
Mapping Function: Cache Block frame number = (Block address) MOD (4096)
i.e. index field or 12 low bit of block address
19 20
5
dce dce
2019
Fully associative cache 2019
Set-Associative Cache
• CAM: Content Addressable Memory Tag Index Block
Offset b
• Each block can be mapped to any lines in t
k
cache V Tag Data Block V Tag Data Block
• Tag bit: t = N-b. Compared to Tag of all lines
• Advantage: replacement occurs only when no
rooms available t
HIT
21 22
25 3
Can cache up to
• Block M is mapped to one of w lines in set M 232 bytes = 4 GB
25 4
25 5
22 32
of memory
% 2k
• Tag bit: t = N-k-b Set associative cache requires parallel tag
matching and more complex hit logic which
may increase hit time
H it D a ta
Mapping Function: Cache Set Number = index= (Block address) MOD (256)
23 24
6
dce dce
2019
Q2: How is a block found? 2019
What causes a MISS?
• Index selects which set to look in
• Three Major Categories of Cache Misses:
• Compare Tag to find block – Compulsory Misses: first access to a block
• Increasing associativity shrinks index, – Capacity Misses: cache cannot contain all blocks needed
expands tag. Fully Associative caches have to execute the program
– Conflict Misses: block replaced by another block and then
no index field. later retrieved - (affects set assoc. or direct mapped
• Direct-mapped: 1-way set associative? caches)
• Nightmare Scenario: ping pong effect!
• Fully associative: 1 set?
Memory Address
Block Address Block
Tag Index Offset
25 26
dce dce
Block Size and Spatial Locality
2019 2019
7
dce dce
2019
Q4: What happens on a write? 2019
Reading assignment 1
• Cache hit:
– write through: write both cache & memory • Cache performance
• generally higher traffic but simplifies cache coherence – Replacement policy (algorithms)
– write back: write cache only – Optimization (Miss rate, penalty, …)
(memory is written only when the entry is evicted)
• a dirty bit per block can further reduce the traffic • Reference
• Cache miss: – Hennessy - Patterson - Computer Architecture. A
– no write allocate: only write to main memory Quantitative
– write allocate (aka fetch on write): fetch into cache – www2.lns.mit.edu/~avinatan/research/cache.pdf
– More on internet
• Common combinations:
– write through and no write allocate
– write back with write allocate
29 30
dce
2019
Reading Log1
• Listing some algorithms of replacement
policy? Can it affect cache performance?
• Multi-level Caches
31