Professional Documents
Culture Documents
Part04 2 Caches
Part04 2 Caches
Part04 2 Caches
ĐẠI HỌC
CÔNG NGHỆ VIETNAM NATIONAL UNIVERSITY HANOI (VNU)
ĐẠI HỌC
CÔNG NGHỆ
VNU UNIVERSITY OF ENGINEERING AND TECHNOLOGY
Computer Architecture
Lecture 5: Caches
Xuan-Tu Tran
VNU University of Engineering and Technology &
VNU Information Technology Institute
Vietnam National University, Hanoi
ĐẠI HỌC
CÔNG NGHỆ
Key Characteristics of Computer Memory Systems
Location Performance
Internal (e.g. processor registers, cache, Access time
main memory) Cycle time
External (e.g. optical disks, magnetic disks, Transfer rate
tapes) Physical Type
Capacity Semiconductor
Number of words Magnetic
Number of bytes Optical
Unit of Transfer Magneto-optical
Word Physical Characteristics
Block Volatile/nonvolatile
Access Method Erasable/nonerasable
Sequential Organization
Direct Memory modules
Random
Associative
1
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Characteristics of Memory Systems
• Location
– Refers to whether memory is internal and external to the
computer
– Internal memory is often equated with main memory
– Processor requires its own local memory, in the form of registers
– Cache is another form of internal memory
– External memory consists of peripheral storage devices that are
accessible to the processor via I/O controllers
• Capacity
– Memory is typically expressed in terms of bytes
• Unit of transfer
– For internal memory the unit of transfer is equal to the number of
electrical lines into and out of the memory module
ĐẠI HỌC
CÔNG NGHỆ
Method of Accessing Units of Data
2
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Capacity and Performance:
ĐẠI HỌC
CÔNG NGHỆ
Memory
3
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Memory Hierarchy
ĐẠI HỌC
CÔNG NGHỆ
g-
Re r s
e
i st
Inb e
ch
me oard Ca
mo in
ry Ma ory
m
me
Ou isk
cd
t eti
sto boar gn OM
rag d M a D- R W
e C D -R W
C R M
D-
D V D- R A y
V a
D lu-R
B
Of e
f ta p
sto -line ne
ti c
rag g
e Ma
4
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Cache example
ĐẠI HỌC
CÔNG NGHỆ
Memory
5
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Cache & main memory
Block Transfer
Word Transfer
Fastest Fast
Less Slow
fast
ĐẠI HỌC
CÔNG NGHỆ
Cache/Main memory structure
Line Memory
Number Tag Block address
0 0
1 1
2 2 Block 0
3 (K words)
C–1
Block Length
(K Words)
(a) Cache
Block M – 1
2n – 1
Word
Length
9/27/2022 Xuan-Tu Tran 12
(b) Main memory
6
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
START
Receive address
RA from CPU
Load main
Deliver RA word
memory block
to CPU
into cache line
DONE
9/27/2022 Xuan-Tu Tran 13
ĐẠI HỌC
CÔNG NGHỆ
Typical Cache organization
Address
Address
buffer
System Bus
Control Control
Processor Cache
Data
buffer
Data
7
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Elements of Cache Design
ĐẠI HỌC
CÔNG NGHỆ
Cache Addresses
• Virtual memory
– Facility that allows programs to address memory from
a logical point of view, without regard to the amount of
main memory physically available
– When used, the address fields of machine instructions
contain virtual addresses
– For reads to and writes from main memory, a
hardware memory management unit (MMU) translates
each virtual address into a physical address in main
memory
8
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Caches with virtual memory
Processor Main
Cache memory
Data
Processor Main
Cache memory
Data
ĐẠI HỌC
CÔNG NGHỆ
Cache sizes of some processors
Year of
Processor Type L1 Cachea L2 cache L3 Cache
Introduction
IBM 360/85 Mainframe 1968 16 to 32 kB — —
PDP-11/70 Minicomputer 1975 1 kB — —
VAX 11/780 Minicomputer 1978 16 kB — —
IBM 3033 Mainframe 1978 64 kB — —
IBM 3090 Mainframe 1985 128 to 256 kB — —
Intel 80486 PC 1989 8 kB — —
Pentium PC 1993 8 kB/8 kB 256 to 512 KB —
PowerPC 601 PC 1993 32 kB — —
PowerPC 620 PC 1996 32 kB/32 kB — —
PowerPC G4 PC/server 1999 32 kB/32 kB 256 KB to 1 MB 2 MB
IBM S/390 G6 Mainframe 1999 256 kB 8 MB —
Pentium 4 PC/server 2000 8 kB/8 kB 256 KB —
High-end
IBM SP server/ 2000 64 kB/32 kB 8 MB —
supercomputer
CRAY MTAb Supercomputer 2000 8 kB 2 MB —
Itanium PC/server 2001 16 kB/16 kB 96 KB 4 MB
Itanium 2 PC/server 2002 32 kB 256 KB 6 MB
IBM High-end
2003 64 kB 1.9 MB 36 MB
POWER5 server
CRAY XD-1 Supercomputer 2004 64 kB/64 kB 1MB —
IBM
PC/server 2007 64 kB/64 kB 4 MB 32 MB
POWER6
IBM z10 Mainframe 2008 64 kB/128 kB 3 MB 24-48 MB
Intel Core i7 Workstaton/
EE 990
2011 6 ´ 32 kB/32 kB 1.5 MB 12 MB
server
IBM 24 MB L3
zEnterprise
Mainframe/ 24 ´ 64 kB/
9/27/2022 Server
2011
Xuan-Tu Tran128 kB 24 ´ 1.5 MB 192 MB 18
196 L4
9
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Mapping Function
ĐẠI HỌC
CÔNG NGHỆ
Example system
10
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
b t b
B0 L0
m lines
Bm–1 Lm–1
First m blocks of
cache memory
main memory
(equal to size of cache) b = length of block in bits
t = length of tag in bits
(a) Direct mapping
t b
L0
one block of
main memory
Lm–1
cache memory
(b) Associative mapping
9/27/2022 Xuan-Tu Tran 21
ĐẠI HỌC
CÔNG NGHỆ
Direct mapping
s+w
s–r
s
W4j
w Li
Compare W(4j+1) Bj
w
W(4j+2)
W(4j+3)
(hit in cache)
1 if match
0 if no match
Lm–1
0 if match
1 if no match
(miss in cache)
11
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
00 000000001111111111111000
00 000000001111111111111100
Line
Tag Data Number
16 000101100000000000000000 77777777 00 13579246 0000
16 000101100000000000000100 11235813 16 11235813 0001
FF 11223344 3FFE
16 000101101111111111111100 12345678 16 12345678 3FFF
8 bits 32 bits
FF 111111110000000000000000 16-Kline cache
FF 111111110000000000000100
FF 111111111111111111111000 11223344
FF 111111111111111111111100 24682468
Note: Memory address values are
in binary representation;
32 bits other values are in hexadecimal
16-MByte main memory
ĐẠI HỌC
CÔNG NGHỆ
Direct Mapping Summary
12
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Victim Cache
ĐẠI HỌC
CÔNG NGHỆ
s+w
w Lj
s
W4j
W(4j+1)
Compare w Bj
W(4j+2)
W(4j+3)
(hit in cache)
1 if match
0 if no match
s
Lm–1
0 if match
1 if no match
(miss in cache)
13
058CE7 FEDCBA98 0001
058CE6 000101100011001110011000
058CE7 000101100011001110011100
058CE8 000101100011001110100000
FEDCBA98 FEDCBA98 9/27/2022
3FFFFD 33333333 3FFD
000000 13579246 3FFE
3FFFFF 24682468 3FFF
22 bits 32 bits
16 Kline Cache
Line
Tag Data Number
3FFFFE 11223344 0000
058CE7 FEDCBA98 0001
058CE6 000101100011001110011000
058CE7 000101100011001110011100 FEDCBA98 FEDCBA98
058CE8 000101100011001110100000
3FFFFD 33333333 3FFD
000000 13579246 3FFE
3FFFFF 24682468 3FFF
22 bits 32 bits
16 Kline Cache
Tag Word
Main Memory Address =
22 bits 2 bits
ĐẠI HỌC
CÔNG NGHỆ
Associative Mapping Summary
14
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Set Associative Mapping
B0 L0
ĐẠI HỌC
k lines
CÔNG NGHỆ
Lk–1
B0 L0
Cache memory - set 0
Bv–1
k lines
First v blocks of
main memory
(equal to number of sets)
Lk–1
Cache memory - set 0
Bv–1
First v blocks of
main memory
(equal to number of sets)
B0 L0
one
set
B0 L0
v lines
one
set
v lines
Bv–1 Lv–1
First v blocks of Cache memory - way 1 Cache memory - way k
main memory
Bv–1 (equal to number of sets) Lv–1
First v blocks of Cache memory - way 1 Cache memory - way k
main memory
9/27/2022 Xuan-Tu Tran 30
(equal to number of sets) (b) k direct-mapped caches
15
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
s+w
Set 0
s–d Fk–1
Fk s+w
Bj
0 if match
1 if no match
(miss in cache)
9/27/2022 Xuan-Tu Tran 31
ĐẠI HỌC
CÔNG NGHỆ
Set Associative Mapping Summary
16
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
000 000000001111111111111000
000 000000001111111111111100
Set
Tag Data Number Tag Data
02C 000101100000000000000000 77777777 000 13579246 0000 02C 77777777
02C 000101100000000000000100 11235813 02C 11235813 0001
32 bits
Note: Memory address values are
16 MByte Main Memory in binary representation;
9/27/2022 Xuan-Tu Tran 33
other values are in hexadecimal
ĐẠI HỌC
CÔNG NGHỆ
1.0
0.9
0.8
0.7
Hit ratio
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M
Cache size (bytes)
direct
2-way
4-way
8-way
9/27/2022 16-way Xuan-Tu Tran 34
17
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Replacement Algorithms
• Once the cache has been filled, when a new block is brought
into the cache, one of the existing blocks must be replaced
• For direct mapping there is only one possible line for any
particular block and no choice is possible
• For the associative and set-associative techniques a
replacement algorithm is needed
• To achieve high speed, an algorithm must be implemented in
hardware
ĐẠI HỌC
CÔNG NGHỆ
The most common replacement algorithms are:
18
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Write Policy
If at least one write operation has been A more complex problem occurs when
performed on a word in that line of the multiple processors are attached to the
cache then main memory must be same bus and each processor has its
updated by writing the line of cache out own local cache - if a word is altered in
to the block of memory before bringing in one cache it could conceivably
the new block invalidate a word in other caches
Write Through
and Write Back
ĐẠI HỌC
CÔNG NGHỆ
• Write through
– Simplest technique
– All write operations are made to main memory as well as to the cache
– The main disadvantage of this technique is that it generates
substantial memory traffic and may create a bottleneck
• Write back
– Minimizes memory writes
– Updates are made only in the cache
– Portions of main memory are invalid and hence accesses by I/O
modules can be allowed only through the cache
– This makes for complex circuitry and a potential bottleneck
19
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ Line Size
block size
Two specific effects
a block increases come into play:
of data more • Larger blocks reduce
is useful data the number of blocks
retrieve are that fit into a cache
d with brought • block becomes larger;
each additional word
adjacen into the is farther from the
t words cache requested word
ĐẠI HỌC
CÔNG NGHỆ
Multilevel Caches
20
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
0.98
0.96
0.94
0.92
0.90
L1 = 16k
Hit ratio
0.88 L1 = 8k
0.86
0.84
0.82
0.80
0.78
1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M 2M
Figure 4.17 Total Hit Ratio (L1 and L2) for 8 Kbyte and 16 Kbyte L1
ĐẠI HỌC
CÔNG NGHỆ
Unified Versus Split Caches
21
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Intel Cache Evolution
Processor on which
Feature First
Problem Solution Appears
Add external cache using 386
External memory slower than the system
faster memory
bus.
technology.
Increased processor speed results in Move external cache on- 486
chip, operating at the
external bus becoming a bottleneck for
same speed as the
cache access.
processor.
Internal cache is rather small, due to Add external L2 cache 486
limited space on chip using faster technology
than main memory
Contention occurs when both the Create separate data and Pentium
Instruction Prefetcher and the Execution instruction caches.
Unit simultaneously require access to the
cache. In that case, the Prefetcher is stalled
while the Execution Unit’s data access
takes place.
Create separate back-side Pentium Pro
bus that runs at higher
speed than the main
Increased processor speed results in
(front-side) external bus.
external bus becoming a bottleneck for L2
The BSB is dedicated to
cache access.
the L2 cache.
Move L2 cache on to the Pentium II
processor chip.
Some applications deal with massive Add external L3 cache. Pentium III
databases and must have rapid access to
9/27/2022 large amounts of data. The on-chip caches Move
Xuan-TuL3Tran
cache on-chip. Pentium 4 43
are too small.
ĐẠI HỌC
CÔNG NGHỆ
Pentium 4 Block Diagram
System Bus
22
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Summary
ĐẠI HỌC
CÔNG NGHỆ
Cache
Chapter 4 Memory
– Elements of cache
• Computer memory
design
system overview
– Cache addresses
– Characteristics of
Memory Systems – Cache size
– Memory Hierarchy – Mapping function
• Cache memory – Replacement
algorithms
principles
– Write policy
– Pentium 4 cache
– Line size
organization
– Number of caches
9/27/2022 Xuan-Tu Tran 46
23
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Intel Coffee Lake
ĐẠI HỌC
CÔNG NGHỆ
ARM Cache
24
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Tổ chức ARM Cache …
ĐẠI HỌC
CÔNG NGHỆ
Apple A8 CPU
• 2 cores
• Max. CPU clock: 1.38 GHz
• Min. feature size: 20 nm
• Instruction set: ARMv8-A
• L1 cache: Per core: 64 KB instruction + 64 KB data
• L2 cache: 1 MB shared
• L3 cache: 4 MB
• 1 GB of LPDDR3 RAM included in the package
• GPU: PowerVR Series 6XT GX6450 (quad core)
• 2 billion transistors, physical size reduced by 13% to 89 mm2
• Produced by Taiwan Semiconductor Manufacturing Company
Limited (TSMC)
25
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Apple A8X, 10/2014
• Cores: 3
• Max. CPU clock rate: 1.5 GHz
• Min. feature size: 20 nm
• Instruction set: A64, A32, T32
• Microarchitecture: Typhoon ARMv8-A-compatible
• L1 cache Per core: 64 KB instruction + 64 KB data
• L2 cache 2 MB shared
• L3 cache 4 MB
• Predecessor Apple A7
• Successor Apple A9X
• GPU PowerVR Series 6XT GXA6850 (octa-core)
ĐẠI HỌC
CÔNG NGHỆ
Apple A9
• Cores: 2
• Max. CPU clock rate: 2.16 -2.26 GHz
• Min. feature size: 16 (TSMC)->14(Samsung) nm
• Instruction set: A64, A32, T32
• Microarchitecture: Typhoon ARMv8-A-compatible
• L1 cache/core: 64 KB instruction + 64 KB data
• L2 cache 3 MB shared
• L3 cache 4 MB (not for A9X)
• GPU : PowerVR Series 7XT GT7600 (six-core)
A9X PowerVR Series 7XT (12 cores)
• included 2 GB of LPDDR4 RAM (not for A9X – 4GB)
26
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Apple A10 Fusion
• Cores: 2
• Max. CPU clock rate: 2.34 GHz
• Min. feature size: 16 (TSMC) nm
• Instruction set: A64, A32, T32
• Microarchitecture: Typhoon ARMv8-A-compatible
• L1 cache/core: 64 KB instruction + 64 KB data
• L2 cache 3 MB shared
• L3 cache 4 MB (not for A9X)
• GPU : (six-core)
• included the LPDDR4 RAM: 2 GB – iPhone 7; 3GB for 7+
ĐẠI HỌC
CÔNG NGHỆ
A11 Bionic
27
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
A12 Bionic
ĐẠI HỌC
CÔNG NGHỆ
Qualcomm Snapdragon
28
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Exynos 7 Octa 7420, 2015
• Technology: 14 nm LPE
• Instruction Set: ARMv8-A
• Microarchitecture: Cortex-A57+
Cortex-A53 (big.LITTLE with GTS)
• Cores: 4 (2.1GHz) + 4 (1.5GHz)
• GPU: Mali-T760 MP8 @ 772 MHz
ĐẠI HỌC
CÔNG NGHỆ
Exynos 8 Octa 8890
• Technology: 14 nm LPE
• Instruction Set: ARMv8-A
• Microarchitecture: Exynos M1
"Mongoose"+ Cortex-A53 (GTS))
• Cores: 4 (2.2-2.6GHz) + 4 (1.6GHz)
• GPU: Mali-T880 MP12 @ 650 MHz
• RAM LPDDR4, 1794 MHz
• Samsung Galaxy S7, S7 Edge, Note 7
29
9/27/2022
ĐẠI HỌC
CÔNG NGHỆ
Exynos 9 Series (8895)
30