Professional Documents
Culture Documents
The Memory Hierarchy: Topics
The Memory Hierarchy: Topics
The Memory Hierarchy: Topics
Topics
class10.ppt
Tran.
per bit
Access
time
Persist? Sensitive?
Cost
Applications
SRAM
1X
Yes
No
100x
cache memories
DRAM
10X
No
Yes
1X
Main memories,
frame buffers
15-213, F02
addr
1
rows
memory
controller
supercell
(2,1)
(to CPU)
8 bits
/
data
15-213, F02
RAS = 2
2
/
addr
1
rows
memory
controller
2
8
/
data
15-213, F02
CAS = 1
2
/
addr
To CPU
1
rows
memory
controller
supercell
(2,1)
2
8
/
data
supercell
(2,1)
15-213, F02
Memory Modules
addr (row = i, col = j)
: supercell (i,j)
DRAM 0
64 MB
memory module
consisting of
eight 8Mx8 DRAMs
DRAM 7
63
56 55
48 47
40 39
32 31
24 23 16 15
8 7
bits
0-7
Memory
controller
64-bit doubleword
7
15-213, F02
Enhanced DRAMs
All enhanced DRAMs are built around the conventional
DRAM core.
signals.
signals.
15-213, F02
Nonvolatile Memories
DRAM and SRAM are volatile memories
Types of ROMs
Firmware
15-213, F02
bus interface
10
I/O
bridge
memory bus
main
memory
15-213, F02
register file
%eax
I/O bridge
bus interface
11
main memory
0
x
15-213, F02
I/O bridge
bus interface
12
main memory
0
x
15-213, F02
I/O bridge
bus interface
13
main memory
0
x
15-213, F02
I/O bridge
bus interface
14
main memory
0
A
15-213, F02
register file
%eax
I/O bridge
bus interface
15
main memory
0
A
15-213, F02
I/O bridge
bus interface
16
main memory
0
y
15-213, F02
Disk Geometry
Disks consist of platters, each with two surfaces.
Each surface consists of concentric rings called tracks.
Each track consists of sectors separated by gaps.
tracks
surface
track k
gaps
spindle
sectors
17
15-213, F02
surface 1
surface 2
platter 1
surface 3
surface 4
platter 2
surface 5
spindle
18
15-213, F02
Disk Capacity
Capacity: maximum number of bits that can be stored.
19
(# platters/disk)
Example:
512 bytes/sector
300 sectors/track (on average)
20,000 tracks/surface
2 surfaces/platter
5 platters/disk
15-213, F02
spindle
spindle
spindle
21
15-213, F02
arm
spindle
22
15-213, F02
Time waiting for first bit of target sector to pass under r/w head.
Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min
23
15-213, F02
Derived:
Important points:
24
15-213, F02
The set of available sectors is modeled as a sequence of bsized logical blocks (0, 1, 2, ...)
25
I/O Bus
CPU chip
register file
ALU
system bus
memory bus
main
memory
I/O
bridge
bus interface
I/O bus
USB
controller
mouse keyboard
26
graphics
adapter
disk
controller
monitor
disk
15-213, F02
main
memory
bus interface
I/O bus
USB
controller
mouse keyboard
graphics
adapter
disk
controller
monitor
disk
27
15-213, F02
main
memory
bus interface
I/O bus
USB
controller
mouse keyboard
graphics
adapter
disk
controller
monitor
disk
28
15-213, F02
main
memory
bus interface
I/O bus
USB
controller
mouse keyboard
graphics
adapter
disk
controller
monitor
disk
29
15-213, F02
Storage Trends
SRAM
DRAM
Disk
30
metric
1980
1985
1990
1995
2000
2000:1980
$/MB
access (ns)
19,200
300
2,900
150
320
35
256
15
100
2
190
100
metric
1980
1985
1990
1995
2000
2000:1980
$/MB
8,000
access (ns)
375
typical size(MB) 0.064
880
200
0.256
100
100
4
30
70
16
1
60
64
8,000
6
1,000
metric
1985
1990
1995
2000
2000:1980
100
75
10
8
28
160
0.30
10
1,000
0.05
8
9,000
10,000
11
9,000
1980
$/MB
500
access (ms)
87
typical size(MB) 1
15-213, F02
processor
clock rate(MHz)
cycle time(ns)
31
1980
8080
1
1,000
1985
286
6
166
1990
386
20
50
1995
Pent
150
6
2000
P-III
750
1.6
2000:1980
750
750
15-213, F02
ns
100,000
10,000
1,000
100
10
1
1980
1985
1990
1995
2000
year
32
15-213, F02
Locality
Principle of Locality:
Locality Example:
sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;
Data
Reference array elements in succession
(stride-1 reference pattern): Spatial locality
Reference sum each iteration: Temporal locality
Instructions
Reference instructions in sequence: Spatial locality
Cycle through loop repeatedly: Temporal locality
33
15-213, F02
Locality Example
Claim: Being able to look at code and get a qualitative
sense of its locality is a key skill for a professional
programmer.
15-213, F02
Locality Example
Question: Does this function have good locality?
35
15-213, F02
Locality Example
Question: Can you permute the loops so that the
function scans the 3-d array a[] with a stride-1
reference pattern (and thus has good spatial
locality)?
int sumarray3d(int a[M][N][N])
{
int i, j, k, sum = 0;
for (i = 0; i < M; i++)
for (j = 0; j < N; j++)
for (k = 0; k < N; k++)
sum += a[k][i][j];
return sum
}
36
15-213, F02
Memory Hierarchies
Some fundamental and enduring properties of hardware
and software:
Fast storage technologies cost more per byte and have less
capacity.
The gap between CPU and main memory speed is widening.
Well-written programs tend to exhibit good locality.
15-213, F02
L0:
registers
L1: on-chip L1
cache (SRAM)
L2:
L3:
Larger,
slower,
and
cheaper
(per byte)
storage
devices
L5:
38
L4:
off-chip L2
cache (SRAM)
main memory
(DRAM)
Caches
Cache: A smaller, faster storage device that acts as a
staging area for a subset of the data in a larger,
slower device.
Fundamental idea of a memory hierarchy:
39
8
4
10
4
Level k+1:
40
14
10
10
11
12
13
14
15
15-213, F02
Level
k:
Cache hit
4*
12
14
12
4*
Level
k+1:
41
Request
12
14
Cache miss
Request
12
4
4*
10
11
12
13
14
15
Conflict miss
Most caches limit blocks at level k+1 to a small subset
Capacity miss
Occurs when the set of active cache blocks (working set) is
42
15-213, F02
What Cached
Where Cached
Registers
4-byte word
CPU registers
0 Compiler
TLB
Address
translations
32-byte block
32-byte block
4-KB page
On-Chip TLB
0 Hardware
On-Chip L1
Off-Chip L2
Main memory
Parts of files
Main memory
1 Hardware
10 Hardware
100 Hardware+
OS
100 OS
L1 cache
L2 cache
Virtual
Memory
Buffer cache
Local disk
Web cache
Remote server
disks
43
Web pages
Local disk
Latency
(cycles)
Managed
By
10,000,000 AFS/NFS
client
10,000,000 Web
browser
1,000,000,000 Web proxy
server
15-213, F02