Basics of Memories

The Memory Hierarchy
1. Overview of the memory hierarchy

2. Basics of Cache Memories
3. Virtual Memory
4. Satisfying a memory request
5. Performance and some cache memory optimizations
1 / 33
The Memory Hierarchy
Basic Problem: How to design a

reasonable cost memory system that can Disk
deliver data at speeds close to the CPU’s
consumption rate.
Main Memory
Current Answer: Construct a memory
hierarchy with slow (inexpensive, large
size) components at the higher levels Cache (L3)
and with fast (most expensive, smallest)
components at the lowest level. Cache (L2)
Migration: As information is referenced,
migrate the data into and out-of the
I−Cache D−Cache
lowest level memories.
CPU/core
2 / 33
How does this help?
I Programs are well behaved and tend to follow the

observed “principles of locality” for programs.
Principle of temporal locality: states that a referenced data
object will likely be referenced again in the near future.
Principle of spatial locality: states that if data at location x
is referenced at time t then it is likely that a nearby data
location x + ∆x will be referenced at a nearby time t + ∆t.
I Also consider the 90/10 rule: A program executes about
90% of it’s instructions from about 10% of its code space.
3 / 33
Cache Memories
I Fixed-sized blocks mapped to the cache (from the

[physical or virtual] memory space)
I Due to speed considerations, all operations implemented
in hardware.
I 4 types of mapping policies:
I fully associative
I direct mapped
I set associative
I sector mapped (not discussed further)
4 / 33
Fully Associative Cache
Any block in the (physical or virtual) memory can be mapped into any
block in cache (just like paged virtual memory). Memory addresses
are formed as: n d where n is the memory tag and d is the
address in the block.
tag
Block 0 Block 0
tag
Block i
Block N−1
tag
Block N−1
Cache
n Block M
Lookup Algorithm
if ∃α : 0 ≤ α < N ∧ cache[α].tag = n
Memory
then return cache[α].word[d]
else cache-miss
5 / 33
Direct Mapped Cache
If cache is partitioned into N fixed size blocks then each cache block
k will contain only (physical or virtual) memory blocks k + nN, where
n = 0, 1, .... The memory addresses are formed as: n k d
where n is the memory tag, k is the cache block index, and d is the
address in the block.
tag
Block 0 Block 0
tag
Block 1 Block 1
k
tag
Block N−1 Block N−1
Cache Block N
n Block N+1
Lookup Algorithm
if cache[k].tag = n
Memory
then return cache[k].word[d]
else cache-miss
6 / 33
Set Associative Cache
Midpoint between directed mapped and fully associative (at the extremes it
degenerates to each). Basic idea: divide cache into S sets with E = N/S block
frames per set (N total blocks in the cache). Memory address are formed as:
n k d where n is the memory tag, k is the cache set index, and d is
the address in the block.
tag
Block 0
Set 0 tag
Block E−1
Block 0
tag
Block 0
Set 1 tag
Block E−1
Block 1
k
tag
Block 0
Set S−1 tag
Block E−1
Block S−1
Block S
n Cache
Lookup Algorithm Block S+1
if ∃α : 0 ≤ α < E ∧ cache[k].block[α].tag = n
then return cache[k].block[α].word[d]
else cache-miss Memory
7 / 33
Cache Policies
I Read/Write Policies
Disk
I Read through
I Early restart
Main Memory
I Critical word first
I Write through
Cache (L3)
I Write back
I Way prediction
Cache (L2)
I Write allocate/no-allocate
I Structure
I Unified or Split Caches I−Cache D−Cache
I Contents across Cache Levels CPU/core
I Multilevel inclusion
I Multilevel exclusion
8 / 33
Cache Miss Rates by Size
9 / 33
Cache Miss Rates by Type
10 / 33
Lecture break; continue next class
11 / 33
Virtual Memory
Addresses used by programs do not correspond to actual

addresses of the program/data locations in main memory.
Instead, there exists a translation mechanism between the CPU
and memory that actually associates CPU addresses (virtual)
to their actual (physical) in memory
Address Main
CPU Translation Memory
Virtual Physical
Address Address
The two most important types:

I Paged virtual memory
I Segmented virtual memory
12 / 33
Demand Paged Virtual Memory
I Logically subdivide virtual and physical spaces into fixed

size units called pages.
I Keep virtual space on disk (swap for working/dirty pages).
I As referenced (on demand), bring pages into main
memory (and into the memory hierarchy) and update the
page table.
I Virtual pages can be mapped into any available physical
page.
I Requires some page replacement algorithm: random,
FIFO, LRU, LFU.
13 / 33
Paged Virtual Memory: Structure
Physical Addresses Virtual Addresses
page 0 page 0
page 1 page 1
page N−1
Main Memory
page M−1
Program Space
14 / 33
Paged Virtual Memory: Address Translation
Virtual Address
page number page offset
page table
page number page offset

Physical Address
valid bit dirty bit/other info
15 / 33
Segmented Virtual Memory
I Identify segments in virtual space by the logical structure

of the program. Using logical structure, we can identify
access control to the segment.
I Dynamically map segments into physical space as
segments are referenced (on demand). Update segment
table accordingly
I Keep virtual space on disk (swap for working/dirty pages).
I Need segment placement algorithm: best fit, worst fit, first
fit.
I Needs some segment replacement algorithm.
16 / 33
Segmented Virtual Memory: Structure
Physical Addresses Virtual Addresses

segment 0
segment i
segment 0 segment i
segment m
Main Memory
segment m
Program Space
17 / 33
Seg. Virt. Memory: Address Translation
Virtual Address
segment number seg. offset
segment table
Physical Address
valid bit dirty bit/length/other info
18 / 33
Segmented-Paged Virtual Memory
I Page the segments and map pages into physical memory.

Considerably improving and simplifying physical memory
use/management.
I Adds an additional translation step.
I Drastically shrinks page table sizes.
I Retains protection capabilities of segments.
I Mechanism used in most contemporary systems (ok,
virtualization adds yet another level of indirection....we will
discuss this later).
19 / 33
Segmented-Paged: Address Translation
Virtual Address
seg # page # pg. offset
segment
table
page
table
Physical Address
20 / 33
Hardware Implications of Virtual Memory
I Virtual memory requires a careful coordination between

the hardware and O/S.
I Virtual memory translation will be performed (at least
mostly) in hardware.
I When hardware cannot translate a virtual address, it will
trap to the O/S.
I Hardware structures for translation will limit the type of
virtual memory that an O/S can deploy.
21 / 33
Lecture break; continue next class
22 / 33
The Memory Hierarchy: Who does what?
I MMU: Memory Management Unit

Disk
I TB: (address) Translation Buffer
I H/W: search to Main Memory Main Memory
I O/S: handles page faults (invoking DMA

operations) Cache (L3)
I if page being replaced is dirty, copy
out first (DMA)
Cache (L2)
I move new page to main memory
(DMA)
I−Cache D−Cache
I O/S: context swaps on page fault
TB
MMU CPU
I DMA: operates concurrently w/ other
tasks
23 / 33
Notes
I Memory Management Unit

I Manages memory subsystems to main memory
I Translates addresses, searches caches, migrates data
(to/from main memory out/in)
I Translation Buffer
I Small cache to assist virtual → physical address translation
process
I generally small (e.g., 64 entries), size does not need to
correspond to cache sizes
I Simplifying Assumptions
I L1, L2, & L3 use physical addresses
I all caches are inclusive
I paged virtual memory
I ignoring details of TB misses
24 / 33
Satisfying A Memory Request
Satisfied in L1 cache:
1. MMU: translate address
2. MMU: search I or D cache as indicated by CPU, success
(sometimes simultaneously with translation)
3. MMU: read/write information to/from CPU
25 / 33
2. MMU: search I or D cache as indicated by CPU, failure
3. MMU: search L2 cache, success
4. MMU: place block from L2 → L1, critical word first?
26 / 33
3. MMU: search L2 cache, failure
4. MMU: search L3 cache, success
27 / 33
Satisfied in main memory:

5. MMU: place block from memory → L3, critical word first?
28 / 33
Not in main memory:

1. MMU: translate address, failure trap to O/S
2. O/S: page fault, block task for page fault
I O/S: if page dirty
I O/S: initiate DMA transfer to copy page to swap
I O/S: block task for DMA interrupt
I O/S: invoke task scheduler
I O/S: on interrupt continue
I O/S: initiate DMA transfer to copy page to main memory
I O/S: block task for DMA interrupt
I O/S: invoke task scheduler
I O/S: on interrupt:
I update page table
I return task to ready to run list
29 / 33
Memory Performance Impact: Naive
CPU exec time
= (CPU clock cycles + Memory stall cycles) × Clock cycle time
Memory stall cycles (doesn’t consider hit times 6= 1)
= Number of Misses × Miss Penalty

Misses
= IC × × Miss penalty
Instruction
Memory accesses
= IC × × Miss Rate × Miss penalty
Instruction
Memory reads
= IC × × Read miss Rate × Read miss penalty
Instruction
Memory Writes
+IC × × Write miss Rate × Write miss penalty
Instruction
30 / 33
Average Memory Access Time
Average memory access time
= Hit time + Miss Rate × Miss penalty

= Hit timeL1 + Miss RateL1 × (Hit timeL2 + Miss RateL2 × Miss penaltyL2 )
Accounting for Out-of-order Execution: (discussion probably premature at

this time)
Memory stall Cycles Misses

Instruction
= Instruction
× (Total miss latency − Overlapped miss latency)
31 / 33
Cache Misses
I Miss Rate
# misses in a cache
Local miss rate =
total # of memory accesses to this cache
# misses in a cache
Global miss rate =
total # of memory accesses generated by CPU
I Categories of Cache Misses

I Compulsory: cold-start or first-reference misses.
I Capacity: not all blocks needed fit into cache.
I Conflict: some needed blocks map to same cache block.
32 / 33
Basic Cache Memory Optimizations
I Larger block sizes to reduce miss rate

I Larger caches to reduce miss rate
I Higher associativity to reduce miss rate
I Multi-level caches to reduce miss penalty
I Giving priority to read misses over writes to reduce miss
penalty
I Avoid address translation to reduce hit time
33 / 33

Basics of Memories

Uploaded by

Copyright:

Available Formats

You might also like

Basics of Memories

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basics of Memories

Uploaded by

Copyright:

Available Formats

The Memory Hierarchy

1. Overview of the memory hierarchy

Basic Problem: How to design a

I Programs are well behaved and tend to follow the

I Fixed-sized blocks mapped to the cache (from the

Addresses used by programs do not correspond to actual

The two most important types:

I Logically subdivide virtual and physical spaces into fixed

Physical Addresses Virtual Addresses

page number page offset

valid bit dirty bit/other info

I Identify segments in virtual space by the logical structure

Physical Addresses Virtual Addresses

valid bit dirty bit/length/other info

I Page the segments and map pages into physical memory.

I Virtual memory requires a careful coordination between

I MMU: Memory Management Unit

I H/W: search to Main Memory Main Memory

I O/S: handles page faults (invoking DMA

I Memory Management Unit

Satisfied in main memory:

Not in main memory:

CPU exec time

= (CPU clock cycles + Memory stall cycles) × Clock cycle time

Memory stall cycles (doesn’t consider hit times 6= 1)

= Number of Misses × Miss Penalty

Average memory access time

= Hit time + Miss Rate × Miss penalty

Accounting for Out-of-order Execution: (discussion probably premature at

Memory stall Cycles Misses

I Categories of Cache Misses

I Larger block sizes to reduce miss rate

You might also like