Professional Documents
Culture Documents
Memory Hierarchy - Caches
Memory Hierarchy - Caches
DRAM Technology
• Uses one transistor per bit
• Data stored as a charge in a capacitor
• Must periodically be refreshed
• Read contents and write back
• Performed on a DRAM “row”
SANTA CLARA UNIVERSITY
Flash Storage
• Nonvolatile semiconductor storage
• 100× – 1000× faster than disk
• Smaller, lower power, more robust
• But more $/GB (between disk and DRAM)
SANTA CLARA UNIVERSITY
Flash Types
• NOR flash: bit cell like a NOR gate
• Random read/write access
• Used for instruction memory in embedded systems
• NAND flash: bit cell like a NAND gate
• Denser (bits/area), but block-at-a-time access
• Cheaper per GB
• Used for USB keys, media storage, …
• Flash bits wears out after 1000’s of accesses
• Not suitable for direct RAM or disk replacement
• Wear leveling: remap data to less used blocks
SANTA CLARA UNIVERSITY
Memory Pyramid
• Static RAM (SRAM)
• 0.5ns – 2.5ns, $2000 – $5000 per GB
• Dynamic RAM (DRAM)
• 50ns – 70ns, $20 – $75 per GB
• Magnetic disk
• 5ms – 20ms, $0.20 – $2 per GB
Memory Hierarchy
• Multiple levels of memory
• Closest to CPU
• SRAM (fast) but can use small amounts.
• May be multiple levels (bigger is slower)
• Further away
• DRAM (slower), but can use more of it (cheap)
• Farthest from CPU
• Disk drive, slowest, but can be huge.
SANTA CLARA UNIVERSITY
Hierarchy in Action
• Store everything on disk
• Copy recently accessed (and nearby) items from disk to
smaller DRAM memory
• Main memory
• Copy more recently accessed (and nearby) items from
DRAM to smaller SRAM memory
• Cache memory attached to CPU
SANTA CLARA UNIVERSITY
Cache Use
• Will copy data as needed from memory into cache.
• Two factors determining the size of data to be copied.
• Memory access delay
• Locality of reference
SANTA CLARA UNIVERSITY
First factor: DRAM access delay
• Access time for 1’st data item :
• long (100’s ns)
• Due to structure of DRAM
• Possible queuing for DRAM access behind other CPU’s accesses
• Access time of 2’nd, 3’rd … datat items:
• relatively faster (e.g. 10 ns)
SANTA CLARA UNIVERSITY
Summing Up
• There are two benefits to copying many bytes from
memory to cache in one access:
• Reducing cost of transfer per byte &
• Exploiting spatial locality
• However
• If the data brought in is not “re-used” often enough, then
copying too much might be wasteful because some of the
data copied might not be used.
SANTA CLARA UNIVERSITY
Cache Memory
• Cache memory
• The level of the memory hierarchy closest to the CPU
• Given accesses X1, …, Xn–1, Xn
◼ How do we know if
the data is present?
◼ Where do we look?
SANTA CLARA UNIVERSITY
Valid Bits
• What if there is no meaningful data in a cache line ? (e.g
at startup time)
• Add one valid bit per cache line.
• Valid bit: 1 = tag is for real data , 0 = junk tag
• Valid bit is Initially 0 (at power-up).
• Hit condition:
• Relevant part of address matches tag of some cache line
AND
• Valid bit is 1 for matched cache line.
SANTA CLARA UNIVERSITY
Cache Misses
• On cache hit, CPU proceeds normally
• On cache miss it will
• Stall the CPU pipeline
• Fetch block from next level of hierarchy
• Instruction cache miss
• Restart instruction fetch
• Data cache miss
• Complete data access
SANTA CLARA UNIVERSITY
Address Breakdown:
• Example:
• Direct-mapped cache
• 64 cache lines, 16 bytes/lines
• To what cache line number does address 1200 map?
• Memory block address = 1200/16 = 75
• Cache line # to be checked= 75 modulo 64 = 11
31 10 9 4 3 0
Tag Index Offset
22 bits 6 bits 4 bits
SANTA CLARA UNIVERSITY
Spectrum of Associativity
• For a cache with 8 lines
SANTA CLARA UNIVERSITY
Replacement Policy
• When memory block brought to associative cache,
controller needs to select which cache line to replace:
• Direct mapped:
• There is a single cache line corresponding to a memory block, so no
policy is needed.
• Set associative
• Memory block maps to a cache set.
• If set has one unused line (valid bit is 0), put memory block
there.
• Otherwise, choose a “victim” line from the lines in the set.
• Need replacement policy (i.e., algorithm)
SANTA CLARA UNIVERSITY
Pseudo-LRU
• Real LRU:
• Sort all lines in a cache set in a total (linear) order (need log2S
bits per line to store order)
• Pseudo-LRU:
• Only remember the line which was accessed most recently
• On a miss, select one of the other S-1 lines for replacement
(some pseudo-random choice).
• Needs one ordering bit per line.
• Easier to implement in hardware.
SANTA CLARA UNIVERSITY
Handling Writes
• Cache writes impose requirement:
• Need to keep cache and memory copies of the same variable
(e.g. X) consistent!
• Techniques for handling:
• Write hits
• Write misses
SANTA CLARA UNIVERSITY
• When writes are sparser write data is sent to memory and CPU can keep
running.
• Otherwise, writes can “clog” the cache-memory interface and cause CPU to
stall.
SANTA CLARA UNIVERSITY
Write Hit Option 2: Write-Back
Performance Summary
• When CPU performance increased
• Miss penalty becomes more significant
• Decreasing base CPI
• Greater proportion of time spent on memory stalls
• Increasing clock rate
• Memory stalls account for more CPU cycles
• Can’t neglect cache behavior when evaluating system
performance
SANTA CLARA UNIVERSITY