Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Cache Memory

Cache Memory

• Cache memory is a small amount of


high-speed memory that is placed Processor
between the Processor and the
Main Memory unit.
• Due to locality of reference
phenomenon the references to the Cache
memory locations are confined to a
small locality.
• The cache memory tries to
accommodate the content of this Main
small locality so that the processor
can access these contents with a Memory
significantly lower access time.
Cache Memory

• When the processor issues a memory access


request it is first referred to the cache.
Processor
• If the cache contains the location, a hit is said to
occur, and the operation is performed from the cache
at the cost of access from the cache.
• Else a miss is said to occur and in that case the
operation is performed on the main memory at the Cache
cost of access from the main memory.
• As per the phenomenon of Temporal Locality, a
memory location referred to once is likely to be
referred to again in the near future.
Main
• To exploit this, after a miss, a copy of the content of
the location is accessed from main memory is saved Memory
in the cache so that the next time a reference to this
location occurs it can be accessed at the cost of
access from the cache.
Cache Memory

• As per the phenomenon of Spatial Locality, if a


memory location is referred to, it is likely that the
locations adjacent to it will also be referred to in the Processor
near future.
• To exploit this, after a miss, while fetching the
content of the location that was referred to, the
contents of the adjacent locations are also fetched, Cache
in a small block (called Cache Line) transfer, and
saved in the cache.
• You will remember that block transfers from main
memory cost much less than transfer of the location
contents as individual bytes.
Main
• A reference to any of these adjacent locations, in Memory
the near future, results in hit and the access is
made at the cost of access from the cache.
Memory
Access
with Cache
Performance of Cache
• Define Hit Ratio (h) as the fraction of memory references that are located in
the cache.
• If 8 out of 10 references are found in the in the cache, then h=0.8;
• These 8 Hit accesses will cost 8Tc, where Tc is the cache access time.
• Rest 2 Miss accesses will cost 2Tm, where Tm is the main memory access time.
• The total access time for the 10 memory accesses will be- 8Tc + 2Tm
And average access time per memory access will be- 0.8Tc + 0.2Tm

• In general, the Average Access Time will be,


Ta = h Tc + (1 – h) Tm
• For two level cache Average Access Time will be,
Ta = h1 Tc1 + (1- h1) h2 Tc2 + (1 - h1)(1 - h2)Tm
where
• Tc1, Tc2 are the level1 & level2 cache access times
• Tm is the main memory access time for a Cache Line
• h1, h2 are the Hit Ratios level1 & level2 cache respectively

Note: The Hit Ratios of cache depend on the design and size of the cache as well
as the behavior of the program in terms of locality of reference. Instances
of Hit Ratios of exceeding 0.9 is not uncommon. Generally higher-level
caches have higher Hit Ratios due to the larger space.
Example Scenarios of Single Level Cache:

For single level cache Average Access Time, Ta = h Tc + (1 – h) Tm


• Ta is the average access time
• Tc is the cache access time
• Tm is the memory access time
• h is the Hit ratio

Ex-1. Tc = 5 ns, Tm = 100 ns, h = 0.95


Ta = 0.95 x 5 + 0.05 x 100 ns =4.75 + 5 ns = 9.75ns

Ex-2. Tc = 5 ns, Tm = 100 ns, h = 0.99


Ta = 0.99 x 5 + 0.01 x 100 ns =4.95 + 1 ns = 5.95ns
Example Scenarios of Two-Level Cache:

• For two level cache Average Access Time,


Ta = h1 Tc1 + (1- h1) h2 Tc2 + (1 - h1)(1 - h2)Tm
• Tc1, Tc2 are the level1 & level2 cache access times
• Tm is the main memory access times
• h1, h2 are the Hit ratios

Ex-3. Tc1 = 5 ns, Tc2 = 10 ns, Tm = 100 ns, h1 = 0.95, h2 = 0.99


Ta = 0.95x5 + 0.05x0.99x10 + 0.05x0.01x100 ns
=4.75 + 0.495 + 0.05 ns = 5.295 ns
Questions?
Cache Organization
Main Memory Blocks:
• Main memory consists
of up to 2n addressable
words, with each word
having a unique n-bit
address.
• For mapping purposes,
grouped into a number
of fixed-length blocks
of K words each.
• No. of blocks in main
memory, M = (2n)/K.
Cache Organization
• Cache cannot be organized as Random
Access like in Main Memory as same
cache location has to hold contents of
different memory locations at different
times.
• It is organized as Content Addressable
Memory (CAM).
• A part of the memory address of the
data, referred to as Tag, is stored as part
of the content in the cache
• The cache is organized as an array of C
Cache Lines.
• Each cache line has space for holding K
words, a Tag (of a few bits) and a
couple of Control bits
• Length of a line, not including tag and control bits, is line size.
• Tag identifies the particular block in memory that is currently being
stored in the Cache Line.
• No. of Cache Lines in the cache, C << M, the number of blocks in main
memory.
Cache Memory
MSB 0011000 0101101101000 011
A Memory Address

Block j

Block No.

(Word bits)
LSB
Cache Memory - Mapping

• There are far fewer cache lines than main memory blocks
• A method for mapping blocks to cache lines is needed
i.e. To determine which block goes to which line

• Based on the type of mapping technique, the cache


organization differs
• Three mapping techniques
• Direct mapping
• Associative mapping
• Set Associative mapping
Cache Memory – Direct Mapping
• Maps each block of main memory into only one specific Cache Line.
• The mapping is expressed as-
• i = j modulo C , where i = cache line number,
j = main memory block number,
C = no. of lines in the cache
• Thus first C blocks map to the C cache lines in a one-to-one manner; similarly
next C blocks map to the C cache lines ...
• Mapping function is easily implemented using the main memory address
• A main memory address can be viewed as consisting of three fields.
• w LS bits identify a unique word or byte within a block of main memory
• t MS bits form the tag and the l bits in the middle specify the cache line
• l + t MS bits specify which block of memory

(t bits) (l bits) (w bits)


Cache Memory – Direct Mapping
MSB 0011000 0101101101000 011

(Tag bits)
A Memory Address

(Line bits)
Block j
Block No.

(Word bits)
LSB
Cache Memory – Direct Mapping
Memory Address length =
(t + l + w) bits
No. of addressable units=
2t+l+w words or bytes
Length of Block =
2w words or bytes BC-1 LC-1
No. of blocks in main memory,
M = 2t+l
No. of cache lines, C = 2l
Size of cache = Cache Line Main Memory Blocks Mapped
2l+w words or bytes 0 0, C, 2C, …, 2 L-C
(Excluding Tags & Control bits) 1 1, C+1, 2C+1, …, 2 L-C+1
No. of blocks mapped to per 2 2, C+2, 2C+2, …, 2 L-C+2
cache line = M/C = 2t

...
Length of tag = t bits B2L -1
C-1 C-1, 2C-1, 3C-1, …, 2 L-1
Main Memory
Cache Memory – Direct Mapping
• Example

Mapping of Memory Blocks to Cache Lines


• 14-bit line number is used as an index Cache Starting Memory Addresses of Blocks
into the cache to access a specific line, Line Mapped to the line
0 000000, 010000, 020000, …, FF0000
• 8-bit tag number matches the tag
1 000004, 010004, 020008, …, FF0004
number currently stored in that line, 2 000008, 010008, 020008, …, FF0008
• 2-bit word number is used to select

...
one of the 4 bytes in that line C-1 00FFFC, 01FFFC, 02FFFC, …, FFFFFC
Cache Memory – Direct Mapping
t+l+w

t l

t+l+w
Cache Memory – Direct Mapping

• Advantage: The direct mapping technique is simple and


inexpensive to implement.

• Disadvantage: there is a fixed cache location for any given


block
• If two Localities of the program, required simultaneously during
execution, occupy memory blocks that map into the same cache
line, then the blocks will be repeatedly swapped between the
Cache and the Main Memory – leading to Thrashing
• Hit ratio will suffer
Questions?
Cache Memory – Associative Mapping
• Associative mapping overcomes the disadvantage of direct mapping by permitting a
main memory block to be loaded into any of the Cache Lines
• Memory address is interpreted as a Tag and a Word field
• Memory Block No. itself is the Tag; i.e. the Tag field uniquely identifies a block
of main memory
• Since any memory block j can be mapped to any cache line i, to determine whether a
block is in the cache, the cache control logic must compare the tag field of every
Cache Line for the tag field of the main memory address.
• To achieve the desired speed of access these comparisons must be done in parallel.
(t bits) (w bits)

• Address length, L = (t + w) bits


• Number of Tag bits, t = b, where, 2b blocks in main memory
Cache Memory – Associative Mapping
t+w

t+w

t
Cache Memory – Associative Mapping

• Advantage:
• Can support large number of simultaneous Localities in the program,
• Flexibility as to which block to replace when a new block is read into the cache -

• Disadvantage:
• Requires expensive circuitry for parallel examination of the tags in all the cache
lines,
• Requires hardware implementation of suitable Cache Replacement Algorithm to
maximize the hit ratio.
Cache Memory – Set Associative Mapping
• Combines the strengths of both, Direct and Associative mapping
approaches while overcoming their disadvantages
• The Cache Lines are grouped into ν Sets, each consisting of α Lines
such that-
Total Cache Line, C = ν x α
and i = j modulo ν
where, i = Set no. of cache lines
j = Main memory block no.
C = No. of lines in cache
ν = No. of cache line sets
α = No. of lines in each set
• This is a α-way set associative mapping
• A block Bi can be loaded into any of the α lines in set j
Cache Memory – Set Associative Mapping

α lines
Lα-1
Cache Memory – Set Associative Mapping
• Cache control logic interprets a memory address as three fields: Tag, Set, and Word
• Tag in a memory address is much smaller and is only compared to the α tags within
a single set
• Address length, l = (t+s+w) bits
• No. of addressable units= 2(t+s+w) word or bytes
• Block size= Line size= 2w
(w bits)
• No. of blocks in main memory= 2t+s (t bits) (s bits)

• No. of lines in a set= α


• No. of sets, γ = 2s
• No. of lines in cache, C = αγ = α x 2s
• Size of cache= α x 2(s+w) word or bytes
• Length of tag, t= l - (s+w) bits
• Total tag size= t x C = t x α x 2s bits
Cache Memory – Set Associative Mapping
t+s+w

t s

t
t+s+w
Example: Direct Mapping
• A 64KB size cache memory with 8-byte cache lines uses Direct Mapping.
• The main memory size is 64MB
• The memory address length is- 26 bits [226 = 64M]
• The Word field length is- 3 bits [23 = 8]
• No. of cache lines = 64 KBytes/ 8 Bytes = 8K
• The Line field length is- 13 bits [213 = 8K]
• The Tag field length= 26 – 13 – 3 = 10 bits
Tag Line Word
(16-25) (3-15) (0-2)

• Total Tag size is- Tag field length x No. of Cache lines = 10 x 1 K bits = 10 K bits
• Comparator size= Tag length = 10 bits
Example: Associative Mapping
• A 64KB size cache memory with 8-byte cache lines uses Associative mapping.
• The main memory size is 64MB

• The memory address length is- 26 bits [226 = 64M]


• The Word field length is- 3 bits [23 = 8]
• No. of cache lines = 64 KBytes/ 8 Bytes = 8K
• The Tag field length= 26 – 3 = 23bits
Tag Word
(3-25) (0-2)

• Total Tag size is- Tag field length x No. of Cache lines = 23 x 8 K bits = 23 KB
• Comparator size= Tag length x No. of cache lines = 23 x 8K bits = 1,844,416 bits
Example: Set-Associative Mapping
• A 64KB size 8-way set-associative cache memory uses 8-byte cache lines.
• The main memory size is 64MB
• The memory address length is- 26 bits [226 = 64M]
• The Word field length is- 3 bits [23 = 8]
• No. of cache lines = 64 KBytes/ 8 Bytes = 8K
• No. of cache line sets = 8K/ 8 = 1K
• The Set field length is- 10 bits [210 = 1K]
• The Tag field length= 26 – 10 – 3 = 13bits
Tag Set Word
(13-25) (3-12) (0-2)

• Total Tag size is- Tag field length x No. of Cache lines = 13 x 8K bits = 13 KB
• Comparator size= Tag length x No. of cache lines per set = 13 x 8 bits = 104 bits
A Comparison of the Mapping Techniques
Parameter Direct Associative Set-Associative
No. of Localities Limited Very Large Large Enough
Supported
Replacement Not Required Need selection of Need selection of
Algorithm 1 among C 1 among α
No. of Parallel 1 As many as the no. Equal to no. of lines
Comparisons of cache lines (C) per Set (α)
Tag Length (t) Short Very Long Short
Comparator Size Small Very Big Not Big

• Direct Mapping is simple and less costly, but does not support multiple simultaneous
localities – prone to Thrashing
• Associative Mapping supports very large number of localities, but the Comparator
Complexity and Tag Space requirements are Very High.
• Set-Associative Mapping supports sufficient number of localities and the Comparator
Complexity and Tag Space requirements are within acceptable limits.
Questions?
Cache Replacement Algorithm
• The size of cache memory is generally much smaller than the main memory.
• Only a portion of a program, which may contain one or more localities in it,
can be accommodated in the cache.
• As the program execution shifts from one locality to another there is need for
accommodating the new localities in the cache replacing the older localities.
• For this the cache uses Replacement Algorithms.
• The commonly used cache replacement algorithm - Least Recently Used
(LRU) - assumes that the location referred to farthest-in-time is least likely to
be referred to in the near future.
• The memory location content that was referred to
farthest-in-time, among those contained in the cache, is
replaced with the newly referred location content.
Cache Memory – Replacement Algorithms
• When the cache is full and a new block is required to be
bought into the cache, one of the existing blocks must be
replaced
• For direct mapping, there is only one possible line for any
particular block – no choice is possible
• For the associative and set-associative mapping, a replacement
algorithm is needed to choose the cache line
• Some common replacement algorithms are:
• Least Recently Used (LRU)
• First in First Out (FIFO)
• Least Frequently Used (LFU)
• Random
Replacement Algorithm - LRU

• Least Recently Used Algorithm


• Replace that block in the cache/set that has not been referred
to for the longest period of time
• To implement:
• Cache maintains a separate list of indexes of all the cache lines
• When a line is referenced, the index of that cache line is moved
to the front of the list
• Replace the block in the last cache line in the list
• Simple, popular and effective.
Replacement Algorithm – FIFO, LFU, Random

• First In First Out Algorithm


• Replace that block in the Cache/ Set that has been in the cache the
longest
• In other words, the block that came in first also goes out first
• Least Frequently Used Algorithm
• Replace that block in the Cache/ Set that has experienced the fewest
references
• Implemented by associating a counter with each line; replace the block
in the line which has the lowest counter value
• Random
• Not a usage-based algorithm
• Pick a line at random for replacement
• Inferior in performance to algorithms based on usage
Questions?
Cache Memory – Write Policy
• A line in the cache may contain Instructions or Data Block. A Block of
Instructions is never modified. However, a Block of Data may be
modified.
• When a Data Block resident in the Cache is to be replaced –
• If the Data Block has been Modified (i.e. a write operation has been
performed on one or more words in the cache line), then the
corresponding Main Memory block must be updated before the
Replacement.
• If the Data Block in the Cache has not been modified, then it may be
overwritten with a new block without first writing the old data block back
in the Main Memory.
• Write Policy determines – when a data block in the
Cache is modified should be updated (or written) in
the main memory.
Cache Memory – Write Policy

• Several modules/devices may be connected to the main


memory –e.g., I/O module may directly read from memory
• Then, if the block that has been changed in the cache is not updated
in main memory, then the main memory may contain invalid data
• Invalid data may in turn be read by other modules
• The simplest solution is the write policy called write through
• All write operation done on the cache are also immediately written
into main memory
• Data in main memory are always valid
• A lot of write operations, memory traffic
Cache Memory – Write Policy

• Alternative policy - write back


• Updates are made only in the cache
• When an update occurs, a Control bit called ‘dirty bit’, or ‘used
bit’, associated with the cache line is set.
• When the block is to be replaced, it is written back to main
memory if and only if the dirty bit is set.
• Disadvantage:
• The Data Block in Main Memory is invalid till the Write Back
operation is performed => Any access by I/O modules can be
allowed only through the cache. Else, I/O modules will receive
invalid data.
• Requires complex circuitry and a may lead bottleneck.
Cache Memory – Write Policy
• Another alternative policy – Buffered Write
• Write operations to the main memory are performed whenever
a Data Block in Cache is modified.
• A queue of write buffers are maintained where the main
memory write requests are queued and the main memory gets
updated from the queue.
Cache Main
Processor Memory
Write Queue
• Advantage:
• Processor does not have to wait for the main memory write
operation to complete.
Cache Memory – Control Bits Cache with the Control Bits
Tags Valid Dirty
Cache
Bit Bit
Each cache line has a set of
Control Bits; Two such bits
commonly found are-
• Valid bit: To indicate
that the contents in the
cache line are valid in
current program
context.
• Dirty bit: Used in case of
Write Back policy
Questions?
Multilevel Cache
• On-chip cache – when cache and processor are on the
same chip
• Compared to cache reachable via bus – reduces the processor’s
external bus activity therefore speeds up execution times and
increases overall system performance
• Simplest multilevel cache
• A two-level cache – internal cache, level 1 (L1) + external cache,
level 2 (L2)
• If there is no L2 cache and the processor makes an access
request for a memory location not in the L1 cache, then the
processor must access RAM via bus – slow RAM access time
results in poor performance
• Using a cache improves performance
Multilevel Cache
• Two features multilevel cache design
• For an off-chip L2 cache, use a separate data path instead of
system bus, so as to reduce the burden on the system bus.
• With the further shrinkage of processor components, a number
of processors now incorporate the L2 cache on the processor
chip, improving performance
• Complicates design issues – size, replacement algorithm, and
write policy
• Potential gains due to the use of an L2 cache depends on
the hit rates in both the L1 and L2 caches
Unified & Split Cache

• When caches are split into two parts – one part dedicated
to instructions and other to data – it is a split cache
• Both exist at the same level, typically as two L1 caches
• Advantage
• Eliminates contention for the cache between the instruction
fetch/decode unit and the execution unit
Unified & Split Cache
• When no such splitting is done it is a unified cache
• Advantages of unified cache:
• For a given size, a unified cache has a higher hit rate than split
caches because it balances the load between instruction and
data fetches automatically
• Only one cache needs to be designed and implemented
• Usually split caches are used for L1 and unified caches for
higher levels
Cache Memory Operation

• Cache connects to the processor via


data, control, and address lines.
• Data and address lines also attach
to data and address buffers, which
attach to a system bus to the main
memory.
• Cache hit occurs, the data and
address buffers are disabled and
communication is only between
processor and cache, with no
system bus traffic.
• Cache miss occurs, the desired
address is loaded onto the system
bus and the data are returned
through the data buffer to both the
cache and the processor
Cache Memory Addresses
• A physical cache stores data using main memory physical
addresses
• Hardware memory management unit (MMU) translates the Virtual/ Logical
Address into a Physical Address in main memory
Cache Memory Addresses
• A logical cache (or virtual cache) stores data using virtual
addresses
Questions?
Thank You

You might also like