Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy

Chapter 6: Memory
Memory is organized into a hierarchy

CPU accesses
Memory near the top of the hierarchy is faster,
memory at least once
but also more expensive, so we have less of it
per fetch-execute
in the computer this presents a challenge
cycle:
Instruction fetch
Possible operand reads
Possible operand write
RAM is much slower

than the CPU, so we
need a compromise:
Cache
We will explore
memory here
RAM, ROM, Cache,
Virtual Memory
how do we make use of faster memory without

having to go down the hierarchy to slower
memory?
Types of Memory
RAM
stands for random access
memory because you
access into memory by
supplying the address
it should be called readwrite memory (Cache
and ROMs are also
random access
memories)
Actually known as
DRAM (dynamic RAM)
and is built out of
capacitors
Capacitors lose their
charge, so must be
recharged often (every
couple of milliseconds)
and have destructive
reads, so must be
recharged after a read
Cache
SRAM (static RAM) made up of flip-flops
(like Registers)
Slower than registers because of added
circuits to find the proper cache location,
but much faster than RAM
DRAM is 10-100 times slower than SRAM
ROM
Read-only memory contents of memory
are fused into place
Variations:
PROM programmable (comes blank and
the user can program it once)
EPROM erasable PROM, where the
contents of all of PROM can be erased by
using ultraviolet light
EEPROM electrical fields can alter parts
of the contents, so it is selectively erasable,
a newer variation, flash memory, provides
greater speed
Memory Hierarchy Terms

The goal of the memory hierarchy is to keep the contents
that are needed now at or near the top of the hierarchy
We discuss the performance of the memory hierarchy using the
following terms:
Hit when the datum being accessed is found at the current level
Miss when the datum being accessed is not found and the next level
of the hierarchy must be examined
Hit rate how many hits out of all memory accesses
Miss rate how many misses out of all memory accesses
NOTE: hit rate = 1 miss rate, miss rate = 1 hit rate
Hit time time to access this level of the hierarchy

Miss penalty time to access the next level
Effective Access Time Formula

We want to determine the impact that the memory
hierarchy has on the CPU
In a pipeline machine, we expect 1 instruction to leave the
pipeline each cycle
the system clock is usually set to the speed of cache
but a memory access to DRAM takes more time, so this impacts the
CPUs performance
On average, we want to know how long a memory access takes

(whether it is cache, DRAM or elsewhere)
effective access time = hit time + miss rate * miss penalty
that is, our memory access, on average, is the time it takes to access the
cache, plus for a miss, how much time it takes to access memory
With a 2-level cache, we can expand our formula:

average memory access time = hit time0 + miss rate0 * (hit time1 + miss
rate1 * miss penalty1 )
We can expand the formula more to include access to swap

space (hard disk)
Locality of Reference
The better the hit rate for level 0, the better off we are
Similarly, if we use 2 caches, we want the hit rate of level 1 to
be as high as possible
We want to implement the memory hierarchy to follow
Locality of Reference
accesses to memory will generally be near recent memory accesses
and those in the near future will be around this current access
Three forms of locality:

Temporal locality recently accessed items tend to be accessed again
in the near future (local variables, instructions inside a loop)
Spatial locality accesses tend to be clustered (accessing a[i] will
probably be followed by a[i+1] in the near future)
Sequential locality instructions tend to be accessed sequentially
How do we support locality of reference?

If we bring something into cache, bring in neighbors as well
Keep an item in the cache for awhile as we hope to keep using it
Cache
Cache is fast memory
Used to store instructions and data
It is hoped that what is needed will be in cache and what isnt needed
will be moved out of cache back to memory
Issues:
What size cache? How many caches?
How do you access what you need?
since cache only stores part of what is in memory, we need a
mechanism to map from the memory address to the location in cache
this is known as the caches mapping function
If you have to bring in something new, what do you discard?

this is known as the replacement strategy
What happens if you write a new value to cache?

we must update the now obsolete value(s) in memory
Cache and Memory Organization

Group memory locations into lines (or refill lines)
For instance, 1 line might store 16 bytes or 4 words
The line size varies architecture-to-architecture
All main memory addresses are broken into two parts

the line #
the location in the line
If we have 256 Megabytes, word accessed, with word sizes of 4, and 4
words per line, we would have 16,777,216 lines so our 26 bit address has
24 bits for the line number and 2 bits for the word in the line
The cache has the same organization but there are far fewer line
numbers (say 1024 lines of 4 words each)
So the remainder of the address becomes the tag
The tag is used to make sure that the line we want is the line we found
The valid bit is used to determine
if the given line has been
modified or not (is the line in
memory still valid or outdated?)
Types of Cache
The mapping function is based on the type of cache
Direct-mapped each entry in memory has 1 specific place
where it can be placed in cache
this is a cheap and easy cache to implement (and also fast), but since
there is no need for a replacement strategy it has the poorest hit rate
Associative any memory item can be placed in any cache line

this cache uses associative memory so that an entry is searched for in
parallel this is expensive and tends to be slower than a direct-mapped
cache, however, because we are free to place an entry anywhere, we can
use a replacement strategy and thus get the best hit rate
Set-associative a compromise between these two extremes

by grouping lines into sets so that a line is mapped into a given set, but
within that set, the line can go anywhere
a replacement strategy is used to determine which line within a set should
be used, so this cache improves on the hit rate of the direct-mapped cache
while not being as expensive or as slow as the associative cache
Direct Mapped Cache

Assume m refill lines
A line j in memory will be found
in cache at location j mod m
Since each line has 1 and only 1
location in cache, there is no
need for a replacement strategy
This yields poor hit rate but fast

performance (and cheap)
All addresses are broken into 3
parts
a line number (to determine the
line in cache)
a word number
the rest is the tag compare the
tag to make sure you have the
right line
Assume 24 bit addresses,

if the cache has 16384 lines,
each storing 4 words, then
we have the following:
Tag s-r
8
Line or Slot r
14
Word w
2
Associative Cache
Any line in memory can be placed in any line in cache
No line number portion of the address, just a tag and a word within the
line
Because the tag is longer, more tag storage space is needed in the cache,
so these caches need more space and so are more costly
All tags are searched simultaneously using associative memory

to find the tag requested
This is both more expensive and slower than direct-mapped caches but,
because there are choices of where to place a new line, associative caches
require a replacement strategy which might require additional hardware to
implement
From our previous example, our address now looks like this:
Tag 22 bit
Notice how big the tag is, our

cache now requires more
space to store more tag space!
Word
2 bit
Set Associative Cache

In order to provide some degree of We can expand this to:
variability in placement, we need
4-way set associative
more than a direct-mapped cache
8-way set associative
A 2-way set associative cache
provides 2 refill lines for each line
number
16-way set associative, etc
Instead of n refill lines, there are now

n / 2 sets, each set storing 2 refill lines
We can think of this as having 2
direct-mapped caches of half the size
Because there are as many refill

lines, the line number has 1 fewer bits
and the tag number has 1 more
Tag s-r
9
As the number increases,

the hit rate improves, but
the expense also increases
and the hit time gets worse
Eventually we reach an nway cache, which is a fully
associative cache
Line or Slot r
13
Word w
2
Replacement And Write Strategies

When we need to bring in a new
line from memory, we will have to
throw out a line
Which one?
No choice in a direct-mapped cache
For associative and set-associative, we
have choices
We rely on a replacement strategy to

make the best choice
this should promote locality of
reference
3 replacement strategies are

Least recently used (hard to
implement, how do we determine
which line was least recently used?)
First-in first out (easy to implement,
but not very good results)
Random
If we are to write a
datum to cache, what
about writing it to
memory?
Write-through write to
both cache and memory
at the same time
if we write to several
data in the same line
though, this becomes
inefficient
Write-back wait until

the refill line is being
discarded and write back
any changed values to
memory at that time
This causes stale or
dirty values in memory
Virtual Memory
Just as DRAM acts as a backup for cache, hard disk
(known as the swap space) acts as a backup for DRAM
This is known as virtual memory
Virtual memory is necessary because most programs are too
large to store entirely in memory
Also, there are parts of a program that are not used very often, so
why waste the time loading those parts into memory if they wont be
used?
Page a fixed sized unit of memory all programs and data

are broken into pages
Paging the process of bringing in a page when it is needed
(this might require throwing a page out of memory, moving
it back to the swap disk)
The operating system is in charge of Virtual Memory for us
it moves needed pages into memory from disk and keeps track of where
a specific page is placed
The Paging Process

When the CPU generates a memory address, it is a
logical (or virtual) address
The first address of a program is 0, so the logical address is
merely an offset into the program or into the data segment
For instance, address 25 is located 25 from the beginning of the program
But 25 is not the physical address in memory, so the logical address
must be translated (or mapped) into a physical address
Assume memory is broken into fixed size units known as

frames (1 page fits into 1 frame)
We know the logical address as its page # and the offset into the page
We have to translate the page # into the frame # (that is, where
is that particular page currently be stored in memory or is it
even in memory?)
Thus, the mapping process for paging means finding the frame # and
replacing the page # with it
Example of Paging
Here, we have a process of 8 pages but only 4 physical frames in

memory therefore we must place a page into one of the available
frames in memory whenever a page is needed
At this point in time, pages 0, 3, 4 and 7 have been moved into
memory at frames 2, 0, 1 and 3 respectively
This information (of which page is stored in which frame) is stored
in memory in a location known as the Page Table. The page table
also stores whether the given page has been modified (the valid bit
much like our cache)
A More
Complete
Example
Virtual address
mapped to physical
address
the page table
Address 1010 is
page 101, item 0
Page 101 (5) is
located in frame 11
(3) so the item 1010
is found at 110
Logical and physical memory for our program
Page Faults
Just as cache is limited in size, so is main memory a
process is usually given a limited number of frames
What if a referenced page is not currently in memory?
The memory reference causes a page fault
The page fault requires that the OS handle the problem
The process status is saved and the CPU switches to the OS

The OS determines if there is an empty frame for the referenced
page, if not, then the OS uses a replacement strategy to select a
page to discard
if that page is dirty, then the page must be written to disk instead of
discarded
The OS locates the requested page on disk and loads it into the
appropriate frame in memory
The page table is modified to reflect the change
Page faults are time consuming because of the disk access this causes
our effective memory access time to deteriorate badly!
Another Paging Example
Here, we have 13 bits for our addresses even though main memory is only 4K = 2 12
The Full
Paging
Process
We want to avoid memory accesses

we prefer cache accesses) but if
very memory access now requires
rst accessing the page table, which
s in memory, it slows down our
omputer
o we move the most used portion of

he page table into a special cache known
s the Table Lookaside Buffer or
ranslation Lookaside Buffer, abbrev.
s the TLB
he process is also shown in the next

lide as a flowchart
A Variation: Segmentation
One flaw of paging is that, because a page is fixed in size, a
chunk of code might be divided into two or more pages
So page faults can occur any time
Consider, as an example, a loop which crosses 2 pages
If the OS must remove one of the two pages to load the other, then the
OS generates 2 page faults for each loop iteration!
A variation of paging is segmentation

instead of fixed size blocks, programs are divided into procedural units
equal to their size
We subdivide programs into procedures
We subdivide data into structures (e.g., arrays, structs)
We still use the on-demand approach of virtual memory, but when a

block of code is loaded into memory, the entire needed block is loaded in
Segmentation uses a segment table instead of a page table and works similarly
although addresses are put together differently
But segmentation causes fragmentation when a segment is discarded from
memory for a new segment, there may be a chunk of memory that goes unused
One solution to fragmentation is to use paging with segmentation
Effective Access With Paging

We modify our previous formula to include the impact of
paging:
effective access time = hit time0 + miss rate0 * (hit time1 + miss
rate1 * (hit time2 + miss rate2 * miss penalty2))
Level 0 is on-chip cache

Level 1 is off-chip cache
Level 2 is main memory
Level 3 is disk (miss penalty2 is disk access time, which is lengthy)
Example:
On chip cache hit rate is 90%, hit time is 5 ns, off chip cache hit rate is
96%, hit time is 10 ns, main memory hit rate is 99.8%, hit time is 60 ns,
memory miss penalty is 10 ms = 10,000 ns
memory miss penalty is the same as the disk hit time, or disk access time
Access time = 5 ns + .10 * (10 ns + .04 * (60 ns + .002 * 10,000 ns)) =

6.32 ns
So our memory hierarchy adds over 20% to our memory access
Memory Organization
Here we see a typical memory layout:

Two on-chip caches: one for data, one for instructions with part of each cache
Reserved for a TLB
One off-chip cache to back-up both on-chip caches
Main memory, backed up by virtual memory

Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy

Uploaded by

Copyright:

Available Formats

Chapter 6: Memory

Memory is organized into a hierarchy

RAM is much slower

how do we make use of faster memory without

Memory Hierarchy Terms

Hit time time to access this level of the hierarchy

Effective Access Time Formula

On average, we want to know how long a memory access takes

With a 2-level cache, we can expand our formula:

We can expand the formula more to include access to swap

Three forms of locality:

How do we support locality of reference?

If you have to bring in something new, what do you discard?

What happens if you write a new value to cache?

Cache and Memory Organization

All main memory addresses are broken into two parts

Associative any memory item can be placed in any cache line

Set-associative a compromise between these two extremes

Direct Mapped Cache

This yields poor hit rate but fast

Assume 24 bit addresses,

All tags are searched simultaneously using associative memory

Notice how big the tag is, our

Set Associative Cache

16-way set associative, etc

Instead of n refill lines, there are now

Because there are as many refill

As the number increases,

Replacement And Write Strategies

We rely on a replacement strategy to

3 replacement strategies are

Write-back wait until

Page a fixed sized unit of memory all programs and data

The Paging Process

Assume memory is broken into fixed size units known as

Here, we have a process of 8 pages but only 4 physical frames in

the page table

The process status is saved and the CPU switches to the OS

Another Paging Example

We want to avoid memory accesses

o we move the most used portion of

he process is also shown in the next

A variation of paging is segmentation

We still use the on-demand approach of virtual memory, but when a

Effective Access With Paging

Level 0 is on-chip cache

Access time = 5 ns + .10 * (10 ns + .04 * (60 ns + .002 * 10,000 ns)) =

So our memory hierarchy adds over 20% to our memory access

Here we see a typical memory layout:

You might also like