Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 33

Miss Rate versus Block Size

25%

20% 1K

4K
15%
Miss
16K
Rate
10%
64K
5% 256K

0%
64
16

32

128

Block Size (bytes) 256


1
Review

2
CS501
Advanced Computer
Architecture

Lecture 39

Dr.Noor Muhammad Sheikh


3
Performance versus Year

1000 CPU
µProc
60%/yr.
“Moore’s Law”
Performance

100 Processor-Memory
Performance Gap:
(grows 50% / year)
10
DRAM
DRAM
7%/yr.
1
1980
1981

1988

1996
1982
1983
1984
1985
1986
1987

1989
1990
1991
1992
1993
1994
1995

1997
1998
1999
2000
Year 4
Block Diagram of Cache
Cache

Control Logic

Determine
Fast
and Tag
Memory comparison RAM
unit

5
Int ALFA [100], SUM;
SUM = 0;
For ( i=1; i<=100; i++ )
{ SUM = SUM + ALFA[i]; }

6
Associative Cache
fig. 7.31(jordan)
Tag Valid Cache Main
memory bits memory memory
421 1 0 Cache block 0 MM block 0
? 0 1 ? MM block 1
119 1 2 Cache block 2 MM block 2

2 1 255 Cache block 255


MM block 119
Tag One cache line,
field, 8 bytes
13 bits MM block 421
Valid,
1 bit
MM block 8191

Main memory address: 13 3


One cache line,
Tag Byte 8 bytes

7
• Main memory address references
have two fields:

• 3-bit word field

• 13-bit tag field

8
• 3-bit word field becomes a “cache
address”.

• Cache address specify where to find


the word in the cache.

• The 13-bit field must be compared


against every 13-bit tag in the tag
memory.
9
Associative Cache Mechanism
fig. 7.32 (Jordan)
A s s o c ia t iv e t a g m e m o r y

1
A rg u m e n t
r e g ist e r
M a tc h V a lid
b it b it

C a c h e b lo c k 0

2
?
M a tc h
C a c h e b lo c k 2
3
4

C a c h e b lo c k 2 5 5

64
O n e c a c h e lin e ,
8 b y te s
M a in m e m o ry a d d r e s s
Tag B y te 5
3
13 3 S e le c to r

6
To C PU
8 10
Direct-mapped cache
The main memory address is
partitioned into three fields:

• Word field

• Group field

• Tag field
11
Direct-mapped cache
• Cache address is composed of
two fields:

• Group field

• Word field
12
Figure: 7.34
(jordan)

13
Continued

• The data cache RAM is a block of fast


memory, usually a static RAM and it
stores copies of data or instructions
frequently requested by the CPU
• The Tag RAM contains part of the
memory address, called into the data
cache RAM.
14
Continued
• Associative memories are
considerably more expensive in terms
of gates than ordinary access-by-
address memories
• Each bit comparison is made with an
XOR gate, whose output will be 0 if
there is a match between the two bits.
• 1 output from the NOR gate indicates
a word match.

15
Continued

• Valid bit specifies that the


information in the selected block
is valid

16
Figure:7.33
(jordan)

17
Direct mapped cache
fig. 7.33 (Jordan)
Tag Valid Cache
memory bits memory Main memory block numbers Group #:
30 1 0 0 256 512 7680 7936 0
9 1 1 1 257 513 2305 7681 7937 1
1 1 2 2 258 514 7682 7938 2

1 1 255 255 511 767 8191 255


Tag #: 0 1 2 9 30 31
Tag One
field, cache
5 bits line,
8 bytes One cache line,
8 bytes
Cache address: 8 3

Main memory address: 5 8 3 18


Tag Group Byte
Direct map cache

• Imposes a considerable amount


of rigidity on cache organization.

• Relies on principle of locality.

19
Direct mapped cache

• Advantage:
simplicity

• Disadvantage:
only a single block from a given
group is present in cache at any time.
20
2-Way Set-Associative Cache
fig. 7.35(Jordan)
Tag Cache
memory memory Main memory block numbers Group #:
2 30 0 512 7680 0 256 512 7680 7936 0
2 9 1 513 2304 1 257 513 2304 7681 7937 1
1 2 258 2 258 514 7682 7938 2

0 1 255 255 511 255 511 767 8191 255


Tag #: 0 1 2 9 30 31
Tag One
field, cache
5 bits line,
One cache line,
8 bytes
8 bytes
Cache group address: 8 3

Main memory address: 5 8 3


Tag Set Byte
21
Continued

• The cache hardware is a


combination of direct and
associative mapping.

22
Block Replacement

23
2-Way Set-Associative Cache

• Similar to direct mapped cache

• Twice as many blocks in the


cache so that a set of any two
blocks from each main memory
can be stored in the cache.
24
2-Way-Set-Associative Cache

The main memory is address is


divided into two fields:

• 8-bit set field

• 5-bit tag field


25
Continued
• The group field is called the set
field.
• Set field is decoded and direct the
search to the correct group.
• After that the tags in the selected
groups are searched.

26
Continued

• Multiple copies of the same data can exist


in memory hierarchy simultaneously.
• Cache needs updating mechanism to
prevent old data values from being used.
This is the problem of cache coherence.
• Write policy is the method used by the
cache to deal with and keep the main
memory updated.
27
2-Way-Set-Associative Cache

• Two possible places in which a block can


resides.

• Both places must be searched


associatively.

• Cache group address is the same as that


of the direct-mapped cache.
28
Continued
• Dirty bit is a status bit which indicates
whether the block in the cache is dirty.
• If the block is clean, it is not written on a
miss, since lower level contains the same
information as the cache
• Writing the cache is not as easy as
reading from it, e.g. modifying a block
cannot begin unit the tag has been
checked to see if the address is a hit
29
Continued
• In the case of write through, also called
store through, the information is written to
both the blocks in the cache as well as in
the next lower level memory which is the
main memory.
• Read misses never results in the write to
the lower level
• In the next lower level the most current
copy of the information is present at all
times
30
Continued
• Write stall:
For write to complete in write through, the
CPU has to wait. This wait state is called
write stall.
• Write buffer:
Write Buffer reduces the write stall by
permitting the processor to continue as
soon as the data has been written into the
buffer, thus allowing overlapping of the
instruction execution with the memory
update.
31
Continued
Write Back:
• The information is written only in lower
block from the cache when modified.
• The modified block is written to the lower
level when it is replaced with the cache.
• Write occurs at the speed of the cache
memory.
• Multiple writes within a block require
memory.
• It uses less memory bandwidth.
32
Continued
Write Allocate:
The block is loaded followed by the write.
This action is similar to the road miss. It is
used in write back caches, since
subsequent writes to that particular block
will be captured by the cache.
No write Allocates:
The block is modified in the lower level and
not loaded into the cache. This method is
generally used in write through caches as
subsequent writs to that block still have to
go to the lower level
33

You might also like