Professional Documents
Culture Documents
DigitalLogic ComputerOrganization L22 CachesP3 Handout
DigitalLogic ComputerOrganization L22 CachesP3 Handout
DigitalLogic ComputerOrganization L22 CachesP3 Handout
COMPUTER ORGANIZATION
Lecture 22: Caches (P3)
Measuring Performance
ELEC3010
ACKNOWLEGEMENT
2
COVERED IN THIS COURSE
❑ Binary numbers and logic gates
❑ Boolean algebra and combinational logic
❑ Sequential logic and state machines
❑ Binary arithmetic
Digital logic
❑ Memories
4
BLOCK REPLACEMENT POLICY
❑Direct mapped: no choice
❑Set associative and fully associative
▪ Pick non-valid entry, if there is one
▪ Otherwise, choose among entries in the set
❑Least recently used (LRU)
▪ Choose the one unused for the longest time
▪ Requires extra bits to order the blocks
▪ High overhead beyond 4-way set associative
❑Random
▪ Similar performance as LRU for high associativity
5
LRU REPLACEMENT EXAMPLE
6
ANOTHER LRU REPLACEMENT EXAMPLE
7
WHAT ABOUT WRITES?
❑Where do we put the result of a store?
❑Cache hit (block is in cache)
▪ Write new data value to the cache
▪ Also write to memory (write through)
▪ Don’t write to memory (write back)
• Requires an additional dirty bit for each cache block
• Writes back to memory when a dirty cache block is evicted
❑Cache miss (block is not in cache)
▪ Allocate the line (bring it into the cache) (write allocate)
▪ Write to memory without allocation (no write allocate or write
around)
8
WRITE THROUGH EXAMPLE
❑Assume write allocate
❑Size of each block is 8 bytes
❑Cache holds 2 blocks
❑Memory holds 8 blocks
❑6-bit memory address
9
WRITE THROUGH EXAMPLE
10
WRITE THROUGH EXAMPLE
11
WRITE THROUGH EXAMPLE
12
WRITE THROUGH EXAMPLE
13
WRITE THROUGH EXAMPLE
14
WRITE THROUGH EXAMPLE
15
WRITE THROUGH EXAMPLE
16
WRITE THROUGH EXAMPLE
17
WRITE THROUGH EXAMPLE
18
WRITE THROUGH EXAMPLE
19
WRITE THROUGH EXAMPLE
20
WRITE BACK EXAMPLE
❑Assume write allocate
❑Size of each block is 8 bytes
❑Cache holds 2 blocks
❑Memory holds 8 blocks
❑6-bit memory address
21
WRITE BACK EXAMPLE
22
WRITE BACK EXAMPLE
23
WRITE BACK EXAMPLE
24
WRITE BACK EXAMPLE
25
WRITE BACK EXAMPLE
26
WRITE BACK EXAMPLE
27
WRITE BACK EXAMPLE
28
WRITE BACK EXAMPLE
29
WRITE BACK EXAMPLE
30
WRITE BACK EXAMPLE
31
WRITE BACK EXAMPLE
32
WRITE BACK EXAMPLE
33
CACHE HIERARCHY
❑Time to get a block from memory is so long that
performance suffers even with a low miss rate
34
PIPELINE WITH A CACHE HIERARCHY
35
THE MEMORY HIERARCHY
36
THE MEMORY HIERARCHY
37
CACHE HIERARCHY
❑Example: assume 1 cycle to access L1 (3% miss rate), 10
cycles to L2, 10% L2 miss rate, 100 cycles to main memory
38
HOW DO WE MEASURE PERFORMANCE?
39
CPU EXECUTION TIME
40
INSTRUCTION COUNT (I)
41
CYCLE TIME (CT)
42
CYCLES PER INSTRUCTION (CPI)
43
A ROUGH BREAKDOWN OF CPI
44
IMPACT OF L1 CACHES
❑ With L1 caches
– L1 instruction cache miss rate = 2%
– L1 data cache miss rate = 5%
– Miss penalty = 100 cycles (access main memory)
– 20% of all instructions are loads, 10% are stores
❑ CPImemhier =
45
IMPACT OF L1 CACHES
❑ With L1 caches
– L1 instruction cache miss rate = 2%
– L1 data cache miss rate = 5%
– Miss penalty = 100 cycles (access main memory)
– 20% of all instructions are loads, 10% are stores
❑ CPImemhier =
46
IMPACT OF L1+L2 CACHES
❑ CPImemhier =
47
IMPACT OF L1+L2 CACHES
❑ CPImemhier =
48
PROCESSOR ORGANIZATION
IMPACT ON CPI (EXAMPLE 1)
49
PROCESSOR ORGANIZATION
IMPACT ON CPI (EXAMPLE 2)
50
COMPILER IMPACT ON CPI (EXAMPLE 3)
51
BEFORE NEXT CLASS
• Next time:
Multicore
52