Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

EE204

Computer Architecture

Lecture 31- Cache


Performance
14th Apr, 2011

EE204 L31 Humaira, Spring 11 1


Measuring & Improving Cache
Performance
• Reducing Miss Rate
– Reducing the probability of two different memory
address for same Cache location
– Adding an Additional Level of Cache
• CPU time
– Clock cycles spent in instruction execution
– Clock cycles spent in waiting for memory system
(Memory-Stall Clock Cycles)

EE204 L31 Humaira, Spring 11 2


Cache Performance
• Memory-Stall Clock cycles
– Clocks spent in Cache Misses
– Read-Stall Cycles + Write-Stall Cycles
• Read-Stall Cycles
– Read accesses per program x
– Read Miss Rate x
– Read Miss Penalty (Clock Cycles)

EE204 L31 Humaira, Spring 11 3


Cache Performance
• Write-Stall Cycles
• Write-through Scheme
– Write Misses (requires fetching of block)
– Write Buffer Stalls (write buffer is full)
– Write Buffer stalls depends on timing of writes,
with sufficient write-buffer depth buffer stalls
are insignificant and can be ignored

EE204 L31 Humaira, Spring 11 4


Cache Performance
• Write-Stall Cycles
– Write accesses per program x
– Write Miss Rate x
– Write Miss Penalty (Clock Cycles)
• Write-Back scheme
– Write-stall when a Cache block is written back to
memory when block is replaced

EE204 L31 Humaira, Spring 11 5


Cache Performance
• Write-through Cache Scheme
– Read Miss Penalty = Write Miss Penalty
• Memory-Stall Clock cycles
– Memory Access per program x
– Miss Rate x
– Miss Penalty
• Memory-Stall Clock cycles
– Instructions per program x
– Misses per Instruction x
– Miss Penalty

EE204 L31 Humaira, Spring 11 6


Cache Performance Example
• Instruction Cache Miss Rate = 2%
• Data Cache Miss Rate = 4%
• Machine CPI = 2 without memory stalls
• Miss Penalty = 40 cycles
• 36% of Instructions are Data Access instructions
• How much faster will machine run with a perfect
Cache which never Misses?

EE204 L31 Humaira, Spring 11 7


Cache Performance Example
• Instruction Miss cycles = I x 2% x 40 = 0.80I
• Data Miss cycles = I x 36% x 4% x 40 = 0.57I
• Total Memory Stall cycles = 1.37I
• CPI with Memory stall = 2+1.37 = 3.37

EE204 L31 Humaira, Spring 11 8


Cache Performance Example
• Performance
= CPU time with stalls/CPU time with perfect Cache
= I x CPIstall x clock cycle/I x CPIperfect x clock cycle
= CPIstall /CPIperfect
= 3.37/2
= 1.68 faster
• Processor is made faster CPI = 1
• Performance = 2.37/1 = 2.37 faster

EE204 L31 Humaira, Spring 11 9


Cache Performance Example
• Clock Rate is doubled
• Total miss cycles/instruction
= (2% x 80) + 36% x (4% x 80) = 2.75
• CPI = 2 + 2.75 = 4.75
• Performance with fast clock/Performance with slow
clock
= exec. time with slow clock/exec. time with fast
clock
= IC x CPI x clock cycle/IC x CPI x clock cycle/2
= 3.36/4.75 x (1/2)
= 1.41
• m/c with faster clock is 1.41 times faster instead of
2.00
EE204 L31 Humaira, Spring 11 10
Cache Performance Example
• Relative Cache penalties increase as machine becomes
faster
• If Clock rate & CPI improve the performance suffers
– Lower the CPI, the more pronounced the effect of
stall cycles
– A higher CPU clock rate leads to a larger miss
penalty

EE204 L31 Humaira, Spring 11 11

You might also like