Seminar PPT SJ

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

Journal Paper

 B-FETCH: Branch Prediction Directed Prefetching


for in-order processors

 Proposed by Reena Panda a graduate student at


Texas A&M, her advisor Dr. Paul V. Gratz and
their collaborator Dr. Daniel Jimenez at UTSA.

 One of four chosen to receive the award of “Best


Papers from IEEE Computer Architecture Letters”
from 2011
Outlines
2

 Problem Definition

 In-order Processors

 Data Cache Prefetching

 Spatial Memory Streaming(SMS)

 B-FETCH

 Comparison with SMS


Problem Definition
3

 energy efficiency – a design constraint in modern


computer system design.

 as a solution we use chip-multiprocessor designs


or CPU chip designs with many processor cores,to
reduce power consumption and increase the memory
performance

 But in in-order processors it is hard to achieve.


In-Order Processors
4

 executes instructions in their original order in a program.

 can being idle while data is retrieved for the next


instruction in a program.

 instructions are fetched, executed & completed in compiler-


generated order

 one fails, they all fails

 instructions are statically scheduled.


Data Cache Prefetching
5

 bring data before it is needed, to caches before the


occurrence of a miss.

 reduce the cache miss rate by eliminating some


on-demand data movements in the cache
hierarchy.

 central aspect is to detect and predict particular


memory references.
Spatial Memory Streaming
6

 predicts the future access pattern within a spatial region


around a miss.

 based on a history of access patterns initiated by that


missing instruction in the past.

 it indirectly infers future program control-flow to speculate


on the misses in a spatial region.

 as a result its overhead in terms of required state can be


high.
B-FETCH
7

 employs two speculative components, speculation on


a) the memory instructions to be executed, and
b) the effective addresses of these instructions .

 firstly we implement a look ahead mechanism to


predict future execution path.

 for the second, we exploit the correlation


between effective address values in a basic
block and their dependent register values at
earlier branch instructions.
Contd…….
8

 Modes of Operation :
B-Fetch supports two modes of operation,
Non-Loop and Loop mode

 Non-Loop mode is used when each branch leads to a


new basic block,

 while Loop mode is used when executing loops which


cause repeated branches back to the same basic block.
Overall System Architecture
9
Operation
10

 Consider the following code


if (x > 0)
{
a=0;
b=1;
c=2;
}
d=3;
Assume that x is not greater than zero
Contd…
11

 In normal execution the execution is done as


Cycle Fetch Decode Execute Save

1 if (x>0)

2 a=0 if (x>0)

3 b=1 a=0 if (x>0)

4 d=3 squash b=1 squash a=0 if (x>0)

5 d=3 squash b=1 squash a=0

6 d=3 squash b=1

7 d=3
Contd…

12

 Using B-FETCH
Cycle Fetch Decode Execute Save

1 if (x>0)

2 d=3 if (x>0)

3 d=3 if (x>0)

4 d=3 if (x>0)

5 d=3
Comparison
13
Comparison with SMS
14

Prefetche System No. of Size


r Component Entries
B-FETCH Branch Trace Cache 256 2KB
Branch Register 128 6.125KB
Table
Register File 64 256 Bytes

Prefetch Filter 1KB 384 Bytes

Prefetch Queue 100 75 Bytes

Path Confidence 2KB 2KB


Estimator
Misc. - 300 Bytes
Total Size 11.1KB
Contd….
15

Prefetche System Component No. of Size


r Entries
SMS Active Generation 64 2.937KB
Table

Filter Table 32 1.46KB

Pattern History Table 2K 28KB

Total Size 32.4KB


Conclusion
16

 accurately generate the future basic block trace and


initiates data prefetching for memory instructions in
those future basic blocks.

 capable of generating accurate and timely prefetches


for data exhibiting both regular and irregular access
patterns.

 provides a mean benefit of 39% over baseline and


incurring a minimal additional hardware cost of
11.1KB.
Future Work
17

 perform comparably on superscalar processors

 explore how B-Fetch performs on other workloads


including commercial workloads which tend to be
more irregular and hence may benefit our design.

 Finally, we will explore how the B-Fetch prefetcher


might be virtualized to further reduce overheads.
References
18

 Reena Panda, Paul V. Gratz , Daniel A. Jimenez


“B-Fetch : Branch Prediction Directed Prefetching
for In- Order Processors” in IEEE Computer
Architecture letters in 2011.

 Stefan G. Berg. "Cache Prefetching" Technical


Report UW-CSE 02-02-04, University of
Washington, February 2002
19

You might also like