Professional Documents
Culture Documents
Instruction Level Parallelism
Instruction Level Parallelism
Scheduling
CPI
Pipelined CPI =
Ideal CPI + Structural Stalls + Data Hazard Stalls +
Control Stalls
Reduce Stalls!
Parallelism in loops
For (i=0; i<100; i++)
x[i] = x[i] + y[i];
Unrolling of the loop will expose the parallelism!
Data dependences
Instruction I dependent on J
Or I dependent on J and J dependent on K etc
Chain of dependences!
Data Dependences
Named dependence
Named Dependences
Anti-dependence
Output Dependence
Data Hazards
RAW
(due to Data dependence or True dependence)
WAR
(due to Anti-dependence)
WAW
(due to Output dependence)
Control dependence
If (p1)
S1;
If (p2)
S2;
Loop Unrolling
Unroll loop
Use different symbolic names
Branch instruction is also avoided
10
11
Branch Prediction
13
The Frequency of
Conditional Branches
So,
1-bit predictors
Taken or not-taken
Use low order address-bits to access the branch
buffer (and the prediction-bit)
Multiple addresses will map to same predictor
If branch is always taken, gives 2 incorrect values for
branch not-taken single time
Use 2-bit predictors
15
2-bit Predictors
17
If (bb==2)
bb =0;
If (aa != bb)
{
}
18
(m, n) Predictors
2m x n x Number-of-prediction-entries
19
Tournament Predictors
20
Score-boarding revision
21
Tomasulos approach
22
Tomasulos approach
23
Tomasulos approach
24
Tomasulos approach
Stages
1.
Issue
2.
Execute
Delay execution till operands are available (RAW)
3.
Write
Write into CDB.
This in-turn writes to registers and to stations waiting for this
operand
25
Tomasulos approach
Table
Op
Qj, Qk reservation station that will produce values
Vj,Vk Actual values of operands that are available
A Store address used (for Load or store)
Busy Indicates that the station is busy
26
Tomasulos approach
Advantages
Independent of pipeline used
Effective in presence of caches (it was designed
before caches were designed!)
27
Hardware--based
Hardware
based--speculation
Basic idea
Why cant some instruction speculatively proceed ahead if its
operands are available?
Just dont commit the instruction if you are not sure
In a sense,
separate out execution of an instruction from its commit
28
Hardware--based
Hardware
based--speculation
1.
2.
3.
29
30
31
ROB
32
Four steps
Issue
If there is empty ROB-slot AND empty reservation station, o/w
instruction issue is stalled
Execute
Execute WHEN operands are available (RAW hazards)
Write result
Write onto CDB, then to ROB and any reservation-station-waiting
Commit
Normal commit, when instruction reaches head of ROB
In case of store, write to memory
In case of wrong speculation, flush the remaining entries in the ROB
33
34
Recovery
35
36
37
38
1.
Or
2. Build the logic necessary to handle two instructions at
once including any possible dependences between the
instructions
Additional Challenge :
How to complete & commit
multiple instructions per clock cycle?
40
41
42
43
Speculation :
Implementation Issues & Execution
44