Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 41

CS 6461: Computer Architecture

Instruction Level Parallelism


Instructor: M. Lancaster
Corresponding to Hennessey and Patterson
Fifth Edition
Section 3.1

Instruction Level Parallelism


Almost all processors since 1985 use pipelining to overlap
the execution of instructions and improve performance.
This potential overlap among instructions is called
instruction level parallelism
First introduced in the IBM Stretch (Model 7030) in about
1959
Later the CDC 6600 incorporated pipelining and the use of
multiple functional units
The Intel i486 was the first pipelined implementation of
the IA32 architecture
January 2013

Instruction Level Parallelism

Instruction Level Parallelism


Instruction level parallel processing is the concurrent
processing of multiple instructions
Difficult to achieve within a basic code block
Typical MIPS programs have a dynamic branch frequency of
between 15% and 25%
That is, between three and six instructions execute between a
pair of branches, and data hazards usually exist within these
instructions as they are likely to be dependent

Given basic code block size in number of instructions, ILP


must be exploited across multiple blocks

January 2013

Instruction Level Parallelism

Instruction Level Parallelism


The current trend is toward very deep pipelines, increasing
from a depth of < 10 to > 20.
With more stages, each stage can be smaller, more simple
and provide less gate delay, therefore very high clock rates
are possible.

January 2013

Instruction Level Parallelism

Loop Level Parallelism


Exploitation among Iterations of a Loop
Loop adding two 1000 element arrays
Code
for(i=1;i<=1000;i=i+1)
x[i]=x[i]+y[i];

If we look at the generated code, within a loop there may


be little opportunity for overlap of instructions, but each
iteration of the loop can overlap with any other iteration

January 2013

Instruction Level Parallelism

Concepts and Challenges


Approaches to Exploiting ILP
Two major approaches
Dynamic these approaches depend upon the hardware to
locate the parallelism
Static fixed solutions generated by the compiler, and thus
bound at compile time

These approaches are not totally disjoint, some requiring


both
Limitations are imposed by data and control hazards

January 2013

Instruction Level Parallelism

Features Limiting Exploitation of Parallelism


Program features
Instruction sequences

Processor features
Pipeline stages and their functions

Interrelationships
How do program properties limit performance? Under what
circumstances?

January 2013

Instruction Level Parallelism

Approaches to Exploiting ILP


Dynamic Approach
Hardware intensive approach
Dominate desktop and server markets

Pentium III, 4, Athlon


MIPS R10000/12000
Sun UltraSPARC III
PowerPC 603, G3, G4
Alpha 21264

January 2013

Instruction Level Parallelism

Approaches to Exploiting ILP


Static Approach
Compiler intensive approach
Embedded market and IA-64

January 2013

Instruction Level Parallelism

Terminology and Ideas


Cycles Per Instruction
Pipeline CPI = Ideal Pipeline CPI + Structural Stalls + Data Hazard Stalls
+ Control Stalls

Ideal Pipeline CPI is the max that we can achieve in a given


architecture. Stalls and/or their impacts must be minimized.
During 1980s CPI =1 was a target objective for single chip
microprocessors
1990s objective: reduce CPI below 1
Scalar processors are pipelined processors that are designed to fetch and
issue at most one instruction every machine cycle
Superscalar processors are those that are designed to fetch and issue
multiple instructions every machine cycle

January 2013

Instruction Level Parallelism

10

Approaches to Exploiting ILP


That We Will Explore
Technique

Reduces

Forwarding and bypassing

Potential data hazards and stalls

Delayed branches and simple branch scheduling

Control hazard stalls

Basic dynamic scheduling (scoreboarding)

Data hazard stalls from true dependences

Dynamic scheduling with renaming

Data hazard stalls and stalls from antidependences and


output dependences

Branch prediction

Control stalls

Issuing multiple instructions per cycle

Ideal CPI

Hardware Speculation

Data hazard and control hazard stalls

Dynamic memory disambiguation

Data hazard stalls with memory

Loop unrolling

Control hazard stalls

Basic computer pipeline scheduling

Data hazard stalls

Compiler dependence analysis, software pipelining, trace


scheduling

Ideal CPI, data hazard stalls

Hardware support for Compiler speculation

Ideal CPI, data, control stalls.

January 2013

Instruction Level Parallelism

11

Approaches to Exploiting ILP


Review of Terminology
Instruction issue:
The process of letting an instruction move from the instruction
decode phase (ID) into the instruction execution (EX) phase

Interlock (pipeline interlock, instruction interlock) is the


resolution of pipeline hazards via hardware. Pipeline
interlock hardware must detect all pipeline hazards and
ensure that all dependencies are satisfied

January 2013

Instruction Level Parallelism

12

Data Dependencies and Hazards


How much parallelism exists in a program and how it can
be exploited
If two instructions are parallel, they can execute
simultaneously in a pipeline without causing any stalls
(assuming no structural hazards exist)
There are no dependencies in parallel instructions
If two instructions are not parallel and must be executed in
order, they may often be partially overlapped.

January 2013

Instruction Level Parallelism

13

Pipeline Hazards
Hazards make it necessary to stall the pipeline.
Some instructions in the pipeline are allowed to proceed while
others are delayed
For this example pipeline approach, when an instruction is
stalled, all instructions further back in the pipeline are also
stalled
No new instructions are fetched during the stall
Instructions issued earlier in the pipeline must continue

January 2013

Instruction Level Parallelism

14

Data Dependencies and Hazards


Data Dependences an instruction j is data dependent on
instruction i if either of the following holds
Instruction i produces a result that may be used by instruction
j
Instruction j is data dependent on instruction k, and
instruction k is data dependent on instruction i that is, one
instruction is dependent on another if there exists a chain of
dependencies of the first type between two instructions.

January 2013

Instruction Level Parallelism

15

Data Dependencies and Hazards

Data Dependences
Code Example
LOOP:
L.D

F0,0(R1)

;F0=array element

ADD.D

F4,F0,F2

;add scalar in F2

S.D

F4,0(R1)

;store result

DADDUI R1,R1,#-8
BNE

;decrement pointer 8

R1,R2,LOOP;

The above dependencies are in floating point data for the first two
arrows, and integer data in the last two instructions

January 2013

Instruction Level Parallelism

16

Data Dependencies and Hazards


Data Dependences
Arrows show where order of instructions must be preserved
If two instructions are dependent, they cannot be
simultaneously executed or be completely overlapped

January 2013

Instruction Level Parallelism

17

Data Dependencies and Hazards


Dependencies are properties of programs
Whether a given dependence results in an actual hazard
being detected and whether that hazard actually causes a
stall are properties of the pipeline organization

January 2013

Instruction Level Parallelism

18

Data Dependencies and Hazards


Hazard created
Code Example
DADDUI R1,R1,#-8

;decrement pointer 8

BNE

R1,R2,LOOP

When the branch test is moved from EX to ID stage


If test stayed in ID, dependence would not cause a stall
(Branch delay would still be two cycles however)

January 2013

Instruction Level Parallelism

19

Data Dependencies and Hazards


Branch destination and test known at
end of second cycle of execution

Branch destination and test known at end


of third cycle of execution
PCSrc

IF.Flush
Hazard
detection
unit

ID/EX

0
M
u
x
1

WB
Control

WB

EX

IF/ID

WB

Control
0

Branch

ALUSrc
Read
data 1

Read
register 2
Registers Read
Write
data 2
register
Write
data

Zero
ALU ALU
result

0
M
u
x
1

MemtoReg

Shift
left 2

MemWrite

Read
register 1

Address
Data
memory

Read
data

Write
data
Instruction 16
[15 0]

Instruction
[20 16]
Instruction
[15 11]

WB

EX

MEM/WB
WB

Sign
extend

32

ALU
control

0
M
u
x
1

1
M
u
x
0

PC

Shift
left 2

Registers

Instruction
memory

MemRead

ALUOp

M
u
x
ALU

Data
memory

M
u
x

M
u
x

Sign
extend

M
u
x
Forwarding
unit

RegDst

January 2013

EX/MEM

Add
Add result

RegWrite
Instruction

Instruction
memory

M
u
x

IF/ID

Address

WB

MEM/WB

Add

PC

ID/EX

M
u
x

EX/MEM

Instruction Level Parallelism

20

Data Dependencies and Hazards


Presence of dependence indicates a potential for a hazard, but
the actual hazard and the length of any stall is a property of the
pipeline.
Data dependence
Indicates possibility of stall
Determines the order in which results are calculated
Sets an upper bound on how much parallelism can be possibly
exploited.

We will focus on overcoming these limitation

January 2013

Instruction Level Parallelism

21

Overcoming Dependences

Two Ways
1. Maintain dependence but avoid the hazard

Schedule the code dynamically

2. Transform the code

January 2013

Instruction Level Parallelism

22

Difficulty in Detecting Dependences


A data value may flow between instructions either through
registers or through memory locations
Therefore, detection is not always straightforward
For instructions referring to memory, the register dependences
are easy to detect
Suppose however we have
R4 = 20 and R6 = 100 and we use 100(R4) and 20(R6)
Suppose we have incremented R4 in an instruction between
two references (say 20(R4) ) that look identical

January 2013

Instruction Level Parallelism

23

Name Dependences; Two Categories


Two instructions use the same register or memory location,
called a name, but there is actually no flow of data between
the instructions associated with that name. In cases where i
precedes j.
1. An antidependence between instructions i and j occurs when
instruction j writes a register or memory location that instruction i
reads. The original ordering must be preserved
2. An output dependence occurs when instruction i and instruction
j write the same register or memory location, the order again must
be preserved

January 2013

Instruction Level Parallelism

24

Name Dependences; Two Categories


1. An antidependence
i
j

DADD R1,R2.#-8
DADD R2,R5,0

2. An output dependence
i
j

January 2013

DADD R1,R2.#-8
DADD R1,R4,#10

Instruction Level Parallelism

25

Name Dependences
Not true data dependencies, and therefore we could execute
them simultaneously or reorder them if the name (register or
memory location) used in the instructions is changed so that
the instructions do not conflict
Register renaming is easier
i
j

DADD R1,R2,#-8
DADD R2,R4,#10

i
j

DADD R1,R2,#-8
DADD R5,R4,#10

January 2013

Instruction Level Parallelism

26

Data Hazards
A hazard is created whenever there is a dependence between
instructions, and they are close enough that the overlap caused
by pipelining or other reordering of instructions would change
the order of access to the operand involved in the dependence.
We must preserve program order; the order the instructions
would execute if executed in a non-pipelined system
However, program order only need be maintained where it
affects the outcome of the program

January 2013

Instruction Level Parallelism

27

Data Hazards Three Types


Two instructions i and j, with i occurring before j in program
order, possible hazards are:
RAW (read after write) j tries to read a source before i writes it,
so j incorrectly gets the old value
The most common type
Program order must be preserved
In a simple common static pipeline a load instruction followed by an
integer ALU instruction that directly uses the load result will lead to a
RAW hazard

January 2013

Instruction Level Parallelism

28

Data Hazards Three Types


Second type:
WAW (write after write) j tries to write an operand before it is
written by i, with the writes ending up in the wrong order, leaving
value written by i
Output dependence
Present in pipelines that write in more than one pipe or allow an
instruction to proceed even when a previous instruction is stalled
In the classic example, WB stage is used for write back, this class of
hazards avoided.
If reordering of instructions is allowed this is a possible hazard
Suppose an integer instruction writes to a register after a floating
point instruction does

January 2013

Instruction Level Parallelism

29

Data Hazards Three Types


Third type:
WAR (write after read) j tries to write an operand before it is
read by i, so i incorrectly gets the new value.
Antidependence
Cannot occur in most static pipelines note that reads are early in ID
and writes late in WB

January 2013

Instruction Level Parallelism

30

Control Dependencies
Determines ordering of instruction, i with respect to a branch
instruction so that the instruction i is executed in the correct
program order and only when it should be.
Example
if p1 {
S1;
};
if p2 {
S2;
}

January 2013

Instruction Level Parallelism

31

Control Dependencies
Example
if p1 {
S1;
};
if p2 {
S2;
}

S1 is control dependent on p1 and S2 is control dependent on


P2 but not on P1

January 2013

Instruction Level Parallelism

32

Control Dependencies
Two constraints imposed
An instruction that is control dependent on a branch cannot be moved
before the branch so that its execution is no longer controlled by the
branch. For example we cannot take a statement from the then portion of
an if statement and move it before the if statement.
An instruction that is not control dependent on a branch cannot be moved
after the branch so that the execution is controlled by the branch. For
example, we cannot take a statement before the if and move it into the
then portion
if p1 {
S1;
};
if p2 {
S2;
}

January 2013

Instruction Level Parallelism

33

Control Dependencies
Two properties of our simple pipeline preserve control
dependencies
Instructions execute in program order
Detection of control or branch hazards ensures that an instruction
that is control dependent on a branch is not executed until the
branch direction is known

We can introduce instructions that should not have been


executed (violating control dependences) if we can do so
without affecting the correctness of the program

January 2013

Instruction Level Parallelism

34

Control Dependencies are Really


Not the issue; Really the issue is the preservation of
Exception behavior
Data flow

January 2013

Instruction Level Parallelism

35

Preserving Exception Behavior


Preserving exception behavior means that any changes in the ordering of
instruction execution must not change how exceptions are raised in the
program
We may relax this rule and say that reordering of instruction execution must
not cause any new exceptions

L1:

DADDU R2,R3,R4
BEQZ
R2, L1
LW
R1,0(R2) ;Could cause illegal mem acc

In the above, if we do not maintain the data dependence of R2, we may


change the program. If we ignore the control dependency and move the load
instruction before the branch, the load instruction may cause a memory
protection exception
There is no visible data dependence that prevents this interchange, only
control dependence

January 2013

Instruction Level Parallelism

36

Preserving Exception Behavior


To allow reordering of these instructions (which as we said
preserves data dependence) we would like to just ignore the
exception.

January 2013

Instruction Level Parallelism

37

Preserving Data Flow


This means preserving the actual flow of data values between
instructions that produce results and those that consume them.
Branches make data flow dynamic, since they allow the source
of data for a given instruction to come from many points

January 2013

Instruction Level Parallelism

38

Preserving Data Flow


Example
DADDU
R1,R2,R3
BEQZ
R4,L
DSUBU
R1,R5,R6
L:

OR
R7,R1,R8 ; depends on branch taken
Cannot move DSUBU above branch

By preserving the control dependence of the OR on the branch


we prevent an illegal change to the data flow

January 2013

Instruction Level Parallelism

39

Preserving Data Flow


Sometimes violating the control dependence cannot affect either the
exception behavior or the data flow
DADDU
BEQZ
DSUBU
DADDU
skip: OR

R1,R2,R3
R1,skip
R4,R5,R6
R5,R4,R9
R7,R1,R8 ; suppose R4 not used after here

If R4 unused after this point, changing the value of R4 just before the
branch would not affect data flow
If R4 were dead and DSUBU could not generate an exception* we could
move the DSUBU instruction before the branch
This is called speculation since compiler is betting on branch outcome

January 2013

Instruction Level Parallelism

40

Control Dependence Again


Control dependence in the simple pipeline is preserved by
implementing control and hazard detection that can cause
control stalls
Can be eliminated by a variety of hardware techniques
Delayed branches can reduce stalls arising from control
hazards, but requires that the compiler preserve data flow

January 2013

Instruction Level Parallelism

41

You might also like