Computer Architecture M2 (Part 3) PPT

18PD08 / Computer Architecture
Presented By
Dr T Ravichandran
Professor
Department of Computer Science and Engineering
MODULE 2 ( PART 3)
PIPELINING
 Pipelining –Basic concepts, Pipeline
organization, Pipelining issues
 Data dependencies
 Memory delays
 Branch delays
 Resource limitations
 Performance evaluation
 Superscalar operation
 Pipelining in CISC processors
Pipelining –Basic concepts, Pipeline organization,
Pipelining issues
Understanding Pipelining
Pipelining in Computer Architecture implements a form of parallelism for
executing the instructions.
A pipelined processor executes multiple instructions at the same time.
Pipelining issues
Pipelining -Basic Concepts
To improve performance of execution speed – 2ways

1. Use faster circuit technology implementation
2. Arrange hardware – more operation at same time
Advantages:
• More efficient use of processor
• Quicker time of execution of large number of instructions
Disadvantages:
• Pipelining involves adding hardware to the chip
• Inability to continuously run the pipeline at full speed because of
pipeline hazards which disrupt the smooth execution of the
pipeline.
Pipelining issues
Pipelining -Organisation
Pipelining creates and organizes a pipeline of instructions the processor can execute in
parallel. Creating parallel operators to process events improves efficiency. The pipeline is
divided into logical stages connected to each other to form a pipelike structure. Instructions
enter from one end and exit from the other.
• In the first stage of the pipeline, the program counter (PC) is used to fetch a new instruction.
• At any given time, each stage of the pipeline is processing a different instruction
Interstage buffers
• Information such as register addresses, immediate data, and the operations to be performed must be carried
through the pipeline.
• Interstage Registers RA, RB, RM, RY, and RZ

Pipelining issues
Pipelining -Organisation
Interstage buffers – USES
Buffer B1 : feeds the Decode stage with a newly-

fetched instruction.
Buffer B2 : Feeds computer stage with 2 operands ,

immediate value , incremented PC value as return
address
Buffer B3: Feeds memory stage with result of the

ALU operation
Buffer B4 : feeds the Write stage with a value to be

written into the register
Pipelining issues
Pipelining issues
Consider the two instructions,
Add R2, R3, #100 (Ij)

Subtract R9, R2, #30 (Ij+1)
The result of Ij is not written into register file , While Ij+1 reads it in cycle 3
• If the execution proceeds result will be incorrect
• Therefore the execution need to wait until Ij+1 Writes
• This means it must be stalled
• The subsequent instructions cannot enter pipeline –
Increases execution time
Pipelining : Pipelining Issues
Any condition that causes pipeline to stall – HAZARD
The hazard occurs due to
 Unavailability of data ( DATA HAZARD)
 Memory delay (MEMORY HAZARD)
 Branch instructions (BRANCH HAZARD)

Pipelining – Unavailability of data ( DATA HAZARD)
Consider the two instructions,
Add R2, R3, #100 Ij

Subtract R9, R2, #30 Ij+1
Clock cycle 1 2 3 4 5 6 7 8
FETCH DECODE COMPUTE MEMORY WRITE
FETCH DECODE STALL STALL COMPUTE MEM WRITE
 The subtract instruction Ij+1 has to be stalled for 3rd to 5th clock cycle until Ij writes its value
 NOP sent to interstage buffer – which creates one clock cycle of Idle time
Pipelining – Unavailability of data ( DATA HAZARD)
DATA Dependencies – Operand Forwarding
• This approach is one solution to overcome data dependency

• Provides the value od operand to the needy stage of instruction in pipeline, before the operand being stored
 Interstage buffers Rx ,Ry ,RA,RB 

 Result of Computation (ALU) stage is passed to needy stage
DATA DEPENDENCY – Unavailability of data
( DATA HAZARD)
 Leaving the task of detecting data dependencies to the compiler
 Compiler identifies a data dependency between two successive

instructions Ij and Ij+1
 insert three explicit NOP (No-operation)
 The NOPs introduce the necessary delay
 This simplifies hardware implementation, But code size increases
 Reordering – placing useful instruction in the place of NOP

Pipelining – Memory delays
 The delay arising with two instructions access same memory causing pipeline to stall
Example:
Load R2, (R3) Ij
Subtract R9, R2, #30 Ij+1
 Ij may require more than one clock cycle to obtain operand ( here R3) from memory
 Operand may not be found in cache – CACHE MISS
 This would cause all subsequent instructions to stall
 And if there is any data dependency this would cause additional stall ( MEM related Stall)
Compiler can eliminate the one cycle data dependency stall by

• Reordering - inserting useful instruction in between that do not
depend on memory
• Insert NOP - If useful instruction not found
EXAMPLE : Operand forwarding after memory

Pipelining – Branch Delay ( Control Hazard)
Occurs during control transfer instructions like BRANCH ,CALL ,JUMP

• Before calculating target address it starts inserting new instruction to
pipeline- causes delay
• Consider the following instructions
F D C M W
Add R8 , R8 , R9 F D C M W
Ij+1
Ij+2 F D C M W
Ij
Ij+k F D C M W
Jump LOOP
Ij+l F D C M W
Ij+1
NOP
Pipeline Execution
X Subtract R8,R5,#8
Ij+2 I àI
j j+1 àNOPà Ij+k àIj+l….à Ij+2
……………………..
…………………………………
Performance Evaluation
Instruction throughput -- no. of instructions executed per second
Non Pipelined instruction Throughput (P) = R/ S

R- clock rates per second
S- Average no. of clock cycle for one instruction execution
Pipelined instruction Throughput (P)= R

If no cache miss , five stage pipeline execute an instruction without stall (S=1)
Ideal pipeline S= 1
Performance is affected by
i) Data dependencies (data hazard)
ii) Branch penalty (branch or control hazard)
iii) Cache miss (memory hazard)
Performance Evaluation –Data dependency
Effect of stall and data dependencies

Assume that Load instructions constitute 25 percent of the dynamic instruction count, and assume that 40 percent of
these Load instructions are followed by a dependent instruction. A one-cycles stall is needed in such cases.
Evaluate the increase over the ideal case of S = 1
δstall = 0.25 × 0.40 × 1 = 0.10
Execution time T is increased by 10 percent and
Throughput is reduced to , Pp = R / 1 + δstall

= R / 1.1
= 0.91R
07/07/2024 Dr T Ravichandran HOD / CSE 23

Performance Evaluation –Branch Hazard
Effect of Branch Penalty
Assume that branches constitute 20 percent of the dynamic instruction count of a program, and that
the average prediction accuracy for branch instructions is 90 % i.e., 10 % of all branch instructions
that are executed incur a one-cycle penalty due to misprediction. The increase in the average number
of cycles (S)
δbranch_penalty = 0.20 × 0.10 × 1 = 0.02 Throughput is reduced to ,

Pp = R / 1 + δbranch_penalty
= R / 1.02
= 0.98R
Performance Evaluation – Memory Delay
The increase over the ideal case of S = 1 due to cache misses is

δmiss = (mi + d × md ) × pm
pm = No. of cycle there is a cache miss
mi = fraction of instruction that are fetched ,causing cache miss d = Fraction of Load & Store instruction
md = Fraction of d which cause cache miss
Performance Evaluation – Memory Delay
Suppose that 5 percent of all fetched instructions incur a cache miss, 30 percent of all
instructions executed are Load or Store instructions, and 10 percent of their data-operand
accesses incur a cache miss. Assume that the penalty to access the main memory for a cache
miss is 10 cycles.
The increase over the ideal case of S = 1 due to cache misses given by
δmiss = (mi + d × md ) × pm
δmiss = (0.05 + 0.30 × 0.10) × 10
= 0.8
 Compared to δstall and δbranch_penalty the effect of a slow main memory for cache misses is more significant δmiss
 When all factors are combined, S is increased from the ideal value of 1 to 1 + δstall + δbranch_penalty + δmiss.
 The contribution of cache misses is often the dominant one.
Superscalar Operation
 This is an approach to equip processor

 Processor has multiple execution units
 Each of it will be pipelined
 Several instruction start execution at same
clock cycle
 Achieves throughput of more than one instruction per
cycle
“ Superscalar Processors “
Two instructions fetched at same clock cycle

 STORE / LOAD deals with memory stage
 ADD / SUB doesn’t perform memory access
OUT OF ORDER EXECUTION
• Load is fetched previously to subtract.

• Subtract may write it result before Load.
• This would become disadvantage when there is data
dependency.
• The instruction must be dispatched in the same sequence
as they appear in program.
• But superscalar may cause out of order execution
Out of order Execution

 To guarantee consistent state , result of successive
instruction should be buffered until its previous instruction
is written.
 Temporary registers can be used for this buffering.
 In example , result of subtract instruction is buffered until

load instruction executes.
Pipelining in CISC
Complications arise due to instructions that are
 Variable in size
 Multiple memory operands
 Condition codes
Instruction that occupy more than one word

• Take several clock cycle to fetch
• Complicates decode and operand access
• Complicates dispatch in superscalar processor
Pipelining in CISC
Cold fire - processors

V1 - FIFO
V2 -Two stage fetch

1 register involved instruction pass once
2 memory involved instruction pass twice on execution stage
V4 – extended with 4 stages and branch prediction
V5 – Same as V4 , provide superscalar processing

Pipelining in CISC
INTEL processors
 Superscalar – high performance

 Core 2 and Corei7 have 14 stage pipeline
 Branch prediction , buffering techniques are used
 To reduce complexity CISC converted to RISC micro-operations

Computer Architecture M2 (Part 3) PPT

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Architecture M2 (Part 3) PPT

Uploaded by

Copyright:

Available Formats

18PD08 / Computer Architecture

To improve performance of execution speed – 2ways

• Interstage Registers RA, RB, RM, RY, and RZ

Interstage buffers – USES

Buffer B1 : feeds the Decode stage with a newly-

Buffer B2 : Feeds computer stage with 2 operands ,

Buffer B3: Feeds memory stage with result of the

Buffer B4 : feeds the Write stage with a value to be

Add R2, R3, #100 (Ij)

Any condition that causes pipeline to stall – HAZARD

The hazard occurs due to

 Unavailability of data ( DATA HAZARD)

 Memory delay (MEMORY HAZARD)

 Branch instructions (BRANCH HAZARD)

Consider the two instructions,

Add R2, R3, #100 Ij

DATA Dependencies – Operand Forwarding

• This approach is one solution to overcome data dependency

FETCH DECODE COMPUTE MEMORY WRITE

FETCH DECODE COMPUTE MEMORY WRITE

 Interstage buffers Rx ,Ry ,RA,RB 

 Compiler identifies a data dependency between two successive

 Reordering – placing useful instruction in the place of NOP

Compiler can eliminate the one cycle data dependency stall by

EXAMPLE : Operand forwarding after memory

Occurs during control transfer instructions like BRANCH ,CALL ,JUMP

Instruction throughput -- no. of instructions executed per second

Non Pipelined instruction Throughput (P) = R/ S

Pipelined instruction Throughput (P)= R

Effect of stall and data dependencies

δstall = 0.25 × 0.40 × 1 = 0.10

Execution time T is increased by 10 percent and

Throughput is reduced to , Pp = R / 1 + δstall

07/07/2024 Dr T Ravichandran HOD / CSE 23

Effect of Branch Penalty

δbranch_penalty = 0.20 × 0.10 × 1 = 0.02 Throughput is reduced to ,

The increase over the ideal case of S = 1 due to cache misses is

 This is an approach to equip processor

Two instructions fetched at same clock cycle

• Load is fetched previously to subtract.

Out of order Execution

 Temporary registers can be used for this buffering.

 In example , result of subtract instruction is buffered until

Instruction that occupy more than one word

Cold fire - processors

V2 -Two stage fetch

V4 – extended with 4 stages and branch prediction

V5 – Same as V4 , provide superscalar processing

 Superscalar – high performance

You might also like