Professional Documents
Culture Documents
Computer Architecture M2 (Part 3) PPT
Computer Architecture M2 (Part 3) PPT
Presented By
Dr T Ravichandran
Professor
Department of Computer Science and Engineering
MODULE 2 ( PART 3)
PIPELINING
Pipelining –Basic concepts, Pipeline
organization, Pipelining issues
Data dependencies
Memory delays
Branch delays
Resource limitations
Performance evaluation
Superscalar operation
Pipelining in CISC processors
Pipelining –Basic concepts, Pipeline organization,
Pipelining issues
Understanding Pipelining
Pipelining in Computer Architecture implements a form of parallelism for
executing the instructions.
A pipelined processor executes multiple instructions at the same time.
Pipelining –Basic concepts, Pipeline organization,
Pipelining issues
Pipelining -Basic Concepts
Advantages:
• More efficient use of processor
• Quicker time of execution of large number of instructions
Disadvantages:
• Pipelining involves adding hardware to the chip
• Inability to continuously run the pipeline at full speed because of
pipeline hazards which disrupt the smooth execution of the
pipeline.
Pipelining –Basic concepts, Pipeline organization,
Pipelining issues
Pipelining -Organisation
Pipelining creates and organizes a pipeline of instructions the processor can execute in
parallel. Creating parallel operators to process events improves efficiency. The pipeline is
divided into logical stages connected to each other to form a pipelike structure. Instructions
enter from one end and exit from the other.
• In the first stage of the pipeline, the program counter (PC) is used to fetch a new instruction.
• At any given time, each stage of the pipeline is processing a different instruction
Interstage buffers
• Information such as register addresses, immediate data, and the operations to be performed must be carried
through the pipeline.
Pipelining issues
Consider the two instructions,
The result of Ij is not written into register file , While Ij+1 reads it in cycle 3
• If the execution proceeds result will be incorrect
• Therefore the execution need to wait until Ij+1 Writes
• This means it must be stalled
• The subsequent instructions cannot enter pipeline –
Increases execution time
Pipelining : Pipelining Issues
The subtract instruction Ij+1 has to be stalled for 3rd to 5th clock cycle until Ij writes its value
NOP sent to interstage buffer – which creates one clock cycle of Idle time
Pipelining – Unavailability of data ( DATA HAZARD)
The delay arising with two instructions access same memory causing pipeline to stall
Example:
Load R2, (R3) Ij
Subtract R9, R2, #30 Ij+1
Ij may require more than one clock cycle to obtain operand ( here R3) from memory
Operand may not be found in cache – CACHE MISS
This would cause all subsequent instructions to stall
And if there is any data dependency this would cause additional stall ( MEM related Stall)
Performance is affected by
i) Data dependencies (data hazard)
ii) Branch penalty (branch or control hazard)
iii) Cache miss (memory hazard)
Performance Evaluation
Performance Evaluation
Performance Evaluation
Performance Evaluation
Performance Evaluation
Performance Evaluation
Performance Evaluation
Performance Evaluation
Performance Evaluation –Data dependency
Assume that branches constitute 20 percent of the dynamic instruction count of a program, and that
the average prediction accuracy for branch instructions is 90 % i.e., 10 % of all branch instructions
that are executed incur a one-cycle penalty due to misprediction. The increase in the average number
of cycles (S)
Suppose that 5 percent of all fetched instructions incur a cache miss, 30 percent of all
instructions executed are Load or Store instructions, and 10 percent of their data-operand
accesses incur a cache miss. Assume that the penalty to access the main memory for a cache
miss is 10 cycles.
The increase over the ideal case of S = 1 due to cache misses given by
δmiss = (mi + d × md ) × pm
δmiss = (0.05 + 0.30 × 0.10) × 10
= 0.8
Compared to δstall and δbranch_penalty the effect of a slow main memory for cache misses is more significant δmiss
When all factors are combined, S is increased from the ideal value of 1 to 1 + δstall + δbranch_penalty + δmiss.
The contribution of cache misses is often the dominant one.
Superscalar Operation
INTEL processors