Unit 4

UNIT-4
Characteristics Of Pipeline Processors

Pipelining refers to the temporal overlapping of processing pipelines are
nothing more than assembly lines in computing that can be used for
instruction processing. A basic pipeline process a sequence of tasks or
instruction, according to the following principle of operation.
Each task is subdivided into a number of successive tasks. The processing of
each single instruction can be broken down into four sub tasks:1. Instruction Fetch
2. Instruction Decode
3. Execute
4. Write back
It is assumed that there is a pipelined stage associated with each subtask.
The same amount of time is available in each stage for performing the
required subtask.
All the pipeline stages operate like an assembly line, that is, receiving their
input from the previous stage and delivering their output to next stage. We
also assumes, the basic pipeline operates clocked, in other words
synchronously. This means that each stage accepts a non input at start of
clock cycle, each stage has a single clock cycle available for performing the
required operation and each stage increases the result to the next stage by
the beginning of subsequent clock cycle.
Linear Pipeline Processors:

A linear Pipeline processor is a cascade of processing stages which are
linearly
connected to perform a fixed function over a stream of data flowing from one
end to other. In modern computers, linear pipelines are applied for
instruction execution, arithmetic computation, memory access operations.
A linear pipeline processor is constructed with be processing stages. External
inputs are fed into the pipeline at the first stage S1. The processed results are
passed from stage Si to stage Si+1 for all i = 1,2.K-1. The final result
emerges
from the pipeline at the last stage Sk. Depending on the control of data flow
along the pipeline, linear pipelines are model in two categories.
Asynchronous Model: Data flow between adjacent stages in asynchronous
pipeline is controlled by hankshaking protocol. When stage S1 is ready to
transmit, it sends a ready signal to Si + 1. After stage Si+1 receives the
incoming
data, it returns an acknowledge signal to Si.
NON LINEAR PIPELINE PROCESSOR:
A Three Stage Pipeline
Clock period The logic circuitry in each stage Si has a time delay denoted by i .
Let l be the time delay of each interface latch. The clock period of a linear
pipeline is defined by
The reciprocal of the clock period is called the frequency f = 1/ .

Ideally, a linear pipeline with k stages can process n tasks in
T k=k+(n-1) periods, where k cycles are used to fill up the pipeline or to
complete execution of the first task and n 1 cycles are needed to complete
the remaining n 1 tasks. The same number of tasks (operand pairs) can be
executed in a nonpipeline processor with an equivalent function in
T1-n.k time delay.
Speedup We define the speedup of a k -stage linear pipeline processor

over an equivalent nonpipeline processor as
It should be noted that the maximum speedup isS k k ,for n >> k. In other
words, the maximum speedup that a linear pipeline can provide us is k ,
where k is the number of stages in the pipe. The maximum speedup is never
fully achievable because of data dependencies between instructions,
interrupts, and other factors.
Efficiency :The efficiency of a linear pipeline is measured by the percentage
of busy time-space spans over the total time-space span, which equals the sum
of all busy and idle time-space spans. Let n, k, be the number of tasks
(instructions), the number of pipeline stages, and the clock period of a linear
pipeline, respectively. The pipeline efficiency is defined by
Note that 1as n . This implies that the larger the number of
tasks flowing through the pipeline, the better is its efficiency. Moreover, we
realize that = Sk /k . This provides another view of efficiency of a linear
pipeline as the ratio of its actual speedup to the ideal speedup k . In the steady
state of a pipeline, we have n >> k, the efficiency should approach 1.
However, this ideal case may not hold all the time because of program
branches and interrupts, data dependency, and other reasons.
Throughput :The number of results (tasks) that can be completed by a
pipeline per unit time is called its throughput. This rate reflects the computing
power of a pipeline. In terms of efficiency and clock period of a linear
pipeline, we define the throughput as follows:
where n equals the total number of tasks being processed during an

observation period k + (n 1) . In the ideal case, w = 1/ =f
when 1.This means that the maximum throughput of a linear pipeline is
equal to its frequency, which corresponds to one output result per clock period.
According to the levels of processing, pipeline processors can be classified into
the classes: arithmetic, instruction, processor, unifunction vs. multifunction,
static vs. dynamic, scalar vs. vector pipelines.
Reservation Table in linear pipelining:

The utilization pattern of successive stages in a synchronons pipeline is
specified by reservation table. The table is essentially a space time diagram
depicting the precedence relationship in using the pipeline stages. For a Kstage linear pipeline, K clock cycles are needed to flow through the pipeline.
Reservations table in Non-linear pipelining:

Reservation table for a dynamic pipeline become more complex and
interesting because a non-linear pattern is followed. For a given non-linear
pipeline configuration, multiple reservation tables can be generated. Each
reservation table will represent evaluation of different function. Each
reservation table displays the time space flow of data through the pipeline
for one function evaluation. Different function may follows different paths on
the reservation table.
Processing sequence
S1 S2 S1 S2 S3 S1 S3 S1
Reservation table for function X
Latency Analysis:
Latency: The number of time units (clock cycles) between two

initiations of a pipeline is the latency between them.
A latency value k means that two initiations are separated by k
clock cycles.
Any attempt by two or more initiations to use the same pipeline
stage at the same time will cause a collision.
A collision implies resource conflicts between two initiations in a
pipeline.
Collision with scheduling latency 2:
Latencies that cause collision are called forbidden latencies.

Forbidden latencies for function X are 2,4,5,7
Latencies 1,3,6 do not cause collision.
Maximum forbidden latency can be m
n = no. of columns
m n-1
All the latencies greater than m+ do not cause
collisions.
Permissible Latency p, lies in the range:
1 p m-1
Value of p should be as small as possible
Permissible latency p=1 corresponds to an ideal case, can
be achieved by a static pipeline.
Non Linear Pipeline:
Collision Vectors:
Combined set of permissible and forbidden latencies.

m-bit binary vector
C = (Cm Cm-1.C2 C1 )
The value of Ci = 1 if the latency i causes a collision; Ci
= 0 if the latency i is permissible.
Cm = 1, always; it corresponds to the maximum forbidden
latency.
State Diagrams
State diagrams can be constructed to specify the permissible
transitions among successive initiations.
The collision vector, corresponding to the initial state of pipeline
at time 1, is called the initial collision vector.
The next state of the pipeline at time t+p can be obtained by
using a bit-right shift register
Initial CV is loaded into the register.
The register is then shifted to the right
When a 0 emerges from the right end after p shifts, p is a
permissible latency
When a 1 emerges, the corresponding latency should be
forbidden
Logical 0 enters from the left end of the shift register.
The next state after p shifts is obtained by bitwise-ORing the
initial CV with the shifted register contents.
This bitwise-ORing of the shifted contents is meant to prevent
collisions from the future initiations starting at time t+1 and
onward.
Latency Cycles
Simple Cycles : Latency cycle in which each state appears only
once.
Greedy Cycles : whose edges are all made with
minimum latencies from their respective starting
states.
MAL : minimum average latency
At least one of the greedy cycles will lead to MAL.
Collision-free scheduling
Finding Greedy cycles from the set of Simple

cycles.
The Greedy cycle yielding the MAL is the final
Choice.
Optimization technique:
Insertion of Delay stages
Modification of reservation table
New CV
Improved state diagram
To yield an optimal latency cycle
Bounds on MAL
MAL is lower-bounded by the maximum number of
checkmarks in any row of the reservation table.
MAL is lower than or equal to the average latency of
any greedy cycle in the state diagram.
Average latency of any greedy cycle is upperbounded
by the number of 1s in the initial CV plus 1.
Optimal latency cycle is selected from one of the lowest
greedy cycles.
output
Instruction Pipeline Design:

A stream of instructions can be executed by pipeline in an overlapped
manner. A
typical instruction execution consists of a sequence of operations, including
(1) Instruction fetch
(2) Decode
(3) Operand fetch
(4) Execute
(5) Write back phases
Pipeline instruction processing:
A typical instruction pipeline has seven stages as depicted below in figures;
Fetch stage (F) fetches instructions from a cache memory.

Decode stage (D) decode the instruction in order to find function to be
performed and identifies the resources needed.
Issue stage (I) reserves resources. Resources include GPRs, bases and
functional units.
The instructions are executed in one or several execute stages (E)
Write back stage (WB) is used to write results into the registers.
Memory lead and store (L/S) operations are treated as part of solution.
Floating point add and multiply operations take four execution clock cycles.
In many RISC processors fewer cycles are needed.

Ideal cycles when instruction issues are blocked due
to resource conflicts
before date Y and Z are located in.
the store of sum to memory location X must wait three cycles for the add to
finish due to flow dependence.

Unit 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 4

Uploaded by

Copyright:

Available Formats

UNIT-4

Characteristics Of Pipeline Processors

Linear Pipeline Processors:

NON LINEAR PIPELINE PROCESSOR:

A Three Stage Pipeline

The reciprocal of the clock period is called the frequency f = 1/ .

Speedup We define the speedup of a k -stage linear pipeline processor

where n equals the total number of tasks being processed during an

Reservation Table in linear pipelining:

Reservations table in Non-linear pipelining:

Latency: The number of time units (clock cycles) between two

Latencies that cause collision are called forbidden latencies.

Combined set of permissible and forbidden latencies.

Finding Greedy cycles from the set of Simple

Instruction Pipeline Design:

Fetch stage (F) fetches instructions from a cache memory.

In many RISC processors fewer cycles are needed.

You might also like