Chap. 9 Pipeline and Vector Processing

9-1
Chap. 9 Pipeline and Vector Processing

 9-1 Parallel Processing
 Simultaneous data processing tasks for the purpose of increasing the
= computational speed
 Perform concurrent data processing to achieve faster execution time
 Multiple Functional Unit : Fig. 9-1 Parallel Processing Example
 Separate the execution unit into eight functional units operating in parallel
 Computer Architectural Classification Adder-subtractor
 Data-Instruction Stream : Flynn

Integer multiply
 Serial versus Parallel Processing : Feng
 Parallelism and Pipelining : Händler Logic unit
 Flynn’s Classification Shift unit
To Memory
 1) SISD (Single Instruction - Single Data stream) Incrementer
Processor
» for practical purpose: only one processor is useful registers
Floatint-point
add-subtract
» Example systems : Amdahl 470V/6, IBM 360/91
Floatint-point
IS multiply
Floatint-point
divide
CU PU MM
IS DS
Computer System Architecture Chap. 9 Pipeline and Vector Processing Dept. of Info. Of Computer
9-2
 2) SIMD
Shared memmory
(Single Instruction - Multiple Data stream) DS 1
PU 1 MM 1
» vector or array operations one vector
operation includes many operations on a DS 2
data stream PU 2 MM 2
IS
» Example systems : CRAY -1, ILLIAC-IV CU
DS n
PU n MM n
IS
 3) MISD
(Multiple Instruction - Single Data stream)
» Data Stream Bottle neck DS
IS 1 IS 1
CU 1 PU 1
Shared memory
IS 2 IS 2
CU 2 PU 2 MM n MM 2 MM 1
IS n IS n
CU n PU n
DS
9-3
 4) MIMD
(Multiple Instruction - Multiple Data stream)
» Multiprocessor Shared memory
IS 1 IS 1 DS
System CU 1 PU 1 MM 1
IS 2 IS 2
CU 2 PU 2 MM 2
v v
IS n IS n
CU n PU n MM n
Computer Architecture 9-4
Classifications
Processor Organizations
Single Instruction,
Single Instruction, Multiple Instruction
Multiple Instruction
Single Data Stream Multiple Data Stream Single Data Stream
Multiple Data Stream
(SISD) (SIMD) (MISD)
(MIMD)
Uniprocessor Vector Array Shared Memory
Multicomputer
Processor Processor (tightly coupled)
(loosely coupled)
9-5
 9-2 Pipelining
 Pipelining Decomposing a sequential process into suboperations
 Each subprocess is executed in a special dedicated segment concurrently
 Pipelining:
 Multiply and add operation : ( for i = 1, 2, …, 7 )
 Suboperation Segment
A i * Bi  C i
» 1) R1  Ai, R 2  Bi : Input Ai and Bi

» 2) R 3  R1 * R 2, R 4  Ci : Multiply and input Ci
» 3) R 5  R 3  R 4 : Add Ci
 Content of registers in pipeline example :
Figure 9-2 9-6
Example of Pipelining.
Ai Bi Ci
 R1 Ai , R2 Bi
R1 R2
Input Ai and Bi
 R3 R1 * R2, R4 Ci
Multiply and input Ci Multiplier
 R5 R3 + R4
R3 R4
Add Ci to product
Adder
R5
6
9-7
Table 9-1
Content of registers in pipeline example.
Clock
Pulse
Segment1 Segment2 Segment3
number R1 R2 R3 R4 R5
1 A1 B1 ---- ---- ----

2 A2 B2 A1*B1 C1 ----
3 A3 B3 A2*B2 C2 A1*B1+C1
4 A4 B4 A3*B3 C3 A2*B2+C2
5 A5 B5 A4*B4 C4 A3*B3+C3
6 A6 B6 A5*B5 C5 A4*B4+C4
7 A7 B7 A6*B6 C6 A5*B5+C5
8 ---- ---- A7*B7 C7 A6*B6+C6
9 ---- ---- ---- ---- A7*B7+C7
7
9-8
General considerations
 4 segment pipeline :
» S : Combinational circuit for Suboperation
» R : Register(intermediate results between the segments)
 Space-time diagram : Segment
» Show segment utilization as a function of time versus
clock-cycle
 Task : T1, T2, T3,…, T6
» Total operation performed going through all the segment
9-9
 Speedup S : Nonpipeline / Pipeline

 S = n • tn / ( k + n - 1 ) • tp = 6 • 6 tn / ( 4 + 6 - 1 ) • tp = 36 tn / 9 tn = 4
» n : task number ( 6 ) Pipeline= 9 clock cycles
» tn : time to complete each task in nonpipeline ( 6 cycle times = 6 tp)
k+n-1n » tp : clock cycle time ( 1 clock cycle )
» k : segment number ( 4 )
Clock cycles 1 2 3 4 5 6 7 8 9
 If n , S = tn / tp
1 T1 T2 T3 T4 T5 T6
task, nonpipeline ( tn ) = pipeline ( k • tp )
S = tn / tp = k • tp / tp = k 2 T1 T2 T3 T4 T5 T6
Segment
. 3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
 PipelineArithmetic Pipeline(Sec. 9-3) Instruction

Pipeline(Sec. 9-4) Sec. 9-3 Arithmetic Pipeline
 Floating-point Adder Pipeline Example : Fig. 9-6
 Add / Subtract two normalized floating-point binary number
» X = A x 2a = 0.9504 x 103
» Y = B x 2b = 0.8200 x 102
9-10
 4 segments suboperations
» 1) Compare exponents by subtraction : Exponents
a b
Mantissas
A B
3-2=1
R R
 X = 0.9504 x 103
 Y = 0.8200 x 102 Compare Difference
Segment 1 : exponents
» 2) Align mantissas by subtraction
 X = 0.9504 x 103
 Y = 0.08200 x 103 R
» 3) Add mantissas Segment 2 : Choose exponent Align mantissas

 Z = 1.0324 x 103
» 4) Normalize result R
 Z = 0.1324 x 104 Add or subtract

Segment 3 :
mantissas
R R
Adjust Normalize
Segment 4 :
exponent result
R R
9-11
Fetch instruction
Segment 1 :
9-4 Instruction Pipeline

from memory

Decode instruction
 Instruction Cycle
Segment 2 : and calculate
effective address
1) Fetch the instruction from memory Branch ?
2) Decode the instruction Segment 3 :

Fetch operand
from memory
3) Calculate the effective address

Segment 4 : Execute instruction
4) Fetch the operands from memory

Interrupt
Interrupt ?
5) Execute the instruction handling
6) Store the result in the proper place Update PC
 Example : Four-segment Instruction Pipeline Empty pipe
 Four-segment CPU pipeline : Fig. 9-7

Step : 1 2 3 4 5 6 7 8 9 10 11 12 13
» 1) FI : Instruction Fetch Instruction : 1 FI DA FO EX
» 2) DA : Decode Instruction & calculate EA 2 FI DA FO EX
» 3) FO : Operand Fetch (Branch) 3 FI DA FO EX
4 FI FI DA FO EX
» 4) EX : Execution
5 FI DA FO EX
 Timing of Instruction Pipeline : Fig. 9-8
6 FI DA FO EX
» Instruction Branch 7 FI DA FO EX
No Branch Branch
9-12
 Pipeline Conflicts : 3 major difficulties

 1) Resource conflicts
» memory access by two segments at the same time
 2) Data dependency
» when an instruction depend on the result of a previous instruction, but this result is not
yet available
 3) Branch difficulties
» branch and other instruction (interrupt, ret, ..) that change the value of PC
 Data Dependency Hardware Hardware Interlock
 previous instruction Hardware Delay
» Operand Forwarding
 previous instruction ALU (, register)
 Software Delayed Load
 previous instruction No-operation instruction Handling of Branch Instructions
 Prefetch target instruction
» Conditional branch branch target instruction instruction fetch

Chap. 9 Pipeline and Vector Processing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap. 9 Pipeline and Vector Processing

Uploaded by

Copyright:

Available Formats

9-1

Chap. 9 Pipeline and Vector Processing

 Data-Instruction Stream : Flynn

 Flynn’s Classification Shift unit

» 1) R1  Ai, R 2  Bi : Input Ai and Bi

 Content of registers in pipeline example :

Content of registers in pipeline example.

1 A1 B1 ---- ---- ----

 Speedup S : Nonpipeline / Pipeline

 PipelineArithmetic Pipeline(Sec. 9-3) Instruction

» 3) Add mantissas Segment 2 : Choose exponent Align mantissas

 Z = 0.1324 x 104 Add or subtract

9-4 Instruction Pipeline

1) Fetch the instruction from memory Branch ?

2) Decode the instruction Segment 3 :

3) Calculate the effective address

4) Fetch the operands from memory

5) Execute the instruction handling

6) Store the result in the proper place Update PC

 Example : Four-segment Instruction Pipeline Empty pipe

 Four-segment CPU pipeline : Fig. 9-7

» 2) DA : Decode Instruction & calculate EA 2 FI DA FO EX

» 3) FO : Operand Fetch (Branch) 3 FI DA FO EX

 Pipeline Conflicts : 3 major difficulties

You might also like