Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 12

9-1

Chap. 9 Pipeline and Vector Processing


 9-1 Parallel Processing
 Simultaneous data processing tasks for the purpose of increasing the
= computational speed
 Perform concurrent data processing to achieve faster execution time
 Multiple Functional Unit : Fig. 9-1 Parallel Processing Example
 Separate the execution unit into eight functional units operating in parallel
 Computer Architectural Classification Adder-subtractor

 Data-Instruction Stream : Flynn


Integer multiply
 Serial versus Parallel Processing : Feng
 Parallelism and Pipelining : Händler Logic unit

 Flynn’s Classification Shift unit

To Memory
 1) SISD (Single Instruction - Single Data stream) Incrementer
Processor
» for practical purpose: only one processor is useful registers
Floatint-point
add-subtract
» Example systems : Amdahl 470V/6, IBM 360/91
Floatint-point
IS multiply

Floatint-point
divide

CU PU MM
IS DS

Computer System Architecture Chap. 9 Pipeline and Vector Processing Dept. of Info. Of Computer
9-2

 2) SIMD
Shared memmory
(Single Instruction - Multiple Data stream) DS 1
PU 1 MM 1
» vector or array operations one vector
operation includes many operations on a DS 2
data stream PU 2 MM 2

IS
» Example systems : CRAY -1, ILLIAC-IV CU

DS n
PU n MM n

IS

 3) MISD
(Multiple Instruction - Single Data stream)
» Data Stream Bottle neck DS

IS 1 IS 1
CU 1 PU 1

Shared memory

IS 2 IS 2
CU 2 PU 2 MM n MM 2 MM 1

IS n IS n
CU n PU n

DS

Computer System Architecture Chap. 9 Pipeline and Vector Processing Dept. of Info. Of Computer
9-3

 4) MIMD
(Multiple Instruction - Multiple Data stream)
» Multiprocessor Shared memory
IS 1 IS 1 DS
System CU 1 PU 1 MM 1

IS 2 IS 2
CU 2 PU 2 MM 2

v v

IS n IS n
CU n PU n MM n

Computer System Architecture Chap. 9 Pipeline and Vector Processing Dept. of Info. Of Computer
Computer Architecture 9-4

Classifications
Processor Organizations

Single Instruction,
Single Instruction, Multiple Instruction
Multiple Instruction
Single Data Stream Multiple Data Stream Single Data Stream
Multiple Data Stream
(SISD) (SIMD) (MISD)
(MIMD)
Uniprocessor Vector Array Shared Memory
Multicomputer
Processor Processor (tightly coupled)
(loosely coupled)

Computer System Architecture Chap. 9 Pipeline and Vector Processing Dept. of Info. Of Computer
9-5

 9-2 Pipelining
 Pipelining Decomposing a sequential process into suboperations
 Each subprocess is executed in a special dedicated segment concurrently

 Pipelining:
 Multiply and add operation : ( for i = 1, 2, …, 7 )
 Suboperation Segment

A i * Bi  C i

» 1) R1  Ai, R 2  Bi : Input Ai and Bi


» 2) R 3  R1 * R 2, R 4  Ci : Multiply and input Ci
» 3) R 5  R 3  R 4 : Add Ci

 Content of registers in pipeline example :

Computer System Architecture Chap. 9 Pipeline and Vector Processing Dept. of Info. Of Computer
Figure 9-2 9-6
Example of Pipelining.

Ai Bi Ci

 R1 Ai , R2 Bi
R1 R2
Input Ai and Bi

 R3 R1 * R2, R4 Ci
Multiply and input Ci Multiplier

 R5 R3 + R4
R3 R4
Add Ci to product

Adder

R5
6
Computer System Architecture Chap. 9 Pipeline and Vector Processing Dept. of Info. Of Computer
9-7
Table 9-1

Content of registers in pipeline example.

Clock
Pulse
Segment1 Segment2 Segment3
number R1 R2 R3 R4 R5

1 A1 B1 ---- ---- ----


2 A2 B2 A1*B1 C1 ----
3 A3 B3 A2*B2 C2 A1*B1+C1
4 A4 B4 A3*B3 C3 A2*B2+C2
5 A5 B5 A4*B4 C4 A3*B3+C3
6 A6 B6 A5*B5 C5 A4*B4+C4
7 A7 B7 A6*B6 C6 A5*B5+C5
8 ---- ---- A7*B7 C7 A6*B6+C6
9 ---- ---- ---- ---- A7*B7+C7

7
Computer System Architecture Chap. 9 Pipeline and Vector Processing Dept. of Info. Of Computer
9-8

General considerations
 4 segment pipeline :
» S : Combinational circuit for Suboperation
» R : Register(intermediate results between the segments)
 Space-time diagram : Segment
» Show segment utilization as a function of time versus
clock-cycle
 Task : T1, T2, T3,…, T6
» Total operation performed going through all the segment

Computer System Architecture Chap. 9 Pipeline and Vector Processing Dept. of Info. Of Computer
9-9

 Speedup S : Nonpipeline / Pipeline


 S = n • tn / ( k + n - 1 ) • tp = 6 • 6 tn / ( 4 + 6 - 1 ) • tp = 36 tn / 9 tn = 4
» n : task number ( 6 ) Pipeline= 9 clock cycles
» tn : time to complete each task in nonpipeline ( 6 cycle times = 6 tp)
k+n-1n » tp : clock cycle time ( 1 clock cycle )
» k : segment number ( 4 )
Clock cycles 1 2 3 4 5 6 7 8 9
 If n , S = tn / tp
1 T1 T2 T3 T4 T5 T6
task, nonpipeline ( tn ) = pipeline ( k • tp )
S = tn / tp = k • tp / tp = k 2 T1 T2 T3 T4 T5 T6

Segment
. 3 T1 T2 T3 T4 T5 T6

4 T1 T2 T3 T4 T5 T6

 PipelineArithmetic Pipeline(Sec. 9-3) Instruction


Pipeline(Sec. 9-4) Sec. 9-3 Arithmetic Pipeline
 Floating-point Adder Pipeline Example : Fig. 9-6
 Add / Subtract two normalized floating-point binary number
» X = A x 2a = 0.9504 x 103
» Y = B x 2b = 0.8200 x 102

Computer System Architecture Chap. 9 Pipeline and Vector Processing Dept. of Info. Of Computer
9-10

 4 segments suboperations
» 1) Compare exponents by subtraction : Exponents
a b
Mantissas
A B

3-2=1
R R
 X = 0.9504 x 103
 Y = 0.8200 x 102 Compare Difference
Segment 1 : exponents
» 2) Align mantissas by subtraction
 X = 0.9504 x 103
 Y = 0.08200 x 103 R

» 3) Add mantissas Segment 2 : Choose exponent Align mantissas


 Z = 1.0324 x 103
» 4) Normalize result R

 Z = 0.1324 x 104 Add or subtract


Segment 3 :
mantissas

R R

Adjust Normalize
Segment 4 :
exponent result

R R

Computer System Architecture Chap. 9 Pipeline and Vector Processing Dept. of Info. Of Computer
9-11

Fetch instruction
Segment 1 :

9-4 Instruction Pipeline


from memory


Decode instruction

 Instruction Cycle
Segment 2 : and calculate
effective address

1) Fetch the instruction from memory Branch ?

2) Decode the instruction Segment 3 :


Fetch operand
from memory

3) Calculate the effective address


Segment 4 : Execute instruction

4) Fetch the operands from memory


Interrupt
Interrupt ?

5) Execute the instruction handling

6) Store the result in the proper place Update PC

 Example : Four-segment Instruction Pipeline Empty pipe

 Four-segment CPU pipeline : Fig. 9-7


Step : 1 2 3 4 5 6 7 8 9 10 11 12 13
» 1) FI : Instruction Fetch Instruction : 1 FI DA FO EX

» 2) DA : Decode Instruction & calculate EA 2 FI DA FO EX

» 3) FO : Operand Fetch (Branch) 3 FI DA FO EX

4 FI FI DA FO EX
» 4) EX : Execution
5 FI DA FO EX
 Timing of Instruction Pipeline : Fig. 9-8
6 FI DA FO EX
» Instruction Branch 7 FI DA FO EX

No Branch Branch

Computer System Architecture Chap. 9 Pipeline and Vector Processing Dept. of Info. Of Computer
9-12

 Pipeline Conflicts : 3 major difficulties


 1) Resource conflicts
» memory access by two segments at the same time
 2) Data dependency
» when an instruction depend on the result of a previous instruction, but this result is not
yet available
 3) Branch difficulties
» branch and other instruction (interrupt, ret, ..) that change the value of PC
 Data Dependency Hardware Hardware Interlock
 previous instruction Hardware Delay
» Operand Forwarding
 previous instruction ALU (, register)
 Software Delayed Load
 previous instruction No-operation instruction Handling of Branch Instructions
 Prefetch target instruction
» Conditional branch branch target instruction instruction fetch

Computer System Architecture Chap. 9 Pipeline and Vector Processing Dept. of Info. Of Computer

You might also like