Professional Documents
Culture Documents
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
Architecture Classification
SISD
Single Instruction Single Data
SIMD
Single Instruction Multiple Data
MIMD
Multiple Instruction Multiple Data
MISD
Multiple Instruction Single Data
2
Alternative Forms of Machine
Parallelism
3
Alternative Forms of Machine
Parallelism
4
Drawbacks of ILP and TLP
Coherency
Synchronization
Large Overhead
instruction fetch and decode: at some point, its hard to
fetch and decode more instructions per clock cycle
cache hit rate: some long-running (scientific) programs
have very large data sets accessed with poor locality
5
Alternative: Vector Processors
6
Vector Processing: Introduction
A vector is an ordered set of elements.
A vector operand contains an ordered set of n elements, where n
is called the length of the vector.
Each element in a vector is a scalar quantity, which may be
floating point number, an integer, a logical value, or a character
(byte).
Example vectors would be 64 or 128 elements in length
Small vectors are about 4 elements in length
8
9
Simple task of adding two groups of 10 numbers
together
Execute this loop 10 times
read the next instruction and decode it
fetch this number
fetch that number
add them
put the result here
end loop
12
Vector instructions are classified into for basic types:
f1: V = V f2: V = S
f3: V * V = V f4: V*S = V
The operations f1 and f2 are unary operations such as vector square root,
vector sine, vector complement, vector summation and so on.
Operations f3 and f4 are binary operations such as vector add, vector multiply,
vector scalar adds and so on.
13
Vector Instruction Fields
Vector instruction includes the initial addresses of the two source operands, one
destination operand, the length of the vectors and the operation to be performed.
14
Fig. Simplified view of a vector processor with one functional unit for arithmetic
operations
15
What is a Vector Processor?
A Vector processor is a processor that can operate on an
entire vector in one instruction.
The operand to the instructions are complete vectors
instead of one element.
Provides high-level operations that work on vectors
16
What is a Vector Processor?
Based on how the operands are fetched, vector processors
can be divided into two categories:
19
Components of Vector Processor
Vector Registers
Fixed length bank holding a single vector
Has at least 2 read and 1 write ports
Typically 8-32 vector registers, each holding 64-128 64-
bit elements
Vector Functional Units
Fully pipelined, start new operation every clock
Typically 4-8 FUs: FP add, FP mult, FP reciprocal, integer
add, logical, shift
Scalar Registers
Single element for FP scalar or address
Load Store Units
20
Vector-Register Architecture
21
Memory operations
Load/store operations move groups of data
between registers and memory
22
Vector Stride
23
Vector Processor Properties
24
Advantages of Vector
Processors
25
Disadvantages of Vector
Processors
Expansion of the Instruction Set
Architecture (ISA) is needed
Additional vector functional units and
registers
Modification of the memory system
26
Example Vector Machines
Machine Year Clock Regs Elements FUs LSUs
Cray 1 1976 80 MHz 8 64 6 1
Cray XMP 1983120 MHz 8 64 8 2 L, 1 S
Cray YMP 1988166 MHz 8 64 8 2 L, 1 S
Cray C-90 1991240 MHz 8 128 8 4
Cray T-90 1996455 MHz 8 128 8 4
Conv. C-1 1984 10 MHz 8 128 4 1
Conv. C-4 1994133 MHz 16 128 3 1
Fuj. VP2001982133 MHz 8-256 32-1024 3 2
NEC SX/2 1984160 MHz 8+8K 256+var 16 8
NEC SX/3 1995400 MHz 8+8K 256+var 16 8 27
Vectorization Example 1
DO 100 I = 1, N
A(I) = B(I) + C(I)
100 CONTINUE
Scalar process:
1. B(1) will be fetched from memory
2. C(1) will be fetched from memory
3. A scalar add instruction will operate on B(1) and C(1)
4. A(1) will be stored back to memory
5. Step (1) to (4) will be repeated N times.
28
Vectorization Example 1
DO 100 I = 1, N
A(I) = B(I) + C(I)
100 CONTINUE
Vector process:
1. A vector of values in B(I) will be fetched from memory
2. A vector of values in C(I) will be fetched from memory.
3. A vector add instruction will operate on pairs of B(I) and C(I) values.
4. After a short start-up time, stream of A(I) values will be stored back to
memory, one value every clock cycle.
29
Example (2): Y=aX+Y
Scalar Code: Vector Code:
LD F0, A LD F0, A
ADDI R4,Rx, #512 ; Last addr LV V1, Rx ; Load vecX
Loop: LD F2, 0(Rx) MULTSV V2, F0, V1 ; Vec Mult
MULTD F2, F0, F2 ; A * X[I] LV V3, Ry ; Load vecY
LD F4, 0(Ry) ADDV V4, V2, V3 ; Vec Add
ADDD F4, F2, F4 ; + Y[I] SV Ry, V4 ; Store result
SD 0(Ry), F4
ADDI Rx, Rx, #8 ; Inc index
ADDI Ry, Ry, #8 64 is element size .So we need
SUB R20, R4, Rx no loop now
BNEZ R20, Loop
1+5*64=321 operations
2+9*64=578 operations 30
Applications
31