Systolic Array

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 42

Fault Tolerance in Systolic Arrays

Presented on : 09 March 2012

Presentation Overview
Systolic arrays
Introduction Structures Matrix Multiplication Applications

Fault Tolerance in Systolic Arrays


Hardware Schemes Software/Algorithm based Reconfigurable SA

What is Systolic Computing?


A set of simple processing elements with regular and local connections which takes external inputs and processes them in a predetermined manner in a pipelined fashion

Systolic computers pump data through The architectures are not general but tied to specific algorithms

Systolic computers show both pipelining and parallel computation

Memory

Memory

PE

PE PE ----- PE

Generalization of pipelined array architecture

Functions of a Cell in a Systolic System


Systolic systems consists of array of PE (Processing Elements)
processors called cells, each cell connected to a small number of nearest neighbours in a mesh like topology.

Each cell performs sequence of operations on data that flows between them. Generally operations are same in each cell. Each cell performs an operation or small number of operations on a data item and then passes it to its neighbour. Systolic arrays compute in lock-step with each cell undertaking alternate compute/communicate phases.

SIMD Array Vs Systolic Array [1]

Control Unit Control Bus

Processing Units

Processing Units

.. Data Bus

Processing Units

Interconnection Network(Local)

PEs under supervision of one control unit


All PEs receive same instruction broadcast by control unit PEs operate on different data sets from distinct data streams.

Systolic Array.
Control Unit Processing Units Control Unit Processing Units

..

Control Unit Processing Units

Interconnection Network(Local)

SIMD array usually loads data into its local memories before starting the computation. Systolic arrays usually pipe data from an outside host and also pipe the results back to the host.
Figure Ref [1] Systolic Computing Fundamentals, http://web.cecs.pdx.edu/~mperkows/temp/May13/systolic.pdf

Variations of Systolic Arrays


Systolic arrays can be built with variations in:
Connection Topology
2D Meshes Hypercubes

Processor capability: ranging through Trivial- just an ALU ALU with several registers Simple CPU- registers, run own program Powerful CPU- local memory also

Typical Structures of a Systolic Architecture

Linear array with 1D I/O.

1D Linear Array

Linear array with 2D I/O. 2D Linear Array

Matrix Multiplication [2]


Consider multiplying a 3x2 X 2x1 matrix:

Systolic Arrays [2]

Y values goes left, X values go right, A values fan in

T0
T1 T2 T3 T4 T5 T6 T7
Figure Ref [2] Jason HandUber , Systolic Arrays , February 12, 2003 , http://web.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdf

Structures of a Systolic Architecture [1]


Bi-directional two-dimensional network

Planar array with perimeter I/O. This configuration allows I/O only through its boundary cells.

Focal Plane array with 3D I/O. This configuration allows I/O to each systolic cell.

Figure Ref [3] Jop Sibeyn , Systolic Matrix Product, http://users.informatik.uni-halle.de/~jopsi/dpar03/chap3.shtml

Figure Ref [4] Shaaban, Systolic Architectures , http://web.cecs.pdx.edu/~mperkows/temp/May22/0020.Matrix-multiplication-systolic.pdf

Structures of a Systolic Architecture

Structures of a Systolic Architecture [1]


hexagonal network

Structures of a Systolic Architecture

Structures of a Systolic Architecture


trees

Applications Of Systolic Arrays


Matrix Inversion and Decomposition. Solution of difference and differential equations Linear Programming Sorting and Searching Polynomial Evaluation Convolution Systolic arrays for matrix multiplication. Image Processing Image Recognition Computational Geometry CAD Systolic lattice filters used for speech and seismic signal processing Artificial neural network. Robotics Equation Solving Combinatorial Problems

Features of Systolic Arrays


Synchrony - data is rhythmically computed (Timed by a global clock) and passed through the network. Modularity array (Finite/Infinite) consists of modular processing units. Regularity - processing units are interconnected with homogeneously. Spatial Locality - cells has a local communication interconnection. Temporal Locality - cells transmits the signals from one cell to other which require at least one unit time delay. Pipelinability - array can achieve a high speed.

Systolic Disadvantages
Complicated Both in Hardware and Software.
In fact entire volumes exist outlining systolic array verification.

Expensive in comparison to uni-processor systems, although much faster. A systolic array used as attached array processor, integrated into

an existing host as a back-end processor


it receives data and o/p the results through an attached host computer,

Fault Tolerance
One for One Redundancy
each PE of SA has a redundant PE
standby PE keeps monitoring the active one at all times it becomes active if active PE fails it has to keep itself synchronized with the active unit operations

Fault Tolerance
N + X redundancy
consists of N+X PEs, where typically X is much smaller than N. whenever any of N modules fails, one of the X modules takes over its functions

health monitoring of N units by X units at all times is not


practical, a higher level module monitors the health of N units If one of the N units fails, it selects one of the X units.

Fault Tolerance
Load Sharing
all the PEs that are equipped to perform the SA function share

the load
higher level module performs load distribution, maintains health status of the PEs. If one load-sharing PE fails, the higher level module starts distributing load among the rest of the units.

There is a graceful degradation of performance with hardware


failure.

SA with N + 1 Redundancy [5]


Regular SA (N=4)
N PEs, N+1 interconnections

SA with N + 1 redundancy (N=4)


N+1 PEs, N mux, N demux, 2N+1 interconnections
Figures - Ref [5] I N Tselepis and M P. Bekakos, Fault-Tolerant Implementation of Systolic Arrays, http://www.aueb.gr/pympe/hercma/proceedings2009/H09FULL-PAPERS-1/TSELEPIS-BEKAKOS-1.pdf

Three Versions of the Computation Structure


A SA with pipeline period = 3 can perform an original algorithm and two redundant algorithms concurrently. Redundant computations can be performed by the idle PEs at idle clock cycles.

Redundancies are introduced at the computational level by deriving three


equivalent algorithms, but with disjoined index spaces.

Re-computing with Shifted Operands [6]

Figure Ref [6] Jacob A. Abraham, Prithviraj Banerjee, Chien-Yi Chen, W. Kent Fuchs, Sy-Yen Kuo, and A. L. Narasimha Reddy. 1987. Fault Tolerance Techniques for Systolic Arrays. Computer 20, 7 (July 1987), 65-75

Triple Modular Redundancy (TMR) [5]

Three processors all work on same problem and compare results

3N PEs, 2 Voters, 3(N+1) interconnections


Figures Ref [5] Tselepis and M P. Bekakos, Fault-Tolerant Implementation of Systolic Arrays, http://www.aueb.gr/pympe/hercma/proceedings2009/H09-FULL-PAPERS1/TSELEPIS-BEKAKOS-1.pdf

Triple Time Redundancy [7]

gracefully degradable linear systolic arrays TMR Fault Detection/Correction Time redundancy achieved - concurrent error correction/detection
Figure Ref [7] Majumdar, A.; Raghavendra, C.S.; Breuer, M.A.; , "Fault tolerance in linear systolic arrays using time redundancy," System Sciences, 1988. Vol.I. Architecture Track, Proceedings of the Twenty-First Annual Hawaii International Conference on , vol.1, no., pp.311-320, 0-0 1988

Algorithm-based Error Detection & Fault Location [6]

Array B

Checksum matrix multiplication

Array A

Reconfigurable Systolic Structures


Independent switches
Switches separated from PEs and treated as independent elements instead of part of PE.

Local Switches
Switches placed immediately around each PE. Information entering a faulty PE can be directed to one of its neighbours without processing.

Bus-structured switches
PEs are in collinear layout, with bundles of communication parallel to the row to which the PEs are connected

Address renaming
Each processor has modifiable address with redundant processors and links provided. Once a faulty PE is detected, addresses of the processor are rearranged so that the faulty PE is excluded and redundant PE is included.

Processor-Switch Lattice

Figure Ref [6] Jacob A. Abraham, Prithviraj Banerjee, Chien-Yi Chen, W. Kent Fuchs, Sy-Yen Kuo, and A. L. Narasimha Reddy. 1987. Fault Tolerance Techniques for Systolic Arrays. Computer 20, 7 (July 1987), 65-75

References
1. 2. Systolic Computing Fundamentals, http://web.cecs.pdx.edu/~mperkows/temp/May13/systolic.pdf Jason HandUber , Systolic Arrays , February 12, 2003 , http://web.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdf 3. Shaaban, Systolic Architectures , http://web.cecs.pdx.edu/~mperkows/temp/May22/0020.Matrixmultiplication-systolic.pdf 4. 5. Jop Sibeyn , Systolic Matrix Product, http://users.informatik.uni-halle.de/~jopsi/dpar03/chap3.shtml I.N. Tselepis and M.P. Bekakos, Fault-Tolerant Implementation of Systolic Arrays, http://www.aueb.gr/pympe/hercma/proceedings2009/H09-FULL-PAPERS-1/TSELEPIS-BEKAKOS-1.pdf 6. Jacob A. Abraham, Prithviraj Banerjee, Chien-Yi Chen, W. Kent Fuchs, Sy-Yen Kuo, and A. L. Narasimha Reddy. 1987. Fault Tolerance Techniques for Systolic Arrays. Computer 20, 7 (July 1987), 65-75 7. Majumdar, A.; Raghavendra, C.S.; Breuer, M.A.; , "Fault tolerance in linear systolic arrays using time redundancy," System Sciences, 1988. Vol.I. Architecture Track, Proceedings of the Twenty-First Annual Hawaii International Conference on , vol.1, no., pp.311-320, 0-0 1988

Thank you

You might also like