Professional Documents
Culture Documents
VLSI Programming Systolic Design: Book Parhi, Chp. 7 Rudolf Mak R.h.mak@tue - NL
VLSI Programming Systolic Design: Book Parhi, Chp. 7 Rudolf Mak R.h.mak@tue - NL
Systolic Design
Book Parhi, Chp. 7
Rudolf Mak
r.h.mak@tue.nl
CL CL
state state
Host
Turing-equivalent machine
PE PE PE PE PE
Systolic array: Moore machines
,0
RIA
, 1 1
1 1
1, 1
, 0 does not work!!!
, 1,
, , 1 or , 1
18-May-16 Rudolf Mak TU/e Computer Science Systolic 9
Regular Iterative Algorithm
A RIA is a triple consisting of , is input
1. An index space { , | 0 , 0
2. A finite set of variables ! , ,
3. A set of direct dependencies among indexed
variables (given as equalities)
• with associated index displacement vectors
• also called fundamental edges by Parhi
Canonical forms:
1. Standard input
2. Standard output
→ → → → → # $ → % $
0, 1 1, 0 0, 0 0, 0 1, 1
0, 1
1, ,
I(g) is the index
vector, i.e., the
sequence of
, 1 ,
coordinates of g
in index-space
, , , 1, 1
18-May-16 Rudolf Mak TU/e Computer Science Systolic 13
Dependence graphs
1. The nodes of a dependence graph represent (small)
computations. There is a separate node for each com-
putation.
h(2)
h(1)
h(0)
0 0 0 0 0
h(2)
0
h(1)
0
h(0)
h(2)
h(1)
(1,-1)
(0,1)
h(0)
(1,0)
fundamental edges 1 1 0 1
24 25 |26
0 1 1
18-May-16 Rudolf Mak TU/e Computer Science Systolic 18
Systolic array design
The design of a systolic array for a computation
given in the form of a regular dependence graph
involves:
h(2)
processors
h(1)
h(0)
B:
18-May-16 Rudolf Mak TU/e Computer Science Systolic 22
Scheduling: C: 1, 0
): 1, 0
x(0) x(1) x(2) x(3) x(4)
0 1 2 3 4
h(2)
0 1 2 3 4
h(1)
0 1 2 3 4 h(0)
C: time
PE PE PE
h0 h1 h2
x(i)
0
y(i) v(i) u(i)
HUE = 1 / | sTd | = 1
18-May-16 Rudolf Mak TU/e Computer Science Systolic 27
Determining 8, ;, and =
• Trial-and-error approach
– Pick a combination and check whether the design
constraints are fulfilled.
• Constructive approach
1. Determine a schedule ;.
:
2. Determine a projection
vector = such that ; = A 0
: :
3. Let I = = 0 – == . Then I is a matrix of rank
:
1 such that I = 0. By sweeping, a zero
column can be created in Q. Drop this column to
obtain a 7 1 -matrix 8.
h(2)
h(1)
(1,-1)
(0,-1)
h(0)
(1,0)
1 0 1
fundamental edges E = ( eh | -ex | ey) =
0 -1 -1
18-May-16 Rudolf Mak TU/e Computer Science Systolic 30
Space-time diagram R1
J ): 1, 1 ,
10 B: 1, 1 ,
C: 1, 1
6
L B: 0 M
N C: 0 M
4
0 2 4 6 8 10 12 K
h(2)
h(1)
h(0)
L B: , : LO) 3
0 4 5 6 10
h(2)
1 2 6 7 8
h(1)
2 3 4 8 9
h(0)
N C: , : 3 P 3 ! 2
PE PE PE
h1 0 h2 0
0 h0
1
06Y205Y5
0 0 2 0
H 4 Y
5
0 h2 0
0 h0
3 0 U ∗ ∗ 0 0 \ \ 0
4 0 U ∗ ∗ U / / 0
5 0 U 0 0 / / 0
6 U 0 U \ \ \ 0
7 0 U ∗ \ 0 0 / / U
8 0 U ∗ \ _ / / 0
9 0 U ∗ \ ∗ _ 0 0 \ \ 0
10 0 U ∗ \ ∗ _ a / / 0
11 0 a 0 0 / / 0
12 U 0 a b \ \ 0
13 0 U ∗ a 0 0 / / a
f , , g ∑ : 0 g: & , ' ,
h , , g & , g 1
i , , g ' g 1,
f , ,0 0
f , ,g f , , g 1 h , , g i , , g
h , ,g h , 1, g
i , ,g i 1, , g
+Oj 0 , , g c
2
k
j
B i
A 1
C 2
1
0 0
0 1 2
• HUE = 1 / 3
18-May-16 Rudolf Mak TU/e Computer Science Systolic 51
x = i-k
Kung-
Leiserson
(3x3)-matrix
multiplication
systolic array
delay-elements
not drawn: one y = j-k
on each edge!
18-May-16 Rudolf Mak TU/e Computer Science Systolic 52
KL-array
processor
allocation
( binding )
unbalanced
workload
2
k
j
B i
1
A
C 2
1
0 0
d 0 1 2
3-slow
schedule
HUE = 1/3
# : 0 : ,
where is the input stream and the output stream.
a) Derive a RIA (in standard output form) for this system that satisfies
the equations
, , 0
, # : : ,0
l , , 0
Note that l , , ‼!
b) Draw the dependence graph of this RIA for 4. (you need to
draw only the part with 0 6).