ECE 565 High-Level Synthesis-An Introduction

ECE 565
High-Level SynthesisAn Introduction
Shantanu Dutt
ECE Dept., UIC
HLS Flow
Code/Algorithm Architecture (interconnected functional
units (FUs), memory units (MUs) via muxes, demuxes, tristate
buffers, buses, dedicated interconnects)
Classically, these 3
stages were
performed
sequentially but
currently performed
together (which
leads to better
optimization)
HLS Flow (contd)
HLS Flow (contd)
(Binding)
Allocation: Simple counting of FUs after the

above 2 stages
Simple HLS Examples
Simple HLS Examples (contd)

2) Mapping to h/w w/ constraints: use only 1 (X) and 1 (+) w/ X delay of 2
ccs and + delay of 1 cc
ldd
ldc
(a) Scheduling
ldx
lda
ldb
x
I1
mux1
d
I0
I0
y
I1
mux
mux
ldy
mux2
i) Non-overlapped pipelined scheduling

c1(1)
c2(1)
ccs 1
(b) Arch. Synthesis
c1(2)
c3(2)
c3(1) c2(2)
Note:
Unspecified
control signals
have either an
inactive value,
or if such a
concept doesnt
exists for the cs,
then the dontcare value
demux
6
[y c+d]
(c2)
Controller FSM:
Reset
cc 3i
cc 3(i+1) (c) Controller FSM

Synthesis
mux1=0,
mux2=0
demux=0,
ldy=1
O1
O0
ldz
Note: A register is loaded at the +ve/-ve edge

(in a +ve/-ve edge triggered system) of the cc
after the one in which its load signal is asseted.
lda=1, ldb=1,
ldc=1, ldd=1,
mux1=1, mux2=1
demux=1,
ldz=1
cc 3(i+2)
ldx=1
[z x+y]
(c3)
demux
[x a x b]
(c1)
lda = 1
reg. a
loaded

2) Mapping to h/w w/ constraints: use only 1 (X) and 1 (+) (contd)
ldd
ldc
(a) Scheduling
ii) Overlapped pipelined scheduling
X
c1(1)
+
ccs 1
c1(2)
ldx
lda
(b) Arch. Synthesis
ldb
I1
mux1
d
I0
I0
y
I1
mux
mux
ldy
mux2
c2(1) c3(1) c2(2) c3(2)
demux
6
cc 3(i+1)
[z x+y,]
(c3)
Controller FSM:
Reset
cc 3i
lda=1, ldb=1,
mux1=0, mux2=0
demux=0,
ldy=1, ldx=1
[y c+d, x a x b]
((c1, c2)
ldc=1, ldd=1,
mux1=1,
mux2=1,
demux=1,
ldz=1
demux
(c) Controller FSM

Synthesis
z
ldz
For 4 iterations, the overlapped schedule takes 9

ccs versus 12 ccs by the non-overlapped sched.
Overlap. sched: Time for n iterations = 2n+1
Throughput = n/(2n+1) ~ 0.5 outputs/cc
Nonoverlap. sched: Time for n iterations = 3n
Throughput = n/3n ~ 0.33 outputs/cc
~ 34% throughput improvement using an
overlapped schedule

in1
Some DFG control operation nodes:
T
Condition
(T/F)
F
Selectot
out
Conditional code:
If (a > b) then
c a-b;
Else
c b-a;
Possible DFGs corresponding to

the above conditional code:
in
in2
Condition
(T/F)
Distributor
T
F
out1
out2

Iterative code: while (a > b)
a a-b;
b
T sel F
c2
mux
>
T dist F
r1
ldr1
c1
Mux
s xor ovfl
= 1 -ve
= 0 +ve
1
cin
Demux
(a) Scheduling
(using only 1
adder/sub)
b+1 = 2s compl.
of -b
1
demux
final a
ldfina
(b) Arch. Synthesis
Scheduling
& binding:
+
ccs
c1
c2
c1
c2
To fsm
Initialized
to F
ldb
lda
Delay Nodes in DFGs
A delay node is generally implemented as a register; a delay node thus becomes a state
variable.
Delay Nodes in DFGs (contd)
register
Transformation in the DFG
Mapping to the architecture
Detailed HLS Example
Detailed HLS Example (contd)

Different paths (i/p
o/p) in the DFG
Scheduling heuristic: Among available

opers schedule those on available FUs
whose delay to o/p is the highest, breaking
ties in favor of those opers u whose
sibling o/ps (o/ps to the same children)
that are avail. or will be available at us
earliest finish will have the largest lifetime
at that point.
(a) Scheduling w/
one X (2 ccs) &
one + (1 cc); goal:
min. latency
(b) Reg. alloc. for

o/p of operations
For WAR
constraint
(c) Arch.
synthesis
Note: Not clear how register allocation has been done.

It is sub-optimal (4 non-primary i/p regs. needed)
The synthesized architecture
Detailed HLS Example (contd)
Detailed HLS ExampleRegister Allocation
Detailed HLS ExampleRegister Allocation (contd)

Scheduling heuristic: Among available opers
schedule those on avail. FUs whose delay to
o/p is the highest, breaking ties in favor of
those opers u whose sibling o/ps (o/ps to
the same children) that are avail. or will be
d0
avail.
at us earliest finish will have the
largest lifetime at that point.
3 non-primary i/p
regs. needed
In the conflict graph (one per FU), there is an edge between

2 var. nodes if their lifetimes overlap (indicating that different
registers need to be allocated to them)
Graph coloringusing min. # of colors to color node s.t.
connected node pairs have different colorsin general is NPhard
The above type of conflict graph is called an interval graph
(derived from a 1-dimensional interval of the lifetimes)
Min. graph coloring can be solved optimally in linear time for
Detailed HLS ExampleRegister Allocation (contd)
d0
3 non-primary i/p
regs. needed
Scheduling heuristic: Among available

opers schedule those on available FUs
whose delay to o/p is the highest,
breaking arbitrarily: Bs lifetime
oncreases, but Ds (dep. of B) decreases

ECE 565 High-Level Synthesis-An Introduction

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ECE 565 High-Level Synthesis-An Introduction

Uploaded by

Copyright:

Available Formats

ECE 565

High-Level SynthesisAn Introduction

HLS Flow (contd)

HLS Flow (contd)

Allocation: Simple counting of FUs after the

Simple HLS Examples

Simple HLS Examples (contd)

i) Non-overlapped pipelined scheduling

(b) Arch. Synthesis

cc 3(i+1) (c) Controller FSM

Note: A register is loaded at the +ve/-ve edge

Simple HLS Examples (contd)

(b) Arch. Synthesis

c2(1) c3(1) c2(2) c3(2)

(c) Controller FSM

For 4 iterations, the overlapped schedule takes 9

Simple HLS Examples (contd)

Some DFG control operation nodes:

Possible DFGs corresponding to

Simple HLS Examples (contd)

(b) Arch. Synthesis

Delay Nodes in DFGs

Delay Nodes in DFGs (contd)

Transformation in the DFG

Mapping to the architecture

Detailed HLS Example

Detailed HLS Example (contd)

Scheduling heuristic: Among available

(b) Reg. alloc. for

Note: Not clear how register allocation has been done.

Detailed HLS Example (contd)

Detailed HLS ExampleRegister Allocation

Detailed HLS ExampleRegister Allocation (contd)

In the conflict graph (one per FU), there is an edge between

Detailed HLS ExampleRegister Allocation (contd)

Scheduling heuristic: Among available

You might also like