Download as pdf or txt
Download as pdf or txt
You are on page 1of 212

Introduction to

High−Level Synthesis

Chapter 1
Source: Gajski, Dutt, Wu, Lin

"High−Level Synthesis"

Kluwer Academic Publishers, 1992

1.1

Copyright © 1993 by Daniel Gajski UC Irvine


NEED FOR HIGH−LEVELS
OF ABSTRACTION

VLSI complexity requires hierarchy

VLSI technology reached maturity

First silicon and first specification

Shorter design cycle

Better exploration of design space

Algorithms outperform designers

Two schools of thought:

1. capture−and−simulate
2. describe−and−synthesize

1.2

Copyright © 1993 by Daniel Gajski UC Irvine


LEVELS OF ABSTRACTION

STRUCTURAL BEHAVIORAL
DOMAIN System synthesis DOMAIN

Register−transfer synthesis
Processors, Memories, Buses Flowcharts, algorithms
Logic synthesis Register transfers
Registers, ALUs, MUXs
Gates, flip−flops Circuit synthesis Boolean expressions
Transistors Transistor functions

Transistor layouts

Cells

Chips

Boards, MCMs

PHYSICAL
DOMAIN

1.3

Copyright © 1993 by Daniel Gajski UC Irvine


THREE DESIGN VIEWS
if IR(3) = ’0’ then
PC := PC + 1;
else
DBUF := MEM(PC);
MEM(SP) := PC + 1;
SP := SP − 1;
PC := DBUF;
end if;
BEHAVIOR

mux1 DBUF

SP PC

Control Address bus


Unit MEM

mux2 1

+/−
Data bus

STRUCTURE

mux1 DBUF
PC
SP Address
bus

MEM

mux2
ADD/SUB

Data bus

1.4 FLOORPLAN
Copyright © 1993 by Daniel Gajski UC Irvine
DEFINITION OF SYNTHESIS

Behavior−to−structure
Circuit synthesis

Logic synthesis

Register−transfer synthesis

System synthesis

Structure−to−layout
Cell layout generation

Module layout generation


Chip floorplanning

System partitioning and placement

1.5

Copyright © 1993 by Daniel Gajski UC Irvine


DEPENDENCE OF LANGUAGES,
DESIGNS AND TECHNOLOGIES

MODELS
DESCRIPTIONS DESIGN

STYLES ABSTRACTIONS

TECHNOLOGY

Several descriptions for the same behavior

Several styles for the same description

Different abstractions for the same design

1.6

Copyright © 1993 by Daniel Gajski UC Irvine


DIFFERENT DESIGNS
FOR THE SAME BEHAVIOR

LIM CNT
if CNT =/ LIM then
EN <= ENIT;
else comp
EN <= ’0’; < = >
end if;
ENIT EN

Level sensitive

LIM CNT

if ENIT = ’1’ and not ENIT’stable then


EN <= ’1’; comp
< = >
elseif CNT = LIM then
EN <= ’0’;
end if;
1 D Q EN
ENIT

Edge sensitive

1.7

Copyright © 1993 by Daniel Gajski UC Irvine


DIFFERENT STYLES
FOR THE SAME DESCRIPTIONS

B EXOR (A,B)

Transmission gates

A
B
EXOR (A,B)

AND−OR−INVERT gate

1.8

Copyright © 1993 by Daniel Gajski UC Irvine


DIFFERENT CONSTRUCTS
FOR THE SAME BEHAVIOR

STATE X A B
0
REGISTER

STATUS
REGISTER
+/−
CONTROL
LOGIC

1 state (no status register)


if x = 0 then y = a+b else y = a−b

2 states (with status register)

if x = 0 then status = 1
if status = 1 then y = a+b else y = a−b

1.9

Copyright © 1993 by Daniel Gajski UC Irvine


Architechtural
Models in Synthesis

Chapter 2
Source: Gajski, Dutt, Wu, Lin

"High−Level Synthesis"

Kluwer Academic Publishers, 1992

2.1

Copyright © 1993 by Daniel Gajski UC Irvine


DESIGN STYLES AND
TARGET ARCHITECTURE

Left Right Result Left Right


bus bus bus bus bus

Register file Register file

LIR RIR

ALU ALU

3−bus nonpipelined 2−bus pipelined


design design

Program 1: x <= a + b; (100ns) LIR <= a; RIR <= b; (50ns)


y <= c − x; (100ns) x, RIR <= LIR + RIR; LIR <= c; (50ns)
y <= LIR − RIR; (50ns)

Program 2: x <= a + b; (100ns) LIR <= a; RIR <= b; (50ns)


y <= c − d; (100ns) x <= LIR + RIR; (50ns)
LIR <= c; RIR <= d; (50ns)
y <= LIR − RIR; (50ns)

2.2

Copyright © 1993 by Daniel Gajski UC Irvine


COMBINATORIAL LOGIC

A B C in A B C in

Programmable
OR array Programmable
OR array
Decoder
0

Programmable
AND array

Cout S Cout S

ROM implementation PLA implementation


of a FA of a FA

2.3

Copyright © 1993 by Daniel Gajski UC Irvine


COMBINATORIAL LOGIC

0 1 1 0 A B

Decoder
0
A
1
2
B
3

Output
(EXOR) EXOR

Decoder implementation Logic gate implementation


of an EXOR gate of an EXOR gate

2.4

Copyright © 1993 by Daniel Gajski UC Irvine


DESIGN PROCESS FOR
COMBINATORIAL FUNCTIONS

1. Compilation

2. Minimization

3. Technology mapping

4. Optimization

5. Transistor sizing

2.5

Copyright © 1993 by Daniel Gajski UC Irvine


FINITE STATE MACHINES

<S, I, O, f: S x I −> S, h: S x I −> O>

FSM types
1. Autonomous

2. State based

3. Transition based

4. Machines with datapath


5. Communicating machines

2.6

Copyright © 1993 by Daniel Gajski UC Irvine


AUTONOMOUS FSM

Modulo−3 counter

s0 s1 s2

State diagram

Present state Next state


Q Q Q Q
1 0 1 0
s0 = 0 0 s1 = 0 1

s1 = 0 1 s2 = 1 0

s2 = 1 0 s0 = 0 0

Next−state table

D1 Q1 D0 Q0
FF1 FF0
Q’1 Q’0
Clock

Logic implementation

Clock

Q1

Q0

State waveforms
2.7

Copyright © 1993 by Daniel Gajski UC Irvine


FSM WITH OUTPUT
MODULO−3 DIVIDER

Present state Next state Output


Q Q Q Q Y
1 0 1 0

s0 = 0 0 s1 = 0 1 0

s1 = 0 1 s2 = 1 0 0

s2 = 1 0 s0 = 0 0 1

State table

D1 Q1 D0 Q0
FF1 FF0
Q’1 Q’0
Clock Y

Logic implementation

Clock

Q1

Q0

State and output waveforms

2.8

Copyright © 1993 by Daniel Gajski UC Irvine


STATE−BASED
MODULO−3 DIVIDER

Present state Input Next state Present state Output

Q1Q 0 Count Q 1Q 0 Q Q0 Y
1
s0 = 0 0 1 s0 = 0 1 s0 = 0 0 0
s1 = 0 1 1 s2 = 1 0 s1 = 0 1 0

s2 = 1 0 1 s0 = 0 0 s2 = 1 0 1

don’t care 0 s0 = 0 0

Next−state and output tables


Count

D1 D0
FF1 Q1 FF0
Q0
Q’1 Q’0
Clock Y

Logic implementation

Clock
Count

Q1
Q0

Input and output waveforms


2.9

Copyright © 1993 by Daniel Gajski UC Irvine


TRANSITION−BASED
MODULO−3 DIVIDER

Present state Input Next state Output

Q1Q 0 Count Q 1Q 0 Y

s0 = 0 0 1 s0 = 0 1 0

s1 = 0 1 1 s2 = 1 0 0

s2 = 1 0 1 s0 = 0 0 1

don’t care 0 s0 = 0 0 0

Next−state and output tables

Count

D1 Q1 D0 Q0
FF FF0
1
Q’1 Q’0
Clock
Y

Logic implementation

Clock

Count

Q1
Q0

Input and output waveforms


2.10

Copyright © 1993 by Daniel Gajski UC Irvine


DESIGN PROCESS FOR
FINITE−STATE MACHINES

1. Compilation

2. State minimization

3. State encoding

4. Synthesis of next−state,
output functions

2.11

Copyright © 1993 by Daniel Gajski UC Irvine


FINITE−STATE MACHINES
WITH A DATAPATH

FSMD = < S, I U B, O U A, f, h >

where

S = set of states
f = next state function

h = output function

B = set of some status variables

A = set of storage variable assignments

2.12

Copyright © 1993 by Daniel Gajski UC Irvine


TRANSITION−BASED FSMD

Present State Input Next State Output

(Count = 1) AND (x = 2) x = x + 1, Y = 0
s (Count = 1) AND (x = 2) s0 x = 0, Y=1
0
Count = 0 x = 0, Y=0

Next state and output table

0 +

Count
0 1
Selector

clock
1 Register

Decoder
0 1 2 3
y
status (x = 2)

Control unit Data path

Datapath implementation

2.13

Copyright © 1993 by Daniel Gajski UC Irvine


STATE−BASED FSMD

Present State Input Next State Output

Count = 0 s0
s (Count = 1) AND (x = 2) s1 x = 0, Y=0
0
(Count = 1) AND (x = 2) s0

Count = 0 s0
s (Count = 1) AND (x = 0) s1 x = x + 1, Y = 0
1
(Count = 1) AND (x = 1) s2

Count = 0 s0
s2 x = 0, Y=1
Count = 1 s1

Next state and output table

1 2
1

0 Selector 1
0 0 +

Count
0 1 0 1
Selector Selector

clock 1 clock
1 State Register

Decoder Decoder
0 1 2 3 0 1 2 3

Control unit Data path

Y
Datapath implementation

2.14

Copyright © 1993 by Daniel Gajski UC Irvine


GENERIC FSMD BLOCK DIAGRAM

Control inputs Datapath inputs

State reg.

Datapath
control
Next−state Output Datapath
function function

Status

Control unit

Control outputs Datapath outputs

FSMD = Control unit + Data Path

2.15

Copyright © 1993 by Daniel Gajski UC Irvine


NEXT−STATE FUNCTION

State register Control inputs


1

Status bits
0 1
Adder ROM/PLA

Status selector
Address selector
Test bit

Typical processor implementation

2.16

Copyright © 1993 by Daniel Gajski UC Irvine


DESIGN PROCESS FOR FSMDs

1. Compilation

2. Unit selection

3. Storage binding

4. Unit binding

5. Interconnection binding

6. Control definition

7. Control−unit synthesis

8. Functional−unit synthesis

2.17

Copyright © 1993 by Daniel Gajski UC Irvine


BEHAVIORAL DESCRIPTION
FOR FSMDs

Loop forever
if count=1
then
if x=2
then
begin
x=0
y=1
end
else
begin
x=x+1
y=0
end
endif
else
begin
x=0
y=0
end
endif
endloop

2.18

Copyright © 1993 by Daniel Gajski UC Irvine


FSMD COMMUNICATION

DQ
C Data bus

Clock 1
State

Next Output Datapath


state

FSM 1

Acknowledge Request

DQ
C

Clock 2
State

Next Output Datapath


state

FSM 2

2.19

Copyright © 1993 by Daniel Gajski UC Irvine


DESIGN PROCESS FROM
SYSTEM DESCRIPTIONS

1. Compilation

2. Partitioning

3. Interface synthesis

4. Scheduling

5. FSMD synthesis

2.20

Copyright © 1993 by Daniel Gajski UC Irvine


ENGINEERING CONSIDERATIONS

1. Clocking

2. Busing

3. Pipelining

2.21

Copyright © 1993 by Daniel Gajski UC Irvine


CLOCKING AND STORAGE

D
Q
Clock

Q’

D−latch

Clock

I/O waveforms

2.22

Copyright © 1993 by Daniel Gajski UC Irvine


CLOCKING STRATEGIES

Data in Data out


D Q D Q D Q

C C C
Clock

Shift−register with latches

MS flip−flop MS flip−flop MS flip−flop


Data in
D Q D Q D Q D Q D Q D Q

C C C C C C
Master Slave Master Slave Master Slave
Clock
Clock‘

Shift−register with MS flip−flops

Clock width

Clock

Clock’
Clock period

Single−phase clock

Phase 1

Phase 2

2−phase clock
2.23

Copyright © 1993 by Daniel Gajski UC Irvine


BUSING

Data
Y
Control

Tri−state driver

Data Bus
Output Latch
D Q
C

Q D
C

Input Latch
Control
Logic

Bus
Released

Bus interface

2.24

Copyright © 1993 by Daniel Gajski UC Irvine


DATAPATH PIPELINING

Register Register
file file
Register Register

Selector Selector

Two−stage
ALU ALU

Non−pipelined Pipelined datapath


datapath with 2−stage adder

2.25

Copyright © 1993 by Daniel Gajski UC Irvine


CONTROL PIPELINING
Control Datapath
input input

Datapath
control
Control Datapath
unit

Status

Control Datapath
output output
Non−pipelined control unit

Control Datapath
input input

Datapath
control
register

Control Datapath
unit

Status
register

Control Datapath
output output
Pipelined control unit
2.26

Copyright © 1993 by Daniel Gajski UC Irvine


FUTURE DIRECTIONS

Expansion of existing models

Design processes for new models

Algorithms with engineering considerations

Synthesis of mixed synch/asynch systems

2.27

Copyright © 1993 by Daniel Gajski UC Irvine


Quality Measures

Chapter 3
Source: Gajski, Dutt, Wu, Lin

"High−Level Synthesis"

Kluwer Academic Publishers, 1992

3.1

Copyright © 1993 by Daniel Gajski UC Irvine


QUALITY MEASURES

1. Cost
2. Area
3. Performance
4. Power
5. Testability
6. Verifiability
7. Reliability
8. Manufacturability

3.2

Copyright © 1993 by Daniel Gajski UC Irvine


RELATIONSHIP BETWEEN
STRUCTURAL AND PHYSICAL DESIGNS

Structural design
Control unit Datapath

present next reg. AR


state cond. state transf.
RAM
DR

Register
Reg file

Mux Mux

FU

Status reg

Technology mapping Technology mapping

PLA Std. cells General Bit−sliced Std. cells


cells stack

Floorplan

PLA Bit−sliced
stack

RAM Std. cells

3.3

Copyright © 1993 by Daniel Gajski UC Irvine


DATAPATH LAYOUT
ARCHITECTURE

Data lines Routing


(metal1 or poly) channel
LSB MSB LSB MSB

Control
lines
(metal1)

Data lines Control


(metal2) lines
(metal2)

Bit slice Bit slice

Custom Cells Standard Cells

3.4

Copyright © 1993 by Daniel Gajski UC Irvine


DATAPATH LAYOUT

Wdp
Wbit
Area = W dp X H dp
LSB MSB
A bit
Unit 1

Unit 2
H dp

Unit n

Extra Routing
Data routing channel
area Power Ground
Ground

Unit 1 Power Unit 1

Ground
Wunit2

Wunit2

Unit 2 Power Unit 2

Ground

Over−the−cell
routing track

Power

Ground
Diffusion
Metal 1
Metal 2
Wunitn
Wunitn

Unit n Poly Unit n


Power
Power/
Ground

Ground

H cell H ch H cell H ch
Wbit W bit

Wunit = const 1 X tr (unit)


H ch = const 2 X # tracks est

3.5

Copyright © 1993 by Daniel Gajski UC Irvine


CONTROL UNIT LAYOUT
Input Output
O1 =(I1’ I2 I3 I4’ I5’ ) OR ( I1 I2’ I3 I4’ I5 ) OR
I1 I2 I3 I4 I5 O 1 O2 O3 O4
(I1 I2’ I3’ I4’ I5 )
Present state Conditions/ Next Control
status state signals O2 = ( I1 I2’ I3’ I4’ I5 )
p1 p0 s2 s1 s0 r1 r 0 c1 c0
State 1 0 1 1 0 0 1 0 0 1 O3 =( I1 I2’ I3 I4’ I5 ) OR ( I1 I2’ I3’ I4’ I5 )
State2 1 0 1 0 1 1 0 1 0
State3 1 0 0 0 1 1 1 1 1 O4 = ( I1’ I2 I3 I4’ I5’ ) OR ( I1 I2’ I3’ I4’ I5 )

State table Output signals:


Boolean Equations
I1 I1
I1’ I1’
I2 I2
I2’ I2’
I3 I3
I3’ I3’
I4 I4
I4’ I4’
I5 I ’ I5 I ’
5 5

n1 n2 n3 n1 n2 n3

O1 O1

2−level AND−OR impl. 2−level NAND−NAND impl.

Clusters
Inputs
I1 I1’ I2 I2’ I3 I3’ I4 I4’ I5 I5’

Input
nets H ch

Internal H sc
nets

AND AND AND OR H cell

O1 O2 O3 O4
Wsc

Standard cell layout


Assumptions:
1. single row 2. signal clustering
3. no sharing between signals 4. track per signal
5. no logic optimization
3.6

Copyright © 1993 by Daniel Gajski UC Irvine


MULTIPLE−ROW
CONTROL LAYOUT

Wsc

H ch
H sc
H cell

single−row
implementation

Wsc / R

H sc

H H sc
block

H sc

3−row implementation

3.7

Copyright © 1993 by Daniel Gajski UC Irvine


PLA LAYOUT ARCHTECTURE

WPLA
AND array OR array W
in Wp Wout

r
Product
AND term OR H
array array PLA
buffers lw
Buffer Clock 2 Latch lh

1 Latch buffer Buffer bh


b
w

I1 I2 I3 I4 I5 O1 O 2 O 3 O4 Inputs Clock Outputs

Logic mapping Layout model

3.8

Copyright © 1993 by Daniel Gajski UC Irvine


MODELING PHYSICAL DESIGN

1. Probabilistic distribution for pins,


wire length

2. Placement, routing models

3. Linear algorithms (min−cut)

4. State encoding model

5. Logic minimization

6. Technology mapping

7. Transistor sizing

3.9

Copyright © 1993 by Daniel Gajski UC Irvine


WIRE MODELING

Wire Comp j
Comp i

RT model

Vdd Wire model


Rw
Comp i Comp j
Rout
C in
Cw 2

Equivalent RC delay model

Lw
Rw = R ( )
s W
w
E
__
C = (L Ww ) ( )
w w t

t = ( R + R ) ( C + C )
p out w w in

3.10

Copyright © 1993 by Daniel Gajski UC Irvine


COMBINATORIAL DELAY

: Critical Path

In 1
B Out 1

In 2 n5
E
In 3
A C n3 F Out 2
In 4
n4
n2 n1

D Out 3

A C E F

3.11

Copyright © 1993 by Daniel Gajski UC Irvine


D−LATCH DELAYS

D
Q
C

D − Latch
t setup t hold

Clock

Data

t t
CQ DQ

Timing Diagram

3.12

Copyright © 1993 by Daniel Gajski UC Irvine


MASTER−SLAVE DELAYS

MSFF

D Master
QM Slave
Q

MS flip−flop

t (MS)
setup

Clock

D
t (S)
setup

QM
(M)
t
DQ
Q

t (S)
CQ

Timing diagram

3.13

Copyright © 1993 by Daniel Gajski UC Irvine


REGISTER−TRANSFER PATH

Reg1 Reg2 MAX ( t (Reg1) , t (Reg2) )


p p

n1 n2 MAX ( t ( n ) , t ( n ) )
Clock p 1 p 2

t p (ALU)
ALU
n3 t p ( n3)

t (Reg3)
setup
Reg3

3.14

Copyright © 1993 by Daniel Gajski UC Irvine


SYSTEM CLOCKING MODELS

Memory
Control unit RAM
Path1
n7 n8

register
n10 n1 n2

State
Next−state Control Reg.
logic logic file AR DR
n4 n n
5 6
n
9 n Functional
3
unit
Datapath

Non−pipelined control

Memory
Control unit RAM
Path1
n7 n8
register

n10 n1 n2
State

Next−state Control Reg.


logic logic file AR DR
n4 n5 n
6
n
9 Functional
Status register
n11 n unit
Path2 3
Datapath

One−stage pipeline

Path2
Memory
Control unit RAM
Path1
n10 n1 n2 n7 n8
register

n
Control
reg.
State

Next−state Control 12 Reg.


logic logic file AR DR
n4 n5 n
6
n
9 Functional
Status register unit
Path3 n n
11 3 Datapath

Two−stage pipeliine

3.15

Copyright © 1993 by Daniel Gajski UC Irvine


FUTURE DIRECTIONS

Better models

Modeling algorithms

Control Optimization
State encoding
Logic optimization
Microarchitecture optimization

Floorplanning models
Other measures

Other technologies

3.16

Copyright © 1993 by Daniel Gajski UC Irvine


Design Description
Languages

Chapter 4
Source: Gajski, Dutt, Wu, Lin

"High−Level Synthesis"

Kluwer Academic Publishers, 1992

4.1

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


NEED FOR HARDWARE
DESCRIPTION LANGUAGE

Concept

Schematic capture High−level


English
and simulation specification synthesis

HDL
description
Manual
design

Synthesis
tools

Register−transfer
design

Design specification

Documentation/Redesign

Verification/Simulation

Communication between designers

4.2

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


DESIGN SPECIFICATION

1. Conceptual capture

2. Higher abstraction level

3. Detect Early Design Errors

4. Model hardware realistically

5. Facilitate synthesis, simulation


and verification

6. Good spec ==> Good design


Poor spec ==> ?

4.3

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


SPECIFICATION:
Programming Language Features

1. Data types
Integer, boolean, arrays

2. Operators
Arithmetic, logic, manip, access

3. Control
If, case, repeat, decode

4. Conciseness
Macros, subroutines

5. Extensibility
Operator overloading, user definitions

6. Expressivity
Hardware constructs, constraints

7. Bindings, user annotations


Component/Time allocation and binding
4.4

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


SPECIFICATION: Design Features

1. Design model
Target architecture: DSP, uproc

2. Execution ordering
Sequential, parallel, pipeline

3. Hierarchy
Complex descriptions

4. Timing specification
Clocks, delays

5. Synchronization
Communication protocols

6. Asynchrony
Global events, resets

4.5

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


HDL FORMATS

Textual
Programming languages
Pascal, Ada, ISPS, VHDL, Verilog

Applicative
DSL
Structural
EDIF, VHDL

Formal
HOP, CIRCAL

Graphical
Hierarchical FSM
StateCharts
Petri−Nets
GDL

FlowCharts
ASM, EXEL

Waveforms

Tabular
Symbolic MicroCode

State Tables
4.6

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


TYPICAL HDL’S

ISPS (Barb 81)

Hardware C (Ku DE 91)

Silage (Hilf 85)

VHDL (IEEE 88)

Verilog (ThMo 91)

4.7

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


ISPS PARADIGM [Barb81]

ISPS Description

Parser

Global Data Base


(Parse Trees)

Fault ....
Analysis
Synthesis
(Design
Automation)
Architecture
Evaluation
Architecture Simulation
Certification

4.8

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


ISPS MODEL

NETWORK OF ENTITIES

ONE ACTIVITION ONLY


CRITICAL
MAIN ENTITIY3 ...
ENTITY (PARAMETERS)
label :=
BEGIN
** SECTION1 ** CONCURRENT SEQUENTIAL
1 COPY CALL
** SECTION2 **
ENTITY2 ... ENTITY1 ...
END

4.9

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


ISPS: FEATURES
Data types
constants
prefix meaning example
’ Boolean ’10?0
# Octal #17
{0−9} Decimal 12
" Hexadecimal "A

bit−vectors & arrays


Acc\accumulator<0:31>
Mem\register.file[0:15]<0:31>

Control constructs

if x => PC = PC + 1

decode x =>
begin
0 := Acc = 0,
1 := Acc = Acc + 1,
end

Label := repeat
begin
if (done) => leave Label
end

4.10

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


ISPS: FEATURES (cont’d)
Operation execution
All operations executed in parallel

Squentiality enforced by NEXT


PC = PC + 1;
ACC = 0;
next
IR = M[PC]

Concurrency
a PROCESS entity executes asynchronously

a CRITICAL entity is queued

Synchronization
process Master := begin
process Slave := begin
...

L2 := repeat begin
nbsend (5) {Messg:Inp1}; nbrecv(:Start)
nbsend (1) {Messg:Start}; if (Start) => leave L2
end;
L1 := repeat begin nbsend (0) {Messg:Done};
nbrecv(:Done) {Messg:Done}; nbrecv (:X) {Messg:Inp1};
if (Done) => leave L1
...

end;
nbsend (0) {Mesg:Start}; nbsend (1) {Messg:Done};
...

end
end

4.11

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


ISPS MARK 1 DESCRIPTION

**Instruction Execution**
Icycle\Instruction.Cycle(main) :=
begin
Mark1 := repeat
PI = M[CR]<0:15> next
begin decode f =>
**Memory.Primary.State** begin
M[0:8191]<31:0>, #0 := CR = M[s]
**Central.Processor.State** #1 := CR = CR + M[s]
#2 := Acc = Acc − M[s]
PI\Present.Instruction<0:15>,
#3 := M[s] = Acc
f\function = PI<0:2>, #4, #5 := Acc = Acc + M[s]
s<0:12> := PI<3:15>, #6 := if Acc < 0 =>
CR = CR + 1
Acc\Accumulator<0:31>, #7 := stop()
end next
CR = CR + 1
end
end
end

4.12

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


DSL: Paradigm and Model

Paradigm
DSL Behavioral Description

Compiler

other
Flow Graphs applications

Automatic Synthesis

Model

CIRCUIT chip; SEQUENTIAL M1


CALL
INTERFACE END;
AREA
FREQUENCY
POWER CALL

APPLICATIVE
SEQUENTIAL M2
END;
END;

4.13

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


DSL FEATURES
Description styles

Applicative
Single assignment, concurrent
Global resets, interrupts
Imperative
Pascal−like procedures
Default sequential execution
Operation−level concurrency:
FORK A := B, C := D JOIN;

Delay specification

CLOCK CYCLES
(A := A + 5; B := 3 CYCLES < 4);

ABSOLUTE DELAY
(IF X THEN A := B DELAY < 20);

Chip−level constraints

POWER
VOLTAGE
LEVELS
AREA
FREQUENCY
4.14

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


DSL SPECIFICATION (CaRo 85)

CIRCUIT exponentiation;

INTERFACE VAR x,y :FIXED(8,8);


vcc :12v; I :LOGICAL(4..0);
gnd :GND;
input(15..0) :INPUT FANIN 1;
output(15..0) :OUTPUT FANOUT 10; APPLICATIVE
enable :INPUT FANIN 1; output := y;
clk :CLOCK FANIN 1;
IF enable = 0 THEN x :=1, y := 1
ELSE Start calc;
POWER 100 mW; FI;
VOLTAGE 12.00V;
TECHNOLOGY CMOS; END APPLICATIVE;
AREA 30 sq. mm;
FREQUENCY 0 to 500 kHz;
CLOCKBASE clk; IMPERATIVE calc;
(FOR i := 1 to 16 DO
PERFORMED FUNCTION x := x * input / i;
output := #exp(input); y := y + x;
CONTROL OD CYCLES = 3);
(enable := 0 CYCLES = 1); END IMPERATIVE;
(enable := 1 CYCLES = 48);
END.
END;

4.15

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


PROCESS SYNCHRONIZATION
IN HARDWARE C

a b x y

process P1(a,b) process P2(x,y)


in port a; out port x;
out port b; in port y;
{ {
...... ......
} }

process P1(a,b) process P2(a,b)


in channel a; out channel a;
out channel b; a in channel b;
{ {
receive (a,buf); send (a,msg);
...... b ......
send (b,msg); receive (b,buf);
} }

4.16

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


MEMORY BOARD DESIGN
AND TIMING DIAGRAM

Memory board

ABus Addr
CPU MemReq
MR
DataRdy Mem
cntrl ROM
Data
BusAck
BUS
BusReq
CNTRL

DBus

MemReq

BusAck

MR

Addr

Busreq 175
ns

DBus

DataRdy

4.17

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


MEMORY BOARD DESCRIPTION
IN BIF

Memory board

ABus Addr
CPU MemReq
MR
DataRdy Mem
cntrl ROM
Data
BusAck
BUS
BusReq
CNTRL

DBus

CPU−memory board block diagram

Present Cond Next


Val Actions Event
State State
DBUS = ’X’;
0 T 1 Falling(MemReq)
DataRdy = 1;
BusReq = 1;

T MR = 0;
Abus 2 Falling(BusAck)
{18..16} Addr = ABus;
== BusReq|( delay 175ns) = 0;
1
Board_Id
F 1 Falling(MemReq)

BusReq = 1;
2 T DBus = Data; 3 Rising(MemReq)
DataRdy = 0;

3 T MR = 1; 0 Rising(BusAck)
Addr = ’X’;

Memory board read cycle in BIF


4.18

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


BEHAVIORAL HIERARCHY IN
SPEC CHARTS (NaVG 91)

SYSTEM declarations: port RESET_IN : in bit;


connections: CPU.CLK : CLK_GEN.CLK;
constraints: num_chips <= 3; area_per_chip <= 60sqmm;

CPU port CLK : in bit; CLK_GEN


variable ACCUM, INSTR, PC : integer;
variable MEM : mem_array (255 downto 0); port CLK : out bit;

/* code behavior that


RESET generates a clock */
rising(RESET_IN)
ACCUM, INSTR, PC := 0;
loop
CLK <= ’0’;
wait for 100 ns;
ACTIVE CLK <= ’1’;
signal OPCODE, ADDR : integer; wait for 100 ns;
end loop;

FETCH
INSTR := MEM(PC) ;
PC := PC + 1; EXECUTE
case OPCODE is
when 1 =>
ACCUM := 0;
DECODE when 2 =>
OPCODE <= INSTR/10; ACCUM := ACCUM + 1;
ADDR <= INSTR mod 10; ...
wait for 30 ns; end case;

(OPCODE=0) not (OPCODE=0)

4.19

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


DESIGN HIERARCHY IN VHDL

VHDL Configuration

Design Design
Entity Entity

Design Interface
Entity
Architectural Body
Design
Entity Process Block DataFlow Block
(sequential behavior) (concurrent behavior)

p1: process(clock) b1: block


begin begin
..... .....
end process; end block;

Structure Block
(netlist)

Reg Reg

ALU
Bus

4.20

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


VHDL [IEEE87]

VHDL Hardware Description Language

Paradigm
VHDL Description

Compiler

other
Internal Model applications

Event−Driven Simulator

Simulation Language

Signals, Registers, Ports: containers w/ drivers

Driver Values Scheduled as Events in Simulation Time


Bus Resolution Function for Multiple Drivers

4.21

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


VHDL FEATURES

Strongly Typed Language

Operator Overloading

Concurrency
Process Level

Operation Level:
DataFlow Blocks

Timing Specification
Transport & Inertial Delays
A <= {TRANSPORT} B AFTER 5 ns

WAIT
WAIT FOR 20 ns

4.22

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


VHDL FEATURES

Resolution Function
Resolve multiple drivers for a signal

Packages
Definitions, Macros

Attributes
Signals:
S’STABLE

S’QUIET

User Extensions
Constraints, Annotations (not simulated)
attribute Performance : Integer

attribute LayoutSize : Integer

attribute Power : Integer

4.23

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


SEQUENTIAL AND PARALLEL
EXECUTION IN VHDL

P1: PROCESS (clock)

begin
A <= B;
B <= A;

end PROCESS P1;

B1: BLOCK (clock’event AND


clock = ’1’)
begin
A <= guarded B;

B <= guarded A;

end BLOCK B1;

4.24

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


VHDL DESCRIPTION STYLES

Behavior
Abstract algorithm
Sequential description of functionality

No implied structural implementation

DataFlow
Parallel execution of operations

Data transformations, register transfers

Hints at structural implementations

Structural
Component instantiations, interconnections
Netlist description

4.25

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


FULL ADDER:
DATAFLOW DESCRIPTION

entity FULL_ADDER is
port (X,Y: in BIT; X Y CIN
CIN: in BIT;
SUM: out BIT;
COUT: out BIT ); COUT SUM
end FULL_ADDER;

Entity interface description in VHDL

architecture FA_BOOLEAN of
FULL_ADDER is
signal S1, S2, S3: BIT;
begin
S1 <= X xor Y;
SUM <= S1 xor CIN after 3 ns;
S2 <= X and Y;
S3 <= S1 and CIN;
COUT <= S2 or S3 after 5 ns;
end;

Data flow

X Y

S2
COUT S1
S3 CIN

SUM

Synthesized structure
4.26

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


FULL ADDER:
BEHAVIORAL DESCRIPTION

architecture FA_BEHAV of
FULL_ADDER is
begin
process(X,Y,CIN)
variable BV: BIT_VECTOR(1 to 3);
variable NUM,I: INTEGER;
variable Stemp, Ctemp: BIT;

begin
NUM := 0; case NUM is
when 0 => Ctemp:=’0’; Stemp:=’0’;
BV := X & Y & CIN;
for I := 1 to 3 loop when 1 => Ctemp:=’0’; Stemp:=’1’;
when 2 => Ctemp:=’1’; Stemp:=’0’;
if (BV(I) = ’1’) then
when 3 => Ctemp:=’1’; Stemp:=’1’;
NUM := NUM + 1; end case;
end if;
SUM <= Stemp after 3 ns;
end loop;
COUT <= Ctemp after 5 ns;
end process;
end FA_BEHAV;

VHDL description

NUM = 0 X Y CIN

3 2 1 S

INC

INC

1 1 0 0 1 0 1 0
INC
3 2 1 0 3 2 1 0
MUX MUX

COUT SUM

Synthesized full adder


4.27

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


FULL ADDER:
STRUCTURAL DESCRIPTION

entity FULL_ADDER is Y
X CIN
port (X,Y: in BIT;
CIN: in BIT;
SUM: out BIT;
COUT SUM
COUT: out BIT );
end FULL_ADDER;

architecture Structure_View of
A B
FULL_ADDER is
component Half_adder
HA
port (A,B: in BIT;
S,C: out BIT);
end component; AB C S
component Or_gate
port (A,B: in BIT;
O: out BIT); O
end component; X Y CIN

signal C1, S1, C2: BIT;


HA1

begin
S1
HA1: Half_adder port map
(A=>X, B=>Y, S=>S1, C=>C1); C1 HA2
HA2: Half_adder port map C2
(A=>S1, B=>CIN, S=>SUM, C=>C2);
OR!: Or_gate port map
(A=>C1, B=>C2, O=>COUT);
COUT SUM
end;

4.28

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


MODELING

1. Language and architecture matching

2. General languages induce modeling styles

3. Description disambiguation and


design optimization

4. Real vs simulated delays

5. Language constructs with no hardware


realization

6. Modeling guidelines

4.29

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


LANGUAGE INDUCED
UNNECESSARY HARDWARE

CNT_CLR: block(CLR = ’1’)


begin
CNT1 <= guarded B"0000" after CLRDEL;
end block;

CNT_UP: block(EN = ’1’ and CLK = ’1’


and CLK’event and INC = ’1’)
begin
CNT2 <= guarded CNT + B"0001" after INCDEL;
end block;

SEL: CNT <= CNT1 when not CNT1’quiet else


CNT2 when not CNT2’quiet else
CNT;

VHDL counter description

"0000"
CLR
mux1

"0001" CNT1

EN
+
not INC EN
mux3
INC CNT1’quiet
CLK CNT CLR
mux2

CLK
not
CNT2’quiet mux4
CNT2

CNT

Initial hardware synthesized RT counter

4.30

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


VHDL SYNTHESIS PROBLEMS

Identification of storage elements, signals

Language constructs with no realizable


hardware

Collecting, identifying component attributes

Specification of asynchronous events

Use of multiple blocks/processes to


describe one component

4.31

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


MODELING GUIDELINES

1. Matching of semantic models to architectural models


(a) specialized languages
(b) modeling styles (structured properties)

2. Combinatorial style
(a) concurrent execution semantics
(b) connection of logic gates
(c) no clocks
(d) dataflow VHDL constructs

3. Functional style
(a) one−state FSMD
(b) synchronous and asynchronous behavior
(c) signal typing
(d) dataflow block VHDL constructs

4. Register−transfer style
(a) FSMD model
(b) states, condition, actions
(c) no explicit VHDL constructs

5. Behavioral style
(a) communicating processes
(b) shared memory or message passing
(c) no allocation, no binding, no schedule
(d) VHDL process statements, wait statements

4.32

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


VHDL FUNCTIONAL
DESCRIPTION STYLE

CNT_UP_CLR: block( CLR = ’1’


or (EN = ’1’ and CLK’event and CLK = ’1’))
begin
CNT <= guarded
B"0000" after CLRDEL when CLR=’1’ else
CNT + B"0001" after INCDEL when INC=’1’ else
CNT;
end block;

4.33

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


VHDL RT
DESCRIPTION STYLE

State_Fetch: block ( (CLK’event and CLK=’1’) and (state=S0))


begin
IR <= M(PC);
state <= S1;
end block;

State_Decode: block ( (CLK’event and CLK=’1’) and (state=S1))


begin case IR is
when "0000" => ACC <= ACC + 1;
state <= S2;
when "0001" => ACC <= 0;
..... state <= S3;

end case;
end block;

4.34

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


BEHAVIORAL
DESCRIPTION STYLE

architecture SHIFT_MULT of MULT is


begin
A_PORT B_PORT
process
variable A, B, M : BIT_VECTOR;
START variable COUNT : INTEGER;
begin
CLK wait until (START = 1);
A := A_PORT; COUNT := 0;
M_OUT DONE B := B_PORT; DONE <= ’0’;
M := B"0000";
while (COUNT < 4) loop
entity MULT is
port ( A_PORT, if (A(0) = ’1’) then
B_PORT: in bit_vector(3 downto 0); M := M + B;
end if;
M_OUT: out bit_vector(7 downto 0);
A := SHR(A, M(0));
CLK: in CLOCK;
M := SHR(M, ‘0’);
START: in BIT; COUNT := COUNT + 1;
DONE: out BIT; end loop;
);
M_OUT <= M & A;
end MULT;
DONE <= ‘1’;
end process;
end SHIFT_MULT;

4.35

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


CURRENT ISSUES

Raise abstraction level

User−interaction / annotation

Design frameworks

Unified design representation

Constraint specification & representation

Design verification

4.36

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


FUTURE DIRECTIONS

Specialized languages

Variety of Intermediate forms

Architectural taxonomies

Modeling guidelines

Design scenarios

4.37

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


Design Representation

Chapter 5
Source: Gajski, Dutt, Wu, Lin

"High−Level Synthesis"

Kluwer Academic Publishers, 1992

5.1

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


ROLE OF INTERMEDIATE
REPRESENTATION

Synthesis
tools
T2
T1 T3

L1
A1

Input Canonical
L2 Target
intermediate A2
HDLs architectures
representation

A3
L3

− Database for complete design information

− Uniform view across tools and users

− Language independent

− Support all architectural styles

5.2

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


High−Level Synthesis Trajectory

VHDL Description
Compilation, Transformations

Value Lifetimes,
Control & Data Dependencies CDFG
Scheduling Partition into control steps,

Allocation Select component types (resources)


Assign resources to ops in each step

FSM DP
Controller Structure

5.3

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


SHIFT−MULTIPLIER: VHDL BEHAVIOR

architecture SHIFT_MULT of MULT is


begin
process
variable A, B, M : BIT_VECTOR;
variable COUNT : INTEGER;
begin
wait until (START = 1);
entity MULT is A := A_PORT; COUNT := 0;
port ( A_PORT, B := B_PORT; DONE <= ’0’; A_PORT B_PORT
B_PORT: in bit_vector(3 downto 0); M := B"0000";
M_OUT: out bit_vector(7 downto 0); START
while (COUNT < 4) loop
CLK: in CLOCK;
if (A(0) = ’1’) then CLK
START: in BIT;
M := M + B;
DONE: out BIT; end if;
M_OUT DONE
); A := SHR(A, M(0));
end MULT; M := SHR(M, ‘0’);
COUNT := COUNT + 1;
end loop;
M_OUT <= M & A;
DONE <= ‘1’;
end process;
end SHIFT_MULT;

5.4

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


DESIGN FLOW IN HLS
(Shift−Multiplier Example)

Control−flow graph
Data−flow graphs

Read ‘1’
START
0 1
=

Read Read ‘0’


A_PORT B_PORT
B1
Write
Write A Write B COUNT

‘0’ B"0000"

0 1 Write Write M
DONE

Read Read ‘4’


‘1’ COUNT
A[0]

B4 0 1 = <

Read M Read A
Read M Read B
& ‘1’
B2
Write Write
+
M_OUT DONE
Write M

Read A Read
M[0]

Read ‘1’
COUNT SHR Read M ‘0’

B3 + Write A SHR

Write
COUNT Write M

5.5

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


SCHEDULED CDFG (Shift−Multiplier Example)

START = 1
S0 0 1

B1
A := A_PORT; COUNT := 0; A_PORT B_PORT

B := B_PORT; DONE <= ‘0’;


M := B"0000";
Mult A_Reg Count_Reg B_Reg

S1 COUNT < 4 Shift1 Shift2 Compar


Adder
0 1
B4
Concat
M_OUT <= M & A; DONE <= ’1’;
START

A(0) = ’1’ CLK


0 1 B2 DONE M_Out
S M := M + B ;
2

ENDIF
Initial Allocation
B3
A := SHR(A, M(0)); COUNT := COUNT + 1;
S
3 M := SHR(M, ‘0’);

5.6

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


DESIGN VIEW: Finite State Machine with DataPath (FSMD)

Present Next
Condition Value Actions
State State

A := A_PORT;
B := B_PORT;
T COUNT := 0; S1
S0 START = 1 DONE := ’0’;
M := 0000";
A_PORT B_PORT
F S0
START
T S2
S1 COUNT < 4 CLK
M_OUT := M @ A;
F DONE := ’1’; S0 DONE M_OUT

T M := M + B; S3
S2 A(0) = 1
I/O ports
F S3

A := SHR(A, M(0));
M := SHR(M. ’0’);
S3 COUNT := COUNT + 1; S1

FSMD state table


5.7

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


SHIFT−MULTIPLIER: AFTER ALLOCATION

A_PORT B_PORT

Present Next Mux1 Mux2


Condition Value Actions
State State

Mult Count_Reg
Control Unit
A_Reg B_Reg
1 S2
1
S1 Compar.LT Concat(OP: concat, INPS: Mult, A_Reg);
0 M_OUT(OP: load, INPS: Concat);
Mux5(OP: c1, INPS: ’0’, ’1’); S0 Mult(0) Mux3 Mux4
0
DONE(OP: load, INPS: Mux5);
Mux3(OP: c0, INPS: Mult, Count_Reg);
Mux4(OP: c0, INPS: B_Reg, "0001");
1 Adder(OP: add, INPS: Mux3, Mux4); S3 Shift1 Shift2 Adder
START
Mux1(OP: c1, INPS: Shift1, Adder); 4
S2 A_Reg(0) Concat
Mult(OP: load, INPS: Mux1);

0 S3 CLK A_Reg(0) Compar


Compar.LT 0 1

Mux5
M_OUT DONE

Component−based state table Partial design

5.8

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


SHIFT−MULTIPLIER: After Control Generation

A_PORT B_PORT

Mux1 Mux2
Present Condition Actions Next
State Value State Control unit

S0 & START = 1
S0 & ~(START = 1)
S1 & COUNT < 4
S1 & ~(COUNT < 4)
S2 & A(0) = 1
S2 & ~(A(0) = 1)
S3
Mux1 − − − − 1 − 0 Mult Count_Reg
Mux2 1 − − − − − 0
Mux3 − − − − 0 − 1 A_Reg B_Reg
Mux4 − − − − 0 − −
0001
1 S2 Load A_Reg 1 − − − − − 1
Load B_Reg 1 − − − − − − 0
S1 Compar.LT Clear Count_Reg 1 − − − − − − Mult(0)
Load Count_Reg − − − − − − 1
0 DONE := 1; S0 Clear Mult 1 − − − − − − Mux3 Mux4
Load Mult − − − − 1 − 1
Adder − − − − 1 − Shift1 Shift2
1
Mux3.sel := 0; Shift1 − − − − − − 1
Mux4.sel := 0; Shift2 − − − − − − 1
1 Adder.add := 1; S3 DONE 0 − − 1 − − − Adder
A_Reg(0) Mux1.sel := 1; Next State s1 s0 s2 s0 s3 s3 s1
S2 0100
Mult.load := 1;
0 S3 Compar
State Reg

START Concat
A_Reg(0)
Compar.LT

Symbolic Control Table CLK


DONE M_OUT

Complete Design
5.9

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


HDL COMPILATION

A := B + C;
D := A * E;
X := D − A;

HDL description

Read B Read C Read B Read C

Stmt1
+ +
Write A
Read E

Read A Read E

Stmt2 * *
Write D

Read D Read A

Stmt3 −

Write X Write X

Parse tree DFG

5.10

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


CONTROL AND DATAFLOW
REPRESENTATION

1 2 E
case C is
when 1 => X := X + 2;
A := X + 5; X := X + 2; A := X + 3; A := X + W;
when 2 => A := X + 3; A := X + 5;
when others => A := X + W;
end case;

VHDL Description Control flow representation

Read X

+ 5 3 Read W

+ + +

1 2 E 1 2 E

Read C

Write X Write A

Dataflow representation

5.11

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


DATAFLOW WITH
PRECEDENCE ARCS

b <= a + 1;
a <= b + 1;

Concurrent VHDL

Read b Const 1 Read a

+ +

Write b Write a

Representation

5.12

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


VARIABLE ACCESS REPRESENTATION

Read b Const 1 b 1

+ +

a := b + 1; Write a
a

b := a + 1; Read a

+ +
Seq. VHDL
Write b b

DFG with DFG with


variable nodes variable traces

5.13

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


TIMING REPRESENTATION

Read a loop_init
Read b
Read req

delay: loop_join
min=500, max=1000 +
delay
delay min 100
min 50 max 1000
Const 1 max 90

shr loop_test
0 1
Write ack

loop_body
Write c loop_exit

Dataflow annotation Timing in DFG Timing in CFG

5.14

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


OTHER FLOW GRAPH
REPRESENTATIONS

Data flow
1. case C is
Read C Read X Read W
2. when 1 => X := X + 2;
3. A := X + 5; +
4. when 2 => A := X + 3;
5. when others => A := X + W; Control flow Write A
6. end case;
Const 3 Read X
1 2 E

VHDL Description +

Write A

Read X Const 2

+ Const 5

+
Partitioned CDFG
Write X Write A

Data flow
DX DW DC

BR BR
Control flow C Data flow
E E
1 2
1
2 3 (C=1) (C=E) 2 = 1
(C=2)
+ + + 2 + 2 X
5

5 W
+ 3 4 5
3
2 2
1 E 1
E + 3 + 4 +
5
6
ME ME
A

UX UA

DeJong’s hybrid flow graph SSIM flow graph

5.15

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


VHDL PROCESS
REPRESENTATION

Process <=> Control/Data Flow Graph (CDFG)

Sequential execution
Control Flow Graph
Data Flow Graph

Data_bus.r

VHDL process
Creg.w
P1: process
begin

CREG := DATA_BUS;
CREG = B"00" Count.r 1
if CREG = B"00" then
COUNT := COUNT + 1;
else
COUNT := 0; +
end if
Count.w
end process;

Count.w

5.16

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


VHDL BLOCK
REPRESENTATION

VHDL block <=> Directed acyclic Graph (DAG)

Parallel execution

DAGs
VHDL Block B.r 1

architecture swap of design is +


begin
L: block (clock’event AND clock = ’1’)
begin A.w

L1: A <= guarded B + 1;


L2: B <= guarded A + 1;
A.r 1
end L;
end swap;
+
B.w

5.17

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


TRANSFORMATIONS

1. Compiler transformations

2. Flowgraph transformations

3. Hardware transformations

5.18

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


COMPILER TRANSFORMATIONS

Const 4 Const 5 Const 9

Write C Write C

Constant folding

Read A Read B Read A Read B Read A Read B

* * *

Write C Write D Write C Write D

Redundant operator elimination

5.19

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


SIGNAL ATTRIBUTE TRANSFORMATIONS

Read Read Read Const


X = 1 and not(X’stable) Signal X Const 1 Stable

1 2 3

= =

NOT
4

6
Read
Signal X
AND

7 sensitivity: EDGE
active edge: POSITIVE

5.20

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


TREE−HEIGHT TRANSFORMATION

a := b + c − d + f − g + h + k;

VHDL code


+
− +

b + + +
+
c b c d f g h k
t1 := h + k −
d
t2 := g + t1 + t1 := b + c Potential
t3 := f − t2 f t2 := d + f
+
parallelism
t4 := d + t3 g t3 := g + h
t5 := c − t4 t4 := t1 − t2
a := b + t5 t5 := t3 + k Potential
h k parallelism
a := t4 − t5
Initial parse tree After tree height reduction

5.21

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


CONTROL−TO−DATAFLOW
TRANSFORMATIONS

if (X = 0) then
A := B + C;
D := B − C;
else
D := D − 1;
end if;

Textual representation

Read
0
X
If_test
0 1 Read Read
= B C

Stmt_blk2 Stmt_blk1
Read + −
1
D

Write Write
− A D
If_join

Write
D

CF representation

Read Read Read


1
B C D
Read
0
X
Read

+ −
A
=
1 0 1 0

Write Write
A D

DF representation
5.22

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


CONTROL FLATTENING

Stmt_blk1

Stmt_blk1
If_test
0 1

If_test
Stmt_blk2 0 1

Stmt_blk3 Stmt_blk6
Stmt_blk3
If_test
0 1
If_join
Stmt_blk4 Stmt_blk5

If_join

If_join

Stmt_blk7

Original CFG Final CFG

5.23

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


LOGIC−LEVEL
TRANSFORMATIONS

c = (a’ NAND (a NAND b)) = a

HDL boolean expression

Read a Read b

NOT NAND Read a

NAND Write C

Write C

Original flow graph After logic optimization

5.24

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


RT TRANSFORMATION

RT−level function recognition

RT−Component specific transformation

A B 1 A B

add
n1 + n3
inc

n2 +

Adder/Incrementer Transformation

5.25

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


COMPLEX FUNCTION
RECOGNITION

A B "0001"

case F is + −
when "00" => OUT <= A + B;
when "01" => OUT <= A + B + "0001";
when "10" => OUT <= A − B; + + − +
when "11" => OUT <= A − B + "0001";
end case; "00" "01" "10" "11"
F

OUT

VHDL DF graph

A B

A B
+ AI − SI
"00" : +
F "01" : AI
"00" "01" "10" "11"
F "10" : −
"11" : SI

OUT
OUT

Simplified DF graph Complex−node DF graph

5.26

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


FUTURE DIRECTIONS

1. Common representation

2. Different views

3. Different architectures

4. Description disambiguation

5. Layout−Driven Transformations

6. Transformation scripts

7. Representation for interactive design

5.27

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


Partitioning

Chapter 6
Source: Gajski, Dutt, Wu, Lin

"High−Level Synthesis"

Kluwer Academic Publishers, 1992

6.1

Copyright © 1993 by Daniel Gajski UC Irvine


PARTITIONING

Used in HLS for:

Scheduling

Allocation

Unit selection

Chip partitioning

Problem decomposition
for tractability

6.2

Copyright © 1993 by Daniel Gajski UC Irvine


COMPONENT PARTITIONING

a
FF1 FF2 c
b G1
a v1 v2 v3
c
ni nj b
Cutline e36
d e24
REG. v5
e d
e v6
e v4 25 G2
f
f G
g
g

Design structure Graph model

a b

Chip 1 c

n n
i j

Chip 2

d e f g

Partitioned Design

6.3

Copyright © 1993 by Daniel Gajski UC Irvine


BEHAVIORAL PARTITIONING

−Time utilization
I1 I2
−Component utilization
process1

entity VHDL EXAMPLE is


port (I1, I2, I3 : in integer; B H
I3
O1 : out integer;)
signal B, F, H : integer;
process2 process3 O1
end entity;
F
architecture BEHAVIOR of EXAMPLE is
begin Inter−process communication
process1
var : A, C, E : integer;
while (I1 > 0) loop
if (B > 0) I1
B <= C − I2;
else
I1 > 0
B <= A − I2;
end if; H
B>0 exit
wait until (H > 0);
end loop; true false wait
end process1; C I2 A I2 H>0

process2 B B
var : D : integer;
end if process1
wait until (B <= 0);
D := I3 + B;
F <= I3 + I1;
B process2 F process3
end process2;
process3 wait wait
var : G : integer; B <= 0 F>0

wait until (F > 0);


O1 <= I3 + G; B I3 I1 F I3 I1
H <= I3 + I1; + +
end process3; + +
end D F G H

Textual Description Control/Data


Flow Graph
6.4

Copyright © 1993 by Daniel Gajski UC Irvine


PARTITIONING TECHNIQUES

Constructive methods
1. Random selection

2. Cluster growth

3. Hierarchical clustering

Iterative−Improvement methods
1. Min−cut partitioning

2. Simulated annealing

6.5

Copyright © 1993 by Daniel Gajski UC Irvine


CLUSTER GROWTH ALGORITHM

Algorithm 6.1

page 185

6.6

Copyright © 1993 by Daniel Gajski UC Irvine


HIERARCHICAL CLUSTERING

Graph Closeness measure Cluster tree

v1
v1 v v v v
5 4 2 3 4 5 v(24)
v2 v1 − − − − −
1 v3
v2 5 − − − − v1 v2 v4 v3 v5
6 3 v3 4 1 − − −
v5 v4 0 6 0 − −
v4 v5 0 3 0 0 −
(a)
v(241)
v1 v1 v(24) v v5
3 v(24)
5 4 v1 − − − −
1 v(24) 5 − − −
v3 v1 v2 v4 v3 v5
v(24) v3 4 1 − −
3 v5 0 3 0 −
v5

(b)
v(2413)
v(241)
v(241) v3 v5 v(241)
v(241) v(24)
v3 − − −
4
v3 4 − −
3 v5 3 0 − v1 v2 v4 v3 v5
v5

(c)
v(24135)
v(2413)
v (2413) v(2413) v5 v(241)
v(24)
v(2413) − −
v5 3 −
3 v1 v2 v4 v3 v5
v5

Cluster tree formation

6.7

Copyright © 1993 by Daniel Gajski UC Irvine


HIERARCHICAL CLUSTERING

Algorithm 6.2

page 188

6.8

Copyright © 1993 by Daniel Gajski UC Irvine


CLUSTERING WITH SEVERAL
CRITERIA

Criterion A
a b c

First
3 4 1 2
5 cutline Criterion A

d e f
f c a e d b
Criterion B (a)
a b 3 c

First
2 cutline
1
5
4
Criterion B
d e f
c e f b a d
(b)
{f,c} {d}
3
5 1
2 A then B
{a,e} {b} {a,e} {f,c} {b} {d}

(c)
4
{c,e} {a,d}

5 1

{f} {b}
B then A
{c,e} {f} {a,d} {b}

(d)

Second
cutline 5
{f,c} {a,e,d}

3 1
A then B
{b} {a,e,d} {f,c} {b}
f c a e d b

6.9

Copyright © 1993 by Daniel Gajski UC Irvine


ITERATIVE ALGORITHMS

G1 G2

Cutline

Two−way partitioning (Kernighan−Lin)

Start with 2 equal subgraphs

Each iteration, exchange


K−pairs between partitions

Continue until no further


improvement

6.10

Copyright © 1993 by Daniel Gajski UC Irvine


MIN−CUT PARTITIONING

Interconnection Reduction

Cutline Cutline

v v v v
i j j i

G1 G2 G1 G2

Before interchange After interchange


of Vi and Vj of Vi and Vj

6.11

Copyright © 1993 by Daniel Gajski UC Irvine


MIN−CUT PARTITIONING

Algorithm 6.3
page 194

6.12

Copyright © 1993 by Daniel Gajski UC Irvine


MIN−CUT SEARCH STRATEGY

GAIN(k)

20

10
5

k
1 2 3 4 5 6 7 8 9 10
−5

GAIN (5) is maximum.

Thus, perform the first 5 exchanges

6.13

Copyright © 1993 by Daniel Gajski UC Irvine


SIMULATED ANNEALING

Algorithm 6.4
page 197

6.14

Copyright © 1993 by Daniel Gajski UC Irvine


CLUSTERING EXAMPLE

a b c

o1 o2 e13
(+) G1 (+) add1
Two cluster
mult1
G1
e
partition
e13 e23 23 G2
o3
G2
*
( )

a b c

o1 o2
G1 G2 e13
add1
(+) (+)
G1
mult1 Three cluster
e13 e
G3
partition
23 add2
o3 G2 e
G3 23
*
( )

6.15

Copyright © 1993 by Daniel Gajski UC Irvine


CLUSTERING FOR UNIT
SELECTION

1. Functional proximity

2. Communication proximity

3. Potential parallelism

4. Closeness distance

6.16

Copyright © 1993 by Daniel Gajski UC Irvine


PARTITIONING SCRIPTS IN APARTY

CDFG

User choice

Clustering
Procedure/ Procedure/
Control Data control data Operator

Physical constraints User choice

Cutline
Schedule
Area Connections length

No
Done?
Yes

Partitioned CDFG

6.17

Copyright © 1993 by Daniel Gajski UC Irvine


DESCRIPTION PARTITIONING

main
num_msgs : register(8); reset system_off
user_id_ram : memory(4x4);
system_on not system_on

system_on

initialize respond_to_machine_button

machine_button_pushed

respond_to_external_line
monitor dialtone

answer
dialtone

play_announcement record_msg
tone=1 tone=1

remote_operation
check_user_id respond_to_cmds
code_ok
dialtone not code_ok

Answering machine description

check_user_id entered_code : memory(4x4);


i : integer range 1 to 5;
i := 1;
while i <= 4 loop
wait until button_tone;
entered_code[i] := button;
i := i + 1;
end loop;
if (entered_code[1] = user_id_ram[1]) and
...
(entered_code[4] = user_id_ram[4]) then
code_ok <= true;
else
code_ok <= false;
end if;

Object description
6.18

Copyright © 1993 by Daniel Gajski UC Irvine


PARTITIONED DESCRIPTION

num_msgs (200) AREA: 12641


8
PINS: 62
2
2
main (3412) respond_to_machine_button (3461) respond_to_cmds (5568)

external ports
17 24

8 3 2 1 1 4
2
monitor (4489) check_user_id (4272) user_id_ram (750) AREA: 9511
7 PINS: 21

6.19

Copyright © 1993 by Daniel Gajski UC Irvine


FUTURE DIRECTIONS

Partitioning of CDFG

Partitioning of specifications

Estimation of quality measures

Software/Hardware partitioning

6.20

Copyright © 1993 by Daniel Gajski UC Irvine


Scheduling

Chapter 7
Source: Gajski, Dutt, Wu, Lin

"High−Level Synthesis"

Kluwer Academic Publishers, 1992

7.1

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


High−Level Synthesis Trajectory

VHDL Description
Compilation, Transformations

Value Lifetimes,
Control & Data Dependencies CDFG
Scheduling Partition into control steps,

Allocation Select component types (resources)


Assign resources to ops in each step

FSM DP
Controller Structure

7.2

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


Scheduling

Definition: Task of assigning behavioral operators


to control steps

Input: CDFG (data lifetimes with control and data


dependencies)

Output: Temporal ordering of individual operations


(FSM states)
Constraints: Hardware Resources, Timing,
Testability, Power, .....

Goal: Exploit parallelism to achieve fastest design


within constraints

7.3

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


SCHEDULED CDFG

START = 1
S0 0 1

B1
A := A_PORT; COUNT := 0;
B := B_PORT; DONE <= ‘0’;
M := B"0000";

S1 COUNT < 4
0 1
B4
M_OUT <= M & A; DONE <= ’1’;

A(0) = ’1’
0 1 B2
S M := M + B ;
2

ENDIF

B3
A := SHR(A, M(0)); COUNT := COUNT + 1;
S
3 M := SHR(M, ‘0’);

7.4

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


Scheduling: Assumptions

Target Architecture:
Pipelining, Clocking, Busing, Component Sets, ...

Language Constructs Permitted:


Conditionals, Loops, Arrays, ADTs,...

Temporary Assumptions for Illustration:


1. No pipelining, single−phase clock
2. Straight−line code, simple data−types
3. Each operation executes in 1 control step
4. Each operation performed by 1 component type
7.5

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


Scheduling Example: HAL

u dx 3 x u dx x dx

v v
* v * 4 + 10
* v 2
1 e
while (x < a) loop e e2,5
4,9
1,5 y
x1 := x + dx; y
e
10,11
* v * v + v
9
3
u1 := u − ( 3 * x * u * dx ) − (3 * y * dx ); 5
u e e a
y1 := y + (u * dx); 3,6 dx
5,7
x := x1; u := u1; y := y1;
− v < v
end loop; v * 6 11
7
e
7,8 e
6,8
c
− v
8

VHDL Behavior DFG Representation

7.6

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


ASAP ALGORITHM

Algorithm 7.1

page 217

7.7

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


ALAP ALGORITHM

Algorithm 7.2

page 219

(Check errata)

7.8

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


HAL: ASAP and ALAP Schedules

v1 v2 v3 v4 v
10
v1 v2
E=1 * * * + L=1
* * *

E=2 * * + v < L=2 * v * v


v5 v6
9
v11 5 3

v v v v
7 7
v4 10
6 +
E=3 − L=3 − * *
v8 v v9 v11
E=4 − L=4 − 8 + <

ASAP Schedule ALAP Schedule

7.9

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


Scheduling Formulations

Resource Constrained Scheduling:


Minimize control steps for given resources

List−Based Scheduling
Static−List Scheduling [JMSW91]

Time Constrained Scheduling:


Minimize resources for given number of time steps
Integer Linear Programming(ILP) [LeHL89]
Force−Directed Scheduling (FDS) [PaKn89]
Iterative Refinement [PaKy91]

7.10

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


LIST−BASED SCHEDULING

Algorithm 7.5

page 234

7.11

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


HAL: List Scheduling

ASAP Schedule Operator Mobility ALAP Schedule


1 2 3 4 5 6 7 8 9 10 11
v1 v2 v3 v4 v
10 s1 v1 v2
E=1 * * * + L=1
* * *

s2
E=2 * * + v < L=2 * v * v
v5 v6
9
v11 5 3

v v v v
7 s3 7
v4 10
6 +
E=3 − L=3 − * *
v8 s4 v v9 v11
E=4 − L=4 − 8 + <

Node: v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11


Mobility(op) = ASAP − ALAP Operation: * * * * * * − − + + <
Mobility: 0 0 1 2 0 1 0 0 2 2 2

Maintain priority list for each component type


Schedule critical nodes first
7.12

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


List Scheduling (Cont’d)

ASAP + Priority function (mobility) for each resource


Resource conflicts resolved by priority function
Constructive Method: schedule, reevaluate priority, sort,...

* * − + <

<0> <0> <2> <2>


PList : 1<0>, 2<0>, 3<1>, 4<2>
* s +
PList + : 10<2> 1 * 1 * 2
* 1 * 2 * 4 + 10 10
PList − : NIL
<0> <1> PList < : NIL s <
2 *
* * + < * 5
3 11
5 3 9 11
<2> <2>
<0> <1> Ready List
s −
3 * *
− * 7
7 6 6 4
<0> Resources : 2
* s4 − +
− 8 9
8 Resources+ : 1
Resources− : 1
DFG with mobilities Resources< : 1 Scheduled DFG
Resource Constraints

7.13

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


STATIC−LIST SCHEDULING

* 1 * 2 * 4 + node 8 9 11 7 6 4 10 5 3 1 2
10
ALAP 1 1 1 2 2 2 2 3 3 4 4
* 5 * 3
+ 9 <
11 ASAP 4 2 2 3 2 1 1 2 1 1 1

− 7 * 6
priority 1 2 3 4 5 6 7 8 9 10 11

− 8

DFG Priority List

− + < − + <
* * * *
+ s * * + 10
* 2 * 1 10 1 2 1

s * 3 * 5 < 11
* 3 * 5 2

s * 4 * 6 − 7
3

s4 − 8 + 9

Partial Schedule Final Schedule

7.14

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


Scheduling Formulations

Resource Constrained Scheduling:


Minimize control steps for given resources

List−Based Scheduling
Static−List Scheduling [JMSW91]

Time Constrained Scheduling:


Minimize resources for given number of time steps
Integer Linear Programming(ILP) [LeHL89]
Force−Directed Scheduling (FDS) [PaKn89]
Iterative Refinement [PaKy91]

7.15

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


HAL: ILP FORMULATION

1 2 3 4 5 6 7 8 9 10 11
s1

s2

s3

s4

Operation ranges

s1 v1 v2
* *

v5 v3 v10
s2
* * +

v7 v6 v
s3 − 4
* *

s4 v8 v v11
− + 9 <

Final Schedule

7.16

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


ILP FORMULATION

Formulation on

pages 221−222

(Check errata)

7.17

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


HAL: ILP FORMULATION

1 2 3 4 5 6 7 8 9 10 11
s1

s2

s3

s4

Eq. 7.5 with


Operation ranges
constraints

s1 v1 v2 (pages 222−223)
* *

v5 v3 v10
s2
* * +

v7 v6 v
s3 − 4
* *

s4 v8 v v11
− + 9 <

Final Schedule
7.18

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


Force−Directed Scheduling [PaKn89]

Time−constrained scheduling, minimize resources

Goal: Achieve high unit utilization by uniformly distributing


operations of a particular type over all states

Iterative Approach:
Determine global effect of scheduling an operation into a state
Compute probability of scheduling op into a state
Compute expected operator cost for each op type
Schedule each operation to balance hardware utilization
Choose assignment of op to state with min cost
Recompute probability distr. and expected op costs

7.19

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


HAL: Force−Directed Scheduling

s s 2.83
1 * * 1
1 2
*
s s 2.33
2 * * + 2
5 3
*
s s 0.83
3 − + < 3
7 6 4 10

0.00
s4 − s4
8 9 11

Probabilities of op scheduling Operator cost for Mult (*)


(uniform distr from mobility) (sum of op prob’s per state)

For each iteration of FDS do:


Compute Expected Operator Costs (EOC) for scheduling
each operator type in every state
Assign one operation to control step based on minimal EOC

7.20

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


HAL: Force−Directed Scheduling(Cont’d)

s s 2.83
1 * * 1
1 2
*
s s 2.33
2 * * + 2
5 3
*
s s 0.83
3 − + < 3
7 6 4 10

0.00
s4 − s4
8 9 11

Probabilities of op scheduling Operator cost for Mult (*)

s s 2.33
1 * * 1
1 2

s s 2.33
2 * * * + 2
5 3

s s 1.33
3
− * + < 3
7 6 4 10
0.00
s4 − s4
8 9 11

Probabilities after o3 is scheduled in s2 Revised operator cost for Mult (*)


7.21

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


FORCE−DIRECTED SCHEDULING

Algorithm 7.3

page 227

7.22

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


Iterative Refinement Scheduling [PaKy91]

FDS: Constructive, no backtracking (rescheduling)


Iterative Refinement Scheduling: allows rescheduling
Based on KL Graph Bisection [KeLi70]
Start with any initial schedule
Reschedule one operation at a time, and lock operation
Maximize cumulative gain

s 1 2 3 4 10 s 1 2 3 4 10
1 1

s 5 6 9 11 s 5 9 11
2 2

s 7 s 7 6
3 3

s4 8 s4 8

Initial Schedule After Op 6 is moved and locked


7.23

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


ITERATIVE RESCHEDULING

Algorithm 7.4

page 231

7.24

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


Scheduling with Realistic Assumptions

Functional Units with Varying Delays

s
1 +
s
1

*
s s
1 + + 2

s s1
2 − * * * *
s2 s3
− −

unit−delay multicycling chaining pipelining

Multi−Functional Units
ALUs, Shift−registers, etc.

Realistic Design Descriptions


Conditionals, loops, nested loops
7.25

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


LOOP SCHEDULING

time

b
1 2 3 4 5 6 7 8 9 10 11 12

Sequential Execution

1, 2, 3 4, 5, 6 7, 8, 9 10 , 11, 12

Partial Loop Unrolling

m
1 4 7 10

p 2 5 8 11

3 6 9 12

Loop Folding

7.26

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


LOOP SCHEDULING (Cont’d)

A B C
s A C
1

I F E D
s I F B
2

K L J H G
s K L D
3

M
s4 M E G

N s5
J N H

Q P R s6
Q P R

DFG with Depend. Seq. Schedule


Across Iterations w/ 3 FU’s

7.27

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


LOOP UNROLLING AND
LOOP UNFOLDING

Loop Overhead

A
s A1 A2 B1 C1
1
I F C
s I1 F1 E1 D1
2
L D B
s L1 J1 H1 G1
3
J M E G A
s4 K1 M1 I2 C2
K N H I F C

s5 N1 L2 F2 D2
Q P R L D B

s P1 R1 Q1 B2
6
J M E G

s7 J2 E2 M2 G2 Loop Body
K N H

s8 K2 N2 H2

Q P R
s9 Q2 P2 R2
Loop Overhead

Loop Unrolling Loop Folding

7.28

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


SIMULATED ANNEALING
FORMULATION
ALU1 ALU2 ALU3

s1 v1 = v2 + v3 v4 = v2 * v3

s2 v5 = v1 + v4 v6 = v4 / v1

s3 v2 = v4 * v5

Initial Schedule

ALU1 ALU2 ALU3

s1 v4 = v2 * v3 v1 = v2 + v3

s2 v5 = v1 + v4 v6 = v4 / v1

s3 v2 = v4 * v5

After Swapping Two Operations

ALU1 ALU2 ALU3

s1 v4 = v2 * v3 v1 = v2 + v3

s2 v5 = v1 + v4 v6 = v4 / v1

s3 v2 = v4 * v5

After Displacing an Operation

7.29

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


PATH−BASED SCHEDULING

Input Ports: branchpc, ibus, branch, ire


Output Ports: ppc, popc, obus
Variables: pc, oldpc

1 ppc <= pc 1 1

2 Write popc <= oldpc 2 2 Write

obus <= ibus + 4 i2 3


3 3

if (branch = ’1’) 4
4 4
Branch Branch
i1 Branch
Branch 5
5 then pc<= branchpc 5

6 6 6
end if

7 wait until (ire = ’1’) 7 7


Loop Loop

8 oldpc <= pc 8 8

Write Write
9 9 9
pc <= pc + 4

10 10 10

CDFG Constraint Scheduled


Intervals CDFG

7.30

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


DFG RESTRUCTURING

a b c d e f a b c d e f

+ + + +
1 5 1 5

+ + +
2 2
3

+ +
3
4

+
4

(((a + b) + c) + d) + (e + f) ((a + b) + c) + (d + (e + f))


before after
Tree−Height Reduction

a b d c a b d c

+ +
1 1

+ + +
2 6 2

* *
3 3

+ + +
4 5 4

+
5

d (a + b + c) + ab + ab d (a + b + c) + 2ab
before after
Redundant Operator Insertion
7.31

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


Scheduling: Future Directions

More realistic libraries


Realistic target architectures
Better cost functions (layout−driven)
Scheduling with allocation
Arbitrary descriptions (nested loops, conditionals)
Loop−pipelining, tree−height reduction
Complex data structures (arrays, ADT)
Application−specific scheduling (RISC, VLIW, DSP,..)

7.32

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


Allocation

Chapter 8
Source: Gajski, Dutt, Wu, Lin

"High−Level Synthesis"

Kluwer Academic Publishers, 1992

8.1

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


Allocation

Selection of components to be used in RT design

Binding of hardware structures (units, regs, connections)


to behavioral operators and variables

Define target DP architecture


Clocking, Busing,..

Approaches
Greedy
Decomposition
Iterative

8.2

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


Unit Selection and Binding: Example

Unit Selection: 2 adders, 4 registers

Mapping Behavior to RT Structure:

a b c d
r1 r2 r3 r4
s1 + o1 + o2 a b, e, g c, f, h d
e f

s2 + o3 + o4
+1, +3 +2, +4
ADD1 ADD2
g h

DFG Allocated RT−Structure

8.3

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


Point−to−Point DP Interconnection Architectures

Output OutBus1
interconnection
network OutBus2
Output Register file
interconnection
network r3
Register file
r1 r2 r4 r5 r6
r3

r1 r2 r4 r5 r6
InBus1

Input InBus2
Input interconnection
interconnection network InBus3
network
InBus4

ALU1 ALU2
ALU1 ALU2

Mux−oriented DP Bus−oriented DP

s1: r3 <= ALU1(r1,r2); r1 <= ALU2(r3,r4);


Register−transfers s2: r1 <= ALU1(r5,r6); r6 <= ALU2(r2,r5);
s3: r3 <= ALU1(r1,r6)

8.4

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


One−Phase Clocking of Point−to−Point DP

Read Read r5 InBus1


Output OutBus1
r1
interconnection
network OutBus2 Read Read InBus2
Register file
r2 r6
r3 InBus3
Read r3 Read r2
r1 r2 r4 r5 r6
Read r4 Read r5 InBus4
InBus1 ALU1
Execute Execute
Input InBus2
interconnection Execute Execute ALU2
network InBus3
InBus4 OutBus1
Write r3 Write r1
Write r6 OutBus2
ALU1 ALU2
Write r1

tr t e t w
Bus−oriented DP
Cycle 1 Cycle 2

Requirement: Cycle Time > tr + te + tw


Sequential execution

8.5

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


One−Phase Pipelined Datapath

OutBus1

OutBus2
Read r1 Read r5 InBus1
r3
Read r2 Read r6 InBus2
r1 r2 r4 r5 r6 Read r3 Read r2 InBus3
Read r4 Read r5 InBus4
InBus1 Execute Execute ALU1
Execute Execute ALU2
InBus2 Write r3 Write r1 OutBus1
Write r1 Write r6 OutBus2
InBus3
Cycle 1 Cycle 2 Cycle 3
InBus4

ALU1 ALU2

Clock cycle = max ( tr + te, tw)


Overlapping read and write data transfers
8.6

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


Two−Phase Pipelined Datapath

r3

r1 r2 r4 r5 r6
te

tw tr
InBus1
Read r Read r5 Read r1 InBus1
InBus2 1
Read r Read r6 Read r6 InBus2
InBus3/OutBus1 2
InBus3/
Read r Read r2 Write r Write r1 OutBus1
InBus4/OutBus2 3 3
Read r4 Read r5 Write r1 Write r6 InBus4/
OutBus2
Execute Execute Execute ALU1
Execute Execute Execute ALU2
ALU1 ALU2
Cycle 1 Cycle 2 Cycle 3 Cycle 4

Clock cycle = max (te, tr + tw)


Overlapping data transfers with FU execution

8.7

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


Allocation Tasks

Unit Selection

Functional−unit Binding

Storage Binding

Interconnection Binding

8.8

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


Interdependence and Ordering of Binding

a b c d
r1 r2 r3 r4

s1 + o1 + o2 a b, e, g c, f, h d
e f

s2 + o3 + o4

+1, +4 +2, +3
g h ADD1 ADD2

Scheduled DFG FU Binding with 6 Muxes

r1 r2 r3 r4
a,g b,e c,f d,h r1 r2 r3 r4

a, g b, e c, f d, h

+1, +3 +2, +4
+1, +4 +2, +3
ADD1 ADD2
ADD1 ADD2

Register Reallocation: 4 Muxes FU Rebinding: Optimal Design

8.9

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


Greedy Binding: Examples

r1 r2 r3 r4 r5 r1 r2 r3 r4 r5 r1 r2 r3 r4 r5

Mux1 Bus1 Mux1 Bus1 Mux1 Bus1


ALU1 ALU2 ALU1 ALU2 ALU1 ALU2

Initial partial design add 2 inputs to mux add tristate buffer to bus

r1 r2 r3 r4 r5 r1 r2 r3 r4 r5 r1 r2 r3 r4 r5

Mux2 Mux1 Mux1 Mux2 Bus1


Mux1 Bus1
ALU1 ALU2 ALU1 ALU3 ALU2 ALU1 ALU3 ALU2

add mux to FU input add FU and tristate to bus add FU and mux

r1 r2 r3 r4 r5

Bus1
ALU1 ALU2
convert muxes to shared bus
8.10

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


CONSTRUCTIVE ALLOCATION

Algorithm 8.1

page 276

8.11

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


Allocation: Decomposition Methods

Clique Partitioning [TsSi86]


Left−Edge Algorithm [KuPa87]
Weighted Bipartite Matching [HCLH90]

8.12

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


CLIQUE PARTITIONING

Algorithm 8.2

page 279

8.13

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


CLIQUE PARTITIONING EXAMPLE
Common
Edge neighbors
e’ 1
1,3
s
s 2 e’ 1
1,4
1 v v2
v v2 1
1 e’ 0
e 2,3
1,4
e e e e’
1,3 2,3 2,5 2,5 0
s
5 e’
v3 v4 3,4 1
v3 v4 v5
e e v5 e’ 0
3,4 4,5 s s 4,5
3 4
Graph G Common Neighbors

s Edge Common Common


neighbors s Edge
s 2 2 neighbors
13 e’ 0
13,4
v v2 e’ 0
v2 1
2,5
v e’
1 2,5 0
e’ 0
4,5

v3 v4 v3 v4
v5 v5 s
s 5
s 5 s
4 134
Supernode Creation 1 Supernode Creation 2

v v2 s
1
25
s = {v1 , v 3 , v 4 }
134
v3 v4
s {v2 , v 5 }
s
134
v5
25 =
Supernode Creation 3 Cliques for Graph G

8.14

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


CLIQUE PARTITIONING FOR
REGISTER BINDING
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11

v1 v2

s1 s1 R
v6 + v4
v3 W

s2 s2 R
* −
v10 v5
W
v7
s s R
3 / + 3
+
W
v11 v8 v9
R
s4 & | s4
W
v1 v2

DFG Lifetime intervals

v8

v10
Cliques:
v1 r1 = {v1 , v 8 }
v9 v2

v7
r2 = {v2 , v3 , v 9 }
v11 r3 =
v3 v5 {v4 , v5 , v 11 }
r4 = {v6 , v7 }
v4
v6
r5 = {v10 }

Graph model A CP solution


8.15

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


REGISTER ALLOCATION USING
LEFT−EDGE ALGORITHM

Algorithm 8.3

page 285

8.16

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


Register Binding with Left Edge Algorithm

v1 v10 v4 v6 v2 v3 v5 v7 v8 v9 v11 v1’ v2’ r1 r2 r3 r4 r5

v2
s1 v1 v6
v10 v4

s2
v3

v5 v7
s3

v8 v9 v11
s
4

v1’ v2’

Lifetimes Registers

8.17

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


WEIGHTED BIPARTITE
MATCHING ALGORITHM

Algorithm 8.4

page 288

8.18

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


BIPARTITE MATCHING EXAMPLE
FOR REGISTER ALLOCATION
v1 v10 v4 v6 v2 v3 v5 v7 v8 v9 v11 v1’ v2’

s1

s2

s3

s
4

Cluster Cluster Cluster Cluster


1 2 3 4

Sorted Lifetime Intervals with Clusters

r1 v1

v3
r2
r1 = { v1 , v8 , v1’ }
v10

r2 = { v9 , v10 }
r3 v4 v5

r3 = { v4 , v5 , v11 }
r4 v6
v7 r4 = { v6 , v7 }
r5 v2
r5 = { v2 , v3 , v2’ }
Set R Set V

Bipartite Graph for Binding Final Register Binding


Vars in Cluster2 after Cluster1

8.19

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


PAIRWISE EXCHANGE
ALGORITHM

Algorithm 8.5

page 292

8.20

Copyright © 1993 by Daniel Gajski and Nikil Dutt UC Irvine


Interdependence of Scheduling and Allocation

Scheduling + Allocation => CU + DP

Scheduling: needs prelim component selection


Allocation: needs "rough" schedule

Which one first?

Iteration and Interleaving

8.21

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


Allocation: Future Directions

Interaction between scheduling and allocation

Realistic cost functions

Allocation of memories

Different target architectures

8.22

Copyright © 1992 by Daniel Gajski and Nikil Dutt UC Irvine


Design Methodology
for High−Level Synthesis

Chapter 9
Source: Gajski, Dutt, Wu, Lin

"High−Level Synthesis"

Kluwer Academic Publishers, 1992

9.1

Copyright © 1993 by Daniel Gajski UC Irvine


DESIGN METHODOLOGY
REQUIREMENTS

1. The syntax and semantics of the input and


output descriptions.

2. The set of algorithms for translating input


into output descriptions.

3. The set of components to be used in the


design implementation.
4. The definition and ranges of design
constraints.

5. The mechanism for selection of design


styles, architectures, topologies and
components.
6. Control strategies (usually called scenarios
or scripts) that define synthesis tasks
and the order in which they are executed.

9.2

Copyright © 1993 by Daniel Gajski UC Irvine


TRIVIAL SYNTHESIS SYSTEM
Assumptions: Sample/clock cycle
Computation/2 clock cycles
Operation/clock cycle
Same bit width
a b c d a b c d

r1 r2 r3 r4

+ − + Adder − Subtractor

r5 r6

* * Multiplier

r7

y y

DFG Annotated DFG

Behavioral Hardware Design


description resources constraints
a b c d

Compiler
r1 r2 r3 r4

Scheduler
+ − DFG
Allocator
r5 r6
Netlist
generator
*

r7
Physical design
y
ASIC description
to manufacturing

Datapath Synthesis system


9.3

Copyright © 1993 by Daniel Gajski UC Irvine


PRACTICALITY OF ASSUMPTIONS

1. All units are not of the same bit width or


same propagation delay.

2. Dataflow architechure is too expensive.


3. I/O rates do not match architecture.

4. Synchronous I/O is not always available.

9.4

Copyright © 1993 by Daniel Gajski UC Irvine


EXAMPLE WITH MEMORIES
a b c d
a b c d

r1 r2 r3 r4

+ − State 1 + −

r2
State 2 r4
*
State 3 *

r5
y
y

DFG Annotated DFG

Memory 1 Address bus1


a b c d Control bus2
16 16 16 16
mux mux AR1
r1 r2 r3 r4

Control Address
mux mux
unit generator
+/− *

32
r5 AR2
Control bus1
Address bus2
Memory 2

FSMD implementation

Load1 Load2 Load3 Memory1


+ − + − * − ALU
* * * Multiplier
Store1 Store2 Store3 Memory2

Resource utilization

Load1 Load2 Load3 Memory1


+ − + − + − ALU
* * * Multiplier
Memory2 Store1 Store2 Store3

Improved resource utilization


9.5

Copyright © 1993 by Daniel Gajski UC Irvine


EXAMPLE SYNTHESIS SYSTEM

Behavioral Design
description constraints
HL synthesis
Compiler

Style, architecture, resource selection


Scheduler
DFG
Allocator

Netlist
generator

Logic/Sequential synthesis

Memory Control Functional


synthesis synthesis synthesis
CDB

Physical design

Synthesis system with a component database


and user controlled resource selection

9.6

Copyright © 1993 by Daniel Gajski UC Irvine


EXAMPLE SYNTHESIS SYSTEM

Behavioral Design
description constraints
HL synthesis
Compiler

Scheduler Architecture,
topology,
CDFG
resource
Allocator selection

Netlist
generator

Logic/Sequential synthesis Designer

Memory Control Functional


synthesis synthesis synthesis
CDB
Design
Physical design quality
assessment

Synthesis system with automatic iterative improvement

9.7

Copyright © 1993 by Daniel Gajski UC Irvine


GENERIC SYNTHESIS SYSTEM

Completeness
1. All levels of design

2. Different target architectures

Extensibility
1. Addition of new algorithms and tools
2. Addition of new architecture styles

3. Addition of new libraries

Controllability
1. Control of tools

2. Control of design exploration

3. Quality metrics of design assessment

Interactivity
1. Partial design definition

2. Modification during and after synthesis

Upgradability
1. Capture−and−simulate to describe−and−synthesize

2. Mixing of strategies
9.8

Copyright © 1993 by Daniel Gajski UC Irvine


HYPOTHETICAL SYNTHESIS
SYSTEM
System Specification Designer

System
synthesis

Chip
SDB

Conceptualization environment
synthesis
Simulation code generators

Intermediate forms
Simulation suite

Logic/Sequential
CDB
synthesis

Physical design
synthesis

ASIC description
to manufacturing

1. Supports capture−and−simulate and


describe−and−synthesize methodologies.

2. Separation of synthesis and simulation.

3. Hierarchical interactive synthesis.

9.9

Copyright © 1993 by Daniel Gajski UC Irvine


SYSTEM SYNTHESIS METHODOLOGY

System description

Compiler

Standard
Estimator SR component
binding

Partitioner

Interface &
arbitration
synthesis

Port minimization

Partitioned system
description

Scheduling

To chip synthesis To RT synthesis

9.10

Copyright © 1993 by Daniel Gajski UC Irvine


CHIP SYNTHESIS METHODOLOGY

Behavioral
description

Compiler

Scheduler

Storage
allocator

CDFG Storage
Technology mapping
Architecture, topology, style selection

merger strategies:
Functional unit
allocator 1. Top−down
Interconnection
allocator 2. Meet−in−the−middle
Module
selector 3. Bottom−up

Technology
mapper
CDB
Microarchitecture
optimizer

Logic/Sequential synthesis

To physical design

9.11

Copyright © 1993 by Daniel Gajski UC Irvine


LOGIC−SYNTHESIS SYSTEM

State Boolean Timing Memory


tables expressions diagrams specifications

State Timing graph Memory


minimization compiler synthesis

State Interface
encoding synthesis

Logic
minimization

Technology
mapping

Physical design

9.12

Copyright © 1993 by Daniel Gajski UC Irvine


PHYSICAL DESIGN METHODOLOGY

RT netlist

Style Component
Partitioning instances
selection
from CDB
1D 2D 1Bit

Stack Glue logic


partitioning partitioning

Floorplanning

Stack Array Glue logic


layout layout layout

Routing

To ASIC manufacturing

9.13

Copyright © 1993 by Daniel Gajski UC Irvine


PHYSICAL DESIGN METHODOLOGY

Register

Counter

Mux

ALU

Comparator Glue Logic

Datapath floorplan

Stack 2

Memory

A/D
Register
file Stack 1

Chip floorplan

9.14

Copyright © 1993 by Daniel Gajski UC Irvine


SYSTEM DATABASES

Phase 1: Collection of tools

Phase 2: Tool integration

Phase 3: Common data model

Phase 4: Design views and


consistency checks

9.15

Copyright © 1993 by Daniel Gajski UC Irvine


DATABASE ARCHITECTURE

Design entity graphs


hierarchy, version control, configuration management

Design data graphs


behavior, structure, geometry, timing

Version Transaction
manager manager

Designer

Schema Design entity Design entity


browser manager graph

Database
interface

Design data
Design view Design data
representation
manager manager
graphs

Design
tools
Consistency Design quality
checker evaluator

9.16

Copyright © 1993 by Daniel Gajski UC Irvine


COMPONENT DATABASE

Component
descriptions,
Component
Component Schematic Component Component
generators,
netlist diagram request query
Component−
optimization
tools
Schematic Schematic
capture generator

RT, logic, layout


Knowledge Component descriptions
server server
Estimates

Component database

Component
Fixed store

Parameterized

Component
Component
descriptions
generators

9.17

Copyright © 1993 by Daniel Gajski UC Irvine


CONCEPTUALIZATION
ENVIRONMENT

Data and design manager

Displays and editors


Design−quality estimators
Design−consistency checkers

Synthesis algorithms

9.18

Copyright © 1993 by Daniel Gajski UC Irvine


BEHAVIORAL DESCRIPTION
DISPLAY

Variables
Begin State Condition CondValue Actions NextState
t, count, opd, result
Ports
R,A,B t=(A<B),
Operators
t=A<B BEGIN R=0, test1
R=0 result = 0
<, +, − result = 0

(t) ("1") testT


test1 test1
t==1 ("0") testF

testT testF
testT count=A, opd=B join
count = A count = B
opd = B opd = A
testF count=B, opd=A join

Join
join loop

loop
loop t=(count>0) test2
t = count > 0

(t) ("1") body


test2 test2
t==1
("0") 1
body

result = result + opd R = result body result=result+opd, loop


count=count−1
count = count −1

1 R = result END

END END
End

Flowchart State table

9.19

Copyright © 1993 by Daniel Gajski UC Irvine


FLOORPLAN DISPLAY

Datapath1

bus1
Reg4.1

Reg4.2
Memory Control Unit
Reg4.3

ALU4.1

ALU4.2

bus2
Datapath2

MUL4.1
Control Unit

Glue Logic

MUL4.2

9.20

Copyright © 1993 by Daniel Gajski UC Irvine


INTERACTIVE SYNTHESIS

Description capture

Description
partitioning

Component
selection

Module/Port Component Connection


placement binding allocation

Scheduling

To physical design

Possible scenarios for interactive synthesis

9.21

Copyright © 1993 by Daniel Gajski UC Irvine


FUTURE DIRECTIONS

Complete synthesis systems/frameworks

Descriptions and modeling guidelines


Quality metrics and estimation
Component taxonomy and generators

Databases and environments

Design exploration strategies


Hardware/software codesign

9.22

Copyright © 1993 by Daniel Gajski UC Irvine

You might also like