Professional Documents
Culture Documents
Architectural and System Synthesis: Camposano, J. Hofstede, Knapp, Macmillen Lin
Architectural and System Synthesis: Camposano, J. Hofstede, Knapp, Macmillen Lin
Architectural and System Synthesis: Camposano, J. Hofstede, Knapp, Macmillen Lin
Synthesis
SOURCES-
DeMicheli
Mark Manwaring Camposano,
Kia Bazargan
Giovanni De Micheli
J. Hofstede,
Gupta Knapp,
Youn-Long Lin MacMillen
Lin
Outline
• Motivation.
• Compiling language models into abstract
models.
• Behavioral-level optimization and program-
level transformations.
• Architectural synthesis: an overview.
Architectural Synthesis
Architectural Synthesis Problem
• Specification
• A sequencing graph
• A set of functional resources
• characterized by area and execution delay
• Constraints
• Tasks
• Place operations in time and space
• Determine detailed interconnection and control
• This is what we need to do in behavioral synthesis! :)
• Architectural-level synthesis:
• Architectural abstraction level.
• Determine macroscopic structure.
• Example of synthesis: major building blocks.
• Logic-level synthesis:
• Logic abstraction level.
• Determine microscopic structure.
• Example of synthesis: logic gate interconnection.
Synthesis and optimization
Example of HDL description of
architecture
diffeq {
read (x, y, u, dx, a);
repeat {
xl = x + dx;
ul = u - (3 * x * u * dx) - (3 * y * dx);
yl = y + u * dx;
c = x < a;
x = xl; u = ul; y = yl;
}
until ( c ) ;
write (y);
}
Example of structures to implement this architecture
Processes
control and
data
Principle of scheduling and
allocation
time unit
1 + +
CDFG
+
2 +
< * +< control
+ 3 <
4 * *
control a b c d
step
1 A1 +1
a b c d e f g h
e
1
2 A2 +3 +2 A1
2
f g 3
3 *1 M1
4
h
Internal representations
• Internal
representation is a b
design back-bone c d
e=a+b;
of synthesis g=c+d;
+1
+2
f=e+b;
h=f*g; e
g
• Representations
+3
f
• Parse tree
*
• Control-flow 1h
graph (CFG)
• Data-flow graph CDFG( contr
(DFG, SFG) ol data flow
• Control/data-flow graph )
graph (CDFG)
Example of trade-off in architectural design
Architectural-level synthesis
motivation
• Raise input abstraction level.
1. Reduce specification of details.
2. Extend designer base.
3. Self-documenting design specifications.
4. Ease modifications and extensions.
architectures
Arch I
Arch II
Arch III
Area
Stages of architectural-level synthesis
1. Translate HDL models into sequencing graphs.
2. Behavioral-level optimization:
1. Optimize abstract models independently from the
implementation parameters.
software compilation.
hardware compilation.
High Level Synthesis Compilation Flow
Compilation and behavioral
optimization
• Software compilation:
• Compile program into intermediate form.
• Optimize intermediate form.
• Generate target code for an architecture.
• Hardware compilation:
• Compile HDL model into sequencing graph.
• Optimize sequencing graph.
• Generate gate-level interconnection for a cell library.
Compilation
• Front-end:
1. Lexical and syntax analysis.
2. Parse-tree generation.
3. Macro-expansion.
4. Expansion of meta-variables.
• Semantic analysis:
1. Data-flow and control-flow analysis.
2. Type checking.
3. Resolve arithmetic and relational operators.
Parse tree example
a = p +q r
Behavioral-level optimization
• Taxonomy:
1. Data-flow based transformations.
2. Control-flow based transformations.
Data-Flow Based
Transformations (review)
1. Tree-height reduction.
4. Dead-code elimination.
5. Operator-strength reduction.
6. Code motion.
• Goal:
• Split into two-operand expressions to exploit hardware
parallelism at best.
• Techniques:
• Balance the expression tree.
• Exploit commutativity, associativity and
distributivity.
Example of tree-height reduction
using commutativity and associativity
x = ( a + (b * c ) ) + d x = (a +d) + (b * c)
Example of tree-height reduction
using distributivity
x = a * (b c d +e) x = (a b) (c d) + (a e);
Examples of propagation
• First Transformation type: Constant propagation:
• a = 0, b = a +1, c = 2 * b,
• a = 0, b = 1, c = 2,
• Logic expressions:
• Performed by logic optimization.
• Kernel-based methods.
• We discussed with factorization
• Arithmetic expressions:
• Search isomorphic patterns in the parse trees.
• Example:
• a = x +y, b = a +1, c = x +y,
• a = x +y, b = a +1, c = a.
Examples of other transformations
• Dead-code elimination:
• a = x; b = x +1; c = 2 * x;
• a = x; can be removed if not referenced.
• Operator-strength reduction:
• a = x 2 ; b = 3 * x;
• a = x * x; t = x << 1; b = x + t;
• Code motion:
• for (i = 1; i a * b) { }
• t = a * b; for (i = 1; i t) { }
• Multiplication only once.
Control- flow based transformations
1. Model expansion.
3. Loop expansion.
4. Block-level transformations.
• (will be discussed in more detail separately, presented on
Friday)
Model expansion
• Expand subroutine and flatten hierarchy as the result.
• Example:
• y = ab; if (a) {x = b + d; } else {x = bd;}
• can be expanded to: x = a(b +d) +a’ bd
• and simplified as: y = ab; x = y +d(a +b)
• Design space:
• Set of all feasible implementations.
• Implementation parameters:
• Area.
• Performance:
• Cycle-time.
• Latency.
• Throughput (for pipelined implementations).
• Power consumption
Three dimensional Design evaluation space
Hardware modeling
1. Circuit behavior:
• Sequencing graphs.
2. Building blocks:
• Resources.
3. Constraints:
• Timing and resource usage.
2. Memory resources:
• Store data.
• Example: memory and registers.
3. Interface resources:
• Example: busses and ports.
Functional resources
1. Standard resources:
• Existing macro-cells.
• Well characterized (area/delay).
• Example: adders, multipliers, ...
2. Application-specific resources:
• Circuits for specific tasks.
• Yet to be synthesized.
• Example: instruction decoder.
Resources and circuit families
• Resource-dominated circuits.
• Area and performance depend on few, well-characterized blocks.
• Example: DSP circuits.
• Timing constraints:
• Cycle-time.
• Latency of a set of operations.
• Time spacing between operation pairs.
• Resource constraints:
• Resource usage (or allocation).
• Partial binding.
Synthesis in the temporal
domain
• Scheduling:
• Associate a start-time with each operation.
• Determine latency and parallelism of the
implementation.
Result of
scheduling
Example of Synthesis in the temporal domain
ASAP
Here we
use
sequencing
graph
Synthesis in the spatial domain
1. Binding:
• Associate a resource with each operation with the same type.
• Determine area of the implementation.
2. Sharing:
• Bind a resource to more than one operation.
• Operations must not execute concurrently.
• First ALU
• Second ALU
• Solution
• Four Multipliers
• Two ALUs
• Four Cycles
Binding specification
• Mapping from the vertex set to the set of resource
instances, for each given type.
1. Partial binding:
• Partial mapping,
• given as design constraint.
2. Compatible binding:
• Binding which is satisfying the constraints of the partial
binding.
cont
Example of Binding specification
• Binding to the
same multiplier
Estimation: area, latency,
cycle time
• Resource-dominated circuits.
• Area = sum of the area of the resources bound to the operations.
• Determined by binding.
• Latency = start time of the sink operation (minus start time of the
source operation).
• Determined by scheduling
2. Cycle-time/latency trade-off,
• for some binding (area).
3. Area/cycle-time trade-off,
• for some schedules (latency).
Area/latency trade-off for various cycle times
• Area/Latency for
cycle time=30
• Area/Latency for
cycle time=40
Pareto points
in three
dimensions
Area-latency trade-off
• Rationale:
• Cycle-time dictated by system constraints.
• Resource-dominated circuits:
• Area is determined by resource usage.
• Approaches:
1. Schedule for minimum latency under resource constraints
2. Schedule for minimum resource usage under latency
constraints
• for varying constraints.
Summary on behavioral and
architectural synthesis and optimization
• Behavioral optimization:
• Create abstract models from HDL models.
• Optimize models without considering implementation
parameters.
REGISTER ALLOCATION
DATAPATH GENERATION
AND CONTROLLER
SYNTHESIS
WRITE VHDL
High Level Synthesis for low
power
for(I=0;I<=2;I=I+1begin
@(posedge clk);
if(fgb[I]%8; begin Control
p=rgb[I]%8;
g=filter(x,y)*8;
end
Datapath Memory
............
Instructions Scheduling
Hardware allocation Operators,
Operations
Registers,
Variables Memory inferencing
Memory, Multiplexor
Arrays Register sharing
constraints Control
signals Control interencing
specification
RTL(register transfer
high level level) architecture
synthesis
Low Power design
Power(Register) =
switching(x)(Cout,Mux+Cin,Register)+switching(y) x (Cout,Register+Cin,DeMux)
switching(x)=switching(y) …. Power(Register)=switching(y) x Ctotal
Control Control
i Register i*
DeMux
x y
MUX
j j*
k k*
Cout,MuxCin,Register Cout,Register
Cin,DeMux
comparison of benchmarks for low
power synthesis methods
mW
100
90
80
70
60
N o n lo w P o w er
50
Lo w P o w er
40
30
20
10
0
E xam ple C ascade Fir11 IIR W ave
25 % P o w er R eductio n
20
15
10
CDFG Synthesis
Structural
RTL
Transformation
Design Flow of specialized high level
synthesis systems
• Synthesizable (and executable) specification
• Loop pipelining
• Retiming
• Memory design
• Reset, clock
• Interface design
• Clocks
• Resets
• Registered outputs
• Loop pipelining
Behavioral Specification
Languages
• Add hardware-specific constructs to existing languages
• HardwareC
• Popular HDL
• Verilog, VHDL
• Synthesis-oriented HDL
• UDL/I
VHDL synthesis tools
RTL-synthesis Behavioral synthesis
• FU allocation • HL Optimizations
• Limited register allocation • Scheduling
• Interconnect allocation • RTL-synthesis
• Binding
• Logic and physical
synthesis
Many issues do not exist in FPGA or architectural synthesis that use ready
blocks but they exist in VLSI chip design.
Chip Synthesis
System on a chip
Variants of the
robot system
Decomposition is
not the same as
partitioning
System “knows”
typical blocks
and libraries of
commercial
components
Example of a System-on-a-Chip
Bridge
External
USB
Memory
Interface
Bridge
External
USB
Memory
Interface
Library/
IP
Vendors Integrators
(Chipless)
EDA
Vendors
Paradigm Shift
Move of EDA vendors to production
Essential Current and Open
Issues in Design Automation
• Behavioral Specification Languages
• From Matlab to chip, from Prolog to chip, etc.
• Target Architectures
• Network on a chip, sensors and motion control integrated.
• Intermediate Representation
• For users to exchange, to understand the design better
• Operation Scheduling
• On the level of complex operations such as transforms or filters.
Still
• Allocation/Binding
• On many levels of operations and processors
areas of
• Control Generation
active
•
•
State machine optimization for large controllers
New technologies , integrate FSM-logic-layout
research
Future research areas in High
Level Syntesis
• System level design
• Software-hardware system co-design
• Reuse
• Intellectual Property (IP) or Virtual Components (VC)
• Synthesis utilizing IP
• Synthesizing IPs
More than
IP Wrapper just the
Port Interface
IP
Future Directions for system
design
• Realistic Methodology
• Evolutional Transition from Current Practice
• Domain Specific
• IP-Centric
• As both Authoring Aid and Integrator Needs better
collaboration of
• Software research
• Co-design and Code Generation universities and
companies
Literature
[1] D. Gajski and N. Dutt, High-level Synthesis : Introduction to Chip and System Design. Kluwer Academic
Publishers, 1992.
[2] G. D. Micheli, Synthesis and Optimization of Digital Circuits. New York : McGraw Hill. Inc, 1994.
[3] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low-Power CMOS digital design", IEEE J. of
Solid-State Circuits, pp. 473-484, 1992.
[4] A. P. Chandrakasan, M. Potkonjak, R. Mehra, J. Rabaey, and R. W. Brodersen, "Optimizing power using
transformation," IEEE Tr. on CAD/ICAS, pp. 12-31, Jan. 1995.
[5] E. Musool and J. Cortadella, "Scheduling and resource binding for low power", Int'l Symp on Synstem
Syntheiss, pp. 104-109, Apr. 1995.
[6] Y. Fang and A. Albicki, "Joint scheduling and allocation for low power," in Proc. of Int'l Symp. on Circuits
& Systems, pp. 556-559, May. 1996.
[7] J. Monteiro and Pranav Ashar, "Scheduling techniques to enable power management", 33rd Design
Automation Conference, 1996.
[8] R. S. Martin, J. P. Knight, "Optimizing Power in ASIC Behavioral Synthesis", IEEE Design & Test of
Computers, pp. 58-70, 1995.
[9] R. Mehra, J. Rabaey, "Exploting Regularity for Low Power Design", IEEE Custom Integrated Circuits
Conference, pp.177-182. 1996.
[10] A. Chandrakasan, T. Sheng, and R. W. Brodersen, "Low Power CMOS Digital Design", Journal of Solid
State Circuits, pp. 473-484, 1992.
[11] R. Mehra and J. Rabaey, "Behavioral level power estimation and exploration," in Proc. of Int'l Symp. on
Low Power Design, pp. 197-202, Apr. 1994.
[12] A. Raghunathan and N. K. Jha, "An iterative improvement algorithm for low power data path
synthesis," in Proc. of Int'l Conf. on Computer-Aided Design, pp. 597-602, Nov. 1995.
[13] R. Mehra, J. Rabaey, "Low power architectural synthesis and the impact of exploiting locality," Journal
of VLSI Signal Processing, 1996.
[14] M. B. Srivastava, A. P. Chandrakasan, and R. W. Brodersen, "Predictive system shutdown
and other architectural techniques for energy efficient programmable computation," IEEE Tr.
on VLSI Systems, pp. 42-55, Mar. 1996.
[15] A. Abnous and J. M. Rabaey, "Ultra low power domain specific multimedia processors," in
Proc. of IEEE VLSI Signal Processing Workshop, Oct. 1996.
[16] M. C. Mcfarland, A. C. Parker, R. Camposano, "The high level synthesis of digital systems,"
Proceedings of the IEEE. Vol 78. No 2 , February, 1990.
[17] A. Chandrakasan, S. Sheng, R. Brodersen, "Low power CMOS digital design,", IEEE Solid
State Circuit, April, 1992.
[18] A. Chandrakasan, R. Brodersen, "Low power digital CMOS design, Kluwer Academic
Publishers, 1995.
[19] M. Alidina, J. Moteiro, S. Devadas, A. Ghosh, M. Papaefthymiou, "Precomputation based
sequential logic optimization for low power," IEEE International Conference on
Computer Aided Design, 1994.
[20] J. Monterio, S. Devadas and A. Ghosh, "Retiming sequential circuits for low power," In
Proceeding of the IEEE International Conference on Computer Aided Design, November, 1993.
[21] F. J. Kurdahi, A. C. Parker, REAL: A Program for Register Allocation,: in Proc. of the 24th
Design Automation Conference, ACM/IEEE, June. pp. 210-215, 1987.
[22] A. Wolfe. A case study in low-power system level design. In Proc.of the IEEE International
Conference on Computer Design, Oct., 1995.
[23] T.D. Burd and R.W. Brothersen. Energy ecient CMOS micropro-cessor design. In Proc. 28th
Annual Hawaii International Conf. On System Sciences, January 1995.
[24] A. Dasgupta and R. Karri. Simultaneous scheduling and binding for power minimization during
microarchitectural synthesis. In Int. Symposium on Low Power Design, pages 69-74, April 1995.
[25] R.S. Martin. Optimizing power consumption, area and delay in behavioral synthesis. PhD
thesis, Department of Electronics, Faculty of Enginering, Carleton University, March 1995.
[26] A. Matsuzawa. Low-power portable design. In Proc. International Symposium on Advanced
Research in Asynchronous Circuits and Systems, March 1996. Invited lecture.
[27] J.D. Meindl. Low-power microelectronics: retrospect and prospect. Proceedings of the IEEE
83(4):619-635, April 1995.
Exam
Problem 1
1. Allocate to time
2. Allocate to logic blocks
3. Design a complete
controller
4. Design a controller for
pipelined design
Exam Problem 3: Scheduling
• Set area constraint
• 2 multipliers
• 2 general-purpose ALUs
• Set the cycle time = latency of a multiplier
• Goal: minimize latency of circuit
Exam
Problem
4
1. Give the set of functional resources: two multipliers, two ALUs.
2. Scheduling example with the constraints (two set constraints, then
optimize the third)
3. We need to maintain the data dependencies. (e.g. vertex 6 must be
scheduled at least one cycle after vertex 1.)
4. This is the same differential equation dataflow graph from a
previous slide.
5. Edges that are not necessary to !show dependencies between
vertices have been removed.
6. Complete this problem
Exam Problem 5: Binding
Exam Problem 5
Second Exam Problem 5:
Competition for Students
1. The student with the smallest area gets a prize,
the student with the smallest latency gets a
prize.