Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 27


The transformation of a circuit description into a geometric description, is known as a
layout. A layout consists of a set of planar geometric shapes in several layers.
The process of converting the specifications of an electrical circuit into a layout is called
Physical design.

Due to the large number of components and the fine details required by the fabrication
process, the physical design is not practically possible without the help of computers. As
a result, almost all phases of physical design extensively use computer-aided design
(CAD) tools and many phases are either partially or fully automated. This automation of
the physical design process has increased the level of integration, reduced the turnaround
time, and enhanced chip performance.
There are various CAD tools available in market and each of them have their own
strengths and weaknesses. The Electronic Design Automation (EDA) companies like
Cadence, Synopsys, Magma, and Mentor Graphics provide these CAD tools.
VLSI physical design automation is mainly deals with the study of algorithms related to
the physical design process. The objective is to study optimal arrangements of devices on
a plane (or in a three-dimensional space) and various interconnection schemes between
these devices to obtain the desired functionality. Because space on a wafer is very
expensive, algorithms must use the space very efficiently to decrease the costs and
improve the yield. In addition, the arrangement of devices (placement) plays a key role in
determining the performance of a chip. Algorithms for physical design must also ensure
that all the rules required by the fabrication are followed and that the layout is within the
tolerance limits of the fabrication process. Finally, algorithms must be efficient and
should be able to handle very large designs. Efficient algorithms not only lead to fast
turnaround time, but also permit designers to iteratively improve the layouts.
The design process of producing a packaged VLSI chip physically follows various
steps which is popularly known as VLSI design cycle.This design cycle is normally
represented by a flow chart shown below. The various steps involved in the design cycle
are elaborated below.
(i). System specification: The specifications of the system to be designed are exactly
specified in this step. It considers performance, functionality, and the physical
dimensions of the design. The choice of fabrication technology and design techniques
are also considered. The end results are specifications for the size, speed, power, and
functionality of the VLSI system to be designed.
(ii) Functional design: In this step, behavioral aspects of the system are considered. The
outcome is usually a timing diagram or other relationships between sub-units. This
information is used to improve the overall design process and to reduce the complexity of
the subsequent phases.

(iii). Logic design: In this step, the functional design is converted into a logical design,
using the Boolean expressions. These expressions are minimized to achieve the smallest
logic design which conforms to the functional design. This logic design of the system is
simulated and tested to verify its correctness.
(iv).Circuit design: This step involves conversion of Boolean expressions into a
circuit representation by taking into consideration the speed and power requirements of
the original design. The electrical behavior of the various components are also
considered in this phase. The circuit design is usually expressed in a detailed circuit

(v).Physical design: In this step, the circuit representation of each component is

converted into a geometric representation. This representation is a set of geometric
patterns which perform the intended logic function of the corresponding component.
Connections between different components are also expressed as geometric patterns.
(This geometric representation of a circuit is called a layout). The exact details of the
layout also depend on design rules, which are guidelines based on the limitations of the
fabrication process and the electrical properties of the fabrication materials. Physical
design is a very complex process, therefore, it is usually broken down into various sub-
steps in order to handle the complexity of the problem.
(vi). Design verification: In this step, the layout is verified to ensure that the layout
meets the system specifications and the fabrication requirements. Design verification
consists of design rule checking (DRC) and circuit extraction. DRC is a process which
verifies that all geometric patterns meet the design rules imposed by the fabrication
process. After checking the layout for design rule violations and removing them, the
functionality of the layout is verified by circuit extraction. This is a reverse engineering
process and generates the circuit representation from the layout. This reverse engineered
circuit representation can then be compared to the original circuit representation to verify
the correctness of the layout.
(vii). Fabrication: This step is followed after the design verification. The fabrication
process consists of several steps like, preparation of wafer, deposition, and diffusion of
various materials on the wafer according to the layout description. A typical wafer is 10
cm in diameter and can be used to produce between 12 and 30 chips. Before the chip is
mass produced, a prototype is made and tested.
(viii). Packaging, testing, and debugging : In this step, the chip is fabricated and diced
in a fabrication facility. Each chip is then packaged and tested to ensure that it meets all
the design specifications and that it functions properly. Chips used in printed circuit
boards (PCBs) are packaged in a dual in-line package (DIP) or pin grid array (PGA).
Chips which are to be used in a multichip module (MCM) are not packaged because
MCMs use bare or naked chips.
The Physical design cycle converts a circuit diagram into a layout. This complex task is
completed in several steps ,like s partitioning, floor-planning, placement, routing, and
lay-out compaction etc. The details of these steps are given below.
(a).Partitioning : The chip layout is always a complex task and hence it is divided
into several smaller tasks. A chip may contain several million transistors. Layout of the
entire circuit cannot be handled due to the limitation of memory space as well as
computation power available.
Therefore, it is normally partitioned by grouping the components into blocks. The actual
partitioning process considers many factors such as size of the blocks, number of blocks,
and number of interconnections between the blocks. The output of partitioning is a set of
blocks along with the interconnections required between blocks. The set of
interconnections required is referred to as a net list. In large circuits the partitioning
process is hierarchical and at the topmost level a chip may have between 5 and 25 blocks.
Each module is then partitioned recursively into smaller blocks.
A disadvantage of the partitioning process is that it may degrade the performance of the
final design. During partitioning, critical components should be assigned to the same
partition. If such an assignment is not possible, then appropriate timing constraints must
be generated to keep the two critical components close together. Usually, several
components, forming a critical path, determine the chip performance. If each component
is assigned to a different partition, the critical path may be too long. Minimizing the
length of critical paths improves system performance
After a chip has been partitioned, each of the sub-circuits must be placed on a fixed plane
and the nets between all the partitions must be interconnected. The placement of the sub-
circuits is done by the placement algorithms and the nets are routed by using routing
(b) Placement: It is the process of arranging a set of modules on the layout surface. Each
module has fixed shape and fixed terminal locations. A poor placement uses larger area
and hence results in performance degradation.
The placement process determines the exact positions of the blocks on the chip, so as to
find a minimum area arrangement for the blocks that allows completion of
interconnections between the blocks. Placement is typically done in two phases. In the
first phase an initial placement is created. In the second phase the initial placement is
evaluated and iterative improvements are made until the layout has minimum area and
conforms to design specifications.
It is important to note that some space between the blocks is intentionally left empty to
allow interconnections between blocks. Placement may lead to un-routable design, i.e.,
routing may not be possible in the space provided. Thus, another iteration of placement is
necessary. To limit the number of iterations of the placement algorithm, an estimate of the
required routing space is used during the placement phase. A good routing and circuit
performance heavily depend on a good placement algorithm.This is due to the fact that
once the position of each block is fixed, very little can be done to improve the routing and
the overall circuit performance.
There are various types of placements.
System-level placement : Place all the PCBs together such that Area occupied is
minimum and Heat dissipation is within limits.
Board-level placement : All the chips have to be placed on a PCB. Area is fixed All
modules of rectangular shape.
The objective is to , Minimize the number of routing layers and Meet system
performance requirements.
Chip-level placement : Normally, floor planning / placement carried out along with pin
assignment. It has limited number of routing layers (2 to 4). Bad placements may be
Can be detected only later (during routing). Costly delays in design cycle.
Minimization of area.
Floor-plan design is an important step in physical design of VLSI circuits to plan the
positions of a set of circuit modules on a chip in order to optimize the circuit
In floor-planning, the information of a set of modules, including their areas and
interconnection is considered and the goal is to plan their positions on a chip to
minimize the total chip area and interconnect cost.
In the floor planning phase, the macro cells are positioned on the layout surface in such
a way that no blocks overlap and that there is enough space left to complete the
interconnections. The input for the floor planning is a set of modules, a list of terminals
(pins for interconnections) for each module and a net list, which describes the terminals
which have to be connected.

Different approaches are followed to the floor planning problem. Wimer et al. describe a
branch and bound approach for the floor plan sizing problem, i.e. finding an optimal
combination of all possible layout-alternatives for all modules after placement. While
their algorithm is able to find the best solution for this problem, it is very time
consuming, especially for real problem instances. Cohoon et al. implemented a genetic
algorithm for the whole floor planning problem. Their algorithm makes use of estimates
for the required routing space to ensure completion of the interconnections. Another more
often used heuristic solution method for placement is Simulated Annealing
(c) Routing: The main objective in this step is to complete the interconnections between
blocks according to the specified netlist. First, the space not occupied by the blocks
(called the routing space) is partitioned into rectangular regions called channels and
switchboxes. The goal of a router is to complete all circuit connections using the shortest
possible wire length and using only the channels and switchboxes. This is usually done in
two phases, referred to as the global routing and detailed routing phases.
In global routing, connections are completed between the proper blocks of the circuit
disregarding the exact geometric details of each wire and pin. For each wire, the global
router finds a list of channels which are to be used as a passage way for that wire. In other
words, global routing specifies the loose route of a wire through different regions in
the routing space.
Global routing is followed by detailed routing, which completes point-to-point
connections between pins on the blocks. Loose routing is converted into exact routing by
specifying geometric information such as width of wires and their layer assignments.
Detailed routing includes channel routing and switchbox routing.
As all problems in routing are computationally hard, the researchers have focused on
heuristic algorithms. As a result, experimental evaluation has become an integral part of
all algorithms and several benchmarks have been standardized. Due to the nature of the
routing algorithms, complete routing of all the connections cannot be guaranteed in many
(d).Compaction: The operation of layout area minimization without violating the design
rules and without altering the original functionality of layout is called as compaction.
The input of compaction is layout and output is also layout but by minimizing area.
Compaction is done by three ways:
(i) By reducing space between blocks without violating design space rule.
(ii) By reducing size of each block without violating design size rule.
(iii).By reducing shape of blocks without violating electrical characteristics of blocks.
Therefore compaction is very complex process because this process requires the
knowledge of all design rules. Due to the use of strategies compaction algorithms are
divided into one-dimensional algorithms (either in x-dimension or y-dimension), two
dimensional algorithms (both in x-dimension and y-dimension) and topological algorithm
(moving of separate cells according to routing constraints).
Types of compaction techniques:
(i) 1-Dimensional compaction:

In this technique compaction is done only in one dimension either in x-direction or y-

direction until no further compaction is possible . There are two types of constraints
which relates to these compaction techniques (i) Separation constraint (ii) Connectivity
(ii).2-Dimensional compaction:
In this method compaction is done in both dimension x-dimensions as well as in y-
dimension. 2-D compaction is in general much better than performing 1-D compaction. If
2-D compaction, solved optimally, produces minimum-area layouts. The trade off in this
technique is the much time consumption. Thus we use 3/2-D Compaction.
(iii) 3/2-D Compaction:
In this technique the blocks are moved in such a way that it not only compact the circuit
but also resolve interferences. Since the geometry is not as free as in 2-D Compaction.
In this method two lists are formed one is ceiling another is floor. First is formed by the
blocks which are appeared from the top & second is formed by the blocks which are
appeared from the bottom. Selects the lowest block in the ceiling list and moves it to the
place on the floor which maximizes the gap between floor and ceiling. The process is
continued until all blocks are moved from ceiling to floor.
Though the partitioning of a physical design decomposes the physical design into
several conceptually easier steps, still each step is computationally very hard. So, in
order to reduce the
the complexity of physical design and to get high yield certain restricted models and
design styles are proposed. They are (i) full-custom design style (ii) standard cell
design style (iii) gate array design style and (iv)
(i).Full-custom design style :
It is the most general form of layout in which the circuit is partitioned into a collection
of sub-circuits according to some criteria such as functionality of each sub-circuit. In this
design style, each sub-circuit is called a functional block or simply a block. The full
custom design style allows functional blocks to be of any size. Blocks can be placed at
any location on the chip surface without restriction. In other words, this style is
characterized by the absence of any constraints on the physical design process. This
design style allows for very compact designs. But, the process of automating a full-
custom design style has a much higher complexity than other restricted models. For this
reason, it is used only when final design must have a minimum area and designing time is
less of a factor. The full-custom structure of a design is shown below .
The automation process for a full-custom layout is still a topic of intensive research.
Some phases of physical design of a full-custom chip may be done manually to optimize
the layout. Layout compaction is a very important aspect in full-custom. The rectangular
solid boxes around the boundary of the circuit are called IO pads.

Pads are used to complete interconnections between chips or interconnections between

chip and the board. The space not occupied by blocks is used for routing of
interconnecting wires. Initially all the blocks are placed within the chip area, with the
objective of minimizing the total area. However, enough space must be left between the
blocks to complete the routing. Usually several metal layers are used for routing
interconnections. Currently, two metal layers are common for routing and the three-metal
layer process is gaining acceptance, as the fabrication costs become more feasible. The
routing area needed between the blocks becomes increasingly smaller as more routing
layers are used. This is because some routing is done on top of the transistors in the
additional metal layers. If all the routing can be done on top of the transistors, the total
chip area is determined by the area of the transistors.
In a hierarchical design of circuit each block in full-custom design may be very complex
and may consist of several sub blocks, which in turn may be designed using the full-
custom design style or other design styles. It is clear that as any block is allowed to be
placed anywhere on the chip, the problem of optimizing area and interconnection of wires
becomes difficult. Full-custom design is very time consuming thus, the method is
inappropriate for very large circuits, unless performance is of utmost importance. Full-
custom is usually used for the layout of chips like microprocessors etc.
(ii).Standard cell design style : This is the more restricted design style and the design
process is simpler than a full-custom design style. Standard cell methodology considers
the layout to consist of rectangular cells of the same height. Initially, a circuit is
partitioned into several smaller blocks, each of which is equivalent to some predefined
sub-circuit or cell. The functionality and electrical characteristics of each predefined cell
are tested, analyzed, and specified. A collection of these cells is called a cell library,
usually consisting of 200400 cells. Terminals on cells may be located either on the
boundary or in the center of the cells. Cells are placed in rows and the space between two
rows is called a channel. These channels are used to perform interconnections between
cells. If two cells to be interconnected lie in the same row or in adjacent rows, then the
channel between the rows is used for interconnection. However, if two cells to be
connected lie in two nonadjacent rows, then their interconnection wire passes through the
empty space between any two cells, or feed through.
Standard cell design is well suited for moderate-size circuits and medium production
volumes. Physical design using standard cells is simpler as compared to full-custom and
efficient using moderndesign tools. The standard cell design style is also widely used to
implement the random logic of the full-custom design . While standard cell designs are
developed more quickly, a substantial initial investment is needed in the development of
the cell library, which may consist of several hundred cells. Each cell in the cell library is
handcrafted and requires a highly skilled design engineer. Each type of cell must be
created with several transistor sizes. Each cell must then be tested by simulation and its
performance must be characterized.A standard cell design usually takes more area than a
full-custom or a handcrafted design. However, as more metal layers become available for
routing, the difference in area between the two design styles will gradually be reduced.
(iii). Gate array design style : This design style is a simplified version of the standard
cell design style. Unlike the cells in standard cell designs, all the cells in gate array are
identical. The entire wafer is prefabricated with an array of identical gates or cells. These
cells are separated by both vertical and horizontal spaces called vertical and horizontal
channels. The circuit design is modified such that it can be partitioned into a number of
identical blocks. Each block must be logically equivalent to a cell on the gate array. The
name gate array signifies the fact that each cell may simply be a gate, such as a three-
input NAND gate. Each block in the design is mapped or placed onto a prefabricated cell
on the wafer during the partitioning / placement phase, which is reduced to a block-to-
cell assignment problem.
The number of partitioned blocks must be less than or equal to that of the total number of
cells on the wafer. Once the circuit is partitioned into identical blocks, the task is to
make the interconnections between the prefabricated cells on the wafer using horizontal
and vertical channels to form the actual circuit. The uncommitted gate array is taken into
the fabrication facility and routing layers are fabricated on top of the wafer. The
completed wafer is also called a customized wafer.
This simplicity of gate array design is gained at the cost of rigidity imposed upon the
circuit both by the technology and the prefabricated wafers. The advantage of gate arrays
is that the steps involved for creating any prefabricated wafer are the same, and only the
last few steps in the fabrication process actually depend on the application for which the
design will be used. Hence, gate arrays are cheaper and easier to produce than full-
custom or standard cell. Similar to standard cell design, gate array is also a
nonhierarchical structure. The gate array architecture is the most restricted form of
layout. It means that it is the simplest for algorithms to work with. For example, the
task of routing in gate array is to determine if a given placement is routable. The
routability problem is conceptually simpler as compared to the routing problem in
standard cell and full-custom design styles.
(iv).Field Programmable Gate Array Design (FPGA):
In this design, chips are prefabricated with logic blocks and interconnects.Logic and
interconnects can be programmed (erased and reprogrammed)by users. No fabrication is
Interconnects are predefined wire segments of fixed lengths with switches in between.
Conclusion: The choice of design style for a particular circuit depends on many factors
like functionality of the chip, time-to-market, and the volume of chips to be
manufactured. Full-custom is typically reserved for high-performance, high-volume
chips, while standard cells are used for moderate performance, where the cost of full-
custom cannot be justified. Gate arrays are typically used for low-performance, low-cost
applications. A design style may be applicable to the entire chip or a block of the chip.
Placement Algorithms :
The placement algorithms can be classified on the basis of the input to the algorithms, the
nature of output generated by the algorithms, and the process used by the algorithms.
Depending on the input, the placement algorithms can be classified into two major
groups: constructive placement and iterative improvement methods. The input to the
constructive placement algorithms consists of a set of blocks along with the net list.
The algorithm finds the locations of blocks. On the other hand, iterative improvement
algorithms start with an initial placement. These algorithms modify the initial placement
in search of a better placement. These algorithms are typically used in an iterative manner
until no improvement is possible.
Simulated Annealing Placement Algorithm :
This is one of the popular placement algorithm used in the VLSI design process .The
simulated annealing technique has been successfully used in many phases of VLSI
physical design,e.g., circuit partitioning. Simulated annealing is used in placement as an
iterative improvement algorithm.
Given a placement configuration, a change to that configuration is made by moving a
component or interchanging locations of two components. In the case of the simple pair
wise interchange algorithm, it is possible that an achieved configuration has a cost higher
than that of the optimum, but no interchange can cause a further cost reduction. In such a
situation, the algorithm is trapped at a local optimum, and cannot proceed further.
Actually, this happens quite often when this algorithm is used in real-life examples.
Simulated annealing avoids getting stuck at a local optimum by occasionally accepting
moves that result in a cost increase.
The simulated Annealing Algorithm is given below.
temp = INIT-TEMP;
while (temp > FINAL-TEMP) do
while (inner_loop_criterion = FALSE) do
new_ place = PERTURB (place);
C = COST(new_ place) - COST (place);
if (C < 0) then
place = new_ place;
else if (RANDOM(0,1) > e T then
place = new_place;
temp = SCHEDULE(temp);
In simulated annealing, all moves that result in a decrease in cost are accepted. Moves
that result in an increase in cost are accepted with a probability that decreases over the
iterations. The analogy to the actual annealing process is heightened with the use of a
parameter called temperature T. This parameter controls the probability of accepting
moves that result in an increased cost. Additional moves are accepted at higher values of
temperature than at lower values. The acceptance probability can be given by e C/T
where C is the increase in cost. The algorithm starts with a very high value of
temperature, which gradually decreases so that moves that increase cost have a lower
probability of being accepted.
Finally, the temperature reduces to a very low value which causes only moves that reduce
cost to be accepted. In this way the algorithm converges to an optimal or near-optimal
configuration. This random shuffling could be achieved by displacing a block to a
random location, an interchange of two blocks, or any other move that can change the
wire length. After the shuffle, the change in cost is evaluated. If a decrease in cost occurs,
the configuration is accepted; otherwise, the new configuration is accepted with a
probability that depends on the temperature. The temperature is then lowered using some
function which, for example, could be exponential in nature. The process is stopped when
the temperature has dropped to a certain level. The parameters and functions used in a
simulated annealing algorithm determine the quality of the placement produced.
These parameters and functions include the cooling schedule consisting of initial
temperature (init_temp), final temperature ( final_temp), and the function used for
changing the temperature (SCHEDULE), inner_loop_criterion, which is the number of
trials at each temperature, the process used for shuffling a configuration (PERTURB),
acceptance probability (F), and the cost function (COST).A good choice of these
parameters and functions can result in a good placement in a relatively short time.
Backtrack Search Algorithm:
In finding a solution for any problem , a sequence of decision points are considered , in
which each choice leads to further along some path. If the correct set of choices are
made ,it will end up at the solution. On the other hand, if a dead end is reached or
otherwise found that an incorrect choice is made then a back track is followed to a
previous decision point and try a different path. Algorithms that use this approach are
called backtracking algorithms

So,this algorithm uses recursive depth-first search to explore the search space. Depth-
first search keeps going down a path as long as it can. If it reaches a node with no
children (dead end"), then it back tracks to its parent and tries another child node that it
hasn't already explored. If it has explored all child nodes, then it backtracks up one more
level and continues. If the average branching factor is b and the depth of the search tree is
k, then backtrack search requires O(bk ) time, which is exponential. Classic examples of
problems requiring backtrack search.
Branch and Bound algorithm:
Branch and bound is a variant of backtracking search that takes advantage of information
about the optimality of partial solutions to avoid considering solutions that cannot be
optimal. So we are still doing an exhaustive search but potentially avoiding exploring
large parts of the search space that are not going to give us a solution. Given an initial
problem and some objective function f to be minimized, the branch and bound technique
works as follows.
If the problem is small enough, then solve it directly.
Otherwise the problem is decomposed into two or more sub problems. Each sub problem
is characterized by the inclusion of one or more constraints.
For each sub problem, we compute a lower bounding function g. This lower bound
represents the smallest possible cost of a solution to the sub problem,given the constraints
on the given sub problem.
The objective behind any simulation tool is to create a computer based model for the
design verification and analyzing the behavior of circuits under construction also
checking the current level of abstraction.

Types of Simulation :

Device level simulation . Circuit level simulation . Timing level & Macro level
simulation. Switch level simulation. Gate level simulation. RTL simulation. System level
Device level simulation : This model involves with a semiconductor device like a MOS
transistor used to test the effect of fabrication parameters .Simulator techniques based on
finite-element method are used for this purpose.
Circuit level simulation:It deals with small groups of transistors modeled in the analog
domain .The variables computed are currents and voltages and the computations are
based on numerical methods.
Switch level simulation: This simulation method , models the MOS transistors as
switches,that pass signals .The values of signals are discrete ,but it also includes certain
analog features to combine certain components like resistance and capacitance.
Gate level simulation : In this model a circuit is composed of several logic gates
connected by uni-directional memory less wires. The logic gates themselves are
collections of transistors and other circuit elements which perform a logic function. A
logic gate may be a simple inverter or NAND gate or NOR gate or more complex
functional unit like a flip-flop or register.
Register Transfer Level (RTL) simulation: This model is used synchronous circuits
where all registers are controlled by a system clock signal.The registers store the state of
the system ,while the combinational logic computes the next state and the output based
on the current state and the input. Here the important consideration is the state transitions
and the precise timing of intermediate signals in the computation of the next state is not
System level Simulation : It deals with the hardware described in terms of primitives
that need not correspond with hardware building blocks. VHDL is the most popular
hardware description language used for system level simulation.When used in the initial
stages of a design ,it can describe the behavior of a circuit as a processor as a set of
communicating processes.
Gate Level Modeling and Simulation :
The gate level model forms the theoretical basis for the logic design. In this model a
circuit is composed of several logic gates connected by uni-directional memory less
wires. The logic gates themselves are collections of transistors and other circuit elements
which perform a logic function. A logic gate may be a simple inverter or NAND gate or
NOR gate or more complex functional unit like a flip-flop or register .The logic gates
compute the Boolean functions correspond to their input signals and transmit the values
along wires to the inputs of other gates to which it is connected.Each input of the gate
has a unique signal source. Information is stored only in feedback paths of sequential
Gate level modeling and simulation is classified into following four types.

Signal Modeling :A signal modeling deals with the signal applied to a logic gate.
Normally the Boolean signals are denoted by either 0 or 1.A signal which is neither 0 nor
1 is denoted by X. This indicates a transition from one state to other. Here X denotes the
unknown value.The more values are used for a signal ,the more complex is the modeling
of a gate.If the gate has n inputs signals ,with each signal having N values ,the output for
Nn should be specified. The logic involved in dealing with a circuit modeled using
multiple valued discrete signals is called multiple-valued logic.
Gate Modeling: This is useful to model the behavior of even a single gate .The model
should be such that signal values at the gates outputs are efficiently computed as a
function of gates inputs.The outputs of a gate are represented by either truth table
representation or Sub-routine representation.
Delay Modeling:
At the gate level ,time is modeled ina discrete way and all delays in the circuit are
expressed as an integer multiple of a time unit. The output of any physical gate will take
some time to switch after the moment that the input is switched. The delay occurred
here can affect the correct functionality of the circuit, especially when the circuit is
asynchronous. So,a correct modeling of the delays is needed. The important delay models
Propagation Delay model : It is associated with a fixed delay at the gates output. So,any
effect of switching inputs is observed at the output of after certain delay.
Rise fall Dealy model: This model is related to rise and fall in the output of a gate.It
always takes some time ro come back to normal state after rise or fall of a signal.
Inertial Delay model:The input pulse should have a minimum width in order to have any
effect at the output.Inertial delays occurs due to the capacitive elements in the gate. The
inertial delays can be combined with the propagation and rise fall delay models.
Connectivity Modeling : This model is related to the suitable connections of all gates in
the network .For this the simulator should have suitable data structures to represent the
The unilateral nature of logic gates is the basis to the operation of gate level simulators.
For each binary vector at the input nodes of a logic gate ,the binary value (0 or 1) at the
output is computed and propagated on the inputs of other gates that are connected to it.
Here during the propagation of the signal a certain time delay occurs due to the inertial
elements like node capacitances present in the circuit. The simulators which do not
consider this delay can analyze only combinatorial circuits.So,the simulators which
handle sequential circuits must estimate the propagational delay through a logic gate in
several ways.Some simulators operate in the unit-delay mode where all logic gates are
assumed to have the same delay. But these Unit delay simulators can verify only the
steady state behavior or the logic functionality of the digital circuit.
The difference in the propagation delays through different signal paths in a network of
logic gates ,sometimes cause undesirable situations like static Hazards and dynamic
Hazards.Hazards are situations ,where a spurious glitches or spikes occur in an otherwise
smooth analog waveform at the output of a logic gate.
Compiler Driven Simulation:

There are two basic mechanisms to simulate a circuit at the gate level.They are (i)
Compiler-driven simulation and (ii) Event-driven simulation

The compiler driven simulation occurs in synchronous circuits.The core of such circuits
consists of registers that store the state of the system and combinational logic that
computes the next state.

Event driven simulation is developed by the fact that ,under normal conditions ,very few
gates switch simultaneously and that computing signal propagation through all gates in
the network over and over again at each time instant leads to unnecessary
computations. So,it is economical to compute only those signals that are actually
changing their states. A change in signal state is called an event , hence this simulation is
called Event driven Simulation.

In a sequential circuit ,the occurrence of a glitch could cause the circuit to mal-function.
Therefore ,the detection of Hazards and race-conditions are very important and as
result ,most digital simulators generates an alert to the user when they occur. The
detection of Hazards is possible by introducing a third state , denoted by X ,which
denotes a signal-transition.

Many simulators use a third value to represent an unknown or undefined logic level
denoted by X.This X state indicate an uninitialized signal ,a signal held between two
logic thresholds or signal in a 0 1 or 1 0 transition.The X state is handled
algebraically by extending the binary Boolean algebra to a ternary or three valued De
Morgans algebra which preserves most of the desired properties of Gate model.

Some other simulators also implement the X-state by an enumeration technique in which
the simulation is repeated with the nodes in the X-state set to all possible combinations of
0s and 1s.Nodes that remain in a unique binary state for all combinations are set to this
state,whicl others are set to X.

To simulate tri-state gates and logic buses ,some simulators use a fourth state called the
High Impedance state and normally denoted by H(or Z also).This H state is also used
some times to model dynamic memory by allowing a node to retain its previous logic
state,if the outputs of all logic gates connected to the node are at the H-level.

Gate level simulators are not completely suitable for the logic simulation of MOS
circuits. Because ,there is a mismatch between the Boolean gate model and the behavior
of the MOS logic circuits. Hence ,there is a need of different approach to the digital
modeling and simulation of MOS circuits ,which is nothing but switch level logic

Switch level modeling and simulation: For simulation of MOS circuits ,these
switch level simulators are developed.One of the first switch-level simulators to be
implemented is MOSSIM.

In contrast to the gate-level modeling and simulation ,the switch leveltechniques operate
directly on the transistor circuit structure and capture many circuit properties that are not
possible in gate level model. For example bi-directionality of signal flow ,charge sharing
effects and transistor sizes.In contrast to circuit level simulation ,node voltages are
represented by discrete logic levels and transistors by bi-directional resistive switches in
switch level modeling.

So, a MOS circuit is modeled by a set of nodes connected by resistive switches.Node

voltages are represented by a small number of discrete logic levels like 0 ,1,X or U .Here
U denotes undefined or unknown.In some cases the high impedance state Z is also used.

The resistive switch model of a transistor is controlled by the voltage level at its gate
terminal.An n-type transistor is conducting when its gate voltage is 1 and a p-type
transistor is conductiong when its gate voltage is 0.Transistors are allowed to have
discrete strength values depending on the values of their conductances when fully
ON.This is done to model the behavior of ratioed logic.

As an example, a depletion load transistor used in n-channel MOS circuit design has its
gate logic level set to 1 and its strength is weaker than that of an enhancement type
transistor. Transistors in series are equal to a single transistor of strength equal to the
weakest one, while transistors in parallel are equivalent to a single transistor of strength
equal to the strongest one(maximum conductance).

In most of the switch level simulators ,the circuit is partitioned into channel-connected
sub-circuits.This partitioning can be done at once at the outset,where every transistor is
included or dynamically at every iteration where only conducting transistors are
included.This dynamic partitioning adds some additional overhead cost in the design.

The simulation of the entire circuit follows an event scheduler similar in many ways to
gate level logic simulators, except that now the gates consists of channel connected

Logic synthesis is the process of converting a high-level description of design into an
optimized gate-level representation. Logic synthesis uses a standard cell library which
have simple cells, such as basic logic gates like and, or, and nor, or macro cells, such as
adder, muxes, memory, and flip-flops. Standard cells put together are called technology
library. Normally the technology library is known by the transistor size (0.18u, 90nm).

Boolean Logic Functions Representation: There are various methods in use to

represent the logic functions. For example
(i) Truth table method ,(ii) equation method (iii) K-map method and (iv) , circuit
method etc

But all these methods are not standard or unique or Canonical. To synthesize ,optimize or
to verify or manipulate the large Boolean functions ,they must be represented efficiently
by using suitable methods. One such method to represent the complex Boolean functions
is Binary Decision Diagrams(BDD).The BDD method ,which is canonical is the most
popular method among others.
Binary Decision Diagram (BDD)
Binary decision diagram (BDD) is a graphical representation of a Boolean function,
which is derivable from Shannons expansion theorem . It is similar to binary tree .So, a
binary decision diagram (BDD) is a finite DAG (Directed Acyclic Graph) with the
following features.
It has a unique initial node,
all non-terminals labelled with a Boolean variable,
Aall terminals labeled are with 0 or 1,
All edges are labelled with 0 (dashed edge) or 1 (solid edge),
Each non-terminal has exactly 1 out-edge labeled 0 and 1 out-edge labeled 1.
Shannons Expansion Theorem :

The Shannons expansion theorem is used iteratively to build any BDD for a given
Boolean function.

Shannons expansion theorem states that Any switching function of n variables can be
expressed as a sum of products of n literals, one for each variable.

Let us assume that f (x1, x2, ..., xn) is a switching function of n variables. According to
Shannon , one way of expressing this function is
f(x1, x2, ..., xn) = x1f(1, x2, ..., xn) + x1'f(0, x2, ..., xn)
On the right side, the function is the sum of two terms, one of them relevant when x1 is
equal to 1 and the other when x1 is equal to 0 . The first term is x1 times what remains of
f when x1 is equal to the value 1 and the second term is x 1' times what remains of f when
x1 is equal to 0.
Shannons expansion theorem in the general case is
f = a0 x1'x2' ... xn' + a1x1'x2' ... xn1'xn + a2x1'x2' ... xn1 xn' + ...+ a2n2 x1x2... xn' + a2n1 x1x2... xn

Each ai is a constant in which the subscript is the decimal equivalent of the multiplier of
ai viewed as a binary number. Thus, for three variables, a5 (binary 101) is the coefficient
of x1x2'x3.
In a similar way it can be stated as any switching function of n variables can be
expressed as a product of sums of n literals, one for each variable .
Binary Decision Diagram (BDD)- Example :
Let us consider an example of constructing a BDD.The output of a Boolean function S is
given by the truth table and the function is denoted by the MUX based circuit shown in
the diagram.

Figure (i) Truth table Figure (ii) Mux based circuit

This Boolean function is denoted by the Binary Decision Diagram as shown below figure
( c)

Figure ( c) - Binary Decision Diagram

In the BDD , the line with a bubble on it denotes that value = 0 and the Lines without
bubble denote the value = 1.
Let us consider S(0,0,0) in Figure (d) and S(1,1,1) in Figure (e).

There are several methods to denote the value = 1 and value = 0 .For example
Bubble vs. Non-bubble line
Dashed line vs. Solid line
T (then) vs. E (else) labels

(i)-Bubble no bubble (ii) Dashed line-solid line (iii) T(then) vs

E(else) label

BDDs for Basic Logic Functions :

The following diagrams denote some logic functions NOT , OR ,AND etc..

Ordered Binary Decision Diagram (OBDD)

A ,Binary Decision Diagram is said to be an ordered binary decision diagram if the

appear in the same order along all paths from the root to the leaves
Let [x1,x2,-------------xn] be an ordered list of variables without duplicates. This BDD is
said to have ordering [x1,x2,x3,---------xn] if (i) all variable labels occur in the order
and if xj follows xi along a path such that j > i.

For variables a,b,c,d the ordering should be such that a b c d as shown in the
diagram below.

The diagram (b) above, shows an optimal ordering because, there is exactly one node
for each variable. The order is b c a d .
Reduction operations(ROBDD) :
1. Removal of duplicate terminals. If a BDD contains more than one terminal 0-node,
redirect all edges which point to such a 0-node to just one of them. Proceed in the same
with terminal nodes labelled with 1.
2. Removal of redundant tests. If both outgoing edges of a node n- point to the same node
then eliminate that node n,sending all its incoming edges to m
3. Removal of duplicate non-terminals. If two distinct nodes n and m in the BDD are the
roots of
structurally identical sub BDDS, then eliminate one of them, say m, and redirect all its
incoming edges to the other one.
A BDD is reduced if it has been simplified as much as possible using these reduction
Examples: (i). remove duplicate terminals
(ii).Remove redundant tests.

(iii).Remove duplicate non-terminal

(iv).Remove duplicate non-terminal and redundant tests

A BDD is reduced if it has been simplified as much as possible using these reduction
BDD Canonical Form : The Binary Decision Diagrams are said to be canonical(unique)
for a given ordering if All internal nodes are descendants of some node and there are no
isomorphic sub-graphs and for every node fT fE .

Advantages of Canonical representation: Representing a Boolean function in canonical

form ,has the advantage of checking for redundant variables easily .A Boolean function
f does not depend on an input variable x if no nodes occur for x in the Reduced ordered
Binary Decision Diagram.

The equivalence of two functions f and g can be easily done by seeing the structure of
ROBDDs.The various manipulations on BDDs can be performed directly if the function
is denoted in canonical form.


The basic objective of high-level synthesis is the mapping of a behavioral description of a

digital system into an RTL design consisting of a data path and a control unit.
So, the High-level synthesis (HSL) is a sequence of tasks that transforms a behavioral
representation into an RTL design. The design consists of functional units such as ALUs
and multipliers, storage units such as memories and register files, and inter connection
units such as multiplexers and buses.
One of the tasks of synthesis is to find the structure that best meets the constraints, such
as limitations on cycle time , area or power, while minimizing other costs. For example,
the goal might be to minimize area while achieving a certain minimum processing rate.
From that input specification, the synthesis system produces a description of a register-
structure that realizes the specified behavior. This structure includes a data path, that is, a
network of registers, functional units, multiplexers and buses, as well as hardware to
control the data transfers in that network. If the control is not integrated into the data path
the synthesis system must also produce the specification of a finite state machine that
drives the data paths so as to produce the required behavior. The control specification
could be in terms of microcode, a PLA profile or random logic.
High-level synthesis is different from the logic synthesis in a sense that the logic
synthesis is specified in terms of logic equations, which must be optimized and mapped
into a given technology. Logic synthesis is used on a design after high-level synthesis
has been done.
So,we can say that High-level synthesis (HLS) is a new step in the design flow of a
digital electronic circuit, moving the design effort to higher abstraction levels.
The advantage of High-level synthesis is that it raises the design abstraction level and
allows rapid generation of optimized RTL hardware for performance, area, and power
In addition , there are several advantages arise from the use of HLS in the design flow.
First of all, the amount of code to be written by designers is reduced dramatically, which
saves time and reduces the risk of mistakes.
Verification time, is reduced a lot because the HLS tool can, in addition to the design
itself, generate test benches, thereby reusing test data that was used to validate the source
code.This is particularly relevant for the design of FPGA based embedded systems.

All HLS systems need to restrict the target hardware. Most systems generate
synchronous hardware and build it with the following parts:
Functional units : They can perform one or more computations, e.g. addition,
comparison, ALU.
Registers:They store inputs, intermediate results and outputs; sometimes several
registers are taken together to form a register file.
Multiplexers: From several inputs, one is passed to the output.

Busses: a connection shared between several hardware elements, such that only one
element can write data at a specific time.

Three-state (tri-state) drivers control the exclusive writing on the bus.

Hardware is normally partitioned into two parts:

The data path : A network of functional units, registers, multiplexers and buses. The
actual computation takes place in the data path.
Control: The part of the hardware that takes care of having the data present at the right
place at a specific time, of presenting the right instructions to a programmable unit, etc.
Often high-level synthesis concentrates on data-path synthesis. The control part is then
realized as a finite state machine or in microcode.
Synthesis tasks
High-level synthesis maps a behavioral description into the FSMD model so that the data
path executes variable assignments and the control unit implements the control
constructs. Since the FSMD model [FSM with a data path] determines the amount of
computation in each state, so,one must first define the number and type of resources
(storage units, functional units, and interconnection units) to be used in the data path.
Allocation is the task of defining necessary resources for a given design constraint.
The next task in mapping a behavioral description into an FSMD model is to partition the
behavioral description into states (or control steps) so that the allocated resources can
compute all the variable assignments in each state. This partitioning of behavior into
time intervals is called scheduling.
Although scheduling assigns each operation to a particular state, it does not assign it to a
particular component. To obtain the proper implementation, we assign each variable to a
storage unit, each operation to a functional unit, and each transfer from l/O ports to units
and among units to an interconnection unit. This task is called binding (or resource
Binding defines the structure of the data path but not the structure of the control unit. The
final task, control synthesis, consists of reducing and encoding states and deriving the
logic network for next-state and control signals in the control unit. Control synthesis
employs well-known logic synthesis.
Allocation. The allocation task determines the type and quantity of resources used in the
chip architecture. It also determines the clocking scheme, memory hierarchy, and
pipelining style. The goal of allocation is to make appropriate trade-offs between the
designs cost and performance. If the original description contains inherent parallelism,
allocating more hardware resources increases area and cost, but it also creates more
opportunities for parallel operations or storage accesses, resulting in better performance.
On the other hand, allocating fewer resources decreases area and cost, but it also forces
operations to execute sequentially , resulting in poorer performance . To perform the
required tradeoffs, allocation must determine the exact area and performance values.
Scheduling :
The next step schedules operations and memory accesses into clock cycles. Scheduling
algorithms are of two types, based on the optimization goal and the specified constraints.I
f the user has completely specified all the available resources and the clock cycle length
during allocation, the scheduling algorithms goal is to produce a design with the best
possible performance, or the fewest clock cycles. In other words, scheduling must
maximize usage of the allocated resources. We call this approach resource-constrained
scheduling. If a list of resources is not available prior to scheduling, but a desired overall
performance is specified, the scheduling algorithms goal is to produce a design with the
lowest possible cost, or the fewest functional units. This is the time constrained
scheduling approach. Resource-constrained scheduling usually constructs the schedule
one state at a time. It schedules operations so as not to exceed resource constraints or
violate data dependencies. It ensures that at the instant for which it schedules an
operation Oi into control step Sj, a resource capable of executing O i is available and all
the predecessors of node Oi have been scheduled.
Binding : The binding task assigns the operations and memory accesses within each
clock cycle to available hardware units. A resource such as a functional, storage, or
interconnection unit can be shared by different operations, data accesses, or data transfers
if they are mutually exclusive. For example, two operations assigned to two different
control steps are mutually exclusive since they will never execute simultaneously; hence
they can be bound to the same hardware unit. Binding consists of three subtasks based on
the unit type.

Scheduling Algorithms: The scheduling is very important because it determines the

amount of concurrency of the resulting implementation .It also determines the amount of
hardware resources required and with minimum area.

There are two important scheduling algorithms. They are ASAP(As soon as possible )
and ALAP(As late as possible).

ASAP Scheduling: It is a unconstrained minimum-latency scheduling problem.In this

situation ,we have infinite resources, all that is needed is the minimum time to perform
the computation. Hence it is referred to as ASAP (as soon as possible) scheduling.
The code for the ASAP scheduling is given below.
ASAP( GS(V,E) ){
Schedule v0 by setting t0 = 1
Select a vertex vi whose predecessors are all scheduled;
Schedule vi by setting ti = max tj + dj
until (vn is scheduled);
return t;

Explanation: The various steps are explained below.

Step1: Schedule v0 at time 1

Step2 : Select a vertex vi whose predecessors are all scheduled
Step3 : Schedule vi to time = predecessors scheduled time + time required for
predecessor to execute
Step4 : Has vn been scheduled yet? No, Repeat loop.
ALAP Scheduling :
It is a Latency constrained scheduling problem.Here Schedule must satisfy an upper
bound on latency .Hence it is referred to as ALAP (as late as possible) scheduling.
ALAP Scheduling goal is to schedule tasks/operations to perform as late as possible
The code for the ALAP scheduling is given below.
ALAP( GS(V,E), ) { ; Perform ALAP scheduling on the
graph, is the upper time bound
Schedule vn by setting tn = + 1 ; Schedule the sink node vn for upper
bound + 1
repeat {
Select a vertex vi whose successors are all scheduled ; Look for tasks/operations whose
Successors are already
Schedule vi by setting ti = min tj - dj ; Schedule the task/operation to time = time
successor scheduled for - time required for
successor to execute
may have multiple successors,take
minimum time
until (v0 is scheduled) ; Keep going until we have scheduled the source
node v0
return t;
Mobility :
In ASAP scheduling we compute the earliest time at which an operation is scheduled.
One can also perform the opposite and compute the latest time at which an operation can
be scheduled.The ALAP scheduling has the same disadvantages as ASAP scheduling. The
combination of both scheduling gives more powerful heuristics. Mobility is used as a
basis for several scheduling heuristics.
So, Mobility or slack is an important quantity used by some scheduling
algorithms.Mobility is given by
Mobility = start time ALAP scheduling start time ASAP scheduling
When Mobility = 0, task/operation can only be started at the given time in order to meet
overall latency constraint. Similarly Mobility > 0, indicates the span of possible start
times and Helps with minimizing resources (adders, multipliers, etc.).
Hence , the concept of Mobility can be used to improve resource requirements.
Physical Design Automation of FPGAs

FPGA is a new approach to ASIC design that can dramatically reduce manufacturing
turnaround time and cost. In its simplest form, an FPGA consists of a regular array of
programmable logic blocks interconnected by a programmable routing network. A
programmable logic block is a RAM and can be programmed by the user to act as a small
logic module. Given a circuit, user can program the programmable logic module using an
FPGA programming tool. The key advantage of FPGAs is re-programmability .The RAM
nature of the FPGAs allows for in-circuit flexibility that is most useful when the
specifications are likely to change in the final application. In some applications such as
remote sensors, it is necessary to make system updates via software. In FPGA, a data
channel is provided, which allows easy transfer of the new logic function and
reprogramming the FPGA.
The physical design automation of FPGAs involves mainly three steps. They are
partitioning, placement and routing.
Partitioning problem in FPGAs is significantly different from the partitioning problems in
other design styles .This problem mainly depends on the architecture in which the circuit
has to be implemented.
Placement problem in FPGAs is very similar to the gate array placement problem.
The routing problem in FPGAs is to find a connection path and program the appropriate
interconnection points.
FPGA Technologies :
An FPGA architecture mainly consists of two parts : the logic blocks, and the routing
network. A logic block has a fixed number of inputs and one output. A wide range of
functions can be implemented using a logic block. Given a circuit to be implemented
using FPGAs, it is first decomposed into smaller sub-circuits such that each of the sub-
circuit can be implemented using a single logic block. There are two types of logic
blocks. The first type is based on Look-Up Tables (LUTs), while second type is based on
Look-up table based logic blocks:

A LUT based logic block is just a segment of RAM. A function can be implemented by
simply loading its LUT into the logic block at power up. If a function needs to be
implemented, then its truth table is loaded into the logic block. In this way, on receiving a
certain set of inputs, the logic blocks simply look up the appropriate output and set the
output line accordingly. Because of the reconfigurable nature of the LUT based logic
blocks, they are also called the Configurable Logic Blocks (CLBs). It is clear that bits are
required in a logic block to represent abit input, 1-bit output combinational logic
Multiplexer based logic blocks: Typically a multiplexer based logic block consist of
three 2-to-l multiplexers and one two-input OR gate as shown in Figure below.

The number of inputs is eight. The circuit within the logic block can be used to
implement a wide range of functions. One such function, shown in Figure (a) can be
mapped to a logic block as shown in Figure (b). Thus, the programming of multiplexer
based logic block is achieved by routing different inputs into the block.

There are two models of routing network , the segmented and the non-segmented routing
network .
Physical Design Cycle for FPGAs :The physical design cycle for FPGAs consists of the
following three important steps:
.Partitioning: The circuit to be mapped onto the FPGA has to be partitioned into smaller
sub-circuits, such that each sub-circuit can be mapped to a programmable logic block.
Unlike the partitioning in other design styles, there are no constraints on the size of a
partition. However, there are constraints on the inputs and outputs of a partition. This is
due to the unique architecture of FPGAs.
Placement: In this step of the design cycle, the sub-circuits which are formed in the
partitioning phase are allocated physical locations on the FPGA, i.e., the logic block on
the FPGA is programmed to behave like the sub-circuit that is mapped to it. This
placement must be carried out in a manner that the routers can complete the
interconnections. This is very critical as the routing resources of the FPGA are limited.

Routing: In this phase, all the sub-circuits which have been programmed on the FPGA
blocks are interconnected by blowing the fuses between the routing segments to achieve
the interconnections.
Figure above shows the complete physical design cycle of FPGAs. System design is
available as a directed graph which is partitioned in second step. Placement involves
mapping of sub-circuits onto CLBs. Shaded rectangles represent CLBs which have been
programmed. Final step is routing of channels.

References: 1.Algorithms for VLSI Design Automation S.H.Garez

2. ECE 474a/575a Susan Lysecky.
3. Algorithms for VLSI Physical Design Automation - Naveed A.

You might also like