Professional Documents
Culture Documents
Ucin 1235504983
Ucin 1235504983
Ucin 1235504983
UNIVERSITY OF CINCINNATI
Date: January 23, 2009
I, Pritesh Johari ,
hereby submit this original work as part of the requirements for the degree of:
Master of Science
in Computer Engineering
It is entitled:
Distributed Decap-Padded Standard Cell based
On-Chip Voltage Drop Compensation Framework
Student Signature:
Pritesh Johari
Committee Chair:
Dr. Ranga Vemuri
Dr. Wen Ben Jone
Dr. Carla Purdy
Dr. Hal Carter
I have reviewed the Thesis/Dissertation in its final electronic format and certify that it is an
accurate copy of the document reviewed and approved by the committee.
MASTER OF SCIENCE
in the Department of
Electrical and Computer Engineering
of the College of Engineering
by
Pritesh Johari
Technology induced voltage scaling coupled with faster switching, have made the circuit behav-
ior very sensitive to the power supply variations. The effect is classified as power and ground bounce
problems. Power and ground bounce can inject random glitches which propagate as mal-functioning
logic.
On-chip decoupling capacitors (Decap) are used to reduce the power supply noise. Traditionally,
lumped decaps are placed in the chip-finishing stages at available white spaces. However, insuffi-
cient budgeting at an early stage and lack of placement estimation have often positioned the decaps
at a distance away from the switching nodes. Experimental results show that proximity of the de-
caps to the violating switching nodes is more effective in power supply noise cancellation. This
work attempts to develop an alternative framework to incorporate the decaps in a design close to the
switching nodes, thus making them more effective.
The proposed voltage drop optimization framework comprises of three components. First, a
special standard cell library with minimum decap padding is developed in order to place decaps
closest to the victim nodes. Second, we propose an optimization algorithm to incorporate these
standard cells together with minimal value of lumped decaps in the physical synthesis stages. Lastly,
we develop an engineering change order placer to generate a valid decap-optimized placement.
The developed framework is integrated with the commercial backend design tools (Cadence and
Synopsys). The effectiveness of our work has been demonstrated on standard benchmark circuits.
To,
My Dearest Parents
Acknowledgments
I would like to extend my sincere thanks to Dr. Ranga Vemuri, my research advisor. He is not
only a good teacher and a great researcher but a very down-to-earth and a humble person. I admire
his qualities, and would like to imbibe them. I thank him for his guidance and regular discussions.
I am also grateful to him for giving me the opportunity to teach few lectures of Physical VLSI and
VLSI Design Automation Course. I would like to thank Dr. Wen-Ben Jone, Dr. Carla Purdy, and
Dr. Harold Carter for being on my thesis committee. It is a great honor for me to present my work
to these distinguished professors. I thank them for their valuable time. The distinctive quality of
Dr. Wen-Ben Jone to blend the academic discourse with humor is unforgettable. I would also like
to thank Mr. Rob Montjoy, ECE Department System Administrator, for resolving numerous tool
related issues.
I would like to express my deepest gratitude to Shubhankar Basu, a DDELite. The base for this
research work, the decap-padded standard cell approach, was provided by him. I am deeply indebted
to him for his long discussions about the research. He has been a great mentor to me. In a short
span of time, he taught me a lot about VLSI CAD. I am really thankful to him. I would also like
to thank my fellow DDELites, Surya, Angan, Annie, Almitra, Mike, Hao, Romana, Ajaay, Balaji,
Suman, Manoj, Bala, and Vijay for making the life fun and memorable. Their presence made the
DDEL more than just a laboratory. Regular technical discussions with Surya, Angan, Hao, Almitra
have enriched my knowledge. I thank Romana for helping me with the Virtuoso tool during the
beginning phase of my research. Angan and Annie, full of energy and sense of humor, have added
a lot of fun to the research life. I would like to thank them all for providing such a nice time.
My parents, my brother and family members have always been supportive of me. I owe them for
their trust and belief in me, and for being supportive of me during many ups and downs during my
stay at UC. Their constant support and encouragement has given me enough strength to complete
my studies at UC.
I would like to thank my friends, Kiran, Srikara, Surya, Sidhhartha, Kalyan, Ravish, Kartik,
Sagar, Srikanth, Nishant, Skanda, Dipti, Shubham, Aparna for creating a very healthy and scholarly
atmosphere around me. The cherishable moments, I spent with them, are really worth remembering.
This apart, there were a lot of inconspicuous contributions, so necessary at times to lead us out
of stalemate. I gratefully acknowledge all of them which could not be mentioned here explicitly.
Contents
List of Figures v
1 Introduction 1
1.1 Power Supply Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Causes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Adverse Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Approaches to Tackle Power Supply Noise . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Power Distribution Network Design . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Logic Level Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.3 Other Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Research Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
i
2.3 Static Voltage Drop Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Dynamic Voltage Drop Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 On-Chip Decoupling Capacitance (Decap) . . . . . . . . . . . . . . . . . . . . . . 26
2.5.1 Sources of On-Chip Decap . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Research in Voltage Drop Optimization . . . . . . . . . . . . . . . . . . . . . . . 28
ii
5.3 Decap Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.1 Decap Measurement using HSPICE . . . . . . . . . . . . . . . . . . . . . 57
5.3.2 DCFLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3.3 UCDCLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4 Cell Characterization and View Generation . . . . . . . . . . . . . . . . . . . . . 62
5.4.1 Symbol Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4.2 Physical View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4.3 Timing and Netlist View . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.4.4 Parasitic View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
iii
8.3 B18 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Bibliography 135
iv
List of Figures
v
4.4 PrimeRail Voltage Drop Analysis Flow . . . . . . . . . . . . . . . . . . . . . . . . 46
8.1 Optimization Result Graphs for Barcode 16: Peak VD and Decap Budget . . . . . 109
8.2 Optimization Result for Barcode16: Voltage Drop Maps . . . . . . . . . . . . . . 110
8.3 Optimization Result Graphs for B14: Peak VD and Decap Budget . . . . . . . . . 111
8.4 Optimization Result for B14: Voltage Drop Maps . . . . . . . . . . . . . . . . . . 112
vi
8.5 Optimization Result Graphs for B18: Peak VD and Decap Budget . . . . . . . . . 113
8.6 Optimization Result for B18: Voltage Drop Maps . . . . . . . . . . . . . . . . . . 114
8.7 Manual Optimization Results for Barcode16 . . . . . . . . . . . . . . . . . . . . . 118
8.8 Manual Optimization Results for B14 . . . . . . . . . . . . . . . . . . . . . . . . 119
vii
List of Tables
5.1 Channel Capacitance of MOSFET for Different Operating Regions (Source [4]) . . 55
5.2 NMOS Decap Measurement Results using HSPICE . . . . . . . . . . . . . . . . . 59
5.3 PMOS Decap Measurement Results using HSPICE . . . . . . . . . . . . . . . . . 59
5.4 DCFLIB Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.5 OSULIB and UCDCLIB Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.6 Library Views and their Description . . . . . . . . . . . . . . . . . . . . . . . . . 63
viii
Chapter 1
Introduction
The semiconductor industry has seen an unprecedented growth since the advent of transistor
era. The invention of transistor and the associated integrated circuits in 1960’s had soon replaced
the bulky vacuum-tube based devices, leading to highly compact, more reliable, and affordable
electronic devices. This propelled an ever increasing demand for the electronic products. Advances
in the research technologies made sure that this trend continues. Following the Moore’s law, which
states doubling of transistor count every 1.5 to 2 years [5], the semiconductor industry has come
a long way from about 1000 transistors in early 1970’s to hundreds of millions of transistors in
the present day integrated circuits. Today, the emphasis is on ”More than Moore” approach which
allows for functional diversification by integrating devices belonging to different domains (e.g.,
analog, RF communication, sensors) on to a single chip (known as System-on-Chip or SoC) [1].
This will enable us to witness increasingly complex and cost-effective devices for at least few more
decades.
The magnificent increase in complexity and performance of the integrated circuits over the years
has been made possible with the advances in circuit and process technologies. The demand for
power and area efficient devices has paved the way for the CMOS technology. At the same time,
the advances in the process technologies led to a steady decrease in the feature size of the CMOS
devices. The technology scaling from sub-micron to deep-submicron (less than 0.5um) has been
beneficial for CMOS devices in terms of area, power, and speed metric. And now, the transition into
the nanometer regime (less than 100nm) would continue to bolster the device characteristics.
Apparently, the motivating factors for the advancement in the technology are performance and
1
cost-effectiveness. Nevertheless, nothing comes without a price. With continual shrinking of the
device sizes, and increasing on-chip logic complexity, various deep-submicron and nanometer par-
asitic effects are becoming prominent, raising the reliability concerns of the integrated circuits [6].
Effects such as increased heat dissipation, process variation, crosstalk, and power supply noise are
undoubtedly posing serious problems. Unless adequately addressed at various steps during the de-
sign process, these effects could lead to intermittent malfunction to the permanent yield loss of the
circuit.
In this thesis, we propose a new framework to address one of these deep submicron effects,
namely the power supply integrity. For reliable operation of any electronic circuit, the power supply
must be stable over the range of its operation. The data from ITRS 2007 [1] in Figure 1.1 shows IC
supply voltage, cost-performance, and frequency trends. Cost-performance refers to highest chip
performance with economical power consumption management. This technology induced voltage
scaling coupled with faster switching frequency and higher power consumption makes the circuit
behavior very sensitive to the power supply variations. The effect is classifies as power and ground
bounce problem. The power and ground bounce can inject random glitches which propagate as
malfunctioning logic.
2
The scope of work presented in this thesis lies in providing a new framework to stabilize the
power supply to the on-chip logic circuit by optimizing the on-chip decoupling capacitance. To
develop an understanding of the power supply noise effects and the need to control it, subsequent
sections start with a brief overview of the power supply noise, the factors causing the power supply
noise, and its effects on the circuit performance. This is followed by the research overview, and the
contributions. Finally, the overall structure of the thesis is presented in Section 1.6.
Reliable operation of a digital logic circuit requires a stable DC power supply. The design of
such a power supply is done so that it can deliver instantly the required amount of the charge to
the logic circuit during its switching period to meet the timing specifications, while maintaining the
supply voltage level at various points in the circuit. Any change in supply voltage from the ideal DC
voltage level affects the circuit performance. This change in the supply voltage is termed as power
supply noise.
A typical electronic system consists of number of integrated circuits housed in different packages
inter-connected on a single or multi-layer printed circuit board as shown in Figure 1.2. The power
supply to a logic circuit on a specific chip has to come through hierarchy of levels. Originating at
Package Interconnect
(Package to Die Interface) On-Chip
On-Board Interconnect
Interconnect
3
the voltage regulator on-board, the supply voltage to on-chip circuit comes through on-board inter-
connect, package interconnect (package to die interface), and finally through the on-chip intercon-
nection network. Assuming an ideal interconnect network and a perfect on-board voltage regulator,
the logic circuitry can get the required amount of charge instantly during its switching period. And
the voltage level at the logic supply node would then be identical to that of at the output of voltage
regulator. However, in reality, the voltage regulator as well as the power distribution interconnect
is not ideal. On-board voltage regulator shows some finite output impedance. And the long supply
and ground path offers a significant parasitic, namely resistance, capacitance, and inductance. The
current flowing through a resistive network causes a voltage drop as per the ohm’s law, resulting in
decrease in the supply voltage at logic supply nodes. Figure 1.3 shows a circuit model illustrating
this effect [7]. Current I taken by gate G3 causes a voltage drop of [I · (R1 + R2 + R3)] at node V D3.
At the same time, node V D2 will be at [V DD − I · (R1 + R2)], which affects gate G2 performance.
The voltage drop due to current taken by a logic gate not only affects its performance, but it also
affects the neighboring logic gates. The drop in supply voltage due to resistance parasitic is com-
monly referred to as IR drop or simply a voltage drop [8]. The ground network is usually similar to
the supply network, and suffers from the similar noise when the current takes a return path. Due to
the presence of similar parasitic in the ground network, the ground potential increases. This effect
is referred to as ground bounce. With interconnect inductance parasitic becoming increasing impor-
tant, the transient noise due to L ∗ di /dt also contributes to overall voltage drop. The noise at supply
nodes due to inductance is termed as ∆i noise.
R1 R2 R3
VD1 VD2 VD3
VDD
G1 G2 G3
As discussed in Section 1.1.1, there are various reasons which make the power supply noise an
area of concern. At one side, the drop in voltage level at logic supply node adversely affects the
performance; any increase in the voltage level above the ideal voltage level raises the reliability
4
VDDLower = 1.89 V
Node Voltage (v)
Voltage below
VDDLower affects circuit
performance
Time (ns)
Figure 1.4: Supply Voltage Band for Reliable Operation at 180nm Node
concerns. Hence, a reliable operation of a logic circuit requires a strict control of absolute change
in the voltage level from the ideal voltage levels. Typically, the supply voltage is allowed to vary
within 10% voltage band from its nominal value (V DDnominal ) as shown in Figure 1.4. However,
reducing noise margin with supply voltage scaling requires a tighter band of 5% for supply voltage
variations. As long as supply voltage variation is confined in this band, the circuit is said to operate
reliably, meeting its functional and logical specifications.
A major contribution to on-chip power supply noise comes from output drivers. Output drivers
are typically large in size, and are designed to meet high current requirements of off-chip load.
The power supply noise due to simultaneous switching of multiple output I/O drivers is commonly
referred to as simultaneous switching noise or SSN (Figure 1.5). Although SSN can be attributed
to parallel switching of other circuit components; Nevertheless, the term SSN is often used in the
context of output driver switching [2].
1.1.1 Causes
Apparent from previous section, interconnect parasitic plays a big role in intensifying power
supply noise problems, it is, however, not the only cause of supply noise. Various other design
parameters affect the power supply noise. Many of them are difficult to control, and are result of
our craving for increasingly complex circuits on a single die. This subsection briefly summarizes
5
VDD
Package
Chip
V1
I0 D0
CL
I1 D1
CL
I2 D2
CL
I31 D31
CL
G1
GND
Figure 1.5: Simultaneous Switching Noise due to Output Drivers (Adapted from [2])
Cruising along with the Moore’s law, number of transistors per unit area on an integrated circuit
has been consistently increasing with each technology scaling. Although, higher transistor density
on an IC allows for increasingly complex and higher performance circuits, proper fabrication of
the circuit requires more number of metal layers for signal and power routing. Higher number of
metal wires translates into longer power supply connections to the logic circuit, which increases
the associated parasitic. Moreover, narrow interconnect wires at lower process nodes lead to higher
resistance. Increased metal layer count results into more number of via connections between metal
layers. The resistance due to via does not scale well with the technology scaling, and in fact, is
increasing.
At the same time, increased transistor density indicates higher overall current, and hence higher
power consumption, for the design. The “More-than-Moore” worsens the situation by integrating
diverse functions onto a single die. The power consumption of logic is further increasing due to
higher switching activity, attributed to the higher clock frequencies. Another important factor af-
6
fecting the power supply noise is the reduction in the supply voltage as we go down the technology
node. In order to avoid the device breakdown due to excessive electric field, the supply voltage
is being reduced down the technology node. The reduction in supply voltage lowers the available
noise margin, and makes the circuit more sensitive to the power supply variations.
Existing CAD tools are not advanced enough to reliably predict the impact of power supply noise
at the early stages of the backend flow. In order to meet the timing specifications, CAD tools try to
keep timing critical components nearby, increasing the switching activity in a region. The increased
regional voltage drop due to higher power dissipation leads to regions of hotspots (a region with
higher temperature). A major contributor to the power supply noise is the clock network. The
synchronous operation of clock registers leads to excessive voltage drop.
Advanced low power techniques also contribute to voltage drop problems. Technique such as
power gating leads to uneven distribution of the currents. When a power gated circuit block wakes
up, a higher rush current produces regions of hotspots, and hence contributes to the voltage drop.
The clock gating is an another example.
Selection of a proper package for the die also affects the power supply noise. Until the deep sub-
micron node, the package pin to die connection was the only main contributor to the interconnect
inductance parasitic. However, on-chip inductance is also becoming increasingly important due to
longer current loops for complex circuits at lower technology nodes. Lastly, number and location
of supply pads in a chip also affects the IR drop.
Power supply noise affects the circuit performance in many ways. Some of the important effects
are summarized in this subsection [8].
Reduction in the supply voltage of a logic gate increases its propagation delay. This happens
due to reduction in the gate to source voltage for a PMOS transistor, which results in decrease in
the available drain current to charge the output load. The similar phenomenon at NMOS side slows
down the discharge rate due to increase in the ground potential. Further, the voltage drop due to
the current taken by the logic gate can affect the nearby logic circuits as well, if the neighboring
logic switches during the same period. In fact, 5% reduction in the supply voltage can make the
gate 15% slower [10]. The increase in the logic delay can lead to intermittent timing violations, and
7
hence can restrict the operating frequency of the circuit. Typically, in a synchronous logic, voltage
drop in the combinational path leads to setup time violation, because signal propagation is delayed
due to reduction in supply voltage. Similarly, voltage drop in the clock network delays the clock
arrival, and hence leads to the hold time violations. Therefore, in order to ensure the timing goals
of a design, variation in the supply voltage level must be minimized.
The power supply noise degrades the available noise margin for the logic circuit. The output
voltage of a typical logic gate is measured with respect to the supply voltage or the ground potential.
Any change in the supply or ground potential would shift the output voltage, which affects the
subsequent logic gates. If the voltage level for the connecting logic gate is also reduced due to
the power supply noise, the state of gate would be unpredictable due to inconsistency in the voltage
references. The situation is similar to multi-vdd logic design style, where proper operation of circuit
requires level shifters. Due to spatial variation in the supply voltage of connected logic gates, the
available noise margin of the receiving gate degrades.
Power supply variations can introduce jitter and/or skew in the clock signal. On-chip clock is
typically generated using PLL. Any supply variation in the PLL components would lead to change
in the phase of the clock signal, thereby, introducing the jitter. On-chip clock network is designed
to balance the skew between clock endpoints. Supply voltage variation in one of the path of the
clock network can delay the clock propagation as compared to other clock paths, leading to the non-
zero skew for the clock signal. The resulting clock skew can lead to setup or hold time violations
depending on clock direction with respect to the data flow.
Voltage overshoot at the power and ground nodes can affect the reliability of the transistor. Re-
duction of the gate oxide thickness of a transistor with technology scaling makes it more susceptible
to this damage.
Another effect attributed to the voltage drop is a phenomenon of “Joule Heating” [11]. With
interconnect scaling, the resistance per unit length of interconnect is increasing. Higher current
density through such an interconnect produces voltage drop. This voltage drop in the interconnect
leads to the self-heating (known as Joule Heating). Increased temperature causes proportional in-
crease in the resistivity of the metal interconnect, which further increases the IR drop. Hence, the
aggressive technology scaling requires a holistic consideration of power, temperature and voltage
drop effects.
8
1.2 Approaches to Tackle Power Supply Noise
A power supply network in a typical electronic system spans multiple levels of hierarchy. This
includes the on-board supply network, package connections, and the on-chip supply network. With
increasing complexity of the integrated circuits, as well as the board level designs, a system level
approach to tackle the power supply noise is necessary. Although apparent, optimizing noise at all
levels of system hierarchy makes the task extremely daunting. Fortunately, it is possible to optimize
the noise at different levels independently by making certain boundary assumptions. In this section,
we concentrate on the approaches to tackle the on-chip power supply noise.
On-chip power supply noise not only depends on the design of the on-chip power distribution
network, but also depends on the underlying logic circuitry as described in Section 1.1.1. Therefore,
a proper design of power distribution network combined with logic level optimizations can lead to
significant improvement in the power supply noise profile for a chip. Following summarizes some
of the approaches used to tackle the power supply noise:
On-chip power distribution network is designed to provide the required current and voltage to
the logic circuitry for its proper functionality. Multiple metal layers are used in a high performance
integrated circuit to form the signal and power routing network. Usually, higher metal layers are
used to form the global power grid owing to their lowest resistance, and the power connections are
brought to transistor circuits at the lowest level using lower metal layers, connected using via. Power
routing for the logic cells usually goes in conflict with the signal routing. More number of metal
resources used at early stage for the power routing leaves less metal resources for the signal routing.
Alternatively, not using enough resources for the power network leads to increased current density
on the power conductors, leading to the problems of electromigration and voltage drop. Moreover,
typically, the power distribution grid is designed during the early stage of the backend flow, when
the placement information for the logic cells is not known. Prior tape-out experience and gate-level
power consumption information is utilized to design the power network. For these reasons, the
power distribution network is conservatively designed. Adding extra resources for power routing at
the later stages could lead to complete redesign of the network, and hence, preferably avoided. This
leaves the designers with less flexibility in tackling voltage drop problems just by optimizing the
9
power distribution network. Nevertheless, following approaches can still have significant impact in
controlling the power supply noise
• An optimal topology of power distribution network can be selected which results in reduc-
tion in total power routing area and improves overall chip voltage drop [12]. Use of multiple
supply and ground stripes can improve the current profile of the chip. This also helps in reliev-
ing the chip from the electromigration problems. Further, multiple supply pads, sufficiently
spaced around the die periphery, can be provided to lower the overall power distribution net-
work impedance.
• Depending on the design and available metal resources, VDD/GND power planes can be used
at higher metal layers. Power planes significantly reduce the resistance parasitic, and provide
shielding effect for the noise. However, use of power planes complicates signal routing due to
reduction in metal resources. In many cases, VDD and GND rings around the core periphery
would suffice the purpose.
• Sizing of power/ground network is another way to relieve power grid noise. Wire widths of
power and ground conductors are optimized such that total weighted area is reduced while
satisfying the electromigration and voltage drop constraints [13].
• Use of multiple vias also helps in reducing the voltage drop issues. Multiple vias reduce the
via resistance, as well as provides the alternate paths for the current. The reduction in the
current density from one particular path lowers the effective IR drop.
Logic level optimization techniques to control the power supply noise mainly work either by
reducing the total power consumption or by redistributing the overall current requirements in a
region. Some of the important techniques are discussed below:
• Total power consumption of a logic circuit can be reduced by various methods. The issue
of power supply noise becomes important only if, the logic circuit is switching. Hence,
controlling the transient power is important. However, excessive leakage current from the
logic under shutdown can affect the nearby active circuitry, if their power rails are shared.
Many low power techniques such as power gating, clock gating can be used to lower the power
10
consumption. Clock network is one of the major contributors to the power supply noise owing
to clock signal’s unity activity factor. Hence clock gating to inactive logic can significantly
reduce the overall power consumption. However, as described in previous section, use of
low power techniques creates regions of hotspots, which might affect the power supply noise
negatively. Hence, a careful analysis must be done before adopting a specific technique.
• Proper buffer sizing is also important. Many times, buffers are conservatively sized up to
meet the timing specifications. These higher sized buffers exacerbate the noise problems by
demanding more current from the power supply network.
• Reducing the system frequency is another way to reduce the overall noise. High frequency
signals bring in the inductive noise into picture due to L ∗ di /dt .
• Stagger the switching of sequential elements. Do not switch all the elements at the same time.
• Power supply noise affects the timing as described in previous section. Hence, the logic can
be designed conservatively by introducing 10% timing margin in the cell libraries [7].
Voltage drop profile of a design depends on the placement of logic blocks in a design. Most of
the approaches discussed above are typically applied prior to design placement which limits their
usefulness in controlling the voltage drop effectively. By far, the most powerful approach to control
the power supply noise at placement stage is the use of on-chip decoupling capacitors [14, 15].
On-chip decoupling capacitors act as local charge reservoirs, and fulfill the instantaneous charge
requirements of switching node. These are very effective in lowering the power supply network
impedance. However, on-chip decoupling capacitors are not part of original logic design. These
are added separately to a design to subdue the power supply noise effects, and can account for a
substantial percentage of total chip area. Therefore, it is important to optimize the placement and
number of capacitors for effective voltage drop management. Efficiency of on-chip decoupling
capacitors can further be enhanced by using an on-chip switching voltage regulator [16]. On-chip
switching voltage regulators can be used to increase the charge transfer efficiency by dynamically
making series or parallel connections of capacitors. Voltage drop compensation based on on-chip
decoupling capacitor optimization is the topic of this thesis. On-chip decoupling capacitors are
further discussed in subsequent chapters.
11
1.3 Research Overview
In order to reduce the adverse effects of on-chip power supply variation, decoupling capacitors
(also known as decaps) are widely used as local charge reservoirs. Decaps are not part of the logic
circuit; they are added separately in the available whitespace (core area not utilized by logic cells)
in the design during later stages of design flow to provide an instantaneous charge to the logic cir-
cuit. Typically, whitespaces are created in the design due to the conservative approach to define
core dimensions. Core area calculation has to account for the signal routing as well as decap budget
requirements. A timing or wirelength driven placement tool causes these whitespaces to migrate
toward core periphery in an attempt to place logic cells in close proximity. Once placement of logic
cells is over, the whitespace area of core is filled with filler cells to provide the well connectivity.
Since the voltage drop effects depend on the placement and routing of logic cells, problems revealed
by voltage drop analysis at this stage can be addressed by inserting decaps in place of filler cells.
We call this approach a filler-based decap allocation. This approach offers an easy solution, since
replacement of filler cells with decaps does not call for placement modifications. Therefore, place-
ment optimized design metrics remain unaffected. With technology scaling, however, the filler cell
based decap approach not only compromises the power supply integrity by placing decaps away
from the switching node, but it also utilizes more amount of decap budget than necessary. The extra
decap budget translates into larger chip area, and hence, higher cost.
As evident from the above problem, we seek a solution to increase the cost-effectiveness of decap
for addressing the on-chip voltage drop effects. For decaps to be more effective, they are required
to be placed close to switching nodes. A distributed decap approach would be more effective as
compared to the filler cell based lumped decap approach. Experimental results shown in Chapter 3
validate the usefulness of the distributed decap approach, and provide a motivation for the proposed
framework discussed below. Further, addition of extra steps to an existing design flow, requiring
out of flow data processing, significantly affects the total development time, and hence is highly
discouraged. Therefore, we seek a solution which does not call for significant changes in the existing
design flow.
In this work, we propose a library based distributed decap approach to control the voltage drop
problems around the chip. We provide a complete design framework to analyze the voltage drop in
a chip, and compensate it by incorporating the decoupling capacitors close to the violating nodes.
This is done by providing designers with an additional set of library, containing decap-padded logic
12
cells, along with the nominal cell library. We analyze the initial voltage drop, and algorithmically,
identify the voltage drop regions exceeding the user defined threshold. Distributed decap placement
is achieved by selectively replacing logic cells in the affected regions with the equivalent decap-
padded logic cells to meet the decap budget requirement of the design. We develop necessary
components to calculate optimum number of cell replacements and a method to incorporate new
cells in the design with minimal perturbation to the original placement. Chapter 4 provides details
of the proposed approach.
This section describes the specific contributions of the thesis. Components described below
stitched together complete the framework described in the previous section.
13
the standard voltage drop analysis tool, and generates a list of cells to be replaced. Chapter 6
provides necessary details about the decap optimization algorithm.
1.5 Assumptions
Before we delve into the details of the proposed framework, we provide, in this section, a brief
discussion of the assumptions and the considerations taken during the development of the proposed
approach. The assumptions are supported by explanation and their implication, and in most of the
cases, do not affect the usefulness of the approach.
1. Row based standard cell designs are prevalent in ASIC designs. In this work, we consider
issues of voltage drop and decap optimization in context of single-height standard cell based
designs only. An extension to macro based designs and multiple height cells can be done with
little modifications.
2. For voltage drop analysis, we consider only resistance parasitic for power distribution net-
work (PDN). With technology scaling, PDN inductance characteristics are also becoming
significant. However, due to unavailability of interconnect technology information, we only
extract PDN resistance. The decap optimization algorithm works on the voltage drop infor-
mation. It does not know about interconnect parasitic model. Including the inductance into
the analysis would only affect the level of noise, hence does not call for any change in the
proposed approach.
3. We assume that the placement tool can provide some timing margin (for ex. 10%) while
placing the logic cells. This is due to the fact that addition of decap cells will affect the
timing metrics of the original placement. Relaxed timing constraint can allow for voltage
drop compensation without degrading the timing metrics significantly.
4. We restrict our analysis to on-chip supply network. Due to power distribution symmetry,
14
the model for on-chip ground network turns out to be similar to that of a supply network.
Therefore, ground network can be analyzed in exactly similar manner.
5. While creating the model of the power distribution network, we do not consider the package
parasitic. We assume that the supply and ground voltage at the chip I/O pads is ideal. Again,
for the same reasons as given for not including the inductance parasitic, this assumption does
not limit the usefulness of the approach. Including package parasitic will only affect the level
of noise.
6. We assume that the chip core utilization is sufficient to accommodate the decap cells. This
is a valid assumption due to two reasons. First, 10% of core area is typically used for decap
allocation. Secondly, some percentage of core area (typically 23-30%) is left for routing
considerations. These two factors combined together leave enough whitespace in core so
that decap cells can be accommodated. Moreover, additional whitespace, if required, can be
inserted using the developed ECO-Placer.
The remainder of the thesis is organized as follows. Chapter 2 provides an overview on the
voltage drop analysis flow. Requirements for the voltage drop analysis and different approaches
are discussed. Experimental results in Chapter 3 are used to highlight the effectiveness of placing
decap near the switching nodes. It provides a motivation for voltage drop optimization using decap
padding to the standard cells. In Chapter 4, we discuss the overall framework. Chapter 4 stitches
together the components of the overall framework, and provides a complete picture. Design and
characterization of decap-padded standard cell library is described in Chapter 5. Chapter 6 provides
the details of the C++ based decap optimization algorithm. And in Chapter 7, we describe the C++
based ECO-Placer algorithm. Finally, we provide experimental results of the proposed framework
on set of benchmarks in Chapter 8. We conclude our work in Chapter 9 and discuss possible future
scope of work.
15
Chapter 2
Considering the adverse effects of the power supply noise on the circuit performance, it is im-
perative to ensure the power integrity of a design before tape-out. Ensuring the power integrity
of a chip requires proper design of on-chip power distribution network, and subsequent full-chip
noise analysis at the power supply nodes. While the design and refinement of the power distribution
network can be done at various stages of a design flow, an accurate full-chip voltage drop analysis
can only be possible during late stages of the design flow. This is due to the fact that the on-chip
voltage drop not only depends on the power consumption of logic blocks, but it also depends on
their placement locations. A logic block placed near the core boundary sees a small value of supply
interconnect parasitic as compared to a block placed at the center of a chip.
The analysis of a power supply noise at late stages of a design flow, however, poses unique chal-
lenges in terms of memory and time requirements. With increasing complexity of VLSI chips, the
power distribution network is also becoming more and more complex. Use of multiple metal layers
and via connections makes the power distribution network a three dimensional network. Further,
the power current requirement of underlying logic circuit depends on input data and varies from
location to location within a chip. Unless these issues are considered methodically, the problem of
voltage drop analysis becomes intractable. Therefore, a systematic computer aided approach is must
to handle the complexity of the power supply noise analysis, and to make the problem tractable.
In this chapter, we discuss standard approaches to handle the complexity of the power supply
noise analysis. We start with a typical design flow in Section 2.1, and identify the stage where
the voltage drop analysis fits into the design flow. Section 2.2 provides modeling requirements for
16
voltage drop analysis. This is followed by the two popular approaches, static and dynamic analysis,
in Sections 2.3 and 2.4 respectively. Decap is a powerful way to control the on-chip noise. The
importance and tyoe of on-chip decoupling capacitance is discussed in Section 2.5. Lastly, we
discuss the previous research in the domain of power supply noise optimization in Section 2.6.
Figure 2.1 shows a typical top-down digital IC design flow. The design flow can be divided
into two parts: frontend, and backend. Frontend deals in logical design, whereas backend involves
physical design of the IC.
As shown in the figure, a standard cell based design flow starts with converting input design
specifications into a RTL description using design languages such as Verilog, VHDL, SystemC etc.
After functional testing of RTL model using test benches, the design is passed to the logic synthesis
step. A standard tool used for logic synthesis process is Synopsys Design Compiler. In addition to
RTL model, the logic synthesis tool also takes in design constraints and standard cell library as its
inputs. Based on the constraints, the design is optimized in terms of area, power and timing, and
a technology independent gate-level netlist is generated. The technology mapping process converts
the generic netlist into the technology-dependent gate-level netlist. This netlist describes the design
in the form of standard cells present in the library. Finally, design is verified for timing violations
using a static timing analyzer tool such as Synopsys PrimeTime.
The synthesized designed is place and routed in the backend stage. The backend design involves
floorplanning, power grid design, cell placement, power routing, clock tree synthesis (CTS), filler
cell insertion, and signal routing in order (not shown in the figure). The physical layout information
for cells at this stage is provided by the standard cell library. Timing verification is performed at
various stages during the backend flow. For example, timing specifications of design are typically
verified at pre-CTS, post-CTS, and post-route stages. The routing parasitic information for the
design can be captured in SPEF or SDF formats. Cell delay information together with routing
delays can be used to perform an accurate post-route timing sign-off analysis of the design.
In addition to timing sign-off, the chip needs to be analyzed for the power integrity. Power-Grid
sign-off (P/G sign-off) involves verifying the design against any potential voltage drop problems.
Although, an early analysis such as at floorplanning stage can prevent costly re-spin of the design
17
process, it is not very accurate due to lack of placement information. For accurate results, voltage
drop analysis is typically performed either after initial design placement or at post-route stage.
Design optimizations are applied to correct any voltage drop problems revealed at this stage.
Once the timing and p/g sign-off of the design is successful, last step involves generation of
GDSII/OASIS format of the design for final tape-out. As evident, a typical IC design flow involves
number of steps in order to realize a final workable chip. Any design changes required at the later
stages can result in cost re-spin and hence, should be avoided.
Design
Specifications
RTL Simulation
LIB, DB Logic Synthesis
(ModelSim)
(Synopsys DC)
Timing Violations
Design Optimizations
Tape-out
(GDSII/OASIS)
18
2.2 Voltage Drop Analysis Flow
As discussed in the chapter prelude, full-chip voltage drop analysis of an integrated circuit re-
quires a systematic computer aided approach due to the sheer magnitude of the problem size. Hi-
erarchical design of a complex integrated circuit suggests that the power supply noise analysis can
be performed at the individual logic block level. Hence local block level optimizations can be per-
formed to meet the noise margins at the supply and ground nodes. Although, block level analysis
offers advantages in terms of reduced memory and runtime requirement, a full-chip power integrity
analysis is must. When locally optimized logic blocks are combined together to form a complete
design, the current flowing through the power grid due to an adjacent block can affect the power
integrity of the logic block under analysis. Hence, a full-chip analysis can only ensure the power
integrity of the design.
Further the problem is complicated due to the non-linear behavior of the transistors loading the
power supply grid. A non-linear circuit simulator such as SPICE can be used to perform an accurate
analysis. However, full-chip analysis of a design netlist containing hundreds of millions of power
grid segments and transistors makes the process intractable.
The process of voltage drop analysis therefore requires creation of a full-chip model of a design.
The full-chip model of design is typically created in two steps. First, the model for the power
and ground network is created. This involves parasitic extraction of the power grid interconnect
discussed in Section 2.2.1. In the second step, transistor circuits loading the power distribution
network are represented by an equivalent linear current source model based on the current profile
of the circuit as discussed in Section 2.2.2. Finally, the two models are combined and a complete
linear model for full-chip power distribution analysis is created. Brief discussion of model analysis
is presented in Section 2.2.3.
Evidently, the model creation and analysis of the resulting network is a method of choice. How-
ever, this approach results in a conservative analysis, and slightly overestimates the power supply
noise levels. The reason for this behavior is the negative feedback between the current consumed
by the logic circuit and the power grid voltage drop. The high current consumed by a logic block
results in a significant voltage drop. This voltage drop, in turn, results in the decrease in the logic
block current, and hence reduces the overall voltage drop levels. Hence, an iterative analysis is
necessary to get an accurate picture of voltage drop profile of a chip.
19
2.2.1 Power Grid Modeling
RLC extraction
On-chip power distribution network is formed using multiple metal layers. Contact and vias are
used to form the connections between layers. Long metal layers are typically divided into multiple
metal segments of smaller length, and each segment is modeled using a Π-network consisting of a
resistor and two capacitors. Distributed RC network results in more accurate results as compared to
a lumped RC model, where a long metal line is replaced by a single R and C elements. However,
the downside is the increased amount of model data.
Layer resistance can be characterized either using shape based extraction algorithms or using a
standard sheet resistance based formula as shown below.
R = s · wl (2.1)
where s is the sheet resistance in the unit of ohm/square, l is the length of line in um, and w is
20
width of the line in um. Further, each contact and via contributes a fixed resistance. Contact and via
resistance needs to be included in the overall resistance extraction data. Lastly, effect of temperature
and electromigration on metal resistivity can also be included for more accurate analysis.
Grid capacitance calculation can be done based on unit-length overlap, fringe, and lateral capac-
itance models [9]. Although, the complex geometrical layout can result in overwhelming amount of
capacitance data because the capacitance can be formed between any two overlapping segments in a
layout, the model size can be substantially reduced by ignoring the capacitive components between
non-adjacent lines. This little compromise in accuracy is acceptable for two reasons. First, it puts
lesser burden on time and memory requirement. Second, the overall capacitance of a power grid
is usually dominated by the capacitance contributed by the logic circuits [8]. Types of logic circuit
capacitance are discussed in subsequent sections.
The inductive properties of on-chip power distribution network are difficult to characterize.
Based on the shape and size of the current loops, loop inductance can be estimated, but the main
hurdle to this approach is that the current paths are not known in advance. Electromagnetic analysis
based PEEC models as described in [17] can be used to characterize the inductance of the grid.
Another approach involves creation of partial inductance matrix as described in [8].
21
2.2.2 Logic Circuit Modeling
The level of power supply noise is greatly influenced by logic circuit forming a load to the
power distribution network. High power consumption of a logic block need not necessarily lead
to a large voltage drop. Only when, the current consumed by logic block flows through a highly
parasitic interconnect path, it leads to a substantial voltage drop. Hence, the voltage drop profile of
a chip depends on the placement and power current profile of logic block. The model for the logic
block involves calculating the power current profile for the block as well as parasitic contributed by
the block as shown in the Figure 2.3. Parasitic information of a transistorized circuit includes the
resistance and capacitance offered by it. Total capacitance offered by the logic circuit can act as a
decoupling capacitance, and can come from two sources (Intrinsic and Intentional) as described in
Section 2.5.1. Following describes the switching current modeling for the logic circuit.
I(t)
I(t)
The current profile of a logic circuit is determined by three components of current: dynamic,
short-circuit, and leakage current. These current components are used to model the logic circuit
as triangular current source as shown in Figure 2.3. Accuracy of triangular current source repre-
sentation is defined by number of current point samples used to generate the model. During the
start of switching period, the current magnitude of the circuit increases linearly and attains a peak
value. The current magnitude, then, decreases linearly. This current profile is known as tap current.
The calculation of tap current is again complicated by the fact that the current profile of a logic
circuit depends on the input pattern. In case multiple inputs, worst case switching pattern is used to
determine the current profile.
22
Each transistor connection to the power grid creates a tap point. Although it is possible, and
indeed easy, to calculate the tap current for each transistor, the resulting tap current information
would be difficult to handle. Instead, tap current information is captured for individual logic gates
or small macros. The worst case current profile can easily be calculated for such small circuits due
to relatively small number of inputs.
Further, depending on the type of analysis as discussed in next section, the tap current informa-
tion can either represent the average current profile of a logic gate or transient current information.
The RLC model of power distribution network is combined with the logic circuit tap current
information along with the decoupling capacitors at various grid nodes. The resulting model as
shown in Figure 2.4 forms a linear network for power supply noise analysis, and can be represented
by equation 2.2.
G · v(t) +C · v′ (t) = i(t) (2.2)
The Static Voltage Drop Analysis is based on the average current of the logic circuit. The tap
current captures the average value of the logic switching current. The goal of static analysis is not to
find the accurate voltage drop in the circuit. Rather, the main value of static analysis lies in verifying
the effectiveness of the power grid structure. Problems in power grid structure such as short, open,
insufficient width of the power interconnect can easily be identified using average current analysis.
23
VDD
VDD
VDD
VDD
The advantage of static analysis is its simplicity. Calculation and storage of average tap current
information for each transistor or gate is relatively easy. The average current of a gate can be
determined statistically using the gate switching activity information. Switching activity of a gate
can in turn be determined using input switching activity propagation algorithms or by performing a
gate level simulation of the design using test benches. Once the gate switching activity is known,
the average current of a gate can be given by [9]:
where A is the gate switching activity value, Cgate is the total gate capacitance of nets in the gate
including the load capacitance, V dd is the supply voltage, and Fclk is the chip frequency.
The average current analysis simplifies the overall power grid model since only resistance par-
asitic is required to be considered. The power grid network simplifies to a two dimensional linear
resistive network. Simple nodal analysis based on ohm’s law can be used to calculate the nodal
voltages. Another advantage of static analysis is that it does not require input vectors, which greatly
simplifies the analysis.
If some part of power grid contains a open, the current flowing through that part will encounter
more resistance, and resulting higher voltage drop would clearly point out this problem. The static
24
analysis can also be used to analyze the electromigration phenomenon, which depends on the trans-
port of metal ions by the direct (average) current.
1. Parasitic resistance of the power grid is extracted, and resistance matrix is formed.
2. Average tap current for each transistor or gate is calculated.
3. The tap currents are attached to the resistive power grid network at designated tap points.
4. Depending on the VDD pad location, ideal supply voltage is attached to the power grid net-
work.
5. The resulting linear resistive network is solved using nodal analysis to calculate the current
and voltage levels at various nodes.
The Dynamic Voltage Drop Analysis is based on the transient current of the logic circuit. As de-
scribed in Section 2.2.2, the tap current captures the logic current with respect to time as a triangular
current source. The goal of dynamic analysis is to perform an accurate voltage drop analysis. The
instantaneous current taken by a gate during the clock period can be high as compared to the average
current during the same clock cycle as shown in Figure 2.5. Hence, the instantaneous voltage drop
would be significantly high, and can only be captured using transient analysis of power distribution
network.
The main advantage of dynamic analysis is its accuracy. However, the dynamic analysis poses
number of challenges. Dynamic analysis requires extraction of R, L, and C parasitic (Inductance can
25
be ignored, if it is small). The resulting network containing huge number of elements puts an upper
limit on circuit simulation time and memory capacity. Moreover, the tap current information for a
gate is no longer a single value. Rather tap current information for each gate must contain a series
of 2-tuple values representing transient current value and associated time stamp. This significantly
increases the memory requirement. Further, dynamic analysis requires good input vector coverage.
Some portion of the design might not be analyzed for voltage drop effects due to insufficient vector
coverage.
Decoupling capacitors play an important role in stabilizing the power supply variations. Decou-
pling capacitors act as local charge reservoirs. It provides a low impedance path for the current to
the logic circuit. It lowers the overall impedance of the power distribution system as seen from the
load by providing the instantaneous charge during the switching period of logic gate.
Figure 2.6 highlights the significance of the decoupling capacitor. The Figure 2.6[A] shows
a circuit without decoupling capacitors. The power distribution network is modeled by R and L
elements. As described in Section 1.1, whenever load draws current from input supply, supply node
at load suffers from voltage drop, and voltage rises at the ground node, when current takes a return
26
path. This decrease in voltage level across load affects the circuit performance. This performance
penalty can be decreased by the use of decoupling capacitor as shown in Figure 2.6[B]. During the
inactive period of load, the decoupling capacitor charges from supply pad at slower rate, and acts as
a charge reservoir. It now provides the required charge to load instantaneously during its switching
period. Based on the capacity of the capacitor, a major portion of the current would be supplied by
the decoupling capacitor, and a very small portion would be drawn from the input supply, resulting
in relatively small voltage drop across the load. The impedance of power supply network is thus
lowered due to addition of decoupling capacitor as seen from the load. Decoupling capacitors are
very useful to reduce the effective impedance of power distribution network.
R1 L1 R2 L2 R1 L1 R2 L2
RD
R4 L4 R3 L3 R4 L4 R3 L3
(A) (B)
Although, decoupling capacitor can improve the voltage drop profile of a chip, unintelligent ad-
dition of decoupling capacitors can raise several concerns for the chip. Typically, the decoupling
capacitors are added in the unused areas of the chip core, known as whitespaces. If the decoupling
capacitors require more area than available whitespace, it results in increase in die area, and de-
crease in yield of the integrated circuit. Further, each decoupling capacitor contributes to a leakage
current. The static power dissipation of chip can thus increase with increase in number of decou-
pling capacitors. This factor also directly affects the circuit yield. Further, as described in previous
chapter, with technology scaling, distributed placement of decoupling capacitors is must. Hence,
optimization of number and placement of decoupling capacitors is important.
Typically, the total decoupling capacitance of a design can be classified into two categories [8]:
Intrinsic, and Intentional. Intrinsic decoupling is offered by the parasitic capacitance of the logic
27
circuit. One source of intrinsic decoupling contributed by the power grid capacitance is already
discussed in previous section. The logic circuit also offers capacitances such as drain junction ca-
pacitance, gate-source capacitance. The pn junction capacitance due to the N and P wells also
contribute to intrinsic decoupling repository. Due to the large well area, the parasitic well decou-
pling capacitance usually dominates the intrinsic capacitance. The non-switching logic circuit can
also provide significant decoupling capacitance. The intrinsic capacitance contributed by the logic
circuit can be determined based on the power consumption of the circuit [8, 9]
where Cdecap is the total intrinsic decap of circuit, P is the power of the circuit, V dd is the power
supply, Fclk is the clock frequency, and A is the switching activity factor of the circuit.
Apart from the intrinsic device decoupling capacitance, designer can add MOS based decoupling
capacitors. These capacitors are known as intentional decoupling capacitors, and are typically real-
ized as MOSFET gate oxide capacitance. The design, modeling, and characterization of intentional
decoupling capacitors is discussed in chapter 5.
Since the power supply voltage directly affects the circuit performance, containing the on-chip
power supply noise within bounds had been a topic of research since over a decade. As described
in Section 1.2.3, decoupling capacitors are indispensable means for controlling the on-chip power
supply noise. Efficacy and placement of decaps on-chip has been analyzed in [19, 20] based on
effective radii of decaps. Decaps must be placed within the effective radius determined by current
load and the input power supply. Authors in [15] provides an early work on on-chip decap opti-
mization for controlling power supply noise. Several contributions since then have been made to
address the issue of power supply noise by optimizing on-chip decap [21, 22, 23, 24]. In [15, 21],
authors consider decap allocation and optimization at floorplan level for full-custom design style.
Given the initial floorplan and switching profile of circuit modules, noise levels at circuit modules
are calculated and decap is allocated to available whitespace in the floorplan using linear program-
ming. If required, additional whitespace is also inserted into floorplan based on hueristic criteria
to meet design decap demand. Architectural level current signatures for various functional blocks
in a processor are used in [22] to estimate power supply noise level and required decap budget for
28
functional blocks. Authors in [22] evaluate different decap placement strategies by analyzing four
decap cases and show that ditributed decap placement approach provides best noise attenuation.
The research focus for these works has primarily been on decap optimization for full custom style
designs. Designs are analyzed at floorplan level where large functional modules are abstracted by
current source representation. Addition of decaps for such cases results in their placement away
from the switching nodes and requires large decap budget.
With pronouncing effects at deep-submicron node, need for decap placement close to switching
nodes is highlighted in [19]. This requires that the power supply noise and decap optimization
must be analyzed at finer abstraction level of the design. On-chip decap optimization at standard
cell level has been analyzed in [23, 24]. In [23], authors propose a non-linear programming based
decap optimization scheme applicable subsequent to placement stage, and calculate optimal decap
allocation for standard cell rows by analyzing an adjoint network of the original power distribution
model of the design. And authors in [24] pad the standard cells with decap to reduce the power
supply noise. Decap padding to standard cells is predicted based on gate switching activity prior to
placement. The padded decap amount is corrected after placement and power grid noise analysis
by gate sizing. Although these approaches are shown to provide effective distributed control of
power grid noise, they add additional steps to power grid noise analysis, increasing its complexity
further. Moreover, these approaches are not very conducive to traditional library based design flow.
Therefore, as discussed in Section 1.3, we aim to develop alternative framework for distributed
decap optimization with the help of decap-padded standard cell library.
29
Chapter 3
As discussed in Section 1.3, effective control of voltage drop requires placement of decaps close
to the switching nodes. Filler-based decap allocation approach compromises the power integrity
by placing decaps away from switching nodes and results into more-than-necessary decap budget.
This directly affects the reliability of the circuit. Therefore, a distributed approach for decap place-
ment is necessary, where decaps are placed physically close to the switching loads [19]. We verify
the effectiveness of decap placement close to the switching loads in this chapter. Experimental re-
sults presented in this chapter provide a motivation for development of voltage drop compensation
framework (described in chapter 5) using a decap-padded standard cell library (described in chapter
6). We analyze different circuit configurations, and compare the efficacy of distributed approach
with lumped decap placement approach, which typically emulates the filler-based decap allocation
method.
Figure 3.1 shows a circuit model for evaluating effectiveness of distributed decap placement
approach on power supply noise. The model forms a coarse representation of a typical row in a
row-based standard cell design style, where logic circuits in a row share common power and ground
lines. Power supply to a row comes from the die pad, and is assumed to be ideal. The logic circuit
is approximated by a block of 20 parallel inverters (Figure 3.2). The rationale for representing the
logic block by parallel inverters is as follows:
30
• Inverters are backbone of all digital logic design. All complex logic gates can be converted
to an equivalent inverter representation for analysis. Behavior of complex logic gates can be
derived by extrapolating the results obtained for inverter [4].
• Unlike a ring oscillator, the rise and fall transitions for block of parallel inverter can be con-
trolled independently. This is important to analyze the effect of voltage drop.
• Lastly, a block of 20 parallel inverters (INVX8) is used to emulate a power hungry logic
circuit block. Simultaneous switching of parallel inverters draws enough current from the
input power supply so as to produce appreciable results for the voltage drop.
V2 V1
VDD Dist.
Lumped Basic
Decap Decap
Block
C2 C1
G2 G1
Figure 3.1: Circuit Model for Evaluating Effectiveness of Distributed Decap Placement
i11 o11
Figure 3.2: Logic Circuit Representing Basic Block Shown in Figure 3.1
The distributed and lumped decoupling capacitors are represented by C1 and C2 respectively.
For simplicity, the distributed capacitance is represented by a single capacitance C1. In actuality,
we distribute the capacitance C1 with each of the 20 parallel inverters in a logic block. Both the
lumped and the distributed capacitances are realized as a MOSFET gate capacitance. A standard
31
decoupling cell as described in chapter 6 is used to add the required value of lumped capacitance to
the circuit. Distributed capacitance for a block is added by padding the standard inverter cell with
required amount of decap.
The power lines connecting the logic block to the input supply voltage are modeled using its
equivalent parasitic elements representing interconnect segment impedance Zx. A global wire is
assumed for power and ground routing. Arizona State University (ASU) interconnect model [25, 26]
is used to derive the global wire parasitic values. Table 3.2 shows the parasitic values for global wire
with specific wire parameters shown in Table 3.1 for 0.18µm technology node.
Parameter Value
Width 0.8µm
Space 0.8µm
Thickness 1.25µm
HeightILD 0.65
KILD 3.5
Width 0.8
Material Cu
Element Value
Resistance 22.92 Ω/mm
Inductance 1.66 nH/mm
Capacitance 238.8 fF/mm
For this experiment, R and L are varied in the range of [0.1, 1.6] and [0.1pH, 1pH], which trans-
lates to wire length in the range of 10 to 50um. Since the parasitic capacitance for this wirelength
range is too small, it is ignored in the analysis. The supply voltage for 0.18u technology is 1.8 V.
And the maximum tolerable ripple at the logic block nodes is assumed to be 5% of the power supply
voltage. Hence the power supply is considered to be noise free, if the voltage at power supply nodes
is within range [1.89 - 1.71]. Any node having voltage outside this range is considered to be noisy
and a decap must be added to reduce the noise. The input waveform for blocks is shown in Figure
3.3. The rise and fall time is set to 80ps. The power supply voltage drop is measured only during
output load charging (input falling transition). The output load for a logic block is assumed to be
1pF.
32
r f
We start by creating a layout of the logic block in 0.18um technology using Magic layout editor.
The resulting block is extracted to spice, and interconnect parasitic are manually added to the spice
netlist to form a power distribution network. This is followed by an transient analysis, where worst
voltage drop at various supply nodes is recorded.
Case Description
N No Decap (C1 ≥ 0 & C2 = 0)
A Only Lumped Decap (C1 ≥ 0 & C2 = 0)
B Distributed decap C2 s.t. Acell + = (20% · Acell ) & C1 ≥ 0
C Distributed decap C2 s.t. Acell + = (30% · Acell ) & C1 ≥ 0
In order to analyze the effect of lumped and distributed decap placement on the resulting voltage
drop, we compare different decap cases shown in Table 3.3. Case N represents a circuit without
decoupling capacitances C1 and C2. In case A, we add enough lumped decap C2, such that the
voltage drop at various supply nodes is within the tolerable band. Cases B and C are used to analyze
the effect of addition of distributed decap. Addition of decap to a standard cell increases the cell
area (refer to chapter 5). Hence, amount of decoupling capacitance added to a standard cell can be
controlled by changing the cell area. In case B, amount of distributed decap added to the logic block
is such that it leads to 20% increase in the logic block area. In case C, higher value of decoupling
is added by increasing the logic block area to 30%. In both cases B, and C, lumped decap is also
added along with the distributed decap such that the voltage drop at various supply nodes is within
the tolerable noise band.
33
Table 3.4: Effect of Decap Addition on Block Area
Table 3.4 shows the change in area of the basic block with increasing amount of decap per cell.
Intrinsic decap for cell is assumed to be zero. Hence a basic block without any intentional de-
cap contributes zero decoupling capacitance. Basic block uses 20 parallel single height inverters
(INVX8), hence block area is represented by block width, measured in λ. The λ for 0.18µm tech-
nology node is 0.1. Decap value per cell, measured in femto farad, shown in Table 3.4 corresponds
to the increased cell area. Decap cell with MOSFET width W and channel length L is added to
INVX8 such that it leads to given increase in cell area. The decap value for this cell with W and L
are calculated as per capacitance equation given in chapter 5.
3.3 Results
We analyze different circuit configurations by varying the number of basic blocks and intercon-
nect parasitic values in the model. Since similar results are obtained for various circuit configura-
tions, we here show results only for two circuit configurations. Figure 3.1 shows the first circuit
configuration containing one basic block and both supply & ground parasitic. Figure 3.4 shows
another configuration with two basic blocks.
V3 V2 V1
Z3 Z2 Z1
VDD Dist. Dist.
Lumped
Decap Decap B2 Decap B1
C2 C1 C1
Z3 Z2 Z1
G3 G2 G1
Figure 3.4: Circuit Model with Two Basic Blocks for Distributed Decap Experiment
Table 3.5 and Table 3.6 show results for these two circuit configurations respectively. Figure 3.5
and Figure 3.6 shows the graphical representation of the results respectively. The impedance Zx
value in the table given as x/y represents a series connection of a resistance with value x Ω and an
34
inductance with value y H. Voltage drop results for Table 3.6 corresponds to simultaneous switching
of both basic blocks. Description of column headings is given below:
3.4 Analysis
As evident from the experimental results for both models, the distributed decoupling capacitor
approach does provide significant benefits in terms of total decap requirement. Total amount of
decap required in distributed case as compared to lumped decap approach is significantly reduced
as the amount of decoupling capacitance per logic block is increased. Even a minimum amount of
decap addition to the logic block also provides sufficient gain in overall decap budget. Although
a slight area penalty with minimum decap per block can be observed, the area increase is not con-
sistent. It depends on interconnect parasitic values. As seen from Figures 3.5 and 3.6, there is a
decrease in design area for some parasitic cases. And with decap case C, design area in all cases
is less than the design area with lumped decap case A. The slight area increase in decap case B as
compared to case A is not of big concern due to following reasons:
1. Typically design core area utilization is kept below 70-75%. This is done to accommodate
signal routing requirements and late stage design changes. Therefore, slight area penalty
can be amortized at the overall design level due to available whitespaces in the core and the
decrease in decap requirement.
2. Additionally, 10-20% area is reserved for the decap allocation. This also creates enough
whitespaces for accommodating decap cells. Figures 3.5 and 3.6 shows decap area require-
ment larger than 10-20%. This is a result of model simplification, where power supply current
can take only one path. In actual designs, current to a logic block can come from multiple
paths, substantially reducing the overall path resistance.
35
Table 3.5: Analysis Results for Circuit Model Shown in Figure 3.1
Figure 3.5: Graphs Show Change in Area and Decap Requirement for Circuit Model in Figure 3.1
analyzed for different interconnect parasitic values
36
Table 3.6: Analysis Results for Circuit Model Shown in Figure 3.4
Figure 3.6: Graphs Show Change in Area and Decap Requirement for Circuit Model Shown in
Figure 3.4 analyzed for different interconnect parasitic values
37
Chapter 4
As discussed in chapter 2, an accurate voltage drop analysis is possible only during the late
stages of design flow. A design needs to go through synthesis followed by place and route stages,
before the adverse effects of voltage drop problem can be analyzed. The reason lies in the fact that
voltage drop inside a chip not only depends on the power consumption of logic circuits and power
distribution network parasitic, but it also depends on the physical placement of logic blocks and their
connectivity. Decoupling capacitors offer an effective way to tame the power supply noise at this
level. Decoupling capacitors are placed in the available whitespace (core area not utilized by logic
cells) on the chip. Special decoupling cells are designed to enable their placement with standard
cells in a row based designs. The design of decoupling capacitor cells is discussed in chapter 5. For
following discussion, a decoupling capacitor cell can be thought of as a passive element offering a
specific value of capacitance.
Figure 4.1 illustrates the complete framework to analyze and control the voltage drop by placing
decoupling capacitors in the available whitespaces. A design in RTL specification is synthesized
using nominal standard cell library, OSULIB (described in chapter 5). Synopsys design compiler
is used to synthesize the design and to generate a gate-level netlist. This gate level netlist is place
and routed using Cadence Encounter. The synthesis and place-n-route flow can be found in [27].
The voltage drop analysis is then performed on a place-n-routed design. The left branch in the
figure refers to a traditional method of controlling voltage drop. And the right branch highlights
38
our approach. Both of these approaches are discussed in subsequent sections. We use Synopsys’
PrimeRail tool for dynamic voltage drop analysis. In Section 4.3, we discuss the data preparation
needs to perform analysis using PrimeRail. Section 4.4 depicts the flow for performing the analysis
using PrimeRail. We end this chapter with discussion of benchmarks used for analysis in Section
4.5.
Design
Specifications
Behavioral/RTL
Design
LIB, DB Logic Synthesis
OSU LIB
Design Compiler
DCFLIB
(Gate Level Netlist, SDC)
DCOPT
Perform DvD
ECO Routing
(Encounter)
Voltage Drop Sign-off
Traditional Proposed
Approach Approach
39
4.1 Traditional Approach
Traditional method to control the voltage drop problem involves placing the decoupling capac-
itors by replacing filler cells [3]. Typically, a placed design contains enough whitespaces due to
two reasons: core dimensions calculation has to account for 10 to 20% of area for decoupling
placement, and core utilization is typically maintained less than 100% (around 70 to 75%) to meet
routing requirements. Filler cells are inserted in these whitespaces to provide proper well connectiv-
ity. Once the analysis reveals voltage drop problems, filler cells from the affected area are replaced
with decoupling capacitor (decap) cells. We call this approach a filler-based decap allocation. This
approach offers an easy solution, since replacement of filler cells with decap cells does not call for
placement modification. The standard cells in the design are left untouched.
[A] Design with Filler cells [B] Replace all Filler cells [C] Iteratively remove Decap
(Virtually). Perform DvD Cells to meet voltage drop
target. Instantiate Decaps.
Synopsys’ PrimeRail tool provides voltage drop optimization based on filler-based decap place-
ment. Figure 4.2 shows the decap optimization procedure. For this optimization to work, design
must contain filler cells. The original design (Figure 4.2[A]) contains only standard cells and filler
cells. The tool first virtually replaces all filler cells with the equivalent sized decap cells (Figure
4.2[B]). One to one correspondence between filler cell masters and decap masters is required. It
then performs multiple iterations of voltage drop analysis and selective removal of filler cell to
achieve the user defined target reduction in the voltage drop (Figure 4.2[C]). For each iteration, it
reports reduction in voltage drop and required decap repository. At the end of analysis, PrimeRail
provides an option to select result of any iteration and perform the actual decap insertion. The
modified design is saved in the MilkyWay Database format. We design and characterize a library
40
DCFLIB (described in chapter 5) containing filler cells and equivalent decap masters to enable this
functionality. We compare results of our approach with Synopsys’ PrimeRail results of filler-based
decap insertion. Results are presented in chapter 8.
Although filler-based decap allocation provides an easy and effective solution, the transition into
deep submicron and nanometer renders this approach less cost-effective. A combination of factors
such as voltage scaling, increase in frequency of operation and design complexity, and increase in
interconnect parasitic per unit length with technology scaling are making a design more susceptible
to power supply noise. As a result of these factors, the filler-based decap allocation results in
following two problems [19]:
• Filler-based decap allocation compromises the power supply integrity by placing decaps away
from the switching nodes. Timing driven placement usually crams in logic cells nearby, mov-
ing the whitespaces toward the block or core periphery. Decaps placed in these whitespaces
by replacing filler cells results in increase in effective distance between the switching nodes
and decoupling capacitors. The increased effective distance between switching node and de-
cap corresponds to larger supply and ground parasitic, which reduces the effectiveness of
filler-based decap.
• As a consequence of above problem, higher value of decap is required to control the power
supply noise. The extra decap results into wasted area, higher power consumption, and re-
duces the yield of a chip.
Apparent from problems indicated in previous section, a distributed placement approach of de-
caps is needed [19]. In fact, decaps are required to be placed near to the switching nodes for them
to be cost-effective. The experimental results presented in chapter 3 highlight the effectiveness of
placing decaps near the switching nodes. As decap is moved near logic cells, the overall require-
ment on decap budget reduces significantly with a slight or no penalty in overall area as compared
to a traditional lumped decap placement approach.
We, therefore, propose a distributed decap approach to control the voltage drop problems around
the chip. We provide a complete design framework to analyze the voltage drop in a chip, and com-
41
pensate it by incorporating the decoupling capacitors close to the violating nodes. This is done
by providing designers with an additional set of library, UCDCLIB, containing decap-padded logic
cells along with the nominal cell library, OSULIB. Each logic cell in UCDCLIB is logically equiv-
alent to a logic cell in OSULIB, but additionally contains a specific amount of decap padded to it.
The design and characterization of UCDCLIB cells is presented in chapter 5.
As shown in Figure 4.1, design is synthesized using nominal cell library. The nominal cell
library in our case is OSU standard cell library [28]. The design is place and routed, and is analyzed
for initial voltage drop. We, then algorithmically, identify the voltage drop regions exceeding the
user defined threshold. The decap requirement for these affected regions is satisfied iteratively by
replacing standard logic cells with equivalent decap-padded logic cells from UCDCLIB. Optimal
selection of number of standard cells for replacement and calculation of decap budget is done by
a C++ based decap optimization procedure, DCOPT, described in chapter 6. Once decap budget
is calculated, and number of cell replacements are decided, the original placement needs to be
modified. Optimal way to modify the original placement from voltage drop point of view is done
by a C++ based Engineering Change Order Placement tool, ECO-Placer, described in chapter 7.
Results shown in chapter 8 highlights effectiveness of our approach. Our approach reduces the
total decap budget substantially while providing a better voltage drop profile than the traditional
filler-based decap allocation approach.
Data preparation is an important step to perform voltage drop analysis. PrimeRail tool requires
cell and design related data in a specific format, before it can analyze a design for potential voltage
drop problems. A brief overview of data preparation needs [3] is discussed in this section.
Milkyway database refers to a common data repository for Synopsys integrated circuit (IC) de-
sign tools. Easy and efficient interoperability among various Synopsys tools is achieved by captur-
ing cell library and design data in MilkyWay database format. A common database eliminates the
need for exchanging large design files thereby saving data translation times, and preventing errors
and inconsistencies due to semantic mismatches between tools. The database provides an appli-
42
cation programming interface (API) for database access and Scheme language extension for easy
integration and customization.
The Synopsys Milkyway database contains multiple directories and files in a tree structure. The
root node of directory structure can be a reference or design library. Each library can contain a
complete design, or modules representing a design, or logic cells. Various data views are generated
for each of these library components to characterize the library. Figure 4.3 lists important views
and their content. Each view provides specific information during the design flow. If library and
its components are used within another library (or a design), it is referred as a reference library. A
design library is a library which instantiates components from reference libraries. A design library
can also be made to refer as a reference library, if it is required to be used in another design in a
hierarchical fashion.
We generate Milkyway reference library for our libraries, UCDCLIB and DCFLIB, as well as
nominal library, OSULIB since library cells from these libraries are instantiated in benchmark de-
signs. The reference libraries are generated using physical (LEF) and timing (LIB, DB) views of
libraries (library view generation is described in chapter 5). We also generate Milkyway design
library for each of our benchmark designs. Steps to generate design and reference library are given
in Appendix A.
43
4.3.2 Cell Parasitic Extraction
As discussed in chapter 2, logic gates present a significant load to the power distribution net-
work due to various intrinsic capacitive effects. The cell load capacitance is usually shielded by
the transistor channel resistance, and hence, does not explicitly appear as a direct load to power
distribution network. However, intrinsic cell capacitance does load the power grid network, and
hence, is important to characterize since it can provide a decoupling effect. PrimeRail uses HSPICE
to characterize the cell resistance and capacitance by performing DC and AC analysis respectively.
During AC analysis, a sinusoidal waveform is applied to a gate, and intrinsic capacitance is calcu-
lated based on magnitude and phase of current response. A characterized gate is then represented by
a Π-model (C − R −C with first C for intrinsic capacitance, last C referring to load capacitance, and
R representing a channel resistance) during final voltage drop analysis. The cell characterization
information is captured as a PARA (parasitic) view in the Milkyway reference library. Appendix B
shows steps for cell parasitic characterization.
Dynamic voltage drop analysis requires peak current information during a switching event to
calculate peak voltage drop as discussed in Chapter 2. PrimeRail captures the current waveforms of
logic gates by performing HSPICE simulations for various input slope and output load conditions.
Only specific points in the current waveform are captured with their time stamp. The piecewise
linear data is stored in the form of a lookup table in a Milkyway reference library database. Library
current characterization steps are discussed in Appendix B. The cell library characterization for par-
asitic extraction and current waveform generation requires transistor model file (for 0.18um node)
and spice netlist for each cell.
Apart from cell library characterization information, PrimeRail requires additional information
to successfully perform a dynamic voltage drop analysis. Additional data required for analysis
include:
1. Design placement (.def): Final place and routed design can be saved in a DEF format. Milky-
44
way design library is generated by importing a place-n-routed design in DEF format.
2. Synopsys Design Constraints (.sdc): SDC file is generated by the Synopsys Design Com-
piler during logic synthesis. This file is required for performing a gate level power analysis
as described in next section.
3. Signal Net Parasitic file (.spef): Standard Parasitic Extraction File (SPEF) can be generated
after a design is place and routed. SPEF is an IEEE standard format for capturing parasitic
information associated with various signals in a design. Signal net parasitic are required to
calculate signal delay and gate power consumption information.
4. Post-Route Verilog Netlist (.v): Once a design goes through placement stages, tool adds
additional buffers for timing optimization and clock tree synthesis. Hence, the final placed
design netlist will not be same as gate level netlist after logic synthesis. Post-route netlist is
required for design power analysis as described in next section.
5. Value Change Dump File (.vcd): A Value Change Dump (VCD) file contains signal transi-
tions generated by performing design simulation using ModelSim. This file is only required,
if vector-based dynamic analysis is to be performed.
Figure 4.4 illustrates the cell level dynamic voltage drop analysis flow. Input place-n-routed
design is captured as Milkyway design library. Milkyway reference library is created for OSULIB,
DCFLIB, and UCDCLIB libraries. Cell parasitic and current information is stored in Milkyway
database. Dynamic voltage drop analysis is then performed in three steps discussed below. When
analysis is over, PrimeRail creates voltage waveform database (stored in Milkyway design library
database) and voltage violation reports. Voltage drop profile of a design can be analyzed graphically
using generated maps. Detailed description of steps to perform dynamic voltage drop analysis using
PrimeRail is given in Appendix C.
The PrimeRail dynamic analysis matrix solver needs current waveforms at cell instance power
and ground ports to calculate timing-dependent voltage drops (or rises) on the power and ground
45
Input Design OSULIB
(DEF, LEF) DCFLIB /
OSULIB UCDCLIB
DCFLIB UCDCLIB
Milkyway Design
Library PT-PX SCRIPT
Milkyway Ref Verliog Netlist
Library
Power Analysis using SDC
PrimeTime-PX
Library SPEF
Characterization
Current VCD/SAIF
Current Waveform
Characterization
Generation
Parasitic
Extraction
Power Grid (PG)
Extraction
Rail Analysis
(DvD)
parasitic network. Library characterization has already generated current waveforms for individual
cells. However, in a design, the current consumption, and hence power, of a cell depends on its
connectivity with other cells. PrimeRail uses PrimeTime-PX (PT-PX) to first perform gate-level
power analysis. PT-PX builds a detailed power profile of the design based on the circuit connectivity
(generated from verilog netlist), the switching activity information (VCD/SAIF), the net parasitic
(SPEF), and the cell-level power behavior data in the Synopsys database format (.db) library, which
can be either a nonlinear power model (NLPM) or a Composite Current Source (CCS) library. It
then calculates the power behavior for a circuit at the cell level and reports the power consumption
46
at the chip, block, and cell levels. Gate-level power analysis depends on input vectors. These input
vectors can be provided through VCD file. In absence of VCD file, we can also provide switching
activity information through Switching Activity Interchange Format (SAIF) file. A SAIF file is
generated either from gate-level or RTL simulation. RTL SAIF captures switching activity for only
part of the design. PT-PX propagates the partial switching activity throughout the whole design.
Once power profile of a design is calculated, the power consumption value of cells is used to
scale the cell current waveforms available from library characterization data. Hence, based on
the library characterization data and PT-PX power reports, PrimeRail creates cell instance profiles,
which include dynamic current waveforms and parasitic of the power supply network for all the
power and ground ports of each cell in the design.
4.4.2 PG Extraction
Chip power grid(PG) consists of multiple metal layers. PG extraction step involves extracting
the parasitic of power grid network. PrimeRail’s built-in extraction engine can extract resistance,
and capacitance parasitic. However, for RC extraction, it requires TLUPlus model, which defines
the technology parameters for interconnect. TLUPlus model can be generated using Interconnect
Technology File (ITF) available from Cadence Star-RC-XT tool or from foundry. In absence of
TLUPlus models, we only extract resistance parasitic. Extraction of resistance parasitic can be done
using Milkyway Technology file. Milkyway technology file for 0.18um node can be obtained from
OSU library.
During rail analysis, PrimeRail combines the cell current and parasitic model with the resistive
power distribution network to solve for the voltage drop values at each node in the resistive network.
To acquire more accurate results, PrimeRail needs the location of the ideal voltage source and the
ideal power supply in the design. The voltage sources can be identified graphically or by specifying
locations on the die. We provide an ideal voltage source on the middle of power ring on each side
of the die. The voltage sources are placed on the VDD ring around the core. The power supply
locations can be saved in a file, and loaded during the analysis.
PrimeRail reports the minimum and maximum values of voltage drop (for a power net) or voltage
47
rise (for a ground net) to the command window and to the log file. By default, the tool also reports
the top five instances of peak voltage drop and time when they occur. A voltage drop violation
report can also be generated, which lists down all the cell instances experiencing absolute voltage
drop more than user defined level. PrimeRail can only report top 100 cell instances having peak
voltage drop. PrimeRail also reports voltage drop values for each metal segments. We use this
report to perform the decoupling capacitor optimization as described in Chapter 6.
The proposed approach discussed in this chapter requires three components, namely UCDCLIB,
DCOPT and ECO-PLACER. These three components along with the PrimeRail based dynamic
voltage drop analysis complete the overall framework. We compare results of our approach with
filler-based decap optimization approach. We specifically compare two approaches in terms of
overall decap budget requirement and effective voltage drop reduction. We categorize the results in
four different cases as shown in Table 4.1.
Subsection 4.5.1 provides further details about these cases. The results of two approaches cate-
gorized in four different cases for each benchmark are presented in chapter 8.
In order to observe the effects of voltage drop and apply the optimization process, we need
designs with sufficient complexity. We choose benchmarks from HLS’95 [29] and ITC’99 [30]
pool. We perform the comparative analysis on benchmarks shown in Table 4.2
Following steps are used to analyze each benchmark shown in Table 4.2 for four design cases
given in Table 4.1:
1. RTL description of design in VHDL format is synthesized using Synopsys Design Compiler
[31]. Technology library used during synthesis is OSULIB for 0.18um node from Okhlahama
State University. This is a reference library for analysis. Synthesis constraints given are clock
48
Table 4.1: Design Cases for Voltage Drop Analysis
Case 2 Post-Opt(F) OSULIB Intrinsic Cell Decap Post-optimized design (Filler-based decap
DCFLIB Filler-based Decap optimization)
* Design contains standard logic cells and
filler cells.
* Voltage drop optimization performed by
filler-based decap approach
Case 3 Post-Opt(D) OSULIB Intrinsic Cell Decap Post-optimized design (Decap padded cells
DCFLIB Decap-padded Cells based optimization)
UCDCLIB * Design contains standard logic cells and
decap-padded logic cells along with filler
cells.
* Voltage drop optimization performed using
our approach
Case 4 Post-Opt(DF) OSULIB Intrinsic Cell Decap Post-optimized design (Decap-padded cells
DCFLIB Filler-based Decap and Filler-based decap optimization)
UCDCLIB Decap-padded Cells * Design contains standard logic cells and
decap-padded logic cells along with filler
cells.
* Voltage drop optimization performed using
ours as well as filler-based decap approach.
frequency, clock transition. Output of synthesis is gate-level netlist (.v), and Synopsys design
constraint file (.sdc).
2. The gate-level netlist is placed and routed using Cadence SOC Encounter [32]. The inputs to
Encounter are gate-level netlist (.v), physical (.lef) and timing (.lib, .db) views of OSULIB,
design constraints (.sdc). Filler cells are inserted in the design from DCFLIB library. Design
is supplied with four ideal vdd and gnd input points. These input supply points are attached
at the middle of power ring stripe on each side of the die. Various placement and routing
49
Table 4.2: Benchmarks Used for Analysis
constraints such as core utilization, aspect ratio, power grid design attributes are specified
in Chapter 7 with the benchmark results. Post-layout outputs are: design placement (.def),
signal parasitic file (.spef).
3. Post-layout design is imported into Synopsys’ PrimeRail voltage drop analysis tool to gen-
erate a Milkyway design library as discussed earlier in this chapter. OSULIB and DCFLIB
Milkyway reference libraries are attached to the design, and dynamic voltage drop analysis is
performed. We use vector-less flow by providing a switching activity interchange file (.saif)
for each design.
The results are classified under case 1: ”pre-opt” (results before voltage drop optimization).
Note that decap contribution to pre-opt design comes from intrinsic cell decap. No explicit
decap is added to the design at this stage.
4. We then use PrimeRail’s filler-based decap insertion flow to optimize the voltage drop around
the chip. Decap insertion flow requires one-to-one correspondence between filler cell dimen-
sion and decap cell dimensions. Master decap cells are referenced from DCFLIB library.
PrimeRail optimizes voltage drop iteratively using filler-based decap insertion approach dis-
cussed earlier in this chapter. The final voltage drop results and number of filler cells replaced
as well as total decap budget required are stored under case 2: ”post-opt (F)” (results after
voltage drop optimization using only filler-based decaps).
5. The pre-opt design from step 3 is now optimized for voltage drop using our approach. We use
C++ based decap-optimization procedure (DCOPT) (described in Chapter 6) in conjunction
with PrimeRail’s voltage drop analysis to reduce the voltage drop around the chip iteratively.
The output results of DCOPT are total decap budget and list of logic cells to be replaced by
decap-padded standard cells from DCLIB library and standalone decap cells from DCFLIB.
50
The original design placement is modified optimally to incorporate these design changes us-
ing a C++ based Engineering Change Order placer (ECO-Placer) (described in Chapter 7).
The modified design is ECO routed using Cadence SOC Encounter ECO flow, and post-layout
design is saved (in .def format).
6. Similar to step 3, post-layout design from step 5 is analyzed for voltage drop. The design
analysis requires DCLIB milkyway reference library to be attached at this stage. The voltage
drop analysis results, so obtained, are classified under case 3: ”Post-Opt (D)” (results after
voltage drop optimization using only decap-padded standard cells).
51
Chapter 5
Standard cell based design approach is a widely popular method for creating Application Spe-
cific Integrated Circuits (ASICs). Use of standard cells in a design flow reduces the ASIC devel-
opment time considerably by providing a high degree of automation. Right from synthesis phase
to final layout generation, standard cell library promotes a highly modular and independent design
framework.
Results from chapter 3 highlight the importance of placing decaps near the switching nodes. To
enable this functionality, we develop a special class of standard cell library, UCDCLIB (University
of Cincinnati Decoupling Capacitor padded Standard Cell Library), where standard logic cells are
padded with a decoupling capacitor. We modify logic cells from OSU standard cells library to
develop UCDCLIB cells. Section 5.1 provides details about OSU library cells. A brief overview
of various sources of MOSFET capacitance is presented in Section 5.2, which forms a basis for
on-chip decoupling capacitor design. Developed libraries are discussed Section 5.3. Lastly, in order
to facilitate the use of decap-padded standard cells at various stages of design flow, we discuss the
method to characterize the library cells in Section 5.4
52
5.1 Nominal Standard Cell Library
OSU Standard cell library (formerly IIT standard cell library) offers various logic cells ranging
from a basic inverter to a complex full-adder [28, 33]. We treat this library as a nominal standard
cell library, and use it for synthesis and layout of benchmark designs. As described in Sectionde-
capsources, each OSU library cell has an associated parasitic capacitance due to the cell structure.
This parasitic capacitance acts as intrinsic decap. Intrinsic decap along with other details such as
logic function, cell area (only width in λ is considered for area since all cells are single height cells)
for each OSU library cell is shown in Table 5.5 (after Section 5.3). The cell nomenclature follows
the pattern: gate name< #n >X< #m >, where gate name refers to the gate function name, #n is
the number of inputs a gate has, #m refers to the driving strength of the gate.
1. Junction capacitance: due to reversed biased pn junction formed by substrate and source,
and substrate and drain interface.
Out of these three sources, only last two contributes toward the gate capacitance of MOSFET.
The total gate capacitance (CG ) is, thus, given as:
CG = CO +CC (5.1)
53
Figure 5.1: Capacitance Sources in MOSFET
CO = COX Xd W (5.2)
where COX is the gate oxide capacitance per unit area, defined as parallel plate capacitance between
MOSFET gate and the conducting channel. In this case, overlapped drain and source region acts as
a conducting channel. Xd is the lateral diffusion amount, and W is the width of channel.
Presence of a uniform channel during the linear mode (VGS > VDS ) divides the total channel
capacitance given by equation 5.3 equally between gate & source (CGCS ), and gate & drain (CGCD ).
And during saturation mode (VDS > VGS −VT ), the capacitance exists only between gate and source
due to pinch-off effect. The capacitance value depends on the area of parallel plates formed by gate
and channel connecting source.
54
S G D S G D S G D
The variation of gate capacitance with respect to gate-to-source voltage (VGS ) is shown in Figure
5.3. As seen from the Figure 5.3, in order to obtain a stable gate capacitance, MOSFET must be
operated in the linear region. This can be done easily by ensuring drain-to-source voltage (VDS ) less
than the gate-to-source voltage (VGS ) at all times during its operation. This design information is
used to realize a stable capacitance using MOSFET devices as described in next section. Table 5.1
summarizes the capacitance values of MOSFET transistor for different operating regions.
Table 5.1: Channel Capacitance of MOSFET for Different Operating Regions (Source [4])
55
5.3 Decap Library
It is clear from previous section that MOSFET gate capacitance can be used to design a decou-
pling capacitor. Either a PMOS or NMOS can be used for this purpose. For a stable decoupling
capacitor, MOSFET must be operated in a linear region. A stable NMOS decap can be designed by
connecting gate terminal to supply voltage, and shorting the source, drain and substrate terminals to
ground as shown in Figure 5.4. Likewise, a stable decap using PMOS can be realized by connecting
gate terminal to ground, and source, drain and substrate connected to supply voltage. From Table
5.1, MOS decaps can be characterized by following equation:
where parameters are as defined earlier. The transient response of a standard decap gets affected
by parasitic resistance offered by the channel. Higher channel resistance slows down the decap
charge release rate, and makes it ineffective. Since the decap is designed using a standard MOSFET
transistor, the same governing equation for standard MOSFET can be used to characterize the low
frequency resistance of a standard decap [34]:
L
Re f f = 6µCOX W (VGS −VT ) (5.5)
where µ is the mobility, VGS (or VGD since source and drain are tied) is the voltage across the oxide,
and VT is the threshold voltage.
From Equation 5.5, it is clear that Re f f is proportional to the channel length L. That is, for a
faster transient response, a decap design should maintain L in a reasonably small range to keep Re f f
small. To capture the transient behavior, a decap can be modeled as a series RC circuit, as shown in
Figure 5.4. Both Reff and Ceff can be considered.
56
The cost-effectiveness of a MOS decap for a row-based standard cell design style can be im-
proved by designing a single height decap cell consisting of both PMOS and NMOS as shown in
Figure 5.5. This decap cell can be treated just like any other standard logic cell. Decap standard
cells do not have any connectivity. They Only connect to supply and ground rail. Hence, they are
also referred as physical only cells. Although, other approaches for decap design have also been
proposed based on application and reliability requirements [34, 35], decap structure shown in Fig-
ure 5.5 allows ease of use and offers the best decap value per unit area [34]. We, therefore, chose
this structure to design our decap-padded standard cell libraries as discussed in Sections 5.3.2 and
5.3.3. Before we provide details about the developed decap libraries, we present in the following
section decap measurement results using HSPICE.
In this section, we experimentally measure NMOS and PMOS decoupling capacitance using
HSPICE, and compare the measurement results with that obtained from decap Equation 5.4.
HSPICE supports various MOS capacitor models [36] for capacitance measurement. Some of
these models are Meyer model, Charge Conservation model, BSIM model, and AMI model. Se-
lection of appropriate model depends on the trade-off between the desired accuracy in specific fre-
quency range and modeling time. Appropriate model for calculating nonlinear, voltage-dependent
MOS gate capacitance can be included in the HSPICE simulation by setting the model parameter
CAPOP.
57
For MOSFET model level 49, CAPOP is set to 2 by default, which is the parameterized Mod-
ified Meyer model. We use this model to compare the simulation results with that obtained from
equations. Meyer model is a first order approximation of MOS capacitance and is reported to pre-
dict the high frequency capacitance more accurately [37, 38]. A DC sweep analysis using HSPICE
[39] can be used to calculate the variation of CG with respect to VGS graph. A HSPICE input file for
characterization of NMOS decap (for 0.18µm node) shown in Figure 5.4 is given below:
Vd d 0 0
Vb b 0 0
Vs s 0 0
.DC Vg -1 1.8 .1
.OPTION POST
The parameter, LX18(<transistor name>), defined in the model, measures the MOS gate capac-
itance. And parameter DCCAP is set to 1 to enable capacitance calculation. Simulation of above
HSPICE file at 0.18µm technology node results in the graph shown in Figure 5.6. The plot shown in
the Figure 5.6 matches well with the graph shown in Figure 5.3. It is clear from the graphs that the
gate capacitance drops when VGS is near MOS threshold voltage VT , and a stable decap can obtained
by operating MOS in linear region (i.e. VGS ≈ V DD. A similar analysis can be done for PMOS
decap. Figure 5.7 shows the capacitance curve for PMOS decap. The capacitance curves shown
are in close correlation with MOS capacitance theory discussed in previous section. Tables 5.2 and
5.3 show the NMOS and PMOS decap measurement results at 0.18µm node for various transistor
widths respectively. (CG )sim and (CG )eq refers to the capacitance result obtained from HSPICE sim-
ulation and from Equation 5.4 respectively. Simulation results match well with that obtained from
equation. The small discrepancy in values could be attributed to Meyer’s simplified piecewise linear
capacitance model.
58
Table 5.2: NMOS Decap Measurement Results Table 5.3: PMOS Decap Measurement Results
using HSPICE using HSPICE
Figure 5.6: NMOS Decap Measurement using Figure 5.7: PMOS Decap Measurement using
HSPICE (CG Vs. VGS plot) HSPICE (CG Vs. VGS plot)
5.3.2 DCFLIB
DCFLIB (Standalone Decap Cells Library) contains decap standard cells of varying sizes. These
decap standard cells serve as filler cell replacements to provide specific value of decap in the design.
The library also contains equal number of same sized filler cells as there are decap standard cells.
This is required for proper functionality of PrimeRail’s filler-based decap insertion. The reason to
develop a separate library containing only filler and decap standard cells is that filler-based decap
optimization is required for both the traditional as well as our voltage drop optimization approach.
Equation 5.4 can be used to arrive at transistor dimensions for required decap value. For example,
substituting values for COX and CO for 0.18µm technology in Equation 5.4, we can design a decap
cell with 2X femto farad capacitance by solving X = W (7L + 5) for each MOSFET in decap cell,
where W and L are defined in terms of λ. We try to keep L at its minimum value allowed by
the technology to lower the resistance and increase W to achieve required decap value. Multiple
fingers are used for decap design, when W reaches its maximum value. Table 5.4 lists out decap
59
Table 5.4: DCFLIB Cells
standard cells in DCFLIB. The decap values listed are as per 0.18um technology node. The cell
nomenclature is defined as: DCPX< #m >, where DCP refers to the decoupling capacitor, #m
refers to the decoupling capacitance provided by the cell in femto farad.
5.3.3 UCDCLIB
• Cell Pins:
Input and output pins of cell are placed at an intersection of a multiple of the xPitch and
yPitch. That is, the pins area placed on a xPitch*yPitch grid. This is done to facilitate signal
routing.
60
• Cell Width:
Cells are designed with width multiple of the yPitch value for the given technology node.
• Cell Origin:
Cell layout is drawn with lower left corner at (0, 0). This helps in defining the place and route
boundary for the cell when generating the abstract view of cell for layout purposes.
(A) (B)
Figure 5.8: [A] Standard Inverter (INVX8) [B] Decap-padded Standard Inverter (INVDCX8)
Table 5.5 shows standard cells from OSULIB and equivalent decap-padded standard cells from
UCDCLIB. Intrinsic capacitance and area of OSULIB cells is denoted by C1 in fF and A1 in λ
(only width is specified since all cells have same height) respectively. Similarly, C2 and A2 refers
to capacitance and area for UCDCLIB cells. Percentage increase in cell capacitance and area due to
decap padding is given in last two columns as %∆C and %∆A respectively.
61
Table 5.5: OSULIB and UCDCLIB Cells
OSULIB UCDCLIB
(with DC suffix)
Cell %∆C %∆A
C1 A1 C2 A2
(fF) (λ) (fF) (λ)
INVX1 5.37 16 22.8 24 324.58 50
INVX2 10.36 16 28.2 24 172.2 50
INVX4 20.8 24 41.05 32 97.36 33.33
INVX8 41.5 40 61.8 48 48.92 20
AND2X1 19.55 32 70.8 48 262.15 50
AND2X2 25.65 32 77.78 48 203.24 50
BUFX2 18 24 40.55 40 125.28 66.67
BUFX4 33.6 32 53.9 40 60.42 25
AOI21X1 27.36 32 68.88 48 151.75 50
AOI22X1 36.09 40 73.74 56 104.32 40
DFFNEGX1 66.55 96 88.05 112 32.31 16.67
DFFPOSX1 70.15 96 91.58 112 30.55 16.67
FAX1 114.44 120 172.63 136 50.85 13.33
HAX1 52 80 91.18 88 75.35 10
MUX2X1 38.24 48 95.86 56 150.68 16.67
NAND2X1 11.08 24 39.95 32 260.56 33.33
NAND3X1 19.91 32 52.39 40 163.13 25
NOR2X1 14.67 24 99.25 40 576.55 66.67
NOR3X1 31.31 64 88.7 64 183.3 0
OAI21X1 23.06 32 70.7 48 206.59 50
OAI22X1 33.69 40 81.38 56 141.56 40
OR2X1 23.7 32 60 48 153.16 50
OR2X2 30.03 32 80.68 48 168.66 50
XNOR2X1 55.6 56 76.33 72 37.28 28.57
XOR2X1 56.03 56 76.75 72 36.98 28.57
A successful tape-out of a cell-based design has a strong dependence on the level of accuracy
with which information about individual cells is accessible to CAD tools during various stages
of design flow. Further, a design flow involving CAD tools from different vendors requires this
information to be presented in a standard format to address the issue of interoperability. These
requirements are satisfied by characterizing library cells and categorizing the generated information
in the form of standard library views. The characterized views include symbol view for making
schematic, logical and timing views for synthesis and design analysis, and physical views for layout
generation and fabrication. In this section, we present method to characterize our library cells and
generate all popular library views to facilitate seamless integration of our library with standard CAD
62
Table 5.6: Library Views and their Description
tools [27]. Table 5.6 summarizes standard library views and their Description. Figure 2.1 shows
various library views required by popular CAD tools at various stages of digital IC design flow. In
subsequent subsections, we discuss method to generate library views shown in Table 5.6.
Symbol library provides a graphical representation for library cells. Using Symbol library, the
CAD tool can generate the schematic of a design by performing a one to one mapping of cells
in the design netlist to cell symbols in the library. Symbol libraries for our library cells have not
been generated. A brief description of symbol library generation [40] is provided for completeness.
The symbol library generation process starts with drawing a representative symbol for each cell in
a schematic editor, like Cadence Virtuoso Schematic. The symbols are then exported to an EDIF
file (.edif), from which an ASCII symbol library (.slib) can be generated using Synopsys Design
Compiler. Finally, Synopsys Design Compiler can be used to generate the standard symbol library
(.sdb).
Physical Views of library cells provide information related to physical representation of logic
cells. These information include cell layouts, layers used, layer numbers etc. Physical views are
required during the backend stage of design flow. Two popular physical views, GDSII and LEF file
formats, are described below:
63
Figure 5.9: Method to Generate Various Library Views
64
by mosis. Cell layouts are then extracted to CIF, and transferred to Cadence Virtuoso Layout Editor.
The process design kit used with Virtuoso is NCSU CDK (Cadence Design Kit) [41]. The design
kit includes layermap files for CIF & GDSII import/export, layer assignments, DRC and parasitic
extraction rules, and transistor model files. Reference [27] provides a step by step description of
generating GDSII file from Virtuoso editor (Cadence ICFB platform toolset).
1. Routing layers information: Tool needs to know how many routing layers for the specified
technology node can be used to successfully connect standard cells and macros in the design.
Additionally, for each routing layer, various attributes like layer type, preferred routing direc-
tion (horizontal or vertical), layer width/spacing/pitch rules, parasitic per unit area etc., are
required. Layer type can be routing, cut (contact), or masterslice (poly/active), overlap. Via
attributes for connecting adjacent layers are also required.
2. Standard Cell information: Various attributes related to standard cells like name, site name,
orientation, place and route (PR) boundary, pins connected to cells are required. Standard
cells are laid out using few metal layers. Locations and sizes of these metal layers are required
by tools to avoid routing same metal layers over the same area inside standard cells. This is
necessary to prevent shorts between same metal layers.
These attributes are typically captured in LEF file and form an abstract view for the library. A
single LEF file can be defined to include all library information. However, if library contains large
number of cells, a single LEF file may lead to a large LEF file which can be hard to manage. In
such cases, the abstract view can be separated in two parts: technology LEF, and cell library LEF.
Technology LEF file contains technology information for a design such as routing layer and via
attributes etc. And cell library LEF contains standard cell and macro attributes. We generate a
single LEF file for our libraries. Figure 5.10 shows a 2 input NAND gate layout and its abstract
view. As seen from figure, abstract view is like a cell skeleton, contains no active layers. It only
contains cell geometry (X and Y), obstruction information for routing layers, and I/O pin locations.
65
Following sections typically define a LEF file [42]:
1. Technology: Technology section defines various attributes related to routing layers and vias
as discussed above.
2. Site: Site section contains generic information about all the standard cells and macros defined
after this section. It defines cell class (whether cells belong to CORE or PAD), cell symmetry
(whether cells can be rotated along X axis or Y-axis or both XY for optimization), and generic
cell dimensions.
3. Macro: Macro section defines attributes related to standard cells as discussed above. There
are as many macro sections as there are standard cells in a library.
Cadence Abstract generator [43] can be used to generate cell abstract views. Detailed steps to
generate the abstract view of library cells using Cadence Abstract generator can be found in [27].
Correct and reliable operation of a design can be ensured if the design meets its timing and
power specifications. Although, it is possible to perform circuit- or switch-level simulations using
66
SPICE like simulators to estimate these parameters for a design, the memory and time overhead
would simply be unacceptable. Moreover, running a full-chip circuit level analysis every time a
design goes through any changes would be prohibitively expensive. Instead, a convenient approach
would be to generate delay and timing models for individual logic cells, and use these models to
estimate parameters for the design. This approach saves considerable time, since model generation
is just a onetime process, and design parameters estimation based on these models does not require
transistor level analysis. Detail and accurate model generation is must to produce results comparable
to full-chip analysis results. Library characterization is a process of generating timing and power
models for individual logic cells on the basis of their physical netlist. Power consumption and speed
of logic cells depend on input slope and output loading conditions. Cell area, logic functionality,
and state dependent leakage power are some of the attributes required by CAD tools to perform
various optimizations. Library characterization generates this information by simulating individual
logic cells for different input slew and output loading conditions for each process corner. Non-
linear models for cell delay, output transition, and power are typically represented in the form of
a 2D lookup table. For example, to calculate cell delay with respect to one input for 5 different
input slope and 5 different output load conditions, 50 cell simulations (known as timing arcs) are
to be performed, 25 for both rise and fall transitions. The characterized data is stored in a standard
format such as .lib, .db, .tlf, or .alf so that CAD tools can understand. Formats .lib (synopsys liberty
format) and .db (database format) are typically used by Synopsys products, whereas format .tlf
(timing library format) is used mostly by Cadence tools. Format .alf (advance library format) is an
extension to .lib format.
We use Cadence SignalStorm tool [44] to characterize our cell libraries for nominal process
corner and generate .lib, .alf, and .db formats. A netlist view (.v) is also generated. It captures
verilog description of the logic cell, required for simulation purposes. Library characterization
process using SignalStorm is detailed in [27].
As discussed in Section 2.2.2, parasitic effects of standard cells appearing as a load to power
distribution network affects the overall voltage drop. Parasitic view captures the cell parasitic,
which can then be used for cell-based voltage drop analysis. Parasitic view creation is discussed in
Chapter 5. Steps to create parasitic view using Synopsys PrimeRail tool are given in Appendix B.
67
Chapter 6
Traditional approach of filler-based decap optimization places decaps away from the switching
nodes, thereby rendering the decaps less effective. A higher decap budget is required to compensate
for this efficiency loss. As discussed in earlier chapters, distributed decap placement can improve
the cost-effectiveness of decaps. We propose to achieve a distributed decap placement using decap-
padded standard cells from UCDCLIB library. Satisfying the decap requirement of a design using
decap-padded cells raises two questions. First, how to calculate the decap budget of a design in
terms of decap-padded standard cells. And second, which standard cells in the design need to
be replaced with decap-padded standard cells. Insertion of decap-padded standard cells without
proper guidance would result in more number of standard cell replacements, thereby leading to an
unacceptable increase in design area and yield degradation of the design.
To address these issues, we develop a C++ based Decoupling Capacitor (Decap) Optimization
procedure: DCOPT. DCOPT works in conjunction with Synopsys’ voltage drop analysis tool,
PrimeRail, to calculate the decap budget of a design. DCOPT identifies standard cells in a de-
sign which are to be replaced with equivalent decap-padded standard cells from UCDCLIB. It also
allows for tighter decap optimization by using a concept of dynamic thresholding described later in
the chapter. Implementation details of DCOPT is presented in this chapter.
68
6.1 DCOPT Block Diagram
Figure 6.1 shows the block diagram of decap optimization flow using DCOPT. DCOPT architec-
ture is based on a client-server model, where client and server act as two different processes, and the
exchange of data between them is accomplished by an interprocess communication (IPC) mecha-
nism. DCOPT comprises of three building blocks, namely PrimeRail Analysis, Decap Optimization
Client (DCOPT-Client), and Decap Optimization Server (DCOPT-Server). DCOPT uses PrimeRail
to perform voltage drop analysis on a design. The result of voltage drop analysis is communicated
to DCOPT- Server by DCOPT-Client through named pipe IPC mechanism. DCOPT-Server virtually
incorporates decaps into the design. The modified design is re-analyzed for voltage drop effects.
DCOPT works in iterative fashion to bring the voltage drop within user defined threshold.
The main reason to implement DCOPT in client-server configuration is to combine the PrimeRail’s
voltage drop analysis with execution efficiency offered by C++. Synopsys PrimeRail supports TCL
and Scheme based scripting as its main interface. PrimeRail can be configured to run in batch mode
using scripts written either in TCL or Scheme. However, implementation of decoupling capaci-
tor optimization algorithm through a script and its subsequent integration with PrimeRail analysis
would make the process unwieldy for two reasons. First, the optimization process requires an it-
erative analysis to arrive at the decap budget. Hence, execution time penalty must be minimized.
Second, efficient data structures are required to capture the design complexity and to enable fast
budgeting. For these reasons, DCOPT-Server is implemented using a compiled language, C++. Im-
plementation of DCOPT-Server in C++ raises another issue: integration of script based PrimeRail
analysis with a C++ based process. The problem can be solved by invoking a separate process
through TCL script. DCOPT-Server can be repeatedly invoked to perform the decap budgeting af-
ter each voltage drop analysis. However, repetitive invocation of DCOPT-Server will kill the very
purpose of implementing a separate process. DCOPT-Server needs to perform initial database and
remember the updated design for next iterative analysis. Invoking it with each iteration would result
in an unacceptable execution time overhead. Hence, a better approach would be keep DCOPT-
Server running in the background for the entire duration of optimization process, and supply it with
a necessary parameters to perform decap budgeting. This functionality is enabled by implementing
a C++ based light weight DCOPT-Client process. DCOPT-Client serves as an interface between
PrimeRail analysis and DCOPT-Server optimization process. Improvement in execution efficiency
due to DCOPT-Server is described in subsequent sections.
69
DCOPT Server
Command ile
F
Output
List f ell perati ns
o c o o
The input requirements of DCOPT can be subdivided as input requirements of its building
blocks. Input needs for PrimeRail analysis are same as described in Section 4.3. DCOPT-Client
is invoked through PrimeRail script, hence does not need additional inputs. Following summarizes
the input requirements of DCOPT-Server, supplied in the form of a command file:
70
with its location attribute is specified in the DEF file. However, instances are not listed in any
specific order. DC Server needs instances in sorted order to perform decap budget calculation.
We sort all cell instances first row wise. Within each row, instances are further sorted along
the width (based on their Y location). We write a perl script to generate a sorted post process
design placement file (stored as .def.rpt).
71
The value is specified in femto farad. Decap step allows for trade-off between tight optimiza-
tion and faster runtime (number of iterations).
PrimeRail analysis is performed through a mixed mode script containing TCL and Scheme lan-
guage commands. Algorithm 1 shows the pseudo-code for the analysis. Input design in Milkyway
format, Ω, is analyzed for initial dynamic voltage drop analysis (DvD). The regions of the placement
on specific metal layers having voltage drop greater than user defined threshold are reported to a
file. We call this file a violation report, Φ. This is followed by invocation of DCOPT-Server in non-
blocking mode. DCOPT-Server runs as a background process, and waits for necessary data from
PrimeRail Analysis. This data is supplied by invoking DCOPT-Client in blocking mode. Invocation
of DCOPT-Client in blocking mode halts the PrimeRail script execution until DCOPT-Client termi-
nates. This step provides an explicit synchronization between PrimeRail analysis and the DCOPT-
Server. DCOPT-Client sends the Φ to DCOPT-Server, which then returns a design modifications χ
or a status flag to DCOPT-Client. On receiving server acknowledgement, DCOPT-Client terminates
and unblocks PrimeRail execution. PrimeRail updates the Ω with χ, and performs DvD again with
What-If Capacitance feature. New violation report Φ is used to repeat the analysis. The iteration
continues until the voltage drop is optimized or an exit status is issued by DCOPT-Server.
What-If Capacitance feature in PrimeRail allows to evaluate changes in voltage drop by virtually
incorporating capacitors at specific locations within the design. DCOPT-Server sends χ in the form
of Scheme script which informs the placement of these virtual capacitors in the Ω. By virtual
capacitors we mean, the design Ω is not modified actually. Instead these capacitors are added
virtually just to observe the effect of design modifications.
6.4 DCOPT-Client
DCOPT-Client is a C++ based light weight process whose function is to synchronize the opera-
tion of DCOPT-Server with PrimeRail analysis. As discussed in previous section, explicit synchro-
nization is provided by DCOPT-Client by blocking the PrimeRail execution. DCOPT-Client com-
municates with server by setting up communication channels using named-pipe IPC mechanism.
72
Algorithm 1: PrimeRail Analysis Script Pseudo-code
Input: Refer to Section 4.3
Output: Voltage Drop Violations
Two unidirectional named-pipe channels are created: a write only channel for sending a command
to server, and a read only channel for receiving the acknowledgment from server. Command is sent
to DCOPT- Server to indicate the availability of violation report. The received acknowledgment
terminates DC Client and unblocks PrimeRail execution.
6.5 DCOPT-Server
DCOPT-Server is a main component of the decap optimization process which is responsible for
calculating decap budget of a design, communicating modified design (with decaps) to PrimeRail,
terminating PrimeRail after analysis, and reporting total design decap in terms of UCDCLIB and
DCFLIB cells. DCOPT-Server analyzes the voltage drop values at various nodes in the design, and
compensates the drop by inserting a decap at that node.
73
DCOPT-Server prepares a physical map of the design from the input DEF. The physical map cap-
tures the location and orientation details of all instances in the design. Physical location of various
nodes on metal 1 layer used to create power trunks are also determined at this stage. Physical map is
necessary to determine cell instances suffering from voltage drop. This is made possible by mapping
the voltage drop violations reported by PrimeRail onto the prepared physical map. DCOPT-Server
receives violation report through DCOPT-Client through IPC mechanism. A valid command from
DCOPT-Client signals availability of violation report. Violation report contains voltage drop val-
ues at various nodes on all metal layers. These nodal voltage drop values are mapped to prepared
physical map, and associated to cell instances as per the physical location.
Note that voltage drop value is specified as a negative number, so lower voltage drop value
refers to higher drop. Although all cell instances having voltage drop value less than the user
defined threshold are under violation and are in need of optimization, we do not consider all such
instances under violation at first step. We optimize first the cell instances suffering from higher
voltage drop (more negative value) using the concept of dynamic threshold. Instead of identifying
all cell instances suffering from voltage drop with respect to user defined threshold, we dynamically
modify the voltage drop threshold as shown in Figure 6.2. During each iteration, voltage drop
threshold is set to a value X% lower than the peak voltage drop (indicated by V Dthes1 in the Figure
6.2). Cell instances identified with respect to this current threshold (shown by band I) are optimized
by adding decap at cell nodes. Hence, with each iteration, peak voltage drop decreases, and so
does the current threshold (shown by V Dthre2 ). When the current threshold becomes equal to user
defined threshold, all cell instances under violations are considered for optimization. The concept
of dynamic threshold allows for tighter decap budget. Decoupling capacitors work on principle
of locality. A decap placed near a violating cell may as well be useful for providing charge to
neighboring cells, if not all cells are switching simultaneously. Hence, optimizing first the cells
suffering from higher voltage drop can absorb number of violations due to lower drop cells. This
reduces total number of violations, and results in smaller decap repository. Experimental results in
Section 6.6 exemplify the usefulness of dynamic threshold concept.
Decap value for each violating cell identified based on the current threshold is updated with
each iteration. During the first iteration, the decap value for a violating cell is set to zero. With
each successive iteration, decap is incremented by input ’decap increment amount’ (Cincr ), if that
cell is found to be under violation again. Decap increment amount offers a trade-off between the
convergence time for optimization and the decap budget required for the design. A higher value for
74
Node Supply Voltage (V)
VDDNominal
VDuser-defined-thres
VDthes2
VDthes1
I I
the increment amount enables faster convergence, but can lead to more than required decap budget.
Alternatively, a smaller value allows for finer control of voltage drop, leading to more accurate
decap budget, but affects the overall optimization time adversely.
During each iteration, decap additions to the design are communicated to PrimeRail through
DCOPT-Client interface. DCOPT-Server compiles all design modifications (decap additions at var-
ious locations) as a Scheme script, which is sent to PrimeRail. PrimeRail virtually updates the
design under analysis through this script, and performs a new voltage drop analysis. During each
iteration, the voltage drop results are communicated to DCOPT-Server for further optimization. The
optimization process continues until design contains no violating cell. At that time, DCOPT-Server
informs PrimeRail to terminate the analysis.
A final step performed by DCOPT-Server is to legalize the decap additions to violating cells.
After the complete analysis, each violating cell has been appended with some value of decap. This
decap is still virtual. In order to accommodate this decap into the design, we replace the violating
cell with equivalent cell from UCDCLIB. However, each UCDCLIB cell contains a minimum decap
padding. Hence, if the virtual decap with violating cell is more than the decap padding value of
equivalent UCDCLIB cell, then the extra virtual decap is combined with the adjacent cells, and these
cells are also considered for replacement from UCDCLIB. If adjacent cells can not accommodate
extra decap, then a decap cell from DCFLIB is inserted at that node to satisfy the decap budget of
the design. DCOPT-Server outputs list of cell operations consisting of violating cell replacements
75
with equivalent cells from UCDCLIB, and decap insertions from DCFLIB.
Apparent from above description, DCOPT-Server needs to keep track of updated decap values at
various nodes in the design. This is the reason to keep DCOPT-Server always active by running it
as separate process. If DCOPT-Server is invoked on each iteration of PrimeRail analysis, this would
have called for storing the updated decap values in some database or file, leading to huge execution
time penalty. Moreover, on each invocation, physical map preparation would add to the overall
optimization time. Therefore, a light weight client interface is provided to improve total execution
time.
Algorithm 2: DCOPT-Server
Input: DCOPT-Server Command File
Output: List of Cell Operations
76
6.6 Experimental Results
In this section, we present experimental results on benchmarks to highlight the functional behav-
ior and effectiveness of DCOPT algorithm in optimizing the power supply noise. All experiments
were performed on Sun Blade 1000 workstation (SparcV9 processor at 750MHz) with Solaris op-
erating system.
Tables 6.1 and 6.2 show the DCOPT results for Barcode16 and B14 benchmarks respectively.
The first row in the table shows the initial voltage drop analysis results for benchmarks. Peak VD
and Cintrinsic give the peak value of voltage drop in mV and intrinsic capacitance (pF) in the design.
This forms an input to the DCOPT for optimization. The optimization goal is set by the user defined
threshold V thusr . DCOPT performs multiple iterations to bring the peak voltage drop of the design
within this threshold. The results of iteration are shown in subsequent rows. We highlight the
usefulness of dynamic threshold concept by performing decap optimization under following three
cases:
1. Without DT: Decap optimization with no dynamic threshold. A fixed user defined threshold
is assumed during each iteration of DCOPT.
2. With 5% DT: Optimization with 5% dynamic threshold. During each iteration of DCOPT,
current voltage drop threshold is dynamically set to 5% higher than the peak voltage drop
(negative value) in that iteration. User defined threshold act as an upper limit.
3. With 10% DT: Same as 2, except that the current threshold is set to 10% higher during each
iteration.
For each case, results show the number of nodes, #Nd, analyzed for decap addition, total design
decap repository after decap addition to #Nd nodes, and the change in peak voltage drop with each
iteration. For dynamic threshold cases, additional parameter, current threshold, Vcur , is also shown.
Last row in tables show the total decap added to the design (Cadded − Cintrinsic ), and the number of
standard cells with decap after iteration i.
Results clearly show the effectiveness of DCOPT approach to reduce the peak voltage drop with
each iteration. As discussed before, without dynamic threshold, DCOPT starts optimization by
processing all nodes under violation, and gradually moves on to concentrate on a smaller set of
nodes with higher voltage drops. This results into large decap repository for the reasons discussed
77
previously. More number of standard cells are thus added with decap. But, without DT, DCOPT
converges fast. Compared to this, by dynamically changing the voltage drop threshold, DCOPT
is made to concentrate initially on a smaller region with higher voltage drop nodes. Low voltage
drop nodes are optimized later. This provides a tighter control of voltage drop which results into
smaller decap budget, and hence, smaller number of cells with decap. The downside is that more
number of iterations are required for complete optimization. As seen from Table 6.1, the 5% case
sets a tightest control, and reduces the peak voltage drop to approximately −94mV after 6 iterations.
Whereas without DT, and 10% DT case reduces the drop to −92mV and −93mv respectively after
6 iterations with “without DT” case being the fastest. Also, observe the total decap requirement for
three cases. In order to bring down the peak VD to approximately −94mV, “without DT” case adds
10.309pF decap after 5 iterations. This budget reduces to 8.711pF for “with 10% DT” case after 5
iterations. And the decap budget further reduces to 6.822pF for “with 5% DT” case, but it takes 6
iterations to arrive at −94mV voltage drop. Table 6.2 shows similar analysis for benchmark B14.
Figures 6.3 and 6.4 shows the change in voltage drop map for two benchmarks after i iterations for
given cases.
78
Table 6.1: DCOPT Results for Benchmark Barcode16
79
Highest Lowest
Drop Drop
80
Highest Lowest
Drop Drop
81
Chapter 7
Engineering Change Order (ECO) is a process of incorporating late stage changes in a design.
Often, a design after placement and routing phases requires small local changes such as gate sizing
or buffer insertions for timing and power fixes, layout modifications to reduce noise problems [45].
Given the time complexity of algorithms involved during physical design, it is unacceptable to re-
iterate through the complete design flow to incorporate these changes. Moreover, algorithms in
a typical design flow are general purpose, designed to generate physical placement and routing
from scratch. In most of the cases, design changes are requested to optimize certain design metrics
with respect to the current layout configuration. A placement from scratch may invalidate those
optimizations. ECO process saves considerable development time by incorporating these changes
incrementally in the design. Since changes are applied locally and incrementally, ECO placement
can optimize design metrics without a significant perturbation to the original placement.
An important characteristic of an ECO placer is that it should apply design changes to the orig-
inal placement with minimal perturbation so as to maintain design quality metrics of the original
placement. Accommodating design changes with certain design objectives such as voltage drop op-
timization places an additional requirement on the ECO placer of maintaining the relative placement
order of the standard cells. It is equally important that ECO process takes substantially less time
as compared to general purpose placement algorithms, which can take many hours to even days
depending on design complexity and available resources to produce a good placement.
82
ECOs can be applied at various stages of design flow. In this chapter, we are concerned with
accommodating changes in an already placed standard cell based design. More specifically, we
are interested in algorithm for post-layout design optimization using an ECO process. In chapter 5,
we presented a voltage drop optimization framework which requires incorporating equivalent decap-
padded standard cells in place of OSULIB standard cells. As discussed in previous chapters, precise
physical placement control of decaps is important from voltage drop point of view. Therefore, we
present, in this chapter, a C++ based Engineering Changer Order (ECO) Placement tool to cater to
this need.
Increasing design complexity has made the process of design optimization so expensive that in-
cremental techniques to evaluate different alternatives are often sought after. Research literature is
splattered with number of incremental placement techniques. Many of these techniques are geared
toward modifying placement and accommodating changes incrementally during the physical syn-
thesis stages itself. They work in close synchronization with global and detail placers to apply
design changes incrementally [46, 47]. Our focus is to apply changes to a design, which has already
been placed (i.e. post-layout designs). We further restrict application of ECO technique to standard
cell based layout.
Many standard CAD tools support various ECO flows [32]. These ECO flows typically update
the placement by comparing old design netlist with new netlist. The netlist changes are incorporated
into the original placement such that it leads to least total movement of cells thereby generating
a new placement which is close to original placement. However, a placement modification with
only minimal cell movement objective may cause some of the cells to be placed at undesirable
locations. Some objectives like voltage drop optimization are highly dependent on the location of
cells. Updating a placement with least cell movement can result in unacceptable voltage drop results
thereby making the sole purpose of incorporating design changes ineffective.
Algorithms for incremental placement modification for post-layout standard cell based designs
are presented in [48, 49, 50]. In [48], authors presented an ECO algorithm to improve useful clock
skew of a design. Cell positions are locally adjusted in an attempt to enlarge positive and negative
skews. Incremental placement modification to improve post-route routing congestion in a stan-
83
dard cell based design is given in [49]. However, these algorithms do not incorporate new design
changes. Cell positions in original placement are modified so as to optimize particular design ob-
jective such as clock routing or routing congestion. In [50], an incremental placement algorithm
for a standard cell layout is presented to incorporate design changes while maintaining placement
close to original. Requested changes are applied to the placement with objectives of wirelength
minimization, and least total movement of cells. Although, relative placement order of cells is also
maintained, which is an important from voltage drop point of view, updating a placement for voltage
drop optimization was not the main theme of work. We present an ECO placement algorithm with
modification of approach presented in [50] to apply changes to a post-layout standard cell based
design. The proposed modification reduces the overall computational matrix size significantly as
described in later sections thereby enabling a faster and efficient ECO process. We further extend
the algorithm to support for variable core area. This is important if design changes cannot be in-
corporated due to insufficient whitespace in the design. Also, unlike [50], our algorithm allows for
design changes that lead to whitespace recovery, which can be used for subsequent design changes.
With overall objective to update a design placement from voltage drop point of view, we present
an Engineering Change Order Placement Algorithm: ECO-Placer for standard cell based designs.
Although, the algorithm serves as one of the component of voltage drop optimization framework
described in chapter 5, ECO-Placer can also be used in standalone configuration to apply post-layout
design changes. The main features of ECO-Placer are:
• As mentioned earlier, voltage drop inside a design not only depends on the power consump-
tion of logic cells, but also on their placement. A logic cell placed away from the supply
voltage input pad is likely to suffer a higher voltage drop compared to a cell in vicinity of
input supply. Therefore, a distributed decap placement approach can only be effective, if de-
cap placement can be controlled properly. ECO-Placer applies design changes at requested
locations while minimizing total cell movement and maintaining relative placement order of
instances.
84
• It applies design changes optimally with fast run time. ECO-Placer generates a significantly
smaller computational matrix to apply changes optimally as compared to [50].
• Quite often, design changes during ECO flow require core area modification. Fixed-die ECO
placers assume enough whitespace in the core area to accommodate necessary changes. Since
decap-padded standard cells have relatively higher area compared to nominal standard cells, a
core area increase might be necessitated for voltage drop optimization. ECO-Placer supports
variable core area to accommodate design changes.
• It supports three types of design changes: new cell insertion, existing cell deletion, replace-
ment of an existing cell with a new cell. In doing so, it allows for whitespace recovery. The
additional whitespaces can be used for subsequent design change requests.
The approach adopted in ECO-Placer is shown in Figure 7.1. Design changes are supplied to
ECO-Placer in a form of cell operation file described in Section 7.3. Each requested design change
is referred as an operation on a design. Operations may include insertion of a new cell, deletion
of an existing cell or replacement of an existing cell with a new cell. We regard original timing
and wirelength driven placement as a reference placement. ECO-Placer incrementally modifies the
reference placement in three phases to apply the operations and generate a new placement DEF. In
Phase I, ECO-Placer tries to apply operations optimally, if enough whitespace is present the design.
Phase II generates whitespace to accommodate operations by selecting candidate cells from a row
and moving them to their optimal position. Core area is increased in Phase-III, if Phase-II is unable
to generate required whitespace. Algorithmic detail of these three phases is discussed in section 7.4.
ECO-Placer algorithm requires following inputs, supplied in the form of a command file.
85
Input Command File
Design Placement Design Netlist Cell Cell Filler Prefix
(DEF) (Verilog) Operations Area Max Core Inc %
Output
Modified Placement (DEF)
86
(DEF) generated using a perl script. This file is same as the one used for DC Server. This acts
as a reference placement, which ECO-Placer modifies incrementally.
87
centage. ECO-Placer can only increase core area up to this limit.
• Filler Prefix
Filler cells are inserted in available whitespace during a design placement. Filler cell instance
name is typically prefixed with a user defined string for easy identification. Filler prefix string
is required to identify filler cells in a design and to recover available whitespace in a design.
7.4 Algorithm
Algorithm 3, 4, 5, 6, 7 show pseudo codes for ECO-Placer with Algorithm 3 as its main program.
As maintained earlier, ECO-Placer takes inputs defined in command file, and generates a modified
placement in standard design exchange format (DEF). Command parameters such as filler prefix,
maximum allowable core area increase are read from the command file. Current core area, rows
and placement grid size are obtained from input placement file. Each cell in design is connected to
few other cells, referred to as neighbors, through its inputs and outputs. Design verilog file is parsed
to generate cell connectivity database for each cell in the design. The cell connectivity database is
required for wirelength estimation during phase II. From cell operations list, algorithm prepares row
wise cell operations list. A row is termed as an operation row, if one or more number of operations
is to be performed on that row. Algorithm enters phase-I to apply operations in each operation row.
Phase-I fixes all operations without regard to available whitespace in the operation row. As a result,
some of the rows at the end of Phase-I becomes violating rows. A row is called a violating row, if
total logic width in that row exceeds the core width. These violating rows are operated in phase-II.
Cells from violating rows are selected and moved to their optimal position in order to make that row
non-violating. If phase-II cannot make the row non-violating, Phase-III increases the core area to
create additional whitespace and Phase-II is repeated again. When all rows become non-violating,
then a modified placement in DEF format is generated. Details of individual phases are given in
subsequent subsections:
88
Algorithm 3: ECO-Placer main
Input: ECO-Placer Command File
Output: Modified Placement DEF
type of operation. A positive value indicates consumption of available whitespace of row, whereas
negative value makes an addition to whitespace repository of the row. A zero whitespace demand
causes no change in available whitespace in the row. Whitespace demand wdi for an operation oi is
calculated as follows:
• For REPLACE operation, an existing cell having width worg will be replaced by a new cell
with width wnew . Hence, whitespace demand wdi is given as
• For DELETE operation, deletion of an existing cell leads to whitespace recovery. Hence we
89
have,
wdi = −worig (7.2)
• For INSERT operation, whitespace demand will be equal to width of a new cell. Hence we
have,
wdi = wnew (7.3)
Therefore, total whitespace demand for applying n operations on row r will be the sum of whitespace
demand for individual operation. Mathematically, this can be written as
n
Required Whitespace: W Sr = ∑ wdi (7.4)
i=1
Two cases can be considered. Algorithm 4 shows the pseudo-code for these two cases:
Case 1: (W Sr −W Sa ) ≥ 0
Whitespace available in operation row r is less than or equal to the required whitespace to
apply n operations. Hence, row r can not accommodate requested operations. In this case,
we skip the optimization process since all cells in the row will be moved anyway. Row r is
allowed to expand beyond core width. All requested operations are applied from left to right.
As a result, row r crosses the core boundary. This situation leads to a violating row. Once all
operations are legalized, a design rule check is performed to ensure that adjacent cells do not
overlap. Algorithm enters into Phase-II to operate on violating rows.
Case 2: (W Sr −W Sa ) < 0
The whitespace demand to apply operations to an operation row r is less than available whites-
pace of row r. This indicates that row r can accommodate all operations without crossing the
core boundary. In such cases, optimal solution is found to apply operations such that it leads
to minimum perturbation (total movement of cells). In order to apply an operation, an inci-
sion point is defined. Incision point is a point at which requested operation is to be applied.
Thus, incision point divides row r into two halves: left and right. In case of DELETE and
REPLACE operations, incision point is aligned with the leftmost boundary of the cell un-
der operation (cell to be deleted or replaced). Hence, cell under operation becomes part of
right halve of row r. Alignment of incision point in case of INSERT operation depends on
whether the requested insertion location for the new cell is occupied by a logic cell. If it is,
incision point aligns with leftmost boundary of the overlapped cell, and new cell is inserted
90
Algorithm 4: ECO-Placer Phase-I: Fixing Operations
Input: Row r, Required Whitespace reqws
Output: Row Status rowsts
availws = GetAvailableWhitespace(r);
if reqws ≥ availws then
Legalize operations in row r from left to right;
rw ← Update row r width;
Perform design rule check (DRC) for r;
if rw > cw then rowsts ← VIOLATED;
else
foreach Operation o for r in OpList do
incpt ← Get Incision Point for o in r;
/*Applying operation optimally*/ ;
if reqws > 0 then
lbuck ← Create (cell, gain) bucket left of incpt;
rbuck ← Create (cell, gain) bucket right of incpt;
s ← FindLeastCellMovement(lbuck, rbuck);
end
Legalize operation(o, r, s);
Perform design rule check (DRC) for r;
end
end
before or after the overlapped cell depending on whether requested location is near left or
right boundary of the overlapped cell. Otherwise, if the requested insertion location falls onto
a whitespace, incision point is aligned with rightmost boundary of cell just before the whites-
pace, and cell is inserted at that location only. Optimization process is applied only if an
operation requires creation of whitespace or in other words, whitespace demand is positive.
Whitespace demand for DELETE operation is always negative, and hence, optimization is
skipped for it. REPLACE operation also does not require optimization, if width of new cell
is less than the existing cell. Such cases always result into whitespace recovery, which can
be utilized for subsequent operations. For all other cases, algorithm discussed next is used to
apply operations optimally.
91
Finding an Optimal Solution
We seek an optimal solution for applying an operation o such that least number of cells are moved
from their original position, and relative placement order of cells is maintained. Optimal solution
is found by moving least number of cells in horizontal direction in an operation row r to satisfy
whitespace demand of o. Assume that whitespace demand for operation o is wd. As discussed
in previous paragraph, an incision point for an operation o creates two sub-rows, a left sub-row
and a right sub-row, of an operation row r. Relative placement order is maintained by restricting
movement of cells in left sub-row only to left and cells in right sub-row only to right. Both left and
right sub-row contains number of cells and whitespaces. Any unused area between two adjacent
cells is counted as a single whitespace. We define following parameters for left sub-row:
Therefore, if first whitespace from incision point is consumed, we get GL1 = gl1 , and NL1 = nl1 .
If first two whitespaces are consumed, GL2 and NL2 are (GL1 + gl2 ) and (NL1 + nl2 ) respectively.
Continuing this, if first i whitespaces counted from incision point are consumed, we get
i≤WL
GLi = ∑ glk (7.5)
k=1
i≤WL
NLi = ∑ nlk (7.6)
k=1
Similar equations for right sub-row can be obtained. In this case, first i whitespaces are counted
from incision point to the right. Definition of parameters is similar to left sub-row, only L (left) is
replaced by R (right).
j≤WR
GR j = ∑ grk (7.7)
k=1
j≤WR
NR j = ∑ nrk (7.8)
k=1
92
Therefore, optimal solution to find least movement of cells while maintaining relative placement or-
der of cells in a row r can be obtained by solving following integer linear programming formulation
of the problem:
Minimize (NLi + NR j )
Subject to
0 ≤ NLi ≤ CL , and i ≤ WL
0 ≤ NR j ≤ CR , and j ≤ WR
GLi + GR j ≥ wd (7.9)
In order to find an optimal solution, we can get following two cell(gain) buckets from equations 7.5
to 7.8:
Left Bucket:
NL1 (GL1 ), NL2 (GL2 ), . . . , NLi (GLi )
Left bucket contains i elements in an ascending order, such that ith node is the first element
satisfying the condition: GLi ≥ wd and i ≤ WL . Each bucket element corresponds a whitespace in
left sub-row, and is represented as a pair of cumulative number of cells moved to get cumulative
whitespace gain, if that whitespace is to be consumed.
Right Bucket:
NR1 (GR1 ), NR2 (GR2 ), . . . , NR j (GR j )
Right bucket contains j elements in an ascending order, such that jth node is the first element
satisfying the condition: GRi ≥ wd and j ≤ WR . Similar to left sub-row, each bucket element
corresponds a whitespace in right sub-row.
Number of potential solutions ps satisfying equation 7.9 can be obtained by combining each
element of left bucket with every element of right bucket. Every potential solution containing (NL +
NR) represents movement of NL cells in left row and NR cells in right row. Optimal solution would
be a potential solution having least (NL + NR). Since the size of left and right bucket is WL and WR
respectively, combining left bucket elements with right bucket elements would result in worst case
search space size of χs ← (WR × WL ). However, this worst case situation will never occur due to
the fact that the elements in left and right bucket are arranged in successively increasing order. We
93
Algorithm 5: FindLeastCellMovement
Input: Left Bucket lbuck with i elements, Right Bucket rbuck with j elements
Output: Optimal solution s
make use of this property to combine bucket elements in a specific order, which reduces the search
space significantly. The search space is explored by scanning the left bucket in descending order.
For each element of left bucket, we combine it with element of right bucket in ascending order until
a first potential solution in right bucket direction is encountered. Exploration process stops when
either all elements have been explored, or right bucket has been completely scanned at least once. A
pseudo-code for the process is shown in Algorithm 5. As a result of this, the solution space explored
94
CL = 12 WL = 5 CR = 13WR = 7
4 8 2 3 2 1 2 3 3 4 4 2
by our modified approach is always going to be less than χs . Further, comparing to this, the solution
space explored in [2] is equal to ψs ← [(CR + 1) × (CL + 1)], because authors in analyzes left and
right sub-rows on cell by cell basis. Since the total cell count in a typical row in a standard cell
based design is always significantly higher than the total number of whitespaces, we have χs ≪ ψs .
Hence, modified approach finds an optimal solution with significantly less number of computations.
Figure 7.2 exemplifies the difference.
Figure 7.2 shows an example operation row with incision point at Cell 12 for an (INSERT)
operation. Suppose the whitespace demand for this operation is 8. Necessary row parameters are
shown in the figure. Solution space generated by [50] contains 156 matrix elements. Compared to
this, the solution space produced by our approach contains only 15 elements as shown in Figure 7.3.
All potential solutions are shown in bold, and optimal solution is underlined. Clearly, number of
computations required are significantly reduced using this modified approach.
Phase-II of ECO-Placer is invoked, if application of Phase-I results into one or more violating
rows. Figure 7.4 shows an example row distribution at the end of Phase-I. Three types of rows can
be identified from Figure 7.4: (1) Rows having no whitespaces, and aligning with the core boundary
(for ex, row 2), (2) Rows having whitespaces and remaining inside the core boundary (for ex, rows
3, 7), and (3) Violating rows crossing the core boundary (rows 4, 5). Note that as a result of Phase-I,
violating rows do not contain any whitespaces. Phase-II operates on each of these violating rows,
and moves enough number of cells from violating rows to second type of rows, so that violating row
becomes non-violating. Algorithm 6 shows the pseudo-code for Phase-II.
95
Left Bucket
Right Bucket
ROW 9
ROW 8
ROW 7
Violating
ROW 6 Rows
ROW 5
ROW 4
ROW 3
ROW 2
ROW 1
ROW 0
96
Consider a violating row vr having width Wvr . Assume core width is cw. Number of cells with
widths w1 , w2 , . . . , wn are selected and moved from vr to rows which can accommodate these cells,
such that
n
(Wvr − ∑ wi ) ≤ cw (7.10)
i=1
These cells are called Candidate Cells. Candidate cells are free to move in horizontal as well as
vertical direction. On the other hand, some of the cells, allowed to move only in horizontal direction,
are referred to as Locked Cells. Cells operated in Phase-I belong to this later category. A candidate
cell cc is selected for movement, if it satisfies following three conditions:
2. Optimal position of cell cc belongs to a row of type (2), referred to as an optimal row.
3. Optimal row can accommodate cell cc without being converted to a violating row.
Optimal position of a candidate cell is calculated based on the balance of forces exerted on can-
didate cell by its neighbors. A cell is typically connected to few other cells through its inputs and
outputs. The force experienced by a candidate cell due to its neighbor is proportional to their con-
nection length (wirelength). Optimal position is one where forces due to neighboring cells balance
out, and candidate cell experiences a zero-force. A cell in its zero-force location also minimizes
total wirelength, which is defined as sum of individual connection length to each neighbor. Optimal
position zpos ← (xopt , yopt ) of a candidate cell is calculated as [51]:
nc
∑ λk ·xk
k=1
xopt = nc (7.11)
∑ λk
k=1
nc
∑ λk ·yk
k=1
yopt = nc (7.12)
∑ λk
k=1
Where, nc refers to number of candidate cell neighbors, λk is the connection weight for each neigh-
bor.
97
Candidate Cost Calculation
A candidate cell is selected for movement to zpos, if it minimizes following weighted cost func-
tion:
cccost = α ·C1(cc) + β ·C2(cc) + γ ·C3(cc) (7.13)
Where, α, β, and γ are the weights set empirically, and C1,C2,C3 are the cost functions described
below:
C1(cc) represents the first component of candidate cell cost in terms of change in wirelength,
if candidate cell is moved to its optimal position. We want to minimize this component. C1(cc) is
given as:
C1(cc) = ∆W L(zpos, nc) (7.14)
Total wirelength of a net connecting a candidate cell cc to its neighbors nc is given by [51]:
nc nc
W Lcurr = ∑ (x − xk ) + ∑ (y − yk ) (7.15)
k=1 k=1
Where, wti is the weight of edge connecting cc to neighbor i , (x, y) refers to candidate cell position,
and (xi , yi ) refers to neighboring cells position.
When a candidate cell moves to its optimal position zpos, equation 7.15 can be used to calculate
new wirelength, W Lopt , of the net. Then, following defines the change in wirelength:
Where, SIZE refers to size of candidate cell cc. Higher the width of cell cc, less will be the cost C2.
This indicates, less number of candidate cells are required to be moved to adjust the violating row.
Where, PWR refers to power consumption of candidate cell cc. Lower the power consumption of
cell cc, less will be the cost C3. A candidate cell with lower power consumption is selected because
moving this cell to a new row will result into less load on the power distribution connection for that
row, and hence would result in less voltage drop.
98
Algorithm 6: ECO-Placer Phase-II: Candidate Cell Selection and Movement
Input: Violating Row vr
Output: Row Status rowsts
If there remains one or more violating rows even after core area increase, Phase-II is applied
again. Since Phase-III has created extra whitespaces in many rows, application of Phase-II again
can now resolve violation rows by moving cells to their optimal position.
Application of Phase-II and Phase-III continues in a loop, until all rows become non-violating
99
i.e. belong to type (1) or type (2) category, or core dimension has reached its maximum limit as
defined by user.
FAILST S ← FALSE;
if cw < maxcw then
cw += placement grid size;
foreach ViolationRow vr in V List do
Update vr;
end
else
Report: operations can not be applied;
ST S ← TRUE;
end
return FAILSTS;
ROW 9
ROW 8
ROW 7
Violating
ROW 6 Rows
ROW 5
ROW 4
ROW 3
ROW 2
ROW 1
ROW 0
Figure 7.5: Reduction in Number of Violating Rows with Core Area Increase
100
7.5 Experimental Results
The ECO-Placer algorithm discussed in this chapter is evaluated on benchmarks given in Section
4.5.1. Experiments were performed on Sun Blade 1000 machines (SparcV9 Processor, 750MHz,
2GB RAM) with Solaris platform.
We compare the Placement results generated by our ECO-Placer with Cadence Encounter ECO
generated placement. We compare two placements specifically in following terms:
• Quality of Placement generated from voltage drop point of view - Since developed ECO-
Placer is aimed at applying operations for voltage drop optimization, generated placement
after ECO flow should reflect the expected voltage drop results.
• Total number of cells moved - One of the important characteristics of ECO-Placer is to apply
operations with least total number of cells moved. We compare this metric with Encounter
ECO placement.
• Change in Worst Negative Slack - Worst negative slack of the design before and after ECO-
Placement is also compared.
Table 7.1 shows the ECO-Placer results. Upper table gives the specifics of initial placement of
benchmarks. Columns 2 to 4 show the total number of cells, peak voltage drop of the design,
total wirelength of initial placement, and worst negative slack in the design. Column 2 in lower
table gives the number of operations to be applied through ECO flow. Here, we consider only RE-
PLACE operations (replacing a OSULIB standard cell with an equivalent decap-padded cell from
UCDCLIB). These operations are applied to the initial placement using Cadence Encounter ECO
(Column 3) as well as using our ECO-Placer approach (Column 4). And last 5 columns show the
percentage change in ECO-Placer generated placement metrics with respect to initial placement
metrics and Encounter ECO placement metrics respectively. As seen from the tabular results for all
three benchmarks, our approach, the ECO-Placer, applies all the requested operations with minimal
perturbation and least total number of cells movement. As a result, unlike Cadence Encounter ECO
placement, the voltage drop profile of a ECO-Placer generated design shows significant improve-
ments. For example, the peak voltage drop of benchmark B14 is −174.84mV before ECO. Total 336
101
operations are applied to B14 initial placement. The Cadence Encounter ECO degrades the peak
drop to −211.41mV, whereas ECO-Placer improves the voltage drop to −140.82mV. This is due
to the fact that voltage drop is placement dependent, and ECO-Placer applies operations such that
relative placement order of cells is maintained. Our algorithm improves the peak voltage drop by
−19.46% (Column 5) over peak drop in initial placement and by −5% (Column 6) over peak drop
in Encounter ECO placement. Further, it can be seen that ECO-Placer requires less number of cell
movements (#TCM) as compared to Cadence ECO to apply these operations. Column 7 shows that
total wirelength of placement generated by ECO-Placer increases as compared to initial placement
wirelength, however the increase in wirelength is very small. Moreover, ECO-Placer is not aimed
at wirelength optimization. Maintaining a relative placement order from voltage drop point of view
can result in slight increase in wirelength. However, ECO-Placer provides impressive wirelngth
results (Column 8) over Cadence Encounter ECO. ECO-Placer results in decrease in wirelength
as compared Encounter ECO for two cases. Lastly, ECO-Placer provides better results for worst
negative slack in the design as compared to Cadence Encounter ECO. Results clearly exemplify the
effectiveness of ECO-Placer algorithm.
Further, in order to highlight the characteristics of ECO-Placer in maintaining the relative place-
ment order and generating a placement from voltage drop point of view, we show the voltage drop
profiles for initial design placement and ECO placed design in Figures 7.6, 7.7, and 7.8. From fig-
ures, it can be noticed clearly that the ECO-Placer generated placement results into a better voltage
drop profile as compared to that generated by Cadence Encounter ECO placement.
102
Table 7.1: ECO-Placer Results
Lowest
Drop
VD Color Map
DvD on Initial Placement
Peak VD = -105.63 mV
ENCOUNTER ECO ECO-PLACER
DvD on Encounter ECO generated Placement DvD on ECO Placer generated Placement
Peak VD = -98.49 mV Peak VD = -92.78 mV
104
Design: B14
Highest
Drop
Lowest
Drop
VD Color Map
DvD on Initial Placement
Peak VD = -174.84 mV
ENCOUNTER ECO ECO-PLACER
DvD on Encounter ECO generated Placement DvD on ECO Placer generated Placement
Peak VD = -211.41 mV Peak VD = -140.82 mV
105
Design: B18
Highest
Drop
Lowest
Drop
VD Color Map
DvD on Initial Placement
Peak VD = -258.61 mV
ENCOUNTER ECO ECO-PLACER
DvD on Encounter ECO generated Placement DvD on ECO Placer generated Placement
Peak VD = -248.80 mV Peak VD = -233.44 mV
106
Chapter 8
In this Chapter, result of our voltage drop optimization framework on benchmarks described
in Section 4.5.1 are presented. Sections 4.5.1 and 4.5.2 provides benchmark details and analysis
procedure. We analyze four different cases outlined in Section 4.5.2 for each benchmark. We
summarize these cases briefly for easy reference:
• Pre-Opt: case refers to volage drop analysis on nominal design (initial place-n-outed design).
In this case, total decap budget comes from intrinsic cell decap.
• Post-Opt(F): case refers to voltage drop analysis on nominal design optimized using only
filler-based decap approach (traditional method). In this case, total decap budget comes from
intrinsic and filler-replaced decap cells.
• Post-Opt(D): case refers to voltage drop analysis on nominal design optimized using our
approach only. The decap in this case comes from intrinsic cell decap and decap-padding
from UCDCLIB cells.
• Post-Opt(DF): case refers to voltage drop analysis on nominal design optimized using both
our as well as traditional approach. Decap sources in this case are intrinsic cell decaps, filler-
replaced decaps and decap-padding from UCDCLIB cells.
107
8.1 Barcode16 Design
The results of DvD analysis on this nominal design are shown below under Pre-Opt case. We use
PrimeRail decap insertion flow to optimize voltage drop. After replacing all filler cells with decap
cell masters from DCFLIB library, results of DvD analysis obtained are shown under Post-Opt(F)
case in the following table.
The result of DvD on this design optimized only using UCDCLIB cells are shown under Post-
Opt(D) case. We then replace all filler cells with decap masters from DCFLIB library using PrimeRail
decap insertion procedure, and report the result of DvD under Post-Opt(DF).
Figures 8.1 shows the peak voltage drop and decap budget result for four different cases in
graphical form. From figures, it is clear that improvement in voltage drop due to filler-replaced
decap is not as significant as that obtained using decap-padded standard cells. After replacing all
108
Design Total Design Peak VD
Case Decap (pF) (mV)
Post-Opt(D) 522.584 -92.78
Post-Opt(DF) 556.196 -90.93
Figure 8.1: Optimization Result Graphs for Barcode 16: Peak VD and Decap Budget
filler, voltage drop improves to -100.49 mV only with decap budget os 544.01 pF. Whereas, our
approach reduces voltage drop to -92.78 mV with decap budget of 522.584pF only. Further we can
see that, replacing all filler in this optimized design does not improve drop by a large amount even
though it takes large decap budget. Figure 8.2 shows voltage drop map for these four cases.
DvD analysis results for Pre-Opt and Post-Opt(F) cases are shown in following table. All 1507
fillers are replaced with decap masters for Post-Opt(F) case.
109
Highest Lowest
Drop Drop
110
Figure 8.3: Optimization Result Graphs for B14: Peak VD and Decap Budget
DvD analysis results for Post-Opt(D) and Post-Opt(DF) are given in following table. All 1385
filler cells are replaced for Post-Opt(DF) case.
Figures 8.3 shows the decap budget and voltage drop result for four different cases in graphical
form. From figures, it is clear that Post-Opt(F) case after taking 29.825pF more budget improves
voltage drop marginally. Whereas, Post-Opt(D) case shows improvement in voltage drop by ap-
proximately 20% with 16.29 pF more decap over Pre-Opt case. Replacing fillers in this case too
shows inefficacy of filler-based decap apparoach. Figure 8.4 shows voltage drop map for these four
cases.
111
Highest Lowest
Drop Drop
DvD analysis results for Pre-Opt and Post-Opt(F) cases are shown in following table. All 4791
fillers are replaced with decap masters for Post-Opt(F) case.
112
Design Total Design Peak VD
Case Decap (pF) (mV)
Pre-Opt 5522 -258.61
Post-Opt(F) 5618 -254.19
DvD analysis results for Post-Opt(D) and Post-Opt(DF) are given in following table. All 3885
filler cells are replaced for Post-Opt(DF) case. Figures 8.5 shows the decap budget and voltage drop
result for four different cases in graphical form. From figures, it is clear that Post-Opt(F) case after
taking 96pF more budget improves voltage drop just by 1.7%. Whereas, Post-Opt(D) case shows
improvement in voltage drop by approximately 10% with 111 pF more decap over Pre-Opt case.
Replacing all fillers in Post-Opt(D) design shows a marginal improvement of 0.4% in voltage drop,
again showing inefficacy of filler-based decap apparoach. Figure 8.6 shows voltage drop map for
these four cases.
Figure 8.5: Optimization Result Graphs for B18: Peak VD and Decap Budget
113
Highest Lowest
Drop Drop
8.4 Summary
Table 8.1 summarizes results discussed in previous sections for the benchmarks in one table. For
each benchmark, peak voltage drop and total design decap repository before and after replacement
is compared. As seen from the table, compared to Post-Opt(F) approach, Post-Opt(D) case offers
much better voltage drop results with a relatively small decap requirement. This indicates significant
decap saving, and hence proportional area savings on the die. The percentage improvement in
voltage drop due to Post-Opt(D) over Pre-Opt case is given in last Column. Clearly, results shown
114
highlights effectiveness of distributed decap placement using decap-padded standard cell library.
Further, as indicated previously in Section 4.5, due to a technical bug in the Synopsys’ PrimeRail
tool, we identified the replacement cell list graphically instead by using DCOPT Algorithm. During
the DCOPT iterative analysis using scripts, PrimeRail crashes with segmentation fault after some
random number of iterations. Therefore, we resorted to graphical identification of cells which are
to be replaced with cells from UCDCLIB and DCFLIB to show the approach. Nevertheless, the
graphical identification of replacement cells does not undermine the proposed optimization frame-
work. The effectiveness of DCOPT has been demonstrated in Chapter 6 by generating violation
report for DCOPT through PrimeRail GUI (only this part is executed graphically, all other opera-
tions of DCOPT are executed by algorithm described earlier). Also, ECO-Placer is not affected by
this process. ECO-Placer is an independent stand-alone tool. The effectiness of ECO-Placer has
been shown in Chapter 7. As of this thesis write-up day, the PrimeRail bug has not been resolved
by Synopsys. If it is resolved in future, we will include optimization results using DCOPT also. For
present setup, we include manual iteration results for at least two benchmarks. The steps for manual
analysis are given below.
The place-n-routed benchmark design is analyzed for DvD analysis, and initial voltage drop pro-
file is obtained. We set the voltage drop threshold to some value higher than the peak voltage drop
(negative), and identify the list of cells which are under violation (having drop more than the thresh-
old) using Perl scripts. These cells are replaced with decap-padded standard cells from UCDCLIB
using ECO-Placer. ECO-Placer generates a modified DEF placement. We also generate a modified
verilog netlist using Perl scripts. We eco route the generated placement, and final place and routed
design is saved. Design parasitic are also capatured in SPEF file. We perform DvD analysis on
this new place-n-routed design, and observe the voltage drop profile. Since this design has decap-
padded cells, we should expect a improvement in voltage drop profile. This forms the iteration 1.
We follow the same steps again to perform one more iteration of DvD analysis. Therefore, for each
benchmark, we get voltage drop improvement results for two iterations. We perform the manual
analaysis this way to emulate the behavior of DCOPT iterations. However, note that using DCOPT,
ECO-Placement step is executed only once. The manual analysis requires ECO-Placement for each
iteration. Hence, results of manual analysis does not reflect DCOPT behavior accurately, however,
it helps in highlighting the approach in the scenario of inconsistent behavior of PrimeRail tool. This
is shown in Figures 8.7 and 8.8 for benchmark barcode16 and B14 respectively. From figures, it is
clear, that the with multiple iterations, voltage drop using our approach improves significantly.
115
Table 8.1: Summary of Voltage Drop Optimization Results
Barcode16 7574 505.476 -105.63 544.010 -100.49 -90 549 522.584 -92.78 556.196 -90.93 -12.17
B14 11780 778.81 -174.84 808.63 -169.46 -130 336 795.095 -140.82 822.506 -139.31 -19.46
B18 71761 5522 -258.61 5618 -254.91 -200 2825 5633 -233.44 5711 -232.57 -9.73
#Cells: Number of logic cells in the design #RCells: Number of logic cells (OSULIB) replaced with equivalent cells from UCDCLIB library
CD : Total decap budget of the design in pF Peak VD: Peak voltage drop of the design in mV
V thuser : User defined voltage drop threshold in mV † % Change in Voltage drop for Post-Opt(D) Case compared to Pre-Opt Case
Negative percentage value indicates improvement
Figure 8.7: Manual Optimization Results for Barcode16
117
Figure 8.8: Manual Optimization Results for B14
118
Chapter 9
In this thesis, we analyzed on-chip power supply integrity problem in standard cell based ASIC
designs and proposed a complete framework to optimize it. We demonstrated that decaps are ef-
fective means to contain the on-chip voltage drop within bounds, however proper decap placment
becomes an important factor as we go down the technology node. Experimental results show that
the traditional method of decap placement which involves filler cell replacement places decaps away
from the violating nodes, rendering them ineffective, and requires more-than-necessary decap bud-
get. A distributed approach of decap placement which places decaps near violating nodes is cost-
effective in terms of voltage drop reduction and required decap budget. We proposed a distributed
decap placement approach by providing a new standard cell library (UCDCLIB) where logic cells
are padded with decoupling capacitor. A decap optimization algorithm (DCOPT) is developed to
calculate decap budget of a design in terms of decap-paddded standard cells and filler-based decaps.
Lastly, we develop an efficient engineering change order placement tool (ECO-Placer) to incremen-
tally modify the original design to accommodate these decap-padded standard cells, and generate
a valid placement DEF. The framework is integrated with commercial tools. Experimental results
show the effectiveness of the developed framework.
• Since the decaps always work in linear mode and directly connects the power and ground
rails, the gate tunneling leakage due to total design decap can be significant. One possible
way to tackle this problem is to make use of thin and thick-oxide decaps during optimization.
Thin-oxide decaps provide a higher decap per unit area with increased leakage, whereas thick-
119
oxide decap takes more area and provides less decap per unit area, but significantly reduces
the leakage. Optimization algorithm can be extended further to calculate decap budget in
terms of both types of decaps considering total design leakage into account.
• In this work, we have applied decap optimization process to post-placement designs. Hence-
forth, we developed an eco-placer algorithm to incrementally modify the original placement.
Although this approach requires a re-spin, the cost of re-spin is minimized with an eco-placer.
A possible alternative to reduce re-spinning cost would be to consider decap optimization
during initial design placement. For a standard cell based designs, rows can be modeled as
equivalent RC network and effect of decap placement in terms of UCDCLIB cells can be
analyzed in-sync with other placement objectives, enabling the development of voltage drop
driven placement tool. Since the accurate voltage drop analysis is not possible until cells’
placement is fixed, decap optimization can be applied in two steps. During the global bin
placement step, decap for individual bins can be calculated based on coarse power grid struc-
ture, available space in the bin, cell power consumption and distance of bin from the core
periphery. Calculated decap budget can then be refined further at local placement step.
• The proposed framework has been restricted to on-chip voltage drop analysis. In order to
simply the analysis, we assume ideal power supply points at the chip IO interface. However, in
reality, package interconnect parasitics also contribute significantly to overall on-chip voltage
drop. Inductance effects due to package interconnect can also be explored further.
• Reduced gate dimensions with technology scaling has positive effect on the overall decap
density since gate capacitance inversely varies with respect to gate oxide thickness. Hence,
with technology scaling, decap-padded standard cell approach is expected to deliver higher
gains in terms of decap budget and area requirements. However, the downside of thin gate
oxide is the increased susceptibility of gate breakdown due to electrostatic discharge phe-
nomenon. Cross-coupled decap approaches [34] have been proposed to address these issues.
The developed optimization framework can be extended further to nanometer technology
nodes considering these phenomenons.
120
Appendix A
Synopsys tools require design data to be available in Milkyway database format (refer to Section
4.3). Two types of Milkyway libraries are to be created: a Reference library and a Design library.
Milkyway reference library contains various views for the standard cells and components, which
can be instantiated in a upper level design. Milkyway design library is created for top level designs,
which instantiates cells and components from reference library. This appendix describes required
steps to generate Milkyway reference and design libraries [52].
Milkyway reference library contains physical and logical views for the standard cells in Milky-
way database format. These views are shown in Figure 4.3. Milkyway physical view can be created
using standard cells’ physical views available in either GDSII or LEF format. Milkyway logical
view can be created using standard cells’ timing and power views available in either LIB or DB
format. Following describes steps to create reference library using LEF and DB flow.
• Input Requirements:
121
capacitance for the technology node. The Milkyway technology file (mw tech.tf) for
0.18µm process can be obtained from OSU standard cell library.
2. Library LEF File: contains physical information about the cells. Refer to Section 5.4.2
for information about LEF file creation.
3. Library DB File: contains timing and power information for standard cells. Refer to
Section 5.4.3 for information about library timing view (DB) creation.
• Steps:
2. PrimeRail user interface supports Scheme as well as TCL mode for command input. Ap-
propriate mode can be selected either by typing begin scheme for scheme and begin tcl
for tcl mode, or by cliking at lower left buttons marked “Scheme” and “Tcl”. Type fol-
lowing command in Scheme mode to open a dialog box shown in Figure A.1. There are
two steps for library preparation now.
> read lib
122
Figure A.1: Reference Library Creation Dialog Box
123
(e) Click “Check Wire Track”. Defaults OK.
(f) Click “Create PDB”. In the opened dialog box, enter reference library name. Select
“Import PDB”. Select “From FRAM View”. Click OK.
(a) Once physical library views are prepared, click on “Prepare Logical Library” button
on read lib dialog box. The dialog box expands. Click “Logical Input Format:
LIB/DB”.
(b) In the opened dialog box, click “Import Logic Model DB”. Click “Select DB”. In
the opened dialog box, set Min, Max, and Typical DB to standard cell library DB
format file. Set “Port Directions” Checked. Click Apply.
5. At this stage, both physical and logical views have been created. You should see the
Milkyway library files under the folder with reference library name.
6. To check if library cells have been imported properly in Milkyway database, open the
reference library (Library->Open Library). Open cell (Cell->Open->Browse->All
versions). You should see FRAM and LM views for all library cells.
Milkyway design library can be created using input design DEF file. A place-n-routed design can
be saved in DEF file format. DEF file defines physical layout of design including instantiated cells
and macros, design floorplan, power and signal routing, netlist, and constraints. Steps to generate
Milkyway design library using DEF file are described below:
• Input Requirements:
2. Milkyway Reference Libraries: reference libraries created above, if the design instanti-
ates components from reference library.
• Steps:
124
1. Start PrimeRail (see above). Create design library by typing following in Scheme mode
or clicking Library->Create. Enter information as discussed above.
> cmCreateLib
2. Attach Milkyway reference libraries to the design. Since design instantiates components
from reference libraries, we need to provide a logical reference to appropriate Milkyway
reference libraries. Type following in Scheme mode or click “Library->Add Ref...”.
Specify library paths and Click OK.
> cmRefLib
5. Import DEF. Enter following to open DEF import dialog box shown in Figure A.2
> read def
Enter following information. And click OK. DEF file would be imported at this stage.
125
Netlist Input Mode Reset & Import
Physical Input Mode Reset & Import
Row Options
Core Site Name core
Via Options
Import Incremental LEF... Checked
LEF File Name Name of library LEF file.
Others default
6. Verify design library. Click Cell->Open or type geOpenCell in Scheme mode). Select cell
name (Browse->Cells) and click OK. You should see the design layout in the display
window.
126
Appendix B
As discussed in Section 4.3, an accurate and efficient voltage drop analysis requires underlying
cells’ parasitic and transient current information at each switching event. PrimeRail provides a
library characterization flow to capture these information for each library cell. The characterization
results are stored in the Milkyway cell reference library. Library characterization can also capture
cell leakage information. Following steps are used to characterize library cells.
• Input Requirements:
1. Milkyway Reference Library: LM (Logic Model) view from reference library is re-
quired to access cell timing and power information.
2. Transistor Model File: Transistor model file for the technology node.
3. Spice SubCkt Cell File: A file containing spice netlist in .subckt format for all cells.
4. Port Specification File: Port specification file (pg.spec) specifies power and ground port
used in reference library, voltage levels for each port, and mapping information for each
port. The mapping information is needed to express current source and sink. Example
of pg.spec file is shown below:
pg.spec file
definePowerPort “vdd” 1.8
definePowerPort “gnd” 0
127
defineDefaultPowerPort “vdd”
defineDefaultGroundPort “gnd”
defineGroundPowerMapping “gnd” {“vdd”}
• Steps:
This single command accomplishes the task of running four individual commands:
pgSpiceSetup, pgPreCharacterize, pgLinkPGSpec, and pgLinkCharacterize. If pgLibCharacterize results into er-
rors, library characterization can be performed by running these individual commands in
order. Running pgLibCharacterize opens a dialog box shown in Figure B.1. This dialog box
can also be opened using “Cell-Level Dynamic Analysis->Library Characterization...”
menu.
4. Enter following information in the opened dialog box. And click OK.
128
Distributed Processing Unchecked.
> pgValidateLib
> pgListCharResult
129
Appendix C
Dynamic voltage drop analysis (DvD) on a design can be performed, once Millkyway design
and reference libraries have been created, and libraries have been characterized properly. Refer to
Sections 4.3 and 4.4 for understanding of primerail voltage drop analysis flow. In this appendix, we
provide steps to perform DvD analysis using PrimeRail.
• Input Requirements: Input requirements for DvD analysis are discussed in Section 4.3. We
briefly summarize requirements here for easy reference.
8. TLU+ Models for interconnect RC extraction or ITF (Interconnect technology file) for
technology node.
130
• Steps: Following steps describe commands to perform cell-level DvD. These steps can also
be performed graphically from “Cell-Level Dynamic Analysis” menu.
> geOpenLib
2. Purge Old Rail View, if any. (DvD analysis results are stored in RAIL View)
> poPurgeRail
3. Perform Power Analysis: PrimeRail performs design power analysis by invoking PrimeTime-
PX. A script template to perform power analysis using PrimeTime-PX can be created
by executing following command:
> poCreatePTPXScriptTemplate
Modify the output script. Specify verilog file name, design timing library (DB), switch-
ing activity (VCD or SAIF based), SPEF file and power analysis output file. To perform
vector-free power analysis, specify switching activity for primary inputs. PrimeTime-
PX propagates switching activity to internal nets. In no input is specified, PrimeTime-
PX assumes default input acvitity factor. A sample script is shown below:
ptpxscript
set power enable analysis true
set link library [list * <path to library DB file>]
read verilog [list <path to design verilog netlist>]
current design <design name>
link
pwr default toggle rate 0.5
pwr default static probability 0.5
pwr default tr reference clock [fastest — related]
read sdc <path to design constraints (sdc)>
read parasitics <path to design parasitic file (spef)>
set power rail output file <power output file>
update power
quit
Run following command and enter the script file name in the opened dialog box.
> poCallPTPX
PrimeTime-PX will be invoked and design switching, short-circuit and leakage power
will be reported to the command window. PrimeRail stores cell power information in
the specified output file.
131
4. Load Power Supply: This step tells PrimeRail about power net voltage levels. Run
following command and specify power supply information in TDF format.
> poLoadPowerSupply
5. Calculate Transient Current Waveforms: Cell transient current information from library
characterization data and cell power information from PrimeTime-PX results are com-
bined at this step to generate actual transient current waveforms for each cell. Current
waveform for each cell recorded at few significant points (10%, 50%, 90%, Peak value)
is stored in design Milkyway library. Following command calculates the current wave-
forms
> poTransientPowerAnalysis
6. Extract Power and Ground Nets: Final step required before DvD is to extract power grid
parasitics. TLU+ models are required to extract RC parasitic for power grid. In absence
of TLU+ model, only resistance parasitic can be extracted. Run following command to
extract power grid resistance:
> poPGExtraction
7. Perform Cell Level Dynamic Analysis: At this step, RC view of power grid can be com-
bined with current and parasitic information of cells and Rail analysis can be performed.
Following command performs the DvD rail analysis:
> poRailAnalysis
Rail analysis requires an additional input: tap file. Tap file specifies power supply inputs
to the design. Power supply input points are specified as coordinates in two-dimensional
layout map. Tap file can be created by graphically specifying the power supply points
in the design (such as at power supply rings). A sample Tap file is shown below:
Tap File
vdd 14 83.150 143.100 # 14 and 12 refers to supply metal layers.
vdd 12 3.100 76.350
8. View Results: Finally, results of voltage drop can be seen by executing following com-
mand. A threshold voltage can be set in the opened dialog box to observe violating
nodes (nodes having high voltage drop).
> pgMap
132
133
Bibliography
[1] International Technology Roadmap for Semiconductors. Technical report, Semiconductor In-
dustry Association, 2007. http://public.itrs.net.
[3] PrimeRail User Guide. Technical report, Synopsys Inc., San Jose, CA, 2008.
[4] Jan Rabaey, Anantha Chandrakasan, and Borivoje Nikolic. Digital Integrated Circuits: A
Design Perspective. Pearson Education, 2nd edition, 2003.
[5] G.E. Moore. Cramming more components onto integrated circuits. Proceedings of the IEEE,
86(1):82–85, Jan 1965.
[6] M.A. Elgamel and M.A. Bayoumi. Interconnect noise analysis and optimization in deep sub-
micron technology. Circuits and Systems Magazine, IEEE, 3(4):6–17, 2003.
[7] Voltagestorm Cell-Level Rail Analysis User Guide. Technical report, Cadence Design Sys-
tems, San Jose, CA, 2007.
[8] A.V. Meziba and E.G. Friedman. Power Distribution Networks in High Speed Integrated
Circuits. Kluwer Academic Publishers, 2004.
[9] Q.K. Zhu. Power Distribution Network Design for VLSI. Wiley-Interscience Publication,
2004.
[10] Joon-Seo Yim, Seong-Ok Bae, and Chong-Min Kyung. A floorplan-based planning method-
ology for power and clock distribution in ASICs [CMOS technology]. Design Automation
Conference, 1999. Proceedings. 36th, pages 766–771, 1999.
134
[11] Yu Zhong and M.D.F. Wong. Thermal-aware IR drop analysis in large power grid. Qual-
ity Electronic Design, 2008. ISQED 2008. 9th International Symposium on, pages 194–199,
March 2008.
[12] K.-H. Erhard, F.M. Johannes, and R. Dachauer. Topology optimization techniques for
power/ground networks in VLSI. Design Automation Conference, 1992. EURO-VHDL ’92,
EURO-DAC ’92. European, pages 362–367, Sep 1992.
[14] L.D. Smith. Decoupling capacitor calculations for CMOS circuits. Electrical Performance of
Electronic packaging, 1994., IEEE 3rd Topical Meeting on, pages 101–105, Nov 1994.
[15] H.H. Chen and D.D. Ling. Power supply noise analysis methodology for deep-submicron
VLSI chip design. Design Automation Conference, 1997. Proceedings of the 34th, pages 638–
643, Jun 1997.
[16] M. Ang, R. Salem, and A. Taylor. An on-chip voltage regulator using switched decoupling
capacitors. Solid-State Circuits Conference, 2000. Digest of Technical Papers. ISSCC. 2000
IEEE International, pages 438–439, 2000.
[17] A. E. Ruehii. Inductance calculations in a complex integrated circuit environment. IBM Jour-
nal of Research and Development, 1972.
[19] M. Popovich, E.G. Friedman, R.M. Secareanu, and O.L. Hartin. Efficient placement of dis-
tributed on-chip decoupling capacitors in nanoscale ics. Computer-Aided Design, 2007. IC-
CAD 2007. IEEE/ACM International Conference on, pages 811–816, Nov. 2007.
[20] Mikhail Popovich, Eby G. Friedman, Michael Sotman, Avinoam Kolodny, and Radu M. Se-
careanu. Maximum effective distance of on-chip decoupling capacitors in power distribution
grids. In GLSVLSI ’06: Proceedings of the 16th ACM Great Lakes symposium on VLSI, pages
173–179, 2006.
135
[21] Shiyou Zhao, K. Roy, and Cheng-Kok Koh. Decoupling capacitance allocation and its ap-
plication to power-supply noise-aware floorplanning. Computer-Aided Design of Integrated
Circuits and Systems, IEEE Transactions on, 21(1):81–92, Jan 2002.
[22] M.D. Pant, P. Pant, and D.S. Wills. On-chip decoupling capacitor optimization using archi-
tectural level prediction. Circuits and Systems, 2000. Proceedings of the 43rd IEEE Midwest
Symposium on, 2:772–775 vol.2, 2000.
[23] Haihua Su, S.S. Sapatnekar, and S.R. Nassif. Optimal decoupling capacitor sizing and place-
ment for standard-cell layout designs. Computer-Aided Design of Integrated Circuits and
Systems, IEEE Transactions on, 22(4):428–436, Apr 2003.
[25] Y. Cao, T. Sato, M. Orshansky, D. Sylvester, and C. Hu. New paradigm of predictive MOSFET
and interconnect modeling for early circuit simulation. Custom Integrated Circuits Confer-
ence, 2000. CICC. Proceedings of the IEEE 2000, pages 201–204, 2000.
[27] Priyanka Thakore. Development of process variation tolerant standard cells. Master’s thesis,
University of Cincinnati, 2007.
[29] P.R. Panda and N.D. Dutt. 1995 high level synthesis design repository. System Synthesis,
1995., Proceedings of the Eighth International Symposium on, pages 170–174, Sep 1995.
[31] Design Compiler User Guide. Technical report, Synopsys Inc., San Jose, CA, 2002.
[32] SOC Encounter User Guide. Technical report, Cadence Design Systems, San Jose, CA, 2007.
[33] J.E. Stine, J. Grad, I. Castellanos, J. Blank, V. Dave, M. Prakash, N. Iliev, and N. Jachimiec.
A framework for high-level hynthesis of system on chip designs. Microelectronic Systems
Education, 2005. (MSE ’05). Proceedings. 2005 IEEE International Conference on, pages
67–68, June 2005.
136
[34] Xiongfei Meng, K. Arabi, and R. Saleh. Novel decoupling capacitor designs for sub-90nm
CMOS technology. Quality Electronic Design, 2006. ISQED ’06. 7th International Sympo-
sium on, pages 6 pp.–271, March 2006.
[35] Yiran Chen, Hai Li, K. Roy, and Cheng-Kok Koh. Gated decap: gate leakage control of on-
chip decoupling capacitors in scaled technologies. Custom Integrated Circuits Conference,
2005. Proceedings of the IEEE 2005, pages 775–778, Sept. 2005.
[36] Star hspice manual. Technical report, Avant! Corporation, June 2001.
[37] J.E. Meyer. MOS models and circuit simulation. RCA Rev., 32:42–63, 1971.
[38] M.A. Cirit. The meyer model revisited: why is charge not conserved? [MOS transis-
tor]. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on,
8(10):1033–1037, Oct 1989.
[39] HSPICE simulation and analysis user guide. Technical report, Synopsys Inc., March.
[40] J.D. Djigbenou, Thien Van Nguyen, Cheng Wei Ren, and Dong Sam Ha. Development of tsmc
0.25m standard cell library. SoutheastCon, 2007. Proceedings. IEEE, pages 566–568, March
2007.
[43] Abstract Generator User Guide. Technical report, Cadence Design Systems, San Jose, CA,
2007.
[44] Signalstorm Library Characterizer User Guide. Technical report, Cadence Design Systems,
San Jose, CA, 2007.
[45] Steve Golson. The human ECO compiler. Synopsys User Group Conference (SNUG), San
Jose, CA, 2004. http://www.trilobyte.com/pdf/golson snug04.pdf.
[46] J.A. Roy and I.L. Markov. ECO-system: Embracing the change in placement. Design Au-
tomation Conference, 2007. ASP-DAC ’07. Asia and South Pacific, pages 147–152, Jan. 2007.
137
[47] Chen Li, Cheng-Kok Koh, and P.H. Madden. Floorplan management: incremental placement
for gate sizing and buffer insertion. Design Automation Conference, 2005. Proceedings of the
ASP-DAC 2005. Asia and South Pacific, 1:349–354 Vol. 1, Jan. 2005.
[48] Yi Liu, Xianlong Hong, Yici Cai, and Weimin Wu. CEP: a clock-driven eco placement al-
gorithm for standard-cell layout. ASIC, 2001. Proceedings. 4th International Conference on,
pages 118–121, 2001.
[49] Zhuoyuan Li, Weimin Wu, and Xianlong Hong. Congestion driven incremental placement
algorithm for standard cell layout. Design Automation Conference, 2003. Proceedings of the
ASP-DAC 2003. Asia and South Pacific, pages 723–728, Jan. 2003.
[50] Zhuoyuan Li, Weimin Wu, Xianlong Hong, and Jun Gu. Incremental placement algorithm for
standard-cell layout. IEEE International Symposium on Circuits and Systems, ISCAS 2002,
2:II–883–II–886 vol.2, 2002.
[51] Sadiq M. Sait and Habib Youssef. VLSI Physical Design Automation - Theory and Practice.
IEEE Press, 1995.
[52] Milkyway Data Preparation User Guide. Technical report, Synopsys Inc., San Jose, CA, 2007.
138