Ucin 1235504983

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 152

U

UNIVERSITY OF CINCINNATI
Date: January 23, 2009

I, Pritesh Johari ,

hereby submit this original work as part of the requirements for the degree of:
Master of Science
in Computer Engineering
It is entitled:
Distributed Decap-Padded Standard Cell based
On-Chip Voltage Drop Compensation Framework

Student Signature:
Pritesh Johari

This work and its defense approved by:

Committee Chair:
Dr. Ranga Vemuri
Dr. Wen Ben Jone
Dr. Carla Purdy
Dr. Hal Carter

Approval of the electronic document:

I have reviewed the Thesis/Dissertation in its final electronic format and certify that it is an
accurate copy of the document reviewed and approved by the committee.

Committee Chair signature: Dr. Ranga Vemuri


Distributed Decap-Padded Standard Cell based On-Chip
Voltage Drop Compensation Framework

A thesis submitted to the

Division of Research and Advanced Studies


of the University of Cincinnati

in partial fulfillment of the


requirements for the degree of

MASTER OF SCIENCE

in the Department of
Electrical and Computer Engineering
of the College of Engineering

January 23, 2009

by

Pritesh Johari

Bachelor of Engineering (Electronics)


Shri Vaishnav Institute of Technology and Science, Indore,
Rajiv Gandhi Technical University, Bhopal
July, 2001

Thesis Advisor and Committee Chair: Dr. Ranga Vemuri


Abstract

Technology induced voltage scaling coupled with faster switching, have made the circuit behav-
ior very sensitive to the power supply variations. The effect is classified as power and ground bounce
problems. Power and ground bounce can inject random glitches which propagate as mal-functioning
logic.

On-chip decoupling capacitors (Decap) are used to reduce the power supply noise. Traditionally,
lumped decaps are placed in the chip-finishing stages at available white spaces. However, insuffi-
cient budgeting at an early stage and lack of placement estimation have often positioned the decaps
at a distance away from the switching nodes. Experimental results show that proximity of the de-
caps to the violating switching nodes is more effective in power supply noise cancellation. This
work attempts to develop an alternative framework to incorporate the decaps in a design close to the
switching nodes, thus making them more effective.

The proposed voltage drop optimization framework comprises of three components. First, a
special standard cell library with minimum decap padding is developed in order to place decaps
closest to the victim nodes. Second, we propose an optimization algorithm to incorporate these
standard cells together with minimal value of lumped decaps in the physical synthesis stages. Lastly,
we develop an engineering change order placer to generate a valid decap-optimized placement.
The developed framework is integrated with the commercial backend design tools (Cadence and
Synopsys). The effectiveness of our work has been demonstrated on standard benchmark circuits.
To,
My Dearest Parents
Acknowledgments

I would like to extend my sincere thanks to Dr. Ranga Vemuri, my research advisor. He is not
only a good teacher and a great researcher but a very down-to-earth and a humble person. I admire
his qualities, and would like to imbibe them. I thank him for his guidance and regular discussions.
I am also grateful to him for giving me the opportunity to teach few lectures of Physical VLSI and
VLSI Design Automation Course. I would like to thank Dr. Wen-Ben Jone, Dr. Carla Purdy, and
Dr. Harold Carter for being on my thesis committee. It is a great honor for me to present my work
to these distinguished professors. I thank them for their valuable time. The distinctive quality of
Dr. Wen-Ben Jone to blend the academic discourse with humor is unforgettable. I would also like
to thank Mr. Rob Montjoy, ECE Department System Administrator, for resolving numerous tool
related issues.

I would like to express my deepest gratitude to Shubhankar Basu, a DDELite. The base for this
research work, the decap-padded standard cell approach, was provided by him. I am deeply indebted
to him for his long discussions about the research. He has been a great mentor to me. In a short
span of time, he taught me a lot about VLSI CAD. I am really thankful to him. I would also like
to thank my fellow DDELites, Surya, Angan, Annie, Almitra, Mike, Hao, Romana, Ajaay, Balaji,
Suman, Manoj, Bala, and Vijay for making the life fun and memorable. Their presence made the
DDEL more than just a laboratory. Regular technical discussions with Surya, Angan, Hao, Almitra
have enriched my knowledge. I thank Romana for helping me with the Virtuoso tool during the
beginning phase of my research. Angan and Annie, full of energy and sense of humor, have added
a lot of fun to the research life. I would like to thank them all for providing such a nice time.

My parents, my brother and family members have always been supportive of me. I owe them for
their trust and belief in me, and for being supportive of me during many ups and downs during my
stay at UC. Their constant support and encouragement has given me enough strength to complete
my studies at UC.

I would like to thank my friends, Kiran, Srikara, Surya, Sidhhartha, Kalyan, Ravish, Kartik,
Sagar, Srikanth, Nishant, Skanda, Dipti, Shubham, Aparna for creating a very healthy and scholarly
atmosphere around me. The cherishable moments, I spent with them, are really worth remembering.

This apart, there were a lot of inconspicuous contributions, so necessary at times to lead us out
of stalemate. I gratefully acknowledge all of them which could not be mentioned here explicitly.
Contents

List of Figures v

List of Tables viii

1 Introduction 1
1.1 Power Supply Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Causes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Adverse Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Approaches to Tackle Power Supply Noise . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Power Distribution Network Design . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Logic Level Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.3 Other Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Research Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 On-Chip Power Supply Noise Analysis 16


2.1 Typical Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Voltage Drop Analysis Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Power Grid Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Logic Circuit Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.3 Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

i
2.3 Static Voltage Drop Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Dynamic Voltage Drop Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 On-Chip Decoupling Capacitance (Decap) . . . . . . . . . . . . . . . . . . . . . . 26
2.5.1 Sources of On-Chip Decap . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Research in Voltage Drop Optimization . . . . . . . . . . . . . . . . . . . . . . . 28

3 Motivation: Initial Experiments 30


3.1 Circuit Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Proposed Voltage Drop Optimizaton Framework 38


4.1 Traditional Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Data Preparation for Voltage Drop Analysis . . . . . . . . . . . . . . . . . . . . . 42
4.3.1 Milkyway Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.2 Cell Parasitic Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.3 Cell Current Characterization . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.4 Additional Design Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4 PrimeRail: Voltage Drop Analysis Flow . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.1 Power Analysis and Current Waveform Generation . . . . . . . . . . . . . 45
4.4.2 PG Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.3 Rail Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5.1 Benchmark Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5.2 Benchmark Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5 UCDCLIB: Decap Padded Standard Cell Library Design 52


5.1 Nominal Standard Cell Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 MOSFET Gate Capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

ii
5.3 Decap Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.1 Decap Measurement using HSPICE . . . . . . . . . . . . . . . . . . . . . 57
5.3.2 DCFLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3.3 UCDCLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4 Cell Characterization and View Generation . . . . . . . . . . . . . . . . . . . . . 62
5.4.1 Symbol Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4.2 Physical View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4.3 Timing and Netlist View . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.4.4 Parasitic View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6 DCOPT: Decap Optimization Algorithm 68


6.1 DCOPT Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 DCOPT Input Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.3 PrimeRail Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.4 DCOPT-Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.5 DCOPT-Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7 ECO-Placer: Engineering Change Order Placement Algorithm 82


7.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2 Proposed Approach: ECO-Placer . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.3 ECO-Placer Input Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.4 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.4.1 Phase-I: Fixing Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.4.2 Phase-II: Candidate Cell Selection and Movement . . . . . . . . . . . . . 95
7.4.3 Phase-III: Core Area Adjustment . . . . . . . . . . . . . . . . . . . . . . . 99
7.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

8 Overall Optimization Results 107


8.1 Barcode16 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.2 B14 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

iii
8.3 B18 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

9 Conclusion and Future Directions 120

A Milkyway Library Creation using LEF and DEF 122


A.1 Milkyway Reference Library Creation . . . . . . . . . . . . . . . . . . . . . . . . 122
A.2 Milkyway Design Library Creation . . . . . . . . . . . . . . . . . . . . . . . . . . 125

B Library Characterization for Voltage Drop Analysis 128

C Dynamic Voltage Drop Analysis using PrimeRail 131

Bibliography 135

iv
List of Figures

1.1 Trends in Circuit Parameters (Source [1]) . . . . . . . . . . . . . . . . . . . . . . 2


1.2 A Typical Electronic System Consisting of Various ICs . . . . . . . . . . . . . . . 3
1.3 Circuit Model Highlighting IR Drop Effect . . . . . . . . . . . . . . . . . . . . . 4
1.4 Supply Voltage Band for Reliable Operation at 180nm Node . . . . . . . . . . . . 5
1.5 Simultaneous Switching Noise due to Output Drivers (Adapted from [2]) . . . . . 6

2.1 Digital Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18


2.2 RLC Model for On-Chip Power Grid Network . . . . . . . . . . . . . . . . . . . . 21
2.3 Transistor Current Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Complete Design Model for Voltage Drop Analysis . . . . . . . . . . . . . . . . . 24
2.5 Transient and Average Current . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 Circuit Model Illustrating Effectiveness of Decoupling Capacitor . . . . . . . . . . 27

3.1 Circuit Model for Evaluating Effectiveness of Distributed Decap Placement . . . . 31


3.2 Logic Circuit Representing Basic Block Shown in Figure 3.1 . . . . . . . . . . . . 31
3.3 Input Waveform for Distributed Decap Experiment . . . . . . . . . . . . . . . . . 33
3.4 Circuit Model with Two Basic Blocks for Distributed Decap Experiment . . . . . . 34
3.5 Distributed Decap Placement Result for Model Shown in Figure 3.1 . . . . . . . . 36
3.6 Distributed Decap Placement Result for Model Shown in Figure 3.4 . . . . . . . . 37

4.1 Proposed Voltage Drop Optimization Approach . . . . . . . . . . . . . . . . . . . 39


4.2 PrimeRail Decap Optimization (Source [3]) . . . . . . . . . . . . . . . . . . . . . 40
4.3 Milkyway Library Views and their Description . . . . . . . . . . . . . . . . . . . 43

v
4.4 PrimeRail Voltage Drop Analysis Flow . . . . . . . . . . . . . . . . . . . . . . . . 46

5.1 Capacitance Sources in MOSFET . . . . . . . . . . . . . . . . . . . . . . . . . . 54


5.2 MOSFET Channel Capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3 MOSFET Gate Capacitance Variation with respect to VGS . . . . . . . . . . . . . . 55
5.4 NMOS Decap and its Equivalent Model . . . . . . . . . . . . . . . . . . . . . . . 56
5.5 Decap Standard N+P Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.6 NMOS Decap Measurement using HSPICE (CG Vs. VGS plot) . . . . . . . . . . . 59
5.7 PMOS Decap Measurement using HSPICE (CG Vs. VGS plot) . . . . . . . . . . . 59
5.8 Decap-Padded Inverter Cell Layout . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.9 Method to Generate Various Library Views . . . . . . . . . . . . . . . . . . . . . 64
5.10 Cell Layout and its Abstract View . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.1 DCOPT Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70


6.2 Illustration of Dynamic Threshold Concept . . . . . . . . . . . . . . . . . . . . . 75
6.3 Voltage Drop Maps: DCOPT Results for Barcode16 . . . . . . . . . . . . . . . . . 80
6.4 Voltage Drop Map: DCOPT Results for B14 . . . . . . . . . . . . . . . . . . . . 81

7.1 ECO-Placer Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86


7.2 Example Operation Row with INSERT Operation . . . . . . . . . . . . . . . . . . 95
7.3 Solution Space by our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.4 Example Row Distribution after Phase-I . . . . . . . . . . . . . . . . . . . . . . . 96
7.5 Reduction in Number of Violating Rows with Core Area Increase . . . . . . . . . 100
7.6 ECO-Placer Results for Barcode16 . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.7 ECO-Placer Results for B14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.8 ECO-Placer Results for B18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

8.1 Optimization Result Graphs for Barcode 16: Peak VD and Decap Budget . . . . . 109
8.2 Optimization Result for Barcode16: Voltage Drop Maps . . . . . . . . . . . . . . 110
8.3 Optimization Result Graphs for B14: Peak VD and Decap Budget . . . . . . . . . 111
8.4 Optimization Result for B14: Voltage Drop Maps . . . . . . . . . . . . . . . . . . 112

vi
8.5 Optimization Result Graphs for B18: Peak VD and Decap Budget . . . . . . . . . 113
8.6 Optimization Result for B18: Voltage Drop Maps . . . . . . . . . . . . . . . . . . 114
8.7 Manual Optimization Results for Barcode16 . . . . . . . . . . . . . . . . . . . . . 118
8.8 Manual Optimization Results for B14 . . . . . . . . . . . . . . . . . . . . . . . . 119

A.1 Reference Library Creation Dialog Box . . . . . . . . . . . . . . . . . . . . . . . 124


A.2 Design Library Creation Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . 126

B.1 Library Characterization for DvD Dialog Box . . . . . . . . . . . . . . . . . . . . 130

vii
List of Tables

3.1 Global Wire Parameters at 0.18µm Node . . . . . . . . . . . . . . . . . . . . . . . 32


3.2 Global Wire Parasitic Values at 0.18µm Node . . . . . . . . . . . . . . . . . . . . 32
3.3 Decap Cases Analyzed for Distributed Decap Experiment . . . . . . . . . . . . . . 33
3.4 Effect of Decap Addition on Block Area . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Analysis Results for Circuit Model Shown in Figure 3.1 . . . . . . . . . . . . . . . 36
3.6 Analysis Results for Circuit Model Shown in Figure 3.4 . . . . . . . . . . . . . . . 37

4.1 Design Cases for Voltage Drop Analysis . . . . . . . . . . . . . . . . . . . . . . . 49


4.2 Benchmarks Used for Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.1 Channel Capacitance of MOSFET for Different Operating Regions (Source [4]) . . 55
5.2 NMOS Decap Measurement Results using HSPICE . . . . . . . . . . . . . . . . . 59
5.3 PMOS Decap Measurement Results using HSPICE . . . . . . . . . . . . . . . . . 59
5.4 DCFLIB Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.5 OSULIB and UCDCLIB Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.6 Library Views and their Description . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.1 DCOPT Results for Benchmark Barcode16 . . . . . . . . . . . . . . . . . . . . . 79


6.2 DCOPT Results for Benchmark B14 . . . . . . . . . . . . . . . . . . . . . . . . . 79

7.1 ECO-Placer Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

8.1 Summary of Voltage Drop Optimization Results . . . . . . . . . . . . . . . . . . . 117

viii
Chapter 1

Introduction

The semiconductor industry has seen an unprecedented growth since the advent of transistor
era. The invention of transistor and the associated integrated circuits in 1960’s had soon replaced
the bulky vacuum-tube based devices, leading to highly compact, more reliable, and affordable
electronic devices. This propelled an ever increasing demand for the electronic products. Advances
in the research technologies made sure that this trend continues. Following the Moore’s law, which
states doubling of transistor count every 1.5 to 2 years [5], the semiconductor industry has come
a long way from about 1000 transistors in early 1970’s to hundreds of millions of transistors in
the present day integrated circuits. Today, the emphasis is on ”More than Moore” approach which
allows for functional diversification by integrating devices belonging to different domains (e.g.,
analog, RF communication, sensors) on to a single chip (known as System-on-Chip or SoC) [1].
This will enable us to witness increasingly complex and cost-effective devices for at least few more
decades.

The magnificent increase in complexity and performance of the integrated circuits over the years
has been made possible with the advances in circuit and process technologies. The demand for
power and area efficient devices has paved the way for the CMOS technology. At the same time,
the advances in the process technologies led to a steady decrease in the feature size of the CMOS
devices. The technology scaling from sub-micron to deep-submicron (less than 0.5um) has been
beneficial for CMOS devices in terms of area, power, and speed metric. And now, the transition into
the nanometer regime (less than 100nm) would continue to bolster the device characteristics.

Apparently, the motivating factors for the advancement in the technology are performance and

1
cost-effectiveness. Nevertheless, nothing comes without a price. With continual shrinking of the
device sizes, and increasing on-chip logic complexity, various deep-submicron and nanometer par-
asitic effects are becoming prominent, raising the reliability concerns of the integrated circuits [6].
Effects such as increased heat dissipation, process variation, crosstalk, and power supply noise are
undoubtedly posing serious problems. Unless adequately addressed at various steps during the de-
sign process, these effects could lead to intermittent malfunction to the permanent yield loss of the
circuit.

In this thesis, we propose a new framework to address one of these deep submicron effects,
namely the power supply integrity. For reliable operation of any electronic circuit, the power supply
must be stable over the range of its operation. The data from ITRS 2007 [1] in Figure 1.1 shows IC
supply voltage, cost-performance, and frequency trends. Cost-performance refers to highest chip
performance with economical power consumption management. This technology induced voltage
scaling coupled with faster switching frequency and higher power consumption makes the circuit
behavior very sensitive to the power supply variations. The effect is classifies as power and ground
bounce problem. The power and ground bounce can inject random glitches which propagate as
malfunctioning logic.

Figure 1.1: Trends in Circuit Parameters (Source [1])

2
The scope of work presented in this thesis lies in providing a new framework to stabilize the
power supply to the on-chip logic circuit by optimizing the on-chip decoupling capacitance. To
develop an understanding of the power supply noise effects and the need to control it, subsequent
sections start with a brief overview of the power supply noise, the factors causing the power supply
noise, and its effects on the circuit performance. This is followed by the research overview, and the
contributions. Finally, the overall structure of the thesis is presented in Section 1.6.

1.1 Power Supply Noise

Reliable operation of a digital logic circuit requires a stable DC power supply. The design of
such a power supply is done so that it can deliver instantly the required amount of the charge to
the logic circuit during its switching period to meet the timing specifications, while maintaining the
supply voltage level at various points in the circuit. Any change in supply voltage from the ideal DC
voltage level affects the circuit performance. This change in the supply voltage is termed as power
supply noise.

A typical electronic system consists of number of integrated circuits housed in different packages
inter-connected on a single or multi-layer printed circuit board as shown in Figure 1.2. The power
supply to a logic circuit on a specific chip has to come through hierarchy of levels. Originating at

Package Interconnect
(Package to Die Interface) On-Chip
On-Board Interconnect
Interconnect

Voltage Regulator On-Board Decoupling Capacitor

Figure 1.2: A Typical Electronic System Consisting of Various ICs

3
the voltage regulator on-board, the supply voltage to on-chip circuit comes through on-board inter-
connect, package interconnect (package to die interface), and finally through the on-chip intercon-
nection network. Assuming an ideal interconnect network and a perfect on-board voltage regulator,
the logic circuitry can get the required amount of charge instantly during its switching period. And
the voltage level at the logic supply node would then be identical to that of at the output of voltage
regulator. However, in reality, the voltage regulator as well as the power distribution interconnect
is not ideal. On-board voltage regulator shows some finite output impedance. And the long supply
and ground path offers a significant parasitic, namely resistance, capacitance, and inductance. The
current flowing through a resistive network causes a voltage drop as per the ohm’s law, resulting in
decrease in the supply voltage at logic supply nodes. Figure 1.3 shows a circuit model illustrating
this effect [7]. Current I taken by gate G3 causes a voltage drop of [I · (R1 + R2 + R3)] at node V D3.
At the same time, node V D2 will be at [V DD − I · (R1 + R2)], which affects gate G2 performance.
The voltage drop due to current taken by a logic gate not only affects its performance, but it also
affects the neighboring logic gates. The drop in supply voltage due to resistance parasitic is com-
monly referred to as IR drop or simply a voltage drop [8]. The ground network is usually similar to
the supply network, and suffers from the similar noise when the current takes a return path. Due to
the presence of similar parasitic in the ground network, the ground potential increases. This effect
is referred to as ground bounce. With interconnect inductance parasitic becoming increasing impor-
tant, the transient noise due to L ∗ di /dt also contributes to overall voltage drop. The noise at supply
nodes due to inductance is termed as ∆i noise.

R1 R2 R3
VD1 VD2 VD3

VDD
G1 G2 G3

VG1 VG2 VG3

Figure 1.3: Circuit Model Highlighting IR Drop Effect

As discussed in Section 1.1.1, there are various reasons which make the power supply noise an
area of concern. At one side, the drop in voltage level at logic supply node adversely affects the
performance; any increase in the voltage level above the ideal voltage level raises the reliability

4
VDDLower = 1.89 V
Node Voltage (v)

VDDNominal = 1.8 V 10% Voltage Band


for
Correct Circuit
Operation
VDDLower = 1.71 V

Voltage below
VDDLower affects circuit
performance

Time (ns)

Figure 1.4: Supply Voltage Band for Reliable Operation at 180nm Node

concerns. Hence, a reliable operation of a logic circuit requires a strict control of absolute change
in the voltage level from the ideal voltage levels. Typically, the supply voltage is allowed to vary
within 10% voltage band from its nominal value (V DDnominal ) as shown in Figure 1.4. However,
reducing noise margin with supply voltage scaling requires a tighter band of 5% for supply voltage
variations. As long as supply voltage variation is confined in this band, the circuit is said to operate
reliably, meeting its functional and logical specifications.

A major contribution to on-chip power supply noise comes from output drivers. Output drivers
are typically large in size, and are designed to meet high current requirements of off-chip load.
The power supply noise due to simultaneous switching of multiple output I/O drivers is commonly
referred to as simultaneous switching noise or SSN (Figure 1.5). Although SSN can be attributed
to parallel switching of other circuit components; Nevertheless, the term SSN is often used in the
context of output driver switching [2].

1.1.1 Causes

Apparent from previous section, interconnect parasitic plays a big role in intensifying power
supply noise problems, it is, however, not the only cause of supply noise. Various other design
parameters affect the power supply noise. Many of them are difficult to control, and are result of
our craving for increasingly complex circuits on a single die. This subsection briefly summarizes

5
VDD
Package

Chip

V1

I0 D0
CL

I1 D1
CL

I2 D2
CL

I31 D31
CL

G1

GND

Figure 1.5: Simultaneous Switching Noise due to Output Drivers (Adapted from [2])

the causes of the voltage drop problems [9].

Cruising along with the Moore’s law, number of transistors per unit area on an integrated circuit
has been consistently increasing with each technology scaling. Although, higher transistor density
on an IC allows for increasingly complex and higher performance circuits, proper fabrication of
the circuit requires more number of metal layers for signal and power routing. Higher number of
metal wires translates into longer power supply connections to the logic circuit, which increases
the associated parasitic. Moreover, narrow interconnect wires at lower process nodes lead to higher
resistance. Increased metal layer count results into more number of via connections between metal
layers. The resistance due to via does not scale well with the technology scaling, and in fact, is
increasing.

At the same time, increased transistor density indicates higher overall current, and hence higher
power consumption, for the design. The “More-than-Moore” worsens the situation by integrating
diverse functions onto a single die. The power consumption of logic is further increasing due to
higher switching activity, attributed to the higher clock frequencies. Another important factor af-

6
fecting the power supply noise is the reduction in the supply voltage as we go down the technology
node. In order to avoid the device breakdown due to excessive electric field, the supply voltage
is being reduced down the technology node. The reduction in supply voltage lowers the available
noise margin, and makes the circuit more sensitive to the power supply variations.

Existing CAD tools are not advanced enough to reliably predict the impact of power supply noise
at the early stages of the backend flow. In order to meet the timing specifications, CAD tools try to
keep timing critical components nearby, increasing the switching activity in a region. The increased
regional voltage drop due to higher power dissipation leads to regions of hotspots (a region with
higher temperature). A major contributor to the power supply noise is the clock network. The
synchronous operation of clock registers leads to excessive voltage drop.

Advanced low power techniques also contribute to voltage drop problems. Technique such as
power gating leads to uneven distribution of the currents. When a power gated circuit block wakes
up, a higher rush current produces regions of hotspots, and hence contributes to the voltage drop.
The clock gating is an another example.

Selection of a proper package for the die also affects the power supply noise. Until the deep sub-
micron node, the package pin to die connection was the only main contributor to the interconnect
inductance parasitic. However, on-chip inductance is also becoming increasingly important due to
longer current loops for complex circuits at lower technology nodes. Lastly, number and location
of supply pads in a chip also affects the IR drop.

1.1.2 Adverse Effects

Power supply noise affects the circuit performance in many ways. Some of the important effects
are summarized in this subsection [8].

Reduction in the supply voltage of a logic gate increases its propagation delay. This happens
due to reduction in the gate to source voltage for a PMOS transistor, which results in decrease in
the available drain current to charge the output load. The similar phenomenon at NMOS side slows
down the discharge rate due to increase in the ground potential. Further, the voltage drop due to
the current taken by the logic gate can affect the nearby logic circuits as well, if the neighboring
logic switches during the same period. In fact, 5% reduction in the supply voltage can make the
gate 15% slower [10]. The increase in the logic delay can lead to intermittent timing violations, and

7
hence can restrict the operating frequency of the circuit. Typically, in a synchronous logic, voltage
drop in the combinational path leads to setup time violation, because signal propagation is delayed
due to reduction in supply voltage. Similarly, voltage drop in the clock network delays the clock
arrival, and hence leads to the hold time violations. Therefore, in order to ensure the timing goals
of a design, variation in the supply voltage level must be minimized.

The power supply noise degrades the available noise margin for the logic circuit. The output
voltage of a typical logic gate is measured with respect to the supply voltage or the ground potential.
Any change in the supply or ground potential would shift the output voltage, which affects the
subsequent logic gates. If the voltage level for the connecting logic gate is also reduced due to
the power supply noise, the state of gate would be unpredictable due to inconsistency in the voltage
references. The situation is similar to multi-vdd logic design style, where proper operation of circuit
requires level shifters. Due to spatial variation in the supply voltage of connected logic gates, the
available noise margin of the receiving gate degrades.

Power supply variations can introduce jitter and/or skew in the clock signal. On-chip clock is
typically generated using PLL. Any supply variation in the PLL components would lead to change
in the phase of the clock signal, thereby, introducing the jitter. On-chip clock network is designed
to balance the skew between clock endpoints. Supply voltage variation in one of the path of the
clock network can delay the clock propagation as compared to other clock paths, leading to the non-
zero skew for the clock signal. The resulting clock skew can lead to setup or hold time violations
depending on clock direction with respect to the data flow.

Voltage overshoot at the power and ground nodes can affect the reliability of the transistor. Re-
duction of the gate oxide thickness of a transistor with technology scaling makes it more susceptible
to this damage.

Another effect attributed to the voltage drop is a phenomenon of “Joule Heating” [11]. With
interconnect scaling, the resistance per unit length of interconnect is increasing. Higher current
density through such an interconnect produces voltage drop. This voltage drop in the interconnect
leads to the self-heating (known as Joule Heating). Increased temperature causes proportional in-
crease in the resistivity of the metal interconnect, which further increases the IR drop. Hence, the
aggressive technology scaling requires a holistic consideration of power, temperature and voltage
drop effects.

8
1.2 Approaches to Tackle Power Supply Noise

A power supply network in a typical electronic system spans multiple levels of hierarchy. This
includes the on-board supply network, package connections, and the on-chip supply network. With
increasing complexity of the integrated circuits, as well as the board level designs, a system level
approach to tackle the power supply noise is necessary. Although apparent, optimizing noise at all
levels of system hierarchy makes the task extremely daunting. Fortunately, it is possible to optimize
the noise at different levels independently by making certain boundary assumptions. In this section,
we concentrate on the approaches to tackle the on-chip power supply noise.

On-chip power supply noise not only depends on the design of the on-chip power distribution
network, but also depends on the underlying logic circuitry as described in Section 1.1.1. Therefore,
a proper design of power distribution network combined with logic level optimizations can lead to
significant improvement in the power supply noise profile for a chip. Following summarizes some
of the approaches used to tackle the power supply noise:

1.2.1 Power Distribution Network Design

On-chip power distribution network is designed to provide the required current and voltage to
the logic circuitry for its proper functionality. Multiple metal layers are used in a high performance
integrated circuit to form the signal and power routing network. Usually, higher metal layers are
used to form the global power grid owing to their lowest resistance, and the power connections are
brought to transistor circuits at the lowest level using lower metal layers, connected using via. Power
routing for the logic cells usually goes in conflict with the signal routing. More number of metal
resources used at early stage for the power routing leaves less metal resources for the signal routing.
Alternatively, not using enough resources for the power network leads to increased current density
on the power conductors, leading to the problems of electromigration and voltage drop. Moreover,
typically, the power distribution grid is designed during the early stage of the backend flow, when
the placement information for the logic cells is not known. Prior tape-out experience and gate-level
power consumption information is utilized to design the power network. For these reasons, the
power distribution network is conservatively designed. Adding extra resources for power routing at
the later stages could lead to complete redesign of the network, and hence, preferably avoided. This
leaves the designers with less flexibility in tackling voltage drop problems just by optimizing the

9
power distribution network. Nevertheless, following approaches can still have significant impact in
controlling the power supply noise

• An optimal topology of power distribution network can be selected which results in reduc-
tion in total power routing area and improves overall chip voltage drop [12]. Use of multiple
supply and ground stripes can improve the current profile of the chip. This also helps in reliev-
ing the chip from the electromigration problems. Further, multiple supply pads, sufficiently
spaced around the die periphery, can be provided to lower the overall power distribution net-
work impedance.

• Depending on the design and available metal resources, VDD/GND power planes can be used
at higher metal layers. Power planes significantly reduce the resistance parasitic, and provide
shielding effect for the noise. However, use of power planes complicates signal routing due to
reduction in metal resources. In many cases, VDD and GND rings around the core periphery
would suffice the purpose.

• Sizing of power/ground network is another way to relieve power grid noise. Wire widths of
power and ground conductors are optimized such that total weighted area is reduced while
satisfying the electromigration and voltage drop constraints [13].

• Use of multiple vias also helps in reducing the voltage drop issues. Multiple vias reduce the
via resistance, as well as provides the alternate paths for the current. The reduction in the
current density from one particular path lowers the effective IR drop.

1.2.2 Logic Level Optimizations

Logic level optimization techniques to control the power supply noise mainly work either by
reducing the total power consumption or by redistributing the overall current requirements in a
region. Some of the important techniques are discussed below:

• Total power consumption of a logic circuit can be reduced by various methods. The issue
of power supply noise becomes important only if, the logic circuit is switching. Hence,
controlling the transient power is important. However, excessive leakage current from the
logic under shutdown can affect the nearby active circuitry, if their power rails are shared.
Many low power techniques such as power gating, clock gating can be used to lower the power

10
consumption. Clock network is one of the major contributors to the power supply noise owing
to clock signal’s unity activity factor. Hence clock gating to inactive logic can significantly
reduce the overall power consumption. However, as described in previous section, use of
low power techniques creates regions of hotspots, which might affect the power supply noise
negatively. Hence, a careful analysis must be done before adopting a specific technique.

• Proper buffer sizing is also important. Many times, buffers are conservatively sized up to
meet the timing specifications. These higher sized buffers exacerbate the noise problems by
demanding more current from the power supply network.

• Reducing the system frequency is another way to reduce the overall noise. High frequency
signals bring in the inductive noise into picture due to L ∗ di /dt .

• Stagger the switching of sequential elements. Do not switch all the elements at the same time.

• Power supply noise affects the timing as described in previous section. Hence, the logic can
be designed conservatively by introducing 10% timing margin in the cell libraries [7].

1.2.3 Other Approaches

Voltage drop profile of a design depends on the placement of logic blocks in a design. Most of
the approaches discussed above are typically applied prior to design placement which limits their
usefulness in controlling the voltage drop effectively. By far, the most powerful approach to control
the power supply noise at placement stage is the use of on-chip decoupling capacitors [14, 15].
On-chip decoupling capacitors act as local charge reservoirs, and fulfill the instantaneous charge
requirements of switching node. These are very effective in lowering the power supply network
impedance. However, on-chip decoupling capacitors are not part of original logic design. These
are added separately to a design to subdue the power supply noise effects, and can account for a
substantial percentage of total chip area. Therefore, it is important to optimize the placement and
number of capacitors for effective voltage drop management. Efficiency of on-chip decoupling
capacitors can further be enhanced by using an on-chip switching voltage regulator [16]. On-chip
switching voltage regulators can be used to increase the charge transfer efficiency by dynamically
making series or parallel connections of capacitors. Voltage drop compensation based on on-chip
decoupling capacitor optimization is the topic of this thesis. On-chip decoupling capacitors are
further discussed in subsequent chapters.

11
1.3 Research Overview

In order to reduce the adverse effects of on-chip power supply variation, decoupling capacitors
(also known as decaps) are widely used as local charge reservoirs. Decaps are not part of the logic
circuit; they are added separately in the available whitespace (core area not utilized by logic cells)
in the design during later stages of design flow to provide an instantaneous charge to the logic cir-
cuit. Typically, whitespaces are created in the design due to the conservative approach to define
core dimensions. Core area calculation has to account for the signal routing as well as decap budget
requirements. A timing or wirelength driven placement tool causes these whitespaces to migrate
toward core periphery in an attempt to place logic cells in close proximity. Once placement of logic
cells is over, the whitespace area of core is filled with filler cells to provide the well connectivity.
Since the voltage drop effects depend on the placement and routing of logic cells, problems revealed
by voltage drop analysis at this stage can be addressed by inserting decaps in place of filler cells.
We call this approach a filler-based decap allocation. This approach offers an easy solution, since
replacement of filler cells with decaps does not call for placement modifications. Therefore, place-
ment optimized design metrics remain unaffected. With technology scaling, however, the filler cell
based decap approach not only compromises the power supply integrity by placing decaps away
from the switching node, but it also utilizes more amount of decap budget than necessary. The extra
decap budget translates into larger chip area, and hence, higher cost.

As evident from the above problem, we seek a solution to increase the cost-effectiveness of decap
for addressing the on-chip voltage drop effects. For decaps to be more effective, they are required
to be placed close to switching nodes. A distributed decap approach would be more effective as
compared to the filler cell based lumped decap approach. Experimental results shown in Chapter 3
validate the usefulness of the distributed decap approach, and provide a motivation for the proposed
framework discussed below. Further, addition of extra steps to an existing design flow, requiring
out of flow data processing, significantly affects the total development time, and hence is highly
discouraged. Therefore, we seek a solution which does not call for significant changes in the existing
design flow.

In this work, we propose a library based distributed decap approach to control the voltage drop
problems around the chip. We provide a complete design framework to analyze the voltage drop in
a chip, and compensate it by incorporating the decoupling capacitors close to the violating nodes.
This is done by providing designers with an additional set of library, containing decap-padded logic

12
cells, along with the nominal cell library. We analyze the initial voltage drop, and algorithmically,
identify the voltage drop regions exceeding the user defined threshold. Distributed decap placement
is achieved by selectively replacing logic cells in the affected regions with the equivalent decap-
padded logic cells to meet the decap budget requirement of the design. We develop necessary
components to calculate optimum number of cell replacements and a method to incorporate new
cells in the design with minimal perturbation to the original placement. Chapter 4 provides details
of the proposed approach.

The developed framework addresses following objectives:


• Nearness of decap to switching loads allows better optimization.
• Significant reduction in overall decap requirement as compared to traditional filler-based de-
cap allocation approach.
• Library based flow - no change in existing design flow.
• Decap optimization applicable both at placement and post-route stages.
• Optimal incorporation of design changes into original design placement.
• Framework integrated with commercial tools (Cadence and Synopsys).

1.4 Research Contributions

This section describes the specific contributions of the thesis. Components described below
stitched together complete the framework described in the previous section.

• UCDCLIB: Decap-padded Standard Cell Library


To accomplish the placement of the decoupling capacitor near the violating node (node where
the voltage drop exceeds use defined threshold), we develop a decap-padded standard cell
library which contains set of logic cells, each padded with specific amount of decoupling
capacitor. We call this library UCDCLIB (University of Cincinnati Decoupling Padded Stan-
dard Cell Library). We generate all the necessary library views for easy integration of the
UCDCLIB with the standard design tools. Design and characterization of the decap-padded
standard cells is discussed in Chapter 5.

• DCOPT: Decap Optimization Algorithm


We develop a C++ based decap optimization algorithm to calculate the necessary decap bud-
geting for a design in terms of UCDCLIB cells. The algorithm works in synchronization with

13
the standard voltage drop analysis tool, and generates a list of cells to be replaced. Chapter 6
provides necessary details about the decap optimization algorithm.

• ECO-Placer: Engineering Change Order Placer


Finally, in order to accommodate UCDCLIB cells into the initial placement, we develop a
C++ based engineering change order placer. The ECO-Placer accommodates new logic cells
with minimal perturbation to the original placement. Three phase operation of the ECO-
Placer allows for the core expansion. The ECO-Placer algorithm is discussed in Chapter 7.

1.5 Assumptions

Before we delve into the details of the proposed framework, we provide, in this section, a brief
discussion of the assumptions and the considerations taken during the development of the proposed
approach. The assumptions are supported by explanation and their implication, and in most of the
cases, do not affect the usefulness of the approach.

1. Row based standard cell designs are prevalent in ASIC designs. In this work, we consider
issues of voltage drop and decap optimization in context of single-height standard cell based
designs only. An extension to macro based designs and multiple height cells can be done with
little modifications.

2. For voltage drop analysis, we consider only resistance parasitic for power distribution net-
work (PDN). With technology scaling, PDN inductance characteristics are also becoming
significant. However, due to unavailability of interconnect technology information, we only
extract PDN resistance. The decap optimization algorithm works on the voltage drop infor-
mation. It does not know about interconnect parasitic model. Including the inductance into
the analysis would only affect the level of noise, hence does not call for any change in the
proposed approach.

3. We assume that the placement tool can provide some timing margin (for ex. 10%) while
placing the logic cells. This is due to the fact that addition of decap cells will affect the
timing metrics of the original placement. Relaxed timing constraint can allow for voltage
drop compensation without degrading the timing metrics significantly.

4. We restrict our analysis to on-chip supply network. Due to power distribution symmetry,

14
the model for on-chip ground network turns out to be similar to that of a supply network.
Therefore, ground network can be analyzed in exactly similar manner.

5. While creating the model of the power distribution network, we do not consider the package
parasitic. We assume that the supply and ground voltage at the chip I/O pads is ideal. Again,
for the same reasons as given for not including the inductance parasitic, this assumption does
not limit the usefulness of the approach. Including package parasitic will only affect the level
of noise.

6. We assume that the chip core utilization is sufficient to accommodate the decap cells. This
is a valid assumption due to two reasons. First, 10% of core area is typically used for decap
allocation. Secondly, some percentage of core area (typically 23-30%) is left for routing
considerations. These two factors combined together leave enough whitespace in core so
that decap cells can be accommodated. Moreover, additional whitespace, if required, can be
inserted using the developed ECO-Placer.

1.6 Thesis Outline

The remainder of the thesis is organized as follows. Chapter 2 provides an overview on the
voltage drop analysis flow. Requirements for the voltage drop analysis and different approaches
are discussed. Experimental results in Chapter 3 are used to highlight the effectiveness of placing
decap near the switching nodes. It provides a motivation for voltage drop optimization using decap
padding to the standard cells. In Chapter 4, we discuss the overall framework. Chapter 4 stitches
together the components of the overall framework, and provides a complete picture. Design and
characterization of decap-padded standard cell library is described in Chapter 5. Chapter 6 provides
the details of the C++ based decap optimization algorithm. And in Chapter 7, we describe the C++
based ECO-Placer algorithm. Finally, we provide experimental results of the proposed framework
on set of benchmarks in Chapter 8. We conclude our work in Chapter 9 and discuss possible future
scope of work.

15
Chapter 2

On-Chip Power Supply Noise Analysis

Considering the adverse effects of the power supply noise on the circuit performance, it is im-
perative to ensure the power integrity of a design before tape-out. Ensuring the power integrity
of a chip requires proper design of on-chip power distribution network, and subsequent full-chip
noise analysis at the power supply nodes. While the design and refinement of the power distribution
network can be done at various stages of a design flow, an accurate full-chip voltage drop analysis
can only be possible during late stages of the design flow. This is due to the fact that the on-chip
voltage drop not only depends on the power consumption of logic blocks, but it also depends on
their placement locations. A logic block placed near the core boundary sees a small value of supply
interconnect parasitic as compared to a block placed at the center of a chip.

The analysis of a power supply noise at late stages of a design flow, however, poses unique chal-
lenges in terms of memory and time requirements. With increasing complexity of VLSI chips, the
power distribution network is also becoming more and more complex. Use of multiple metal layers
and via connections makes the power distribution network a three dimensional network. Further,
the power current requirement of underlying logic circuit depends on input data and varies from
location to location within a chip. Unless these issues are considered methodically, the problem of
voltage drop analysis becomes intractable. Therefore, a systematic computer aided approach is must
to handle the complexity of the power supply noise analysis, and to make the problem tractable.

In this chapter, we discuss standard approaches to handle the complexity of the power supply
noise analysis. We start with a typical design flow in Section 2.1, and identify the stage where
the voltage drop analysis fits into the design flow. Section 2.2 provides modeling requirements for

16
voltage drop analysis. This is followed by the two popular approaches, static and dynamic analysis,
in Sections 2.3 and 2.4 respectively. Decap is a powerful way to control the on-chip noise. The
importance and tyoe of on-chip decoupling capacitance is discussed in Section 2.5. Lastly, we
discuss the previous research in the domain of power supply noise optimization in Section 2.6.

2.1 Typical Design Flow

Figure 2.1 shows a typical top-down digital IC design flow. The design flow can be divided
into two parts: frontend, and backend. Frontend deals in logical design, whereas backend involves
physical design of the IC.

As shown in the figure, a standard cell based design flow starts with converting input design
specifications into a RTL description using design languages such as Verilog, VHDL, SystemC etc.
After functional testing of RTL model using test benches, the design is passed to the logic synthesis
step. A standard tool used for logic synthesis process is Synopsys Design Compiler. In addition to
RTL model, the logic synthesis tool also takes in design constraints and standard cell library as its
inputs. Based on the constraints, the design is optimized in terms of area, power and timing, and
a technology independent gate-level netlist is generated. The technology mapping process converts
the generic netlist into the technology-dependent gate-level netlist. This netlist describes the design
in the form of standard cells present in the library. Finally, design is verified for timing violations
using a static timing analyzer tool such as Synopsys PrimeTime.

The synthesized designed is place and routed in the backend stage. The backend design involves
floorplanning, power grid design, cell placement, power routing, clock tree synthesis (CTS), filler
cell insertion, and signal routing in order (not shown in the figure). The physical layout information
for cells at this stage is provided by the standard cell library. Timing verification is performed at
various stages during the backend flow. For example, timing specifications of design are typically
verified at pre-CTS, post-CTS, and post-route stages. The routing parasitic information for the
design can be captured in SPEF or SDF formats. Cell delay information together with routing
delays can be used to perform an accurate post-route timing sign-off analysis of the design.

In addition to timing sign-off, the chip needs to be analyzed for the power integrity. Power-Grid
sign-off (P/G sign-off) involves verifying the design against any potential voltage drop problems.
Although, an early analysis such as at floorplanning stage can prevent costly re-spin of the design

17
process, it is not very accurate due to lack of placement information. For accurate results, voltage
drop analysis is typically performed either after initial design placement or at post-route stage.
Design optimizations are applied to correct any voltage drop problems revealed at this stage.

Once the timing and p/g sign-off of the design is successful, last step involves generation of
GDSII/OASIS format of the design for final tape-out. As evident, a typical IC design flow involves
number of steps in order to realize a final workable chip. Any design changes required at the later
stages can result in cost re-spin and hence, should be avoided.

Design
Specifications

RTL Implementation Testbench


(ModelSim)

RTL Simulation
LIB, DB Logic Synthesis
(ModelSim)
(Synopsys DC)

 Gate Level Netlist, SDC

DB Timing Analysis VCD


Standard (PrimeTime)
Cell
Library
Place and Route
(SOC Encounter)

DEF, Netlist, SPEF

Post-Route Timing Design


Analysis (PrimeTime) Constraints

Timing Violations

Design Optimizations

Voltage Drop Analysis SPEF


User Defined Voltage (Static and Dynamic) VCD
Drop Threshold (PrimeRail)
Voltage Drop Violations

Design Optimizations Timing Analysis

Tape-out
(GDSII/OASIS)

Figure 2.1: Digital Design Flow

18
2.2 Voltage Drop Analysis Flow

As discussed in the chapter prelude, full-chip voltage drop analysis of an integrated circuit re-
quires a systematic computer aided approach due to the sheer magnitude of the problem size. Hi-
erarchical design of a complex integrated circuit suggests that the power supply noise analysis can
be performed at the individual logic block level. Hence local block level optimizations can be per-
formed to meet the noise margins at the supply and ground nodes. Although, block level analysis
offers advantages in terms of reduced memory and runtime requirement, a full-chip power integrity
analysis is must. When locally optimized logic blocks are combined together to form a complete
design, the current flowing through the power grid due to an adjacent block can affect the power
integrity of the logic block under analysis. Hence, a full-chip analysis can only ensure the power
integrity of the design.

Further the problem is complicated due to the non-linear behavior of the transistors loading the
power supply grid. A non-linear circuit simulator such as SPICE can be used to perform an accurate
analysis. However, full-chip analysis of a design netlist containing hundreds of millions of power
grid segments and transistors makes the process intractable.

The process of voltage drop analysis therefore requires creation of a full-chip model of a design.
The full-chip model of design is typically created in two steps. First, the model for the power
and ground network is created. This involves parasitic extraction of the power grid interconnect
discussed in Section 2.2.1. In the second step, transistor circuits loading the power distribution
network are represented by an equivalent linear current source model based on the current profile
of the circuit as discussed in Section 2.2.2. Finally, the two models are combined and a complete
linear model for full-chip power distribution analysis is created. Brief discussion of model analysis
is presented in Section 2.2.3.

Evidently, the model creation and analysis of the resulting network is a method of choice. How-
ever, this approach results in a conservative analysis, and slightly overestimates the power supply
noise levels. The reason for this behavior is the negative feedback between the current consumed
by the logic circuit and the power grid voltage drop. The high current consumed by a logic block
results in a significant voltage drop. This voltage drop, in turn, results in the decrease in the logic
block current, and hence reduces the overall voltage drop levels. Hence, an iterative analysis is
necessary to get an accurate picture of voltage drop profile of a chip.

19
2.2.1 Power Grid Modeling

Performance characteristic of a power distribution network is greatly affected by the associated


parasitic information. Until the dawn of deep submicron era, power grid network were typically
characterized by resistance and capacitance parasitic, and the inductance parasitic was mainly at-
tributed to the die-package interconnection wires. However, with increased frequency of opera-
tion as well as increased design complexity leading to larger current loops at deep submicron and
nanometer nodes, on-chip interconnect inductance is also becoming significant. Based on required
accuracy and analysis goals, the interconnect model can be simplified. Alternatively, all three, the
resistance, capacitance, and inductance parasitic can be included to form a full blown RLC network.
This translates a multi-layer power distribution grid into a 2D RLC network as shown in Figure 2.2.
Typically, DC analysis of power grid requires only resistance extraction, whereas the need for ca-
pacitance and inductance extraction arises during the transient analysis. Further, board-level and
package-level interconnect parasitic can also be included in the resulting on-chip model. In this
thesis, we assume ideal voltage levels at the die interface, and hence, restrict our analysis to an on-
chip network. Further, we only create resistance network for the power grid due to unavailability of
interconnect technology information. As described in Section 1.5, this simplification will not affect
the usefulness of our approach. Following subsections briefly describe the R, L, and C extraction
method [8]:

RLC extraction

On-chip power distribution network is formed using multiple metal layers. Contact and vias are
used to form the connections between layers. Long metal layers are typically divided into multiple
metal segments of smaller length, and each segment is modeled using a Π-network consisting of a
resistor and two capacitors. Distributed RC network results in more accurate results as compared to
a lumped RC model, where a long metal line is replaced by a single R and C elements. However,
the downside is the increased amount of model data.

Layer resistance can be characterized either using shape based extraction algorithms or using a
standard sheet resistance based formula as shown below.

R = s · wl (2.1)

where s is the sheet resistance in the unit of ohm/square, l is the length of line in um, and w is

20
width of the line in um. Further, each contact and via contributes a fixed resistance. Contact and via
resistance needs to be included in the overall resistance extraction data. Lastly, effect of temperature
and electromigration on metal resistivity can also be included for more accurate analysis.

Grid capacitance calculation can be done based on unit-length overlap, fringe, and lateral capac-
itance models [9]. Although, the complex geometrical layout can result in overwhelming amount of
capacitance data because the capacitance can be formed between any two overlapping segments in a
layout, the model size can be substantially reduced by ignoring the capacitive components between
non-adjacent lines. This little compromise in accuracy is acceptable for two reasons. First, it puts
lesser burden on time and memory requirement. Second, the overall capacitance of a power grid
is usually dominated by the capacitance contributed by the logic circuits [8]. Types of logic circuit
capacitance are discussed in subsequent sections.

The inductive properties of on-chip power distribution network are difficult to characterize.
Based on the shape and size of the current loops, loop inductance can be estimated, but the main
hurdle to this approach is that the current paths are not known in advance. Electromagnetic analysis
based PEEC models as described in [17] can be used to characterize the inductance of the grid.
Another approach involves creation of partial inductance matrix as described in [8].

Figure 2.2: RLC Model for On-Chip Power Grid Network

21
2.2.2 Logic Circuit Modeling

The level of power supply noise is greatly influenced by logic circuit forming a load to the
power distribution network. High power consumption of a logic block need not necessarily lead
to a large voltage drop. Only when, the current consumed by logic block flows through a highly
parasitic interconnect path, it leads to a substantial voltage drop. Hence, the voltage drop profile of
a chip depends on the placement and power current profile of logic block. The model for the logic
block involves calculating the power current profile for the block as well as parasitic contributed by
the block as shown in the Figure 2.3. Parasitic information of a transistorized circuit includes the
resistance and capacitance offered by it. Total capacitance offered by the logic circuit can act as a
decoupling capacitance, and can come from two sources (Intrinsic and Intentional) as described in
Section 2.5.1. Following describes the switching current modeling for the logic circuit.

I(t)

I(t)

Figure 2.3: Transistor Current Model

Switching Current Characterization

The current profile of a logic circuit is determined by three components of current: dynamic,
short-circuit, and leakage current. These current components are used to model the logic circuit
as triangular current source as shown in Figure 2.3. Accuracy of triangular current source repre-
sentation is defined by number of current point samples used to generate the model. During the
start of switching period, the current magnitude of the circuit increases linearly and attains a peak
value. The current magnitude, then, decreases linearly. This current profile is known as tap current.
The calculation of tap current is again complicated by the fact that the current profile of a logic
circuit depends on the input pattern. In case multiple inputs, worst case switching pattern is used to
determine the current profile.

22
Each transistor connection to the power grid creates a tap point. Although it is possible, and
indeed easy, to calculate the tap current for each transistor, the resulting tap current information
would be difficult to handle. Instead, tap current information is captured for individual logic gates
or small macros. The worst case current profile can easily be calculated for such small circuits due
to relatively small number of inputs.

Further, depending on the type of analysis as discussed in next section, the tap current informa-
tion can either represent the average current profile of a logic gate or transient current information.

2.2.3 Model Analysis

The RLC model of power distribution network is combined with the logic circuit tap current
information along with the decoupling capacitors at various grid nodes. The resulting model as
shown in Figure 2.4 forms a linear network for power supply noise analysis, and can be represented
by equation 2.2.
G · v(t) +C · v′ (t) = i(t) (2.2)

where G is a conductance matrix, C is an admittance matrix containing capacitive and inductive


elements, v(t) is a matrix consisting of time varying voltages and currents through various nodes,
and i(t) is a matrix containing independent current sources. Many efficient linear circuit analysis
techniques can be used to calculate the voltage drop values at various nodes in the grid. These linear
methods are classified into direct and iterative methods. Direct methods solves a nodal equation
shown above by decomposition and substitution. For example, backward euler technique is one
such efficient method. The iterative methods start with initial guess, and refine the nodal values
iteratively until the nodal voltages attain stable values. Many techniques to enhance the efficiency
of the numerical analysis methods can be found in [8].

2.3 Static Voltage Drop Analysis

The Static Voltage Drop Analysis is based on the average current of the logic circuit. The tap
current captures the average value of the logic switching current. The goal of static analysis is not to
find the accurate voltage drop in the circuit. Rather, the main value of static analysis lies in verifying
the effectiveness of the power grid structure. Problems in power grid structure such as short, open,
insufficient width of the power interconnect can easily be identified using average current analysis.

23
VDD

VDD

VDD

VDD

Figure 2.4: Complete Design Model for Voltage Drop Analysis

The advantage of static analysis is its simplicity. Calculation and storage of average tap current
information for each transistor or gate is relatively easy. The average current of a gate can be
determined statistically using the gate switching activity information. Switching activity of a gate
can in turn be determined using input switching activity propagation algorithms or by performing a
gate level simulation of the design using test benches. Once the gate switching activity is known,
the average current of a gate can be given by [9]:

Iavg = A ·Cgate ·V dd · Fclk (2.3)

where A is the gate switching activity value, Cgate is the total gate capacitance of nets in the gate
including the load capacitance, V dd is the supply voltage, and Fclk is the chip frequency.

The average current analysis simplifies the overall power grid model since only resistance par-
asitic is required to be considered. The power grid network simplifies to a two dimensional linear
resistive network. Simple nodal analysis based on ohm’s law can be used to calculate the nodal
voltages. Another advantage of static analysis is that it does not require input vectors, which greatly
simplifies the analysis.

If some part of power grid contains a open, the current flowing through that part will encounter
more resistance, and resulting higher voltage drop would clearly point out this problem. The static

24
analysis can also be used to analyze the electromigration phenomenon, which depends on the trans-
port of metal ions by the direct (average) current.

Main steps in static analysis are summarized as follows [18]:

1. Parasitic resistance of the power grid is extracted, and resistance matrix is formed.
2. Average tap current for each transistor or gate is calculated.
3. The tap currents are attached to the resistive power grid network at designated tap points.
4. Depending on the VDD pad location, ideal supply voltage is attached to the power grid net-
work.
5. The resulting linear resistive network is solved using nodal analysis to calculate the current
and voltage levels at various nodes.

2.4 Dynamic Voltage Drop Analysis

The Dynamic Voltage Drop Analysis is based on the transient current of the logic circuit. As de-
scribed in Section 2.2.2, the tap current captures the logic current with respect to time as a triangular
current source. The goal of dynamic analysis is to perform an accurate voltage drop analysis. The
instantaneous current taken by a gate during the clock period can be high as compared to the average
current during the same clock cycle as shown in Figure 2.5. Hence, the instantaneous voltage drop
would be significantly high, and can only be captured using transient analysis of power distribution
network.

Figure 2.5: Transient and Average Current

The main advantage of dynamic analysis is its accuracy. However, the dynamic analysis poses
number of challenges. Dynamic analysis requires extraction of R, L, and C parasitic (Inductance can

25
be ignored, if it is small). The resulting network containing huge number of elements puts an upper
limit on circuit simulation time and memory capacity. Moreover, the tap current information for a
gate is no longer a single value. Rather tap current information for each gate must contain a series
of 2-tuple values representing transient current value and associated time stamp. This significantly
increases the memory requirement. Further, dynamic analysis requires good input vector coverage.
Some portion of the design might not be analyzed for voltage drop effects due to insufficient vector
coverage.

Main steps in dynamic analysis are summarized below [18]:

1. Power grid network modeling requires RLC extraction.


2. The tap current for each logic gate is modeled by the triangular current source.
3. Decoupling capacitance due to intentional sources as well as due to inactive logic is deter-
mined.
4. The tap current along with the decoupling capacitors are attached to the power grid tap points.
5. Based on the input pattern, power consumption of the logic gates is also captured. If input
patterns are not available, value charge dump (VCD) information from gate level simulation
can be used.
6. Based on the power consumption values, tap current values are scaled.
7. The resulting network is simulated iteratively, and nodal voltage values with respect to time
are calculated.

2.5 On-Chip Decoupling Capacitance (Decap)

Decoupling capacitors play an important role in stabilizing the power supply variations. Decou-
pling capacitors act as local charge reservoirs. It provides a low impedance path for the current to
the logic circuit. It lowers the overall impedance of the power distribution system as seen from the
load by providing the instantaneous charge during the switching period of logic gate.

Figure 2.6 highlights the significance of the decoupling capacitor. The Figure 2.6[A] shows
a circuit without decoupling capacitors. The power distribution network is modeled by R and L
elements. As described in Section 1.1, whenever load draws current from input supply, supply node
at load suffers from voltage drop, and voltage rises at the ground node, when current takes a return

26
path. This decrease in voltage level across load affects the circuit performance. This performance
penalty can be decreased by the use of decoupling capacitor as shown in Figure 2.6[B]. During the
inactive period of load, the decoupling capacitor charges from supply pad at slower rate, and acts as
a charge reservoir. It now provides the required charge to load instantaneously during its switching
period. Based on the capacity of the capacitor, a major portion of the current would be supplied by
the decoupling capacitor, and a very small portion would be drawn from the input supply, resulting
in relatively small voltage drop across the load. The impedance of power supply network is thus
lowered due to addition of decoupling capacitor as seen from the load. Decoupling capacitors are
very useful to reduce the effective impedance of power distribution network.

R1 L1 R2 L2 R1 L1 R2 L2

RD

VDD Load VDD Load


CD

R4 L4 R3 L3 R4 L4 R3 L3

(A) (B)

Figure 2.6: Circuit Model Illustrating Effectiveness of Decoupling Capacitor

Although, decoupling capacitor can improve the voltage drop profile of a chip, unintelligent ad-
dition of decoupling capacitors can raise several concerns for the chip. Typically, the decoupling
capacitors are added in the unused areas of the chip core, known as whitespaces. If the decoupling
capacitors require more area than available whitespace, it results in increase in die area, and de-
crease in yield of the integrated circuit. Further, each decoupling capacitor contributes to a leakage
current. The static power dissipation of chip can thus increase with increase in number of decou-
pling capacitors. This factor also directly affects the circuit yield. Further, as described in previous
chapter, with technology scaling, distributed placement of decoupling capacitors is must. Hence,
optimization of number and placement of decoupling capacitors is important.

2.5.1 Sources of On-Chip Decap

Typically, the total decoupling capacitance of a design can be classified into two categories [8]:
Intrinsic, and Intentional. Intrinsic decoupling is offered by the parasitic capacitance of the logic

27
circuit. One source of intrinsic decoupling contributed by the power grid capacitance is already
discussed in previous section. The logic circuit also offers capacitances such as drain junction ca-
pacitance, gate-source capacitance. The pn junction capacitance due to the N and P wells also
contribute to intrinsic decoupling repository. Due to the large well area, the parasitic well decou-
pling capacitance usually dominates the intrinsic capacitance. The non-switching logic circuit can
also provide significant decoupling capacitance. The intrinsic capacitance contributed by the logic
circuit can be determined based on the power consumption of the circuit [8, 9]

Cdecap = [P/(V dd 2 · Fclk )] · (1 − A)/A (2.4)

where Cdecap is the total intrinsic decap of circuit, P is the power of the circuit, V dd is the power
supply, Fclk is the clock frequency, and A is the switching activity factor of the circuit.

Apart from the intrinsic device decoupling capacitance, designer can add MOS based decoupling
capacitors. These capacitors are known as intentional decoupling capacitors, and are typically real-
ized as MOSFET gate oxide capacitance. The design, modeling, and characterization of intentional
decoupling capacitors is discussed in chapter 5.

2.6 Research in Voltage Drop Optimization

Since the power supply voltage directly affects the circuit performance, containing the on-chip
power supply noise within bounds had been a topic of research since over a decade. As described
in Section 1.2.3, decoupling capacitors are indispensable means for controlling the on-chip power
supply noise. Efficacy and placement of decaps on-chip has been analyzed in [19, 20] based on
effective radii of decaps. Decaps must be placed within the effective radius determined by current
load and the input power supply. Authors in [15] provides an early work on on-chip decap opti-
mization for controlling power supply noise. Several contributions since then have been made to
address the issue of power supply noise by optimizing on-chip decap [21, 22, 23, 24]. In [15, 21],
authors consider decap allocation and optimization at floorplan level for full-custom design style.
Given the initial floorplan and switching profile of circuit modules, noise levels at circuit modules
are calculated and decap is allocated to available whitespace in the floorplan using linear program-
ming. If required, additional whitespace is also inserted into floorplan based on hueristic criteria
to meet design decap demand. Architectural level current signatures for various functional blocks
in a processor are used in [22] to estimate power supply noise level and required decap budget for

28
functional blocks. Authors in [22] evaluate different decap placement strategies by analyzing four
decap cases and show that ditributed decap placement approach provides best noise attenuation.
The research focus for these works has primarily been on decap optimization for full custom style
designs. Designs are analyzed at floorplan level where large functional modules are abstracted by
current source representation. Addition of decaps for such cases results in their placement away
from the switching nodes and requires large decap budget.

With pronouncing effects at deep-submicron node, need for decap placement close to switching
nodes is highlighted in [19]. This requires that the power supply noise and decap optimization
must be analyzed at finer abstraction level of the design. On-chip decap optimization at standard
cell level has been analyzed in [23, 24]. In [23], authors propose a non-linear programming based
decap optimization scheme applicable subsequent to placement stage, and calculate optimal decap
allocation for standard cell rows by analyzing an adjoint network of the original power distribution
model of the design. And authors in [24] pad the standard cells with decap to reduce the power
supply noise. Decap padding to standard cells is predicted based on gate switching activity prior to
placement. The padded decap amount is corrected after placement and power grid noise analysis
by gate sizing. Although these approaches are shown to provide effective distributed control of
power grid noise, they add additional steps to power grid noise analysis, increasing its complexity
further. Moreover, these approaches are not very conducive to traditional library based design flow.
Therefore, as discussed in Section 1.3, we aim to develop alternative framework for distributed
decap optimization with the help of decap-padded standard cell library.

29
Chapter 3

Motivation: Initial Experiments

As discussed in Section 1.3, effective control of voltage drop requires placement of decaps close
to the switching nodes. Filler-based decap allocation approach compromises the power integrity
by placing decaps away from switching nodes and results into more-than-necessary decap budget.
This directly affects the reliability of the circuit. Therefore, a distributed approach for decap place-
ment is necessary, where decaps are placed physically close to the switching loads [19]. We verify
the effectiveness of decap placement close to the switching loads in this chapter. Experimental re-
sults presented in this chapter provide a motivation for development of voltage drop compensation
framework (described in chapter 5) using a decap-padded standard cell library (described in chapter
6). We analyze different circuit configurations, and compare the efficacy of distributed approach
with lumped decap placement approach, which typically emulates the filler-based decap allocation
method.

3.1 Circuit Modeling

Figure 3.1 shows a circuit model for evaluating effectiveness of distributed decap placement
approach on power supply noise. The model forms a coarse representation of a typical row in a
row-based standard cell design style, where logic circuits in a row share common power and ground
lines. Power supply to a row comes from the die pad, and is assumed to be ideal. The logic circuit
is approximated by a block of 20 parallel inverters (Figure 3.2). The rationale for representing the
logic block by parallel inverters is as follows:

30
• Inverters are backbone of all digital logic design. All complex logic gates can be converted
to an equivalent inverter representation for analysis. Behavior of complex logic gates can be
derived by extrapolating the results obtained for inverter [4].

• Unlike a ring oscillator, the rise and fall transitions for block of parallel inverter can be con-
trolled independently. This is important to analyze the effect of voltage drop.

• Lastly, a block of 20 parallel inverters (INVX8) is used to emulate a power hungry logic
circuit block. Simultaneous switching of parallel inverters draws enough current from the
input power supply so as to produce appreciable results for the voltage drop.

V2 V1

VDD Dist.
Lumped Basic
Decap Decap
Block
C2 C1

G2 G1

Power Distribution Interconnect


Segment Impedance

Figure 3.1: Circuit Model for Evaluating Effectiveness of Distributed Decap Placement

i11 o11

Figure 3.2: Logic Circuit Representing Basic Block Shown in Figure 3.1

The distributed and lumped decoupling capacitors are represented by C1 and C2 respectively.
For simplicity, the distributed capacitance is represented by a single capacitance C1. In actuality,
we distribute the capacitance C1 with each of the 20 parallel inverters in a logic block. Both the
lumped and the distributed capacitances are realized as a MOSFET gate capacitance. A standard

31
decoupling cell as described in chapter 6 is used to add the required value of lumped capacitance to
the circuit. Distributed capacitance for a block is added by padding the standard inverter cell with
required amount of decap.

The power lines connecting the logic block to the input supply voltage are modeled using its
equivalent parasitic elements representing interconnect segment impedance Zx. A global wire is
assumed for power and ground routing. Arizona State University (ASU) interconnect model [25, 26]
is used to derive the global wire parasitic values. Table 3.2 shows the parasitic values for global wire
with specific wire parameters shown in Table 3.1 for 0.18µm technology node.

Table 3.1: Global Wire Parameters at 0.18µm Node

Parameter Value
Width 0.8µm
Space 0.8µm
Thickness 1.25µm
HeightILD 0.65
KILD 3.5
Width 0.8
Material Cu

Table 3.2: Global Wire Parasitic Values at 0.18µm Node

Element Value
Resistance 22.92 Ω/mm
Inductance 1.66 nH/mm
Capacitance 238.8 fF/mm

For this experiment, R and L are varied in the range of [0.1, 1.6] and [0.1pH, 1pH], which trans-
lates to wire length in the range of 10 to 50um. Since the parasitic capacitance for this wirelength
range is too small, it is ignored in the analysis. The supply voltage for 0.18u technology is 1.8 V.
And the maximum tolerable ripple at the logic block nodes is assumed to be 5% of the power supply
voltage. Hence the power supply is considered to be noise free, if the voltage at power supply nodes
is within range [1.89 - 1.71]. Any node having voltage outside this range is considered to be noisy
and a decap must be added to reduce the noise. The input waveform for blocks is shown in Figure
3.3. The rise and fall time is set to 80ps. The power supply voltage drop is measured only during
output load charging (input falling transition). The output load for a logic block is assumed to be
1pF.

32
r f

Figure 3.3: Input Waveform for Distributed Decap Experiment

3.2 Experimental setup

We start by creating a layout of the logic block in 0.18um technology using Magic layout editor.
The resulting block is extracted to spice, and interconnect parasitic are manually added to the spice
netlist to form a power distribution network. This is followed by an transient analysis, where worst
voltage drop at various supply nodes is recorded.

Table 3.3: Decap Cases Analyzed for Distributed Decap Experiment

Case Description
N No Decap (C1 ≥ 0 & C2 = 0)
A Only Lumped Decap (C1 ≥ 0 & C2 = 0)
B Distributed decap C2 s.t. Acell + = (20% · Acell ) & C1 ≥ 0
C Distributed decap C2 s.t. Acell + = (30% · Acell ) & C1 ≥ 0

In order to analyze the effect of lumped and distributed decap placement on the resulting voltage
drop, we compare different decap cases shown in Table 3.3. Case N represents a circuit without
decoupling capacitances C1 and C2. In case A, we add enough lumped decap C2, such that the
voltage drop at various supply nodes is within the tolerable band. Cases B and C are used to analyze
the effect of addition of distributed decap. Addition of decap to a standard cell increases the cell
area (refer to chapter 5). Hence, amount of decoupling capacitance added to a standard cell can be
controlled by changing the cell area. In case B, amount of distributed decap added to the logic block
is such that it leads to 20% increase in the logic block area. In case C, higher value of decoupling
is added by increasing the logic block area to 30%. In both cases B, and C, lumped decap is also
added along with the distributed decap such that the voltage drop at various supply nodes is within
the tolerable noise band.

33
Table 3.4: Effect of Decap Addition on Block Area

Case Block Area Decap Value Decap Value


(λ) per Cell (fF) per Block (fF)
N 800 0 0
B 960 14.4 288
C 1050 37.2 744

Table 3.4 shows the change in area of the basic block with increasing amount of decap per cell.
Intrinsic decap for cell is assumed to be zero. Hence a basic block without any intentional de-
cap contributes zero decoupling capacitance. Basic block uses 20 parallel single height inverters
(INVX8), hence block area is represented by block width, measured in λ. The λ for 0.18µm tech-
nology node is 0.1. Decap value per cell, measured in femto farad, shown in Table 3.4 corresponds
to the increased cell area. Decap cell with MOSFET width W and channel length L is added to
INVX8 such that it leads to given increase in cell area. The decap value for this cell with W and L
are calculated as per capacitance equation given in chapter 5.

3.3 Results

We analyze different circuit configurations by varying the number of basic blocks and intercon-
nect parasitic values in the model. Since similar results are obtained for various circuit configura-
tions, we here show results only for two circuit configurations. Figure 3.1 shows the first circuit
configuration containing one basic block and both supply & ground parasitic. Figure 3.4 shows
another configuration with two basic blocks.

V3 V2 V1
Z3 Z2 Z1
VDD Dist. Dist.
Lumped
Decap Decap B2 Decap B1
C2 C1 C1

Z3 Z2 Z1
G3 G2 G1

Figure 3.4: Circuit Model with Two Basic Blocks for Distributed Decap Experiment

Table 3.5 and Table 3.6 show results for these two circuit configurations respectively. Figure 3.5
and Figure 3.6 shows the graphical representation of the results respectively. The impedance Zx
value in the table given as x/y represents a series connection of a resistance with value x Ω and an

34
inductance with value y H. Voltage drop results for Table 3.6 corresponds to simultaneous switching
of both basic blocks. Description of column headings is given below:

Para Cases Cases for different values of interconnect parasitic


Zx Interconnect segment impedance represented as series connection of R and L
Decap Cases Different decap cases shown in Table 3.3
C1 Distributed decap in pF realized by padding INVX8 in basic block
C2 Lumped decap in pF placed away from the load
CT Total capacitance in the circuit (C1 +C2)
Ablk Area of the basic block in λ
AC2 Area of the lumped capacitance C2 in λ
AT Total area of circuit (Ablk + AC2 )

3.4 Analysis

As evident from the experimental results for both models, the distributed decoupling capacitor
approach does provide significant benefits in terms of total decap requirement. Total amount of
decap required in distributed case as compared to lumped decap approach is significantly reduced
as the amount of decoupling capacitance per logic block is increased. Even a minimum amount of
decap addition to the logic block also provides sufficient gain in overall decap budget. Although
a slight area penalty with minimum decap per block can be observed, the area increase is not con-
sistent. It depends on interconnect parasitic values. As seen from Figures 3.5 and 3.6, there is a
decrease in design area for some parasitic cases. And with decap case C, design area in all cases
is less than the design area with lumped decap case A. The slight area increase in decap case B as
compared to case A is not of big concern due to following reasons:

1. Typically design core area utilization is kept below 70-75%. This is done to accommodate
signal routing requirements and late stage design changes. Therefore, slight area penalty
can be amortized at the overall design level due to available whitespaces in the core and the
decrease in decap requirement.

2. Additionally, 10-20% area is reserved for the decap allocation. This also creates enough
whitespaces for accommodating decap cells. Figures 3.5 and 3.6 shows decap area require-
ment larger than 10-20%. This is a result of model simplification, where power supply current
can take only one path. In actual designs, current to a logic block can come from multiple
paths, substantially reducing the overall path resistance.

35
Table 3.5: Analysis Results for Circuit Model Shown in Figure 3.1

Para Z1 Z2 Decap C1 C2 CT Ablk AC2 AT


Cases (Ω/H) (Ω/H) Cases (pF) (pF)
N 0 0 0 800 0 800
A 0 3.32 3.32 800 1012 1812
I 0.1/0 1.5/0
B 0.288 2.99 3.28 960 900 1860
C 0.744 2.48 3.22 1050 750 1800
N 0 0 0 800 0 800
A 0 6.58 6.58 800 1984 2784
II 0.2/0 1.6/0
B 0.288 6.22 6.51 960 1878 2838
C 0.744 5.65 6.39 1050 1706 2756
N 0 0 0 800 0 800
A 0 3.61 3.61 800 1100 1900
III 0.1/0.1p 1.5/1p
B 0.288 3.28 3.57 960 996 1896
C 0.744 2.76 3.50 1050 846 1896
N 0 0 0 800 0 800
A 0 6.83 6.83 800 2070 2870
IV 0.2/0.1p 1.6/1p
B 0.288 6.46 6.75 960 1950 2910
C 0.744 5.89 6.63 1050 1774 2824

Decap Case A Decap Case B Decap Case C


Total Decap Required (pF)

Para Case I Para Case II Para Case III Para Case IV


Design Area (Ȝ)

Para Case I Para Case II Para Case III Para Case IV

Figure 3.5: Graphs Show Change in Area and Decap Requirement for Circuit Model in Figure 3.1
analyzed for different interconnect parasitic values

36
Table 3.6: Analysis Results for Circuit Model Shown in Figure 3.4

Para Z1=Z2 Z3 Decap 2 × C1 C2 CT 2 × Ablk AC2 AT


Cases (Ω/H) (Ω/H) Cases (pF) (pF)
N 0 0 0 800 0 1600
A 0 35.55 35.55 800 10672 12272
I 0.1/0 1.5/0
B 0.288 34.66 35.24 960 10406 12326
C 0.744 33.26 34.75 1050 10046 12146
N 0 0 0 800 0 1600
A 0 66.21 66.21 800 19876 21476
II 0.2/0 1.6/0
B 0.288 64.30 64.88 960 19296 21216
C 0.744 61.39 62.88 1050 18424 20524
N 0 0 0 800 0 1600
A 0 35.91 35.91 800 10774 12374
III 0.1/0.1p 1.5/1p
B 0.288 35.04 35.62 960 10522 12442
C 0.744 33.73 35.22 1050 10134 12234
N 0 0 0 800 0 1600
A 0 67.02 67.02 800 20116 21716
IV 0.2/0.1p 1.6/1p
B 0.288 65.11 65.69 960 19548 21468
C 0.744 62.03 63.52 1050 18622 20722

Decap Case A Decap Case B Decap Case C


Total Decap Required (pF)

Para Case I Para Case II Para Case III Para Case IV


Design Area ( Ȝ)

Para Case I Para Case II Para Case III Para Case IV

Figure 3.6: Graphs Show Change in Area and Decap Requirement for Circuit Model Shown in
Figure 3.4 analyzed for different interconnect parasitic values

37
Chapter 4

Proposed Voltage Drop Optimizaton


Framework

As discussed in chapter 2, an accurate voltage drop analysis is possible only during the late
stages of design flow. A design needs to go through synthesis followed by place and route stages,
before the adverse effects of voltage drop problem can be analyzed. The reason lies in the fact that
voltage drop inside a chip not only depends on the power consumption of logic circuits and power
distribution network parasitic, but it also depends on the physical placement of logic blocks and their
connectivity. Decoupling capacitors offer an effective way to tame the power supply noise at this
level. Decoupling capacitors are placed in the available whitespace (core area not utilized by logic
cells) on the chip. Special decoupling cells are designed to enable their placement with standard
cells in a row based designs. The design of decoupling capacitor cells is discussed in chapter 5. For
following discussion, a decoupling capacitor cell can be thought of as a passive element offering a
specific value of capacitance.

Figure 4.1 illustrates the complete framework to analyze and control the voltage drop by placing
decoupling capacitors in the available whitespaces. A design in RTL specification is synthesized
using nominal standard cell library, OSULIB (described in chapter 5). Synopsys design compiler
is used to synthesize the design and to generate a gate-level netlist. This gate level netlist is place
and routed using Cadence Encounter. The synthesis and place-n-route flow can be found in [27].
The voltage drop analysis is then performed on a place-n-routed design. The left branch in the
figure refers to a traditional method of controlling voltage drop. And the right branch highlights

38
our approach. Both of these approaches are discussed in subsequent sections. We use Synopsys’
PrimeRail tool for dynamic voltage drop analysis. In Section 4.3, we discuss the data preparation
needs to perform analysis using PrimeRail. Section 4.4 depicts the flow for performing the analysis
using PrimeRail. We end this chapter with discussion of benchmarks used for analysis in Section
4.5.

Design
Specifications

Behavioral/RTL
Design


LIB, DB Logic Synthesis
OSU LIB
Design Compiler
DCFLIB
(Gate Level Netlist, SDC)

LIB, LEF Place-n-Route


Encounter
OSULIB
UCDCLIB
DCFLIB
(DEF, SPEF, SDC)
User Defined Voltage Drop
DB, LEF, Voltage Drop Analysis
Milkyway Ref Threshold
LM, FRAM
(DvD)
Library PARA
VCD/SAIF or
PrimeRail
Switching Activity

DCOPT

Decap Perform Virtual Decap


Masters Analyze Filler Cell
Replacement Optimization
With Decap Cells

Perform DvD

Insert Decap Cells


(Cell Operations)

ECO-Placer: LIB, LEF


ECO Placement UCDCLIB
Perform DvD

ECO Routing
(Encounter)
Voltage Drop Sign-off

Voltage Drop Sign-off

Traditional Proposed
Approach Approach

Figure 4.1: Proposed Voltage Drop Optimization Approach

39
4.1 Traditional Approach

Traditional method to control the voltage drop problem involves placing the decoupling capac-
itors by replacing filler cells [3]. Typically, a placed design contains enough whitespaces due to
two reasons: core dimensions calculation has to account for 10 to 20% of area for decoupling
placement, and core utilization is typically maintained less than 100% (around 70 to 75%) to meet
routing requirements. Filler cells are inserted in these whitespaces to provide proper well connectiv-
ity. Once the analysis reveals voltage drop problems, filler cells from the affected area are replaced
with decoupling capacitor (decap) cells. We call this approach a filler-based decap allocation. This
approach offers an easy solution, since replacement of filler cells with decap cells does not call for
placement modification. The standard cells in the design are left untouched.

[A] Design with Filler cells [B] Replace all Filler cells [C] Iteratively remove Decap
(Virtually). Perform DvD Cells to meet voltage drop
target. Instantiate Decaps.

Standard Cell Filler Cell Decap Cell

Figure 4.2: PrimeRail Decap Optimization (Source [3])

Synopsys’ PrimeRail tool provides voltage drop optimization based on filler-based decap place-
ment. Figure 4.2 shows the decap optimization procedure. For this optimization to work, design
must contain filler cells. The original design (Figure 4.2[A]) contains only standard cells and filler
cells. The tool first virtually replaces all filler cells with the equivalent sized decap cells (Figure
4.2[B]). One to one correspondence between filler cell masters and decap masters is required. It
then performs multiple iterations of voltage drop analysis and selective removal of filler cell to
achieve the user defined target reduction in the voltage drop (Figure 4.2[C]). For each iteration, it
reports reduction in voltage drop and required decap repository. At the end of analysis, PrimeRail
provides an option to select result of any iteration and perform the actual decap insertion. The
modified design is saved in the MilkyWay Database format. We design and characterize a library

40
DCFLIB (described in chapter 5) containing filler cells and equivalent decap masters to enable this
functionality. We compare results of our approach with Synopsys’ PrimeRail results of filler-based
decap insertion. Results are presented in chapter 8.

Although filler-based decap allocation provides an easy and effective solution, the transition into
deep submicron and nanometer renders this approach less cost-effective. A combination of factors
such as voltage scaling, increase in frequency of operation and design complexity, and increase in
interconnect parasitic per unit length with technology scaling are making a design more susceptible
to power supply noise. As a result of these factors, the filler-based decap allocation results in
following two problems [19]:

• Filler-based decap allocation compromises the power supply integrity by placing decaps away
from the switching nodes. Timing driven placement usually crams in logic cells nearby, mov-
ing the whitespaces toward the block or core periphery. Decaps placed in these whitespaces
by replacing filler cells results in increase in effective distance between the switching nodes
and decoupling capacitors. The increased effective distance between switching node and de-
cap corresponds to larger supply and ground parasitic, which reduces the effectiveness of
filler-based decap.

• As a consequence of above problem, higher value of decap is required to control the power
supply noise. The extra decap results into wasted area, higher power consumption, and re-
duces the yield of a chip.

4.2 Proposed Approach

Apparent from problems indicated in previous section, a distributed placement approach of de-
caps is needed [19]. In fact, decaps are required to be placed near to the switching nodes for them
to be cost-effective. The experimental results presented in chapter 3 highlight the effectiveness of
placing decaps near the switching nodes. As decap is moved near logic cells, the overall require-
ment on decap budget reduces significantly with a slight or no penalty in overall area as compared
to a traditional lumped decap placement approach.

We, therefore, propose a distributed decap approach to control the voltage drop problems around
the chip. We provide a complete design framework to analyze the voltage drop in a chip, and com-

41
pensate it by incorporating the decoupling capacitors close to the violating nodes. This is done
by providing designers with an additional set of library, UCDCLIB, containing decap-padded logic
cells along with the nominal cell library, OSULIB. Each logic cell in UCDCLIB is logically equiv-
alent to a logic cell in OSULIB, but additionally contains a specific amount of decap padded to it.
The design and characterization of UCDCLIB cells is presented in chapter 5.

As shown in Figure 4.1, design is synthesized using nominal cell library. The nominal cell
library in our case is OSU standard cell library [28]. The design is place and routed, and is analyzed
for initial voltage drop. We, then algorithmically, identify the voltage drop regions exceeding the
user defined threshold. The decap requirement for these affected regions is satisfied iteratively by
replacing standard logic cells with equivalent decap-padded logic cells from UCDCLIB. Optimal
selection of number of standard cells for replacement and calculation of decap budget is done by
a C++ based decap optimization procedure, DCOPT, described in chapter 6. Once decap budget
is calculated, and number of cell replacements are decided, the original placement needs to be
modified. Optimal way to modify the original placement from voltage drop point of view is done
by a C++ based Engineering Change Order Placement tool, ECO-Placer, described in chapter 7.
Results shown in chapter 8 highlights effectiveness of our approach. Our approach reduces the
total decap budget substantially while providing a better voltage drop profile than the traditional
filler-based decap allocation approach.

4.3 Data Preparation for Voltage Drop Analysis

Data preparation is an important step to perform voltage drop analysis. PrimeRail tool requires
cell and design related data in a specific format, before it can analyze a design for potential voltage
drop problems. A brief overview of data preparation needs [3] is discussed in this section.

4.3.1 Milkyway Database

Milkyway database refers to a common data repository for Synopsys integrated circuit (IC) de-
sign tools. Easy and efficient interoperability among various Synopsys tools is achieved by captur-
ing cell library and design data in MilkyWay database format. A common database eliminates the
need for exchanging large design files thereby saving data translation times, and preventing errors
and inconsistencies due to semantic mismatches between tools. The database provides an appli-

42
cation programming interface (API) for database access and Scheme language extension for easy
integration and customization.

The Synopsys Milkyway database contains multiple directories and files in a tree structure. The
root node of directory structure can be a reference or design library. Each library can contain a
complete design, or modules representing a design, or logic cells. Various data views are generated
for each of these library components to characterize the library. Figure 4.3 lists important views
and their content. Each view provides specific information during the design flow. If library and
its components are used within another library (or a design), it is referred as a reference library. A
design library is a library which instantiates components from reference libraries. A design library
can also be made to refer as a reference library, if it is required to be used in another design in a
hierarchical fashion.

CEL Physical Data of cell/design imported through flow

FRAM Placement and Routing details of cell/design

LM Timing, Power, and Logic Information of cell/design


Milkyway
Library
LOGIC Timing Constraints, Clock definitions, and Netlist
Information (same as Netlist + SDC)

NETL Cell/design Netlist information

PARA Parasitic RC view of cell/desin

Figure 4.3: Milkyway Library Views and their Description

We generate Milkyway reference library for our libraries, UCDCLIB and DCFLIB, as well as
nominal library, OSULIB since library cells from these libraries are instantiated in benchmark de-
signs. The reference libraries are generated using physical (LEF) and timing (LIB, DB) views of
libraries (library view generation is described in chapter 5). We also generate Milkyway design
library for each of our benchmark designs. Steps to generate design and reference library are given
in Appendix A.

43
4.3.2 Cell Parasitic Extraction

As discussed in chapter 2, logic gates present a significant load to the power distribution net-
work due to various intrinsic capacitive effects. The cell load capacitance is usually shielded by
the transistor channel resistance, and hence, does not explicitly appear as a direct load to power
distribution network. However, intrinsic cell capacitance does load the power grid network, and
hence, is important to characterize since it can provide a decoupling effect. PrimeRail uses HSPICE
to characterize the cell resistance and capacitance by performing DC and AC analysis respectively.
During AC analysis, a sinusoidal waveform is applied to a gate, and intrinsic capacitance is calcu-
lated based on magnitude and phase of current response. A characterized gate is then represented by
a Π-model (C − R −C with first C for intrinsic capacitance, last C referring to load capacitance, and
R representing a channel resistance) during final voltage drop analysis. The cell characterization
information is captured as a PARA (parasitic) view in the Milkyway reference library. Appendix B
shows steps for cell parasitic characterization.

4.3.3 Cell Current Characterization

Dynamic voltage drop analysis requires peak current information during a switching event to
calculate peak voltage drop as discussed in Chapter 2. PrimeRail captures the current waveforms of
logic gates by performing HSPICE simulations for various input slope and output load conditions.
Only specific points in the current waveform are captured with their time stamp. The piecewise
linear data is stored in the form of a lookup table in a Milkyway reference library database. Library
current characterization steps are discussed in Appendix B. The cell library characterization for par-
asitic extraction and current waveform generation requires transistor model file (for 0.18um node)
and spice netlist for each cell.

4.3.4 Additional Design Inputs

Apart from cell library characterization information, PrimeRail requires additional information
to successfully perform a dynamic voltage drop analysis. Additional data required for analysis
include:

1. Design placement (.def): Final place and routed design can be saved in a DEF format. Milky-

44
way design library is generated by importing a place-n-routed design in DEF format.

2. Synopsys Design Constraints (.sdc): SDC file is generated by the Synopsys Design Com-
piler during logic synthesis. This file is required for performing a gate level power analysis
as described in next section.

3. Signal Net Parasitic file (.spef): Standard Parasitic Extraction File (SPEF) can be generated
after a design is place and routed. SPEF is an IEEE standard format for capturing parasitic
information associated with various signals in a design. Signal net parasitic are required to
calculate signal delay and gate power consumption information.

4. Post-Route Verilog Netlist (.v): Once a design goes through placement stages, tool adds
additional buffers for timing optimization and clock tree synthesis. Hence, the final placed
design netlist will not be same as gate level netlist after logic synthesis. Post-route netlist is
required for design power analysis as described in next section.

5. Value Change Dump File (.vcd): A Value Change Dump (VCD) file contains signal transi-
tions generated by performing design simulation using ModelSim. This file is only required,
if vector-based dynamic analysis is to be performed.

4.4 PrimeRail: Voltage Drop Analysis Flow

Figure 4.4 illustrates the cell level dynamic voltage drop analysis flow. Input place-n-routed
design is captured as Milkyway design library. Milkyway reference library is created for OSULIB,
DCFLIB, and UCDCLIB libraries. Cell parasitic and current information is stored in Milkyway
database. Dynamic voltage drop analysis is then performed in three steps discussed below. When
analysis is over, PrimeRail creates voltage waveform database (stored in Milkyway design library
database) and voltage violation reports. Voltage drop profile of a design can be analyzed graphically
using generated maps. Detailed description of steps to perform dynamic voltage drop analysis using
PrimeRail is given in Appendix C.

4.4.1 Power Analysis and Current Waveform Generation

The PrimeRail dynamic analysis matrix solver needs current waveforms at cell instance power
and ground ports to calculate timing-dependent voltage drops (or rises) on the power and ground

45

Input Design OSULIB
(DEF, LEF) DCFLIB /
OSULIB UCDCLIB
DCFLIB UCDCLIB
Milkyway Design
 Library PT-PX SCRIPT
Milkyway Ref Verliog Netlist
Library
Power Analysis using SDC
PrimeTime-PX
Library SPEF
Characterization
Current VCD/SAIF
Current Waveform
Characterization
Generation

Parasitic
Extraction
Power Grid (PG)
Extraction

Rail Analysis
(DvD)

Voltage Drop Map /


Results

Figure 4.4: PrimeRail Voltage Drop Analysis Flow

parasitic network. Library characterization has already generated current waveforms for individual
cells. However, in a design, the current consumption, and hence power, of a cell depends on its
connectivity with other cells. PrimeRail uses PrimeTime-PX (PT-PX) to first perform gate-level
power analysis. PT-PX builds a detailed power profile of the design based on the circuit connectivity
(generated from verilog netlist), the switching activity information (VCD/SAIF), the net parasitic
(SPEF), and the cell-level power behavior data in the Synopsys database format (.db) library, which
can be either a nonlinear power model (NLPM) or a Composite Current Source (CCS) library. It
then calculates the power behavior for a circuit at the cell level and reports the power consumption

46
at the chip, block, and cell levels. Gate-level power analysis depends on input vectors. These input
vectors can be provided through VCD file. In absence of VCD file, we can also provide switching
activity information through Switching Activity Interchange Format (SAIF) file. A SAIF file is
generated either from gate-level or RTL simulation. RTL SAIF captures switching activity for only
part of the design. PT-PX propagates the partial switching activity throughout the whole design.

Once power profile of a design is calculated, the power consumption value of cells is used to
scale the cell current waveforms available from library characterization data. Hence, based on
the library characterization data and PT-PX power reports, PrimeRail creates cell instance profiles,
which include dynamic current waveforms and parasitic of the power supply network for all the
power and ground ports of each cell in the design.

4.4.2 PG Extraction

Chip power grid(PG) consists of multiple metal layers. PG extraction step involves extracting
the parasitic of power grid network. PrimeRail’s built-in extraction engine can extract resistance,
and capacitance parasitic. However, for RC extraction, it requires TLUPlus model, which defines
the technology parameters for interconnect. TLUPlus model can be generated using Interconnect
Technology File (ITF) available from Cadence Star-RC-XT tool or from foundry. In absence of
TLUPlus models, we only extract resistance parasitic. Extraction of resistance parasitic can be done
using Milkyway Technology file. Milkyway technology file for 0.18um node can be obtained from
OSU library.

4.4.3 Rail Analysis

During rail analysis, PrimeRail combines the cell current and parasitic model with the resistive
power distribution network to solve for the voltage drop values at each node in the resistive network.
To acquire more accurate results, PrimeRail needs the location of the ideal voltage source and the
ideal power supply in the design. The voltage sources can be identified graphically or by specifying
locations on the die. We provide an ideal voltage source on the middle of power ring on each side
of the die. The voltage sources are placed on the VDD ring around the core. The power supply
locations can be saved in a file, and loaded during the analysis.

PrimeRail reports the minimum and maximum values of voltage drop (for a power net) or voltage

47
rise (for a ground net) to the command window and to the log file. By default, the tool also reports
the top five instances of peak voltage drop and time when they occur. A voltage drop violation
report can also be generated, which lists down all the cell instances experiencing absolute voltage
drop more than user defined level. PrimeRail can only report top 100 cell instances having peak
voltage drop. PrimeRail also reports voltage drop values for each metal segments. We use this
report to perform the decoupling capacitor optimization as described in Chapter 6.

4.5 Experimental Setup

The proposed approach discussed in this chapter requires three components, namely UCDCLIB,
DCOPT and ECO-PLACER. These three components along with the PrimeRail based dynamic
voltage drop analysis complete the overall framework. We compare results of our approach with
filler-based decap optimization approach. We specifically compare two approaches in terms of
overall decap budget requirement and effective voltage drop reduction. We categorize the results in
four different cases as shown in Table 4.1.

Subsection 4.5.1 provides further details about these cases. The results of two approaches cate-
gorized in four different cases for each benchmark are presented in chapter 8.

4.5.1 Benchmark Description

In order to observe the effects of voltage drop and apply the optimization process, we need
designs with sufficient complexity. We choose benchmarks from HLS’95 [29] and ITC’99 [30]
pool. We perform the comparative analysis on benchmarks shown in Table 4.2

4.5.2 Benchmark Analysis

Following steps are used to analyze each benchmark shown in Table 4.2 for four design cases
given in Table 4.1:

1. RTL description of design in VHDL format is synthesized using Synopsys Design Compiler
[31]. Technology library used during synthesis is OSULIB for 0.18um node from Okhlahama
State University. This is a reference library for analysis. Synthesis constraints given are clock

48
Table 4.1: Design Cases for Voltage Drop Analysis

Case Design Library Decap Source Description


Cells
Used
Case 1 Pre-Opt OSULIB Intrinsic Cell Decap Pre-optimized design (nominal design)
DCFLIB * Design contains standard logic and filler
cells.
* No voltage drop optimization performed.
* Design contains intrinsic cell decap only.

Case 2 Post-Opt(F) OSULIB Intrinsic Cell Decap Post-optimized design (Filler-based decap
DCFLIB Filler-based Decap optimization)
* Design contains standard logic cells and
filler cells.
* Voltage drop optimization performed by
filler-based decap approach

Case 3 Post-Opt(D) OSULIB Intrinsic Cell Decap Post-optimized design (Decap padded cells
DCFLIB Decap-padded Cells based optimization)
UCDCLIB * Design contains standard logic cells and
decap-padded logic cells along with filler
cells.
* Voltage drop optimization performed using
our approach

Case 4 Post-Opt(DF) OSULIB Intrinsic Cell Decap Post-optimized design (Decap-padded cells
DCFLIB Filler-based Decap and Filler-based decap optimization)
UCDCLIB Decap-padded Cells * Design contains standard logic cells and
decap-padded logic cells along with filler
cells.
* Voltage drop optimization performed using
ours as well as filler-based decap approach.

frequency, clock transition. Output of synthesis is gate-level netlist (.v), and Synopsys design
constraint file (.sdc).

2. The gate-level netlist is placed and routed using Cadence SOC Encounter [32]. The inputs to
Encounter are gate-level netlist (.v), physical (.lef) and timing (.lib, .db) views of OSULIB,
design constraints (.sdc). Filler cells are inserted in the design from DCFLIB library. Design
is supplied with four ideal vdd and gnd input points. These input supply points are attached
at the middle of power ring stripe on each side of the die. Various placement and routing

49
Table 4.2: Benchmarks Used for Analysis

Benchmark Name Description Post- Post-


Synthesis Layout
Gates Gates
Barcode16 (HLS’95) 16 parallel copies of Barcode Reader Design 6627 7574
B14 (ITC’99) Viper Processor-Subset 10201 11780
B18 (ITC’99) 2 copies of B14 and 6 copies of 80386 58972 71761

constraints such as core utilization, aspect ratio, power grid design attributes are specified
in Chapter 7 with the benchmark results. Post-layout outputs are: design placement (.def),
signal parasitic file (.spef).

3. Post-layout design is imported into Synopsys’ PrimeRail voltage drop analysis tool to gen-
erate a Milkyway design library as discussed earlier in this chapter. OSULIB and DCFLIB
Milkyway reference libraries are attached to the design, and dynamic voltage drop analysis is
performed. We use vector-less flow by providing a switching activity interchange file (.saif)
for each design.

The results are classified under case 1: ”pre-opt” (results before voltage drop optimization).
Note that decap contribution to pre-opt design comes from intrinsic cell decap. No explicit
decap is added to the design at this stage.

4. We then use PrimeRail’s filler-based decap insertion flow to optimize the voltage drop around
the chip. Decap insertion flow requires one-to-one correspondence between filler cell dimen-
sion and decap cell dimensions. Master decap cells are referenced from DCFLIB library.

PrimeRail optimizes voltage drop iteratively using filler-based decap insertion approach dis-
cussed earlier in this chapter. The final voltage drop results and number of filler cells replaced
as well as total decap budget required are stored under case 2: ”post-opt (F)” (results after
voltage drop optimization using only filler-based decaps).

5. The pre-opt design from step 3 is now optimized for voltage drop using our approach. We use
C++ based decap-optimization procedure (DCOPT) (described in Chapter 6) in conjunction
with PrimeRail’s voltage drop analysis to reduce the voltage drop around the chip iteratively.
The output results of DCOPT are total decap budget and list of logic cells to be replaced by
decap-padded standard cells from DCLIB library and standalone decap cells from DCFLIB.

50
The original design placement is modified optimally to incorporate these design changes us-
ing a C++ based Engineering Change Order placer (ECO-Placer) (described in Chapter 7).
The modified design is ECO routed using Cadence SOC Encounter ECO flow, and post-layout
design is saved (in .def format).

As described in Chapter 6, DCOPT optimization algorithm keeps PrimeRail in iteration loop


for voltage drop analysis. The DCOPT operation is dependent on the violation report gener-
ated by PrimeRail during each iteration. However when PrimeRail is run in a batch mode to
generate a violation report, it crashes after random number of iterations with a segmentation
fault. The bug in PrimeRail has been verified by Synopsys (STAR Case 9000269422), but
has not been resolved as of today (thesis writeup). Hence, for final voltage drop optimiza-
tion experiments, we graphically identify hotspot regions (regions of high drop), and use perl
scripts to locate list of cells in that region. These cells are eventually replaced by cells from
UCDCLIB and DCFLIB using ECO-Placer. This process of graphical identification, though,
bypasses the functionality of DCOPT, it does not undermine the overall optimization frame-
work. We demonstrate effectiveness of DCOPT in Chapter 6 by performing the few voltage
drop optimization iterations in a non-batch mode. PrimeRail works fine, if the violation report
is generated through GUI. Hence, the PrimeRail report generation part is done through GUI
and rest of operation of DCOPT is done as per the algorithm described in Chapter 6. Please
note that, ECO-Placer is no way affected by this process. ECO-Placer is a stand-alone tool in
itself. The results in Chapter 7 demonstrates the effectiveness of our ECO-Placer algorithm.
As of this thesis write-up, the bug in PrimeRail has not been resolved. However, in future, if
bug is resolved, we will also include optimization results shown in Chapter 8 using DCOPT
algorithm.

6. Similar to step 3, post-layout design from step 5 is analyzed for voltage drop. The design
analysis requires DCLIB milkyway reference library to be attached at this stage. The voltage
drop analysis results, so obtained, are classified under case 3: ”Post-Opt (D)” (results after
voltage drop optimization using only decap-padded standard cells).

7. Finally, we perform PrimeRail’s filler-based decap optimization similar to step 4 on Post-


Opt (D) design from step 6. The results at this step are denoted under case 4: ”Post-Opt
(DF)” (results after voltage drop optimization using both th decap-padded standard cells and
filler-based decaps).

51
Chapter 5

UCDCLIB: Decap Padded Standard


Cell Library Design

Standard cell based design approach is a widely popular method for creating Application Spe-
cific Integrated Circuits (ASICs). Use of standard cells in a design flow reduces the ASIC devel-
opment time considerably by providing a high degree of automation. Right from synthesis phase
to final layout generation, standard cell library promotes a highly modular and independent design
framework.

Results from chapter 3 highlight the importance of placing decaps near the switching nodes. To
enable this functionality, we develop a special class of standard cell library, UCDCLIB (University
of Cincinnati Decoupling Capacitor padded Standard Cell Library), where standard logic cells are
padded with a decoupling capacitor. We modify logic cells from OSU standard cells library to
develop UCDCLIB cells. Section 5.1 provides details about OSU library cells. A brief overview
of various sources of MOSFET capacitance is presented in Section 5.2, which forms a basis for
on-chip decoupling capacitor design. Developed libraries are discussed Section 5.3. Lastly, in order
to facilitate the use of decap-padded standard cells at various stages of design flow, we discuss the
method to characterize the library cells in Section 5.4

52
5.1 Nominal Standard Cell Library

OSU Standard cell library (formerly IIT standard cell library) offers various logic cells ranging
from a basic inverter to a complex full-adder [28, 33]. We treat this library as a nominal standard
cell library, and use it for synthesis and layout of benchmark designs. As described in Sectionde-
capsources, each OSU library cell has an associated parasitic capacitance due to the cell structure.
This parasitic capacitance acts as intrinsic decap. Intrinsic decap along with other details such as
logic function, cell area (only width in λ is considered for area since all cells are single height cells)
for each OSU library cell is shown in Table 5.5 (after Section 5.3). The cell nomenclature follows
the pattern: gate name< #n >X< #m >, where gate name refers to the gate function name, #n is
the number of inputs a gate has, #m refers to the driving strength of the gate.

5.2 MOSFET Gate Capacitance

On-chip decoupling capacitance is typically realized as a gate capacitance of MOSFET devices.


An understanding of MOSFET capacitance, therefore, would be useful for design of a proper de-
coupling capacitor. Detailed analysis of MOSFET capacitance can be found in [4]. In this section,
we briefly summarize from [4] capacitances contributed by a MOSFET transistor (Figure 5.1):

1. Junction capacitance: due to reversed biased pn junction formed by substrate and source,
and substrate and drain interface.

2. Overlap Capacitance (CO ): due to MOSFET structure

3. Channel Capacitance(CC ): due to conducting channel below the gate terminal.

Out of these three sources, only last two contributes toward the gate capacitance of MOSFET.
The total gate capacitance (CG ) is, thus, given as:

CG = CO +CC (5.1)

Overlap Capacitance (CO ):


Overlap capacitance arises due to an overlap of drain and source diffusion with gate oxide as shown
in Figure 5.1. The lateral diffusion of source and drain under gate oxide reduces the drawn channel

53
Figure 5.1: Capacitance Sources in MOSFET

length Ld of transistor by an amount ∆L = 2 · Xd . The overlap capacitance depends on the fabrication


process, and assumes a fixed value for a particular process. It is given as:

CO = COX Xd W (5.2)

where COX is the gate oxide capacitance per unit area, defined as parallel plate capacitance between
MOSFET gate and the conducting channel. In this case, overlapped drain and source region acts as
a conducting channel. Xd is the lateral diffusion amount, and W is the width of channel.

Channel Capacitance (CC ):


Separation of conducting channel and gate conductor terminal by gate oxide results in the channel
capacitance similar to a parallel plate capacitance. The Channel capacitance depends on the operat-
ing mode of the MOSFET as shown in Figure 5.2. During cutoff mode (VGS ≤ 0 f orNMOS), there
exists no channel, and the total channel capacitance formed by parallel plates of gate and substrate
(CGCB ) is equal to,
CC = COX W L (5.3)

where, L is the effective length of the channel.

Presence of a uniform channel during the linear mode (VGS > VDS ) divides the total channel
capacitance given by equation 5.3 equally between gate & source (CGCS ), and gate & drain (CGCD ).
And during saturation mode (VDS > VGS −VT ), the capacitance exists only between gate and source
due to pinch-off effect. The capacitance value depends on the area of parallel plates formed by gate
and channel connecting source.

54
S G D S G D S G D

P-Sub P-Sub P-Sub

Cut-Off Linear Saturation

Figure 5.2: MOSFET Channel Capacitance

The variation of gate capacitance with respect to gate-to-source voltage (VGS ) is shown in Figure
5.3. As seen from the Figure 5.3, in order to obtain a stable gate capacitance, MOSFET must be
operated in the linear region. This can be done easily by ensuring drain-to-source voltage (VDS ) less
than the gate-to-source voltage (VGS ) at all times during its operation. This design information is
used to realize a stable capacitance using MOSFET devices as described in next section. Table 5.1
summarizes the capacitance values of MOSFET transistor for different operating regions.

Figure 5.3: MOSFET Gate Capacitance Variation with respect to VGS

Table 5.1: Channel Capacitance of MOSFET for Different Operating Regions (Source [4])

Region CGCB CGCS CGCD CC CG


Cut-off COX W L 0 0 COX W L COX W (L + 2Xd )
Linear 0 COX W L/2 COX W L/2 COX W L COX W (L + 2Xd )
Saturation 0 (2/3) ∗COX W L 0 (2/3) ∗COX W L (2/3) ∗COX W (L + 2Xd )

55
5.3 Decap Library

It is clear from previous section that MOSFET gate capacitance can be used to design a decou-
pling capacitor. Either a PMOS or NMOS can be used for this purpose. For a stable decoupling
capacitor, MOSFET must be operated in a linear region. A stable NMOS decap can be designed by
connecting gate terminal to supply voltage, and shorting the source, drain and substrate terminals to
ground as shown in Figure 5.4. Likewise, a stable decap using PMOS can be realized by connecting
gate terminal to ground, and source, drain and substrate connected to supply voltage. From Table
5.1, MOS decaps can be characterized by following equation:

CG = COX W (L + 2Xd ) (5.4)

where parameters are as defined earlier. The transient response of a standard decap gets affected

Figure 5.4: NMOS Decap and its Equivalent Model

by parasitic resistance offered by the channel. Higher channel resistance slows down the decap
charge release rate, and makes it ineffective. Since the decap is designed using a standard MOSFET
transistor, the same governing equation for standard MOSFET can be used to characterize the low
frequency resistance of a standard decap [34]:
L
Re f f = 6µCOX W (VGS −VT ) (5.5)

where µ is the mobility, VGS (or VGD since source and drain are tied) is the voltage across the oxide,
and VT is the threshold voltage.

From Equation 5.5, it is clear that Re f f is proportional to the channel length L. That is, for a
faster transient response, a decap design should maintain L in a reasonably small range to keep Re f f
small. To capture the transient behavior, a decap can be modeled as a series RC circuit, as shown in
Figure 5.4. Both Reff and Ceff can be considered.

56
The cost-effectiveness of a MOS decap for a row-based standard cell design style can be im-
proved by designing a single height decap cell consisting of both PMOS and NMOS as shown in
Figure 5.5. This decap cell can be treated just like any other standard logic cell. Decap standard
cells do not have any connectivity. They Only connect to supply and ground rail. Hence, they are
also referred as physical only cells. Although, other approaches for decap design have also been
proposed based on application and reliability requirements [34, 35], decap structure shown in Fig-
ure 5.5 allows ease of use and offers the best decap value per unit area [34]. We, therefore, chose
this structure to design our decap-padded standard cell libraries as discussed in Sections 5.3.2 and
5.3.3. Before we provide details about the developed decap libraries, we present in the following
section decap measurement results using HSPICE.

Figure 5.5: Decap Standard N+P Cell

5.3.1 Decap Measurement using HSPICE

In this section, we experimentally measure NMOS and PMOS decoupling capacitance using
HSPICE, and compare the measurement results with that obtained from decap Equation 5.4.

HSPICE supports various MOS capacitor models [36] for capacitance measurement. Some of
these models are Meyer model, Charge Conservation model, BSIM model, and AMI model. Se-
lection of appropriate model depends on the trade-off between the desired accuracy in specific fre-
quency range and modeling time. Appropriate model for calculating nonlinear, voltage-dependent
MOS gate capacitance can be included in the HSPICE simulation by setting the model parameter
CAPOP.

57
For MOSFET model level 49, CAPOP is set to 2 by default, which is the parameterized Mod-
ified Meyer model. We use this model to compare the simulation results with that obtained from
equations. Meyer model is a first order approximation of MOS capacitance and is reported to pre-
dict the high frequency capacitance more accurately [37, 38]. A DC sweep analysis using HSPICE
[39] can be used to calculate the variation of CG with respect to VGS graph. A HSPICE input file for
characterization of NMOS decap (for 0.18µm node) shown in Figure 5.4 is given below:

HSPICE input file for N-Decap measurement


.OPTION SCALE=0.09u
.OPTION DCCAP=1 POST
mn d g s b nmos w=100 l=2

.INCLUDE model 180nm.sp

Vd d 0 0

Vb b 0 0
Vs s 0 0
.DC Vg -1 1.8 .1

.OPTION POST

.print VGS =V(g) CG =LX18(mn)


.END

The parameter, LX18(<transistor name>), defined in the model, measures the MOS gate capac-
itance. And parameter DCCAP is set to 1 to enable capacitance calculation. Simulation of above
HSPICE file at 0.18µm technology node results in the graph shown in Figure 5.6. The plot shown in
the Figure 5.6 matches well with the graph shown in Figure 5.3. It is clear from the graphs that the
gate capacitance drops when VGS is near MOS threshold voltage VT , and a stable decap can obtained
by operating MOS in linear region (i.e. VGS ≈ V DD. A similar analysis can be done for PMOS
decap. Figure 5.7 shows the capacitance curve for PMOS decap. The capacitance curves shown
are in close correlation with MOS capacitance theory discussed in previous section. Tables 5.2 and
5.3 show the NMOS and PMOS decap measurement results at 0.18µm node for various transistor
widths respectively. (CG )sim and (CG )eq refers to the capacitance result obtained from HSPICE sim-
ulation and from Equation 5.4 respectively. Simulation results match well with that obtained from
equation. The small discrepancy in values could be attributed to Meyer’s simplified piecewise linear
capacitance model.

58
Table 5.2: NMOS Decap Measurement Results Table 5.3: PMOS Decap Measurement Results
using HSPICE using HSPICE

W L (CG )sim (CG )eq W L (CG )sim (CG )eq


λ λ (fF) (fF) λ λ (fF) (fF)
10 2 1.75 1.89 10 2 1.86 1.83
20 2 3.5 3.78 20 2 3.72 3.66
40 2 7.0 7.56 40 2 7.44 7.32
100 2 17.5 18.9 100 2 18.6 18.3

Figure 5.6: NMOS Decap Measurement using Figure 5.7: PMOS Decap Measurement using
HSPICE (CG Vs. VGS plot) HSPICE (CG Vs. VGS plot)

5.3.2 DCFLIB

DCFLIB (Standalone Decap Cells Library) contains decap standard cells of varying sizes. These
decap standard cells serve as filler cell replacements to provide specific value of decap in the design.
The library also contains equal number of same sized filler cells as there are decap standard cells.
This is required for proper functionality of PrimeRail’s filler-based decap insertion. The reason to
develop a separate library containing only filler and decap standard cells is that filler-based decap
optimization is required for both the traditional as well as our voltage drop optimization approach.
Equation 5.4 can be used to arrive at transistor dimensions for required decap value. For example,
substituting values for COX and CO for 0.18µm technology in Equation 5.4, we can design a decap
cell with 2X femto farad capacitance by solving X = W (7L + 5) for each MOSFET in decap cell,
where W and L are defined in terms of λ. We try to keep L at its minimum value allowed by
the technology to lower the resistance and increase W to achieve required decap value. Multiple
fingers are used for decap design, when W reaches its maximum value. Table 5.4 lists out decap

59
Table 5.4: DCFLIB Cells

Decap Cell Value Area Fingers


Name ( f F) (h × w)λ2
DCPX1 10 100 x 14 1
DCPX2 20 100 × 16 1
DCPX5 50 100 × 22 1
DCPX10 100 100 × 34 1
DCPX20 200 100 × 62 2
DCPX50 500 100 × 146 5

standard cells in DCFLIB. The decap values listed are as per 0.18um technology node. The cell
nomenclature is defined as: DCPX< #m >, where DCP refers to the decoupling capacitor, #m
refers to the decoupling capacitance provided by the cell in femto farad.

5.3.3 UCDCLIB

UCDCLIB (University of Cincinnati Decap-padded Standard Cell Library) contains standard


logic cells with minimum decap padding. Each standard logic cell from nominal library is increased
in area to realize a minimum value decap. Diffusion regions of padded decap cell is shared with
the diffusion regions of the logic cell in order to reduce the increase in area. Figure 5.8[A] shows
layout of a standard inverter, INVX8. Its equivalent decap-padded inverter, INVDCX1 is shown
in Figure 5.8[B]. Nomenclature of a decap-padded standard logic cell follows the standard logic
cell with DC suffix. Care must be taken to ensure that, the addition of decap does not change the
input pin capacitance of a logic cell. Increase in the input pin capacitance of a logic cell results in
increased loading on the preceding gate driving it, which in turn results in timing degradation. With
careful layout drawing, we ensure that the pin capacitance of a logic cell does not change. Further,
modification of a standard cell has to confirm to design guidelines for its proper use at various stages
in the design flow. Decap padding to a standard cell is done confirming to following standard cell
design guidelines:

• Cell Pins:
Input and output pins of cell are placed at an intersection of a multiple of the xPitch and
yPitch. That is, the pins area placed on a xPitch*yPitch grid. This is done to facilitate signal
routing.

60
• Cell Width:
Cells are designed with width multiple of the yPitch value for the given technology node.

• Cell Origin:
Cell layout is drawn with lower left corner at (0, 0). This helps in defining the place and route
boundary for the cell when generating the abstract view of cell for layout purposes.

(A) (B)

Figure 5.8: [A] Standard Inverter (INVX8) [B] Decap-padded Standard Inverter (INVDCX8)

Table 5.5 shows standard cells from OSULIB and equivalent decap-padded standard cells from
UCDCLIB. Intrinsic capacitance and area of OSULIB cells is denoted by C1 in fF and A1 in λ
(only width is specified since all cells have same height) respectively. Similarly, C2 and A2 refers
to capacitance and area for UCDCLIB cells. Percentage increase in cell capacitance and area due to
decap padding is given in last two columns as %∆C and %∆A respectively.

61
Table 5.5: OSULIB and UCDCLIB Cells

OSULIB UCDCLIB
(with DC suffix)
Cell %∆C %∆A
C1 A1 C2 A2
(fF) (λ) (fF) (λ)
INVX1 5.37 16 22.8 24 324.58 50
INVX2 10.36 16 28.2 24 172.2 50
INVX4 20.8 24 41.05 32 97.36 33.33
INVX8 41.5 40 61.8 48 48.92 20
AND2X1 19.55 32 70.8 48 262.15 50
AND2X2 25.65 32 77.78 48 203.24 50
BUFX2 18 24 40.55 40 125.28 66.67
BUFX4 33.6 32 53.9 40 60.42 25
AOI21X1 27.36 32 68.88 48 151.75 50
AOI22X1 36.09 40 73.74 56 104.32 40
DFFNEGX1 66.55 96 88.05 112 32.31 16.67
DFFPOSX1 70.15 96 91.58 112 30.55 16.67
FAX1 114.44 120 172.63 136 50.85 13.33
HAX1 52 80 91.18 88 75.35 10
MUX2X1 38.24 48 95.86 56 150.68 16.67
NAND2X1 11.08 24 39.95 32 260.56 33.33
NAND3X1 19.91 32 52.39 40 163.13 25
NOR2X1 14.67 24 99.25 40 576.55 66.67
NOR3X1 31.31 64 88.7 64 183.3 0
OAI21X1 23.06 32 70.7 48 206.59 50
OAI22X1 33.69 40 81.38 56 141.56 40
OR2X1 23.7 32 60 48 153.16 50
OR2X2 30.03 32 80.68 48 168.66 50
XNOR2X1 55.6 56 76.33 72 37.28 28.57
XOR2X1 56.03 56 76.75 72 36.98 28.57

5.4 Cell Characterization and View Generation

A successful tape-out of a cell-based design has a strong dependence on the level of accuracy
with which information about individual cells is accessible to CAD tools during various stages
of design flow. Further, a design flow involving CAD tools from different vendors requires this
information to be presented in a standard format to address the issue of interoperability. These
requirements are satisfied by characterizing library cells and categorizing the generated information
in the form of standard library views. The characterized views include symbol view for making
schematic, logical and timing views for synthesis and design analysis, and physical views for layout
generation and fabrication. In this section, we present method to characterize our library cells and
generate all popular library views to facilitate seamless integration of our library with standard CAD

62
Table 5.6: Library Views and their Description

Category View Description


Symbol SDB Symbol Database
GDSII Graphical Data System
Physical
LEF Library Exchange Format
TLF Timing Library Format
Timing And Power DB Timing Database
LIB Liberty Format
ALF Advanced Library Format
Netlist V Stub Verilog
Parasitic PARA Parasitic format

tools [27]. Table 5.6 summarizes standard library views and their Description. Figure 2.1 shows
various library views required by popular CAD tools at various stages of digital IC design flow. In
subsequent subsections, we discuss method to generate library views shown in Table 5.6.

5.4.1 Symbol Library

Symbol library provides a graphical representation for library cells. Using Symbol library, the
CAD tool can generate the schematic of a design by performing a one to one mapping of cells
in the design netlist to cell symbols in the library. Symbol libraries for our library cells have not
been generated. A brief description of symbol library generation [40] is provided for completeness.
The symbol library generation process starts with drawing a representative symbol for each cell in
a schematic editor, like Cadence Virtuoso Schematic. The symbols are then exported to an EDIF
file (.edif), from which an ASCII symbol library (.slib) can be generated using Synopsys Design
Compiler. Finally, Synopsys Design Compiler can be used to generate the standard symbol library
(.sdb).

5.4.2 Physical View

Physical Views of library cells provide information related to physical representation of logic
cells. These information include cell layouts, layers used, layer numbers etc. Physical views are
required during the backend stage of design flow. Two popular physical views, GDSII and LEF file
formats, are described below:

63
Figure 5.9: Method to Generate Various Library Views

The GDS II stream Format


GDSII stream format is a de-facto industry standard for exchanging design layout information be-
tween IC designer and fabrication engineers. Foundry typically requires design information in the
form of GDSII format, which is then used to generate necessary photomasks for silicon fabrication.
GDSII stands for “Graphical Design System II”, and is an extension of GDS format. Originally
developed by a company, Calma, now it is owned by Cadence Design Systems. GDSII provides a
binary platform independent representation of a 2D design layout in the form of hierarchy of struc-
tures. The objects are grouped by numeric attributes like layer numbers, datatype, and texttype. In a
typical design flow, physical view of library cells is combined in hierarchical fashion with the chip
layout and a final GDSII output for the design is generated. Figure 5.9 shows the flow to generate
GDSII view. We layout the library cells in Magic using TSMC 0.18µm technology rules provided

64
by mosis. Cell layouts are then extracted to CIF, and transferred to Cadence Virtuoso Layout Editor.
The process design kit used with Virtuoso is NCSU CDK (Cadence Design Kit) [41]. The design
kit includes layermap files for CIF & GDSII import/export, layer assignments, DRC and parasitic
extraction rules, and transistor model files. Reference [27] provides a step by step description of
generating GDSII file from Virtuoso editor (Cadence ICFB platform toolset).

Library Exchange Format


Library Exchange Format (popularly referred to as LEF), introduced by Cadence, is a standard
ASCII format for providing abstract view of library cells to automated place and route (APR) tools.
Typically, APR tools require following in order to successfully place and route a standard cell based
design:

1. Routing layers information: Tool needs to know how many routing layers for the specified
technology node can be used to successfully connect standard cells and macros in the design.
Additionally, for each routing layer, various attributes like layer type, preferred routing direc-
tion (horizontal or vertical), layer width/spacing/pitch rules, parasitic per unit area etc., are
required. Layer type can be routing, cut (contact), or masterslice (poly/active), overlap. Via
attributes for connecting adjacent layers are also required.

2. Standard Cell information: Various attributes related to standard cells like name, site name,
orientation, place and route (PR) boundary, pins connected to cells are required. Standard
cells are laid out using few metal layers. Locations and sizes of these metal layers are required
by tools to avoid routing same metal layers over the same area inside standard cells. This is
necessary to prevent shorts between same metal layers.

These attributes are typically captured in LEF file and form an abstract view for the library. A
single LEF file can be defined to include all library information. However, if library contains large
number of cells, a single LEF file may lead to a large LEF file which can be hard to manage. In
such cases, the abstract view can be separated in two parts: technology LEF, and cell library LEF.
Technology LEF file contains technology information for a design such as routing layer and via
attributes etc. And cell library LEF contains standard cell and macro attributes. We generate a
single LEF file for our libraries. Figure 5.10 shows a 2 input NAND gate layout and its abstract
view. As seen from figure, abstract view is like a cell skeleton, contains no active layers. It only
contains cell geometry (X and Y), obstruction information for routing layers, and I/O pin locations.

65
Following sections typically define a LEF file [42]:

1. Technology: Technology section defines various attributes related to routing layers and vias
as discussed above.

2. Site: Site section contains generic information about all the standard cells and macros defined
after this section. It defines cell class (whether cells belong to CORE or PAD), cell symmetry
(whether cells can be rotated along X axis or Y-axis or both XY for optimization), and generic
cell dimensions.

3. Macro: Macro section defines attributes related to standard cells as discussed above. There
are as many macro sections as there are standard cells in a library.

Figure 5.10: [A] Cell Layout [B] its Abstract View

Cadence Abstract generator [43] can be used to generate cell abstract views. Detailed steps to
generate the abstract view of library cells using Cadence Abstract generator can be found in [27].

5.4.3 Timing and Netlist View

Correct and reliable operation of a design can be ensured if the design meets its timing and
power specifications. Although, it is possible to perform circuit- or switch-level simulations using

66
SPICE like simulators to estimate these parameters for a design, the memory and time overhead
would simply be unacceptable. Moreover, running a full-chip circuit level analysis every time a
design goes through any changes would be prohibitively expensive. Instead, a convenient approach
would be to generate delay and timing models for individual logic cells, and use these models to
estimate parameters for the design. This approach saves considerable time, since model generation
is just a onetime process, and design parameters estimation based on these models does not require
transistor level analysis. Detail and accurate model generation is must to produce results comparable
to full-chip analysis results. Library characterization is a process of generating timing and power
models for individual logic cells on the basis of their physical netlist. Power consumption and speed
of logic cells depend on input slope and output loading conditions. Cell area, logic functionality,
and state dependent leakage power are some of the attributes required by CAD tools to perform
various optimizations. Library characterization generates this information by simulating individual
logic cells for different input slew and output loading conditions for each process corner. Non-
linear models for cell delay, output transition, and power are typically represented in the form of
a 2D lookup table. For example, to calculate cell delay with respect to one input for 5 different
input slope and 5 different output load conditions, 50 cell simulations (known as timing arcs) are
to be performed, 25 for both rise and fall transitions. The characterized data is stored in a standard
format such as .lib, .db, .tlf, or .alf so that CAD tools can understand. Formats .lib (synopsys liberty
format) and .db (database format) are typically used by Synopsys products, whereas format .tlf
(timing library format) is used mostly by Cadence tools. Format .alf (advance library format) is an
extension to .lib format.

We use Cadence SignalStorm tool [44] to characterize our cell libraries for nominal process
corner and generate .lib, .alf, and .db formats. A netlist view (.v) is also generated. It captures
verilog description of the logic cell, required for simulation purposes. Library characterization
process using SignalStorm is detailed in [27].

5.4.4 Parasitic View

As discussed in Section 2.2.2, parasitic effects of standard cells appearing as a load to power
distribution network affects the overall voltage drop. Parasitic view captures the cell parasitic,
which can then be used for cell-based voltage drop analysis. Parasitic view creation is discussed in
Chapter 5. Steps to create parasitic view using Synopsys PrimeRail tool are given in Appendix B.

67
Chapter 6

DCOPT: Decap Optimization Algorithm

Traditional approach of filler-based decap optimization places decaps away from the switching
nodes, thereby rendering the decaps less effective. A higher decap budget is required to compensate
for this efficiency loss. As discussed in earlier chapters, distributed decap placement can improve
the cost-effectiveness of decaps. We propose to achieve a distributed decap placement using decap-
padded standard cells from UCDCLIB library. Satisfying the decap requirement of a design using
decap-padded cells raises two questions. First, how to calculate the decap budget of a design in
terms of decap-padded standard cells. And second, which standard cells in the design need to
be replaced with decap-padded standard cells. Insertion of decap-padded standard cells without
proper guidance would result in more number of standard cell replacements, thereby leading to an
unacceptable increase in design area and yield degradation of the design.

To address these issues, we develop a C++ based Decoupling Capacitor (Decap) Optimization
procedure: DCOPT. DCOPT works in conjunction with Synopsys’ voltage drop analysis tool,
PrimeRail, to calculate the decap budget of a design. DCOPT identifies standard cells in a de-
sign which are to be replaced with equivalent decap-padded standard cells from UCDCLIB. It also
allows for tighter decap optimization by using a concept of dynamic thresholding described later in
the chapter. Implementation details of DCOPT is presented in this chapter.

68
6.1 DCOPT Block Diagram

Figure 6.1 shows the block diagram of decap optimization flow using DCOPT. DCOPT architec-
ture is based on a client-server model, where client and server act as two different processes, and the
exchange of data between them is accomplished by an interprocess communication (IPC) mecha-
nism. DCOPT comprises of three building blocks, namely PrimeRail Analysis, Decap Optimization
Client (DCOPT-Client), and Decap Optimization Server (DCOPT-Server). DCOPT uses PrimeRail
to perform voltage drop analysis on a design. The result of voltage drop analysis is communicated
to DCOPT- Server by DCOPT-Client through named pipe IPC mechanism. DCOPT-Server virtually
incorporates decaps into the design. The modified design is re-analyzed for voltage drop effects.
DCOPT works in iterative fashion to bring the voltage drop within user defined threshold.

The main reason to implement DCOPT in client-server configuration is to combine the PrimeRail’s
voltage drop analysis with execution efficiency offered by C++. Synopsys PrimeRail supports TCL
and Scheme based scripting as its main interface. PrimeRail can be configured to run in batch mode
using scripts written either in TCL or Scheme. However, implementation of decoupling capaci-
tor optimization algorithm through a script and its subsequent integration with PrimeRail analysis
would make the process unwieldy for two reasons. First, the optimization process requires an it-
erative analysis to arrive at the decap budget. Hence, execution time penalty must be minimized.
Second, efficient data structures are required to capture the design complexity and to enable fast
budgeting. For these reasons, DCOPT-Server is implemented using a compiled language, C++. Im-
plementation of DCOPT-Server in C++ raises another issue: integration of script based PrimeRail
analysis with a C++ based process. The problem can be solved by invoking a separate process
through TCL script. DCOPT-Server can be repeatedly invoked to perform the decap budgeting af-
ter each voltage drop analysis. However, repetitive invocation of DCOPT-Server will kill the very
purpose of implementing a separate process. DCOPT-Server needs to perform initial database and
remember the updated design for next iterative analysis. Invoking it with each iteration would result
in an unacceptable execution time overhead. Hence, a better approach would be keep DCOPT-
Server running in the background for the entire duration of optimization process, and supply it with
a necessary parameters to perform decap budgeting. This functionality is enabled by implementing
a C++ based light weight DCOPT-Client process. DCOPT-Client serves as an interface between
PrimeRail analysis and DCOPT-Server optimization process. Improvement in execution efficiency
due to DCOPT-Server is described in subsequent sections.

69
 DCOPT Server
Command ile
F

TCL/SCHEME Script DCOPT Server


  
Initial Voltage Pre pa re place me nt
 

Drop A nalys is
 database
 
Waitforacommand
Sta rt Decap
  DCOPT C lie nt

fromClient
O ptimizati onServer
Se nd Command
  Read ode
toServer IPC N I f
Violation
Invoke Decap
 
  n o
Optimization C lie nt
  Wa it fo r Server Map it to
  
ACK   
Load the odal Instances
 N 
DECAPs
 Adjust DECAPto
 
Perfo rm Voltage node.Ge nerate
Dec p I f
Nodal a  n o
 
Dro pA nalys is
Adjust VD
Ge nerate node
  Thres hold
Violation Info
 
Se nd ACKto C lie nt
 

Output
List f ell perati ns
o  c  o o

Figure 6.1: DCOPT Block Diagram

6.2 DCOPT Input Requirements

The input requirements of DCOPT can be subdivided as input requirements of its building
blocks. Input needs for PrimeRail analysis are same as described in Section 4.3. DCOPT-Client
is invoked through PrimeRail script, hence does not need additional inputs. Following summarizes
the input requirements of DCOPT-Server, supplied in the form of a command file:

1. Post Process Design Placement:


Design exchange format (DEF) describes the placement and routing information of a design.
A post-route design DEF can be generated from a place and route tool. Each cell instance

70
with its location attribute is specified in the DEF file. However, instances are not listed in any
specific order. DC Server needs instances in sorted order to perform decap budget calculation.
We sort all cell instances first row wise. Within each row, instances are further sorted along
the width (based on their Y location). We write a perl script to generate a sorted post process
design placement file (stored as .def.rpt).

2. Standard Cell Area:


DC Server requires area information for standard cells in OSULIB library. Since all standard
cells are single height cells, only cell width is sufficient. Cell area information is required
to map the voltage drop results from PrimeRail on the design placement as described in next
section.

3. Violation Report Name:


Violation report contains region wise voltage drop values of each metal layer. Violation report
is generated by PrimeRail tool. DC Server performs decap budgeting based on this violation
report, and hence it needs to know the name of violation report.

4. WhatIf Script Name:


Whatif Script name specifies the name of file used to communicate decap insertions in the
design to PrimeRail.

5. Cell Operations File Name:


This is a output file name generated by DC Server. It contains list of standard cells to be
replaced by decap-padded standard cells.

6. Design Supply Voltage:


Supply Voltage for the technology node needs to be specified. The default value is 1.8V for
TSMC 0.18um technology node.

7. Tolerable Voltage Drop Band:


As shown in chapter 1, reliable operation of a design requires that the variation in supply
voltage must be within a tolerable voltage drop band. The default value is 10% for TSMC
0,18um technology node.

8. Decap Increment Amount:


This specifies minimum amount by which decap must be incremented during the analysis.

71
The value is specified in femto farad. Decap step allows for trade-off between tight optimiza-
tion and faster runtime (number of iterations).

6.3 PrimeRail Analysis

PrimeRail analysis is performed through a mixed mode script containing TCL and Scheme lan-
guage commands. Algorithm 1 shows the pseudo-code for the analysis. Input design in Milkyway
format, Ω, is analyzed for initial dynamic voltage drop analysis (DvD). The regions of the placement
on specific metal layers having voltage drop greater than user defined threshold are reported to a
file. We call this file a violation report, Φ. This is followed by invocation of DCOPT-Server in non-
blocking mode. DCOPT-Server runs as a background process, and waits for necessary data from
PrimeRail Analysis. This data is supplied by invoking DCOPT-Client in blocking mode. Invocation
of DCOPT-Client in blocking mode halts the PrimeRail script execution until DCOPT-Client termi-
nates. This step provides an explicit synchronization between PrimeRail analysis and the DCOPT-
Server. DCOPT-Client sends the Φ to DCOPT-Server, which then returns a design modifications χ
or a status flag to DCOPT-Client. On receiving server acknowledgement, DCOPT-Client terminates
and unblocks PrimeRail execution. PrimeRail updates the Ω with χ, and performs DvD again with
What-If Capacitance feature. New violation report Φ is used to repeat the analysis. The iteration
continues until the voltage drop is optimized or an exit status is issued by DCOPT-Server.

What-If Capacitance feature in PrimeRail allows to evaluate changes in voltage drop by virtually
incorporating capacitors at specific locations within the design. DCOPT-Server sends χ in the form
of Scheme script which informs the placement of these virtual capacitors in the Ω. By virtual
capacitors we mean, the design Ω is not modified actually. Instead these capacitors are added
virtually just to observe the effect of design modifications.

6.4 DCOPT-Client

DCOPT-Client is a C++ based light weight process whose function is to synchronize the opera-
tion of DCOPT-Server with PrimeRail analysis. As discussed in previous section, explicit synchro-
nization is provided by DCOPT-Client by blocking the PrimeRail execution. DCOPT-Client com-
municates with server by setting up communication channels using named-pipe IPC mechanism.

72
Algorithm 1: PrimeRail Analysis Script Pseudo-code
Input: Refer to Section 4.3
Output: Voltage Drop Violations

Ω ← Load Milkyway Design Library;


Attach Reference Libraries to Ω;
Perform DvD on Ω ;
Invoke DCOPT-Server;
Φ ← Generate Violation Results;
while TRUE do
Invoke DCOPT-Client;
while DCOPT-Client Alive do ;
/* χ ← Design Modifications */ ;
Ψ ← χ .or. Status from DCOPT-Server;
if Ψ = EXIT then break;
Update Ω with Ψ;
Perform What-If C Analysis;
Generate Φ;
end

Two unidirectional named-pipe channels are created: a write only channel for sending a command
to server, and a read only channel for receiving the acknowledgment from server. Command is sent
to DCOPT- Server to indicate the availability of violation report. The received acknowledgment
terminates DC Client and unblocks PrimeRail execution.

6.5 DCOPT-Server

DCOPT-Server is a main component of the decap optimization process which is responsible for
calculating decap budget of a design, communicating modified design (with decaps) to PrimeRail,
terminating PrimeRail after analysis, and reporting total design decap in terms of UCDCLIB and
DCFLIB cells. DCOPT-Server analyzes the voltage drop values at various nodes in the design, and
compensates the drop by inserting a decap at that node.

DCOPT-Server is implmented in C++. Algorithm 2 shows the pseudo-code for DCOPT-Server.

73
DCOPT-Server prepares a physical map of the design from the input DEF. The physical map cap-
tures the location and orientation details of all instances in the design. Physical location of various
nodes on metal 1 layer used to create power trunks are also determined at this stage. Physical map is
necessary to determine cell instances suffering from voltage drop. This is made possible by mapping
the voltage drop violations reported by PrimeRail onto the prepared physical map. DCOPT-Server
receives violation report through DCOPT-Client through IPC mechanism. A valid command from
DCOPT-Client signals availability of violation report. Violation report contains voltage drop val-
ues at various nodes on all metal layers. These nodal voltage drop values are mapped to prepared
physical map, and associated to cell instances as per the physical location.

Note that voltage drop value is specified as a negative number, so lower voltage drop value
refers to higher drop. Although all cell instances having voltage drop value less than the user
defined threshold are under violation and are in need of optimization, we do not consider all such
instances under violation at first step. We optimize first the cell instances suffering from higher
voltage drop (more negative value) using the concept of dynamic threshold. Instead of identifying
all cell instances suffering from voltage drop with respect to user defined threshold, we dynamically
modify the voltage drop threshold as shown in Figure 6.2. During each iteration, voltage drop
threshold is set to a value X% lower than the peak voltage drop (indicated by V Dthes1 in the Figure
6.2). Cell instances identified with respect to this current threshold (shown by band I) are optimized
by adding decap at cell nodes. Hence, with each iteration, peak voltage drop decreases, and so
does the current threshold (shown by V Dthre2 ). When the current threshold becomes equal to user
defined threshold, all cell instances under violations are considered for optimization. The concept
of dynamic threshold allows for tighter decap budget. Decoupling capacitors work on principle
of locality. A decap placed near a violating cell may as well be useful for providing charge to
neighboring cells, if not all cells are switching simultaneously. Hence, optimizing first the cells
suffering from higher voltage drop can absorb number of violations due to lower drop cells. This
reduces total number of violations, and results in smaller decap repository. Experimental results in
Section 6.6 exemplify the usefulness of dynamic threshold concept.

Decap value for each violating cell identified based on the current threshold is updated with
each iteration. During the first iteration, the decap value for a violating cell is set to zero. With
each successive iteration, decap is incremented by input ’decap increment amount’ (Cincr ), if that
cell is found to be under violation again. Decap increment amount offers a trade-off between the
convergence time for optimization and the decap budget required for the design. A higher value for

74
Node Supply Voltage (V)

VDDNominal

VDuser-defined-thres

VDthes2

VDthes1
I I

Peak Voltage Drop

Figure 6.2: Illustration of Dynamic Threshold Concept

the increment amount enables faster convergence, but can lead to more than required decap budget.
Alternatively, a smaller value allows for finer control of voltage drop, leading to more accurate
decap budget, but affects the overall optimization time adversely.

During each iteration, decap additions to the design are communicated to PrimeRail through
DCOPT-Client interface. DCOPT-Server compiles all design modifications (decap additions at var-
ious locations) as a Scheme script, which is sent to PrimeRail. PrimeRail virtually updates the
design under analysis through this script, and performs a new voltage drop analysis. During each
iteration, the voltage drop results are communicated to DCOPT-Server for further optimization. The
optimization process continues until design contains no violating cell. At that time, DCOPT-Server
informs PrimeRail to terminate the analysis.

A final step performed by DCOPT-Server is to legalize the decap additions to violating cells.
After the complete analysis, each violating cell has been appended with some value of decap. This
decap is still virtual. In order to accommodate this decap into the design, we replace the violating
cell with equivalent cell from UCDCLIB. However, each UCDCLIB cell contains a minimum decap
padding. Hence, if the virtual decap with violating cell is more than the decap padding value of
equivalent UCDCLIB cell, then the extra virtual decap is combined with the adjacent cells, and these
cells are also considered for replacement from UCDCLIB. If adjacent cells can not accommodate
extra decap, then a decap cell from DCFLIB is inserted at that node to satisfy the decap budget of
the design. DCOPT-Server outputs list of cell operations consisting of violating cell replacements

75
with equivalent cells from UCDCLIB, and decap insertions from DCFLIB.

Apparent from above description, DCOPT-Server needs to keep track of updated decap values at
various nodes in the design. This is the reason to keep DCOPT-Server always active by running it
as separate process. If DCOPT-Server is invoked on each iteration of PrimeRail analysis, this would
have called for storing the updated decap values in some database or file, leading to huge execution
time penalty. Moreover, on each invocation, physical map preparation would add to the overall
optimization time. Therefore, a light weight client interface is provided to improve total execution
time.

Algorithm 2: DCOPT-Server
Input: DCOPT-Server Command File
Output: List of Cell Operations

P ← Prepare Placement Database;


Create IPC Channels;
Continue ← TRUE;
while Continue do
while α is not a valid command do
α ← Wait for DCOPT-Client;
end
Continue ← FALSE;
Γ ← Read Violations;
δcur ← DynamicThreshold(Γ, δreq );
foreach violation γ in Γ do
Map it to instance in P;
if γ < δcur then
Cnode + = Cincr ;
Update P;
Continue ← TRUE;
end
end
χ ← Generate Design Modifications;
Send χ to DCOPT-Client;
end
Legalize Decaps;

76
6.6 Experimental Results

In this section, we present experimental results on benchmarks to highlight the functional behav-
ior and effectiveness of DCOPT algorithm in optimizing the power supply noise. All experiments
were performed on Sun Blade 1000 workstation (SparcV9 processor at 750MHz) with Solaris op-
erating system.

Tables 6.1 and 6.2 show the DCOPT results for Barcode16 and B14 benchmarks respectively.
The first row in the table shows the initial voltage drop analysis results for benchmarks. Peak VD
and Cintrinsic give the peak value of voltage drop in mV and intrinsic capacitance (pF) in the design.
This forms an input to the DCOPT for optimization. The optimization goal is set by the user defined
threshold V thusr . DCOPT performs multiple iterations to bring the peak voltage drop of the design
within this threshold. The results of iteration are shown in subsequent rows. We highlight the
usefulness of dynamic threshold concept by performing decap optimization under following three
cases:

1. Without DT: Decap optimization with no dynamic threshold. A fixed user defined threshold
is assumed during each iteration of DCOPT.

2. With 5% DT: Optimization with 5% dynamic threshold. During each iteration of DCOPT,
current voltage drop threshold is dynamically set to 5% higher than the peak voltage drop
(negative value) in that iteration. User defined threshold act as an upper limit.

3. With 10% DT: Same as 2, except that the current threshold is set to 10% higher during each
iteration.

For each case, results show the number of nodes, #Nd, analyzed for decap addition, total design
decap repository after decap addition to #Nd nodes, and the change in peak voltage drop with each
iteration. For dynamic threshold cases, additional parameter, current threshold, Vcur , is also shown.
Last row in tables show the total decap added to the design (Cadded − Cintrinsic ), and the number of
standard cells with decap after iteration i.

Results clearly show the effectiveness of DCOPT approach to reduce the peak voltage drop with
each iteration. As discussed before, without dynamic threshold, DCOPT starts optimization by
processing all nodes under violation, and gradually moves on to concentrate on a smaller set of
nodes with higher voltage drops. This results into large decap repository for the reasons discussed

77
previously. More number of standard cells are thus added with decap. But, without DT, DCOPT
converges fast. Compared to this, by dynamically changing the voltage drop threshold, DCOPT
is made to concentrate initially on a smaller region with higher voltage drop nodes. Low voltage
drop nodes are optimized later. This provides a tighter control of voltage drop which results into
smaller decap budget, and hence, smaller number of cells with decap. The downside is that more
number of iterations are required for complete optimization. As seen from Table 6.1, the 5% case
sets a tightest control, and reduces the peak voltage drop to approximately −94mV after 6 iterations.
Whereas without DT, and 10% DT case reduces the drop to −92mV and −93mv respectively after
6 iterations with “without DT” case being the fastest. Also, observe the total decap requirement for
three cases. In order to bring down the peak VD to approximately −94mV, “without DT” case adds
10.309pF decap after 5 iterations. This budget reduces to 8.711pF for “with 10% DT” case after 5
iterations. And the decap budget further reduces to 6.822pF for “with 5% DT” case, but it takes 6
iterations to arrive at −94mV voltage drop. Table 6.2 shows similar analysis for benchmark B14.
Figures 6.3 and 6.4 shows the change in voltage drop map for two benchmarks after i iterations for
given cases.

78
Table 6.1: DCOPT Results for Benchmark Barcode16

Initial DvD: Peak VD = −105.625mV ; Cintrinsic = 505.472pF; V thusr = −90mV


Without DT With 5% DT With 10% DT
#i #Nd Cadded Peak #Nd Cadded Peak Vthcur #Nd Cadded Peak Vthcur
(pF) VD (pF) VD (mV) (pF) VD (mV)
(mV) (mV) (mV)
1 525 510.722 −102.08 88 506.355 −103.59 −100.34 207 507.544 −103.10 −95.06
2 256 513.282 −99.77 84 507.195 −101.82 −98.41 241 509.953 −100.78 −92.79
3 128 514.561 −97.75 89 508.084 −100.05 −96.73 222 512.172 −98.56 −90.70
4 85 515.411 −95.85 123 509.315 −98.20 −95.04 152 513.694 −96.46 −90
5 37 515.781 −94.12 140 510.714 −96.28 −93.29 49 514.183 −94.65 −90
6 26 516.041 −92.70 164 512.354 −94.41 −91.47 32 514.503 −93.23 −90
Total Cadded : 10.569pF Total Cadded : 6.882pF Total Cadded : 9.031pF
#Cells with Decap: 525 #Cells with Decap: 206 #Cells with Decap: 289

Table 6.2: DCOPT Results for Benchmark B14

Initial DvD: Peak VD = −174.838mV ; Cintrinsic = 778.799pF; V thusr = −90mV


Without DT With 5% DT With 10% DT
#i #Nd Cadded Peak #Nd Cadded Peak Vthcur #Nd Cadded Peak Vthcur
(pF) VD (pF) VD (mV) (pF) VD (mV)
(mV) (mV) (mV)
1 1421 793.009 −161.34 20 779.009 −170.92 −166.10 30 779.109 −169.50 −157.35
2 825 801.255 −151.22 20 779.209 −167.18 −162.37 36 779.469 −164.46 −152.55
3 694 808.191 −142.74 21 779.419 −163.45 −158.82 84 780.308 −159.49 −148.02
4 575 813.941 −135.55 21 779.629 −159.88 −155.28 108 781.387 −154.99 −143.54
5 461 818.548 −129.74 29 779.918 −156.66 −151.89 160 782.987 −150.70 −139.50
6 388 822.426 −124.81 66 780.578 −153.60 −148.82 133 784.316 −146.69 −135.63
7 321 825.652 −120.59 57 781.149 −150.49 −145.92 179 786.106 −142.69 −132.02
8 277 828.420 −116.93 95 782.098 −147.59 −142.96 135 787.456 −138.89 −128.38
Total Cadded : 49.621pF Total Cadded : 3.299pF Total Cadded : 8.657pF
#Cells with Decap: 1421 #Cells with Decap: 95 #Cells with Decap: 187

The legend map for Tables 6.1 and 6.2:


#i Iteration count
#Nd Number of nodes updated with decap during iteration #i
Cadded Amount of intentional decap in pF added during iteration #i
Cintrinsic Intrinsic amount of decap in pF present in the design
Peak VD Peak value of voltage drop in mV
DT Dynamic Threshold
V thuser User defined voltage drop threshold in mV
V thcur Current voltage drop threshold in mV for the cases with dynamic threshold

79
Highest Lowest
Drop Drop

Figure 6.3: Voltage Drop Maps: DCOPT Results for Barcode16

80
Highest Lowest
Drop Drop

Figure 6.4: Voltage Drop Map: DCOPT Results for B14

81
Chapter 7

ECO-Placer: Engineering Change


Order Placement Algorithm

Engineering Change Order (ECO) is a process of incorporating late stage changes in a design.
Often, a design after placement and routing phases requires small local changes such as gate sizing
or buffer insertions for timing and power fixes, layout modifications to reduce noise problems [45].
Given the time complexity of algorithms involved during physical design, it is unacceptable to re-
iterate through the complete design flow to incorporate these changes. Moreover, algorithms in
a typical design flow are general purpose, designed to generate physical placement and routing
from scratch. In most of the cases, design changes are requested to optimize certain design metrics
with respect to the current layout configuration. A placement from scratch may invalidate those
optimizations. ECO process saves considerable development time by incorporating these changes
incrementally in the design. Since changes are applied locally and incrementally, ECO placement
can optimize design metrics without a significant perturbation to the original placement.

An important characteristic of an ECO placer is that it should apply design changes to the orig-
inal placement with minimal perturbation so as to maintain design quality metrics of the original
placement. Accommodating design changes with certain design objectives such as voltage drop op-
timization places an additional requirement on the ECO placer of maintaining the relative placement
order of the standard cells. It is equally important that ECO process takes substantially less time
as compared to general purpose placement algorithms, which can take many hours to even days
depending on design complexity and available resources to produce a good placement.

82
ECOs can be applied at various stages of design flow. In this chapter, we are concerned with
accommodating changes in an already placed standard cell based design. More specifically, we
are interested in algorithm for post-layout design optimization using an ECO process. In chapter 5,
we presented a voltage drop optimization framework which requires incorporating equivalent decap-
padded standard cells in place of OSULIB standard cells. As discussed in previous chapters, precise
physical placement control of decaps is important from voltage drop point of view. Therefore, we
present, in this chapter, a C++ based Engineering Changer Order (ECO) Placement tool to cater to
this need.

7.1 Related Work

Increasing design complexity has made the process of design optimization so expensive that in-
cremental techniques to evaluate different alternatives are often sought after. Research literature is
splattered with number of incremental placement techniques. Many of these techniques are geared
toward modifying placement and accommodating changes incrementally during the physical syn-
thesis stages itself. They work in close synchronization with global and detail placers to apply
design changes incrementally [46, 47]. Our focus is to apply changes to a design, which has already
been placed (i.e. post-layout designs). We further restrict application of ECO technique to standard
cell based layout.

Many standard CAD tools support various ECO flows [32]. These ECO flows typically update
the placement by comparing old design netlist with new netlist. The netlist changes are incorporated
into the original placement such that it leads to least total movement of cells thereby generating
a new placement which is close to original placement. However, a placement modification with
only minimal cell movement objective may cause some of the cells to be placed at undesirable
locations. Some objectives like voltage drop optimization are highly dependent on the location of
cells. Updating a placement with least cell movement can result in unacceptable voltage drop results
thereby making the sole purpose of incorporating design changes ineffective.

Algorithms for incremental placement modification for post-layout standard cell based designs
are presented in [48, 49, 50]. In [48], authors presented an ECO algorithm to improve useful clock
skew of a design. Cell positions are locally adjusted in an attempt to enlarge positive and negative
skews. Incremental placement modification to improve post-route routing congestion in a stan-

83
dard cell based design is given in [49]. However, these algorithms do not incorporate new design
changes. Cell positions in original placement are modified so as to optimize particular design ob-
jective such as clock routing or routing congestion. In [50], an incremental placement algorithm
for a standard cell layout is presented to incorporate design changes while maintaining placement
close to original. Requested changes are applied to the placement with objectives of wirelength
minimization, and least total movement of cells. Although, relative placement order of cells is also
maintained, which is an important from voltage drop point of view, updating a placement for voltage
drop optimization was not the main theme of work. We present an ECO placement algorithm with
modification of approach presented in [50] to apply changes to a post-layout standard cell based
design. The proposed modification reduces the overall computational matrix size significantly as
described in later sections thereby enabling a faster and efficient ECO process. We further extend
the algorithm to support for variable core area. This is important if design changes cannot be in-
corporated due to insufficient whitespace in the design. Also, unlike [50], our algorithm allows for
design changes that lead to whitespace recovery, which can be used for subsequent design changes.

7.2 Proposed Approach: ECO-Placer

With overall objective to update a design placement from voltage drop point of view, we present
an Engineering Change Order Placement Algorithm: ECO-Placer for standard cell based designs.
Although, the algorithm serves as one of the component of voltage drop optimization framework
described in chapter 5, ECO-Placer can also be used in standalone configuration to apply post-layout
design changes. The main features of ECO-Placer are:

• As mentioned earlier, voltage drop inside a design not only depends on the power consump-
tion of logic cells, but also on their placement. A logic cell placed away from the supply
voltage input pad is likely to suffer a higher voltage drop compared to a cell in vicinity of
input supply. Therefore, a distributed decap placement approach can only be effective, if de-
cap placement can be controlled properly. ECO-Placer applies design changes at requested
locations while minimizing total cell movement and maintaining relative placement order of
instances.

• Original placement is minimally perturbed so as to maintain design quality metrics of original


placement.

84
• It applies design changes optimally with fast run time. ECO-Placer generates a significantly
smaller computational matrix to apply changes optimally as compared to [50].

• Quite often, design changes during ECO flow require core area modification. Fixed-die ECO
placers assume enough whitespace in the core area to accommodate necessary changes. Since
decap-padded standard cells have relatively higher area compared to nominal standard cells, a
core area increase might be necessitated for voltage drop optimization. ECO-Placer supports
variable core area to accommodate design changes.

• It supports three types of design changes: new cell insertion, existing cell deletion, replace-
ment of an existing cell with a new cell. In doing so, it allows for whitespace recovery. The
additional whitespaces can be used for subsequent design change requests.

• Finally, it is equally important to generate the modified placement in a standard format so


that it can be read by standard placement tools. ECO-placer generates final placement in a
standard format such as Design Exchange Format (DEF).

The approach adopted in ECO-Placer is shown in Figure 7.1. Design changes are supplied to
ECO-Placer in a form of cell operation file described in Section 7.3. Each requested design change
is referred as an operation on a design. Operations may include insertion of a new cell, deletion
of an existing cell or replacement of an existing cell with a new cell. We regard original timing
and wirelength driven placement as a reference placement. ECO-Placer incrementally modifies the
reference placement in three phases to apply the operations and generate a new placement DEF. In
Phase I, ECO-Placer tries to apply operations optimally, if enough whitespace is present the design.
Phase II generates whitespace to accommodate operations by selecting candidate cells from a row
and moving them to their optimal position. Core area is increased in Phase-III, if Phase-II is unable
to generate required whitespace. Algorithmic detail of these three phases is discussed in section 7.4.

7.3 ECO-Placer Input Requirements

ECO-Placer algorithm requires following inputs, supplied in the form of a command file.

• Post Process Design Placement:


Post process design placement (DEF) is a sorted version of post-route design placement

85
Input Command File
Design Placement Design Netlist Cell Cell Filler Prefix
(DEF) (Verilog) Operations Area Max Core Inc %

Prepare Cell Operation Database


Prepare Placement Database
Prepare Cell Connectivity Database

Phase-I: Fixing Operations


Apply operations with least cell
movement and maintain
relative placement order
Lock Operated Cells
Cell Operations
INSERT, DELETE, REPLACE
Check DRC

Phase-II: Candidate Cell


Selection and Movement
Operate on Violating rows from
Phase-I
Cost based candidate selection
Cost = Į*C1 + ȕ*C2 + Ȗ*C3
Move candidate to optimal position
Check DRC

Phase-III: Core Area Adjustment

Increase core area by placement


grid size
Update Violating rows from Phase-II
Reiterate through Phase-II

Output
Modified Placement (DEF)

Figure 7.1: ECO-Placer Block Diagram

86
(DEF) generated using a perl script. This file is same as the one used for DC Server. This acts
as a reference placement, which ECO-Placer modifies incrementally.

• Design Verilog (.V)


Post-routed design netlist in Verilog format is supplied to ECO-Placer to generate cell con-
nectivity database. The cell connectivity information is required for Phase II of ECO-Placer.

• Cell Area information


Standard cell area information is required in ECO-Placer to avoid DRC violations with adja-
cent cells and to calculate available whitespace in a row.

• List of cell operations


ECO-Placer supports three types of operations as mentioned in previous section. These three
types of operations are supplies in following format.
START ROW <row_number>
- <command> <parameters>
Where, <row_number> refers to row to which operations are to be applied. <command>
can be REPLACE, INSERT, and DELETE.
Based on the command type, <parameters> assume different definitions as shown below:
- REPLACE <instance_name> <cell_name> <new_cell_name>
- INSERT <new_instance_name> <new_cell_name> <X> <Y>
- DELETE <instance_name> <cell_name>
Where, <instance_name> and <cell_name> refers to the original cell in the place-
ment. In case of REPLACE operation, original cell is replaced by a new cell. Instance name of
cell is maintained same, while an optional <new_cell_name> can be specified. INSERT
operation requires new cell name as well a unique instance name (<new_instance_name>).
New cell is inserted at <X> <Y> location in the design.
For example, following shows three operations on row 2.
START ROW 2
- REPLACE U31 INVX4 INVDCX4
- INSERT U33n AND2X1 150 2000
- DELETE U23 BUFX2

• Maximum core increase percentage


A maximum limit on allowable increase in core area is specified in terms of core area per-

87
centage. ECO-Placer can only increase core area up to this limit.

• Filler Prefix
Filler cells are inserted in available whitespace during a design placement. Filler cell instance
name is typically prefixed with a user defined string for easy identification. Filler prefix string
is required to identify filler cells in a design and to recover available whitespace in a design.

7.4 Algorithm

Algorithm 3, 4, 5, 6, 7 show pseudo codes for ECO-Placer with Algorithm 3 as its main program.
As maintained earlier, ECO-Placer takes inputs defined in command file, and generates a modified
placement in standard design exchange format (DEF). Command parameters such as filler prefix,
maximum allowable core area increase are read from the command file. Current core area, rows
and placement grid size are obtained from input placement file. Each cell in design is connected to
few other cells, referred to as neighbors, through its inputs and outputs. Design verilog file is parsed
to generate cell connectivity database for each cell in the design. The cell connectivity database is
required for wirelength estimation during phase II. From cell operations list, algorithm prepares row
wise cell operations list. A row is termed as an operation row, if one or more number of operations
is to be performed on that row. Algorithm enters phase-I to apply operations in each operation row.
Phase-I fixes all operations without regard to available whitespace in the operation row. As a result,
some of the rows at the end of Phase-I becomes violating rows. A row is called a violating row, if
total logic width in that row exceeds the core width. These violating rows are operated in phase-II.
Cells from violating rows are selected and moved to their optimal position in order to make that row
non-violating. If phase-II cannot make the row non-violating, Phase-III increases the core area to
create additional whitespace and Phase-II is repeated again. When all rows become non-violating,
then a modified placement in DEF format is generated. Details of individual phases are given in
subsequent subsections:

7.4.1 Phase-I: Fixing Operations

Suppose n operations o1 , o2 ,. . . , on are to be performed on a operation row r having available


whitespace W Sa . Further consider that wd1 , wd2 ,. . . , wdn are the whitespace demands for each oper-
ation respectively. Whitespace demand can assume value positive, negative or zero depending on the

88
Algorithm 3: ECO-Placer main
Input: ECO-Placer Command File
Output: Modified Placement DEF

para ← Get parameters from the command file;


OpList ← Prepare operation row list ;
ConnList ← Prepare cell connectivity database;
V List ← List of violating rows;
foreach Row r in OpList do
reqws = GetRequiredWhitespace(r);
rowsts ← Mark row as non-violating;
rowsts = PhaseI(r, reqws);
if rowsts is violating then V List ← Add(r);
end
FAIL ← FALSE;
while V List not empty .and. not FAIL do
foreach ViolatingRow vr in V List do
rowsts = PhaseII(vr);
if rowsts is nonviolating then V List → Remove(vr);
end
if V List still not empty then FAIL = PhaseIII(para);
end
if not FAIL then GeneratePlacement();

type of operation. A positive value indicates consumption of available whitespace of row, whereas
negative value makes an addition to whitespace repository of the row. A zero whitespace demand
causes no change in available whitespace in the row. Whitespace demand wdi for an operation oi is
calculated as follows:

• For REPLACE operation, an existing cell having width worg will be replaced by a new cell
with width wnew . Hence, whitespace demand wdi is given as

wdi = wnew − worig (7.1)

• For DELETE operation, deletion of an existing cell leads to whitespace recovery. Hence we

89
have,
wdi = −worig (7.2)

• For INSERT operation, whitespace demand will be equal to width of a new cell. Hence we
have,
wdi = wnew (7.3)

Therefore, total whitespace demand for applying n operations on row r will be the sum of whitespace
demand for individual operation. Mathematically, this can be written as
n
Required Whitespace: W Sr = ∑ wdi (7.4)
i=1

Two cases can be considered. Algorithm 4 shows the pseudo-code for these two cases:

Case 1: (W Sr −W Sa ) ≥ 0
Whitespace available in operation row r is less than or equal to the required whitespace to
apply n operations. Hence, row r can not accommodate requested operations. In this case,
we skip the optimization process since all cells in the row will be moved anyway. Row r is
allowed to expand beyond core width. All requested operations are applied from left to right.
As a result, row r crosses the core boundary. This situation leads to a violating row. Once all
operations are legalized, a design rule check is performed to ensure that adjacent cells do not
overlap. Algorithm enters into Phase-II to operate on violating rows.

Case 2: (W Sr −W Sa ) < 0
The whitespace demand to apply operations to an operation row r is less than available whites-
pace of row r. This indicates that row r can accommodate all operations without crossing the
core boundary. In such cases, optimal solution is found to apply operations such that it leads
to minimum perturbation (total movement of cells). In order to apply an operation, an inci-
sion point is defined. Incision point is a point at which requested operation is to be applied.
Thus, incision point divides row r into two halves: left and right. In case of DELETE and
REPLACE operations, incision point is aligned with the leftmost boundary of the cell un-
der operation (cell to be deleted or replaced). Hence, cell under operation becomes part of
right halve of row r. Alignment of incision point in case of INSERT operation depends on
whether the requested insertion location for the new cell is occupied by a logic cell. If it is,
incision point aligns with leftmost boundary of the overlapped cell, and new cell is inserted

90
Algorithm 4: ECO-Placer Phase-I: Fixing Operations
Input: Row r, Required Whitespace reqws
Output: Row Status rowsts

availws = GetAvailableWhitespace(r);
if reqws ≥ availws then
Legalize operations in row r from left to right;
rw ← Update row r width;
Perform design rule check (DRC) for r;
if rw > cw then rowsts ← VIOLATED;
else
foreach Operation o for r in OpList do
incpt ← Get Incision Point for o in r;
/*Applying operation optimally*/ ;
if reqws > 0 then
lbuck ← Create (cell, gain) bucket left of incpt;
rbuck ← Create (cell, gain) bucket right of incpt;
s ← FindLeastCellMovement(lbuck, rbuck);
end
Legalize operation(o, r, s);
Perform design rule check (DRC) for r;
end
end

before or after the overlapped cell depending on whether requested location is near left or
right boundary of the overlapped cell. Otherwise, if the requested insertion location falls onto
a whitespace, incision point is aligned with rightmost boundary of cell just before the whites-
pace, and cell is inserted at that location only. Optimization process is applied only if an
operation requires creation of whitespace or in other words, whitespace demand is positive.
Whitespace demand for DELETE operation is always negative, and hence, optimization is
skipped for it. REPLACE operation also does not require optimization, if width of new cell
is less than the existing cell. Such cases always result into whitespace recovery, which can
be utilized for subsequent operations. For all other cases, algorithm discussed next is used to
apply operations optimally.

91
Finding an Optimal Solution

We seek an optimal solution for applying an operation o such that least number of cells are moved
from their original position, and relative placement order of cells is maintained. Optimal solution
is found by moving least number of cells in horizontal direction in an operation row r to satisfy
whitespace demand of o. Assume that whitespace demand for operation o is wd. As discussed
in previous paragraph, an incision point for an operation o creates two sub-rows, a left sub-row
and a right sub-row, of an operation row r. Relative placement order is maintained by restricting
movement of cells in left sub-row only to left and cells in right sub-row only to right. Both left and
right sub-row contains number of cells and whitespaces. Any unused area between two adjacent
cells is counted as a single whitespace. We define following parameters for left sub-row:

CL : Total number of cells in left sub-row


WL : Total number of whitespaces in left sub-row
gli : Whitespace gain if whitespace i counted from incision point to the left is consumed
nli : Number of cells between whitespace i and i − 1
GLi: Cumulative whitespace gain if first i whitespaces from incision point to the left are consumed
NLi: Cumulative total number of cells moved to left to get cumulative whitespace gain of GLi

Therefore, if first whitespace from incision point is consumed, we get GL1 = gl1 , and NL1 = nl1 .
If first two whitespaces are consumed, GL2 and NL2 are (GL1 + gl2 ) and (NL1 + nl2 ) respectively.
Continuing this, if first i whitespaces counted from incision point are consumed, we get
i≤WL
GLi = ∑ glk (7.5)
k=1

i≤WL
NLi = ∑ nlk (7.6)
k=1
Similar equations for right sub-row can be obtained. In this case, first i whitespaces are counted
from incision point to the right. Definition of parameters is similar to left sub-row, only L (left) is
replaced by R (right).
j≤WR
GR j = ∑ grk (7.7)
k=1
j≤WR
NR j = ∑ nrk (7.8)
k=1

92
Therefore, optimal solution to find least movement of cells while maintaining relative placement or-
der of cells in a row r can be obtained by solving following integer linear programming formulation
of the problem:

Minimize (NLi + NR j )
Subject to
0 ≤ NLi ≤ CL , and i ≤ WL
0 ≤ NR j ≤ CR , and j ≤ WR
GLi + GR j ≥ wd (7.9)

In order to find an optimal solution, we can get following two cell(gain) buckets from equations 7.5
to 7.8:
Left Bucket:
NL1 (GL1 ), NL2 (GL2 ), . . . , NLi (GLi )
Left bucket contains i elements in an ascending order, such that ith node is the first element
satisfying the condition: GLi ≥ wd and i ≤ WL . Each bucket element corresponds a whitespace in
left sub-row, and is represented as a pair of cumulative number of cells moved to get cumulative
whitespace gain, if that whitespace is to be consumed.

Right Bucket:
NR1 (GR1 ), NR2 (GR2 ), . . . , NR j (GR j )
Right bucket contains j elements in an ascending order, such that jth node is the first element
satisfying the condition: GRi ≥ wd and j ≤ WR . Similar to left sub-row, each bucket element
corresponds a whitespace in right sub-row.

Number of potential solutions ps satisfying equation 7.9 can be obtained by combining each
element of left bucket with every element of right bucket. Every potential solution containing (NL +
NR) represents movement of NL cells in left row and NR cells in right row. Optimal solution would
be a potential solution having least (NL + NR). Since the size of left and right bucket is WL and WR
respectively, combining left bucket elements with right bucket elements would result in worst case
search space size of χs ← (WR × WL ). However, this worst case situation will never occur due to
the fact that the elements in left and right bucket are arranged in successively increasing order. We

93
Algorithm 5: FindLeastCellMovement
Input: Left Bucket lbuck with i elements, Right Bucket rbuck with j elements
Output: Optimal solution s

ps: Potential solutions;


lhigh = length[lbuck];
rhigh = length[rbuck];
/*see text description for GL, GR, NL, NR definitions*/ ;
if GLlhigh ≥ wd then
ps ← (NLlhigh + 0);
lhigh ← lhigh − 1;
end
if GRrhigh ≥ wd then
ps ← (0 + NRrhigh );
rhigh ← rhigh − 1;
end
foreach element l in lbuck in descending order do
foreach element r in rbuck in ascending order do
/*wd: whitespace demand*/ ;
if (GLl + GRr ) ≥ wd then
ps ← (NLl + NRr );
break;
end
end
if rbuck scanned to rhigh then break ;
end
s ← ps with lowest (NL + NR);
return(s);

make use of this property to combine bucket elements in a specific order, which reduces the search
space significantly. The search space is explored by scanning the left bucket in descending order.
For each element of left bucket, we combine it with element of right bucket in ascending order until
a first potential solution in right bucket direction is encountered. Exploration process stops when
either all elements have been explored, or right bucket has been completely scanned at least once. A
pseudo-code for the process is shown in Algorithm 5. As a result of this, the solution space explored

94
CL = 12 WL = 5 CR = 13WR = 7

4 8 2 3 2 1 2 3 3 4 4 2

Standard Cell gl Whitespace with gain gl


Incision Point

Figure 7.2: Example Operation Row with INSERT Operation

by our modified approach is always going to be less than χs . Further, comparing to this, the solution
space explored in [2] is equal to ψs ← [(CR + 1) × (CL + 1)], because authors in analyzes left and
right sub-rows on cell by cell basis. Since the total cell count in a typical row in a standard cell
based design is always significantly higher than the total number of whitespaces, we have χs ≪ ψs .
Hence, modified approach finds an optimal solution with significantly less number of computations.
Figure 7.2 exemplifies the difference.

Figure 7.2 shows an example operation row with incision point at Cell 12 for an (INSERT)
operation. Suppose the whitespace demand for this operation is 8. Necessary row parameters are
shown in the figure. Solution space generated by [50] contains 156 matrix elements. Compared to
this, the solution space produced by our approach contains only 15 elements as shown in Figure 7.3.
All potential solutions are shown in bold, and optimal solution is underlined. Clearly, number of
computations required are significantly reduced using this modified approach.

7.4.2 Phase-II: Candidate Cell Selection and Movement

Phase-II of ECO-Placer is invoked, if application of Phase-I results into one or more violating
rows. Figure 7.4 shows an example row distribution at the end of Phase-I. Three types of rows can
be identified from Figure 7.4: (1) Rows having no whitespaces, and aligning with the core boundary
(for ex, row 2), (2) Rows having whitespaces and remaining inside the core boundary (for ex, rows
3, 7), and (3) Violating rows crossing the core boundary (rows 4, 5). Note that as a result of Phase-I,
violating rows do not contain any whitespaces. Phase-II operates on each of these violating rows,
and moves enough number of cells from violating rows to second type of rows, so that violating row
becomes non-violating. Algorithm 6 shows the pseudo-code for Phase-II.

95
Left Bucket

Nx(Gx) - Optimal Solution


Nx(Gx) - Potential Solutions
Nx(Gx) - Other Solutions

Right Bucket

Figure 7.3: Solution Space by our Approach

ROW 9

ROW 8

ROW 7

Violating
ROW 6 Rows

ROW 5

ROW 4

ROW 3

ROW 2

ROW 1

ROW 0

Core Width (cw)

Standard Cell Whitespace

Figure 7.4: Example Row Distribution after Phase-I

96
Consider a violating row vr having width Wvr . Assume core width is cw. Number of cells with
widths w1 , w2 , . . . , wn are selected and moved from vr to rows which can accommodate these cells,
such that

n
(Wvr − ∑ wi ) ≤ cw (7.10)
i=1

These cells are called Candidate Cells. Candidate cells are free to move in horizontal as well as
vertical direction. On the other hand, some of the cells, allowed to move only in horizontal direction,
are referred to as Locked Cells. Cells operated in Phase-I belong to this later category. A candidate
cell cc is selected for movement, if it satisfies following three conditions:

1. It minimizes cell cost. (discussed in following subsection)

2. Optimal position of cell cc belongs to a row of type (2), referred to as an optimal row.

3. Optimal row can accommodate cell cc without being converted to a violating row.

Candidate Optimal Position Calculation

Optimal position of a candidate cell is calculated based on the balance of forces exerted on can-
didate cell by its neighbors. A cell is typically connected to few other cells through its inputs and
outputs. The force experienced by a candidate cell due to its neighbor is proportional to their con-
nection length (wirelength). Optimal position is one where forces due to neighboring cells balance
out, and candidate cell experiences a zero-force. A cell in its zero-force location also minimizes
total wirelength, which is defined as sum of individual connection length to each neighbor. Optimal
position zpos ← (xopt , yopt ) of a candidate cell is calculated as [51]:

nc
∑ λk ·xk
k=1
xopt = nc (7.11)
∑ λk
k=1
nc
∑ λk ·yk
k=1
yopt = nc (7.12)
∑ λk
k=1

Where, nc refers to number of candidate cell neighbors, λk is the connection weight for each neigh-
bor.

97
Candidate Cost Calculation

A candidate cell is selected for movement to zpos, if it minimizes following weighted cost func-
tion:
cccost = α ·C1(cc) + β ·C2(cc) + γ ·C3(cc) (7.13)

Where, α, β, and γ are the weights set empirically, and C1,C2,C3 are the cost functions described
below:

C1(cc) represents the first component of candidate cell cost in terms of change in wirelength,
if candidate cell is moved to its optimal position. We want to minimize this component. C1(cc) is
given as:
C1(cc) = ∆W L(zpos, nc) (7.14)

Total wirelength of a net connecting a candidate cell cc to its neighbors nc is given by [51]:
nc nc
W Lcurr = ∑ (x − xk ) + ∑ (y − yk ) (7.15)
k=1 k=1

Where, wti is the weight of edge connecting cc to neighbor i , (x, y) refers to candidate cell position,
and (xi , yi ) refers to neighboring cells position.

When a candidate cell moves to its optimal position zpos, equation 7.15 can be used to calculate
new wirelength, W Lopt , of the net. Then, following defines the change in wirelength:

∆W L(zpos, nc) = W Lcurr −W Lopt (7.16)

Second component of cost is defined as:

C2(cc) = −1 ∗ SIZE(cc) (7.17)

Where, SIZE refers to size of candidate cell cc. Higher the width of cell cc, less will be the cost C2.
This indicates, less number of candidate cells are required to be moved to adjust the violating row.

Final component of cost is defined as:

C3(cc) = PW R(cc) (7.18)

Where, PWR refers to power consumption of candidate cell cc. Lower the power consumption of
cell cc, less will be the cost C3. A candidate cell with lower power consumption is selected because
moving this cell to a new row will result into less load on the power distribution connection for that
row, and hence would result in less voltage drop.

98
Algorithm 6: ECO-Placer Phase-II: Candidate Cell Selection and Movement
Input: Violating Row vr
Output: Row Status rowsts

rowsts ← V IOLAT ED;


cclist ← Get candidate cells in vr;
foreach Candidate Cell cc in cclist do
zpos ← Calculate optimal position for cc;
nc ← Get cc neighbors;
cccost = α · ∆WL(zpos, nc) − β · SIZE(cc) + γ · PWR(cc);
end
Sort Candidate Cells in ascending order wrt cccost;
foreach cc in cclist do
if cc can be moved to zpos then
Move the cc to its zpos in optimal row;
rowsts ← Update vr;
if rowsts is NONVIOLATED then break;
end
end
return rowsts;

7.4.3 Phase-III: Core Area Adjustment

Finally, if Phase-II is unable to change a violating row to a non-violating, Phase-III is used to


increase the core area as shown in Figure 7.5. Core area is increased by a minimum placement
grid size as defined in design initial placement DEF. Increasing core area causes extra whitespace
creation. As a result, some of the rows may change their types, discussed in previous section. Type
(1) rows will change to type (2) rows with added whitespace. And type (3) rows, violating ones, may
change to type (1) or type (2) depending on the amount of core increase. Core width is increased by
an amount equal to placement grid size defined in DEF for a specific technology node.

If there remains one or more violating rows even after core area increase, Phase-II is applied
again. Since Phase-III has created extra whitespaces in many rows, application of Phase-II again
can now resolve violation rows by moving cells to their optimal position.

Application of Phase-II and Phase-III continues in a loop, until all rows become non-violating

99
i.e. belong to type (1) or type (2) category, or core dimension has reached its maximum limit as
defined by user.

Algorithm 7: ECO-Placer Phase-III: Core Area Adjustment


Input: Core Width cw, Max Core Width Allowed maxcw
Output: Status Flag FAILST S

FAILST S ← FALSE;
if cw < maxcw then
cw += placement grid size;
foreach ViolationRow vr in V List do
Update vr;
end
else
Report: operations can not be applied;
ST S ← TRUE;
end
return FAILSTS;

ROW 9

ROW 8

ROW 7

Violating
ROW 6 Rows

ROW 5

ROW 4

ROW 3

ROW 2

ROW 1

ROW 0

Core Width (cw)

Standard Cell Whitespace

Figure 7.5: Reduction in Number of Violating Rows with Core Area Increase

100
7.5 Experimental Results

The ECO-Placer algorithm discussed in this chapter is evaluated on benchmarks given in Section
4.5.1. Experiments were performed on Sun Blade 1000 machines (SparcV9 Processor, 750MHz,
2GB RAM) with Solaris platform.

We compare the Placement results generated by our ECO-Placer with Cadence Encounter ECO
generated placement. We compare two placements specifically in following terms:

• Quality of Placement generated from voltage drop point of view - Since developed ECO-
Placer is aimed at applying operations for voltage drop optimization, generated placement
after ECO flow should reflect the expected voltage drop results.

• Total number of cells moved - One of the important characteristics of ECO-Placer is to apply
operations with least total number of cells moved. We compare this metric with Encounter
ECO placement.

• Change in Wirelength - We compare total wirelength of ECO-placer generated placement


with that of initial placement and Encounter ECO placement.

• Change in Worst Negative Slack - Worst negative slack of the design before and after ECO-
Placement is also compared.

Table 7.1 shows the ECO-Placer results. Upper table gives the specifics of initial placement of
benchmarks. Columns 2 to 4 show the total number of cells, peak voltage drop of the design,
total wirelength of initial placement, and worst negative slack in the design. Column 2 in lower
table gives the number of operations to be applied through ECO flow. Here, we consider only RE-
PLACE operations (replacing a OSULIB standard cell with an equivalent decap-padded cell from
UCDCLIB). These operations are applied to the initial placement using Cadence Encounter ECO
(Column 3) as well as using our ECO-Placer approach (Column 4). And last 5 columns show the
percentage change in ECO-Placer generated placement metrics with respect to initial placement
metrics and Encounter ECO placement metrics respectively. As seen from the tabular results for all
three benchmarks, our approach, the ECO-Placer, applies all the requested operations with minimal
perturbation and least total number of cells movement. As a result, unlike Cadence Encounter ECO
placement, the voltage drop profile of a ECO-Placer generated design shows significant improve-
ments. For example, the peak voltage drop of benchmark B14 is −174.84mV before ECO. Total 336

101
operations are applied to B14 initial placement. The Cadence Encounter ECO degrades the peak
drop to −211.41mV, whereas ECO-Placer improves the voltage drop to −140.82mV. This is due
to the fact that voltage drop is placement dependent, and ECO-Placer applies operations such that
relative placement order of cells is maintained. Our algorithm improves the peak voltage drop by
−19.46% (Column 5) over peak drop in initial placement and by −5% (Column 6) over peak drop
in Encounter ECO placement. Further, it can be seen that ECO-Placer requires less number of cell
movements (#TCM) as compared to Cadence ECO to apply these operations. Column 7 shows that
total wirelength of placement generated by ECO-Placer increases as compared to initial placement
wirelength, however the increase in wirelength is very small. Moreover, ECO-Placer is not aimed
at wirelength optimization. Maintaining a relative placement order from voltage drop point of view
can result in slight increase in wirelength. However, ECO-Placer provides impressive wirelngth
results (Column 8) over Cadence Encounter ECO. ECO-Placer results in decrease in wirelength
as compared Encounter ECO for two cases. Lastly, ECO-Placer provides better results for worst
negative slack in the design as compared to Cadence Encounter ECO. Results clearly exemplify the
effectiveness of ECO-Placer algorithm.

Further, in order to highlight the characteristics of ECO-Placer in maintaining the relative place-
ment order and generating a placement from voltage drop point of view, we show the voltage drop
profiles for initial design placement and ECO placed design in Figures 7.6, 7.7, and 7.8. From fig-
ures, it can be noticed clearly that the ECO-Placer generated placement results into a better voltage
drop profile as compared to that generated by Cadence Encounter ECO placement.

102
Table 7.1: ECO-Placer Results

Initial Placement & DvD (1)


Benchmark #Cells Peak WL WNS
VD (µm)
Barcode16 7574 −105.63 212523.15 −1.783
B14 11780 −174.84 680912.15 −3.035
B18 71761 −258.61 3287571.1 −10.603

ECO Placement & DvD


103

Encounter ECO (2) ECO-Placer (3)


Benchmark #OP Peak WL #TCM WNS Peak WL #TCM WNS %VD† %VD†† %WL§ %WL§§ %WNS¶
VD (µm) VD (µm)
Barcode16 549 −98.49 229016.55 2325 −2.987 −92.78 217629.85 911 −2.977 −12.14 −6.12 2.40 −5.23 −0.3
B14 336 −211.41 705297.5 3315 −3.133 −140.82 712394.3 2168 −3.104 −19.46 −5.0 4.6 1.0 −0.9
B18 2825 −248.80 3561509.8 22700 −10.669 −233.44 3492255.7 15482 −10.544 −9.73 −6.6 6.2 −2.0 −1.2
Negative value for last 5 columns indicates an improvement
† Change in Voltage Drop Compared to Initial Placement (1) †† Change in Voltage Drop Compared to Encounter ECO Placement (2)
§ Change in Wirelength Compared to Initial Placement (1) §§ Change in Wirelength Compared to Encounter ECO Placement (2)
¶ Change in WNS Compared to Encounter ECO Placement (2) WNS: Worst Negative Slack
#OP: Number of Operations (REPLACE Operations) #TCM: Total Number of Cells Moved
Peak VD: Peak Voltage Drop WL: Total Wirelength
Design: Barcode16
Highest
Drop

Lowest
Drop
VD Color Map
DvD on Initial Placement
Peak VD = -105.63 mV
ENCOUNTER ECO ECO-PLACER

DvD on Encounter ECO generated Placement DvD on ECO Placer generated Placement
Peak VD = -98.49 mV Peak VD = -92.78 mV

Figure 7.6: ECO-Placer Results for Barcode16

104
Design: B14
Highest
Drop

Lowest
Drop
VD Color Map
DvD on Initial Placement
Peak VD = -174.84 mV
ENCOUNTER ECO ECO-PLACER

DvD on Encounter ECO generated Placement DvD on ECO Placer generated Placement
Peak VD = -211.41 mV Peak VD = -140.82 mV

Figure 7.7: ECO-Placer Results for B14

105
Design: B18
Highest
Drop

Lowest
Drop
VD Color Map
DvD on Initial Placement
Peak VD = -258.61 mV
ENCOUNTER ECO ECO-PLACER

DvD on Encounter ECO generated Placement DvD on ECO Placer generated Placement
Peak VD = -248.80 mV Peak VD = -233.44 mV

Figure 7.8: ECO-Placer Results for B18

106
Chapter 8

Overall Optimization Results

In this Chapter, result of our voltage drop optimization framework on benchmarks described
in Section 4.5.1 are presented. Sections 4.5.1 and 4.5.2 provides benchmark details and analysis
procedure. We analyze four different cases outlined in Section 4.5.2 for each benchmark. We
summarize these cases briefly for easy reference:

• Pre-Opt: case refers to volage drop analysis on nominal design (initial place-n-outed design).
In this case, total decap budget comes from intrinsic cell decap.

• Post-Opt(F): case refers to voltage drop analysis on nominal design optimized using only
filler-based decap approach (traditional method). In this case, total decap budget comes from
intrinsic and filler-replaced decap cells.

• Post-Opt(D): case refers to voltage drop analysis on nominal design optimized using our
approach only. The decap in this case comes from intrinsic cell decap and decap-padding
from UCDCLIB cells.

• Post-Opt(DF): case refers to voltage drop analysis on nominal design optimized using both
our as well as traditional approach. Decap sources in this case are intrinsic cell decaps, filler-
replaced decaps and decap-padding from UCDCLIB cells.

107
8.1 Barcode16 Design

• Voltage drop analysis using Traditional Approach


Synthesized design is place-n-routed using Cadence SOC encounter. Logic cells from OSULIB li-
brary and filler cells from DCFLIB library are used for design placement. The design contains 7574
logic gates and 1947 filler cells. Placement parameters are:

Core utilization 75%


Power Ring Metal 5 and Metal 4
Power Stripe One in middle (Metal 4)
Die Area 585250 x 576000

The results of DvD analysis on this nominal design are shown below under Pre-Opt case. We use
PrimeRail decap insertion flow to optimize voltage drop. After replacing all filler cells with decap
cell masters from DCFLIB library, results of DvD analysis obtained are shown under Post-Opt(F)
case in the following table.

Design Total Design Peak VD


Case Decap (pF) (mV)
Pre-Opt 505.476 -105.63
Post-Opt(F) 544.01 -100.49

• Voltage drop analysis using Our Approach


We demonstrate the efficacy of our approach by graphically selecting list of cells from peak voltage
drop regions, and replace them with equivalent cells from UCDCLIB library (refer to Section 4.5.2
for details). In this case, we replace 525 standard cells belonging to region where peak voltage drop
is higher than 90mV. We invoke ECO-Placer to replace these 525 cells with equivalent UCDCLIB
cells. All three libraries OSULIB, UCDCLIB, and DCFLIB are utilized for placement. The de-
sign contains 7574 logic and 1698 filler cells. The core area after eco-placement remains same as
previous.

The result of DvD on this design optimized only using UCDCLIB cells are shown under Post-
Opt(D) case. We then replace all filler cells with decap masters from DCFLIB library using PrimeRail
decap insertion procedure, and report the result of DvD under Post-Opt(DF).

Figures 8.1 shows the peak voltage drop and decap budget result for four different cases in
graphical form. From figures, it is clear that improvement in voltage drop due to filler-replaced
decap is not as significant as that obtained using decap-padded standard cells. After replacing all

108
Design Total Design Peak VD
Case Decap (pF) (mV)
Post-Opt(D) 522.584 -92.78
Post-Opt(DF) 556.196 -90.93

Figure 8.1: Optimization Result Graphs for Barcode 16: Peak VD and Decap Budget

filler, voltage drop improves to -100.49 mV only with decap budget os 544.01 pF. Whereas, our
approach reduces voltage drop to -92.78 mV with decap budget of 522.584pF only. Further we can
see that, replacing all filler in this optimized design does not improve drop by a large amount even
though it takes large decap budget. Figure 8.2 shows voltage drop map for these four cases.

8.2 B14 Design

• Voltage drop analysis using Traditional Approach


Synthesized design is place-n-routed using Cadence SOC encounter. Logic cells from OSULIB
library and filler cells from DCFLIB library are used for design placement. The design contains
11780 logic gates and 1507 filler cells. Placement parameters are:

Core utilization 75%


Power Ring Metal 5 and Metal 4
Power Stripe One in middle (Metal 4)
Die Area 685800 x 656000

DvD analysis results for Pre-Opt and Post-Opt(F) cases are shown in following table. All 1507
fillers are replaced with decap masters for Post-Opt(F) case.

109
Highest Lowest
Drop Drop

Figure 8.2: Optimization Result for Barcode16: Voltage Drop Maps

Design Total Design Peak VD


Case Decap (pF) (mV)
Pre-Opt 778.809 -174.84
Post-Opt(F) 808.634 -169.46

• Voltage drop analysis using Our Approach


We graphically select from voltage drop map 336 logic cells belonging to regions where peak volt-
age drop exceeds 130mV (refer to Section 4.5.2 for details). We invoke ECO-Placer to replace these
336 cells with equivalent UCDCLIB cells. All three libraries OSULIB, UCDCLIB, and DCFLIB
are utilized for placement. The design after eco-placement contains 11780 logic and 1385 filler

110
Figure 8.3: Optimization Result Graphs for B14: Peak VD and Decap Budget

cells. The core area after eco-placement remains same.

DvD analysis results for Post-Opt(D) and Post-Opt(DF) are given in following table. All 1385
filler cells are replaced for Post-Opt(DF) case.

Design Total Design Peak VD


Case Decap (pF) (mV)
Post-Opt(D) 795.095 -140.82
Post-Opt(DF) 822.506 -139.31

Figures 8.3 shows the decap budget and voltage drop result for four different cases in graphical
form. From figures, it is clear that Post-Opt(F) case after taking 29.825pF more budget improves
voltage drop marginally. Whereas, Post-Opt(D) case shows improvement in voltage drop by ap-
proximately 20% with 16.29 pF more decap over Pre-Opt case. Replacing fillers in this case too
shows inefficacy of filler-based decap apparoach. Figure 8.4 shows voltage drop map for these four
cases.

8.3 B18 Design

• Voltage drop analysis using Traditional Approach


Synthesized design is place-n-routed using Cadence SOC encounter. Logic cells from OSULIB
library and filler cells from DCFLIB library are used for design placement. The design contains
71761 logic gates and 4791 filler cells. Placement parameters are given below:

111
Highest Lowest
Drop Drop

Figure 8.4: Optimization Result for B14: Voltage Drop Maps

Core utilization 75%


Power Ring Metal 6 and Metal 5
Power Stripe 5 equidistant stripes (Metal 6)
Die Area 1764300 x 1748000

DvD analysis results for Pre-Opt and Post-Opt(F) cases are shown in following table. All 4791
fillers are replaced with decap masters for Post-Opt(F) case.

112
Design Total Design Peak VD
Case Decap (pF) (mV)
Pre-Opt 5522 -258.61
Post-Opt(F) 5618 -254.19

• Voltage drop analysis using Our Approach


We graphically select from voltage drop map 2825 logic cells belonging to regions where peak volt-
age drop exceeds 200mV (refer to Section 4.5.2 for details). We invoke ECO-Placer to replace these
2825 cells with equivalent UCDCLIB cells. All three libraries OSULIB, UCDCLIB, and DCFLIB
are utilized for placement. The design after eco-placement contains 71161 logic and 3885 filler
cells. The core area after eco-placement remains same.

DvD analysis results for Post-Opt(D) and Post-Opt(DF) are given in following table. All 3885
filler cells are replaced for Post-Opt(DF) case. Figures 8.5 shows the decap budget and voltage drop

Design Total Design Peak VD


Case Decap (pF) (mV)
Post-Opt(D) 5633 -233.44
Post-Opt(DF) 5711 -232.57

result for four different cases in graphical form. From figures, it is clear that Post-Opt(F) case after
taking 96pF more budget improves voltage drop just by 1.7%. Whereas, Post-Opt(D) case shows
improvement in voltage drop by approximately 10% with 111 pF more decap over Pre-Opt case.
Replacing all fillers in Post-Opt(D) design shows a marginal improvement of 0.4% in voltage drop,
again showing inefficacy of filler-based decap apparoach. Figure 8.6 shows voltage drop map for
these four cases.

Figure 8.5: Optimization Result Graphs for B18: Peak VD and Decap Budget

113
Highest Lowest
Drop Drop

Figure 8.6: Optimization Result for B18: Voltage Drop Maps

8.4 Summary

Table 8.1 summarizes results discussed in previous sections for the benchmarks in one table. For
each benchmark, peak voltage drop and total design decap repository before and after replacement
is compared. As seen from the table, compared to Post-Opt(F) approach, Post-Opt(D) case offers
much better voltage drop results with a relatively small decap requirement. This indicates significant
decap saving, and hence proportional area savings on the die. The percentage improvement in
voltage drop due to Post-Opt(D) over Pre-Opt case is given in last Column. Clearly, results shown

114
highlights effectiveness of distributed decap placement using decap-padded standard cell library.

Further, as indicated previously in Section 4.5, due to a technical bug in the Synopsys’ PrimeRail
tool, we identified the replacement cell list graphically instead by using DCOPT Algorithm. During
the DCOPT iterative analysis using scripts, PrimeRail crashes with segmentation fault after some
random number of iterations. Therefore, we resorted to graphical identification of cells which are
to be replaced with cells from UCDCLIB and DCFLIB to show the approach. Nevertheless, the
graphical identification of replacement cells does not undermine the proposed optimization frame-
work. The effectiveness of DCOPT has been demonstrated in Chapter 6 by generating violation
report for DCOPT through PrimeRail GUI (only this part is executed graphically, all other opera-
tions of DCOPT are executed by algorithm described earlier). Also, ECO-Placer is not affected by
this process. ECO-Placer is an independent stand-alone tool. The effectiness of ECO-Placer has
been shown in Chapter 7. As of this thesis write-up day, the PrimeRail bug has not been resolved
by Synopsys. If it is resolved in future, we will include optimization results using DCOPT also. For
present setup, we include manual iteration results for at least two benchmarks. The steps for manual
analysis are given below.

The place-n-routed benchmark design is analyzed for DvD analysis, and initial voltage drop pro-
file is obtained. We set the voltage drop threshold to some value higher than the peak voltage drop
(negative), and identify the list of cells which are under violation (having drop more than the thresh-
old) using Perl scripts. These cells are replaced with decap-padded standard cells from UCDCLIB
using ECO-Placer. ECO-Placer generates a modified DEF placement. We also generate a modified
verilog netlist using Perl scripts. We eco route the generated placement, and final place and routed
design is saved. Design parasitic are also capatured in SPEF file. We perform DvD analysis on
this new place-n-routed design, and observe the voltage drop profile. Since this design has decap-
padded cells, we should expect a improvement in voltage drop profile. This forms the iteration 1.
We follow the same steps again to perform one more iteration of DvD analysis. Therefore, for each
benchmark, we get voltage drop improvement results for two iterations. We perform the manual
analaysis this way to emulate the behavior of DCOPT iterations. However, note that using DCOPT,
ECO-Placement step is executed only once. The manual analysis requires ECO-Placement for each
iteration. Hence, results of manual analysis does not reflect DCOPT behavior accurately, however,
it helps in highlighting the approach in the scenario of inconsistent behavior of PrimeRail tool. This
is shown in Figures 8.7 and 8.8 for benchmark barcode16 and B14 respectively. From figures, it is
clear, that the with multiple iterations, voltage drop using our approach improves significantly.

115
Table 8.1: Summary of Voltage Drop Optimization Results

Initial Placement Cell Replacement After ECO-Placement


Design #Cells Pre-Opt Post-Opt(F) V thuser #RCells Post-Opt(D) Post-Opt(DF) %V D†
Cd Peak VD Cd Peak VD Cd Peak VD Cd Peak VD
(pF) (mV) (pF) (mV) (pF) (mV) (pF) (mV)
116

Barcode16 7574 505.476 -105.63 544.010 -100.49 -90 549 522.584 -92.78 556.196 -90.93 -12.17
B14 11780 778.81 -174.84 808.63 -169.46 -130 336 795.095 -140.82 822.506 -139.31 -19.46
B18 71761 5522 -258.61 5618 -254.91 -200 2825 5633 -233.44 5711 -232.57 -9.73
#Cells: Number of logic cells in the design #RCells: Number of logic cells (OSULIB) replaced with equivalent cells from UCDCLIB library
CD : Total decap budget of the design in pF Peak VD: Peak voltage drop of the design in mV
V thuser : User defined voltage drop threshold in mV † % Change in Voltage drop for Post-Opt(D) Case compared to Pre-Opt Case
Negative percentage value indicates improvement
Figure 8.7: Manual Optimization Results for Barcode16

117
Figure 8.8: Manual Optimization Results for B14

118
Chapter 9

Conclusion and Future Directions

In this thesis, we analyzed on-chip power supply integrity problem in standard cell based ASIC
designs and proposed a complete framework to optimize it. We demonstrated that decaps are ef-
fective means to contain the on-chip voltage drop within bounds, however proper decap placment
becomes an important factor as we go down the technology node. Experimental results show that
the traditional method of decap placement which involves filler cell replacement places decaps away
from the violating nodes, rendering them ineffective, and requires more-than-necessary decap bud-
get. A distributed approach of decap placement which places decaps near violating nodes is cost-
effective in terms of voltage drop reduction and required decap budget. We proposed a distributed
decap placement approach by providing a new standard cell library (UCDCLIB) where logic cells
are padded with decoupling capacitor. A decap optimization algorithm (DCOPT) is developed to
calculate decap budget of a design in terms of decap-paddded standard cells and filler-based decaps.
Lastly, we develop an efficient engineering change order placement tool (ECO-Placer) to incremen-
tally modify the original design to accommodate these decap-padded standard cells, and generate
a valid placement DEF. The framework is integrated with commercial tools. Experimental results
show the effectiveness of the developed framework.

The developed framework can be extended further in following directions:

• Since the decaps always work in linear mode and directly connects the power and ground
rails, the gate tunneling leakage due to total design decap can be significant. One possible
way to tackle this problem is to make use of thin and thick-oxide decaps during optimization.
Thin-oxide decaps provide a higher decap per unit area with increased leakage, whereas thick-

119
oxide decap takes more area and provides less decap per unit area, but significantly reduces
the leakage. Optimization algorithm can be extended further to calculate decap budget in
terms of both types of decaps considering total design leakage into account.

• In this work, we have applied decap optimization process to post-placement designs. Hence-
forth, we developed an eco-placer algorithm to incrementally modify the original placement.
Although this approach requires a re-spin, the cost of re-spin is minimized with an eco-placer.
A possible alternative to reduce re-spinning cost would be to consider decap optimization
during initial design placement. For a standard cell based designs, rows can be modeled as
equivalent RC network and effect of decap placement in terms of UCDCLIB cells can be
analyzed in-sync with other placement objectives, enabling the development of voltage drop
driven placement tool. Since the accurate voltage drop analysis is not possible until cells’
placement is fixed, decap optimization can be applied in two steps. During the global bin
placement step, decap for individual bins can be calculated based on coarse power grid struc-
ture, available space in the bin, cell power consumption and distance of bin from the core
periphery. Calculated decap budget can then be refined further at local placement step.

• The proposed framework has been restricted to on-chip voltage drop analysis. In order to
simply the analysis, we assume ideal power supply points at the chip IO interface. However, in
reality, package interconnect parasitics also contribute significantly to overall on-chip voltage
drop. Inductance effects due to package interconnect can also be explored further.

• Reduced gate dimensions with technology scaling has positive effect on the overall decap
density since gate capacitance inversely varies with respect to gate oxide thickness. Hence,
with technology scaling, decap-padded standard cell approach is expected to deliver higher
gains in terms of decap budget and area requirements. However, the downside of thin gate
oxide is the increased susceptibility of gate breakdown due to electrostatic discharge phe-
nomenon. Cross-coupled decap approaches [34] have been proposed to address these issues.
The developed optimization framework can be extended further to nanometer technology
nodes considering these phenomenons.

120
Appendix A

Milkyway Library Creation using LEF


and DEF

Synopsys tools require design data to be available in Milkyway database format (refer to Section
4.3). Two types of Milkyway libraries are to be created: a Reference library and a Design library.
Milkyway reference library contains various views for the standard cells and components, which
can be instantiated in a upper level design. Milkyway design library is created for top level designs,
which instantiates cells and components from reference library. This appendix describes required
steps to generate Milkyway reference and design libraries [52].

A.1 Milkyway Reference Library Creation

Milkyway reference library contains physical and logical views for the standard cells in Milky-
way database format. These views are shown in Figure 4.3. Milkyway physical view can be created
using standard cells’ physical views available in either GDSII or LEF format. Milkyway logical
view can be created using standard cells’ timing and power views available in either LIB or DB
format. Following describes steps to create reference library using LEF and DB flow.

• Input Requirements:

1. Technology File: is used to describe attributes of a Milkyway library. It defines mea-


surement units, design rules, vias and layer attributes such as parasitic resistance and

121
capacitance for the technology node. The Milkyway technology file (mw tech.tf) for
0.18µm process can be obtained from OSU standard cell library.
2. Library LEF File: contains physical information about the cells. Refer to Section 5.4.2
for information about LEF file creation.
3. Library DB File: contains timing and power information for standard cells. Refer to
Section 5.4.3 for information about library timing view (DB) creation.

• Steps:

1. Start PrimeRail by typing following at command prompt:


> PrimeRail

2. PrimeRail user interface supports Scheme as well as TCL mode for command input. Ap-
propriate mode can be selected either by typing begin scheme for scheme and begin tcl
for tcl mode, or by cliking at lower left buttons marked “Scheme” and “Tcl”. Type fol-
lowing command in Scheme mode to open a dialog box shown in Figure A.1. There are
two steps for library preparation now.
> read lib

3. Physical Library Preparation


(a) Click on “Prepare Physical Library” button. The dialog box expands. Set “Physical
Input Format to TF+LEF”.
(b) Click Create Library button. Fill in following information in the opened dialog
box. And click OK. This creates an empty reference library with name specified in
Library name.
Library Name Such as MW OSULIB
Technology File Name mw tech.tf
Hierarchy Separator /
Set Case Sensitive Checked
(c) Click “Set Bus Naming Style”. Enter bus naming style (default [%d]). This is used
to identify bus signal names. It can be obtained from the LEF/DEF file.
(d) Click “Import LEF”. In the opened dialog box, enter following information.
Library Name MW OSULIB.
Tech LEF Files Name of LEF file containing technology information.
Cell LEF Files Name of LEF file containing cell information.

122
Figure A.1: Reference Library Creation Dialog Box

Layer Mapping LEF to Milkyway layer mapping file.


Advanced Library Prep Mode Unchecked
Others default
Typically a standard cell library LEF file contains both the technology (at the be-
ginning) and cell information. Therefore, tech LEF and cell LEF refers to same
file name. Layer mapping file can be obtained from OSU standard cell library.
Unchecking ”Advanced Library Prep Mode” will perform next four steps (“Extract
BPV’, “Set PR Boundary”, “Set Property For Multiple Height Cells”, and “Define
Wire Track”) automatically.

123
(e) Click “Check Wire Track”. Defaults OK.
(f) Click “Create PDB”. In the opened dialog box, enter reference library name. Select
“Import PDB”. Select “From FRAM View”. Click OK.

4. Logical Library Preparation

(a) Once physical library views are prepared, click on “Prepare Logical Library” button
on read lib dialog box. The dialog box expands. Click “Logical Input Format:
LIB/DB”.
(b) In the opened dialog box, click “Import Logic Model DB”. Click “Select DB”. In
the opened dialog box, set Min, Max, and Typical DB to standard cell library DB
format file. Set “Port Directions” Checked. Click Apply.

5. At this stage, both physical and logical views have been created. You should see the
Milkyway library files under the folder with reference library name.

6. To check if library cells have been imported properly in Milkyway database, open the
reference library (Library->Open Library). Open cell (Cell->Open->Browse->All
versions). You should see FRAM and LM views for all library cells.

A.2 Milkyway Design Library Creation

Milkyway design library can be created using input design DEF file. A place-n-routed design can
be saved in DEF file format. DEF file defines physical layout of design including instantiated cells
and macros, design floorplan, power and signal routing, netlist, and constraints. Steps to generate
Milkyway design library using DEF file are described below:

• Input Requirements:

1. Technology File: same as above (mw tech.tf)

2. Milkyway Reference Libraries: reference libraries created above, if the design instanti-
ates components from reference library.

3. Library LEF File: same as above

4. Input Design in DEF File Format.

• Steps:

124
1. Start PrimeRail (see above). Create design library by typing following in Scheme mode
or clicking Library->Create. Enter information as discussed above.
> cmCreateLib

2. Attach Milkyway reference libraries to the design. Since design instantiates components
from reference libraries, we need to provide a logical reference to appropriate Milkyway
reference libraries. Type following in Scheme mode or click “Library->Add Ref...”.
Specify library paths and Click OK.
> cmRefLib

3. Verify library references. Type following in Scheme mode or click “Library->Show


Ref...”. Enter design library name. Click OK. All attached reference libraries must be
displayed in the command window.

4. Open the library. (Library->Open or geOpenLib in Scheme mode).

5. Import DEF. Enter following to open DEF import dialog box shown in Figure A.2
> read def

Figure A.2: Design Library Creation Dialog Box

Enter following information. And click OK. DEF file would be imported at this stage.

Library Name Design Library Name created at step 1.


Cell Name Design Cell Name. This is the design name used during design
placement. It can be also be obtained from DEF file.
DEF File Name Name of design DEF file.
Advance Cell Version Unchecked

125
Netlist Input Mode Reset & Import
Physical Input Mode Reset & Import
Row Options
Core Site Name core
Via Options
Import Incremental LEF... Checked
LEF File Name Name of library LEF file.
Others default

6. Verify design library. Click Cell->Open or type geOpenCell in Scheme mode). Select cell
name (Browse->Cells) and click OK. You should see the design layout in the display
window.

126
Appendix B

Library Characterization for Voltage


Drop Analysis

As discussed in Section 4.3, an accurate and efficient voltage drop analysis requires underlying
cells’ parasitic and transient current information at each switching event. PrimeRail provides a
library characterization flow to capture these information for each library cell. The characterization
results are stored in the Milkyway cell reference library. Library characterization can also capture
cell leakage information. Following steps are used to characterize library cells.

• Input Requirements:

1. Milkyway Reference Library: LM (Logic Model) view from reference library is re-
quired to access cell timing and power information.

2. Transistor Model File: Transistor model file for the technology node.

3. Spice SubCkt Cell File: A file containing spice netlist in .subckt format for all cells.

4. Port Specification File: Port specification file (pg.spec) specifies power and ground port
used in reference library, voltage levels for each port, and mapping information for each
port. The mapping information is needed to express current source and sink. Example
of pg.spec file is shown below:
pg.spec file
definePowerPort “vdd” 1.8
definePowerPort “gnd” 0

127
defineDefaultPowerPort “vdd”
defineDefaultGroundPort “gnd”
defineGroundPowerMapping “gnd” {“vdd”}

5. HSPICE tool: PrimeRail invokes HSPICE to perform characterization.

• Steps:

1. Start PrimeRail by typing following at command prompt:


> PrimeRail

2. Open library to be characterized (Library->Open or geOpenLib in Scheme mode).

3. Library characterization can be performed by running a following single command:


> pgLibCharacterize

This single command accomplishes the task of running four individual commands:
pgSpiceSetup, pgPreCharacterize, pgLinkPGSpec, and pgLinkCharacterize. If pgLibCharacterize results into er-
rors, library characterization can be performed by running these individual commands in
order. Running pgLibCharacterize opens a dialog box shown in Figure B.1. This dialog box
can also be opened using “Cell-Level Dynamic Analysis->Library Characterization...”
menu.

4. Enter following information in the opened dialog box. And click OK.

SPICE binary HSPICE executable name


Spice Files:
Transistor Library File Transistor model file
Subcircuit File Spice subckt file for cells
PVT Condition:
Process 1.0
Temperature 25
P/G Port Spec File pg.spec file
Directory to Store Spice Error Path to store errors
Include Cell Name From File Provide a file containing cell names
to be characterized. Leave blank to
characterize all cells.
Pre-Chracterization Type Select type of characterization.
Library Info. Source Select LM View
Pre-Chracterization Base Based on SPICE Simulation.

128
Distributed Processing Unchecked.

5. To verify characterization results, use following commands:


> pgDumpCharacterize

> pgValidateLib

> pgListCharResult

Figure B.1: Library Characterization for DvD Dialog Box

129
Appendix C

Dynamic Voltage Drop Analysis using


PrimeRail

Dynamic voltage drop analysis (DvD) on a design can be performed, once Millkyway design
and reference libraries have been created, and libraries have been characterized properly. Refer to
Sections 4.3 and 4.4 for understanding of primerail voltage drop analysis flow. In this appendix, we
provide steps to perform DvD analysis using PrimeRail.

• Input Requirements: Input requirements for DvD analysis are discussed in Section 4.3. We
briefly summarize requirements here for easy reference.

1. Milkyway Reference Library

2. Milkyway Design Library

3. Library Characterization Data

4. Design Verilog Netlist

5. Switching Activity in VCD or SAIF Format

6. Design Parasitic Information (SPEF Format)

7. Design Constraints (SDC File)

8. TLU+ Models for interconnect RC extraction or ITF (Interconnect technology file) for
technology node.

130
• Steps: Following steps describe commands to perform cell-level DvD. These steps can also
be performed graphically from “Cell-Level Dynamic Analysis” menu.

1. Start PrimeRail. Open Milkyway design library to be analyzed.


> PrimeRail

> geOpenLib

2. Purge Old Rail View, if any. (DvD analysis results are stored in RAIL View)
> poPurgeRail

3. Perform Power Analysis: PrimeRail performs design power analysis by invoking PrimeTime-
PX. A script template to perform power analysis using PrimeTime-PX can be created
by executing following command:
> poCreatePTPXScriptTemplate

Modify the output script. Specify verilog file name, design timing library (DB), switch-
ing activity (VCD or SAIF based), SPEF file and power analysis output file. To perform
vector-free power analysis, specify switching activity for primary inputs. PrimeTime-
PX propagates switching activity to internal nets. In no input is specified, PrimeTime-
PX assumes default input acvitity factor. A sample script is shown below:
ptpxscript
set power enable analysis true
set link library [list * <path to library DB file>]
read verilog [list <path to design verilog netlist>]
current design <design name>
link
pwr default toggle rate 0.5
pwr default static probability 0.5
pwr default tr reference clock [fastest — related]
read sdc <path to design constraints (sdc)>
read parasitics <path to design parasitic file (spef)>
set power rail output file <power output file>
update power
quit

Run following command and enter the script file name in the opened dialog box.
> poCallPTPX

PrimeTime-PX will be invoked and design switching, short-circuit and leakage power
will be reported to the command window. PrimeRail stores cell power information in
the specified output file.

131
4. Load Power Supply: This step tells PrimeRail about power net voltage levels. Run
following command and specify power supply information in TDF format.
> poLoadPowerSupply

A sample tdf file is given below:


Supply.tdf file
tdfSetPowerSupply “vdd” 1.8

5. Calculate Transient Current Waveforms: Cell transient current information from library
characterization data and cell power information from PrimeTime-PX results are com-
bined at this step to generate actual transient current waveforms for each cell. Current
waveform for each cell recorded at few significant points (10%, 50%, 90%, Peak value)
is stored in design Milkyway library. Following command calculates the current wave-
forms
> poTransientPowerAnalysis

6. Extract Power and Ground Nets: Final step required before DvD is to extract power grid
parasitics. TLU+ models are required to extract RC parasitic for power grid. In absence
of TLU+ model, only resistance parasitic can be extracted. Run following command to
extract power grid resistance:
> poPGExtraction

7. Perform Cell Level Dynamic Analysis: At this step, RC view of power grid can be com-
bined with current and parasitic information of cells and Rail analysis can be performed.
Following command performs the DvD rail analysis:
> poRailAnalysis

Rail analysis requires an additional input: tap file. Tap file specifies power supply inputs
to the design. Power supply input points are specified as coordinates in two-dimensional
layout map. Tap file can be created by graphically specifying the power supply points
in the design (such as at power supply rings). A sample Tap file is shown below:
Tap File
vdd 14 83.150 143.100 # 14 and 12 refers to supply metal layers.
vdd 12 3.100 76.350

8. View Results: Finally, results of voltage drop can be seen by executing following com-
mand. A threshold voltage can be set in the opened dialog box to observe violating
nodes (nodes having high voltage drop).
> pgMap

132
133
Bibliography

[1] International Technology Roadmap for Semiconductors. Technical report, Semiconductor In-
dustry Association, 2007. http://public.itrs.net.

[2] Fei Yuan. Simultaneous Switching Noise. http://www.ee.ryerson.ca/∼fyuan/ssn.pdf.

[3] PrimeRail User Guide. Technical report, Synopsys Inc., San Jose, CA, 2008.

[4] Jan Rabaey, Anantha Chandrakasan, and Borivoje Nikolic. Digital Integrated Circuits: A
Design Perspective. Pearson Education, 2nd edition, 2003.

[5] G.E. Moore. Cramming more components onto integrated circuits. Proceedings of the IEEE,
86(1):82–85, Jan 1965.

[6] M.A. Elgamel and M.A. Bayoumi. Interconnect noise analysis and optimization in deep sub-
micron technology. Circuits and Systems Magazine, IEEE, 3(4):6–17, 2003.

[7] Voltagestorm Cell-Level Rail Analysis User Guide. Technical report, Cadence Design Sys-
tems, San Jose, CA, 2007.

[8] A.V. Meziba and E.G. Friedman. Power Distribution Networks in High Speed Integrated
Circuits. Kluwer Academic Publishers, 2004.

[9] Q.K. Zhu. Power Distribution Network Design for VLSI. Wiley-Interscience Publication,
2004.

[10] Joon-Seo Yim, Seong-Ok Bae, and Chong-Min Kyung. A floorplan-based planning method-
ology for power and clock distribution in ASICs [CMOS technology]. Design Automation
Conference, 1999. Proceedings. 36th, pages 766–771, 1999.

134
[11] Yu Zhong and M.D.F. Wong. Thermal-aware IR drop analysis in large power grid. Qual-
ity Electronic Design, 2008. ISQED 2008. 9th International Symposium on, pages 194–199,
March 2008.

[12] K.-H. Erhard, F.M. Johannes, and R. Dachauer. Topology optimization techniques for
power/ground networks in VLSI. Design Automation Conference, 1992. EURO-VHDL ’92,
EURO-DAC ’92. European, pages 362–367, Sep 1992.

[13] R. Dutta and M. Marek-Sadowska. Automatic sizing of power/ground (P/G) networks in


VLSI. Design Automation, 1989. 26th Conference on, pages 783–786, June 1989.

[14] L.D. Smith. Decoupling capacitor calculations for CMOS circuits. Electrical Performance of
Electronic packaging, 1994., IEEE 3rd Topical Meeting on, pages 101–105, Nov 1994.

[15] H.H. Chen and D.D. Ling. Power supply noise analysis methodology for deep-submicron
VLSI chip design. Design Automation Conference, 1997. Proceedings of the 34th, pages 638–
643, Jun 1997.

[16] M. Ang, R. Salem, and A. Taylor. An on-chip voltage regulator using switched decoupling
capacitors. Solid-State Circuits Conference, 2000. Digest of Technical Papers. ISSCC. 2000
IEEE International, pages 438–439, 2000.

[17] A. E. Ruehii. Inductance calculations in a complex integrated circuit environment. IBM Jour-
nal of Research and Development, 1972.

[18] Power Grid Verification. White Paper, 2002. Available at


www.cadence.com/whitepapers/4101 PowerGridVerif WP.pdf.

[19] M. Popovich, E.G. Friedman, R.M. Secareanu, and O.L. Hartin. Efficient placement of dis-
tributed on-chip decoupling capacitors in nanoscale ics. Computer-Aided Design, 2007. IC-
CAD 2007. IEEE/ACM International Conference on, pages 811–816, Nov. 2007.

[20] Mikhail Popovich, Eby G. Friedman, Michael Sotman, Avinoam Kolodny, and Radu M. Se-
careanu. Maximum effective distance of on-chip decoupling capacitors in power distribution
grids. In GLSVLSI ’06: Proceedings of the 16th ACM Great Lakes symposium on VLSI, pages
173–179, 2006.

135
[21] Shiyou Zhao, K. Roy, and Cheng-Kok Koh. Decoupling capacitance allocation and its ap-
plication to power-supply noise-aware floorplanning. Computer-Aided Design of Integrated
Circuits and Systems, IEEE Transactions on, 21(1):81–92, Jan 2002.

[22] M.D. Pant, P. Pant, and D.S. Wills. On-chip decoupling capacitor optimization using archi-
tectural level prediction. Circuits and Systems, 2000. Proceedings of the 43rd IEEE Midwest
Symposium on, 2:772–775 vol.2, 2000.

[23] Haihua Su, S.S. Sapatnekar, and S.R. Nassif. Optimal decoupling capacitor sizing and place-
ment for standard-cell layout designs. Computer-Aided Design of Integrated Circuits and
Systems, IEEE Transactions on, 22(4):428–436, Apr 2003.

[24] Chao-Yang Yeh and Malgorzata Merek-Sadowska. Timing-aware power-noise reduction in


placement. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions
on, 26(3):527–541, March 2007.

[25] Y. Cao, T. Sato, M. Orshansky, D. Sylvester, and C. Hu. New paradigm of predictive MOSFET
and interconnect modeling for early circuit simulation. Custom Integrated Circuits Confer-
ence, 2000. CICC. Proceedings of the IEEE 2000, pages 201–204, 2000.

[26] Arizona State University Predictive Technology Model. http://www.eas.asu.edu/ ptm/.

[27] Priyanka Thakore. Development of process variation tolerant standard cells. Master’s thesis,
University of Cincinnati, 2007.

[28] OSU Standard Cell Library. http://www.eda.ncsu.edu/wiki/NCSU CDK.

[29] P.R. Panda and N.D. Dutt. 1995 high level synthesis design repository. System Synthesis,
1995., Proceedings of the Eighth International Symposium on, pages 170–174, Sep 1995.

[30] ITC’99 Benchmarks. http://www.cad.polito.it/tools/itc99.html.

[31] Design Compiler User Guide. Technical report, Synopsys Inc., San Jose, CA, 2002.

[32] SOC Encounter User Guide. Technical report, Cadence Design Systems, San Jose, CA, 2007.

[33] J.E. Stine, J. Grad, I. Castellanos, J. Blank, V. Dave, M. Prakash, N. Iliev, and N. Jachimiec.
A framework for high-level hynthesis of system on chip designs. Microelectronic Systems
Education, 2005. (MSE ’05). Proceedings. 2005 IEEE International Conference on, pages
67–68, June 2005.

136
[34] Xiongfei Meng, K. Arabi, and R. Saleh. Novel decoupling capacitor designs for sub-90nm
CMOS technology. Quality Electronic Design, 2006. ISQED ’06. 7th International Sympo-
sium on, pages 6 pp.–271, March 2006.

[35] Yiran Chen, Hai Li, K. Roy, and Cheng-Kok Koh. Gated decap: gate leakage control of on-
chip decoupling capacitors in scaled technologies. Custom Integrated Circuits Conference,
2005. Proceedings of the IEEE 2005, pages 775–778, Sept. 2005.

[36] Star hspice manual. Technical report, Avant! Corporation, June 2001.

[37] J.E. Meyer. MOS models and circuit simulation. RCA Rev., 32:42–63, 1971.

[38] M.A. Cirit. The meyer model revisited: why is charge not conserved? [MOS transis-
tor]. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on,
8(10):1033–1037, Oct 1989.

[39] HSPICE simulation and analysis user guide. Technical report, Synopsys Inc., March.

[40] J.D. Djigbenou, Thien Van Nguyen, Cheng Wei Ren, and Dong Sam Ha. Development of tsmc
0.25m standard cell library. SoutheastCon, 2007. Proceedings. IEEE, pages 566–568, March
2007.

[41] NCSU Cadence Design Kit. http://www.eda.ncsu.edu/wiki/NCSU CDK.

[42] Chintan Patel. Advanced VLSI Design Abstract Generation. Available at


www.csee.umbc.edu/cpatel2/links/414/slides/lect02 abstract.pdf.

[43] Abstract Generator User Guide. Technical report, Cadence Design Systems, San Jose, CA,
2007.

[44] Signalstorm Library Characterizer User Guide. Technical report, Cadence Design Systems,
San Jose, CA, 2007.

[45] Steve Golson. The human ECO compiler. Synopsys User Group Conference (SNUG), San
Jose, CA, 2004. http://www.trilobyte.com/pdf/golson snug04.pdf.

[46] J.A. Roy and I.L. Markov. ECO-system: Embracing the change in placement. Design Au-
tomation Conference, 2007. ASP-DAC ’07. Asia and South Pacific, pages 147–152, Jan. 2007.

137
[47] Chen Li, Cheng-Kok Koh, and P.H. Madden. Floorplan management: incremental placement
for gate sizing and buffer insertion. Design Automation Conference, 2005. Proceedings of the
ASP-DAC 2005. Asia and South Pacific, 1:349–354 Vol. 1, Jan. 2005.

[48] Yi Liu, Xianlong Hong, Yici Cai, and Weimin Wu. CEP: a clock-driven eco placement al-
gorithm for standard-cell layout. ASIC, 2001. Proceedings. 4th International Conference on,
pages 118–121, 2001.

[49] Zhuoyuan Li, Weimin Wu, and Xianlong Hong. Congestion driven incremental placement
algorithm for standard cell layout. Design Automation Conference, 2003. Proceedings of the
ASP-DAC 2003. Asia and South Pacific, pages 723–728, Jan. 2003.

[50] Zhuoyuan Li, Weimin Wu, Xianlong Hong, and Jun Gu. Incremental placement algorithm for
standard-cell layout. IEEE International Symposium on Circuits and Systems, ISCAS 2002,
2:II–883–II–886 vol.2, 2002.

[51] Sadiq M. Sait and Habib Youssef. VLSI Physical Design Automation - Theory and Practice.
IEEE Press, 1995.

[52] Milkyway Data Preparation User Guide. Technical report, Synopsys Inc., San Jose, CA, 2007.

138

You might also like