Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220537378

An Energy-Efficient Adaptive Modulation Suitable for Wireless Sensor


Networks with SER and Throughput Constraints

Article  in  EURASIP Journal on Wireless Communications and Networking · May 2007


DOI: 10.1155/2007/41401 · Source: DBLP

CITATIONS READS

31 187

3 authors:

J. Joaquin Escudero-Garzas Carlos Bousoño-Calzón


Texas A&M University University Carlos III de Madrid
24 PUBLICATIONS   233 CITATIONS    70 PUBLICATIONS   606 CITATIONS   

SEE PROFILE SEE PROFILE

Ana Garcia Armada


University Carlos III de Madrid
171 PUBLICATIONS   1,507 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Space-Time Coding Selection for Broadband Communication Systems View project

MACHINE View project

All content following this page was uploaded by Ana Garcia Armada on 20 May 2014.

The user has requested enhancement of the downloaded file.


Power Conscious CAD Tools and
Methodologies: A Perspective
~~

DE0 SINGH, MEMBER, IEEE, JAN M. RABAEY, FELLOW, IEEE,


MASSOUD PEDRAM, MEMBER, IEEE, FRANCKY CATTHOOR, MEMBER, IEEE,
SURESH RAJGOPAL, MEMBER, IEEE, NARESH SEHGAL, MEMEBER, IEEE,
AND THOMAS J. MOZDZEN, MEMBER, IEEE

Invited Paper

Power consumption is rapidly becoming an area of growing community that power consumption of integrated circuits is
concern in IC and system design houses. Issues such as battery beginning to impact design complexity, design speed, de-
life, thermal limits, packaging constraints and cooling options are sign area, and manufacturing costs. The motivation for this
becoming key factors in the success of a product. As a conse-
quence, IC and system designers are beginning to see the impact paper at this time (since the problem is still at a relatively
of power on design area, design speed, design complexity and early stage) is to convey an understanding of the breadth of
manufacturing cost. While process and voltage scaling can achieve the problem and the complexity of the solutions that need
significant power reductions, these are expensive strategies that to be developed (in a relatively short period of time) to
require industry momentum, that only pay off in the long run. hold down cost by driving down power consumption. The
Technology independent gains for power come from the area
of design for low power which has a much higher return on
paper is written in the form of a tutorial and is targeted at
investment (ROI). But low power design is not only a new area a wide audience, not only low power design experts and
but is also a complex endeavor requiring a broad range of syn- researchers in the field, but also system and IC designers
ergistic capabilities from architecture/microarchitecturedesign to that do not have a strong background in low power design
package design. It changes traditional IC design from a two- and the issues surrounding low power design. It is also
dimensional problem (AredPerformance) to a three-dimensional
one (AredPerformance/Power). This paper describes the CAD targeted at the commercial CAD community in the hope
tools and methodologies required to effect eficient design for that the CAD companies will reexamine their development
low power. It is targeted to a wide audience and tries to convey plans and develop new value added products to support low
an understanding of the breadth of the problem. It explains the power design. As such, the authors have made a concerted
state of the art in CAD tools and methodologies. The paper effort to make the tutorial easy to read by keeping the
is written in the form of a tutorial, making it easy to read by
keeping the technical depth to a minimum while supplying a
technical depth to a minimum while supplying the reader
wealth of technical references. Simultaneously the paper identifies with a wealth of references for further study.
unresolved problems in an attempt to incite research in these
areas. Finally an attempt is made to provide commercial CAD tool A. Market Forces
vendors with an understanding of the needs and time frames for Historically power consumption has not been a issue
new CAD tools supporting low power design.
nor a high priority in the semiconductor industry. The
industry was driven by the competitive market forces of si-
I. INTRODUCTION multaneously increasing integration and reducing chip area
Over the last year there has been increasing concern (and cost). In the DRAM market segment, the increase in
in system design houses, IC suppliers, and the academic integration came in the form of larger memories and higher
Manuscript received May 21, 1994; revised August 25, 1994. memory density. In the high volume microprocessor market
J. M. Rabaey is with the University of California, Berkeley, CA 94720 segment, integration was driven by the need to increase
USA. the performance. Higher performance was achieved through
D. Singh, S. Rajgopal, N. Sehgal, and T. J. Mozdzen are with Intel
Corp., Santa Clara, CA 95052-81 19 USA. the use of new architectures (e.g., super scalar and super
M. Pedram is with the Department of Electrical Enginering Systems, pipelined); integrating more and more system functions on
The University of Southern California, Los Angeles, CA 90089-2562 chip, increasing the processor speed and increasing the
USA.
F. Catthoor is with IMEC Laboratory, B-3030 Leuven, Belgium. processor’s ability to perform computation on larger word
IEEE Log Number 9409205. sizes e.g., 8, 16, and 32 b. The trend in the large, and cost

0018-9219/95$04.00 0 1995 IEEE

570 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 4, APRIL 1995

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
0 Ifice E qulpm e a t E lectriclty Perfor m an ce
Consumption
50%
40%
30%
20% lPrinters
10%
IMinicomputers
0%

Fig. 1. Electricity consumption by equipment type source:


Dataquest. Power

sensitive, embedded-processor market segment is towards Fig. 2. Optimization changes from two dimensions to three di
larger word sizes and higher frequencies. mensions.
The trend in the general purpose DSP markets is
also towards higher performance. The embedded proces-
sors/microcontroller and DSP chips are targeted at large a finite amount of energy; the equipment is battery powered!
and significant systems markets such as consumer elec- Although significant accomplishments had been made to
tronics (e.g., HDTV and video), mobile communications, extend battery life from 1-2 h (between charges) in the
Broadband ISDN, military, aerospace, surveillance systems, early notebook computers to about 4 h in state-of-the-art
automotive, image processing, medical and office systems. notebook computers, future more powerful processors are
In some of these markets the data throughput requirement predicted to make even greater demands on the battery.
is so high, e.g., Digital HDTV and Video that application Although the mobile market segment will be a major
specific DSP’s are needed to deliver billions of operations driving force towards lower power in electronic equipment,
per second (BOPS). The higher performance is achieved at it is by no means the only one. The US Environmental
the cost of greater switching activity. Protection Agency [28] estimates that over 80% of Office
Higher integration and speeds have resulted in increased Equipment Electricity consumption is attributed to PC’s,
power consumption and heat dissipation. The combination monitors, minicomputers, and printers, and the nonpower-
of higher speeds and shrinking geometries leads to higher managed PC’s consume some 6.5 times the electricity
on-chip electric fields which translate into a decrease in consumption of the power managed systems.
reliability. The increased heat dissipation leads to higher
packaging costs e.g the addition of a heat sink could
easily increase the component cost by $5-$10. The ad- B. Challenge
ditional packaging costs become significant in the low- In rising to the challenge to reduce power the semi-
margin, price-sensitive embedded-processor market seg- conductor industry has adopted a multifaceted approach,
ment, as well as in the high volume microprocessor market attacking the problem on three fronts:
segment. An additional $5 cost is not easily absorbed in Reducing chip capacitance through process scaling.
the embedded market place, and may be prohibitive in This approach to reducing the power is very expensive
entering into this market segment. In the high volume and has a very low return on investment (ROI).
microprocessor market the additional packaging cost could Reducing voltage. While this approach provides the
easily (over the life of the product) translate in additional maximum relative reduction in power it is complex,
costs of $100 million. difficult to achieve and requires moving the DRAM
From the system perspective increased power consump- and systems industries to a new voltage standard. The
tion is reflected in the higher system costs; system designers ROI is low.
are now required to effect superior thermal design to Employing better architectural and circuit design tech-
rapidly remove unnecessary heat and maintain safe system niques. This approach promises to be the most suc-
temperatures, as well as developing more sophisticated cessful because the investment to reduce power by
power supplies. design is relatively small in comparison to the other
Fig. 1 shows the power consumption of office equipment. two approaches.
The mobile computing market segment (notebook, hand- The last approach (design for low power) does however,
held, and pen based computers) is predicted to grow from have one drawback, namely the design problem has essen-
approximately 8% (in 1992) to approximately 32% (in tially changed from an optimization of the design in two
1996) of the total PC and personal workstation market [28]. dimensions (performance and area) to optimizing a three-
This market segment compounds the foregoing power dimensional problem, i.e., performance, area, and power,
requirements with the need to operate the mobile product on see Fig. 2 .

SINGH et al. CAD TOOLS AND METHODOLOGIES: A PERSPECTIVE 57 1

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
tion is then transformed and mapped to functional building
Functional
blocks (FBB’s) or templates representing executing units,
Transformation
Tools
controllers, and memory elements, etc. The elements of
the functional building block are stored in an object-
oriented library. Microarchitectural synthesis further refines
the FBB’s into hardware building blocks (again stored in
the library) to meet performance, power, timing, and area
constraints. The result of this phase is an RTL description
describing datapaths, FSM’ s, and memories (registers to-
gether with necessary busses). The power estimation and
optimization tools relating to this phase of the high level
Estimation
design are described in Section 11.

r-f Next the RTL description is processed by logic level


tools, such as datapath compilers and logic synthesis tools
to further optimize the design. Again the design is opti-
mized for performance power, timing and area, and the
Done tools make use of the design entities residing in the object-
oriented library. The power estimation and optimization
Fig. 3. Design data flow accessing a cell library at all phases tools relating to this phase of the high level design are
described in Section 111.
The design for low power problem cannot be achieved Section IV describes the estimation and optimization
without good CAD tools. The remainder of this paper techniques required to optimize the physical design. This
describes the CAD tools and methodologies required to section discusses partitioning, floorplanning, placement,
effect efficient design for low power. As stated earlier, low routing and circuit sizing. It also discusses the role of the
power design is a relatively new problem and the paper is object-oriented library. The library is key to the successful
targeted at a wide audience to achieve the following: implementation and support of the CAD tools and activities
Convey an understanding of the breadth of the prob- at all levels of the design data hierarchy as described in this
lem. paper. The library does not naturally fall into any one point
Explain the state of the art of CAD tools and method- of the design flow because it is an enabling technology and
ologies and as well as references to find additional supports all phases of the design process; and is often taken
more in-depth technical information in specific fields. for granted. It is described in this section of the paper to
Highlight the areas that need considerably more re- emphasize its importance to the layout activities.
search. Contemporary layout tools have to manipulate standard
Assist the commercial CAD vendors in understanding cell libraries containing a few thousand cells when making
the needs and time frames for new CAD tools to tradeoffs of performance and area. The flat library structure
support low power design. associated with mature libraries severely limit large num-
bers of rapid tradeoffs of power and area. New tools for
C. Structure Of Paper power estimation and optimization will make the situation
It is well accepted that design is not a purely top down a practical impossibility. It is therefore important to apply
process, however, it is a convenient method of describing new library solutions/mechanisms (discussed in Section
the low power activities at different levels of the design- IV), firstly at the layout level and then at other levels in
data hierarchy as well as the flow of data and control. The a bottom up fashion. This approach should lead to a high
different sections of the paper discuss the power issues in value added solution.
the context of a CAD system to support design for low Sections V and VI discuss reliability and noise-induced
power. Fig. 3 depicts the flow of data through the CAD packaging issues. There are two factors that make it nec-
system. essary to address these issues in the context of low power
Complex systems tend to be specified with a mix of design. To begin with, low power design techniques are
architectural and behavioral constructs, primarily because a cause for some of these problems, such as low noise
some parts of the system can be described by an algo- margins due to voltage scaling, and increased power supply
rithm and others cannot. It is assumed that the design noise levels due to dynamic on-chip power management.
starts off with a behavioral/architectural description of the Other noise problems such as simultaneous switching noise
product specification; DSP systems are more amenable to and signal crosstalk are due to the continuing demand
behavioral descriptions whereas general purpose micropro- for higher performance/watt (via higher frequency and
cessor designs are more amenable to microarchitectural increased device integration).
specifications. In the case of DSP systems, the behavioral Each section of the paper follows a common theme.
description is transformed through a series of behavior- The scope of the problem is first established. This is then
presenting transformations to generate microarchitectural followed by discussions on the power analysis techniques
specifications with timing. The microarchitectural descrip- and power optimization techniques.

572 PROCEEDINGS OF THE IEEE. VOL. 83, NO. 4, APRIL 1995

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
The paper concludes with a summary and recommenda- where C represents the physical capacitance of the design,
tions for low power tool development together with their VDDthe supply voltage, Vswingthe logical voltage swing
priorities. (which most often equals the supply voltage in CMOS
design), a the switching activity, and f the periodicity of
11. HIGH LEVELDESIGN the computation, which is most often the sample, instruction
or clock rate. Of those factors, especially the capacitance
A. The Behavioral Level and activity are difficult to predict in the early design
While this is the least explored dimension of the power phases. The former requires a detailed knowledge of the
minimization process, it is potentially the one with the implementation, while the latter is a strong function of the
most dramatic impact. Selecting the right algorithm can signal statistics and is, hence, nondeterministic. The two
reduce the power consumption of an application by or- factors are often lumped together in a single parameter,
ders of magnitude. Similarly, a single application can called the effective or switching capacitance of a design
be represented in many different ways, one being more [801:
amenable than the other for low power implementation.
These observations have been experimentally verified and
c,, = a c . (2)
demonstrated by a number of researchers [98], [18], [lo61 a ) Determining the Switching Activity: The activity fac-
for various domains of such as audiolspeech, imagelvideo, tor is what distinguishes power prediction from, for in-
telecommunications and networking. Similarly, finding the stance, area or performance analysis and makes it consid-
right system partitioning can have a profound impact on erably harder. Activity is a function of both the algorithm
the overall power budget. For instance, when designing a structure and the data applied. Especially the latter factor is
wireless communication system, it makes sense to transfer hard to determine deterministically and most of the activity
the brunt of the power intensive functions to the immobile analysis approaches, therefore, require extensive simulation
side of the system, relieving the power consumption at or use empirical data.
the battery powered portable end [98]. A combination of As an example of the latter, Masaki [62], [63] deter-
the preceding decisions can result in orders of magnitude mined the global switching activity factor for a variety of
of power reduction. This comes, however, at the price of computing systems by analyzing their power consumption
decreased performance, increased latency or area, such that using both simulation or measurement. For minicomputers,
tradeoffs are necessary in practice. a-factors between 0.01 and 0.005 are obtained, while
In summary, the problem with power minimization at the microcomputers display a higher activity ranging from 0.05
behavioral level is that a designer has to uadeoff between to 0.01.
multiple contradictory design parameters, the impact of While the experimental approach is useful when perform-
which is mostly unknown in the early phases of the design ing a macroanalysis, its accuracy is insufficient to be of
process. The same holds for the architectural level design any help in algorithm selection and optimization. Most of
as well. As a result, the algorithmic and architectural the high level power prediction tools use a combination of
specifications are often frozen early in the design process, deterministic algorithm analysis, combined with profiling
based on “back of the envelope” computations. This reduces and simulation to address data dependencies. Important
the power minimization space to the circuit and logic levels, statistics include the number of instructions of a given type,
where generally only limited reductions can be achieved. the number of bus, register and memory accesses and the
Providing the designer with a set of tools which can help number of U 0 operations [20], [64], executed within a given
him to explore, evaluate, compare and optimize the power period. The advantage is that this analysis can be executed
dissipation alternatives early in the design process is a must at a high abstraction level and can therefore be executed on
to facilitate and enable fast and accurate low power design. a large data set without incurring a dramatic time penalty.
In the remainder of this section, we discuss the exist- Instruction level simulation or behavioral DSP simulators
ing and ongoing efforts in design methodology and tool are easily adapted to produce this information (e.g., [lo]).
development at the behavioral and architectural levels. An example of such an approach is documented in [123l,
As is common throughout this tutorial, design tools are where the switching activity related to transitions in data
partitioned in two categories, being analysislprediction and and address signals for off-chip storage is tabulated by
the synthesis/optimization. applying random excitations to a board level system model.
I ) Power PredictiodEstimation: A meaningful minimiza- b) Determining the Capacitance: To translate the activ-
tion process requires a reasonable estimate of the opti- ity factors into actual energy measurements, they have to
mization target, in this case power consumption. Trying be weighted with the capacitance, switched per access of
to predict the power consumption of an application in the a given resource. It is important to observe here that,
early phases of the design process is a nontrivial task. to a first degree, the actual number of instances of a
Dissipation is a function of a number of parameters, as resource does not impact the dissipation. For instance,
demonstrated in (l), the well known expression for dynamic performing N additions sequentially on a single adder
power consumption in CMOS circuits: consumes just as much power as performing the additions
concurrently on N adders. This observation ignores the
architectural overhead involved with either approach, as

SINGH et al.: CAD TOOLS AND METHODOLOGIES: A PERSPECTIVE 573

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
1500

......I
Bitwidth = 16 (1.2pm CMOS)
method 1
a
U 31

p
.z
1000 /

0
V

: method 2

supply voltage 0

0 Fig. 5. Comparison of two different algorithms for vector com-


parison.
Area, mm2

Fig. 4. Statistical estimation of interconnect bus capacitance collect the routing data and construct a stochastic model
(from [20]).
[@I, [W.
A similar approach can be adapted to predict the average
well as the impact of signal correlations. The architectural capacitance per bus. This is demonstrated in Fig. 4, which
composition, for instance, has an impact on the size of the plots the measured bus-capacitance as a function of the chip
chip and, consequently, influences the cost of a bus access, area. The data is obtained from 50 benchmark examples
an inputloutput operation or the clock network. generated using the HYPER synthesis [82] and the LAGER-
The power cost of accessing a resource is determined IV layout generation [99] tools. The close correlation,
by its type. For computational units (adders, multipliers, which exists between area and bus capacitance, is easily
registers) or for memories, it is possible to get measured captured in a simple piecewise linear model. The chip
or estimated data from the design libraries. At this point area, in turn, can be estimated from the algorithm and the
in the design process, it is hard to determine the actual performance constraints [451, 1831.
switching properties of the data presented to the mod- Similar approaches can be used to produce early esti-
ules. Consequently, a white noise data model is generally mates of control and inputloutput capacitance [64]. It can be
adopted. While presenting a pessimistic view, this sim- argued that the resulting models are only useful for a given
plification is acceptable at this level of abstraction as it set of design tools and methodologies. While it is obviously
does not dramatically distort the correlation with the actual true that the model parameters will vary over methodologies
dissipation. and technologies, a more important result of these modeling
Given these modeling assumptions, the effective capac- attempts is the establishment of fundamental dependencies
itance of the computational resources is then estimated by and relationships. The understanding of these dependencies
the following expression: can eventually open the door to more advanced models.
c) Applications of Behavioral Power Modeling: The re-
sulting power model turns out to be particularly useful in
the algorithmic optimization process and for design space
(3) exploration. This is illustrated with the example of a vector
where res is the computational resource-Execution units quantization encoder for video compression [64], [56]. A
( E m ) , Memories (Mem), Registers (Reg), and C,,, the small variation in the formulation of the vector encoding
capacitance switched per access over a given period. algorithm can result in dramatic power reductions, yet
Predicting the dissipation of other architectural compo- maintaining the same performance. This is demonstrated
nents, such as UO, busses, clocks and controllers, is more in Fig. 5, where the energyhteration is plotted in function
troublesome, as their capacitance is strongly influenced of the supply voltage for two formulations of the algorithm.
by the subsequent design phases (such as logic synthesis, The second variant has the advantage of a reduced critical
floorplanning, and place and route). While just ignoring path (which enables lower voltage operation) and a smaller
their contribution results in unacceptable results, the best number of operations (which reduces the overall switching
that can be produced at this point is a reasonable estimate. capacitance). Behavioral power estimators can be extremely
In the high level synthesis community, there have been useful in studying this type of tradeoffs. This is also
considerable efforts in estimating the wiring cost, given demonstrated in [ 1221 and [ 1231.
the active area of a design and other estimated parameters, 2 ) Behavioral Level Power Minimization: Prior to a de-
such as the number of modules, the number of busses tailed design of the architecture, the only means to affect
and their average bus-width and fanout and the number of power is to modify or transform the specification, such that
control signals. The most popular approach is to generate the resulting description is more amenable to low power
the layouts for a large number of benchmark examples, solutions. Assuming that the repetition frequency (sample,

514 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 4, APRIL 1995

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
execution, or instruction rate) is a given design constraint, Table 1 Impact of Behavioral Transformations on the
the only means to reduce the projected dissipation of an Number of Address Line Transitions for a Segment Protocol
Processor: Before (SPPI) and After (SPP2). Three Scheduling
application is by either reducing the supply voltage or the Simulations-Each Processing About 20k ATM Cells-While
effective capacitance. Varying an Important System Parameter (SSM), Have Been Listed
a ) Supply Voltage Reduction: Lowering the supply 1 SPPl SPP2 I Reduction
voltage is the most effective means of reducing power
consumption as it yields a quadratic improvement in the 50% SSM 2,537,541 1,555,690 -39%
95% SSM 2,301,714 1,523,293 -34%
power-delay product of a logic family. Unfortunately,
this simple solution comes at a cost. First of all, finding
commodity components that operate at different voltage used extensively in video compression, range over a factor
levels (below 3.3 V) is hard, while multiple power supplies of 10 [98]. Automated techniques have been reported,
are to be avoided. Some of these issues can be addressed by which allow to globally reduce the number of operations,
the introduction of power-efficient dc-dc converters [ 1011. weighted with an aredpower cost factor over a range of
Reducing the supply voltage degrades the performance of algebraic manipulations and other transformations [46].
the logic operators [ 181. While the performance penalty is This has resulted in a near specification invariance for linear
initially small (between 3 V and 5 V), a rapid decrease applications such as DCT and video filters.
can be observed once VDD approaches two times the Another option is to reduce the “power strength” of an
transistor threshold. To maintain performance, this loss has operation by replacing it with less demanding operations
to be compensated by other means. At the architectural (this term is coined from the traditional strength reduction
level, this means using either faster components and
transformation in optimizing compilers [ 11). An example of
operators or exploiting concurrency. Replacing a single
the latter is the expansion of a constant multiplication into
fast operator, operated at high voltage, by a multitude of
a sequence of addshift operations [19]. In the degenerated
slow operators, running at low VDD,does not impact the
case, where no time multiplexing is performed, these shift
effective capacitance (ignoring the overhead of the parallel
operands can even be replaced by simple wiring.
architecture). But it allows for a lowering of the supply
Locality of reference is a prime feature to be pursued
voltage, which results in substantial power savings. This
when optimizing an algorithm for power consumption [81].
approach is generally known as trading area for power.
Local operations tend to involve less capacitance and are,
Design automation is key to successful exploration of
consequently, cheaper from a power perspective. Obvi-
the design space to effect area-power tradeoffs. To make
an application amenable for this class of power optimiza- ously, this locality is only meaningful when exploited
tion, it is essential that the algorithm contains sufficient properly at the architectural level. For instance, opera-
concurrency and that its critical path is smaller than the tions in the same subfunction of an algorithm should be
maximum execution time (under the voltage range of allocated to the same processor or the same data path
interest). Optimizing transformations to achieve both those partition. This eliminates expensive data transfers across
goals have been studied extensively in the compiler and high capacitance busses. Transformations, improving the
high level synthesis communities [ 11, [78], [40], mostly for locality of reference, are abundant, for instance, in the
speed optimization purposes. The power minimization goal domain of memory management, although most of the work
adds another twist to the problem, however: increasing the in that domain has been oriented towards either area or
concurrency or reducing the critical path only makes sense performance optimization [ 1201, [59] until recently.
as long as the effective capacitance factor does not increase Memory accesses often contribute a substantial part of
faster. For instance, at low supply voltages, the amount the dissipation in both computational and signal processing
of concurrency needed to compensate for the increase in applications. Replacing expensive accesses to background
propagation delay is so great, that the overhead becomes (secondary) memory by foreground memory references, or
dominant and power dissipation increases. using distributed memory instead of a single centralized
A good overview of the use of optimizing transforma- memory can cause a substantial power reduction [ l l ] . An
tions for supply voltage reduction is given in [19], [20]. extensive study into the impact of memory management
Concurrency increasing transformations include (time) loop on system and architecture-level power consumption has
unrolling, pipelining, and control flow optimizations. The been performed at IMEC [123]. This has shown that the
critical path of an application can be reduced by algebraic external communication and memory access for e.g. a table-
transformations [40], retiming and pipelining [54], [3 11. based telecom network subsystem dominates the power
b) Reduction in Effective Capacitance: There are many budget even when compared to the sum of all the other
means of reducing the potential effective capacitance of an contributions (clock, data-path, and control). The combined
algorithm. The most obvious way is to reduce the number results for this realistic test-vehicle taken from an ATM
of operations by choosing either the right algorithm for network, shown in Table 1, illustrate the extent of the
a given function or by eliminating redundant operators improvements which can be obtained.
(for instance, using dead code and common sub-expression Finally, selecting the correct data representation or en-
elimination). For instance, the computational requirements coding can reduce the switching activity [21]. For instance,
for the popular Discrete Cosine Transformation (DCT), the almost universally used two’s complement notation

SINGH er al.: CAD TOOLS AND METHODOLOGIES: A PERSPECTIVE 515

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
has the disadvantage that all bits of the representation % Error

x
: :
are toggled for a transition from 0 to -1, which occurs
rather often. This is not the case in the sign magnitude 500

representation, where only the sign bit is toggled. Choosing


the correct data encoding can impact the dissipation in data
signals with distinct properties, such as signal processing
data paths [41], [46] address counters and state machines.
In [103], it was demonstrated that choosing a Gray-coded 100

instruction addressing scheme results in an average reduc-


tion in switching activity, equal to 37% over a range of 2 4 6 8 1 0 1 2
benchmarks. Similar techniques were employed in [ 1231. Bits of Dynamic Range Utilized

Fig. 6. Error in modeling multiplier power


B. The Architectural Level
Once the architecture is defined and specified (e.g., using ous components of a typical processor architecture were
a functional or register-transfer level description), a more expressed as a function of a set of primary parameters. For
refined power profile can be constructed, which opens the instance, the capacitance of the clock wire, distributed in
way for more detailed optimizations. For instance, the an H-tree structure, can be modeled as 24 Dtcint,with Dt
impact of resource multiplexing can be taken into account. the chip dimension and c ; , ~ the interconnect capacitance
The architectural level is the design entry point for the large per unit length. Models were developed for memory, logic,
majority of digital designs and design decisions at this level interconnect and clocks and were used to predict the
can have a dramatic impact on the power budget of a design. global power consumption of a number of processors and
The availability of efficient, yet accurate tools at this level ASIC chips. This modeling approach has the advantage of
of abstraction is therefore of utmost importance. giving a global picture, suitable for driving optimizations
I ) Power Analysis: Consider an architecture described at and design tradeoffs. On the other hand, when accuracy
the structural level, consisting of an assembly of (param- is an issue, the technique suffers from an abundance of
eterizable) modules and interconnecting buses and wires. parameters and is sensitive to mismatches in the modeling
Performing an accurate power analysis at this level of assumptions. At that point, empirical models become more
abstraction requires the following information: attractive.
power models for the composing modules (library A first modeling effort in that direction was performed
information) by Powell and Chau [79], who proposed a parameterized
physical capacitance for the interconnect power model for macro-modules based on the following
switching statistics for the interconnect signals. considerations. The power model for an array multiplier
The latter can be obtained from functional or register might, for instance, be represented as P = C u N 2 V 2 f ,
transfer level simulation. Given a functional model of the where N is the width of the two inputs. Here, the quadratic
architecture, it might even be possible to derive the signal dependence on N accounts for the N 2 adders of which a
statistics analytically using, for instance, power spectral typical array multiplier is composed. Given this model, the
analysis techniques, long known in signal processing. This characterization process consists of simulating or measuring
approach turns out to be hopelessly complex, however, and the power dissipation of a multiplier, and fitting the capaci-
is only useful for linear, time-invariant systems (which are tive coefficient, Cc, to these results. The problem with this
rare). Simulation driven by an actual test-bench is therefore approach is that the module’s power consumption depends
the most appropriate technique to gather signal statistics. on the inputs applied. It being impossible, however, to
As interconnect in general has a significant impact on characterize the module for all possible input statistics,
the overall power budget, getting a meaningful estimate of purely random inputs-that is to say, independent uniform
its capacitance is important. Meaningful estimates can be white-noise (UWN) inputs-are typically applied when
obtained from stochastic models, as described in the behav- deriving Cu.
ioral section or from the initial floorplan. More accurate This leads to an important source of error as illustrated
data can be back-annotated into the model once a more by Fig. 6, which displays the estimation error (relative to
actual floorplan or layout has been produced. switch-level simulations) for a 16x 16 multiplier. Clearly,
While the abstract modeling of architectural macro- when the dynamic range of the inputs does not fully occupy
modules with regards to area and power is a well understood the bit width of the multiplier, the UWN model becomes
craft, providing high level power models has only recently extremely inaccurate. Errors in the range of 60-100% are
attracted major attention. In a first attempt in that direction, not uncommon.
Vermassen [118] derived power models for adders and A more precise model was proposed by Landman and
multipliers, based on complexity theory [113] and a Rabaey [50]. In the so-called dual bit type (DBT) model,
probabilistic analysis of the underlying gate structure. it is projected that typical data in a digital system can
Another parametric model was described by Svenson be divided into two regions: the LSB’s, which act as
and Liu [104], where the power dissipation of the vari- uncorrelated white noise and the MSB’s, which correspond

516 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 4. APRIL 1995

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
P(O-+I) P
0.50-1 I 1 1 I I I I -

0.40 -
0.30 -
0.25 -
0.20 -

0.10 -
0.00-1 I I I I I I 1 7
0 2 4 6 8 I 0 1 2 1 4
Bit

Fig. 7. Transition activity versus bit.


REG BUS €XU MUX CLK BUF CTL

M=3 M=7 M=l5 M=31


10-1
9-
- ’
Model
I
I I I
Fig. 9. Effective capacitance for architectural resources in mul-
tiplexed and parallel versions of QMF filter, as predicted by the
SPA power prediction tool (501.
8-
- -Swltch-level

models. An example of a tool, doing just that, is the


SPA architectural power analysis tool from University of
Califomia at Berkeley [50].
N=32 Fig. 9 illustrates the potential impact of architectural
level power estimation and analysis tools. It compares
the projected power for the various hardware resources
N=16 of two architectures, implementing the same quadrature
N=8 mirror filter (QMF). It is easily observed that the parallel
0 3 0 7 0 15 0 31 version is far more power effective than the multiplexed one
by virtually eliminating the bus and multiplexer effective
SHIFT
capacitance [81]. This type of data is instrumental when
Fig. 8. Architectural model versus switch level simulated power performing power tradeoff and is hard to obtain from lower
dissipation of logarithmic shifter. design hierarchy levels or “intuition.”
2 ) Power Optimization: Similar to the behavioral level,
to sign bits. The latter are generally correlated between architectural level optimizations can have a dramatic impact
consequent data values and are far from random. This is on power dissipations. Researchers have reported orders of
illustrated in Fig. 7, which plots the transition probabilities magnitude in dissipation reduction by picking the proper
architectural choice [106], [81], [21].
for the different bit-positions in a data word as a function
Although architectural level synthesis has received con-
of the correlation factor p between consecutive samples.
siderable attention in the last decade [121], virtually none
p = 0 means that the consecutive data samples are totally
of the efforts in this domain has been addressing the power
uncorrelated and, hence, correspond to white noise data. In
dimension. Some initial results have been reported in [ 191,
that case, no difference exists between LSB and MSB bits.
[64], [31], and [123]. In lieu of analyzing existing ap-
When a meaningful correlation is present between samples, proaches, it is worth discussing where and how architectural
an important deviation in transition activity between LSB synthesis can affect dissipation. Similar to the behavioral
and MSB’s can be observed. The DBT model categorizes a level optimization, architectural level power minimization
hardware module for both those regions. A power model is targets a dual goal: 1) Minimize the required supply voltage
automatically generated by applying a predefined set of test to a minimal allowable level; and 2) minimize the effective
pattems (correlated and uncorrelated) to the module using a capacitance. While the former is obtained at the architecture
simulator of choice and extracting the switching capacitance level by exploiting parallelism and pipelining [ 191, one of
for each of the regions of interest. The resulting power mod- the main architectural means of addressing the latter is the
els are accurate to within 15% [51] as illustrated in Fig. 8, use of locality of reference: whenever possible, accessing
which compares the power consumption of a logarithmic global, centralized resources (such as memories, buses or
shifter, obtained using both switch level simulations and ALU’s) should be avoided. How some of these goals can
the architectural model. be accomplished at the different levels of the architectural
Once appropriate models for the modules are available, synthesis process is discussed below.
performing power prediction at the architectural or register a ) Architecture Selection and Partitioning: The choice
transfer level becomes feasible by combining signal sta- of the architecture model has a dramatic impact on power
tistics, interconnect capacitance and macro-module power dissipation as it determines the amount of concurrency

SINGH et al.: CAD TOOLS AND METHODOLOGIES: A PERSPECTIVE 571

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
which can be sustained, the amount of multiplexing of c ) Architecture Synthesis: The architectural synthesis
a hardware resource, the required clock frequency, the process traditionally consists of three phases: allocation,
capacitance per instruction, etc. There is a widespread assignment, and scheduling. In a few words, these processes
belief that the fully parallel, nonmultiplexed architecture determine how many instances of each resource are
yields the lowest possible dissipation with the minimum needed (allocation), on what resource will a computational
overhead [81], [21]. Unfortunately, such an architecture operation be performed (assignment) and when will it be
precludes programming or requires an unwieldy large area. executed (scheduling). Traditionally, architectural synthesis
Deciding on the right architecture for a given task or attempts to minimize the number of resources to perform a
function is therefore a tradeoff between power, area, and task in a given time or tries to minimize the execution time
flexibility. for a given set of resources. The underlying concept is to
Similarly, the partitioning of a task over a number of re- optimize resource utilization, under the assumption that an
sources can have a substantial impact on the dissipation. For architecture where all resources are kept busy for most of
instance, in [56] it was shown that partitioning of the code- the time is probably also the smallest solution.
book memory in a vector encoder for video compression This picture is not valid anymore when power mini-
reduced the power consumption in the memories (which mization is the primary target. The smallest solution is
was the dominant factor in this design) with a factor of not necessarily the one with the smallest dissipation. Quite
8. In [ 111, it was demonstrated how the memory hierarchy, the contrary is true. Indeed, it often makes sense to add
the caching approach and the cache sizes impacts the power extra units, if this contributes to minimizing the effective
cost of accessing the main memory in a microprocessor. A capacitance.
similar analysis was performed in [123]. This difference has an important effect on the architec-
Due to a lack of meaningful tools, the only recourse tural synthesis techniques. The prime driving function for
available to a designer at present is to perform back of assignment now becomes the quest towards regularity and
the envelope calculations. The emerging architectural level locality. Both help to reduce the overhead of interconnect,
power estimators will help to make this task more accurate while simultaneously keeping the signals more correlated.
and meaningful by identifying the power bottlenecks in Scheduling can have a similar impact, as it influences the
an architecture. For instance, the analysis of the signal correlation between the signals on the busses, memories and
statistics of these tools can help to determine if and when logic operators. This has been illustrated in [123], where
power-down of a hardware module is desirable. operations were reordered over conditional boundaries.
b ) Instruction Set and Hardware Selection: Optimizing In [ 1031, a “cold scheduling” approach for traditional
the instruction set is another means of reducing the power microprocessor architectures was proposed,
consumption in a processor. As an example, providing a While not specifically targeting power, a number of
special datapath for often executed instructions reduces the researchers have already explored some of these issues
capacitance switched for each execution of that instruction and some of the proposed approaches could become of
(compared to executing that instruction on a general particular interest in power sensitive architectural synthesis
purpose ALU). This approach is popular in application [871, [151, t371, [521.
specific processors, such as modem and voice coder The result of this phase of the design yields a timed RTL
processors [5]. netlist used to drive logic level tools in the next stage of
Another dimension of the power minimization equa- the design process.
tion is the choice of the hardware module to execute a
given instruction. A detailed gate level study of adder and 111. LOGICLEVEL
multiplication modules, for instance, revealed that energy Logic design and synthesis fits between the register
structures such as carry-lookahead adders and Wallace mul- transfer level and the netlist of gates specification. It pro-
tipliers, operate at lower energy levels than the more area vides the automatic synthesis of netlists minimizing some
effective ripple adder or carry save multiplier [14]. This objective function subject to various constraints. Example
study was later confirmed by Nagendra er al. [70], where inputs to a logic synthesis system include two-level logic
the power-delay products of various adder structures were representation, multilevel Boolean networks, finite state
analyzed at the circuit level. The picture becomes more machines and technology mapped circuits. Depending on
complex when additional optimizations such are operator the input specification (combinational versus sequential,
pipelining are introduced. synchronous versus asynchronous), the target implementa-
Dealing with all these extra factors requires a profound tion (two-level versus multilevel, unmapped versus mapped,
revision of the hardware selection process in high level or ASIC’s versus FPGA’s), the objective function (area, delay,
architectural synthesis. The availability of functional level power, testability) and the delay models used (zero-delay,
power models is a first step in that direction [122]. A vision unit-delay, unit-fanout delay, or library delay models),
on how cell libraries can interface and communicate with different techniques are applied to transform and optimize
synthesis tools is presented in [95], which proposes an the original RTL description.
object oriented cell library manager. Based on extensive Once various system level, architectural and technologi-
area, time and power modeling, the library manager helps cal choices are made, it is the switching activity of the logic
to select the best fitting cell for a required functionality. (weighted by the capacitive loading) that determines the

578 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 4, APRIL 1995

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
power consumption of a circuit. In this section, a number Calculation of E ( s w ) is difficult as it depends on:
of techniques for power estimation and minimization during input patterns and the sequence in which they are
logic synthesis will be presented. The emphasis during applied,
power estimation will be on pattern-independent simulation delay model used,
techniques while the strategy for synthesizing circuits for circuit structure.
low power consumption will be to restructure/optimize the Switching activity at the output of a gate depends not
circuit to obtain low switching activity factors at nodes that only on the switching activities at the inputs and the
drive large capacitive loads. logic function of the gate, but also on the spatial and
temporal dependencies among the gate inputs. For example,
A. Power Estimation Techniques consider a two-input AND-gate g with independent inputs
I ) Sources of Power Dissipation: Power dissipation in i and j whose signal probabilities are 1/2, then E,(sw)=
CMOS circuits is caused by three sources: the (subthresh- 2.1/4.3/4= 3/8. Now suppose it is known that only patterns
old) leakage current that arises from the inversion charge 00 and 11 can be applied to the gate inputs and that both pat-
that exists at the gate voltages below the threshold voltage, terns are equally likely, then E,(sw)= 1/2. Alternatively,
the short-circuit current that is due to the DC path between assume that it is known that every 0 applied to input i is
the supply rails during output transitions, and the charging immediately followed by a 1 while every 1 applied to input
and discharging of capacitive loads during logic changes. j is immediately followed by a 0, then E,(sw)= 4/9. The
The subthreshold current for long channel devices in- first case is an example of spatial correlation between gate
creases linearly with the ratio of the channel width over inputs while the second case illustrates temporal correlation
channel length and decreases nearly exponentially with on spatially independent gate inputs.
decreasing VGT = VGS- VT where VGS is the gate bias Based on the delay model used, the power estima-
and VT is the threshold voltage. Several hundred millivolts tion techniques could account for steady-state transitions
of “off bias” (say, 3 0 0 4 0 0 mV) typically reduces the (which consume power, but are necessary to perform a
subthreshold current to negligible values. With reduced computational task) and/or hazards and glitches (which
power supply and device threshold voltages, the subthresh- dissipate power without doing any useful computation). It
old current will however become more pronounced. In is shown in [6] that although the mean value of the ratio of
addition, at short channel lengths, the subthreshold current hazardous component to the total power dissipation varies
also becomes exponentially dependent on drain voltage significantly with the considered circuits (from 9% to 38%),
instead of being independent of VDS (see [35] for a recent the hazardglitch power dissipation cannot be neglected in
analysis). The subthreshold current will remain 102-105 static CMOS circuits. Indeed, an average of 15-20% of
times smaller than the “on current” even at submicron the total power is dissipated in glitching. The glitch power
device sizes. problem is likely to become even more important in future
The short-circuit power consumption for an inverter gate scaled technology.
is proportional to the gain of the inverter, the cubic power In real networks, statistical perturbations of circuit pa-
of supply voltage minus device threshold, the input rise/fall rameters may change the propagation delays and produce
time and the operating frequency [ 1 171. The maximum changes in the number of transitions because of the ap-
short circuit current flows when there is no load; this current pearance or disappearance of hazards. It is therefore useful
decreases with the load. If gate sizes are selected so that to determine the change in the signal transition count as
the input and output rise/fall times are approximately equal, a function of this statistical perturbations. Variation of
the short-circuit power consumption will be less than 15% gate delay parameters may change the number of hazards
of the dynamic power consumption. If, however, design for occurring during a transition as well as their duration. For
high performance is taken to the extreme where large gates this reason, it is expected that the hazardous component
are used to drive relatively small loads, then there will be a of power dissipation is more sensitive to IC parameter
severe penalty in terms of short-circuit power consumption. fluctuations than the power strictly required to perform the
It is widely accepted that the short-circuit and sub- transition between the initial and final state of each node.
threshold currents in CMOS circuits can be made small The major difficulty in computing the signal probabilities
with proper circuit and device design techniques. The is there convergent nodes. Indeed, if a network consists
dominant source of power dissipation is thus the charging of simple gates and has no reconvergent fanout stems (or
and discharging of the node capacitances (also referred to nodes), then the exact signal probabilities can be computed
as the dynamic power dissipation) and is given by:’ during a single post-order traversal of the network. For
networks with reconvergent fanout, the problem is much
(4) more difficult.
2 ) Circuit- and Switch-Level Simulation: Circuit simula-
where Vdd is the supply voltage and Tcycleis the clock tion based techniques [47], [ 1121 simulate the circuit with
cycle time. a representative set of input vectors. They are accurate and
capable of handling various device models, different cir-
Observe the similarity between (1) and (4).The difference of a factor
between them is a result of deviation in the definitions of the period cuit design styles, dynamidprecharged logic tristate drives,
concept at the architectural and logickircuit levels. latches, flip-flops, etc. Although circuit level simulators are

SINGH et al.: CAD TOOLS AND METHODOLOGIES: A PERSPECTIVE 579

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
accurate, flexible and easy-to-use, they suffer from memory PowerMill and Entice-Aspen are steps in the right direction
and execution time constraints and are not suitable for large, as they provide intermediate level simulation that bridges
cell-based designs. In general, it is difficult to generate a the gaps between circuit-level and switch-level simulation
compact stimulus vector set to calculate accurate activity paradigms.
factors at the circuit nodes. The size of such a vector set is
dependent on the application and the system environment B. Estimation in Combinational Circuits
1851. I ) Estimation under a Zero Delay Model: Most of the
A Monte Carlo approach for power estimation that al- power in CMOS circuits is consumed during charging and
leviates this problem has been proposed in [13]. The discharging of the load capacitance. To estimate the power
convergence time for this approach is quite good when esti- consumption, one has to calculate the (switching) activity
mating the total power consumption of the circuit. However, factors of the intemal nodes of the circuit. Methods of
when signal probability (or power consumption) values on estimating the activity factor E,(sw) at a circuit node
individual lines of the circuit are required, the convergence n involve estimation of signal probability prob(n), the
rate is not as good. A method for estimating the typical probability that the signal value at the node is one. Under
power consumption of synchronous CMOS circuits using a the assumption that the values applied to each circuit input
hierarchy of RTL and circuit-level simulators is presented are temporally independent, we can write:
in [116]. In this method, the number of clock cycles used
for simulation is incrementally calculated depending on the E,(sw) = 2prob(n)(l - prob(n)). (5)
considered circuit and on the specified accuracy.
Switch-level simulation techniques are in general orders Computing signal probabilities has attracted much atten-
of magnitude faster than circuit-level simulation techniques, tion [75], [39], [96], [17], [72]. These works describe vari-
but are not as accurate or versatile. ous exact and approximate procedures for signal probability
calculation. Notable among them is the exact procedure
PowerMill [30] is a transistor-level power simulator and
given in [17] which is based on ordered binary-decision
analyzer that applies an event-driven timing simulation
diagrams (OBDD’s) [9]. This procedure is linear in the size
algorithm (based on simplified table-driven device models,
of the corresponding function graph. The size of the graph,
circuit partitioning and single-step nonlinear iteration) to
of course, may be exponential in the number of circuit
increase the speed by two to three orders of magnitude
inputs. The signal probability at the output of a node is
over SPICE. PowerMill gives detailed power information
calculated by first building an OBDD corresponding to the
(instantaneous, average and rms current values) as well as
global function of the node and then performing a postorder
the total power consumption (due to steady-state transitions,
traversal of the OBDD using
hazards and glitches, transient short circuit currents, and
leakage currents). It also provides diagnostic information prob(y) = prob(z)prob( f z ) + prob(Z)prob(f z ) . (6)
(by using both static check and dynamic diagnosis) to
identify the hot spots (which consume large amount of where f z and f z are the cofactors of f with respect to x
power) and the trouble spots (which exhibit excessive and x, respectively.
leakage or transient short-circuit currents). Finally, it tracks The spatial correlation among different signals are mod-
the current density and voltage drop in the power net eled in [32] where a procedure is described for propagating
and identifies reliability problems caused by EM failures, signal probabilities from the circuit inputs toward the circuit
ground bounce and excessive voltage drops. outputs using only painvise correlations between signals
Entice-Aspen [36] is a power analysis system that raises and ignoring higher order correlation terms.
the level of abstraction for power estimation from the The temporal correlation between values of some signal
transistor level to the gate level. Aspen computes the circuit x in two successive clock cycles are modeled in [92] and
activity information using the Entice power characterization [61] by a time-homogeneous Markov chain that has two
data as follows. For each cell in the library, Entice requires states 0 and 1 and four edges where each edge i j ( i , j =
a transistor level netlist that is generally obtained from the 0, 1) is annotated with the conditional probability prob:J
cell layout using a parameter extraction tool. In addition, a +
that z will go to state j at time t 1 if it is in state i at
stimulus file is to be supplied where power and timing delay time t. The transition probability prob(xi,j) is equal to
vectors are specified. The set of power vectors discretizes prob(x = i)probrj. The activity factor of line IC can be
all possible events in which power can be dissipated by the expressed in terms of these transition probabilities as
cell. With the relevant parameters set according to the user’s
specifications, a SPICE circuit simulation is invoked to ac-
E,(sw) = prob(zo,l) + prob(xl,o). (7)
curately obtain the power dissipation of each vector. During The transition probabilities can be computed exactly
logic simulation, Aspen monitors the transition count of using the OBDD representation of the logic function of z in
each cell and computes the total power consumption as terms of the circuit inputs. A approximate mechanism for
the sum of the power dissipation for all cells in the power propagating the transition probabilities through the circuit
vector path. is described in [61] that is more efficient as there is no
In summary, accuracy and efficiency are the key need to build the global function of each node in terms
requirements for any power analysis prediction tool. of the circuit inputs, but is less accurate. The loss is often

580 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 4, APIUL 1995

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
small while the computational saving is significant. This Table 2 Effect of Spatio-Temporal Correlations on
work is then extended in [61] to account for spatiotemporal Switching Activity Estimation
correlations (i.e., spatial correlations between temporally with I without
Spatial Correlations
dependent events).
Table 2 (taken from [61]) gives the various error measures
with I without 1 with I without
Temporal Correlations
(compared to exact values obtained from exhaustive binary max I 0.0463 I 0.2020 I 0.2421 I 0.2478
simulation) for pseudo-random input sequences for thef5lm mean 0.0115 0.059 1 0.0658 0.0969
benchmark circuit [124]. This is a small circuit with 8 RMS 0.0185 0.0767 0.0722 0.1103
STD 0.0149 0.0505 0.0960 0.0544
inputs, 8 outputs, and 46 CMOS gates. It can be seen
that accounting for either spatial or temporal correlations
improves the accuracy while the most accurate results characterized by its steady state values. Next, each group is
are obtained by considering both spatial and temporal combined into a probability waveform with the appropriate
correlations. steady-state tag. Given the tagged probability waveforms at
In summary, the OBDD-based approach is the best choice the input of a simple gate, it is then possible to compute the
for signal probability calculation if the OBDD represen- tagged probability waveforms at the output of the gate. The
tation of the entire circuit can be constructed. Otherwise, correlation between probability waveforms at the inputs is
a circuit partitioning scheme that breaks the circuit into approximated by the correlation between the steady state
blocks for which OBDD representations can be built is values of these lines. This approach requires significantly
recommended. In this case, the correlation coefficients must less memory and runs much faster than symbolic simula-
be calculated and propagated from the circuit inputs toward tion, yet achieves very high accuracy, e.g., the average error
the circuit outputs in order to improve the accuracy. in aggregate power consumption is about 2%.
2 ) Estimation under a Real Delay Model: The above In summary, symbolic simulation provides the exact
methods only account for steady-state behavior of the switching activity values under a real delay model. It is
circuit and thus ignore hazards and glitches. This section however very inefficient and impractical, but for small
reviews some techniques that examine the dynamic circuits. Probabilistic simulation and its tagged variant
behavior of the circuit and thus estimate the power constitute the best choice for switching activity estimation
dissipation due to hazards and glitches. at the gate level.
In [38], the exact power estimation of a given combi- 3 ) Estimation in Sequential Circuits: Recently developed
national logic circuit is carried out by creating a set of methods for power estimation have primarily focused on
symbolic functions such that summing the signal probabil- combinational logic. The estimates produced by purely
ities of the functions corresponds to the average switching combinational methods can greatly differ from those pro-
activity at a circuit line z in the original combinational duced by the exact state probability method. Indeed, ac-
circuit. The inputs to the created symbolic functions are curate average switching activity estimation for sequential
the circuit input lines at time instances 0- and W. Each circuits is considerably more difficult than for combina-
function is the exclusive-OR of the characteristic functions tional circuits, because the probability of the circuit being
describing the logic values of 5 at two consecutive in- in each of its possible states has to be calculated.
stances. The major disadvantage of this estimation method The exact state probabilities of a sequential machine
is its exponential complexity. However, for the circuits that are calculated in [ 11 11 and [66] by solving the Chapman
this method is applicable to, the estimates provided by the Kolmogorov (C-K) equations for discrete-time discrete-
method can serve as a basis for comparison among different state Markov process. This method requires the solution
approximation schemes. of a linear system of equations of size 2 N , where N is the
The concept of a probability waveform is introduced in number of flip-flops in the machine. Thus this method is
[12]. This waveform consists of a sequence of transition limited to circuits with <15 flip-flops, since it requires the
edges or events over time from the initial steady state (time explicit consideration of each state in the circuit.
0-) to the final steady state (time cc) where each event is A framework for exact and approximate calculation of
annotated with an occurrence probability. The probability switching activities in sequential circuits is also described
waveform of a node is a compact representation of the set in [67]. The basic computation step is the solution of a
of all possible logical waveforms at that node. Given these nonlinear algebraic system of equations in terms of the
waveforms, it is straight-forward to calculate the switching signal probabilities of the present state and combinational
activity of x that includes the contribution of hazards and inputs of the FSM. The fixed point (or zero) of this system
glitches. Given such waveforms at the circuit inputs and of equations can be found using the Picard-Peano (or New-
with some convenient partitioning of the circuit, the authors ton-Raphson) iteration. Increasing the number of variables
examine every sub-circuit and derive the corresponding or the number of equations in the above system results
waveforms at the internal circuit nodes[73]. in increased accuracy. For a wide variety of examples, it
A tagged probabilistic simulation approach is described is shown that the approximation scheme is within 1-3% of
in [ 1081 that correctly accounts for reconvergent fanout and the exact method, but is orders of magnitude faster for large
glitches. The key idea is to break the set of possible logical circuits. Previous sequential switching activity estimation
waveforms at a node n into four groups, each group being methods can have significantly greater inaccuracies.

SlNGH et al.: CAD TOOLS AND METHODOLOGIES: A PERSPECTIVE 581

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
C. Logic Optimization Techniques that implements the next state logic function is presented in
Both the switching activity and the capacitive loading can [ 1071. Experimental results on a large number of benchmark
be optimized during logic synthesis. It therefore has more circuits show 10% and 17% power reductions for two-level
potential for reducing the power dissipation than physical logic and multilevel implementations, respectively.
design. On the other hand, less information is available 4 ) Multi-Level Network Optimization: Network “don’t
during logic synthesis, and hence, factors such as slew rates, cares” can be used for minimization of nodes in a boolean
short circuit currents, etc. cannot be captured properly. network [9 I]. Two multilevel network optimization tech-
In the following, we present a number of techniques for niques for low power are described in [97] and [44]. One
power reduction during sequential and combinational logic difference between these procedure and the procedure in
synthesis that essentially target dynamic power dissipation [91] is in the cost function used during the two-level
under a zero-delay or a simple library delay model. logic minimization. The new cost function minimizes a
I ) Retiming: In [65], it is noted that the flip-flop output linear combination of the number of product terms and the
may make at most one transition when the clock is asserted. weighted switching activity. In addition, [44] considers how
Based on this observation, the authors then describe a changes in the global function of an internal node affects
the switching activity (and thus, the power consumption)
circuit retiming technique targeting low power dissipation.
of nodes in its transitive fanout. In particular, two types
The technique does not produce the optimal retiming so-
of local don’t care conditions are identified: 1) Function
lution as the retiming of a single node can dramatically
Preserving Don’t Care (FPDC) which consists of all points
change the switching activity in a circuit and it is very
in the local space of the node that never occur and is
difficult to predict what this change will be.
simply the complement of the range of the primary input
The technique heuristically selects a set of nodes with
space into the space of the local fanins of the node and 2 )
the property that if flip-flops are placed at their outputs,
Function Modifying Don ’t Care (FMDC) which consists of
switching activity in the network will be reduced. Nodes
all points in the local space that do not produce observable
are selected based on the amount of glitching that is
outputs. FPDC does not affect the global function of the
present at their outputs and the probability that this glitching
node while FMDC does. A greedy network optimization
propagates through their transitive fanouts. The power
procedure based on the above concepts is introduced that
dissipated by the 3-stage pipelined circuits obtained by reduces the power dissipation in the combinational circuits
retiming for low power with a delay constraint is about 8% by about 10%.
less than that obtained by retiming for minimum number 5 ) Common Subexpression Extraction: Extraction based
of flip-flops given a delay constraint. on algebraic division (using cube-free primary divisors or
2 ) Pre-Computation Logic: A sequential logic optimiza- kernels) has proven to be very successful in creating an
tion method is described in [2] that is based on selectively area-optimized multilevel Boolean network [8] and [86].
precomputing the output logic values of the circuits one The kernel extraction procedure is modified in [89] to
clock cycle before they are required, and using the pre- generate multilevel circuits with low power consumption.
computed values to reduced internal switching activity in The main idea is to calculate a power saving factor for each
the succeeding clock cycle. They present an automatic candidate kernel based on how its extraction will effect
method of synthesizing precomputation logic so as to the loading on the its input lines and the among of logic
achieved maximal reductions in power dissipation. Up to sharing. Results show 12% reduction in power compared
62% reduction in average switching activity and power to a minimum-literal network.
dissipation are reported with marginal increases in circuit 6 ) Path Balancing: Balancing path delays reduces haz-
area and delay. arddglitches in the circuit which in turn reduces the average
3 ) State Assignment: In the past, many researchers have power dissipation in the circuit. This can be achieved
addressed the encoding problem for minimum area of two- before technology mapping by selective collapsing and
level or multi-level logic implementations (e.g., [ 1191 and logic decomposition or after technology mapping by delay
[58]). A state assignment procedure is presented in [89] insertion and pin reordering.
that minimizes the switching activity on the present state The rationale behind selective collapsing is that by col-
input lines. This problem is equivalent to embedding a lapsing the fanins of a node into that node, the arrival time
weighted graph on a hypercube of minimum (or given) at the output of the node can be increased. This is however
dimensionality which is an NP-hard problem. The authors valid only if the node delay is determined by assuming an
use simulated annealing to solve this problem. AND-ORimplementation with the delay of AND and OR gates
The shortcomings of the above approach are: (1) It being a function of the number of inputs to the gate. Logic
minimizes the switching on the present state bits without decomposition can be performed so as to minimize the
any consideration for the loading on the state bits; ( 2 ) It level difference between the inputs of nodes that are driving
does not account for the power consumption in the resulting high capacitive nodes. The key issue in delay insertion is
two- or multi-level logic realization of the next state logic to use the minimum number of delay elements to achieve
of the FSM. A state assignment technique that not only the maximum reduction in spurious switching activity. This
accounts for the power consumption at the state bit lines, is a difficult task as delay insertion at some node will
but also the power consumption in the combinational logic not only change the spurious activity at the immediate

582 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 4, APRIL 1995

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
output of the gate (hopefully, will reduce it), but will affect Nets vs Switching
the spurious activity in the transitive fanout of the node
(unfortunately, due to change in output waveforms and A
sensitizability conditions, the activity may be increased). 50
Path delays may sometimes be balanced by appropriate
840
signal to pin assignment. This is possible as the delay
7 3o
characteristics of CMOS gates vary as a function of the g 20
input pin that is causing a transition at the output. 10
7) Technology Decomposition: It is difficult to develop 0
a decomposed network that leads to a minimum power t33rzwg%Xgz
0 0 0 0
implementation after technology mapping since gate load-
ing and mapping information are unknown at this stage. Switching Activity
Nevertheless, it has been observed that a decomposition
scheme which minimizes the sum of the switching activities
at the internal nodes of the network, is a good starting point
r-l
El AD-Map
PD-Map

for power-efficient technology mapping. Fig. 10. Number of nets versus switching rate for s832.
Given the switching activity value at each input of a
complex node, a procedure for AND decomposition of
the node is described in [lo91 that minimizes the total Average Load Vs. Switching Activity
switching activity in the resulting two-input AND tree under
a zero-delay model. The decomposition procedure (which 2T
is similar to Huffman’s algorithm (431 for constructing a 1.5
binary tree with minimum average weighted path length)
is optimal for dynamic CMOS circuits and produces very 3 1
good results for static CMOS circuits. The performance- 4
* 0.5
oriented version of the above problem requires that the in-
crease in the height of the decomposed network (compared 0
to the undecomposed network) be bounded. The solution
here is similar to Larmore/Hirschberg’s algorithm [53] for
solving the tree decomposition problem with minimum
1 r3 AD-Map
P D-Map
Switching Activity
average weighted path length subject to a height constraint.
It is shown that the low power technology decomposition
reduces the total switching activity in the networks by 5% Fig. 11. Average load per net versus switching rate for s832.
over the conventional balanced tree decomposition method.
This translates to a 3% reduction in power consumption Compared to a technology mapper that minimizes the
after technology mapping. circuit delay, this procedure leads to an average of 18%
8) Technology Mapping: A successful and efficient solu- reduction in power consumption at the expense of 16%
tion to the minimum area mapping problem was suggested increase in area without any degradation in performance.
in [48] and implemented in programs such as DAGON and
In Figs. 10 and 11, the results of this power-delay mapper
MIS. The idea is to reduce technology mapping to DAG
are compared with the area-delay mapper of [23] for the
covering and to approximate DAG covering by a sequence
s832 benchmark circuit [124]. This circuit contains 18
of tree coverings that can be performed optimally using
inputs, 19 outputs, 245 product terms, and 25 flip-flops.
dynamic programming.
After the mapping, the circuit contains about 200 CMOS
The problem of minimizing the average power consump-
gates (the exact gate counts are different for the area-
tion during the technology dependent phase of logic syn-
thesis is addressed in [109]. This approach consists of two delay and power-delay mappers). From Fig. 10, we can
steps. In the first step, power-delay curves (that capture see that the power-delay mapper reduces the number of
power consumption versus arrival time tradeoffs) at all high switching activity nets at the expense of increasing
nodes in the network are computed. In the second step, the number of low switching activity nets. From Fig. 11,
the mapping solution is generated based on the computed we learn that for the remaining high switching activity nets,
power-delay curves and the required times at the primary the power-delay mapper reduces the average load on the
outputs. For a NAND-decomposed tree, subject to load calcu- nets. By taking these two steps, this mapper minimizes the
lation errors, this two step approach finds the minimum area total weighted switching activity and hence the total power
mapping satisfying any delay constraint if such a solution consumption in the circuit.
exists. Under a real delay model, the dynamic programming
The algorithm is optimal for trees and has polynomial based tree mapping algorithm does not guarantee to find an
run time on anode-balanced tree. It is easily extended to optimum solution even for a tree. The dynamic program-
mapping a network modeled by a directed acyclic graph. ming approach was adopted based on the assumption that

SINGH et al.: CAD TOOLS AND METHODOLOGIES: A PERSPECTIVE 583

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
the current best solution is derived from the best solutions IV. PHYSICAL
LEVEL
stored at the fanin nodes of the matching gate. This is true Physical design fits between the netlist of gates spec-
for power estimation under a zero delay model, but not for ification and the geometric (mask) representation known
that under a real delay model. as the layout. It provides the automatic layout of cir-
The extension to a real delay model is considered in cuits minimizing some objective function subject to given
[ 1081.Every point on the power-delay curve of a given node constraints. Depending on the target design style (full-
uniquely defines a mapped subnetwork from the circuit custom, standard-cell, gate arrays, FPGAs), the packaging
inputs up to the node. Again, the idea is to annotate each technology (printed circuit boards, multi-chip modules,
such point with the probability waveform for the node in the wafer-scale integration) and the objective function (area,
corresponding mapped subnetwork. Using this information, delay, power, reliability), various optimization techniques
the total power cost (due to steady-state transitions and are used to partition, place, resize and route gates.
hazards) of a candidate match can be calculated from the
annotated power-delay curves at the inputs of the gate
and the power-delay values of the gate itself. The spatial A. Layout Optimization Techniques
correlations among the input waveforms are captured using Under a zero-delay model, the switching activity of
the tagging mechanism described previously. gates remains unchanged during layout optimization, and
The concept of power-delay curve has been extended to hence, the only way to reduce power dissipation is to
include the area tradeoff. Instead of generating a set of decrease the load on high switching activity gates by
(power, delay) values, the tradeoff curve consists of a set netlist partitioning and gate placement, gate and wire sizing,
of (power, delay, area) values [57]. A modification of the transistor reordering, and routing. Simultaneously, if a real-
SIS mapper for low power is described in [105]. delay model is used, various layout optimization operations
9) Signal-to-Pin Assignment: In general, library gates influence the hazard activity in the circuit. This is however
have pins that are functionally equivalent which means that a very difficult analysis and optimization problem and
inputs can be permuted on those pins without changing requires further research.
function of the gate output. These equivalent pins may 1 ) Circuit Partitioning: Netlist partitioning is key in
have different input pin loads and pin dependent delays. It breaking a complex design into pieces that are subsequently
is well known that the signal to pin assignment in a CMOS optimized and implemented as separate blocks. In general,
logic gate has a sizable impact on the propagation delay the off-block capacitances are much higher than the on-
through the gate. block capacitances (one to two orders of magnitude). It
If we ignore the power dissipation due to charging and is therefore essential to develop partitioning schemes that
discharging of internal capacitances, it becomes obvious keep the high switching activity nets entirely within the
that high switching activity inputs should be matched with same block as much as possible. Techniques based on local
pins that have low input capacitance [16]. However, the neighborhood search (e.g., the FM heuristic [33]) can be
internal power dissipation also varies as a function of the easily adapted to do this. In particular, it is adequate to
switching activities and the pin assignment of the input assign net weights based on the switching activity values of
signals. To find the minimum power pin assignment for the driver gates and then find a minimum cost partitioning
a gate g, one must solve a difficult optimization problem solution.
[110]. As the number of functionally equivalent pins in 2 ) Floorplanning: Floorplanning plays an important role
a typical semi-custom library is not greater that six, it during layout optimization as it determines the interface
is feasible to exhaustively enumerate all pin permutations characteristics (shape, size, U 0 locations) and positions
to find the minimum power pin assignment. Alternatively, of custom or semi-custom blocks in a hierarchical design
one can use heuristics, for example, a reasonable heuristic environment. [22] describes a floorplanner that considers
assigns the signal with largest probability of assuming a power management. The idea is to generate a set of power-
controlling value (zero for NMOS and one for PMOS) to the indexed shape functions and then use implementations for
transistor near the output terminal of the gate. Alternatively, each flexible module that satisfies the timing constraints
one could assign the signal with the earliest transition to a while minimizing the dynamic power dissipation. In ad-
controlling value to the transistor near the output terminal. dition, this work considers constraints to mitigate power
The rationale is that this transistor will switch off as often line noises and thermal reliability problems. Results show
as (or as early as) possible, thus blocking the internal nodes 18% reduction in power and more smoothly distributed
from nonproductive charge and discharge events. power dissipation over the floorplan area compared to
Another heuristic [57] assigns the signal with highest conventional floorplanners with the same delay constraint.
switching activity to the input pin with the least capacitance. There is however a small area penalty.
This is not very effective as in the semi-custom libraries, 3 ) Placement: A performance driven placement algo-
the difference in pin capacitances for logically equivalent rithm for minimizing the power consumption is presented
pins is small. in [114]. The problem is formulated as a constrained
The pin permutation for low power should take place on programming problem and is solved in two phases: global
non-critical gates as it is in general different from the pin optimization and slot assignment. The objective function
permutation for minimum delay. used during either phase is the total weighted net length

584 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 4, APRIL 1995

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
7) Clock Tree Generation: Clock is the fastest and most
Ab6tKU-I
behavioral heavily loaded net in a digital system. Power dissipation
of the clock net contributes a large fraction of the total
Funcuonal power consumption. A two-level clock distribution scheme
CIrCYlr
based on area pad technology for MCM’s is described in
[126]. The first level of the tree is routed on the MCM
substrate connecting the clock source to the clock area pads
while the second level tree lies inside each die with the
Layout
area pads as the source. The objective is to minimize the
leaf cells
load on the clock drivers subjects to meeting a tolerable
clock skew. A significant power reduction (70% for one
benchmark circuit) over the method with one clock pad per
die is reported by using this scheme.
Fig. 12. The datapath library cell model B. Libraries
I ) Scope of the Problem: As VLSI layout design dimen-
where net weights are calculated as the expected switching sions continue to shrink, device sizes tend to shrink faster
activities of gates driving the nets. Constraints on total path than the routing distances. For future designs, it means that
delays are also accounted for. On average, this procedure a significant part of the system power will be due to layout
reduces power consumption by about 10% at the expense parasitics, that are not well modeled at logic and circuit
of 2% increase in circuit delay compared to a placement levels. The conflicting needs for high design productivity,
program minimizing the total interconnection length. low power consumption, high layout density and higher
4 ) Routing: Routing for low power can be performed by performance goals are difficult to meet in acceptable run
net weighting where again the net weights are derived from times without a pre-characterized library. Existing design
the switching activity values of the driver gates. The nets methodologies map high level requirements to the cells in a
with higher weights are more critical and should be given library early in the design cycle. This causes some selection
priority during routing. decisions to be made before all the constraints may be fully
5) Gate Sizing: The treatment of gate sizing problem is known, e.g., the pin positions on the neighboring cells. This
closely related to finding a macro-model that captures the calls for mismatched cell views during layout phase and
main features of a complex gate (area, delay, power con- consequently longer interconnect wires. Another issue is
sumption) through a small number of parameters (typically, the growing size of the library, as the search time for a cell,
width or transconductance). The first major contribution to with m constraints among n cells in a relational database
transistor sizing problem was the work done in [34]. This using separately sorted tables of attribute values, is O ( n m ) .
optimization technique is greedy in the sense that they pick 2 ) Solutions: One possible solution is to overlap the
a path that fails to meet the timing requirements, and resize design steps so that the estimates done at the higher
some transistor on the path so as to meet the constraint. levels are more realistic. The tradeoff is in the increased
The procedure is iterated until all timing constraints are complexity of estimated models and an effort to complete
satisfied or no further optimization is possible. A novel part of the layout design before the logic mapping is fully
approach to gate sizing is described in [7]. This approach done. An approach is to use a library of cells with views
linearizes the path-based timing constraints and uses a linear at every design abstraction level for aiding the decision
programming solver to find the global optimum solution. making process. The cell information is stored in a database
The drawbacks of this appruach are the omission of slope which is queried at each design phase. At lowest level of
factor (input ramp time) of input waveforms from the delay hierarchy, various cells are stored as objects with attributes
model and use of simple power dissipation model that and at higher levels they are grouped by functionality.
ignores short-circuit currents. The concept of delayed binding is used in selecting the
6) Wire Sizing: Wire sizing andlor driver sizing are often layout view of a library cell identified during technology
needed to reduce the interconnect delay on time-critical mapping. Each layout view is designed without any pins on
nets. Wire sizing however tends to increase the load on the the boundary of the cell and multiple connection flexibility
driver and hence increase the power dissipation. A com- is available after final placement of cell instances during
bined wire sizing and driver sizing approach that reduces the layout phase A cell in the datapath library may have
the interconnect delay with only a small increase in the multiple levels of abstraction (i.e., behavioral, logic, circuit
power dissipation is described in [26]. Experimental results and layout), and multiple views at each level, as shown
show that for the same delay constraint, this approach in Fig. 12 (i.e., an Adder at behavioral level will be the
reduces the power by about 10% when compared to the operator “+” but at circuit level the selection can be made
conventional method of driver sizing only. Alternatively, between Carry Look-ahead, Ripple Carry style, etc.).
this approach produces delay values that are up to 40% After an individual library cell design is completed,
lower when compared to the conventional method (at the layout parasitics are extracted with simulated routing placed
cost of increasing the power dissipation by 25%). over-the-cell and this information is used to derive charac-

SINGH er al.: CAD TOOLS AND METHODOLOGIES: A PERSPECTIVE 585

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
teristic model equations for current and power consumption the next we discuss reliability and noise problems caused by
in terms of actual load being driven, voltage, toggle rate high performancehigh frequency design needs as well as
as well the slope of input signal. These electrical equa- low-power design techniques. Reliability verification (RV)
tions along with behavioral models and attributes such as addresses electromigration (EM) and the IR voltage drops
area and pin-to-pin delays are stored in an object-oriented on power supply lines. Continuing with the theme followed
database. The database implements search for equivalent throughout this paper, we first discuss analysis capabilities
functionality cells, and a designer can specify constraints to do reliability verification and optimization techniques to
on a cell performance to limit the search space. Since do design for reliability.
all the performance constraints may not be known at
the beginning, i.e., device strengths are not determined A. Scope of the Problem
until the circuit phase, a group of cells are identified Reliability issues are becoming important due to several
during the logic phase for each component. As more design constraints-higher performance and frequency, de-
details of a design become known, the size of the mapped vice miniaturization, higher levels of on-chip integration,
group for each design component becomes smaller. If and lower supply voltages.
no corresponding cell is available as a new constraint is Device miniaturization can lead to several reliability
applied, that constraint is relaxed and a neighboring cell’s problems-hot-electrons that cause threshold voltage shifts,
constraint is tightened. and gate oxide breakdown leading to runaway current
If more than one cells in the library match the final set and electrostatic discharge (ESD) failures. High current
of constraints, an objective function is computed over each densities from faster circuits, thinner wires and increased
selected cell (e.g., minimize powedarea), and then cells are device count can cause EM failures resulting in opens in
sorted accordingly to select the most desirable candidates. signal and power lines. Another problem caused by high
If no cell meets the given constraints, then a closest cell by average current (Iavg) is the IR voltage drops along the
relaxing the design constraints is found. power lines. This along with lower power supply levels can
In order to do any kind of design or optimization for low degrade the noise margin and lead to functional failures.
power, a power model for the cells has to be available. In In this section we will not address device and material
a design, sources of power dissipation in a design are as reliability issues, but will deal primarily with signal and
follows: power network reliability verification.
B. Reliability VeriJcation
with In order to meet EM reliability constraints, the average
current in a bidirectional signal line needs to meet the
Pdynamic = QCVDDKwingf + ItscvDD (9) following constraint [3].
the first part of which is identical to (1). It,, is the transient Iavg = (Cunitl+ C ~ ) V f w cI JmaxWt (11)
short circuit current, which is the direct current through the
where Cunitis capacitance per unit length, CL is the load
series P and N transistors in CMOS network during logic
capacitance, 1 is the line length, fwc is the worst case
transition (i.e., transition time = tt)
switching frequency, Jma,is the limit of current density
Pstatic = (Ibias + 1leakage)VDD (10) for a specific layer and W and t are the line width and
thickness respectively. Different conducting layers (Al, Cu)
where Ibias is the d.c. current through sense amps, ratioed
have different current density limits. The estimated Iavg
loads, and Iieakage is the circuit leakage current. During
will not only determines which layer to use but also the
search, the value of these equations is computed at run-time
maximum line length (1) and width ( W )
and compared with the given constraints. Special algorithms
In order to determine IR drop limits on power supply
using set theory to optimize the multiple constraints based
lines, current density and voltage drop information needs
query time have been developed.
to be extracted from the physical layout. The following is
Such a cell library and the database is used during
needed to accomplish this task.
low power design to meet the given budgets on area,
performance and power consumption. As compared to a 1) The transistor netlist must be simulated to estimate
relational database, the object-oriented cell library manager the current flowing into and out of each power bus
reduces the search time for an appropriate cell, with m via.
constraints among n cells, from O ( n m ) to O(m1ogn). 2) The physical layout of the power buses must be
A prototype implementation with 5000 library cells, with accurately extracted into an RC network with current
92 attributes per cell, took only 8.6 s to satisfy power sources for the transistors.
constraints with a matching cell [95]. 3) This power network can then be simulated to find the
current density and voltage at each node.
v. RELIABILITY ISSUESIN LOW POWER DESIGN
The primary objectives in the earlier sections have been Techniques for circuit simulation (to extract Iavg and
the analysis and optimization of full-chip power across Ipeak) have been introduced in earlier sections. For large
different stages of the design hierarchy. In this section and networks obtaining the current waveforms for each transis-

586 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 4, APRIL 1995

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
Vcc (on die)

M
0
vcc-vss
-=+7
Switching
2 Maximum
9

Time

Fig. 13. 1& and 15, oscillations from the intrinsic LC circuit-in
the device.

tor can be time-consuming. This drawback can be overcome tions) where the current density or voltage drop exceeds
by using switch-level and logic-level simulators [ 1251, [29] (or is near) the user-specified threshold, is important for
or even pattern-independent simulation techniques (such as the designer to quickly redesign the power bus layout to
[112], [71]). The obvious tradeoff is accuracy. meet reliability requirements.
Resistance extraction algorithms fall into three general
categories: numerical solution of Laplace's equation [4], C . Design for Reliability Optimization
lumped approximation [94] and polygonal reduction [ 1001, For reliability design, signal and power lines need ap-
[103]. The most accurate method solves Laplace's equa- propriate sizing to meet EM constraints. Adding repeaters
tion to produce point-to-point resistances. This tends to to divide long lines into smaller subsections can also
be too slow for full-chip power supply analysis. The help reduce EM [74]. Power distribution can be improved
lumped approximation is fast but oversimplifies the network to curb IR voltage drops. This can be done with an
connectivity. The polygonal reduction method provides a interweaved comb-like power distribution network. Using
speed/accuracy trade-off by producing smaller networks on different metal layers for a global and local power distri-
which point-to-point resistance can be calculated. Capaci- bution also helps [3]. The global network is best laid out
tance extraction techniques too can vary from the simple in the topmost low-resistivity metal layer which should be
[901 to the complex [115], [27]. thicker than the lower metal layers used for local power
The simulation of an RC network is a classic problem in distribution. The effective resistance can be further reduced
circuit analysis that requires circuit formulation and matrix by increasing the number of power connections.
computation [25]. It may, however, be the bottleneck in the
reliability verification (RV) process due to the size of power VI. PACKAGING INDUCEDELECTRICAL
NOISE
network, where millions of parasitic elements are common ISSUESIN LOW-POWERDESIGN
for today's VLSI chips. The number of equations in the Noise management tries to contain the problems caused
matrix computation can reach 10 000 for a 10K gate VLSI by fluctuations of the Vdd/Vss power supply levels. Many
circuit. Moreover, a matrix computation is needed for each factors have aggravated this noise problem: faster tran-
timestep in the time-domain transient simulation. Simplified sistors, higher current levels, shorter clock cycles, lower
methods can speed up the RC network simulation by 1) supply voltages, and power savings techniques. One might
reducing the massive RC network to a smaller size; or 2) have the initial impression that low power design techniques
limiting the number of power networks analyzed. In the for- must inherently improve the stability of the on die power
mer case parasitic elements are lumped into a small number supply levels, Vdd and Vss. As we will see shortly, it
of elements. For example, a wire modeled by a distributed actually makes the problem worse. In this section, we
RC is lumped to a simplified p-structure. Another simplifi- begin by discussing the noise problems caused by high
cation is to reduce the network graph to a tree and then solve performancehigh frequency design and low-power design
for currents and voltages [77]. A second method performs techniques. We then discuss analysis capabilities and op-
the power network analysis only once for a pre-specified timization techniques to better design for on-chip noise
time interval such that average, rms, or peak current of each management.
transistor over this interval is applied to the power network
[125]. In the extreme case, only one matrix computation is A. The Problem
needed to estimate the average current density and voltage We will deal primarily with power supply noise issues.
drop. The speedup from these simplifications, however, is However, we will briefly consider signal line noise.
achieved at the expense of a degraded accuracy. I ) Power Supply Noise: Power supply noise levels due
The output calculated in power net and signal RV analysis to increased & / d t in high performance circuits are also
contains a large volume of data making it a tedious process becoming a growing concern. Given an initial arbitrary
to browse the current density and voltage drop at each current pulse stimulus, Vdd and V,, will try to oscillate
location of the power network. For reliability analysis, a 180" out of phase at their natural ringing frequency, W O =
graphical display showing violations (and potential viola- c
I/(&?), where is the Vdd-Vss capacitance and L is the

SINGH et al.: CAD TOOLS AND METHODOLOGIES: A PERSPECTIVE 587

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
total power supply loop inductance. See Fig. 13. However,
if the stimulus is periodic and strong enough, the oscillator
can be forced to oscillate at the frequency of the stimulus.
The peaks and valleys of Vdd - V,, can have performance
and reliability implications. Timing slowdown may occur
when V& - V,, is at a minimum. Timing skews may arise
from some circuits speeding up at high V d d - V,, and others,
which switch at a different time in the clock period, may Time (seconds)
see a low Vdd - V,, and slow down. Hot electron operating
limits or gate oxide stress limits may be exceeded during Fig. 14. Plot of power before and after using low power design
the Vdd - V,, peaks, leading to reliability failures [42]. techniques [93].
The magnitude of the oscillations are a function of the
power supply inductance, LV.CC and L V T S Sthe ; Vdd -
V,, die capacitance, C D I E ;the amount of capacitance [ S I . The charge on capacitors far away from the switching
which is switching, C S W ;power supply resistance; and the circuit will have a transit time determined by its RC time
magnitude of the current demand from switching circuits, constant. Since the switching circuits are switching faster
laus. It is expressed by the following equation (from [68]): than this RC time constant, this far away charge can not
arrive in time.
Iavg One final area of concern is in the use of low voltage to
AV= $”--
CSM+ ~ D I E
0 sin(w,t). (12)
achieve low power. Although low power supply voltages
help lower the power consumed, higher transistor counts
In order to reduce the magnitude of the voltage oscillation and higher frequency rates usually keep I d d relatively high.
peaks, the package is supplied with as many Vdd and Lower Vdd usually means maintaining a lower absolute
V,, pins as possible to reduce LVCC-VSS.Decoupling value of voltage noise. Considering I * R drop across the
capacitance is added to the die and on the package so that die, power supply guardbands and tester guardbands, very
the highest frequency components of d i / d t does not need to little margin is left for the on die power supply oscillations.
be supplied by a highly inductive off package path. Various Since the d i / d t usually remains fairly high, large values of
architectural techniques to limit d i / d t can be attempted, but decoupling capacitance are needed.
the circuits cannot be slowed down, since performance will 2) Signal Line Noise: With neighboring signal lines lying
be affected. closer to each other on the die and in the package, the
Low power design introduces its own set of problems. mutual coupling capacitance and inductance between them
An ideal low power design would result in low values increases. This along with rapid switching on these lines can
of Iavg and d i l d t . All units on the die would use small cause capacitive and inductive crosstalk leading to timing
currents when active and very little current when inactive. slowdowns and inadvertent logic transition faults.
Low power designs for microprocessors typically results in:
reducing the maximum current peaks moderately,
reducing the time spent at peak levels greatly, B. Noise Analysis
causing very low values of current when the device is In order to calculate the fluctuations of the power supply
carrying out “easy” tasks or is in standby mode. voltage levels on the die and its effect on noise, the
That is, the current delivered tracks the MIPS required on following is needed:
a real time basis. This means that the d i / d t is going to get RLC network of Vdd/Vss from the die to the printed
worse. Power conscious designs in general will go through circuit board power supply,
periods of inactivity, such as, standby mode or sleep mode, switching activity of the device,
followed by intense periods of activity, followed again passive Vdd - V,, capacitance of the device and the
by periods of inactivity. Fig. 14 shows the differences in associated interconnect parasitics, and
current demand between using and not using low power voltage noise specifications for the device under anal-
design techniques [93]. Notice how the current differences ysis.
between peak operation and idle operation are larger in the Inductive effects caused by fast switching core logic
design using power savings techniques. circuits have not been a problem in the past for digital
One other difficulty caused by low power design is that CMOS circuits. The only inductive effects the designer had
current activity may be very localized in space. There is nat- to deal with were in the periphery (U0 circuits) and they
urally occurring decoupling capacitance spread uniformly were easy to deal with because they were very localized,
across the die in the form of N-well to substrate capacitance, had their own power supply, and the designer knew exactly
nonswitching circuit blocks, and other capacitive parasitics. when they switched [69].
When the current activity is concentrated in one area of the Today, high d i / d t problems are distributed all over the
chip, only the naturally occurring decoupling capacitance surface of the die. The designer cannot manually keep track
close to that circuit will function efficiently, it will not be of which circuits in the block switch with every input
able to “see” the capacitance at the far end of the chip vector to the block. The on chip inductance is beginning

588 PROCEEDINGS OF THE IEEE. VOL. 83, NO. 4, APRIL 1995

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
to become significant, further complicating matters. Clearly number for the decoupling capacitance which is lower than
sophisticated CAD tools are needed to give the designer a if neighboring blocks were ignored.
chance at managing d i l d t . 3 ) Cell Libraries for Decoupling Capacitance Design:
I ) d i l d t Extraction Capabilities: While evaluating d i l d t Taking this a step further, a set of standard cells or layout
noise, the damping effect of resistance or inductance in the design guidelines need to be accessible to circuit designers
power supply lines is needed to model realistic effects [60]. and the place and route tools. Standard cells for decoupling
Existing circuit simulators can provide current waveforms capacitors will insure that the intrinsic resistance of the
depicting d i l d t , but they do not include resistance or capacitor is low enough to function properly. A set of
inductance in the power supply lines. While resistance guidelines for interleaving the power supply lines, V d d and
extraction tools have been around for a while, parasitic V,,, needs to be incorporated into the local and global
inductance extraction tools are non-existent. Extracting the routing tools. This will insure that resistance, rather than
inductance requires, in addition to knowledge of the circuit the inductance, will be the dominant component of the
block’s operation, knowledge of the current flow direction, impedance of the power supply lines.
current flow in neighboring lines, and all of the return paths. The step from dealing with laus to dealing with d i ( t ) / d t
To simulate this properly, one usually uses a finite element is big and poses a real challenge for CAD tool developers.
method [24]. Since each situation is unique, an inductance 4 ) Crosstalk Analysis Capabilities: Crosstalk induced
extraction is needed for each particular case of interest. noise is the voltage that develops upon an idle signal
Once parasitic values are extracted, simulations can be line when another signal line switches. Crosstalk consists
run. Unfortunately, when a simulation is performed with of two components: inductive and capacitive. The noise
V& and V,, as variable circuit nodes, rather than fixed developed is dependant upon L,* d i l d t and C, * d v / d t ,
source voltages, it takes up to 25 times longer to run where L , is the mutual inductance and C, is the mutual
[60]. So now fast simulators are needed that give up some capacitance [84].
accuracy, but run faster, to handle the cases with parasitic When crosstalk is dominated by capacitive coupling, as
resistance and inductance in the power supply loop. is the case for most CMOS IC’s operating below 100 MHz,
Information on d i l d t helps to estimate the on-chip die one just needs to know the various capacitance components
and local decoupling capacitance requirements. Determin- of the active line and the passive line. This is basically
ing this estimate at higher levels of abstraction is rapidly an exercise in charge sharing and the various capacitance
gaining importance. A major effort is under way to estimate extraction tools available today can be used.
lavg at the architecture level, the RTL level, schematic level, When the signal lines are closer together, and the switch-
and even the silicon testing level. Considering that lavg ing is faster, then both inductive and capacitive effects
is not easy to calculate, computing d i l d t is even more should be included. This situation applies for package leads
difficult, since a profile over time is needed. One way and traces, Printed Circuit board lines, and lines in high
of doing this at the RTLkchematic level is to generate a speed IC’s. To analyze these situations properly, a finite
switching activityhoggle profile over time. element analysis or a similar analysis capability is needed.
2 ) Effective Decoupling Capacitance Calculation: In ad- Many tools are available today. They take inputs such
dition to d i l d t , other information is needed, such as the as line width and thickness, spacing to neighboring lines,
effective capacitance of the block under consideration and location of return current planes, microstrip or stripline
the percentage of the gates that are not switching. Basically, configuration, etc., and return L , and C, as well as other
we need to know how much capacitance is being charged useful information [ S I .
and discharged and how much is just sitting idle, able to
serve as decoupling Capacitance. Just as power estimates
become more accurate as we move from the architectural C. Design f o r Noise Minimization
level to the circuit level, and eventually the silicon level, Preventing reliability and noise effects from impacting
we expect our decoupling capacitance estimates to improve the performance of the chip requires contributions from the
in accuracy as well. architecture, circuit design, physical layout, packaging, and
Just as the total power of a device is not equal the the board design areas. Placing undo burden on any one of
sum of the worst case power of all of the circuit blocks, them will result in grotesque solutions, if any.
so it is with decoupling capacitance. Tools that calculate 1 ) Architecture: Architectural techniques can be used to
the effective decoupling capacitance of a circuit block, tame the sudden changes incurrent demand. When the
must also be able to calculate the effective decoupling device goes into a standby mode, the circuit blocks can
capacitance of neighboring blocks. A neighboring block be shut down sequentially over two or three clock periods.
may contribute more decoupling capacitance if it is idle Similarly, when the device comes out of standby mode,
than if it is active. The blocks further away from the one or two clock periods could be used to started drawing
switching block will have higher parasitic resistance and moderate amounts of current again.
inductance in the power supplies connecting the blocks. When instructions are pipelined, there is advanced warn-
Hence, the effective decoupling capacitance will be lower ing of when various circuit blocks will switch. A central
for these blocks. When all of this is added up, using circuit d i l d t bookkeeping unit on the die, could estimate in
activity factors to temper the results, one should get a advance the current activities of the die. When the unit

SINGH er al.: CAD TOOLS AND METHODOLOGIES: A PERSPECTIVE 589

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
I
P.C B m d
- LEad
through the capacitor, the bondwires now only have to deal
with one or two Amps per nanosecond.
Similarly, the on package decoupling capacitors can sup-
I power
Supply ply a few amperes per nanosecond, requiring only perhaps
tenths of amperes per nanosecond to flow through the pack-
age pins. The process continues to the printed circuit board
power supply where it is now only required to respond in
Fig. 15. Decoupling capacitance hierarchy. the amperes per hundreds of microseconds range.
The cost per nanofarad of capacitance has to be managed
carefully. Small amounts of capacitance are very cheap to
predicts a large surge in current, it could partially power up
build on the die or on the package, but large amounts can
the appropriate circuit blocks in advance to lower d i l d t .
get very expensive. An optimum design uses just the right
2 ) Logic Design and Physical Layout: To control power
amount of decoupling at each stage to meet noise target
supply noise in addition to decoupling capacitance, the
goals. Capacitance on the die and in the package costs
addition of power and ground pins to minimize the effect
money, so we need to design with the right amount and not
of V,, and V,, inductances has already been discussed. The much more. If enough decoupling capacitance is not used,
on-die decoupling capacitance should be distributed across the on die voltage supply levels will vary too much and
the die, and be placed very close to the circuits needing there will be yield loss. If design is done with excessive
fast charge. If possible, the capacitors should have a clean amounts, the die may need to grow or the package may
V d d / V s s power supply connection. This is to ensure that
again become so complex that there will be yield loss.
the capacitor is able to maintain its own nominal value of
V d d / V s s . Circuit block placement could also be optimized
to share decoupling capacitance, as long as chip routing is
D. Prognosis
not made worse. Managing d i / d t is a complex task which will become
Crosstalk problems are most easily solved using physical more and more critical as technology advances because it
layout techniques. In Printed Circuit boards and in routable is scaling in the wrong direction. Further development in
packages, good impedance control will minimize crosstalk. d i / d t management tools is needed to prevent d i / d t from
This is achieved by placing power planes on either side of becoming a major factor limiting further advances in circuit
the signal lines. The spacing from the lines to the planes integration.
should be closer than the line to line spacing. This can be
achieved by either larger line to line spacing or moving the VII. SUMMARY AND RECOMMENDATIONS
power planes closer to the signal planes. The need for lower power systems is being driven by
On the die, this technique is not yet practical because many market segments as outlined in the Section I. There
the metal layers are usually used for signal routing and not are several approaches to reducing power. The highest ROI
power planes. One can also minimize the overlap of the approach is through designing for low power. Unfortunately
signals by routing sensitive signals perpendicular to each designing for low power adds another dimension to the
other. When this is not possible, one can make the line to already complex design problem; the design has to be
line spacing wider in long parallel runs. optimized for Power as well as Performance and Area.
One final technique is to route power and ground lines Optimizing the three axes necessitates a new class of
between the signal lines to make sure the EM fields couple power conscious CAD tools. The problem is further com-
to the power and ground lines and not the signal lines. plicated by the need to optimize the design for power
3) Die, Package, and Printed Circuit Board Decoupling at all design phases. The power savings at the different
Capacitance: Decoupling capacitance should be added on design phases are, in the best case, multiplicative. For
the die, in the package, and on the printed circuit board. example, a 10%power savings from network optimization
The decoupling capacitance is a good local source for fast together with a 15% power savings from state assignment
charge because high frequency current can come from the will yield a total power savings of (1-0.9)* 0.85)* 100
local capacitor and not have to come from a more inductive = 23.5%. However, more often than not, the total power
path. The decoupling capacitors are functioning as high savings is less, say in this example 20%, since the various
frequency filters [3]. optimizations may adversely affect each other. In addition
In Fig. 15, we demonstrate how the high d i / d t demands all the evidence points to the fact that the biggest power
on the die can be met despite a relatively high inductive savings will be derived from transformations and decisions
V d d / V s s connection to the printed circuit board power made early in the design process, i.e., from the high level
supply. The on die decoupling capacitors have to be high design phase. However, high level design tools are not as
performance capacitors. This means that their own intrinsic mature as the logic and layout level tools. At all phases of
inductance and resistance must be small as to not limit the design process the tools can be classified as either:
the demand for high frequency charge. They will have to analysis tools,
supply charge for current demands in the tens of Amps optimization tools, or
per nanosecond. With this high frequency current shunted libraries and design management tools.

590 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 4, APRIL 1995

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
Initially (in the absence of good optimization tools) the REFERENCES
analysis tools will have the highest impact, as they will
[I] A. Aho and J. Ullman, Principles of Compiler Design. Read-
allow the designer to identify areas of the design that need ing, MA: Addison-Wesley, 1977.
to be optimized for power. In addition the analysis and [2] M. Alidina, J. Monteiro, S. Devadas, A. Ghosh, and M. Pa-
power estimation tools will enjoy a large market because of paefthymiou, “Precomputation-based sequential logic optimiza-
tion for low power,” in Proc. 1994 Int. Workshop on Low Power
their applicability to different design domains such as CPU, Design, pp. 5762, Apr. 1994.
DSP, etc. To ensure designer acceptance and minimum [3] H. Bakoglu, Circuits, Interconnections and Packaging for VLSI.
learning time (on the tool), the analysis and optimization Reading, MA: Addison-Wesley, 1990.
[4] E. Barke, “Resistance calculation from mask artwork data by
tools should, when possible support power capabilities in finite element method,” P. McCormick, Ed. Proc. 22nd DAC
a value added way. 1985, pp. 305-311.
The tools’ problem will have to be attacked on two [5] Y. Be’ery, S. Berger, and B. Ovadia, “An application spe-
cific DSP for portable applications,” Proc. VLSI Signal Proc.
fronts. On one front, existing and mature CAD packages Workshop, Veldhoven, The Netherlands, 1993, pp. 48-56.
(commercial and non commercial) can be quickly modified [6] L. Benini, M. Favalli, and B. Ricco, “Analysis of hazard
to provide basic support for analysis and optimization, e.g. contribution to power dissipation in CMOS IC’s,” in Proc. 1994
Int. Workshop on Low Power Design, pp. 27-32, Apr. 1994.
logic simulation and logic synthesis. [7] M. Berkelaar and J. Jess, “Gate sizing in MOS digital circuits
On the second front, the research community should with linear programming,” in Proc. Europe. Design Autom.
Conf, pp. 217-221, 1990.
embark on longer term, basic and applied, pre-competitive [8] R. K. Brayton and C. McMullen, “The decomposition and
research, to reduce power at all levels of the design, but facorization of Boolean expressions,” in Proc. Int. Symp. on
with a special emphasis on higher level design as described Circ. and Syst., pp. 49-54, Rome, May 1982.
[9] R. Bryant, “Graph-based algorithms for Boolean function ma-
in Sections I1 and 111. Furthermore it is imperative that the nipulation,” IEEE Trans. Compu., vol. c-35,pp. 677-691, Aug.
research activity necessarily include work on design and 1986.
architectures, as well as CAD tools and methodologies, [IO] J. Buck, S. Ha, E. Lee, and D. Messerschmitt, “Ptolemy:
A framework for simulating and prototyping heterogeneous
because good high level CAD tools are domain specific. systems,” Int. J. Compu. Simulation, special issue on Simulation
Domain specific design requires tools targeted at specific and Software Development.
architectures such as those used in CPU and DSP de- [ 1 I] J. Bunda, D. Fussel, and W. Athas, “Evaluating power implica-
tions of CMOS microprocessors,” Proc. Inr. Workshop on Low
signs. Power Design, Napa Valley, CA, Apr. 1994.
The successful development of new power conscious [I21 A. R. Burch, F. Najm, P. Yang, and D. Hocevar, “Pattern
tools and methodologies requires a clear and measurable independent current estimation for reliability analysis of CMOS
circuits,” in Proc. 25th Design Aurom. Conf, pp. 294-299, June
goal. One realistic and, yet challenging, target is to reduce 1988.
power by 3 x in three years and 5 x in 5 years through [I31 R. Burch, F. N. Najm, P. Yang, and T. Trick, “A Monte Carlo
design and tool development. That is, any power reduction approach for power estimation,” IEEE Trans. Very Large-scale
Integ. Syst., vol. I , pp. 63-71, Mar. 1993.
through process scaling or voltage scaling should be above [I41 T. Callaway and E. Swartzlander, “Optimizing arithmitic ele-
an beyond the 3 x and 5 x goals. To achieve these goals, it is ments for signal processing,” Proc. VLSI Signal Proc. Workshop,
necessary that a large number of the tools and techniques, IEEE Press, pp. 91-100, 1992.
[15] M. Carazao, M. Khalaf, L. Guerra, M. Potkonjak, and J. Rabaey,
mentioned in this paper, should become widely available “Instruction set mapping for performance optimization,” Proc.
(by preference through commercial channels). ICCAD 1993, pp. 518-521, Santa Clara, CA, Nov. 1993.
To be more specific, in the short term (one year time [16] B. S . Carlson and C-Y. R. Chen, “Performance enhancement
of CMOS VLSI circuits by transistor reordering,” in Proc. 30th
frame), the following tools are required: logic level power Design Autom. Conf, pp. 361-366, June 1993.
estimation tools (circuit level power estimation is cur- 171 S. Chakravarty, “On the complexity of using BDD’s for the
synthesis and analysis of Boolean circuits,” in Proc. 27th Annu.
rently available), RTL power estimation, low power logic Allerton Con$ on Commun., Contr. and Computing, pp. 730-
synthesis, reliability verification and packaging modeling 739, 1989.
and standard cell libraries characterized for Power, Perfor- 181 A. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-power
CMOS design,” J. Solid-Stare Circ., pp. 472-484, Apr. 1992.
mance, and Area. In the longer term (three-year frame), a 191 A. Chandrakasan, M. Potkonjak, J. Rabaey, and R. W. Broder-
low power design environment should contain the following sen, “HYPER-LP: A system for power minimization using
components: behavioral and architectural level power esti- architectural transformations,” Proc. ICCAD 1992.
[20] A. Chandrakasan, M. Potkonjak, R. Mehru, .I. Rabaey, and R.
mation, low power behavioraYarchitectura1 synthesis and W. Brodersen, “Optimizing power using transformations,’’ to
tools to optimize the micro-architectural description, based be published.
on techniques described in Section 11. [21] A. Chandrakasan, R. Allmon, A. Stratakos, and R. Brodersen,
“Design of portable systems,” Proc. CICC Conf, May 1994.
Achieving these challenging goals will require an active [22] K-Y. Chao and D. F. Wong, “Low power considerations in
role of there search community, in both the development floorplan design,” in Proc. 1994 Inr. Workshop on Low Power
and proliferation of low power circuit, logic and architec- Design, pp. 45-50, Apr. 1994.
[23] K. Chaudhary and M. Pedram, “A near-optimal algorithm for
tural techniques. technology mapping minimizing area under delay constraints,”
in Proc. 29th Design Aurom. Conf, June 1992.
[24] T. Y. Chou, J. Cosentino, and Z. Cendes, “High-speed inter-
connect modeling and high-accuracy simulation using SPICE
ACKNOWLEDGMENT and finite element methods,” Proc. 30th ACMnEEE DAC, June
The authors gratefully acknowledge the participation 1993.
[25] L. 0. Chua and P. M. Lin, Computer-Aided Analysis Algo-
of C. Deng of EPIC Design Technology Inc. for his rithms and Computational Techniques. Englewood Cliffs, NJ:
contribution on PowerMill and circuit level power analysis. Prentice Hall.

SINGH et al.: CAD TOOLS AND METHODOLOGIES: A PERSPECTIVE 591

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
J. Cong, C-K. Koh, and K-S. Leung, “Wire sizing with driver [51] -, “Black-box capacitance models for architectural power
sizing for performance and power optimization,” in Proc. analysis,” Int. Workshop Low Power, Napa Valley, CA, Apr.
1994 Int. Workshop on Low Power Design, pp. 81-86, Apr. 1994.
1994. [52] D. Lanneer, M. Comero, G. Goossens, and H. De Man, “Data
R. L. M. Dang and N. Shigyo, “Coupling capacitances for2- Routing: a paradigm for efficient data path synthesis and
dimensional wires,” IEEE Electro XXX codegeneration,” Proc. 7th ACM IEEE Int. Symp. on High Level
“Dataquest: The Green PC Revolution,” Focus Rep., Oct. 11, Synthesis, Niagara-on-the-Lake, Canada, May 1994.
1993. [53] L. Larmore and D. S. Hirschberg, “A fast algorithm for optimal
A. C. Deng, Y-C Shiau, and K-H. Loh, “Time domain cur- length-limited Huffman codes,” J. Assoc. for Computing Mach.,
rent waveform simulation of CMOS circuits,” in IEEE Trans. vol. 37, no. 3, pp. 4 6 4 4 7 3 , 1990.
Camp.-Aided Design, pp. 208-21 1, Nov. 1988. [54] C. Leiserson, F. Rose, and J. Saxe, “Optimizing synchronous
C. Deng, “Power analysis for CMOSBiCMOS circuits,” in circuits by retiming,” in Proc. Third ConJ: VLSI, 1983, pp.
Proc. 1994 Int. Workshop on Low Power Design, pp. 3-8, Apr. 23-26.
1994. [55] J. C. Liao, 0. A. Palusinski et al., “University of Arizona
P. Duncan, S. Swamy, and R. Jain, “Low-power DSP circuit Capacitance Calculator,” Users Guide, June 1986.
design using retimed maximally parallel architectures,” Proc. [56] D. Lidsky and J. Rabaey, “Low power design of memory
1st Symp. on Integ. Syst., pp. 266-275, Mar. 1993. intensive applications-Case study: vector quantization,” Proc.
S. Ercolani, M. Favalli, M. Damiani, P. Olivo, and B. Ricco, 1994 Symp. on Low Power Electron., San Diego, CA, 1994.
“Estimate of signal probability in combinational logic net- [57] B. Lin and H. de Man, “Low-power driven technology mapping
works,’’ in 1st Europe. Test Cont. pp. 132-138, 1989. under timing constraints,” in Int. Workshop on Logic Synthesis,
C. M. Fiduccia and R. M. Mattheyses, “A linear-time heuristic pp. 9a-19a.16, Apr. 1993.
for improving network partitions,” in Proc. 19th Design Autom. [58] B. Lin and A. R. Newton, “Synthesis of multiple-level logic
Conf, pp. 175-181, 1982. from symbolic high-level description languages,” in IFIP Int.
J. P. Fishbum and A. E. Dunlop, “TILOS: A posynomial Conf on Very Large Scale Integration, pages 187-196, Aug.
programming approach to transistor sizing,” in P roc. IEEE Int. 1989.
Conf on Camp.-Aided Design, pp. 3 2 6 3 2 8 , Nov. 1985. [59] P. Lippens, J. van Meerbergen, W. Verhaegh, and A. van
T. A. Fjeldly and M. Shur, “Threshold voltage modeling and the der Werf, “Allocation of multiport memories for hierarchical
subthreshold regime of operation of short-channel MOSFET’s,” datastreams,” Proc. IEEE Int. Con$ Comp. Aided Design, Clara
IEEE Trans. Electron Devices, vol. 40, pp. 137-145, Jan. 1993. CA, Nov. 1993.
B. J. George, D. Gossain, S. C. Tyler, M. G. Wloka, and G. K. [60] F. Maloberti and G. Torelli, “On the design of CMOS digital
H. Yeap, “Power analysis and characterization for semi-custom output drivers with controlled dlldt,” IEEE Int. Symp. on Circ.
design,” in Pro. 1994 Int. Workshop on Low Power Design, pp. and Syst., vol. 4, p. 2236, 1991.
215-218, Apr. 1994. [61] R. Marculescu, D. Marculescu, and M. Pedram, “Logic level
W. Geurts, F. Catthoor, and H. De Man, “Heuristic tech- power estimation considering spatiotemporal correlations,” in
niques for the synthesis of complex functional units,” Proc. Proc. IEEE Int. Conf on Computer Aided Design, Nov. 1994.
4th ACM/IEEE EDAC Conf, Paris, pp. 552-556, Feb. 1993. [62] A. Masaki, “Deep-submicron CMOS warms up to high-speed
A. A. Ghosh, S. Devadas, K. Keutzer, and J. White, “Estimation logic,” Circuits and Devices, Nov. 1992.
of average switching activity in combinational and sequential [63] A. Masaki and T. Chiba, “Design aspects of VLSI for computer
logic,” IEEE Trans. Electron Devices, Apr. 1982.
circuits,” in Proc. 29th Design Automation Conf, pp. 253-259,
[64] R. Mehra and J. Rabaey, “High level power estimation and
June 1992.
exploration,” Proc. Int. Low Power Workshop, Napa Valley, CA,
L. H. Goldstein, “Controllability/observability of digital cir- Apr. 1994.
cuits,” IEEE Trans. Circ. and Syst., vol. 26, pp. 685-693, Sept. [65] J. Monteiro, S. Devadas, and A. Ghosh, “Retiming sequential
1979. circuits for low power,” in Proc. IEEE Int. Con$ on Comp.
R. Hartley and A. Casavant, “Tree-height minimization in Aided Design, pp. 3 9 8 4 0 2 , Nov. 1993.
pipelined architectures,” IEEE Trans. Comp. Aided Design, vol. [66] -, “Estimation of switching activity in sequential logic
8, pp. 112-115, 1989. circuits with applications to synthesis for low power,” in Proc.
R. I. Hartley, “Optimization of canonic signed digit multipliers 31st Design Autom. ConJ:, pp. xx, June 1994.
for filter design,” Proc. IEEE Int. Symp. Syst., Singapore, June [67] J. Monteiro, S. Devadas, B. Lin, C.-Y. Tsui, M. Pedram, and
1991. A. M. Despain, “Exact and approximate methods of switching
C. Hu, S. C. Tam, F. C. Hsu, P. K. KO, T. Y. Chan, and K. W. activity estimation in sequential logic circuits,” in Proc. 1994
Temll, “Hot-electron induced MOSFET degradation-Model, Int. Workshop on Low Power Design, pp. 117-122, Apr. 1994.
monitor, and improvement,” IEEE Trans. Electron Devices, vol. [68] T. Mozdzen and B. Bhattachaqya, “Electrical design rules for
ED-32, p. 375, 1985. using Si02 gate oxide as decoupling capacitors,” Intel Internal
D. A. Huffman, “A method for the construction of minimum Rep., Nov. 1992.
redundancy codes,” in Proc. IRE, vol. 40, pp. 1098-1 101, Sept. [69] T. Mozdzen, “The effect of on die V c c / V s s Z * R voltage drop
1952. upon average and dildt,” Intel Internal Rep., May 1994.
[44] S. Iman and M. Pedram, “Multi-level network optimization for [70] C. Nagendra et al., “A comparison of power-delay character-
low power,” in Proc. IEEE Int. Conf on Comp. Aided Design, istics of CMOS adders,” Proc. Int. Workshop on Low Power
Nov. 1994. Design, Napa Valley, CA, Apr. 1994.
[45] R. Jain et al, “Module Selection for Pipelined Synthesis,” Proc. [71] F. Najm and R. Burch, “CREST-A current estimator for
Design Automation, Anaheim, pp. 542-547, 1988. CMOS circuits,” Proc. 25th DAC, June 1988, pp 204-207.
[46] J. M. Janssen, F. Catthoor, H. De Man, “A specification [72] F. Najm, “Transition density, a stochastic measure of activity
invariant technique for operation cost minimisation in flow- in digital circuits,” in Proc. 28th Design Autom. Con$, pp.
graphs,” Proc. 7th Int. Workshop on High-Level, Niagara-on- 644-649, June 1991.
the-Lake, Canada, May 1994. [73] F. N. Najm, R. Burch, P. Yang, and I. Hajj, “Probabilistic
[47] S. M. Kang, “Accurate simulation of power dissipation in VLSI simulation for reliability analysis of CMOS VLSI circuits,” in
circuits,” it IEEE J. Solid-state Circ., vol. 21, pp. 889-891, Oct. IEEE Trans. Computer-Aided Design of Integ. Circ. and Syst.,
1986. vol. 9, pp. 4 3 9 4 5 0 , Apr. 1990.
[48] K. Keutzer, “DAGON: Technology mapping and local opti- [74] S. Y. Oh, K. J. Chang, N. Chang, and K. Lee, “Interconnect
mization,” in Proc. Design Automation Conf, pp. 341-347, June modeling and design in high-speed VLSI/ULSI systems,” Proc.
1987. 29th ACM/IEEE DAC, June 1992, pp. 184-189.
[49] F. J. Kurdahi and A. C. Parker, “Techniques and area estimation [75] K. P. Parker and J. McCluskey, “Probabilistic treatment of
of VLSI layouts,” IEEE Trans. Camp.-Aided Design, vol. 8, pp. general combinational networks,” IEEE Trans. Computers,vol.
xxx, Jan. 19xx. C-24, pp. 6 6 8 4 7 0 , June 1975.
[50] P. Landman and J. Rabaey, “Power estimation for high level [76] M. Pedram and B. Preas, “Accurate prediction of physical
synthesis,” Proc. EDAC-EUROASIC ‘93, Paris, France, pp. design characteristic for random1 ogic,” 1989 IEEE ICCD Conf,
361-366, Feb. 1993. Boston, pp. 100-108, 1989.

592 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 4, APRIL 1995

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
[77] L. T. Pillage and R. A. Rohrer, “Asymptotic waveform eval- [lo41 C. Svensson and D. Liu, “A power estimation tool and prospects
uation for timing analysis,” IEEE Trans. Comp.-Aided Design, of power savings in CMOS VLSI chips,” Proc. Int. Workshop
vol. 9, pp. 352-366, Apr. 1990. on Low Power Design, Napa Valley, CA, Apr. 1994.
[78] M. Potkonjak and J. Rabaey, “Optimizing resource utilization [IO51 V. Tiwari, P. Ashar, and S. Malik, “Technology mapping for
using transformations,” IEEE Trans. Comp. Aided Design, vol. low power,” in Proc. 30th Design Auto. Conf, pp. 74-79, June
13, pp. 277-292, Mar. 1994. 1993.
[79] S. R. Powell and P. M. Chau, “Estimating power dissipation of [lo61 E. Tsem, T. Meng, and A. Hung, “Video compression for
VLSI signal processing chips: The PFA technique,” VLSI Signal portable communication using pyramid vector quantization of
Processing IV, pp. 250-259, 1990. subband coefficients,” Proc. IEEE Workshop on VLSI Signal
[80] J. Rabaey, Digital Integrated Circuits: A Design Perspective. Processing, Veldhoven, pp. 444-452, Oct. 1993.
Englewood Cliffs, NJ: Prentice Hall, 1995, currently available [IO71 C-Y. Tsui, M. Pedram, C-A. Chen, and A. M. Despain, “Low
as EE141 Course Reader, U.C. Berkeley. power state assignment targeting two- and multi-level logic
[81] -, “Low power design methodologies,” in Circuits and implementations,” in Proc. IEEE Int. Con$ on Comp. Aided
Systems Tutorials. Oxford, UK: LTP Electronics Ltd, pp. Design, Nov. 1994.
373-386, June 1994. [lo81 C. Y. Tsui, M. Pedram, and A. Despain, “Efficient estimation
[82] J. Rabaey, C. Chu, P. Hoang, and M. Potkonjak, “Fast proto- of dynamic power dissipation under a real delay model,” in
typing of data path intensive architectures,” IEEE Design and Proc. IEEE Int. Con$ on Comp. Aided Design, pp. 224228,
Test vol. 8, pp. 40-51, 1991. Nov. 1993.
[83] J. Rabaey and M. Potkonjak, “Complexity estimation for real [ 1091 -, “Technology decomposition and mapping targeting low
time application specific circuits,” Proc. IEEE ESSClRC Con$, power dissipation,” in Proc. 30th Design Automation Conf, pp.
Milan, Italy, 1991. 68-73, June 1993.
[84] S. Rajaram, “Electrical characterization of interconnections,” 101 -, “Power efficient technology decomposition and mapping
The Electrical Engineering Handbook, CRC Press, p. 499, 1993. under an extended power consumption model,” IEEE Trans.
[85] S. Rajgopal and G. Mehta, “Expreriences with simulation-based Comp.-Aided Design of Integrated Circuits and Syst. vol. 13,
schematic level current estimation,” in Proc. I994 Int. Workshop pp. xx, Sept. 1994.
on Low Power Design, pp. 9-14, Apr. 1994. 111 __, “Exact and approximate methods for calculating signal
[86] J. Rajski and J. Vasudevamurthy, “The testability-preserving and transition probabilities in FSM’s,” in Proc. 31st Design
concurrent decomposition and factorization of Boolean expres- Automation Con$ June 1994.
sions,” IEEE Trans. Comp.-Aided Design of Integ. Circ. and 121 A. Tyagi, “Hercules: A power analyzer of MOS VLSI cir-
Syst., vol. 11, pp. 778-793, June 1993. cuits,” in Proc. IEEE Int. Conf on Computer Aided Design,
[87] D. Rao and F. Kurdahi, “Partitioning by regularity extraction,” pp. 53@533, Nov. 1987.
29th ACM/DAC’92, pp. 235-238, 1992. [I131 J. Ullman, Computational Aspects of VLSI. Rockville, MD:
[88] N. Raver, “Modeling CMOS local drop-point voltage depen- Computer Science Press, 1984.
dencies,” IBM Res. Rep., vol. RC 17269, No. 76388, Oct. [114] H. Vaishnav and M. Pedram, “Pcube: A performance driven
1991. placement algorithm for low power designs,” in Proc. Europe.
[89] K. Roy and S. C. Prasad, “Circuit activity based logic synthesis Design Autom. Con$, Sept. 1993.
for low power reliable operations,” IEEE Trans. Very Large- [ 1151 N. P. Van Der Meijs et a/, “An efficient finite element method
Scale Integ. Syst., vol. 1, pp. 503-513, Dec. 1993. for submicron IC capacitance extraction,” Proc. 26th DAC, pp.
[90] T. Sakurai and K. Tamaru, “Simple formulas for 2 and 3- 678-681, June 1989
dimensional capacitances,” IEEE Trans. Electron Devices, vol. 161 P. Van Oostende, P. Six, J. Vandewalle, and H. DeMan,
ED-30, pp. 183-185, Feb. 1983. “Estimation of typical power of synchronous CMOS circuits
[91] H. Savoj, R. K. Brayton, and H. J. Touati, “Extracting local using a hierarchy of simulators,” IEEE 1.Solid-state Circ., vol.
don’t cares for network optimization,” in Proc. IEEE Inr. Con$ 28, pp. 26-39, Jan. 1993.
on Comp. Aided Design, pp. 5 1 4 5 1 7 , Nov. 1991. 171 H. J. M. Veendrick, “Short-circuit dissipation of static CMOS
[92] P. Schneider and U. Schlichtmann, “Decomposition of Boolean circuitry and its impact on the design of buffer circuits,” in
functions for low power based on a new power estimation IEEE J. Solid-State Circ., vol. 19, pp 4 6 8 4 7 3 , Aug. 1984.
technique,” in Proc. 1994 Int. Workshop on Low Power Design, 181 H. Vermassen, “Mathematical models for the complexity of
pp. 123-128, 1994. VLSI,” M.S. thesis, Katholieke Universiteit, Leuven, Belgium,
[93] J. Schutz, “A 3.3V 0.6um BiCMOS Superscaler Microproces- 1986.
sor,”ISSCC Dig. Tech. Papers, pp. 202-203, Feb. 1994. 191 T. Villa and A. Sangiovanni-Vincentelli,“NOVA: State as-
[94] W. S. Scott and J. K. Ousterhout, “Magic’s circuit extractor,” signment of finite state machines for optimal two-level logic
Proc. 22nd DAC, 1985, pp 286-292. implementations,” IEEE Trans. Comp.-Aided Design of Integ.
[95] N. Seghal, C. Chen, and J. Acken, “An object-oriented cell Circ. and Syst., vol. 9, pp. 905-924, Sept. 1990.
library manager,” ICCAD, Nov. 1994. [120] M. Van Swaaij, F. Franssen, F. Catthoor, and H. De Man,
[96] S. C. Seth, L. Pan, and V. Agrawal, “PREDICT-Probabilistic “Automating high-level control flow transformations for DSP
Estimation of Digital Circuit Testability,” in Proc. Fault Toler- memory management,” Proc. IEEE Workshop on VLSI Signal
ant Comp. Symp., pp. 22@225, June 1985. Processing, Napa Valley, CA, Oct. 1992. Also in VLSI Signal
[97] A. A. Shen, A. Ghosh, S. Devadas, and K. Keutzer, “On Processing, V. K. Yao, R. Jain, W. Przytula, and J. Rabaey,
average power dissipation and random pattem testability of Eds. New York: IEEE Press, pp. 397-406, 1992.
CMOS combinational logic networks, in Proc. IEEE Int. Con$ [I211 R.A. Walker and R. Camposano, A Survey of High-Level Syn-
on Computer Aided Design,Nov. 1992. thesis Systems. Boston: Kluwer, 1992.
[98] S. Sheng, A. Chandrakasan, and R. Brodersen, “A portable [122] S. Wu, “A generic description of hardware libraries,” M.S.
multimedia terminal,” IEEE Commun. Magazine, pp. 64-75, thesis, Univ. Calif., Berkeley, May 1994.
Dec. 1992. [123] S. Wuytack, F. Catthoor, F. Franssen, L. Nachtergaele, and
[99] C. S. Shung et al., “An integrated CAD system for algorithm- H. DeMan, “Global communication and memory optimizing
specific IC design,” IEEE Trans. CAD Integ. Circ. and Syst., transformations for low power systems,” Int. Workshop Power
Apr. 1991. Design, Napa Valley, CA, Apr. 1994.
[loo] D. Stark and M. Horowitz, “REDS: Resistance extractor for [ 1241 S. Yang, “Logic synthesis and optimization benchmarks user
digital simulation,” Proc. 24th DAC, 1987, pp 570-573. guide,” Microelectronics Center of North Carolina, Research
[ l o l l A. Stratakos, R. Brodersen, and S. Sanders, “High low voltage Triangle Park, NC, 1991.
DC-DC conversion for portable applications,” Proc. Int. Low [125] T. Yoshitome, “Hierarchical Analyzer for VLSI power supply
Power Workshop, Napa Valley, CA, Apr. 1994. networks based on a new reduction method,” Proc. ICCAD, pp.
[lo21 S. L. Su, V. B. Rao and T. N. Trick, “HPEX: A hierarchical 298-301, 1991.
parasitic circuit extractor,” Proc. 24th DAC, 1987, pp 566-569. [I261 Q. Zhu, J. G. Xi, W. W-M. Dai, and R. Shukla, “Low power
[ 1031 C. Su, C. Tsui, and A. Despain, “Low power architecture design clock distribution based on area pad interconnect for multichip
and compilation techniques for high performance processors,” modules.” in Proc. 1994 Int. Workshop on Low Power Design,
Proc. pp. 87-92, Apr. 1994.

SINGH et al.: CAD TOOLS AND METHODOLOGIES: A PERSPECTIVE 593

Authorized licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.
Deo Singh (Member, IEEE) received Diplo- Francky Catthoor (Member, IEEE) received
mas in electrical engineering from Middlesex the engineenng degree and the Ph.D. degree
Polytechnic and Twickenham Polytechnics, in in electncal engineering from the Katholieke
1976 and 1977, and the M.Sc.-CAD degree from Universiteit Leuven, Belgium, in 1982 and 1987,
Kingston Polytechnic in 1981. respectively
In 1977 he joined GEC Semiconductors, Mel- From 1983 to 1987 he was a Researcher
bourne, FL, where he was responsible for de- in the VLSI Design Methodologies for Signal
velopment and support of CAD tools. In 1981 Processing at IMEC, Heverlee, Belgium Since
he joined Harris Semiconductors in Melbourne, 1987 he has headed research domans in the
FL, and worked on logic simulation and cellu- area of architectural and synthesis methodolo-
larhierarchical design rule checking. From 1983 gies, within the VSDM division at IMEC. His
to 1987 he was Harris’ liaison to the MCC CAD program in Austin, TX. current research activities manly belong to the field of application-specific
From 1988 to 1991 he managed a European commission multinational architecture design methods and system-level transformations intended
R&D CAD tools project (SPRITE) in Belgium. The project developed for real-time signal processing algonthms in image, video, and telecom
two prototype high-level synthesis systems targeted at application-specific applications.
architectural compilation for video, HDTV, and digital communications. Dr. Catthoor received the Young Scientist Award from the Marconi
He is presently Principal Engineer and Manager of the Low Power Design International Fellowship in 1986.
Technologies (LPDT) group in Intel Corporation, Santa Clara, CA, where
he is responsible for reducing power in Intel microprocessors through the
development and proliferation of low-power design methodologies (Tools,
Methodologies, and Design Techniques).
Suresh Rajgopal (Member, IEEE) received the
B.S. degree in computer science and engineer-
ing from the Indian Institute of Technology,
Jan M. Rabaey (Fellow, IEEE) received the Karagpur, in 1985. He received the M.S. in com-
E.E. and Ph.D. degrees in applied sciences from puter science from the University of Tennessee,
the Katholieke Universiteit, Leuven, Belgium, in Knoxville, in 1987 and the Ph.D. in computer
1978 and 1983, respectively. science from the University of North Carolina,
From 1983 to 1985 he was a Visiting Research Chapel Hill, in 1992.
Engineer with the University of California at He is currently a Senior CAD Engineer in the
Berkeley. From 1985 to 1987 he directed the Low Power Design Technology Group at Intel
Architectural and Algorithmic Strategies Group Corporation, Santa Clara, CA.
of the Design Methodologies for the VLSI Sys-
tem Division at IMEC, Belgium. Since 1987
he has been with the University of California
at Berkeley’s Electrical Engineering and Computer Science Department,
where he is now a Professor. He was a Visiting Professor at the University
of Pavia, Italy, in 1991. His current research interests include computer- Naresh Sehgal (Member, IEEE) received the
aided analysis and automated design of digital signal processing circuits, B.E. degree (hons) from Panjab University,
and architectural synthesis. He has authored or co-authored numerous India, in 1984, and the M.S. and Ph.D. degrees in
technical papers in the area of signal processing and design automation. computer engineering from Syracuse University
He was an Associate Editor of the IEEE Journal of Solid-State Circuirs. in 1988 and 1994, respectively.
Dr. Rabaey received a number of awards, including the 1985 IEEE He has been with Intel’s Design Technology
Transactions on Computer Aided Design Best Paper Award from the division since 1988, and currently manages a
Circuits and Systems Society, the 1989 Presidential Young Investigator full-chip assembly place and route tool team.
Award, and the 1994 Signal Processing Society Senior Award. He is the His research interests are in object-oriented
current Chair of the VLSI Signal Processing Technical Committee of the databases and layout algorithms.
Signal Processing Society. He is also a member of the Design Automation
Conference’s Executive Committee.

Massoud Pedram (Member, IEEE) received 1 Thomas J. Mozdzen (Member, IEEE) received
the B S degree in electncal engineenng from the B S in physics and the M S E E from the
the California Institute of Technology in 1986 University of Illinois at Urbana in 1978 and
and the M.S and Ph D degrees in electncal 1980, respectively He received the M S in
engineenng and computer sciences from the physics from the University of Texas at Dallas
University of California at Berkeley in 1985
He is Assistant Professor of Electncal Engi- He joined Mostek in 1979 as a Process Char-
neering-Systems at the University of Southern actenzation Engineer and worked on thin oxide
California His research interests and contri- integnty, hot electron effects, and the modeling
butions have been in logic synthesis, physical of failure mechanisms He joined the DRAM
design, and CAS for low power design group in 1984 and codesigned a 256K
Dr Pedram received the Best Paper Award in the CAD track in the IEEE video DRAM Dunng 1986-1988 he was with Siemens, Munich, Ger-
International Conference on Computer Design in 1989, and the National many, where he worked on the design of a &MEG DRAM and a digital
Science Foundation’s Research Initiation and Young Investigator Awards television picture processor In 1988 he joined the ASIC design group at
in 1992 and 1994, respectively He has served on the program commttees Intel, where he was the design architect of the Intel ASIC 0 1 p m Standard
for several conferences and workshops (including the Design Automation Cell Library His current area of focus is on d z / d t management across
Conference) and is the cofounder of and General Chair of the International the die, package, and board
Workshop on Low Power Design Mr Mozdzen is a member of the Amencan Physical Society

594 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 4, APRIL 1995

Authorized
View publication stats licensed use limited to: Rice University. Downloaded on January 3, 2009 at 17:34 from IEEE Xplore. Restrictions apply.

You might also like