12

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

Analysis and Comparison in the Energy-Delay-Area


Domain of Nanometer CMOS Flip-Flops:
Part I—Methodology and Design Strategies
Massimo Alioto, Senior Member, IEEE, Elio Consoli, and Gaetano Palumbo, Fellow, IEEE

Abstract—In this paper (split into Parts I and II), an extensive So far various analyses have been carried out, each focusing
comparison of existing flip-flop (FF) classes and topologies is car- on aspects pertinent with FFs comparison [1], [2], [4]–[11].
ried out. In contrast to previous works, analysis explicitly accounts Among these works, [4]–[6] are the most thorough in terms of
for effects that arise in nanometer technologies and affect the en-
ergy-delay-area tradeoff (e.g., leakage and the impact of layout and adopted figures of merit and evaluated parameters. However,
interconnects). Compared to previous papers on FFs comparison, previous comparisons exhibit (some of) the following lacks.
the analysis involves a significantly wider range of FF classes and • They involve a limited number of topologies and/or do not
topologies. In particular, in this Part I, the comparison strategy, cover the entire spectrum of applications and con-
which includes the simulation setup, the energy-delay estimation straints that are observed in real designs (therefore, no uni-
methodology, and an overview of an optimum design strategy, to-
gether with the introduction of the analyzed FF classes and topolo- form comparison is available for the wide range of existing
gies, are reported. FF classes).
• The area-delay and leakage-delay tradeoffs are usually not
Index Terms—Clocking, energy-delay tradeoff, energy effi- considered at all.
ciency, flip-flops (FFs), high speed, interconnects, leakage, Logical
Effort, low power, nanometer technologies, VLSI.
• Circuit designers are typically accustomed to think in terms
of minimum energy-delay products or . Instead, a
fair comparison should take into account the FFs behavior
over the whole space [6], [12]–[14].
I. INTRODUCTION
• The FF input capacitance is typically assumed fixed
or at most swept as a parameter in a narrow range, whose
extent is selected in a naïve manner. Hence, it is not clear

T HE appropriate choice of flip-flop (FF) topologies is of


fundamental importance in the design of VLSI integrated
circuits and, in particular, of both high-speed and low-energy
if the adopted ranges cover the regions of the
involved in real applications. Moreover, this naïve choice
does not permit to associate each value of
space

to a well-
microprocessors [1]. Indeed, FFs affect the clock frequency, defined point in the space.
since their delay occupies a significant fraction of the clock • Till now, the most significant FF analyses in the litera-
cycle, especially in fast micro-architectures with low logic depth ture have not adopted sub-100-nm technologies, thereby
[2]. Moreover, together with the circuits devoted to the clock neglecting:
generation and distribution, FFs are part of the clock network, • the leakage influence in active and in standby modes;
which is responsible for 30%–50% of the whole chip energy • the impact of layout parasitics associated with intercon-
budget [3], [4]. nects, which degrade both speed and energy.
Various classes of FFs have been proposed to achieve a de- In this paper (split into Parts I and II), a novel analysis and
sired energy-delay tradeoff and depending on the fea- comparison strategy is proposed, which suitably accounts for all
tures of the application (high speed, low energy, low standby the previously mentioned aspects to achieve fair and meaningful
energy, etc.). Understanding the suitability of FFs for a given results. Such strategy is applied to compare a large number of FF
application is difficult and so is their selection, since it involves classes (4) and topologies (19) in a 65-nm CMOS technology.
a large number of existing topologies and depends on transis- In particular, we show the following.
tors sizing. In particular, an appropriate sizing methodology is 1) The comparison is carried out by including local wires par-
necessary to get reliable results usable in practical designs. asitics within the transistors sizes optimization. To limit the
number of real layouts, wires parasitics are extracted from
Manuscript received September 10, 2009; revised December 21, 2009. some reference layouts and are estimated for different tran-
M. Alioto is with the Dipartimento di Ingegneria dell’Informazione (DII), sistors sizes from the analysis of stick diagrams.
Università di Siena, 53100 Siena, Italy and also with the Berkeley Wireless
Research Center—Electrical Engineering and Computer Science Department,
2) Leakage is separately evaluated and its impact is analyzed
University of California, Berkeley, CA 94704-1302 USA (e-mail: malioto@dii. in both active and standby mode.
unisi.it; alioto@eecs.berkeley.edu) 3) The space is explored by considering the points
E. Consoli and G. Palumbo are with the Dipartimento di Ingegneria Elettrica, where products are minimized ( and are widely
Elettronica e dei Sistemi (DIEES), Università di Catania, I-95125 Catania, Italy
(e-mail: econsoli@diees.unict.it; gpalumbo@diees.unict.it). varied to cover this space). Accordingly, every design is
Digital Object Identifier 10.1109/TVLSI.2010.2041376 associated with a point in the space that has a clear
1063-8210/$26.00 © 2010 IEEE

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:23:17 EDT from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

the FF input capacitance seen from the clock terminal, both the
transistors and local (i.e., internal to the FF) interconnects ca-
pacitances are taken into account.
As concerns the FF data input signal, we follow a different
approach with respect to [1], [2], where another constant slope
policy was adopted for simplicity. Indeed, in real pipelines, the
speed of the logic block driving the FF data input is obviously
comparable to the FF speed. Accordingly, the size of the data-
driving inverter close to the FF, , is set so that the slope of
the FF data input signal ( in Fig. 1) is equal to the slope at the
output of the FF first stage that is driven by . The latter slope
is estimated by resorting to the Logical Effort (LE) model [16].
Indeed, during the exploration of the design space, the sizes of
all FF transistors are known and LE model can be applied (also
including the layout parasitics).
In the case of circuits that are driven by complementary clock
(e.g., master-slave FFs) or data (e.g., differential FFs) signals,
both polarities are generated through buffers that are considered
Fig. 1. Test bench circuit used to characterize a generic FF.
as external to the FF, in order to avoid a penalty with respect
to other circuits [8]. Moreover, we assume that the comparison
of inverting and non-inverting FFs does not require further ar-
meaning, which links results to the hardware intensity con-
rangements and that neither of them is presumptively better than
cept in [15]. This allows for gaining a deeper insight into
the other ones [8].
the tradeoff.
Finally, the output load is swept to test the FF response under
4) is a design variable allowing for further exploring the
light, moderate and heavy loading conditions [8]. Typical rea-
potentials of each topology in the minimization of different
sonable loads are , being the input capac-
figures of merit, differently from [6], where separate en-
itance of a symmetrical minimum inverter (i.e., with
ergy-efficient curves (EECs) were extracted under very few
). Greater loads are not considered since, ac-
(three) different parametrical values.
cording to LE, they usually require the insertion of a buffer at
5) In addition to the thorough investigation of the
the FF output, which alters the intrinsic energy-delay FF fea-
tradeoff, the interdependence of several other circuit pa-
tures [8]. Observe that the first loading inverter in Fig. 1 that
rameters is analyzed, including leakage, silicon area and
loads the FF output is in turn loaded by another inverter, which
clock load. To this aim, appropriate figures of merit to rank
is 4 times wider to avoid an unrealistically strong Miller effect
the considered FF classes and topologies are introduced.
in the gate-drain capacitances at the FF output.
This paper is divided into two parts: Part I presents the com-
parison methodology, the design strategies and the FF topolo- B. Definition of Speed and Timing Parameters
gies to be analyzed, whereas the results are reported and com-
mented in detail in Part II [42]. The timing parameters characterizing a FF are well-known
Part I is structured as follows. In Section II the simulation and are accurately described in [2]. They are as follows:
and analysis setup is shown. The space analysis and the 1) the minimum data-to-output delay , which is ob-
related topics are in Section III. The design-optimization strate- tained by selecting the optimum data-to-clock delay;
gies for FFs are discussed in Section IV. Section V presents the 2) the setup time , which is the optimum data-to-clock
selected FF classes and topologies. Finally, Section VI reports a delay that leads to ;
brief conclusion. This paper also includes an Appendix related 3) the minimum clock-to-output delay , occurring
with the accurate evaluation of the transient energy dissipation. when the data input transition occurs well before the clock
transition;
4) the hold time , which is the clock-to-data delay that
II. SIMULATION SETUP, ENERGY-DELAY ESTIMATION AND leads to a 5% increment of clock-to-output delay with
ANALYSIS OF LAYOUT ISSUES respect to .
In the analysis of the FF behavior within the space,
A. Test Bench: Driving Circuits and Output Load
the speed is quantified through the minimum achievable
Fig. 1 shows the setup used to test a generic FF, which is data-to-output delay, i.e., , according to [6]. In-
similar to that proposed in [1], [2] but with some differences. deed, represents the FF timing contribution to the cycle
The clock signal fed to the FF comes from a two-stage buffer, time when the FF is placed into a critical path [5]. Moreover,
sized to attain a typical slope [1], [2] at the clock input every delay is evaluated by considering the greatest among
node of the FF ( is the slope of the output waveform of all the possible data-to-output paths [namely two
an inverter loaded by inverters of the same size [16]). Hence, for the single-edge-triggered (SET) FFs and four paths for the
the size of the clock-driving inverter close to the FF is dual-edge-triggered (DET) FFs]. On the other hand, FFs lying
set to get an electrical effort equal to 3 [16]. When evaluating in fast paths do not affect system speed. Anyhow, data races

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:23:17 EDT from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 3

must be avoided and the race immunity (1)


is the parameter that defines the FF sensitivity to races [2].
As regards and , FFs can be subdivided into the from which it is apparent that the value of the clock period
following two main categories: has to be explicitly set to consistently add and compare
1) FFs where have positive (negative) values, the leakage and the transient energy. Equivalently, the impact
such as the master-slave FFs. They always have of leakage in active mode depends on the choices made at the
and hence are not prone to data race problems, although micro-architectural level, which set for a given technology.
they do not allow time-borrowing because of their positive Actually, to express the impact of micro-architecture regardless
[2], [5]. of the adopted technology, it is more convenient to refer to the
2) FFs where have negative (positive) values, logic depth instead of the absolute clock cycle (i.e.,
such as the Pulsed FFs. They are featured by an inherent the equivalent number of cascaded stages with optimum stage
tradeoff related to the duration of their transparency effort equal to 4 [16]). Typical values of are respec-
window. Indeed, by enlarging the width of the clock pulse, tively equal to 10, 40, and 80 for high-performance, energy-ef-
their soft-clock-edge and time-borrowing properties are ficient and low-energy microprocessors, respectively [14], [20].
improved thanks to an increasingly negative [5], but To fairly compare DET and SET FFs, they are analyzed by
their race immunity diminishes because increases assuming the same throughput, i.e., by assuming that the clock
[2]. cycle of the former is half that of the latter ones (see also the
In the first case, and are inherently related to the Appendix for details).
value. Hence, the only independent figure of merit con- In regard to the temperature, the analyses are carried out by
cerning FF timing is . In the second case, and setting the temperature to realistic values encountered in real ap-
can be arranged regardless of the value. How- plications (i.e., in the order of 70 C) instead of the room tem-
ever, this tradeoff depends only on the specific requirements at perature [17], [18]. This setting affects the speed of the circuits
the micro-architectural level and can be freely regulated through and the strongly temperature-dependent leakage currents in a
the pulse width. Hence, also in this case, the only real figure of realistic way.
merit concerning FF timing is . Finally, the average FF energy dissipation in one clock cycle
, is the sum of and and thus depends on
C. Estimation of Energy Dissipation the input data statistics and micro-architectural choices through
The FF energy dissipation is made up of transient (i.e., dy- switching activity and logic depth .
namic and short-circuit) and static (i.e., leakage) contributions
D. Layout Issues: Local Wires Parasitics and Area Estimation
[3].
Transient energy depends on the data input switching activity The extraction of the parasitic capacitances due to local inter-
[9], which is set to the typical values 0.10, 0.25, and 0.5. The connects is a task that must be necessarily accomplished when
evaluation of transient dissipation has been discussed in detail analyzing nanometer circuits. This is especially true for FFs,
in [2]. However, we feel that some modifications are required which have a more complex structure compared to combina-
to more properly evaluate this contribution. Details about our tional logic gates. For example, we have the following:
estimation methodology are reported in the Appendix, where the • FFs are built with various cascaded logic stages and hence
average transient energy in a clock cycle is evaluated as long wires are needed to connect them to each other or to
a function of . keepers (e.g., see the FF in [21]);
In deep submicrometer technologies, the impact of leakage • sometimes there are wires connecting the FF output with
has to be considered not only in standby mode but also in active the first stages (e.g., see the FF in [22]);
mode, as it is a sizeable fraction of the chip consumption [18], • the interconnects parasitics associated with the clock ter-
[19]. For this reason, it is separately evaluated from the transient minal within FFs (especially the master-slave ones) tend
contribution, as discussed in the following. An average leakage to be rather high (e.g., see the FFs in [10], [23]).
current, , is estimated by averaging out the 8 possible In principle, the FF layout should be redrawn for each set
values for the total FF leakage current according to the different of transistors sizes under evaluation, in order to include the FF
steady states of the FF terminals. The generic current is defined interconnects parasitics within the circuit design loop. Since a
where the subscripts and stand for the clock, data large number of transistors sizings must be considered to reach
and output static values (0 or 1). In order to correctly account each optimal point in the space, this usually requires
for the gate leakage, we also do the following: the realization of hundreds of thousands layouts, which is obvi-
• add the leakage contribution at the data and clock inputs ously impracticable. Therefore, a strategy to estimate the inter-
when they are at the high logical level (because this current connects parasitics as a function of the transistors sizes is essen-
is drawn by the FF under analysis); tial. Obviously, these parasitics are purely capacitive (parasitic
• subtract leakage currents that flow from the output when it resistive effects within a FF are clearly negligible, due to the
is logically high (because they are drawn by the load). small dimension).
Thus, the average static dissipation in a clock cycle is To estimate the interconnect capacitance, one needs to esti-
mate the wire length for each Poly, Metal1, and Metal2 line in
a reference layout. In this paper, we refer to standard cell lay-
outs [17], [18], which obviously must be as compact as possible

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:23:17 EDT from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

and where only Poly, Metal1, and Metal2 layers are employed
to connect transistors. In particular, the length of these inter-
connects depend on the relative position of the transistors along
their path, as well as the physical dimension of these transistors.
Stick diagrams are a useful tool to evaluate the relative posi-
tion [17], whereas the physical dimension of transistors can be
found from detailed layout consideration. From these consid-
erations, the adopted systematic methodology to extract inter-
connects parasitics is that reported in the Appendix A of [24].
Once the interconnect lengths are derived according to the above
methodology, the layout parasitics are found by multiplying the
lengths by the capacitances per unit length of the connecting Fig. 2. Energy-Efficient curve and minimum E D design points.
layers. Observe that the capacitances per unit length depend on
the detailed layout configuration, hence they must be properly
extracted from the design kit (see [24, App. A] for details). for a given delay (energy). Given the circuit topology, load and
The above procedure leads to the definition of expressions of supply voltage, the EEC has a hyperbolic shape, as shown in
parasitics that depend on the sizes of the FF transistors. Hence it Fig. 2 [15]
can be automated in circuit simulations to include the effect of
interconnects when sizing transistors. The accuracy of the above (2)
procedure has been validated through comparison with real lay-
outs of minimum- designs. On average, the parasitics are where and are the energy and delay asymptotes of the cir-
underestimated by 10%–25% (for a detailed example, see Ap- cuit, and the parameter is determined by fitting experimental
pendix A of [24]). The error in the delay and energy estimation is data (typically ). Actually, there is a minimum energy
actually lower (5%–10%), because each node capacitance also that is achievable with the minimum transistors sizes al-
includes the contribution of transistors. lowing correct operation. Therefore, the points between and
The analysis of all topologies shows that layout parasitic ca- must be discarded in the EEC (see Fig. 2). Regarding ,
pacitances are comparable to (or even greater than) the capaci- it can be only asymptotically approached through transistors
tances of all transistors connected to the same node. From the LE sizing, and measures the speed potential of a specific topology.
perspective, this means that the branching effort due to the pres- can be estimated through methods that are focused on speed
ence of layout parasitics is typically 2 or more, thus confirming optimization (e.g., the LE method [16]).
that interconnects have a huge impact on the FF speed and en- The resulting EEC is made up by the points minimizing the
ergy. Compared with the case where such parasitics are not in- products , with , as shown in Fig. 2. This result
cluded (as in previous works, [1]–[11]), the transistors sizes to was found in [15], [28] referring to the slightly different metrics
maintain a given speed strongly increase in high-speed designs , where the parameter is called “hardware intensity”.
and the energy is nearly doubled. A fair comparison among FFs requires an exhaustive anal-
To conclude, layout parasitics must be included in the tran- ysis across the entire space, i.e., the extraction of
sistor-level design loop to correctly characterize the the EEC under different loading, switching activity and logic
tradeoff of an FF, as well as to fairly compare different topolo- depth conditions. This task was somewhat accomplished in [6]
gies. Moreover, observe that this procedure can predict the size through extensive simulations, but with no reference to any
of the FF cell, other than the length of interconnects. Hence, it design metric. Instead, in the following, the relation between
is also a useful tool to estimate the FF area, thereby enabling a the EEC and the metrics is exploited to efficiently extract
thorough analysis of the energy-delay-area tradeoff for the first the EEC curve, as well as to give a clear physical meaning to
time. each point of the EEC.

III. ENERGY-DELAY TRADEOFF B. Optimization Under a Varying Input Capacitance


Typically, previous FF comparisons considered a fixed input
A. Energy-Delay Space Analysis size or, at most, a narrow set of values [1], [2], [4]–[11].
Since the last few years, digital VLSI circuits operate in a However, such a choice does not allow to extensively explore
power limited regime [25]. Hence, a deep understanding of the the FF potentials in terms of both energy and delay. Indeed, a
tradeoff is crucial in both high-speed and low-energy considerable increment (reduction) of may bring a signifi-
designs [12]–[14]. This justifies the adoption of composite met- cant speed improvement (energy savings). The same concept is
rics, such as the products or [26], [27], which can be stated in [6], [12], [14] referring to generic circuits, but in those
easily generalized to the class of figures of merit (FOMs). papers is swept to a limited extent.
Varying the exponents and , a wide range of different Here we suggest that, in the accomplishment of a fair design
tradeoffs can be explored. strategy, (determined by the width of transistors in the
FF topologies can be thoroughly and fairly compared by ex- first stage of the path) should be adopted as an additional
tracting the “energy-efficient curve” (EEC) in the space, design variable to be fully optimized, like the sizes of transistors
i.e., the set of design points showing minimum energy (delay) in the other stages within the FF. Hence, it is worth opportunely

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:23:17 EDT from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 5

addressing the consequences of the optimization within a


wide range of exploration.
• If is increased with respect to medium values, it means
that the FF is being sized to achieve a high speed and
somewhat disregarding the energy consumption (which in-
creases since we include the energy to charge/discharge
; see the Appendix). Even if the FF imposes a signif-
icant load on the preceding logic stage in the pipeline, in
high-speed applications the speed penalty of the preceding
logic stages could be exceeded by the speed advantage that
the faster-FF brings at the beginning of the cycle time.
• If is strongly reduced, the FF is being sized for low-en-
ergy operation. Granted that the above mentioned tradeoff
is still valid, the low-energy applications typically have
high logic depths, and hence a slower FF can be tolerated
in favor of its smaller energy dissipation.
According to the above considerations, the adopted com-
parison strategy considers the input capacitance as a degree
of freedom. Given a value, from the LE point of view
this means that analysis is carried out by also optimizing the Fig. 3. Exemplification of transistors sizing (section of SDFF).
electrical effort according to the desired tradeoff values
[8], as opposite to the usual analysis with fixed (non-optimal)
electrical effort. 1) Clocked precharge transistors are sized to ensure a correct
precharge in half clock period.1 In general, a minimum size
IV. DESIGN AND OPTIMIZATION METHODOLOGIES is sufficient except when the clock frequency requirement
is very pressing or when the precharge is counteracted by a
In this section, in order to provide a complete and in-depth non-gated keeper (in this case a width is sufficient).
view of the issues involved in the FFs design, a brief overview 2) Keepers inverting first stages (see Fig. 3) are minimum
of the thorough design methodology, proposed by the authors sized. Keepers level-restoring second stages (see Fig. 3)
and described in detail in [24], is reported. have always minimum widths. For non-gated keepers, their
length (normalized to the minimum value ) is set to
A. Transistor Sizing Strategy
2 in order to weaken their driving strength.
Among all the transistors widths, only some of them can im- 3) Transistors involved in the clock gating logic and not be-
prove the speed, i.e., those of the transistors along the critical longing to paths are minimum sized.
paths (i.e., the paths, as ), . These 4) Transistors placed in feedback paths (typically coming
transistors widths are hence design variables to be swept by the from the output) that do not define the transparency
optimization algorithm that extracts the EEC. To further reduce window (as in [23]) are minimum sized.
the computational effort involved in the transistors sizes opti- 5) When sizing the pulse generators (delay paths) in explicit
mization, let us introduce some reasonable simplifications. First, (implicit) pulsed FFs [1], [2], the duration of the trans-
series-connected transistors are equally sized, static gates are parency window is set equal to the whole delay.
symmetrically sized, whereas pull-up and pull-down networks This choice guarantees the FF functionality and links the
(lying in paths) of non-static gates are associated with dis- window duration to the speed of the FF for each specific
tinct design variables. In the following, the widths of tran- sizing. A detailed discussion on how to size each block of
sistors along the paths are referred with their normalized Pulse Generators (e.g., those shown in Fig. 3) can be found
value, , with respect to the minimum width allowed by in [24] (see Section V and Appendix B).
the technology. The previous list covers all the cases encountered in this work
The sizes of the remaining transistors are not independent within the analysis and design of the considered FFs. Further
variables, as they can be either set to a constant value (i.e., the details and examples relative to the above sizing strategy can be
minimum value ensuring correct functionality), or are a func- found in [24].
tion of the design variables (e.g., when their sizes set the 1A correct precharge means the reaching of 90% of the steady logic value
transparency window of pulsed FFs). in half clock period. Hence, given: 1) the T =F O4 specification; 2) the ca-
All the possible cases for such dependent variables are listed pacitive load at the node to be precharged; and 3) the possible counteraction of
in the following, and some are exemplified in Fig. 3 for the non-gated keepers, the required size of the precharged transistors is estimated
by resorting to LE model. Indeed, during the exploration of the design space,
semi-dynamic FF (SDFF) [29]. Normalized width/length (first/ the sizes of all FF transistors are known and LE can be applied (also including
second number) are shown next to each transistor. the layout parasitics).

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:23:17 EDT from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

B. Definition of Design Space Bounds


As regards the widths to be swept within the optimization
procedure, realistic bounds for their ranges of variation need to
be identified to keep the computational effort within reasonable
limits. To this aim, we use the LE method since the LE sizing
leads to extreme high-speed designs (i.e., maximum possible
transistors sizes), as discussed in the following.
From the LE method, the optimized delay of a path of
cascaded gates (like a FF) is given by [16]

(3)

where and are the Logical Effort, the electrical effort


, the branching effort and the parasitic delay of
the entire path, respectively. Equivalently, one has
Fig. 4. LE-based procedure to identify a maximum C through opti-
(4) mized D to C sensitivity (SDFF example).

(5) • multiple reconvergent paths (which obviously must be


sized to exhibit the same delay).
where is the relative delay increment with respect to the ideal Nevertheless, the above arguments leading to (6) can be ex-
and practically inaccessible minimum path delay, i.e., the path ploited, given that, for high values, , and converge
parasitic delay . The sensitivity of the optimized path delay to somewhat constant values (i.e., the derivation of (6) from
to , from (3)–(5) is given by (4)–(5) is consistent in the high- region).
From a practical perspective, instead of directly applying
(6) (5)–(6), we apply an iterative procedure by increasing
until the condition (6) is satisfied with . For each
value, the LE method is employed, thus leading to the definition
From (6), we can evaluate the increment that leads to of the entire FF sizing (a set of nonlinear equations must be it-
the minimum acceptable sensitivity2 of the delay on , eratively solved). A fitted curve versus , upon which
and, in turn, from (5) we can evaluate the corresponding max- the sensitivity evaluation is carried out, is re-extrapolated for
imum input capacitance that keeps . each new value. When the sensitivity is equal to , the
Obviously, the upper bound of the input capacitance cycle stops, a practical maximum (i.e., ) is identified
corresponds to the upper bound of the size of the transis- and, correspondently, also the maximum values are found
tors in the first stage. The upper bounds of all other transistors (thanks to the LE method).
sizes ( , with ) can be found by considering that the As example, in Fig. 4 we report the results relative to the
LE design method permits to evaluate the transistor sizes that application of this procedure to SDFF, under a load. For
minimize the delay under a given and . Hence, since the instance, a sensitivity (i.e., we are significantly close
sizes found with LE method increase when increasing , to the FF parasitic delay) is reached for .
their upper bounds are found by optimizing the circuit with LE All the analyses in the paper are carried out by determining
for . the ranges of variation through a value, thus
Observe that, in practical cases, formulas (5)–(6) cannot be surely including the values that actually optimize the
applied straightforwardly because , and are not available FOMs considered in Section IV-C.
in a closed-form as a function of , which makes it impossible
to evaluate from (5). Indeed, , and can be found
only by numerically solving a set of nonlinear equations when C. Optimization in the Space
applying the LE method for a given [16]. This issue is well
known to arise in real circuits where the LE parameters of each Once the ranges of variation of the have been found, an
stage depend on both the transistors sizes of the stage optimization algorithm is applied to extract the optimum for
itself and of the other stages too, due to the presence of the each point of the EEC. As discussed in Section III-A, the EEC is
following: made up of the design points minimizing the FOMs, and
• keepers; hence the curve can be found by minimizing a discrete set of
• logic gates with fixed sizes outside the path under analysis these FOMs and interpolating them to extract the intermediate
(e.g., precharge transistors or PGs); points. We choose to consider the following ones, which cover
• interconnects capacitances (which depend on the ); a very wide range of applications
2Points in the EEC with very low sensitivity in (6) (say, a few percentage
points) are not of interest. Indeed, in this case the delay is almost insensitive to
an increase in C , and hence the FF becomes strongly energy inefficient. (7)

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:23:17 EDT from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 7

deviation, maximum and minimum sensitivity for each point of


the EEC are reported in Table I. From Fig. 5 and Table I, it is ap-
parent that the dispersion of the simulated values is very small
and the resulting values agree very well with theoretical sensi-
tivity, thereby confirming the validity of the assumptions intro-
duced above (including the convexity of the functionals ).

V. ANALYZED FLIP-FLOP FAMILIES AND TOPOLOGIES


To cover the wide spectrum of adopted FF topologies, in our
analysis we consider the main four FF classes: master-slave
(MS), implicit-explicit pulsed (IP and EP), differential and DET
topologies. Nineteen FFs among the most representative and
best known ones are chosen for the four classes.
In particular, for the MS class we consider the following:
Fig. 5. Sensitivity analyses for the optimum designs. • Transmission Gate FF [10] (TGFF);
• Write-Port Master-Slave FF [30] (WPMS);
• Gated Master-Slave FF [10] (GMSL);
TABLE I
ANALYSIS OF THE SENSITIVITY S
• Data Transition Look-Ahead FF [31] (DTLA).
The latter two are also clock-gated structures.
The analyzed pulsed topologies are seven as follows:
• Hybrid Latch FF [32] (HLFF);
• Semi-Dynamic FF [29] (SDFF);
• UltraSPARC Semi-Dynamic FF [21] (USDFF);
• Implicitly Push-Pull FF [33] (IPPFF);
• Conditional Precharge FF [23] (CPFF);
• Static Explicit Pulsed FF [34] (SEPFF);
• Transmission Gate Pulsed Latch [35] (TGPL).
The first five are IP, whereas the remaining two are EP.
The identification of the optimum transistors sizes is carried out The four differential FFs investigated are as follows:
by resorting to a simple binary search algorithm, since the • Modified Sense-Amplifier FF [36] (MSAFF);
functionals are presumably convex (see later for details). • Skew-Tolerant FF [37] (STFF);
In order to reduce the computational effort, we restrict the • Conditional Capture FF [22] (CCFF);
search space by exploiting another property of the functionals • Variable Sampling Window FF [38] (VSWFF).
. Indeed, the optimum transistor sizes obtained when Finally, the four DET topologies are as follows:
minimizing are always greater than those minimizing • Transmission Gate Latch-Mux [39] (DET-TGLM);
, since the former gives more emphasis on the speed. • Symmetric Pulse Generator FF [40] (DET-SPGFF);
Therefore, starting from the extreme high-speed region (i.e., • Static Pulsed Latch [11] (DET-SPL);
), the ranges of for each FOM are progressively • Conditional Discharge FF [41] (DET-CDFF).
limited by those minimizing the previous one (i.e., the search The first is an MS, the second is IP and the other ones are EP.
space for each point of the EEC is progressively narrower). In Fig. 6(a)–(s) we report the schematics of each FF and the
Accordingly, each search relative to a specific metric has a location of the variable widths lying in the - paths and
number of iterations equal to , where that have to be optimized as explained in Section IV.
is the maximum optimum found by minimizing Note that, in the analysis of EP FFs, one PG for each latch is
the previous FOM ( stands for the closest integer considered. Obviously, this is a somewhat conservative choice
greater than a real number ). The EEC is finally extracted in the estimation of the energy consumption of such FFs, since,
through a hyperbolical fitting that interpolate the optimum by sharing the PG among a few different latches [11], [17], [35],
design points in the space, according to (2). the fraction of PG dissipation imputable to each latch slightly di-
We verified the consistency of the previous assumptions (e.g., minishes. This element must be taken into account when com-
convexity of the functionals) by comparing the energy-to-delay paring EP FFs with other topologies.
sensitivity in the minimum points with the theoretical In order to have an idea of the layout complexity and the re-
value, which is given by [15], [28] sulting impact of local wires, the layouts of the FFs are shown in
Fig. 7(a)–(s) for the minimum- sizings under typical values
(8) and .

The simulated and theoretical sensitivities are plotted in Fig. 5 VI. CONCLUSION
for all the optimum designs found within this work under var- In Part I, exhaustive analysis and design methodologies for
ious conditions and for all the considered FFs (see detailed re- nanometer CMOS FFs have been presented. Such methodolo-
sults in Part II). Detailed numerical values of the mean, standard gies are based on the notion of the Energy Efficient Curve and

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:23:17 EDT from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Fig. 6. Schematics of the analyzed FFs and variable widths (w ) to be optimized: (a) TGFF; (b) WPMS; (c) GMSL; (d) DTLA; (e) HLFF; (f) SDFF; (g) USDFF;
(h) IPPFF; (i) CPFF; (j) SEPFF; (k) TGPL; (l) MSAFF; (m) STFF; (n) CCFF; (o) VSWFF; (p) DET-TGLM; (q) DET-SPGFF; (r) DET-SPL; (s) DET-CDFF.

on the evaluation of its points that correspond to figures of merit meaning. Moreover, the FF input capacitance is considered as a
in the energy-delay space, which have a clear physical further independent variable to be optimized and the impact of

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:23:17 EDT from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 9

Fig. 7. Layouts of the analyzed FFs (minimum ED sizing): (a) TGFF; (b) WPMS; (c) GMSL; (d) DTLA; (e) HLFF; (f) SDFF; (g) USDFF; (h) IPPFF; (i) CPFF;
(j) SEPFF; (k) TGPL; (l) MSAFF; (m) STFF; (n) CCFF; (o) VSWFF; (p) DET-TGLM; (q) DET-SPGFF; (r) DET-SPL; (s) DET-CDFF.

local wires parasitics is included in the transistor-level design APPENDIX


loop thanks to a stick-diagram based methodology, which is also ACCURATE EVALUATION OF THE TRANSIENT
helpful to achieve good area estimations. ENERGY CONSUMPTION
The FFs chosen for a thorough comparative analysis, whose
results are reported in Part II [42], cover the wide spectrum of The proper evaluation of the energy dissipation related with
practically adopted solutions and have been here presented. the input, internal and output nodes transitions is a somewhat

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:23:17 EDT from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

complicated task. In [2], [5] some significant guidelines were input signal (clock or data) that is varying and to the specific
outlined. For instance, the evaluation of the energy consump- transition or , the second subscript refers to the
tion must not include the energy dissipated in the charging/dis- value of the other stable input signal or and the third
charging of the external output load, since it is a value solely subscript is related with the output behavior, which can remain
depending on the load dimension and not on the FF features. stable or or can vary or . In the
However, one should not simply subtract the magnitude of following, with no loss of generality, we will refer to a non-in-
the current flowing from and towards the load, otherwise, the verting Positive SET FF. The first four contributions are those
effect of some undesired output transitions would not be taken related with the clock transitions when the input and output are
into account. To be more specific, some topologies (e.g., some stable
semi-dynamic FFs) can suffer from glitches both on the internal
and output nodes. In this case, the energy dissipation due to (A2)
output glitches must be included, since it is a shortcoming that (A3)
worsens the FF features right dependently on the load value. (A4)
The energy spent to charge the data and clock inputs has to
be included in the computation [2], [5], because it is a feature (A5)
dependent on the FF characteristics. The replicas of the data-
The second four contributions are related with the clock tran-
and clock-driving buffers are inserted in the simulation setup
sitions when the input and output are different. In the case of a
(see Fig. 1), to subtract the energy due to the parasitic load of
Positive SET FF, this does (not) lead to an output change for the
the same driving inverters [2], [5].
clock transition (the situation is reversed for a
Summarizing, defining , and as
Negative SET FF)
the currents drawn from the power supply by the FF, by the data-
and clock-driving inverters close to the FF and by their unloaded (A6)
replicas, the generic contribution to the transient energy (at the
(A7)
moment including the energy on the load) is
(A8)
(A1) (A9)

where the definition of the integration limits and has to Finally come the data input transitions, which can occur
be properly done. during the high or low clock phase
In [2], the authors deal with the energy breakdown by re-
ferring to four energy contributions: and (A10)
. They are evaluated by considering a single clock pe- (A11)
riod during which a single event on data occurs (A12)
. The authors state that it is possible to infer (A13)
clocking, precharge and internal nodes energy contributions by
simply combining the four terms according to transition proba- These 12 contributions are evaluated by integrating the supply
bilities and subtracting the energy spent on load. current according to (A1), assuming to be the point of time
However, the simple approach shown in [2] does not allow to where the input experiences a transition, and to be the point
accurately separate and localize the various sources of energy of time where the slowest node within the FF reaches 99% of
consumption (clock, precharge, ), because the energy dissi- its steady value.3 In this way, the time window is suf-
pation related with the transition of one signal is influenced by ficiently wide to fully capture the dynamic and short-circuit en-
the values of the other signals. For instance, according to the FF ergy contributions, whereas it is sufficiently narrow to neglect
functionality, the transition of the data input can cause simply the impact of leakage.
the charging of the input gate capacitance or the transition of To determine the average transient energy in a clock cycle,
internal nodes according to the state of the clock (e.g., in MS the switching activity needs to be used. If at most one data
circuits). If, after the transition, the data always remained stable transition occurs for each clock period (i.e., ), the av-
waiting for being transferred through the FF, the argument made erage transient energy can be written as
in [2] would be completely correct and exhaustive. But, actually,
the data can change during the opaque phase of the FF and one
needs to account for all the possible transition scenarios, in order
to have the most general information about the transient energy.
Moreover, the integration of the supply current over the entire
3Sometimes, the nodes voltages can take long times to reach the 99% of the
clock period includes the static energy due to leakage, whereas
steady value. Anyhow, when not employing simple pass-transistors that cause
it should be separately evaluated and weighted according to the a threshold drop and when all transistors are properly sized according to the ar-
chosen logic depth . chitectural T =F O4 specification (as described in Section IV-A or [24]), the
For these reasons we suggest to consider all the possible tran- 99% value can be closely approached in practically acceptable times. Neverthe-
less, a good estimation of transient energy comes out also considering slightly
sitions that arise according to all possible inputs combinations. smaller values than 99% (e.g., 90%), and hence it is simply a matter of conven-
We adopt the following notation: the first subscript refers to the tion when characterizing an FF.

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:23:17 EDT from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 11

[10] D. Markovic, B. Nikolic, and R. Brodersen, “Analysis and design of


low-energy flip-flops,” in Proc. Int. Symp. Low Power Electron. Des.,
(A14) Aug. 2001, pp. 52–55.
[11] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V. De,
“Comparative delay and energy of single edge-triggered and dual edge-
The quantity in (A14) is the energy required by triggered pulsed flip-flops for high-performance microprocessors,” in
Proc. Int. Symp. Low Power Electron. Des., Aug. 2001, pp. 147–152.
the load capacitance, and is hence subtracted because it depends [12] H. Dao, B. Zeydel, and V. Oklobdzija, “Energy optimization of
only on the adopted load (i.e., it is not a feature of the considered pipelined digital systems using circuit sizing and supply scaling,”
FF). IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 2, pp.
122–134, Feb. 2006.
Some further arrangements are required when dealing with [13] D. Markovic, V. Stojanovic, B. Nikolic, M. Horowitz, and R.
DET FFs. In this case (A6) and (A7) change into Brodersen, “Methods for true energy-performance optimization,”
IEEE J. Solid-State Circuits, vol. 39, no. 8, pp. 1282–1293, Aug. 2004.
[14] V. Oklobdzija and R. Krishnamurthy, High-Performance Energy-Effi-
(A15) cient Microprocessor Design. New York: Springer, 2006.
(A16) [15] V. Zyuban and P. Strenski, “Unified methodology for resolving power-
performance tradeoffs at the microarchitectural and circuit levels,” in
Proc. Int. Symp. Low Power Electron. Des., Aug. 2002, pp. 166–171.
because both the clock transitions enable the data-transfer [16] I. Sutherland, B. Sproull, and D. Harris, Logical Effort: Designing Fast
through the FF. To fairly compare SETs and DETs, the same CMOS Circuits. San Mateo, CA: Morgan Kaufmann, 1998.
[17] N. Weste and D. Harris, CMOS VLSI Design: A Circuits and System
throughput must be assumed, which translates into an halved Perspective (3rd Edition). Reading, MA: Addison Wesley, 2004.
clock frequency for DET with respect to SET. Therefore, in [18] J. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Cir-
order to consistently readjust the average transient energy cuits: A Design Perspective (2 nd Edition). Englewood Cliffs, NJ:
Prentice-Hall, 2003.
evaluation, one has to refer to the transitions occurring in a [19] S. G. Narendra and A. Chandrakasan, Leakage in Nanometer CMOS
half-cycle and hence (A14) is changed into Technologies. New York: Springer, 2006.
[20] M. Hrishikesh, N. Jouppi, K. Farkas, D. Burger, S. Keckler, and P.
Shivakumar, “The optimal logic depth per pipeline stage is 6 to 8 FO4
inverter delays,” in Proc. Ann. Int. Symp. Comput. Arch., May 2002,
pp. 14–24.
[21] R. Heald, K. Aingaran, C. Amir, M. Ang, M. Boland, P. Dixit, G.
Gouldsberry, D. Greenley, J. Grinberg, J. Hart, T. Horel, W. Hsu, J.
Kaku, C. Kim, S. Kim, F. Klass, H. Kwan, G. Lauterbach, R. Lo, H.
McIntyre, A. Mehta, D. Murata, S. Nguyen, Y. Pai, S. Patel, K. Shin,
K. Tam, S. Vishwanthaiah, J. Wu, G. Yee, and E. You, “A third gener-
(A17) ation SPARC V9 64-b microprocessor,” IEEE J. Solid-State Circuits,
vol. 35, no. 11, pp. 1526–1538, Nov. 2000.
It is worth noting that the proposed methodology also allows [22] B. Kong, S. Kim, and Y. Jun, “Conditional-capture flip-flop for statis-
tical power reduction,” IEEE J. Solid-State Circuits, vol. 36, no. 8, pp.
for straightforwardly taking data glitches into account. This was 1263–1271, Aug. 2001.
not possible in [2], since contributions in (A10)–(A13) were not [23] N. Nedovic, M. Aleksic, and V. Oklobdzija, “Conditional techniques
explicitly evaluated. for low power consumption flip-flops,” in Proc. IEEE Int. Conf. Elec-
tron., Circuits Syst., Feb./May 2001, vol. 2, pp. 803–806.
[24] M. Alioto, E. Consoli, and G. Palumbo, “General strategies to design
REFERENCES nanometer flip-flops in the energy-delay space,” IEEE Trans. Circuits
Syst. I, Reg. Papers, to be published.
[1] V. Oklobdzija, “Clocking and clocked storage elements in a multi-Gi- [25] B. Nikolic, “Design in the power-limited scaling regime,” IEEE Trans.
gaHertz environment,” IBM J. Res. Development, vol. 47, no. 5/6, pp. Electron Devices, vol. 55, no. 1, pp. 71–83, Jan. 2008.
567–583, Sep./Nov. 2003. [26] R. Gonzalez, B. Gordon, and M. Horowitz, “Supply and threshold
[2] V. Oklobdzija, V. Stojanovic, D. Markovic, and N. Nedovic, Digital voltage scaling for low power CMOS,” IEEE J. Solid-State Circuits,
System Clocking: High-Performance and Low-Power Aspects. New vol. 32, no. 8, pp. 1210–1216, Aug. 1997.
York: Wiley-IEEE Press, 2003. [27] A. Martin, “Towards an energy complexity of computation,” Informa-
[3] M. Alioto, E. Consoli, and G. Palumbo, “Flip-flop energy/performance tion Process. Lett., vol. 77, no. 2, pp. 181–187, Feb. 2001.
versus clock slope and impact on the clock network design,” IEEE [28] P. Penzes and A. Martin, “Energy-delay efficiency of VLSI computa-
Trans. Circuits Syst. I, Reg. Papers, to be published. tions,” in Proc. ACM Great Lake Symp. VLSI, New York, Apr. 2002,
[4] N. Nedovic and V. Oklobdzija, “Dual-edge triggered storage elements pp. 104–111.
and clocking strategy for low-power systems,” IEEE Trans. Very Large [29] F. Klass, C. Amir, A. Das, K. Aingaran, C. Truong, R. Wang, A. Mehta,
Scale Integr. (VLSI) Syst., vol. 13, no. 5, pp. 577–590, May 2005. R. Heald, and G. Yee, “A new family of semidynamic and dynamic flip-
[5] V. Stojanovic and V. Oklobdzija, “Comparative analysis of master- flops with embedded logic for high-performance processors,” IEEE J.
slave latches and flip-flops for high-performance and low-power sys- Solid-State Circuits, vol. 34, no. 5, pp. 712–716, May 1999.
tems,” IEEE J. Solid-State Circuits, vol. 34, no. 4, pp. 536–548, Apr. [30] D. Markovic, J. Tschanz, and V. De, “Transmission-gate based flip-
1999. flop,” U.S. Patent 6 642 765, Nov. 4, 2003.
[6] C. Giacomotto, N. Nedovic, and V. Oklobdzija, “The effect of the [31] M. Nogawa and Y. Ohtomo, “A data-transition look-ahead DFF circuit
system specification on the optimal selection of clocked storage ele- for statistical reduction in power consumption,” IEEE J. Solid-State
ments,” IEEE J. Solid-State Circuits, vol. 42, no. 6, pp. 1392–1404, Circuits, vol. 33, no. 5, pp. 702–706, May 1998.
Jun. 2007. [32] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper,
[7] H. Partovi, “Clocked storage elements,” in Design of High-Perfor- “Flow-through latch and edge-triggered flip-flop hybrid elements,” in
mance Microprocessor Circuits. Piscataway, NJ: IEEE Press, 2001, Proc. IEEE Int. Solid-State Circuit Conf., Feb. 1996, pp. 138–139.
pp. 207–234. [33] N. Nedovic, “Clocked storage elements for high-performance applica-
[8] S. Heo and K. Asanovic, “Load-sensitive flip-flop characterization,” in tions,” Ph.D. dissertation, Dept. Elect. Comput. Eng., Univ. California,
Proc. IEEE Comput. Soc. Workshop VLSI, Apr. 2001, pp. 87–92. Davis, 2003.
[9] S. Heo, R. Krashinsky, and K. Asanovic, “Activity-sensitive flip-flop [34] P. Zhao, T. Darwish, and M. Bayoumi, “Low power and high speed
and latch selection for reduced energy,” IEEE Trans. Very Large Scale explicit-pulsed flip-flops,” in Proc. IEEE Midw. Symp. Circuits Syst.,
Integr. (VLSI) Syst., vol. 15, no. 9, pp. 1060–1064, Sep. 2007. Aug. 2002, pp. 477–480.

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:23:17 EDT from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

[35] S. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. Sullivan, ences, and companies throughout the world. He has served as a member of var-
and T. Grutkowski, “The implementation of the Itanium 2 micropro- ious conference technical program committees (ISCAS, PATMOS, ICM, ICCD,
cessor,” IEEE J. Solid-State Circuit, vol. 37, no. 11, pp. 1448–1460, CSIE) and Track Chair (ICECS, ISCAS, ICM, ICCD). He serves as an Asso-
Nov. 2002. ciate Editor of the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION
[36] B. Nikolic, V. Stojanovic, V. Oklobdzija, W. Jia, J. Chiu, and M. (VLSI) SYSTEMS, as well as of the Microelectronics Journal, the Integration
Leung, “Improved sense-amplifier-based flip-flop: Design and mea- The VLSI Journal and the Journal of Circuits, Systems, and Computers. He is
surements,” IEEE J. Solid-State Circuits, vol. 35, no. 6, pp. 876–884, Guest Editor of the Special Issue Advances in Oscillator Analysis and Design
Jun. 2000. of the Journal of Circuits, Systems, and Computers (2009).
[37] N. Nedovic, V. Oklobdzija, and W. Walker, “A clock skew absorbing
flip-flop,” in Proc. IEEE Int. Solid-State Circuit Conf., Feb. 2003, pp.
342–344.
[38] S. Shin and B. Kong, “Variable sampling window flip-flops for low- Elio Consoli was born in Catania, Italy, in 1983. He received the Master’s de-
power high-speed VLSI,” IEE Proc. IEE Circuits, Devices Syst., vol. gree in microelectronic engineering from the University of Catania, Catania,
152, no. 3, pp. 266–271, Jun. 2005. Italy, in 2008, where he is currently pursuing the Ph.D. degree in the Depart-
[39] R. Llopis and M. Sachdev, “Low power, testable dual edge triggered ment of Electrical, Electronic, and Systems Engineering (DIEES).
flip-flops,” in Proc. Int. Symp. Low Power Electron. Des., Aug. 1996, His primary research interests include clocking strategies and energy-efficient
pp. 341–345. design techniques for high-performance and low-power digital VLSI systems
[40] N. Nedovic, W. Walker, V. Oklobdzija, and M. Aleksic, “A low power in nanometer CMOS technologies. He is the co-author of scientific papers on
simmetrically pulsed dual edge-triggered flip-flop,” in Proc. IEEE Eur. referred international Journals and Conferences.
Solid-State Circuits Conf., Sep. 2002, pp. 399–402.
[41] P. Zhao, T. Darwish, and M. Bayoumi, “High-performance and low-
power conditional discharge flip-flop,” IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 12, no. 5, pp. 477–484, May 2004.
[42] M. Alioto, E. Consoli, and G. Palumbo, “Analysis and comparison Gaetano Palumbo (F’07) was born in Catania, Italy, in 1964. He received the
in the energy-delay-area domain of nanometer CMOS flip-flops: Part Laurea degree in electrical engineering and the Ph.D. degree from the University
II–Results and figures of merit,” IEEE Trans. Very Large Scale Integr. of Catania, Catania, Italy, in 1988 and 1993, respectively.
(VLSI) Syst., accepted for publication. Since 1993, he conducts courses on Electronic Devices, Electronics for Dig-
ital Systems, and Basic Electronics. In 1994, he joined the Dipartimento Elet-
trico Elettronico e Sistemistico (DEES), now Dipartimento di Ingegneria Elet-
trica Elettronica e dei Sistemi (DIEES), University of Catania, as a Researcher,
Massimo Alioto (M’01–SM’07) was born in Brescia, Italy, in 1972. He received subsequently becoming an Associate Professor in 1998. Since 2000, he is a Full
the Laurea degree in electronics engineering and the Ph.D. degree in electrical Professor with the same department. His primary research interest has been
engineering from the University of Catania, Catania, Italy, in 1997 and 2001, analog circuits with particular emphasis on feedback circuits, compensation
respectively. techniques, current-mode approach, low-voltage circuits. Then, his research has
In 2002, he joined the Dipartimento di Ingegneria dellInformazione (DII), also embraced digital circuits with emphasis on bipolar and MOS current-mode
University of Siena, Siena, Italy, as a Research Associate and in the same year as digital circuits, adiabatic circuits, and high-performance building blocks fo-
an Assistant Professor. In 2005, he was appointed Associate Professor of Elec- cused on achieving optimum speed within the constraint of low power oper-
tronics, and was engaged in the same faculty in 2006. In the summer of 2007, ation. In all these fields he is developing some the research activities in col-
he was a Visiting Professor at EPFL—Lausanne, Switzerland. In 2009-2010, he laboration with STMicroelectronics of Catania. He was the co-author of three
is a Visiting Professor with BWRC, University of California at Berkeley, inves- books CMOS Current Amplifiers (Kluwer, 1999), Feedback Amplifiers: Theory
tigating on ultra-low power circuits and wireless sensor nodes. Since 2001, he and Design (Kluwer, 2001), and Model and Design of Bipolar and MOS Cur-
has been teaching undergraduate and graduate courses on advanced VLSI digital rent-Mode Logic (CML, ECL and SCL Digital Circuits) (Kluwer, 2005) and a
design, microelectronics and basic electronics. He has authored or co-authored textbook on electronic device in 2005. He is the author of 350 scientific papers
more than 140 publications on journals (over 50, mostly IEEE Transactions) on referred international journals (over 150) and in conferences. Moreover he is
and conference proceedings. Two of them are among the 25 most downloaded co-author of several patents.
TVLSI papers in 2007 (respectively, 10th and 13th). He is co-author of the book Prof. Palumbo was a recipient of the Darlington Award in 2003. Since June
Model and Design of Bipolar and MOS Current-Mode Logic: CML, ECL and 1999 to the end of 2001 and since 2004 to 2005, he served as an Associated Ed-
SCL Digital Circuits (Springer, 2005). His primary research interests include the itor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—PART I: REGULAR
modeling and the optimized design of CMOS high-performance, low-power and PAPERS for the topic analog circuits and filters and digital circuits and systems,
ultra low-power digital circuits, arithmetic and cryptographic circuits, intercon- respectively. Since 2006 to 2007, he served as an Associated Editor of the IEEE
nect modeling, design/modeling for variability-tolerant and low-leakage VLSI TRANSACTIONS ON CIRCUITS AND SYSTEMS—PART II: EXPRESS BRIEFS. Since
circuits, circuit techniques for emerging technologies. He is the director of the 2008, he is serving as an Associated Editor of the IEEE TRANSACTIONS ON
Electronics Lab at University of Siena (site of Arezzo). CIRCUITS AND SYSTEMS—PART I: REGULAR PAPERS. In 2005, he was one of
Prof. Alioto is a member of the HiPEAC Network of Excellence. He is the the 12 panelists in the scientific-disciplinare area 09-industrial and informa-
Chair Elect of the VLSI Systems and Applications Technical Committee of the tion engineering of the Committee for Evaluation of Italian Research (CIVR),
IEEE Circuits and Systems Society, for which he is also Distinguished Lecturer. which has the aim to evaluate the Italian research in the above area for the period
He is regularly invited to give talks and tutorials to academic institutions, confer- 2001-2003.

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:23:17 EDT from IEEE Xplore. Restrictions apply.

You might also like