Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.


Design of Low-Voltage High-Speed CML D-Latches

in Nanometer CMOS Technologies
Giuseppe Scotti, Davide Bellizia, Alessandro Trifiletti, and Gaetano Palumbo, Fellow, IEEE

Abstract— This paper presents the design of a novel noise and current mode allows to achieve fast switching at
low-voltage high-speed D-latch circuit suitable for nanometer the cost of higher power consumption [13]–[15].
CMOS technologies. The proposed topology is compared against Nowadays, CMOS nanometer technologies offer fast
the low-voltage triple-tail D-latch and its advantages are demon-
strated both by simulations, under different performance/power devices, with transition frequencies even higher than 350 GHz
consumption tradeoffs with a 40-nm CMOS technology, and for n-channel and 200 GHz for p-channel devices, but these
theoretically, thanks to a simple model of the propagation delay technology nodes exhibit some critical drawbacks, which have
derived for both low-voltage topologies. In order to further raised the new design issues.
demonstrate the advantages of the proposed topology, it has The most important issue to cope with is the dramatic
also been used to design a D flip-flop (DFF), where thanks
to the feature to need just 1 clock differential pair; a further reduction of the supply voltage; in fact, in order to maintain the
speed improvement is achieved over the conventional triple-tail electric field under the critical value in the channel region,
topology. Indeed, by comparing a two-stage frequency divider the power supply has been reduced to very low values during
designed using both the triple-tail DFF and the proposed folded the last decade, reaching two or three times the device’s thresh-
DFF, a 54% improvement in the maximum operating frequency old voltage (below 1 V for sub-50-nm CMOS technology
is found when using the proposed folded DFF.
Index Terms— Current mode logic (CML) D-latch, D flip- This very low-voltage constraint limits the possibility to
flop (DFF), low voltage, nanometer CMOS. have several stacked transistors in a CML gate making most of
the conventional CML topologies unsuitable from a practical
I. I NTRODUCTION point of view as we will point out in the following.
Novel modeling and design approaches for CML gates
D URING the last decade, the increasing interest for
high-speed communications and chip-to-chip intercon-
nect applications has rising the needs of high-performance
have been recently presented in [16]–[19] referring to CMOS
technologies, but the problem of stacking several levels of
logics, which can support both low-power and high data-rate transistors (as in the CML D-latch topology) under a very
applications. Fiber optics, wireline and backplane communi- low-voltage constraint has not been addressed in these works.
cations as well as equalizers, and millimeter-wave-sampling Another drawback which comes using nanometer CMOS
ADCs demand the use of tens of Gb/s capable logics that technologies is the degradation of important small-signal para-
cannot be supported using standard CMOS logic style [1]–[6]. meters, such as gm and rds due to short-channel effects [20]:
Basic building blocks, such as phase detectors, multiplexers, this limits the intrinsic gain of MOS transistors and worsen
decision circuits, frequency dividers, and prescalers, used in the analog performances and noise margins of the conventional
these applications, require high immunity to noise and are CML designs.
required to operate at very high clock rates [7]–[12]. Latch topologies suitable for low-voltage operation have
The D-latch is one of the main building blocks in high- been proposed in [21] and [22]. The circuit proposed in [21]
speed digital circuits. To satisfy stringent speed require- requires inductive load to implement the high pass feed-
ments, D-latches are usually designed in current-mode forward concept, and this results in a quite large silicon area
logic (CML) [13]–[15]. compared with resistive or active load. Furthermore, transistors
The idea behind CML is to use a MOS differential pair defining the tail current of differential pairs are directly driven
as core block for logical and sequential circuits, since the by the clock signals and this results in a tail current which is
differential signaling offers a good protection to switching strongly dependent on the swing of the clock signals and on
process, voltage and temperature (PVT) variations. The CML
Manuscript received January 13, 2017; revised May 10, 2017 and D-latch circuit presented in [22] has higher speed than the
July 21, 2017; accepted August 23, 2017. (Corresponding author: conventional CML D-latch, but as stated in the same paper,
Giuseppe Scotti.)
G. Scotti, D. Bellizia, and A. Trifiletti are with the Department of it suffers from some drawbacks.
Information Engineering, Electronics and Telecommunications, University of 1) This circuit is vulnerable to the common-mode noise
Rome “La Sapienza,” 00184 Rome, Italy (e-mail:; coming from the ground rail due to the operation of;
G. Palumbo is with the Department of Electrical and Electron- current source transistors in triode region.
ics Engineering, University of Catania, 95125 Catania, Italy (e-mail: 2) The currents of transistors driven by the clock signals are sensitive to PVT variations. For example, those
Color versions of one or more of the figures in this paper are available
online at currents are affected by any change in the amplitude
Digital Object Identifier 10.1109/TVLSI.2017.2750207 of the clock signals. Also, any change in the threshold
1063-8210 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


Fig. 1. CML inverter schematic.

voltages increases the ON resistance, which results in

Fig. 2. Conventional MOS CML D-latch schematic.
speed reduction.
The conventional CML D-latch is much less sensitive to such
variations, if the tail current source is kept constant [22].
Since the D-latch proposed in this paper does not suffer
from these drawbacks because the tail current is well defined
as in the conventional CML D-latch, in the following, we focus
on topologies which exhibit this robustness and do not require
additional circuitry to guarantee a stable output swing, prop-
agation delay, and noise margin. Nevertheless, a comparison
between the proposed D-latch and a D-latch circuit in which
current source transistors are directly driven by the clock
signals as in [21] and [22] is briefly discussed in Appendix I.
In this paper, a novel low-voltage D-latch topology is
proposed and designed in a nanometer CMOS technology.
In Section II, we show that the conventional CML D-latch
topology is unsuitable to be implemented in a 40-nm CMOS
Fig. 3. Voltage gain of the CML inverter versus frequency and for Vswing
process under a low-voltage constraint (e.g., supply voltage ranging from 200 to 800 mV, for (VDD = 1 V and ISS = 250 μA).
below 1 V).
In Section III, we focus on two alternative low-voltage
D-latch topologies: first, we consider the CMOS implemen- where VSWING is the output voltage swing of the gate and
tation of the triple-tail D-latch presented in [23] (referring to gm is the small-signal transconductance of the trasnsistors
bipolar transistors), and then, we introduce a novel D-latch of the differential pair (M1 , M2 ). From the above equation,
topology which can be viewed as the folded version of the it is evident that due to the low gm available in nanometer
conventional D-latch circuit. In Section IV, a simple analyt- CMOS technologies, VSWING has to be increased in order
ical model to compute the theoretical propagation delay of to enhance A V . In particular, once having chosen the bias
both the folded D-latch and the triple-tail D-latch is derived. current ISS of the gate (ISS can be set according to power
Simulation results and comparisons are shown in Section V. consumption requirements in the low power or in the power-
The implementation of D flip-flops (DFFs) exploiting the efficient region [16]) and the overdrive voltage of the transis-
advantages offered by the proposed folded D-latch is discussed tors of the CML inverter, the gate width W of MOS devices
in Section VI. Finally, conclusions are reported in Section VII. M1 and M2 can be set and gm computed [16]. As an
example setting the bias current ISS = 250 μA (power-
II. C ONVENTIONAL CML L ATCH efficient region), L = 40 nm, and W = 4 μm, the voltage
D ESIGN IN 40-nm CMOS gain of the CML inverter as a function of the frequency is
depicted in Fig. 3 for different values of VSWING ranging from
The schematics of a CML inverter and of a conventional
200 to 800 mV. As we can see from Fig. 3, only large output
CML D-latch are depicted in Figs. 1 and 2, respectively.
voltage swings lead to voltage gain slightly greater than unity.
In order to demonstrate the unfeasibility to design a conven-
For VSWING = 800 mV, the gain A V is approximately 4 dB,
tional CML D-latch with a 40-nm CMOS technology under
which is a reasonable value to guarantee the full switching
the 1-V supply voltage constraint, we start analyzing the gain
of the CML inverter. (A margin is required to guarantee a
of the CML inverter reported in Fig. 1. The first thing to be
reasonable input sensitivity under PVT variations.)
pointed out is that to get full switching of a CML inverter,
Referring to the conventional CML D-latch circuit in Fig. 2,
the voltage gain A V of the inverter itself has to be greater
it has to be noticed that this topology imposes stringent
than unity.
requirements on the choice of the common-mode voltage
As shown in [16], A V can be expressed as follows:
for data and clock signals which are related to the VSWING .
VSWING By inspecting the circuit in Fig. 2, it can be easily understood
A V = gm R D = gm (1)
2ISS that setting VSWING = 800 mV and the supply voltage
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


Fig. 4. MOS triple-tail D-latch.

VDD = 1 V and considering that the output common-mode

voltage of the latch (Q signal) has to be equal to the input Fig. 5. Proposed low-voltage folded CML D-latch.
common-mode voltage (D signal) in order to allow cascading,
we are not able to properly bias all the stacked transistors,
leading the respective devices into the linear (triode) region.1 the input triple tail is entirely absorbed by M1 , and M3 –M4
Thus, a standard CML D-latch cannot be properly designed are OFF.
with a 40-nm CMOS technology node under the 1-V supply It has to be noticed that this circuit is very sensitive to the
voltage constraint and different low-voltage topologies are common-mode voltage and to the VSWING of the clock and
required to get a high-speed D-latch circuit working under data signals, as well as to the aspect ratios of M1 and M2 .
these constraints. As stated in [24], transistors M1 and M2 are assumed to
have an aspect ratio much greater than the other devices in
order to not reduce the output swing too much and avoid an
III. L OW-VOLTAGE CMOS CML L ATCH C IRCUITS excessive noise margin degradation. If we define the factor M
To overcome critical issues related to the low supply voltage as gm1 /gm3 , it can be easily shown that the output voltage
and the need of sufficiently large VSWING to guarantee full cur- swing is reduced according to the following:
rent steering in the differential pairs, D-latch topologies which M
make use of less stacked transistors than the conventional VSWING = 2R D ISS. (2)
1+ M
D-latch are required. As stated in the introduction, we focus
on low-voltage topologies in which the tail current of data Practical values of M to not reduce the output voltage swing
differential pairs is well defined and does not require additional with respect to the input one are in the range of 8–10.
circuitry to guarantee a stable output swing, propagation delay, Unfortunately using M = 8–10 results in a quite large input
and noise margin. capacitance of M1 and M2 which strongly loads the clock
distribution network: this issue represents one of the main
drawbacks of the triple-tail D-latch topology.
A. Low-Voltage Triple-Tail MOS D-Latch Note that in [27] a MOS CML exclusive-OR gate topology
A suitable topology to design a low-voltage CML D-latch exploiting multithreshold CMOS processes and the triple-tail
can be obtained by using the MOS implementation of the concept has been introduced. Indeed, the usage of transistors
triple-tail D-latch circuit, as shown in Fig. 4, originally pre- with different threshold voltages allows to exploit a lower
sented by using bipolar technologies in [23]. value for the factor M. However, since the multithreshold
The triple-tail principle has been recently adopted to option is not available in all CMOS processes, we consider
implement low-voltage CML exclusive-OR gates [25] and this solution is out of the aim of this paper.
D-latches [26] using CMOS technology. An expression of the
noise margin of the triple-tail D-latch can be found in [26]. B. Low-Voltage Folded D-Latch
Referring to the circuit in Fig. 4, the gates of transis-
A novel topology of low-voltage current-mode D-latch
tors M3 –M4 and M5 –M6 are connected to the input and
which we call folded CML D-latch is introduced here. In the
output data, respectively, whereas the additional transistors
proposed D-latch circuit, depicted in Fig. 5, the pMOS dif-
M1 and M2 are driven by the differential clock signal and
ferential pair M1 –M2 steers the current ISS through the input
select one of the differential couples M3 –M4 or M5 –M6 .
of current mirrors M7 –M9 and M8 –M10 , depending on the
When CK is high and C K is low, the D-latch samples the
differential clock value. The input data differential pair made
input data, since the current on the output triple-tail is entirely
up of devices M3 –M4 is active when CK is high, and the latch
absorbed by M2 and M5 –M6 are OFF. When CK is low and
samples the differential input signal. During the sample phase,
C K is high, the D-latch is in hold phase since the current of
only the input pair is active, and ISS flows through it. When
1 Note that this condition is not suited for high-speed D-latches, because due CK becomes low, the output data differential pair made up of
to reduced driving current in the linear region transistors are usually slower. devices M5 –M6 holds the output value due to the presence of
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


we propose a different approach which models the latch as

a two-pole system neglecting the effects of high frequency
zeroes on the propagation delay.
The bias current dependence of the small-signal parameters
has been accounted as in [16].

A. Propagation Delay Model of the Low-Voltage

Triple-Tail D-Latch
The equivalent circuit model used to evaluate the propa-
gation delay of the low-voltage triple-tail D-latch is shown
in Fig. 6, where

Fig. 6. Equivalent circuit models for triple-tail D-latch. Computation

G M = gm (3)
of (a) τ A and (b) τ B , respectively.
as shown in Appendix II.
In the proposed model, we split the propagation delay
the positive feedback as in the conventional D-latch topology. into two components. The first component τ A is the delay
During the hold phase, only the output pair M5 –M6 is active, associated with the voltage to current transfer from the gate
and ISS flows through it. of M1 (vCK ) to the drain of M3 (i A ), which is considered in
It has to be noticed that folding the current steering of short-circuit condition.
the clock differential pair allows to save one level of stacked The second component τ B is the delay associated with the
transistors, thus reducing the minimum supply voltage of the current to voltage transfer from the drain of M3 (i A ) to the
circuit with respect to the conventional CML D-latch. capacitive load at the output node Q (vOUT ) of the latch.
Another advantage obtained using the proposed topology (Transistor M3 acts as current source when computing this
is that the common-mode voltage of the clock signals can contribution.)
be set independently of the common-mode voltage of data The time constant τ A is given by the equivalent capacitance
signals. In the conventional D-latch, CK and D signals cannot and resistance at the common source of M1,3,4 (node s
have the same common-mode voltage. In fact referring to in Fig. 6) and can be expressed as follows:
Fig. 2, choosing the same common-mode voltage for CK and τ A = Req A Ceq A (4)
D results in VDS1,2 = 0. In the proposed folded D-latch,
CK and D common-mode voltages can be set independently where
(even equal to each other), allowing more degrees of freedom 1
for circuit optimization under a low-voltage constraint. Req A = (5)
2G m3,4 + G m1,2
Looking at Fig. 5, it can be easily shown that the minimum Ceq A = C gs1,2 + 2C gs3,4 + 2C gb3,4. (6)
supply voltage of the proposed folded D-latch is equal to the
minimum supply voltage of a CML inverter. The time constant τ B is given by the equivalent capacitance
Since the data path of the proposed folded D-latch is identi- and resistance at the output node Q of the latch and can be
cal to the one of the conventional CML D-latch, the expression expressed as follows:
of its noise margin can be found in [16].
τ B = Req B Ceq B (7)
In this section, we evaluate propagation delays on linearized R D · rds3,4 (1 + G m3,4 /G m1 ) ∼
Req B = = RD (8)
models of the D-latch topologies discussed in Section III. R D + rds3,4(1+G m3,4 /G m1 )
The propagation delay is estimated for the worst case scenario Ceq B = 2C gd3,5 + C gs5,6 + 2Cdb3,5 + C R D + C L . (9)
by considering the CK to Q delay when the clock signal
switches from the hold state to the sample state and the output According to this model, the transfer function from
has to be inverted with respect to the stored value. Due to vCK to vOUT can be viewed as the second-order transfer
the symmetry of the circuits, our analysis is carried out on function with two real poles. When one of the two poles can
the half-circuit representation. The logic threshold has been be considered as dominant (i.e., τ B /τ A > 10), the second-
chosen as reference for the linearization point. We want to order transfer function can be approximated by the first-order
point out that even if the D-latch is a digital circuit, a simple one and the propagation delay can be computed by using the
linearized model can give a reasonable accuracy due to the well-known formula
adoption of nanometer MOS transistors (whose I –V relation τpd ∼
= 0.69 · τ B . (10)
is close to linear) and low supply voltage [15].
In CML modeling literature [14]–[19], it is common to In the general case in which no one of the two poles is
approximate the CML gates as the first-order system with a dominant the second-order system has to be studied in order
pole time constant τ and zero time constant τ z. In this section, to compute the propagation delay.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


Fig. 7. Equivalent circuit models for the folded D-latch. Computation of (a) τ A_FL , (b) τ B_FL , and (c) τC_FL , respectively.

In [28], a good approximation of the propagation delay while the capacitive contribution at the node is equal to
of the second-order system is reported, which after trivial
Ceq A_FL = 2C gd1,2 + Cdb1,2 + Cdb7,8 + C gs7,8
manipulations can be rewritten as
  + C gs9,10 + C gd9,10. (15)
τA + τB √
τpd = 1 + 0.35 · √ τAτB . (11)
τAτB From the drain of M9,10 now acting as a current source
Equation (11) is able to give an accurate approximation of the to the drain of M4,5 (which is now considered in short-
propagation delay if τA and τB are real poles and the following circuit condition), a current-to-current transfer with time
condition is satisfied2 : constant τ B_FL can be identified
τB τ BFL = Req BFL Ceq BFL (16)
< 10. (12)
B. Propagation Delay Model of the Low-Voltage 1
Folded D-Latch Req B_FL = (17)
G m3 + G m4
The equivalent circuit model used for the computation of the Ceq B_FL = Cdb9,10 + C gd9,10 + 2C gs3,4 + 2C gb3,4. (18)
propagation delay of the low-voltage folded D-latch is shown
The third time constant τC_FL is due to the current-to-
in Fig. 7.
voltage transfer from the drain of M4,5 (which acts as a current
The propagation delay can be split into three components.
source) to the capacitive load at the output node (Q)
Each one of these components can be considered as a single-
pole system, and we will give a definition of the three time τC_FL = ReqC_FL CeqC_FL (19)
constants, as in the previous case. The first time constant
τ A_FL is related to the voltage–current transfer from the gate where
of M1,2 (vCK ) to the drain of M9,10 (i D9) in short-circuit ReqC_FL ≈ R D (20)
CeqC_FL = 2C gd3,5 + C gs5,6 + 2Cdb3,5 + C R D + C L . (21)
τ A_FL = Req A_FL Ceq A_FL . (13)
By using this approach, the folded D-latch is modeled as the
In this case, the equivalent resistance is the resistance third-order system. In order to further simplify the analysis,
offered at the drain of M7,8 we can notice that the time constant τ B_FL is typically smaller
1 1 than that of the other two. This can be shown numerically by
Req A_FL = //rds1,2 ≈ (14) comparing (18) against (15) and (21) and (17) against (14)
G m7,8 G m7,8
and (20). From a circuit perspective, the current-to-current
2 Condition (12) states that equation (11) is accurate when we are not transfer related with time constant τ B_FL can be analyzed
in dominant pole condition. If we consider the limit case τ B /τ A = 10 referring to a common gate transistor acting as a current buffer,
equation (11) gives: τ pd = 0.701τ B , which is in reasonable agreement with
equation (10). When (τ B /τ A ) > 10 equation (10) results more accurate than which is, as well known, faster than the other configurations.
equation (11). Hence, if we neglect the pole due to the time constant τ B_FL ,
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.



we can compute the propagation delay of the folded latch

assuming the second-order system and according to (11) where
τ A = τ A_FL and τ B = τ C_FL .


In order to validate the proposed propagation delay models
and to compare the triple-tail and the low-voltage folded
D-latch performances, the two circuits have been designed and
simulated in a 40-nm CMOS technology whose main process
parameters, defined as in [16], are reported in Table I. All the
simulations have been carried out using Cadence Virtuoso with
BSIM4 models provided by the IC manufacturer.
The value of the bias current ISS_opt_PDP [16] which mini-
mizes the power delay product (PDP) by using the 40-nm tech-
nology has been found to be about 150 μA. ISS = 250 μ A
has been assumed as the reference bias current in our design
as a reasonable compromise between speed and power con-
sumption in the power-efficient region.
The propagation delay has been measured while ranging
the bias current ISS from 100 to 500 μA: this range includes
values in the low power, power efficient, and high-speed design Fig. 8. Testbenches used to evaluate propagation delay of the low-voltage
regions. D-latches. (a) TB1. (b) TB2. (c) TB3.
VSWING has to be designed to guarantee both a minimum
gain ( Av > 1.4 according to [25]) and the operation of Simulations have been carried out on three different test-
transistors in the saturation region [16]. Due to the low- benches and two different fan-out conditions as detailed in the
voltage constraint and the limitations of the 40-nm technology, following. (CML inverters have been used as load and driving
a careful design of VSWING is needed to guarantee proper cells.)
operation of the latches. In fact, a lower VSWING results in 1) Testbench 1 (TB1): Ideal clock source feeds the D-latch
lower gain. (For Vswing = 0.6 V, the gain Av is less than 1,3 as under test [see Fig. 8(a)].
shown in Fig. 3.) On the other hand, the maximum VSWING 2) Testbench 2 (TB2): CML inverter, biased and sized
which ensures the transistor operation in the saturation region at ISS , used as buffer cell for the clock signal
is equal to 2VTH which is about 0.84 V in the adopted 40-nm [see Fig. 8(b)].
technology. In our design, VSWING has been set to 0.8 V and 3) Testbench 3 (TB3): CML inverter, biased and sized
kept constant in all the simulations and comparisons. at 4ISS , used as buffer cell for the clock signal
To get a fair comparison, both the triple-tail and the folded [see Fig. 8(c)].
D-latch topologies have been designed by following the design For each one of the three testbenches, simulations have been
procedure in [16] to exhibit the same power consumption. carried out for a fan-out of 1 (FO1) and a fan-out of 5 (FO5)
Resulting design parameters and transistor dimensions for the CML inverters biased at ISS .
reference bias current ISS are reported in Table II and scaled TB1 has been used to validate the propagation delay models
with the current as in [16]. presented in Section IV: a comparison of propagation delay
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


Fig. 9. Comparison of propagation delay model versus bias current against Fig. 10. Propagation delay plot versus bias current for triple-tail D-latch
simulations. (a) Triple-tail D-latch. (b) Folded D-latch (TB1). referring to TB2. Case (a) FO = 1 and (b) FO = 5, respectively.

In simulations referring to TB3, the clock buffer is biased

models and simulations is shown in Fig. 9 showing an agree-
and sized for 4ISS and is able to adequately drive the high
ment of model versus simulations within 15%.
capacitive load due to M1,2 in the triple-tail D-latch at the
Simulation results shown in Fig. 9 confirm that driving
expense of higher power consumption. Under this condition,
both the two latch topologies with an ideal clock source, they
the triple-tail and the proposed D-latches again exhibit similar
exhibit quite similar propagation delays. In fact, for ISS set
propagation delays.
to 250 μA and FO = 1, the propagation delay is 10,45 ps
Analyzing the results of simulations referring to
(10.93 ps) for the triple-tail (folded) D-latch. For ISS set to
TB2 and TB3, it is evident that the proposed folded D-latch
250 μA and FO = 5, the propagation delay is 16,03 ps
outperforms the triple-tail D-latch in all the cases in which
(15.88 ps) for the triple-tail (folded) D-latch.
the clock distribution network is based on CML inverters
Simulations referring to TB2 are shown in Fig. 10 (Fig. 11)
sized and biased for typical current values (i.e., ranging
for the triple-tail (folded) D-latch. In the case of the triple-tail
from fractions of the D-latch bias current ISS up to three
D-latch (FO = 1), the propagation delay of the clock buffer is
or four times ISS ). The delay improvement provided by the
greater than the latch’s delay. In fact, the clock buffer suffers
proposed topology over the triple-tail latch is more evident
from the high capacitive load due to M1,2 . This is a critical
when the clock distribution network is made up of CML
drawback for the triple-tail D-latch.
inverters biased with low-current values.
Simulations referring to TB3 are shown in Fig. 12 (Fig. 13)
for the triple-tail (folded) D-latch.
Table III confirms a propagation delay improvement higher A. Comparison of the Results
than 30% in TB2 case both for FO1 and FO5 conditions for A summary of propagation delays for ISS set to 250 μA is
the proposed folded D-latch topology over the triple-tail one. reported in Table III for the different testbenches and fan-out
Propagation delays of the folded and triple-tail D-latches for conditions.
ISS ranging from 100 to 500 μA are compared in Fig. 14 for Results in Fig. 14 confirm the propagation delay improve-
FO1 (a) and FO5 (b) conditions. ment in TB2 case both for FO1 and FO5 conditions for the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


Fig. 11. Propagation delay plot versus bias current for the folded D-latch Fig. 12. Propagation delay plot versus bias current for triple-tail D-latch
referring to TB2. Case (a) FO = 1 and (b) FO = 5, respectively. referring to TB3. Case (a) FO = 1 and (b) FO = 5, respectively.


gin (CPM) [29] of the folded and triple-tail D-latches for

different clock frequencies is reported in Table IV.
The CPM specifies the maximum deviation of a clock strobe
edge from the center of the input data eye which will not
introduce errors when resampling the output data. A large
CPM means that the data regenerator may tolerate phase
misalignments in the clock recovery, and also some routing
delay in the recovered clock used as its strobe.


In order to show the heavy advantage of the proposed low-
proposed folded D-latch topology over the triple-tail one for voltage CML D-latch, the triple-tail D-latch and the proposed
ISS ranging from 100 to 500 μA. folded D-latch have been used as basic elements for the design
As a measure of the sensitivity with respect to the clock of master–slave DFFs based on the well-known architecture
edge rate, a comparison in terms of clock phase mar- shown in Fig. 15.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


Fig. 13. Propagation delay plot versus bias current for the folded D-latch Fig. 14. Comparison of the folded versus triple-tail D-latches propagation
referring to TB3. Case (a) FO = 1 and (b) FO = 5, respectively. delay for TB2. (a) FO1. (b) FO5.

Note that when designing a DFF using the triple-tail

D-latch as a basic element, two full instances of the circuit
in Fig. 4 have to be used.
On the other hand, when designing a DFF using the
proposed folded D-latch as a basic element, it is possible
to share the clock switching part of the latch (M1 , M2 , M7 ,
M8 , and ISS current source) between the two latch instances.
In fact, as shown in Fig. 16, transistors M9B and M10 B can
be used to implement additional outputs of the clock current
steering differential pair through the current mirrors M7 –M9 Fig. 15. Master–slave DFF architecture.
and M8 –M10 . [The inverted phase of the clock signal is imple-
mented by properly connecting the additional outputs of the the transistors in the current mirrors) which can be exploited
clock current steering part to the second D-latch (see Fig. 16).] to reduce the power consumption or to optimize speed per-
As a drawback of the proposed DFF topology in Fig. 16, formance. As an example, in the overall current budget,
it has to be noted that due to the additional current mirror the proposed DFF topology allows to save the current of one
outputs, the capacitance at the gate nodes of M7 and M8 is clock buffer: this saved current can be used to increase the tail
increased and the time constant τ A_FL defined in Section IV current of the clock differential pair M1 , M2 , thus improving
is increased too. the frequency response of the current mirrors.
The possibility to share the same clock steering pair for the An improved version of the clock switching part of the DFF
two latches of the DFF allows to reduce the load presented by is shown in Fig. 17 [30]. Referring to the circuit in Fig. 17,
the DFF to the clock distribution network. This introduces transistors M7 and M8 are replaced by transistors M7 A ,
additional degrees of freedom in the design (i.e., the tail M7B , M8 A , and M8B , which are connected in a feedback
current of the clock differential pair and the aspect ratio of arrangement to improve the accuracy of the current mirrors.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


Fig. 16. Schematic of the proposed folded DFF.

Fig. 18. Comparison of propagation delay for the folded and triple-tail DFFs.

Fig. 17. Detail of the improved clock switching part of the folded DFF.
Fig. 19. Simulation testbench based on clock frequency divider-by-4 to
evaluate the maximum toggle frequency.
In fact, M7B and M8B can be sized to be equal to M3 and M4
(see Fig. 16) and the bias voltage V B can be set to be equal to
the common-mode voltage of D signals to equalize the drain–
source voltage VDS of transistors M7 A , M9 , M9B , M8 A , M10 ,
and M10 B .
In this way, the channel length modulation effect is min-
imized and the accuracy of the current mirrors is strongly
In order to demonstrate the advantages provided by the
proposed DFF topology, we have compared the DFF imple-
mented using triple-tail latches against the proposed DFF
shown in Fig. 16 and using the clock switching stage shown
in Fig. 17.
Fig. 18 shows the comparison of the “clock to Q” delay
of the folded and triple-tail DFFs for different bias current
settings. The comparison has been carried out while consid-
ering the same power consumption for both DFFs, which has
been fixed to 6ISS including clock driving buffers.
The performance improvement has been found to be 42%
Fig. 20. Maximum output frequency of the clock frequency divider-by-4
on average and 35% minimum over the whole considered circuit for different ISS settings.
ISS range.
As a further comparison between the folded and triple-tail
DFFs, we have considered the simulation testbench shown The maximum output frequency of the clock frequency
in Fig. 19 and based on a clock frequency divider-by-4 circuit divider-by-4 circuit is reported in Fig. 20 as a function
in which the maximum output frequency has been evaluated. of ISS : the performance improvement in the maximum toggle
Also, in this case, the DFFs have been designed with the frequency has been found to be 48% on average and 54%
same power consumption fixed to 6ISS including clock driving maximum, showing how the folded DFF-based circuit outper-
buffers. forms the triple-tail-based one in all ISS conditions.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.



In this paper, we have analyzed the issue of implementing [1] S. Han, T. Kim, J. Kim, and J. Kim, “A 10 Gbps SerDes for wireless
high-speed CML D-latches and DFFs using sub-50-nm CMOS chip-to-chip communication,” in Proc. Int. SoC Design Conf. (ISOCC),
2015, pp. 17–18.
technologies under a low-voltage constraint. [2] S. Hua, Q. Wang, H. Yan, D. Wang, and C. Hou, “A high speed low
The conventional CML D-latch topology has been shown to power interface for inter-die communication,” in Proc. Int. Conf. Solid-
be unsuitable to be implemented with supply voltages lower State Integr. Circuit Technol. (ICSICT), 2010, pp. 1916–1918.
[3] H.-J. Jeon, R. Kulkarni, Y.-C. Lo, J. Kim, and J. Silva-Martinez,
than 1 V using a 40-nm CMOS process. A novel low-voltage “A bang-bang clock and data recovery using mixed mode adaptive
CML D-latch topology has been introduced and compared loop gain strategy,” IEEE J. Solid-State Circuits, vol. 48, no. 6,
against the low-voltage triple-tail D-latch. A simple model of pp. 1398–1415, Jun. 2013.
[4] S.-H. Chu et al., “A 22 to 26.5 Gb/s optical receiver with all-digital
the propagation delay has been derived for both low-voltage clock and data recovery in a 65 nm CMOS process,” IEEE J. Solid-
D-latch topologies and design guidelines have been provided. State Circuits, vol. 50, no. 11, pp. 2603–2612, Nov. 2015.
Simulation results for different performance/power consump- [5] G. Shu et al., “A 4-to-10.5 Gb/s continuous-rate digital clock and data
recovery with automatic frequency acquisition,” IEEE J. Solid-State
tion tradeoffs have been carried out showing more than 30% Circuits, vol. 51, no. 2, pp. 428–439, Feb. 2016.
speed improvement of the proposed folded D-latch topology [6] T. Chalvatzis, E. Gagnon, M. Repeta, and S. P. Voinigescu, “A low-
against the triple-tail D-latch in a wide range of speed/power noise 40-GS/s continuous-time bandpass  ADC centered at 2 GHz
for direct sampling receivers,” IEEE J. Solid-State Circuits, vol. 42,
tradeoff conditions. no. 5, pp. 1065–1075, May 2007.
It is worth noting that the proposed folded D-latch has [7] A. Ghilioni, A. Mazzanti, and F. Svelto, “Analysis and design of mm-
been compared with a D-latch circuit without current source wave frequency dividers based on dynamic latches with load modu-
lation,” IEEE J. Solid-State Circuits, vol. 48, no. 8, pp. 1842–1850,
as presented in [21] and [22]. Results show that despite the Aug. 2013.
proposed solution maintain the robustness, (unlike the D-latch [8] J. Kang, P. Qin, X. Li, and T. Mo, “13 GHz programmable frequency
without current source) it provides a propagation delay almost divider in 65 nm CMOS,” in Proc. Int. Conf. Solid-State Integr. Circuit
Technol. (ICSICT), 2012, pp. 1–3.
equal or even better (see Appendix I). [9] J. Yang, J. Shi, P. Ma, and S. Zhang, “A wideband and high-
A low-voltage CML DFF topology exploiting the folded speed frequency divider,” in Proc. Int. Conf. Solid-State Integr. Circuit
D-latch and using an improved clock switching circuit has Technol. (ICSICT), 2014, pp. 1–3.
[10] M. Alioto, R. Mita, and G. Palumbo, “Design of high-speed power-
been also proposed showing more than 54% speed improve- efficient MOS current-mode logic frequency dividers,” IEEE Trans.
ment over the triple-tail latch-based DFF in a clock frequency Circuits Syst. II, Exp. Briefs, vol. 53, no. 11, pp. 1165–1169, Nov. 2006.
divider-by-4 circuit. [11] X. Zhang, Y. Wang, S. Jia, G. Zhang, and X. Zhang, “A novel CML latch
for ultra high speed applications,” in Proc. Int. Conf. Electron Devices
A PPENDIX I Solid-State Circuits (EDSSC), 2014, pp. 1–2.
The proposed folded D-latch has been compared against a [12] W.-Y. Tsai, C.-T. Chiu, J.-M. Wu, S.-H. Hsu, and Y.-S. Hsu, “A novel
MUX-FF circuit for low power and high speed serial link interfaces,”
D-latch circuit in which current source transistors are directly in Proc. Int. Symp. Circuits Syst., 2010, pp. 4305–4308.
driven by the clock signals as in [21] and [22] according [13] U. Singh, L. Li, and M. M. Green, “A 34 Gb/s distributed 2:1 MUX
to TB1, for the same reference current ISS = 250 μA and and CMU using 0.18 μm CMOS,” IEEE J. Solid-State Circuits, vol. 41,
no. 9, pp. 2067–2076, Sep. 2006.
using the same data differential pairs. If driven by large swing [14] O. Schrape, M. Appel, F. Winkler, and M. Krstić, “Low-power design
clock signals (i.e., from 0 to 700 mV of clock swing), the methodology for CML and ECL circuits,” in Proc. Int. Workshop Power
circuit following the approach in [21] and [22] is slower than Timing Modeling, Optim. Simulation (PATMOS), 2014, pp. 1–5.
[15] M. Alioto and G. Palumbo, Model and Design of Bipolar and MOS
the folded D-latch. By carefully designing the clock signal Current-Mode Logic: CML, ECL and SCL Digital Circuits. New York,
swing (i.e., from 450 to 700 mV), the same circuit exhibits a NY, USA: Springer, 2015.
propagation delay of 9.5 ps and results about 15% faster than [16] M. Alioto and G. Palumbo, “Power-aware design techniques for nanome-
ter MOS current-mode logic gates: A design framework,” IEEE Circuits
the folded D-latch circuit. Similar results have been found for Syst. Mag., vol. 6, no. 4, pp. 40–59, 4th Quart., 2006.
ISS ranging from 100 to 500 μA. [17] A. Kapoor, Y. Hu, and R. Bashirullah, “A current-density centric logical
effort delay and power model for high-speed CML gates,” IEEE Trans.
A PPENDIX II Circuits Syst. I, Reg. Papers, vol. 60, no. 10, pp. 2618–2630, Oct. 2013.
[18] N. Singh and S. Deb, “Analysis and design guidelines for customized
As shown in [16], the large-signal transconductance G M logic families in CMOS,” in Proc. Int. Symp. VLSI Design Test (VDAT),
can be expressed as a function of the gm 2015, pp. 1–2.
[19] I. Jang, Y. Lee, S. Kim, and J. Kim, “Power-performance tradeoff
i D gm
GM = ≈ (A.1) analysis of CML-based high-speed transmitter designs using circuit-level
vG S 0.6 + 0.4 · α optimization,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 63, no. 4,
pp. 540–550, Apr. 2016.
where referring to the well-known alpha-power law [16] [20] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits,
  1 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 2003.
1 ISS 1− α
gm = α · (K · W ) α (A.2) [21] B. Razavi, “The role of PLLs in future wireline transmitters,” IEEE
2 Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 8, pp. 1786–1793,
 α Aug. 2009.
22α−1 AV [22] P. Payandehnia, H. Maghami, S. Sheikhaei, A. Abbasfar,
W = · · ISS . (A.3)
K α·V SWING B. Forouzandeh, and K. Nanbakhsh, “High speed CML latch using
active inductor in 0.18 μm CMOS technology,” in Proc. IEEE Iranian
For scaled processes, since α is close to unity, (A.1) can be Conf. Elect. Eng. (ICEE), May 2011, pp. 1–4.
approximated into [23] B. Razavi, Y. Ota, and R. G. Swartz, “Design techniques for low-voltage
high-speed digital bipolar circuits,” IEEE J. Solid-State Circuits, vol. 29,
G M ≈ gm . (A.4) no. 3, pp. 332–339, Mar. 1994.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


[24] M. Alioto, R. Mita, and G. Palumbo, “Performance evaluation of the Davide Bellizia was born in 1989. He received the
low-voltage CML D-latch topology,” Integr., VLSI J., vol. 36, no. 4, bachelor’s degree in electronic engineering and the
pp. 191–209, 2003. master’s (summa cum laude) degree in electronic
[25] K. Gupta, N. Pandey, and M. Gupta, “Analysis and design of MOS design from the University “La Sapienza,” Rome,
current mode logic exclusive-OR gate using triple-tail cells,” Microelec- Italy, in 2011 and 2014, respectively, where he
tron. J., vol. 44, no. 6, pp. 561–567, 2013. is currently pursuing the Ph.D. degree with the
[26] K. Gupta, N. Pandey, and M. Gupta, “MCML D-latch using triple-tail Dipartimento di Ingegneria dell’Informazione,
cells: Analysis and design,” Active Passive Electron. Compon., vol. 2013, Elettronica e Telecomunicazioni.
pp. 1–9, 2013, doi: 10.1155/2013/217674. His current research interests include the design
[27] N. Pandey, K. Gupta, G. Bhatia, and B. Choudhary, “MOS current of cryptographic ICs for counteracting power
mode logic exclusive-OR gate using multi-threshold triple-tail cells,” analysis attacks and VLSI design for DSP algorithm
Microelectron. J., vol. 57, no. 11, pp. 13–20, Nov. 2016. implementations.
[28] B. Kuo, Automatic Control Systems, 3rd ed. Englewood Cliffs, NJ, USA: Mr. Bellizia received the “Laureato Eccellente” Award for the best
Prentice-Hall, 1975. graduated student of the year in 2014.
[29] S. K. Enam and A. A. Abidi, “NMOS IC’s for clock and data regener-
ation in gigabit-per-second optical-fiber receivers,” IEEE J. Solid-State Alessandro Trifiletti was born in Rome, Italy,
Circuits, vol. 27, no. 12, pp. 1763–1774, Dec. 1992. in 1959. He received the Laurea degree in elec-
[30] J. Ramirez-Angulo, R. G. Carvajal, and A. Torralba, “Low supply tronic engineering from the Università di Rome
voltage high-performance CMOS current mirror with low input and “La Sapienza,” Rome, Italy.
output voltage requirements,” IEEE Trans. Circuits Syst. II, Exp. Briefs, In 1991, he joined the Dipartimento di Ingegne-
vol. 51, no. 3, pp. 124–129, Mar. 2004. ria Elettronica, Università di Rome “La Sapienza,”
as a Research Assistant, where he is currently an
Associate Professor. He has authored over 70 inter-
national journal papers and 120 contributions in
conference proceedings. His current research inter-
ests include high-speed circuit design techniques,
III–V device modeling, DSP techniques to enhance analog circuit perfor-
mance, techniques to improve resilience to security attacks in VLSI ICs, and
robust design methodologies.

Giuseppe Scotti was born in Cagliari, Italy, in 1975. Gaetano Palumbo (F’07) was born in Catania,
He received the M.S. and Ph.D. degrees in elec- Italy, in 1964. He received the Laurea degree in
tronic engineering from the University of Rome electrical engineering and the Ph.D. degree from the
“La Sapienza,” Rome, Italy, in 1999 and 2003, University of Catania, Catania, in 1988 and 1993,
respectively. respectively.
In 2010, he joined the Department of Information In 1994, he joined the University of Catania,
Engineering, Electronics and Telecommunications, where he is currently a Full Professor. He has
University of Rome “La Sapienza,” as a Researcher co-authored four books (Kluwer Academic Publish-
(Assistant Professor), where he was appointed as an ers and Springer in 1999, 2001, 2005, and 2014),
Associate Professor in 2015. He teaches undergrad- a textbook on electronic devices in 2005, and several
uate and graduate courses on basic electronics and patents. He has authored more than 400 scientific
microelectronics. His research activity was mainly concerned with integrated papers in referred international journals (more than 170) and conferences.
circuits design and focused on design methodologies able to guarantee His current research interests include analog and digital circuits.
robustness with respect to parameter variations in both analog circuits and Mr. Palumbo served as a member of the Board of Governors of the IEEE
digital VLSI circuits. In the context of analog design, his research activity CAS Society, from 2011 to 2013. In 2003, he received the Darlington
was concerned with circuit topologies for the realization of low-voltage Award. He served as an Associate Editor of the IEEE T RANSACTIONS
analog building blocks using ultra-short channel CMOS technologies and ON C IRCUITS AND S YSTEMS PART I during 1999–2001, 2004–2005, and
with the development of current mode analog functions. He has been also 2008–2011, and the IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS
involved in research and development activities held in collaboration between PART II during 2006–2007. In 2005, he was one of the 12 panelists in
“La Sapienza” University and some industrial partners which led, between the scientific-disciplinary area 09—industrial and information engineering of
2000 and 2015, to the implementation of 13 application-specified integrated the CIVR (Committee for Italian Research Assessment). In 2015, he has
circuits. He has co-authored more than 45 publications in international journals been a panelist of the Group of Evaluation Experts in the scientific area
and 70 contributions in conference proceedings, and is a co-inventor of 09—industrial and information engineering of the ANVUR for the Assessment
two international patents. of Italian Research Quality during 2011–2014.

You might also like