Professional Documents
Culture Documents
Sense-Amplifier-Based Flip-Flop With Transition Completion Detection For Low-Voltage Operation
Sense-Amplifier-Based Flip-Flop With Transition Completion Detection For Low-Voltage Operation
Abstract— A novel high-speed and highly reliable sense- VDD decreases, leading to a large variation in gate delay.
amplifier-based flip-flop with transition completion detection As a result, the setup time, tsetup , in master–slave-based edge-
(SAFF-TCD) is proposed for low supply voltage (VDD ) operation. triggered FFs [3]–[6], which is determined by the worst case
The SAFF-TCD adopts the internally generated detection signal
to indicate the completion of sense-amplifier stage transition. The variation, is significantly increased [7]. In the pulse-triggered
detection signal gates the pull-down path of the sense-amplifier FFs proposed in [8]–[11], this problem is resolved. Input
stage and the slave latch, thus overcoming the operational yield D of the pulse-triggered FFs starts to be sampled by the
degradation, current contention, and glitches of previous SAFFs. latch right after the clock rising edge, which results in near-
The operational yield, speed, hold time, energy consumption, zero or negative tsetup. However, these FFs suffer from conflict-
and area of the proposed and previous FFs are quantitatively
compared for a wide range of VDD with 22-nm FinFET tech- ing requirements for the width of the sampling window. A very
nology. It is shown that the minimum VDD of the SAFF-TCD small width cannot guarantee that the input data value properly
is 573 mV lower than that of previous SAFFs, which means propagates into the latch, whereas a large width increases
the SAFF-TCD can operate even when VDD is in the near- the hold time, thold . This so-called sizing problem becomes
threshold or subthreshold region. At 0.3–0.4 V, the SAFF-TCD more severe as variation effects increase in low VDD regions,
operates twice as fast as the master–slave-based FF (MSFF)
with a practical hold time. Even with these benefits, the energy because the pulsewidth required to reliably propagate the
consumption overhead is limited to less than 20% compared with input into the latch and thold are determined by the respective
that of MSFF, and the area is similar to that of previous SAFFs. worst variation corners. There are also approaches to achieve
Index Terms— Flip-flop (FF), low-voltage circuit design, sense low VDD operation of FFs by utilizing 28-nm fully depleted
amplifier (SA). silicon on insulator (FD-SOI) with back biasing [12], [13].
With the back biasing, circuit designers are allowed to control
I. I NTRODUCTION Vth dynamically, which enables to widen the operating voltage
range. Especially in [13], it is demonstrated that nonvolatile
TABLE I
T ECHNOLOGY PARAMETERS FOR VDD = 0.8 V
TABLE II
D ESIGN RULES FOR 22-nm FinFET
TABLE III
F IN N UMBERS FOR AN SA S TAGE
at low VDD , ION becomes larger with higher temperature by Fig. 15. D to Q delay (tDQ ) versus D to CLK delay (tDC ) for various FFs
at TT corner with VDD = 0.8 V.
Vth effect. For the previous SAFFs whose VDDmin is critically
determined by high VDD operation, the yield becomes worse
at the cold temperature because of higher ION of MN4 . On the
other hand, for the proposed SAFF-TCD, the effect of MN4
on the yield is negligible. Instead, the mismatch between input
transistors of SA stage—MN2 and MN3 in Fig. 6—critically
determines the yield. In low VDD region, where VDDmin of
SAFF-TCD is determined, the voltage difference between
X and Y at the beginning of sensing procedure, which is
developed by the two input transistors, is smaller in cold
temperature due to smaller ION . As a result, VDDmin is
determined at the cold temperature condition in SAFF-TCD.
VDD,min of the previous SAFFs is 735 mV. In other words,
the previous SAFFs cannot satisfy the target yield if VDD is
less than 735 mV. However, VDD,min of SAFF-TCD is 162 mV,
an improvement of 573 mV. This is attributed to the adaptively
turn ON of MN4 controlled by TC. Thus, when Nfin values for
the transistors in the SA stage are set as shown in Table III,
SAFF-TCD can operate in the near-threshold or subthreshold Fig. 16. Comparison of 3σ worst case values of tsetup in various FFs for
region, whereas previous SAFFs cannot. Because the variation different VDD values.
effect is rapidly increased as VDD lowers in sub-Vth region,
the operational yield of SAFF drops steeply VDD < 200 mV. Montanaro’s SAFF, Nikolic’s SAFF, Kim’s SAFF, and
Thus, it is highly unstable to operate SAFF-TCD near VDD,min , Strollo’s SAFF—have the same tsetup, because these four previ-
175 mV, and it is more appropriate to determine VDD,min with ous SAFFs have identical SA stages. As also previously men-
safety margin, for example, VDD,min = 200 mV at which tioned in Section II, tsetup of these four previously proposed
the operational yield is much higher than the target yield. SAFFs is near-zero negative, since the sampling of D starts at
Nevertheless, this conservative VDD,min of SAFF-TCD is still the CLK rising edge, and ends as soon as D is latched inside
far better than VDD,min of previously proposed SAFFs. the SA stage. SAFF-TCD has a slightly smaller tsetup (better)
Fig. 14 compares the waveforms of /R and /S in SAFF-TCD than the previous SAFFs. This is because, in SAFF-TCD
with those in the previous SAFFs when VDD is at the near- even if D stabilizes slower than in the previous SAFFs and
threshold voltage (400 mV) and the sampled D is high. It is the difference between D and /D at the sampling edge is
observed that /R and /S are correctly developed in SAFF-TCD, smaller, the SA stage of SAFF-TCD operates more quickly and
unlike in the previous SAFFs. correctly because of the turned-OFF MN4 during the transition.
Fig. 15 compares tDQ versus D-to-CLK delay (tDC ) curves Fig. 16 compares 3σ worst case values of tsetup in various
of various FFs at TT corner with VDD = 0.8 V. According FFs for different VDD values. As stated in Section II, tsetup of
to [2], tsetup is derived as tDC that minimizes tDQ . As explained PowerPC is greatly increased in low VDD , which results in a
in Section II, PowerPC FF has the largest tsetup, because D significant performance degradation. For the previous SAFFs,
should be well copied into the master latch sufficiently long tsetup can only be obtained when VDD ≥ 800 mV, because
before the CLK rising edge, and TGPL has the smallest tsetup, they have VDD,min of 735 mV. For the previous SAFFs and
because D can be captured for a positive time period, TON , SAFF-TCD, tsetup is negative or close to zero, as explained
after the CLK rising edge. Four previously proposed SAFFs— previously.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 17. 3σ worst tsetup versus VDD at −40° and 120° for (a) PowerPC FF,
(b) TGPL, (c) SAFF-TCD, and (d) previous SAFFs.
Fig. 21. 3σ worst thold of (a) SAFF-TCD and (b) TGPL at cold and hot
temperature corners.
Fig. 20. thold,3σ in various FFs and thold,3σ / Tclk .
compared with PowerPC, because the energy consumption is
over 44% of Tclk at 300 mV, as shown in Fig. 20. According dominated by the pulse generation, which is performed every
to [28], thold should be limited to 10% of Tclk for the practical CLK edge.
use of FFs. Thus, it can be concluded that TGPL is not At high VDD , even though most SAFFs exhibit comparable
applicable at low voltages. However, in SAFF-TCD, thold,3σ is results, Kim’s consumes the most energy because of the
limited below 10% of Tclk even at low VDD , as the sampling current contention at the output nodes. At low VDD values,
window in the SA stage is shut as soon as the data have where only PowerPC FF and SAFF-TCD can operate, SAFF-
been latched. The results for Montanaro’s SAFF are included TCD consumes 15%–20% more energy. This is because, even
only VDD ≥ 800 mV, where the target yield can be achieved. when D is the same as the previous Q, /R, /S, and TC must be
In this region, SAFF-TCD has larger thold,3σ compared with switched ON every CLK cycle, whereas only CLK and /CLK
Montanaro’s SAFF, because a NAND2 for implementing switch in the PowerPC FF. However, this energy overhead
TCD scheme increases the capacitance of /R and /S. seems tolerable, considering the 2 times speed benefit of
However, because of low variation effect in high VDD region, SAFF-TCD and the unique advantages of SAFFs, such as
SAFF-TCD still has small thold,3σ (<4% of Tclk ), which is level-shifting applications and differential signal availability.
sufficiently lower than the practical limit. In Fig. 21(a) and (b), Compared with Strollo’s SAFF, the proposed SAFF-TCD
thold versus VDD are derived at cold and hot temperature does not show a significant improvement in speed and energy.
corners for SAFF-TCD and TGPL, respectively. Similar to However, the proposed SAFF-TCD has the advantage in terms
tsetup and tDQ , for VDD ≥ 700 mV, thold is larger at hot of operating voltage, as previously mentioned. In other words,
temperature corner, because ION becomes smaller at hot SAFF-TCD can operate at much wider supply voltage range
temperature, which increases time required to sufficiently compared with Strollo’s SAFF.
develop the internal nodes of SAFF-TCD and TGPL after Fig. 23 shows the layout of SAFF-TCD based on the design
clock edge. On the contrary, for VDD < 700 mV, thold is rules listed in Table II, and Table IV compares the layout area
larger at cold temperature corner in which ION is smaller of various FFs, normalized according to that of the PowerPC
compared with hot temperature corner. FF. With the exception of the conventional SAFF, the areas
Fig. 22 shows the energy consumption of various FFs for are comparable.
different values of activity factor, α; the energy was measured As stated in Section II, the previous SAFFs can also achieve
by the method demonstrated in [29]. It is assumed that the target operational yields at low VDD if larger Nfin values
short pulse enabling TGPL is generated implicitly as in [8]. are used in the SA stage, except for the always-turned-ON
As mentioned in Section II, a large number or a large size transistor MN4 . Fig. 24 shows the ratio of Nfin for achieving
of transistors should be used in a pulse generator to obtain the 3σ yield (Nfin,3σ ) to the typical Nfin set (Nfin,typ) given
sufficiently large TON that can tolerate large variation effect in in Table III for different VDD values. To compensate for
low VDD , which causes large power consumption. Thus, TGPL the effect of variation and reduced voltage headroom at
consumes larger energy than any other FFs. low VDD , this should be accompanied by an extremely large
By comparing the energy consumption with different val- scaling-up using multiple fingers. This results in a large
ues of α, it can be observed that PowerPC FF is highly overhead in terms of area, performance, and energy. Stacking
sensitive to α compared with the SAFFs. This is because, transistors to reduce the driving strength of MN4 may be
at CLK rising edge, the internal nodes of PowerPC FF switch another approach. However, the required stack number for
only when D is changed from the previous value, while the the target yield is significant at low VDD (12 for 300 mV
internal nodes of the SAFFs switch every CLK rising edge and 7 for 400 mV, according to simulations), which results
whether or not D switches. TGPL is also less sensitive to α, in a large area overhead. More importantly, with large stack
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 22. Energy consumption of various FFs for activity factors, α. (a) α = 0. (b) α = 0.2. (c) α = 0.4. (d) α = 0.6. (e) α = 0.8. (f) α = 1.
Fig. 24. Required scale-up for low voltage to achieve target operational
Fig. 23. Layout of SAFF-TCD. yield.
TABLE IV R EFERENCES
A REA C OMPARISON OF FF S
[1] H. Kaul, M. Anders, S. Hsu, A. Agarwal, R. Krishnamurthy,
and S. Borkar, “Near-threshold voltage (NTV) design—Opportunities
and challenges,” in Proc. 49th ACM/EDAC/IEEE Design Autom.
Conf. (DAC), Jun. 2012, pp. 1149–1154.
[2] D. Jeon, M. Seok, C. Chakrabarti, D. Blaauw, and D. Sylvester,
“A super-pipelined energy efficient subthreshold 240 MS/s FFT core in
65 nm CMOS,” IEEE J. Solid-State Circuits, vol. 47, no. 1, pp. 23–34,
Jan. 2012.
[3] Y. Suzuki, K. Odagawa, and T. Abe, “Clocked CMOS calculator
circuitry,” IEEE J. Solid-State Circuits, vol. SSC-8, no. 6, pp. 462–469,
Dec. 1973.
[4] G. Gerosa et al., “A 2.2 W, 80 MHz superscalar RISC microprocessor,”
numbers at low VDD , the noise immunity of the /R and /S IEEE J. Solid-State Circuits, vol. 29, no. 12, pp. 1440–1454, Dec. 1994.
[5] D. Markovic, B. Nikolic, and R. W. Brodersen, “Analysis and design of
nodes becomes degraded because of the weak pull-down low-energy flip-flops,” in Proc. Int. Symp. Low Power Electron. Design,
path. Thus, a structural solution for enhancing the operational 2001, pp. 52–55.
yield of the SA stage in SAFFs, such as offered by the [6] C. K. Teh, T. Fujita, H. Hara, and M. Hamada, “A 77% energy-saving
22-transistor single-phase-clocking D-flip-flop with adaptive-coupling
proposed SAFF-TCD, is essential for operating SAFFs at configuration in 40 nm CMOS,” in IEEE Int. Solid-State Circuits Conf.
low VDD . (ISSCC) Dig. Tech. Papers, Feb. 2011, pp. 338–340.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[7] N. Lotze, M. Ortmanns, and Y. Manoli, “Variability of flip-flop timing [29] M. Alioto, E. Consoli, and G. Palumbo, “Analysis and compari-
at sub-threshold voltages,” in Proc. ACM/IEEE Int. Symp. Low Power son in the energy-delay-area domain of nanometer CMOS flip-flops:
Electron. Design (ISLPED), Aug. 2008, pp. 221–224. Part I—Methodology and design strategies,” IEEE Trans. Very Large
[8] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, Scale Integr. (VLSI) Syst., vol. 19, no. 5, pp. 725–736, May 2011.
“Flow-through latch and edge-triggered flip-flop hybrid elements,” [30] S. Lin, H. Yang, and R. Luo, “High speed soft-error-tolerant latch and
in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, flip-flop design for multiple VDD circuit,” in Proc. IEEE Comput. Soc.
Feb. 1996, pp. 138–139. Annu. Symp. Very Large Scale Integr., Mar. 2007, pp. 273–278.
[9] Z. Peiyi, T. Darwish, and M. Bayoumi, “Low power and high [31] W. Wang and H. Gong, “Sense amplifier based RADHARD flip
speed explicit-pulsed flip-flops,” in Proc. 45th Midwest Symp. Circuits flop design,” IEEE Trans. Nucl. Sci., vol. 51, no. 6, pp. 3811–3815,
Syst. (MWSCAS), vol. 2. Aug. 2002, pp. II-477–II-480. Dec. 2004.
[10] F. Klass et al., “A new family of semidynamic and dynamic flip-flops
with embedded logic for high-performance processors,” IEEE J. Solid-
State Circuits, vol. 34, no. 5, pp. 712–716, May 1999. Hanwool Jeong was born in Seoul, South Korea,
[11] S. D. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. J. Sullivan, in 1987. He received the B.S. degree in electrical
and T. Grutkowski, “The implementation of the Itanium 2 microproces- and electronic engineering from Yonsei University,
sor,” IEEE J. Solid-State Circuits, vol. 37, no. 11, pp. 1448–1460, Seoul, South Korea, in 2012, where he is currently
Nov. 2002. working toward the Ph.D. degree in electrical and
[12] S. Bernard, M. Belleville, J.-D. Legat, A. Valentian, and D. Bol, electronic engineering.
“Ultra-wide voltage range pulse-triggered flip-flops and register file with His current research interests include near-
tunable energy-delay target in 28 nm UTBB-FDSOI,” Microelectron. J., threshold digital logic circuit design, low-voltage
vol. 57, pp. 76–86, Nov. 2016. SRAM peripheral circuit design, and advanced
[13] H. Cai, Y. Wang, L. A. Naviner, W. Kang, and W. Zhao, “Energy effi- device-based SRAM cell design.
cient magnetic tunnel junction based hybrid LSI using multi-threshold
UTBB-FD-SOI device,” in Proc. Great Lakes Symp. VLSI, 2017,
pp. 23–28.
[14] J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC micro- Tae Woo Oh was born in Seoul, South Korea,
processor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703–1714, in 1992. He received the B.S. degree in electrical
Nov. 1996. and electronic engineering from Yonsei University,
[15] B. Nikolic, V. G. Oklobdzija, V. Stojanovic, W. Jia, J. K.-S. Chiu, Seoul, in 2015, where he is currently working
and M. M.-T. Leung, “Improved sense-amplifier-based flip-flop: Design toward the Ph.D. degree in electrical and electronic
and measurements,” IEEE J. Solid-State Circuits, vol. 35, no. 6, engineering.
pp. 876–884, Jun. 2000. His current research interests include FinFET-
[16] J.-C. Kim, Y.-C. Jang, and H.-J. Park, “CMOS sense amplifier-based flip- based low-power and high-performance SRAM.
flop with two N-C/sup 2/MOS output latches,” Electron. Lett., vol. 36,
no. 6, pp. 498–500, Mar. 2000.
[17] A. G. M. Strollo, D. De Caro, E. Napoli, and N. Petra, “A novel high-
speed sense-amplifier-based flip-flop,” IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 13, no. 11, pp. 1266–1274, Nov. 2005. Seung Chul Song received the Ph.D. degree in
[18] D. H. Saari and D. G. Nairn, “Analog integrated circuit design using solid-state electronics from The University of Texas
fixed-length devices,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), at Austin, Austin, TX, USA, in 2000.
May 2016, pp. 1798–1801. Since 2000, he has been in engineering and
[19] D. Ingerly et al., “Low-k interconnect stack with metal-insulator-metal management positions in various organizations,
capacitors for 22 nm high volume manufacturing,” in Proc. IEEE Int. involved in advanced CMOS process/device technol-
Interconnect Technol. Conf. (IITC), Jun. 2012, pp. 1–3. ogy development. He is currently with Qualcomm
[20] H.-T. Lin, Y.-L. Chuang, Z.-H. Yang, and T.-Y. Ho, “Pulsed-latch Inc., San Diego, CA, USA, where he leads the
utilization for clock-tree power optimization,” IEEE Trans. Very Large 28-nm HK/MG technology development with lead-
Scale Integr. (VLSI) Syst., vol. 22, no. 4, pp. 721–733, Apr. 2014. ing foundries. He has contributed several key papers
[21] Y. S. Chauhan et al., “BSIM—Industry standard compact MOSFET to high-profile journals and conferences on various
models,” in Proc. (ESSCIRC), Sep. 2012, pp. 30–33. topics of CMOS technology, including SiON, HK/MG, and FinFET. He is the
[22] C. Auth et al., “A 22 nm high performance and low-power CMOS holder of six U.S. patents.
technology featuring fully-depleted tri-gate transistors, self-aligned
contacts and high density MIM capacitors,” in Proc. Symp. VLSI
Technol. (VLSIT), Jun. 2012, pp. 131–132. Seong-Ook Jung (M’00–SM’03) received the B.S.
[23] R. W. Mann et al., “Impact of circuit assist methods on margin and and M.S. degrees in electronic engineering from
performance in 6 T SRAM,” Solid-State Electron., vol. 54, no. 11, Yonsei University, Seoul, South Korea, in 1987 and
pp. 1398–1407, 2010. 1989, respectively, and the Ph.D. degree in electri-
[24] M. Guillorn et al., “FinFET performance advantage at 22 nm: An AC cal engineering from the University of Illinois at
perspective,” in Proc. Int. Symp. VLSI Technol., Jun. 2008, pp. 12–13. Urbana–Champaign, Urbana, IL, USA, in 2002.
[25] M. J. M. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers, “Matching From 1989 to 1998, he was with Samsung Elec-
properties of MOS transistors,” IEEE J. Solid-State Circuits, vol. 24, tronics, where he was involved in the specialty
no. 5, pp. 1433–1439, Oct. 1989. memories, such as video RAM, graphic RAM, and
[26] H. Kawasaki et al., “Challenges and solutions of FinFET integration in window RAM, and merged memory logic. From
an SRAM cell and a logic circuit for 22-nm node and beyond,” in IEDM 2001 to 2003, he was with T-RAM Inc., Milpitas,
Tech. Dig., Dec. 2009, pp. 289–292. CA, USA, where he was the Leader of the Thyristor-Based Memory Design
[27] M. Alioto, E. Consoli, and G. Palumbo, “General strategies to design Team. From 2003 to 2006, he was with Qualcomm Inc., San Diego, CA, USA,
nanometer flip-flops in the energy-delay space,” IEEE Trans. Circuits where he was involved in high-performance low-power embedded memories,
Syst. I, Reg. Papers, vol. 57, no. 7, pp. 1583–1596, Jul. 2010. process variation tolerant circuit design, and low-power circuit techniques.
[28] C. Chia-Hsiang, K. Bowman, C. Augustine, Z. Zhengya, and Since 2006, he has been a Professor with Yonsei University. His current
J. Tschanz, “Minimum supply voltage for sequential logic circuits in research interests include process variation tolerant circuit design, low-power
a 22 nm technology,” in Proc. IEEE Int. Symp. Low Power Electron. circuit design, mixed-mode circuit design, and future generation memory.
Design (ISLPED), Sep. 2013, pp. 181–186. Dr. Jung is currently a Board Member of the IEEE SSCS Seoul Chapter.