Professional Documents
Culture Documents
Techniques: Low-Power Design For High-Performance CMOS Adders KO
Techniques: Low-Power Design For High-Performance CMOS Adders KO
[4] R. J. Francis, J. Rose, and K. Chung, “Chortle: A technology mapping Low-Power Design Techniques for
program for lookup table-based field programmable gate array,’’ in Proc. High-Performance CMOS Adders
27th Des. Automat. Con$, 1990, pp. 613-619.
[5] R. Murgai, Y. Nishihito, N. Shenoy, R. K. Brayton, and A. Sangiovanni-
Uming KO,Poras T. Balsara, and Wai Lee
Vincentelli, “Logic synthesis for programmable gate arrays,” in Proc.
27th Des. Aufomat. Con$, 1990, pp. 620-625.
[6] R. J. Francis, J. Rose, and 2. Vranesic, “Chortle-crf: Fast technology
mapping for lookup table-based FPGA’s,” in Proc. 28th Des. Automat. Absftoct-A high-performanceadder is one of the mast critical compo-
Con$, 1991, pp. 227-233. nents of a processor which determines its throughput, as it is used in the
[7] R. Murgai, N. Shenoy, R. K. Brayton. and A. Sangiovanni-Vincentelli, ALU, the floating-point unit,and for address generation in case of cache
“Improved logic synthesis algorithms for table look up architectures,” or memory access. In this paper, low-power design techniques for various
in Proc. Inr. Con$ Compuf.- Aid. Des., 1991, pp. 564-567. digital circuit families am studied for implementing high-performance
[8] K. Karplus, “Xmap: A technology mapper for table-lookup field- adders, with the objeftive to optimize performance per watt or energy
programmable gate arrays,” in Proc. 28th Des. Automar. Con$, 1991. ef6ciency as well as silicon area efficiency. While the investigation is done
pp. 240-243. using 100 MHz, 32 b carry lookahead (CLA) adders in a 0.6 pm CMOS
[9] N. S. Woo,“A heuristic method FPGA technology mapping based on the technology, most techniques presented here can also be applied to other
edge visibility,” in Proc. 28th Des. Automa. Con$, 1991, pp. 248-251. parallel adder algorithms such as carry-select adders (CSA) and other
[lo] P. Sawkar and D. Thomas, “Area and delay mapping for table-look- energy ef6cient CMOS arcuits. Among the techniques presented here,
up based field programmable gate arrays,” in Proc. 29th Des. Automat. the double pass-transistor logic @PL) is found to be the most energy
Con$, 1992, pp. 368-373. emdent while the single-rail domino and complementary pass-transistor
[ll] -, “Performance directed technology mapping for look-up table logic (CPL) result in the best p e d o r m ~ c eand the most area efficient
based FPGAS,” in Proc. 30th Des. Aufoma. Con$, 1993, pp. 208-212. adders, respectively. The impact of transistor threshold voltage scaling
[ 121 R. J. Francis, J. Rose, and Z. Vranesic, “Technology mapping of lookup on energy efficiency is also examined when the supply voltage is scaled
table-based FPGA’s for performance,” in Proc. Inf. Con$ Compuf-Aid. from 3.5 V down to 1.0 V.
Des., 1991. pp. 568-571. Zndex Terms-Low power, digital CMOS, high performance, adder.
[13] R. Murgai, N. Shenoy, R. K. Brayton, and A. Sangiovanni-Vincentelli,
“Performance directed synthesis for table look up programmable gate
arrays,” in Proc. Int. Con$ Compur.-Aid. Des., 1991, pp. 572-575.
I. INTRODUCTION
[I41 K. C. Chen, J. Cong, Y.Ding, and A. B. Kahng, “DAG-Map: Graph-
based FFGA technology mapping for delay optimization,” IEEE Des. With the advent of battery operated applications like portable
Test Compur.. pp. 7-20, Sept. 1992. computing and personal communication systems ( X S ) [l], it has
[I51 N. Bhat and D. Hill, “Routable technology mapping for FPGA’s,” in become imperative to develop integrated circuits and systems that
Pmc. 1st Inr. ACWSIGDA Workshop Field Programmable Gate Arrays.
Feb. 1992, pp. 143-148. use less energy without greatly sacrificing computational throughput.
[16] M. Schlag, J. Kong, and P. K. Chen. “Routability driven technology Furthermore, such energy efficient circuits are also needed in high-
mapping for lookup table based PGA’s,” in Proc. IEEE Int. Con$ performance desktop, AC powered systems in which sinking large
Compur. Des., Oct. 1992. amount of heat through packages is becoming a difficult problem.
[I71 J. Cong and Y. Ding, “On areddepth trade-off in LUT-based FPGA
Thus, designing a low-power processor is becoming equally important
technology mapping,” in Proc. 30rh Des. Automa. Con$, 1993, pp.
213-218. to designing a high performance one. This trend will benefit desktop
[I81 R. K. Brayton, R. Rudell, A. Sangiovanni-Vincentelli, and A. R. as well as portable systems as it will allow greater integration at the
Wang, “MIS:A multiple-level logic optimization system,” IEEE Trans. silicon level with less expensive device packaging which in tum will
Comput.-Aid. Des. Infegr. Circ.. Sysf., Nov. 1987. lead to a further reduction in system power dissipation.
[19] T. H. Cormen, C. E. Leisemon, and R. L. Rivest, Introduction to A high-performance adder has been one of the most critical compo-
Algorithms. Canbridge, MA: MIT, pp. 477-479.
[20] Y. Chen, T. Ku, W. Chia, S. Chiu, and 0. Lam, “Structure exploration in nents in determining the throughput of a processor’s execution unit,
high-level language description for logic synthesis,” in Proc. 7fhIEEE floating-point unit, and memory address generation unit. Recently,
Int. ASIC Con$ Exhibif, 1994, pp. 63-66. Nagendra et al. [2] presented power-delay characteristics of various
[21] J. Cong and Y. Ding, “Beyond the combinational limit in depth adder architectures using the full static CMOS circuit style. They
minimization for LUT-based FPGA design,” in Proc. Inf. Con$ Comput-
Aid. Des., 1993, pp. 11CL114. presented a comparison of ripple carry, block carry lookahead and
signed digit adders. A similar study was also reported by Callaway et
al. [3] in which they compared dynamic power dissipation of different
adder architectures. Besides considering different adder architectures,
another approach is to employ different CMOS circuit styles to design
energy efficient, high-performance adder circuits for a given architec-
ture. Conventional static CMOS has been the technique of choice in
most processor design. Altematively, static pass-transistor circuits, in
particular, have also been suggested for low-power applications [ 11.
Dynamic circuits, when clocked judiciously, can also be used in low-
power microprocessors [4]. However, several other design techniques
need to be applied and evaluated along with these circuit styles for
low-power/voltage processor applications.
Manuscript received April 22, 1994; revised November 3, 1994.
U. KO and W. Lee are with Texas Instruments, Inc., Dallas, TX 75265
USA.
P. T. Balsara is with the Department of Electrical Engineering, University
of Texas at Dallas, Richardson, TX 75083 USA.
IEEE Log Number 9410845.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on February 23,2021 at 07:54:38 UTC from IEEE Xplore. Restrictions apply.
328 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (-1) SYSTEMS, VOL. 3, NO. 2, JUNE 1995
T T a b
a. T b~
- -
Fig. 1. XOR using high-performance static CMOS.
Fig. 3. High-performance CPL XOR with pMOSFET feedback.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on February 23,2021 at 07:54:38 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS,VOL. 3, NO. 2, JUNE 1995 329
-
a b
i,
C4 4L K - I t L ~ ~ zi
(Single-rail signal)
' ''I
b
& L
J
L
.b
I Fig. 7. NOR using high-performance single-rail domino.
a
Fig. 5. XOR using high-performance DPL
CLKdm
T i T l t t
si Pi ti (single-rail signals)
Pi gi zi single-rail signals)
t t f
Fig. 6. XOR using high-performance dual-rail domino. IPre-Conditioning1
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on February 23,2021 at 07:54:38 UTC from IEEE Xplore. Restrictions apply.
330 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 3, NO. 2, JUNE 1995
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on February 23,2021 at 07:54:38 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 3, NO. 2, JUNE 1995 331
Single
Parameters DualRail Rail
(Vddd.3 V) static CPL DPL Domino Domino Units
Power@ 100MHz 34.3 34.5 21.5 82.5 60.2 mW
percentage 125 125 100 300 219 %
-.
Delnv of critical uath
~
I1
2.33 I
2.24 I
1.98 I
1.78 1.64 118
100, I 7
I 1 ' I
1.5 2 2.5 3 3.5 1.5 2 Supply Voltage
2.5 (volts) 3 3.5
Supply Voltage (volts)
32 b adders average power versus supply voltage. 32 b adder delay versus supply voltage.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on February 23,2021 at 07:54:38 UTC from IEEE Xplore. Restrictions apply.
332 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS,VOL. 3, NO. 2. JUNE 1995
4.5
140 - Static - 4
110 - 3.5
70 . 2.5
1.5
1.5 2 2.5 3.5
supply Voltage (volts) Threshold Voltage, 1/5 of Supply Voltage (volts)
Fig. 11. 32 b adders energy versus supply voltage. Dependency of delay on Vt scaling.
170 I
160
150
140
130
120
110
100
90
80
70
60
50
40
30
20
nl I 10
0.2 0.3 0.4 0.5 0.6 0.7 0.2 0.3 0.4 0.5 0.6 0.7
Threshold Voltage, 115 of Supply Voltage (volts) Threshold Voltage, 1/5 of Supply Voltage (volts)
Dependency of power on Vt scaling. Dependency of energy on Vt scaling.
VI. CONCLUSION is increased by a factor of 1.55 with a 58% area reduction when
Although CPL uses fewer transistors than the other four static and compared to the original DPL.
dynamic circuit styles to implement the same logic functions, the Precharged circuits like domino offers performance advantage over
partial swing at the intermediate nodes results in more than 50% DPL (fastest in the three static circuit families investigated) by
of the power being wasted. Reduction in the Vt of the nMOSFET 12%-21%, but at the expense of burning 300%-219% power relative
pass transistors has been proposed to ease this problem [l], but to DPL. Compared to DPL, dual-rail and single-rail consume 270%
and 181% energy and require 76% and 44% more silicon area,
it will reduce the noise margin to an unacceptable level at low
respectively.
supply voltages. An additional feedback pMOSFET device in the
In summary, the improved CPL uses the least amount of silicon
inverter stage combined with both the low-power CPL and the high-
area, and single-rail domino offers the fastest performance. The
performance version and the technique in [13], yields an improvement
improved DPL is the most energy efficient circuit and is suitable for
by a factor of 2.62 in energy efficiency and a 59% reduction in
applications in which power and performance are equally important.
area, compared to the original CPL. Because of the presence of
both nMOSFET and pMOSFET devices, all nodes in DPL have
a full voltage swing and there is no static short-circuit current ACKNOWLEDGMENT
problem. Dual current paths in DPL implementation also improves
performance. With the similar techniques applied in CPL (except The authors would like to thank the anonymous referees for their
for feedback pMOSFET), the optimized DPL's energy efficiency helpful comments in improving the presentation of this paper.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on February 23,2021 at 07:54:38 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS,VOL. 3, NO. 2, JUNE 1995 333
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on February 23,2021 at 07:54:38 UTC from IEEE Xplore. Restrictions apply.