Adder

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Microelectronics Journal 35 (2004) 939944 www.elsevier.

com/locate/mejo

An area-efcient static CMOS carry-select adder based on a compact carry look-ahead unit
G.A. Ruiz*, M. Granda
nica y Computadores, Facultad de Ciencias, Universidad de Cantabria, Avda. de Los Castros s/n, 39005 Santander, Spain Depto. de Electro Received 13 January 2004; received in revised form 23 August 2004; accepted 2 September 2004

Abstract This paper presents a highly area-efcient CMOS carry-select adder (CSA) with a regular and iterative-shared transistor structure very suitable for implementation in VLSI. This adder is based on both a static and compact multi-output carry look-ahead (CLA) circuit and a very simple select circuit. Comparisons with other representative 32-bit CSAs show that the proposed adder reduces the area by between 25 and 16%, the number of transistors by between 43 and 30%, and the dynamic power supply between 35 and 16%, while maintaining a high speed. q 2004 Elsevier Ltd. All rights reserved.
Keywords: Addition; Carry-select adder; Carry-look ahead; Computer arithmetic circuits

1. Introduction Addition is one of the fundamental arithmetic operations. A number of fast adder architectures have been proposed in the long history of computer arithmetic [13] in pursuit of three basic characteristics: a regular structure, a fast logic evaluation and a compact circuit layout. Table 1 shows the asymptotic time and area requirements for most important types of adders. El carry-ripple adder (CRA) is the simplest approach. However, the carry-lookahead adder (CLA) and its fast version, the parallel-prex CLA, is the selected scheme for time-critical applications with a considerable cost in terms of silicon area and power dissipation. The CSA provides a compromise between a RCA and a CLA adder. Hybrid adders combine elements of different approaches to obtain adders with a higher performance, reduced area and low power consumption. The CLA/CSA hybrid adder is the most popular for high-speed applications. Implementations of fast CLA/CSA hybrid adders based on Manchester carrylookahead chains of xed [5] or variable [6] length, muxbased CLA circuits [7] and Lings carry [8] have been recently reported.

* Corresponding author. Fax: C34 942 201402. E-mail address: ruizrg@unican.es (G.A. Ruiz). 0026-2692/$ - see front matter q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.mejo.2004.09.002

In the classical scheme, the CSA [4] is divided into k-bit blocks, where each one performs (generally by means of two ripple-carry adders) two additions in parallel, one assuming the carry-in 0 and the other assuming carry-in 1, as shown in Fig. 1. When the carry-out of the preceding block (Cpblock) is nally known, the correct sum (which has been precomputed) is simply selected. Cpblock must drive many multiplexers, this being one of the factors, which conditions the size of the CSA blocks and speed. One of the most typical applications of CSA is their use in the nal adder of parallel multipliers [9]. Different implementations of CSA have been proposed. Tyagi [10] shows that the classical CSA can be replaced by a variable parallel prex block and a carry circuit selection reducing by about 23% the area and 7% the delay in comparison with similar fast adders. The area-efcient CSA proposed in [11] is based on an add-one circuit to replace one carry-ripple adder resulting in 29.2% fewer transistors but with a speed loss of 5.9% for length nZ64. This CSA is improved in [12] with a substantial reduction in area. Others CSAs with pipeline structure [13], self-timed applications [14,15] and FPGA technology [16,17] have been proposed. Recently, new methods to minimize the power-delay product in CSAs have been presented in [18]. This paper presents a highly area-efcient CSA based on a static and compact multi-output CLA. The select circuit is

940

G.A. Ruiz, M. Granda / Microelectronics Journal 35 (2004) 939944

Table 1 Asymptotic time and area requeriments for different types of adders Time CRA CLA Parallel-prex CLA CSA O(n) O(log n) O(2 log n) p O n Area O(n) O(n log n) O(2n log n) O(n)

From Eq. (4), it can be deduced that Ci Z Ci0 C PPi Cpblock Z Ci0 C Ci1 Cpblock Ci0 PPi being 0 since PiGiZ0, and where PPi Z
i Y jZ1

(5)

made up of NMOS pass-transistors with a simpler structure than that proposed in other CSAs. Hence, this adder has a regular and iterative-shared transistor structure very suitable for implementation in VLSI. Comparison with similar 32-bit CSAs shows a signicant reduction in area and power, while maintaining the same speed.

Pj

(6)

2. Circuit selection for CSAs Let Ai and Bi be the i bits of the input data and CiK1 the carry-in for stage i. The usual method for computing the carry-out Ci and sum Si in an adder is Ci Z Gi C Pi CiK1 where Pi Z Ai 4Bi Gi Z Ai Bi (2) Si Z Pi 4CiK1 (1)

Different CSAs with more area-efcient structures or with an improved delay of critical path have been proposed. These CSAs can be roughly classied according to the type of select circuit in the CSA, whether based on carry selection [7,10] whose main objective is high-speed, or on adder selection [11,12], where the main objective is to reduce area. The CSA proposed by Tyagi [10] reduces the area by about 23% and the delay by 7% in comparison with other competitive adders. For this purpose, it uses a variable parallel prex block to generate the Ci0 K1 terms and the bit slice selection circuit of Fig. 2a. The sum is obtained by combining the following equations CiK1 Z Ci0 K1 C PPiK1 Cpblock Si Z Pi 4CiK1 (7)

are carry propagate signal and carry generate signal, respectively. For one block of the classical CSA, the following equations can be dened
1  Si Z S0 i C pblock C Si Cpblock

Ci Z Ci0 C Ci1 Cpblock

(3)

where Cpblock is the carry-out of preceding block, S0 i is the sum output and Ci0 is the carry output of the adder for carry1 in 0, and S1 i and Ci for carry-in 1. For example, for iZ4, we get
0 C4

In Ref. [7] it is demonstrated that GiCPiZGi4Pi so that 0 the following relation can be established: S1 i Z Si 4PPiK1 . 0 The CiK1 are obtained from a mux-based carry look-ahead circuit and the select circuit in Fig. 2b allows the sum S0 iZ 1 Pi 4Ci0 K1 to be generated, and from these, Si . The nal sum is dened as ( S1 Z P1 4Cpblock
0  Si Z S0 i C pblock C Si 4PPiK1 Cpblock

Z G4 C P4 G3 C P4 P3 G2 C P4 P3 P2 G1

for i O 1

(8)

1 C4 Z G4 C P4 G3 C P4 P3 G2 C P4 P3 P2 G1 C P4 P3 P2 P1 0 Z C4 C P4 P3 P 2 P1

The authors indicate that this proposed 1Kb CSA has a 20% size advantage over 1Kb conventional CSA with the same critical path.

Fig. 1. Classical carry select adder divided in k-bit blocks.

G.A. Ruiz, M. Granda / Microelectronics Journal 35 (2004) 939944

941

Fig. 2. Bit-slice selection circuits proposed in (a) [10], (b) [7], (c) [11] and (d) [12].

The CSA of [11] is based on a ripple carry adder with carry-in 0 to obtain S0 i and additional logic so that 8 < S1 Z S0 1 4Cpblock (9) 0 : S Z S0 C 0  for i O 1 i i pblock C Si 4SS iK1 Cpblock where SS0 i Z
i Y jZ1

S0 j

(10)

This CSA is highly area-efcient resulting in 29.2% fewer transistors but with a speed loss of 5.9% for length nZ64 in comparison with classical CSA. Its select circuit (Fig. 2c) is improved in [12] resulting in the circuit shown in Fig. 2d which implements the following expressions 8 S Z S0 > 1 4Cpblock > < 1 0 0 0  Si Z Si SS0 iK1 Cpblock C Si SSiK1 Cpblock (11) > > : 0 0 Z Si 4SSiK1 Cpblock for i O 1 This CSA reduce the number of transistors by 29% in comparison with [11] with negligible speed loss. All of the above CSAs are functionally equivalent, fullling the following relation SS0 i Z PPi (12)

principles, whichever are the most suitable for a given technology. In a carry-ripple adder of n bits, stage i uses three inputs to implement a 1-bit addition: two input data bits (Ai, Bi) and a carry input (CiK1) from the previous stage. The speed of this adder depends to a large extent on the carry propagation time through its stages. Table 2 shows the truth table of the full adder carry-out and its complement. When AiZBiZ0 or AiZBiZ1, the carryout is generated at the ith stage. Pi term indicates when  iK1 to the ith stage will pass the incoming carry CiK1 C the next higher stage. The carry-out and its complement [19] can be expressed as follows Ci Z Ai Bi C Pi CiK1 Z Gi C Pi CiK1 i Z A  iK1 Z Ni C Pi C  iK1  iB  i C Pi C C (13)

3. Compact multi-output CLA Of all the choices of fast binary adders available for implementation in VLSI, by far the most popular are adders based on CLA, mainly because they improve the carry delay by calculating the carries of each stage in parallel. The carry generation logic of these parallel adders uses fast and efcient structures developed from techniques, both in the domain of logic structure and that of basic circuit

 iB  i . Note that GiPiZ0, NiPiZ where GiZAiBi and Ni Z A 0 and NiGiZ0, that is, these signals have a mutually exclusive property. Fig. 3a and b show an efcient implementation in static CMOS logic of the basic cells  i carry-outs to be obtained. The that enable the Ci and C cell of Fig. 3b is more suitable since it is not necessary to complement the input data (Ai, Bi). Moreover, the noise margin problem presented by both cells can be eliminated by means of output restoring inverters. This problem is due to the fact that the high level is lower than the supply voltage level by the threshold voltage of the NMOS pass-transistors (Pi). These inverters also provide electrical insulation from the cell and increase its
Table 2 Full adder carry-out Ai Bi 0 0 1 1 0 1 0 1 Ci 0 CiK1 CiK1 1 i C 1  iK1 C  iK1 C 0 Pi 0 1 1 0

942

G.A. Ruiz, M. Granda / Microelectronics Journal 35 (2004) 939944

 i , and (c) 4-bit CLA based on cell (b). Fig. 3. Static and compact implementation of carry-out: (a) cell for generation of Ci, (b) cell for generation of C

fan-out. Fig. 1c shows the structure of a regular, simple and multi-output 4-bit CLA unit made up of several cells from Fig. 3b connected in cascade; the optimum number of cells may be calculated for a given technology by simulation. To ensure full swing when a high level is transmitted via Pi pass-transistors, the level restoring inverter displayed in Fig. 3c (labelled with a big point) is required. This inverter has a weak fed back PMOS transistor and the N- and P-transistors should be adjusted to balance the rising and falling output time. 4. Compact CSA based on CLA and comparisons The area and power efciency of the proposed CSA is based on the CLA of Fig. 3 and the following expressions derived from Eqs. (7) and (13) resulting in i Z C  0 C i C PPi C pblock  iK1 Si Z Pi 4C (14)

This CSA, as shown in Fig. 4, is basically made up of a 4-bit CLA circuit and a multi-output AND gate which generates the PPi signals. The select circuit is formed by NMOS pass-transistors and is far simpler than those shown in Fig. 2. In this circuit, when PPiZ1, then the carry-in is

 j j % i nodes. directly transmitted in parallel to all C  Otherwise, if PPiZ0, then Ci is generated in the CLA 0 circuit (note that C i PPi Z 0). This adder uses NMOS passtransistors and thus the level restoring inverters are necessary. In order to make a comparative analysis of the most representative CSAs described in [7,10,12] and shown in Fig. 4, four 32-b CSA adders implemented in 4 bit groups were designed using a standard 0.6 mm CMOS two metals p-well technology. The electrical circuit was extracted from the layout, including very precise extraction of parasitics, and simulated with HSPICE at VDDZ3.3 V and CLZ0.1 pF for each output. Table 3 lists the area, number of transistors, dynamic power consumption at 50 MHz and timing characteristics of these adders. The proposed adder occupies 6608 mm2 and reduces the area by between 16 and 25% in comparison with other CSAs. More signicant is the decrease in the number of transistors which may reach 43% with respect to that proposed in [7] and 30% with respect to the others. These reductions in size and in number of transistors are achieved through the efcient and compact CLA structure and through the ease with which the carry-in can be propagated

G.A. Ruiz, M. Granda / Microelectronics Journal 35 (2004) 939944

943

Fig. 4. New static and compact 4-bit CSA. All dimensions of transistors are in mm. LZ0.35 mm, except for weak transistors, where LZ0.6 mm.

in parallel using simple NMOS pass transistors. As result, the dynamic power supply is reduced by 16% with respect to [10], 19% to [7] and 35% to [12]. In a CSA the critical path is dened by the carry chain of the rst block and by the selection circuit of each block. In the nal block the worst delay is in the selection circuit of nal sum bits. Fig. 5 shows the transient waveforms of proposed 32-b CSA for the worst-case delay path. C4 to C28 are the carry-out signals of different blocks and S32 the output sum. Therefore, the delay of the critical path tp of this CSA is made up of N identical blocks and can be written as tp Z ti C N K 2tg C te
Table 3 Simulation results for 32-bit CSAs Area (mm2) [10] [7] [12] Proposed 7690 8277 7746 6608 No. trans. 169 210 173 120 Power at 50 MHz (mW) 5.05 5.25 6.57 4.52 tp (ns) ti (ns) tg (ns) te (ns)

where ti is the time to create the carry signals out of the rst block, tg is the delay of the carry selection circuit, and te is the delay of the sum selection circuit. These times, shown in Table 3 and highlighted in Fig. 5, demonstrate that the proposed adder presents a tp similar to [10] and slightly lower than [7] which is the fastest. Since it has the lowest tg, the critical path of carry-out propagation through the blocks is minimal, which will lead to a low tp even for high N.

(15)

5.65 5.23 7.33 5.6

1.2 1.28 1.1 1.77

0.61 0.53 0.89 0.5

0.79 0.77 0.89 0.83

Fig. 5. Transients of 32-b CSA.

944

G.A. Ruiz, M. Granda / Microelectronics Journal 35 (2004) 939944 [8] Y. Wang, C. Pai, X. Song, The design of hybrid carry-lookahead/carry-select adders, IEEE Transactions on Circuit and Systems-II: Analog and Digital Signal Processing 49 (1) (2000) 1624. [9] V.G. Oklobdzija, D. Villeger, S.S. Liu, A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach, IEEE Transactions on Computers 45 (3) (1996) 294306. [10] A. Tyagi, A reduced-area scheme for carry-select adders, IEEE Transactions on Computers 42 (10) (1993) 11631170. [11] T.Y. Chang, M.J. Hsiao, Carry-select adder using single ripple-carry adder, Electronics Letters 34 (22) (1998) 21012103. [12] Y. Kim, L.S. Kim, 64-bit carry-select adder with reduced area, Electronics Letters 37 (10) (2001) 614615. [13] Y. Kim, K.H. Sung, L.S. Kim, 1.67 GHz 32-bit pipelined carryselect adder using complementary scheme, IEEE International Symposium on Circuits and Systems, Piscataway (USA), 2002 pp. I-461464. [14] P. Corsonello, S. Perri, G. Cocorullo, Hybrid carry-select statistical carry look-ahead adder, Electronics Letters 35 (7) (1999) 549551. [15] A. de Gloria, M. Oliveri, Completion-detecting carry select addition, IEEE Proceedings on Computer and Digital Techniques 147 (2) (2000) 93100. [16] R. Hshermian, An algorithm and design procedure for high speed carry select adders using FPGA technology, Proceedings of 37th Midwest Symposium on Circuits and Systems, New York (USA), 1994 pp. 257260. [17] R. Hshermian, A new design for high speed and high-density carry select adders, Proceedings of 43rd IEEE Midwest Symposium on Circuits and Systems, Lansing, MI, 2000 pp. 13001303. ` ve, H. Schettler, T. Ludwig, D. Flandre, Power-delay product [18] A. Ne minimization in high-performance 64-bit carry-select adders, IEEE Transactions on VLSI 12 (3) (2004) 235243. [19] G.A. Ruiz, Evaluation of three 32-bit CMOS adders in DCVS logic for self-timed circuits, IEEE Journal of Solid-State Circuits 33 (4) (1998) 604613.

5. Conclusions The CSA presented in this paper is made up of a compact CLA and a very simple selection circuit in order to obtain a highly area-efcient CMOS circuit. The CLA has a compact, static, regular and multi-output structure with a low number of transistors. The select circuit is based on NMOS pass-transistors and is far simpler than those proposed in other CSAs (Fig. 2). Comparisons with others representative 32-bit CSAs show a high reduction in area, number of transistors and dynamic power, while maintaining a low delay of the critical path.

References
[1] K. Hwang, Computer Arithmetic: Principles, Architecture, and Design, Wiley, New York/ChiChester/Brisbane/Toronto/Singapore, 1979. [2] B. Parhami, Computer Arithmetic, Algorithms and Hardware, Oxford University Press, New York, Oxford, 2000. [3] M.D. Ercegovac, T. Lang, Digital Arithmetic, Morgan Kaufmann Publishers, San Francisco, CA, 2004. [4] N.H.E. Weste, K.E. Eshraghian, Principles of CMOS VLSI Design: a Systems Perpective, Addison Wesley, NY, 1992. [5] T. Lynch, E.E. Swartzlander, A spanning tree carry lookahead adder, IEEE Transactions on Computers 41 (8) (1992) 931939. [6] V. Kantabruta, A recursive carry-lookahead/carry-select hybrid adder, IEEE Transactions on Computers 42 (12) (1993) 14951499. [7] H. Morinaka, H. Makino, Y. Nakase, H. Suzuki, K. Mashiko, A 64 bit carry lookahead CMOS adder using Modied Carry Select, Proceedings of the IEEE Custom Integrated Circuits Conference, New York (USA), 1995 pp. 585588.

You might also like