07095289

2015 2nd International Conference on Signal Processing and Integrated Networks (SPIN)
CMOS Implementation of Efficient 16-Bit Square Root

Carry-Select Adder
Shamim Akhter, Saurabh Chaturvedi, Kilari Pardhasardi
Department of Electronics and Communication Engineering

Jaypee Institute of Information Technology
NOIDA, India
shamim.akhter@jiit.ac.in, saurabh.chaturvedi@jiit.ac.in, pardhu.inst@gmail.com
Abstract- A 16-bit Square Root Carry-Select Adder SQRT CSA architecture is realized by replacing one
(SQRT CSA) is implemented and analyzed in this paper. ripple carry adder block with a binary-to-excess-1
The SQRT CSA is an architecture level modification to converter [2-3].
reduce area and power dissipation as compared to that All the building blocks of SQRT CSA are
of conventional CSA. Conventional CSA with Cin=1
implemented with CMOS TG.. Also, 1-bit full adder
block is replaced with binary-to-excess-1 converter
(BEC) in the modified SQRT CSA structure. The needed for CSA is designed using CMOS TG based
architecture of 16-bit CSA is configured into five XOR gates [6]. In the modified SQRT CSA, the BEC
different stages with progressivley increasing data size. block is implemented using standard cells of CMOS
In order to realize 16-bit CSA, the basic building blocks, inverter, TG based XOR and AND gates. To select
e.g., XOR gate, AND gate, 2:1 Mux, half adder (HA) the true sum and carry at each stage, a CMOS TG
and full adder (FA) are implemented using CMOS based 2:1 Mux is used.
transmission gate (CMOS TG). The ripple carry adder
(RCA) and binary-to-excess converter (BEC) of The paper is organized as follows: Three-module
different bit sizes are also implemented. Transistor level
based full adder is explained in Section II. In Section
schematics are drawn using Mentor Graphics Design
Architect and simulations are carried out using Eldo III, the working and gate level diagram of 4-bit BEC
with TSMC 0.35m CMOS technology and supply circuit are presented. Section IV and V compare the
voltage of 3.3 V. architectures of conventional and modified SQRT
CSA based on area, power and delay. Section VI
Keywords- Carry-select adder, Square root carry-select provides simulation results followed by conclusion in
adder, Ripple carry adder, binary-to-excess-1 converter, Section VII.
Carry-propagation adder
II. FULL ADDER BUILDING BLOCK
I. INTRODUCTION
Adder is most commonly used arithmetic block in The Boolean expressions for outputs of 1-bit full
applications like central processing unit (CPU) and adder are given as:
digital signal processing (DSP) [1]. Therefore design Sum = (A B) Cin
of area efficient, low power and high performance Carry = A·B + Cin(A B)
adder circuit is of utmost importance [2-5]. This
paper presents the analysis of 16-bit addition There are various CMOS implementations for full
opeation using SQRT CSA and its performance is adder [1, 6-9]. Out of these, TG based CMOS full
evaluated in terms of MOS transistor count, total adder has the advantages of lower transistor count,
power dissipation and propagation delay. lower loading of inputs and intermediate nodes [7-9].
The Boolean expressions for three-module based full
The conventional SQRT CSA architecture uses
adder cell are given below:
multiple sets of ripple carry adder block at each stage
to reduce area and power dissipation. A modified Sum = H Cin
Carry = AH’+ CinH
978-1-4799-5991-4/15/$31.00 ©2015 IEEE 891

where, H and H’ represents AB and (AB)' In Fig. 3(c), square box shows generation of H and
respectively. H’.
The block diagram is shown in Fig.1.
(a)
Fig. 1: Three modules of a full adder [6]
The design of H and H’ is optimized for enhancing

the performance of full adder cell. Module 1 is a
CMOS TG based XOR gate and it uses 8 MOS
transistors to realize the signal H. H’ is derived from
H by using CMOS inverter. The CMOS design for H
and H’ is given in Fig. 2.
(b)
Fig. 2: CMOS TG based XOR gate
SQRT CSA circuit involves 2:1 Mux, HA and FA

[2,3]. The circuit diagram for 2:1 Mux, HA and FA
are shown in Fig. 3. The number of MOS transistors
required for CMOS TG based implementaion for
each block is given in Table I.
Table I. Transistor count for building blocks of SQRT CSA
Block Name No. of Transistor

2:1 Mux 6
XOR Gate 8 (c)
Half Adder 12 Fig. 3: CMOS TG based (a) 2:1 Mux, (b) HA, and (c) FA
Full Adder 20
892
III.4-BIT BINARY-TO-EXCESS-1 CONVERTER It consists of three full adders, a half adder and a 6:3
Mux.
The BEC is used in place of RCA with Cin=1. The n-
bit RCA is replaced by n+1 bit BEC [2].
Table II. Lookup table of 4-bit BEC [2]
Input [B3:B0] Output [X3:X0]
0000 0001
0001 0010
0010 0011
……. …….
……… …….
1110 1111
Fig. 5: Block diagram of conventional 16-bit SQRT CSA [2]
1111 0000
3:2 RCA block with Cin=0 is realized by one FA and

one HA.
The Boolean expressions for X0, X1, X2 and X3 are:
3:2 RCA block with Cin=1 is realized by two FA.
X0= NOT(B0)
X1= XOR (B0,B1) 6:3 Mux is a cascaded block of three 2:1 Mux which
X2= XOR[B2, (B0 AND B1)] is controlled by carry output of the 1st stage. Out of
X3= XOR[B3, (B0 AND B1 AND B2)] three Mux, two are used for the sum bits (S[3] and
S[2]) and remaining one is used for selecting carry
The 4-bit BEC circuit diagram is shown in Fig. 4. output for the next stage.
Each logic gate is designed using CMOS.
As mentioned in Table I, FA involves 20 transistors,
HA requires 12 transistors and 2:1 Mux requires 6
transistors. So total transistor count = 90 (FA + HA +
Mux ), i.e.,
FA=60 (3×20), HA=12 (1×12), Mux=18 (3×6)
Similarly, the transistor count for other stages can be

determined. The transistor count, power dissipation
and delay for each stage are listed in Table III.
Table III. Conventional SQRT CSA parameters
Stage No. Transistor Power Delay (ns)

Fig. 4: 4-bit binary-to-excess-1 converter
(No of Bits) Count Dissipation
(W)
IV. AREA EVALUATION OF 16-BIT
CONVENTIONAL SQRT CSA Stage-1(2-Bit) 40 84.2 1.619
Stage-2(2-Bit) 90 183.8 1.986

The architecture of conventional 16-bit SQRT CSA is
shown in Fig. 5 [2]. It has five different stages of Stage-3(3-Bit) 136 210.9 2.582
RCA with progressiviely increasing input data size.
Stage-4(4-Bit) 182 325.9 3.01
The first stage is simply two cascaded 1-bit full adder Stage-5(5-Bit) 228 411.8 4.28
circuit. Rest stages are CSA based circuits. The
transistor count for stage-2 of 16-bit SQRT CSA is Total 676 1207.6 13.477
discussed below.
893
V. AREA EVALUATION OF MODIFIED 16-BIT

SQRT CSA
The architecture of modified 16-bit SQRT CSA is
shown in Fig. 6 [2]. It is configured with five
different stages. The RCA block with Cin=1 in
conventional SQRT CSA is replaced by BEC block.
Fig. 6: Block diagram of modified 16-bit SQRT CSA [2] Fig. 8: Stage-2 and stage-3 of modified SQRT CSA
The combined circuit diagram for stage-2 and stage-

The MOS transistor requirements for stage-2 in 3 is illustrated in Fig. 8. The remaining circuit
modified 16-bit SQRT CSA are given below: diagrams are given in Fig. 9 and Fig. 10. The
It consists of one FA, one HA for 3:2 RCA with implementation of 7-bit modified SQRT CSA is
Cin=0 and 3-bit BEC. shown in Fig. 11. The transistor count, power
dissipation and delay for each stage of modified
Transistor count=74 (FA+HA+Mux 6:3+3-bit BEC) SQRT CSA are listed in Table IV.
FA=20 (1×20) Table IV. Modified SQRT CSA parameters
HA=12 (1×12) Stage No. Transistor Power Delay (ns)

(No of Bits) Count Dissipation (W)
3-bit BEC=24 (AND=6, NOT=2, XOR=16)
Stage-1(2-Bit) 40 84.2 1.619
MUX=18 (3×6) Stage-2(2-Bit) 74 108.5 1.858
The circuit diagram given in Fig. 7 shows the stage-3 Stage-3(3-Bit) 116 153.9 2.302
implementation using FA, HA and 4-bit BEC.
Stage-4(4-Bit) 156 206.1 3.432
Stage-5(5-Bit) 196 285.6 4.823
Total 582 808.4 14.034
VI. SIMULATION RESULTS

SPICE simulations are performed using TSMC 0.35
m CMOS technology with minimum sized
transistors and supply voltage of 3.3 V. The worst-
case delay occurs when all 16 bits of one input data
are kept logic-high while for 2nd input data, only LSB
is kept high, i.e., A = FFFF, B = 0001. Comparative
Fig. 7: Stage-3 partial sum and carry computation analysis of conventional and modified SQRT CSA
894
architectures based propagation delay, power design can also be performed for higher order input
consumption, power consumption, power-delay data size and at different technology nodes.
product (PDP) and transistor count is given in Table
V. REFERENCES
Table V. Comparison of conventional and modified 16-bit SQRT [1] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, “Digital
CSA parameters Integrated Circuits: A Design Perspective,” PHI Learning, 2nd
Edition, 2001.
Stages SQRT Delay Power PDP Transistor Count
[2] B. Ramkumar, and H. M. Kittur, “Low-Power and Area-
CSA (ns) (W) (10-15) Efficient Carry Select Adder,” IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 20, no. 2, pp. 371-375, Feb.
Stage-1 Conventional 1.619 84.2 136.31 40
2012.
Modified 1.619 84.2 136.31 40
[3] Y. He, C.-H. Chang, and J. Gu, “An Area Efficient 64-Bit
Stage-2 Conventional 1.986 183.8 365.02 90 Square Root Carry-Select Adder for Low Power Applications,”
Proc. of IEEE International Symposium on Circuits and Systems
Modified 1.858 108.5 201.59 74 (ISCAS-2005), vol. 4, pp. 4082-4085, May 2005.
Stage-3 Conventional 2.582 201.9 521.3 136 [4] Y. Kim, and L.-S. Kim, “64-Bit Carry-Select Adder with
Reduced Area,” Electronics Letters, vol. 37, no. 10, pp. 614–615,
Modified 2.302 153.9 354.27 116 May 2001.
[5] N. Zhuang, and H. Wu, “A New Design of the CMOS Full
Modified 3.432 206.1 707.33 156 Adder,” IEEE Journal of Solid-State Circuits, vol. 27, no. 5, pp.
840-844, May 1992.
[6] J.-M. Wang, S.-C. Fang, and W.-S. Feng, “New Efficient
Modified 4.382 285.6 1251.49 196 Designs for XOR and XNOR Functions on the Transistor Level,”
IEEE Journal of Solid–State Circuits, vol. 29, no. 7, pp. 780-786,
July 1994.
It can be observed that modified SQRT CSA requires [7] N. H. E. Weste, D. Harris, and A. Banerjee, “CMOS VLSI
less number of transistors and has less PDP. The Design: A Circuits and Systems Perspective,” Pearson Education,
3rd Edition, 2005.
functional validity of the proposed CMOS based
design is verified with different sets of input data. [8] S. Akhter, and S. Chaturvedi, “A Novel Method for Dual
Output Dynamic Logic Using SCL Topology,” Proc. of IEEE
International Conference on Signal Processing and Integrated
Networks (SPIN-2014), pp. 481-485, Feb. 2014.
VII. CONCLUSION
[9] S. Akhter, and S. Chaturvedi, “A high speed 14 Transistor Full
Adder Cell Using Novel 4 Transistor XOR/XNOR Gates Based on
In this paper, the SQRT CSA has been designed Dynamic CMOS Logic,” International Journal of Applied
using CMOS TG technique. The TG based full adder Engineering Research, vol. 9, no. 11, pp. 1551-1564, 2014.
blocks and multiplexers have advantage that there is
no series critical path or longest path involved in the [10] S. Akhter, "VHDL Implementation of Fast NxN Multiplier
circuit like as in Manchester-chains, therefore CMOS Based on Vedic Mathematic," Proc. of 18th European Conference
TG technique is especially useful for low-power on Circuit Theory and Design (ECCTD), pp. 472-475, Aug. 2007.
applications. The SQRT CSA balances well the area,
[11] S. A. White, “Applications of Distributed Arithmetic to
power consumption and speed of performance.
Digital Signal Processing: A Tutorial Review,” IEEE ASSP
Therefore, it is suitable for area efficient, low-power Magazine, vol. 6, no. 3, pp. 4-19, July 1989.
application with relative high-speed requirement.
However, for the extremely high-speed applications, [12] S. Akhter, V. Karwal, and R. C. Jain, "Implementation of Odd
the alternative dynamic logic technique should be Discrete Cosine Transform (ODCT-II) using Distributed
used for realizing XOR gate [8-9]. The adder circuit Arithmetic Approach," Proc. of 3rd Nirma University International
based on SQRT CSA can be used in fast multiplier- Conference on Engineering (NUiCONE), pp.1-6, Dec. 2012.
Vedic arithmetic [10] and distributed arithmetic
based multiplication [11-12]. The above analysis and
895
Fig. 9: Schematic for stage-4 of modified SQRT CSA
Fig. 10: Schematic for stage-5 of modified SQRT CSA
Fig. 11: Schematic of 7-bit modified SQRT CSA
896

07095289

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

07095289

Uploaded by

Copyright:

Available Formats

2015 2nd International Conference on Signal Processing and Integrated Networks (SPIN)

CMOS Implementation of Efficient 16-Bit Square Root

Department of Electronics and Communication Engineering

978-1-4799-5991-4/15/$31.00 ©2015 IEEE 891

The block diagram is shown in Fig.1.

Fig. 1: Three modules of a full adder [6]

The design of H and H’ is optimized for enhancing

Fig. 2: CMOS TG based XOR gate

SQRT CSA circuit involves 2:1 Mux, HA and FA

Block Name No. of Transistor

Table II. Lookup table of 4-bit BEC [2]

Input [B3:B0] Output [X3:X0]

3:2 RCA block with Cin=0 is realized by one FA and

FA=60 (3×20), HA=12 (1×12), Mux=18 (3×6)

Similarly, the transistor count for other stages can be

Stage No. Transistor Power Delay (ns)

Stage-2(2-Bit) 90 183.8 1.986

V. AREA EVALUATION OF MODIFIED 16-BIT

The combined circuit diagram for stage-2 and stage-

HA=12 (1×12) Stage No. Transistor Power Delay (ns)

Stage-5(5-Bit) 196 285.6 4.823

Total 582 808.4 14.034

VI. SIMULATION RESULTS

Fig. 9: Schematic for stage-4 of modified SQRT CSA

Fig. 10: Schematic for stage-5 of modified SQRT CSA

Fig. 11: Schematic of 7-bit modified SQRT CSA

You might also like