Final Report Book

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

PERFORMANCE ANALYSIS OF FULL

ADDER IN MULTISTAGE ADDERS USING


HYBRID TECHNOLOGY
A PROJECT REPORT

Submitted by

J.BLESSY (730417106005)

M.GOMATHI (730417106015)

V.HARINISHRI (730417106017)

S.JAMUNA (730417106024)

in partial fulfillment for the award of the

degree of

BACHELOR OF ENGINEERING

in

ELECTRONIC AND COMMUNICATION ENGINEERING

ERODE SENGUNTHAR ENGINEERING COLLEGE (AUTONOMOUS)


PERUNDURAI, ERODE-638 057

MARCH 2021

ERODE SENGUNTHAR ENGINEERING COLLEGE (AUTONOMOUS)


PERUNDURAI, ERODE-638 057

i
BONAFIDE CERTIFICATE

Certified that this project report “PERFORMANCE ANALYSIS OF FULL ADDER


IN MULTISTAGE ADDERS USING HYBRID TECHNOLOGY ” is the bonafide
work of “J.BLESSY (730417106005), M.GOMATHI (730417106015),
V.HARINISHRI (730417106017), S.JAMUNA (730417106024)” who carried
out the project work under my supervision. Certified further that to the best of
my knowledge the work reported herein does not form part of any other thesis or
dissertation on the basis of which a degree or award was conferred on an earlier
occasion on this or any other candidate.

SUPERVISED BY,
Mr.V.Thamizharasan
ASSISTANT PROFESSOR/ECE

Submitted for the Anna University project viva voice held on _______________

INTERNAL EXAMINER EXTERNAL EXAMINER

ACKNOWLEDGEMENT
It is our privilege and great pleasure to express our sincere and
heartful thanks to our President & Correspondent
Thiru.G.KAMALAMURUGAN for giving opportunity to study in this
prestigious institution.

We get immense pleasure to thank our Secretary


Thiru.S.N.THANGARAJU for their well-intentioned support throughout
our academic career.

ii
We are indebted to our Principal Dr.V.VENKATACHALAM for
his constant encouragement and valuable guidance throughout our career.

We express our sincere and heartfelt thanks to KARTHIKKUMAR


Head of the Department, Electronics and Communication Engineering for
her valuable suggestions and encouragement throughout the project.

We extend our sincere thanks to our beloved Project guide


Mr.V.Thamizharasan Assistant Professor and Coordinators
Dr.G.S.SATHEESH KUMAR & Mr.G.LINGESWARAN, Assistant
Professor, Department of Electronics and Communication Engineering for
his encouragement and valuable support for our project.

We offer our sincere thanks to all our esteemed teaching, nonteaching


staff members of Electronics and Communication
Engineering Department, my dear parents and friends who have contributed
their continuous support by their valuable suggestions and encouragement.

OBJECTIVE

➢Design of hybrid based full adder.

➢To minimize the power consumption of full adder.

➢To minimize the power consumption in multistage adder.

➢To analyze the timing behavior of multistage adder circuits.

iii
CONTENT

• Objective

• Existing Method

• Proposed Method

• Block Diagram

• Conclusion

• References

Existing Methods

➢ The most popular logic style in very large scale of integration circuit design
is conventional CMOS (C-CMOS) which has pull-up and pulldown transistor
networks.

➢ providing good drive capabilities and full output swing

➢ C-CMOS circuits cause more short circuit current and more dynamic current
at switching time, which causes more power consumption in comparison with
hybrid logic ones

➢ C-CMOS full adder has a large input capacitance because of the number of
connections to the pMOS and nMOS transistor which decreases the speed

➢ Hybrid logic style utilizes the characteristics of distinct logic styles for their
implementation to enhance the performance of the design CONVETIONAL
FULL ADDER TRUTH TABLE OF FULL

iv
v
Abstract—A novel design of a hybrid Full Adder (FA) using Pass Transistors
(PTs), Transmission Gates (TGs) and Conventional Complementary Metal Oxide
Semiconductor (CCMOS) logic is presented. Performance analysis of the circuit
has been conducted using Cadence toolset. For comparative analysis, the
performance parameters have been compared with twenty existing FA circuits. The
proposed FA has also been extended up to a word length of 64 bits in order to test
its scalability. Only the proposed FA and five of the existing designs have the
ability to operate without utilizing buffer in intermediate stages while extended to
64 bits. According to simulation results, the proposed design demonstrates notable
performance in power consumption and delay which accounted for low power delay
product. Based on the simulation results, it can be stated that the proposed hybrid
FA circuit is an attractive alternative in the data path design of modern high-speed
Central Processing Units. Index Terms— hybrid adder, full adder, transmission
gate, pass transistor. I. INTRODUCTION ITH the explosive evolution of transistor
scaling, research work related to low-power design of microelectronics circuits
have been greatly escalated.. As a result, demand for high performance design of
microelectronic circuits has skyrocketed [1]. Modern image and video processing
operations, digital signal processing (DSP) chips, microprocessors and many other
applications require large scale of arithmetic operations [2]. One of the most
elementary arithmetic operations is addition which is substantially and repetitively
used in modern computational applications. A 1-bit Full Adder (FA) is considered
as the nucleus of binary addition since it is the fundamental cell for building wide
word-length adders [3-4]. Hence, performance enhancement of FA is crucial to the
Mehedi Hasan and Md. Jobayer Hossein are with the ECE Department, North South
University, Dhaka 1229, Bangladesh. (e-mail: mehedi.hasan01@northsouth.edu,
jobayerhossein@gmail.com). Mainul Hossain was with the University of Central
Florida, Orlando, FL 32826 US. Currently, he is with the Department of Electrical
and Electronic Engineering, Dhaka University, Dhaka 1000, Bangladesh. (email:
mainul.eee@du.ac.bd). Hasan U. Zaman is a Senior Consultant of Intel
Corporation, Hudson, MA 1749, US and a Professor (on leave) of North South
University, Dhaka 1229, Bangladesh. (e-mail: hasan.zaman@northsouth.edu)
Sharnali Islam was with the University of Illinois at Urbana-Champaign,
Champaign, IL 61801 US. Currently she is with Dhaka University, Dhaka 1000,
Bangladesh. (email: sharnali.eee@du.ac.bd). performance improvement of the
Arithmetic and Logic Unit (ALU) in the microprocessors. In this research, a new
hybrid FA using PTs, TGs and static CMOS logic has been proposed. The FA was
implemented in 45 nm technology using Cadence tools. In order to judge the
reliability, a comparative inspection on performance parameters of the proposed
design has been conducted with twenty existing FA designs with supply voltage
varied from 0.4 V to 1.2 V. Moreover, the FAs have been extended to wide word
1
length adders to test their performance parameters in large scale circuits. The
proposed FA exhibited remarkable performance both as single cell and extended
form compared to the existing FAs. II. RELATED WORK Complementary Pass
Transistor Logic (CPL) based FA [5], 12 Transistor (12-T) FA [6] and
Conventional CMOS (CCMOS) logic based FA [7] utilize single logic style for FA
implementation. Due to having voltage degradation, CPL logic requires additional
buffers to bring back the signal to supply voltage level. The problem of voltage
degradation is not associated with C-CMOS FA. However, large input impedance is
a major problem in C-CMOS FA. Researchers nowadays tend to use hybrid design
approach which utilizes the favorable aspects of several logic styles within the same
FA cell. TG Adder (TGA) in [8], [9] and Transmission Function Adder (TFA) in
[10] use TGs in the design. While TGA and TFA FA do not suffer from voltage
degradation, poor capability is a major concern in these two adders. FA cells using
24 transistors (24-T) [11], 14 transistors (14- T) [12] and 10 transistors (10-T) [13]
also use multiple logic style for FA implementation. 24-T FA uses a 3 input XOR
gate for sum calculation rather than cascading two separate 2 input XOR gate. Input
bits in 14-T and 10-T FA are at first fed into a XOR gate. The XOR gate acts as an
intermediate node whose output is employed to produce the final output. Very low
transistor count in 14-T and 10-T FAs accounts for less surface area. However,
driving capability remains a major concern which limits their application in high
fan-out cases. Hybrid Pass Static CMOS (HPSC) FA in [14] uses Pass Transistor
Logic (PTL) to generate XOR and XNOR signals in intermediate nodes. The output
side consists of C-CMOS logic which provides full-swing outputs. However,
employment of CCMOS logic increases transistor count and capacitance. Design of
a Scalable Low-Power 1-bit Hybrid Full Adder for Fast Computation Mehedi
Hasan, Md. Jobayer Hossein, Mainul Hossain, Hasan U. Zaman, Senior Member,
IEEE and Sharnali Islam W 1549-7747 (c) 2019 IEEE. Personal use is permitted,
but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more
information. This article has been accepted for publication in a future issue of this
journal, but has not been fully edited. Content may change prior to final publication.
Citation information: DOI 10.1109/TCSII.2019.2940558, IEEE Transactions on
Circuits and Systems II: Express Briefs 2 Double Pass transistor logic (DPL) and
Swing Restored CPL (SRCPL) Hybrid FAs proposed in [15] generate ANDOR
signal to compute output carry and XOR-XNOR signal for the sum output. Input
carry is used as the select bit of TG based multiplexers (MUX) to produce outputs
from intermediate AND-OR and XOR-XNOR nodes. The hybrid FA in [16] uses
inverter in the output side of the adder and utilized CPL and TG logic in the input
side. The idea of using C-CMOS logic-based inverter in output side is to provide
good driving capacity. More Hybrid FAs are reported in [17- 21]. Based on Gate
2
Diffusion Input (GDI) technique, three FA designs (addressed as GDI D1, GDI D2
and GDI D2 in this paper) have been reported in [22]. Major problem associated
with GDI adders is its weak output signal. III. PROPOSED FULL ADDER
DESIGN APPROACH Design approach of the proposed FA is shown in Fig. 1. The
design of the proposed FA is divided into four major modules: two modules for
carry generation and the other two for sum. Schematic of the proposed design is
represented by Fig. 2. The design methods and detailed description of the modules
are provided in the following sub-sections. A. Carry Generation From the truth
table of a FA, it can be observed that If, 0݊݊ = ‫ ܥ‬then ‫ = ݐݑ݋ܥ‬AND function and
if, 1݊݊ = ‫ ܥ‬then ‫ = ݐݑ݋ܥ‬OR function Therefore, the proposed carry generation part
consists of a novel AND-OR module (module 1) based on TG and CPL logic.
Within the AND-OR module, implementation of AND and OR gates are quite
similar, except the fact that nMOS and pMOS transistors are interchanged. The
conditions required to design the proposed AND gate are as follows: If 0 = ‫ ܣ‬, then
0 = ‫ ݐݑ݋ݐݑ݋‬and (1) If 1=‫ܣ‬, then 2) ‫ )ݐݑ݋ݐݑ݋ = ܤ‬The first condition for AND gate
design in (1) is carried out by the pass transistor N2 and TG1, respectively. The
later condition (2) is carried out by TG1. In the same way, conditions for designing
OR gate are: If 1 = ‫ ܣ‬, then 1 = ‫ ݐݑ݋ݐݑ݋‬and (3) If 0=‫ܣ‬, then 4) ‫ )ݐݑ݋ݐݑ݋ = ܤ‬Here,
conditions (3) and (4) are carried out by pass transistor P2 and TG2, respectively. A
TG based 2:1 MUX (module 2: consists of TG3 and TG4) is used to select the
appropriate carry-out signal from AND-OR module depending on input carry Cin.
B. Sum Generation The sum output of the proposed FA is generated by cascading
two XOR modules (module 3 and 4). The XOR gates have exactly the same
structure and are implemented using TGs and PTs. The conditions for designing the
XOR gate for module 3 are: If 0 = ‫ ܣ‬, then 5) ‫ )ݐݑ݋ݐݑ݋ = ܤ‬If 1=‫ ܣ‬and 0=‫ܤ‬, then
6) ‫ )ݐݑ݋ݐݑ݋ = ܣ‬If 1=‫ ܣ‬and 1=‫ܤ‬,then 7) ‫ )ݐݑ݋ݐݑ݋ = ܣ‬In this case, condition (5)
has been implemented using TG5. The pass transistor P3 in module 3 (XOR gate)
implements condition (6) and the pass transistor N3 implements condition (7).
Employing exactly the same method, XOR gate for module 4 has been designed
using Cin and A⊕B (output of module 3) as inputs. For this case, the conditions
are: (8 (‫ ܤ ⊕ ܣ = ݐݑ݋ݐݑ݋‬then , 0݊݊ = ‫ ܥ‬If If 1݊݊ = ‫ ܥ‬and 0=‫ܣ⊕ܤ‬, then ‫ݐݑ݋ݐݑ݋‬
= ‫݋݋ )ܥ‬9) If 1݊݊ = ‫ ܥ‬and 1=‫ܣ⊕ܤ‬, then ‫݋݋ )ݐݑ݋ݐݑ݋ = ܥ‬10) In this case, the
conditions (8), (9) and (10) have been implemented by TG6, P4 and N4,
respectively. IV. SIMULATION RESULT AND DISCUSSION To analyze various
performance parameters, simulation has been conducted in GPDK 45 nm
technology node using Cadence simulation tools. Buffers are added in the input and
output side for simulation test bench. The output side also consisted a load
capacitance of 6 fF. In this way, the circuit experiences significant signal distortion

3
unlike realistic environment. Input frequency has been set to 100 MHz. For
optimized transistor sizing, particle swarm optimization algorithm has been utilized
for implementation of FAs [23]- [24]. For transistor sizing using swarm algorithm,
we initialize a vector ݊‫ ݋×݋‬ሬሬሬሬሬሬሬሬሬሬሬԦ where m is the transistor count
and n is the number of particles. Each particle has 3 sorts of movement tendency in
the search space: (a) tendency due to its own inertia (b) tendency towards particle’s
best value (c) tendency towards team’s best value. The circuit is simulated by
randomly changing transistor width within the range of minimum to maximum
allowable size. The objective is minimum Power Delay Product (PDP). If a
particle’s current PDP < best PDP value so far found by that particle, then the
current PDP is set as the particle’s best PDP. In the same way, if the best PDP for
current simulation run < the best PDP value so far found in the entire search space,
then the team’s best value is updated with the current value. In this way, after
completing the desired number of iterations, the team’s best value of PDP is chosen
as the optimal PDP and the corresponding transistor widths are chosen for circuit
implementation. A. Performance of FA cells for various supply voltage The post-
layout simulation results on delay, power and Power Delay Product (PDP) for the
supply voltage 0.4 V (minimum voltage in order to avoid sub-threshold operation),
0.8 V (mid-point voltage between 0.4 V and 1.2 V) and 1.2 V (normal operating
voltage) are represented using Fig. 3 and Table I. The results (Fig. 3) demonstrate
that proposed FA obtained superior performance in case of delay and PDP.
Although the proposed design does not obtain best performance in power (Fig. 3b),
still the value is low enough for practical utilization in modern processors. The least
technology node in which 10-T FA [13] could operate is 180 nm technology with a
least supply voltage of 1.8 V. Hence, to maintain uniform environment for
comparison, performance parameters (15.03 µW power and 129.48 ps delay) of 10-
T FA are not added in Table I. 1549-7747 (c) 2019 IEEE. Personal use is permitted,
but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more
information. This article has been accepted for publication in a future issue of this
journal, but has not been fully edited. Content may change prior to final publication.
Citation information: DOI 10.1109/TCSII.2019.2940558, IEEE Transactions on
Circuits and Systems II: Express Briefs 3 Fig. 1. Block diagram of proposed full
adder. Fig. 2 Schematic of the proposed FA. 12-T [6], 14-T [12], HPSC [14] and
Hybrid 2 [18] FAs could not operate flawlessly while simulated with 0.4 V supply
voltage. Hence, performance parameters of these FAs for 0.4 V could not be
obtained for presenting in Table I and Fig. 3. To figure out the least operating
voltages of these FAs, further simulation has been conducted using 0.45 V-0.75 V
supply voltage. The least supply voltage required for 12-T [6], 14-T [12], HPSC
[14] and Hybrid 2 [18] FAs are 0.75 V, 0.7 V, 0.7 V and 0.5 V respectively. It has
4
been also observed that delays of FA cells start increasing quite drastically while
operating below 0.8 V supply voltage. As technology scales down, interconnect
performance depends more on the resistance rather than the capacitance [24]. To
achieve the best performance, the interconnect resistance and capacitances need to
be re-optimized. For 5nm node, as demonstrated by Pan et al. [24], increasing the
interconnect width beyond half pitch without changing the interconnect pitch yields
55% improvement in energy-delay product for vertical field-effect transistor
(VFET) circuits. Therefore, for more aggressive technologies, the interconnect
width should be a target optimization parameter to achieve desired performance. B.
Performance of FAs operating in cascade To inspect scalability, FAs are extended
into 4, 8, 16, 32 and 64 bits in Ripple Carry Adder style as depicted in Fig. 4 [18].
No intermediate buffer has been added while cascading FAs. Simulation results on
performance parameters are listed in Table II. It was observed that only 5 out of 20
existing FAs and the proposed design can operate when extended to 64-bits. As per
Fig. 3, Hybrid 2 [18], Hybrid 6 [21] and GDI D1 [22] FAs showed promising
performance as single cell. However, they can be only extended to maximum 8-bits.
This limitation happens due to the decline of signal strength (voltage) while
propagating through several stages. Due to having CCMOS based logic style in the
output terminals, the output signals come from Vdd and Gnd for CCMOS [7] and
24-T [11] FAs. As a result, the voltage gets replenished in each stage for which
CCMOS [7] and 24-T [11] FAs do not face the problem of voltage deterioration in
long carry chains. ` Fig. 3. Performance of FA cells (a) average power (b)
propagation delay (c) PDP. 1549-7747 (c) 2019 IEEE. Personal use is permitted,
but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more
information. This article has been accepted for publication in a future issue of this
journal, but has not been fully edited. Content may change prior to final publication.
Citation information: DOI 10.1109/TCSII.2019.2940558, IEEE Transactions on
Circuits and Systems II: Express Briefs 4 TABLE I PERFORMANCE OF FULL
ADDER CELLS UNDER VARIOUS SUPPLY VOLTAGE TC: Transistor Count,
AP: Average Power, DP: Dynamic Power (Switching Power + Short-Circuit
Power), PDP: Power Delay Product, SP: Static Power, F: Failed to Operate, PR:
Performance Ratio (Proposed FA Value: Best Value Obtained by Other FAs) Full
Adder Ref. no T C Power (AP in µW, DP and SP in nW) Delay (ps) PDP (aJ) 0.4 V
0.8 V 1.2 V 0.4 V 0.8 V 1.2 V 0.4 V 0.8 V 1.2 AP DP SP AP DP SP AP DP SP V
CPL [5] 32 0.488 458.2 29.8 1.72 1612.8 107.3 3.89 3634.1 255.9 612.1 84.8 37.3
298.7 145.9 145.1 12-T [6] 12 F F F 1.2 1158.4 41.6 2.54 2423.4 116.6 F 447.4
67.6 F 536.9 171.7 CCMOS [7] 28 0.159 149.1 9.9 0.68 660.3 19.7 1.91 1841.6
68.4 809.3 125.8 39.7 145.7 85.5 75.8 TGA [8] 20 0.163 158.8 8.2 0.64 617.4 22.6
1.41 1365.8 44.2 893.4 139.7 58.3 151.9 89.4 82.2 TFA [10] 16 0.155 150.9 8.1
5
0.61 586.5 23.5 1.33 1288.7 41.7 957.5 145.6 66.8 155.1 88.8 88.8 24-T [11] 24
0.202 189.9 12.1 0.76 724.8 35.2 1.68 1628.8 51.2 908.3 128.4 65.9 183.5 97.6
110.7 14-T [12] 14 F F F 1.12 1059.7 60.3 2.34 2202.7 137.3 F 385.6 67.4 F 431.9
157.7 HPSC [14] 22 F F F 0.89 844.5 45.5 2.09 1979.5 110.5 F 89.7 38.6 F 78 80.7
DPL [15] 22 0.256 240.1 15.9 0.87 833.3 36.7 2.11 1986.3 123.7 835.6 91.9 45.6
213.9 79.9 96.2 SRCPL [15] 20 0.193 182.8 10.2 0.7 759.8 31.1 1.79 1726 64
869.6 132.3 50.4 167.8 104.5 90.2 New-HPSC [16] 24 0.217 205.3 11.7 0.77 739.8
30.2 2.04 1931.6 108.4 998.5 193.6 71.5 216.7 149.1 145.9 Hybrid 1 [17] 24 0.267
258.8 16.2 0.78 749.3 30.7 2.31 2176.4 133.6 932.4 189.1 60.2 248.9 145.6 139
Hybrid 2 [18] 16 F F F 0.35 339.7 10.3 0.94 912.8 27.2 F 231.4 69.6 F 81 65.4
Hybrid 3 [19] 22 0.169 162.6 6.4 0.68 661.1 18.9 1.53 1473.5 56.5 811.8 101.3
48.6 146.9 68.9 74.4 Hybrid 4 [20] 16 0.113 106.9 6.1 0.44 428.2 11.8 0.98 951.3
28.7 675.3 81.6 38.7 76.3 35.9 37.9 Hybrid 5 [21] 21 0.146 136.9 9.1 0.58 561.1
18.9 1.31 1270.4 39.6 697.5 96.8 43.9 108.1 56.1 57.5 Hybrid 6 [21] 23 0.133
125.3 7.7 0.54 522.5 17.5 1.35 1304.1 45.9 683.4 81.4 35.1 98.4 43.9 47.4 GDI D1
[22] 18 0.127 144.8 7.2 0.46 446.7 13.3 1.09 1054.7 35.3 599.8 98.8 31.8 76.17
43.5 34.7 GDI D2 [22] 22 0.165 155.6 9.4 0.63 604.6 25.4 1.49 1437.5 52.5 547.6
77.3 28.6 90.4 48.7 42.6 GDI D3 [22] 21 0.152 116.6 8.5 0.61 586.3 23.7 1.32
1277.6 42.4 708.3 90.53 39.7 114.7 55.2 52.4 Proposed ----- 22 0.129 122.2 6.8
0.48 466.6 13.4 1.17 1135.7 34.3 523.8 65.7 25.3 67.6 31.5 29.6 Performance Ratio
(PR) 1.14:1 1.14:1 1.11:1 1.37:1 1.37:1 1.3:1 1.24:1 1.24:1 1.26:1 1:1.04 1:1.17
1:1.13 1:1,13 1:1.14 1:1.17 TABLE II PERFORMANCE COMPARISON OF
FULL ADDER CELLS OPERATING IN CASCADE Supply voltage: 0.8 V, F:
Failed to Operate, PR: Performance Ratio (Proposed FA Value: Best Value
Obtained by Other FAs) Full Adder Ref. no Power (µW) Delay (ps) PDP (fJ) 4 bit 8
bit 16 bit 32 bit 64 bit 4 bit 8 bit 16 bit 32 bit 64 bit 4 bit 8 bit 16 bit 32 bit 64 bit
CPL [5] 6.47 13.46 F F F 1256.7 4484.9 F F F 8.13 60.36 F F F 12-T [6] F F F F F
F F F F F F F F F F CCMOS [7] 2.42 4.68 9.48 20.79 44.2 510.6 1042.5 2105.8
4233.2 8490.5 1.24 4.88 19.96 88.01 373.3 TGA [8] 2.45 5.33 F F F 709.1 2145.7 F
F F 1.74 11.44 F F F TFA [10] 2.5 4.45 F F F 478.7 2444.5 F F F 1.196 10.88 F F F
24-T [11] 2.83 5.56 10.95 22.06 46.31 521.2 1078.5 2183.9 4389.6 8867.1 1.47
5.996 23.91 96.83 410.6 14-T [12] F F F F F F F F F F F F F F F HPSC [14] 3.24
6.3 12.45 25.98 F 561.8 1214.3 2520.2 5131.7 F 1.82 7.65 31.38 133.3 F DPL [15]
3.62 7.92 F F F 1467.6 4814.4 F F F 5.31 38.13 F F F SRCPL [15] 3.59 F F F F
1934.3 F F F F 6.94 F F F F New HPSC [16] 2.84 5.57 11.11 22.75 F 945.5 2024.6
4203.5 8581.2 F 2.69 11.28 46.7 195.2 F Hybrid 1 [17] 2.81 5.47 10.77 22.69 F
767.8 1552.5 3485.4 7057.8 F 2.16 8.49 37.54 160.1 F Hybrid 2 [18] 1.67 3.65 F F
F 1683.9 6436.7 F F F 2.81 23.49 F F F Hybrid 3 [19] 2.53 5.41 11.69 25.45 55.94
413.3 881.6 1782.4 3572.5 7976.7 1.06 4.77 20.83 90.92 446.2 Hybrid 4 [20] 1.83
3.99 F F F 475.7 2487.6 F F F 0.87 9.93 F F F Hybrid 5 [21] 2.39 5.23 F F F 539.6
6
3094.6 F F F 1.29 16.18 F F F Hybrid 6 [21] 2.41 5.28 F F F 498.6 2986.3 F F F 1.2
15.77 F F F GDI D1 [22] 1.93 4.27 F F F 500.3 2493.6 F F F 0.97 10.65 F F F GDI
D2 [22] 2.08 4.38 9.4 20.16 43.23 345.9 743.6 1621.2 3550.6 7985.1 0.72 3.26
15.24 71.58 345.2 GDI D3 [22] 2.57 5.56 11.86 25.94 56.87 410.7 883.1 1924.2
4290.8 9826.4 1.05 4.91 22.82 111.3 558.8 Proposed ----- 1.95 4.19 9.01 19.11
40.19 285.6 613.8 1326.5 2946.7 6725.9 0.56 2.57 11.95 56.3 270.3 Performance
Ratio (PR) 1.17:1 1.15:1 1:1.04 1:1.05 1:1.08 1:1.21 1:1.21 1:1.22 1:1.2 1:1.19
1:1.28 1:1.27 1:1.28 1:1.27 1:1.28 Fig. 4. Implementation of n-bit adder using 1-bit
FA. HPSC [14] and New-HPSC [16] FAs despite having CCMOS logic in the
output terminals failed to operate in 64- bits. It occurred since their delays were
extremely high to comply with the input frequency (100 MHz). Hybrid 3 [19], GDI
D2 [22], GDI D3 [22] FAs and the proposed design do not incorporate CCMOS
logic in the output terminals. Still, they managed to operate up to 64-bits due to
their internal design structures. With careful inspection of Fig. 4, it can be observed
that in each stage, An and Bn terms are inserted as fresh inputs. The carry term
needs to propagate through several stages. In the proposed design (Fig. 2), since the
input carry term Cin is used as gate control of the TGs, the possible signals which
might appear at the output terminal (Cout terminal) are Vdd, Gnd or B. If we
consider only B (since Vdd and Gnd will be provided as fresh inputs in each stage),
for n=0, B0 will appear in the output terminal (C1 output terminal) whereas for
n=1, B1 will appear in output terminal (C2 output terminal). Hence, the same signal
does not propagate through n=0 to n=63 for which the design is not subjected to
voltage degradation when extended to wide adder. 1549-7747 (c) 2019 IEEE.
Personal use is permitted, but republication/redistribution requires IEEE
permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more
information. This article has been accepted for publication in a future issue of this
journal, but has not been fully edited. Content may change prior to final publication.
Citation information: DOI 10.1109/TCSII.2019.2940558, IEEE Transactions on
Circuits and Systems II: Express Briefs > REPLACE THIS LINE WITH YOUR
PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 5
Fig. 5. Performance of proposed design under various loads with 0.8 V supply
voltage. C. Driving capability test of the proposed design To test drive capability of
the proposed FA design, a wide range of output loads from fan-out of 4 unit-size
inverters (FO-4) to fan-out of 64 unit-size inverters (FO-64) have been applied. The
supply voltage for this case was set to 0.8 V. Simulation data are illustrated using
Fig. 5. With careful observation of Fig. 5, it can be noticed that delay, AP and PDP
are extremely high for FO-32 and FO-64. Hence, the most favorable fan-out load
condition for the proposed design would be up to FO-16. V. CONCLUSION In this
research, a novel hybrid FA design showing markedly improved performance has
7
been proposed. Characteristics of the proposed design have been compared with
twenty existing FAs. For performance analysis, simulation was conducted using
Cadence toolset. As per simulation results, the proposed FA demonstrates superior
performance in speed and PDP while operated as a single cell. Moreover, the FAs
have been extended up to a word length of 64-bits in order to test scalability. Only
the proposed FA and five of the existing designs have the ability to operate without
utilizing buffers in intermediate stages while extended to 64- bits. In the cascade
mode, the performance parameters point out that the proposed design is
comparatively more suitable for wider word-length adder implementation which is
the trend in modern computing systems where high-speed and efficient computation
while consuming low power is essential. REFERENCES [1] F. Frustaci, M.
Lanuzza, P. Zicari, S. Perri and P. Corsonello, "Designing High-Speed Adders in
Power-Constrained Environments," IEEE Transactions on Circuits and Systems II:
Express Briefs, vol. 56, no. 2, pp. 172-176, Feb. 2009. [2] A. Pal, Low Power VLSI
Circuits and Systems, New Delhi, India: Springer India, 2015. [3] B. K. Mohanty,
"Efficient Fixed-Width Adder-Tree Design," IEEE Transactions on Circuits and
Systems II: Express Briefs, vol. 66, no. 2, pp. 292-296, Feb. 2019.. [4] S. Purohit
and M. Margala, “Investigating the impact of logic and circuit implementation for
full adder performance,” IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, vol. 20, no. 7, pp. 1327-1331, 2012. [5] R. Zimmermann and W.
Fichtner, “Low-power logic styles: CMOS versus pass-transistor logic,” IEEE J.
Solid-State Circuits, vol. 32, no. 7, pp. 1079–1090, Jul. 1997. [6] Yingtao Jiang, A.
Al-Sheraidah, Yuke Wang, E. Sha and Jin-Gyun Chung, "A novel multiplexer-
based low-power full adder," IEEE Transactions on Circuits and Systems II:
Express Briefs, vol. 51, no. 7, pp. 345-348, July 2004. [7] N. H. E. Weste and D. M.
Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. Boston,
MA, USA: Addison-Wesley, 2010. [8] A. M. Shams, T.K. Darwish, and M. A.
Bayoumi, “Performance analysis of low-power 1-bit CMOS full adder cells,” IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 10, no. 1, pp. 20–29, Feb. 2002.
[9] C. H. Chang, J. M. Gu, and M. Zhang, “A review of 0.18-μm full adder
performances for tree structured arithmetic circuits,” IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 13, no. 6, pp. 686–695, Jun. 2005. [10] M. Alioto, G. Di
Cataldo, and G. Palumbo, “Mixed full adder topologies for high-performance low-
power arithmetic circuits,” Microelectron. J., vol. 38, no. 1, pp. 130–139, Jan. 2007.
[11] C.-K. Tung, Y.-C. Hung, S.-H. Shieh, and G.-S. Huang, “A low-power high-
speed hybrid CMOS full adder for embedded system,” in Proc. IEEE Conf. Design
Diagnostics Electron. Circuits Syst., vol. 13. Apr. 2007, pp. 1–4. [12] M.
Vesterbacka, “A 14-transistor CMOS full adder with full voltageswing nodes,” in
Proc. IEEE Workshop Signal Process. Syst. (SiPS), Taipei, Taiwan, Oct. 1999, pp.
713–722. [13] H. T. Bui, Y. Wang, and Y. Jiang, “Design and analysis of low-
8
power 10-transistor full adders using novel XOR-XNOR gates,” IEEE Transactions
on Circuits Systems II: Express Briefs, vol. 49, no. 1, pp. 25–30, Jan. 2002. [14] M.
Zhang, J. Gu, and C.-H. Chang, “A novel hybrid pass logic with static CMOS
output drive full-adder cell,” in Proc. Int. Symp. Circuits Syst., May 2003, pp. 317–
320. [15] M. Aguirre-Hernandez and M. Linares-Aranda, “CMOS full-adders for
energy-efficient arithmetic applications,” IEEE Trans. Very Large Scale Integr.
(VLSI) Syst., vol. 19, no. 4, pp. 718–721, Apr. 2011. [16] S. Goel, A. Kumar, and
M. A. Bayoumi, “Design of robust, energyefficient full adders for deep-
submicrometer design using hybrid-CMOS logic style,” IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 14, no. 12, pp. 1309–1321, Dec. 2006. [17] I.
Hassoune, D. Flandre, I. O'Connor and J. Legat, "ULPFA: A New Efficient Design
of a Power-Aware Full Adder," in IEEE Transactions on Circuits and Systems I:
Regular Papers, vol. 57, no. 8, pp. 2066- 2074, Aug. 2010. [18] P. Bhattacharyya,
B. Kundu, S. Ghosh, V. Kumar and A. Dandapat, "Performance Analysis of a Low-
Power High-Speed Hybrid 1-bit Full Adder Circuit," IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 23, no. 10, pp. 2001-2008, Oct. 2015.
[19] R. F. Mirzaee, M. H. Moaiyeri, H. Khorsand and K. Navi, “A New Robust and
Hybrid High-Performance Full Adder Cell,” Journal of Circuits, Systems and
Computers, vol. 20, no. 4, pp. 641-655, 2011. [20] M. C. Parameshwara and H. C.
Srinivasaiah, “Low-Power Hybrid 1-Bit Full Adder Circuit for Energy Efficient
Arithmetic Applications,” Journal of Circuits, Systems and Computers, vol. 26, no.
1, pp. 1-15, 2017. [21] P. Kumar and R. K. Sharma, “An Energy Efficient Logic
Approch to Implement CMOS Full Adder,” Journal of Circuits, Systems and
Computers, vol. 26, no. 5, pp. 1-20, 2017. [22] M. Shoba and R. Nakkeeran, “GDI
based full adders for energy efficient arithmetic applications,” Engineering Science
and Technology, an International Journal, vol. 19, no. 1, pp. 485-496, 2016. [23] J.
J. Liang, A. K. Qin, P. N. Suganthan, and S. Baskar, “Comprehensive learning
particle swarm optimizer for global optimization of multimodal functions,” IEEE
Trans. Evol. Comput., vol. 10, no. 3, pp. 281–295, Jun. 2006. [24] C. Pan and A.
Naeemi, “A Paradigm Shift in Local Interconnect Technology in the Era of
Nanoscale Multigate and Gate-All-Around Devices,” vol. 36, no. 3, pp. 274-276,
201

Hybrid Logical Effort for Hybrid Logic Style Full Adders in Multistage
Structures Hareesh-Reddy Basireddy, Karthikeya Challa, and Tooraj Nikoubin
, Senior Member, IEEE Abstract— One of the critical issues in the advancement
of very large scale of integration circuit design is the estimation of timing
behavior of the arithmetic circuits. The concept of logical effort provides a
proficient approach to comprehend and assess the timing behavior of circuits
9
with conventional CMOS (C-CMOS) structure. However, this technique is not
working for circuits with a hybrid structure. On the other hand, numerous
circuits with the hybrid structure which are faster and consume less power than
C-CMOS one have been proposed for different applications such as portable and
IoT devices. In this regard, the necessity of having and use of a simple and
efficient timing behavior method like conventional logical effort for analysis of
the hybrid adder circuits is inevitable. This paper proposes an efficient analysis
and modeling technique that enables designers to assess the timing behavior of
hybrid full adder circuits at the block level and anticipate their performance in
multistage circuits. The gain and selection factor are introduced as a criterion
for accurate selection and optimization of the hybrid adder cells measurable on
the single test bench for management of energy efficiency and performance
tradeoff. The proposed method is investigated using 32-nm CMOS and FinFET
technologies. Index Terms— Drivability, hybrid CMOS, hybrid logical effort,
input capacitance, logical effort, timing behavior. I. INTRODUCTION FULL
adders as the core of arithmetic building blocks have a fundamental role in the
operation of any computer system, from the simplest controller to the most
complex processors [1]. From the past several years, the full adder is a focal
concentration area for arithmetic blocks, especially for different applications
such as portable devices and IoT [2], [3]. A full adder can be designed using
several logic styles, where each style has its advantages and disadvantages. The
most popular logic style in very large scale of integration circuit design is
conventional CMOS (C-CMOS) which has pull-up and pull-down transistor
networks. The pull-up network is based on series and parallel pMOS transistors,
and pull-down network is based on series and parallel nMOS transistors. The C-
CMOS style full adder is the general CMOS structure with conventional pull-
up and pull-down networks providing good drive capabilities and full output
swing. On the other hand, Manuscript received May 31, 2018; revised
November 2, 2018; accepted December 18, 2018. (Corresponding author:
Tooraj Nikoubin.) The authors are with the Department of Electrical and
Computer Engineering, Texas Tech University, Lubbock, TX 79409 USA (e-
mail: hareesh-reddy.basireddy@ttu.edu; karthikeya.challa@ttu.edu;
t.nikoubin@ gmail.com). Color versions of one or more of the figures in this
paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier
10.1109/TVLSI.2018.2889833 C-CMOS circuits cause more short circuit
current and more dynamic current at switching time, which causes more power
consumption in comparison with hybrid logic ones. In general, hybrid structure
circuits have less connection to the power supply and ground in comparison with
C-CMOS circuits. Also, C-CMOS full adder has a large input capacitance
10
because of the number of connections to the pMOS and nMOS transistor which
decreases the speed. Hybrid logic style utilizes the characteristics of distinct
logic styles for their implementation to enhance the performance of the design
[3]–[6]. It is notable that the most popular circuits reported in the hybrid
structure are full adders because the complexity of the circuit becomes more
dramatically with increasing the block size and the number of inputs and outputs
such as compressors, carry save adders, and so on. In the gate level, most
popular circuits with hybrid structure are two- and three-input XOR and XNOR
circuits which they are the core of full adders as well. Therefore, full adder
blocks with different input–output drive conditions have been considered in this
investigation. Despite many advantages which are reported for many hybrid full
adders like area–power–energy efficiency, noise tolerant, and high speed, the
main problem in their utilization is irregularity and complexity of their structure.
In this regard, the cell design methodology (CDM) and systematic CDM [6]–
[9] are proposed as a systematic approach for the design of the circuits with
hybrid structure. These methods are presented in such a way that they can keep
their advantages with focus on key points of the circuit design. These key points
are a minimum number of transistors on the critical path, splitting the circuit to
logic part and drive part, the powerless and groundless design of the logic part,
the use of different amend mechanisms for various characteristics, and so on.
To have the hybrid logic style, as a reliable logic style like C-CMOS, efficient
transistor sizing algorithm is another problem. This mentioned sizing method
must be flexible for covering complexity of the circuits with hybrid structure
because the conventional logical effort was not working for them. For example,
in the conventional logical effort, the parallel transistors can get the similar size
based on transistor unit size, and for the series transistors, the size of each
transistor depends on the number of transistors on the path drive and transistor
unit size. For the circuits with a hybrid structure, this technique is not working,
because the circuit structure is more complicated. Simple exact algorithm (SEA)
[10] has been proposed as a heuristic method for optimization and sizing of the
circuits with hybrid structure. This is efficient sizing algorithm, which 1063-
8210 © 2019 IEEE. Personal use is permitted, but republication/redistribution
requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for
more information. This article has been accepted for inclusion in a future issue
of this journal. Content is final as presented, with the exception of pagination. 2
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)
SYSTEMS Fig. 1. (a) TGA with inputs drive path. (b) TFA. (c) New-14T. (d)
C-CMOS. (e) New-HPSC. (f) Cin measurement test bench. (g) Delay versus CL
11
for Cin calculation. is flexible for covering both of structure complexity and
desirable characteristics of the circuit in single-object and multiobject
optimizations, but this technique does not work for delay measurement or timing
behavioral analysis. The concept of logical effort was first presented by
Sutherland et al. [11]. The report clearly explains the methodology to model the
C-CMOS circuits for both circuit optimization with sizing and delay modeling
of single-stage and multistage structures. As recent work in this regard, Lin et
al. [12] proposed an improved logical effort model for the circuits operating in
multiple supply voltage regimes. All the previously explained works focused on
the logical effort of C-CMOS logic style circuits because of their fixed and
proper structure. However, for hybrid logic circuits, a new method is required
to analyze the circuit behavior because of their complicated structure [13].
Another advantage of this paper is to make clear ideas on how to select and use
adder blocks with hybrid structure. For example, some of the full adders are low
power or energy efficient when they are working on a single test bench with
specific input–output drive conditions. However, they are not working properly
in the multistage structures because of the lack of drivability and their big input
capacitance. With adding a buffer to compensate their lack of drive, they can
function properly, but they could not be an efficient cell because the whole
circuit has some overhead of area, power, and energy consumption with extra
buffers. This paper is useful for deeply understanding the adder circuits with a
hybrid structure to make sure they could be reliable blocks in multistage
structures at the same time they are the best for the goal parameters such as
power, energy, and so on. This paper has the potential for improving the
industrial tools for timing analysis as well. The big question in this paper is what
are the critical parameters that designer needs to consider for the circuits with
hybrid structure in the single test bench, for extraction of a complete timing
behavior of the adder cells? In this regard, a new method is proposed to dissect
the timing behavior of the hybrid full adders with the irregular structure, and
simplicity is considered as a principle in the proposed method. The transistor
function full adder (TFA), transmission gate full adder (TGA), New-HPSC,
New-14T (Fig. 1) as most popular hybrid full adders [10] with different input–
output drive conditions are selected in comparison with the C-CMOS full adder
for running this new method. Since propagation delay is, test bench dependent
and its characteristics depend on input and output drive conditions, it could not
be a sufficient parameter for representing timing This article has been accepted
for inclusion in a future issue of this journal. Content is final as presented, with
the exception of pagination. BASIREDDY et al.: HYBRID LOGICAL
EFFORT FOR HYBRID LOGIC STYLE FULL ADDERS IN MULTISTAGE
12
STRUCTURES 3 behavior of the cell. Therefore, in the proposed method, three
parameters have been considered to estimate the timing behavior of the full
adders. These mentioned parameters are switching speed, drive capability, and
input capacitance which help designer to predict the performance of the full
adder cells in multistage structures by some measurement at the single test bench
[Fig. 2(c)]. The ripple carry adder (RCA) and 6:2 compressor are used as the
test bench for doing the multistage analysis for carry and sum output signals,
respectively. The RCA has been selected because it has a multistage path drive,
suitable for timing behavior extraction of the carry signal [Fig. 3(b)]. For
extraction of the sum signal timing behavior, the 6:2 compressor has been
selected which they cascaded in five-stage structure [Fig. 3(a)]. This analysis is
done for all selected full adders. To study the timing behavior, initially, the full
adders have been sized properly to optimize the power delay product (PDP) by
using the SEA [10] as an efficient algorithm. On scaling down the conventional
MOSFET below 20 nm, electrical parameters start to degrade. The FinFET
technology is selected as a replacement of Bulk CMOS, which allows transistors
to be scaled down further with promising advantages over Bulk CMOS such as
high drain current, lower switching voltage, and significantly less static leakage
current [14]. FinFETs have proved to be better performing in terms of speed and
power because of their outstanding ability to produce more current at smaller
dimensions and lesser input voltages, leading to high drive capability compared
to the Bulk CMOS technology. Another question in this paper, which we are
answering is, how this timing analysis method which contains three mentioned
parameters is working on different technologies. Therefore, full adders with
FinFET technology are also analyzed, and their timing behavior is compared
with the Bulk CMOS full adders. All the simulations in this paper are performed
in HSPICE using 32-nm Bulk CMOS and 32-nm shorted gate FinFET models.
II. PROPOSED TIMING BEHAVIOR ANALYSIS This section presents more
insights into the proposed logical effort analysis of the hybrid structure circuits
which contains the following items. A. Switching Time The switching time of
the circuits is measured at a single test bench ([Fig. 2(c)], when output is
connected to the buffer) at two specific input–output conditions. These two
items are unit drivability condition of the previous stage and the unit output
capacitor. The switching time is propagation delay at the standard input–output
drive condition and necessary parameter but not sufficient parameter to
understand the timing behavior of the full adder in multistage structures. Hence,
the drive capability and input capacitance are considered as essential parameters
along with switching time which form an integral part of our timing behavior
analysis. In this way, the minimum size of the inverter (W/L = 2 for pMOS and
13
W/L = 1 for nMOS) with consideration of technology limits has been considered
as unit drivability Fig. 2. Single stage analysis. (a) Switching time. (b) Drive
capability. (c) Single test bench used in simulation. of the previous stage and
unit output load on the single test bench. Also, all drivability and input
capacitance values are considered based on this standardization. It is notable that
these standard and unit values are technology dependent, but they are under the
control of designer for any specific W/L aspect ratio as a unit value for
consideration. Using the minimum size of the technology as a unit value is one
of the best choices, because other values will be normalized with the minimum
value. B. Drive Capability Circuit drivability is the ability of the output of the
circuit to drive the certain load properly. Lesser time the circuit takes to charge
the output capacitor, more prominent is the drive capability of the circuit. For
this analysis, the output is This article has been accepted for inclusion in a future
issue of this journal. Content is final as presented, with the exception of
pagination. 4 IEEE TRANSACTIONS ON VERY LARGE SCALE
INTEGRATION (VLSI) SYSTEMS Fig. 3. Multistage test benches for sum and
carry signals. connected to a capacitor and delay is measured by sweeping the
load capacitor value from 0 to 2 fF with a specific step size of 0.4 fF [Fig. 2(c)],
when output is connected to the variable CL) and load capacitance versus delay
line is plotted. For example, Fig. 1(g) depicts the delay versus load capacitance
for the unit inverter. The slope of the delay versus load capacitance is calculated
[α in Fig. 1(g)]. The complement of the slope of the line is defined as the circuit
drive capability (Dro = 1/α, fF/ps) in this paper. Similarly, the same method is
followed to measure the drive capability for all mentioned full adders. In the
conventional logical effort, the gate delay of the C-CMOS circuit is modeled as
follows: Delay = g.h.b + p. (1) In this delay equation, the “g” is the logical effort.
The “g” is the ratio of the input capacitance of the template gate and the
reference inverter having a minimum size. The “h” parameter is the electrical
effort, which is the ratio of the output capacitance to the input capacitance. The
“b” parameter is the branching factor, which represents the number of fanouts
that occur within a logic network [15], [16], and the “p” parameter is the
parasitic delay, which is the delay of the gate driving without output load [8]. In
the proposed analysis delay of sum and carry outputs of a hybrid full adder can
be demonstrated as follows: TD = TDo + (α × C-Load). (2) In this delay
equation, TD is the delay time, TDo is the delay time at zero load capacitance
(intrinsic delay) which can be matched with “p” in the conventional logical
effort, but “h” and “b” are not used in the proposed method. The C-Load is the
load capacitance, and “α” is the slope of the delay line versus C-Load variation.
The drive capability plays a vital role in the estimation of the full adder
14
performance in multistage structure as it is the factor that decides how fast a
particular cell is going to charge the input capacitance of the next stage like a
current source and can model both RC of the cell plus C-Load. Fig. 4 shows the
timing behavior of all mentioned full adders in this paper for five different stages
based on (2). C. Input Capacitance Input capacitance is an essential parameter
for timing analysis and can be matched with “g” in the conventional logical
effort. The test bench used is the inverter driving each input of the full adder
with the varying load capacitors connected to the sum and carry terminals. The
delay offered by the input of the full adder to the driving inverter is calculated,
and the physical value of the capacitor that causes the same delay is confirmed
as the input capacitance (Cin) [Fig. 1(f) and (g)]. The largest value of the input
capacitances is considered as input capacitance of the full adder. III.
GRAPHICAL ANALYSIS The graphical analysis has been presented against
the complexity of the hybrid circuits and better understanding their structure.
With categorization of the circuits based on proposed logical effort, designers
have a more clear idea which circuit has potential to work faster in multistage
structure and which of them may have lack of the drive and they are not working
without extra buffers or inverters. In this paper, for better understanding drive
parameters between blocks on the multistage structure, the selected adder blocks
are the hybrid full adders with different input–output drive conditions.
Therefore, the main idea of this section is which kind of circuit parameters
should be considered by the designer to achieve an efficient prediction for the
speed of the circuit with a hybrid structure before simulation or physical
measurement. In this section, the proposed graphical analysis allows predicting
necessary information regarding the drive capability and input capacitance of
the circuit by looking at the circuit structure. For example, it can be anticipated
that both C-CMOS and New-HPSC full adders offer good and equal drive
capability because of the presence of the inherent inverter at the output. The
methodology to predict the input capacitance is drawing the drive path from the
inputs toward the outputs (Fig. 1). In some full adders, inputs are directly
connected to the gate terminal of the transistors or could end up being connected
to the gates of the transistors through source-drain connections which fix the
input capacitance or end up being connected to the load capacitor which makes
the input capacitance varies with the load capacitor. Fig. 1 depicts the full adders
with drive path drawn from the inputs toward the outputs. For TGA, in the drive
path from the inputs to the sum output, inputs A and B directly connected to the
gates of the transistors or end up connecting to the gate terminals through
source–drain connections. On the other hand, input C is connected to the output
load capacitor through source– drain connections in addition to the connection
15
to some gate terminals. Therefore, it can be predicted that input A and input B
will have fixed input capacitance with constant values and This article has been
accepted for inclusion in a future issue of this journal. Content is final as
presented, with the exception of pagination. BASIREDDY et al.: HYBRID
LOGICAL EFFORT FOR HYBRID LOGIC STYLE FULL ADDERS IN
MULTISTAGE STRUCTURES 5 Fig. 4. Multistage analysis of full adders (a,
b, c, d, e, f, g & h) CM1 type full adders, (i, j, k, l, m, n) CM2 type full adders,
(o, p) CM3 type full adders, (q, r, s) CM4 type full adders, (t, u, v) compressor
analysis after connecting buffer at the output. the input capacitance of input C
varies with the load which is not a constant value. Similarly, for the carry output,
it is predictable that input B will have fixed input capacitance with a constant
value and the input capacitance of inputs A and C will vary with load capacitor
which is not a constant value. Based on the drive paths from inputs to the
outputs, circuits are categorized into three categories. 1) CS1: Inputs are directly
connected to only gate terminals of the transistor. 2) CS2: Inputs are directly
connected to gate terminals or end up connecting to the gate terminals through
the source–drain connections. 3) CS3: Inputs are connected to the load capacitor
through source-drain connections and/or connection to the gate terminals. The
path drive for each of the inputs toward the output is shown with three different
colors on Fig. 1. Also, for better graphical analysis of the circuits, the number
of connections to the gate and drain–source of transistors and connections to the
C-Load is specified on Table I. The simulation results of Cin measurement for
each of the inputs is tabulated on Table II as well. For the CS1-type full adders
(C-CMOS), the input capacitance is fixed and can be determined using a simple
method like conventional logical effort. For CS2-type full adders TABLE I
INPUT CAPACITANCE INPUT CONNECTIONS TO THE GATE,
DRAIN/SOURCE, AND C-LOAD (New-HPSC), the input capacitance is
constant but can be determined by using the proposed technique. For CS3-type
full adders (TFA, TGA and New-14T), the input capacitance is not constant and
varies with the load capacitance. In this case, Cin is a function of output load
capacitor and has the following equation: Cin = Cin0 + β · CL. (3) In this
equation, Cin0 is Cin value when the output load is zero (CL = 0), and β is
transformer coefficient of the output C-load to the input capacitor (Cin). This
transformation coefficient has been measured with the same method as
mentioned before for measuring equivalent Cin with the corresponding delay
[Fig. 1(g)]. The β is measurable with the consideration This article has been
accepted for inclusion in a future issue of this journal. Content is final as
presented, with the exception of pagination. 6 IEEE TRANSACTIONS ON
VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS TABLE II
16
INPUT CAPACITANCE SIMULATION RESULTS of two points on the delay
curve. For CS3 category full adders, the timing behavior can be predicted by
making input capacitance constant which can be done by connecting to the
output buffer. In the multistage analysis, the performance of any number of
stages can be predicted by analyzing the simulation results of only 3 or 4 stages
[see graphs in Fig. 4]. Fig. 1(g) shows the delay versus CL, in this graph, “α” is
the slop of the delay versus CL line and TDo is the intrinsic delay. Also, Fig.
1(f1) and (f2) is the test bench for Cin measurement. Fig. 1(f1) shows the delay
measurement of the circuit under test and Fig. 1(f2) shows the delay
measurement with the CL variation. Based on the analysis of the delay for the
first four stages, circuits are categorized into four following categories. 1) CM1:
Constant α and constant TDo for all stages. 2) CM2: Constant α with predictable
variation in TDo from stage to stage. 3) CM3: Predictable variation in α and
TDo from stage to stage. 4) CM4: Unpredictable variation in α and TDo from
stage to stage. It is obvious that the best circuit is the one which has constant α
and constant TDo for different stages. This circuit can keep its drivability
constantly in multistage structure with the stable timing behavior. IV.
SIMULATION AND RESULTS ANALYSIS A. Experimental Setup In this
paper, HSPICE is used as circuit simulator for all simulations, 32-nm MOS
predictive technology model (PTM) model and 32-nm shorted gate FinFET
PTM model are considered as two different technologies for comparison. For
single-stage analysis, all full adders have been simulated on a single test bench
shown in Fig. 2(c). This single test bench includes input and output buffers. All
56 possible transitions are applied for the delay and power measurement. The
rise time and falls time of the input pulses have been considered 10 ps, and
power supply has been considered 0.9 V. B. Switching Time Results of the
switching time for different full adders is shown in Fig. 2(a), which depicts that
New-HPSC full adder designed with FinFET technology offer better switching
speed for sum output. Also, C-CMOS full adder designed with FinFET
technology offer better switching speed for carry output. Full adders designed
with FinFET technology provide more than 50% improvement in switching
speed compared to the full adders designed in Bulk CMOS technology because
of their higher ION/IOFF ratio. C. Drive Capability Outcomes from Fig. 2(b)
affirm that the full adders with inverters inherent in their structure like C-CMOS
and NewHPSC have a more noteworthy drive capability that is provided by the
inverter itself whereas hybrid full adders have lesser drive capability. Fig. 2(b)
affirms that FinFET full adders showed more than 40% improvement in drive
capability than Bulk CMOS full adders because of their great ability to produce
more current at smaller dimensions and less input voltage. D. Input Capacitance
17
The data prediction on Table I and the simulation results on Table II demonstrate
that the input capacitance in some cases did not fluctuate with load because they
are directly connected to the gate of the transistors or might end up connecting
to the gate of the transistors through the sources or drains. The input capacitance
of some of the inputs varies with the variation in the load capacitor because of
their direct connection to the load capacitor through drain–source terminals.
Results from Table I affirms that C-CMOS (CS1) and New-HPSC (CS2) full
adders have fixed input capacitance because of no connection from inputs to the
load capacitor. TGA, TFA and New-14T full adders (CS3) have variable input
capacitance for some of the inputs because of their connection to load capacitor
through some source–drain connections. The prediction analysis matched with
simulation results. CS3 category full adders are expected to show different
behavior in each stage in the multistage analysis. V. MULTISTAGE
PERFORMANCE PREDICTION PARAMETERS A. Gain Excellent drive
capability and low input capacitance in single test bench are the coveted
parameters to get excellent performance in multistage structure. Gain is the
multistage performance prediction parameter which is the ratio of output drive
capability over input capacitance. The better the Gain, the better is the circuit’s
performance in the multistage structure. It is notable that with increasing the size
of transistors, the drivability will increase, but this variation has a negative effect
on the input capacitance. Since there is no way to control the drivability without
impact on the input capacitance, Gain This article has been accepted for
inclusion in a future issue of this journal. Content is final as presented, with the
exception of pagination. BASIREDDY et al.: HYBRID LOGICAL EFFORT
FOR HYBRID LOGIC STYLE FULL ADDERS IN MULTISTAGE
STRUCTURES 7 Fig. 5. (a) Gain versus K for sum signal. (b) Gain versus K
for carry signal. (c) SF normalized with 1030. is the best parameter which can
cover the effect of the size on both of drivability and input capacitance
simultaneously Gain (G) = Dro Cin (4) where Dro is the drive capability of each
output terminal, and Cin is the input capacitance of each input. For gain
calculation, simulation results as shown in Fig. 5(a) and (b) for FinFET
technology, affirms that the performance prediction factor “G” increases with
increasing the size of the transistors, but the rate of variation is more for carry
signal in comparison with the sum signal in all full adders. Also, the graphs
show that the carry outputs are more sensitive to the variation of the transistor
sizes than sum outputs. In this method, coefficient ‘K’ is used for increasing the
size of all circuit transistors simultaneously when the technology limit is
considered as the base. Therefore, this parameter can be controlled by the
designer for managing the tradeoff between speed and power. Regardless of
18
other characteristics of the cell, like energy consumption, the Gain parameter is
the crucial parameter for the proper operation of the cell in the multistage
structure. For the better cell, the gain should have the minimum need for the
input drive and maximum output drivability. Fig. 5(a) and (b) shows that the
New-HPSC circuit is better for the Gain in both of the sum and carry cases. B.
Selection Factor The selection factor (SF) is introduced as a new parameter in
this paper for the wise selection of the cell, which is defined as follows: SF =
Gain PDP = Dro/Cin Power ∗ Td (5) where Gain is the multistage performance
prediction parameter and PDP is the energy consumption. The PDP is used for
transistor sizing, to optimize the circuit for both power and delay. The SF is
covering both of Gain and PDP parameters, which is a criterion to pick the best
circuit based on minimum energy consumption and maximum gain. The SF
parameter could be different for sum and carry outputs based on circuit structure.
The simulation results for SF of the sum and carry signals for both CMOS and
FinFET technologies are shown in Fig. 5(c). The Gain parameter, for some of
the full adders like TGA & TFA could be controlled by inverters at the input of
the cell. In some other cases like C-CMOS and New-HPSC, Gain could be
controlled by inverters at the output of the cell. Some other full adders like New-
14T have inverter neither at the input port nor the output port. The simulation
results show this circuit does not have a proper operation in multistage structure
because of lack of the gain. The buffer is used at the output port to improve the
performance of the New-14T full adder, and simulation results as shown in Fig.
5 affirms that New-14T full adder with maximum SF is the best circuit for both
of sum and carry outputs after using the output buffer. Equation (5) shows that
SF is a function of four different parameters and designer can control the effect
of each parameter for different applications. However, it is notable that Fig. 5(c)
shows the New-14T as the best circuit if our SF had been formed by (5) which
all four parameters have similar weight. It is notable that with increasing or
decreasing the weight of each parameter, SF can get different adjustment
depends on the application. C. Cell Characterization With SF In the process of
semicustom design with standard cell library, SF is a crucial parameter for cell
characterization and library generation. In this paper, SF is defined as a selection
or optimization parameter which is a function of four variables (5). These four
variables are controllable by cell transistor sizing, but logical effort as
conventional transistor sizing method is not working for hybrid structure adders
because of irregularity and complexity of their structure. Some other
optimization algorithms such as SEA algorithm [10] which are more flexible
should be used for transistor sizing. The flexibility should be for target metric
selection, devicecircuit cooptimization, different variables for different
19
technologies, multiobject optimization, and circuits with hybrid structures [6],
[8], [10]. Traditional standard cell libraries are characterized for five different
modes which are five options for the tradeoff between power and speed. These
cells are characterized as Typical, Fast-Fast, Slow-Slow, Fast-Slow and Slow-
Fast, controllable by the transistor sizing [17]. On the other side, all four
variables on the SF (5), (Dro, Cin, Power, and Td) have a similar weight for
regular optimization. At this situation, the result of optimization with This
article has been accepted for inclusion in a future issue of this journal. Content
is final as presented, with the exception of pagination. 8 IEEE
TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)
SYSTEMS transistor sizing may cause a kind of typical cells for both power and
speed optimization and proper operation of the cell in multistage structure. The
optimization process based on a selection factor is flexible for the designer to
play with optimization parameters based on the weight of each SF parameter.
Selection factor could be redefined as (6) which has n1, n2, n3, and n4 as four
effect factor for four SF parameters. The ni (i = 1, 2, 3, and 4) as the effect factor
of each variable can get a value between “0” to maximum a reasonable number
like 3. For example, when ni = 1, the related variable has a regular impact on
the optimization process and if n > 1, like 1.2 or 1.4, and so on. The mentioned
variable has more effect on the optimization process and cell characteristics. The
minimum value for ni as effect factor is “0,” which removes the effect of a
related variable from SF and the optimization process completely. At this
situation, the optimization process could be based on other three parameters on
SF equation. For example, if the n3 value for power is considered as “0,” and
n4 = 1, the optimization process goes to Fast-Fast corner because the power is
not considered on the optimization process SF = Gain PDP = Drn1 o /Cn2 in
Powern3 ∗ Tdn4 . (6) In traditional standard cell libraries, in addition to the
typical and corners cell characterization, cells may have different categorize for
different fan-outs like 1, 2, 4, 8, and so on. However, in the proposed method
for cells with hybrid structure, Dro and Cin on the SF equation are the
parameters for characterization of the cell for switching behavior on the
multistage structure for fan-out consideration. More drivability and less input
capacitor can make the cell faster for switching on the multistage structure. The
n1 and n2 are the effect factor for controlling the cell drivability and input
capacitor on the transistor sizing process. For example, if n3 = 1, n4 = 2, and
Dro = Cin = 1, the optimization process is set for energy-delay product (EDP)
with the consideration of output drivability and input capacitance as typical. At
this situation, the cell may have best sizes for optimum EDP but this cell does
not have the best switching behavior on the multistage structure, this unreliable
20
operation would be worse with increasing the cell fan-out in multistage
structure. Increasing the size of transistors on the cell which cause more
drivability and more speed has some area overhead that leads to an increase in
the Cin and decreases the cell speed in another way. Therefore, transistor sizing
can control this phenomenon with controlling Cin on the SF, along with other
parameters. In the proposed method, transistor sizing based on SF can control
the Dro and Cin together for different input pulse transitions. In this case,
transition time as one of the cell library characteristics will be covered because
it has a direct connection to the charge and discharge of the input capacitors for
triggering. The smaller Cin and bigger Dro cause strong cell input trigger for
fast switching, and it is like using fast transition time for input pulses. Under
this condition, a cell can improve cell switching for high fan-out structures.
TABLE III DRIVABILITY AND SWITCHING TIME VARIATION VI.
MULTISTAGE ANALYSIS The multistage analysis is performed by using 6:2
compressor [Fig. 3(a)] and 4-bit RCA [Fig. 3(b)]. In 6:2 compressor sum signal,
and in 4-bit RCA, carry signals are on the critical path. To better understand the
timing behavior of the cells with the increasing number of stages, the transition
time for all input pulses have been fixed. The input buffer is used for all input
pulses, and the capacitor load has been swept from 0 to 2 fF for each output
stage. For single test bench measurements, such as Cin measurement, the unit
inverter has been used. For multistage analysis, which we need more drivability
at the first stage, the buffer with the size of W/L 5/3 and 12/5 [10] has been used,
and the delay of the first stage contains the buffer’s delay as well. A. Full Adders
in Bulk CMOS Technology The simulation results of the sum and carry outputs
are shown in Fig. 4 (a), (b), (e), (f), (i), (k), (l), (o) and (q) for the full adders. It
is obvious from Fig. 4 that C-CMOS and NewHPSC full adders have the same
slopes on their output delay lines. This constant slop is because of the output
inverters which makes specific drivability. From stage 2 to stage 4, all lines
coincide with each other because of having constant α and constant TDo for all
stages so that we can predict the same behavior for any number of stages. In
compressor analysis of TGA and TFA full adders, it is clear that all the four
stages have distinctive behavior, but there is a predictable variation in α and
constant TDo from stage to stage. In the RCA analysis of TGA full adder, there
is a predictable variation in α and constant TDo from stage to stage. In the RCA
analysis of TFA full adder, there is predictable variation in both α and TDo from
stage to stage. Therefore, the behavior of C-CMOS, New-HPSC, TFA, and TGA
full adders can be predicted for any number of stages. Because of the poor drive
and high input capacitance, the performance of the 14T full adder is extremely
poor and shows improper outputs in compressor analysis. In the RCA analysis
21
of New-14T full adder, it can be observed that there is an unpredictable variation
in α and TDo from stage to stage. Therefore, the behavior of New14T full adder
can be predicted by connecting output buffer, but buffer can make some area
and power overhead on the cell. Variation in slope and TDo from stage to stage
for all mentioned full adders is explained in Table III. From the Table III, it is
obvious that C-CMOS and New-HPSC are two circuits without variation on the
drivability and inherent delay This article has been accepted for inclusion in a
future issue of this journal. Content is final as presented, with the exception of
pagination. BASIREDDY et al.: HYBRID LOGICAL EFFORT FOR HYBRID
LOGIC STYLE FULL ADDERS IN MULTISTAGE STRUCTURES 9 which
can show they have a consistent timing behavior in multistage structure. They
can function well on the different structures, and they can keep their speed in
various stages. The addition of buffers not only improves the output drive
capability but also fixes the input capacitance of the full adders as well. Fig.
3(t)–(v) affirms that α and TDo can be made constant for all stages for any circuit
by connecting output buffer. B. Full Adders in FinFET Technology From both
compressor analysis and RCA analysis plots [Fig. 4(c), (d), (g), (h), (j), (m), (n),
(p), (r), and (s)], it can be observed that all the full adders built using FinFET
technology demonstrate the same behavior in comparison with Bulk CMOS
technology. For TGA, TFA, and New-14T full adders from stage to stage, the
delay decreases as the number of stages increases due to very excellent drive
capability and low input capacitance. The analysis in Table II demonstrates that
slope decreases predictably from stage to stage for TFA and TGA full adders.
C. Comparison of Bulk CMOS and FinFET technologies Compared results
show that the performance of FinFET full adders is superior to full adders
designed with Bulk CMOS technology. Full adders designed with FinFET
technology offer lesser switching time when compared to the Bulk CMOS full
adders because of their high ION/IOFF ratio. In comparison with Bulk CMOS
full adders, FinFET full adders offer greater drive capability with more than
40% improvement because of their ability to produce more current at smaller
dimensions with the lesser input voltage. Although the input capacitance is more
for FinFET full adders compared to Bulk CMOS full adders, both DRo/Cin and
Gain/PDP are better for FinFET full adders over Bulk CMOS full adders. VII.
CONCLUSION In this paper, a new logical effort analysis is proposed for the
hybrid structure adder circuits. The big question in this paper, which we are
answering is, if hybrid full adders are fast, energy–area efficient more than C-
CMOS one, what are the parameters that designer can consider and control to
make sure they are reliable blocks to use in the multistage structure in a very
simple method like conventional logical effort. This analysis is essential to
22
estimate the performance of the full adders in bigger structures such as big
adders, compressors, multipliers and so on. Based on the results of simulation
at the single test bench and comparison with multistage analysis for the
mentioned full adders, in this paper, their timing behavior can be modeled using:
switching time, input capacitance, and output drivability. All these three items
are measurable by use of the single test bench, and with these three items, it is
possible to predict timing behavior of the circuit for multistage structures. The
Gain is defined as a new parameter which is the ratio of output drivability and
input capacitance and could be controlled by the transistor sizing. The designer
can control the tradeoff between energy and performance of the circuit with the
control of the Gain parameter. In this regard, the selection factor is introduced
as a design and optimization parameter which is the ratio of the Gain over energy
consumption. Also, graphical classification of the hybrid cells is done for
bringing up which circuit has more potential to be high performance just with
looking at the circuit with any degree of the circuit complexity based on
proposed timing analysis method. The predictive analysis is confirmed by doing
HSPICE simulation for the selected adder cells and both CMOS and FinFET
technologies. REFERENCES [1] C.-H. Chang, J. Gu, and M. Zhang, “A review
of 0.18-μm full adder performances for tree structured arithmetic circuits,” IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 6, pp. 686–695, Jun.
2005. [2] C.-K. Tung, Y.-C. Hung, S.-H. Shieh, and G.-S. Huang, “A low-power
high-speed hybrid CMOS full adder for embedded system,” in Proc. IEEE
Design Diagnostics Electron. Circuits Syst. (DDECS), Krakow, Poland, Apr.
2007, pp. 1–4. [3] C.-K. Tung, S.-H. Shieh, and C.-H. Cheng, “Low-power high-
speed full adder for portable electronic applications,” Electron. Lett., vol. 49,
no. 17, pp. 1063–1064, Aug. 2013. [4] S. Goel, A. Kumar, and M. Bayoumi,
“Design of robust, energy-efficient full adders for deep-submicrometer design
using hybrid-CMOS logic style,” IEEE Trans. Very Large Scale Integr. (VLSI)
Syst., vol. 14, no. 12, pp. 1309–1321, Dec. 2006. [5] P. Bhattacharyya, B.
Kundu, S. Ghosh, V. Kumar, and A. Dandapat, “Performance analysis of a low-
power high-speed hybrid 1-bit full adder circuit,” IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 23, no. 10, pp. 2001–2008, Oct. 2015. [6] K.
Haghshenas, M. Hashemi, and T. Nikoubin, “Fast and energyefficient CNFET
adders with CDM and sensitivity-based device-circuit co-optimization,” IEEE
Trans. Nanotechnol., vol. 17, no. 4, pp. 783–794, Jul. 2018. [7] Y. S. Mehrabani
and M. Eshghi, “Noise and process variation tolerant, low-power, high-speed,
and low-energy full adders in CNFET technology,” IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 24, no. 11, pp. 3268–3281, Nov. 2016. [8] T.
Nikoubin, M. Grailoo, and C. Li, “Energy and area efficient threeinput
23
XOR/XNORs with systematic cell design methodology,” IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 24, no. 1, pp. 398–402, Jan. 2016. [9] T.
Nikoubin, M. Grailoo, and S. H. Mozafari, “Cell design methodology based on
transmission gate for low-power high-speed balanced XORXNOR circuits in
hybrid-CMOS logic style,” J. Low Power Electron., vol. 6, no. 4, pp. 503–512,
2010. [10] T. Nikoubin, P. Bahrebar, S. Pouri, K. Navi, and V. Iravani, “Simple
exact algorithm for transistor sizing of low-power high-speed arithmetic
circuits,” in Proc. VLSI Design, Jan. 2010, p. 3. [11] I. Sutherland, B. Sproull,
and D. Harris, Logical Effort: Designing Fast CMOS Circuits. San Francisco,
CA, USA: Morgan Kaufmann, 1999. [12] X. Lin, Y. Wang, S. Nazarian, and M.
Pedram, “An improved logical effort model and framework applied to optimal
sizing of circuits operating in multiple supply voltage regimes,” in Proc. 15th
Int. Symp. Quality Electron. Design, Santa Clara, CA, USA, Mar. 2014, pp.
249–256. [13] R. M. Anacan and J. L. Bagay, “Logical effort analysis of various
VLSI design algorithms,” in Proc. IEEE Int. Conf. Control Syst., Comput. Eng.
(ICCSCE), George Town, Malaysi, Nov. 2015, pp. 19–23. [14] A. B. A. Tahrim
and M. L. P. Tan, “Design and implementation of a 1-bit FinFET full adder cell
for ALU in subthreshold region,” in Proc. IEEE Int. Conf. Semiconductor
Electron. (ICSE), Kuala Lumpur, Malaysia, Aug. 2014, pp. 44–47. [15] S.
Maheshwari, J. Patel, S. K. Nirmalkar, and A. Gupta, “Logical effort based
power-delay-product optimization,” in Proc. Int. Conf. Adv. Comput.,
Commun. Inform. (ICACCI), New Delhi, India, Aug. 2014, pp. 565–569. [16]
R. Uma and P. Dhavachelvan, “Performance evaluation of full adders in ASIC
using logical effort calculation,” in Proc. Int. Conf. Recent Trends Inf. Technol.
(ICRTIT), Chennai, India, Jul. 2013, pp. 612–618. [17] J. M. Rabaey, A. P.
Chandrakasan, and N. Borivoje, Digital Integrated Circuits: A Design
Perspective, 2nd ed. Upper Saddle River, NJ, USA: Pearson, 2003. This article
has been accepted for inclusion in a future issue of this journal. Content is final
as presented, with the exception of pagination. 10 IEEE TRANSACTIONS ON
VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS Hareesh-Reddy
Basireddy received the B.E. degree in electronics and communication
engineering from Jawaharlal Nehru Technology University, Anantapur, India,
in 2012, the M.S. degree in VLSI design from the Visvesvaraya National
Institute of Technology, Nagpur, India, in 2015, and the M.S. degree in electrical
engineering from Texas Tech University, Lubbock, Texas, in 2018,
respectively. His current research interests include low power and energy-
efficient digital very large scale of integration design, and optimization.
Karthikeya Challa received the B.E. degree in electronics and communication
engineering from Jawaharlal Nehru Technological University, Hyderabad,
24
India, in 2013 and the M.S degree in electrical engineering from Texas Tech
University, Lubbock, Texas, in 2015, respectively. He is currently an
Electronics Design and Validation Engineer at SL America Corporation,
Auburn Hills, MI, USA. His current research interests include low-power and
high-performance application-specified integrated circuit design Tooraj
Nikoubin (M’14–SM’15) received the B.Sc. and M.Sc. degrees in electronic
engineering from Electrical, Computer Engineering Department, K. N. Toosi
University of Technology, Tehran, Iran, in 1991 and 1994, respectively, and the
Ph.D. degree in computer engineering from Shahid Beheshti University, Tehran,
in 2009. In 2012, he joined the Electrical and Computer Engineering
Department, Texas Tech University, Lubbock, TX, USA. His current research
interests include nanoelectronics and energy-efficient computing, very large
scale of integration circuit and system design and optimization, computer
architecture, and wearable electronics for health and assistance.

Design of a Scalable Low-Power 1-bit Hybrid Full Adder for Fast Computation
Mehedi Hasan, Md. Jobayer Hossein, Mainul Hossain, Hasan U. Zaman, Senior
Member, IEEE and Sharnali Islam

Abstract—A novel design of a hybrid Full Adder (FA) using Pass Transistors
(PTs), Transmission Gates (TGs) and Conventional Complementary Metal
Oxide Semiconductor (CCMOS) logic is presented. Performance analysis of the
circuit has been conducted using Cadence toolset. For comparative analysis, the
performance parameters have been compared with twenty existing FA circuits.
The proposed FA has also been extended up to a word length of 64 bits in order
to test its scalability. Only the proposed FA and five of the existing designs have
the ability to operate without utilizing buffer in intermediate stages while
extended to 64 bits. According to simulation results, the proposed design
demonstrates notable performance in power consumption and delay which
accounted for low power delay product. Based on the simulation results, it can
be stated that the proposed hybrid FA circuit is an attractive alternative in the
data path design of modern high-speed Central Processing Units. Index Terms—
hybrid adder, full adder, transmission gate, pass transistor.

I. INTRODUCTION ITH the explosive evolution of transistor scaling, research


work related to low-power design of microelectronics circuits have been greatly
escalated.. As a result, demand for high performance design of microelectronic
25
circuits has skyrocketed [1]. Modern image and video processing operations,
digital signal processing (DSP) chips, microprocessors and many other
applications require large scale of arithmetic operations [2]. One of the most
elementary arithmetic operations is addition which is substantially and
repetitively used in modern computational applications. A 1-bit Full Adder (FA)
is considered as the nucleus of binary addition since it is the fundamental cell
for building wide word-length adders [3-4]. Hence, performance enhancement
of FA is crucial to the Mehedi Hasan and Md. Jobayer Hossein are with the ECE
Department, North South University, Dhaka 1229, Bangladesh. (e-mail:
mehedi.hasan01@northsouth.edu, jobayerhossein@gmail.com). Mainul
Hossain was with the University of Central Florida, Orlando, FL 32826 US.
Currently, he is with the Department of Electrical and Electronic Engineering,
Dhaka University, Dhaka 1000, Bangladesh. (email: mainul.eee@du.ac.bd).
Hasan U. Zaman is a Senior Consultant of Intel Corporation, Hudson, MA 1749,
US and a Professor (on leave) of North South University, Dhaka 1229,
Bangladesh. (e-mail: hasan.zaman@northsouth.edu) Sharnali Islam was with
the University of Illinois at Urbana-Champaign, Champaign, IL 61801 US.
Currently she is with Dhaka University, Dhaka 1000, Bangladesh. (email:
sharnali.eee@du.ac.bd). performance improvement of the Arithmetic and Logic
Unit (ALU) in the microprocessors. In this research, a new hybrid FA using PTs,
TGs and static CMOS logic has been proposed. The FA was implemented in 45
nm technology using Cadence tools. In order to judge the reliability, a
comparative inspection on performance parameters of the proposed design has
been conducted with twenty existing FA designs with supply voltage varied
from 0.4 V to 1.2 V. Moreover, the FAs have been extended to wide word length
adders to test their performance parameters in large scale circuits. The proposed
FA exhibited remarkable performance both as single cell and extended form
compared to the existing FAs. II. RELATED WORK Complementary Pass
Transistor Logic (CPL) based FA [5], 12 Transistor (12-T) FA [6] and
Conventional CMOS (CCMOS) logic based FA [7] utilize single logic style for
FA implementation. Due to having voltage degradation, CPL logic requires
additional buffers to bring back the signal to supply voltage level. The problem
of voltage degradation is not associated with C-CMOS FA. However, large
input impedance is a major problem in C-CMOS FA. Researchers nowadays
tend to use hybrid design approach which utilizes the favorable aspects of
several logic styles within the same FA cell. TG Adder (TGA) in [8], [9] and
Transmission Function Adder (TFA) in [10] use TGs in the design. While TGA
and TFA FA do not suffer from voltage degradation, poor capability is a major
concern in these two adders. FA cells using 24 transistors (24-T) [11], 14
26
transistors (14- T) [12] and 10 transistors (10-T) [13] also use multiple logic
style for FA implementation. 24-T FA uses a 3 input XOR gate for sum
calculation rather than cascading two separate 2 input XOR gate. Input bits in
14-T and 10-T FA are at first fed into a XOR gate. The XOR gate acts as an
intermediate node whose output is employed to produce the final output. Very
low transistor count in 14-T and 10-T FAs accounts for less surface area.
However, driving capability remains a major concern which limits their
application in high fan-out cases. Hybrid Pass Static CMOS (HPSC) FA in [14]
uses Pass Transistor Logic (PTL) to generate XOR and XNOR signals in
intermediate nodes. The output side consists of C-CMOS logic which provides
full-swing outputs. However, employment of CCMOS logic increases transistor
count and capacitance. Design of a Scalable Low-Power 1-bit Hybrid Full Adder
for Fast Computation Mehedi Hasan, Md. Jobayer Hossein, Mainul Hossain,
Hasan U. Zaman, Senior Member, IEEE and Sharnali Islam W 1549-7747 (c)
2019 IEEE. Personal use is permitted, but republication/redistribution requires
IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for
more information. This article has been accepted for publication in a future issue
of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/TCSII.2019.2940558, IEEE
Transactions on Circuits and Systems II: Express Briefs 2 Double Pass transistor
logic (DPL) and Swing Restored CPL (SRCPL) Hybrid FAs proposed in [15]
generate ANDOR signal to compute output carry and XOR-XNOR signal for
the sum output. Input carry is used as the select bit of TG based multiplexers
(MUX) to produce outputs from intermediate AND-OR and XOR-XNOR
nodes. The hybrid FA in [16] uses inverter in the output side of the adder and
utilized CPL and TG logic in the input side. The idea of using C-CMOS logic-
based inverter in output side is to provide good driving capacity. More Hybrid
FAs are reported in [17- 21]. Based on Gate Diffusion Input (GDI) technique,
three FA designs (addressed as GDI D1, GDI D2 and GDI D2 in this paper)
have been reported in [22]. Major problem associated with GDI adders is its
weak output signal. III. PROPOSED FULL ADDER DESIGN APPROACH
Design approach of the proposed FA is shown in Fig. 1. The design of the
proposed FA is divided into four major modules: two modules for carry
generation and the other two for sum. Schematic of the proposed design is
represented by Fig. 2. The design methods and detailed description of the
modules are provided in the following sub-sections. A. Carry Generation From
the truth table of a FA, it can be observed that If, 0݊݊ = ‫ ܥ‬then ‫ = ݐݑ݋ܥ‬AND
function and if, 1݊݊ = ‫ ܥ‬then ‫ = ݐݑ݋ܥ‬OR function Therefore, the proposed
27
carry generation part consists of a novel AND-OR module (module 1) based on
TG and CPL logic. Within the AND-OR module, implementation of AND and
OR gates are quite similar, except the fact that nMOS and pMOS transistors are
interchanged. The conditions required to design the proposed AND gate are as
follows: If 0 = ‫ ܣ‬, then 0 = ‫ ݐݑ݋ݐݑ݋‬and (1) If 1=‫ܣ‬, then 2) ‫ )ݐݑ݋ݐݑ݋ = ܤ‬The
first condition for AND gate design in (1) is carried out by the pass transistor
N2 and TG1, respectively. The later condition (2) is carried out by TG1. In the
same way, conditions for designing OR gate are: If 1 = ‫ ܣ‬, then 1 = ‫ ݐݑ݋ݐݑ݋‬and
(3) If 0=‫ܣ‬, then 4) ‫ )ݐݑ݋ݐݑ݋ = ܤ‬Here, conditions (3) and (4) are carried out by
pass transistor P2 and TG2, respectively. A TG based 2:1 MUX (module 2:
consists of TG3 and TG4) is used to select the appropriate carry-out signal from
AND-OR module depending on input carry Cin. B. Sum Generation The sum
output of the proposed FA is generated by cascading two XOR modules (module
3 and 4). The XOR gates have exactly the same structure and are implemented
using TGs and PTs. The conditions for designing the XOR gate for module 3
are: If 0 = ‫ ܣ‬, then 5) ‫ )ݐݑ݋ݐݑ݋ = ܤ‬If 1=‫ ܣ‬and 0=‫ܤ‬, then 6) ‫ )ݐݑ݋ݐݑ݋ = ܣ‬If
1=‫ ܣ‬and 1=‫ܤ‬,then 7) ‫ )ݐݑ݋ݐݑ݋ = ܣ‬In this case, condition (5) has been
implemented using TG5. The pass transistor P3 in module 3 (XOR gate)
implements condition (6) and the pass transistor N3 implements condition (7).
Employing exactly the same method, XOR gate for module 4 has been designed
using Cin and A⊕B (output of module 3) as inputs. For this case, the conditions
are: (8 (‫ ܤ ⊕ ܣ = ݐݑ݋ݐݑ݋‬then , 0݊݊ = ‫ ܥ‬If If 1݊݊ = ‫ ܥ‬and 0=‫ܣ⊕ܤ‬, then
‫݋݋ )ݐݑ݋ݐݑ݋ = ܥ‬9) If 1݊݊ = ‫ ܥ‬and 1=‫ܣ⊕ܤ‬, then ‫݋݋ )ݐݑ݋ݐݑ݋ = ܥ‬10) In this
case, the conditions (8), (9) and (10) have been implemented by TG6, P4 and
N4, respectively. IV. SIMULATION RESULT AND DISCUSSION To analyze
various performance parameters, simulation has been conducted in GPDK 45
nm technology node using Cadence simulation tools. Buffers are added in the
input and output side for simulation test bench. The output side also consisted a
load capacitance of 6 fF. In this way, the circuit experiences significant signal
distortion unlike realistic environment. Input frequency has been set to 100
MHz. For optimized transistor sizing, particle swarm optimization algorithm has
been utilized for implementation of FAs [23]- [24]. For transistor sizing using
swarm algorithm, we initialize a vector ݊‫ ݋×݋‬ሬሬሬሬሬሬሬሬሬሬሬԦ where m is
the transistor count and n is the number of particles. Each particle has 3 sorts of
movement tendency in the search space: (a) tendency due to its own inertia (b)
tendency towards particle’s best value (c) tendency towards team’s best value.
The circuit is simulated by randomly changing transistor width within the range
of minimum to maximum allowable size. The objective is minimum Power
28
Delay Product (PDP). If a particle’s current PDP < best PDP value so far found
by that particle, then the current PDP is set as the particle’s best PDP. In the
same way, if the best PDP for current simulation run < the best PDP value so
far found in the entire search space, then the team’s best value is updated with
the current value. In this way, after completing the desired number of iterations,
the team’s best value of PDP is chosen as the optimal PDP and the
corresponding transistor widths are chosen for circuit implementation. A.
Performance of FA cells for various supply voltage The post-layout simulation
results on delay, power and Power Delay Product (PDP) for the supply voltage
0.4 V (minimum voltage in order to avoid sub-threshold operation), 0.8 V (mid-
point voltage between 0.4 V and 1.2 V) and 1.2 V (normal operating voltage)
are represented using Fig. 3 and Table I. The results (Fig. 3) demonstrate that
proposed FA obtained superior performance in case of delay and PDP. Although
the proposed design does not obtain best performance in power (Fig. 3b), still
the value is low enough for practical utilization in modern processors. The least
technology node in which 10-T FA [13] could operate is 180 nm technology
with a least supply voltage of 1.8 V. Hence, to maintain uniform environment
for comparison, performance parameters (15.03 µW power and 129.48 ps delay)
of 10-T FA are not added in Table I. 1549-7747 (c) 2019 IEEE. Personal use is
permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for
more information. This article has been accepted for publication in a future issue
of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/TCSII.2019.2940558, IEEE
Transactions on Circuits and Systems II: Express Briefs 3 Fig. 1. Block diagram
of proposed full adder. Fig. 2 Schematic of the proposed FA. 12-T [6], 14-T
[12], HPSC [14] and Hybrid 2 [18] FAs could not operate flawlessly while
simulated with 0.4 V supply voltage. Hence, performance parameters of these
FAs for 0.4 V could not be obtained for presenting in Table I and Fig. 3. To
figure out the least operating voltages of these FAs, further simulation has been
conducted using 0.45 V-0.75 V supply voltage. The least supply voltage
required for 12-T [6], 14-T [12], HPSC [14] and Hybrid 2 [18] FAs are 0.75 V,
0.7 V, 0.7 V and 0.5 V respectively. It has been also observed that delays of FA
cells start increasing quite drastically while operating below 0.8 V supply
voltage. As technology scales down, interconnect performance depends more
on the resistance rather than the capacitance [24]. To achieve the best
performance, the interconnect resistance and capacitances need to be re-
optimized. For 5nm node, as demonstrated by Pan et al. [24], increasing the
interconnect width beyond half pitch without changing the interconnect pitch
29
yields 55% improvement in energy-delay product for vertical field-effect
transistor (VFET) circuits. Therefore, for more aggressive technologies, the
interconnect width should be a target optimization parameter to achieve desired
performance. B. Performance of FAs operating in cascade To inspect scalability,
FAs are extended into 4, 8, 16, 32 and 64 bits in Ripple Carry Adder style as
depicted in Fig. 4 [18]. No intermediate buffer has been added while cascading
FAs. Simulation results on performance parameters are listed in Table II. It was
observed that only 5 out of 20 existing FAs and the proposed design can operate
when extended to 64-bits. As per Fig. 3, Hybrid 2 [18], Hybrid 6 [21] and GDI
D1 [22] FAs showed promising performance as single cell. However, they can
be only extended to maximum 8-bits. This limitation happens due to the decline
of signal strength (voltage) while propagating through several stages. Due to
having CCMOS based logic style in the output terminals, the output signals
come from Vdd and Gnd for CCMOS [7] and 24-T [11] FAs. As a result, the
voltage gets replenished in each stage for which CCMOS [7] and 24-T [11] FAs
do not face the problem of voltage deterioration in long carry chains. ` Fig. 3.
Performance of FA cells (a) average power (b) propagation delay (c) PDP. 1549-
7747 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution
requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for
more information. This article has been accepted for publication in a future issue
of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/TCSII.2019.2940558, IEEE
Transactions on Circuits and Systems II: Express Briefs 4 TABLE I
PERFORMANCE OF FULL ADDER CELLS UNDER VARIOUS SUPPLY
VOLTAGE TC: Transistor Count, AP: Average Power, DP: Dynamic Power
(Switching Power + Short-Circuit Power), PDP: Power Delay Product, SP:
Static Power, F: Failed to Operate, PR: Performance Ratio (Proposed FA Value:
Best Value Obtained by Other FAs) Full Adder Ref. no T C Power (AP in µW,
DP and SP in nW) Delay (ps) PDP (aJ) 0.4 V 0.8 V 1.2 V 0.4 V 0.8 V 1.2 V 0.4
V 0.8 V 1.2 AP DP SP AP DP SP AP DP SP V CPL [5] 32 0.488 458.2 29.8
1.72 1612.8 107.3 3.89 3634.1 255.9 612.1 84.8 37.3 298.7 145.9 145.1 12-T
[6] 12 F F F 1.2 1158.4 41.6 2.54 2423.4 116.6 F 447.4 67.6 F 536.9 171.7
CCMOS [7] 28 0.159 149.1 9.9 0.68 660.3 19.7 1.91 1841.6 68.4 809.3 125.8
39.7 145.7 85.5 75.8 TGA [8] 20 0.163 158.8 8.2 0.64 617.4 22.6 1.41 1365.8
44.2 893.4 139.7 58.3 151.9 89.4 82.2 TFA [10] 16 0.155 150.9 8.1 0.61 586.5
23.5 1.33 1288.7 41.7 957.5 145.6 66.8 155.1 88.8 88.8 24-T [11] 24 0.202
189.9 12.1 0.76 724.8 35.2 1.68 1628.8 51.2 908.3 128.4 65.9 183.5 97.6 110.7
14-T [12] 14 F F F 1.12 1059.7 60.3 2.34 2202.7 137.3 F 385.6 67.4 F 431.9
30
157.7 HPSC [14] 22 F F F 0.89 844.5 45.5 2.09 1979.5 110.5 F 89.7 38.6 F 78
80.7 DPL [15] 22 0.256 240.1 15.9 0.87 833.3 36.7 2.11 1986.3 123.7 835.6
91.9 45.6 213.9 79.9 96.2 SRCPL [15] 20 0.193 182.8 10.2 0.7 759.8 31.1 1.79
1726 64 869.6 132.3 50.4 167.8 104.5 90.2 New-HPSC [16] 24 0.217 205.3 11.7
0.77 739.8 30.2 2.04 1931.6 108.4 998.5 193.6 71.5 216.7 149.1 145.9 Hybrid
1 [17] 24 0.267 258.8 16.2 0.78 749.3 30.7 2.31 2176.4 133.6 932.4 189.1 60.2
248.9 145.6 139 Hybrid 2 [18] 16 F F F 0.35 339.7 10.3 0.94 912.8 27.2 F 231.4
69.6 F 81 65.4 Hybrid 3 [19] 22 0.169 162.6 6.4 0.68 661.1 18.9 1.53 1473.5
56.5 811.8 101.3 48.6 146.9 68.9 74.4 Hybrid 4 [20] 16 0.113 106.9 6.1 0.44
428.2 11.8 0.98 951.3 28.7 675.3 81.6 38.7 76.3 35.9 37.9 Hybrid 5 [21] 21
0.146 136.9 9.1 0.58 561.1 18.9 1.31 1270.4 39.6 697.5 96.8 43.9 108.1 56.1
57.5 Hybrid 6 [21] 23 0.133 125.3 7.7 0.54 522.5 17.5 1.35 1304.1 45.9 683.4
81.4 35.1 98.4 43.9 47.4 GDI D1 [22] 18 0.127 144.8 7.2 0.46 446.7 13.3 1.09
1054.7 35.3 599.8 98.8 31.8 76.17 43.5 34.7 GDI D2 [22] 22 0.165 155.6 9.4
0.63 604.6 25.4 1.49 1437.5 52.5 547.6 77.3 28.6 90.4 48.7 42.6 GDI D3 [22]
21 0.152 116.6 8.5 0.61 586.3 23.7 1.32 1277.6 42.4 708.3 90.53 39.7 114.7
55.2 52.4 Proposed ----- 22 0.129 122.2 6.8 0.48 466.6 13.4 1.17 1135.7 34.3
523.8 65.7 25.3 67.6 31.5 29.6 Performance Ratio (PR) 1.14:1 1.14:1 1.11:1
1.37:1 1.37:1 1.3:1 1.24:1 1.24:1 1.26:1 1:1.04 1:1.17 1:1.13 1:1,13 1:1.14
1:1.17 TABLE II PERFORMANCE COMPARISON OF FULL ADDER
CELLS OPERATING IN CASCADE Supply voltage: 0.8 V, F: Failed to
Operate, PR: Performance Ratio (Proposed FA Value: Best Value Obtained by
Other FAs) Full Adder Ref. no Power (µW) Delay (ps) PDP (fJ) 4 bit 8 bit 16
bit 32 bit 64 bit 4 bit 8 bit 16 bit 32 bit 64 bit 4 bit 8 bit 16 bit 32 bit 64 bit CPL
[5] 6.47 13.46 F F F 1256.7 4484.9 F F F 8.13 60.36 F F F 12-T [6] F F F F F F
F F F F F F F F F CCMOS [7] 2.42 4.68 9.48 20.79 44.2 510.6 1042.5 2105.8
4233.2 8490.5 1.24 4.88 19.96 88.01 373.3 TGA [8] 2.45 5.33 F F F 709.1
2145.7 F F F 1.74 11.44 F F F TFA [10] 2.5 4.45 F F F 478.7 2444.5 F F F 1.196
10.88 F F F 24-T [11] 2.83 5.56 10.95 22.06 46.31 521.2 1078.5 2183.9 4389.6
8867.1 1.47 5.996 23.91 96.83 410.6 14-T [12] F F F F F F F F F F F F F F F
HPSC [14] 3.24 6.3 12.45 25.98 F 561.8 1214.3 2520.2 5131.7 F 1.82 7.65 31.38
133.3 F DPL [15] 3.62 7.92 F F F 1467.6 4814.4 F F F 5.31 38.13 F F F SRCPL
[15] 3.59 F F F F 1934.3 F F F F 6.94 F F F F New HPSC [16] 2.84 5.57 11.11
22.75 F 945.5 2024.6 4203.5 8581.2 F 2.69 11.28 46.7 195.2 F Hybrid 1 [17]
2.81 5.47 10.77 22.69 F 767.8 1552.5 3485.4 7057.8 F 2.16 8.49 37.54 160.1 F
Hybrid 2 [18] 1.67 3.65 F F F 1683.9 6436.7 F F F 2.81 23.49 F F F Hybrid 3
[19] 2.53 5.41 11.69 25.45 55.94 413.3 881.6 1782.4 3572.5 7976.7 1.06 4.77
20.83 90.92 446.2 Hybrid 4 [20] 1.83 3.99 F F F 475.7 2487.6 F F F 0.87 9.93
F F F Hybrid 5 [21] 2.39 5.23 F F F 539.6 3094.6 F F F 1.29 16.18 F F F Hybrid
31
6 [21] 2.41 5.28 F F F 498.6 2986.3 F F F 1.2 15.77 F F F GDI D1 [22] 1.93
4.27 F F F 500.3 2493.6 F F F 0.97 10.65 F F F GDI D2 [22] 2.08 4.38 9.4 20.16
43.23 345.9 743.6 1621.2 3550.6 7985.1 0.72 3.26 15.24 71.58 345.2 GDI D3
[22] 2.57 5.56 11.86 25.94 56.87 410.7 883.1 1924.2 4290.8 9826.4 1.05 4.91
22.82 111.3 558.8 Proposed ----- 1.95 4.19 9.01 19.11 40.19 285.6 613.8 1326.5
2946.7 6725.9 0.56 2.57 11.95 56.3 270.3 Performance Ratio (PR) 1.17:1 1.15:1
1:1.04 1:1.05 1:1.08 1:1.21 1:1.21 1:1.22 1:1.2 1:1.19 1:1.28 1:1.27 1:1.28
1:1.27 1:1.28 Fig. 4. Implementation of n-bit adder using 1-bit FA. HPSC [14]
and New-HPSC [16] FAs despite having CCMOS logic in the output terminals
failed to operate in 64- bits. It occurred since their delays were extremely high
to comply with the input frequency (100 MHz). Hybrid 3 [19], GDI D2 [22],
GDI D3 [22] FAs and the proposed design do not incorporate CCMOS logic in
the output terminals. Still, they managed to operate up to 64-bits due to their
internal design structures. With careful inspection of Fig. 4, it can be observed
that in each stage, An and Bn terms are inserted as fresh inputs. The carry term
needs to propagate through several stages. In the proposed design (Fig. 2), since
the input carry term Cin is used as gate control of the TGs, the possible signals
which might appear at the output terminal (Cout terminal) are Vdd, Gnd or B.
If we consider only B (since Vdd and Gnd will be provided as fresh inputs in
each stage), for n=0, B0 will appear in the output terminal (C1 output terminal)
whereas for n=1, B1 will appear in output terminal (C2 output terminal). Hence,
the same signal does not propagate through n=0 to n=63 for which the design is
not subjected to voltage degradation when extended to wide adder. 1549-7747
(c) 2019 IEEE. Personal use is permitted, but republication/redistribution
requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for
more information. This article has been accepted for publication in a future issue
of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/TCSII.2019.2940558, IEEE
Transactions on Circuits and Systems II: Express Briefs > REPLACE THIS
LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-
CLICK HERE TO EDIT) < 5 Fig. 5. Performance of proposed design under
various loads with 0.8 V supply voltage. C. Driving capability test of the
proposed design To test drive capability of the proposed FA design, a wide range
of output loads from fan-out of 4 unit-size inverters (FO-4) to fan-out of 64 unit-
size inverters (FO-64) have been applied. The supply voltage for this case was
set to 0.8 V. Simulation data are illustrated using Fig. 5. With careful
observation of Fig. 5, it can be noticed that delay, AP and PDP are extremely
high for FO-32 and FO-64. Hence, the most favorable fan-out load condition for
32
the proposed design would be up to FO-16. V. CONCLUSION In this research,
a novel hybrid FA design showing markedly improved performance has been
proposed. Characteristics of the proposed design have been compared with
twenty existing FAs. For performance analysis, simulation was conducted using
Cadence toolset. As per simulation results, the proposed FA demonstrates
superior performance in speed and PDP while operated as a single cell.
Moreover, the FAs have been extended up to a word length of 64-bits in order
to test scalability. Only the proposed FA and five of the existing designs have
the ability to operate without utilizing buffers in intermediate stages while
extended to 64- bits. In the cascade mode, the performance parameters point out
that the proposed design is comparatively more suitable for wider word-length
adder implementation which is the trend in modern computing systems where
high-speed and efficient computation while consuming low power is essential.
REFERENCES [1] F. Frustaci, M. Lanuzza, P. Zicari, S. Perri and P.
Corsonello, "Designing High-Speed Adders in Power-Constrained
Environments," IEEE Transactions on Circuits and Systems II: Express Briefs,
vol. 56, no. 2, pp. 172-176, Feb. 2009. [2] A. Pal, Low Power VLSI Circuits and
Systems, New Delhi, India: Springer India, 2015. [3] B. K. Mohanty, "Efficient
Fixed-Width Adder-Tree Design," IEEE Transactions on Circuits and Systems
II: Express Briefs, vol. 66, no. 2, pp. 292-296, Feb. 2019.. [4] S. Purohit and M.
Margala, “Investigating the impact of logic and circuit implementation for full
adder performance,” IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol. 20, no. 7, pp. 1327-1331, 2012. [5] R. Zimmermann and W.
Fichtner, “Low-power logic styles: CMOS versus pass-transistor logic,” IEEE
J. Solid-State Circuits, vol. 32, no. 7, pp. 1079–1090, Jul. 1997. [6] Yingtao
Jiang, A. Al-Sheraidah, Yuke Wang, E. Sha and Jin-Gyun Chung, "A novel
multiplexer-based low-power full adder," IEEE Transactions on Circuits and
Systems II: Express Briefs, vol. 51, no. 7, pp. 345-348, July 2004. [7] N. H. E.
Weste and D. M. Harris, CMOS VLSI Design: A Circuits and Systems
Perspective, 4th ed. Boston, MA, USA: Addison-Wesley, 2010. [8] A. M.
Shams, T.K. Darwish, and M. A. Bayoumi, “Performance analysis of low-power
1-bit CMOS full adder cells,” IEEE Trans. Very Large Scale Integr. (VLSI)
Syst., vol. 10, no. 1, pp. 20–29, Feb. 2002. [9] C. H. Chang, J. M. Gu, and M.
Zhang, “A review of 0.18-μm full adder performances for tree structured
arithmetic circuits,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13,
no. 6, pp. 686–695, Jun. 2005. [10] M. Alioto, G. Di Cataldo, and G. Palumbo,
“Mixed full adder topologies for high-performance low-power arithmetic
circuits,” Microelectron. J., vol. 38, no. 1, pp. 130–139, Jan. 2007. [11] C.-K.
Tung, Y.-C. Hung, S.-H. Shieh, and G.-S. Huang, “A low-power high-speed
33
hybrid CMOS full adder for embedded system,” in Proc. IEEE Conf. Design
Diagnostics Electron. Circuits Syst., vol. 13. Apr. 2007, pp. 1–4. [12] M.
Vesterbacka, “A 14-transistor CMOS full adder with full voltageswing nodes,”
in Proc. IEEE Workshop Signal Process. Syst. (SiPS), Taipei, Taiwan, Oct.
1999, pp. 713–722. [13] H. T. Bui, Y. Wang, and Y. Jiang, “Design and analysis
of low-power 10-transistor full adders using novel XOR-XNOR gates,” IEEE
Transactions on Circuits Systems II: Express Briefs, vol. 49, no. 1, pp. 25–30,
Jan. 2002. [14] M. Zhang, J. Gu, and C.-H. Chang, “A novel hybrid pass logic
with static CMOS output drive full-adder cell,” in Proc. Int. Symp. Circuits
Syst., May 2003, pp. 317–320. [15] M. Aguirre-Hernandez and M. Linares-
Aranda, “CMOS full-adders for energy-efficient arithmetic applications,” IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 4, pp. 718–721, Apr.
2011. [16] S. Goel, A. Kumar, and M. A. Bayoumi, “Design of robust,
energyefficient full adders for deep-submicrometer design using hybrid-CMOS
logic style,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 12,
pp. 1309–1321, Dec. 2006. [17] I. Hassoune, D. Flandre, I. O'Connor and J.
Legat, "ULPFA: A New Efficient Design of a Power-Aware Full Adder," in
IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 57, no. 8, pp.
2066- 2074, Aug. 2010. [18] P. Bhattacharyya, B. Kundu, S. Ghosh, V. Kumar
and A. Dandapat, "Performance Analysis of a Low-Power High-Speed Hybrid
1-bit Full Adder Circuit," IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, vol. 23, no. 10, pp. 2001-2008, Oct. 2015. [19] R. F. Mirzaee,
M. H. Moaiyeri, H. Khorsand and K. Navi, “A New Robust and Hybrid High-
Performance Full Adder Cell,” Journal of Circuits, Systems and Computers, vol.
20, no. 4, pp. 641-655, 2011. [20] M. C. Parameshwara and H. C. Srinivasaiah,
“Low-Power Hybrid 1-Bit Full Adder Circuit for Energy Efficient Arithmetic
Applications,” Journal of Circuits, Systems and Computers, vol. 26, no. 1, pp.
1-15, 2017. [21] P. Kumar and R. K. Sharma, “An Energy Efficient Logic
Approch to Implement CMOS Full Adder,” Journal of Circuits, Systems and
Computers, vol. 26, no. 5, pp. 1-20, 2017. [22] M. Shoba and R. Nakkeeran,
“GDI based full adders for energy efficient arithmetic applications,”
Engineering Science and Technology, an International Journal, vol. 19, no. 1,
pp. 485-496, 2016. [23] J. J. Liang, A. K. Qin, P. N. Suganthan, and S. Baskar,
“Comprehensive learning particle swarm optimizer for global optimization of
multimodal functions,” IEEE Trans. Evol. Comput., vol. 10, no. 3, pp. 281–295,
Jun. 2006. [24] C. Pan and A. Naeemi, “A Paradigm Shift in Local Interconnect
Technology in the Era of Nanoscale Multigate and Gate-All-Around Devices,”
vol. 36, no. 3, pp. 274-276, 2015.

34

You might also like