Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/307444268

Low-Power Hybrid 1-Bit Full-Adder Circuit for Energy


Efficient Arithmetic Applications

Article  in  Journal of Circuits, Systems and Computers · August 2016


DOI: 10.1142/S0218126617500141

CITATIONS READS

21 193

2 authors:

Parameshwara M C Srinivasaiah Hc
Vemana Institute of Technology Bangalore Dayananda Sagar Institutions
17 PUBLICATIONS   28 CITATIONS    9 PUBLICATIONS   27 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Design of low power VLSI circuits View project

COMPARATIVE ANALYSIS OF VARIOUS APPROXIMATE FULL ADDERS UNDER RTL CODES View project

All content following this page was uploaded by Parameshwara M C on 11 September 2020.

The user has requested enhancement of the downloaded file.


Journal of Circuits, Systems, and Computers
Vol. 26, No. 1 (2017) 1750014 (15 pages)
.c World Scienti¯c Publishing Company
#
DOI: 10.1142/S0218126617500141

Low-Power Hybrid 1-Bit Full-Adder Circuit for Energy


E±cient Arithmetic Applications¤

M. C. Parameshwara
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

Department of Electronics and Communication Engineering (ECE),


Vemana Institute of Technology, Koramangala,
No. 135/7, 14th A Cross, Vyalikaval,
Malleshwaram, Bengaluru, Karnataka 560 034, India
pmcvit@gmail.com

H. C. Srinivasaiah
Department of Telecommunication Engineering (TCE),
Dayananda Sagar College of Engineering (DSCE),
Kumaraswamy Layout, Bengaluru, Karnataka 560 078, India
hcsrinivas@dayanandasagar.edu; hcsrinivas@ieee.org

Received 25 April 2015


Accepted 8 July 2016
Published 26 August 2016

A novel \16 transistor" (16T) 1-bit Full adder (FA) circuit based on CMOS transmission-gate
(TG) and pass transistor logics (PTL) is presented. This 1-bit FA circuit with TG and PTL
structure is derived based on carry dependent sum implementation approach. The design
metrics (DMs) such as power, delay, power-delay-product (PDP), and transistor-count (TC) for
this 1-bit FA are compared against eight other standard and state-of-the-art 1-bit FA circuits
reported till date. All the comparisons are done at post layout level with respect to the DMs
under consideration. The proposed 1-bit FA dissipates an average power of 2.118 W, with a
delay of 606 ps, with an area of 33.1 m2, resulting in a PDP of 1.28 fJ. This power and hence the
PDP is the lowest of all, ever reported till date. In this comparative study a common test bench
with a supply voltage VDD ¼ 1:2 V, input signal frequency fin ¼ 200 MHz is used. This 1-bit FA
is designed and implemented using Cadences' 90 nm \generic-process-design-kit" (GPDK).

Keywords: Low power arithmetic; carry dependent sum; hybrid logic full adder; pass transistor
logic; transmission gate logic.

1. Introduction
The evolving generations of portable electronic gadgets are driven by higher inte-
gration of diverse multimedia hardware which ultimately consumes more power and
demands designing of energy-e±cient and area-e±cient digital integrated systems.
*This paper was recommended by Regional Editor Piero Malcovati.

1750014-1
M. C. Parameshwara & H. C. Srinivasaiah
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

Fig. 1. Power breakdown in high frequency DDFS.

The full adders (FAs) being critical elements, for e.g., in all `Digital Signal Pro-
cessors' (DSPs) of the state-of-the-art communication systems, need a special at-
tention in achieving the overall system performance.1,2 Speci¯cally, the FAs are often
used in critical paths such as ¯nal carry propagate paths in multiplication, and
division circuits. Another speci¯c use of FAs is in various building blocks of com-
munication systems such as a \direct-digital-frequency-synthesizers" (DDFS).3,4
Figure 1 indicates the power breakdown among various functional blocks in typical
application of a high frequency DDFS circuit required in Long-Term-Evolutionary
(LTE) communication systems.5,6 In this ¯gure, the \phase accumulator and com-
plementor" block involving arithmetic circuits is observed to be power hungry
needing greater attention. Thus optimization of a 1-bit FA for context based ap-
plication is a never ending process, which motivates this research work.
Going further to FA architecture; they are broadly classi¯ed into two types
viz., \carry-independent-sum-adders" (CISAs) and \carry-dependent-sum-adders"
(CDSAs). The FAs are further classi¯ed into static, dynamic, and hybrid FAs based
on type of logic style used. Hybrid FAs use more than one logic style and are
classi¯ed into XOR–XOR, XNOR–XNOR, and XOR-XNOR-based FAs.1,2 In the
past, several FAs have been proposed on carry-independent-sum (CIS), and to the
best of our knowledge very few FAs have been proposed on CDSA. One such FA is a
static CMOS and its variant called static CMOS mirror adder,7,8 having full swing
outputs, and reliable even at low supply voltages. On the other hand, these FAs
su®er from drawbacks such as speed degradation due to presence of large gate ca-
pacitance at each input. This is due to the fact that each input has to drive a gate
capacitance of at least a pMOS, and a nMOS device. Further the presence of pMOS
pull up circuitry lowers the speed, because the pMOS devices are slower compared to
nMOS and hence pMOS devices need to be sized up to attain the required perfor-
mance leading to increased area and hence higher power dissipation. Finally these
static CMOS FAs consume more transistors and hence more area due to increased

1750014-2
Low-Power Hybrid 1-Bit FA Circuit for Energy E±cient Arithmetic

TC. Apart from these FAs several other FA circuits have been proposed in the recent
literature on CISAs using PTL, CMOS TG, Hybrid-logic-style, etc., where hybrid
type uses more than one logic style to meet the requirements such as low power, high
speed, and low TC to achieve required performance.
The \transmission-gate-adder" (TGA),8 14T,9 8T10 and \transmission-function-
adder" (TFA)11 are low power adders due to low TC. The 14T is derived based on
PTL and TG logics. The 8T is derived based on PTL logic only. Whereas the TFA
and TGA are derived based on TG logic; the main disadvantage of these FAs is that
they lack driving capability required in long chain adders. The other type of FAs are
\hybrid-pass-transistor-with-static-CMOS logic" (HPSC-1),12 and its improved
version herein called as HPSC-2.13 These adders exploit the advantage of di®erent
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

logic styles to trade-o® between di®erent DMs and to achieve low TC, but they su®er
from layout complexity due to di®erent sizing of pMOS and nMOS devices. The
hybrid FA1,2 herein called as Hybrid-1 FA has the advantage of good driving ca-
pability and better noise margin due to combination of pass transistor and static
CMOS style of design, but they are observed to have relatively higher power
dissipation due to glitches associated with the internal node capacitances and in-
creased silicon area due to di®erent sizes of pMOS and nMOS transistors. The
\double-pass-logic" (DPL) and \swing-restoring-complementary-pass-transistor-
logic" (SRCPL)14,15 based adders are having advantages of both low power and high
speed but they su®er from large silicon area due to higher TC. The recently reported
1-bit FA16 herein called as Hybrid-2 is having advantage of low power, high speed,
and low TC. This Hybrid-2 adder is based on the CIS. In Hybrid-2 FA the required
performance was achieved through transistor sizing at the cost of extra area.
In this paper, a novel 1-bit FA based on CDSA is presented. The proposed 1-bit
FA is designed using PTL and TG logics and conceived only 16T. With optimized
transistor sizing of pMOS and nMOS of the proposed FA achieves lowest power,
small area, and lowest PDP compared to recently proposed FAs. Also proposed 1-bit
FA works well with proper logic swings without the need of bu®ers in the proposed
test bench.
Rest of this paper is organized as follows. Section 2 presents the proposed 1-bit FA
circuit, highlighting its salient features. Section 3 describes about the simulation
environment used for extracting DMs of 1-bit FA. Section 4 discusses the perfor-
mance comparison of the proposed 1-bit FA with eight other till date reported 1-bit
FAs through simulation. Finally, conclusions are drawn in Sec. 5.

2. Proposed 1-Bit Hybrid Full Adder


In this section, the proposed novel 1-bit FA is presented highlighting its lowest
possible power and PDP characteristics. This possibility is due to a very subtle
insight into the logics of operation and local signal routing considerations. The block
diagram of proposed 1-bit FA (Fig. 2) is derived based on the standard CDSA.

1750014-3
M. C. Parameshwara & H. C. Srinivasaiah

Table 1. Truth table of 1-bit


Full adder.

A B Cin Sum Cout


0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

Consider the truth table of a 1-bit FA shown in Table 1, when input A 6¼ B, then
Cout ¼ Cin , and when A ¼ B, then Cout ¼ Aðor BÞ, thus Cout can be expressed as:
Cout ¼ AH 0 þ Cin H ¼ BH 0 þ Cin H ; ð1Þ
where H 0 in Eq. (1) is A  B (i.e., XNOR) which is output of MUX-Select-Signal-
Generator (MSSG) of Fig. 2(a). The H 0 acts as select signal for Cin or \A (or B)" to
the Cout output of the MUX in Fig. 2(b). Again from Table 1, when A 6¼ B, then the
0 0
Sum ¼ C out (or C in Þ and when A ¼ B, Sum ¼ Cin , Thus Sum output can be
expressed as:
0 0
Sum ¼ Cin H 0 þ C in H ¼ Cin H 0 þ C out H ð2Þ
0
again the H 0 selects Cin or C out to Sum output in the MUX of Fig. 2(c) in accordance
with Eq. (2).
Normally the gate capacitance is an order of magnitude higher than that of the
di®usion capacitance.17 It would be preferable to connect critical signals to di®usion
terminal rather than the gate of an inverter, particularly when the inverter is weak. It

(b)

(a)

(c)

Fig. 2. Block diagram of a proposed 1-bit FA, in which the MUX-Select-Signal-Generator (MSSG)
output of (a) is used as the select input signal in (b) and (c) generating ¯nal carry Cout and the Sum.

1750014-4
Low-Power Hybrid 1-Bit FA Circuit for Energy E±cient Arithmetic

should be noticed that a bu®ered signal will get its edges — sharp rising (or falling)
only when inverter transistors are su±ciently wider; this widening constraint is in
con°ict with area constraint. This argument corroborates the use of the PTL with
the signals driving the di®usion rather than gate of a weak inverter. Thus the local
interconnection topology associated with careful handcraft of layout would be ben-
e¯cial in simultaneous optimization of the area, delay, and power without a tradeo®
among them.
This work envisages the adder architecture, beyond a logical or Boolean optimi-
zation and topology considerations. A very careful critical path analysis has provided
us an insight into the choice of variable selection in accordance with Eqs. (1) and (2).
In Eq. (1), the variables A and B can be conveniently interchanged during the local
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

routing, based on the choice of their arrival times. The attribute of the signal arrival
time can be exploited in the application of adders, particularly in multipliers, and
phase accumulator in the DDFS application said above.
0
From Eq. (2), a judicious choice of whether C out or Cin signal to route on to one of
the MUX input (Fig. 2(c)), makes a signi¯cant di®erence in terms of speed of the
critical path as it involves an inversion operation. Therefore, the MUX is trans-
mission gate based with the inputs applied to the di®usion terminals and select
signals applied to the gate terminals as the transistors are of minimum size. The
logical optimization and routing insight along with the device topography will pro-
vide additional control to optimize both the speed and area metrics simultaneously
as shown in this work. Accordingly, every fraction of a nanosecond (ns) or a pico-
second (ps) decrease in delay, translates by more than a signi¯cant fraction of a Giga-
Hertz (GHz) or a Tera-Hertz (THz) bandwidth enhancement at the system level.
This stringent bargain to enhance the speed of the adders is justi¯ed in their

(a)

Fig. 3. (a) Schematic of proposed 1-bit FA, and (b) its layout.

1750014-5
M. C. Parameshwara & H. C. Srinivasaiah
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

(b)

Fig. 3. (Continued )

application in DDFS. Demand for simultaneous optimization of speed, area, and


power is rather a strong drive toward achieving various quality attributes (such as
footprint, battery life, bandwidth, cost, etc. of the gadgets) of modern communica-
tion DSPs.
The schematic and layout of the novel 1-bit FA is shown in Fig. 3. To minimize
the number of transistors in the novel FA, the MSSG (XNOR) is implemented using
PTL, which conceived only four transistors.9 The proposed 1-bit FA consists of 16
transistors (i.e., 16T) conceiving a total layout area of 33.1 m2 with width ¼
7.875 m and height ¼ 4.2 m. In the schematic of Fig. 3(a), the W =L ratios of the
nMOS/pMOS transistors are indicated next to each transistor. The transistor sizing
methodology as suggested in the earlier reports15,18 has been adopted.

3. Simulation Methodology Used


To compare the performance of the proposed 1-bit FA, with the other 1-bit FAs, a
common test bench13 shown in Fig. 4 is used. The test bench shown in this ¯gure is
used to extract worst case delay and average power of 1-bit FA as the Circuit-Under-
Test (CUT). The test bench has total 12 identical FAs and these FAs are organized
in several stages. This test bench is useful for measuring the performance and driving

1750014-6
Low-Power Hybrid 1-Bit FA Circuit for Energy E±cient Arithmetic
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

Fig. 4. Test bench for validation of all the eight 1-bit FAs, along with the proposed one.

strength of the CUT. The test bench shown in Fig. 4 has a total of ¯ve stages, namely
Stage-1 (S1), Stage-2 (S2), Stage-3 (S3), Stage-4 (S4), and Stage-5 (S5). The S4 is the
CUT (FA*) for which worst case delay and power is to be extracted. The delay of the
CUT is measured from the inputs (A, B, and Cin Þ of the S1 to the outputs of S4
(CUT). Further the power of the CUT is measured in the S1; this is because each FA
in S1 will have exercised all possible input vector-to-vector transitions. Thus the
power of CUT is the average power consumed by the 5-FAs in the S1. In fact FAs in
all stages, S1 through S5 are instances (replicas) of the CUT under consideration.

4. Comparative Study of Various 1-Bit Full Adders


This section discusses about the comparison of results extracted from the test bench
shown in Fig. 4. The results are discussed in the following subsections with respect to
the average power, the worst case delay, and the PDP performance metrics. The
Cadences' Spectre and Virtuoso-XL based on generic 90 nm GPDK have been used
for design, simulation, and layout of all the 1-bit FA circuits under consideration.
The values of transistor sizes in the layout and schematics are retained as their
reported ones in the literature, originally. The extraction of all the performance

1750014-7
M. C. Parameshwara & H. C. Srinivasaiah

metrics has been done through post layout simulations. All these simulations were
carried out using the BSIM3v3 (level 49) models under nominal temperature con-
ditions at supply VDD ¼ 1:2 V with fin ¼ 200 MHz.

4.1. Extraction of delay


To extract the propagation delay (tpd Þ of the proposed FA, a standard test input
patterns as suggested 18;19 in the literature have been used. The tpd is calculated as
the time since a 50% change in input signal while transitioning either from 0 to 1 or
from 1 to 0 logic levels, till a corresponding 50% change in output signals, again
either from 0 to 1 or 1 to 0 transition. For a 1-bit FA with 3 inputs: A, B, and Cin ,
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

there are 2 3 ¼ 8 possible input vectors. For an exhaustive delay analysis for Sum or
Cout , one must consider all possible input vector to vector transition. There are
2 k  2 k ¼ 64 numbers of input vector transitions for k ¼ 3 inputs, presented in the
form of a 2 k  2 k matrix herein called as delay matrix as in Table 2. The extracted
delay for a 1-bit FA at its output — Cout is entered as an element in this matrix
consisting 64 cells. The eight diagonal cells out of a total 64 in the delay matrix,
correspond to transition within the same input states without any physical signi¯-
cance, for e.g., 000!000, 001!001,    111!111 transitions indicated by a \*" in
this matrix (where the arrow indicating the direction of state transition). Further 24
out of remaining 56 (¼ 64-8) state transitions will not cause any state change at the
output — Cout , labeled as Not Applicable (NA). All the 56 input vector transitions
are de¯ned as standard input test patterns18,19 to determine the worst case delay in
Cout as shown in waveforms of Fig. 5(a). These input vector transitions have been
generated automatically using piecewise-linear (PWL) function in Cadences' tool to
generate 200 MHz signal for transient simulation. Collectively, these waveforms on
A, B, and Cin inputs have all the 56 input vector-to-vector transitions implicit in
them. The delay extracted at the output — Cout constitutes the critical path (in an
adder chain). Further, a total 32 ð¼ 56-24) delays at Cout , corresponding to the 32
signi¯cant transitions on 3-tuple input vector fA, B, Cin g has been simulated,
extracted, and tabulated in the respective cell in Table 2. In this table, the worst case

Table 2. Extracted delay values of proposed 1-bit full adder,


with maximum and minimum values highlighted.

ABCin 000 001 010 011 100 101 110 111


000 * NA NA 239 NA 98 130 76
001 NA * NA 606 NA 564 365 290
010 NA NA * 534 NA 538 428 343
011 209 433 398 * 401 NA NA NA
100 NA NA NA 540 * 534 430 344
101 211 436 404 NA 398 * NA NA
110 221 341 473 NA 480 NA * NA
111 369 209 400 NA 459 NA NA *

1750014-8
Low-Power Hybrid 1-Bit FA Circuit for Energy E±cient Arithmetic
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

(a)

(b)

Fig. 5. Simulated waveforms of proposed 1-bit FA to determine, (a) the worst case delay, where 56 input
vector to vector transitions are implicit, and (b) the average power measured over three frequencies namely
fH , fM , and fL , over the 3 inputs A (fA Þ, B (fB Þ, and Cin (fCin Þ of all 1-bit FAs of Fig. 4.

1750014-9
M. C. Parameshwara & H. C. Srinivasaiah

delay is the maximum delay which is highlighted along with the lowest one among a
total of 32 delays. For the proposed 1-bit FA, this worst-case delay is found to be 606
ps for a 001!011 transition.
Further, a dynamic (speed) test is performed over all the 1-bit FAs. During this
test the critical input in a 3-tuple (vector) is driven by a square wave (train of pulses)
with its on-time Ton (ps) with 50% duty cycle. The Ton is varied suitably (main-
taining 50% duty cycle) till the outputs at the Sum or the carry Cout are just at the
verge of failing to transit between the valid logic levels, and this Ton is the Ton-min .
The highest frequency upto which an adder under consideration operates successfully
with the valid logic levels is designated as fdmax ¼ 1=ð2  Ton-min Þ. The simulated
value fdmax for all the 1-bit FAs under consideration are tabulated. In the process of
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

determining the fdmax the noncritical inputs in the 3-tuple are maintained at ap-
propriate and ¯xed logic levels. The fdmax for the proposed 1-bit FA is 1.66 GHz and
the highest fdmax among the 9 FAs is the \TGA" architecture. This TGA is having
2.5  higher power consumption compared to the proposed one.
The salient features of the proposed 1-bit FA are as follows. The logic style is
hybrid pass logic, the power dissipation is 2.118 W, the worst case delay (at critical
Cout Þ is 606 ps, worst critical vector transition is 001!011, the layout area is 33.1
m2, the PDP is 1.28 fJ, TC is 16, and fdmax is 1.66 GHz. The observation on
remaining eight, 1-bit FA architectures under comparison can be inferred from the
comparison table of 1-bit FAs.

4.2. Extraction of power


The worst case power for all the 9, 1-bit FAs under consideration is determined as the
average power dissipated over 9 input frequency patterns (Table 3), applied at the 3-
tuple inputs resulting in a valid logic levels at the outputs — Sum and Cout .18,19 The
frequencies for the inputs A, B, and Cin , are labeled as fA , fB , and fCin . The average

Table 3. Power measurement input


sub-patterns.

Input Sub-pattern

Frequency Patterns fA fB fCin


1 fH fM fL
2 fH fL fM
3 fM fL fH
4 fM fH fL
5 fL fH fM
6 fL fM fH
7 fM fMD fL
8 fM fMD fM
9 fM fMD fH

1750014-10
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

Table 4. Comparison of 1-bit FAs in 90 nm technology node.

Average Vector transition


FA circuit Logic style used power (W) Cout Delay (ps) (ABCin ) PDP (fJ) TC fdmax (GHz) Reference
TFA CMOS Transmission Gate Logic 5.172 540.7 000!011 2.79 16 3.45 11, 18
TGA 5.274 333.5 000!011 1.76 20 4.26 8, 18
HPSC-1 Hybrid Pass Transistor Logic 12.32 756.2 001!011 9.31 22 3.84 12
with Static CMOS
HPSC-2 6.606 451.4 000! 011 2.98 26 0.75 13
Hybrid-1 Hybrid Pass Logic 7.164 500.6 101!000 3.59 24 0.768 1, 2

1750014-11
Hybrid-2 2.975 811.5 011!100 2.41 16 1.876 16
DPL Double Pass Logic 3.928 1871 000!111 7.35 28 1.6 14, 15
SRCPL Complementary Pass Logic 4.6 1866 111!000 8.58 26 1.06 14, 15
Proposed Hybrid Pass Logic 2.118 606.2 001!011 1.28 16 1.66 Present work
Low-Power Hybrid 1-Bit FA Circuit for Energy E±cient Arithmetic
M. C. Parameshwara & H. C. Srinivasaiah

power - Pavg , is the sum of three components, given as;


Pavg ¼ Pswitching þ Pstatic þ Psc ; ð3Þ
where Pswitching is the average switching power loss, Pstatic is the average static power
loss, and Psc is the average short circuit power loss in each of the 9 FAs under
consideration. The static power is the power dissipated due to the steady state
leakage, while the switching power is power loss due to switching of all the node
capacitances in the circuit, and the short circuit power is the power lost over the
entire circuit due to simultaneous conduction of nMOS subnets and pMOS subnets
connected between the power rails.
In Eq. (3), the Pavg is computed for all the 9, 1-bit FAs under consideration over
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

nine frequency patterns as shown in Table 3, are assigned at all the 3-tuple inputs of
the test bench of Fig. 4. These nine frequency patterns corresponding to respective
rows in Table 3 are applied over an interval of one cycle of lowest frequency among
them (Fig. 5(b)). The six out of nine frequency patterns are combinations of three
frequencies, viz., fH ¼ 200 MHz, fM ¼ fH =2 ¼ 100 MHz, and fL ¼ fH =4 ¼ 50 MHz,
taken in 3! (3 factorial) ways and constitute the ¯rst six rows in Table 3. This power
analysis is done at supply VDD ¼ 1:2 V. In the last three rows of this table, the
additional fMD assignment to B is the fM which is phase o®set by 50% of its pulse
width. These three rows will e®ectively simulate worst-case power loss due to glit-
ches. The resulting worst-case glitches are evident in the waveform of Fig. 5 (b). The
waveform of Fig. 5(b) encompasses all the nine frequency patterns applied at 3-tuple
inputs of the test bench, along with the corresponding waveforms at Sum and Cout
outputs. In the waveforms of this ¯gure the nine frequency patterns are shown in nine
time slots: 1 through 9. Each slot is one cycle time of the lowest frequency among
them. All these combinations of frequencies are automatically generated using PWL
function in Cadences' tool. The power metrics derived from this technique for all the
FAs under consideration within the test bench are tabulated in Table 4. It is ob-
served that the proposed FA dissipates lowest power (2.118 W) among all the ones
in this table. This low power feature can be attributed to the low TC leading to
comparatively low layout area and the adopted (hybrid) logic style.

4.3. Extraction of PDP


To determine the worst case PDP the product of worst case power and worst case
delay are extracted using the input patterns shown in Fig. 5. The extraction of the
worst case delay and the worst case power are discussed in the earlier section. Tra-
ditionally, the PDP is considered to be a suitable performance metric for simulta-
neous optimization of power and delay.20 The power dissipation is mainly due to
switching of the node capacitances in the circuits in long channel transistors, where
the leakage power dissipation was relatively insigni¯cant. However, in sub-90 nm
gate length, the total leakage power dissipation due to all the leakage mechanism,

1750014-12
Low-Power Hybrid 1-Bit FA Circuit for Energy E±cient Arithmetic

viz., subthreshold, junction, and carrier tunneling through oxide, becomes compa-
rable with dynamic power dissipation.
In this paper it is the worst case delay at the output Cout , and the worst case
average power are considered to determine the PDP. The PDP provides a trade-o®
between worst case power and worst case delay performance metrics. The minimum
PDP implies prolonged battery life, a desirable feature for portable applications.

4.4. Performance comparison of proposed 1-bit FA with other


reported 1-bit FAs
This section presents about the performance comparison of the proposed 1-bit FA
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

with other 1-bit FAs in terms of worst case power, worst case delay, PDP, TC, and
fdmax . For performance comparison, initially about 10 FAs viz. TFA, TGA, HPSC-1,
HPSC-2, Hybrid-1, Hybrid-2, DPL, SRCPL, 14T, and 8T have been considered. The
reason for considering 10 FAs is that these are optimized FAs with proper transistor
sizing as per their respective reports in the literature. Further among the 10 FAs, the
14T and 8T although succeed in standalone, but fails to produce correct logic levels
at their outputs when instanced in the test bench of Fig. 4 with VDD ¼ 1:2 V and
fin ¼ 200 MHz. Thus the performance comparison results of the remaining eight-FAs
and the proposed 1-bit FA are tabulated in Table 4. From this table, the following
points can be noted:

. Considering the average power column of Table 4, the proposed FA has the lowest
power consumption (2.118 W) as compared to any other FA under consideration.
This advantageous feature is mainly attributed to the underlying architecture and
small transistor sizes. Further all its intermediate nodes are driving smaller par-
asitic capacitances,
. Referring to the delay column of Table 4, the proposed 1-bit FA is having a delay
of 606 ps which is about 45% more than the 20T TGA and ranked as ¯fth among
nine-FAs under consideration. This is because the signal path from Cin to Cout has
to drive more di®usion capacitances. The delay of proposed FA is calculated at
Cout with respect to 3-tuple transition: 001!011, within the test bench,
. Again from the PDP column of Table 4, it is observed that the proposed FA is
having PDP metric of 1.28 fJ which is lowest among all the FAs considered for
comparison. This can be mainly attributed to the low power consumption of the
circuit.
. Among other FAs of Table 4, the power and hence PDP of HPSC-1 is highest
among the other FAs, this is mainly due to more glitches at the outputs and large
intermediate node capacitances,
. The delay of both DPL and SRCPL FAs is highest among other FAs; this is due to
the fact that the signals at both Sum and Cout will not swing rail-to-rail and it is
also due to large intermediate node capacitances,

1750014-13
M. C. Parameshwara & H. C. Srinivasaiah

. From the fdmax column of Table 4, the proposed FA is having 1.66 GHz while the
TGA FA is having 4.26 GHz. For the high performance and low power applica-
tions, a trade-o® between the proposed 16T FA and the TGA type would be a
choice, or a combination of both as hybrid architecture would be another option.
This combination is advantageous in applications for e.g., in DDFS where speed as
well power are critical, with a careful trade-o® and handcrafted layout.

5. Conclusion
A novel 16T 1-bit FA based on CDSA has been presented and compared with the
other state-of-the-art 1-bit FAs in terms of the DMs under consideration. The pro-
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com
by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

posed 1-bit FA has been designed using both `pass-transistor' and `transmission-gate'
logics and implemented in Cadence Virtuoso-XL using generic 90 nm GPDK. The
extracted performance metrics such as delay, power, and PDP of the proposed 1-bit
FA are compared with eight other 1-bit FAs reported in the literature till date.
Comparison of the results show that the proposed 16T 1-bit FA o®ers about 28.8%
power saving with respect to the Hybrid-2, 1-bit FA which has been claimed as
lowest power in the recent reports.16 It further has a 27.27% saving in PDP in
comparison with TGA 1-bit FA which claims as lowest PDP as per this study. Thus
the proposed 1-bit FA circuit is a good choice for portable low power applications.
The fdmax parameter of the proposed FA is 1.66 GHz while the TGA FA is having
4.26 GHz as observed in this study within the test-bench under consideration. For
the high performance and low power applications, a trade-o® between the proposed
16T FA and the TGA type would be a choice or a combination of both as hybrid
architecture would also be an alternative. This combination is advantageous in
applications where speed and power are critical, for e.g., in DDFS. This combination
in percentage between the proposed 1-bit FA, and the TGA 1-bit FA needs to be
estimated with handcrafted layout style in view of achieving regularity.

Acknowledgments
Authors acknowledge the management of Dayananda Sagar Group of Institutions
(DSI) for all its support and constant encouragement for carrying out this work in
the Research center, Department of Telecommunication Engineering (TCE), Day-
ananda Sagar College of Engineering (DSCE), Bengaluru–560 078 in a±liation to
Visvesvaraya Technological University (VTU), Belagavi. The authors convey their
special thanks to VTU, Belagavi for all its support for this research work.

References
1. S. Goel, A. Kumar and M. A. Bayoumi, Design of robust, energy-e±cient full adders for
deep-sub micrometer design using hybrid-CMOS logic style, IEEE Trans. Very Large
Scale Integr. (VLSI) Syst. 14 (2006) 1309.

1750014-14
Low-Power Hybrid 1-Bit FA Circuit for Energy E±cient Arithmetic

2. S. Goel, S. Gollamudi, A. Kumar and M. A. Bayoumi, On the design of low energy hybrid
CMOS 1-bit full adder cells, Proc. 47th IEEE Int. Conf. Midwest Symp. Circuits and
System, Hiroshima, Japan (2004), pp. 209–212.
3. Amir M. Sodagar and G. Roientan Lahiji, Mapping from phase to sine–amplitude in
direct digital frequency synthesizer using parabolic approximation, IEEE Trans. Circuits
Syst.-II: Analog Digital Signal Process. 47 (2000) 1452.
4. L. K. Tan and H. Samueli, Quadrature digital synthesizer/mixer in 0.8 m CMOS, IEEE
J. Solid-State Circuits 30 (1995) 193.
5. C.-Y. Yang, J.-H. Weng and H.-Y. Chang, A 5 GHz direct digital frequency synthesizer
using analog-sine-mapping technique in 0.35 m SiGe BiCMOS, IEEE J. Solid-State
Circuits 46 (2011) 2064.
6. I. F. Akyildiz, D. M. Gutierrez-Estevez and E. C. Reyes, The evolution to 4G cellular
J CIRCUIT SYST COMP 2017.26. Downloaded from www.worldscientific.com

systems: LTE-advanced, Elsevier J. Phys. Commun. 3 (2010) 217.


by FLINDERS UNIVERSITY LIBRARY on 11/01/16. For personal use only.

7. R. Zimmermann and W. Fichtner, Low-power logic styles: CMOS versus pass-transistor


logic, IEEE J. Solid-State Circuits 32 (1997) 1.
8. N. H. E. Weste, D. Harris and A. Banerjee, CMOS VLSI Design: A Circuits and Systems
Perspective (Pearson, New Delhi, 2013).
9. D. Radhakrishnan, Low-voltage low-power CMOS full adder, Proc. IEE Circuits Devices
Sys. 148 (2001), pp. 19–24.
10. Y. Wei and J.-z. Shen, Design of novel low power 8-transistor 1-bit full adder cell, J.
Zhejiang Univ.-Sci. C 12 (2011) 604.
11. N. Zhuang and H. Wu, A new design of the CMOS full adder, IEEE J. Solid-State
Circuits 27 (1992) 840.
12. M. Zhang, J. Gu and C. H. Chang, A novel hybrid pass logic with static CMOS output
drive full-adder cell, Proc. IEEE Int. Symp. Circuits and Systems Bangkok, Thailand
(2003), pp. 317–320.
13. C. H. Chang, J. Gu and M. Zhang, A review of 0.18 m full adder performances for tree
structured arithmetic circuits, IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
13 (2005) 686.
14. M. Aguirre-Hernandez and M. Linares-Aranda, CMOS full–adders for energy-e±cient
arithmetic applications, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 19 (2011)
718.
15. M. Aguirre-Hernandez and M. Linares-Aranda, An alternative logic approach to imple-
ment high-speed low power full adder cells, Proc. SBCCI, Florianopolis, Brazil (2005),
pp. 166–171.
16. P. Bhattacharyya, B. Kundu, S. Ghosh, V. Kumar, A. Dandapat, Performance analysis of
a low-power high-speed hybrid 1-bit full adder circuit, IEEE Trans. Very Large Scale
Integr. (VLSI) Syst. 23 (2015) 2001.
17. H. C. Srinivasaiah, Statistical Modeling of Transistor Mismatch E®ects in 100 nm CMOS
devices, Ph.D thesis, Indian Institute of Science (IISc), Bengaluru, Karnataka, India
(2004), http://etd.ncsi.iisc.ernet.in/handle/2005/1202.
18. A. M. Shams, T. K. Darwish and M. Bayoumi, Performance analysis of low-power 1-bit
CMOS full adder cells, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 10 (2002) 20.
19. A. M. Shams and M. Bayoumi, Performance evaluation of 1 bit CMOS adder cells, Proc.
IEEE Int. Symp. Circuits and Systems Orlando, Florida, USA (1999), pp. 27–30.
20. Dipanjan Sengupta, and Resve Saleh, Generalized power delay metric–in deep submicron
CMOS design, IEEE Trans. CAD ICs Syst. 26 (2007) 183.

1750014-15

View publication stats

You might also like