Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Energy Efficient Multiplier for

Reconfigurable Digital filter

Dr.S.Karthick Dr.C.Kamalanathan Dr.Sunita Panda


Department of Electronics and Department of Electrical,Electronics Department of Electrical,Electronics
Communication Engineering and Communication Engineering and Communication Engineering
Koneru Lakshmaiah Education GITAM School of Technology GITAM School of Technology
Foundation GITAM Deemed to be University GITAM Deemed to be University
Vaddeswaram,Andra Pradesh,India Bengaluru,India Bengaluru,India
karthickapplied@gmail.com kamalanadhan@gmail.com sunita.nano@gmail.com

Abstract— The growth of personal computing devices as and reducing its power dissipation are crucial to improve the
portable desktops, personal digital assistants, multimedia and performance of the multiplier. Chang et al. (2004) proposed
personal computers demands a complex functionality with the low power compressors capable of improving the
efficient computations. Digital systems with present day performance of multiplier circuits. Aliparast et al. (2013)
technologies possess a very high computing capability that makes presented the ultra high speed 4-2 compressor for fast digital
powerful workstations. The demand for low power consumption
is prominently increasing as the device feature size shrinks into
arithmetic circuits. Low power circuits can be obtained by
submicron dimensions. Power dissipation can become the minimizing the number of inverter (Shams & Bayoumi 1997).
bottleneck for the current designs because of battery life and Systolic array architecture increases the throughput and
cooling requirements of portable equipment’s. Excessive power maintains regularity & modularity cells of VLSI system
consumption degrades the reliability and limits the transistor designs (Parhi 2001). Vinoth et al. (2011) uses Sklansky adder
integration on a chip. With, ever increasing quest for higher to implement 4:2 and 5:2 compressor architectures. The
computing power on battery operated devices, design emphasis proposed architecture consumes 1.436 mW power with the
on optimizing area, power with high performance. The total reduction of 4.57% compared to conventional Wallace tree
power dissipation of the proposed compressor is found to be multiplier. Ghasemzadeh et al. (2014) designed a novel 4:2
11.796 µW and 5.168 µW with the corresponding delay of 0.649
ns and 0.437 ns for 180 nm and 90 nm technologies respectively.
compressor structure to reduce the glitches at the output
It reveals a PDP reduction of the proposed compressor to be waveform. The speed is enhanced by faster production of
5.50% to 15.83% for 180 nm technology, 4.48% to 14.89% for 90 intermediate carry. Jamshidi et al. (2015) proposed magnetic
nm technology compared with other state-of-the-art designs. tunnel junction and complementary metal oxide
This suggests the suitability of the proposed compressor design semiconductor based architecture for 4:2 compressor designs.
for portable applications. An 8-bit Wallace tree multiplier is The proposed architecture reduces the number of elements and
designed using the proposed compressor and other state-of-the- interconnections in the compressor cell. Howard et al. (2005)
art designs. From the comparison, it is seen that the multiplier compared standard CMOS, hybrid CMOS and pass transistor
design using proposed compressor architecture achieves better logic based compressors. The compressors are selected based
performance in terms of PDP compared to other state-of-the- art
designs.
on power, area and speed requirements. Kandasamy et
al.(2018) presents the high-speed adder circuit design based
Keywords—Multiplier, Digital filter, Compressor on 18 transistor (18T) full swing gate diffusion input based
logic gates and transmission based techniques to compute the
I. INTRODUCTION sum and carry bits. Nehru et al.(2017)incorporate a different
Digital signal processing includes filtering, averaging, kinds of adders and the performance was determined by the
correlation and modulating the signals in digital form to trade-offs between power delay and area parameters. (Huang
estimate the characteristic parameters of a signal. et al. 2012) proposed systolic array based digital filter for the
Advancements in DSP have permitted many applications QRS detector of ECG analysis to increase the throughput of
with unprecedented growth capabilities. Digital computers the digital filter. Gavaberet.al.(2019) proposed architecture
and special purpose digital hardware performs complex uses a low-power three-input XOR gate to reduce area, delay
signal processing tasks. Even though the VLSI technology and power consumption. Energy efficient compressor is
has rapid exponential growth characterized by Moore’s law, proposed in this work. The performance of the proposed
there is a boon in low power design parallel to technology compressor is verified by implementing in systolic array based
scaling. Complex sensor and monitoring systems in digital filter.
biomedical applications implemented in general purpose
computing are highly sensitized to power consumption due
to the scaling of technologies. Hence designers are forced
towards power constrained designs for the development of
low power biomedical monitoring systems. II. SYSTOLIC ARRAY BASED DIGITAL FILTER
Many digital signal processing applications uses filtering
as the underlying component. Adders and multipliers form the Digital filters are used in the pre-processing stage of QRS
basic processing elements of the digital filters. The partial detection algorithm. Band pass filters are used to reduce the
product accumulation stage of the multiplier occupies a high wave interference, noise and baseline wanders in QRS
fraction of silicon area and consumes significant amount of preprocessing stage. The low pass and high pass filters are
power. Therefore, increasing the speed of partial product stage cascaded to form the band pass filter. The amplitude response

978-1-7281-6828-9/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Middlesex University. Downloaded on November 03,2020 at 08:53:20 UTC from IEEE Xplore. Restrictions apply.
of the band pass filter is designed to approximate the spectrum the partial product terms. The second stage is the reduction of
of the average QRS complex and effectively passes the partial product terms using accumulator. Carry propagate
characteristic frequencies by attenuating higher and lower adders are used at the last stage to obtain the final product
frequencies. terms. The second stage of partial product accumulation called
as carry save adder tree which determines the area, throughput
and power consumption of the multiplier. The reduction of
these partial products involves the parallel application of
compressors
III. PROPOSED 4:2 COMPRESSOR
In proposed 4:2 compressor gate level optimization is done
to improve the energy efficiency of the compressor. The 4:2
compressor consists of five inputs and three outputs. The
arithmetic functionality of 4:2 compressor is represented by
the equation (1).
Fig. 1. Digital Filter with Systolic Array Architecture
Z1 + Z2 + Z3 + Z4 + Cin = 2 x (Cout + Cox) + Sout (1)
Systolic array architecture contains a set of interconnected
cells called processing elements each capable of performing The sum output (Sout) of the proposed compressor is
simple operations. Systolic array architecture computes and expressed by the equation (2). From the equation (2)
passes the data using a network of Processing Elements (PEs). if Cin = 0 and odd number of inputs are high, the sum output
These processing elements maintain the regular flow of data will be high. For Cin = 1, the Sout will be high if even number
in and out of the system. All the processing elements in a of inputs are high.
systolic array are uniform and pipelined. Replacing a single Sout = (Z1 ⨁ Z2) ⨁ (Z3 ⨁ Z4) ⨁ Cin (2)
processing element with array of processing elements
increases the computational throughput without increasing the The carry outputs Cox and Cout of the proposed 4:2
memory bandwidth. Figure 1 shows the systolic array compressor is represented by equations (3) to (7).
architecture for digital filters. The systolic array architecture
is used to build the digital filter to achieve the tradeoff Cox = (Z1 ⨁ Z2 ) Cin + Z1Z2 (3)
between throughput and hardware required. The systolic array
architecture shown in Figure 1 consists of delay elements, = Z1Z2’Cin + Z1’Z2Cin + Z1Z2 (4)
adders and multipliers. Adder and multiplier form the basic
processing element shown within the dotted area in Figure 1. = Z1Z2’Cin + Z1Z2 + Z2Cin (5)
The processing elements can be connected in an array
= Z1Cin + Z1Z2 + Z2Cin (6)
structure to implement the digital filter (Huang et al. 2012).
Multiplier efficiency determines the performance of digital
Cout =(Z1 ⨁ Z2 ⨁ Cin) Z3 + (Z1 ⨁ Z2 ⨁ Cin) Z4 + Z3Z4 (7)
filter when power consumption and throughput are considered
as the limiting factor.
TABLE I. TRUTH TABLE OF PROPOSED 4:2 COMPRESSOR
ARCHITECTURE

No of High
Cin Cox Cout Sout
Logic (Z*)
0 0 0 0 0
1 0 0 0 1
2 0 1 0 0
3 0 1 0 1
4 0 1 1 0
0 1 0 0 1
1 1 0 1 0
2 1 1 0 1
3 1 1 1 0
4 1 1 1 1

The truth table of the proposed 4:2 compressor is


illustrated in Table 1, where Z* is the number of logic high
inputs. Figure 3 shows the proposed 4:2 compressor
architecture. Based on And-Or-Invert (AOI) logic it can be
seen that 8 NOT gate + 12 AND gate + 6 OR gate make up
the conventional compressor design using full adders. The
proposed compressor design shown in Figure 3 is made by 7
Fig. 2. Z × Z Multiplier Processing Stages NOT gate + 14 AND gate + 7 OR gates. The area count is
A fast tree multiplier involves three processing stages as evaluated by counting the total number of NAND gates that
shown in Figure 2. The first processing stage is to generate make the circuit. The NAND equivalent of NOT gate is taken

Authorized licensed use limited to: Middlesex University. Downloaded on November 03,2020 at 08:53:20 UTC from IEEE Xplore. Restrictions apply.
as 1 unit, AND gate is taken as 2 units and OR gate is taken as TABLE III PERFORMANCE COMPARISONS OF PROPOSED AND
3 units for calculating the area count of the proposed EXISTING COMPRESSOR ARCHITECTURES
compressor and other state-of-the-art designs. The delay of the Performance Metrics Area Delay Power PDP
proposed compressor design shown in Figure 3 is 2* ∆xor + Compressors
(µm)2 (ns) (µW) (µW-
1* ∆and + 2* ∆or + 1* ∆not. Table 2 gives the gate count in ns)
NAND equivalent and delay of the proposed compressor and Conventional
other state-of-the-art designs. The proposed logic has Compressor

125.19

11.116
0.796

8.848
minimum number of inverter in the critical path. The proposed Using Full
compressor reduces the number of gates required, which Adders
minimizes the interconnect delays and the associated glitches. Logic Level
Optimized

159.03

13.177
0.633

8.341
Compressor
(Chang et al.
2004)
High Speed

168.56

14.601
0.623

9.096
Compressor
(Baran et al.2010)
180
nm Logical
Decomposed
Compressor

154.51

12.836
0.642

8.241
(Pishvaie et al.
2013)

Ultra High Speed


Compressor

164.48

13.526
0.599

8.102
(Aliparast et. al.
2010)
Fig. 3. Proposed 4:2 Compressor
Proposed
Compressor

140.49

11.796
0.649

7.656
TABLE II NAND EQUIVALENT GATE COUNT AND DELAY OF
THE PROPOSED COMPRESSOR AND OTHER STATE-OF-THE-ART
DESIGN
Conventional
4:2 NAND Equivalent Delay Compressor

65.41

0.534

4.833

2.581
Compressor Using Full
NOT AND OR Adders
Conventional 8 24 18 4*Δxor
Compressor Logic Level
Optimized
Logic Level 14 24 27 2*Δxor+1*Δand +1*Δnor Compressor
81.89

0.425

5.729

2.435
Optimized +1*Δnot (Chang et al.
Compressor 2004)

High Speed 18 36 15 1*Δnand+2*Δxor+1*Δxnor


Compressor High Speed
Compressor
86.37

0.418

6.348

Logical 13 14 36 3*Δand+3*Δnor+1*Δnand (Baran et al.2010) 2.653


Decomposed +1*Δnot 90 nm
Compressor
Logical
Ultra High 18 30 18 1*Δxor+2*Δxnor Decomposed
Compressor
79.56

0.431

5.581

2.405

Speed (Pishvaie et al.


Compressor 2013)
Proposed 7 28 21 2*Δxor+1*Δand+2*Δor
Compressor +1*Δnot Ultra High Speed
Compressor
84.22

0.402

5.881

2.364

(Aliparast et. al.


2010)
The energy efficient compressor is described using structural
Verilog-HDL to produce gate level netlist and synthesized Proposed
using Cadence RTL compiler with respect to 180 nm and 90 Compressor
72.42

0.437

5.168

2.258

nm technologies. Conventional compressor, logic level


optimized compressor (Chang et. al. 2004), high speed
compressor (Baran et. al. 2010), logical decomposed
compressor (Pishave et. al. 2013) and ultra high speed
compressor (Aliparast et. al. 2013) are used for comparison. The total power dissipation of the proposed compressor is
declined by 10.48%, 19.21%, 8.10% and 12.79%
The designs used for comparison are also described using for 180 nm technology, 9.79%, 18.59%, 7.40% and 12.12%
structural Verilog-HDL. The area, power and PDP of the for 90 nm technology compared to logic level optimized
proposed and previous designs are shown in Table 3. compressor, high speed compressor, logical decomposed

Authorized licensed use limited to: Middlesex University. Downloaded on November 03,2020 at 08:53:20 UTC from IEEE Xplore. Restrictions apply.
compressor and ultra high speed compressor respectively. Using High
Delay of the proposed compressor is 18.47% and 18.16% less Speed

11.383
9.729

1.170
9989
Compressor
for 180 nm and 90 nm technologies compared to conventional (Baran et
compressor architecture. The PDP of the proposed compressor al.2010)
is reducing by 13.47%, 8.21%, 15.83%, 7.10% and 5.50% Using Logical
for 180 nm technology, 12.51%, 7.27%, 14.89%, 6.11% and Decomposed
4.48% for 90 nm technology compared with other state-of- Compressor

10.048

10.269
1.022
9245
90 nm
the-art-designs. (Pishvaie et al.
2013)

Using Ultra High


IV. IMPLEMENTATION OF PROPOSED COMPRESSOR IN Speed Compressor

10.117
9.342

1.083
9752
SYSTOLIC ARRAY BASED DIGITAL FILTER (Aliparast et al.
2013)
To verify the performance of the proposed compressor an
implementation of processing element in systolic array based
digital filter shown in Figure 1 is done. The processing Using Proposed

10.446

0.901

9.412
7934
element is constructed using 16-bit carry look ahead adder and Compressor
8-bit Wallace tree multiplier with proposed compressor and
other state-of-the-art designs Table 4 shows the performance
estimates of the digital filter in terms of area, delay and power
dissipation. From Table 4 it can be seen that the digital filter
implemented with proposed compressor exhibits a PDP
reductions of at least 8.67% for 180 nm technology and 6.98%
for 90 nm technology respectively compared with other
state- of-the-art designs. A plot of PDP for the digital filter
with proposed and other compressor designs for 180 nm and
90 nm technologies is shown in Figures 4 and 5.
TABLE IV AREA, DELAY AND POWER COMPARISONS OF DIGITAL FILTER
IMPLEMENTED WITH PROPOSED AND OTHER STATE-0F-THE-ART
COMPRESSOR DESIGNS

Performance Metrics Area Delay Power PDP


Compressors
(µm) (ns) (µW) (µW-
2
ns) Fig. 4. PDP of the Digital Filter Implemented with Proposed Compressor and
Using Conventional State-of-the-Art Designs for 180nm Technology
21.103

39.083
13567

1.852

Compressor
Using Logic Level
14.924

36.146
18736

2.422

Optimized
Compressor
(Chang et al.2004)
Using High Speed
14.691

39.695
19679

2.702

Compressor
(Baran et al. 2010)

Using Logical
Decomposed
15.172

35.806
18212

2.360

180 Compressor
nm (Pishvaie et
al.2013)
Using Ultra High
Speed Compressor
14.107

34.647
19212

2.456

(Aliparast et al.
2013)
Fig. 5. PDP of the Digital Filter Implemented with Proposed Compressor and
State-of-the-Art Designs for 90nm Technology
Using Proposed
Compressor
15.573

31.644
15530

2.032

V.CONCLUSION
Using Conventional Energy efficient compressor architecture for systolic array
Compressor
13.976

11.209

based filter is proposed. The total power dissipation of the


0.802
6887

proposed compressor is found to be 11.796 µW and 5.168


µW with the corresponding delay of 0.649 ns and 0.437 ns for
Using Logic Level 180 nm and 90 nm technologies, respectively. It reveals a PDP
Optimized
10.368
9.884

1.049

reduction of the proposed compressor to be 5.50% to 15.83%


9511

Compressor
(Chang et al.2004) for 180 nm technology, 4.48% to 14.89% for 90 nm
technology compared with other state-of-the-art designs. This
suggests the suitability of the proposed compressor design for

Authorized licensed use limited to: Middlesex University. Downloaded on November 03,2020 at 08:53:20 UTC from IEEE Xplore. Restrictions apply.
portable applications. An 8- bit Wallace tree multiplier is the 21st IEEE International Conference on Mixed Design of Integrated
designed using the proposed compressor and other state-of- Circuits & Systems(MIXEDS),pp.127-130
the-art designs. From the comparison, it is seen that the [6] Chang, CH, Gu, J & Zhang, M 2004, ‘Ultra Low-Voltage Low-Power
CMOS 4-2 and 5-2 Compressors for Fast Arithmetic Circuits’, IEEE
multiplier design using proposed compressor architecture Transactions on Circuits and Systems-I:Regular Papers, vol. 51, no. 10,
achieves better performance in terms of PDP compared to pp. 1985-1997
other state-of-the- art designs. An implementation of the [7] Jamshidi, V, Fazeli, M & Patooghy A 2015, ‘A low power hybrid
proposed compressor in the processing element of systolic MTJ/CMOS (4-2) compressor for fast arithmetic circuits’,In 18th CSI
array based digital filter is done to evaluate its performance. IEEE International Symposium on Computer Architecture and Digital
Systems (CADS), pp. 1- 6.
REFERENCES [8] Howard, GM, Mokrian, P, Ahmadi, M & Miller, WC 2005, ‘Power and
delay analysis of 4: 2 compressor cells’, In IEEE International
[1] Aliparast, P, Koozehkanani, ZD & Nazari, F 2013, “An Ultra High
Symposium on Circuits and Systems (ISCAS 2005), pp.3559-3562.
Speed Digital 4-2 Compressor in 65-nm CMOS”, International Journal
of Computer Theory and Engineering, vol. 5, no. 4, pp. 593-597. [9] Nehru, K. and Linju, T.T., 2017. Design of 16 Bit Vedic Multiplier
Using Semi-Custom and Full Custom Approach. Journal of
[2] Shams, AM & Bayoumi, MA 1997, ‘A structured approach for
Engineering Science & Technology Review, vol. 10 no.2 pp.220-232.
designing low power adders’, In IEEE Conference Record of the
Thirty-First Asilomar Conference on Signals, Systems & Computers, [10] Prasad, K & Parhi, KK 2001, ‘Low-power 4-2 and 5-2 compressors’In
vol. 1, pp. 757-761 IEEE Conference Record of the Thirty-Fifth Asilomar Conference on
Signals,Systems and Computers, vol. 1, pp.129-133
[3] Huang, SC, Wang, HM & Chen, WY 2012, ‘A ±6 ms- Accuracy, 0.68
mm2 and 2.21 µW QRS detection ASIC’,VLSI Design, vol. 2012, pp. [11] Kandasamy, N., Ahmad, F., Reddy, S., Telagam, N. and Utlapalli, S.,
1-13 2018. Performance evolution of 4-b bit MAC unit using hybrid GDI
and transmission gate-based adder and multiplier circuits in 180 and
[4] Vinoth, C, Bhaaskaran, VK, Brindha, B, Sakthikumaran, S,
90 nm technology. Microprocessors and Microsystems, 59, pp.15-
Kavinilavu, V, Bhaskar, B, Kanagasabapathy, M & Sharath, B 2011,
28.
‘A novel low power and high speed Wallace tree multiplier for RISC
processor’, In IEEE 3rd International Conference on Electronics [12] Gavaber, M.D., Poorhosseini, M. and Pourmozafari, S., 2019. Novel
Computer Technology (ICECT), vol. 1, pp. 330-334 architecture for low-power CNTFET-based compressors. Journal of
Circuits, Systems and Computers, Vol. 28 no.12, p.1950207.
[5] Ghasemzadeh, M, Akbari, A, Hadidi, K & Khoei, A 2014, ‘A novel
fast glitchless 4-2 compressor with a new structure’, In Proceedings of

Authorized licensed use limited to: Middlesex University. Downloaded on November 03,2020 at 08:53:20 UTC from IEEE Xplore. Restrictions apply.

You might also like