Professional Documents
Culture Documents
Kar Thick 2020
Kar Thick 2020
Abstract— The growth of personal computing devices as and reducing its power dissipation are crucial to improve the
portable desktops, personal digital assistants, multimedia and performance of the multiplier. Chang et al. (2004) proposed
personal computers demands a complex functionality with the low power compressors capable of improving the
efficient computations. Digital systems with present day performance of multiplier circuits. Aliparast et al. (2013)
technologies possess a very high computing capability that makes presented the ultra high speed 4-2 compressor for fast digital
powerful workstations. The demand for low power consumption
is prominently increasing as the device feature size shrinks into
arithmetic circuits. Low power circuits can be obtained by
submicron dimensions. Power dissipation can become the minimizing the number of inverter (Shams & Bayoumi 1997).
bottleneck for the current designs because of battery life and Systolic array architecture increases the throughput and
cooling requirements of portable equipment’s. Excessive power maintains regularity & modularity cells of VLSI system
consumption degrades the reliability and limits the transistor designs (Parhi 2001). Vinoth et al. (2011) uses Sklansky adder
integration on a chip. With, ever increasing quest for higher to implement 4:2 and 5:2 compressor architectures. The
computing power on battery operated devices, design emphasis proposed architecture consumes 1.436 mW power with the
on optimizing area, power with high performance. The total reduction of 4.57% compared to conventional Wallace tree
power dissipation of the proposed compressor is found to be multiplier. Ghasemzadeh et al. (2014) designed a novel 4:2
11.796 µW and 5.168 µW with the corresponding delay of 0.649
ns and 0.437 ns for 180 nm and 90 nm technologies respectively.
compressor structure to reduce the glitches at the output
It reveals a PDP reduction of the proposed compressor to be waveform. The speed is enhanced by faster production of
5.50% to 15.83% for 180 nm technology, 4.48% to 14.89% for 90 intermediate carry. Jamshidi et al. (2015) proposed magnetic
nm technology compared with other state-of-the-art designs. tunnel junction and complementary metal oxide
This suggests the suitability of the proposed compressor design semiconductor based architecture for 4:2 compressor designs.
for portable applications. An 8-bit Wallace tree multiplier is The proposed architecture reduces the number of elements and
designed using the proposed compressor and other state-of-the- interconnections in the compressor cell. Howard et al. (2005)
art designs. From the comparison, it is seen that the multiplier compared standard CMOS, hybrid CMOS and pass transistor
design using proposed compressor architecture achieves better logic based compressors. The compressors are selected based
performance in terms of PDP compared to other state-of-the- art
designs.
on power, area and speed requirements. Kandasamy et
al.(2018) presents the high-speed adder circuit design based
Keywords—Multiplier, Digital filter, Compressor on 18 transistor (18T) full swing gate diffusion input based
logic gates and transmission based techniques to compute the
I. INTRODUCTION sum and carry bits. Nehru et al.(2017)incorporate a different
Digital signal processing includes filtering, averaging, kinds of adders and the performance was determined by the
correlation and modulating the signals in digital form to trade-offs between power delay and area parameters. (Huang
estimate the characteristic parameters of a signal. et al. 2012) proposed systolic array based digital filter for the
Advancements in DSP have permitted many applications QRS detector of ECG analysis to increase the throughput of
with unprecedented growth capabilities. Digital computers the digital filter. Gavaberet.al.(2019) proposed architecture
and special purpose digital hardware performs complex uses a low-power three-input XOR gate to reduce area, delay
signal processing tasks. Even though the VLSI technology and power consumption. Energy efficient compressor is
has rapid exponential growth characterized by Moore’s law, proposed in this work. The performance of the proposed
there is a boon in low power design parallel to technology compressor is verified by implementing in systolic array based
scaling. Complex sensor and monitoring systems in digital filter.
biomedical applications implemented in general purpose
computing are highly sensitized to power consumption due
to the scaling of technologies. Hence designers are forced
towards power constrained designs for the development of
low power biomedical monitoring systems. II. SYSTOLIC ARRAY BASED DIGITAL FILTER
Many digital signal processing applications uses filtering
as the underlying component. Adders and multipliers form the Digital filters are used in the pre-processing stage of QRS
basic processing elements of the digital filters. The partial detection algorithm. Band pass filters are used to reduce the
product accumulation stage of the multiplier occupies a high wave interference, noise and baseline wanders in QRS
fraction of silicon area and consumes significant amount of preprocessing stage. The low pass and high pass filters are
power. Therefore, increasing the speed of partial product stage cascaded to form the band pass filter. The amplitude response
Authorized licensed use limited to: Middlesex University. Downloaded on November 03,2020 at 08:53:20 UTC from IEEE Xplore. Restrictions apply.
of the band pass filter is designed to approximate the spectrum the partial product terms. The second stage is the reduction of
of the average QRS complex and effectively passes the partial product terms using accumulator. Carry propagate
characteristic frequencies by attenuating higher and lower adders are used at the last stage to obtain the final product
frequencies. terms. The second stage of partial product accumulation called
as carry save adder tree which determines the area, throughput
and power consumption of the multiplier. The reduction of
these partial products involves the parallel application of
compressors
III. PROPOSED 4:2 COMPRESSOR
In proposed 4:2 compressor gate level optimization is done
to improve the energy efficiency of the compressor. The 4:2
compressor consists of five inputs and three outputs. The
arithmetic functionality of 4:2 compressor is represented by
the equation (1).
Fig. 1. Digital Filter with Systolic Array Architecture
Z1 + Z2 + Z3 + Z4 + Cin = 2 x (Cout + Cox) + Sout (1)
Systolic array architecture contains a set of interconnected
cells called processing elements each capable of performing The sum output (Sout) of the proposed compressor is
simple operations. Systolic array architecture computes and expressed by the equation (2). From the equation (2)
passes the data using a network of Processing Elements (PEs). if Cin = 0 and odd number of inputs are high, the sum output
These processing elements maintain the regular flow of data will be high. For Cin = 1, the Sout will be high if even number
in and out of the system. All the processing elements in a of inputs are high.
systolic array are uniform and pipelined. Replacing a single Sout = (Z1 ⨁ Z2) ⨁ (Z3 ⨁ Z4) ⨁ Cin (2)
processing element with array of processing elements
increases the computational throughput without increasing the The carry outputs Cox and Cout of the proposed 4:2
memory bandwidth. Figure 1 shows the systolic array compressor is represented by equations (3) to (7).
architecture for digital filters. The systolic array architecture
is used to build the digital filter to achieve the tradeoff Cox = (Z1 ⨁ Z2 ) Cin + Z1Z2 (3)
between throughput and hardware required. The systolic array
architecture shown in Figure 1 consists of delay elements, = Z1Z2’Cin + Z1’Z2Cin + Z1Z2 (4)
adders and multipliers. Adder and multiplier form the basic
processing element shown within the dotted area in Figure 1. = Z1Z2’Cin + Z1Z2 + Z2Cin (5)
The processing elements can be connected in an array
= Z1Cin + Z1Z2 + Z2Cin (6)
structure to implement the digital filter (Huang et al. 2012).
Multiplier efficiency determines the performance of digital
Cout =(Z1 ⨁ Z2 ⨁ Cin) Z3 + (Z1 ⨁ Z2 ⨁ Cin) Z4 + Z3Z4 (7)
filter when power consumption and throughput are considered
as the limiting factor.
TABLE I. TRUTH TABLE OF PROPOSED 4:2 COMPRESSOR
ARCHITECTURE
No of High
Cin Cox Cout Sout
Logic (Z*)
0 0 0 0 0
1 0 0 0 1
2 0 1 0 0
3 0 1 0 1
4 0 1 1 0
0 1 0 0 1
1 1 0 1 0
2 1 1 0 1
3 1 1 1 0
4 1 1 1 1
Authorized licensed use limited to: Middlesex University. Downloaded on November 03,2020 at 08:53:20 UTC from IEEE Xplore. Restrictions apply.
as 1 unit, AND gate is taken as 2 units and OR gate is taken as TABLE III PERFORMANCE COMPARISONS OF PROPOSED AND
3 units for calculating the area count of the proposed EXISTING COMPRESSOR ARCHITECTURES
compressor and other state-of-the-art designs. The delay of the Performance Metrics Area Delay Power PDP
proposed compressor design shown in Figure 3 is 2* ∆xor + Compressors
(µm)2 (ns) (µW) (µW-
1* ∆and + 2* ∆or + 1* ∆not. Table 2 gives the gate count in ns)
NAND equivalent and delay of the proposed compressor and Conventional
other state-of-the-art designs. The proposed logic has Compressor
125.19
11.116
0.796
8.848
minimum number of inverter in the critical path. The proposed Using Full
compressor reduces the number of gates required, which Adders
minimizes the interconnect delays and the associated glitches. Logic Level
Optimized
159.03
13.177
0.633
8.341
Compressor
(Chang et al.
2004)
High Speed
168.56
14.601
0.623
9.096
Compressor
(Baran et al.2010)
180
nm Logical
Decomposed
Compressor
154.51
12.836
0.642
8.241
(Pishvaie et al.
2013)
164.48
13.526
0.599
8.102
(Aliparast et. al.
2010)
Fig. 3. Proposed 4:2 Compressor
Proposed
Compressor
140.49
11.796
0.649
7.656
TABLE II NAND EQUIVALENT GATE COUNT AND DELAY OF
THE PROPOSED COMPRESSOR AND OTHER STATE-OF-THE-ART
DESIGN
Conventional
4:2 NAND Equivalent Delay Compressor
65.41
0.534
4.833
2.581
Compressor Using Full
NOT AND OR Adders
Conventional 8 24 18 4*Δxor
Compressor Logic Level
Optimized
Logic Level 14 24 27 2*Δxor+1*Δand +1*Δnor Compressor
81.89
0.425
5.729
2.435
Optimized +1*Δnot (Chang et al.
Compressor 2004)
0.418
6.348
0.431
5.581
2.405
0.402
5.881
2.364
0.437
5.168
2.258
Authorized licensed use limited to: Middlesex University. Downloaded on November 03,2020 at 08:53:20 UTC from IEEE Xplore. Restrictions apply.
compressor and ultra high speed compressor respectively. Using High
Delay of the proposed compressor is 18.47% and 18.16% less Speed
11.383
9.729
1.170
9989
Compressor
for 180 nm and 90 nm technologies compared to conventional (Baran et
compressor architecture. The PDP of the proposed compressor al.2010)
is reducing by 13.47%, 8.21%, 15.83%, 7.10% and 5.50% Using Logical
for 180 nm technology, 12.51%, 7.27%, 14.89%, 6.11% and Decomposed
4.48% for 90 nm technology compared with other state-of- Compressor
10.048
10.269
1.022
9245
90 nm
the-art-designs. (Pishvaie et al.
2013)
10.117
9.342
1.083
9752
SYSTOLIC ARRAY BASED DIGITAL FILTER (Aliparast et al.
2013)
To verify the performance of the proposed compressor an
implementation of processing element in systolic array based
digital filter shown in Figure 1 is done. The processing Using Proposed
10.446
0.901
9.412
7934
element is constructed using 16-bit carry look ahead adder and Compressor
8-bit Wallace tree multiplier with proposed compressor and
other state-of-the-art designs Table 4 shows the performance
estimates of the digital filter in terms of area, delay and power
dissipation. From Table 4 it can be seen that the digital filter
implemented with proposed compressor exhibits a PDP
reductions of at least 8.67% for 180 nm technology and 6.98%
for 90 nm technology respectively compared with other
state- of-the-art designs. A plot of PDP for the digital filter
with proposed and other compressor designs for 180 nm and
90 nm technologies is shown in Figures 4 and 5.
TABLE IV AREA, DELAY AND POWER COMPARISONS OF DIGITAL FILTER
IMPLEMENTED WITH PROPOSED AND OTHER STATE-0F-THE-ART
COMPRESSOR DESIGNS
39.083
13567
1.852
Compressor
Using Logic Level
14.924
36.146
18736
2.422
Optimized
Compressor
(Chang et al.2004)
Using High Speed
14.691
39.695
19679
2.702
Compressor
(Baran et al. 2010)
Using Logical
Decomposed
15.172
35.806
18212
2.360
180 Compressor
nm (Pishvaie et
al.2013)
Using Ultra High
Speed Compressor
14.107
34.647
19212
2.456
(Aliparast et al.
2013)
Fig. 5. PDP of the Digital Filter Implemented with Proposed Compressor and
State-of-the-Art Designs for 90nm Technology
Using Proposed
Compressor
15.573
31.644
15530
2.032
V.CONCLUSION
Using Conventional Energy efficient compressor architecture for systolic array
Compressor
13.976
11.209
1.049
Compressor
(Chang et al.2004) for 180 nm technology, 4.48% to 14.89% for 90 nm
technology compared with other state-of-the-art designs. This
suggests the suitability of the proposed compressor design for
Authorized licensed use limited to: Middlesex University. Downloaded on November 03,2020 at 08:53:20 UTC from IEEE Xplore. Restrictions apply.
portable applications. An 8- bit Wallace tree multiplier is the 21st IEEE International Conference on Mixed Design of Integrated
designed using the proposed compressor and other state-of- Circuits & Systems(MIXEDS),pp.127-130
the-art designs. From the comparison, it is seen that the [6] Chang, CH, Gu, J & Zhang, M 2004, ‘Ultra Low-Voltage Low-Power
CMOS 4-2 and 5-2 Compressors for Fast Arithmetic Circuits’, IEEE
multiplier design using proposed compressor architecture Transactions on Circuits and Systems-I:Regular Papers, vol. 51, no. 10,
achieves better performance in terms of PDP compared to pp. 1985-1997
other state-of-the- art designs. An implementation of the [7] Jamshidi, V, Fazeli, M & Patooghy A 2015, ‘A low power hybrid
proposed compressor in the processing element of systolic MTJ/CMOS (4-2) compressor for fast arithmetic circuits’,In 18th CSI
array based digital filter is done to evaluate its performance. IEEE International Symposium on Computer Architecture and Digital
Systems (CADS), pp. 1- 6.
REFERENCES [8] Howard, GM, Mokrian, P, Ahmadi, M & Miller, WC 2005, ‘Power and
delay analysis of 4: 2 compressor cells’, In IEEE International
[1] Aliparast, P, Koozehkanani, ZD & Nazari, F 2013, “An Ultra High
Symposium on Circuits and Systems (ISCAS 2005), pp.3559-3562.
Speed Digital 4-2 Compressor in 65-nm CMOS”, International Journal
of Computer Theory and Engineering, vol. 5, no. 4, pp. 593-597. [9] Nehru, K. and Linju, T.T., 2017. Design of 16 Bit Vedic Multiplier
Using Semi-Custom and Full Custom Approach. Journal of
[2] Shams, AM & Bayoumi, MA 1997, ‘A structured approach for
Engineering Science & Technology Review, vol. 10 no.2 pp.220-232.
designing low power adders’, In IEEE Conference Record of the
Thirty-First Asilomar Conference on Signals, Systems & Computers, [10] Prasad, K & Parhi, KK 2001, ‘Low-power 4-2 and 5-2 compressors’In
vol. 1, pp. 757-761 IEEE Conference Record of the Thirty-Fifth Asilomar Conference on
Signals,Systems and Computers, vol. 1, pp.129-133
[3] Huang, SC, Wang, HM & Chen, WY 2012, ‘A ±6 ms- Accuracy, 0.68
mm2 and 2.21 µW QRS detection ASIC’,VLSI Design, vol. 2012, pp. [11] Kandasamy, N., Ahmad, F., Reddy, S., Telagam, N. and Utlapalli, S.,
1-13 2018. Performance evolution of 4-b bit MAC unit using hybrid GDI
and transmission gate-based adder and multiplier circuits in 180 and
[4] Vinoth, C, Bhaaskaran, VK, Brindha, B, Sakthikumaran, S,
90 nm technology. Microprocessors and Microsystems, 59, pp.15-
Kavinilavu, V, Bhaskar, B, Kanagasabapathy, M & Sharath, B 2011,
28.
‘A novel low power and high speed Wallace tree multiplier for RISC
processor’, In IEEE 3rd International Conference on Electronics [12] Gavaber, M.D., Poorhosseini, M. and Pourmozafari, S., 2019. Novel
Computer Technology (ICECT), vol. 1, pp. 330-334 architecture for low-power CNTFET-based compressors. Journal of
Circuits, Systems and Computers, Vol. 28 no.12, p.1950207.
[5] Ghasemzadeh, M, Akbari, A, Hadidi, K & Khoei, A 2014, ‘A novel
fast glitchless 4-2 compressor with a new structure’, In Proceedings of
Authorized licensed use limited to: Middlesex University. Downloaded on November 03,2020 at 08:53:20 UTC from IEEE Xplore. Restrictions apply.