Bhattacharjee 2011

Evaluation of Power Efficient Adder and Multiplier
Circuits for FPGA Based DSP Applications

1
Subhankar Bhattacharjee, 2Sanjib Sil, 3Biswajit Basak and 4 Amlan Chakrabarti
1,2
Dept-ECE,Techno India College of Technology, Newtown, Rajarhat, Kolkata-700156, India
3
Dept ECE,Hooghly Engineering & Technology College, Pipulpati, Hoogly, India
4
A.K.Choudhury School of Information Technology, University of Calcutta, Kolkata, India
Email: 1subhankarb42002@yahoo.co.in,2sanjib_sil@hotmail.com,3biswajitbsk@yahoo.co.in,4acakcs@caluniv.ac.in
Abstract— This paper describes the design and extension (as long as FPGA is not exhausted) facilities
implementation of low power arithmetic circuits for available in the FPGAs. Field programmable gate arrays
digital signal processing (DSP) applications, using Xilinx (FPGAs) provide a configurable structure through an
XC5VLX30 (Virtex-5) field programmable gate array array of adjustable logic modules interconnected by
(FPGA) devices. DSP is a highly demanding application programmable routing resources and surrounded by
domain in the present day technology wherein the programmable input/output blocks. In this research work
demands for enhanced performance and reduced we have presented the FPGA based design and
resource utilization have increased over the years. implementation of low power faster adder and
Recent advancements in FPGA design technology multiplier circuits comparable with peer research works
[1,3,4,5,6,7]
through the incorporation of DSP functional blocks along for DSP applications, as well as their performance
with the inherent FPGA features like high flexibility analysis based on resource usage, delay and power
through reconfiguration, reusability, moderate cost and considerations.
feature extension has resulted in FPGA(s) becoming the
preferred platform for evaluating and implementing This paper shows the results of our study
DSP. The arithmetic circuits for addition and based on the implementation of different types of adders
multiplication are the core of any DSP hardware as and multiplier [12, 13] (fixed point arithmetic circuits) circuits
all the operations in DSP domain are a combination in respect to resource utilization, delay and power measure.
of these. In this work we have implemented the Though related work exists for the design of the arithmetic
various forms of adder and multiplier circuits on circuits in VLSI domain for DSP applications [6, 7, 8, 10, 11, 12],
FPGA, to have an analysis for finding out the most the FPGA based design issues were not considered in those.
suitable arithmetic circuits for FPGA based DSP The works on FPGA based design of arithmetic circuits [9, 13]
implementation. We present an analysis of our have focused more on custom design and their realization,
implementation results in respect to delay, power but a detailed analysis of the different arithmetic circuits in
requirement and implementation costs of the different 8, terms of FPGA design metrics of resource utilization, delay
16, 32 and 64 bit circuits that can be realized for and power were not discussed. In [1] the author has discussed
implementing the basic fixed-point arithmetic units in about the adder and multiplier circuits for FPGA based
FPGA. design, but no results of power analysis were demonstrated.
In our research we have tried to focus on the design and
Keywords: Arithmetic circuits, adder circuits, low power implementation issues of the common arithmetic circuits used
design, FPGA, DSP, Xilinx Xpower in DSP applications, in terms of the FPGA design metrics.
The work done in our paper is quite extensive and it proves to
I. INTRODUCTION be a very good analysis in the domain of FPGA based circuit
design for DSP applications. Virtex-5 FPGA [2] has been
The adders and multipliers are the most fundamental chosen for implementing our designs, since it possesses
arithmetic circuits w h i c h a r e used to build DSP [9] arithmetic-support features.
hardware. Since these circuits perform key operations in The paper is divided into six sections. After the
the present day VLSI, their speed and p o w e r optimization introduction, Section II describes the design of different
are crucial quality factor in high performance DSP adder circuits. S ection III describes t h e different
applications. Typically DSP applications require a tradeoff multiplier circuits. The simulation environment and results
between power consumption and speed, hence there is are briefed in Section IV. The analysis of the results is
an immense need for low power [8,11] high speed design of presented in Section V, and concluding remarks are discussed
arithmetic circuits. in Section VI.
Nowadays, FPGA is a much favored platform for digital
VLSI design. This is due to the high flexibility, reusability,
low power, moderate cost, easy upgrading (due to usage
of hardware description languages (HDLs)) and feature
II. DESIGN OF ADDER CIRCUITS Booth one, Booth two, Booth three singed multipliers
We have designed and implemented the following of base two, which increase the speed by computing
several bits of the results in one computational step.
adder circuits
• Carry chain adder (CCA): Basic adder structure
based on full adders connected into a chain. The IV. SIMULATION ENVIRONMENT & RESULTS
carry-out of the ith full-adder ( FA) is connected with We have performed all the fixed point arithmetic
the carry-in of the (i+1)th FA, to generate the n- circuits i.e. adders and multipliers implementation on
bit sum outputs and a carry out. This is the simplest Virtex-5, XC5VLX30 device with Xilinx ISE11.1 design
circuit for implementation and it is important for environment, for behavioral simulation purpose we have
comparison with the other adder circuits. used ModelSim and for optimized power estimation we have
• Carry select adder (CSA): In this circuit two used Xilinx Xpower tool.
sets of adders are used, one with the carry-in signal A. Introduction to Xilinx Xpower tool
driven to logic 0 and the second with the carry-in
signal driven to logic 1.The result is selected by Xpower [2] estimates the power based on the
multiplexers controlled by the actual carry-in signal. observation of the dynamic power consumption in CMOS
circuits due to switching activity. Each element (LUT, FF,
• Carry look ahead adder (CLA): The main idea
behind the carry look-ahead addition is an attempt to BRAM, and routing segment) that can switch has a
generate all incoming carries in parallel and avoid capacitance model associated with it. Clock signals and
waiting until the correct carry propagates from the primary input signals are assigned specific frequencies
stage (FA) of the adder where it has been by the user. Xpower estimates power as a summation of the
generated. The carry look ahead adder uses two power consumed by each element in the design. The power
special signals G (generate) and P (propagate) to consumed by each switching element in the design is given
predict the propagation of a carry signal without by:
using a carry chain. P = C * V2 * E * F * 1000
Where: P=Power in mW; C=Capacitance in Farads
• Carry skip adder (CKA): A carry-skip adder reduces V=Volts; E=switching activity
the carry-propagation time by skipping over groups
F=Frequency in Hz
of consecutive adder stages. The carry-skip adder
is usually comparable in speed to the carry look-ahead B. Results of fixed point addition
technique.
Table I shows the results of measurements carried out on
• Sign magnitude adder (SMA): This is simple carry the implemented adder structures of 16, 32 and 64 bit width
ripple adder with most significant bit used as sign on Virtex-5 FPGA. The results show that the best adder
bit. It takes more area as well as more power. structure in terms of both area and time is the carry look
ahead adder for 64 bit width.
III. DESIGN OF MULTIPLIER CIRCUITS
We have designed and implemented the following TABLE I: RESULTS OF FIXED POINT ADDERS IMPLEMENTED ON
VIRTEX-5, XC5VLX30,PACKAGE-FF243
multiplier circuits
• Basic multiplier (BM): The most basic parallel N=8Bit N=16Bit N=32Bit N=64Bit
mode multiplier circuit that computes
Area Delay Area Delay Area Delay Area Delay
multiplication in a human like way, i.e., going (Slices) (ns) (Slices) (ns) (Slices) (ns) (Slices) (ns)
through the bits of one operand and
simultaneously accumulating the second operand CCA 4 5.521 9 7.872 18 12.111 42 20.59
and shifting the sum.
CSA 4 5.521 7 7.64 17 11.879 50 20.358
• Carry save multiplier (CSM): A parallel
mode multiplier that uses a tree of carry save adders. CKA 4 5.752 7 7.872 20 12.111 50 20.5
Carrie save multipliers have a shorter delay than a 9
normal multiplier, also takes less number of slices. CLA 4 4.833 16 9.165 23 9.574 41 11.468
• Carry ripple multiplier (CRM): A parallel SMA 7 8.852 15 12.16 33 20.146 70 28.64
mode multiplier that uses carry ripple adders. Its
delay is greater than carry save multiplier and also
takes more slices than carry save multipliers.
• Booth multiplier for signed operands (BSM): This
is also a parallel mode multiplier that uses
Booth [10] recording. This enables us to compute
signed number multiplication. We have implemented
implemented adder circuits of 16,32 and 64bit width on
Virtex-5 FPGA. In Figure 2 and Figure 3 we have compared
TABLE II : RESULTS OF POWER CALCULATION OF FIXED POINT ADDERS the highest and lowest total quiescent power and total dynamic
IMPLEMENTED ON VIRTEX-5, DEVICE-XC5VLX30,324 power between the adders.
N=8Bit N=16Bit N=32Bit N=64Bit
Total Total Total Total Total Total Total Total

Dynamic Quiescent Dy. Qui. Dy. Qui. Dy. Qui.
Power Power Pwr Pwr Pwr Pwr Pwr Pwr
CCA 0.01907 0.29612 0.03608 0.29647 0.0701 029719 0.13917 029865
Watt Watt Watt Watt Watt Watt Watt Watt
CSA 0.01907 0.29612 0.03604 0.29647 0.0702 0.29719 0.13932 0.29865
CKA 0.01907 0.29612 0.03604 0.29647 0.07019 0.29719 0.13932 0.29865
CLA 0.01904 0.29611 0.03625 0.29648 0.06824 0.29715 0.13532 0.29857
SMA 0.01284 0.29599 0.03631 0.29648 0.07076 0.29720 0.14033 0.29867

Are a Figure 2. Comparison between the highest and the lowest total
dynamic power
80
70
60
50
40 S er ies 1
30
20
10
0
C ar r y lo o k ahead ad d er S ig n m ag nit ud e ad d er
6 4 b it 6 4 b it
D e la y
35
Figure 3. Comparison between the highest and the lowest total quiescent
power
30
TABLE III. RESULTS OF FIXED POINT MULTIPLIER IMPLEMENTED
25 ON VIRTEX-5, DEVICE-XC5VLX30,PACKAGE-FF324
20 N=8Bit N=16Bit N=32Bit

S er ies 1
15 Area Delay Area Delay Area Delay
Slices (ns) Slices (ns) Slices (ns)
10
BM 30 15.668 124 28.543 445 54.262
5
CSM 21 8.904 93 14.260 392 24.846
0 RCM 23 14.075 107 30.023 434 60.223
C ar r y lo o k ahead ad d er S ing m ag nit ud e ad d er
6 4 b it 6 4 b it BSM 35 16.644 142 25.028 530 47.189
(I)
BSM 23 11.561 77 20.178 284 37.511
Figure 1. Comparison between the highest and the lowest area and delay of (II)
the 64bit adders BSM 26 11.520 108 16.644 383 27.781
(III)
In Figure 1, we have compared the highest and lowest area

and delay between the adders. Table II shows the results by Table III shows the results of measurements carried out
measurement of power using Xpower tools for the on the implemented multiplier circuits of 8, 16 and 32 bit
width on Virtex-5 FPGA. The result shows that the TABLE IV. RESULTS OF POWER CALCULATION OF FIXED POINT
MULTIPLIERS IMPLIMENTED ON VIRTEX5XC5VLX30,FF3234
best multiplier structure in terms of area is Booth-two
multiplier for signed operands (Base 2) for 32 bit width. N=8Bit N=16Bit N=32Bit
Total Total Total Total Total Total

Ar e a
Dynamic Quiescent Dynamic Quiescent Dynamic Quiescent
Power Power Power Power Power Power
600 BM 0.03392 0.29643 0.06919 0.29717 0.14361 0.29874
500 Watt Watt Watt Watt Watt Watt
400 CSM 0.01722 0.29608 0.03569 0.29646 0.07605 0.29731
Area(Slices)
300 Series 1 Watt Watt Watt Watt Watt Watt

RCM 0.03389 0.29643 0.06713 0.29713 0.14084 0.29869
200
Watt Watt Watt Watt Watt Watt
100 BSM 0.03402 0.29643 0.06974 0.29718 0.14598 0.29879
0 (I) Watt Watt Watt Watt Watt Watt
BSM 0.03389 0.29643 0.06889 0.29716 0.14005 0.29867
r
r
r
r
r
r
li e
l ie
l ie
l ie
li e
l ie
(II) Watt Watt Watt Watt Watt Watt

ti p
tip
tip
ti p
tip
tip
ul
ul
ul
ul
ul
ul
BSM 0.03394 0.29643 0.05652 0.29690 0.04007 0.29656

m
m
m
m
m
m
ee
r ry
ne
o
ic
ve
(III) Watt Watt Watt Watt Watt Watt

w
s
_o
hr
ca
sa
Ba
_t
_t
th
th
le
ry
th
o
p
o
ar
Bo
o
Bo
ip
Bo
C
M u ltip lie r 32_Bit Total Quiescent Pow er
0.299
0.2985
Power(watt)
0.298
D e la y 0.2975
0.297 S eries1
70
0.2965
0.296
60 0.2955
50 0.295
Delay(ns)
40
Carry save
Booth_one
Booth_three
Booth_two
Ripple carry
multiplier
multiplier
multiplier
multiplier
S e r ie s 1
multiplier
multiplier
Basic
30
20
10
0
r
M ultiplie r 32_Bit
r
r
r
li e
l ie
l ie
r
li e
l ie
l ie
ti p
t ip
t ip
ti p
tip
tip
ul
ul
ul
ul
ul
ul
m
m
m
m
m
ee
y
ne
o
ve
ic
rr
hr
as
ca
_o
sa
_t
_t
Figure 5. Comparison between the highest and the lowest total

B
th
th
e
ry
th
pl
oo
oo
ar
oo
ip
quiescent power of the 32bit multipliers

B
B
C
M u lt ip lie r 3 2 _ B it
V. ANALYSIS OF THE RESULTS

Figure 4. Comparison of cost between the best and the worst From the comparison of the fixed-point adder circuits as
multiplier circuits: Comparison of Area and Delay
shown in Figure 2 and Figure 3, we can say that minimum total
dynamic and total quiescent power is consumed by the Carry
look ahead adder for 64-bit width and maximum of both type
In Figure 4 we have compared the highest and lowest of power is consumed by the Sign magnitude adder for 64-bit
area and delay between the 32bit multiplier. In Table IV width. From the comparison of Figure 1 we can also observe
we have shown the results by measurements of power that minimum slices are used for 64-bit Carry look ahead adder
u s i n g Xpower tools for the implemented adder circuits and maximum slices are used for 64-bit Signed magnitude
of 8, 16 and 32bit width on Virtex-5 FPGA. In our adder. We can suggest that the better design objective would be
implementation we have observed that 64 bit multiplier is to design an adder circuit with less number of circuit
not possible to implement in Virtex-5 FPGA due to components i.e. slices which will give less power.
hardware constraint. In the Figure 5 we have compared
between the highest and lowest total quiescent power
and Figure 6 shows the comparison between the highest
and lowest total dynamic power of the 32bit multiplier.
research we have focused both on delay and power
Total Dynamic Power
consumption of different types of arithmetic circuits on FPGA.
The Xilinx Virtex-5 FPGA [2] family was chosen as the
0.16
0.14 implementation platform, because it provides special features
Power(watt)
0.12
0.1 to support arithmetic operations, a trend that seems to be
0.08 Series1
0.06 common at the present time for FPGA chips. From our
0.04
0.02
research we have observed that not only the design with less
0 numbers of circuits i.e. less slices gives less power but also the
Carry save
multiplier
Booth_one
Booth_two
Booth_three
circuits having smaller delays consumes less power.
Ripple carry
multiplier
multiplier
multiplier
multiplier
multiplier
Basic
REFERENCES
Multiplier 32_Bit
[1] Beèváø M. and Štukjunger P. , Fixed-Point Arithmetic in FPGA,
Acta, Polythecnica Vol 45.,No.2, 2005
[2] http://www.xilinx.com
Figure 6. Comparison between the highest and the lowest total [3] Wallace, C. A Suggestion for a Fast Multiplier. IEEE Transactions on
dynamic power of the 32bit multipliers Electronic Computers, 13:14–17, 1964.
[4] Ercegovac, M. D., Lang, T.: Digital Arithmetic, Morgan Kaufmann
From our comparison of fixed-point multiplier circuit in Publishers, San Francisco, 2003.
Figure 5 and Figure 6 we can say that minimum total [5] Hennessy, J. L., Patterson, D. A.,Goldberg, D.: Computer Architecture:
A Quantitative Approach: Appendix H Computer Arithmetic,
quiescent power and total dynamic power is consumed by the Elsevier Science, 1995.
Booth three multiplier for 32-bit width and maximum of both [6] Wang, Z. and W. C. Miller. A new design technique for column
type of power is consumed by the Booth one multiplier for 32- compression multipliers. IEEE Transactions on Computers, vol.
bit width. From the comparison of Figure 4 we can also 44:962–970, 2005.
observe that minimum slices are used for 32-bit Booth three [7] Cheng, F. and M. Theobald. Design of Synchronous variable-
latency pipelined multipliers. IEEE Transaction on Computers, vol. 49:
multiplier and the maximum slices are used for 32-bit Booth 659-672,2005
one multiplier. Here also we can conclude that minimum [8] Shanthala S and S. Y. Kulkarni, “VLSI Design and Implementation of
number of slice utilization gives minimum power but it is not Low Power MAC Unit with Block Enabling Technique”, European
totally correct. This is because we can observe that in case of Journal of Scientific Research, ISSN 1450-216X Vol.30 No.4 , pp.620-
multiplier design only minimum slices dosen’t mean minimum 630, 2009.
power as the delay is responsible for minimum power. From [9] Lakshmi Narayanan G. and Venkataramani B., “Optimization
Techniques for FPGA-Based Wave Pipelined DSP Blocks” IEEE Trans.
Figure 4 we can observe that Booth two multiplier consumes Very Large Scale Integr. (VLSI) Syst., vol.13. no 7 . pp 783- 792, July
less slices than Booth three multiplier but power consumtion is 2005
much higher than Booth three multiplier, because in Booth [10] Lee H., “A power-aware scalable pipelined Booth multiplier,” in
two multiplier delay is greater than the Booth three multiplier. Proc.IEEE Int. SOC Conf., 2004, pp. 123–126
[11] Chen K.H., Chen Y. M., and Chu Y. S., “A versatile multimedia
functional unit design using the spurious power suppression technique,”
VI. CONCLUSION in Proc. IEEE Asian Solid-State Circuits Conf., 2006, pp. 111–114.
[12] Huang Z., “High level optimization techniques for low power multiplier
design” Ph.D. Thesis, University of California, los angels, 2003.
To design a low power high speed arithmetic circuit is [13] Beuchat L., Muller J. M.,"Automatic Generation of Modular Multipliers
always a challenge for DSP applications. Our study provides a for FPGA Applications," LIP Research Report W2007-1, 2007.
comparison of the circuits of fixed-point adders and multipliers
implemented in FPGA. In peer research works [1] they have
compared only about the delay and the area but did not
considered power consumption of the arithmetic circuits. In our

Bhattacharjee 2011

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bhattacharjee 2011

Uploaded by

Copyright:

Available Formats

Evaluation of Power Efficient Adder and Multiplier

Circuits for FPGA Based DSP Applications

N=8Bit N=16Bit N=32Bit N=64Bit

Total Total Total Total Total Total Total Total

SMA 0.01284 0.29599 0.03631 0.29648 0.07076 0.29720 0.14033 0.29867

20 N=8Bit N=16Bit N=32Bit

In Figure 1, we have compared the highest and lowest area

Total Total Total Total Total Total

300 Series 1 Watt Watt Watt Watt Watt Watt

(II) Watt Watt Watt Watt Watt Watt

BSM 0.03394 0.29643 0.05652 0.29690 0.04007 0.29656

(III) Watt Watt Watt Watt Watt Watt

M u ltip lie r 32_Bit Total Quiescent Pow er

Figure 5. Comparison between the highest and the lowest total

quiescent power of the 32bit multipliers

V. ANALYSIS OF THE RESULTS

You might also like