Professional Documents
Culture Documents
Bhattacharjee 2011
Bhattacharjee 2011
Abstract— This paper describes the design and extension (as long as FPGA is not exhausted) facilities
implementation of low power arithmetic circuits for available in the FPGAs. Field programmable gate arrays
digital signal processing (DSP) applications, using Xilinx (FPGAs) provide a configurable structure through an
XC5VLX30 (Virtex-5) field programmable gate array array of adjustable logic modules interconnected by
(FPGA) devices. DSP is a highly demanding application programmable routing resources and surrounded by
domain in the present day technology wherein the programmable input/output blocks. In this research work
demands for enhanced performance and reduced we have presented the FPGA based design and
resource utilization have increased over the years. implementation of low power faster adder and
Recent advancements in FPGA design technology multiplier circuits comparable with peer research works
[1,3,4,5,6,7]
through the incorporation of DSP functional blocks along for DSP applications, as well as their performance
with the inherent FPGA features like high flexibility analysis based on resource usage, delay and power
through reconfiguration, reusability, moderate cost and considerations.
feature extension has resulted in FPGA(s) becoming the
preferred platform for evaluating and implementing This paper shows the results of our study
DSP. The arithmetic circuits for addition and based on the implementation of different types of adders
multiplication are the core of any DSP hardware as and multiplier [12, 13] (fixed point arithmetic circuits) circuits
all the operations in DSP domain are a combination in respect to resource utilization, delay and power measure.
of these. In this work we have implemented the Though related work exists for the design of the arithmetic
various forms of adder and multiplier circuits on circuits in VLSI domain for DSP applications [6, 7, 8, 10, 11, 12],
FPGA, to have an analysis for finding out the most the FPGA based design issues were not considered in those.
suitable arithmetic circuits for FPGA based DSP The works on FPGA based design of arithmetic circuits [9, 13]
implementation. We present an analysis of our have focused more on custom design and their realization,
implementation results in respect to delay, power but a detailed analysis of the different arithmetic circuits in
requirement and implementation costs of the different 8, terms of FPGA design metrics of resource utilization, delay
16, 32 and 64 bit circuits that can be realized for and power were not discussed. In [1] the author has discussed
implementing the basic fixed-point arithmetic units in about the adder and multiplier circuits for FPGA based
FPGA. design, but no results of power analysis were demonstrated.
In our research we have tried to focus on the design and
Keywords: Arithmetic circuits, adder circuits, low power implementation issues of the common arithmetic circuits used
design, FPGA, DSP, Xilinx Xpower in DSP applications, in terms of the FPGA design metrics.
The work done in our paper is quite extensive and it proves to
I. INTRODUCTION be a very good analysis in the domain of FPGA based circuit
design for DSP applications. Virtex-5 FPGA [2] has been
The adders and multipliers are the most fundamental chosen for implementing our designs, since it possesses
arithmetic circuits w h i c h a r e used to build DSP [9] arithmetic-support features.
hardware. Since these circuits perform key operations in The paper is divided into six sections. After the
the present day VLSI, their speed and p o w e r optimization introduction, Section II describes the design of different
are crucial quality factor in high performance DSP adder circuits. S ection III describes t h e different
applications. Typically DSP applications require a tradeoff multiplier circuits. The simulation environment and results
between power consumption and speed, hence there is are briefed in Section IV. The analysis of the results is
an immense need for low power [8,11] high speed design of presented in Section V, and concluding remarks are discussed
arithmetic circuits. in Section VI.
Nowadays, FPGA is a much favored platform for digital
VLSI design. This is due to the high flexibility, reusability,
low power, moderate cost, easy upgrading (due to usage
of hardware description languages (HDLs)) and feature
II. DESIGN OF ADDER CIRCUITS Booth one, Booth two, Booth three singed multipliers
We have designed and implemented the following of base two, which increase the speed by computing
several bits of the results in one computational step.
adder circuits
• Carry chain adder (CCA): Basic adder structure
based on full adders connected into a chain. The IV. SIMULATION ENVIRONMENT & RESULTS
carry-out of the ith full-adder ( FA) is connected with We have performed all the fixed point arithmetic
the carry-in of the (i+1)th FA, to generate the n- circuits i.e. adders and multipliers implementation on
bit sum outputs and a carry out. This is the simplest Virtex-5, XC5VLX30 device with Xilinx ISE11.1 design
circuit for implementation and it is important for environment, for behavioral simulation purpose we have
comparison with the other adder circuits. used ModelSim and for optimized power estimation we have
• Carry select adder (CSA): In this circuit two used Xilinx Xpower tool.
sets of adders are used, one with the carry-in signal A. Introduction to Xilinx Xpower tool
driven to logic 0 and the second with the carry-in
signal driven to logic 1.The result is selected by Xpower [2] estimates the power based on the
multiplexers controlled by the actual carry-in signal. observation of the dynamic power consumption in CMOS
circuits due to switching activity. Each element (LUT, FF,
• Carry look ahead adder (CLA): The main idea
behind the carry look-ahead addition is an attempt to BRAM, and routing segment) that can switch has a
generate all incoming carries in parallel and avoid capacitance model associated with it. Clock signals and
waiting until the correct carry propagates from the primary input signals are assigned specific frequencies
stage (FA) of the adder where it has been by the user. Xpower estimates power as a summation of the
generated. The carry look ahead adder uses two power consumed by each element in the design. The power
special signals G (generate) and P (propagate) to consumed by each switching element in the design is given
predict the propagation of a carry signal without by:
using a carry chain. P = C * V2 * E * F * 1000
Where: P=Power in mW; C=Capacitance in Farads
• Carry skip adder (CKA): A carry-skip adder reduces V=Volts; E=switching activity
the carry-propagation time by skipping over groups
F=Frequency in Hz
of consecutive adder stages. The carry-skip adder
is usually comparable in speed to the carry look-ahead B. Results of fixed point addition
technique.
Table I shows the results of measurements carried out on
• Sign magnitude adder (SMA): This is simple carry the implemented adder structures of 16, 32 and 64 bit width
ripple adder with most significant bit used as sign on Virtex-5 FPGA. The results show that the best adder
bit. It takes more area as well as more power. structure in terms of both area and time is the carry look
ahead adder for 64 bit width.
III. DESIGN OF MULTIPLIER CIRCUITS
We have designed and implemented the following TABLE I: RESULTS OF FIXED POINT ADDERS IMPLEMENTED ON
VIRTEX-5, XC5VLX30,PACKAGE-FF243
multiplier circuits
• Basic multiplier (BM): The most basic parallel N=8Bit N=16Bit N=32Bit N=64Bit
mode multiplier circuit that computes
Area Delay Area Delay Area Delay Area Delay
multiplication in a human like way, i.e., going (Slices) (ns) (Slices) (ns) (Slices) (ns) (Slices) (ns)
through the bits of one operand and
simultaneously accumulating the second operand CCA 4 5.521 9 7.872 18 12.111 42 20.59
and shifting the sum.
CSA 4 5.521 7 7.64 17 11.879 50 20.358
• Carry save multiplier (CSM): A parallel
mode multiplier that uses a tree of carry save adders. CKA 4 5.752 7 7.872 20 12.111 50 20.5
Carrie save multipliers have a shorter delay than a 9
normal multiplier, also takes less number of slices. CLA 4 4.833 16 9.165 23 9.574 41 11.468
• Carry ripple multiplier (CRM): A parallel SMA 7 8.852 15 12.16 33 20.146 70 28.64
mode multiplier that uses carry ripple adders. Its
delay is greater than carry save multiplier and also
takes more slices than carry save multipliers.
• Booth multiplier for signed operands (BSM): This
is also a parallel mode multiplier that uses
Booth [10] recording. This enables us to compute
signed number multiplication. We have implemented
implemented adder circuits of 16,32 and 64bit width on
Virtex-5 FPGA. In Figure 2 and Figure 3 we have compared
TABLE II : RESULTS OF POWER CALCULATION OF FIXED POINT ADDERS the highest and lowest total quiescent power and total dynamic
IMPLEMENTED ON VIRTEX-5, DEVICE-XC5VLX30,324 power between the adders.
Are a Figure 2. Comparison between the highest and the lowest total
dynamic power
80
70
60
50
40 S er ies 1
30
20
10
0
C ar r y lo o k ahead ad d er S ig n m ag nit ud e ad d er
6 4 b it 6 4 b it
D e la y
35
Figure 3. Comparison between the highest and the lowest total quiescent
power
30
TABLE III. RESULTS OF FIXED POINT MULTIPLIER IMPLEMENTED
25 ON VIRTEX-5, DEVICE-XC5VLX30,PACKAGE-FF324
r
r
r
r
li e
l ie
l ie
l ie
li e
l ie
tip
ti p
tip
tip
ul
ul
ul
ul
ul
ul
m
m
m
m
m
ee
r ry
ne
o
ic
ve
_o
hr
ca
sa
Ba
_t
_t
th
th
le
ry
th
o
p
o
ar
Bo
o
Bo
ip
Bo
C
0.299
0.2985
Power(watt)
0.298
D e la y 0.2975
0.297 S eries1
70
0.2965
0.296
60 0.2955
50 0.295
Delay(ns)
40
Carry save
Booth_one
Booth_three
Booth_two
Ripple carry
multiplier
multiplier
multiplier
multiplier
S e r ie s 1
multiplier
multiplier
Basic
30
20
10
0
r
M ultiplie r 32_Bit
r
r
r
li e
l ie
l ie
r
li e
l ie
l ie
ti p
t ip
t ip
ti p
tip
tip
ul
ul
ul
ul
ul
ul
m
m
m
m
m
ee
y
ne
o
ve
ic
rr
hr
as
ca
_o
sa
_t
_t
th
th
e
ry
th
pl
oo
oo
ar
oo
ip
B
C
M u lt ip lie r 3 2 _ B it
0.12
0.1 to support arithmetic operations, a trend that seems to be
0.08 Series1
0.06 common at the present time for FPGA chips. From our
0.04
0.02
research we have observed that not only the design with less
0 numbers of circuits i.e. less slices gives less power but also the
Carry save
multiplier
Booth_one
Booth_two
Booth_three
circuits having smaller delays consumes less power.
Ripple carry
multiplier
multiplier
multiplier
multiplier
multiplier
Basic
REFERENCES
Multiplier 32_Bit
[1] Beèváø M. and Štukjunger P. , Fixed-Point Arithmetic in FPGA,
Acta, Polythecnica Vol 45.,No.2, 2005
[2] http://www.xilinx.com
Figure 6. Comparison between the highest and the lowest total [3] Wallace, C. A Suggestion for a Fast Multiplier. IEEE Transactions on
dynamic power of the 32bit multipliers Electronic Computers, 13:14–17, 1964.
[4] Ercegovac, M. D., Lang, T.: Digital Arithmetic, Morgan Kaufmann
From our comparison of fixed-point multiplier circuit in Publishers, San Francisco, 2003.
Figure 5 and Figure 6 we can say that minimum total [5] Hennessy, J. L., Patterson, D. A.,Goldberg, D.: Computer Architecture:
A Quantitative Approach: Appendix H Computer Arithmetic,
quiescent power and total dynamic power is consumed by the Elsevier Science, 1995.
Booth three multiplier for 32-bit width and maximum of both [6] Wang, Z. and W. C. Miller. A new design technique for column
type of power is consumed by the Booth one multiplier for 32- compression multipliers. IEEE Transactions on Computers, vol.
bit width. From the comparison of Figure 4 we can also 44:962–970, 2005.
observe that minimum slices are used for 32-bit Booth three [7] Cheng, F. and M. Theobald. Design of Synchronous variable-
latency pipelined multipliers. IEEE Transaction on Computers, vol. 49:
multiplier and the maximum slices are used for 32-bit Booth 659-672,2005
one multiplier. Here also we can conclude that minimum [8] Shanthala S and S. Y. Kulkarni, “VLSI Design and Implementation of
number of slice utilization gives minimum power but it is not Low Power MAC Unit with Block Enabling Technique”, European
totally correct. This is because we can observe that in case of Journal of Scientific Research, ISSN 1450-216X Vol.30 No.4 , pp.620-
multiplier design only minimum slices dosen’t mean minimum 630, 2009.
power as the delay is responsible for minimum power. From [9] Lakshmi Narayanan G. and Venkataramani B., “Optimization
Techniques for FPGA-Based Wave Pipelined DSP Blocks” IEEE Trans.
Figure 4 we can observe that Booth two multiplier consumes Very Large Scale Integr. (VLSI) Syst., vol.13. no 7 . pp 783- 792, July
less slices than Booth three multiplier but power consumtion is 2005
much higher than Booth three multiplier, because in Booth [10] Lee H., “A power-aware scalable pipelined Booth multiplier,” in
two multiplier delay is greater than the Booth three multiplier. Proc.IEEE Int. SOC Conf., 2004, pp. 123–126
[11] Chen K.H., Chen Y. M., and Chu Y. S., “A versatile multimedia
functional unit design using the spurious power suppression technique,”
VI. CONCLUSION in Proc. IEEE Asian Solid-State Circuits Conf., 2006, pp. 111–114.
[12] Huang Z., “High level optimization techniques for low power multiplier
design” Ph.D. Thesis, University of California, los angels, 2003.
To design a low power high speed arithmetic circuit is [13] Beuchat L., Muller J. M.,"Automatic Generation of Modular Multipliers
always a challenge for DSP applications. Our study provides a for FPGA Applications," LIP Research Report W2007-1, 2007.
comparison of the circuits of fixed-point adders and multipliers
implemented in FPGA. In peer research works [1] they have
compared only about the delay and the area but did not
considered power consumption of the arithmetic circuits. In our