Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

International conference on Communication and Signal Processing, April 3-5, 2013, India

Hardware Implementation of Truncated


Multiplier Based on Multiplexer Using FPGA
Yogesh M. Motey, Tejaswini G. Panse

 obtain the final result. Although there are many algorithms for
Abstract— The paper is about the implementation of PCT accomplishing this, there is no reduction in the height of
multiplier whose design is based on multiplexer using Field partial products that need to be summed to produce the final
Programmable Gate Array (FPGA). Multiplier is such an result. The problem of more area and power consumption with
important element f r om t he poi nt of power consumption and
area occupied in the system. Multiplication using truncated
fast operation can be overcome using truncation schemes for
scheme provides an efficient method for reducing the power and multiplier. Truncated multiplication is significantly reducing
area as compared to that of full width multipliers. There are the power over standard parallel multiplier for different
many schemes for truncation in multiplier among them an operand sizes, is shown in different research papers. Many
adaptive pseudo-carry compensation truncation (PCT) scheme truncation schemes are proposed for array multipliers, few of
gives result with low error. This scheme is suitable to array them are Constant Correction Truncated (CCT) [12], Variable
multiplier designed using multiplexer based technique. The
comparative result between the two multipliers in this paper, Correction Truncated (VCT) [9], and a pseudo-carry
PCT multiplier occupies just about 34% more area with compensation truncation (PCT) scheme [1]. In constant
approximately 38% less power consumption and with low error correction truncation (CCT) the correction constant is fixed for
probability. The designed PCT multiplier is power efficient and specific values of n and k regardless of the value of the
faster than the compared one in terms of propagation delay. The multiplicand and multiplier. A non-zero DC component of the
future scope is it can be used for image compression by
resulting product is incurred by this fixed correction constant.
implementation on Field Programmable Gate Array (FPGA).
A non-zero dc component is added based on specific values of
Index Terms— Digital multiplier, PCT truncated multiplier, n and k to Columns (n-1) to (k-1) of the pp matrix. To adapt
Computer arithmetic, FPGA, VLSI Design. the correction to the input values, a variable correction
truncation (VCT) scheme was proposed. A data-dependent
variable correction truncation scheme (VCT) is proposed
I. INTRODUCTION where the most significant pp bits from the (n-k-1)th column

A N eminence multiplier is always being a need of are stacked over the (n-k)th column and a constant bias of ‘C’
electronics industry for applications in DSP, image is added in Columns (n-1) to (n-k). The PCT technique takes
processing. A truncated multiplier is a p × p multiplier with p account of correction to the input values and caries generated
bits output. In a truncated multiplier the p less significant bits in each stage. In this paper pseudo-carry compensation
of the full-width product are discarded to compensate it some truncation (PCT) scheme is used and architecture of a
of the partial products are removed and replaced by a suit- able multiplier is designed in verilog HDL language using
compensation function, to trade-off accuracy with hardware structural modeling and simulated on Xilinx and finally
cost. A system’s performance is generally depending on the Implemented on FPGA Spartan 3 kit.
performance of the multiplier because the multiplier operation
is time consuming which makes it slowest clement in the
system. Moreover, it is generally the large area consuming. II. TRUNCATED MULTIPLIERS
That's why; optimizing both the speed and area of the
multiplier is a key design issue. Conversely, area and speed are Most existing schemes target both array and tree multipliers.
usually conflicting constraints so that improving one result in The design of high-speed, area-efficient multipliers is essential
affecting other. Most algorithms involve a shift and an add for VLSI implementations of digital signal processing systems.
technique where the multiplicand is conditionally added to Use of truncations schemes gives significant reduction in
complexities of design. Truncation is best suited where exact
result is not always required and a rounded product is used for
further computation.

Yogesh M. Motey, Yashvantrao Chavan College of Engineering, Nagpur,


INDIA (e-mail: yogesh.motey@gmail.com).
Tejaswini G. Panse, Yashvantrao Chavan College of Engineering, Nagpur,
INDIA (e-mail: tejaswini.deshmukh@gmail.com).

978-1-4673-1622-4/12/$31.00 ©2013 IEEE 401


n 1 n 1
P ¦ (x i ˜ yi )22i  ¦ M i 2i
i 0 i 1 (2)
Where Mi = Xi * yi + xi * Yi , Mi can be implemented
as a multiplexer With xi and yi as select signals. Mi = 0, Xi, Yi
and Xi + Yi when xi , yi = 00 ", “01”, “10” and “11”,
respectively. Without loss of generality, our proposed
truncated scheme can be explained with the help of a truncated
multiplexer matrix of Fig. 1 with n = 8 and k = 2, where k is
the number of partial product (pp) columns to be kept beyond
the width, n of the truncated product. Using VCT scheme, the
multiplexers from Column 5 (i.e., n−k−1) will be stacked on
Column 6 (i.e., n−k). However, the product bit in Column 6 is
dependent on the carry’s generated from Column 5 into
Column 6 rather than the sum of its pp bits. Directly stacking
the pp bits of Column 5 onto Column 6 can also create
excessive error due to the carry. Propagated into Column 7
Fig. 1. RTL view of Truncated Multiplier. (i.e., n−k+1). From the definition of Mi , if both xi and yi are
1’s, then the sum of Xi and Yi is selected. If the select signals
A. Truncated Multiplier of all the multiplexers in Column 5 are 1’s, then the sums of
The truncated multiplier is an array multiplier. The the multiplexer inputs, i.e. s0 , s1, and s2, are selected. In the
architecture consists of AND gates and two types of fixed worst case, the carry signals, c0 = x0 * y0, c1 = x1 * y1 and c2 =
input multiplexers. One is with inputs 0, 0, 0, Ci, where Ci is x2 * y2 are obtained. Our idea is to add the carry signals
carry bit input of multiplexer which will take care of carry generated from the inputs to the multiplexers in the (n−k−1)th
generated in the previous column in architecture or from inputs column to the (n−k)th column. The error is reduced as only the
necessary carry’s from the (n−k−1)th column are added to the
by ANDing them and other with 0, xi, yi, si where xi,yi are the
(n−k)th column. The error is still present, though, because of
input bits of 8 bit numbers and si is the sum bit taking
the assumption that all select signals of the multiplexers in the
summation result coming from the previous column of
(n−k−1)th are “1”. This approach is, however, closer to the
multiplexer. The figure 1 shows The RTL view for Truncated carry propagation in full-width multiplier as opposed to VCT.
multiplexer matrix multiplier for an 8*8-bit multiplier; n=8 To minimize the truncation error for an unsigned integer
most significant columns and k=2 additional columns of multiplication, a new pseudo-carry compensated truncation
multiplexers. This view is obtained in Xilinx 9.1 gives idea (PCT) scheme consisting of an adaptive compensation circuit
about the basic building blocks in the design. and a fixed bias is proposed. Finally, the n-bit truncated
product, Pt , with its least significant bit weighted 20 , is given
B. A Pseudo-Carry Compensation Truncation (PCT) by
Multiplier n 1 r 1 n 1

A novel truncation scheme for the multiplexer based array


P ¦M
i r
i 2i  n  ¦ r 1
M ti 2r  n  ¦ r 1
( xi ˜ yi )22i n
i [ ] i [ ]
multiplier was propose in [7]. A brief antecedent of this work 2 2
r 1
was presented in [13]. The proposed method achieves lower  ¦ ( xi ˜ yi )2r  n  21
average and spread of errors by means of adaptive pseudo i 0 (3)
carry compensation and a simple deterministic constant bias
that is independent of k [1]. By exploiting the symmetry of the In this architecture the building blocks are again
multiplexer-based array multiplier, the pp bits generated by the multiplexers but they are designed by combining full adders
multiplexers in our truncated multiplier can be accumulated in and half adders with 4-to-1 multiplexer gives FMUX and
a carry-save format to further reduce the area and improve the HMUX. Therefore HMUX are placed at the boundary and
speed over other truncated array multipliers [1]. FMUX at the interior of the array. A Fixed correction bias of 1
1) NEW TRUNCATED MULTIPLICATION SCHEME is used as input to the leftmost HMUX in array for adaptive
compensation of error. The ripple carry adders (RCA) are used
The product of two n-bit positive integers X=xn−1 , xn−2 ,…. at the bottom of the array to add the outputs of FMUX and to
, x1 , x0 And Y= yn−1 , yn−2 ,…. , y1 , y0 is a 2n-bit product get the final product bit, for MSB instead of RCA logical
P=XY. The numbers are assumed to be fractional in their error ORing block is used. The figure 2 shows RTL view of a PCT
analysis and the inputs and output are scaled by a factor of 2−n Truncated multiplexer matrix multiplier for an 8*8-bit
and 2−2n , respectively [1]. multiplier; it is again an array multiplier.

P= {Xn−1 + 2n−1 xn−1}{Yn−1 + 2n−1 yn−1} (1)

402
Particularly, the Spartan-3AN is used as a target technology in
this paper. Spartan-3AN combines all the feature of Spartan-
3A FPGA family plus leading technology in-system flash
memory for configuration and nonvolatile data storage [15].
The programming is done by using Verilog HDL language
using a structural modeling in place of behavioral or data flow
modeling to get a known structure or size of design which is
not possible with other two modeling’s. Each building block
i.e. gates and different multiplexers are designed and using
port mapping or entity called in programming.

IV. RESULTS
Simulations are performed on Xilinx 9.1 for spartan 3
FPGA kit. The number slices occupied by Truncated
Fig. 2. RTL view of PCT Truncated Multiplier Multiplier is 38 and by PCT Truncated Multiplier is 51 which
is relatively just about 34% more area than Truncated
Multiplier shown in Table I with approximately 38% less
III. FIELD PROGRAMMABLE GATE ARRAY (FPGA) power consumption with respect to area consumed. The power
The Spartan-3 FPGA belongs to the fifth generation Xilinx is calculated at the constant ambient temperature of 25C and
family. It is specifically designed to meet the needs of high the constant values of dc load current, load capacitance and
different voltages as shown in Table II.
volume, low unit cost electronic systems. The family consists
of eight member offering densities ranging from 50,000 to five
million system gates. The Spartan-3 FPGA consists of five Available Truncated PCT
fundamental programmable functional elements: CLBs, IOBs,
Block RAMs, dedicated multipliers (18×18) and digital clock
managers (DCMs), Spartan-3 family includes Spartan- 3L, No.of 3584 38 51
Spartan-3E, Spartan-3A, Spartan-3A DSP, Spartan-3AN and Slices
the extended Spartan-3A FPGAs [15]. occupied

% Area 100% 1% 1%

Table I: Area Comparison

Idc=10mA Load Vccint = Vccaux Vcco=


cap.= 2.50V = 2.50V
1000fF 3.30V

Truncated PCT

Power 360 346


(mW)

Table II: Power Consumption

There is no propagation delay in PCT multiplier which is


present in truncated multiplier.
For Truncated Multiplier
1. Minimum input arrival time before clock: 3.578ns
2. Maximum output required time after clock: 6.141ns
3. Maximum combinational path delay: No path found
For PCT Multiplier
1. Minimum input arrival time before clock: No path
found.
2. Maximum output required time after clock: No path
Fig. 3. FPGA KIT found

403
3. Maximum combinational path delay: 25.157ns

V. CONCLUSION
A conclusion from the results obtained is that PCT
Truncated Multiplier is occupying more area but its power
consumption is better than Truncated Multiplier with less
delay in terms of a propagation delay. The future work on this
is to implement this multiplier for the application of image
compression using microcontroller coding for controlling and
data or image transfer in or from FPGA.

REFERENCES
[1] Chip-Hong Chang and Ravi Kumar Satzoda, “A Low Error and High
Performance Multiplexer-Based Truncated Multiplier” Chip-Hong
Chang and Ravi Kumar Satzoda, IEEE Transaction on Very Large Scale
Integration (VLSI) Systems, Vol. 18, NO. 12, December 2010.
[2] Anitha R, Bagyaveereswaran V “Braun’s Multiplier Implementation
using FPGA with Bypassing Techniques” International Journal of VLSI
design & Communication Systems (VLSICS) Vol.2, No.3, September
2011 p.p 201-212 [DOI : 10.5121/vlsic.2011.2317]
[3] VALERIA GAROFALO, “Truncated Binary Multipliers with Minimum
Mean Square Error: Analytical Characterization, Circuit Implementation
and Applications.” DIPARTIMENTO DI INGEGNERIA BIOMEDICA,
ELETTRONICA E DELLE TELECOMUNICAZIONI A.A, 2008-09.
[4] Muhammad H. Rais “Efficient Hardware Realization of Truncated
Multipliers using FPGA” International Journal of Engineering and
Applied Sciences 5:2 2009 p.p 124-128.
[5] L. -D.Van and C. -C.Yang, “Generalized low-error area-efficient
fixedwidth multipliers,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol.
52, no. 8, pp. 1608–1619, Aug. 2005.
[6] J. E. Stine and O. M. Duverne, “Variations on truncated multiplication,”
in Euromicro Symposium on Digital System Design, pp. 112–119,
2003.
[7] K. Z. Pekmestzi, “Multiplexer-based array multipliers,” IEEE Trans.
Comput., vol. 48, no. 1, pp. 15–23, Jan. 1999.
[8] M. J. Schulte, J. E. Stine, and J. G. Jansen, “Reduced power dissipation
through truncated multiplication,” in Proc. IEEE Alessandro Volta
Memorial Workshop Low-Power Des., pp. 61–69, Mar. 1999.
[9] E. J. King and E. E. Swartzlander, Jr., “Data-dependent truncated
scheme for parallel multiplication,” in Proceedings of the Thirty First
Asilomar Conference on Signals, Circuits and Systems, pp. 1178–1182,
1998.
[10] S. Kidambi, F. El-Guibaly, and A. Antonious, “Area-efficient
multipliers for digital signal processing applications,” IEEE Transaction
on Circuits and Systems II: Analog and digital signal processing, vol.
43, no. 2, pp. 90–95, Feb. 1996.
[11] K. Bickerstaff, M. J. Schulte, and E. E. Swartzlander, Jr., “Parallel
Reduced Area Multipliers,” Journal of VLSI Signal Processing, vol. 9,
pp. 181–192, April 1995.
[12] M. J. Schulte and E. E. Swartzlander, Jr., “Truncated multiplication
with correction constant,” in VLSI Signal Processing VI, pp. 388–396,
October 1993.
[13] C. H. Chang, R. K. Satzoda, and S. Sekar, “A novel multiplexer based
truncated array multiplier,” in Proc. IEEE Int. Symp. Circuits Syst.
(ISCAS), Kobe, Japan, May 2005, pp. 85–88.
[14] C.Wallace, “A suggestion for fast multiplier,” IEEE Transaction on
Electronic Computers, vol. EC-13, no. 1, pp. 14–17, Feb. 1964.
[15] Xilinx, Spartan-3 FPGA family datasheet, 2008.

404

You might also like