Techniques On FPGA Implementation of 8-Bit Multipliers: Prabhat Shukla, Naveen Kr. Gahlan, Jasbir Kaur

ISSN : 0976-8491 (Online) | ISSN : 2229-4333 (Print) IJCST Vol.
3, Issue 2, April - June 2012
Techniques on FPGA Implementation of 8-bit Multipliers

1
Prabhat Shukla, 2Naveen Kr. Gahlan, 3Jasbir Kaur
1,2,3
Dept. of E & EC, PEC University of Technology, Chandigarh, UT, India
Abstract design of high speed processors [1] .

Multiplication is one of the basic arithmetic operations and In the past, multiplication was implemented using a sequence of
fundamental building block in all DSP task. The objective of good addition subtractions and shift operation. Multiplication can be
multiplier is to provide a physically compact, good speed and low considered as a series of repeated addition. Each step of addition
power consumption. To save significant power consumption in generates a partial products. This repeated addition method that is
VLSI design, it is good to reduce its dynamic power that is major suggested by the arithmetic definition is slow that is almost always
part of total power dissipation. For higher order multiplications, replace by algorithm that makes use of positional representation
a huge number of adders are to be used to perform the partial [14]. This paper addresses high-level optimization techniques
product addition. Reduction of adders by introducing special kind for 8-bit multipliers. High-level techniques refer to algorithm
of adders that are capable to add five/six/seven bits per decade. and architecture level techniques that consider multiplication’s
These adders are called compressors. These compressors make arithmetic features and input data characteristics. Some of the
the multipliers faster as compared to the conventional design. In important algorithm presents in paper for VLSI implementable
this paper we present a comparative study of Array Multiplier, multiplication is Booth Multiplier, Array Multiplier and
Wallace Tree Multiplier, Booth Multiplier for Area, Power, Speed Wallace Tree Multiplier using Xilinx and Cadence tool.
in VLSI design of 8 bit Multipliers.
II. Multiplier
Keywords A basic multiplier can consist of three parts:
Multiplication, Array Multiplier, Wallace Tree Multiplier, Booth • Partial product generation
Multiplier, Compressor, RTL Design • Partial product addition and
• Final addition [12].
I. Introduction A multiplier essentially consist of two operands ,a multiplicand
With the high demanding of electronic portable devices, the “Y” and a multiplier “X” and produces a product “P”. In a
requirement of low power device is getting more attention in conventional multiplier, a number of partial products are formed
recent years. The primary concern of electronic portable device is first by multiplying the multiplicand with each bit of multiplier.
to extend operating hours without changing the battery residing These partial product are then added together to generate the
in device. Although advanced technology enhances battery life Product “P”. In short, we can break down multiplication into
to operate for longer hours, the complicated operations in the two parts ,namely partial product generation and partial product
high- end portable devices are still power hungry and is critical accumulation. Speeding up multiplication therefore must aim
for the low power design. Power dissipation is recognized as a • Speeding up Partial Product Generation (PPG)
critical parameter in modern VLSI design field. Dynamic power • Reduce the number of partial products
dissipation which is a major part of total power dissipation is • Speeding up partial product summation or
due to the charging and discharging capacitance in the circuit. • A combination of one or more of the above [9].
The formula for calculation of dynamic power dissipation is
Pd = CLV2f [14]. Power reduction can be achieved by various II. Basic Multiplication Operation
methods. They are reduction of output capacitance CL ,reduction The most basic form of multiplication consists of forming the
of power supply voltage V, reduction of switching activity and product of two binary numbers m and n. (m×n) bit multiplication
clock frequency. can be viewed as forming n partial product of m bits each, and
With the constant growth of computer applications such as then summing appropriately shifted partial products to produce
computer graphics and signal processing ,fast arithmetic unit an (m+n) bit result P. Binary multiplication is equivalent to
especially multipliers are becoming increasingly important [11]. a logical AND operation. Let A and B be the operands with
Many multimedia and DSP applications are highly multiplication m and n bits respectively. Using shift and add type of approach
intensive so that the performance and power consumption of the product P of these two operands can be represented as shown
these systems are dominated by multipliers [8]. Reducing the in equation.
power consumption for multiplication can be tackled at different
level of design hierarchy [6]. With the capacity, performance and
cost efficiency increasing, the FPGA (Field Programmable
Gate Array) is widely used in communications, digital signal
processing, industrial control systems. Using FPGA to design
digital circuit system has become one of a major method
in these fields. High-performance multiplier is one of the most
important devices [15].
Multiplication is a fundamental operation in most signal
processing algorithm. Multipliers have large area, long latency
and consume considerable power. There has been extensive work
on low-power multiplier at technology, physical circuit and logic
levels. More ever power consumption is directly related to data
switching patterns. Fast multipliers are a key topic in the VLSI
w w w. i j c s t. c o m International Journal of Computer Science And Technology 455

IJCST Vol. 3, Issue 2, April - June 2012 ISSN : 0976-8491 (Online) | ISSN : 2229-4333 (Print)
A Binary Multiplier is an electronic hardware device used in digital and is favorable for layout due to this advantage [13].
electronics or a computer or other electronic device to perform In Array Multiplier, consider two binary numbers A and B, of
rapid multiplication of two numbers in binary representation. m and n bits. There are mn summands that are produced in
It is built using binary adders. The rules for binary multiplication parallel by a set of mn AND gates. n x n multiplier requires
can be stated as follows: n (n-2) full adders, n half-adders and n2 AND gates. Also, in Array
• If the multiplier digit is a 1, the multiplicand is simply copied Multiplier worst case delay would be (2n+1) td [1].
down and represents the product.
• If the multiplier digit is a 0 the product is also 0.
IV. Methods and Performances

There are number of techniques that can be used to perform
multiplication. In general, the choice is based upon factors such
as latency, throughput, area, and design complexity [7]. Array
Multiplier, Booth Multiplier and Wallace Tree Multipliers are
some of the standard approaches to have hardware implementation
of binary Multiplier which are suitable for VLSI implementation
at CMOS level.
A. Array Multiplie
Array Multiplier is an efficient method of a combinational
multiplier. Array Multiplier is common in multiplier design due to
its regular and compact structure. The structure of Array Multiplier
is organized by several stages of adder and AND-gates [2]. It
generates all the partial products after only one AND-gate delay.
Then, it sums up all partial products sequentially. The advantage Fig. 1: 4 x4 Basic Multiplication
of this structure is that the arrangement of its adders is very regular
Fig. 2: Ripple Carry Array Multiplier
456 International Journal of Computer Science And Technology w w w. i j c s t. c o m

ISSN : 0976-8491 (Online) | ISSN : 2229-4333 (Print) IJCST Vol. 3, Issue 2, April - June 2012
Fig. 3:(a), (b) Complete RTL Schematic of Array Multiplier
Array Multiplier gives more power consumption as well as

optimum number of components required, but delay for this
multiplier is larger. It also requires larger number of gates because
of which area is also increased; due to this Array Multiplier is
less economical [4].
B. Wallace Tree Multiplier

A fast process for multiplication of two numbers was developed
by Wallace [7]. Using this method, a three step process is used to
multiply two numbers; the bit products are formed, the bit product
matrix is reduced to a two row matrix where sum of the row equals
the sum of bit products, and the two resulting rows are summed
with a fast adder to produce a final product.
In the Wallace Tree method, three bit signals are passed to
a one bit full adder (“3W”) which is called a three input Wallace
Tree circuit, and the output signal (sum signal) is supplied to the Fig. 4: Logic used in Wallace Tree Multiplier
next stage full adder of the same bit, and the carry output signal
thereof is passed to the next stage full adder of the same no of bit,
and the carry output signal thereof is supplied to the next stage of
the full adder located at a one bit higher position [5].
Fig. 5: Wallace Tree Multiplier

Fig. 6(b): RTL Schematic of Wallace Tree Multiplier
In the Wallace Tree method, the circuit layout is not easy although
the speed of the operation is high since the circuit is quite irregular.
Delay and Power dissipation of Wallace Tree Multiplier is least
whereas Array Multiplier is a best for reduced area applications
but not speed [3].
Fig. 6(a): RTL Schematic of Wallace Tree Multiplier

Table. 1: Possible Arithmetic Actions

Bits Operation Performed
00 no arithmetic operation
01 add multiplicand to left half of product
subtract multiplicand from left half of
10
product
11 no arithmetic operation
To speed up the multiplication Booth encoding performs several

steps of multiplication at once. Booth’s algorithm takes advantage
of the fact that an adder subtractor is nearly as fast and small as
a simple adder.
Fig. 7: Booth Multiplier
From the basics of Booth Multiplication it can be proved that the

addition/subtraction operation can be skipped if the successive
bits in the multiplicand are same. If 2 consecutive bits are same
then addition/subtraction operation can be skipped. Thus in most
of the cases the delay associated with Booth Multiplication are
smaller than that with Array Multiplier. However the performance
of Booth Multiplier for delay is input data dependent. The high
performance of Booth Multiplier comes with the drawback of
power consumption. The reason for this is the large number of
adder cells that consume more power [5].
Fig. 6(c): RTL Schematic of Wallace Tree Multiplier
Fig.6(a)/(b)/(c): Complete RTL Schematic of Wallace Tree

Multiplier
C. Booth Multiplier
One improvement in the multiplier is by reducing the number
of partial products generated. Booth Algorithm based on the
fact that only fewer partial product needs to be generated for a
multiplier consisting of consecutive ones or zeros [10]. The Booth
Multiplier scans the two bits at a time to reduce the number of
partial products

Fig. 8(c): RTL Schematic of Booth Multiplier

Fig. 8(a)/(b)/(c): Complete RTL Schematic of Wallace Tree
Multiplier
D. Compressor
Fig. 8(a): RTL Schematic of Booth Multiplier Compressors are arithmetic components, similar in principle to
parallel counters, but with two distinct differences: (1) they have
explicit carry-in and carry-out bits; and (2) there may be some
redundancy among the ranks of the sum and carry-output bits.
E. 4:2 Compressor
The 4:2 compressor has 4 input bits and produces 2 sum output
bits (out0 and out1 ), it also has a carry-in (cin) and a carry-out
(cout) bit (thus, the total number of input/output bits are 5 and
3); All input bits, including cin, have rank 0; the two output bits
have ranks 0 and 1 respectively, while cout has rank 1 as well.
Thus, the output of the 4:2 compressor is a redundant number;
for example, out1 = 0 and cout = 1 is equivalent to out1 = 1 and
cout = 0 in all cases.
Fig. 9(a). 4:2 Compressor I/O diagram, (b). 4:2 Compressor

Architecture
Fig. 8(b): RTL Schematic of Booth Multiplier

G. 6:2 Compressor
bits (out0 and out1), it also has a carry-in (Cin0, Cin1) and
a carry-out (Cout0, Cout1) bit (thus, the total number of input/
output bits are 8 and 4); All input bits, including Cin0, have rank
0 and Cin1 has rank 1 ; the two output bits have ranks 0 and 1
respectively, while Cout0 has rank 1 and Cout1 has rank 2 as
shown in fig.
Fig. 8(c): RTL Schematic of 4:2 Compressor
F. 5:2 Compressor
bits (sum and cout3), it also has a carry-in (cin1, cin2) and a
carry-out (cout1, cout2,cout3) bit (thus, the total number of input/
output bits are 7 and 4); All input bits, including cin1, have rank
0 and cin2 has rank 2 ; the two output bits have ranks 0 and 1
respectively, while cout 2 has rank 1 and cout1 has rank 2 as
shown in fig. Fig. 11(a): 6:2 Compressor I/O Diagram
Fig. 10: (a) 5:2 Compressor I/O Diagram
Fig. 11(b): 6:2 Compressor Architecture
Fig. 10(b): RTL Schematic of 5:2 Compressor

Table 3: Simulation Result using Cadence

Power Time Dealy
Multiplier
(uW) (psec)
Leakage Dynamic

Power Power
Array
4.869 61.792 2.794
Multiplier
Wallace Tree
4.961 69.617 2.008
Multiplier
Booth
5.589 25.123 1.157
Multiplier
8*8 Multiplier
(4:2 & 5:2 5.178 74.642 2.128
Compressor)
8*8 Multiplier
(4:2 & 6:2 5.690 82.309 1.765
Compressor)
From these simulation results using Xilinx and Cadence tools,

it is evident that the Booth Multiplier requires lesser number of
4 input LUT and time delay than Array Multiplier and Wallace
Tree Multiplier.
IV. Conclusions
It can be concluded that Booth Multiplier is superior in respect like
speed, delay, area, complexity. However Array Multiplier requires
more number of components and large delay than Booth Multiplier.
Fig. 11(c): RTL Schematic of 6:2 Compressor But the advantage of Array structure is that the arrangement of
its adders is very regular and is favorable for layout. Wallace
H. Multiplier Performance and Comparison Tree Multiplier having less delay than Array Multiplier, but due
The performance comparison of Array Multiplier, Wallace Tree to its higher complexity, its layout is complex. Multiplier using
Multiplier, Booth Multiplier are listed in Tables 2 & Table 3 in Compressor having larger area and power but the time delay is
terms of number of occupied slices, number of 4 input LUT using less compare to conventional Array Multiplier and Wallace Tree
Xilinx and power, delay using Cadence tool. The simulation Multiplier. As the Compressor order is increased the time delay
results of number of occupied slices and number of 4 input LUT reduces respectively. Hence for small area requirement and for
are shown in Table 2. The simulation results of power and delay less delay requirement Booth’s Multiplier is suggested.
are shown in Table 3.
References
Table 2: Simulation Result using Xilinx [1] Sumit R. Vaidya, D. R. Dandekar,“Performance Comparison
Number of of Multipliers for Power-Speed Trade-off in VLSI",
Number of Design Recent Advances in Networking VLSI and Signal
Multiplier occupied
4 input LUTs Processing, pp. 262-265
Slices
[2] SriNivas Devadas, Sharad Malik,“A Survey of optimization
Array
126 66 Techniques targating low power VLSI Circuit”, IEEE, DAC
Multiplier
'95, Proceedings of the 32nd annual ACM/IEEE Design
Wallace Automation Conference, 1995.
139 75
Tree Multiplier [3] P.V. Rao, Cyril Prassana Raj P,S. Ravi,“VLSI Design
Booth and Analysis of Multipliers for Low Power”, IEEE Fifth
82 43
Multiplier International Conference On Intelligence Information Hiding
8*8 Multiplier and Multipedia Signal processing, pp. 1354-1357, 2009.
(4:2 & 5:2 139 75 [4] Morris Mano,“Computer System Architecture ”, pp. 346-347,
Compressor) 3rd edition, PHI 1993 Publication.
8*8 Multiplier [5] Sumit R. Vaidya, D.R. Dandekar,"Delay-power Performance
(4:2 & 6:2 148 78 Comparison of Multipliers in VLSI Circuit Design”, IJCNC,
Compressor) Vol. 2, No. 4, pp. 47-56, 2010.
[6] Rizwan Mudassir, Mohab Anis, Javed Jaffari,"Switching
Activity Reduction in Low Power Booth Multiplier”, IEEE
pp. 3306-3309, 2008.

[7] Prvinkumar G. Parate, Prafulla S. Patil, Dr. (Mrs) S.

Subbaraman,“ASIC Implementation of 4 Bit Multipliers”, Prabhat Shukla received his B.Tech
IEEE First International Conference on Emerging Trends in degree in Electronics & Communication
Engineering and Technology, pp. 408-413,2008 from R.K.G.I.T College, Ghaziabad,
[8] Shiann Rong Kuang, Jiun-Ping Wang,"Design of Power- India, in 2009, pursue M.E. degree in
Efficent Configurable Booth Multiplier” , IEEE Trans. On Electronics from PEC University of
Circuits And Systems, Vol. 57, pp. 568-580, Mar. 2010. Technology, Chandigarh, India, in 2010-
[9] Bijoy Josh, Damu Radhakrishnan,"Comparison of Various 2012. His research interests include
Multipliers Techniques”, IEEE, pp. 297-300, 2007. VLSI Design, electronic measurement
[10] Bijoy Josh, Damu Radhakrishnan,"Fast Redundant Binary techniques.
Partial Product Generators for BoothMultiplication”, IEEE,
pp. 297-300, 2007.
[11] Razaidi Hussin, Ali Yeon Md.Shakaff, Norina Idris, Zaliman
Sauli, Rizalafande Che Ismail, Afzan Kamarudin,"An Naveen Kr. Gahlan received his B.Tech
Efficient Modified Booth Multiplier Architecture”, IEEE degree in Electronics & Communication
International Conference on Electronics Design, Dec. from NIT KURUKSHETRA,
2008. Kurukshetra, India, in 2009, pursue
[12] N. Ravi, A.Satish, Dr. T. Jayachandra Prasad, Dr. T. Subba M.E. degree in Electronics from PEC
Rao,"A New Design for Array multiplier with Trade Off in University of Technology, Chandigarh,
power and Area”, IJCSI, Vol. 8,issue 3, 2011. India, in 2010-2012. His research
[13] Ko-Chi Kuo, Chi-WenChou,"Low power and high interests include VLSI Design, electronic
speed multiplier design with row bypassing and parallel measurement techniques.
architecture”, Microelectronics JOURNAL, 41, 2010.
[14] C.N.Marimuthu, P.Thangaraj,“Low Power High Performance
Multiplier”, ICGST-PDCS, Vol. 8, Issue 1, Dec. 2008.
[15] Qian Yi, HAN Jing,"An Improved Design Method for Multi- Jasbir Kaur received his B.Tech degree
bits Reused Booth Multiplier”, IEEE, 2009. in Electronics & Communication from
GNDU College, Amritsar, India, in
1992, the M.E. degree in Electronics
from PEC University of Technology,
Chandigarh, India, in 2009, pursue Phd
degree in VLSI from PEC University
of Technology, Chandigarh, India. Her
research interests include VLSI Design,
electronic measurement techniques,
wireless communication.

Techniques On FPGA Implementation of 8-Bit Multipliers: Prabhat Shukla, Naveen Kr. Gahlan, Jasbir Kaur

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Techniques On FPGA Implementation of 8-Bit Multipliers: Prabhat Shukla, Naveen Kr. Gahlan, Jasbir Kaur

Uploaded by

Copyright:

Available Formats

ISSN : 0976-8491 (Online) | ISSN : 2229-4333 (Print) IJCST Vol.

3, Issue 2, April - June 2012

Techniques on FPGA Implementation of 8-bit Multipliers

Abstract design of high speed processors [1] .

w w w. i j c s t. c o m International Journal of Computer Science And Technology 455

IV. Methods and Performances

Fig. 2: Ripple Carry Array Multiplier

456 International Journal of Computer Science And Technology w w w. i j c s t. c o m

Fig. 3:(a), (b) Complete RTL Schematic of Array Multiplier

Array Multiplier gives more power consumption as well as

B. Wallace Tree Multiplier

Fig. 5: Wallace Tree Multiplier

w w w. i j c s t. c o m International Journal of Computer Science And Technology 457

Fig. 6(b): RTL Schematic of Wallace Tree Multiplier

458 International Journal of Computer Science And Technology w w w. i j c s t. c o m

Table. 1: Possible Arithmetic Actions

To speed up the multiplication Booth encoding performs several

Fig. 7: Booth Multiplier

From the basics of Booth Multiplication it can be proved that the

Fig. 6(c): RTL Schematic of Wallace Tree Multiplier

Fig.6(a)/(b)/(c): Complete RTL Schematic of Wallace Tree

w w w. i j c s t. c o m International Journal of Computer Science And Technology 459

Fig. 8(c): RTL Schematic of Booth Multiplier

Fig. 9(a). 4:2 Compressor I/O diagram, (b). 4:2 Compressor

Fig. 8(b): RTL Schematic of Booth Multiplier

460 International Journal of Computer Science And Technology w w w. i j c s t. c o m

Fig. 8(c): RTL Schematic of 4:2 Compressor

Fig. 10: (a) 5:2 Compressor I/O Diagram

Fig. 11(b): 6:2 Compressor Architecture

Fig. 10(b): RTL Schematic of 5:2 Compressor

w w w. i j c s t. c o m International Journal of Computer Science And Technology 461

Table 3: Simulation Result using Cadence

From these simulation results using Xilinx and Cadence tools,

462 International Journal of Computer Science And Technology w w w. i j c s t. c o m

[7] Prvinkumar G. Parate, Prafulla S. Patil, Dr. (Mrs) S.

w w w. i j c s t. c o m International Journal of Computer Science And Technology 463

You might also like