Design and Implementation of 64 Bit Multiplier Using Vedic Algorithm (Final Report)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Digital Electronic Circuits Term Paper

​ ESIGN AND IMPLEMENTATION OF 64


D
BIT MULTIPLIER USING VEDIC ALGORITHM

Presented by:

Sohini Roy 18IE10026


Hardhik Mohanty 18EE10065
Abhinav Japesh 18EE30001
Sunit Kumar Singha 18EE10064
Chinmoyee Chakraborty 18EE30027
Sankalp Srivastava 18EE10069

INTRODUCTION
History

The first column compression multiplier was introduced by Wallace in 1964. He reduced
the partial product of rows by grouping into sets of the three-row set and two-row set
using (3,2) counter and (2,2)  counter, respectively. In 1965, Dadda altered the
approach of Wallace by starting with the exact placement of the (3,2) counter and (2,2)
counter in the maximum critical path delay of the multiplier. Dadda multiplier is slightly
faster than the Wallace multiplier and the hardware required for the Dadda multiplier is
lesser than the Wallace multiplier. High-speed multiplication is a primary requirement of
high-performance digital systems.
General Approach

The multiplication process begins with the generation of all partial products in parallel
using an array of AND gates. The next major steps in the design process are
partitioning of the partial products and their reduction process.

Array multiplier is one of the fast multipliers. It has a regular structure and it can be
designed very easily. Array multiplier is used for multiplication of unsigned numbers by
using full adders and half adders. It depends on the previous computations of the partial
sum to produce the final output. Hence, the delay is more to produce the output. CMOS
power gating based CLA are used for maximizing the speed of the multiplier and to
improve the power dissipation with minimum delay. CMOS logic is based on the radix
2(binary) number system.

The aim of a good multiplier is to provide a physically packed together, high speed and
low power consumption unit. Most of the DSP algorithms perform addition and
multiplication. So, these operations overcome the execution time. That is why; there is a
need for a high-speed multiplier. The demand for high-speed processing has been
increasing as a result of expanding computer and signal processing applications. Low
power consumption is also an important issue in multiplier design. To reduce significant
power consumption it is good to reduce the number of operation thereby reducing
dynamic power which is a major part of total power consumption so the need for high
speed and low power multiplier has increased.

Introduction to the floating-point multiplier


The floating-point multiplier is one of the most useful modules in electronics industries
such as digital signal processing, image processing, 3D technology, and arithmetic unit
in the microprocessor. In most of the applications, it is needed to do arithmetic
operations in very less time with greater accuracy. The main idea behind that is to
minimize the delay, latency period, area, power consumption and to maximize the
accuracy and speed of the multiplication. There are many complex issues to resolve
with the handling of floating-point numbers in computers to ensure that the results are
consistent when the same program is run on different machines. The most compact
representation of a floating-point number defined by IEEE 754 is the 32-bit
“single-precision‟ floating-point number format. Following is the format of IEEE 754
single-precision floating-point number:

where
S: Sign bit (bit 31) (0 for positive and 1 for negative number)
E: Exponent which is of 8 bit (from bit 23 to 30)
F: Mantissa or Fraction which is of 23 bit (from bit 0 to 22)
--
In the Wallace method, the partial products are reduced as soon as possible but
Dadda’s method does the minimum reduction necessary at each level. Due to this, the
adders have different delays and different areas. Therefore, it can be concluded that the
speed of multiplication in Dadda multiplier with both ripple-carry adders, as well as
carry-look-ahead adders is greater than that of in Wallace multiplier and also the
complexity is reduced in Dadda multiplier.
The Dadda Algorithm:

● An algorithm invented by computer scientist Luigi Dadda in 1965.


● Fastest algorithm to multiply two numbers.
● Similar to the Wallace multiplication algorithm, both have the same three steps.
● The multiplication products of the first step carry different weights reflecting the
magnitude of the original bit values in the multiplication, like in the Wallace
algorithm.
● Dadda algorithm attempts to minimize the number of gates used, as well as
input/output delay, unlike the Wallace algorithm which reduces as much as
possible on each layer.

Steps to multiply 2 floating-point numbers using Dadda Algorithm

● Form the partial product by multiplying (logical AND-ing) the two multiplicands bit by bit,
resulting in a matrix known as partial product matrix.
● Depending on the position of the multiplied bits, the multiplication products of the first
stage carry different weights.
● Reduce the number of partial products using full and half adders until two rows are left.
● Group the products into two numbers, and add them with a conventional adder.

Procedure to reduce the partial product into 2 rows

● Consider d1 to be the minimum reduced height, i.e. d1 = 2 and d(1+j) = [1.5*dj], where dj
is the height of the partial product matrix of j-th stage.
● Repeat the above calculation until the largest height is calculated.
● Find the smallest j such that at least one column of the original partial product matrix has
more than dj bits.
● In the jth stage from the LSB column, apply full and/or half adders to reduce the height of
the column until no column has a height more than dj bits for that stage.
● Let j=j -1 and repeat the above step until the height of each column is reduced to two.
Advantages of the algorithm

● Faster than the Wallace algorithm.


● Requires fewer gates, hence less area
● The reduction phase is less expensive.

Disadvantages of the algorithm

● The final numbers may be a few bits longer, thus requiring slightly bigger adders.
● More complex than the Wallace algorithm.

Dot Diagram
VEDIC MATHS SUTRA(URDHVA TRIYAKBHYAM)

INTRODUCTION:

● Urdhva-Tiryagbhyam is an ancient Vedic sutra (formulae) for multiplication, which


means vertically crosswise multiplication.
● On this algorithm, the quantity of phase required for multiplication can be
decreased, and as a consequence, the speed of the multiplication is accelerated.
● Urdhva-Tiryagbhyam algorithm is the best algorithm for binary multiplication in
terms of area and delay.

ILLUSTRATION:

Multiplication of two 4-bit binary numbers by Urdhva-Tiryagbhyam sutra:

1.The 4-bit binary numbers to be multiplied are written on two consecutive sides of the
square as shown in the figure.

2.The square is divided into rows and columns where each row/column corresponds to
one of the digits of either a multiplier or a multiplicand.

3.Thus, each bit of the multiplier has a small box common to a digit of the multiplicand.

4.Each bit of the multiplier is then independently multiplied (logical AND) with every bit
of the multiplicand and the product is written in the common box.

5.All the bits lying on a crosswise dotted line are added to the previous carry.

6.The least significant bit of the obtained number acts as the result bit and the rest as
they carry for the next step.

7.Carry for the first step (i.e., the dotted line on the extreme right side) is taken to be
zero.

We can extend this method for higher-order binary numbers.

URDHVA-TRIYAGBAYAM MULTIPLIER MODEL :

On this algorithm, the number of phases required for multiplication can be decreased
and as a consequence, the speed of the multiplication is accelerated.

Let us take an example of two 4-bit numbers, as shown below.

The two inputs are A3A2A1A0, and B3B2B1B0 and the products are
P7P6P5P4P3P2P1P0 with temporal partial products t0t1t2t3t4t5t6​.

The partial products obtained are given below:

Stage 1: t0 = A0B0

Stage 2: t1 = A1B0+A0B1

Stage 3: t2=A2B0+A1B1+A0B2

Stage 4: t3 =A3B0+A2B1+A1B2+A0B3

Stage 5: t4 =A3B1+A2B2+A1B3

Stage 6: t5 =A3B2+A2B3

Stage 7: t6 =A3B3
LINE DIAGRAM FOR URDHVA-TRIYAGBAYAM MULTIPLIER:
FAST DADDA MULTIPLIER USING URDHVA TRIYAKBHYAM :

Dadda multiplier consists of three stages. In the first stage, the partial product matrix is
formed. In the second stage, this partial product matrix is reduced to a height of two and
then in the final stage, these two rows are combined using a ripple carry adder. Among
all these three stages, the partial product reduction and summation stage have the
maximum delay compared to the other two stages. However, the maximum delay
produced can be reduced by the help of a Vedic sutra known as Urdhva Triyakbhyam
(meaning vertically and clockwise). Urdhva Tiryagbhyam algorithm is the best algorithm
for binary multiplication in terms of area and delay. With the help of this Vedic sutra, the
second and the third stage can be performed simultaneously which leads to a significant
reduction in delay. The Vedic sutra uses the Ripple carry adder to combine the last two
rows obtained after partial product reduction. A ripple carry adder is a logic circuit in
which the carry-out of each full adder is the carry-in of the succeeding next most
significant full adder. This adder uses multiple full adders which are cascaded to add an
N-bit number. It is called a ripple carry adder because each carry bit gets rippled into
the next stage.
Algorithm
Floating-point representation

Our proposed work is to design a fast floating-point multiplication unit. To improve the
performance of multiplier we minimize delays in most of the stages of arithmetic
calculations. This is done by using the Dadda algorithm for mantissa multiplication and
Kogge-Stone adder for the addition of partial product of mantissa. Along with these, we
use shifting mechanisms instead of AND gate for forming partial products of mantissa in
the Dadda algorithm.

Normally the following steps are necessary to multiply two floating-point numbers-

1. Multiplying the significant i.e.; (1.M1 × 1.M2)

2. Placing the decimal point in the result.

3. Adding the exponents and subtract bias i.e.; (E1 + E2 – Bias)

4. Obtaining the sign of final result i.e.; (S1 XOR S2).

5. Normalizing the result i.e. obtaining 1 at the MSB of significant multiplication‟s result.

6. Rounding the result to fit in the given floating-point number format.

7. Checking for underflow or overflow condition.

The ​Dadda Algorithm:

Dadda multipliers have 3 steps:

1. Multiply (that is - AND) each bit of one of the arguments, by each bit of
the other, yielding N2 results.
2. Reduce the number of partial products to two layers of full and half
adders. For this, the Dadda reduction scheme uses the following
algorithm.

a) Let d1 = 2 and dj+1 = [3.dj / 2], where dj is the matrix height for the
j-th stage from the end. Find the largest j such that at least one
column of the matrix has more than dj bits.

b) Employ (3, 2) and (2, 2) counters to obtain a reduced matrix with


no more than dj elements in any column. c)Until a matrix with only
two rows is generated. Let j = j-1 and repeat step b

3.​ Group the wires in two numbers and add them with a conventional

adder​.

● The closed block is Half-adder


● The open block is the full adder, open for adding carry

Bits in Multiplier (N) Number of Stages

3 1

4 2

5≤N≤6 3

7≤N≤9 4

10 ≤ N ≤ 13 5

14 ≤ N ≤ 19 6

20 ≤ N ≤ 28 7

29 ≤ N ≤42 8

43 ≤ N ≤ 63 9

63 ≤ N ≤ 94 10
Flow Chart of 16 × 16 Dadda multiplier
16x16 Dadda Adder
MAIN BLOCKS OF FLOATING POINT MULTIPLIER
IEEE 754 single-precision floating-point format :

A. SIGN CALCULATOR BLOCK :


The sign bit (31) of a floating-point number indicates whether the number is positive
(S=0) or negative (S=1). Inside this block, the MSB (sign bit) of the product is equal to
the output obtained by providing the inputs S1 and S2 to the XOR gate. Hence, when
the multiplicand and multiplier are either both positive or negative then the sign bit of the
product is equal to 0 (0 XOR 0 = 0; 1 XOR 1 = 0), and when they are of opposite signs
then the sign bit of the product is equal to 1 (0 XOR 1 = 1; 1 XOR 0 = 1).

B. EXPONENT CALCULATOR BLOCK :


The exponent (E) of a floating-point number is represented as an 8-bit number from bit
23 to 30. Inside this block, the exponents of the multiplicand (E1) and multiplier (E2) are
added together using an 8-bit adder circuit. Next, the bias is subtracted from the result
obtained in the previous stage by making use of 2’s complement representation. In
floating-point arithmetic, a biased exponent is essential to make the range of the
exponent non-negative. In this block, bias is always constant, and in our case, it is equal
to 127, whose 2’s complement representation is equal to 10000001.

C. Mantissa multiplier unit :

In this block, fixed point multiplication is done with the help of the Dadda algorithm.
Dadda algorithm is the fastest way of multiplying two numbers. In this, instead of AND
gates, a shifting technique is used to generate a partial product matrix. The matrix bits
are arranged so that bits having the same weight are in the same column. Columns are
also arranged with particular rules. From the extreme right, the first column is of bits
having weight 1; the second column is of bits having weight 2, the third column is of bits
having weight 4, and so on. Then this partial product is reduced to two rows with the
help of half adder or full adder or both according to height (means how many bits are
present in that column) of that column. Then the final addition of elements in two rows is
done to get the multiplication result of mantissa, which is 46 bits.
D. Normalization unit :

The normalized number means it will have leading, 1‟ just immediately to the left-hand
side of the decimal point in the 46-bit mantissa multiplication result. According to the
position of decimal, the point exponent is adjusted. Leading „1‟ is skipped from the
result, and then remaining bits, after the decimal point, are truncated to 23 bit, which is
mantissa of the final product. If the decimal point is shifted to the right-hand side, then
the exponent is increased, and if the decimal point is shifted to the left-hand side, then
the exponent decreases. The number of increase or decrease in the exponent value is
equal to the number by which the decimal point is shifted. Truncation is nothing but the
rounding of the result.

E. Shifting Mechanism :

In this first, we concatenate 22 zeros to the extreme left of the mantissa of first input
operands. The next step is to multiply (Logical AND) all bits of operand 1 with the first
bit of operand 2. Instead of AND gate, we use a simple mechanism of shifting to get the
partial product. If the first bit of operand 2 is logic high, i.e., 1 then, operand one is
copied down as it is otherwise, if the first bit of operand 2 is logic low i.e., 0 then, a row
of all zeros, i.e., 45 zeros are copied down. This is nothing but the first row of the partial
product matrix. Now see the second bit of operand 2; if it is 1,‟ then operand one is
shifted left by one position and then copied down. If the second bit of operand 2 is „0,‟
then again, all zero are copied down. This is nothing but the second row of the partial
product matrix. Similarly, find out all the rows for each bit of operand 2. In this way, the
partial product matrix is formed. Then row-wise addition is done with the help of half
adders and full adders to reduce rows of partial product matrix into two rows. To get the
final mantissa result we added the final two rows. This is nothing but the fixed point
multiplier. Due to the use of a shifting mechanism instead of an AND gate, the area of
design minimizes to great extent.
DADDA MULTIPLIER ( CIRCUIT IMPLEMENTATION)

SHIFT OPERATION ( GENERATING FLOATING NUMBERS IN SCIENTIFIC NOTATION ):


EXPONENT GENERATION ( STANDARD FORM MEMORY REPRESENTATION OF
FLOATING POINT NUMBERS )

Conclusion

This term paper presents a very fast floating-point multiplier unit that supports the IEEE 754
single-precision binary floating-point number format. The multiplier gives output in only one
clock cycle and achieves a maximum frequency of 851 MHz with a 433 slice area in virtex6
FPGA. For reducing delays in the formation of the partial product matrix, the left shifting
technique is used instead of AND gates, resulting in a tremendous decrease in slice utilization.
We used the Vedic Maths Sutra URDHVA TRIYAKBHYAM and the Dadda algorithm to increase
the speed of the multiplier. The speed of the multiplier is increased, but the numbers of LUT-flip
flop pairs used are also increased. We plan to extend our work for parameterizing floating-point
multiplier units.

References
Purna Ramesh, Addanki. (2011). Implementation of Dadda and Array Multiplier
Architectures Using the Tanner Tool.

Kaur, Sukhmeet. (2015). Study of various high-speed multipliers.


10.1109/ICCCI.2015.7218139.

Sateesan, Arish & Sharma, R.. (2015). An efficient binary multiplier design for
high-speed applications using the Karatsuba algorithm and Urdhva-Tiryagbhyam
algorithm. 192-196. 10.1109/GCCT.2015.7342650.

Jais, Amish & Palsodkar, Prasanna. (2016). Design and implementation of a 64-bit
multiplier using Vedic algorithm. 0775-0779. 10.1109/ICCSP.2016.7754250.

You might also like