Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2019 International Conference on Vision Towards Emerging Trends in Communication and Networking

(ViTECoN)

An Area Efficient 16-bit Logarithmic Multiplier


M B K Chaitanya1, Y Sai Teja2, K Ram Teja3, Prof. G Ragunath4

School of Electronics Engineering

Vellore Institute of Technology, Vellore, India

maddulachaitu@gmail.com1, saitejareddy2201@gmail.com2, kothamasuramteja@gmail.com3, ragunath.g@vit.ac.in4.

Abstract – Digital signal processing applications often use major In order to reduce the error generated by the
mathematical operations such as multiplication, which consume Logarithmic Number System, techniques such as iterative and
more power and time. Operations like Fast Fourier Transform, non-iterative methods are used.
Convolution and correlation depends heavily on a large number
of multiplications. There are many techniques available to Mitchell algorithm (MA) is one of the non-iterative
perform multiplications. One such technique is logarithmic multiplication methods proposed in [7]. In MA, log(1+m) is
multiplication. logarithmic multiplication is achieved by adding approximated as m to reduce the complexity of logarithms.
the binary logarithms of two numbers and deriving the antilog
Here m represents the mantissa of a number. But MA is
of the result. In this paper, an efficient algorithm for logarithmic
multiplication is presented with the use of adders, decoders, proved to be generating nearly 11% error in the product as
multiplexers and a few combinational circuits that effectively stated in [1].
reduce the power and area of the multiplier.
To overcome such errors, an iterative algorithm
Keywords - Logarithmic number system, Digital Signal similar to Mitchell algorithm was proposed in [8]. In this
Processing, logarithmic multiplication, Verilog HDL, Multiplexer. method, the product is given by the sum of the approximate
product and error. The error here refers to the residues that
are discarded in the process [2]. These residues are again fed
I. INTRODUCTION into the algorithm and the products are added to get result
with the least possible error. In our presented architecture, we
Digital Signal Processing applications are entitled to optimized the performance of the algorithm by redesigning
perform a large number of arithmetic operations. the barrel shifter and leading one detector [9] proposed by [1].
Advancements in the field of integrated circuits lead to the
integration of arithmetic operations and integrated circuits. The paper is formulated as follows. Section II
This integration reduced the speed of operation and increased discusses methods of multiplication in a logarithmic
reliability. It is found that digital arithmetic operations such multiplication system. Mitchell and iterative logarithmic
as multiplication, addition, square root consume 86% of the multipliers are the two methods presented with appropriate
total data processing time [10]. Of these, multiplication is the mathematical equations. In section III, the multiplicative
most area consuming and time thirsty operator. DSP algorithm is presented. Section IV gives an analysis of
applications don’t demand accuracy. Signal processing deals modified designs and verification of the modofied design.
with signals generated by non-ideal sensors that add noise Section V and VI presents results and conclusion
into it. Quantization, amplification of such signals never respectively.
provides desired results that they ought to. So, the speed of II. REVIEW
operation, the power consumed and the area occupied is at
high priority [11], [3]. In this section, both the conventional Mitchell’s algorithm
based logarithmic multiplier and the iterative logarithmic
Multiplication is one of the major operations in these multiplier are explained briefly.
applications. In the optimization of a DSP application,
multiplication block plays a major role. There are many ways A. Mitchell’s Algorithm Based Multiplier
in which two numbers could be multiplied. One such way is
to convert numbers into Logarithmic Number System (LNS) To simplify the multiplication, we introduced logarithmic
[6]. The LNS converts multiplication and division into number system, especially in the cases where the accuracy is
addition and subtraction respectively. This way not a major concern. Mitchell’s algorithm (MA) is one of the
multiplication could be directly substituted with addition thus most significant multiplication methods in logarithmic
area, latency and power are improved. But logarithmic number system [5], [3].
conversions are not accurate when applied on numbers [11]
The binary representation of number N can be written as:
so LNS imparts errors into the output.

978-1-5386-9353-7/19/$31.00 ©2019 IEEE

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 28,2020 at 13:55:42 UTC from IEEE Xplore. Restrictions apply.
where x is the position of the most significant bit with value
‘1’, Zn is the value of the bit in nth position, is fraction or
mantissa, then the logarithm with the basis of 2 of N is

It is necessary to approximate the values of logarithm and Where Ptrue is exact product Papprox is approximate product and
antilogarithm that can be derived from binary representation ‘E’ is the error.
of the numbers.
The logarithmic of the product is given as:

In the above equation (10), the error ‘E’ is in the form of


The expression log2(1 + m) is approximated with m; product of two binary numbers and can be obtained simply by
therefore, logarithmic solutions are a trade-off between the removing the leading ‘1’ in the number N1 and N2. And then
time consumption and accuracy. The logarithm of the two we have to continue the multiplication procedure with these
numbers product is expressed as the sum of their new multiplicands (N1 - 2 ). (N2 - 2 ) in the same way as
characteristic numbers and mantissas: Papprox and repeat the procedure up to certain iterations to get
the exact result [4].
We can now calculate the approximate value of ‘E’ and add
The characteristic numbers x1 and x2 represent the places of it to the approximate product Papprox as a correction term (K)
the most significant bit with the value of ’1’. The fractions or by this we can reduce the error of approximation [1].
mantissa’s m1 and m2 are in range [0, 1).
= + K (1) (11)
The final approximate multiplication result (where PMA = N1.
N2) of MA depends on the carry bit from the sum of If we repeat this multiplication procedure with ‘i’ correction
mantissa’s and is given by terms, we can approximate the product as:

= +K(1)+K(2) +…..+ K(i) (12)

The final approximation for the product, in equation (5) C. Block Diagram
requires the comparison of the sum of the mantissa’s with ‘1’.
The error introduced here is always positive as log2(1+m) is
always greater than or equal to m and the error ranges from 0 LOD
LOD
to 11 %. To reduce this error various methods were proposed.
Some of those methods are Operand Decomposition method
[1], using look up tables, and segmentation and interpolation
methods [11].
ENCODER ENCODER

B. An Iterative Logarithmic Multiplier


BARREL BARREL
SHIFTER SHIFTER
This method is similar to MA but this does not include the LEFT LEFT

logarithmic approximation. By using this iterative method, it


is possible to reduce the error as small as required and even
might obtain an exact result.
`

ADDER

We can write correct expression for the multiplication using ADDER


equation (1) DECODER

ADDER

To avoid the approximation error, we are considering the next Fig. 1. Basic Block Diagram
equation derived from equation (1)
m.2x = N-2x (7) III. ALGORITHM
By using equations (6) and (7) 1. Inputs N1, N2 are n-bit binary numbers to be
multiplied, Output Papprox is product of that two
numbers with 2n-bits.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 28,2020 at 13:55:42 UTC from IEEE Xplore. Restrictions apply.
2. N1, N2 are taken as inputs to the Leading One
Detectors (LODs), outputs of LODs will be 2 and
2 . Where x1 and x2 are the leading one positions of
N1 and N2.
3. With inputs as 2 and 2 , encoders calculate the
values of x1 and x2.
4. (N1 - 2 ) and (N2 - 2 ) are the outputs of the two
XOR banks, where the operands and output of LODs
are given as input.
5. By using Barrel Shifters, (N1 - 2 ) is left-shifted by
x2 bits and (N2 - 2 ) is left-shifted by x1 bits then
the obtained result is (N1 - 2 ) 2 and (N2 -
2 ) 2 .
6. The above result is added using 32-bit adder to
obtain the resultant sum as: (N1- 2 ) 2 + (N2 -
2 ) 2 .
Fig. 2. 4-bit LOD
7. The values of x1 and x2 obtained in step-3 are added
and the result is given as an input to the Decoder The architecture of 16-bit LOD is shown in Fig. 3
which gives the output as 2 . which is implemented using 5 4-bit LODs and sixteen 2-ip
AND gates. It has a total of 3 stages, the 1st stage has 4 4-bit
8. The results obtained in the step 6 and step 7 are LODs and the other LOD is in the intermediate stage and the
added to give output as 2 .+(N1- 2 ) 2 + (N2 final stage consists of array of 2-input AND gates.
- 2 ) 2
9. The outputs of XOR banks are taken as error
operands and repeat the above same procedure, the
accurate product can be achieved at some iteration.

IV. MODIFIED ITERATIVE MULTIPLIER


The existing iterative logarithmic multiplier does not
provide an optimised hardware architecture and speed. We
propose an area efficient low power VLSI architecture of the
iterative logarithmic multiplier as shown in Fig.1 by
redesigning the circuit with a smaller number of logic gates
to get further improvement in area and power.
The contribution of this work includes change in
priority encoder block to encoder, modified LOD block and
reduction in number of stages and multiplexers in barrel
shifter, which results in reduced power and area when
compared to the existing work. This section presents a Fig. 3. 16-bit LOD
detailed circuit analysis of discussed approach and the
improvements are thoroughly discussed in the forthcoming The modified LOD architecture shown in Fig. 4 is
sections. drastically reduced to a 44-gate circuit with only 29 AND
gates and 15 NOT gates. The proposed architecture reduces
A. Leading One Detector the power and area in a large scale.
LODs are used to derive the most significant one or
a leading bit in a number. Here the LOD units are used to
extract the leading one from the operands. The leading one
detector discussed in [9] uses a 2:1 multiplexer as the
fundamental block. Now implementing the same logic by
replacing the 2:1 multiplexer with a logic gate which can
reduce the delay and power [8].
The existing circuit of a 4-bit LOD is shown in Fig. 2 which
is designed using the 2-input fundamental gates like AND,
OR, and NOT.

Fig. 4. Modified LOD

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 28,2020 at 13:55:42 UTC from IEEE Xplore. Restrictions apply.
Table I. Results of LOD circuits binary number is shifted by n bits to the left, the output would
LOD circuit Cell Area (µm )2
Power (nW)
be the number multiplied by itself n times.
Existing 154 2520.267
Proposed 104 2128.569
A 32-bit barrel shifter is needs 5 selection lines. In
previous works, a basic barrel shifter left is used. In our
algorithm, the selection lines that are received by the shifter
B. Encoder are only 4 leaving other selection line as zero. When a
selection line is 0, the shifter stage assigned to the selection
In the existing iterative logarithmic multiplier, line does not perform any shifting operation but returns the
Priority encoder is used for calculating the leading position input as output. This clearly states that the removal of 1st stage
bit of ‘1’ in the operands N1 and N2. Priority encoder is used in the shifter does not affect the output generated. So 1st stage
to detect the position of leading ‘1’ bit while the input may that can shift 16 bits is removed resulting in reduction of 32
have one or more number of 1’s simultaneously. But in 2:1-multiplexers.
logarithmic multiplier the output of LOD is given as input to
the priority encoder which contains only one bit as ‘1’. For example, we are taking 8-bit barrel shifter which
Therefore, we can use an encoder which detects the position is shown in Fig. 7 to explain our presented work. The logic
of bit with ‘1’. used for 32-bit barrel shifter is the same as that of 8-bit barrel
shifter. An 8-bit barrel shifter consists of 3 selection lines, in
Here we have taken 4-bit priority encoder which is our case, the 8-bit barrel shifter takes the input of four-bit
shown in Fig. 5 and 4- bit encoder shown in Fig. 6, these are number padded with the four 0's in the MSB side. If input 4-
compared with respect to the number of logic gates require to bit number is passed through the encoder maximum we get
implement them. the output as 2 bits, that is used as a selection line for barrel
shifter. Here for an 8-bit shifter, the selection line is always 2
bits to get the maximum shift of 3-bit. This can be done by
two stages of selection lines. So, we can remove the third
stage of selection line which is always zero.

Fig. 5. Priority Encoder

Fig. 7. barrel shifter

When a selection line is zero, the output is same as


Fig. 6. Encoder
the input. So, we can remove that stage to reduce the area. In
this barrel shifter, the input is 8-bit, in which 4 bits are padded
From the above Fig (5) and Fig (6), we can conclude that 16- 0's in the MSB side. In our modified design, if the selection
bit encoder requires a smaller number of logic gates than 16- line of the first stage is high the shift is 2-bit. so, the leftmost
bit priority encoder. 2:1 mux always gets the two inputs as '0' if the selection line
is high or low. So, we can remove that two mux’s and we can
Table II. Results of Modified Encoder circuits
directly take zero as an output. the last 2-bits of LSB will
Circuit Area (µm2) Power (nW) always get the zero as one of the inputs to the mux, so we can
Priority Encoder 118 2757.707 replace the mux with AND gate Similarly, in the next stage,
Encoder 48 977.125 we can remove one mux at MSB side and one mux is replaced
with AND gate at LSB side. The modified deign is shown in
Fig. 8.
C. Barrel Shifter
A barrel shifter can be designed by the combination
of 2:1 multiplexer circuit connected in a specific order to
generate the desired number of shifts in input. The number of
selection lines define the number of stages of multiplexer
arrays that are needed for the shift [10]. The basic idea behind
barrel shifter is multiplication of a binary number. When a

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 28,2020 at 13:55:42 UTC from IEEE Xplore. Restrictions apply.
VI. CONCLUSION
The motive behind this effort is to implement a
logarithmic architecture with efficient hardware with reduced
number of gates in LOD, replaced priority encoder with
encoder and designed a new barrel shifter with reduced
number of stages and improved internal design. The modified
design consists of simple combinational circuits. It has been
absorbed that an improved gain in terms of timing, area
occupied and power consumed. Synthesized results infer that
the presented logarithmic multiplier gives 14.02% change in
power and 5.61% change in area for a 16-bit architecture.

VII. REFERENCES
Fig. 8. Presented 8-bit barrel shifter
[1] Babić, Z., Avramović, A. and Bulić, P. (2011). “An iterative
logarithmic multiplier”, Microprocessors and Microsystems, 35(1),
Similarly, a 32-bit barrel shifter could be pp.23-33.
implemented. In 1st stage 32 mux are removed. A total of 15 [2] Weiqiang Liu, Danye Wang, Jiahua Xu, Chenghua Wang, Fabrizio
mux is removed on the MSB side. Another 15 mux are Lombardi, and Paolo Montuschi. “Design and Evaluation of
replaced by and gates. The comparison of barrel shifters is Approximate Logarithmic Multipliers for Low Power Error-Tolerant
shown below. Applications”, IEEE Transactions on Circuits and Systems I: Regular
Papers, 65(9), pp.2856-2868.
Table III. Results of Barrel Shifters [3] Alelsej Avramovk, Zdenka babi, Patrido Bulic, “A simple pipelined
logarithmic multiplier,” 2010 IEEE International Conference on
Circuit Cell Area (µm2) Power (nW) Computer Design, 2010.
Existing 848 43461.618 [4] S. Ahmed and M. Srinivas, "An Improved Logarithmic Multiplier for
Presented 656 27638.732 Media Processing", Journal of Signal Processing Systems, 2018.
Available: 10.1007/s11265-018-1350-2.
[5] R. K. Agrawal and H. M. Kittur, “ASIC based logarithmic multiplier
V. RESULT using iterative pipelined architecture,” 2013 IEEE Conference On
Information And Communication Technologies, 2013.
The cell power, timing and cell area of the existing [6] Hoefflinger, F. Warkowski, and B. M. Selzer, “Digital logarithmic
and the modified designs are summarized in the above three CMOS multiplier for very-high-speed signal processing,” Proceedings
of the IEEE 1991 Custom Integrated Circuits Conference.
tables. The presented designs are replaced with the existing
[7] Ch. Achuth Reddy, Alen Anurag Pandit, Dr. Gautam Narayan, “Design
designs in the Fig (1). and Simulation of 16×16 bit Iterative Logarithmic Multiplier for
Accurate Results,” 2018 Second International Conference on
A. Simulation Electronics, Communication and Aerospace Technology (ICECA),
2018.
The presented architecture is designed in Verilog [8] Raymond E. Siferd, Khalid H. Abed, “VLSI Implementations of Low-
HDL, and tested in ModelSim. Two 16-bit numbers N1 & N2 Power Leading-One Detector Circuits,” Proceedings of the IEEE
are given and output product generated is P. The simulation SoutheastCon 2006.
results are displayed in Fig. 9. [9] G. Yemiscioglu and P. Lee, “16-Bit Clocked Adiabatic Logic (CAL)
logarithmic signal processor,” 2012 IEEE 55th International Midwest
Symposium on Circuits and Systems (MWSCAS), 2012.
[10] G. V. Nikhil, B. P. Vaibhav, V. G. Naik, and B. S. Premananda,
“Design of low power barrel shifter and vedic multiplier with kogge-
stone adder using reversible logic gates,” 2017 International
Conference on Communication and Signal Processing (ICCSP), 2017.
[11] V. Mahalingam and N. Ranganathan, “An efficient and accurate
logarithmic multiplier based on operand decomposition,” 19th
International Conference on VLSI Design held jointly with 5th
Fig. 9. Simulation Results International Conference on Embedded Systems Design (VLSID06),
2006.

B. Area and Power analysis


Below here are the results of cell area and power of
the complete designed architecture of both existing and
modified architecture. TSMC 90 nm CMOS technology is
used for this analysis and it is performed in Cadence and
evaluated for 16-bit data input.
Table IV. Results of area and power

Circuit Area (µm2) Power (nW)


Existing 3244 104174.686
Presented 3062 89560.999
Improvement 5.61% 14.02%

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 28,2020 at 13:55:42 UTC from IEEE Xplore. Restrictions apply.

You might also like