Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

An Iterative Mitchell's Algorithm Based Multiplier

Faculty of Electrical Engineering University of Banja Luka Patre 5, Banja Luka, BiH

Zdenka Babic

zdenka@etfbl.net

Faculty of Electrical Engineering University of Banja Luka Patre 5, Banja Luka, BiH alexejgspinter.net

Aleksej Avramovic

Patricio Bulic Faculty of Computer and Information Science University of Ljubljana Trzaska c. 25, Ljubljana, Slovenia patricio.bulicgfri.uni-lj.si

plier, logarithmic number system.

percentage with only two parallel correction circuits. Keywords Computer arithmetic, digital signal processing, multi-

Abstract This paper presents a new multiplier with possibility to achieve an arbitrary accuracy. The multiplier is based upon the same idea of numbers representation as Mitchell s algorithm, but does not use logarithm approximation. The proposed iterative algorithm is simple and efficient, achieving an error percentage as small as required, until the exact result. Hardware solution involves adders and shifters, so it is not gate and power consuming. Parallel circuits are used for error correction. The error summary for operands ranging from 8-bits to 16-bits operands indicates very low error

to hardware area savings. This paper presents a new iterative solution for multiplication, where error correction is realized with parallel correction circuits. This paper is further organized as follows: Section II presents a basic Mitchell's algorithm Xe and its modifications, with their benefits and weaknesses, Section III describes the proposed iterative solution, in Section IV hardware implementations of the proposed algorithm are discussed, Section V gives an experimental results overview,

II. MA BASED MULTIPLIERS

I. INTRODUCTION Multiplication has always been hardware, time and power consuming arithmetic operation, especially for large value operands. This bottleneck is even more emphasized in digital signal processing applications that involve a huge number of multiplications. In many signal processing applications the results of arithmetic operations do not have to be exactly accurate. For example, in signal compression techniques, quantization is usually performed after cosine or some other transform. Therefore, calculation of true transform coefficients values is not necessary and rounded products after multiplication by signal transformation are acceptable. Also, many digital signal processing systems can deal with an extra noise introduced. In these signal processing applications, where speed of calculation is more important than accuracy, logarithm number system (LNS) for multiplication seems to be suitable method. The main advantage of this method is substitution of multiplication with addition, after conversion operands into logarithms. But this simple idea has a significant weakness: a necessity for approximation of logarithm and antilogarithm. Therefore, logarithmic based solutions are trade-off between time consumption and accuracy. In well known Mitchell's algorithm (MA) [1] for multiplication in LNS, high error is caused due to the piecewise straight line approximation of logarithm and antilogarithm curve. Later, ' methods for many MA error correction are introduced with more or less success [5], [2], [3], [4], [5], [6]. LNS multipliers can be divided into two categories, one based on methods that use a lookup tables and interpolations and the other based on Mitchell's algorithm, even there is a lookup table approach in some of the MA based methods. MA based solutions suppressed lookup tables due

Logarithmic number system is introduced to simplify multiplication, especially in cases when accuracy requirements
are not rigorous. One of the most significant multiplication methods in LNS is Mitchell's algorithm, witch approximates logarithm with a piecewise straight line function. MA multiplies two operands by finding their logarithms, adding them and looking for antilogarithm of the sum. Approximation of logarithm and antilogarithm is essential, and it is derived from binary representation of numbers:
N
=

2k(l +

k-1
i=j

2i-kZi) = 2k(I +x)

(1)

where k is a characteristic number or place of the most significant bit with the value of '1', Zi is bit value at i-th position, x is fraction or mantissa andj depends on number's precision. By logarithm of the product computation,

k1 + k2 + log2(1 + X1) + log2(1 + X2) (2) is approximated with xi and logarithm of the two log2 (1+ xi) numbers' product is expressed as a sum of their characteristic numbers and mantissas:
=

log2 (Ni N2)

. . Th fal MAapoximation fo m ipan dei carry ~~~~~on bit from sum of mantissas iS given by: r 2k1k2 (1 + x + '2),k2 < k11 + (Ni Nf2)MA= 2k21, +(2)4) 2 t21k+(l+x) 1+z The sum of characteristic numbers determines MSB of the product. The sum of mantissas is added to complete the final

log2(Nl N2) k1i+ k2 +- X +- 2 Antilogarithm uses the similar approximation.

(3)

978-1 -4244-3555-5/08/$25.00 2008 IEEE

303

result. MA produce a significant error percentage. Relative error increases with the number of bits with the value of '1' in the mantissas. The maximum possible relative error for MA multiplication is around II% and the average error is around 3.8%. Mitchell analyzed this error and proposed analytical expressions for error correction. Summarizing, the most significant advantage of MA is simplicity, efficiency, i.e. non power-consuming. The most significant disadvantage is high error percentage. Algorithm 1 Mitchell's Algorithm 1) N1, N2: n-bits binary multiplicands, Papprox = 0 2nbits approximate product 2) Calculate k1: leading one position of N1 3) Calculate k2: leading one position of N2 4) Calculate x1: shift N1 to the left by n - k1 digits 5) Calculate X2: shift N2 to the left by n - k2 digits 6) Calculate k12 = k1 + k2 7) Calculate 112 = 1 +12 8) Decode k12 and insert '1' in that position of Papprox 9) Append X12 immediately after this one in Papprox 10) N1 N2 = Papprox
A step-by-step example illustrating the Mitchell's algoritm based multiplication is shown in Example 1. Example 1:
N1 234
=

proposed method decreases the by 44.70 on the average.

error

percentage of the MA

III. PROPOSED SOLUTION It is already mentioned that basic disadvantage in Mitchell's algorithm and similar solutions comes from logarithm approximation. Therefore, proposed solution avoids logarithm approximation and introduces an iterative algorithm with various possibilities for achieving error as small as wanted and possibility for getting the exact result. The proposed solution is an iterative but not a recursive algorithm, so it can be realized with parallel circuits for error reduction. Looking at the binary representation of numbers in (1), we can derive a correct expression for multiplication:

Ptrue

=
=

N1 N2 =2k1 (1 + 1) 2k2 (1 + 12) 2k1+k2 (1 + X1 + X2) + 2k1+k2 (X1X2)

(5)

The similarity with MA is evident. The error in MA is caused by neglecting the second term in (5). To avoid the approximation error, we have to take into account next relation: 2k= N-2k (6)
The combination of (5) and (6) gives:
Ptrte

11101010

134 = 10000110 N2 31356 Ptrte ki = 0111,1 = 1101010

2(k1k2) + +(N1 - 2kl)2k2 + (N2 - 2k2 )2ki +(Nl 2k1) (N2 2k2)
(N1 N2)

(7)
(8)

Let

k1 + k2
1

k2 = 0111, X2

0000110

Po

2(kl+k2)

+ (N1 - 2kl)2k2 + (N2 - 2k2)2kl

1110
1110000
1110.1110000 111100000000000 = 30720

Papprox Er 2.03% Numerous attempts have been made for improving the MA accuracy. Hall [4] derived different equations for error correction in logarithm and antilogarithm approximation in four separate regions, depending on mantissa value, reducing average error to 20, but increasing complexity of realization, Abed and Siferd [5], [6] derived correction equations with coefficients that are power of two, reducing error and keeping the simplicity of solution. Among many methods that use look-up tables for error correction in MA algorithm, McLaren's method [2], which uses lookup table with 64 correction coefficients calculated in dependence of mantissas values, can be selected as one with satisfactory accuracy and complexity. The recent approach for MA error correction reducing the number of bits with the value of ' 1' in mantissas by operand decomposition was presented by Mahalingam and Rangantathan [3]. The

(1ogNjN2)approx

be the first approximation of the product. It is evident that + Ptr, = PPO (Ni1-2k1) * (N2 2-2) (9)

If we

approximate the product with

approx PO (10) then the proposed method is very similar to MA. Actually, Po is equal to the first case in MA approximation (4). Mitchell proposed an exact correction term as in (9), but logarithm approximation based multiplying of given residues where not sufficient to achieve significant error decreasing. Avoiding logarithm approximation enables parallel error corrections and therefore more accurate product. For this reason proposed an iterative calculation of correction terms as follows. An absolute error after the first approximation is

E(

P -p() rox = P-P = = (N1 2k1 ) .(N2 -2k2 )

(11)

Note that E() > 0. The two multiplicands in (11) are binary numbers that can be obtained simply by removing the leading

978-1 -4244-3555-5/08/$25.00

C2008 IEEE

304

Algorithm 2 Iterative MA Based Algorithm with i correction terms 1) N1, N2: n-bits binary multiplicands, Po = 0 : 2nbits first approximation, CM = 0 : 2n-bits i correction terms, Papprox = 0 : 2n-bits product 2) Calculate k1: leading one position of N1 3) Calculate k2: leading one position of N2 4) Calculate (Nl - 2kl )2k2: shift (N1 - 2ki ) to the left by k2 digits 5) Calculate (N2 - 2k2 )2ki: shift (N2 - 2k2) to the left by k1 digits 6) Calculate k12 = k1 + k2 7) Calculate 2(ki+k2) : decode k12 8) Calculate Po : add 2(kik2) (N1-2kl)2k2 and (N22k2 )2ki 9) Repeat i-times: a) Set: N1 = N1 - 2k1, N1 = N2 -2k2 b) Calculate k1: leading one position of N1 c) Calculate k2: leading one position of N2 d) Calculate (N1 - 2kl)2k2: shift (N1 - 2ki) to the left by k2 digits e) Calculate (N2 - 2k2 )2kl: shift (N2 - 2k2 ) to the left by k1 digits f) Calculate k12 = k1 + k2 g) Calculate 2 (kk2) : decode k12 h) Calculate Ci : add 2(ki+k2) (N1- 2kl)2k2 and

Then the final result is exact: Papprox = Ptre. The number of iterations required for exact result is equal to the number of bits with the value of '1' in operand with smaller number of bits with the value of '1'. The proposed iterative MA based multiplication is given in Algorithm 2. One of the advantages of the proposed solution is a possibility to achieve an arbitrary accuracy by selecting a number of iterations, i.e. number of parallel correction circuits. Two stepby-step examples illustrating the iterative MA based algoritm multiplication with two correction terms are shown in Example 2 and Example 2, respectively. In the Example 2 a correct result is achieved with two correction terms, while in Example 3 the relative error is decreased bellow 1%.
Example 2:

N134 Ptrue 31356


k1 + k2
(Ni - 2k)2 (N2 -2k2)2ki

N1 N2

= =

234 134

= =

11101010 10000110 0

ki k2

0111, N12k1 01101010 0111, N2 -2k2 = 00000110 1110


00001100000000 100000000000000

(N2 -2k2)2k 10) Pprox = Po + 3i C(0)

2(k1+k2)

'1' in the numbers N1 and N2 so we can repeat the proposed multiplication procedure with new multiplicands

111100000000000 E$0) = 2.03% N(1) = 01101010 1

Po

30720

E()

C0(1) +E(1)

(12)

where C(M) is the approximate value of E() and E(1) is an absolute error when approximating E(). The combination of (9) and (12) gives

0110 N1()-2 1 - 00101010 k _l 0010, N2(1) 2

(N1(1) - 2k ) )2k )2
(N2(1)
-

1000 10101000
10000000

+E(1) (13) PO + We can now add the approximate value of E() to the approximate product Papprox as an correction term by which we decrease the error of approximation p(l) Po + C(1) (14)
Ptrte
approx

2k12 )2k)

2(k()1k( )) C(1)

P(l)

If we repeat this multiplication procedure with i correction terms we can approximate the product as

Pcprox

Po + c(1) + C(2) + ... + C(i)


Pi+ZO(J) = jPo+ EC(i)

(15) (15)

k(2) +kA 2

N(2) - 00101010 N2 = 00000010 0101 N1(1) k2 - 0001, N2(12 - 0110


10100 ~ = 00000

_1) = _

552 = 111101000101000 31272

100000000 1000101000

0.27% . /

00000000

The procedure can be repeated achieving an error as small as necessary, or until at least one of mantissas becomes a zero.

(N1(1

(N2(1) -2k( ) )2k )

2k1 )2k

978-1 -4244-3555-5/08/$25.00 2008 IEEE

305

2 (1)+

100

C(2)

E(2)
Example 3:

p(2)

1010100 = 84 = 111101001111100 (l) (2) = Po + C(l) + C(2)


0.0%

31356
31356

Papprox

IV. HARDWARE IMPLEMENTATION In order to evaluate the device utilization and the performance of the proposed multiplier, we implemented different multipliers on Xilinx xc3s5OOe-4fg320 FPGA [7]. We have implemented 5 16-bits multipliers: a sequential implementation of the proposed multiplier, a multiplier with no correction terms, and three multipliers with two, three and four correction terms, respectively.

N1 N2 Ptrve

234 255

=
=

11101010 11111111

= 01101010 k2 = 0111, N2- 2k2 01111111 k1 + k2 1110 (Ni- 2k)2k2 11010100000000 11111110000000 (N2 - 2k2 )2kl (kl+k2 ) = 100000000000000 2 1011010010000000 46208 PO E(0) E() = 22.56% N( ) = 01101010 1 01111111 2
=

ki = 0111, Ni -

=k59670

A. Basic Block A basic block is the proposed multiplier with no correction terms. The task of the basic block is to calculate one approximate product according to Equation 8. The 16-bits basic block is presented in Figure 1. The 16-bits basic block consists of two leading one detector units (LOD), two encoders, two 32bit barrel shifters, a decoder unit and two 32-bit adders. Two input operands are given to LODs and the encoders. The LOD units are used to remove leading one from the operands, which are then passed to the barrel shifters. The barrel shifters are used to shift residues according to Equation 8. The decode unit decodes k1 + k2, i.e. puts the leading one in the product. The leading one and two shifted residues are then added to form the approximate product. The basic block is used in further implemenations to calculate Po and C(i). B. Sequential Implementation
A sequential implementation of the proposed iterative multiplier consists of a data-path unit and a control unit. Core od the data path unit is one basic block form Figure 1. The control unit executes Algorithm 2.

k(l)
k~1
1+

k2l) = 0110, N2(1)k-2k


(1)
1100

0110, N1

00101010 00111111

k(1) 1 (N2(l) 222- )2 k(1) 2 1)2

(Nl() -2 '1 )2k1 2

k2(1)
k

(N2k1) 2(k1
-

k2 (

C(1)

p(l) 0
r

101010000000 = 11111100000 111111000000 1000000000000 = 10101001000000= 10816 1101111011000000 57024

4.43
= 00101010 = 00111111

N(2) k1 k(2) k(2) +k(2) k1) (N1(l) - 2 1 )2 2


k2

N( )

= 0101, N2(1)-2k) =00011111

0101, NP1(

(2) -

0) 00001010 00001010

To decrease the time overhead required by sequential implementation, we implemented multipliers with parallel correction circuits. To implement the proposed multipliers, we used the cascade of basic blocks. Block diagram of the proposed logarithmic multiplier with one parallel error correction circuit is shown in Figure 2. The multiplier is composed of two basic blocks of which the first one calculates the first approximation of the product (PO) while the second one calculates the error correction term 0(1).
D. Device Utilization For design entry we used Xilinx ISE 10.1.02 - WebPACK and design with VHDL. The design was synthetised with Xilinx Xst Release 10.1.02 for Linux. Device utilization (number of slices, number of 4-input LUTs and number of input-output blocks) for all five implemented multipliers are given in Table 1. Maximum combinational path delays along with total power consumptions for the basic block and all three parallel implementations are given in Table 2. The estimated maximum frequency for the sequential implementation was 69.85 1 MHz.

C.

Parallel Implementation

0101000000 k 1111100000 (N2(1) - 2k" 2k' 9(k(1)+k2 ) = 10000000000 k(1)) 2(k1 0(2) = 101001100000 = 2656 o()= 1110100100100000 =59680 E()= 0.52%
10000000000

Papprox

P0 + 0(1) + 0(2) =59680

978-1 -4244-3555-5/08/$25.00 2008 IEEE

306

NI

1!N2

/ENCO DER

EC DER

kl

1b t
BCARREL SHYTR LEFT
|<

ks
r

16 b
0 SHYTER ECLEFT
BA|RREL|

<10 3 7. 9

20

7.

92 F9

7.

98.0

q~~~~~~~~~~~~~~~~) | (N2_-2 Fig 1. Blok diga of a bai blc of th prpsdieaiv utpir

Ero

C
< 1 0o[1 69.9

C
99.6

C
100

C
99.4

3C
100 L

2C

3C
100

k65.6

52

99.

BaicBlc

Basic 3f 1088 + Block.+


143

75

96

i calculated from1well-knowni

Basic Block

96ga fabscboko h pooe trtv utpir Basic Block + iCT 736e 3.E o pecombntgration ofnbtno-eaiv
353. 1 182c

ubrs

ropretg

S M c t power (W) A - Zt (7) Multiplier 523 Sequential 523I 265. 996ET =o 11 656994 100% appr5o699.4100%5 (16) 10 Basic Sequentia 353 1. 2165. 996 a order to of the proposed 24era818e multiplier. 182 96 e.b Error percentage rt [%]. Basic Block + CT736 381 is|the Verg ERRORpecNtAgevaueisusd nomwer-of mqultion Basic Block + 2CT 2. 2376 reutlztsQ.57hee6 Table Fovlain Sytesise Basic Block + 3CT example, fr 12t n e a t 541555 7 11 o n

96combinationsofnbitnon-negativ
6|I
Fr

1 (16)19110

|~~~~~~~ea 35 Basic Bloc ||

Table results. Bai BaiBlock + 3CT 32.21438 c 265 Max.combinatii T7119 |Multiplier| Sequential ll 523 96

l Er = trzep n de i170(16) c t iapprox, T fr Ei c | Basic Block + 1CT || 36 | 381 |errorcorrections paale circuinnegt,ithenumepralle rorrerctin,ag 37n 9Basic +22C|24. 108 I Block

2l,Synthesis

5om loith isaplid7ol ito40 average error percentage valte is used:


re oeaut,tepooed

The iterative approach improves the average error percentage and the error rate compared to the basic MA multiplicaN1-2k2 tion. With only three correction terms all multiplication results have less then 0.5%0 of error. N22k! |The parallel implementation of the iterative MA multiplier if lr 1 rwith only one correction circuit almost doubles the area r required when compared to the original MA multiplier, but the power consumptions increases only from 2% (one correction term) to 1600 (three correction terms). This is still significantly less than the area and power required for a standard multiplier. The maximum combinational delay increases by 30 450 0 with the each added correction circuit, but this can be significantly improved by pipelining the three main stages in / \+ the basic MA based multiplier and pipelining the correction circuits. N11N2 To evaluate properties of various number of parallel correction circuits the Gaussian filteroer is implemented for removing Fig. 2. Block diagram of the proposed multiplier with one parallel error correction circuit. The multiplier is composed of two basic Mitchell's 2 00 salt & pepper noise. Filter is implemented with proposed logarithmic multipliers of which the first one calculates an approxmate product multiplier, with various number of correction term circuits. Error evaluation is done with average error percentage (AEP) and while the second one calculates an error correction term. mean square error (MSE). Since we have integer multiplier, it can be used with integer value kernels and signals. Average Table 4. Average relative errors [%] for 0, 1, 2 and 3 correction terms. bits II Basic MA [ I C. term 2 C. terms ] 3 C. terms AEP and MSE in this example for ICT is 0.4 and 0.05 00, 8 8.9131 0.8337 0.0708 0.0048 respectively and zero for 2CT-s, so this multiplication method 0.0069 could be useful for implementation of integer transforms. 0.0845 9 9.1336 0.8980
Nl

N2

FNo.

10 11 12 13 14 15 16

9.2595 0.9369 9.3301 0.9597 9.3692 0.9726 9.3906 13 9.3906 0.9799 9.4023 0.9840 9.4086 0.9862 9.4124 0.9874

0.0936 0.0994 0.1029

0.0086 0.0098 0.0106

0.1049 0.1060 0.1067 0.1070

0.0111 0.0114 0.0116 0.0117

tions. Results from Table 3 present error percentage rate for various cases. Results from Table 4 give more precise view, how algorithm can be modified to wanted error percentage. These results are compared with results from [3], since it is the latest paper with complete overview of various solutions. Authors presented combinations of their own proposal with other solutions and gave a similar error table. Comparing 8bits and 16-bits average error percentages, we can notice that our solution with three iterations outperforms the best propose combination of operand decomposition and Mitchell's error correction term.
VI. CONCLUSIONS

with two parallel corrections and with three parallel correc-

[2] D.J. Mclaren, Improved Mitchell-based logarithmic multiplier for lowpower DSP applications, Proceedings of IEEE International SOC Conference 2003 pp. 53-56, 17-20 September 2003. [3] V. Mahalingam, N. Rangantathan, Improving Accuracy in Mitchells Logarithmic Multiplication Using Operand Decomposition, IEEE Transactions on Computers, Vol. 55, No. 2, pp. 1523-1535, December 2006. [4] E.L. Hall, D.D. Lynch, S. J. Dwyer III, Generation of Products and Quotients Using Approximate Binary Logarithms for Digital Filtering

[1] J.N. Mitchell, Computer multiplication and division using binary logarithms, IRE Transactions 1962. pp. 512-517, August on Electronic Computers Computers, vol. EC-

REFERENCES

~~~~~~~~~~~~~~11,

Applications, IEEE Transactions on Computers, Vol. C-19, No. 2, pp. 97-105. February 1970. [5] K.H. Abed, R.E. Sifred, CMOS VLSI Implementation of a Low-Power Logarithmic Converter, IEEE Transactions on Computers, Vol. 52, No. 11, pp. 1421-1433, November 2003. [6] K.H. Abed, R.E. Sifred, VLSI Implementation of a Low-Power Antilogarithmic Converter, IEEE Transactions on Computers, Vol. 52, No. 9, pp. 1221-1228, September 2003. http://ww-wxilinx. comrsupportldocumentationldata-sheets/ds312.pdf, April 18, 2008.

[7] Xilinx Inc. Spartan-3E FPGA Family: Complete Data Sheet, DS312.

In this paper, we have investgate and proposed a new approach to improve the accuracy in Mitchell's algorithm based multiplication. The proposed method is based on iteratively calculating the correction terms. We have shown that the calculation of correction terms can be performed parallel in hardware.

978-1 -4244-3555-5/08/$25.00 2008 IEEE

308

You might also like