Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2009 Third International Symposium on Intelligent Information Technology Application Workshops

High-speed Parallel 3232-b Multiplier Using a Radix-16 Booth Encoder

CHEN Ping-hua

ZHAO Juan

Faculty of Computer
Guangdong University of Technology
Guangzhou Guangdong 510006China
phchen@gdut.edu.cn

Faculty of Computer
Guangdong University of Technology
Guangzhou Guangdong 510006China
gdut-zj@163.com
For example: A represents multiplicand, B represents
multiplier. B=bn-1bn-2bibi-1b1b0 i=0,1,n-1. Then,

Abstracta 3232-b High-speed Parallel Multiplier is


proposed in this paper. The new Multiplier uses a Radix-16
Booth Encoder, uses a mixed compressed tree constituted by 42 and 5-2 compressor and an improved 64-bit carry-lookahead
adder (CLA) which combines the characteristics of CSS and
CLA. The test result shows that, compared with iterative
multiplier, this multiplier has some
improvement on
performance.

A* B

i 0

INTRODUCTION

High-speed multipliers are the core component of DSPs


(Digital Signal Processors). As a large number of filtration,
convolutions, and related operations are needed during the
digital signal processing, DSP system must has many
different technologies and structures compared with generalpurpose microprocessors. To operate the computing
mentioned above, High-speed Multiplier is an indispensable
arithmetic unit of DSPs, and its performance directly
influents the overall performance of DSPs[1].
The key of designing multiplier is the generation of
partial products and the summation of partial products.
Therefore, two methods can be used to improve the speed of
multiplier: 1. to reduce the sum number of partial products; 2.
to reduce the delay of the summation of partial products.
People proposed a lot of operational structures and
algorithms to improve the performance of multiplier, such as,
Iterative multiplier, array multiplier, Booth multiplier, and so
on. A 3232 Parallel Fixed-point multiplier, using a Radix16 Booth encoding algorithm, was designed by the base of
comparison and analyze of several multipliers. The mixed
compressed tree which is constituted by 4-2 and 5-2
compressor and an improved 64-bit carry-lookahead adder
(CLA) which combines the characteristics of CSS and CLA
are applied in it. The simulation result showed that,
compared with the iterative multiplier, the proposed
multiplier has some improvement on performance.
II.

Ab 2

i 0

 

Figure 1. The structure of iterative multiplier

The iterative multiplier is intuitive. Its algorithm and


structure are simple. It needs for less hardware recourses,
and has low power consumption. However, it has a low
speed on calculation, as it has to cycled calculate n times to
obtain the final result. So, the iterative multiplier is suitable
for the applications which require small area and low-power
consumption.
B. Array Multiplier
The array multiplier achieves high-performance by
increasing the complexity of the system. To complete a
multiplication of NN, the array multiplier needs N2 and
gates and an array composed by N(N-1) of FA (Full Adder).
A typical 44-bit parallel array multiplier is shown in Figure
2.

SEVERAL STRUCTURES OF MULTIPLIER

A. Iterative multiplier
Iterative multiplier is the most easily achievable
multiplier. It uses cumulative displacement to cycle achieve
the partial product and the cumulative sum.
978-0-7695-3860-0/09 $26.00 2009 IEEE
DOI 10.1109/IITAW.2009.44

n 1

Set PPi represents partial product (According to when


the multiplier bit is 1 or 0, partial product is the values of
multiplicand or 0), the values of PPi is
n-1<=i<=0
2
PPi = A*bi2i
If Pi represented the cumulative sum obtained by i times
cycled calculating. Then,
i=0,1,n-1P0=0 3
Pi+1=Pi+PPi
At last, Pn=AB . The structure of iterative multiplier is
shown as Figure 1.

Keywords- Multiplier; Radix-16 Booth Encoder; Compressor.

I.

n 1

A bi 2i

406

that can overlap scan three multipliers Radix-8 .The


improved Booth algorithm had further reduced the number
of partial product.
III.

DESIGN OF RADIX-16 BOOTH HIGH-SPEED PARALLEL


3232 MULTIPLIER

According to the diagram of Booth encoded multiplier, in


this paper, three methods is used to improved the
performance of parallel multiplier:
1) Use Radix-16 Booth encoding algorithm, in order to
further reduce the number of partial product.
2) Use the mixed compressed tree which constituted by
4-2 and 5-2 compressor to reduce 1 level of compress
calculation, then, the delay of computing.
3) Use a rapid adder combined by CLA and CSS, in
order to further reduce time of obtaining the final result.

Figure 2. A typical 44-bit parallel array multiplier

In the structure of array multiplier, the performance has a


significant improvement, as partial products are obtained by
parallel calculation. However, this structure has a obviously
disadvantage: the occupation of hardware resources sharply
increased with the digit of multiplication increased. Besides,
the number of array multiplier partial products does not
decrease. Therefore, the array multiplier still has room for
improvements[1].

A. Radix-16 Booth algorithm


To use Radix-16 Booth algorithm in 3232-b multiplier,
re-encoding of the multiplier complement number is needed,
should add a 0 to the least significant bit of the multiplier
complement number and scan by each 4 digits. An n-bit
binary complement number can be represented as:

C. Booth Multiplier
To reduce the number of partial products, people reencoded the multiplier. In these coding algorithms, the Booth
code is the most accepted one.
In 1951, A. D. Booth proposed an encoding that encoded
for the multiplier named Booth Encoding, to solve the sign
fixing problem in the multiplication of signed numbers. In
Booth code, a zero is added to the lowest bit, when partial
product is being cycled, two adjacent multipliersRadix4are loaded out for estimation. The partial products will be
determined as the double or triple of multiplicand, based on
the value of these two multipliers. Review the reference[2]
for the details of Booth Encoding algorithm.
Booth encoded multiplication can be carried out in three
steps[3]:
1) Generate partial product;
2) Adder array cumulate partial product(i.e.
compressor);
3) Use adder to obtain the final result.
The schematic of Booth encoded multiplier is shown in
Figure 3.

Bc

n2

b2 n 1  bi 2 i

2 n  4 (8bn1  4bn  2  2bn 3  bn  4  bn 5 ) 

i 0

2 n8 (8bn 5  4bn 6  2bn7  bn8  nn9 )    2 4 (8b3  4b2  2b1  b1 )

bi represents the i-bit value of Multiplier and should be


re-encoded by the code formula -8bi+3+4bi+2+2bi+1+bi+bi-1.
As 0 should be added to the least significant bit of multiplier,
the bit wide of partial product is 35-bit. According to the
results of coding, partial product should be chosen from
array {0, fA,f2A,f3A,f4A,f5A,f6A,
f7A,f8A} (A is the multiplicand complement
number). In the array, 2A 4A8A can be obtained
by simple shifting from A. 3A5A6A7A
are the complex multiplier of A . Because 6Acan be
obtained by shifting from 3A; 3A, 5A, 7Ais the only
complex multiplier need to be sub-generated by 4-bit short
adder. In order to improve the computational efficiency of
the multiplier, reduce the calculation delay, the computation
of complex multiplier and the Booth re-encoding of
multiplier can be executed in parallel.
Radix-16 Booth encoding algorithm scans 4 digit of
multiplier each time, and 9 partial products will be obtained.
It significantly reduces the number of partial product.
B. Design of partial product compressor circuit
To accelerate the speed of the whole adder array, partial
product needs to be compressed. Compressing the partial
product is not adding the whole partial product one by one,
but adding the digital bits in partial product which has same
weights.
4-2 compressor is the most frequently used component
during the compressing of partial product. Its structure is
shown as Figure 4[4].

Figure 3. The diagram of Booth encoded multiplier

To further reduce the number of partial product and


increase the speed of multiplier, someone had improved the
Booth algorithm, proposed an improved Booth algorithm

407

Figure 6. The structure of 5-2 compressor

Figure 4. The structure of 4-2 compressor

The mixed compressed tree constituted by


compressor and 5-2 compressor is shown as Figure 7.

There are 4 partial product input signals in 4-2


compressor. They are In1~In4; Cin is the carry input of
adjacent compressor; sum is pseudo-sum; Carry and Cout are
carry output, they have same weights. The independence of
input carry and output carry can insure the partial product
add separately at the same time.
By Radix-16 Booth encoding the 3232-b Parallel
multiplier, 9 partial products are obtained. After the
compressed of 4-2 compressor, the compressed tree is shown
as Figure 5[5].

4-2

Figure 7. The structure of mixed compressed tree

C. Improved design of CLA


The last step of designing Radix-16 Booth Parallel
multiplier is to design a 64bit adder for final products. The
designed improved 64bit CLA is combined using the CSS
and CLA: Divided the 64bit adder into high-32bit adder and
low-32bit adder; Each 32bit adder will be divided to high16bit adder and low-16bit adder; The low-16bit adder is
design by using 2-level CLA, and its carry output is taken for
conditional selection signal of high 16-bit adder. Details of
designing improved CLA can be seen in refence[7].

Figure 5. the compressed tree

Though the 4-2 compressed tree has a regular structure, 3


levels compression delay constrained its performance. The
mixed compressed tree constituted by 4-2 compressor and 52 compressor, will reduce 1 level compression delay and
improve the performance.
The structure of 5-2 compressor is shown as Figure 6[6].

IV.

SIMULATION TESTING

In the IDE of Xilinx ISE 7.1 and Modelsim SE, the


iterative 3232 multiplier and Radix-16 Booth 3232
multiplier were tested and verified by logic simulation.
During the simulation, The VirtexE series FPGA chip
XCV200E of Xilinx company and the 1.8V voltage, 0.18m
6-level metal technology were used in the test. The
simulation was running in Modelsim. The result shows that
the design is correct. The comparison of the performance of
two multipliers is shown as Table 1.

408

TABLE I.
Multiplier

Iterative
multiplier
Radix-16
multiplier

[2]

THE PERFORMANCE OF TWO MULTIPLIERS

Slices Used

Microcells Used

Clock frequency
(Finish times, ns)

41/2352(1%)

115/256(45%)

114.88(8.05)

182/2352(7%)

100/256(40%)

129.02(7.55)

V.

[3]

[4]

ACKNOWLEDGMENT

[5]

This work was supported by Natural Science Foundation


of Guangdong Province, China.

[6]

REFERENCES
[7]
[1]

JIANG Yong, LUO Yuping, MA Yan, YE Xin. Design and


Implementation of 32-bit Parallel Multiplier Using FPGA [J].
Computer Engineering, 2005, 31(23): 222-224

409

Burchard B, Romer R, Fox O. A Single Chip Phoneme Based HMM


Speech Recongnition System for Consumer Applications. IEEE
Transactions on Consumer Electronics,2000,46(3)
LIU Qiang, WANG Rongsheng. High-speed Parallel 3232-b
Multiplier Design Using Radix-16 Booth Encoders. Computer
Engineering [J], 2005,31(6): 200-202
Dony C,Purchase J, Winder R.Exception Handling in Object-oriented
System[C]. Report on ECOOP91 Workshop W4,1991:17-30
WANG Xingang, FAN Xiaoya, LI Ying, QI Bin. Design and
Implementation of a Parallel Multiplier. Application Research of
Computers [J], 2004,(7):135-137
Pallavi Devi Gopineedi, Himanshu Thapliyal, M.B Srinivas, Hamid R.
Arabnia. Novel and Efficient 4: 2 and 5: 2 Compressors with
Minimum Number of Transistors Designed for Low-Power
Operations, ESA, 2006: 160-168
CHEN Ping-huaZHAO Juan, XIE Guo-bo,LI Yi-jun. An improved
32-bit Carry-Lookahead Adder with Conditional Carry-Selection [C].
Proceedings of 2009 4th International Conference on Computer
Science & Education ,ICCSE 2009: 1911-1913

You might also like