Professional Documents
Culture Documents
Design OfHigh Performance 64 Bit MAC Unit
Design OfHigh Performance 64 Bit MAC Unit
Abstract - A design of high performance 64 bit block. The design consists of 64 bit modified Wallace
Multiplier-and-Accumulator ( MAC) is implemented in multiplier, 128 bit carry save adder and a register.
this paper. MAC unit performs important operation in
many of the digital signal processing (DSP)
This paper is divided into six sections. In the first
applications. The multiplier is designed using modified
section the introduction about MAC unit is discussed.
Wallace multiplier and the adder is done with carry
save adder. The total design is coded with verilog-HDL In the second section discuss about the detailed
and the synthesis is done using Cadence RTL complier operation of MAC unit. The third and fourth section
using typical libraries of TSMC O.18um technology. deals with the operation of modified Wallace
The total MAC unit operates at 217 MHz. The total multiplier and carry save adder respectively. In the
power dissipation is 177.732 mW. fifth section, the obtained result for the 64 bit MAC
unit is discussed and finally the conclusion is made in
Keywords - Modified Wallace multiplier, Carry save the sixth section.
adder, multiplier and accumulator (MAC).
rj+ 1
=
2[ri/3]+rjmod3 (2 )
MULTIPLIER If rj mod3 0, then rj+ 1 = 2r/3
=
(3 )
ACCUMULATOR • • • • • • • • • • • • • • • • • • •
• • • • • • • • • • • • • • • • •
• • • • • • • • • • • • • • •
• • • • • • • • • • • • •
• • • • • • • • • • •
• • • • • • • • •
• • • • • • •
• • • • •
• • •
·�//////////////:.
Figure I [4]: Basic architecture of MAC unit
• · Z,777777? : ·
• • •
Hence, a modification to the Wallace reduction is
done in which the delay is the same as for the
• �/�////.
• • • • • •
• • •
conventional Wallace reduction. The modified • • • •
reduction method greatly reduces the number of half • • • • • • • • • •
adders with a very slight increase in the number of • • • • •
783
2013 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2013]
process where the carryout of one stage is fed directly The design is developed using Verilog-HDL and
to the carry in of the next stage. This process is synthesized in Encounter RTL compiler using typical
continued without adding any intermediate carry libraries of TSMC 180nm technology.
propagation.
As a previous work, 8 bit MAC unit is designed
Since the representation of 128 bit carry save using different multipliers and adders. The
adder is infeasible , hence a typical 8 bit carry save multipliers used for comparative study are: (i)
adder is shown in the figure 3[6].Here we are Modified Booth Aigorithm (ii) Dadda Multiplier (iii)
computing the sum of two 128 bit binary numbers, Wallace multiplier. The different adders used in the
then 128 half adders at the first stage instead of 128 study are: (i) Carry Look Ahead (ii) Carry Select
full adder. Therefore , carry save unit comprises of Adder (iii) Carry Save adder. The area, delay and
128 half adders, each of which computes single sum power dissipation comparative table are shown in the
and carry bit based only on the corresponding bits of table 1 and table 2 respectively. The graphs are
the two input numbers. plotted against the different type of 8 bit MAC unit.
� 2000
During the addition of two numbers using a half 1000
o
adder, two ripple carry adder is used. This is due the
u u u u u u u u
fact that ripple carry adder cannot compute a sum bit -« -« -« -«
u
-« -« -« -« -«
784
2013 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2013]
Table 1: Area and Delay Report of Different MAC unit is also less when compared other MAC unit which
are done either by modified booth or dadda
N MAC NAME Area Report Delay
multiplier. Hence we have designed a 64 bit MAC
0 Cells Cell (ps) unit using Wallace multiplier and carry save adder.
Area
Puwe:r Dissip a ti o n ' ( n \V)
( 11m2) S 2 000000
'";i
'-'
1 Multiplier Modified Booth 221 7677 4995
Algorithm
.! 1500000
s..
... 1000000
Adder Carry Look Ahead
(MCLMAC)
ä 500000
�
2 Multiplier Dadda Multiplier 202 7904 4213
� 0
Adder Carry Look Ahead u u u u u u u u u
.q: .q: .q: .q: .q: .q: .q: .q: .q:
:;§; :;§; :;§; :;§; :;§; :;§; :;§; :;§; :;§;
(DCLMAC) --' IW --' UJ --'
u u t3 u u t3 u t3 IW
u
3 Multiplier Wallace Multiplier 90 4398 3084 :;§; :;§; :;§; , 0 ,0 ,0
3 3 3
Adder Carry Look Ahead Figure 5: Graph Comparison of Total Power Dissipation in nW
(WCLMAC)
4 Multiplier Modified Booth 202 7418 4556 The simulation waveform result of 64 bit MAC
unit in shown in the figure 7. In the simulation result
Algorithm
it is observed that there is a delay by one clock cycle,
Adder Carry Select this is because it takes some time to compute since
(MCEMAC) the multiplier used here is 64 bit and the adder used
here is 128 bits.
5 Multiplier Dadda Multiplier 183 7637 4125
Adder Carry Select Table 2: Power Dissipation Report of Different MAC Unit
(DCEMAC)
SI.No MAC Power Report
6 Multiplier Wallace Multiplier 108 5013 1890
NAME
Adder Carry Select
Leakage Dynamic Total
(WCEMAC)
Power Power Power
7 Multiplier Modified Booth 185 6819 4545
(nW) (nW) (nW)
Algorithm
785
2013 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2013]
is 11916 and the cell area occupied by MAC unit is VI. CONCLUSION
542177 11m2. For 64 bit MAC unit, the delay is only Hence a design of high performance 64 bit
4900 ps. Multiplier-and-Accumulator (MAC) is implemented
De1ay(ils) in this paper. The total MAC unit operates at a
frequency of 217 MHz. The total power dissipated
6000 by 64 bit MAC unit is 177.732 mW. The total area
5000 occupied by it is 542177 11m2. Since the delay of 64
bit is less, this design can be used in the system
0.. 4000
which requires high performance in processors
.S 3000 involving large number of bits of the operation. The
i!'
;! 2000 MAC unit is designed using Verilog-HDL and
synthesized in Cadence 180nm RTL Complier.
1000
o REFERENCES
u u u u u u u u (....) [1].Young-Ho Seo and Dong-Wook Kim, "New VLSI
�-:c �-:( �c( �c( « �-:( .0:0-:( .0:.;1; �-:(
Architecture of Parallel Multiplier-Accumulator Based on Radix-2
2: 2: 2: 2: 2: 2: 2: 2:
."...
.ii!
--' w --' --' w Modified Booth Algorithm," IEEE Transactions on very large
u u tJ u
w
(J tJ u U tJ
."...
� 2: 2: Cl Cl Cl
:;: $ $ scale integration (vlsi) systems, vol. 18, no. 2,february 20 10
Figure 6: Graph Comparison of Delay in ps (2). Ron S. Waters and Earl E. Swartzlander, Jr., "A Reduced
Complexity Wallace Multiplier Reduction, " IEEE Transactions
On Computers, vol. 59, no. 8, Aug 20 10
From the figure 6, we can infer less delay caused
[3]. C. S. Wallace, "A suggestion for a fast multiplier," iEEE
by Wallace - Carry Save MAC unit. Hence the
Trans. ElectronComput., vol. EC-13, no. I, pp. 14-17, Feb. 1964
performance of 64 bit MAC unit is high.
[4]. Shanthala S, Cyril Prasanna Raj, Dr.S.Y.Kulkarni, "Design
and VLST Implementation of Pipelined Multiply Accumulate
Unit," IEEE International Conference on Emerging Trends in
Engineering and Technology, ICETET-09
Figure 7: Simulation Waveform of 64 bit MAC uni!. Adders and Multipliers", in "Design of High-Performance
Microprocessor Circuits", Book edited by A.Chandrakasan,IEEE
Press,2000
Table 3: Area and Delay Report of 64 Bit MAC Unit
[8]. Dadda, "Some Schemes for Parallel Multipliers," Alta
Frequenza, vol. 34, pp. 349-356, 1965
Instance Cells Cell Delay
area(J.1m2) (ps) [9]. C.S. Wallace "A Suggestion for a fast multipliers," IEEE
MAC unit 11916 542177 4900 Trans. Electronic Computers, vol. 13, no.l,pp 14-17, Feb. 1967
786