Professional Documents
Culture Documents
An Efficient Architecture For Signed Carry Save Multiplication
An Efficient Architecture For Signed Carry Save Multiplication
An Efficient Architecture For Signed Carry Save Multiplication
1, JANUARY–JUNE 2020 9
An Efficient Architecture for Signed A gate level implementation of low power and small-area
Carry Save Multiplication approximate multiplier is discussed in [9]. Significance-driven
logic compression based energy efficient approximate multiplier is
presented in [10]. Modified bauwooley scheme for 8 8 signed
Pramod Patali and Shahana Thottathikkulam Kassim multiplication is shown in Fig. 2. Here i ¼ 0 to 7, j ¼ 0 to 7 and
k ¼ 0 to 15.
Abstract—The performance of a digital signal processing (DSP) system is greatly The CPD of the conventional 8 x 8 signed CSM using modified
affected by the performance of its multiplication operations. Simultaneous
bauwooley scheme may be given by,
improvement in performance metrics such as delay, power, area, and energy
efficiency is difficult to achieve and is a challenge to be addressed. To this end, an TS8 ¼ Tand2 þ Thas þ 5Tfas þ Tmer8 (2)
efficient carry save multiplier (CSM) that employs modified square root carry select
adder (MSCA) for the vector-merging addition and improved full adder (IFA) in TS8 ¼ Tand2 þ Thas þ 5Tfas þ Tfadc þ 6Tfac þ Thas (3)
place of conventional full adder is proposed. Among 16 x 16 multipliers, the critical
path delay (CPD), power, area, power delay product (PDP), and area delay where Tand2 is the delay incurred by a two input AND gate, Tfas is
product (ADP) of the proposed CSM are improved by 27.74, 19.4, 46.2, 41.4, and the delay incurred by a full adder for the generation of sum, Tfac
60.87 percent respectively in comparison with improved booth multiplier and by
is the delay from carry input to carry output of a full adder, Tfadc is
46.43, 31.46, 36.9, 63.05, and 65.96 percent respectively in comparison with low
PDP booth multiplier. Cadence software with gpdk 45 nm standard cell library is
the delay from data input to carry output of a full adder and Thas
used for the design and implementation. is the delay incurred by a half adder for the generation of sum.
For a conventional full adder,
Index Terms—Computer arithmetic, low-power design, processors, VLSI systems
Tfas ¼ 2Txor2 (4)
Ç Tfadc ¼ Txor2 þ Tand2 þ Tor2 (5)
Tfac ¼ Tand2 þ Tor2 (6)
1. INTRODUCTION Thas ¼ Txor2 : (7)
REAL time digital signal processing (DSP) architectures require a
low complex, delay and energy efficient multiplier in order to meet From equations (3), (4), (5), (6) and (7), the CPD is obtained as
high speed processing of input data [1]. Various multiplication
schemes have been proposed over the years [2] to [5]. A radix-4 8 x 8 TS8 ¼ 14 Txor2 þ 8 Tand2 þ 7 Tor2 : (8)
bit multiplier using improved binary to two’s complement con-
verter (BTC) is introduced in [6]. The improvement in delay through
the use of improved BTC was negated by the serial processing 3 MODIFIED CARRY SAVE MULTIPLIER
of data through the stages. A conventional carry save multiplier
The modified CSM is developed by incorporating the following
(CSM) has a simple and regular structure. In a CSM, the carry bits
strategies.
are not immediately added, but saved to pass diagonally down-
wards. Though the speed is improved through the carry save opera- 1. The conventional full adder (FA) structure is replaced by
tion, the delay performance is affected by the final vector-merging the improved full adder (IFA).
adder. A delay and energy efficient modular hybrid adder is dis- 2. The conventional vector-merging adder(CVMA) is replaced
cussed in [7]. An efficient CSM that uses high speed and energy effi- by the delay and energy efficient MSCA.
cient MSCA [8] for vector merging addition and improved full
adder in place of conventional one is proposed here. 3.1 Improved Full Adder
The rest of this paper is organized as follows. The conventional The sum (S) and the carry (Co ) outputs of a conventional full adder
CSM is discussed in Section 2 and the proposed CSM is introduced shown in Fig. 3a may be represented by
in Section 3. The performance comparison of various multipliers is
done in Section 4. The conclusion is given in Section 5. S ¼ ðA BÞ Cin (9)
where Ai and Bj ði ¼ 0 to 3; j ¼ 0 to 3Þ respectively represent the The carry propagation delays are improved as follows.
multiplicand and multiplier bits.
Tifadc ¼ Tor2 þ 2Tnand2 (13)
The authors are with the Division of Electronics, School of Engineering, Cochin Tifac ¼ 2Tnand2 : (14)
University of Science and Technology, Kochi, Kerala 682022, India.
E-mail: pramodp2006@gmail.com, shahanatk@cusat.ac.in. Where Tifadc represents the propagation delay from data inputs
Manuscript received 1 Nov. 2019; revised 1 Jan. 2020; accepted 25 Jan. 2020. Date of to carry output and Tifac represents the propagation delay from
publication 3 Feb. 2020; date of current version 25 Feb. 2020. carry input to carry output of IFA.
(Corresponding author: Pramod Patali.)
Recommended for acceptance by I. Iliadis. The performance comparison of conventional and improved
Digital Object Identifier no. 10.1109/LOCS.2020.2971443 full adders at 45 nm is shown in Table 1.
2573-9689 ß 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Saveetha Engineering College. Downloaded on February 04,2021 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.
10 IEEE LETTERS OF THE COMPUTER SOCIETY, VOL. 3, NO. 1, JANUARY–JUNE 2020
TABLE 1
Comparison of Conventional Full Adder(FA) and Improved Full Adder
(IFA) at 45 nm After Digital Synthesis Using RTL Compiler v11.10
Where
Combining equations (13), (16), (17) and (24) with equation (15),
the CPD of the proposed 8 8 CSM may be obtained as
where Tnand2 is the delay incurred by a two input NAND gate, Thas1 1. CPG: The carry propagate and generate (CPG) block gener-
is the delay incurred by the half adder (HA1) for the generation of ate the propagate and generate functions.
sum, Tifas is the delay incurred by the improved full adder for the 2. NCC: The nand carry chain(NCC) generates 2 carry rows-
generation of sum, Tifac is the delay incurred by the improved full one for input carry, Cin ¼ 0 and the other for Cin ¼ 1.
adder for the generation of carry, Tmcsa7 is the delay incurred by the 3. CS: The carry select (CS) block selects one among two pos-
7-bit MSCA and Txnor2 is a two input XNOR gate delay. sible carries depending upon the input carry.
4. SG: The sum generation(SG) block generates final sum.
Thas1 ¼ Txnor2 (16) 5. MCG: Module carry generation (MCG) block generates
module end carry
Tifas ¼ Tor2 þ Tnand2 þ Txnor2 (17) The MSB of the multiplication result P15 is obtained by adding
bit ‘1’ with the carry output C14 of the previous bit (14th) addition
Tmcsa7 ¼ Tcpg0 þ Tmcg0 þ Tmcg1 þ Tcs2 þ Tsg2 ; (18) and may be as expressed by (26)
Authorized licensed use limited to: Saveetha Engineering College. Downloaded on February 04,2021 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.
IEEE LETTERS OF THE COMPUTER SOCIETY, VOL. 3, NO. 1, JANUARY–JUNE 2020 11
Fig. 4. An 8 x 8 Modified carry save multiplier showing critical path. The logic elements along the critical path are highlighted.
Authorized licensed use limited to: Saveetha Engineering College. Downloaded on February 04,2021 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.
12 IEEE LETTERS OF THE COMPUTER SOCIETY, VOL. 3, NO. 1, JANUARY–JUNE 2020
TABLE 3 TABLE 4
Results of Various Signed Multipliers in Terms of Critical Path Delay Comparison of Performance of the Proposed
(CPD), Power, Area, PDP, and ADP at 45 nm After Digital 20 x 20 Multiplier and SDCM at 45 nm
Synthesis Using RTL Compiler v11.10
Multipliers Delay (ns) Power (nW) Area (sq. mm) PDP (fJ) ADP (sq.mm x ns)
Multipliers CPD Power Area PDP ADP Prop. CSM 1.45 226334 3545 327 5123
Size
ns nW sq. mm fJ sq.mm x ns SDCM[10] 2.83 169039 2944 479 8340
comparison with multipliers 2, 3, 4, 5, 6, 7, 8 and 9, whereas among please visit our Digital Library at www.computer.org/csdl.
16 16 multipliers it is reduced by 60.87, 40.56, 65.96, 49.71, 77.36,
40.19, 74.91 and 59.96 percent respectively. It is found that the pro-
posed 8 8 and 16 16 multipliers excel all the accurate multipliers
(2 to 9) in terms of all the 5 performance metrics. The number of bits
in the partial product matrix of significance-driven logic compres-
sion based approximate multiplier (SDCM) is reduced by perform-
ing lossy logic compression. Improvement in PDP and ADP is
achieved at the cost of increased percentage of inaccuracy. The CPD
of proposed 8 x 8 and 16 x 16 multipliers are reduced by 3.75 and
15.36 percent respectively in comparison with SDCM. With the
increase in multiplier size, the CPD, PDP and ADP of the proposed
multiplier further improves as shown in Table 4, primarily due to
the high speed vector merging addition.
5 CONCLUSION
A delay, power, area and energy efficient carry save multiplier is
presented. Precise critical path analysis of carry save multiplier
was done and derived the critical path delay as a function of num-
ber of full adders and logic gates. Remarkably improved perfor-
mance in terms of majority of the performance metrics is achieved
through the use of improved full adder and modified square root
CSLA. The structural simplicity and regularity of the full adder/
half adder array of the proposed CSM is improved through the use
of IFA. The carry generation delay, area and power of the IFA is
Authorized licensed use limited to: Saveetha Engineering College. Downloaded on February 04,2021 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.