Professional Documents
Culture Documents
PH Alguna 2018
PH Alguna 2018
PH Alguna 2018
P. V. Ananda Mohan§
Department of R&D, Centre for Development of Advanced Computing,
Ministry of Electronics and Information Technology,
No. 1, Knowledge Park, Old Madras Road,
Byappanahalli, Bengaluru 560038, Karnataka, India
anandmohanpv@live.in
In this paper, mixed radix conversion (MRC)-based residue number system (RNS)-to-binary
converters for two new three-moduli sets f2 k 3, 2 k 2, 2 k 1g and f2 k þ 1, 2 k þ 2, 2 k þ 3g
which are derived from the moduli set f2m þ 1, 2m, 2m 1g are presented. These have the
advantage of having one modulus of the form 2 k 1 or 2 k þ 1 simplifying computations in one
residue channel. The proposed reverse converters are evaluated and compared with state-of-the-
art reverse converters proposed in literature for other three-moduli sets regarding hardware
requirement and conversion time.
Keywords: Residue number systems; reverse converters; three-moduli sets; CRT; mixed radix
conversion.
1. Introduction
The advantages of residue number system (RNS) such as carry-free operation,
modularity and fault tolerance have made it attractive in applications like cryp-
tography, digital signal processing (DSP) and communication systems.1–4 An RNS
1850224-1
P. S. Phalguna, D. V. Kamat & P. V. Ananda Mohan
1850224-2
RNS-to-Binary Converters for New Three-Moduli Sets
2. Background Material
The two popular approaches used for the reverse conversion process in RNS are CRT
and MRC. In CRT, we compute decoded binary number X as
!
by UNIVERSITY OF NEW ENGLAND on 04/30/18. For personal use only.
X
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
j
1
X¼ xi Mi mod M ; ð1Þ
i¼1
M i mi
where M is the product of all j mod. Note that Mi ¼ M=mi and xi are the given
residues de¯ned such that xi ¼ X mod mi . Note also that ¼ ð M1 i Þmi is denoted as
multiplicative inverse of Mi with respect to modulus mi satisfying the relationship
that Mi when divided by mi yields the remainder 1. The main advantage of CRT is
the parallel computation of various terms in Eq. (1) corresponding to the given
residues xi followed by the summation of various terms mod M. However, X in
Eq. (1) before modulo M reduction can be less than (j M) thus needing the time
consuming mod M reduction. In special cases, where M is of the form (2 x 1), CRT
will be attractive as has been demonstrated in case of the popular three-moduli set
f2 p 1, 2 p , 2 p þ 1)g (Refs. 7–12), four-moduli set f2 p 1, 2 p , 2 p þ 1, 2 2p þ 1g
(Ref. 22) and the ¯ve-moduli set for p even f2 p 1, 2 p , 2 p þ 1, 2 p 2 ðpþ1Þ=2 þ 1,
2 p þ 2 ðpþ1Þ=2 þ 1g.23
In MRC, the decoded number X is obtained as
!
Y
j1
X ¼ Uj m i þ þ U2 m 1 þ U1 ; ð2Þ
i¼1
ð3Þ
1850224-3
P. S. Phalguna, D. V. Kamat & P. V. Ananda Mohan
Note that MRC is a sequential process needing (j 1) steps. In each step, a single
mixed radix digit Ui is determined. The next step is to compute Eq. (2). Note that the
cumbersome modulo M reduction needed in the case of CRT in Eq. (1) is not needed
in MRC since X < M. In the present paper, we use MRC technique for deriving
various reverse converters.
3.1. MRC-based converters for the moduli set M2 f2 k ¡3, 2 k ¡2, 2 k ¡1g
We denote m1A ¼ 2 k 3, m2A ¼ 2 k 2 and m3A ¼ 2 k 1, the corresponding
residues as (x1A , x2A , x3A ) and the binary number corresponding to this residue set as
X. The MRC technique for M2 is shown in Fig. 1(a). The various multiplicative
inverses needed in the computation are as follows:
1
p¼ ¼ 1; ð4aÞ
2 k 2 2 k 3
1
q¼ ¼ 1 ; ð4bÞ
2 k 2 2 k 1
1
r¼ ¼ 2 k1 : ð4cÞ
2 k 3 2 k 1
1850224-4
RNS-to-Binary Converters for New Three-Moduli Sets
by UNIVERSITY OF NEW ENGLAND on 04/30/18. For personal use only.
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
(a)
(b)
Fig. 1. (a) MRC algorithm for the moduli set M2 and (b) architecture of converter D6CE.
1850224-5
P. S. Phalguna, D. V. Kamat & P. V. Ananda Mohan
t
CPA2 Ci =1
m1A
CPA4
k
EAC MODSUBB
T1 CPA3
T2 k
2:1 MUX MODSUBA
by UNIVERSITY OF NEW ENGLAND on 04/30/18. For personal use only.
(a) (b)
xc xd
k k
-
mi
k
Ci =1
CPA5 CSA2
C2 S2
T3 Ci =1
CPA6
T4
k
(xc-xd) mod mi
(c)
Fig. 2. (a) Subtractor mod (2 k 3), (b) subtractor mod (2 k 1) and (c) high-speed (HS) mod mi
subtractor.
where x2A , UA and UC are the mixed radix digits. We rewrite negative terms present
in Eq. (5) in terms of one's complement of similar word length and adding a
1850224-6
RNS-to-Binary Converters for New Three-Moduli Sets
Fig. 3(a) are e®ectively packed along with the correction term in order to reduce the
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
number of words need to be added to realize Eq. (6) to six as shown in Fig. 3(b).
The various words W 1–W 6 present in bit matrix of Fig. 3(b) can be added using a
four-level 3k-bit carry save adder (CSA) tree (CSA1) followed by 3k-bit carry
propagate adder (CPA1) as shown in the BLOCK1 in Fig. 1(b).
The high-speed converter D6 high-speed (HS) can be realized by replacing the
MODSUBA block and the two MODSUBB blocks of Fig. 1(b) with the MODSUBC
block shown in Fig. 2(c) to obtain intermediate results U A , UB and U C without
changing the other blocks. In the implementation of the architecture of Fig. 1(b), we
need three modulo subtractors. The CE version of a modulo subtractor (MODSUBA
block) for computing (x1A x2A Þ mod (2 k 3) can be realized as shown in Fig. 2(a)
in which we compute T1 ¼ ðx1A x2A Þ followed by T2 ¼ ðT1 þ m1A Þ using two k-bit
adders CPA2 and CPA3 and based on the sign of T1 , we select either T1 or T2 using a
(a)
W1 uC3 uC2 uC1 uC0 uA3 uA2 uA1 uA0 x2A3 x2A2 x2A1 x2A0
W2 u′ C3 u′ C2 u′ C1 u′ C0 u′ C1 u′ A3 u′ A2 u′ A1 u′ A0
W3 u′ C3 u′ C2 uC3 u′ C0 uC1 uC0 uC0
W4 uC2 uC2 uC1
W5 uC3
W6 ct11 ct10 ct9 ct8 ct7 ct6 ct5 ct4 ct3 ct2 ct1 ct0
(b)
Fig. 3. Bit matrices for the last stage computation of D6CE for k ¼ 4.
1850224-7
P. S. Phalguna, D. V. Kamat & P. V. Ananda Mohan
2:1 multiplexer (2:1 MUX). Thus the hardware requirement is two k-bit CPAs
(CPA2 and CPA3) and one k-bit 2:1 MUX. The computation time needed is ð2kÞ
FA þ MUX where FA and MUX are delays of a full-adder (FA) and a 2:1 MUX,
respectively.
In the special case of m1A ¼ 2 k 1, the modulo subtractor (x2A x3A ) mod
k
(2 1) can be realized in a cost-e®ective way using a CPA4 with end-around carry
(EAC) as shown in the MODSUBB block in Fig. 2(b). Thus, the hardware re-
quirement is less (only k inverters and k full-adders) whereas the computation time
is 2kFA .
We can use the HS modulo subtractor of Fig. 2(c) in which we compute T3 ¼
ðxc xd Þ and T4 ¼ ðxc xd þ mi Þ in two parallel adders and based on the sign of T3
by UNIVERSITY OF NEW ENGLAND on 04/30/18. For personal use only.
we select either T3 or T4 using a 2:1 MUX. Note that one's complement of xd and a
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
carry input of one are added to realize two's complement of xd . Thus the hardware
requirement is two k-bit carry-propagate adders (CPA5 and CPA6), one k-bit
carry-save adder (CSA2) and one k-bit 2:1 MUX. The computation time is
(k þ 1ÞFA þ MUX . We denote this block as MODSUBC.
3.2. MRC-based converters for the moduli set M3 f2 k +1, 2 k +2, 2 k +3g
The MRC for this moduli set follows the procedure presented in Fig. 4(a). We denote
the residues as (x1B , x2B , x3B ) corresponding to the moduli m1B ¼ 2 k þ 1, m2B ¼
2 k þ 2 and m3B ¼ 2 k þ 3, respectively. Note that all the moduli are of length (k þ 1)
bits. The various multiplicative inverses needed for this approach shown in Fig. 4(a)
are as follows:
1
e¼ ¼ 1 ; ð7aÞ
2 k þ 2 2 k þ3
1
f¼ ¼ 1; ð7bÞ
2 k þ 2 2 k þ1
1
g¼ ¼ 2 k1 þ 1 : ð7cÞ
2 k þ 3 2 k þ1
Note that Eqs. (7a)–(7c) can be easily veri¯ed noting that ðð2 k þ 2Þ
ð1ÞÞmodð2 k þ 3Þ ¼ 1, ð2 k þ 2Þ mod ð2 k þ 1Þ ¼ 1 and (ð2 k þ 3Þ ð2 k1 þ 1Þ) mod
ð2 k þ 1Þ ¼ 1. The architecture of the converter D7 following Fig. 4(a) is shown in
Fig. 4(b). The mixed radix digit P can be computed as ðx2B x3B Þ þ tð2 k þ 3Þ since
e ¼ 1 [see Eq. (7a)] where if x2B x3B , t is 0 else t is 1. Note that ðx2B x3B Þ þ
tð2 k þ 3Þ is computed using MODSUBA block [see Fig. 4(b)]. The sign bit of
the result of (x2B x3B ) [sign bit of the output of CPA2 in MODSUBA block in
Fig. 2(a)] is considered as t.
Next, we consider computation of the mixed radix digit Q. We compute
(x1B x2B ) but we defer modulo m1B reduction since the multiplicative inverse
1850224-8
RNS-to-Binary Converters for New Three-Moduli Sets
(a)
k+1 - k +1
x3BH
t Ci =1 k
k k
CPA2
m3B x1BH
CSA3
k +1 k
C3 S3
T1 (x2B)1C
CPA3
CSA4
T2 k+1
C4 S4
2:1 MUX
MODSUBA CSA5
k +1 C5 S5
s Ci =1-t
m1B
P CPA7
k+1 k+1
T6
s′
EXOR
gates
Ci =s′
CPA8
T5
2:1 MUX
k+1
Q
(b)
Fig. 4. (a) MRC for three-moduli set f2 k þ1, 2 k þ2, 2 k þ3g and (b) architecture of converter D7.
1850224-9
P. S. Phalguna, D. V. Kamat & P. V. Ananda Mohan
f with which we need to multiply mod (2 k þ 1) is unity [see Eq. (7b)]. Next, unlike
in conventional MRC, we subtract (ðx2B x3B Þ þ tð2 k þ 3Þ) from (x1B x2B ) to
obtain the intermediate result
Q ¼ ðx3B 2x2B þ x1B tð2 k þ 3ÞÞmodð2 k þ 1Þ
Note that in the second equality, we have used the fact ð2 k þ 3Þ mod ð2 k þ 1Þ ¼ 2.
The subtraction of (ðx2B x3B Þ þ tð2 k þ 3ÞÞ instead of P has the advantage that t is
available before P is available saving one k-bit CPA delay.
Next, we consider realizing the computation of Q and multiplication with
by UNIVERSITY OF NEW ENGLAND on 04/30/18. For personal use only.
g ¼ ð2 k1 þ 1Þ. Note that g can be obtained by deleting the least signi¯cant bit
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
where x2B , P and Q are the mixed radix digits. We rewrite Eq. (9) as
1850224-10
RNS-to-Binary Converters for New Three-Moduli Sets
(a)
W9 q2 q3 q2 q1 p3 q1 q0 q0
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
W 10 q4 q4 q3 q0 q2 q1
W 11 q4 q2
W 12 q3
(b)
Fig. 5. Bit matrices for the ¯nal stage computation of converter D7 for k ¼ 4.
As an illustration, we consider k ¼ 4 for which case the bits that need to be summed
are shown in Fig. 5(a). These can be rearranged in order to reduce the number of words
that need to be added to six as shown in Fig. 5(b). Using a four-level ð3k þ 1Þ-bit CSA
tree followed by ð3k þ 1Þ-bit CPA, we can obtain the decoded integer X.
The hardware requirements for the computation of last stages of the converters
D6 and D7 can be estimated as ð8k 1ÞFA þ ð2k þ 1ÞHA and ð6k þ 7ÞFA þ ðk 2ÞHA ,
respectively, where HA denotes half-adder; the conversion times required for the last
stages of D6 and D7 are estimated as (3kþ4)FA and ð3k þ 5ÞFA , respectively.
1850224-11
P. S. Phalguna, D. V. Kamat & P. V. Ananda Mohan
UA UC
k k
-
m2A
k
Ci =1
CPA9 CSA6
C6 S6
T7 Ci =1
CPA10
T8
by UNIVERSITY OF NEW ENGLAND on 04/30/18. For personal use only.
2:1 MUX
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
ZA
since ð2 k 3Þ mod ð2 k 2Þ ¼ 1. Thus the overhead needed for scaling is the
additional hardware needed to compute ðUC þ UA Þ mod ð2 k 2Þ as shown in
Fig. 6. Note that (UA UC ) and (UA UC þ m2A ) are computed in parallel using
CPA9, CSA6 and CPA10, respectively, and based on the sign of the carry output of
CPA9, using a 2:1 MUX the correct result ZA is selected. Since UC and UA are k-bit
wide, ZA lies between 2 k 4 and (2 k 2). This needs in the worst-case addition of
m2A with (UC þ UA Þ. The scaled result in RNS is thus fZA , UA , UB g. Note that the
total hardware requirement needed in the architecture of scaler for M2 is 7k FAs þ2k
2:1 MUXs and the time for scaling is ð5k þ 1ÞFA þ 22:1 MUX . The following
example illustrates the scaling by modulus 14 in the RNS f14, 13, 15g.
Example. We consider scaling of 2,390 whose RNS form is (10, 11, 5) corresponding to
the moduli set f14, 13, 15g. The complete MRC following Fig. 1(a) is illustrated next:
Note that in the ¯rst stage of MRC we got the residues UA ¼ 1 and UB ¼ 5 of
the quotient QA of division of 2,390 by 14. We need to ¯nd the residue with respect to
1850224-12
RNS-to-Binary Converters for New Three-Moduli Sets
the modulus 14. This step is called base extension. Here we ¯nd the quotient QA as
UC m1A þ UA ¼ 13 13 þ 1. We ¯nd QA mod 14 as ð1Þ 13 þ 1 ¼ 12 mod
14 ¼ 2. Hence the scaled result in RNS is f2, 1, 5g corresponding to 170.
Note that scaling by product of two moduli is also feasible. Note that UC is the
result of scaling (division) by (m1A m2A ). The scaled result in RNS is fU C0 , U C00 ,
UC g where U C0 ¼ UC mod m2A and U C00 ¼ UC mod m1A .
the architecture in Fig. 8(a). Evidently, scaling by m2B is achieved in two steps: ¯rst
by computing (x3B x2B ) mod m3B and (x1B x2B ) mod m1B and next by multi-
plication with the respective multiplicative inverses. This yields the residues UD and
UE corresponding to the quotient ðX x2B Þ=m2B . Next the MRC will be continued
to get the MRC digit UF as shown in Fig. 7. Note that g ¼ 2 k1 þ 1 as de¯ned before
[see Eq. (7c)]. We denote U F ¼ ðUE UD Þ mod ð2 k þ 1Þ, and write U F ¼ 2FH þ F0
where FH is the k-bit MSB word in U F and F0 is the LSB of U F . Thus we compute
ðU F gÞ mod ð2 k þ 1Þ as
Fig. 7. MRC technique for the moduli set M3 to illustrate scaling by modulus ð2 k þ 2Þ.
1850224-13
P. S. Phalguna, D. V. Kamat & P. V. Ananda Mohan
m 3B MODSUBC2 m 1B
MODSUBC1
k+1 k+1
UD UE
k+1
-
MODSUBC3 m 1B
UF* k+1
by UNIVERSITY OF NEW ENGLAND on 04/30/18. For personal use only.
k
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
k+1
UF k+1
m 2B
CPA11 CSA7
C7 S7
T9
Ci =1
CPA12
T10
2:1 MUX
k+1
ZB
(a)
UF* g
k+1
k
F0
k AND
FH gates
CPA13
k+1
(b)
Fig. 8. (a) Architecture of scaling for moduli set M3 and (b) computation of (U F g) mod (2 k þ1).
1850224-14
RNS-to-Binary Converters for New Three-Moduli Sets
Note that in the ¯rst stage of MRC we got the residues of the quotient QB of
division by 18. We need to ¯nd residue with respect to the modulus 18. This step
is called base extension. Here the quotient word QB ¼ 14 19 þ 17 ¼ 283. We
¯nd QB mod 18 as 14 ð1Þ þ 17 ¼ 31 mod 18 ¼ 13. Hence, the scaled result in
RNS is f13; 17; 11g ¼ 283. Scaling by product of two moduli (m2B m3B ) is
also feasible. Note that UF is the result of scaling by (m2B m3B ). The
scaled result in RNS is fU F0 , U F00 , UF g where U F0 ¼ UF mod m2B and U F00 ¼ UF
mod m3B .
Note that the computation of mixed radix digit UD and the intermediate results
UE and U F is carried out using three (k þ 1)-bit MODSUBC blocks of Fig. 2(c).
Next, the computation of mixed radix digit UF from U F is carried out using modulo
multiplier presented in Fig. 8(b). Note that (UF þ UD ) and (UF þ UD m2B ) are
computed in parallel using CPA11, CSA7 and CPA12, respectively, and based on the
sign of the carry output of CPA11, using a 2:1 MUX the correct result ZB is selected
as shown in Fig. 8(a). Thus, the total hardware requirement needed in the archi-
tecture of scaler for M3 is ð13k þ 3Þ FAs þð4k þ 4Þ 2:1 MUXs þ k AND gates and the
time for scaling is ð4k þ 7ÞFA þ 32:1 MUX þ AND . Evidently scaling is faster than
RNS-to-binary conversion using MRC, since MRC needs additional step of com-
puting the ¯nal decoded number following (2).
1850224-15
P. S. Phalguna, D. V. Kamat & P. V. Ananda Mohan
Table 1. Comparison of hardware requirements and conversion times of various reverse converters for
the three- moduli sets M1, M2 and M3.
EXORsþ L1 þ L2M
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
D831 M1 f2mþ1, 2m, 2m1g (15uþ1) FAs þ 2u HAs þ (uþ1) (5uþ4)FA þ L1 þ 22:1 MUX
ANDs þ EXOR þ (4uþ2)
2:1 MUXs þ L1þ L3
D6CE M2 f2 k 3, 2 k 2, 2 k 1g ð12k þ 3Þ FAs þð2k þ 1Þ HAs ð7k þ 8ÞFA
þk 2 : 1 MUXs
D6HS M2 f2 k 3, 2 k 2, 2 k 1g ð17k þ 8Þ FAs þð2k þ 1Þ HAs ð5k þ 8ÞFA
þ3k 2:1 MUXs
D7 M3 f2 k þ 1, 2 k þ 2, 2 k þ 3g ð14k þ 25Þ FAs þ (k 2) HAs ð7k þ 15ÞFA þ 3:1 MUX þ
þðk þ 1Þ 2:1 MUXs 2AND
þðk þ 1Þ 3:1 MUXs
þð2k þ 1Þ ANDs þðk þ 1Þ
EXORs
1850224-16
RNS-to-Binary Converters for New Three-Moduli Sets
subtractor, one u-bit subtractor, one u-bit modulo 2m adder and one 3u-bit modulo
ð2mð4m 2 1ÞÞ adder. This design is also considered for comparison (presented in
Table 1 as entry D8). We considered high-speed implementation of all modulo adders
in this evaluation. Note that all the converters D1–D5 and D8 need two multipliers
having quadratic dependence of area on m.
The converter D6CE for the moduli set M2 needs lowest area due to its multiplier-
free architecture. Among the two converters D6CE and D6HS, there is trade-o® be-
tween conversion time and area. The converter D7 using one modulus of the form
2 k þ 1 does not need multipliers, however, needs more hardware resources than the
converter D6CE. The conversion time of D7 is comparable to that of converter
D6CE. The converter D6HS needs the least conversion time among all the converters.
by UNIVERSITY OF NEW ENGLAND on 04/30/18. For personal use only.
The proposed designs D6CE, D6HS, D7 as well as design D520 were implemented
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
using Cadence (version 14.20), Compiler: RC 14.25 and synthesized using Cadence
Encounter tool using 180 nm technology. The post-place and route results of area,
conversion time and power dissipation for the designs D5, D6CE, D6HS and D7 are
presented in Table 3. The appropriate m values for dynamic range (DR) ranging
from 8-bit to 64-bit, are summarized in Table 2. Note that in Table 2, the moduli set
Table 3. ASIC implementation results of various reverse converters for the three-moduli sets M1, M2
and M3.
Design
20
D5 D6CE (Proposed)
Dynamic range Area (m 2 ) Delay (ps) Power (W) Area (m 2 ) Delay (ps) Power (W)
8-bit 3,765 2,609 677 4,717 3,048 1,136
16-bit 12,507 5,828 4,545 9,876 4,355 3,592
24-bit 23,168 8,091 11,389 19,752 7,623 9,940
32-bit 31,617 9,382 17,984 27,130 8,623 15,302
48-bit 61,236 14,225 34,934 53,681 13,816 32,997
64-bit 90,897 17,638 62,140 85,006 17,270 60,400
D6HS (Proposed) D7 (Proposed)
8-bit 5,568 3,360 1,456 3,659 3,212 687
16-bit 11,253 4,585 4,338 11,726 6,358 4,263
24-bit 22,390 8,181 12,238 19,496 7,724 8,840
32-bit 30,210 9,894 18,842 30,639 11,870 18,432
48-bit 57,883 14,670 39,152 53,968 17,249 34,164
64-bit 90,664 18,422 72,056 90,867 22,924 66,484
1850224-17
P. S. Phalguna, D. V. Kamat & P. V. Ananda Mohan
for given m is f2m þ 1, 2m, 2m 1g. As an illustration, for 16-bit DR the moduli
sets M1, M2 and M3 are, respectively, f41, 42, 43g (m ¼ 21), f61, 62, 63g (m ¼ 31)
and f65, 66, 67g (m ¼ 33).
From Table 3, regarding area, for 8- and 24-bit DRs design D7 is preferable,
whereas for 16-bit DR, design D6CE is preferable. For 32-, 48- and 64-bit DRs,
designs D6CE, D6HS and D7 outperform design D5. Among the three new
converters, design D6CE is superior.
Regarding conversion time, for 8-bit DR design D5 is better and for 16-bit DR,
designs D6CE and D6HS are better than other converters. For all other DRs, con-
verter D6CE is better than the other converters. Among the new converters, D6CE is
preferable regarding conversion time.
by UNIVERSITY OF NEW ENGLAND on 04/30/18. For personal use only.
Regarding power dissipation, design D5 is superior for 8-bit DR, whereas for
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
16-bit DR, designs D6HS, D6CE and D7 outperform design D5; for other DRs,
design D6CE is superior over other converters.
6. Conclusions
In this paper, we have presented RNS-to-binary converters for the moduli sets
f2 k 2, 2 k 3, 2 k 1g and f2 k þ 2, 2 k þ 3, 2 k þ 1g using MRC technique. All the
proposed converters were evaluated based on the hardware resource requirement as
well as conversion time and compared with all converters described in literature for
the moduli set f2m þ 1, 2m, 2m 1g. All the proposed converters are implemented
and compared with the area-e±cient converter20 for M1 regarding area, conversion
time and power dissipation for di®erent DRs. The proposed converters for M2 and
M3 were shown to be better than some of the other converters regarding area and
conversion time while having the advantage of availability of mixed radix digits to
facilitate easy comparison. We have also presented techniques for scaling by one
modulus for both the proposed moduli sets using mixed radix conversion. E±cient
binary-to-RNS converters and multipliers for the moduli 2 k 3 and 2 k þ 3
(Refs. 25–27) and multipliers28,29 and binary-to-RNS converters for the moduli ð2 k
1Þ and ð2 k þ 1Þ are available in literature.2 The binary-to-RNS converter for the even
moduli ð2 k þ 2Þ and ð2 k 2Þ can employ the binary-to-RNS converter for moduli
ð2 k1 þ 1Þ and (2 k1 1Þ with suitable modi¯cation described in Ref. 30.
References
1. N. S. Szabo and R. I. Tanaka, Residue Arithmetic and Its Applications to Computer
Technology (McGraw-Hill, New York, 1967).
2. P. V. Ananda Mohan, Residue Number Systems: Theory and Applications (Birkhauser,
Switzerland, 2016).
3. A. Omondi and A. B. Premkumar, Residue Number System Theory and Implementation
(Imperial College Press, 2007).
1850224-18
RNS-to-Binary Converters for New Three-Moduli Sets
446–447.
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
10. M. Bhardwaj, A. B. Premkumar and T. Srikanthan, Breaking the 2n-bit carry propa-
gation barrier in residue to binary conversion for the f2 n 1, 2 n , 2 n þ 1g moduli set,
IEEE Trans. Circuits Syst. I, Fundam. Theory Appl. 45 (1998) 998–1002.
11. Z. Wang, G. A. Jullien and W. C. Miller, An improved residue to binary converter, IEEE
Trans. Circuits Syst.I, Fundam. Theory Appl. 47 (2000) 1437–1440.
12. Y. Wang, X. Song, M. Aboulhamid and H. Shen, Adder based residue to binary number
converters for f2 n 1, 2 n , 2 n þ1g, IEEE Trans. Signal Process. 50 (2002) 1772–1779.
13. A. A. Hiasat and H. S. Abdel-Aty-Zohdy, Residue to binary arithmetic converter for the
moduli set f2 k , 2 k 1, 2 k1 1g, IEEE Trans. Circuits Syst. II, Analog Digit. Signal
Process. 45 (1998) 204–209.
14. W. Wang, M. N. S. Swamy, M. O. Ahmad and Y. Wang, A high-speed residue-to-binary
converter for thee moduli f2 k , 2 k 1, 2 k1 1g RNS and a scheme for its VLSI
implementation, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process. 47 (2000)
1576–1581.
15. W. Wang, M. N. S. Swamy, M. O. Ahmad and Y. Wang, A note on \A high-speed
residue-to-binary converter for thee moduli f2 k , 2 k 1, 2 k1 1g RNS and a scheme for its
VLSI implementation", IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.
49 (2002) 230.
16. P. V. Ananda Mohan, New residue to binary converters for the moduli set f2 k , 2 k 1,
2 k1 1g, Proc. IEEE Region 10 Conf. (TENCON 2008) (2008), pp. 1–6.
17. A. B. Premkumar, M. Bhardwaj and T. Srikanthan, High-speed and low-cost reverse
converters for the f2n1, 2n, 2nþ1g moduli set, IEEE Trans. Circuits Syst. II, Analog
Digit. Signal Process. 45 (1998) 903–908.
18. Y. Wang, M. N. S. Swamy and M. O. Ahmad, Residue-to-binary number converters
for three moduli sets, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.
46 (1999) 180–183.
19. K. A. Gbolagade and S. D. Cotofana, An e±cient RNS to binary converter using
the moduli set f2n1, 2n, 2nþ1g, Proc. XXIII Conf. Design of Circuits and Integrated
Systems (DCIS) (2008).
20. K. A. Gbolagade, G. R. Voicu and S. D. Cotofana, An e±cient FPGA design of resi-
due-to-binary converter for the moduli set f2nþ1, 2n, 2n1g, IEEE Trans. Very Large
Scale Integr. (VLSI) Syst. 19 (2011) 1500–1503.
21. Y. Wang, Residue to binary converters based on new Chinese remainder theorems,
IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process. 47 (2000) 197–205.
1850224-19
P. S. Phalguna, D. V. Kamat & P. V. Ananda Mohan
22. B. Cao, C. H. Chang and T. Srikanthan, An e±cient reverse converter for the 4-moduli
set f2 n 1, 2 n , 2 n þ1, 2 2n þ1g based on the new Chinese remainder Theorem, IEEE Trans.
Circuits Syst. I, Fundam. Theory Appl. 50 (2003) 1296–1303.
23. A. Hiasat, VLSI implementation of new arithmetic residue to binary decoders,
IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 13 (2005) 153–158.
24. E. E. Swartzlander, Jr., Merged arithmetic, IEEE Trans. Comput. 29 (1980) 946–950.
25. P. M. Matutino, H. Pettenghi, R. Chaves and L. Sousa, Multiplier based binary to RNS
converters modulo (2 n k), Proc. 26th Conf. Design of Circuits and Integrated Systems
(DCIS) (2011), pp. 125–130.
26. P. M. Matutino, R. Chaves and L. Sousa, Binary to RNS conversion units for moduli
(2 n 3), Proc. 14th IEEE Euromicro Conf. Digital System Design (DSD) (2011),
pp. 460–467.
27. H. Pettenghi, R. Chaves and L. Sousa, Method for designing modulo f2 n kg binary
by UNIVERSITY OF NEW ENGLAND on 04/30/18. For personal use only.
to RNS converters, Proc. Int. Conf. Design of Circuits and Integrated Systems (DCIS)
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
1850224-20