PH Alguna 2018

Journal of Circuits, Systems, and Computers
Vol. 27, No. 14 (2018) 1850224 (20 pages)

.c World Scienti¯c Publishing Company
#
DOI: 10.1142/S0218126618502249
RNS-to-Binary Converters for New Three-Moduli Sets

f2 k ¡3, 2 k ¡2, 2 k ¡1g and f2 k þ 1, 2 k þ 2, 2 k þ 3g¤
P. S. Phalguna† and Dattaguru V. Kamat‡

by UNIVERSITY OF NEW ENGLAND on 04/30/18. For personal use only.
Department of Electronics and Communication Engineering,

J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
Manipal Institute of Technology,

Manipal Academy of Higher Education,
Manipal 576104, Karnataka, India
†phalguna.ps@learner.manipal.edu
‡
dv.kamath@manipal.edu
P. V. Ananda Mohan§
Department of R&D, Centre for Development of Advanced Computing,
Ministry of Electronics and Information Technology,
No. 1, Knowledge Park, Old Madras Road,
Byappanahalli, Bengaluru 560038, Karnataka, India
anandmohanpv@live.in
Received 27 November 2017

Accepted 28 February 2018
Published 9 April 2018
In this paper, mixed radix conversion (MRC)-based residue number system (RNS)-to-binary
converters for two new three-moduli sets f2 k 3, 2 k 2, 2 k 1g and f2 k þ 1, 2 k þ 2, 2 k þ 3g
which are derived from the moduli set f2m þ 1, 2m, 2m 1g are presented. These have the
advantage of having one modulus of the form 2 k 1 or 2 k þ 1 simplifying computations in one
residue channel. The proposed reverse converters are evaluated and compared with state-of-the-
art reverse converters proposed in literature for other three-moduli sets regarding hardware
requirement and conversion time.
Keywords: Residue number systems; reverse converters; three-moduli sets; CRT; mixed radix
conversion.
1. Introduction
The advantages of residue number system (RNS) such as carry-free operation,
modularity and fault tolerance have made it attractive in applications like cryp-
tography, digital signal processing (DSP) and communication systems.1–4 An RNS
*This paper was recommended by Regional Editor Zoran Stamenkovic.

§ Corresponding author.
1850224-1
P. S. Phalguna, D. V. Kamat & P. V. Ananda Mohan
processor has a front-end binary-to-residue converter also called forward converter.

Several e±cient techniques for forward conversion are available in literature. Next,
the resulting residues of various moduli channels are processed using several moduli
processors working in parallel which perform various arithmetic operations such as
modulo addition, subtraction, modulo multiplication, etc. After the computation is
completed, the resulting residues need to be converted into conventional binary form
using a reverse converter also called residue-to-binary converter. Signal processing
algorithms involve other operations such as sign detection, magnitude comparison
and scaling which generally require reverse conversion to be performed ¯rst. It is thus
important that reverse conversion be performed e±ciently. The moduli sets thus
need to be chosen so as to speed up the reverse conversion.
Several three-, four- or higher-moduli sets have been described in literature.2

These prefer the use of powers-of-two-related moduli of the forms 2 , 2 þ 1, 2 1,

2 þ 3 and 2 3. In addition, other three-moduli sets which use consecutive
numbers as moduli also have been investigated, viz. M1 f2m þ 1, 2m, 2m 1g
(Ref. 5) and f2m, 2m þ 1, 2m þ 2g (Ref. 6) the latter using two moduli 2m and
2m þ 2 which have a common factor. The moduli set f2 p 1, 2 p , 2 p þ 1g (see
Refs. 7–12) is a special case of the moduli set f2m þ 1, 2m, 2m 1g. The moduli set
f2 p1 1, 2 p 1, 2 p g (see Refs. 13–16) is obtained from the moduli set f2m, 2m þ 1,
2m þ 2g (Ref. 6) by removing the common factor from the moduli 2m ¼ ð2 p 2Þ and
2m þ 2 ¼ 2 p in the moduli set f2 p 2, 2 p 1, 2 p g to make the moduli relatively prime.
Premkumar suggested the three-moduli set M1 f2m þ 1, 2m, 2m 1g in Ref. 5
and several reverse converters for this moduli set have been reported in the lit-
erature.5,17–20,31 The ¯rst reverse converter for M1 is presented in Ref. 5 using
Chinese Remainder Theorem (CRT). Later, two reverse converters have been
presented using a modi¯cation of CRT for reducing the modulo reduction com-
plexity.17 Reverse converters for this moduli set using new CRT II21 also have
been investigated.18 More recently, improved reverse converters for this moduli
set using CRT have been presented.19,20 However, these converters can be con-
sidered to be similar to a mixed radix conversion (MRC)-type design. The in-
termediate digits derived, however, are not amenable for facilitating comparison
since one of the intermediate digits can be negative. Another converter for M1 has
been presented in Ref. 31 using core function. In this paper, we consider the MRC
technique for reverse conversion for two special cases of this moduli set which use
one modulus of the type 2 k 1 or 2 k þ 1. These two moduli sets are M2 f2 k 3,
2 k 2, 2 k 1g and M3 f2 k þ 1, 2 k þ 2, 2 k þ 3g. These choices will be shown to
lead to some simpli¯cation in the reverse converter architectures than those for
the general case of moduli set f2m þ 1, 2m, 2m 1g. All the proposed archi-
tectures are compared with the state-of-the-art reverse converters reported earlier
for the moduli set M1 f2m þ 1, 2m, 2m 1g in the literature regarding hardware
requirement and conversion time.
1850224-2
In Sec. 2, background material is given in brief. The proposed MRC-based reverse

converter architectures are presented in Sec. 3. We also consider the problem of
scaling in the proposed RNS by one modulus for both the proposed moduli sets in
Sec. 4. The evaluation and comparison of the proposed converters with converters for
the moduli set M1 reported earlier and implementation results are provided in Sec. 5.
The concluding remarks are given in Sec. 6.
2. Background Material
The two popular approaches used for the reverse conversion process in RNS are CRT
and MRC. In CRT, we compute decoded binary number X as
!
X
j
1
X¼ xi Mi mod M ; ð1Þ
i¼1
M i mi
where M is the product of all j mod. Note that Mi ¼ M=mi and xi are the given
residues de¯ned such that xi ¼ X mod mi . Note also that ¼ ð M1 i Þmi is denoted as
multiplicative inverse of Mi with respect to modulus mi satisfying the relationship
that Mi when divided by mi yields the remainder 1. The main advantage of CRT is
the parallel computation of various terms in Eq. (1) corresponding to the given
residues xi followed by the summation of various terms mod M. However, X in
Eq. (1) before modulo M reduction can be less than (j M) thus needing the time
consuming mod M reduction. In special cases, where M is of the form (2 x 1), CRT
will be attractive as has been demonstrated in case of the popular three-moduli set
f2 p 1, 2 p , 2 p þ 1)g (Refs. 7–12), four-moduli set f2 p 1, 2 p , 2 p þ 1, 2 2p þ 1g
(Ref. 22) and the ¯ve-moduli set for p even f2 p 1, 2 p , 2 p þ 1, 2 p 2 ðpþ1Þ=2 þ 1,
2 p þ 2 ðpþ1Þ=2 þ 1g.23
In MRC, the decoded number X is obtained as
!
Y
j1
X ¼ Uj m i þ þ U2 m 1 þ U1 ; ð2Þ
i¼1
where Ui are the mixed radix digits de¯ned as

1
U1 ¼ x1 ; U2 ¼ ðx2 U1 Þ ;
m1 m 2 m 2
! !
1 1
U3 ¼ ðx3 U1 Þ U2 ;
m1 m 3 m 3 m2 m 3 m
000 1 1 1
!
3

1 1 1
Uj ¼ @@@ ðxj U1 Þ U2 A Uj1 A A :
m1 mj m m2 m j mj1 mj
j mj
ð3Þ
1850224-3
Note that MRC is a sequential process needing (j 1) steps. In each step, a single
mixed radix digit Ui is determined. The next step is to compute Eq. (2). Note that the
cumbersome modulo M reduction needed in the case of CRT in Eq. (1) is not needed
in MRC since X < M. In the present paper, we use MRC technique for deriving
various reverse converters.
3. Proposed RNS-to-Binary Converters

In this section we present new RNS-to-binary converters for the three-moduli sets
M2 f2 k 3, 2 k 2, 2 k 1g and M3 f2 k þ 1, 2 k þ 2, 2 k þ 3g using MRC technique.
Note that the m values for these moduli sets are 2 k1 1 and 2 k1 þ 1, since 2m ¼
2 k 2 and 2m ¼ 2 k þ 2 for these two cases, respectively.
3.1. MRC-based converters for the moduli set M2 f2 k ¡3, 2 k ¡2, 2 k ¡1g
We denote m1A ¼ 2 k 3, m2A ¼ 2 k 2 and m3A ¼ 2 k 1, the corresponding
residues as (x1A , x2A , x3A ) and the binary number corresponding to this residue set as
X. The MRC technique for M2 is shown in Fig. 1(a). The various multiplicative
inverses needed in the computation are as follows:

1
p¼ ¼ 1; ð4aÞ
2 k 2 2 k 3

1
q¼ ¼ 1 ; ð4bÞ
2 k 2 2 k 1

1
r¼ ¼ 2 k1 : ð4cÞ
2 k 3 2 k 1
These can be easily veri¯ed to be true since ð2 k 2Þ ¼ 1 mod (2 k 3), ð2 k 2Þ ¼

1 mod (2 k 1) and ð2 k 3Þ ð2 k1 Þ ¼ 1 mod (2 k 1). The cost-e®ective (CE)
implementation of the MRC algorithm of Fig. 1(a) follows the architecture given in
Fig. 1(b) indicated as D6CE. The computation of U A ¼ ðx1A x2A Þ mod (2 k 3)
can be realized using MODSUBA block shown in Fig. 2(a). The mixed radix digit UA
is thus already available as U A since p is 1 [see Eq. (4a)]. The intermediate result UB
can be computed as (x2A x3A ) mod (2 k 1) since q ¼ 1 [see Eq. (4b)].
The modulo subtractions (x2A x3A ) mod (2 k 1) and (UB UA ) mod (2 k 1)
to obtain intermediate results UB and U C , respectively, can be carried out using
MODSUBB block shown in Fig. 2(b). First we use MODSUBB1 to obtain UB fol-
lowed by MODSUBB2 to compute U C as shown in Fig. 1(b). Next, the multiplica-
tion mod (2 k 1) of U C with r to obtain mixed radix digit UC can be carried out by
performing (k 1)-bit circular-left-shift (CLS) of U C followed by one's complement
of the result [see Eq. (4c)].
1850224-4
(a)
(b)
Fig. 1. (a) MRC algorithm for the moduli set M2 and (b) architecture of converter D6CE.
1850224-5
x1A x2A x2A x3A

k k k
- - k
t
CPA2 Ci =1
m1A
CPA4
k
EAC MODSUBB
T1 CPA3
T2 k
2:1 MUX MODSUBA
(x2A-x3A) mod (2k-1)

(x1A-x2A) mod m1A
(a) (b)
xc xd
k k
-
mi
k
Ci =1
CPA5 CSA2
C2 S2
T3 Ci =1
CPA6
T4
2:1 MUX MODSUBC
k
(xc-xd) mod mi
(c)
Fig. 2. (a) Subtractor mod (2 k 3), (b) subtractor mod (2 k 1) and (c) high-speed (HS) mod mi
subtractor.
The decoded number X next shall be computed as
X ¼ x2A þ UA ð2 k 2Þ þ UC ð2 k 3Þð2 k 2Þ ; ð5Þ
where x2A , UA and UC are the mixed radix digits. We rewrite negative terms present
in Eq. (5) in terms of one's complement of similar word length and adding a
1850224-6
correction term (CT) as

X ¼ x2A þ UA 2 k þ ð2 k 1 UA Þ2 þ UC 2 2k þ ð2 k 1 UC Þ2 kþ2
þ ð2 k 1 UC Þ2 k þ 4UC þ 2UC þ CT : ð6Þ
Note that we have written UA as ð2 k 1 UA Þ and UC as ð2 k 1 UC Þ.

The additional term added in this process is ð2 k 1Þ 2 þ ð2 k 1Þ
2 kþ2 þ ð2 k 1Þ 2 k ¼ 5ð2 2k Þ 3ð2 k Þ 2. Thus CT to be added in Eq. (6) is CT
¼ ð5ð2 2k Þ 3ð2 k Þ 2Þ. As an illustration we consider k ¼ 4, and the ¯rst eight
terms of Eq. (6) can be represented as shown in the bit matrix of Fig. 3(a). Note that
dashes indicate the inverted bits present in the bit matrix. The terms represented in
Fig. 3(a) are e®ectively packed along with the correction term in order to reduce the
number of words need to be added to realize Eq. (6) to six as shown in Fig. 3(b).
The various words W 1–W 6 present in bit matrix of Fig. 3(b) can be added using a
four-level 3k-bit carry save adder (CSA) tree (CSA1) followed by 3k-bit carry
propagate adder (CPA1) as shown in the BLOCK1 in Fig. 1(b).
The high-speed converter D6 high-speed (HS) can be realized by replacing the
MODSUBA block and the two MODSUBB blocks of Fig. 1(b) with the MODSUBC
block shown in Fig. 2(c) to obtain intermediate results U A , UB and U C without
changing the other blocks. In the implementation of the architecture of Fig. 1(b), we
need three modulo subtractors. The CE version of a modulo subtractor (MODSUBA
block) for computing (x1A x2A Þ mod (2 k 3) can be realized as shown in Fig. 2(a)
in which we compute T1 ¼ ðx1A x2A Þ followed by T2 ¼ ðT1 þ m1A Þ using two k-bit
adders CPA2 and CPA3 and based on the sign of T1 , we select either T1 or T2 using a
x2A3 x2A2 x2A1 x2A0

uA3 uA2 uA1 uA0
u′ A3 u′ A2 u′ A1 u′ A0
uC3 uC2 uC1 uC0
u ′C3 u′ C2 u′ C1 uʹC0
u′ C3 u′ C2 u′ C1 u′ C0
uC3 uC2 uC1 uC0
uC3 uC2 uC1 uC0
(a)
W1 uC3 uC2 uC1 uC0 uA3 uA2 uA1 uA0 x2A3 x2A2 x2A1 x2A0
W2 u′ C3 u′ C2 u′ C1 u′ C0 u′ C1 u′ A3 u′ A2 u′ A1 u′ A0
W3 u′ C3 u′ C2 uC3 u′ C0 uC1 uC0 uC0
W4 uC2 uC2 uC1
W5 uC3
W6 ct11 ct10 ct9 ct8 ct7 ct6 ct5 ct4 ct3 ct2 ct1 ct0
(b)
Fig. 3. Bit matrices for the last stage computation of D6CE for k ¼ 4.
1850224-7
2:1 multiplexer (2:1 MUX). Thus the hardware requirement is two k-bit CPAs
(CPA2 and CPA3) and one k-bit 2:1 MUX. The computation time needed is ð2kÞ
FA þ MUX where FA and MUX are delays of a full-adder (FA) and a 2:1 MUX,
respectively.
In the special case of m1A ¼ 2 k 1, the modulo subtractor (x2A x3A ) mod
k
(2 1) can be realized in a cost-e®ective way using a CPA4 with end-around carry
(EAC) as shown in the MODSUBB block in Fig. 2(b). Thus, the hardware re-
quirement is less (only k inverters and k full-adders) whereas the computation time
is 2kFA .
We can use the HS modulo subtractor of Fig. 2(c) in which we compute T3 ¼
ðxc xd Þ and T4 ¼ ðxc xd þ mi Þ in two parallel adders and based on the sign of T3
we select either T3 or T4 using a 2:1 MUX. Note that one's complement of xd and a
carry input of one are added to realize two's complement of xd . Thus the hardware
requirement is two k-bit carry-propagate adders (CPA5 and CPA6), one k-bit
carry-save adder (CSA2) and one k-bit 2:1 MUX. The computation time is
(k þ 1ÞFA þ MUX . We denote this block as MODSUBC.
3.2. MRC-based converters for the moduli set M3 f2 k +1, 2 k +2, 2 k +3g
The MRC for this moduli set follows the procedure presented in Fig. 4(a). We denote
the residues as (x1B , x2B , x3B ) corresponding to the moduli m1B ¼ 2 k þ 1, m2B ¼
2 k þ 2 and m3B ¼ 2 k þ 3, respectively. Note that all the moduli are of length (k þ 1)
bits. The various multiplicative inverses needed for this approach shown in Fig. 4(a)
are as follows:

1
e¼ ¼ 1 ; ð7aÞ
2 k þ 2 2 k þ3

1
f¼ ¼ 1; ð7bÞ
2 k þ 2 2 k þ1

1
g¼ ¼ 2 k1 þ 1 : ð7cÞ
2 k þ 3 2 k þ1
Note that Eqs. (7a)–(7c) can be easily veri¯ed noting that ðð2 k þ 2Þ
ð1ÞÞmodð2 k þ 3Þ ¼ 1, ð2 k þ 2Þ mod ð2 k þ 1Þ ¼ 1 and (ð2 k þ 3Þ ð2 k1 þ 1Þ) mod
ð2 k þ 1Þ ¼ 1. The architecture of the converter D7 following Fig. 4(a) is shown in
Fig. 4(b). The mixed radix digit P can be computed as ðx2B x3B Þ þ tð2 k þ 3Þ since
e ¼ 1 [see Eq. (7a)] where if x2B x3B , t is 0 else t is 1. Note that ðx2B x3B Þ þ
tð2 k þ 3Þ is computed using MODSUBA block [see Fig. 4(b)]. The sign bit of
the result of (x2B x3B ) [sign bit of the output of CPA2 in MODSUBA block in
Fig. 2(a)] is considered as t.
Next, we consider computation of the mixed radix digit Q. We compute
(x1B x2B ) but we defer modulo m1B reduction since the multiplicative inverse
1850224-8
m2B m3B m1B

k k
= 2 +2 = 2 +3 = 2k+1
x2B x3B x1B
-x2B -x2B
(x3B-x2B) (x1B-x2B)
× e (= -1) × f (= 1)
P = (x2B-x3B+tm3B) (x1B-x2B)
-((x2B-x3B) + tm3B)
Q* (= (x1B-2x2B+x3B-tm3B) mod m1B)
×g
Q (= (Q*×g) mod m1B)
(a)
x2B x3B x3B0 g x1B0 g
k+1 - k +1
x3BH
t Ci =1 k
k k
CPA2
m3B x1BH
CSA3
k +1 k
C3 S3
T1 (x2B)1C
CPA3
CSA4
T2 k+1
C4 S4
2:1 MUX
MODSUBA CSA5
k +1 C5 S5
s Ci =1-t
m1B
P CPA7
k+1 k+1
T6
s′
EXOR
gates
Ci =s′
CPA8
T5
2:1 MUX
k+1
Q
(b)
Fig. 4. (a) MRC for three-moduli set f2 k þ1, 2 k þ2, 2 k þ3g and (b) architecture of converter D7.
1850224-9
f with which we need to multiply mod (2 k þ 1) is unity [see Eq. (7b)]. Next, unlike
in conventional MRC, we subtract (ðx2B x3B Þ þ tð2 k þ 3Þ) from (x1B x2B ) to
obtain the intermediate result
Q ¼ ðx3B 2x2B þ x1B tð2 k þ 3ÞÞmodð2 k þ 1Þ
¼ ðx3B 2x2B þ x1B 2tÞ modð2 k þ 1Þ : ð8Þ
Note that in the second equality, we have used the fact ð2 k þ 3Þ mod ð2 k þ 1Þ ¼ 2.
The subtraction of (ðx2B x3B Þ þ tð2 k þ 3ÞÞ instead of P has the advantage that t is
available before P is available saving one k-bit CPA delay.
Next, we consider realizing the computation of Q and multiplication with
g ¼ ð2 k1 þ 1Þ. Note that g can be obtained by deleting the least signi¯cant bit
(LSB) of m2B since m2B ¼ 2 k þ 2 is even. We need to compute

ðg Q Þ mod ð2 k þ 1Þ ¼ ½g ðx3B 2x2B þ x1B 2tÞ modð2 k þ 1Þ mod ð2 k þ 1Þ
¼ ðgðx3B þ x1B Þ x2B tÞ mod ð2 k þ 1Þ:
Note that (g 2x2B ) mod ð2 k þ 1Þ ¼ ð2 ð2 k1 þ 1Þ x2B Þ mod ð2 k þ 1Þ ¼ x2B and

ðg 2tÞ mod ð2 k þ 1Þ ¼ ð2 ð2 k1 þ 1Þ tÞ mod ð2 k þ 1Þ ¼ t. We consider x3B ¼
2x3BH þ x3B0 and x1B ¼ 2x1BH þ x1B0 , where x3BH and x1BH are the words formed
by the most signi¯cant k bits of x3B and x1B , respectively. The computation of
g ðx3B þ x1B Þ can be realized by adding g x3B0 , g x1B0 , x3BH and x1BH since
ð2g x3BH Þ mod ð2 k þ 1Þ ¼ ð2 ð2 k1 þ 1Þ x3BH Þ mod ð2 k þ 1Þ ¼ x3BH and ð2g
x1BH Þ mod ð2 k þ 1Þ ¼ ð2 ð2 k1 þ 1Þ x1BH Þ mod ð2 k þ 1Þ ¼ x1BH . Note that
x2B t is realized as (x2B Þ1C þ ð1 tÞ ¼ ðx2B Þ1C þ t 0 where t 0 is inverted bit of t.
Thus we need to compute (g x3B0 þ g x1B0 þ x3BH þ x1BH þ ðx2B Þ1C þ t 0 Þ mod
ð2 k þ 1Þ to obtain Q. Note that when x1B0 , x3B0 both are 1, we need to add 2g
and when x1B0 or x3B0 is 1, g needs to be added. This is realized by two pairs of AND
gates enabling g when x1B0 and x3B0 are 1. The resulting ¯ve operands can be added
using the three-level CSA tree (CSA3–CSA5) and CPA7 followed by a mod ð2 k þ 1Þ
adder. Note that the maximum positive and minimum negative values of the result of
CPA7 are ð3 2 k1 þ 2Þ and ð2 k þ 2Þ, respectively. Hence at most two additions
or one subtraction of modulus ð2 k þ 1Þ is su±cient to obtain Q. Hence a modulo
ð2 k þ 1Þ adder using an ADD/SUB unit formed by CPA8, k exclusive-OR
(EXOR) gates and a 2:1 MUX are used to compute Q. Next, we obtain the decoded
result X as
X ¼ x2B þ P ð2 k þ 2Þ þ Qð2 k þ 3Þð2 k þ 2Þ ; ð9Þ
where x2B , P and Q are the mixed radix digits. We rewrite Eq. (9) as
X ¼ x2B þ P 2 k þ 2P þ Q2 2k þ Q2 kþ2 þ Q2 k þ 4Q þ 2Q : ð10Þ
1850224-10
x2B4 x2B3 x2B2 x2B1 x2B0

p4 p3 p2 p1 p0
p4 p3 p2 p1 p0
q4 q3 q2 q1 q0
q4 q3 q2 q1 q0
q4 q3 q2 q1 q0
q4 q3 q2 q1 q0
q4 q3 q2 q1 q0
(a)
W7 q4 q3 q2 q1 p4 p3 p2 p1 x2B4 x2B3 x2B2 x2B1 x2B0

W8 q4 q3 q0 q1 q0 p4 p0 p2 p1 p0
W9 q2 q3 q2 q1 p3 q1 q0 q0
W 10 q4 q4 q3 q0 q2 q1
W 11 q4 q2
W 12 q3
(b)
Fig. 5. Bit matrices for the ¯nal stage computation of converter D7 for k ¼ 4.
As an illustration, we consider k ¼ 4 for which case the bits that need to be summed
are shown in Fig. 5(a). These can be rearranged in order to reduce the number of words
that need to be added to six as shown in Fig. 5(b). Using a four-level ð3k þ 1Þ-bit CSA
tree followed by ð3k þ 1Þ-bit CPA, we can obtain the decoded integer X.
The hardware requirements for the computation of last stages of the converters
D6 and D7 can be estimated as ð8k 1ÞFA þ ð2k þ 1ÞHA and ð6k þ 7ÞFA þ ðk 2ÞHA ,
respectively, where HA denotes half-adder; the conversion times required for the last
stages of D6 and D7 are estimated as (3kþ4)FA and ð3k þ 5ÞFA , respectively.
4. Techniques for Scaling by One Modulus

4.1. Scaling for moduli set M2
Note that scaling by one of the moduli is easily possible using the architecture of the
MRC-based reverse converter. As an illustration for the moduli set M2, referring to
Fig. 1(a), we observe that the intermediate residues UA and UB correspond to quo-
tient QA ¼ ðX x2A Þ=m2A . Thus, we need to ¯nd the residue of the quotient QA
corresponding to modulus m2A by base extension. This needs continuing with MRC
as in Fig. 1(a). The MRC digits corresponding to QA , viz. UA and UC , are evidently
such that QA ¼ UC ð2 k 3Þ þ UA . We need thus to compute the residue ZA
corresponding to modulus m2A as
ZA ¼ QA mod m2A ¼ ðUC ð2 k 3Þ þ UA Þ mod ð2 k 2Þ
¼ ðUC þ UA Þ mod ð2 k 2Þ ð11Þ
1850224-11
UA UC
k k
-
m2A
k
Ci =1
CPA9 CSA6
C6 S6
T7 Ci =1
CPA10
T8
2:1 MUX
ZA
Fig. 6. Architecture of scaling by modulus (2 k þ 2) for moduli set M2.
since ð2 k 3Þ mod ð2 k 2Þ ¼ 1. Thus the overhead needed for scaling is the
additional hardware needed to compute ðUC þ UA Þ mod ð2 k 2Þ as shown in
Fig. 6. Note that (UA UC ) and (UA UC þ m2A ) are computed in parallel using
CPA9, CSA6 and CPA10, respectively, and based on the sign of the carry output of
CPA9, using a 2:1 MUX the correct result ZA is selected. Since UC and UA are k-bit
wide, ZA lies between 2 k 4 and (2 k 2). This needs in the worst-case addition of
m2A with (UC þ UA Þ. The scaled result in RNS is thus fZA , UA , UB g. Note that the
total hardware requirement needed in the architecture of scaler for M2 is 7k FAs þ2k
2:1 MUXs and the time for scaling is ð5k þ 1ÞFA þ 22:1 MUX . The following
example illustrates the scaling by modulus 14 in the RNS f14, 13, 15g.
Example. We consider scaling of 2,390 whose RNS form is (10, 11, 5) corresponding to
the moduli set f14, 13, 15g. The complete MRC following Fig. 1(a) is illustrated next:
Note that in the ¯rst stage of MRC we got the residues UA ¼ 1 and UB ¼ 5 of
the quotient QA of division of 2,390 by 14. We need to ¯nd the residue with respect to
1850224-12
the modulus 14. This step is called base extension. Here we ¯nd the quotient QA as
UC m1A þ UA ¼ 13 13 þ 1. We ¯nd QA mod 14 as ð1Þ 13 þ 1 ¼ 12 mod
14 ¼ 2. Hence the scaled result in RNS is f2, 1, 5g corresponding to 170.
Note that scaling by product of two moduli is also feasible. Note that UC is the
result of scaling (division) by (m1A m2A ). The scaled result in RNS is fU C0 , U C00 ,
UC g where U C0 ¼ UC mod m2A and U C00 ¼ UC mod m1A .
4.2. Scaling for moduli set M3

In the case of moduli set M3, we refer to Fig. 7 using MRC [note that the architecture
of Fig. 4(a) is designed to give the MRC digits fast and will not give the intermediate
residues corresponding to ðX x2B Þ=m2B Þ]. The hardware implementation follows

the architecture in Fig. 8(a). Evidently, scaling by m2B is achieved in two steps: ¯rst
by computing (x3B x2B ) mod m3B and (x1B x2B ) mod m1B and next by multi-
plication with the respective multiplicative inverses. This yields the residues UD and
UE corresponding to the quotient ðX x2B Þ=m2B . Next the MRC will be continued
to get the MRC digit UF as shown in Fig. 7. Note that g ¼ 2 k1 þ 1 as de¯ned before
[see Eq. (7c)]. We denote U F ¼ ðUE UD Þ mod ð2 k þ 1Þ, and write U F ¼ 2FH þ F0
where FH is the k-bit MSB word in U F and F0 is the LSB of U F . Thus we compute
ðU F gÞ mod ð2 k þ 1Þ as
ðU F gÞ mod ð2 k þ 1Þ ¼ ðð2FH þ F0 Þ gÞ mod ð2 k þ 1Þ
¼ ðFH þ F0 gÞ mod ð2 k þ 1Þ:
Thus FH needs to be added with g enabled by F0 in a (k þ 1)-bit CPA13 adder

as shown in Fig. 8(b). Note that ðFH þ F0 gÞ is less than ð2 k þ 1Þ needing no
modulo ð2 k þ 1Þ reduction. Thus, we need to ¯nd ZB ¼ ðUF ð2 k þ 3Þ þ UD Þ mod
ð2 k þ 2Þ ¼ ðUF þ UD Þ mod ð2 k þ 2Þ. Note that only one subtraction of m2B
from ðUF þ UD Þ is needed to obtain the residue ZB . Thus the scaled result in RNS is
fZB , UD , UE g.
Fig. 7. MRC technique for the moduli set M3 to illustrate scaling by modulus ð2 k þ 2Þ.
1850224-13
x2B x3B x1B x2B
k+1 k+1 k+1 k+1

- -
m 3B MODSUBC2 m 1B
MODSUBC1
k+1 k+1
UD UE
k+1
-
MODSUBC3 m 1B
UF* k+1
k
(UF*×g) mod (2 +1)

k g
k+1
UF k+1
m 2B
CPA11 CSA7
C7 S7
T9
Ci =1
CPA12
T10
2:1 MUX
k+1
ZB
(a)
UF* g
k+1
k
F0
k AND
FH gates
CPA13
k+1
(UF*×g) mod (2k+1)
(b)
Fig. 8. (a) Architecture of scaling for moduli set M3 and (b) computation of (U F g) mod (2 k þ1).
1850224-14
Example. We consider an example of scaling of 5,110 by 18 in the moduli set

(18, 19, 17). Let the given residues be 5, 110 ¼ f16; 18; 10g.
Note that in the ¯rst stage of MRC we got the residues of the quotient QB of
division by 18. We need to ¯nd residue with respect to the modulus 18. This step
is called base extension. Here the quotient word QB ¼ 14 19 þ 17 ¼ 283. We
¯nd QB mod 18 as 14 ð1Þ þ 17 ¼ 31 mod 18 ¼ 13. Hence, the scaled result in
RNS is f13; 17; 11g ¼ 283. Scaling by product of two moduli (m2B m3B ) is
also feasible. Note that UF is the result of scaling by (m2B m3B ). The
scaled result in RNS is fU F0 , U F00 , UF g where U F0 ¼ UF mod m2B and U F00 ¼ UF
mod m3B .
Note that the computation of mixed radix digit UD and the intermediate results
UE and U F is carried out using three (k þ 1)-bit MODSUBC blocks of Fig. 2(c).
Next, the computation of mixed radix digit UF from U F is carried out using modulo
multiplier presented in Fig. 8(b). Note that (UF þ UD ) and (UF þ UD m2B ) are
computed in parallel using CPA11, CSA7 and CPA12, respectively, and based on the
sign of the carry output of CPA11, using a 2:1 MUX the correct result ZB is selected
as shown in Fig. 8(a). Thus, the total hardware requirement needed in the archi-
tecture of scaler for M3 is ð13k þ 3Þ FAs þð4k þ 4Þ 2:1 MUXs þ k AND gates and the
time for scaling is ð4k þ 7ÞFA þ 32:1 MUX þ AND . Evidently scaling is faster than
RNS-to-binary conversion using MRC, since MRC needs additional step of com-
puting the ¯nal decoded number following (2).
5. Performance Evaluation and Comparison

The hardware requirements and conversion times for the various converters de-
scribed in Refs. 5,17,18,20 and 31 for the moduli set M1 along with the proposed
converters have been presented in Table 1. Note that FA, HA, AND, EXOR and w:1
MUX stand for a full-adder, half-adder, two-input AND gate, two-input EXOR gate
and w:1 multiplexer, respectively. The notations L1, L2 and L3 are used to represent
2u u, u u, ðu þ 1Þ ð2u þ 1Þ multipliers, respectively, and LiM (for i ¼ 1; 2; 3) is
used to represent merged multiplier,24 where u ¼ dlog2 mi e. In merged multiplier,
1850224-15
Table 1. Comparison of hardware requirements and conversion times of various reverse converters for
the three- moduli sets M1, M2 and M3.
Design Moduli set Hardware requirement Conversion time

5
D1 M1 f2m þ 1; 2m; 2m 1g 15uFAs þ 15u 2 : 1 MUXs þ 3 ð6u þ u=2ÞFA
L1s
D217 M1 f2m þ 1; 2m; 2m 1g 11uFAs þ 6u 2 : 1 MUXs þ 9uFA þ L1 þ L2
L1 þ L2
D317 M1 f2m þ 1; 2m; 2m 1g 13uFAs þ 5u 2 : 1 MUXs þ 9uFA þ L1 þ L2
L1 þ L2
D418 M1 f2m þ 1; 2m; 2m 1g ð18u þ 18ÞFAs þ 2u 3 : 1 ð6u þ 7ÞFA þ 3:1 MUX þ L1 þ
MUXs þuANDs þ L1 þ L2 L2
D520 M1 f2mþ1, 2m, 2m1g (8uþ9) FAs þ (uþ2) 5:1 MUXs (6uþ2) FA þ 5:1 MUX þ
þ (uþ2) NANDs þ (uþ2) EXOR þ NAND þ L1
EXORsþ L1 þ L2M
D831 M1 f2mþ1, 2m, 2m1g (15uþ1) FAs þ 2u HAs þ (uþ1) (5uþ4)FA þ L1 þ 22:1 MUX
ANDs þ EXOR þ (4uþ2)
2:1 MUXs þ L1þ L3
D6CE M2 f2 k 3, 2 k 2, 2 k 1g ð12k þ 3Þ FAs þð2k þ 1Þ HAs ð7k þ 8ÞFA
þk 2 : 1 MUXs
D6HS M2 f2 k 3, 2 k 2, 2 k 1g ð17k þ 8Þ FAs þð2k þ 1Þ HAs ð5k þ 8ÞFA
þ3k 2:1 MUXs
D7 M3 f2 k þ 1, 2 k þ 2, 2 k þ 3g ð14k þ 25Þ FAs þ (k 2) HAs ð7k þ 15ÞFA þ 3:1 MUX þ
þðk þ 1Þ 2:1 MUXs 2AND
þðk þ 1Þ 3:1 MUXs
þð2k þ 1Þ ANDs þðk þ 1Þ
EXORs
ðx yÞ þ z is realized in one stage instead of multiplier followed by an adder in

order to reduce computation time.
The converter D1 of Premkumar5 for the moduli set f2m þ 1; 2m; 2m 1g using
CRT technique utilizes ¯ve two-input adders, three 2u u multipliers and ¯ve 3u-bit
2:1 MUXs. In order to overcome the complexity involved in D1, Premkumar et al.
suggested two converters D2 (architecture A) and D3 (architecture B)17 by simpli-
fying the conventional CRT which replaces the modulo M reduction with simpli¯ed
modulo ð4m 2 1Þ reduction and utilizes one 2u u multiplier and another u u
multiplier. Architecture A (D2) needs seven two-input adders and six u-bit 2:1
MUXs. Architecture B (D3) needs nine two-input adders and ¯ve u-bit 2:1 MUXs.
Note that D3 is a high-speed version. The converter D4 for M1 is proposed by Wang
et al.18 using new CRT II technique. This design D4 also needs one 2u u multiplier,
one u u multiplier, few adders and few comparators. The converter D5 for M1
suggested by Gbolagadeet al.20 is based on the modi¯cation of CRT and realizes
modulo (2m 1) reduction using several MUXs and comparators and needs one
2u u multiplier and one u u multiplier. The hardware requirements and con-
version times for the ¯ve converters D1–D5 are presented as ¯rst ¯ve entries in
Table 1. The converter D8 for M1 is presented in Ref. 31 using core function that
needs one 2u u multiplier, one ðu þ 1Þ ð2u þ 1Þ multiplier, one (u þ 1)-bit
1850224-16
subtractor, one u-bit subtractor, one u-bit modulo 2m adder and one 3u-bit modulo
ð2mð4m 2 1ÞÞ adder. This design is also considered for comparison (presented in
Table 1 as entry D8). We considered high-speed implementation of all modulo adders
in this evaluation. Note that all the converters D1–D5 and D8 need two multipliers
having quadratic dependence of area on m.
The converter D6CE for the moduli set M2 needs lowest area due to its multiplier-
free architecture. Among the two converters D6CE and D6HS, there is trade-o® be-
tween conversion time and area. The converter D7 using one modulus of the form
2 k þ 1 does not need multipliers, however, needs more hardware resources than the
converter D6CE. The conversion time of D7 is comparable to that of converter
D6CE. The converter D6HS needs the least conversion time among all the converters.
The proposed designs D6CE, D6HS, D7 as well as design D520 were implemented
using Cadence (version 14.20), Compiler: RC 14.25 and synthesized using Cadence
Encounter tool using 180 nm technology. The post-place and route results of area,
conversion time and power dissipation for the designs D5, D6CE, D6HS and D7 are
presented in Table 3. The appropriate m values for dynamic range (DR) ranging
from 8-bit to 64-bit, are summarized in Table 2. Note that in Table 2, the moduli set
Table 2. Values of m to be considered for various DRs of M1, M2 and M3.
Moduli set 8-bit DR 16-bit DR 24-bit DR 32-bit DR 48-bit DR 64-bit DR

M1 4 21 129 813 32,769 1,321,123
M2 7 31 255 1,203 65,535 2,097,151
M3 5 33 129 1025 32,769 2,097,153
Table 3. ASIC implementation results of various reverse converters for the three-moduli sets M1, M2
and M3.
Design
20
D5 D6CE (Proposed)
Dynamic range Area (m 2 ) Delay (ps) Power (W) Area (m 2 ) Delay (ps) Power (W)
8-bit 3,765 2,609 677 4,717 3,048 1,136
16-bit 12,507 5,828 4,545 9,876 4,355 3,592
24-bit 23,168 8,091 11,389 19,752 7,623 9,940
32-bit 31,617 9,382 17,984 27,130 8,623 15,302
48-bit 61,236 14,225 34,934 53,681 13,816 32,997
64-bit 90,897 17,638 62,140 85,006 17,270 60,400
D6HS (Proposed) D7 (Proposed)
8-bit 5,568 3,360 1,456 3,659 3,212 687
16-bit 11,253 4,585 4,338 11,726 6,358 4,263
24-bit 22,390 8,181 12,238 19,496 7,724 8,840
32-bit 30,210 9,894 18,842 30,639 11,870 18,432
48-bit 57,883 14,670 39,152 53,968 17,249 34,164
64-bit 90,664 18,422 72,056 90,867 22,924 66,484
1850224-17
for given m is f2m þ 1, 2m, 2m 1g. As an illustration, for 16-bit DR the moduli
sets M1, M2 and M3 are, respectively, f41, 42, 43g (m ¼ 21), f61, 62, 63g (m ¼ 31)
and f65, 66, 67g (m ¼ 33).
From Table 3, regarding area, for 8- and 24-bit DRs design D7 is preferable,
whereas for 16-bit DR, design D6CE is preferable. For 32-, 48- and 64-bit DRs,
designs D6CE, D6HS and D7 outperform design D5. Among the three new
converters, design D6CE is superior.
Regarding conversion time, for 8-bit DR design D5 is better and for 16-bit DR,
designs D6CE and D6HS are better than other converters. For all other DRs, con-
verter D6CE is better than the other converters. Among the new converters, D6CE is
preferable regarding conversion time.
Regarding power dissipation, design D5 is superior for 8-bit DR, whereas for
16-bit DR, designs D6HS, D6CE and D7 outperform design D5; for other DRs,
design D6CE is superior over other converters.
6. Conclusions
In this paper, we have presented RNS-to-binary converters for the moduli sets
f2 k 2, 2 k 3, 2 k 1g and f2 k þ 2, 2 k þ 3, 2 k þ 1g using MRC technique. All the
proposed converters were evaluated based on the hardware resource requirement as
well as conversion time and compared with all converters described in literature for
the moduli set f2m þ 1, 2m, 2m 1g. All the proposed converters are implemented
and compared with the area-e±cient converter20 for M1 regarding area, conversion
time and power dissipation for di®erent DRs. The proposed converters for M2 and
M3 were shown to be better than some of the other converters regarding area and
conversion time while having the advantage of availability of mixed radix digits to
facilitate easy comparison. We have also presented techniques for scaling by one
modulus for both the proposed moduli sets using mixed radix conversion. E±cient
binary-to-RNS converters and multipliers for the moduli 2 k 3 and 2 k þ 3
(Refs. 25–27) and multipliers28,29 and binary-to-RNS converters for the moduli ð2 k
1Þ and ð2 k þ 1Þ are available in literature.2 The binary-to-RNS converter for the even
moduli ð2 k þ 2Þ and ð2 k 2Þ can employ the binary-to-RNS converter for moduli
ð2 k1 þ 1Þ and (2 k1 1Þ with suitable modi¯cation described in Ref. 30.
References
1. N. S. Szabo and R. I. Tanaka, Residue Arithmetic and Its Applications to Computer
Technology (McGraw-Hill, New York, 1967).
2. P. V. Ananda Mohan, Residue Number Systems: Theory and Applications (Birkhauser,
Switzerland, 2016).
3. A. Omondi and A. B. Premkumar, Residue Number System Theory and Implementation
(Imperial College Press, 2007).
1850224-18
4. M. A. Soderstrand, G. A. Jullien, W. K. Jenkins and F. Taylor (eds.), Residue Number

System Arithmetic: Modern Applications in Digital Signal Processing (IEEE Press, 1986).
5. A. B. Premkumar, An RNS to binary converter in f2n þ 1, 2n, 2n 1g moduli set,
IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process. 39 (1992) 480–482.
6. A. B. Premkumar, An RNS to binary converter in a three moduli set with
common factors, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process. 42 (1995)
298–301.
7. S. Andraros and H. Ahmad, A new e±cient memory-less residue to binary converter,
IEEE Trans. Circuits Syst. 35 (1988) 1441–1444.
8. S. J. Piestrak, A high-speed realization of residue to binary system conversion,
9. A. Dhurkadas, Comments on \A high-speed realization of a residue to binary number
system converter", IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process. 45 (1998)
446–447.
10. M. Bhardwaj, A. B. Premkumar and T. Srikanthan, Breaking the 2n-bit carry propa-
gation barrier in residue to binary conversion for the f2 n 1, 2 n , 2 n þ 1g moduli set,
IEEE Trans. Circuits Syst. I, Fundam. Theory Appl. 45 (1998) 998–1002.
11. Z. Wang, G. A. Jullien and W. C. Miller, An improved residue to binary converter, IEEE
Trans. Circuits Syst.I, Fundam. Theory Appl. 47 (2000) 1437–1440.
12. Y. Wang, X. Song, M. Aboulhamid and H. Shen, Adder based residue to binary number
converters for f2 n 1, 2 n , 2 n þ1g, IEEE Trans. Signal Process. 50 (2002) 1772–1779.
13. A. A. Hiasat and H. S. Abdel-Aty-Zohdy, Residue to binary arithmetic converter for the
moduli set f2 k , 2 k 1, 2 k1 1g, IEEE Trans. Circuits Syst. II, Analog Digit. Signal
Process. 45 (1998) 204–209.
14. W. Wang, M. N. S. Swamy, M. O. Ahmad and Y. Wang, A high-speed residue-to-binary
converter for thee moduli f2 k , 2 k 1, 2 k1 1g RNS and a scheme for its VLSI
implementation, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process. 47 (2000)
1576–1581.
15. W. Wang, M. N. S. Swamy, M. O. Ahmad and Y. Wang, A note on \A high-speed
residue-to-binary converter for thee moduli f2 k , 2 k 1, 2 k1 1g RNS and a scheme for its
VLSI implementation", IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.
49 (2002) 230.
16. P. V. Ananda Mohan, New residue to binary converters for the moduli set f2 k , 2 k 1,
2 k1 1g, Proc. IEEE Region 10 Conf. (TENCON 2008) (2008), pp. 1–6.
17. A. B. Premkumar, M. Bhardwaj and T. Srikanthan, High-speed and low-cost reverse
converters for the f2n1, 2n, 2nþ1g moduli set, IEEE Trans. Circuits Syst. II, Analog
Digit. Signal Process. 45 (1998) 903–908.
18. Y. Wang, M. N. S. Swamy and M. O. Ahmad, Residue-to-binary number converters
for three moduli sets, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.
46 (1999) 180–183.
19. K. A. Gbolagade and S. D. Cotofana, An e±cient RNS to binary converter using
the moduli set f2n1, 2n, 2nþ1g, Proc. XXIII Conf. Design of Circuits and Integrated
Systems (DCIS) (2008).
20. K. A. Gbolagade, G. R. Voicu and S. D. Cotofana, An e±cient FPGA design of resi-
due-to-binary converter for the moduli set f2nþ1, 2n, 2n1g, IEEE Trans. Very Large
Scale Integr. (VLSI) Syst. 19 (2011) 1500–1503.
21. Y. Wang, Residue to binary converters based on new Chinese remainder theorems,
1850224-19
22. B. Cao, C. H. Chang and T. Srikanthan, An e±cient reverse converter for the 4-moduli
set f2 n 1, 2 n , 2 n þ1, 2 2n þ1g based on the new Chinese remainder Theorem, IEEE Trans.
Circuits Syst. I, Fundam. Theory Appl. 50 (2003) 1296–1303.
23. A. Hiasat, VLSI implementation of new arithmetic residue to binary decoders,
IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 13 (2005) 153–158.
24. E. E. Swartzlander, Jr., Merged arithmetic, IEEE Trans. Comput. 29 (1980) 946–950.
25. P. M. Matutino, H. Pettenghi, R. Chaves and L. Sousa, Multiplier based binary to RNS
converters modulo (2 n k), Proc. 26th Conf. Design of Circuits and Integrated Systems
(DCIS) (2011), pp. 125–130.
26. P. M. Matutino, R. Chaves and L. Sousa, Binary to RNS conversion units for moduli
(2 n 3), Proc. 14th IEEE Euromicro Conf. Digital System Design (DSD) (2011),
pp. 460–467.
27. H. Pettenghi, R. Chaves and L. Sousa, Method for designing modulo f2 n kg binary
to RNS converters, Proc. Int. Conf. Design of Circuits and Integrated Systems (DCIS)
(2013) pp. 1–6.

28. L. Li, L. Zhou and W. Zhou, High-speed modulo (2 n þ 3) multipliers, IEICE Electron.
Express 10 (2013) 1–7.
29. H. Ahmadifar and G. Jaberipur, Improved modulo (2 n 3) multipliers, Proc. 17th CSI
Int. Symp. Computer Architecture and Digital Systems (CADS) (2013), pp. 31–35.
30. P. V. Ananda Mohan, E±cient design of binary to RNS converters, J. Circuits Syst.
Comput. 9 (1999) 145–154.
31. P. V. Ananda Mohan, Reverse conversion using core function, CRT and mixed radix
conversion, Circuits Syst. Signal Process. 36 (2017) 2847–2874.
1850224-20

PH Alguna 2018

Uploaded by

Copyright:

Available Formats

You might also like

PH Alguna 2018

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PH Alguna 2018

Uploaded by

Copyright:

Available Formats

Journal of Circuits, Systems, and Computers

Vol. 27, No. 14 (2018) 1850224 (20 pages)

RNS-to-Binary Converters for New Three-Moduli Sets

P. S. Phalguna† and Dattaguru V. Kamat‡

Department of Electronics and Communication Engineering,

Manipal Institute of Technology,

Received 27 November 2017

*This paper was recommended by Regional Editor Zoran Stamenkovic.

processor has a front-end binary-to-residue converter also called forward converter.

Several three-, four- or higher-moduli sets have been described in literature.2

These prefer the use of powers-of-two-related moduli of the forms 2 , 2 þ 1, 2 1,

In Sec. 2, background material is given in brief. The proposed MRC-based reverse

where Ui are the mixed radix digits de¯ned as

3. Proposed RNS-to-Binary Converters

2 k 2 and 2m ¼ 2 k þ 2 for these two cases, respectively.

These can be easily veri¯ed to be true since ð2 k 2Þ ¼ 1 mod (2 k 3), ð2 k 2Þ ¼

x1A x2A x2A x3A

(x2A-x3A) mod (2k-1)

(x1A-x2A) mod m1A

2:1 MUX MODSUBC

The decoded number X next shall be computed as

X ¼ x2A þ UA ð2 k 2Þ þ UC ð2 k 3Þð2 k 2Þ ; ð5Þ

correction term (CT) as

Note that we have written UA as ð2 k 1 UA Þ and UC as ð2 k 1 UC Þ.

x2A3 x2A2 x2A1 x2A0

m2B m3B m1B

x2B x3B x3B0 g x1B0 g

¼ ðx3B 2x2B þ x1B 2tÞ modð2 k þ 1Þ : ð8Þ

(LSB) of m2B since m2B ¼ 2 k þ 2 is even. We need to compute

Note that (g 2x2B ) mod ð2 k þ 1Þ ¼ ð2 ð2 k1 þ 1Þ x2B Þ mod ð2 k þ 1Þ ¼ x2B and

X ¼ x2B þ P ð2 k þ 2Þ þ Qð2 k þ 3Þð2 k þ 2Þ ; ð9Þ

X ¼ x2B þ P 2 k þ 2P þ Q2 2k þ Q2 kþ2 þ Q2 k þ 4Q þ 2Q : ð10Þ

x2B4 x2B3 x2B2 x2B1 x2B0

W7 q4 q3 q2 q1 p4 p3 p2 p1 x2B4 x2B3 x2B2 x2B1 x2B0

4. Techniques for Scaling by One Modulus

¼ ðUC þ UA Þ mod ð2 k 2Þ ð11Þ

Fig. 6. Architecture of scaling by modulus (2 k þ 2) for moduli set M2.

4.2. Scaling for moduli set M3

residues corresponding to ðX x2B Þ=m2B Þ]. The hardware implementation follows

ðU F gÞ mod ð2 k þ 1Þ ¼ ðð2FH þ F0 Þ gÞ mod ð2 k þ 1Þ

¼ ðFH þ F0 gÞ mod ð2 k þ 1Þ:

Thus FH needs to be added with g enabled by F0 in a (k þ 1)-bit CPA13 adder

x2B x3B x1B x2B

k+1 k+1 k+1 k+1

(UF*×g) mod (2 +1)

(UF*×g) mod (2k+1)

Example. We consider an example of scaling of 5,110 by 18 in the moduli set

5. Performance Evaluation and Comparison

Design Moduli set Hardware requirement Conversion time

ðx yÞ þ z is realized in one stage instead of multiplier followed by an adder in

Table 2. Values of m to be considered for various DRs of M1, M2 and M3.

Moduli set 8-bit DR 16-bit DR 24-bit DR 32-bit DR 48-bit DR 64-bit DR

4. M. A. Soderstrand, G. A. Jullien, W. K. Jenkins and F. Taylor (eds.), Residue Number

(2013) pp. 1–6.

You might also like