Professional Documents
Culture Documents
Reliable Low-Power Multiplier Design Using Fixed-Width Replica Redundancy Block
Reliable Low-Power Multiplier Design Using Fixed-Width Replica Redundancy Block
1, JANUARY 2015
Fig. 4. Statistical curves of average truncation error in the LSP block and
the curves of compensation function with β − 1, β, and β + 1 in the 12-bit
fixed-width RPR-based ANT multiplier.
Fig. 3. 12 × 12 bit ANT multiplier is implemented with the six-bit fixed-
width replica redundancy block.
ε = P − Pt (5)
and β is found. It is noted that β is the summation of all
where P is the output of the complete multiplier in MDSP partial products of ICV. By statistically analyzing the truncated
and Pt is the output of the fixed-width multiplier in RPR. difference between MDSP and fixed-width RPR with uniform
Pt can be expressed as input distribution, we can find the relationship between f (EC)
n−1
n−1 and β. As shown in Fig. 4, the statistical results show that the
Pt = yj2j x i 2i average truncation error in the fixed-width RPR multiplier is
j = n2 +1 i= 3n
approximately distributed between β and β+1. More precisely,
2 −j
as β = 0, the average truncation error is close to β + 1. As
+ f x n−1 y n2 , x n−2 y n2 +1 , x n−3 y n2 +2 , . . . , x n2 y n2 +2 β > 0, the average truncation error is very close to β. If we can
select β as the compensation vector, the compensation vector
+ f x n−2 y n2 , x n−3 y n2 +1 , x n−4 y n2 +2 , . . . , x n2 yn−2 can directly inject into the fixed-width RPR as compensation,
n−1
n−1 which does not need extra compensation logic gates [17].
j
= yj2 x i 2i + f (ICV) + f (MICV) We go further to analyze the compensation precision by
j = n2 +1
2 −j
i= 3n selecting β as the compensation vector. We can find that
the absolute average error in β = 0 is much larger than
n−1
n−1
= yj2j x i 2i + f (EC) (6) that in other β cases, as shown in Fig. 5. Moreover, the
j = n2 +1
absolute average error in β = 0 is larger than 0.5∗ 2(3n/2),
i= 3n
2 −j
while the absolute average error in other β situations is
where f (EC) is the error compensation function, f (ICV) smaller than 0.5∗ 2(3n/2). Therefore, we can apply multiple
is the error compensation function contributed by the input input error compensation vectors to further enhance the error
correction vector ICV(β), and f (MICV) is the error compen- compensation precision. For the β > 0 case, we can still select
sation function contributed by minor input correction vector β as the compensation vector. For the β = 0 case, we select
MICV(α). β + 1 combining with MICV as the compensation vector.
The source of errors generated in the fixed-width RPR is Before directly injecting the compensation vector β into the
dominated by the bit products of ICV since they have the fixed-width RPR, we go further to double check the weight for
largest weight. In [8], it is reported that a low-cost EC circuit the partial product terms in ICV with the same partial product
can be designed easily if a simple relationship between f (EC) summation value β but with different locations. As shown in
WEY et al.: RELIABLE LOW-POWER MULTIPLIER DESIGN 81
TABLE I
R ELATION B ETWEEN THE PARTIAL P RODUCT T ERM ’ S L OCATION IN
ICV AND THE S TATISTICAL C OMPENSATION E RROR E AVG
⎧
⎪
⎪ 0, if βl = 0
⎪
⎪
⎪
⎪
⎨β, if βl = 1, or βl = 2 & (up − MICV = 0 & down − MICV = 0)
f (ICV) = β − 1, if βl = 2 & (up − MICV = 0 or down − MICV = 0), βl = 3, (7)
⎪
⎪
⎪
⎪ or βl = 4 & (up − MICV = 0 & down − MICV = 0)
⎪
⎪
⎩β − 2, if βl = 4 & (up − MICV = 0 or down − MICV = 0)
82 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015
Fig. 8. Comparison of absolute error between the proposed design, the fixed- IV. P ERFORMANCE C OMPARISONS
width RPR with compensation vector β only, and the full-width RPR in the
12-bit fixed-width RPR-based ANT multiplier. To evaluate and compare the performance of the proposed
fixed-width RPR based ANT design and the previous full-
width RPR-based ANT design, we implemented these two
ANT designs in a 12-bit by 12-bit multiplier. The main
performance indexes are the precision of RPR blocks, the
silicon area of RPR blocks, the critical computation delay of
RPR blocks, the error probability of RPR blocks under VOS,
and the lowest reliable operating supply voltage under VOS.
Through quantitative analysis of experimental data, we can
demonstrate that our proposed design can more effectively
restrain the soft noise interference resulting from postponed
computation delay under VOS when the circuit operates with
a very low-voltage supply. Moreover, hardware overhead and
Fig. 9. Proposed high-accuracy fixed-width RPR multiplier with compensa- power consumption can also be lowered in the proposed
tion constructed by the multiple truncation EC vectors combined ICV together fixed-width RPR-based ANT design. Finally, we implemented
with MICV.
our proposed 12-bit by 12-bit fixed-width RPR-based ANT
multiplier design in TSMC 90-nm CMOS process technology.
and one minor compensation vector MICV(α) to amend First, we compare the proposed fixed-width multiplier with
the insufficient error compensation cases. The compensation other literature designs [2], [17], respectively. All performance
vector ICV(β) is realized by directly injecting the partial terms comparisons are evaluated under 12-bit ANT-based multiplier
of X n−1 Yn/2 , X n−2 Y(n/2)+1 , X n−3 Y(n/2)+2 , . . . , X (n/2)+2 Yn−2 . designs. To evaluate the truncation error compensation per-
These directly injecting compensation terms are labeled as formance, we define the index of absolute mean error (εavg),
C1 , C2 , C3 , . . . , C(n/2)−1 in Fig. 9. The other compensation mean-square error (εms ), maximum truncation error (εmax ),
vector used to mend the insufficient error compensation case is and variance of error (εν ), respectively. The comparison basis
constructed by one conditional controlled OR gate. One input is the fixed-width RPR without any compensation vector
of OR gate is injected by X (n/2) Yn−1 , which is designed to and the definition of various error evaluation indexes are
realize the function of compensation vector β. The other input expressed as
is conditional controlled by the judgment formula used to εavg
judge whether β = 0 and βl = 0 as well. As shown in Fig. 8, εavg,% = × 100% (8)
εavg (Truncated)
the term Cm1 is used to judge whether β = 0 or not. The judg-
εms
ment function is realized by one NOR gate, while its inputs are εms,% = × 100% (9)
X n−1 Yn/2 , X n−2 Y(n/2)+1 , X n−3 Y(n/2)+2 , . . . , X (n/2)+2 Yn−2 . εms(Truncated)
εmax
The term Cm2 is used to judge whether βl = 0. The judgment εmax,% = × 100% (10)
function is realized by one OR gate, while its inputs are εmax(Truncated)
εν
X n−2 Yn/2 , X n−3 Y(n/2)+1 , X n−4 Y(n/2)+2 , . . . , X (n/2)+1 Yn−2 . If εν,% = × 100%. (11)
both of these two judgments are true, a compensation term εν(Truncated)
Cm is generated via a two-input AND gate. Then, Cm is The smaller value of εavg , εms , εmax , and εν represents the
injected together with X (n/2) Yn−1 into a two-input OR gate more accurate EC performance. The precision analysis results
to correct the insufficient error compensation. Accordingly, of various fixed-width RPR multipliers or full-width RPR mul-
in the case of β = 0 and βl = 0 as well, one additional tiplier are shown in Table II. The fixed-width RPR multipliers
carry-in signal C(n/2) is injected into the compensation are the six-bit multipliers while their LSPs are truncated and
vector to modify the compensation value as β + 1 instead various error compensation vectors are applied. The full-width
of β. Moreover, the carry-in signal C(n/2) is injected in the RPR multiplier is the six-bit by six-bit multiplier. As shown
bottom of error compensation vector, which is the farthest in Table II, the fixed-width RPR designs usually perform with
location away from the critical path. Therefore, not only the higher truncation errors than that of the full-width RPR design
WEY et al.: RELIABLE LOW-POWER MULTIPLIER DESIGN 83
TABLE II
C OMPARISONS OF THE A BSOLUTE M EAN E RROR , THE M EAN -S QUARE
E RROR , AND THE VARIANCE OF E RROR U NDER VARIOUS
RPR-BASED 12-B IT ANT M ULTIPLIER D ESIGNS
Fig. 11. Comparisons of delay profiles for the six-bit fixed-width RPR, the
six-bit full-width RPR, and the 12-bit full-length multiplier.
from 6.83 to 9.13 dB. The analysis results show that the
output signal quality of the conventional full-width RPR-based
ANT design is influenced by the truncation error. However,
the output SNR of the our proposed fixed-width RPR-based
ANT design can still be maintained high and is not severely
influenced by the truncation error because a precise error
compensation vector is applied.
To lowering supply voltage beyond critical supply voltage
without sacrificing the throughput and without leading to
Fig. 10. Comparisons of the output signal SNR for various RPR designs severe SNR degradation, the critical computation delay in
with different truncated bits. the RPR block must be as fast as possible. The shorter
computation delay in RPR can bring the benefits of lower
because more computation cells are truncated. However, with overscaling supply voltage adopted. The comparison of delay
appreciate error compensation vector or multiple truncation profiles under various 10 000 input random patterns for
error compensation vectors, the fixed-width RPR designs still the presented six-bit fixed-width RPR design, the previous
have the chance to perform with lower truncation errors. six-bit full-width RPR design, and the standard 12-bit full-
As shown in Table II, the absolute mean error, the mean length multiplier are shown in Fig. 11. Because the compu-
square error, the maximum error, and the variance of error tation word length in the RPR block is less than that in the
in our proposed fixed-width RPR multiplier can be lowered to main block computation unit, the critical computation delay
21.39%, 5.57%, 9.18%, and 9.00%, respectively, in the 12-bit and the average computation delay in the RPR block can all be
by 12-bit ANT multiplier design. All these truncation error shorter. In the presented fixed-width RPR design, the critical
evaluation indexes are the lowest ones as compared with the computation delay and the average computation delay can be
state-art-designs of [2] and [17] because multiple truncation further shortened since the fanout and wires loading of adder
EC vectors combined ICV together with MICV are applied to cells are lowered to be close to half. As shown in Fig. 11, the
lower the truncation errors based on probability and statistics critical computation delay in the proposed RPR block can be
analysis. shortened by 56.3% as compared with the main computation
To confirm the consistency of the compensation accuracy, block. As compared with the previous full-width RPR, the
we compare the output signal quality in terms of SNR for critical computation delay in our proposed RPR block can
various RPR designs with different truncated bits, which is also be shortened by 12.6%.
shown in Fig. 10. In the 12-bit ANT multiplier, the truncated To determine the optimized bit-length of fixed-RPR, com-
RPRs are analyzed from word length of five to ten bits. As parisons of the Kvos and RPR area with different bit length
shown in Fig. 10, the output SNR of our proposed six-bit fixed- of RPR block are shown in Fig. 12. Our goal is to select
width RPR-based ANT multiplier design can be maintained the bit length of fixed-RPR, which can achieve the lowest
to 33.15 dB. As compared with the directly truncated six-bit Kvos. The smaller RPR can perform with higher speed, lower
fixed-width RPR, the signal quality can be improved with power consumption, and lower area. However, RPR with lower
13.89-dB enhancement. At the same time, the SNR in the bit length would also lead to SNR degradation. Therefore, a
full-width RPR-based ANT design is 24.28 dB. With various tradeoff between hardware area and RPR bit length is needed.
truncation bits, the output SNR differences between our pro- In this paper, the optimized bit length of fixed-RPR is selected
posed fixed-width RPR-based ANT multiplier design and the as six based on the lowest Kvos analysis results shown in
conventional full-width RPR-based ANT design are ranged Fig. 12.
84 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015
Fig. 14. Comparisons of the output SNR of various RPR designs under
Fig. 12. Comparisons of the Kvos and RPR area with different bit length different Kvos.
of RPR block.
Fig. 15. Comparisons of the total logic synthesis cell area of various RPR
Fig. 13. Comparisons of the error probability of various RPR blocks under designs with different bit length.
different VOS scale.
Fig. 16. Chip layout of the proposed 12-bit fixed-width RPR-based ANT
multiplier design.
TABLE III
P ERFORMANCE S PECIFICATION S UMMARY OF THE P ROPOSED 12-B IT
F IXED -W IDTH RPR-BASED ANT M ULTIPLIER D ESIGN
Chien-Chang Peng was born in Taichung, Taiwan, Feng-Yu Liao was born in Taichung, Taiwan, in 1986. He received the
in 1984. He received the B.S. degree in electronics M.S. degree in electrical engineering from Chang Gang University, Taoyuan,
engineering from Chung Yuan Christian University, Taiwan, in 2010.
Zhongli, Taiwan, and the M.S. degree in electrical His current research interests include VLSI CMOS circuit design and
engineering from Chang Gang University, Taoyuan, ultralow-power circuits design.
Taiwan, in 2008 and 2010, respectively, where he
is currently pursuing the Ph.D. degree from the
Department of Electrical Engineering.
His current research interests include VLSI CMOS
circuits design, ultralow-power CMOS circuits
design, soft-error-tolerant CMOS circuits design,
and noise detection circuits design.