Professional Documents
Culture Documents
Probabilistic Error Modeling For Approximate Adders
Probabilistic Error Modeling For Approximate Adders
Probabilistic Error Modeling For Approximate Adders
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2605382, IEEE
IEEE TRANSACTIONS ON COMPUTERS Transactions on Computers 1
Abstract—Approximate adders are widely being advocated as a means to achieve performance gain in error resilient applications. In this
paper, a generic methodology for analytical modeling of probability of occurrence of error and the Probability Mass Function (PMF) of error
value in a selected class of approximate adders is presented, which can serve as performance metrics for the comparative analysis of
various adders and their configurations. The proposed model is applicable to approximate adders that comprise of sub-adder units of uniform
as well as non-uniform lengths. Using a systematic methodology, we derive closed form expressions for the probability of error for a number
of state-of-the-art high-performance approximate adders. The probabilistic analysis is carried out for arbitrary input distributions. It can be
used to study the dependence of error statistics in an adder’s output on its configuration and input distribution. Moreover, it is shown that by
building upon the proposed error model, we can estimate the probability of error in circuits with multiple approximate adders. We also
demonstrate that, using the proposed analysis, the comparative performance of different approximate adders can be correctly predicted in
practical applications of image processing.
Index Terms—Approximate computing, Adders, Probability of error, Probability Mass Function (PMF), Image smoothing, Modeling,
Arithmetic, Analysis
1 I NTRODUCTION
Fig. 2. Generic functional model for approximate adders. N -bit inputs are added using L sub-adders. By setting the values of S1 , S2 , ..., SL and R2 , ..RL ,
the adder can be configured as ACA, ACA-II, GeAr, ETA-II, ETA-IIM, GDA etc.
LSB MSB
Input A 1 1 1 1 0 1 1 0 1 1 G0 P
Input B 1 1 0 0 1 0 0 1 0 1
S1 R2 S2 Sum of indicated group of input bits are
Sub-Adder 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 Sub-Adder 2 G0 generating carry-out. In this example,
Inputs 1 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 Inputs A[1:0]+B[1:0] is generating carry-out.
Sub-Adder 1 Sum of indicated group of input bits are
Output 0 1 0 0 0 0 0 0 1 1 1 1 1 1 1 0 1 Sub-Adder Output
2
P propagating carry-in. In this example,
Approx. Adder Output 0 1 0 0 0 0 0 0 1 0 1 A[7:2]+B[7:2] is propagating carry-in.
If A[1:0]+B[1:0] are generating carry-out AND A[7:2]+B[7:2] are propagating carry-in, Sub-Adder 2 Output is erroneous
Fig. 3. An example of addition using an approximate adder with two sub-adders. Error in the approximate adder is due to error in Sub-Adder 2. This occurs
when all the prediction bit pairs of Sub-Adder 2 are propagating carry-in (the pairs of bits at positions indicated by R2 are unequal) and the less significant
bit pairs, that are not input to Sub-Adder 2, are generating carry-out. Therefore, the event E2 (error in Sub-Adder 2) is represented as simultaneous
carry-out generation (G0) and carry-in propagation (P ) events at the relevant bit locations.
case of ACA, which is not necessary as we show in this paper can lead to error in final output. This introduces error at the
that only the location of carry chain truncation is required to respective bit positions of Sappr and thus the corresponding
predict the error value. In case of error value analysis of ESA probability of error can be defined as the expected relative
and ETA-II, only the error in the most significant sub-adder frequency of incorrect adder outputs:
unit is considered at a time, while ignoring the less significant
ones, which may not be always negligible. Pr[Error] = Pr[Sappr 6= A + B] (1)
It is important to note that the mathematical analysis is where A and B are N -bit adder inputs, having probability
not only a tool for comparison of different adders, but it also distributions pA (.) and pB (.). An example is shown in Fig.
helps understand the adders’ behavior, which in turn helps 3. The output of Sub-Adder 2 is erroneous because there is
in designing more enhanced adder structures. In our work in a carry-out generated by the sum A[1 : 0] + B[1 : 0], which
[20], we demonstrated that the knowledge of the mathematical would have been propagated to A[9 : 8] + B[9 : 8] (as the sum
model of approximation error leads to more efficient design of A[7 : 2] + B[7 : 2] is propagating carry-in i.e., A[i] ⊕ B[i] = 1
accuracy configurable adders. for i = 2, .., 7), if the carry-chain had not been broken. In Fig.
3, ”G0” represents the event that input combination is such
that A[1 : 0] + B[1 : 0] generates a carry-out. Similarly,”P ”
3 A DDER M ODEL
represents the event that A[7 : 2] + B[7 : 2] propagates carry-
Most of the approximate adders, presented in the literature, in. Error in Sub-Adder 2 occurs when both of these events
are designed by exploiting the fact that the occurrence of occur simultaneously. The representation of events in Fig. 3
long carry propagations is very rare. This implies that errors will be used throughout this article. The rectangle labeled as
induced by breaking the carry chain are very infrequent. In G0 represents that there is a carry-out generated from the sum
many designs, carry chain is truncated by dividing the input of inputs at the indicated bit locations and P represents that the
bits among multiple disjoint or overlapping sub-adders, so that sums of bits at the indicated bit locations are all propagating
the longest carry propagation chain is now equal to the size of carry-in.
these sub-adders. Since the occurrence of error is due to the broken carry
Fig. 2 shows a general model of an N -bit approximate chain, a deficiency is created in the sum by an amount equal to
adder consisting of L precise sub-adders. The ith sub-adder the weight of that bit position due to the missed addition of
takes Si + Ri bits as input. The number of bits used to predict carry bit. Thus, we found that in case of an error, the adder
carry-in is Ri . The number of sum bits is Si . The output of first output value is always going to be less than the precise sum.
sub-adder is always precise. Hence, S1 is equal to the length of In case of simultaneous error in more than one sub-adder, the
first sub-adder and R1 = 0. error value is the sum of all the carries that are discarded
Our adder model can be configured to many approximate because of the broken carry chain. Due to this type of behavior,
adder designs, such as ACA-II [3], GeAr [15], GDA [5] and the error can have certain specific values. As a result, the
ETA-II [4], by specifying the values of L, R2 , R3 , ..., RL and metrics such as ACCinf , ACCamp , MSE, MED and NED may not
S1 , S2 , ..., SL . R and S can remain constant in all sub-adders, be adequately informative since they average different forms of
like ACA-II [3], GeAr [15] and ETA-II [4] or they can change magnitude of error (absolute error or hamming distance) over
from sub-adder to sub-adder as in the modified ETA-II in [4] all inputs, that mostly yield the correct output. For the example
and GDA [5]. given in Fig. 3, the value of error is 256. However, since the
Error in the ith sub-adder’s output occurs when its Ri output will be correct for most of the inputs, averaging the
prediction bits fail to correctly predict the carry-in for the error over all inputs will yield a small error value. Therefore,
higher Si sum bits. The error in any of the L sub-adders for this type of adders, it is more pertinent to use probability of
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2605382, IEEE
IEEE TRANSACTIONS ON COMPUTERS Transactions on Computers 4
Event: Error in Sappr Error in Sappr can be due to error in any one of the sub-adders, i.e.,
E2: E 3: E4: Pr[Error]=Pr[E2 V E3 V E4]
Error in Sub-Adder 2 Error in Sub-Adder 3 Error in Sub-Adder 4
Ei Pi AND G0i
S1
G02 P2 Error in the ith sub-adder implies that the Ri prediction bit pairs of
R2 S2 this sub-adder are propagating carry-in (event Pi) AND the sum of
G03 P3 previous less significant input bit pairs, that are not included in the
R3 S3 ith sub-adder input, are generating carry-out (event G0i).
G04 P4
R4 S4 Joint probability terms are simplified such that there
is only one condition for every bit location.
LSB MSB
Event E
Pr[Error] requires evaluation of
joint probabilities. Dependent Events
G02 P2
G04 P4
Equivalent Independent Events
G0 P G1 P
error and PMF of error value as performance metrics. Once the a Ri + Si -bit precise adder. The output of Sub-adder 1 is
PMF of error is found, other metrics like M ED, M SE , N ED always correct as there is no loss of accuracy due to carry-chain
etc. can also be calculated, if required. truncation. However, the outputs of Sub-adder 2 to Sub-adder
L can be erroneous since addition with the carry generated by
4 M ETHODOLOGY FOR A NALYTICAL M ODELING the previous less significant bits has been eliminated. Since the
In this section, we develop a generic methodology for error output sum is obtained by gathering the outputs of all the sub-
probability analysis of the class of adders represented by the adders, as shown in Fig. 2, error in any sub-adder’s output can
model in Fig. 2. The proposed methodology, depicted in Fig. 4, contribute to error in the final output.
primarily has the following components: Error in the ith sub-adder (for i ≥ 2) occurs when the Ri
1) The first step is to identify the errors in the intermediary prediction bits of the sub-adder’s input are all propagating
logical elements of the approximate adder that contribute carry-in and the previous less significant bits of adder’s input,
to error in the output and then mathematically relate the that are not input to this sub-adder, are generating carry-out.
occurrence of such errors with Pr[Error], defined in Eq. The error occurs because the generated carry-out is not prop-
(1). In this step, Pr[Error] will be expressed as the sum agated due to the broken carry chain between sub-adders. Let
of probabilities of joint carry-in propagation and carry-out E2 , E3 , ..., EL represent events associated with the occurrence
generation events for specified groups of input bits. This of error in Sub-adders 2, 3, ..., L respectively. The error events
is explained in Sub-section 4.1. are defined such that Ei = 1 if the ith sub-adder output
2) The next step is to find each joint probability term. For this, is erroneous and Ei = 0 otherwise. Since any one of these
we propose a method to transform these joint probabili- events can contribute to error in the final output Sappr , the
ties into products of probabilities of independent carry-in event associated with an error in Sappr is expressed as the
propagation and carry-out generation events. This method union of these events. Furthermore, since two or more of these
is given in Sub-section 4.2. events can occur simultaneously, these events are not mutually
3) The probabilities of carry-in propagation and carry-out exclusive. The probability of the union of such events can be
generation events are derived in generalized form for evaluated using the inclusion-exclusion principle [27],
arbitrary input distributions, which will be used to com- L
pute each product term. The derivations are given in Sub-
X
Pr[Error] = Pr [E2 ∨ E3 ∨ ... ∨ EL ] = Pr [Ei ]
section 4.3. i=2
4) By building upon the analysis, the analysis for the PMF of X X
− Pr [Ei ∧ Ej ] + Pr[Ei ∧ Ej ∧ Ek ]− (2)
error value is presented in Sub-section 4.4.
2≤i<j≤L 2≤i<j<k≤L
5) In Sub-section 4.5, the proposed analysis is used to esti-
L
" #
mate the error statistics in circuits with multiple approxi- L
^
... + (−1) Pr Ei
mate adders.
i=2
4.1 Identifying Events leading to Approximation Error Each term in Eq. (2) is going to be evaluated separately in
The N -bit approximate adder, given in Fig. 2, is constructed the next sub-sections. For larger number of sub-adders, as the
using L sub-adders. The ith sub-adder, for i = 1, 2, ..., L, is analysis becomes increasingly difficult due to large number of
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2605382, IEEE
IEEE TRANSACTIONS ON COMPUTERS Transactions on Computers 5
since every bit contributes to a single event, the probability This relationship is generalized as follows:
can be evaluated as follows: k
2N −1−k
X2 −1 2 X 1 −1
Pr[E2 ∧ E3 ∧ E4 ] = Pr[P ∧ G0 ∧ G1] pX (k) = pA (2k2 +1 i + 2k1 k + j) (9)
(6)
= Pr[P ] Pr[G0] Pr[G1] i=0 j=0
Algorithm 2 computes the probability of intersection events
for 0 ≤ k ≤ 2k2 −k1 +1 − 1. Similarly, pY (.) can be evaluated
employing the above discussed concept. First, all the propaga-
from pB (.). Assuming that the two adder inputs A and B ,
tion bits are collected and eliminated from carry-generation
and hence X and Y , are independent, the PMF of the sum
event sets (steps 2-6). Then the carry-generation events are
Z = X + Y is given by the convolution of pX and pY .
iteratively simplified into G0 and G1 events (steps 7-17). The
probabilities Pr[P ], Pr[G0] and Pr[G1] used in Algorithm 2, pZ (k) = pX (k) ∗ pY (k), for 0 ≤ k ≤ 2k2 −k1 +2 − 2 (10)
for specific groups of input bits, are derived in Sub-section 4.3.
Algorithm 2 also computes the error value V , which will be If A and B are uniformly distributed between 0 and 2N −1,
used to find the PMF in Sub-section 4.4 (steps 14, 18-23). then X and Y are uniformly distributed between 0 and 2n − 1,
i.e.,
Algorithm 2 Joint Probability Pr[∧i∈I Ei ]
1
, for 0 ≤ k ≤ 2n − 1
1: Input the set of sub-adder indices I in intersection operation, pX (k; n) = pY (k; n) = 2n (11)
S1 , S2 , ..., SL and R2 , R3 , ..., RL . 0, otherwise
2: For
Pi−1 ith sub-adder (2 ≤P i ≤ L), find the set of prediction bits: ri = {x :
i−1
k=1 Sk − Ri ≤ x ≤ k=1 Sk − 1} S The PMF of sum is given as follows:
3: Find union of all the sets of prediction bit positions RU = i∈I ri .
4: Set Print = Pr[P ; RU ]. pZ (k; n) = pX (k; n) ∗ pY (k; n) (12a)
5: Find the sets for carry-out generating bits gi = {0, 1, 2, ..., i−1
P
k=1 Sk }− k + 1
ri , for all i ∈ I . , 0 ≤ k ≤ 2n − 1
6: Update every gi = gi −T 22n
RU .
n+1
7: Find intersection GI = i∈I gi . pZ (k; n) = 2 −k−1 (12b)
8: Initialize the set of generation events SG = {GI }. , 2n − 1 < k ≤ 2n+1 − 2
22n
9: Update Print = Print × Pr[G0; GI ].
10: Find difference set GD = {gi − GI : i ∈ I ∧ gi − GI 6= Φ}. 0, otherwise
11: for every elementT of GD do
12: Update GI = gd ∈GD gd . Note that in case of uniformly distributed inputs, the dis-
13: Update Print = Print × Pr[G1; GI ]. tributions pX and pY only depend on the number of bits
14: Add GI to the set SG . n = k2 − k1 + 1, while in case of non-uniformly distributed
15: Update GD = {gd − GI : gd ∈ GD ∧ gd − GI 6= Φ} inputs, bit locations (k1 , k2 ) are also significant.
16: end for
17: Output Print .
P i 4.3.1 Probability of Event P
18: For error value, find si = Sk .
k=1 Carry-in is propagated by an n-bit addition when all n pairs of
19: Initialize error value V = 0.
20: for every element g of SG do
the input bits are propagating. This happens when the values
21: Update V = V + 2s , where s is found as follows: for di = si − of bits in every pair of addition arguments are unequal. When
max(g), s = si for which di has the smallest positive value. every pair of bits satisfy this condition, the carry-out will be
22: end for equal to carry-in and the sum will be equal to 2n − 1. Hence,
23: Output error value V .
the probability of carry-in propagation by adding X and Y is
given as follows:
4.3 Probabilities of Carry-in Propagation and Carry-out Pr[P ; k1 , k2 ] = pZ (2n − 1) (13)
Generation Events
We found in Sub-section 4.2 that all the terms in Eq. (2) can be where Z = X + Y and pZ (k) = pX (k) ∗ pY (k). Pr[P ; k1 , k2 ]
represented as product of probabilities of carry-in propagation denotes the probability of propagation event for given k1 , k2 ,
and carry-out generation by addition of specific groups of bits indicating that pX and pY depend upon the bit locations k1 , k2 .
of N -bit inputs A and B . In this section, we first derive the In case of uniformly distributed inputs, probability of carry-
PMFs of groups of bits of inputs A and B for given pA and pB , in propagation is evaluated by using k = 2n − 1 in Eq. (12).
which are then used to derive the probabilities of events P , G0 In this case, since all the bits are identically distributed, it only
and G1. Analysis for the special case of uniformly distributed depends upon the number of bits n. Hence,
inputs is also presented to demonstrate the application of the 1
proposed method. Pr[P ; n] = (14)
2n
Let X = A[k2 : k1 ] and Y = B[k2 : k1 ], for 0 ≤ k1 ≤
k2 ≤ N − 1, be the n-bit (n = k2 − k1 + 1) groups of bits of 4.3.2 Probabilities of Events G0 and G1
A and B , respectively. In order to find the probability of carry- G0 denotes the event for carry-out generation provided that
in propagation and carry-out generation by the addition of X carry-in for n-bit addition is zero, i.e.,
and Y , we first need the probability distributions of X and Y .
If pA (k) is the PMF of A, then pX (k) can be found by summing Pr[G0; k1 , k2 ] = Pr[Carry-out = 1|Carry-in = 0] (15)
pA (k) over all the values of A for which X = k . For example, Since carry-out is one when the sum Z cannot be accommo-
if A = [a6 , a5 , a4 , a3 , a2 , a1 , a0 ] and X = [a3 , a2 ], then pX (k) is dated in n bits, Pr[G0] is given as follows:
given as follows:
6 2n+1
X−2
a6 +25 a5 +24 a4
X
pX (k) = pA 2 +2 2
k+2a +a
(7) Pr[G0; k1 , k2 ] = Pr[Z > 2n − 1] = pZ (k) (16)
1 0
a6 ,a5 ,a4 , k=2n
a1 ,a0 ∈{0,1}
for k = 0, 1, 2, 3. The above equation can also be written as a For uniformly distributed inputs, Pr[G0; n] is found by using
double summation: pZ (.) from Eq. (12) in Eq. (16):
3 2
2X −1 2X −1 2n+1
X−2
4 2 2n+1 − k − 1 2n − 1
pX (k) = pA (2 i + 2 k + j) (8) Pr[G0; n] = = n+1 (17)
2 2n 2
i=0 j=0 k=2n
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2605382, IEEE
IEEE TRANSACTIONS ON COMPUTERS Transactions on Computers 7
still remain approximately uniformly distributed. This is be- Pr[Error; k] = Pr[E2 ] + Pr[E3 ] − Pr[E2 ∧ E3 ] (28)
cause sum from every single full adder can be 1 or 0 with equal
probability. They are not exactly uniformly distributed because Now the probability of error in the ith sub-adder of ACA-II
of the occurrence of error. However, since the occurrence of can be obtained by using Eq. (4) as follows:
error is infrequent in most adder configurations, we found Pr[Ei ] = Pr[Pi ; (i − 1)k, ik − 1] Pr[G0i ; 0, (i − 1)k − 1] (29)
that the distribution of output bits is going to be very close to
the uniform distribution. Thus the probability of error models where Pr[P ; k1 , k2 ] and Pr[G0; k1 , k2 ] are found using Eqs. (13)
remain approximately the same for every approximate adder and (16), respectively. Similarly, using Eq. (5), we get
and they can be applied by considering each adder as an
Pr[E2 ∧ E3 ] = Pr[P2 ∧ G02 ∧ P3 ∧ G03 ] (30)
independent component.
Consider a circuit with M identically constructed, N -bit In order to handle the joint probability term in Eq. (28),
approximate adders, connected together to perform a more we transform the dependent events into independent events
complex arithmetic operation involving the addition of M + 1 according to the proposed methodology, as shown in Fig. 8.
N -bit numbers. We further assume that all the M +1 inputs are Now the probability of the joint event is given as follows:
independent. If the probability of error in each adder is ρ, then
the probability of error in the final addition is approximated Pr[E2 ∧ E3 ] = Pr[P ; k, 3k − 1] Pr[G0; 0, k − 1] (31)
using the inclusion-exclusion principle as follows:
For uniformly distributed inputs, Pr[P ] and Pr[G0] only de-
M pend on the number of bits. Therefore, the probability of error
!
X M
Pr[Error] ≈ (−1)m+1 ρm (26) comes out to be as follows:
m=1
m
1 2k − 1 1 22k − 1 1 2k − 1
Since all the M + 1 inputs are independent, the PMF of Pr[Error; k] = + − (32)
2k 2k+1 2k 22k+1 22k 2k+1
cumulative error is estimated by the convolution of the PMFs
of error in individual adders [27]. Similarly, an equivalent expression for Pr[Error] can be de-
rived for the ACA-II adder with four sub-adders,
pVM ≈ pV ∗ pV ∗ ... ∗ pV (27)
| {z } Pr[Error] = Pr[E2 ] + Pr[E3 ] + Pr[E4 ]
M convolutions
− Pr[E2 ∧ E3 ] − Pr[E3 ∧ E4 ] (33)
Two examples of circuits with multiple approximate adders
− Pr[E2 ∧ E4 ] + Pr[E2 ∧ E3 ∧ E4 ]
are shown in Fig. 7. The exact configuration in which the
adders are connected may differ from application to appli- The transformation of dependent events into equivalent in-
cation. The configuration in Fig. 7(a) can be used in filter- dependent events for the intersection terms (E2 ∧ E4 ) and
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2605382, IEEE
IEEE TRANSACTIONS ON COMPUTERS Transactions on Computers 9
Pr[Error]
k k 0.3 ACA-II, L = 4
Proposed, L = 5
Inputs to Sub-adder 3 ACA-II, L = 5
0.2
Inputs to Sub-adder 1 Exhaustive
Increased simulation
LSB MSB
0.1 error in
ACA-II model
G0 P
with increasing L
G0 P 0
5 10 15 20 25 30 35 40
G0 P Number of Input Bits N
Pr[G0] Pr[P] Fig. 10. Comparison of proposed model with exhaustive simulations and
X approximate model for the ACA-II adder in [3].
Pr [E2 ᴧ E3]
five. Also with increasing number of input bits N , we found
Fig. 8. Derivation of Pr[E2 ∧ E3 ] for ACA-II with three sub-adders. that this difference decreases and the exact probability of error
approaches the approximation given in [3]. This is because of
Error in Sappr the fact that with increasing N , if the number of sub-adders
E2: Error in Sub- E3: Error in Sub- E4: Error in Sub- is kept constant, then the number of prediction bits increases
Adder 2 Adder 3 Adder 4 in each sub-adder that in turn increases the probability of
correct carry prediction and the outputs of different sub-adders
k k
become less inter-dependent.
k
k
Inputs to Sub-adder 4
Inputs to Sub-adder 3
Inputs to Sub-adder 2 5.2 Probability of Error in Generic Accuracy Configurable
Inputs to Sub-adder 1 (GeAr) Adder
LSB MSB
The generic adder, given in Fig. 3, can be configured to rep-
G0 P G0 P resent the GeAr adder [15] by considering R2 = R3 = ... =
G0 P G0 P RL = R, S2 = S3 = ... = SL = S and S1 = S + R. In GeAr
G0 P
model, R and S may or may not be equal, but they are same for
G0 P G1 P G0 P all sub-adders. For three sub-adders, the probability of error
can be represented using the inclusion-exclusion principle,
Pr[G0] Pr[P] Pr[G1] Pr[P] Pr[G0] Pr[P] given in Eq. (2), as follows:
(E2 ∧ E3 ∧ E4 ) is illustrated in Fig. 9. For uniformly distributed The event represented by the intersection term in Eq. (35) was
inputs, Pr[Error] comes out to be as follows: found to be different for GeAr configurations with R ≥ S and
R < S . After the transformation of events, according to the
1 2k − 1 1 22k − 1 1 23k − 1
Pr[Error; k] = + + proposed methodology, the general expressions for the GeAr
2k 2k+1 2k 22k+1 2k 23k+1 ! adder with three sub-adders are given as follows: For R ≥ S ,
k 2k
1 2 −1 1 2 −1 1 2 − 1 2k − 1
k
1
− 2k k+1 − 2k 2k+1 − 2k k+1 + k
2 2 2 2 2 2 2k+1 2 Pr[Error; S, R] = Pr[P ; S, S + R − 1] Pr[G0; 0, S − 1]
1 2 −1 k + Pr[P ; 2S, 2S + R − 1] Pr[G0; 0, 2S − 1] (37)
+
23k 2k+1 − Pr[P ; S, 2S + R − 1] Pr[G0; 0, S − 1]
(34)
For uniformly distributed inputs, Pr[Error] is a function of S
Fig. 10 shows a comparison of the proposed analysis with and R:
simulation results. It can be seen that the proposed analysis is
in perfect agreement with the results obtained via exhaustive 1 2S − 1 1 22S − 1
Pr[Error; S, R] = +
simulations. 2R 2S+1 2R 22S+1 (38)
1 2S − 1
5.1.1 Comparison with Approximate ACA-II Model − S+R S+1
2 2
Fig. 10 also shows comparison with the probability of error
model presented in [3]. This model assumes that the sub-adder Similarly for R < S ,
outputs are independent of one another. The assumption sim-
Pr[Error; S, R] = Pr[P ; S, S + R − 1] Pr[G0; 0, S − 1]
plifies the analysis but the model can compute only approxi-
mate probability of error. We found that the difference between + Pr[P ; 2S, 2S + R − 1] Pr[G0; 0, 2S − 1]
(39)
exact probability of error and that predicted by the model in − Pr[P ; S, S + R − 1] Pr[G0; 0, S − 1]
[3] increases as number of sub-adders increases from three to × Pr[P ; 2S, 2S + R − 1] Pr[G1; S + R, 2S − 1]
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2605382, IEEE
IEEE TRANSACTIONS ON COMPUTERS Transactions on Computers 10
0.6 TABLE 1
Exhaustive simulation, S = 2 Comparison of probability of error computed using the proposed analysis
Proposed analysis, S = 2 with Monte-Carlo simulation results.
Pr[Error]
0.4
Exhaustive simulations, S = 1
Proposed analysis, S = 1 𝐏𝐫[𝐄𝐫𝐫𝐨𝐫]
0.2
L = 3, N = 3S + R Adder Input Distribution Monte-
Analysis Carlo
0 ACA-II Uniform 0.0293 0.0293
0 2 4 6 8 10 12
Number of Prediction Bits R N=12, k=4, Gaussian 𝜇 = 2040, 𝜎 = 295.6 0.0293 0.0290
L=2 Geometric 𝑝 = 0.01 0.0164 0.0168
Fig. 11. Comparison of proposed model with exhaustive simulations for the ACA-II Uniform 0.0586 0.0586
GeAr adder in [15]. N=16, k=4, Gaussian 𝜇 = 32760, 𝜎 = 4729 0.0586 0.0587
L=3 Geometric 𝑝 = 0.0005 0.04910 0.0498
ACA-II Uniform 0.0870 0.0870
For uniformly distributed inputs, Pr[Error] is given as fol- N=20, k=4, Gaussian 𝜇 = 5 × 105 , 𝜎 = 7 × 104 0.0870 0.0853
lows: L=4 Geometric 𝑝 = 0.00005 0.0699 0.0703
1 2S − 1 1 22S − 1 GeAr N=16, Uniform 0.0068 0.0067
Pr[Error; S, R] = R S+1
+ R 2S+1 R=7, S=3, L=3 Geometric 𝑝 = 0.0005 0.0043 0.0043
2 2 2 2 (40) GeAr N=16, Uniform 0.4352 0.4354
1 2S − 1 2S−R − 1
1
− 2R S+1 + S−R R=1, S=5, L=3 Geometric 𝑝 = 0.0005 0.3918 0.3857
2 2 2S−R+1 2 GeAr N=16, Uniform 0.0814 0.0813
R=4, S=3, L=4 Geometric 𝑝 = 0.0005 0.0607 0.0603
Fig. 11 shows a comparison of analysis and simulation
GeAr N=16, Uniform 0.8501 0.8501
results for uniformly distributed inputs, which validates the
R=0, S=4, L=4 Geometric 𝑝 = 0.0005 0.7584 0.7591
proposed analysis. Table 1 shows results for various input
ETA-II-M, Uniform 0.0293 0.0293
distributions for ACA-II and GeAr adders. It can be seen that N=16, L=3,
the analysis and simulation results are in good agreement. Geometric 𝑝 = 0.0005 0.0292 0.0291
S1=8, S2=S3=4,
PMFs for GeAr adders are evaluated using the method de- Gaussian 𝜇 = 32760, 𝜎 = 4729 0.0292 0.0290
R2=4, R3=8
scribed in Sub-section 4.4. For L = 3, PMFs are given by Eqs.
(24) and (23). Results for inputs with uniform and geometric
distributions are shown in Fig. 12, which show that the analysis are identical
and simulation results are in good agreement. Adder and all inputs are independent
Evaluated ER and
MEDuniformly
MSE
distributed. It can be seen that the simulation results𝟑 are in 𝟔
from (× 𝟏𝟎 ) (× 𝟏𝟎 )
good agreement with the proposed analysis.
Gaussian Smoothing using 3x3 kernel. 𝑴 = 𝟖, 𝑵 =adders
Using Eq. (27), PMF of error in case of cascaded 𝟏𝟔, 𝑳 = 𝟑
5.3 Probability of Error in Modified Error Tolerant Adder-II
(ETAIIM) GeAr Image 0.9432 3.861
was approximated. Mean Error Distance (MED), which is one 0.0546
𝑅 = 1, 𝑆 = 5
of the commonly used Analysis
performance 0.988
metrics in4.0944 22.92
approximate
In some approximate adders, including ETAIIM and some
configurations of GDA, the number of prediction and sum bits GeAr is found using
computing, pV as follows:
Image 0.3668 1.247
differ from sub-adder to sub-adder. The lengths of individual 𝑅 = 4, 𝑆 = 4 Analysis ∞ 0.383 1.018 4.84
X
sub-adders may also vary. These cases are not covered by the GeAr M ED =
Image |v|p (v)
0.0391
V 0.197 (42)
GeAr model. The approximate adder model in Section 3 is a 𝑅 = 7, 𝑆 = 3 v=−∞
Analysis 0.0534 0.245 1.85
more general model, so the proposed methodology can also be The GeAr
computed MED is compared Image with
0.00476simulation results in
applied to compute the Pr[Error] in ETAIIM and GDA. Fig. 𝑅 = 10, 𝑆 = 2
13(b). Analysis 0.00584 0.0698 1.015
In this case study, we consider a structure similar to the
ETAIIM [4] with three sub-adders, i.e., L = 3. In this design, Gaussian Smoothing using 3x3 kernel. 𝑴 = 𝟖, 𝑵 = 𝟖
the number of prediction bits in the 3rd sub-adder is larger, 5.5 GeAr 𝐿=4
Observations and Image
Discussions0.9667
so the probability of error in more significant bits is smaller. 𝑅 = 0, 𝑆 = 2
5.5.1 Accuracy of Analysis Analysis 0.999 0.254 0.0727
Considering the case with S2 = S3 = S and R3 = 2R2 , GeAr 𝐿 = 3 Image 0.69
The expressions for probability of error derived for uniformly
the Pr[Error] for uniformly distributed inputs is evaluated 𝑅 = 2, 𝑆inputs
distributed = 2 usingAnalysis
the proposed0.81 methodology0.059are accu-0.006
as follows: in the𝐿 sense
rate GeAr = 4 that they Image 0.289
precisely tell the fraction of input
1 2S − 1 1 2N −S−R3 − 1 𝑅 = 4, 𝑆 =for
combinations 1 whichAnalysis
the adder output
0.3189will be inaccurate.
0.028 0.0034
Pr[Error; S, R] = + ThisGeAr
is confirmed by Figs. 10 and 11, in which the proposed
2R2 2S+1 2R3 2N −S−R3 +1 (41) 𝐿=3 Image 0.051
S
1 2 −1 analysis
𝑅 = 5,is 𝑆compared
=1 with the results of exhaustive simula-
− R3 S+1 Analysis 0.118 0.0117 0.0014
tions. The exhaustive simulations are performed by evaluating
2 2 Gaussian Smoothing using 5x5 kernel. 𝑴 = 𝟐𝟒, 𝑵 = 𝟏𝟔
all 22N input combinations. In case of non-uniformly dis-
Table 1 shows the results for ETA-IIM with various input GeAr 𝐿 = 3 Image 0.7804
tributed inputs, we introduced some approximations to keep
distributions. 𝑅 = 4, 𝑆 = 4 Analysis in Sub-section
0.7652 2.96Compar- 19.8
the analysis simple, as explained 4.3.
isonsGeAr 𝐿 = 13 show thatImage
in Table 0.1801
the analytical and simulation results
5.4 Probability of Error in Circuits with Multiple match𝑅= 7, 𝑆for
well = both
3 types Analysis 0.1518
of input distributions. 0.815 6.73
Approximate Adders GeAr 𝐿 = 4 Image 0.8381
Using the approximation in Eq. (26), the probability of error in 5.5.2𝑅 =Complexity
4, 𝑆 = 3 of Analysis
Analysis 0.8696 6.0724 79.52
case of multiple additions is evaluated for several approximate GeAr 𝐿 = that
We observed 4 in caseImage 0.0899of sub-adders, the
of large number
adders. In all the simulations, the approximate adders are 𝑅 = 8, of
derivation 𝑆= 2 probability
exact Analysis of error
0.1003 using the
0.845 proposed 12.3
connected together in the form of a cascade, similar to the methodology
GeAr 𝐿 = 5becomes significantly
Image complex. This is due to
0.4042
one shown in Fig. 7(b). The results of analysis are compared increase
𝑅 = in6, the
𝑆 =number
2 ofAnalysis
terms in Eq.0.4340
(2). The number
3.042of terms 48.5
with those of Monte-Carlo simulations in Fig. 13(a). For estima- L−1
P L−1
in Eq. (2) in case of L sub-adders is . Therefore, for
tion of probability using the Monte-Carlo simulations, 100000 i
i=1
randomly generated input combinations were used. In every large L, Algorithms 1 and 2 can be used to automate/program
considered case for the comparison, all the approximate adders the analysis, if required.
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2605382, IEEE
IEEE TRANSACTIONS ON COMPUTERS Transactions on Computers 11
0.2
S = 5, R = 1, S = 3, R = 7 0.03 S = 3, R = 4 S = 4, R = 0 0.15 Monte-Carlo
0.2 0.004
L=3 L = 4, N = 16 Analysis
pV (v)
pV (v)
pV (v)
pV (v)
L=3 L = 4, N = 16
pV (v)
pV (v)
pV (v)
L=3
pV (v)
pV (v)
0.02 0.1
0.1 N = 16 0.002 S = 4, R = 0
0.01 0.1
L = 5, N = 20
0 0 0 0 0
0 1000 2000 0 4000 8000 0 4000 8000 0 2000 4000 0 2 4 6 8
(f) (g) (h) (i) (j)
Error v Error v Error v Error v Error v x 10
4
Fig. 12. Comparison of analysis and simulation results for PMFs of various configurations of GeAr adders. (a)-(e) Results with uniformly distributed inputs.
(f)-(j) Results with geometrically distributed inputs.
0.8 GeAr(10,3,1)
Pr[Error]
2
0.6 10 Coinciding curves for GeAr(10,3,1)
MED
and GeAr(11,2,3)
0.4 GeAr(11,2,3)
Fig. 13. Comparison of analysis and simulation results for of circuits with M adders using GeAr(N,S,R) adders.
5.5.3 Effects of Input Distribution a certain constraint on accuracy. If all the sub-adders are
Table 1 shows the comparison of results with various input implemented as Ripple Carry Adders (RCAs), number of full
distributions. It can be seen that the results with Gaussian adders increase by (L − 1)R as compared to a precise adder.
distribution are very close to those with uniform distribution. The GeAr configurations were synthesized on a XILINX Virtex
Similar behavior was observed for PMFs as well. For most 6 XC6VLX75T FPGA and the area, in terms on FPGA look-
input distributions, we found that the Pr[Error] and PMF are up tables (LUTs), was found. The results are plotted in 14(b),
not significantly affected, except in case of steeply decreasing which shows that the required number of LUTs increases
geometric distributions. The results are shown in Table 1 and approximately linearly with number of prediction bits, for a
Fig 12. This is due to the fact that the probabilistic behavior of fixed N and L. The Pr[Error] decreases with increasing R
these adders depends upon the lower significance bits, which for fixed values of N and L. However, the curves for various
still remain approximately uniformly distributed. The input values of N are found to overlap which means that there is no
distribution largely affects the more significant bits, that are significant effect on Pr[Error] with changing N . This happens
not responsible for error in these approximate adders. The because an increase in N with constant L and R means that S
geometric distributions used in Table 1 and Fig 12 are so has been increased. It can be observed from Eqs. (17) and (21)
rapidly decaying that it significantly affects the distribution that probability of carry-out generation rapidly approaches 0.5
of propagation bits of last sub-adder. as the number of bits n increases. From Eqs. (38) and (38), we
also found that each of the Pr[P ] terms depends only upon
5.5.4 PMF of Approximation Error R and every Pr[G0] and Pr[G1] term approaches 0.5 with
The PMFs in Fig. 12 show that the approximation error in this increasing S . Therefore, for constant R and L, change in N
class of adders can have certain specific values, which depend or S has very little effect on Pr[Error].
on the locations of carry chain truncation. As explained in In Fig. 15, we have plotted Pr[Error] against the carry-
Sections 3 and 4, the error in the output is equal to the sum chain length R + S that yields insights into the trade-off between
of the weights of all the carry bits that are discarded due to speed and accuracy of the approximate adder. The delay of a
the carry chain truncation and are not correctly predicted by sub-adder is equal to the delay of a R + S -bit precise adder.
the propagating bits in the respective sub-adders. As a result, The delay only depends upon R + S and is not affected by
the PMFs of approximation error are found to consist of widely N or L. It can be observed that Pr[Error] decreases with
separated impulses. The location of these impulses indicate the increasing carry-chain length and to achieve same Pr[Error],
source of error, i.e., the sub-adder that yields the sum bits with required carry-chain length R + S increases with increasing N .
the same weight. This is because Pr[Error] depends pre-dominantly upon R, as
observed in Fig. 14(a). For constant R and L, if N increases,
5.5.5 Predicting Trade-offs among Accuracy, Speed and Area- then S , and hence R + S would increase. For constant N and
on-chip L, increase in carry-chain length is a consequence of increased
Figs. 14 and 15 show various trends of probability of error R that reduces the Pr[Error]. Moreover, by increasing R for
that can be observed using the derived models. In Fig. 14(a), same number of sub-adders, overlap between the inputs of
Pr[Error] has been plotted against number of prediction bits successive sub-adders increases, which in turn requires larger
R for various values of N . This can provide useful informa- area-on-chip.
tion regarding the additional area-on-chip required to achieve It is important to note that we have predicted the trends
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2605382, IEEE
IEEE TRANSACTIONS ON COMPUTERS Transactions on Computers 12
1 TABLE 2
Overlapping curves indicate that N = 10
Pr[Error]
Comparison of probability of error computed using the proposed analysis
Pr[Error] depends more upon R than N N = 12 with image processing results.
0.5 N = 16
N = 20
N = 32 Adder Evaluated from ER MED (× 𝟏𝟎𝟑)
0 Gaussian Smoothing using 3x3 kernel. 𝑴 = 𝟖, 𝑵 = 𝟏𝟔,
0 1 2 3 4 5 6 7 8 9 10
(a) Number of Prediction Bits R GeAr 𝐿 = 3 Image 0.9432 3.861
𝑅 = 1, 𝑆 = 5 Analysis 0.9880 4.0944
L = 3, N = 3S + R
Area (LUTs)
50
GeAr 𝐿 = 3 Image 0.3668 1.247
40
𝑅 = 4, 𝑆 = 4 Analysis 0.3831 1.018
30
GeAr 𝐿 = 3 Image 0.0391 0.197
20 𝑅 = 7, 𝑆 = 3 Analysis 0.0534 0.245
10 GeAr 𝐿 = 3
0 1 2 3 4 5 6 7 8 9 10 Image 0.00476 0.0498
(b) Number of Prediction Bits R 𝑅 = 10, 𝑆 = 2 Analysis 0.00584 0.0698
Gaussian Smoothing using 3x3 kernel. 𝑴 = 𝟖, 𝑵 = 𝟖
Fig. 14. (a) Probability of error in the GeAr adder, and (b) area on FPGA, GeAr 𝐿 = 4 Image 0.9667 0.321
with increasing number of prediction bits R. 𝑅 = 0, 𝑆 = 2 Analysis 0.999 0.254
GeAr 𝐿 = 3 Image 0.691 0.071
−9 𝑅 = 2, 𝑆 = 2 Analysis 0.810 0.059
x 10
0.5 1.6 GeAr 𝐿 = 4 Image 0.289 0.0311
𝑅 = 4, 𝑆 = 1 Analysis 0.3189 0.028
0.4 1.5 GeAr 𝐿 = 3 Image 0.051 0.0213
Increase in required
Path Delay (sec)
carry-chain length 𝑅 = 5, 𝑆 = 1 Analysis 0.118 0.0117
Pr[Error]
Fig. 16. Image smoothing using 3x3 kernel: (a) original image with noise.(b) Image filtered using precise arithmetic. Images filtered using 16-bit GeAr
adders with three sub-adders and difference images showing the erroneous pixels. (c) Filtered image and (d) difference image for R = 1, S = 5.(e)
Filtered image and (f) difference image for R = 4, S = 4. (g) Filtered image and (h) difference image for R = 7, S = 3. (i) Filtered image and (j) difference
image for R = 10, S = 2.
IEEE Transactions on Very Large Scale Integration Systems, vol. 18, no. 4, Muhammad Shafique (M11, SM16) is a full pro-
pp. 517–526, 2010. fessor at Vienna University of Technology (TU
[26] R. Venkatesan, A. Agarwal, K. Roy, and A. Raghunathan, “MACACO: Wien), Austria, where he is directing the Chair
Modeling and analysis of circuits for approximate computing,” in for Computer Architecture and Robust, Energy-
Proc. International Conference on Computer-Aided Design. IEEE Press, Efficient Technologies (CARE-Tech). He was a se-
2011, pp. 667–673. nior research group leader at Karlsruhe Institute of
[27] A. Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Technology (KIT), Germany for more than 5 years
Engineering. Pearson/Prentice Hall, 2008. after receiving his Ph.D. in Computer Science from
KIT in January 2011. Before, he was with Stream-
ing Networks Pvt. Ltd. where he was involved in
research and development of video coding algo-
rithms and optimizations for VLIW-based Processors for three years. His
Sana Mazahir received the B.E. degree (Presi- research interests are in power/energy-efficient, reliable, and adaptive
dent’s Gold Medal) in electrical engineering and computing systems covering various design abstractions of the hardware
the masters degree in electrical engineering (DSP and software stacks (like micro-architecture, architecture, run-time system,
and communications) from the College of Electrical and compiler), embedded systems, emerging technologies and computing
and Mechanical Engineering, National University paradigms, and their applications in Internet-of-things, Cyber-Physical Sys-
of Sciences and Technology (NUST), Pakistan, in tems, and ICT for Developing Countries (ICTD).
2013 and 2015, respectively. In 2015, she worked Dr. Shafique received 2015 ACM/SIGDA Outstanding New Faculty
as a Research Assistant with the System Analysis Award, six gold medals, and several best paper awards and nominations
and Verification Laboratory at NUST. In 2016, she at prestigious conferences like CODES+ISSS 2015, 2014, and 2011, DATE
was a Visiting Student at King Abdullah University 2008, DAC 2016, and ICCAD 2009, Best Master Thesis Award, DAC 2014
of Science and Technology, Saudi Arabia, for six Designer Track Best Poster Award and SS14 Best Lecturer Award. He is the
months. Her research interests include signal construction and process- TPC co-Chair of ESTIMedia 2015 and 2016, and has served on the TPC of
ing for wireless communications and stochastic modeling and analysis of several IEEE/ACM conferences like ICCAD, DATE, CASES, and ASPDAC.
systems and approximate computing. He is a senior member of the IEEE and IEEE Signal Processing Society
(SPS). He holds one US patent.
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.