Professional Documents
Culture Documents
10 1109@tcsii 2019 2901060
10 1109@tcsii 2019 2901060
10 1109@tcsii 2019 2901060
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2019.2901060, IEEE
Transactions on Circuits and Systems II: Express Briefs
1
Abstract—In this paper, a low energy consumption block-based structure which is called BCSA adder. In this structure, the adder
carry speculative approximate adder is proposed. Its structure is is partitioned into some non-overlapped parallel blocks, which
based on partitioning the adder into some non-overlapped in the worst-case, the carry output of a block is dependent on the
summation blocks whose structures may be selected from both the carry output of the previous block. To reduce the critical path
carry propagate and parallel-prefix adders. Here, the carry output more, we suggest an approach to predict the carry output of a
of each block is speculated based on the input operands of the block based on its signals as well as of the next block. The
block itself and those of the next block. In this adder, the length of structure has a low hardware complexity leading a low delay (on
the carry chain is reduced to two blocks (worst case), where in average, about one block) and a rather high quality. To achieve
most cases only one block is employed to calculate the carry output a lower accuracy loss, an error detection and recovery
leading to a lower average delay. In addition, to increase the
mechanism, which significantly reduces the output error rate, is
accuracy and reduce the output error rate, an error detection and
proposed. The effectiveness of this adder is compared with some
recovery mechanism is proposed. The effectiveness of the
proposed approximate adder is compared with state-of-the-art
of the state-of-the-art approximate adders. Finally, the efficiency
approximate adders using a cost function based on the energy, of the adder is studied using two image processing applications.
delay, area, and output quality. The results indicate an average of The rest of the paper is organized as follows. Some related
50% reduction in terms of the cost function compared to other works are reviewed in Section II. Section III presents the details
approximate adders. of the proposed approximate adder. The efficacy of the proposed
adder compared to those of the previous ones and also its
Index Terms— Approximate Computing, Low Power, applicability in image processing applications are studied in
Speculative Adder, Energy-Efficient. Section IV. Finally, the paper is concluded in Section VI.
1549-7747 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2019.2901060, IEEE
Transactions on Circuits and Systems II: Express Briefs
2
1549-7747 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2019.2901060, IEEE
Transactions on Circuits and Systems II: Express Briefs
3
=0
Critical Path
Critical Path
Select =1 1 1 0 [ ]= 0 1 1 1
[ ]= 1 0 1 0 [ ]= 1 1 1 1
1 = =0 =0
1 =0
Sub 0 Sub
Sub Sub
0
= =1
[ ]= 1 0 0 0 error [ ]= 0 1 1 0
(a)
Fig. 2. The structure of the proposed adder with Error Recovery Unit (ERU). =0
=1 1 1 0 [ ]= 0 1 1 1
[ ]= 1 0 1 0 [ ]= 1 1 1 1
=0
probabilities have been obtained by assuming that the = =0 1 =0
distribution of the 1 and 0 in the input operands bits are Sub Sub
0
=
uniform. Therefore, the carry input of the (i+1)th block may = = =0
be propagated to its most significant bits. Hence, to reduce error detected =1 error not detected
the probability of the error propagation in the proposed
adder, in this case, the Select unit circuit, chooses the [ ]= 1 0 0 1 error recovered [ ]= 0 1 1 0
whose error probability is smaller than . (b)
• In the second case (𝐾 = 0 and 𝐺 = 1), because 𝐺 is
Fig. 3. An exmpale of functionality of the proposed approximate adder, (a)
1, the speculated carry ( ) is correct. Therefore, for this whitout ERU (BCSA W/O ERU), and (b) with ERU.
case, the is selected as the 𝑛 .
• In the third case (𝐾 = 1 and 𝐺 = 0 ), because 𝐾 These adders include GeAr [13], RAP-CLA [7], SARA [11],
is 1, independent from the accuracy of the carry input of HABA [10], and the HABA equipped with the ERU unit
the (i+1)th block, the carry input is not propagated. proposed in [10] (HABAERU). Note that the ERU circuit in
Therefore, for shortening the critical path, we suggest to HABAERU is different from the one we suggested for BSCAERU.
select as the carry output of the ith block. For this study, three error metrics have been considered
• In the fourth case (𝐾 =1 and 𝐺 = 1), similar to the including Error Rate (ER), Normalized Mean Error Distance
second case, since 𝐺 is 1, the predicated carry is the (NMED), and Mean Relative Error Distance (MRED)[7]. The
same as the exact carry output of the block. Therefore, in NMED and MRED for an n-bit adder are obtained by
the proposed approach, is selected as the 𝑛 . |N|
Among these cases, only in the first case, the carry is 1 |Si − Si ′|
𝑁𝑀𝐸𝐷 = n ∑ (12)
propagated in two blocks. Therefore, on average, the length of 2 2|N|
i=
the carry propagation is close to one block. |N|
In the third case, although the carry input of the block is 1 |Si − Si ′|
killed and is not propagated, the carry input is employed to 𝑀𝑅𝐸𝐷 = ∑ (13)
|N| Si
determine the first summation bit of the block. Therefore, if the i=
carry input in this case is wrong, it impacts on the output where |N| shows the number of the input data samples, and Si
accuracy of the summation. Hence, for improving the accuracy (Si ′) is the exact (approximate) output of the add operation. The
of the proposed adder, we suggest an error recovery unit which ER, NMED, and MRED of the considered adders under different
generates the first summation bit of the ith block ( ) by block sizes and input operand widths are reported in TABLE I.
= (𝐾 . )+(𝑃 ⊕ ) (11) While only the error metrics for the cases of block sizes of 2, 4
𝑛
and 8 are reported in this table, we have performed this study for
Note that 𝑃 ⊕ 𝑛 ( 𝑛 = ) is the approximate the block sizes from 2 to 16.
summation output in the first bit of the (i+1)th block denoted as These metrics have been extracted by applying 65,536 (10
𝐴 in Fig. 2. Since the ERU is not on the critical path of million) uniform random numbers in the case of 8-bit (16- and
adder, using the Error Recovery Unit (ERU) leads to improving 32-bit) adders. As the results show, in the case of 8-bit adder, the
the accuracy without increasing the delay of the proposed adder ER, NMED, and MRED of the BCSAERU are 0 meaning that
structure. Fig. 3 shows the functionality of the proposed BCSAERU is exact.
speculative approximate adder with and without the ERU. The Among the studied adders, BCSAERU and HABA [10] have
ERU imposes only about 3% and 2% power and area overheads, the lowest ER compared to the other ones. On average, the ER
respectively. In the rest of the paper, we call the proposed adder of the BCSA is about 80% larger than HABA [10]. On the other
enhanced by the ERU as BCSAERU. hand, the NMED and MRED of the BCSAERU (BCSA) are, on
average, about 87% (52%) and 86% (40%) smaller than those of
I. RESULTS AND DISCUSSION the other studies adders. In addition, for all the adders, by
increasing the block size, the accuracy increases. As an example,
A. Error Metrics Evaluation in the case of 32-bit BCSAERU (BCSA), the ER, NMED, and
As previously discussed, the Select and Carry Predictor units MRED reduced about 100% (93%), 100% (100%), and 100%
determine the accuracy of the approximate adders. In this (100%), respectively, when the block size was increased from 2
section, the accuracy of the proposed adder is evaluated to 16. TABLE II shows the percentages of the summation results
compared to the four latest state-of-the-art approximate adders. with Relative Error Distance (RED) smaller than a specified
1549-7747 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2019.2901060, IEEE
Transactions on Circuits and Systems II: Express Briefs
4
TABLE I COMPARISON OF ACCURACY RESULTS FOR 8-, 16-, AND 32-BIT of the GeAr. In terms of the energy consumption, GeAr
ADDERS.
consumes lower energy except for the block sizes of 6 and 16
8-bit 16-bit 32-bit where BCSA has the lowest energy dissipation, while SARA
Adder Block ER NMED MRED ER NMED MRED ER NMED MRED
Type Size (%) (×10-4) (×10-4) (%) (×10-4) (×10-4) (%) (×10-4) (×10-4)
consumes the largest energy consumption among the considered
2 12.42 607 621 33.77 608 623 51.23 625 644 adders. As the results show, SARA, HABA, HABAERU,
HABA 4 1.37 137 136 4.29 155 157 9.18 156 158 BCSAERU, C and RAP-CLA consumes, on average, 81%, 60%,
8 0 0 0 0.10 10 10 0.29 10 10 87%, 5%, 4%, and 30%, respectively, higher energy compared
2 12.41 151 155 33.77 154 157 51.21 156 161
4 1.36 9 9 4.28 10 10 9.17 10 10 to that of GeAr.
HABAERU
8 0 0 0 0.1 0.04 0.04 0.29 0.04 0.04 To obtain a better grasp of relative performances of the
2 30.06 605 707 61.77 625 730 65.37 625 730 adders, the plots in Fig. 4(c)-(d) show the relations between the
GeAr 4 5.47 68 93 16.72 156 190 32.18 156 190
8 0 0 0 0.68 10 12 2.25 10 12 accuracies and the EDPs (energy-delay products) of the studied
RAP-
2 15.54 606 646 36.45 626 670 55.71 625 670 8- and 16- and 32-bit adders with the block sizes of 2, 4, 8, 12
CLA
4 2.34 137 147 8.54 129 154 17.17 130 154 and 16. the size of each circle indicates the area usage of the
8 0 0 0 0.34 10 11 1.12 10 11
2 14.43 379 484 35.05 417 531 65.44 563 589
adder. For the case of the 8-bit adder, while the EDP of the GeAr
SARA 4 5.46 68 93 16.75 83 112 35.88 83 112 in some block sizes are smaller than that of the BCSA and
8 0 0 0 6.09 5 7 17.45 5 7 BCSAERU, their accuracies are lower than that of the GeAr. On
2 18.73 167 185 47.86 172 190 74.66 172 189
4 0 0 0 5.89 20 26 16.66 20 26
the other hand, in the case of the 32-bit adder, the EDP and
BCSAERU
8 0 0 0 0 0 0 0.39 0.1 0.07 accuracy of the BCSA and BCSAERU are better than those of the
2 27.47 395 428 60.98 417 554 87.62 417 554 GeAr. Compared to the other adders, the EDP of the BCSA and
BCSA 4 5.46 34 52 21.15 56 81 45.94 55 81 BCSAERU are better for different block sizes and bit widths.
8 0 0 0 6.20 2 3 18.04 3 4
The study show that GeAr has the lowest area in the case of
the block size of 2, while for the block size of 4, the smallest
TABLE II THE ERRONEOUS OUTPUTS WITH RED SMALLER THAN A
SPECIFIC VALUE WHEN N = 8 AND L = 4. area is for the BCSA. For other studied block sizes, SARA has
the smallest area compared to those of other studies approximate
RED(%) ≤ 5% ≤ 10% ≤ 20% ≤ 50% ≤ 100% adders. The areas of the HABA, HABAERU, GeAr, BCSAERU,
HABA & HABAERU 98.6% 98.6% 98.6% 98.6% 100% BCSA, and RAP-CLA are, on average, 14%, 19%, 30%, 15%,
GeAr 95.3% 95.8% 97.2% 99.5% 100% 12%, and 56%, respectively higher than that of the SARA. Also,
RAP-CLA 97.7% 97.7% 97.9% 99.0% 100% as the results shown in Figure 4(c)-(e), the EDP and area usage
SARA 94.5% 94.5% 100% 100% 100% of the BCSA is smaller than those of the exact adders. In the best
BCSA W/O ERU 94.5% 94.5% 100% 100% 100%
BCSA W ERU 100% 100% 100% 100% 100%
(worst) case, the EDP of the BCSA is about 68% (13%) lower
than those of the exact CLA at the cost of 0.055 (1e-6) MRED.
Finally, to assess the efficacy of the proposed adder in image
value for the all the considered 8-bit adders with the block size processing applications, we employed it (16-bit width with
of 4. The RED is extracted by employing (13) without 1/|𝑁| block size of 4) in the sharpening and smoothing applications
term. As the figures of this table show, all the outputs of the with an input set of ten image benchmarks. To evaluate the
BCSA and BCSAERU have RED smaller than 10% while in the accuracy of the output images, we employed Peak Signal-to-
case of HABA (HABAERU), about 1.5% of the outputs have Noise Ratio (PSNR) and Mean Structural Similarity Index
RED values larger than 50%. Method (MSSIM) metrics used in evaluating the qualities of
B. Design Parameters Evaluation image processing applications [7]. The results of this study show
that using the BCSA in sharpening and smoothing image
The hardware of BCSA, BCSAERU, HABA, HABAERU,
processing applications leads to, on average, 33% (12%) and
GeAr, RAP-CLA and SARA were described by Verilog HDL
68% (70%), PSNR (MSSIM) reductions, respectively,
and synthesized by Synopsys Design Compiler. All the studies
compared to BCSAERU. Using the BCSA instead of CLA exact
in this work have been performed using the typical process of
adder led to, on average, 14.4% and 20% energy efficiency and
the 15nm FinFET NanGate technology [17] with the operating
performance gains in the studied image processing applications.
voltage level of 0.8V and at the temperature of 25°C. Based on
It should be pointed out that most portions of the energy and
these implementations, the results of the delay and energy of the
delay in these applications were due to the multiplications which
32-bit studied adders with the block sizes of 2, 4, 6, 8, 12 and 16
were performed using exact multipliers. By employing the ERU,
are presented in Fig. 4(a) and (b), respectively. The design
the PSNR (MSSIM) reductions in the cases of sharpening and
parameters of each adder under different block sizes have been
smoothing image processing applications improved by 14%
reported in these plots where the corresponding block sizes are
(4%) and 50% (67%), respectively compared to the BCSA
provided inside the circle. For extracting the power consumption
without the ERU. Four output images of these applications with
(to obtain the energy consumption), up to 10M random stimuli
the lowest PSNR and MSSIM in the case of using BCSAERU
were injected to the input of the netlist of the synthesized adders
adder are shown in Fig. 5. The reported PSNR and MSSIM
and the activity of the internal nodes of them were logged in the
values are with respect to the images produced by the exact
VCD format. Then, by employing Synopsys Primetime tool, the
adder.
power consumptions of the adders were extracted based on their
corresponding VCD file. Also, the reported delay for each adder
was the critical path delay of the adder determined using the II. CONCLUSION
timing analysis of the Synopsys Design Compiler. In this paper, we proposed a block-based carry speculative
The studies show that the delay of SARA, HABA, approximate adder (BCSA), which was based on dividing an
HABAERU, BCSAERU, BCSA and RAP-CLA are, on average, exact adder into some non-overlapped blocks operated in
88%, 71%,78%, 7%, 5% and 20%, respectively higher than that
1549-7747 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2019.2901060, IEEE
Transactions on Circuits and Systems II: Express Briefs
5
(a) (b)
Fig. 4. The a) delay and b) energy consumption of the investigated 32-bit approximate adders and the relations between the accuracies and the EDPs of the
investigated c) 8- and d) 16- and e) 32-bit adders under different block sizes.
Smoothing Sharpening
1549-7747 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.