Professional Documents
Culture Documents
Acmjetc
Acmjetc
net/publication/316286935
CITATIONS READS
54 2,060
2 authors:
All content following this page was uploaded by Anusha Gorantla on 12 September 2019.
Approximate computing is a promising technique for energy-efficient Very Large Scale Integration (VLSI)
system design. It is best suited for error-resilient applications such as signal processing and multimedia.
Approximate computing reduces accuracy but still provides significant and faster results with lower power
consumption. This is attractive to arithmetic circuits. In this article, various novel design approaches of
approximate 4-2 and 5-2 compressors have been proposed for reduction of the partial product stages in
multiplication. Three approximate 8 × 8 Dadda multiplier designs using three novel approximate 4-2 com-
pressors and two approximate 8 × 8 Dadda multiplier designs using two novel approximate 5-2 compressors
have proposed. The synthesis results show that the proposed designs achieved significant accuracy improve-
ment together with power and delay reductions compared to the existing approximate designs.
CCS Concepts: r Computing methodologies; • Hardware → Very large scale integration design
Additional Key Words and Phrases: Approximate computing, 4-2 compressor, 5-2 compressor, dadda
multiplier
ACM Reference Format:
Anusha Gorantla and Deepa P. 2017. Design of approximate compressors for multiplication. J. Emerg.
Technol. Comput. Syst. 13, 3, Article 44 (April 2017), 17 pages.
DOI: http://dx.doi.org/10.1145/3007649
44
1. INTRODUCTION
Various scientific and engineering problems solved using deterministic and precise
algorithms. However, some applications such as an image and video processing can
tolerate errors [Han et al. 2013]. Humans have fewer perceptual abilities in identifying
imprecision during image or video processing. Hence, precise algorithms and models
are inefficient to use in these applications. Approximate computation increases the
performance of the existing digital logic circuits or systems by decreasing the logic
complexity with a tradeoff in accuracy [Gupta et al. 2011; Han et al. 2013; Gupta et al.
2013; Swagath et al. 2013; Li et al. 2015; Nair et al. 2010]. Approximate computing is
an emerging approach to energy-efficient Very Large Scale Integration (VLSI) designs.
Approximate computing can also be applied to the different levels of abstractions.
This article presents a way to introduce approximate computing at the logic level by
introducing possible minimal errors in the truth table and simplifying the logic using
a karunagh map (k-map).
Multiplication is an elementary arithmetic operation and crucial in applications
like digital signal processing. The implementation of multipliers includes generation
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
44:2 A. Gorantla and Deepa P.
of partial products, reduction of partial products using Carry-Save Adder (CSA) and
Carry propagation for computing the final result. Thus, speeding up the CSA circuit and
lowering its power dissipation are crucial for sustaining performance of the multiplier
to stay competitive. To reduce partial products, multi-operand adders are required,
and hence a different design method is needed for multi-operand adders [Parhami
et al. 2010; Ercegovac et al. 2004; Koren et al. 1993]. A different structure, known as
the compressor, can be adopted for multi-operand addition. Wallace and Dadda were
the first ones who explained the usage of compressors and counters, respectively, for
partial product reduction trees in multipliers [Wallace et al. 1964; Dadda et al. 1965].
Early designs of the CSA tree used the Dadda’s column compression technique with the
3-2 counters or, equivalently, the full adders to reduce the partial production stages.
Further, to reduce the partial production stages, 4-2, and 5-2 compressors have been
employed nowadays for high-speed multipliers.
An Error Tolerant Multiplier (ETM) divides the input operands into two parts as
accurate and inaccurate. In the accurate part, exact multiplication is performed at
higher order bits, and in the inaccurate part non-multiplication is constructed with a
certain amount of errors [Kyaw et al. 2010]. A novel 2×2-bit Under Designed Multiplier
(UDM) is proposed to build a larger multiplier [Kulkarni et al. 2011]. [Mahdiani et al.
2010] presents a 6×6 bit Broken Array Multiplier and is faster than an accurate array
multiplier. The 4×4 Imprecise Counter-based Multiplier (ICM) that uses 4:2 inaccurate
compressors to reduce the partial production stages of a Wallace tree multiplier has
a powerfully efficient design to implement multipliers of large sizes [Lin et al. 2013].
Four different approaches of the Approximate Wallace Tree Multiplier (AWTM) are
presented in Bhardwaj et al. [2014]. This design uses a carry-in prediction method,
resulting in hardware reduction and less power, smaller area, and decreased delay
compared to the Accurate Wallace Tree Multiplier. A fast multiplier is based on the
approximate adder that can process data in parallel by cutting the carry propagation
chain [Liu et al. 2014]. However, still there arises a need to develop the efficient adders
and multipliers for recent applications [Momeni et al. 2015].
Two approximate 4-2 compressor architectures and four 8×8 approximate Dadda
multipliers are proposed in Momeni et al. [2015]. Most of the approximate multipliers
resolve for a tradeoff in accuracy, power, delay, and area. There are various approxi-
mate 4-2 compressors, 5-2 compressors, and approximate 8×8 Dadda multipliers that
are proposed in this article to improve the performance and accuracy. The proposed
approximate 8×8 Dadda multipliers presented in the article provides better results
than the approximate 8×8 Dadda Multipliers proposed in Momeni et al. [2015].
This article is organized as follows.
—Section II reviews the exact 4-2 compressor, existing approximate 4-2 compres-
sors, exact 5-2 compressor, and the proposed approximate 4-2 compressors and 5-2
compressors.
—Section III presents the conventional 8×8 Dadda multiplier, proposed approximate 4-
2 compressor-based approximate 8×8 Dadda multipliers and proposed approximate
5-2 compressor-based approximate 8×8 Dadda multipliers.
—The synthesized results, error metrics for the approximate compressors, approxi-
mate 8×8 Dadda multipliers, and application in image processing using 8×8 Dadda
Multipliers are discussed in Section IV.
—Finally, Section V concludes the article.
2. EXISTING DESIGNS
Compressors are the key building blocks used for reducing the partial production
stages during multiplication processes. Therefore, improving the power efficiency of
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
Design of Approximate Compressors for Multiplication 44:3
these architectures can lead to significant savings of the power consumed by the entire
multiplier. In parallel multiplication, (n-2) compressors are used.
2.1. Exact and Proposed Approximate 4-2 Compressors
An exact 4-2 compressor has five inputs and three outputs as shown in Figure 1. It
produces a sum for the same order of the next stage and a carry for one order higher
in the next stage. Also, a carry out (Cout ) becomes the carry in (Cin ) for the next
higher order compressor. A 4-2 accurate compressor design utilizes three Exclusive OR
(EX-OR)- Exclusive NOR (EX-NOR) gates, one Exclusive OR (EX-OR) gate and two 2:1
multiplexers [Chang et al. 2004].
Table I shows the exact 4-2 compressor truth table and the logic equations for outputs
of the exact 4-2 compressor as follows:
Sum = X1 ⊕ X2 ⊕ X3 ⊕ X4 ⊕ Cin (1)
Cout = ( X1 ⊕ X2) X3 + ( X1 ⊕ X2) X1 (2)
Carry = ( X1 ⊕ X2 ⊕ X3 ⊕ X4) Cin + ( X1 ⊕ X2 ⊕ X3 ⊕ X4) X4) (3)
Two approximate 4-2 compressors, such as approximate compressor1 and approxi-
mate compressor2, are presented in Momeni et al. [2015]. The logic, as well as perfor-
mance, is optimized by using these compressors with a tradeoff in accuracy. The gate
level implementation of approximate compressor1 produces the critical path delay of
3 and the corresponding truth table is given in Momeni et al. [2015]. The difference is
computed between the exact 4-2 compressor output and approximate 4-2 compressor1
output. The difference is related to the inaccuracy and produces12 errors.
Various approximate 4-2 compressors and 5-2 compressors are proposed and simpli-
fied using a k-map to further reduce errors and to increase the performance compared
to the exact 4-2 and 5-2 compressors. They are named approximate 4-2 compressor3,
approximate 4-2 compressor4, approximate 4-2 compressor5, approximate 5-2 compres-
sor1, and approximate 5-2 compressor2. The proposed approximate 4-2 compressor and
5-2 compressor designs are required to simplify the design, such that Cin and Cout are
removed from the circuit.
Approximate 4-2 compressors2 is proposed further to reduce the critical path delay
and errors and increase the performance as compared to approximate compressor1.
It simplifies the circuit and gives better results in terms of accuracy. The gate level
implementation of approximate compressor2 produces the critical path delay of 2,
so accuracy is decreased compared to approximate 4-2 compressor1. Four errors are
possible in the 4-2 compressor2.
2.1.1. Proposed Approximate 4-2 Compressor3. The logic equations given below are used
for designing the approximate 4-2 compressor3 and the corresponding truth table is
given in Table II. This proposed design produces three errors as specified by the term
difference in the truth table.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
44:4 A. Gorantla and Deepa P.
Table I. Truth Table for Exact and Existing Approximate 4-2 Compressors
Outputs Difference (Number of Errors)
Existing approximate Existing approximate
Inputs Exact 4-2 4-2 compressors 4-2 compressors
compressor compressor 1 compressor 2 Exact 4-2
Cin X4 X3 X2 X1 Sum Carry Sum1 Carry1 Sum2 Carry2 Compressor Compressor 1 Compressor 2
0 0 0 0 0 0 0 1 0 1 0 0 1 1
0 0 0 0 1 1 0 1 0 1 0 0 0 0
0 0 0 1 0 1 0 1 0 1 0 0 0 0
0 0 0 1 1 0 0 1 0 1 0 0 −1 −1
0 0 1 0 0 1 0 1 0 1 0 0 0 0
0 0 1 0 1 0 0 0 0 0 1 0 0 0
0 0 1 1 0 0 0 0 0 0 1 0 0 0
0 0 1 1 1 1 0 1 0 1 1 0 0 0
0 1 0 0 0 1 0 1 0 1 0 0 0 0
0 1 0 0 1 0 1 0 0 0 1 0 0 0
0 1 0 1 0 0 1 0 0 0 1 0 0 0
0 1 0 1 1 1 0 1 0 1 1 0 0 0
0 1 1 0 0 0 1 1 0 1 0 0 −1 −1
0 1 1 0 1 1 0 1 0 1 1 0 0 0
0 1 1 1 0 1 0 1 0 1 1 0 0 0
0 1 1 1 1 0 1 1 0 1 1 0 −1 −1
1 0 0 0 0 1 0 0 1 – – 0 1 –
1 0 0 0 1 0 1 0 1 – – 0 0 –
1 0 0 1 0 0 1 0 1 – – 0 0 –
1 0 0 1 1 1 0 0 1 – – 0 −1 –
1 0 1 0 0 0 1 0 1 – – 0 0 –
1 0 1 0 1 1 0 0 1 – – 0 1 –
1 0 1 1 0 1 0 0 1 – – 0 1 –
1 0 1 1 1 0 1 0 1 – – 0 0 –
1 1 0 0 0 0 1 0 1 – – 0 0 –
1 1 0 0 1 1 1 0 1 – – 0 1 –
1 1 0 1 0 1 1 0 1 – – 0 1 –
1 1 0 1 1 0 1 0 1 – – 0 0 –
1 1 1 0 0 1 1 0 1 – – 0 −1 –
1 1 1 0 1 0 1 0 1 – – 0 0 –
1 1 1 1 0 0 1 0 1 – – 0 0 –
1 1 1 1 1 1 1 0 1 – – 0 −1 –
The logic equations for outputs of proposed Approximate 4-2 compressor3 as follows:
Sum3 = X1X2 + X3X4 + X1 X2 ( X3 + X4) X1 X4 ( X1 + X2) (4)
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
Design of Approximate Compressors for Multiplication 44:5
2.1.3. Proposed Approximate 4-2 Compressor5. The logic equations (8) and (9) are used
for designing approximate 4-2 compressor4, and the corresponding truth table is given
in Table II. The proposed design produces one error as specified by the term difference
in the truth table.
The logic equations for outputs of Approximate 4-2 compressor5 are as follows:
Sum5 = X1X3X4 + X2X3X4 + X1 X2 X3 X4 + X1 X2 X3X4 + X1 X2X3 X4 (8)
Carry5 = ( X1X2) + ( X3X4) (9)
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
44:6 A. Gorantla and Deepa P.
The logic equations for outputs of exact 5-2 compressor are as follows:
Sum = (X1 ⊕ X2 ⊕ X3 ⊕ X4 ⊕ X5 ⊕ Cin1 ⊕ Cin2 (10)
Cout1 = (X1 + X2)(X3 + X4) (11)
Cout2 = ((((X1 ⊕ X2 ⊕ X3 ⊕ X4) (X1X2)) + ((X1 ⊕ X2 ⊕ X3 ⊕ X4)Cin1)) (12)
Carry = ( X1 ⊕ X2 ⊕ X3 ⊕ X4 ⊕ Cin1) X5 + (( X1 ⊕ X2 ⊕ X3 ⊕ X4 ⊕ X5 ⊕ Cin1) Cin2)
(13)
2.2.1. Proposed Approximate 5-2 Compressor1. The logic equations given below are used
for designing the approximate 5-2 compressor, and the corresponding truth table is
given in Table III. The proposed design produces seven errors as specified by the term
difference in the truth table.
The logic equations for outputs of Approximate 5-2 compressor1 are as follows:
Sum1 = ((X2X3)X5 ) + (X2X3 X5) + (X1 X2 X3X4 X5 ) + (X1 X2 X3X4X5)
+ (X1X2 X3 X4 X5 ) + (X1X2 X3 X4 X5) + (X1X2X3X4X5) (14)
Cout1 = Cout2 = (( X1 + X2) X3) + ( X1X2) + ( X4 + X5) + ( X4X5) (15)
2.2.2. Proposed Approximate 5-2 Compressor2. The logic equations given below are used
for designing the approximate 5-2 compressor2, and the corresponding truth table is
given in Table III. From Table III, it is observed that the proposed design produces five
errors as specified by the term difference in the truth table.
The logic equations for the outputs of the proposed approximate 5-2 compressor2 are
as follows:
Sum2 = X2X3 + X1 X2X3X4 + X3X4X5X1 + X1X2X4X5 + X1X2 X3X4
+ X1X3X4 X5 (16)
Carry2 = ((X1 + X2)X3 + X1X2 + (X4 + X5) + X4X5 (17)
3. DADDA MULTIPLIERS
In a conventional parallel multiplier, generation of partial products is done by mul-
tiplying the multiplicand with each bit of multiplier. Then these partial products are
added together to generate a resultant product. A multiplication process is divided into
two parts, namely the partial product generation and partial product accumulation.
The number of partial products to be added plays an important role in determining the
performance of parallel multiplier.
The key objective is to design and implement multiplier focusing on methods to
decrease the power consumption and minimizing overall delay. These parameters are
inversely proportional to each other, and improving one comes at the cost of the nother.
The two well-known fast multipliers presented by Wallace and Dadda [Wallace et al.
1964; Dadda et al. 1965] consist of three stages. In the first stage, a partial product
matrix is formed. In the second stage, the partial product matrix is reduced to a height
of two. These two rows are combined using an adder in the final stage.
The dots diagram shown in Figure 3 represents the conventional Dadda algorithm
implemented for an 8×8 bit multiplier. Four reduction levels are required with matrix
heights of 6, 4, 3, and 2. Two dots joined by a diagonal line indicate that these dots
are the outputs from a 3-2 compressor. Similarly, two dots joined by a crossed diagonal
indicate that these dots are the outputs from a 2-2 compressor. Sixty-four AND gates,
35 numbers in 3:2 compressors, 7 numbers in 2:2 compressors, and a 14-bit carry
propagating adder are required to form the 16-bit product.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
Design of Approximate Compressors for Multiplication 44:7
Table III. Truth Table for Exact and Proposed Approximate 5-2 Compressors
Outputs Difference (Number of Errors)
Exact 5-2 Proposed approximate Exact 5-2 Proposed approximate
Inputs compressor 5-2 compressors compressor 5-2 compressors
compressor 1 compressor 2
Cin1 Cin2 X5 X4 X3 X2 X1 Sum Carry Sum1 Cout1 Sum2 Carry2 compressor 1 compressor 2
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 1 1 0 1 0 1 1 0 0 0
0 0 0 0 0 1 0 1 0 1 1 1 1 0 0 0
0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0
0 0 0 0 1 0 0 1 0 0 1 1 1 0 1 1
0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0
0 0 0 0 1 1 1 1 0 1 0 1 1 0 0 0
0 0 0 1 0 0 0 1 0 1 0 0 1 0 1 1
0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0
0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0
0 0 0 1 0 1 1 1 0 1 1 1 1 0 0 0
0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0
0 0 0 1 1 0 1 1 0 1 1 1 1 0 1 0
0 0 0 1 1 1 0 1 0 1 1 0 1 0 0 0
0 0 0 1 1 1 1 0 1 0 0 1 0 0 0 0
0 1 0 0 0 0 0 1 0 1 1 0 1 0 0 0
0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0
0 1 0 0 0 1 1 1 0 0 0 0 1 0 1 1
0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 0
0 1 0 0 1 0 1 1 1 1 1 1 1 0 0 0
0 1 0 0 1 1 0 1 0 0 0 0 1 0 1 1
0 1 0 0 1 1 1 0 1 0 1 0 0 0 0 0
0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0
0 1 0 1 0 0 1 1 1 1 0 1 1 0 0 0
0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 1
0 1 0 1 0 1 1 0 1 0 0 1 0 0 0 0
0 1 0 1 1 0 0 1 1 1 1 1 1 0 0 0
0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0
0 1 0 1 1 1 0 0 1 0 1 0 0 0 0 0
0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0
1 0 0 0 0 0 0 1 1 – – – – 0 – –
1 0 0 0 0 0 1 0 1 – – – – 0 – –
1 0 0 0 0 1 0 0 0 – – – – 0 – –
1 0 0 0 0 1 1 1 1 – – – – 0 – –
1 0 0 0 1 0 0 0 1 – – – – 0 – –
1 0 0 0 1 0 1 1 1 – – – – 0 – –
1 0 0 0 1 1 0 1 0 – – – – 0 – –
1 0 0 0 1 1 1 0 1 – – – – 0 – –
1 0 0 1 0 0 0 0 0 – – – – 0 – –
1 0 0 1 0 0 1 1 1 – – – – 0 – –
1 0 0 1 0 1 0 1 1 – – – – 0 – –
1 0 0 1 0 1 1 0 1 – – – – 0 – –
1 0 0 1 1 0 0 1 1 – – – – 0 – –
1 0 0 1 1 0 1 0 0 – – – – 0 – –
1 0 0 1 1 1 0 0 0 – – – – 0 – –
1 0 0 1 1 1 1 1 1 – – – – 0 – –
1 1 0 0 0 0 0 0 0 – – – – 0 – –
1 1 0 0 0 0 1 1 1 – – – – 0 – –
1 1 0 0 0 1 0 1 1 – – – – 0 – –
1 1 0 0 0 1 1 0 0 – – – – 0 – –
1 1 0 0 1 0 0 1 1 – – – – 0 – –
1 1 0 0 1 0 1 0 1 – – – – 0 – –
1 1 0 0 1 1 0 0 0 – – – – 0 – –
1 1 0 0 1 1 1 1 1 – – – – 0 – –
1 1 0 1 0 0 0 1 1 – – – – 0 – –
1 1 0 1 0 0 1 0 1 – – – – 0 – –
1 1 0 1 0 1 0 0 0 – – – – 0 – –
1 1 0 1 0 1 1 1 1 – – – – 0 – –
1 1 0 1 1 0 0 0 0 – – – – 0 – –
1 1 0 1 1 0 1 1 1 – – – – 0 – –
1 1 0 1 1 1 0 1 1 – – – – 0 – –
1 1 0 1 1 1 1 0 0 – – – – 0 – –
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
44:8 A. Gorantla and Deepa P.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
Design of Approximate Compressors for Multiplication 44:9
Fig. 4. The reduction circuitry approach of proposed approximate 8×8 Dadda multiplier3, multiplier4, and
multiplier5.
Fig. 5. Reduction circuitry approach of proposed approximate 8×8 Dadda multiplier6 and multiplier7.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
44:10 A. Gorantla and Deepa P.
smaller devices that reduce the switching power. Smaller gate oxide thickness improves
gate control. Low-technology nodes have considerable benefits on high-speed VLSI
circuits. Tables V, VI, VII, and VIII compare the design parameters with respect to
three technology nodes of existing and proposed approximate designs.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
Design of Approximate Compressors for Multiplication 44:11
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
44:12 A. Gorantla and Deepa P.
Table VII. Comparison of Several 4-2 Compressor Based Approximate 8×8 Dadda Multipliers
Design Power (nanowatts) Delay (Ps) Area Number of Errors
@180 nm (Voltage = 0.9V, Operating Frequency = 1GHz)
A 2x2 bit UDM [Kulkarni et al. 2011] 52,000 710 52 1
4×4 ICM [Lin et al. 2013] 230,000 1,990 130 1
Conventional 8×8 Dadda multiplier 48,083 3,400 1,785 0
based on exact 4-2 and 5-2 Compressor
approximate 8×8 Dadda multiplier1 44,725 2,300 1,529 108
[Momeni et al. 2015]
approximate 8×8 Dadda multiplier2 46,311 2,100 1,538 36
[Momeni et al. 2015]
Proposed approximate 8×8 Dadda 43,212 2,500 1,506 27
multiplier3
Proposed approximate 8×8 Dadda 45,064 4,100 1,656 18
multiplier4
Proposed approximate 8×8 Dadda 47,165 3,750 1,890 9
multiplier5
@90 nm (Voltage = 1V, Operating Frequency = 1GHz)
A 2x2 bit UDM [Kulkarni et al. 2011] 19,240 262 25 1
4×4 ICM [Lin et al. 2013] 85,100 990 65 1
Conventional 8×8 Dadda multiplier 17,791 1,688 862 0
based on exact 4-2 and 5-2 Compressor
Approximate 8×8 Dadda multiplier1 16,548 1,035 820 108
[Momeni et al. 2015]
Approximate 8×8 Dadda multiplier2 17,135 1,419 812 36
[Momeni et al. 2015]
Proposed approximate 8×8 Dadda 15,988 1,432 726 27
multiplier3
Proposed approximate 8×8 Dadda 16,674 1,516 749 18
multiplier4
Proposed approximate 8×8 Dadda 18,309 1,464 767 9
multiplier5
@45 nm (Voltage = 1.1 V, Operating Frequency = 1GHz)
A 2x2 bit UDM [Kulkarni et al. 2011] 3,078 183 14 1
4×4 ICM [Lin et al. 2013] 13,616 495 33 1
Conventional 8×8 Dadda multiplier 2,847 798 388 0
based on exact 4-2 and 5-2 Compressor
Approximate 8×8 Dadda multiplier1 2,648 717 369 108
[Momeni et al. 2015]
Approximate 8×8 Dadda multiplier2 2,742 709 365 36
[Momeni et al. 2015]
Proposed approximate 8×8 Dadda 2,558 682 645 27
multiplier3
Proposed approximate 8×8 Dadda 2,668 748 741 18
multiplier4
Proposed approximate 8×8 Dadda 2,929 753 757 9
multiplier5
proposed approximate 8×8 Dadda multiplier designs, the proposed approximate 8×8
Dadda multiplier3 design achieves low power consumption and area, due to the usage
of proposed approximate 4-2 compressor3. In terms of accuracy, the proposed approxi-
mate 8×8 Dadda multiplier5 design has minimal error as compared to other proposed
approximate 8×8 Dadda multipliers and existing approximate 8×8 Dadda multipliers.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
Design of Approximate Compressors for Multiplication 44:13
Table VIII. Comparison of Various 5-2 Compressor-Based Approximate 8×8 Dadda Multipliers
Design Power (nanowatts) Delay (Ps) Area Number of Errors
@180 nm (Voltage = 0.9V, Operating Frequency = 1GHz)
Conventional 8×8 Dadda multiplier 48,083 3,400 1,785 0
based on exact 4-2 and 5-2 compressor
Proposed approximate 8×8 Dadda 47,034 3,512 765 46
multiplier6
Proposed approximate 8×8 Dadda 45,481 3,790 731 38
multiplier7
@90 nm (Voltage = 1V, Operating Frequency = 1GHz)
Conventional 8×8 Dadda multiplier 17,791 1,688 862 0
based on exact 4-2 and 5-2 compressor
Proposed approximate 8×8 Dadda 13,289 1,891 443 46
multiplier6
Proposed approximate 8×8 Dadda 12,871 1,712 412 38
multiplier7
@45 nm (Voltage = 1.1 V, Operating Frequency = 1GHz)
Conventional 8×8 Dadda multiplier 2,847 798 388 0
based on exact 4-2 and 5-2 compressor
Proposed approximate 8×8 Dadda 2,615 987 275 46
multiplier6
Proposed approximate 8×8 Dadda 2,472 912 247 38
multiplier7
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
44:14 A. Gorantla and Deepa P.
4.6. Application
In this section, the application of the proposed approximate 8×8 Dadda multipliers to
image processing is illustrated. An image sharpening [Burger et al. 2009] approach is
chosen to analyse the quality of the 8×8 Dadda multipliers. The peak-to-signal noise
ratio (PSNR) based on MSE are computed to access the quality of the output image.
Figure 6 show the performance of the exact, existing, and proposed approximate 8×8
Dadda multipliers in terms of PSNR. The output image of the proposed approximate
8x8 Dadda multipliers is compared with the output image of conventional 8×8 Dadda
multiplier. From Figure 6, it is observed that the proposed approximate 8×8 Dadda
multiplier5 has high PSNR compared to other 8×8 Dadda multipliers. The Average
NED values for image sharpening are given in Table X. From Table X, it is observed that
the proposed approximate 8×8 Dadda multiplier5 has a low average NED compared to
other 8×8 Dadda multipliers. As discussed previously, in the proposed approximate 4-2
compressors and approximate 5-2 compressors, the approximate 4-2 compressor5 has
a smaller probability of error. Therefore, the image quality for proposed approximate
8×8 Dadda multiplier5 is high compared to other approximate 8x8 Dadda multipliers.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
Design of Approximate Compressors for Multiplication 44:15
Fig. 6. Output image of size 512×512 (a) Original image. (b) Conventional 8×8 Dadda multiplier. (c)
Approximate 8×8 Dadda multiplier1. (d) Approximate 8×8 Dadda multiplier2. (e) Proposed approximate
8×8 Dadda multiplier3. (f) Proposed approximate 8×8 Dadda multiplier4. (g) Proposed approximate 8×8
Dadda multiplier5 (h) Proposed approximate 8×8 Dadda multiplier6. (i) Proposed approximate 8×8 Dadda
multiplier7.
For audio processing, the output of image quality is not needed [Han et al. 2013] and
these designs are more suitable.
5. CONCLUSIONS
In this article, novel designs of three approximate 4-2 compressors, three designs of
approximate 8×8 Dadda multipliers based on approximate 4-2 compressors, two ap-
proximate 5-2 compressors, and two designs of approximate 8×8 Dadda multipliers
based on approximate 4-2 compressor3 and approximate 5-2 compressors have been
proposed. The proposed approximate 4-2 compressors have given the best results in
power, delay, area, and accuracy as compared to the approximate 4-2 compressors pre-
sented in Momeni et al. [2015]. In addition, the proposed approximate 5-2 compressors
give the best results in power, delay, and area as compared to an exact 5-2 compressor.
Among all approximate 4-2 compressors and approximate 5-2 compressors, the pro-
posed approximate 4-2 compressor3 and approximate 5-2 compressor2 are well suited
for designing energy-efficient digital circuits. Thus, an approximate design approach in
compressors offers a significant advantage in terms of both circuit level and error met-
rics. Therefore, proposed approximate 8×8 Dadda multiplier3 and approximate 8×8
Dadda multiplier7 are more suitable for energy-efficient VLSI architectures. Future
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
44:16 A. Gorantla and Deepa P.
works should optimize logic for high-level designs, and approximate 7-2 compressors
are more helpful to improve the performance of design metrics and to apply suitable
approximate compressors for DSP applications.
ACKNOWLEDGMENTS
The authors thank the ECE Department of Government College of Technology for providing necessary
support for project implementation.
REFERENCES
K. Bhardwaj, P. S. Mane, and J. Henkel. 2014 Power- and area-efficient approximate wallace tree multiplier
for error-resilient systems. In Proceedings of the 15th International Symposium on Quality Electronic
Design (ISQED’14). 263–269 DOI:http://dx.doi.org/10.1109/ISQED.2014.6783335
Wilhelm Burger and Mark James Burge. 2009. Principles of Digital Image Processing: Fundamental Tech-
niques (1st ed.). Springer. DOI:http://dx.doi.org/ 10.1109/TCSI.2004.835683.
C. H. Chang, J. Gu, and M. Zhang. 2004. Ultra-low-voltage, low-power CMOS 4-2 and 5-2 com-
pressors for fast arithmetic circuits. IEEE Trans. Circ. Syst. 51, 10, 85–97. DOI:http://dx.doi.org/
10.1109/TCSI.2004.835683.
L. Dadda. 1965. Some schemes for parallel multipliers. Alta Freq. 34, 349–356. DOI:http://dx.doi.org/10.1109/
TCSI.2004.835683.
Milos Ercegovac and Tomas Lang. 2004 Digital Arithmetic. Morgan Kaufman. DOI:http://dx.doi.org/
10.1109/TCSI.2004.835683.
V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy. 2011 IMPACT: Imprecise adders for low-
power approximate computing, In Proceedings of the IEEE/ACM International Symposium Low-Power
Electronic Design. DOI:http://dx.doi.org/10.1109/ISLPED.2011.5993675.
Vaibhav Gupta, Debabrata Mohaptra, Anand Raghunathan and Kaushik Roy, 2013, low-power digital signal
processing using approximate adders. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 32, 1, 87–97.
DOI:http://dx.doi.org/10.1109/TCAD.2012.2217962.
J. Han and M. Orshansky. 2013. In Proceedings of Approximate Computing: An Emerging Paradigm for
Energy-Efficient Design (ETS’13), 1–6. DOI:http://dx.doi.org/10.1109/ETS.2013.6569370.
I. Koren. 1993. Computer Arithmetic Algorithms. Prentice Hall, Englewood Cliffs, NJ. DOI:http://dx.doi.org/
10.1109/TCSI.2004.835683.
P. Kulkarni, P. Gupta, and M. Ercegovac. 2011 Trading accuracy for power with an under designed mul-
tiplier architecture. In Proceedings of the 24th International Conference on VLSI Design. 346–351.
DOI:http://dx.doi.org/10.1109/VLSID.2011.51
K. Y. Kyaw, W. L. Goh, and K. S. Yeo. 2010. Low-power, a high-speed multiplier for error-tolerant applica-
tion. In Proceedings of the IEEE International Conference on Electron Devices and Solid-State Circuits
(EDSSC’10). 1–4. DOI:http://dx.doi.org/10.1109/EDSSC.2010.5713751.
Chaofan Li, Wei Luo, S. S. Sapatnekar, and Jiang Hu. 2015. Joint precision optimization and high-level
synthesis for approximate computing in the DAC’15. In Proceedings of the Annual Design Automation
Conference, 1–6. DOI:http://dx.doi.org/10.1145/2744769.2744863
J. Liang, J. Han, and F. Lombardi. 2013. New metrics for the reliability of approximate and probabilistic
adders, IEEE Trans. Comput. 63, 9, 1760–1771. DOI:http://dx.doi.org/ 10.1109/TC.2012.146.
C. H. Lin and I. C. Lin. 2013. High accuracy approximate multiplier with error correction. IEEE 31st
International Conference on Computer Design (ICCD), 33–38. DOI:http://dx.doi.org/10.1109/ICCD.2013.
6657022
C. Liu, J. Han, and F. Lombardi. 2014 Dresten, germany. In Proceeding of DATE 2014, A Low-Power, High-
Performance Approximate Multiplier with Configurable Partial Error Recovery. DOI:http://dx.doi.org/
10.1109/TCSI.2004.835683.
H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas. 2010 Bio-inspired imprecise computational blocks
for efficient VLSI implementation of soft-computing applications. IEEE Trans. Circ. Syst. 57, 850–862.
DOI:http://dx.doi.org/10.1109/TCSI.2009.2027626
A. Momeni, J. Han, P. Montuschi, and F. Lombardi. 2015. Design and analysis of approximate compressors
for multiplication. IEEE Trans. Comput. 984–994. DOI:http://dx.doi.org/10.1109/TC.2014.
Ravi Nair, 2010. Models for energy-efficient approximate computing, In Proceedings of the 16th
ACM/IEEE International Symposium on Low Power Electronic and Design (ISLPED’10), 359–360.
DOI:http://dx.doi.org/10.1145/1840845.1840921.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.
Design of Approximate Compressors for Multiplication 44:17
B. Parhami. 2010. Computer Arithmetic: Algorithms and Hardware Designs (2nd ed.). Oxford University
Press, New York, NY.
Swagath Venkataramani, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. Approximate
computing and the quest for computing efficiency in DAC’13. In Proceedings of the 50th Annual Design
Automation Conference DOI:http://dx.doi.org/10.1145/2744769.2751163
C. S. Wallace. 1964. A suggestion for a fast multiplier. IEEE Trans. Electron. Comp. 13, 1, 14–17. DOI:http://dx.
doi.org/10.1109/PGEC.1964.263830
ACM Journal on Emerging Technologies in Computing Systems, Vol. 13, No. 3, Article 44, Publication date: April 2017.