Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

Approximate Hybrid High Radix Encoding for


Energy-Efficient Inexact Multipliers
Vasileios Leon , Georgios Zervakis , Dimitrios Soudris, and Kiamal Pekmestzi

Abstract— Approximate computing forms a design alternative approximate circuits. The main targets are arithmetic units,
that exploits the intrinsic error resilience of various applications e.g., adders and multipliers, that are the core components in
and produces energy-efficient circuits with small accuracy loss. many embedded devices and hardware accelerators. Extensive
In this paper, we propose an approximate hybrid high radix
encoding for generating the partial products in signed multipli- research is reported in approximate adders [4]–[8], providing
cations that encodes the most significant bits with the accurate significant gains in terms of delay and power dissipation.
radix-4 encoding and the least significant bits with an approxi- However, research activities on the approximate multipliers
mate higher radix encoding. The approximations are performed [9]–[21] is less comprehensive compared with the respective
by rounding the high radix values to their nearest power of two. on approximate adders. In inexact multipliers, approximations
The proposed technique can be configured to achieve the desired
energy–accuracy tradeoffs. Compared with the accurate radix-4 can be applied on the partial product generation [16]–[18],
multiplier, the proposed multipliers deliver up to 56% energy as well as the partial product accumulation [9]–[11], [13].
and 55% area savings, when operating at the same frequency, Approximations on the partial product generation and approx-
while the imposed error is bounded by a Gaussian distribution imations on their accumulation are synergistic, and can be
with near-zero average. Moreover, the proposed multipliers are applied in collaboration in order to achieve higher power
compared with state-of-the-art inexact multipliers, outperforming
them by up to 40% in energy consumption, for similar error reduction [17], [18], [22]. Although significant research has
values. Finally, we demonstrate the scalability of our technique. been conducted in the partial product accumulation, research
activity on the approximation of the partial product generation
Index Terms— Approximate computing, error resiliency, low
power, radix encodings, signed multipliers. is still limited. Finally, another limitation of the existing
approximate multipliers is that the majority of them (see [9],
[10], [12], [19], [20]) does not examine signed multiplication.
I. I NTRODUCTION In this paper, targeting the design of inexact multipliers
by applying approximations on the partial product generation,
I N MODERN embedded systems and data centers, energy
efficiency is a mandatory design concern. Considering that
a large amount of application domains exhibits an intrinsic
we propose a novel approximate hybrid high radix encoding.
In the proposed technique, the most significant bits (MSBs)
error tolerance, e.g., digital signal processing (DSP), image of the multiplicand are encoded using the radix-4 encoding,
processing, data analytics, and data mining [1], [2], approx- whereas the k least significant bits (LSBs) are encoded using
imate computing appears as an effective solution to reduce a radix-2k (with k ≥ 4). To simplify the increased complexity
their power dissipation. In approximate computing, error has induced by the proposed hybrid encoding, the circuit for
been viewed as a commodity that can be traded for significant generating the partial products is approximated by altering
gains in cost (e.g., power, energy, and performance) [3], accordingly its truth table. Hence, the number of the partial
and as a result, it composes a promising design paradigm products decreases significantly and simpler tree architectures
targeting energy-efficient systems by extremely decreasing the are used for their accumulation, reducing the multiplier’s
power consumption of inherently error resilient applications. energy consumption, area, and critical path delay. The major
Specifically, approximate computing exploits the innate error contributions of this paper are summarized as follows.
tolerance of the respective applications and deliberately relaxes 1) We propose and enable the application of hybrid high
the correctness of some computations, in order to decrease radix encodings for the generation of energy-efficient
their power consumption and/or accelerate their execution. approximate multipliers, exceeding the increased hard-
Recently, targeting to take advantage of its benefits, mas- ware complexity of very high radix encodings.
sive research has been conducted in the field of hardware 2) The proposed technique can be applied to any multiplier
architecture and is reconfigurable, enabling the user to
Manuscript received June 22, 2017; revised September 12, 2017; accepted
October 10, 2017. This work was supported by the E.C. Program select the optimal per application energy–error tradeoff.
VINEYARD through H2020 Grant under Grant 687628. (Corresponding 3) An analytical error analysis is conducted, showing that
author: Vasileios Leon.) the output error of the proposed technique is bounded
The authors are with the School of Electrical and Computer
Engineering, National Technical University of Athens, Athens 15780, and predictable. Such a rigorous error analysis leads to
Greece (e-mail: vleon@microlab.ntua.gr; zervakis@microlab.ntua.gr; precise and a priori error estimation for any input distri-
dsoudris@microlab.ntua.gr; pekmes@microlab.ntua.gr). bution, without the need of time-consuming simulations.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. 4) We show that the proposed technique outperforms
Digital Object Identifier 10.1109/TVLSI.2017.2767858 related state-of-the-art approximate signed multipliers in
1063-8210 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

terms of hardware and accuracy, achieving up to 40% counter that reduces the partial product stages, and then use it
less energy dissipation for comparable error values. to build larger multipliers. However, the delay is large due
More specifically, the proposed technique is applied to a to the array structure of the multiplier. Kyaw et al. [12]
16×16 bit multiplier and is evaluated using industrial strength propose a multiplier that divides the operands in two parts:
tools, i.e., Synopsys Design Compiler, PrimeTime, and Mentor an accurate multiplication-based part that includes the MSBs
Graphics ModelSim. Compared with the accurate multiplier, and an approximate nonmultiplication-based part for the LSBs
the proposed technique delivers up to 55% energy and area that does not generate partial products. This design delivers
reduction, for mean relative error up to 0.93%. Moreover, significant reductions in power and delay, but only for specific
compared with related state-of-the-art approximate computing input combinations. Moreover, Liu et al. [13] propose an
signed multipliers, i.e., [15], [17], [18], our technique out- approximate multiplier with configurable error recovery that
performs them significantly in terms of energy consumption uses inaccurate fast adders for the partial product additions.
and error. Finally, we examine the scalability of the proposed Although this multiplier delivers low power, the error imposed
technique and show that for the same error value, the delivered cannot be predicted, as it depends on the carry propagation.
energy savings increase as the multiplier size increases. Narayanamoorthy et al. [14] statically split the operands in
The rest of this paper is organized as follows. Section II three m-bit segments and perform the multiplication utilizing
overviews the related work in the field of approximate adders the segment that contains the most significant nonzero bit.
and multipliers. In Section III, the proposed approximate However, this approach exhibits small scalability, as m should
hybrid high radix encoding is introduced and its application be at least half the operand bit-width in order to keep the accu-
for generating inexact multipliers is described. In Section IV, racy in acceptable limits. Reference [15] extended this idea to
the error analysis of the proposed approximate multipliers is enable dynamic range multiplications. The dynamic partition
presented, while in Section V, we evaluate the proposed tech- technique requires extra components for the signed multi-
nique by comparing it with related state-of-the-art approximate plications, adding significant hardware overhead. Recently,
multipliers. Finally, Section VI concludes this paper. Zendegani et al. [21] proposed a multiplier that rounds the
input operands into the nearest exponent of two. Finally, [23]
replaces the floating-point operations with fixed-point ones,
II. R ELATED W ORK
and by applying the proposed stochastic rounding, achieves
In this section, prior works in the field of approximate good accuracy results in training deep neural networks while
multipliers, related to our proposed design, are discussed. delivering high energy savings by limiting the data precision
As far as the approximate adders are concerned, [4] produces representation.
approximate adders by simplifying the logic used in the The modified Booth encoding is commonly used in signed
adder. Gupta et al. [5] design imprecise full adder cells by multipliers [16]–[18], [24], [25]. Although these techniques
approximating their logic function and then use them to build perform fast multiplications, the number of the partial prod-
approximate adders. Nevertheless, it is not clear how these ucts is not reduced in most cases, in contrast with our
adders can be used in different tree architectures and how design. Zervakis et al. [16] introduce the partial product
the error scales in the case of multioperand accumulation. perforation technique, where they omit the generation of
Reference [6] proposes an approximate adder that consists of some partial products based on the modified Booth encoding.
an accurate and an inaccurate part. Reference [7] designs a Jiang et al. [17] propose an approximate radix-8 booth multi-
multiplier in which the LSBs of the additions are approximated plier that uses an approximate adder for producing ±3 A, and
by applying bitwise OR to the respective input bits. In [8], combine this idea with the truncation method. Recently, Liu
a fast approximate adder is produced by limiting the carry et al. [18] designed approximate modified Booth encoders by
propagation, based on a proof that the longest carry chain modifying its K-Map, and combined them with an approximate
in an n-bit adder is log n. However, despite the fact that compressor.
all these techniques demonstrate the benefit of approximate In this paper, an approximate hybrid high radix encod-
computing, their fixed functionality and low-level design limit ing for designing energy-error efficient inexact multipliers
further improvements in efficiency. is proposed. The difference with the existing related work
Regarding the approximations in multiplication schemes, is that it concerns approximations on the generation of the
Kulkarni et al. [9] propose an underdesigned 2 × 2 inaccurate partial products, and can be combined with any accumulation
multiplier block to produce the partial products, and then use it technique, approximate or not. Another significant aspect of
as a building block to design larger multipliers. This technique this paper is that the error imposed depends only on the
is characterized by high error and hardware overhead for the configuration parameter k, and as a result, it can be calculated
error detection and correction due to the small building block. without the need for exhaustive simulations. Consequently,
Momeni et al. [10] propose two approximate 4:2 compressors a precise estimation of the output quality can be extracted
to accumulate the partial products by modifying the respective for the application’s inputs, giving the flexibility to target the
accurate truth table, and then use them to build approximate maximum energy reduction for a specific error bound.
multipliers. The negative aspect of this technique is that there
is no parameter to adjust the accuracy of the multiplier. Lin III. A PPROXIMATE H YBRID H IGH R ADIX M ULTIPLIERS
and Lin [11] introduce a high accuracy approximate 4 × 4 High radix encodings offer partial products reduction, and
Wallace tree multiplier by employing a 4:2 approximate as a result, their accumulation requires smaller trees, leading
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LEON et al.: APPROXIMATE HYBRID HIGH RADIX ENCODING FOR ENERGY-EFFICIENT INEXACT MULTIPLIERS 3

to energy, area, and/or delay savings. However, high radix TABLE I


encodings require complex encoding and partial product gener- A CCURATE R ADIX -4 E NCODING TABLE
ation circuits, negating thus the benefits of the partial products
reduction. In this section, the proposed hybrid high radix
encoding and the performed approximations for simplifying its
circuit complexity are presented. In the proposed technique,
the multiplicand B is encoded using the approximate high
radix encoding, generating B̃, and the approximate multiplica-
tion A· B̃ is performed. Finally, its adaptation on inexact 16-bit
hardware multipliers is described, and a qualitative analysis is
conducted, targeting to estimate the potential area gains.

A. Hybrid High Radix Encoding


In the proposed hybrid high radix encoding, B is divided in
TABLE II
two groups: the MSB part of n − k bits and the LSB part of k
A PPROXIMATE R ADIX -2k E NCODING TABLE
bits. The configuration parameter, k ≥ 4, is an even number,
namely, k = 2m: m ∈ Z, with m ≥ 2. The MSB part is
encoded using the radix-4 (modified Booth) encoding, while
the LSB part is encoded with the high radix-2k encoding

n−2 
n/2−1
k
B = −bn−1 2n−1 + bi 2i = j 4 + y0
y R4 j R2
(1)
i=0 j =k/2
k≥4

where

j = −2b2 j +1 + b2 j + b2 j −1
y R4 (2)
and
k
y0R2 = −2k−1 bk−1 + 2k−2 bk−2 + · · · + b0 . (3)
The radix-4 encoding includes (n − k)/2 digits y R4
j ∈
k
{0, ±1, ±2}, while y0R2 ∈ {0, ±1, ±2, ±3, . . . , ± (2k−1 − 1), Table I presents the accurate radix-4 encoding. The output
−2k−1 } corresponds to the radix-2k encoding. Overall, B is signals sign j , ×1 j , and ×2 j define the radix-4 digit y R4
j . Their
encoded with (n − k)/2 + 1 digits. logic equations are the following:
The above hybrid high radix encoding is characterized
sign j = b2 j +1 (7)
by increased logic complexity, due to the high radix values
k
of y0R2 that are not power of two, and thus, an approximate ×1 j = b2 j −1 ⊕ b2 j (8)
version is proposed. However, in order to retain high accuracy, ×2 j = (b2 j +1 ⊕ b2 j ) · (b2 j −1 ⊕ b2 j ). (9)
the radix-4 encoding of the MSB is performed accurately.
In particular, in the approximate encoding, all the values that Table II presents the approximate radix-2k encoding. The
are not power of two and the k − 4 smallest powers of two logic equations of the encoding signals that define the radix-
k
as well, are rounded to the nearest of the 4 largest powers of 2k digit ŷ0R2 are the following:
two or 0, so that the sum of all the values of the approximate
k sign = bk−1 (10)
digit ŷ0R2 is 0. We choose to keep only the four largest powers k−4
of two, so that the radix-2k encoding circuit requires only ×2 = (bk−2 · bk−3 · bk−4 + bk−2 · bk−3 · bk−4 )
about the double area in comparison with the accurate radix-4 · (bk−4 ⊕ bk−5 ) (11)
encoder. Therefore, B is approximately encoded as follows: ×2 k−3
= bk−1 · bk−2 · (bk−3 · bk−4 · bk−5 + bk−3 · bk−4 )

n/2−1
k
+ bk−1 · bk−2 · (bk−3 · bk−4 · bk−5 + bk−3 · bk−4 )
B̃ = j 4 + ŷ0
y R4 j R2
(4) (12)
j =k/2
k≥4 ×2 k−2
= bk−2 · bk−3 · (bk−1 + bk−4 )
where + bk−2 · bk−3 · (bk−1 + bk−4 ) (13)
×2 k−1
= bk−1 · bk−2 · bk−3 + bk−1 · bk−2 · bk−3 . (14)
j ∈ {0, ±1, ±2}
y R4 (5)
and The effectiveness of the approximate hybrid radix encoding
technique is explored with its application to 16-bit signed
k
ŷ0R2 ∈ {0, ±2k−4 , ±2k−3 , ±2k−2 , ±2k−1 }. (6) numbers, for k = 6, 8, 10, namely, the LSBs are encoded using
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Fig. 1. i-bit partial product generator based on (a) accurate radix-4 encoding and the approximate (b) radix-64, (c) radix-256, and (d) radix-1024 encoding.
ai : i-bit of operand A, ai = ai ⊕ sign.

Fig. 2. Partial product tree based on the hybrid encoding of accurate radix-4 and approximate (a) radix-64, (b) radix-256, and (c) radix-1024 encoding. :
partial product bits from the approximate high radix encoding. •: partial product bits from the accurate radix-4 encoding. and : inverted MSBs of the
partial products.  and ◦: sign factors.

the radix-64, radix-256, and radix-1024 encoding, respectively. TABLE III


In the radix-64 encoding, the bits of B are grouped as in PARTIAL P RODUCTS PER R ADIX E NCODING

y6R4 y4R4 y0R64


        
b15b14 b13 b12 b11 b10 b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 . (15)
        
y7R4 y5R4 y3R4
The following values of the digit y0R64 are rounded to their
nearest power of two: ±1, ±3, ±5, ±6, ±7, ±9, . . . , ±15,
±17, . . . , ±31 are rounded to ±4, ±8, ±16, or ±32, while
the smallest powers of two, i.e., ±1 and ±2, are rounded In Fig. 1, four partial product generators are presented,
to 0 or ±4. i.e., the circuit of the accurate radix-4 encoding and the ones
In radix-1024 encoding, the bits of B are grouped as of the three approximate high radix encodings discussed in
follows: Section III-A. The partial products created from each encoding
are shown in Table III. In addition, the three hybrid high radix
y6R4 y0R1024 encodings create the partial product trees shown in Fig. 2.
     
b15 b14 b13 b12 b11b10 b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 . (16) The trees also include the encoding’s correction term (constant
      terms and sign factors).
y7R4 y5R4 The implementation of the partial product accumulation can
Similarly, the nonpowers of two are rounded be chosen by the designer. In this paper, an accurate Wallace
to ±64, ±128, ±256, or ±512, and the tree [26] is used to implement the partial product’s sum,
smallest powers of two (±1, ±2, ±4, ±8, whereas the two outputs produced by the Wallace tree are
±16, ±32) are rounded to 0 or ±64. The encoder’s added using a prefix (fast) adder. Overall, the multiplication
inputs are the bits b9 , b8 , . . . , b0 , the approximate radix-1024 circuit consists of stages of operand hybrid radix encod-
digit is ŷ0R1024 ∈ {0, ±64, ±128, ±256, ±512}, and the output ing, partial product generation, partial product accumulation,
signals that define ŷ0R1024 are sign, ×64, ×128, ×256, ×512. and final addition. The proposed approximate multipliers are
named RAD2k , showing the selected approximate high radix
encoding, e.g., RAD64, RAD256, and RAD1024.
B. Partial Product Generation
In the proposed hybrid encoding, the n − k MSBs of B are
encoded with the accurate radix-4 encoding, while the k LSBs C. Unit Gate Model
are encoded with an approximate radix-2k encoding. The accu- The advantage of the approximate hybrid high radix mul-
rate radix-4 encoder produces the signals defined in (7)–(9), tipliers is their simple logic, resulting in fast operation and
whereas the approximate high radix encoder produces the low power performance. Although overhead is added because
signals of (10)–(14). Overall, there is a reduction of k/2 − 1 of the encoding circuits, it is insignificant because of the
partial products generated in the multiplication A · B̃. approximations made. Also, this offset is compensated with
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LEON et al.: APPROXIMATE HYBRID HIGH RADIX ENCODING FOR ENERGY-EFFICIENT INEXACT MULTIPLIERS 5

TABLE IV approximate multipliers (RAD64, RAD256, and RAD1024).


U NIT G ATES A REA OF THE R ADIX E NCODERS AND The results concern the radix encoding, the partial product
PARTIAL P RODUCTS G ENERATORS
generation, the partial product accumulation, and the final
addition. The area reduction achieved in our designs is up
to 33.16%. However, the unit gate model is a simplified
model (e.g., does not take into account the interconnections
complexity), but gives a rough estimation for the area reduc-
tion. The exact hardware reductions achieved in our designs
are presented in Section V.

TABLE V IV. E RROR A NALYSIS


U NIT G ATES A REA S AVINGS OF RAD M ULTIPLIERS
A critical issue in approximate computing designs is the
error imposed due to the approximations and how it affects
the final results. In this section, an error evaluation analysis
of the proposed multipliers is presented. In [29], an error
evaluation metric is proposed, being called mean relative error
distance (MRED). RED is defined as the arithmetic difference
between the accurate product and the approximate product
divided by the accurate product, while MRED is the average
of REDs for a set of inputs (22n are all the possible input
combinations for a n × n bit multiplier). The possibility of
having RED smaller than 2% (PRED) is another important
error metric used in [17] and [18] for evaluating approximate
radix multipliers.
the partial product generators that deliver low area, and the Considering the multiplication of two n-bit numbers,
reduction of the number of the partial products. A and B, with B̃ being the approximate encoded operand,
In order to give a theoretical evaluation of the proposed RED is calculated by
multipliers, an area gate model is included. The area eval-
uation is performed by using the unit gate model of [27]: |AB − A B̃| |B − B̃|
RED AB = =
a XOR-2 gate counts as 2 unit gates, an AND-2 or an OR-2 |AB| |B|
 R2k k
gate is equal to 1 unit gate, and a NOT gate is equal to 0.5 unit y − ŷ0 
R2
gate. According to this model, Table IV presents the number =  0 . (17)
n/2−1 R4 k
 
 j =k/2 y j 4 + y0 
j R2
of unit gates of each circuit used in our multipliers.
Overall, the accurate radix-4 n-bit multiplier uses n/2 k≥4
radix-4 encoders and n 2 /2 radix-4 partial product generators to Hence, RED depends only on B and RED AB = RED B .
produce the n/2 n-bit partial products. Similarly, the proposed Assuming that p A and p B are the probability density func-
approximate radix-2k , with k ≥ 4, requires (n − k)/2 radix-4 tions (pdfs) of A and B, respectively, MRED is calculated
encoders, 1 radix-2k encoder, n(n − k)/2 radix-4 partial by
product generators, and n + k − 2 radix-2k partial product 
generators. MRED = p A (A) p B (B)RED AB
∀A,B
The partial product accumulation is implemented by   
a Wallace tree [26]. The multioperand accumulation = p A (A) p B (B)RED B = p B (B)RED B
of m n-bit numbers in carry-save form requires m − 2 n-bit ∀A ∀B ∀B
carry save adders, while the n-bit carry-save adder requires n  k
|y0R2 − ŷ0R2 |
k

full adders [28]. Also, a full adder is equal to 7 unit gates = p B (B)  . (18)
n/2−1 R4 
according to the used model. Therefore, since the accurate ∀B  y 4 j + y R2k 
 j =k/2 j 0 
radix-4 multiplier accumulates m = n/2 + 1 operands (includ- k≥4
ing the correction term), the partial product accumulation In the case of uniform distribution, MRED is obtained by
requires 7n(n − 2)/2 unit gates. Similarly, the partial product setting p B (B) = 1/2n .
tree of the proposed multipliers consists of m = n/2 − k/2 + 2 In Fig. 3, the pdfs of RED for the RAD64 multiplier and
operands, and thus, the accumulation requires 7n(n − k)/2 the distribution of RED according to the encoded operand B
unit gates. Regarding the final accurate 2n-bit addition, it is for the multipliers RAD2k , k = 6, 8, 10 are presented. The
performed by a fast adder that consists of 2n half adders of 3 pdf of RED is similar for the other two multipliers and shows
unit gates, n log2 2n propagate group circuits of 3 unit gates, that the probability of having small RED is high. The error
and 2n XOR-2 gates. distribution follows a Gaussian distribution with bounds for
Based on the previous analysis, Table V presents the maximum error, i.e., the error is very small for large values of
total number of area unit gates required in the accurate B. In particular, RAD64 exhibits RED more than 10% only
radix-4 16-bit multiplier (ACCR4) and the proposed if B ∈ [−100, 100]. The respective intervals for RAD256 and
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Fig. 3. (a) PDF of RED for RAD64 and RED values of (b) RAD64, (c) RAD256, and (d) RAD1024.

RAD1024 are [−400, 400] and [−1500, 1500]. However, these TABLE VI
intervals represent a very small fraction of all the possible A PPROXIMATE M ULTIPLIERS C OMPARISON
values of B in 16-bit arithmetic. Therefore, the possibility of
having RED more than 10% is 0.001%, 0.004%, and 0.02%
for RAD64, RAD256, and RAD1024, respectively.
Finally, the provided equations enable the calculation of
the error propagation in more complex hardware accelerators.
Consider two approximate products P1 and P2 , produced
by the proposed RAD multipliers, with errors ε p1 and ε p2 ,
respectively. Using [30], the propagated errors of the addition
y and the multiplication p are obtained by

y + ε y = P1 + P2 + ε p1 + ε p2 + ε+ (19)
p + ε p = P1 · P2 + P1 · ε p2 + P2 · ε p1 + ε p1 · ε p2 + ε p1 p2
(20)
where ε+ is the error introduced by the addition operation and
ε p1 , ε p2 , and ε p1 p2 are obtained from (17). All the examined multipliers are implemented in Verilog,
In conclusion, in terms of error, the proposed RAD multi- synthesized using Synopsys Design Compiler and the TSMC
pliers constitute very efficient approximate designs. Even the 65-nm standard cell library and simulated with Mentor Graph-
RAD1024, that induces the highest error, features MRED only ics ModelSim. The critical path delay and the area are reported
0.93% and PRED 93.26%. Moreover, although high errors by Synopsys Design Compiler, while the power consumption
may be produced, the probability of such an error is negligible. is measured with Synopsys PrimeTime-PX tool using all the
For example, for the RAD1024 multiplier, the probability of possible input combinations. All the designs are synthesized
RED being higher than 10% is 0.02%. and simulated at 1 V, i.e., the nominal supply voltage. All
the examined multipliers are evaluated and compared as both
V. E XPERIMENTAL R ESULTS stand-alone circuits and as components of real-life applica-
tions. The applications used in this evaluation consist of typical
This section includes the evaluation of the proposed design
examples where approximate computing is applied.
in terms of accuracy (error) and hardware (delay, area, power,
and energy). For comparisons, the accurate radix-4 and radix-8
multipliers are implemented, as well as the approximate multi- A. Evaluation at Circuit Level
pliers [15], [17], [18]. R8ABM1 and R8ABM2-15 [17] use the In this section, we experimentally evaluate the examined
radix-8 encoding and calculate 3 A with an adder multipliers as stand-alone circuits. Table VI presents the
that operates approximately for the 8 LSBs. Addi- comparison results of all the approximate multipliers in terms
tionally, R8ABM2-15 truncates 15 bits of the partial of hardware and accuracy. Power and area have been measured
products. R4ABM1-14, R4ABM1-16, R4ABM2-14, and at the critical path delay of each multiplier, while energy is
R4ABM2-16 [18] use the accurate radix-4 encoder for produc- defined as the product of power and delay. Regarding the
ing the MSBs of the partial product tree, and an approximate error values, both MRED and PRED have been obtained by
radix-4 encoder for its LSBs (14 and 16), with R4ABM2 per- performing exhaustive simulations over all the possible input
forming more approximations than R4ABM1. The radix mul- combinations. The error metrics of our multipliers have also
tipliers of [17] and [18] are implemented to accumulate been calculated and verified in software, by using (17) and (18)
accurately the partial products with a Wallace tree, just as presented in Section IV.
the proposed multipliers of this paper. Finally, DRUM6 [15] RAD64 gives the best MRED among all the designs,
selects a 6-bit segment starting from the leading nonzero bit as approximations are performed only in the 6 LSBs of B.
of the input operands and sets the LSB of the truncated values In addition, the approximated values have small absolute
to “1.” distance from the accurate. In R4ABM1-14, a very small error
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LEON et al.: APPROXIMATE HYBRID HIGH RADIX ENCODING FOR ENERGY-EFFICIENT INEXACT MULTIPLIERS 7

Fig. 4. Delivered savings of the proposed approximate multipliers compared


with the accurate radix-4 multiplier when operating at the same frequency.

Fig. 5. MRED–energy tradeoff of the examined approximate multipliers.


is introduced, considering that only four entries of the radix-4
encoder’s K-Map are modified [18], while R8ABM1 has a
are attained by the RAD multipliers, and it is shown that the
small MRED, because the approximate adder used to cal-
gains become even more significant when using higher radix
culate 3 A involves both accurate and approximate parts.
encoding, since less partial products are generated. In these
However, the errors of R8ABM1 are of greater significance,
designs, the number of the partial products is reduced from 8
namely, there are large absolute distances from the exact
to 6, 5, and 4, and thus, the area and the depth of the Wallace
results, a factor that can damage the functionality of a real-
tree are reduced by up to 50%. Also, using only powers of two
life application. In R8ABM2-15, the error does not increase
for the encoding reduces significantly both area and energy,
significantly, although the 15 LSBs of the partial products are
with RAD1024 delivering reductions of about 56% in energy
truncated. This is due to the correction “1” added to the 17th
consumption and 55% in area.
bit that compensates the error generated by the truncated lower
Fig. 5 shows a comprehensive comparison of all the mul-
part [17]. Finally, DRUM6 delivers the worst MRED among
tipliers by considering both MRED and energy consumption
all the multipliers with 1.47%, although its error distribution is
in a Pareto diagram. The purpose of the Pareto diagram is
bounded. In conclusion, the proposed approximate high radix
to extract the most efficient designs in terms of energy–error.
multipliers deliver MRED smaller than 1% as stand-alone
As shown, the Pareto front is formed exclusively by RAD
circuits, meaning that they can be used in real-life applications
multipliers. Hence, the proposed RAD designs constitute the
that tolerate the mean errors of 5% or 10% in total [31].
most efficient approximate multiplier alternatives, compared
DRUM6 has the worst critical path delay, as it initially
with all the examined state-of-the-art approximate multipliers,
calculates the absolute value of the products [15], while our
as they exhibit the best energy–MRED tradeoff. RAD64 attains
approximate multipliers RAD64, RAD256, and RAD1024 are
the smallest MRED value and should be preferred when error
the fastest circuits, with the delays of 0.72, 0.69, and 0.65,
is of high importance, while RAD1024 delivers the highest
respectively. Regarding energy, R8ABM1 delivers the worst
energy reduction. Finally, RAD256 features significant energy
energy consumption among all the multipliers, as it may use
reduction for a very small error value.
an approximate adder to calculate 3 A, but it contains a
The presented analysis regards fixed point arithmetic, but
7-bit precise adder to ensure the accuracy of the results [17].
it can also be extended in floating point multiplication. In
The energy dissipation is improved in R8ABM2-15 due to
the latter case, the proposed RAD multipliers can replace
the truncated partial product bits. However, as our multipliers
the fixed point multiplication of the mantissas performed
feature small MRED, the truncation technique can be applied
in a floating point multiplier. As the induced error is very
to them as well, in order to give even better hardware
small, it affects only the mantissa’s accuracy and the exponent
reductions. The radix-4 multipliers of [18] exhibit similar
value (computed by an accurate adder) is calculated precisely.
results, with R4ABM2-16 delivering the highest area and
Therefore, similar error values are expected in case of floating
energy savings (among the multipliers of [18]). The energy
point multiplication. Finally, the energy savings depend on the
dissipation can be further reduced by using approximate partial
floating point representation and in the case of high precision,
product accumulation [18]. RAD1024 features the best energy
similar or even higher savings are expected.
consumption, with DRUM6 being second, delivering however
quite large MRED compared with the other multipliers.
Next, we evaluate the proposed designs by exploring the B. Evaluation at Application Level
energy and area savings compared with the accurate radix-4 In this section, we evaluate the proposed multipliers over
multiplier. The main target of approximate computing is to real-life applications, in order to demonstrate the benefits of
trade accuracy for energy gains, i.e., generate good enough replacing the accurate radix-4 multiplier with a RAD one,
results at comparable performance and lower energy con- and examine its impact on the application’s accuracy. The
sumption. Thus, in this evaluation, we leverage the delay examined benchmarks are the Sobel edge detector, a 32-tap
slack between the RAD multipliers and the accurate one, and finite impulse response (FIR) filter, and the matrix multiplica-
targeting energy efficiency, we synthesize and simulate the tion. These benchmarks belong to the image processing, DSP,
RAD designs at the critical path delay of the accurate radix-4. and machine learning domains, i.e., typical domains where
Hence, in Fig. 4, we examine the delivered savings of RAD approximate computing is applied.
multipliers compared with the accurate radix-4 when they Table VII shows the delivered reductions in critical path
operate at the same frequency. High energy and area reduction delay, energy consumption, and area when using the respective
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

TABLE VII
H ARDWARE G AINS AND A CCURACY R ESULTS OF R EAL -L IFE A PPLICATIONS W ITH A PPROXIMATE M ULTIPLIERS

approximate multiplier in the examined applications. The


presented percentages are calculated with respect to the delay,
energy consumption, and area of every benchmark when using
the accurate radix-4 multiplier. Additionally, the accuracy
results of each benchmark are reported. Next, we present the
benchmarks and evaluate the approximate multipliers accord-
ing to the attained savings and output error.
1) Sobel Edge Detector: A common filter for finding the
boundaries of the objects of an image is the discrete Sobel
operator [32]. The Sobel operator consists of two 3 × 3
kernels that are applied linearly to the image to compute an
approximation of the gradient of the image intensity function.
The first kernel computes the changes in brightness in the
horizontal orientation, and the second one in the vertical
orientation. The image is filtered in 3 × 3 regions. Let I be a
3 × 3 region, the new value of its central pixel is computed
by the application of the Sobel kernels S1 and S2

Ik (2, 2) = I (i, j )Sk (i, j ), k = 1, 2. (21)
∀i, j ∈[1,3]

Then, all the changes in brightness are computed as follows:


Fig. 6. (a) Input image. (b) Edge detection with accurate multiplier.
E(2, 2) = I1 (2, 2)2 + I2 (2, 2)2 . (22) (c) RAD64. (d) RAD256. (e) RAD1024. (f) R8ABM1. (g) R8ABM2-15.
(h) R4ABM1-14. (i) R4ABM1-16. (j) R4ABM2-14. (k) R4ABM2-16.
For evaluating the performance of edge detection with the (l) DRUM6.
16 × 16 bit approximate multipliers, an input image of 16-bit
pixels is used. The percentage of the edges detected using the
approximate multiplier over those detected using the accurate of the edges, respectively. In conclusion, the proposed RAD
radix-4 is used as quality metric. Fig. 6 presents the original multipliers are the most efficient in detecting the edges of an
input image, and the output images produced by the linear image while delivering high energy savings.
filtering using an accurate and the approximate multipliers. 2) FIR Filter: In signal processing, filtering is a process
As shown in Table VII, the RAD multipliers achieve the for removing unwanted features from a signal, e.g., specific
highest accuracy by detecting almost all the edges (more frequencies or frequency bands. We evaluate the efficacy of
than 99.87% edges are detected). Moreover, they deliver up the approximate multipliers when applied to an FIR filter.
to 54.84% energy, 46.02% area, and 8.38% delay reduc- Specifically, we designed a 32-tap low-pass FIR filter with
tion, respectively. DRUM6 attains the highest energy reduc- cutoff frequency equal to 20 kHz and 16-bit coefficients.
tion (55.26%) that is slightly larger than the one of RAD1024. The multiplications are performed with the approximate
However, DRUM6 induces a delay overhead of 7.78% and multipliers, while the additions are accurate. The quality
detects 1% less edges compared with RAD1024. Further- metric is MRED, where RED is defined as the absolute RED
more, it is worth to note that although R8ABM1 and between the output of the accurate FIR filter and that of the
R8ABM2-15 feature small MRED values, they perform poorly approximate one. The 200 000 random generated inputs are
in this benchmark, since they detect only 58.90% and 54.41% used in this evaluation. The output of an FIR filter equals to
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LEON et al.: APPROXIMATE HYBRID HIGH RADIX ENCODING FOR ENERGY-EFFICIENT INEXACT MULTIPLIERS 9

More specifically, we study the impact of the proposed tech-


nique, in terms of energy savings and delay gains, over scaled
multiplier size, i.e., 16, 24, and 32 bit. We consider the accu-
rate radix-4 multiplier as the base architecture for comparisons,
and a constant MRED value of 0.28% as a quality constraint.
We select 0.28% as the error bound, since it is the MRED
value of the 16-bit RAD256. In Fig. 5, RAD256 attained
a very efficient energy-error tradeoff, i.e., significant error
Fig. 7. Energy and delay savings of the proposed RAD multipliers for scaled reduction for very small error. Fig. 7 shows how the reduc-
multiplier bit-width and constant error bound MRED = 0.28%. tions in delay and energy consumption scale with respect
to the multiplier’s size. For the 16-bit multiplier, the high
the convolution of the input signal and the filter’s coefficients, radix-28 encoding (RAD256) is considered, while the 24-bit
and is calculated by accumulating the multiplications of the and 32-bit multipliers are designed with the high radix-216 and
discrete input values with the coefficients. radix-224 encodings, respectively, in order to deliver the same
As presented in Table VII, the RAD designs deliver both mean error. The reductions in energy consumption and delay
the highest accuracy (smallest MRED) and highest energy scale up to 22% and 64%, respectively, for a small mean error
reduction. The proposed multipliers achieve up to 34.14% equal to 0.28%. Thus, the results show that as the multiplier’s
energy reduction for small MRED value 3.60%. As in the size increases, the proposed technique achieves higher gains
previous benchmark, R8ABM1 and R8ABM2-15 feature high in critical path delay and energy dissipation for the same error.
error value, i.e., 13.51% and 33.98%, respectively. More- The scaling behavior is theoretically confirmed, as the RAD
over, in this benchmark, DRUM6 achieves significant energy designs of Fig. 7 have MRED = 0.28% while generating
reduction (21.43%) for considerable error (9.18%). However, 37.50%, 58.33%, and 68.75% less partial products for 16, 24,
these values are worse compared with the proposed RAD1024, and 32 bit, respectively. More specifically, each one generates
i.e., 12.71% less energy reduction and 5.58% higher error. 5 partial products in total, i.e., 1 approximate and 4 accurate,
Finally, compared with the other two benchmarks, the induced in contrast to the 8, 12, and 16 partial products generated,
error by the approximate multipliers in the FIR filter is larger. respectively, by the accurate radix-4 multiplier.
This is explained by the fact that in FIR, every output depends
on the previous ones, and as a result, an error occurred in a VI. C ONCLUSION
previous computation is propagated to the next ones.
In this paper, we propose an approximate hybrid high
3) Matrix Multiplication: Matrix multiplication has signif-
radix encoding for generating the partial products of a signed
icant application in the fields of signal processing, graphics,
multiplier. The MSBs of the multiplicand are encoded with
and machine learning. It is an operation that is used either as a
the accurate radix-4 encoding, while its k LSBs are encoded
common stand-alone function or forms a kernel of more com-
with an approximate high radix-2k encoding, with k being
plex computations in larger designs. For evaluation purposes,
a configuration parameter that adjusts the tradeoff between
we created a circuit that performs the matrix multiplication
accuracy and energy consumption. The error of the pro-
3 × 3 tilling. The quality of this benchmark is evaluated again
posed technique follows a Gaussian distribution with near
with MRED, as it is defined in the FIR filter. Similarly, 200 000
zero average. Compared with state-of-the-art inexact multi-
random generated inputs are used in this evaluation.
pliers, the proposed ones constitute better approximate design
As shown in Table VII, all the multipliers perform very well
alternative, outperforming them in both energy consumption
in terms of accuracy. Compared with the examined inexact
and accuracy. Furthermore, we showed the efficiency of the
multipliers, the proposed RAD designs achieve higher energy
proposed multipliers when applied in real-life applications.
savings for similar error values. RAD1024 delivers 37.7%
Finally, the proposed technique is scalable, delivering higher
energy reduction for only 0.57% MRED value. R4AMB1-
energy savings for the same error, as the multiplier’s size
14 and RAD64 feature the smallest MRED (0.07%), but
increases.
RAD64 attains 2% higher energy reduction compared with
R4AMB1-14. Similarly, R4AMB2-14 and RAD256 exhibit
similar MRED, while RAD256 delivers 18.66% higher energy R EFERENCES
savings. [1] V. K. Chippa, S. T. Chakradhar, K. Roy, and A. Raghunathan, “Analysis
In conclusion, the utilization of the proposed RAD designs and characterization of inherent application resilience for approximate
computing,” in Proc. ACM/IEEE Design Autom. Conf., May 2013,
in real-life application delivers significant energy, area, and pp. 1–9.
delay reductions for a small and acceptable application [2] S. T. Chakradhar and A. Raghunathan, “Best-effort computing:
error [31]. Moreover, they outperform all the examined state- Re-thinking parallel software and hardware,” in Proc. ACM/IEEE Design
Autom. Conf., Jun. 2010, pp. 865–870.
of-the-art approximate multipliers in all the metrics examined.
[3] A. Lingamneni, C. Enz, K. Palem, and C. Piguet, “Highly energy-
efficient and quality-tunable inexact FFT accelerators,” in Proc. IEEE
Custom Integr. Circuits Conf., Sep. 2014, pp. 1–4.
C. Evaluation of RAD Scaling [4] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, “Low-power dig-
ital signal processing using approximate adders,” IEEE Trans. Comput.-
In this section, we examine the scalability of the pro- Aided Design Integr. Circuits Syst., vol. 32, no. 1, pp. 124–137,
posed technique in terms of increased multiplier’s bit-width. Jan. 2013.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

[5] V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and [29] J. Liang, J. Han, and F. Lombardi, “New metrics for the reliability of
K. Roy, “IMPACT: IMPrecise adders for low-power approximate approximate and probabilistic adders,” IEEE Trans. Comput., vol. 62,
computing,” in Proc. 17th IEEE/ACM Int. Symp. Low-Power Electron. no. 9, pp. 1760–1771, Sep. 2013.
Design, Aug. 2011, pp. 409–414. [30] C. Li, W. Luo, S. S. Sapatnekar, and J. Hu, “Joint precision opti-
[6] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, “Design of mization and high level synthesis for approximate computing,” in Proc.
low-power high-speed truncation-error-tolerant adder and its application ACM/IEEE Design Autom. Conf., Jun. 2015, pp. 1–6.
in digital signal processing,” IEEE Trans. Very Large Scale Integr. (VLSI) [31] J. Park, E. Amaro, D. Mahajan, B. Thwaites, and H. Esmaeilzadeh,
Syst., vol. 18, no. 8, pp. 1225–1229, Aug. 2010. “AxGames: Towards crowdsourcing quality target determination in
[7] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “Bio-inspired approximate computing,” in Proc. ACM Int. Conf. Architectural Support
imprecise computational blocks for efficient VLSI implementation of Programm. Lang. Oper. Syst., Apr. 2016, pp. 623–636.
soft-computing applications,” IEEE Trans. Circuits Syst. I, Reg. Papers, [32] I. Sobel and G. Feldman, “A 3×3 isotropic gradient operator for image
vol. 57, no. 4, pp. 850–862, Apr. 2010. processing,” in Stanford Artificial Intelligence Project. 1968.
[8] A. K. Verma, P. Brisk, and P. Ienne, “Variable latency speculative
addition: A new paradigm for arithmetic circuit design,” in Proc. Design,
Vasileios Leon received the Diploma degree from
Autom. Test Eur., Mar. 2008, pp. 1250–1255.
the Department of Computer Engineering and Infor-
[9] P. Kulkarni, P. Gupta, and M. Ercegovac, “Trading accuracy for power
matics, University of Patras, Greece, in 2016. He
with an underdesigned multiplier architecture,” in Proc. 24th Int. Conf.
is currently working toward the Ph.D. degree at
VLSI Design, Jan. 2011, pp. 346–351.
the School of Electrical and Computer Engineering,
[10] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and
National Technical University of Athens, Greece.
analysis of approximate compressors for multiplication,” IEEE Trans.
His research interests include approximate com-
Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015.
puting, low-power VLSI designs, arithmetic circuits
[11] C.-H. Lin and I.-C. Lin, “High accuracy approximate multiplier with designs, FPGA designs, digital signal processing,
error correction,” in Proc. IEEE 31st Int. Conf. Comput. Design, and image processing.
Oct. 2013, pp. 33–38.
[12] K. Y. Kyaw, W. L. Goh, and K. S. Yeo, “Low-power high-speed mul-
tiplier for error-tolerant application,” in Proc. IEEE Int. Conf. Electron
Devices Solid-State Circuits, Dec. 2010, pp. 1–4. Georgios Zervakis received the Diploma from the
[13] C. Liu, J. Han, and F. Lombardi, “A low-power, high-performance Department of Electrical and Computer Engineering,
approximate multiplier with configurable partial error recovery,” in Proc. National Technical University of Athens, Greece,
Design, Autom. Test Eur., Mar. 2014, pp. 1–4. in 2012, where he is currently working toward the
[14] S. Narayanamoorthy, H. A. Moghaddam, Z. Liu, T. Park, and N. S. Kim, Ph.D. degree in the field of digital and microproces-
“Energy-efficient approximate multiplication for digital signal process- sor system design.
ing and classification applications,” IEEE Trans. Very Large Scale His research interests include approximate com-
Integr. (VLSI) Syst., vol. 23, no. 6, pp. 1180–1184, Jun. 2015. puting, hardware acceleration on cloud, arithmetic
[15] S. Hashemi, R. I. Bahar, and S. Reda, “DRUM: A dynamic range circuits, and low power design.
unbiased multiplier for approximate applications,” in Proc. IEEE/ACM
Int. Conf. Comput.-Aided Design, Nov. 2015, pp. 418–425.
[16] G. Zervakis, K. Tsoumanis, S. Xydis, D. Soudris, and K. Pekmestzi, Dimitrios Soudris received the Diploma and the
“Design-efficient approximate multiplication circuits through partial Ph.D. degree in electrical engineering from the
product perforation,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., University of Patras, Greece, in 1987 and 1992,
vol. 24, no. 10, pp. 3105–3117, Oct. 2016. respectively.
[17] H. Jiang, J. Han, F. Qiao, and F. Lombardi, “Approximate radix-8 booth He was working as a Professor at the Department
multipliers for low-power and high-performance operation,” IEEE Trans. of Electrical and Computer Engineering, Democritus
Comput., vol. 65, no. 8, pp. 2638–2644, Aug. 2016. University of Thrace for 13 years since 1995. He
[18] W. Liu, L. Qian, C. Wang, H. Jiang, J. Han, and F. Lombardi, “Design is currently working as Associate Professor at the
of approximate radix-4 booth multipliers for error-tolerant computing,” School of Electrical and Computer Engineering,
IEEE Trans. Comput., vol. 66, no. 8, pp. 1435–1441, Aug. 2017. Department of Computer Science, National Techni-
[19] Z. Babić, A. Avramović, and P. Bulić, “An iterative logarithmic multi- cal University of Athens, Greece. His research inter-
plier,” Microprocess. Microsyst., vol. 35, no. 1, pp. 23–33, Feb. 2011. ests include embedded systems design, reconfigurable architectures, reliability
[20] J. N. Mitchell, “Computer multiplication and division using binary log- and low power VLSI design. He has published more than 340 papers in
arithms,” IRE Trans. Electron. Comput., vol. EC-11, no. 4, pp. 512–517, international journals and conferences. Also, he is coauthor/coeditor in seven
Aug. 1962. books of Kluwer and Springer. He is leader and principal investigator in
[21] R. Zendegani, M. Kamal, M. Bahadori, A. Afzali-Kusha, and numerous research projects funded from the Greek Government and Industry,
M. Pedram, “RoBa multiplier: A rounding-based approximate multiplier European Commission, ENIAC-JU and European Space Agency.
for high-speed yet energy-efficient digital signal processing,” IEEE Dr. Soudris has served as General Chair and Program Chair in several con-
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 2, pp. 393–401, ferences. Also, he received an award from INTEL and IBM for the EU project
Feb. 2017. LPGD 25256, awards in ASP-DAC 05 and VLSI 05 for EU AMDREL project
[22] G. Zervakis, S. Xydis, K. Tsoumanis, D. Soudris, and K. Pekmestzi, IST-2001-34379.
“Hybrid approximate multiplier architectures for improved power-
accuracy trade-offs,” in Proc. IEEE/ACM Int. Symp. Low Power Elec-
tron. Design, Jul. 2015, pp. 79–84. Kiamal Pekmestzi received the Diploma in elec-
[23] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep trical engineering from the National Technical Uni-
learning with limited numerical precision,” in Proc. Int. Conf. versity of Athens, Greece, in 1975 and the Ph.D.
Mach. Learn., Jul. 2015, pp. 1737–1746. degree in electrical engineering from the University
[24] K.-J. Cho, K.-C. Lee, J.-G. Chung, and K. K. Parhi, “Design of low- of Patras, Greece, in 1981.
error fixed-width modified booth multiplier,” IEEE Trans. Very Large From 1975 to 1981 he was a Research Fellow
Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 522–531, May 2004. at the Electronics Department, Nuclear Research
[25] J. P. Wang, S. R. Kuang, and S. C. Liang, “High-accuracy fixed-width Center “Demokritos.” From 1983 to 1985, he was
modified booth multipliers for lossy applications,” IEEE Trans. Very a Professor at the Higher School of Electronics,
Large Scale Integr. (VLSI) Syst., vol. 19, no. 1, pp. 52–60, Jan. 2011. Athens, Greece. Since 1985, he has been with the
[26] C. S. Wallace, “A suggestion for a fast multiplier,” IEEE Trans. Electron. National Technical University of Athens, where he
Comput., vol. EC-13, no. 1, pp. 14–17, Feb. 1964. is currently a Professor at the Department of Electrical and Computer Engi-
[27] A. Tyagi, “A reduced-area scheme for carry-select adders,” IEEE Trans. neering. His research interests include efficient implementation of arithmetic
Comput., vol. 42, no. 10, pp. 1163–1170, Oct. 1993. operations, design of embedded and microprocessor-based systems, architec-
[28] N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems tures for reconfigurable computing, VLSI implementation of cryptography,
Perspective, 4th ed. Reading, MA, USA: Addison-Wesley, 2010. and DSP algorithms.

You might also like