Professional Documents
Culture Documents
Approximate Hybrid High Radix Encoding For Energy-Efficient Inexact Multipliers
Approximate Hybrid High Radix Encoding For Energy-Efficient Inexact Multipliers
Abstract— Approximate computing forms a design alternative approximate circuits. The main targets are arithmetic units,
that exploits the intrinsic error resilience of various applications e.g., adders and multipliers, that are the core components in
and produces energy-efficient circuits with small accuracy loss. many embedded devices and hardware accelerators. Extensive
In this paper, we propose an approximate hybrid high radix
encoding for generating the partial products in signed multipli- research is reported in approximate adders [4]–[8], providing
cations that encodes the most significant bits with the accurate significant gains in terms of delay and power dissipation.
radix-4 encoding and the least significant bits with an approxi- However, research activities on the approximate multipliers
mate higher radix encoding. The approximations are performed [9]–[21] is less comprehensive compared with the respective
by rounding the high radix values to their nearest power of two. on approximate adders. In inexact multipliers, approximations
The proposed technique can be configured to achieve the desired
energy–accuracy tradeoffs. Compared with the accurate radix-4 can be applied on the partial product generation [16]–[18],
multiplier, the proposed multipliers deliver up to 56% energy as well as the partial product accumulation [9]–[11], [13].
and 55% area savings, when operating at the same frequency, Approximations on the partial product generation and approx-
while the imposed error is bounded by a Gaussian distribution imations on their accumulation are synergistic, and can be
with near-zero average. Moreover, the proposed multipliers are applied in collaboration in order to achieve higher power
compared with state-of-the-art inexact multipliers, outperforming
them by up to 40% in energy consumption, for similar error reduction [17], [18], [22]. Although significant research has
values. Finally, we demonstrate the scalability of our technique. been conducted in the partial product accumulation, research
activity on the approximation of the partial product generation
Index Terms— Approximate computing, error resiliency, low
power, radix encodings, signed multipliers. is still limited. Finally, another limitation of the existing
approximate multipliers is that the majority of them (see [9],
[10], [12], [19], [20]) does not examine signed multiplication.
I. I NTRODUCTION In this paper, targeting the design of inexact multipliers
by applying approximations on the partial product generation,
I N MODERN embedded systems and data centers, energy
efficiency is a mandatory design concern. Considering that
a large amount of application domains exhibits an intrinsic
we propose a novel approximate hybrid high radix encoding.
In the proposed technique, the most significant bits (MSBs)
error tolerance, e.g., digital signal processing (DSP), image of the multiplicand are encoded using the radix-4 encoding,
processing, data analytics, and data mining [1], [2], approx- whereas the k least significant bits (LSBs) are encoded using
imate computing appears as an effective solution to reduce a radix-2k (with k ≥ 4). To simplify the increased complexity
their power dissipation. In approximate computing, error has induced by the proposed hybrid encoding, the circuit for
been viewed as a commodity that can be traded for significant generating the partial products is approximated by altering
gains in cost (e.g., power, energy, and performance) [3], accordingly its truth table. Hence, the number of the partial
and as a result, it composes a promising design paradigm products decreases significantly and simpler tree architectures
targeting energy-efficient systems by extremely decreasing the are used for their accumulation, reducing the multiplier’s
power consumption of inherently error resilient applications. energy consumption, area, and critical path delay. The major
Specifically, approximate computing exploits the innate error contributions of this paper are summarized as follows.
tolerance of the respective applications and deliberately relaxes 1) We propose and enable the application of hybrid high
the correctness of some computations, in order to decrease radix encodings for the generation of energy-efficient
their power consumption and/or accelerate their execution. approximate multipliers, exceeding the increased hard-
Recently, targeting to take advantage of its benefits, mas- ware complexity of very high radix encodings.
sive research has been conducted in the field of hardware 2) The proposed technique can be applied to any multiplier
architecture and is reconfigurable, enabling the user to
Manuscript received June 22, 2017; revised September 12, 2017; accepted
October 10, 2017. This work was supported by the E.C. Program select the optimal per application energy–error tradeoff.
VINEYARD through H2020 Grant under Grant 687628. (Corresponding 3) An analytical error analysis is conducted, showing that
author: Vasileios Leon.) the output error of the proposed technique is bounded
The authors are with the School of Electrical and Computer
Engineering, National Technical University of Athens, Athens 15780, and predictable. Such a rigorous error analysis leads to
Greece (e-mail: vleon@microlab.ntua.gr; zervakis@microlab.ntua.gr; precise and a priori error estimation for any input distri-
dsoudris@microlab.ntua.gr; pekmes@microlab.ntua.gr). bution, without the need of time-consuming simulations.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. 4) We show that the proposed technique outperforms
Digital Object Identifier 10.1109/TVLSI.2017.2767858 related state-of-the-art approximate signed multipliers in
1063-8210 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
terms of hardware and accuracy, achieving up to 40% counter that reduces the partial product stages, and then use it
less energy dissipation for comparable error values. to build larger multipliers. However, the delay is large due
More specifically, the proposed technique is applied to a to the array structure of the multiplier. Kyaw et al. [12]
16×16 bit multiplier and is evaluated using industrial strength propose a multiplier that divides the operands in two parts:
tools, i.e., Synopsys Design Compiler, PrimeTime, and Mentor an accurate multiplication-based part that includes the MSBs
Graphics ModelSim. Compared with the accurate multiplier, and an approximate nonmultiplication-based part for the LSBs
the proposed technique delivers up to 55% energy and area that does not generate partial products. This design delivers
reduction, for mean relative error up to 0.93%. Moreover, significant reductions in power and delay, but only for specific
compared with related state-of-the-art approximate computing input combinations. Moreover, Liu et al. [13] propose an
signed multipliers, i.e., [15], [17], [18], our technique out- approximate multiplier with configurable error recovery that
performs them significantly in terms of energy consumption uses inaccurate fast adders for the partial product additions.
and error. Finally, we examine the scalability of the proposed Although this multiplier delivers low power, the error imposed
technique and show that for the same error value, the delivered cannot be predicted, as it depends on the carry propagation.
energy savings increase as the multiplier size increases. Narayanamoorthy et al. [14] statically split the operands in
The rest of this paper is organized as follows. Section II three m-bit segments and perform the multiplication utilizing
overviews the related work in the field of approximate adders the segment that contains the most significant nonzero bit.
and multipliers. In Section III, the proposed approximate However, this approach exhibits small scalability, as m should
hybrid high radix encoding is introduced and its application be at least half the operand bit-width in order to keep the accu-
for generating inexact multipliers is described. In Section IV, racy in acceptable limits. Reference [15] extended this idea to
the error analysis of the proposed approximate multipliers is enable dynamic range multiplications. The dynamic partition
presented, while in Section V, we evaluate the proposed tech- technique requires extra components for the signed multi-
nique by comparing it with related state-of-the-art approximate plications, adding significant hardware overhead. Recently,
multipliers. Finally, Section VI concludes this paper. Zendegani et al. [21] proposed a multiplier that rounds the
input operands into the nearest exponent of two. Finally, [23]
replaces the floating-point operations with fixed-point ones,
II. R ELATED W ORK
and by applying the proposed stochastic rounding, achieves
In this section, prior works in the field of approximate good accuracy results in training deep neural networks while
multipliers, related to our proposed design, are discussed. delivering high energy savings by limiting the data precision
As far as the approximate adders are concerned, [4] produces representation.
approximate adders by simplifying the logic used in the The modified Booth encoding is commonly used in signed
adder. Gupta et al. [5] design imprecise full adder cells by multipliers [16]–[18], [24], [25]. Although these techniques
approximating their logic function and then use them to build perform fast multiplications, the number of the partial prod-
approximate adders. Nevertheless, it is not clear how these ucts is not reduced in most cases, in contrast with our
adders can be used in different tree architectures and how design. Zervakis et al. [16] introduce the partial product
the error scales in the case of multioperand accumulation. perforation technique, where they omit the generation of
Reference [6] proposes an approximate adder that consists of some partial products based on the modified Booth encoding.
an accurate and an inaccurate part. Reference [7] designs a Jiang et al. [17] propose an approximate radix-8 booth multi-
multiplier in which the LSBs of the additions are approximated plier that uses an approximate adder for producing ±3 A, and
by applying bitwise OR to the respective input bits. In [8], combine this idea with the truncation method. Recently, Liu
a fast approximate adder is produced by limiting the carry et al. [18] designed approximate modified Booth encoders by
propagation, based on a proof that the longest carry chain modifying its K-Map, and combined them with an approximate
in an n-bit adder is log n. However, despite the fact that compressor.
all these techniques demonstrate the benefit of approximate In this paper, an approximate hybrid high radix encod-
computing, their fixed functionality and low-level design limit ing for designing energy-error efficient inexact multipliers
further improvements in efficiency. is proposed. The difference with the existing related work
Regarding the approximations in multiplication schemes, is that it concerns approximations on the generation of the
Kulkarni et al. [9] propose an underdesigned 2 × 2 inaccurate partial products, and can be combined with any accumulation
multiplier block to produce the partial products, and then use it technique, approximate or not. Another significant aspect of
as a building block to design larger multipliers. This technique this paper is that the error imposed depends only on the
is characterized by high error and hardware overhead for the configuration parameter k, and as a result, it can be calculated
error detection and correction due to the small building block. without the need for exhaustive simulations. Consequently,
Momeni et al. [10] propose two approximate 4:2 compressors a precise estimation of the output quality can be extracted
to accumulate the partial products by modifying the respective for the application’s inputs, giving the flexibility to target the
accurate truth table, and then use them to build approximate maximum energy reduction for a specific error bound.
multipliers. The negative aspect of this technique is that there
is no parameter to adjust the accuracy of the multiplier. Lin III. A PPROXIMATE H YBRID H IGH R ADIX M ULTIPLIERS
and Lin [11] introduce a high accuracy approximate 4 × 4 High radix encodings offer partial products reduction, and
Wallace tree multiplier by employing a 4:2 approximate as a result, their accumulation requires smaller trees, leading
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LEON et al.: APPROXIMATE HYBRID HIGH RADIX ENCODING FOR ENERGY-EFFICIENT INEXACT MULTIPLIERS 3
where
j = −2b2 j +1 + b2 j + b2 j −1
y R4 (2)
and
k
y0R2 = −2k−1 bk−1 + 2k−2 bk−2 + · · · + b0 . (3)
The radix-4 encoding includes (n − k)/2 digits y R4
j ∈
k
{0, ±1, ±2}, while y0R2 ∈ {0, ±1, ±2, ±3, . . . , ± (2k−1 − 1), Table I presents the accurate radix-4 encoding. The output
−2k−1 } corresponds to the radix-2k encoding. Overall, B is signals sign j , ×1 j , and ×2 j define the radix-4 digit y R4
j . Their
encoded with (n − k)/2 + 1 digits. logic equations are the following:
The above hybrid high radix encoding is characterized
sign j = b2 j +1 (7)
by increased logic complexity, due to the high radix values
k
of y0R2 that are not power of two, and thus, an approximate ×1 j = b2 j −1 ⊕ b2 j (8)
version is proposed. However, in order to retain high accuracy, ×2 j = (b2 j +1 ⊕ b2 j ) · (b2 j −1 ⊕ b2 j ). (9)
the radix-4 encoding of the MSB is performed accurately.
In particular, in the approximate encoding, all the values that Table II presents the approximate radix-2k encoding. The
are not power of two and the k − 4 smallest powers of two logic equations of the encoding signals that define the radix-
k
as well, are rounded to the nearest of the 4 largest powers of 2k digit ŷ0R2 are the following:
two or 0, so that the sum of all the values of the approximate
k sign = bk−1 (10)
digit ŷ0R2 is 0. We choose to keep only the four largest powers k−4
of two, so that the radix-2k encoding circuit requires only ×2 = (bk−2 · bk−3 · bk−4 + bk−2 · bk−3 · bk−4 )
about the double area in comparison with the accurate radix-4 · (bk−4 ⊕ bk−5 ) (11)
encoder. Therefore, B is approximately encoded as follows: ×2 k−3
= bk−1 · bk−2 · (bk−3 · bk−4 · bk−5 + bk−3 · bk−4 )
n/2−1
k
+ bk−1 · bk−2 · (bk−3 · bk−4 · bk−5 + bk−3 · bk−4 )
B̃ = j 4 + ŷ0
y R4 j R2
(4) (12)
j =k/2
k≥4 ×2 k−2
= bk−2 · bk−3 · (bk−1 + bk−4 )
where + bk−2 · bk−3 · (bk−1 + bk−4 ) (13)
×2 k−1
= bk−1 · bk−2 · bk−3 + bk−1 · bk−2 · bk−3 . (14)
j ∈ {0, ±1, ±2}
y R4 (5)
and The effectiveness of the approximate hybrid radix encoding
technique is explored with its application to 16-bit signed
k
ŷ0R2 ∈ {0, ±2k−4 , ±2k−3 , ±2k−2 , ±2k−1 }. (6) numbers, for k = 6, 8, 10, namely, the LSBs are encoded using
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 1. i-bit partial product generator based on (a) accurate radix-4 encoding and the approximate (b) radix-64, (c) radix-256, and (d) radix-1024 encoding.
ai : i-bit of operand A, ai = ai ⊕ sign.
Fig. 2. Partial product tree based on the hybrid encoding of accurate radix-4 and approximate (a) radix-64, (b) radix-256, and (c) radix-1024 encoding. :
partial product bits from the approximate high radix encoding. •: partial product bits from the accurate radix-4 encoding. and : inverted MSBs of the
partial products. and ◦: sign factors.
LEON et al.: APPROXIMATE HYBRID HIGH RADIX ENCODING FOR ENERGY-EFFICIENT INEXACT MULTIPLIERS 5
full adders [28]. Also, a full adder is equal to 7 unit gates = p B (B) . (18)
n/2−1 R4
according to the used model. Therefore, since the accurate ∀B y 4 j + y R2k
j =k/2 j 0
radix-4 multiplier accumulates m = n/2 + 1 operands (includ- k≥4
ing the correction term), the partial product accumulation In the case of uniform distribution, MRED is obtained by
requires 7n(n − 2)/2 unit gates. Similarly, the partial product setting p B (B) = 1/2n .
tree of the proposed multipliers consists of m = n/2 − k/2 + 2 In Fig. 3, the pdfs of RED for the RAD64 multiplier and
operands, and thus, the accumulation requires 7n(n − k)/2 the distribution of RED according to the encoded operand B
unit gates. Regarding the final accurate 2n-bit addition, it is for the multipliers RAD2k , k = 6, 8, 10 are presented. The
performed by a fast adder that consists of 2n half adders of 3 pdf of RED is similar for the other two multipliers and shows
unit gates, n log2 2n propagate group circuits of 3 unit gates, that the probability of having small RED is high. The error
and 2n XOR-2 gates. distribution follows a Gaussian distribution with bounds for
Based on the previous analysis, Table V presents the maximum error, i.e., the error is very small for large values of
total number of area unit gates required in the accurate B. In particular, RAD64 exhibits RED more than 10% only
radix-4 16-bit multiplier (ACCR4) and the proposed if B ∈ [−100, 100]. The respective intervals for RAD256 and
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 3. (a) PDF of RED for RAD64 and RED values of (b) RAD64, (c) RAD256, and (d) RAD1024.
RAD1024 are [−400, 400] and [−1500, 1500]. However, these TABLE VI
intervals represent a very small fraction of all the possible A PPROXIMATE M ULTIPLIERS C OMPARISON
values of B in 16-bit arithmetic. Therefore, the possibility of
having RED more than 10% is 0.001%, 0.004%, and 0.02%
for RAD64, RAD256, and RAD1024, respectively.
Finally, the provided equations enable the calculation of
the error propagation in more complex hardware accelerators.
Consider two approximate products P1 and P2 , produced
by the proposed RAD multipliers, with errors ε p1 and ε p2 ,
respectively. Using [30], the propagated errors of the addition
y and the multiplication p are obtained by
y + ε y = P1 + P2 + ε p1 + ε p2 + ε+ (19)
p + ε p = P1 · P2 + P1 · ε p2 + P2 · ε p1 + ε p1 · ε p2 + ε p1 p2
(20)
where ε+ is the error introduced by the addition operation and
ε p1 , ε p2 , and ε p1 p2 are obtained from (17). All the examined multipliers are implemented in Verilog,
In conclusion, in terms of error, the proposed RAD multi- synthesized using Synopsys Design Compiler and the TSMC
pliers constitute very efficient approximate designs. Even the 65-nm standard cell library and simulated with Mentor Graph-
RAD1024, that induces the highest error, features MRED only ics ModelSim. The critical path delay and the area are reported
0.93% and PRED 93.26%. Moreover, although high errors by Synopsys Design Compiler, while the power consumption
may be produced, the probability of such an error is negligible. is measured with Synopsys PrimeTime-PX tool using all the
For example, for the RAD1024 multiplier, the probability of possible input combinations. All the designs are synthesized
RED being higher than 10% is 0.02%. and simulated at 1 V, i.e., the nominal supply voltage. All
the examined multipliers are evaluated and compared as both
V. E XPERIMENTAL R ESULTS stand-alone circuits and as components of real-life applica-
tions. The applications used in this evaluation consist of typical
This section includes the evaluation of the proposed design
examples where approximate computing is applied.
in terms of accuracy (error) and hardware (delay, area, power,
and energy). For comparisons, the accurate radix-4 and radix-8
multipliers are implemented, as well as the approximate multi- A. Evaluation at Circuit Level
pliers [15], [17], [18]. R8ABM1 and R8ABM2-15 [17] use the In this section, we experimentally evaluate the examined
radix-8 encoding and calculate 3 A with an adder multipliers as stand-alone circuits. Table VI presents the
that operates approximately for the 8 LSBs. Addi- comparison results of all the approximate multipliers in terms
tionally, R8ABM2-15 truncates 15 bits of the partial of hardware and accuracy. Power and area have been measured
products. R4ABM1-14, R4ABM1-16, R4ABM2-14, and at the critical path delay of each multiplier, while energy is
R4ABM2-16 [18] use the accurate radix-4 encoder for produc- defined as the product of power and delay. Regarding the
ing the MSBs of the partial product tree, and an approximate error values, both MRED and PRED have been obtained by
radix-4 encoder for its LSBs (14 and 16), with R4ABM2 per- performing exhaustive simulations over all the possible input
forming more approximations than R4ABM1. The radix mul- combinations. The error metrics of our multipliers have also
tipliers of [17] and [18] are implemented to accumulate been calculated and verified in software, by using (17) and (18)
accurately the partial products with a Wallace tree, just as presented in Section IV.
the proposed multipliers of this paper. Finally, DRUM6 [15] RAD64 gives the best MRED among all the designs,
selects a 6-bit segment starting from the leading nonzero bit as approximations are performed only in the 6 LSBs of B.
of the input operands and sets the LSB of the truncated values In addition, the approximated values have small absolute
to “1.” distance from the accurate. In R4ABM1-14, a very small error
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LEON et al.: APPROXIMATE HYBRID HIGH RADIX ENCODING FOR ENERGY-EFFICIENT INEXACT MULTIPLIERS 7
TABLE VII
H ARDWARE G AINS AND A CCURACY R ESULTS OF R EAL -L IFE A PPLICATIONS W ITH A PPROXIMATE M ULTIPLIERS
LEON et al.: APPROXIMATE HYBRID HIGH RADIX ENCODING FOR ENERGY-EFFICIENT INEXACT MULTIPLIERS 9
[5] V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and [29] J. Liang, J. Han, and F. Lombardi, “New metrics for the reliability of
K. Roy, “IMPACT: IMPrecise adders for low-power approximate approximate and probabilistic adders,” IEEE Trans. Comput., vol. 62,
computing,” in Proc. 17th IEEE/ACM Int. Symp. Low-Power Electron. no. 9, pp. 1760–1771, Sep. 2013.
Design, Aug. 2011, pp. 409–414. [30] C. Li, W. Luo, S. S. Sapatnekar, and J. Hu, “Joint precision opti-
[6] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, “Design of mization and high level synthesis for approximate computing,” in Proc.
low-power high-speed truncation-error-tolerant adder and its application ACM/IEEE Design Autom. Conf., Jun. 2015, pp. 1–6.
in digital signal processing,” IEEE Trans. Very Large Scale Integr. (VLSI) [31] J. Park, E. Amaro, D. Mahajan, B. Thwaites, and H. Esmaeilzadeh,
Syst., vol. 18, no. 8, pp. 1225–1229, Aug. 2010. “AxGames: Towards crowdsourcing quality target determination in
[7] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “Bio-inspired approximate computing,” in Proc. ACM Int. Conf. Architectural Support
imprecise computational blocks for efficient VLSI implementation of Programm. Lang. Oper. Syst., Apr. 2016, pp. 623–636.
soft-computing applications,” IEEE Trans. Circuits Syst. I, Reg. Papers, [32] I. Sobel and G. Feldman, “A 3×3 isotropic gradient operator for image
vol. 57, no. 4, pp. 850–862, Apr. 2010. processing,” in Stanford Artificial Intelligence Project. 1968.
[8] A. K. Verma, P. Brisk, and P. Ienne, “Variable latency speculative
addition: A new paradigm for arithmetic circuit design,” in Proc. Design,
Vasileios Leon received the Diploma degree from
Autom. Test Eur., Mar. 2008, pp. 1250–1255.
the Department of Computer Engineering and Infor-
[9] P. Kulkarni, P. Gupta, and M. Ercegovac, “Trading accuracy for power
matics, University of Patras, Greece, in 2016. He
with an underdesigned multiplier architecture,” in Proc. 24th Int. Conf.
is currently working toward the Ph.D. degree at
VLSI Design, Jan. 2011, pp. 346–351.
the School of Electrical and Computer Engineering,
[10] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and
National Technical University of Athens, Greece.
analysis of approximate compressors for multiplication,” IEEE Trans.
His research interests include approximate com-
Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015.
puting, low-power VLSI designs, arithmetic circuits
[11] C.-H. Lin and I.-C. Lin, “High accuracy approximate multiplier with designs, FPGA designs, digital signal processing,
error correction,” in Proc. IEEE 31st Int. Conf. Comput. Design, and image processing.
Oct. 2013, pp. 33–38.
[12] K. Y. Kyaw, W. L. Goh, and K. S. Yeo, “Low-power high-speed mul-
tiplier for error-tolerant application,” in Proc. IEEE Int. Conf. Electron
Devices Solid-State Circuits, Dec. 2010, pp. 1–4. Georgios Zervakis received the Diploma from the
[13] C. Liu, J. Han, and F. Lombardi, “A low-power, high-performance Department of Electrical and Computer Engineering,
approximate multiplier with configurable partial error recovery,” in Proc. National Technical University of Athens, Greece,
Design, Autom. Test Eur., Mar. 2014, pp. 1–4. in 2012, where he is currently working toward the
[14] S. Narayanamoorthy, H. A. Moghaddam, Z. Liu, T. Park, and N. S. Kim, Ph.D. degree in the field of digital and microproces-
“Energy-efficient approximate multiplication for digital signal process- sor system design.
ing and classification applications,” IEEE Trans. Very Large Scale His research interests include approximate com-
Integr. (VLSI) Syst., vol. 23, no. 6, pp. 1180–1184, Jun. 2015. puting, hardware acceleration on cloud, arithmetic
[15] S. Hashemi, R. I. Bahar, and S. Reda, “DRUM: A dynamic range circuits, and low power design.
unbiased multiplier for approximate applications,” in Proc. IEEE/ACM
Int. Conf. Comput.-Aided Design, Nov. 2015, pp. 418–425.
[16] G. Zervakis, K. Tsoumanis, S. Xydis, D. Soudris, and K. Pekmestzi, Dimitrios Soudris received the Diploma and the
“Design-efficient approximate multiplication circuits through partial Ph.D. degree in electrical engineering from the
product perforation,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., University of Patras, Greece, in 1987 and 1992,
vol. 24, no. 10, pp. 3105–3117, Oct. 2016. respectively.
[17] H. Jiang, J. Han, F. Qiao, and F. Lombardi, “Approximate radix-8 booth He was working as a Professor at the Department
multipliers for low-power and high-performance operation,” IEEE Trans. of Electrical and Computer Engineering, Democritus
Comput., vol. 65, no. 8, pp. 2638–2644, Aug. 2016. University of Thrace for 13 years since 1995. He
[18] W. Liu, L. Qian, C. Wang, H. Jiang, J. Han, and F. Lombardi, “Design is currently working as Associate Professor at the
of approximate radix-4 booth multipliers for error-tolerant computing,” School of Electrical and Computer Engineering,
IEEE Trans. Comput., vol. 66, no. 8, pp. 1435–1441, Aug. 2017. Department of Computer Science, National Techni-
[19] Z. Babić, A. Avramović, and P. Bulić, “An iterative logarithmic multi- cal University of Athens, Greece. His research inter-
plier,” Microprocess. Microsyst., vol. 35, no. 1, pp. 23–33, Feb. 2011. ests include embedded systems design, reconfigurable architectures, reliability
[20] J. N. Mitchell, “Computer multiplication and division using binary log- and low power VLSI design. He has published more than 340 papers in
arithms,” IRE Trans. Electron. Comput., vol. EC-11, no. 4, pp. 512–517, international journals and conferences. Also, he is coauthor/coeditor in seven
Aug. 1962. books of Kluwer and Springer. He is leader and principal investigator in
[21] R. Zendegani, M. Kamal, M. Bahadori, A. Afzali-Kusha, and numerous research projects funded from the Greek Government and Industry,
M. Pedram, “RoBa multiplier: A rounding-based approximate multiplier European Commission, ENIAC-JU and European Space Agency.
for high-speed yet energy-efficient digital signal processing,” IEEE Dr. Soudris has served as General Chair and Program Chair in several con-
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 2, pp. 393–401, ferences. Also, he received an award from INTEL and IBM for the EU project
Feb. 2017. LPGD 25256, awards in ASP-DAC 05 and VLSI 05 for EU AMDREL project
[22] G. Zervakis, S. Xydis, K. Tsoumanis, D. Soudris, and K. Pekmestzi, IST-2001-34379.
“Hybrid approximate multiplier architectures for improved power-
accuracy trade-offs,” in Proc. IEEE/ACM Int. Symp. Low Power Elec-
tron. Design, Jul. 2015, pp. 79–84. Kiamal Pekmestzi received the Diploma in elec-
[23] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep trical engineering from the National Technical Uni-
learning with limited numerical precision,” in Proc. Int. Conf. versity of Athens, Greece, in 1975 and the Ph.D.
Mach. Learn., Jul. 2015, pp. 1737–1746. degree in electrical engineering from the University
[24] K.-J. Cho, K.-C. Lee, J.-G. Chung, and K. K. Parhi, “Design of low- of Patras, Greece, in 1981.
error fixed-width modified booth multiplier,” IEEE Trans. Very Large From 1975 to 1981 he was a Research Fellow
Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 522–531, May 2004. at the Electronics Department, Nuclear Research
[25] J. P. Wang, S. R. Kuang, and S. C. Liang, “High-accuracy fixed-width Center “Demokritos.” From 1983 to 1985, he was
modified booth multipliers for lossy applications,” IEEE Trans. Very a Professor at the Higher School of Electronics,
Large Scale Integr. (VLSI) Syst., vol. 19, no. 1, pp. 52–60, Jan. 2011. Athens, Greece. Since 1985, he has been with the
[26] C. S. Wallace, “A suggestion for a fast multiplier,” IEEE Trans. Electron. National Technical University of Athens, where he
Comput., vol. EC-13, no. 1, pp. 14–17, Feb. 1964. is currently a Professor at the Department of Electrical and Computer Engi-
[27] A. Tyagi, “A reduced-area scheme for carry-select adders,” IEEE Trans. neering. His research interests include efficient implementation of arithmetic
Comput., vol. 42, no. 10, pp. 1163–1170, Oct. 1993. operations, design of embedded and microprocessor-based systems, architec-
[28] N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems tures for reconfigurable computing, VLSI implementation of cryptography,
Perspective, 4th ed. Reading, MA, USA: Addison-Wesley, 2010. and DSP algorithms.