Professional Documents
Culture Documents
ALS Survey
ALS Survey
0018-9219 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
III. A L S : S T R U C T U R A L N E T L I S T
where desc(i) are all direct descendants of node i.
T R A N S F O R M AT I O N S
After nodes have been ranked according to the desired
Among methods that implement structural netlist trans-
formation, we review five different works that we group metric, the GLP framework iteratively removes a node from
the original circuit, setting its output to a constant; it then
according to their adopted strategy:
1) greedy heuristics for netlist pruning;
2) greedy heuristics for netlist manipulation;
3) stochastic netlist transformation;
4) exhaustive exploration for netlist pruning.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
form of simplification that leads to minimal area. Each
expression is given both a value, defined as the reduction in
the area it realizes when simplified, and a weight, defined
0
0
0 0 0
0
as the introduced error; then, a knapsack formulation is 1 0 1 1 0 0
0 1 0 1 0 1
area where BLASYS is used to synthesize approximate
versions for circuit x2 from the LGSynth91 benchmark
0 0 0
1 set [63]. x2 has ten primary inputs and seven primary
1 1 1 0 outputs; thus, one can vary the factorization degree d from
0 1 0 1 d = 6 down to d = 1. By changing d, BLASYS enables
1 1 0 0
a graceful tradeoff between QoR and design area. For
M= .
0 1 0 1 example, for x2, we can reduce the design area by 36%
1 1 0 0 at the expense of introducing 2.79% bit flips in the truth
0 0 0 1 table.
1 0 0 0 Since the construction of the matrix that represents the
truth table is limited by the number of primary inputs of
2 The order of the rows in M corresponds to inputs the circuit, BLASYS incorporates a hypergraph partitioning
000, 001, . . . , 111. method that breaks down a large circuit into a number
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
V. A P P R O X I M A T E H I G H - L E V E L
SYNTHESIS
All the aforementioned ALS techniques operate on either
Fig. 17. ABACUS approximation flow. An input design is parsed
the gate-based netlist or on a Boolean representation of and captured in an AST form. The AST is then manipulated to
the circuit. In contrast, the goal of AHLS is to integrate generate approximate ASTs that are written back to Verilog and
inexact operators as building blocks, in order to efficiently synthesized into approximate circuits.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
applied to the AST to create approximate variants of the of approximate adders offering optimal accuracy–energy
AST, which are then written back to Verilog and simulated tradeoff for addition. To choose the best approximate
for error evaluation and synthesized for the area and adder, different approximate adders are substituted in the
power evaluation. These operators include the following circuit, the impact of their errors is propagated to the
transformations. circuit’s outputs, and the final QoR and power consump-
tion are evaluated. If the QoR meets a global error bound
1) Bit-width simplifications: This transformation reduces
and the approximate designs offer a new Pareto point
the bit width of signals by truncating them and
in terms of the power-QoR tradeoff, then the approxi-
removing some of the least significant bits. This
mate design is accepted. To determine the best precision,
transformation automatically leads to a reduction in
Constantinides et al. [6] formulated an integer linear pro-
the underlying logic resources that operate on these
gram (ILP) to identify the best precision approximation
signals.
for each arithmetic operator, such that the total energy
2) Variable to constant substitutions: Given the simu-
is minimized subject to an upper bound on the output
lation results of the original circuit using the test
error variance. Their approach only considers bit-width
benches, ABACUS analyzes the relative change in
simplifications as an avenue toward approximation. It is
each signal, and if a signal has a relatively small range
generalized by Li et al. [25], which also considers inex-
of values, then it is replaced by a constant that is
act implementations of adders and multipliers. In [25],
equal to the mean value of the range. This substitu-
the solution of the ILP formulation determines the best
tion enables the logic synthesis tool to simplify the
approximation for each arithmetic operator.
underlying logic.
Raising the level of abstraction even further,
3) Approximate arithmetic transformations: This trans-
Lee et al. [24] proposed an AHLS method that is capable of
formation replaces an exact arithmetic operator (such
creating approximate designs from C-based descriptions of
as ∗ or +) by an approximate operator using the
hardware systems. The main idea is to focus on applying
available approximate designs, such as DRUM [15],
approximations to loops since they are critical to the
[16] and EvoApproxLib [35].
overall latency of digital systems. Rather than taking the
4) Expression transformations: For this transformation,
ABACUS approach, where loops are unrolled and
ABACUS analyzes the arithmetic expressions in the
the resulting statements are approximated individually,
design and replaces them by approximate ver-
the loop structure is kept intact. Instead, the iterations of a
sions; for example, a Verilog assignment statement,
loop are split into a number of classes based on the impact
such as assign z = a*b+c*d, is replaced by
of each iteration on accuracy. For example, a loop with N
assign z = a*(b+d), where the value of signal c
iterations can be transformed into two: a high-accuracy
is approximated by the value of signal a and in the
loop with N1 iterations and a low-accuracy loop with
process of saving one multiplier.
N2 iterations, such that N = N1 + N2 . More aggressive
5) Loop transformations: In ABACUS, all Verilog loops
approximations are then applied to the low-accuracy loop.
are unrolled, and the statements in the unrolled
For example, the low-accuracy loop can use low precision
loop are then approximated using the aforementioned
by using smaller bit widths for the variables, whereas
transformations.
the high-accuracy loop can use the fixed-point precision
An attractive feature of ABACUS is that it applies the of the original loop. To analyze the iterations of a loop,
transformation operators in a random fashion and then the input C code is first compiled to an intermediate
selects and retains the approximate designs that have the representation (IR) using LLVM [23]. Using test benches,
optimal tradeoff between accuracy and power consump- the IR is then profiled to evaluate the data statistics,
tion. This selection and optimization are achieved using mobility, latency, and energy of every operation in the
an evolutionary approach based on the Non-dominated design. The profiling considers approximate substitutions
Sorting Genetic Algorithm (NSGA-II) [9]. ABACUS also for each IR statement and propagates the error to evaluate
prioritizes approximations on the critical path to create the QoR. This QoR estimation is done on a per iteration
positive slack that can be exploited to reduce the voltage basis in order to classify each iteration based on its impact
while keeping the frequency intact [37]. The reduction in on accuracy. The iterations are then grouped together into
voltage leads to additional power savings beyond those classes based on their accuracy. For low-accuracy loops,
provided by the approximate logic. the degree of approximation to the variables or arithmetic
Focusing on approximate arithmetic transformations, operations is chosen to minimize the latency or energy
a few works (e.g., [25], [34], and [68]) propose a subject to QoR constraints.
more detailed analysis of this transformation in order Roldao-Lopes et al. [41] also proposed a method for
to choose a Pareto-optimal approximate operator from approximating loop nests, considering the case of linear
within a large library of approximate arithmetic opera- equation solvers. They observe that, when using itera-
tors [35]. For example, consider a statement, such as tive methods, accuracy can be increased by either run-
assign z=a+b, where the exact adder is replaced by ning more iterations or by performing each computation
an approximate adder from a library of a large number more precisely. They showcase that a combination of both
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
VI. C O M P A R A T I V E E V A L U A T I O N O F
ALS METHODS
As described in Sections III and IV, ALS methods pro-
posed in the literature have different characteristics. Most
prominently, a number of works (described in Section III)
introduce transformation strategies performed on graphs
representing gate-level netlists of the circuit to be approx-
imated, while others (covered in Section IV) perform the
automatic rewriting of Boolean functions to be realized Fig. 18. Approximate design variants of an 8-bit unsigned
in hardware. In this section, we conduct an empirical multiplier. Approximate designs were created with BLASYS,
comparison of a number of available tools in the public EvoApprox, GLP, and manually by truncating the least significant
domain and provide a number of insightful results. bits.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
of an inexact circuit is indeed an upper threshold on the that, in most real-world scenarios, too large error values
actual one, while the other two provide a soft bound would not be meaningful; thus, we can conclude that
based on Monte Carlo simulations on a subset of the CC is a valid methodology when strict maximum error
circuit inputs, which they employ to assess the QoR at guarantees are required.
different approximation steps. Fig. 20 reports the area ratio In our third comparison, we contrast approximate
obtained by different constraints on such maximum error. logic rewriting methods with approximate HLS methods.
All techniques provide good to excellent approximations, We consider a combinational version of the eight-point
especially when considering the stricter error constraint. FFT benchmark written in Verilog RTL. We contrast ABA-
Their results are comparable for small error values, while, CUS [38], which is an example of approximate HLS tools,
for larger errors, CC tends to dominate GLP, and BLASYS against BLASYS [19] that relies on approximate logic
dominates both, except for the 8-bit absolute difference, rewriting.5 For BLASYS, the circuit is first compiled to
which it was unable to process. For example, BLASYS Boolean representation using Yosys. Fig. 21 shows the
halves the area with an error of 0.1% for the 32-bit adder error in mse (%) versus area ratio compared with the accu-
(top row, central), while, for the 4-bit multiply–add, both rate design for designs from both techniques. The results
CC and BLASYS reduce the original area to less than 30% from ABACUS show two clusters of points with a big
with a 14% maximum error (bottom row, left). BLASYS distance in between them with respect to mse. For the first
again proves to be very effective in exploring the design cluster with low mse, BLASYS is able to achieve better
space and scale to large circuits, while CC suffers from results with a smoother tradeoff between accuracy and
its exponential worst case complexity though it generally area as it incrementally approximates each partition in the
provides better approximations than GLP. Indeed, on larger design. However, this fine granularity comes at an expense:
error values, it has to stop before terminating exploration,
and this leads to worse approximations. However, we note 5 ABACUS is available at https://github.com/scale-lab/ABACUS
Fig. 20. Comparison of area reduction in approximate circuits for GLP [45], CC [44], and BLASYS [19]. The maximum absolute error (max
error) is expressed as a percentage of the maximum circuit output.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
have been proposed, their design is still manually done. approximability in advance of time-consuming synthesis
Automated approaches that are able to explore the ben- and simulations. Nonetheless, these methodologies are
efits of accuracy configuration, in light of the overhead only limited to combinatorial designs (limited support for
(energywise and areawise) implied by the logic governing sequential ones is provided in [2]) and do not provide an
the transitions between different approximate settings are, explicit link between circuit-level QoR metrics (e.g., aver-
instead, currently lacking. age and maximum error) and application-level ones (SNR
and SSIM).
B. Approximate High-Level Synthesis 2) Cross-Layer Approximate Design: Another
1) Arithmetic Library: The synthesis of approximate little-explored aspect related to AHLS is that simplifications
operators is clearly a key first requirement for the wide can be driven by algorithmic considerations [50] as well
adoption of ALS techniques. Except for very few works as ALS-based ones. Interaction between the two levels
on adders [28] and multipliers [66], most of the existing (i.e., algorithmic and AHLS) has mainly been explored in
studies on approximate arithmetic have been focused on an ad hoc fashion, such as the design of linear equation
integer arithmetic disregarding floating-point operators. solvers in [41] or that of classification engines in [11],
Building robust AHLS tools requires extensive libraries of while more general cross-layer approaches are still in their
approximate floating- and fixed-point arithmetic compo- infancy [38].
nents that need to be developed. Key toward the maturity of AHLS will be the develop-
As covered in Section V, works in AHLS do study the ment of strategies to quickly link error and QoR metrics
composition of multiple approximate operators to real- across abstraction levels. On the one hand, such effort
ize approximate accelerators [25], [34], [38], [68]. Still, encompasses the development of tools dedicated to the
by picking the proper building blocks among the ones early evaluation, from a high-level viewpoint, of approx-
available in a library, such strategies cast accelerator design imation choices. Available frameworks are currently ham-
as a selection problem. Therefore, solutions are restricted pered by a narrow application scope, such as approximate
only to integrating the available library elements. More neural networks [55]. On the other hand, methodologies
flexible approaches have indeed been proposed, but they should be envisioned that estimate the QoR impact of a
only focus on signal width optimizations [7], [8], with- simplifying transformation, limiting or entirely avoiding
out considering the opportunities for logic simplification post hoc simulations. Such stance is far from trivial, since
offered by ALS. These are, instead, leveraged by the metrics of interest are usually related to a considered
strategies in [2] and [20], which employs interval arith- scope: application-wide QoR being best expressed with
metic to estimate the impact, as seen at the output of a metrics such as the classification accuracy or SNR, while,
hardware module, of approximating its constituent opera- at the operator levels, ERs or average errors are usually
tors. This stance allows the assessment of the operations more appropriate.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
REFERENCES
[1] O. Akbari, M. Kamal, A. Afzali-Kusha, and Autom. Conf. (DAC), Jun. 2018, pp. 55:1–55:6. IEEE/ACM Design, Autom. Test Eur., Mar. 2014,
M. Pedram, “Dual-quality 4:2 compressors for [20] J. Huang, J. Lach, and G. Robins, “A methodology pp. 1–6.
utilizing in dynamic accuracy configurable for energy-quality tradeoff using imprecise [39] A. Ranjan, A. Raha, S. Venkataramani, K. Roy, and
multipliers,” IEEE Trans. Very Large Scale Integr. hardware,” in Proc. 49th Annu. Design Autom. Conf. A. Raghunathan, “ASLAN: Synthesis of approximate
(VLSI) Syst., vol. 25, no. 4, pp. 1352–1361, (DAC), Jun. 2012, pp. 504–509. sequential circuits,” in Proc. Design, Autom. Test Eur.
Apr. 2017. [21] A. B. Kahng and S. Kang, “Accuracy-configurable Conf. Exhib., Mar. 2014, pp. 1–6.
[2] G. Ansaloni, I. Scarabottolo, and L. Pozzi, adder for approximate arithmetic designs,” in Proc. [40] S. Rehman, W. El-Harouni, M. Shafique, A. Kumar,
“Judiciously spreading approximation among DAC Design Autom. Conf., 2012, pp. 820–825. and J. Henkel, “Architectural-space exploration of
arithmetic components with top-down inexact [22] P. Kulkarni, P. Gupta, and M. Ercegovac, “Trading approximate multipliers,” in Proc. Int. Conf.
hardware design,” in Applied Reconfigurable accuracy for power with an underdesigned Comput. Aided Design, Nov. 2016, pp. 1–8.
Computing Architectures. New York, NY, USA: multiplier architecture,” in Proc. 24th Int. Conf. [41] A. Roldao-Lopes, A. Shahzad, G. A. Constantinides,
Springer, Mar. 2020, pp. 14–29. VLSI Design, Jan. 2011, pp. 346–351. and E. C. Kerrigan, “More flops or more precision?
[3] B. Barrois, O. Sentieys, and D. Menard, “The [23] C. Lattner and V. Adve, “LLVM: A compilation Accuracy parameterizable linear equation solvers
hidden cost of functional approximation against framework for lifelong program analysis & for model predictive control,” in Proc. 17th IEEE
careful data sizing—A case study,” in Proc. Design, transformation,” in Proc. ACM Int. Symp. Code Symp. Field-Program. Custom Comput. Mach.,
Autom. Test Eur. Conf. Exhib. (DATE), Mar. 2017, Gener. Optim., 2004, pp. 75–84. Apr. 2009, pp. 209–216.
pp. 181–186. [24] S. Lee, L. K. John, and A. Gerstlauer, “High-level [42] H. Saadat, H. Javaid, and S. Parameswaran,
[4] R. E. Bryant, “Graph-based algorithms for Boolean synthesis of approximate hardware under joint “Approximate integer and floating-point dividers
function manipulation,” IEEE Trans. Comput., precision and voltage scaling,” in Proc. IEEE/ACM with near-zero error bias,” in Proc. 56th Annu.
vol. C-35, no. 8, pp. 677–691, Aug. 1986. Design Automat. Conf., Mar. 2017, Design Autom. Conf., Jun. 2019, pp. 161:1–161:6.
[5] A. Chandrasekharan, M. Soeken, D. Große, and pp. 187–192. [43] I. Scarabottolo, G. Ansaloni, G. Constantinides, and
R. Drechsler, “Approximation-aware rewriting of [25] C. Li, W. Luo, S. S. Sapatnekar, and J. Hu, “Joint L. Pozzi, “Partition and propagate: An error
AIGs for error tolerant applications,” in Proc. 35th precision optimization and high level synthesis for derivation algorithm for the design of approximate
Int. Conf. Comput.-Aided Des., Nov. 2016, p. 83. approximate computing,” in Proc. IEEE/ACM Design circuits,” in Proc. 56th Design Autom. Conf.,
[6] G. A. Constantinides, P. Y. K. Cheung, and W. Luk, Autom. Conf., vol. 104, Jun. 2015, pp. 1–6. Jun. 2019, pp. 1–6.
“Optimum wordlength allocation,” in Proc. 10th [26] A. Lingamneni, C. Enz, K. Palem, and C. Piguet, [44] I. Scarabottolo, G. Ansaloni, and L. Pozzi, “Circuit
Annu. IEEE Symp. Field-Program. Custom Comput. “Synthesizing parsimonious inexact circuits carving: A methodology for the design of
Mach. (FCCM), 2002, pp. 219–228. through probabilistic design techniques,” ACM approximate hardware,” in Proc. Design, Autom.
[7] G. Constantinides, A. Kinsman, and N. Nicolici, Trans. Embedded Comput. Syst., vol. 12, no. 2s, Test Eur. Conf. Exhib., Mar. 2018, pp. 545–550.
“Numerical data representations for FPGA-based pp. 1–26, May 2013. [45] J. Schlachter, V. Camus, K. V. Palem, and C. Enz,
scientific computing,” IEEE Des. Test. Comput., [27] G. Liu and Z. Zhang, “Statistically certified “Design and applications of approximate circuits by
vol. 28, no. 4, pp. 8–17, Jul. 2011. approximate logic synthesis,” in Proc. IEEE/ACM gate-level pruning,” IEEE Trans. Very Large Scale
[8] G. A. Constantinides, P. Cheung, and W. Luk, “The Int. Conf. Comput.-Aided Design (ICCAD), Integr. (VLSI) Syst., vol. 25, no. 5, pp. 1694–1702,
multiple wordlength paradigm,” in Proc. IEEE Nov. 2017, pp. 344–351. May 2017.
Symp. FPGAs for Custom Comput. Mach., pp. 51–60, [28] W. Liu, L. Chen, C. Wang, M. O’Neill, and [46] E. Sentovich and K. Singh, “A system for sequential
2001. F. Lombardi, “Design and analysis of inexact circuit synthesis,” EECS, UCB, Berkeley, CA, USA,
[9] K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan, floating-point adders,” IEEE Trans. Comput., vol. 65, Tech. Rep. UCB/ERL M92/41, 1992.
“A fast elitist non-dominated sorting genetic no. 1, pp. 308–314, Jan. 2016. [47] M. Shafique, W. Ahmad, R. Hafiz, and J. Henkel,
algorithm for multi-objective optimization: [29] W. Liu, F. Lombardi, and M. Shulte, “A retrospective “A low latency generic accuracy configurable
NSGA-II,” in Proc. Int. Conf. Parallel Problem Solving and prospective view of approximate computing adder,” in Proc. 52nd Design Autom. Conf.,
Nature, 2000, pp. 849–858. [Point of view],” Proc. IEEE, vol. 108, no. 3, Jul. 2015, pp. 1–6.
[10] T. A. Drane, T. M. Rose, and G. A. Constantinides, pp. 394–399, Mar. 2020. [48] K. Shi, D. Boland, E. Stott, S. Bayliss, and
“On the systematic creation of faithfully rounded [30] J. Ma, S. Hashemi, and S. Reda, “Approximate logic G. A. Constantinides, “Datapath synthesis for
truncated multipliers and arrays,” IEEE Trans. synthesis using BLASYS,” in Proc. Workshop overclocking: Online arithmetic for
Comput., vol. 63, no. 10, pp. 2513–2525, Open-Source EDA Technol., no. 5, 2019, pp. 1–3, latency-accuracy trade-offs,” in Proc. Design Autom.
Oct. 2014. Art. no. 5. Conf., 2014, pp. 1–6.
[11] L. Ferretti et al., “Tailoring SVM inference for [31] J. Miao, A. Gerstlauer, and M. Orshansky, [49] D. Shin and S. K. Gupta, “A new circuit
resource-efficient ECG-based epilepsy monitors,” in “Approximate logic synthesis under general error simplification method for error tolerant
Proc. Design, Autom. Test Eur. Conf. Exhib. (DATE), magnitude and frequency constraints,” in Proc. Int. applications,” in Proc. Design, Autom. Test Eur. Conf.
Mar. 2019, pp. 1–4. Conf. Comput. Aided Design, Nov. 2013, Exhib., Mar. 2011, pp. 1–6.
[12] P. Fišer and J. Schmidt, “A comprehensive set of pp. 779–786. [50] S. Sidiroglou-Douskos, S. Misailovic, H. Hoffmann,
logic synthesis and optimization examples,” in Proc. [32] P. Miettinen and J. Vreeken, “Model order selection and M. Rinard, “Managing performance vs.
12th Int. Workshop Boolean Problems (IWSBP), for Boolean matrix factorization,” in Proc. 17th accuracy trade-offs with loop perforation,” in Proc.
2016, pp. 151–158. ACM SIGKDD Int. Conf. Knowl. Discovery Data SIGSOFT Symp. Eur. Conf. Found. Softw. Eng.,
[13] S. Froehlich, D. Grosse, and R. Drechsler, “Error Mining, 2011, pp. 51–59. Sep. 2011, pp. 124–134.
bounded exact BDD minimization in approximate [33] A. Mishchenko, S. Chatterjee, and R. Brayton, [51] M. Soeken, D. Große, A. Chandrasekharan, and
computing,” in Proc. Int. Symp. Multi-Level Logic, “DAG-aware AIG rewriting: A fresh look at R. Drechsler, “BDD minimization for approximate
May 2017, pp. 254–259. combinational logic synthesis,” in Proc. IEEE/ACM computing,” in Proc. 21st Asia South Pacific Design
[14] J. Han and M. Orshansky, “Approximate computing: Design Autom. Conf., Jul. 2006, pp. 532–535. Autom. Conf. (ASP-DAC), Jan. 2016,
An emerging paradigm for energy-efficient design,” [34] V. Mrazek, M. A. Hanif, Z. Vasicek, L. Sekanina, and pp. 474–479.
in Proc. IEEE Eur. TEST Symp. (ETS), May 2013, M. Shafique, “autoAx: An automatic design space [52] J. E. Stine et al., “FreePDK: An open-source
pp. 1–6. exploration and circuit building methodology variation-aware design kit,” in Proc. IEEE Int. Conf.
[15] S. Hashemi, R. I. Bahar, and S. Reda, “DRUM: utilizing libraries of approximate components,” in Microelectron. Syst. Edu. (MSE), Jun. 2007,
A dynamic range unbiased multiplier for Proc. IEEE/ACM Design Autom. Conf., vol. 123, pp. 173–174.
approximate applications,” in Proc. IEEE/ACM Int. Jun. 2019, pp. 1–6. [53] S. Su, Y. Wu, and W. Qian, “Efficient batch
Conf. Comput.-Aided Design, Nov. 2015, [35] V. Mrazek, Z. Vasicek, and L. Sekanina, statistical error estimation for iterative multi-level
pp. 418–425. “EvoApproxLib: Extended library of approximate approximate logic synthesis,” in Proc. 55th
[16] S. Hashemi, R. I. Bahar, and S. Reda, “A low-power arithmetic circuits,” in Proc. Workshop Open-Source ACM/ESDA/IEEE Design Autom. Conf. (DAC),
dynamic divider for approximate applications,” in EDA Technol., no. 10, 2019, pp. 1–4. Jun. 2018, pp. 54:1–54:6.
Proc. IEEE/ACM Design Autom. Conf., vol. 105, [36] K. E. Murray, A. Suardi, V. Betz, and [54] Z. Vasicek and L. Sekanina, “Evolutionary approach
Jun. 2016, pp. 1–6. G. Constantinides, “Calculated risks: Quantifying to approximate digital circuits design,” IEEE Trans.
[17] S. Hashemi and S. Reda, “Generalized matrix timing error probability with extended static timing Evol. Comput., vol. 19, no. 3, pp. 432–444,
factorization techniques for approximate logic analysis,” IEEE Trans. Comput.-Aided Design Integr. Jun. 2015.
synthesis,” in Proc. ACM/IEEE Design Automat. Test Circuits Syst., vol. 38, no. 4, pp. 719–732, [55] F. Vaverka, V. Mrazek, Z. Vasicek, and L. Sekanina,
Eur., Mar. 2019, pp. 1289–1292. Apr. 2019. “TFApprox: Towards a fast emulation of DNN
[18] S. Hashemi, H. Tann, F. Buttafuoco, and S. Reda, [37] K. Nepal, S. Hashemi, H. Tann, R. I. Bahar, and approximate hardware accelerators on GPU,” in
“Approximate computing for biometric security S. Reda, “Automated high-level generation of Proc. Design, Autom. Test Eur. Conf. Exhib.,
systems: A case study on iris scanning,” in Proc. low-power approximate computing circuits,” IEEE Mar. 2020, pp. 1–6.
Design, Autom. Test Eur. Conf. Exhib. (DATE), Trans. Emerg. Topics Comput., vol. 7, no. 1, [56] S. Venkataramani, K. Roy, and A. Raghunathan,
Mar. 2018, pp. 319–324. pp. 18–30, Jan. 2019. “Substitute-and-simplify: A unified design
[19] S. Hashemi, H. Tann, and S. Reda, “BLASYS: [38] K. Nepal, Y. Li, R. I. Bahar, and S. Reda, “ABACUS: paradigm for approximate and quality configurable
Approximate logic synthesis using Boolean matrix A technique for automated behavioral synthesis of circuits,” in Proc. Design, Autom. Test Eur. Conf.
factorization,” in Proc. 55th ACM/ESDA/IEEE Design approximate computing circuits,” in Proc. Exhib., Mar. 2013, pp. 1367–1372.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[57] S. Venkataramani, A. Sabne, V. Kozhikkottu, K. Roy, A BDD-based logic optimization system,” in Proc. [67] Z. Wang, A. C. Bovik, H. R. Sheikh, and
and A. Raghunathan, “SALSA: Systematic logic ACM/IEEE Design Autom. Conf., Jul. 2000, E. P. Simoncelli, “Image quality assessment: From
synthesis of approximate circuits,” in Proc. 49th pp. 866–876. error visibility to structural similarity,” IEEE Trans.
Design Autom. Conf., Jun. 2012, pp. 796–801. [63] S. Yang, “Logic synthesis and optimization Image Process., vol. 13, no. 4, pp. 600–612,
[58] R. Venkatesan, A. Agarwal, K. Roy, and benchmarks user guide,” Microelectron. Center Apr. 2004.
A. Raghunathan, “MACACO: Modeling and analysis North Carolina, Research Triangle, NC, USA, 1991. [68] G. Zervakis, S. Xydis, D. Soudris, and K. Pekmestzi,
of circuits for approximate computing,” in Proc. Int. [64] Y. Yao, S. Huang, C. Wang, Y. Wu, and W. Qian, “Multi-level approximate accelerator synthesis
Conf. Comput. Aided Design, Nov. 2011, “Approximate disjoint bi-decomposition and its under voltage island constraints,” IEEE Trans.
pp. 667–673. application to approximate logic synthesis,” in Proc. Circuits Syst. II, Exp. Briefs, vol. 66, no. 4,
[59] C. Wolf. Yosys Open SYnthesis Suite. Accessed: 2016. IEEE Int. Conf. Comput. Design (ICCD), Nov. 2017, pp. 607–611, Apr. 2019.
[Online]. Available: http://www.clifford.at/yosys/ pp. 517–524. [69] Z. Zhang, Y. He, J. He, X. Yi, Q. Li, and B. Zhang,
documentation.html [65] R. Ye, T. Wang, F. Yuan, R. Kumar, and Q. Xu, “On “Optimal slope ranking: An approximate computing
[60] Y. Wu and W. Qian, “An efficient method for reconfiguration-oriented approximate adder design approach for circuit pruning,” in Proc. IEEE Int.
multi-level approximate logic synthesis under error and its application,” in Proc. IEEE/ACM Int. Conf. Symp. Circuits Syst. (ISCAS), May 2018, pp. 1–4.
rate constraint,” in Proc. 53rd ACM/EDAC/IEEE Comput.-Aided Design (ICCAD), Nov. 2013, [70] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and
Design Autom. Conf. (DAC), Jun. 2016, pp. 1–6. pp. 48–54. Z. H. Kong, “Design of low-power high-speed
[61] Q. Xu, T. Mytkowicz, and N. Kim, “Approximate [66] P. Yin, C. Wang, W. Liu, and F. Lombardi, “Design truncation-error-tolerant adder and its application
computing: A survey,” IEEE Des. Test, vol. 33, no. 1, and performance evaluation of approximate in digital signal processing,” IEEE Trans. Very Large
pp. 8–22, Jan. 2016. floating-point multipliers,” in Proc. IEEE Comput. Scale Integr. (VLSI) Syst., vol. 18, no. 8,
[62] C. Yang, M. Ciesielski, and V. Singhal, “BDS: Soc. Annu. Symp. VLSI, Jul. 2016, pp. 296–301. pp. 1225–1229, Aug. 2010.
Authorized licensed use limited to: Universita della Svizzera Italiana. Downloaded on September 24,2020 at 09:18:11 UTC from IEEE Xplore. Restrictions apply.