Professional Documents
Culture Documents
Application-Specific Arithmetic: Computing Just Right For The Reconfigurable Computer and The Dark Silicon Era 2024th Edition de Dinechin
Application-Specific Arithmetic: Computing Just Right For The Reconfigurable Computer and The Dark Silicon Era 2024th Edition de Dinechin
https://textbookfull.com/product/the-dark-side-of-silicon-energy-
efficient-computing-in-the-dark-silicon-era-1st-edition-amir-m-
rahmani/
https://textbookfull.com/product/intelligent-computing-theories-
and-application-de-shuang-huang/
https://textbookfull.com/product/just-right-1st-edition-
catherine-lievens-lievens/
https://textbookfull.com/product/applied-reconfigurable-
computing-architectures-tools-and-applications-nikolaos-voros/
Basic Immunology and Its Clinical Application 2024th
Edition Mitsuru Matsumoto
https://textbookfull.com/product/basic-immunology-and-its-
clinical-application-2024th-edition-mitsuru-matsumoto/
https://textbookfull.com/product/quantum-computing-for-computer-
scientists-mannucci/
https://textbookfull.com/product/advanced-binary-for-programming-
computer-science-logical-bitwise-and-arithmetic-operations-and-
data-encoding-and-representation-sunil-tanna/
https://textbookfull.com/product/the-silicon-valley-model-
management-for-entrepreneurship-1st-edition-annika-steiber/
Florent de Dinechin
Martin Kumm
Application-Specific
Arithmetic
Computing Just Right
for the Reconfigurable Computer
and the Dark Silicon Era
Application-Specific Arithmetic
Florent de Dinechin • Martin Kumm
Application-Specific
Arithmetic
Computing Just Right
for the Reconfigurable Computer
and the Dark Silicon Era
123
Florent de Dinechin Martin Kumm
CITI laboratory, INSA-Lyon Fulda University of Applied Sciences
Villeurbanne, France Fulda, Germany
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
1 ... with one huge exception: for lack of time we had to leave finite-field arithmetic and
cryptography out of this book, and we deeply regret it.
Preface vii
Acknowledgments
Many people have contributed to making this book possible. Some have
been our mentors, some have been our colleagues, some have been our stu-
dents. Some began as students, then became colleagues. Many have become
close friends. For these reasons and for many others, the authors would like
to thank:
Levent Aksoy, Andrea Bocco, David Boland, Sylvie Boldo, Marthe
Bonamy, Andreas Böttcher, Philip Brisk, Javier Bruguera, Nicolas Brunie,
Chip-Hong Chang, Peter Cheung, Ray Cheung, Sylvain Chevillard, Maxime
Christ, Caroline Collange, Marius Cornea, Octavian Creţ, Marc Daumas,
David Defour, Steven Derrien, Oregane Desrentes, Jérémie Detrey, Laurent-
Stéphane Didier, Benoît Dupont de Dinechin, Yves Durand, Miloš Ercego-
vac, Alexey Ershov, Diana Fanghänel, Julian Faraone, Mathias Faust, Fab-
rizio Ferrandi, Nicolai Fiege, Silviu Filip, Luc Forget, Giulio Gambardella,
Rémi Garcia, Mario Garrido, Alexandre Goldsztejn, Bernard Goossens, Jean-
Marie Gorce, Oscar Gustafsson, Tobias Habermann, Martin Hardieck,
Matei Iştoan, Paolo Ienne, Claude-Pierre Jeannerod, Håkan Johansson,
Mioara Joldeş, Petter Källström, Johannes Kappauf, Nachiket Kapre, Cris-
tian Klein, Marco Kleinlein, Harald Klingbeil, Andreas Koch, Dirk Koch,
Jonas Kühle, Akash Kumar, Martin Langhammer, Philippe Langlois,
Christoph Lauter, Vincent Lefèvre, Philip H. W. Leong, Nicolas Louvet,
Wayne Luk, David Lutz, Sergey Maidanov, Peter Markstein, Steve McK-
eever, Michael Mecik, Guillaume Melquiond, Uwe Meyer-Baese, Marc Mez-
zarobba, Konrad Möller, Lionel Morel, Duncan Moss, Jean-Michel Muller,
Ettore Napoli, Andrey Naraikin, Stuart Oberman, Julian Oppermann, Bog-
dan Pasca, Jakoba Petri-König, Thomas Preußer, Patrice Quinton, Sanjay Ra-
jopadhye, Melanie Reuter-Oppermann, Nathalie Revol, Guillaume Salagnac,
Shahab Sanjari, Tapio Saramäki, Kentaro Sano, Olivier Sentieys, Nabeel Shi-
razi, Patrick Sittel, Hayden So, Christine Solnon, Lukas Sommer, Antonio
Strollo, Ping Tak Peter Tang, David Thomas, Arnaud Tisserand, Stephen
Tridgell, Yohann Uguen, Álvaro Vázquez, Gilles Villard, Anastasia Volkova,
Lukas Weber, Markus Weinhardt, John Wickerson, Paul Zimmermann, Peter
Zipf, and many others.
It is a bit frightening to have found recollections of drinking beer while
talking science with so many people.
The authors also want to express their gratitude to all the FloPoCo de-
velopers (with extra apologies to those whose work had to be left out of the
present book):
Hatam Abdoli, Sebastian Banescu, Louis Besème, Andreas Böttcher, Nico-
las Bonfante, Nicolas Brunie, Romain Bouarah, Victor Capelle, Jiajie Chen,
Maxime Christ, Caroline Collange, Quentin Corradi, Orégane Desrentes,
Jérémie Detrey, Antonin Dudermel, Fabrizio Ferrandi, Nicolai Fiege, Luc
Forget, Martin Hardieck, Valentin Huguet, Kinga Illyes, Matei Iştoan,
viii Preface
Finally, special thanks to all the people who have spent some of their time
to review part of this book:
Hatam Abdoli, Noah Bertholon, Andreas Böttcher, Romain Bouarah,
Maxime Christ, Quentin Corradi, Orégane Desrentes, Christophe de
Dinechin, Silviu Filip, Luc Forget, Robin Green, Tobias Habermann, Agathe
Herrou, Michael Mecik, Jean-Michel Muller, Raymond Nijssen, Pierre-Yves
Piriou, and Tanguy Risset.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Computer Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Trends in Digital Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 The Dark Silicon Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 The Dawn of the Reconfigurable Computer . . . . . . . . . 4
1.3 Beyond Traditional Arithmetic Operators . . . . . . . . . . . . . . . . . . 7
1.4 Opportunities of Application-Specific Arithmetic . . . . . . . . . . . 8
1.4.1 Operator Specialization . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.2 Operator Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.3 Function Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.4 Resource Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.5 Target-Specific Optimizations . . . . . . . . . . . . . . . . . . . . . . 15
1.5 General Design Principles for Application-Specific
Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.1 Parameterize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.2 Compute Just Right (Last-Bit Accuracy) . . . . . . . . . . . . 17
1.5.3 Expose the Design Space . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5.4 Do Not Write Operators, Write Generators . . . . . . . . . . 19
1.5.5 Generate the Test Bench Along with the Operator . . . 20
1.6 Organization of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 Support Software: The FloPoCo Project . . . . . . . . . . . . . . . . . . . . . 22
1.8 Other Books on Computer Arithmetic . . . . . . . . . . . . . . . . . . . . . . 22
1.8.1 General Computer Arithmetic . . . . . . . . . . . . . . . . . . . . . 23
1.8.2 Arithmetic for FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8.3 Other Specialized Books . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.8.4 Approximate Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.9 Notations Used in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2 Number Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1 Representing Integers in Binary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.1.1 Binary Representation of Positive Integers . . . . . . . . . . 34
ix
x Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
1 Introduction
CHAPTER 1
later to vacuum tubes, and the idea remained relevant with transistor-based
switching, the core technology behind current computers.
During the electronic computer era, technological progress has often
driven evolutions in the computer arithmetic domain. One illustrative ex-
ample is the use of memory when evaluating elementary functions (expo-
nential and logarithm, trigonometric, etc.). In the late 1980s, as memory sizes
doubled every other year following Moore’s law, it became relevant to de-
sign algorithms that could replace long computations with large tables of
precalculated values [Tan89; Tan90; GB91; Tan92]. Such algorithms domi-
nated the state of the art for three decades. But what technology gives, it
can also take back. Even though memory size still increases, memory ac-
cess time has become the main performance bottleneck in current multicore
computers. For this reason, table-free algorithms are now considered more
efficient in many cases [CHT02]: it is often faster and more energy-efficient
to compute tens of additions and multiplications than to perform one mem-
ory lookup.
Digital technology is still evolving. How should computer arithmetic
adapt to current and upcoming changes?
One answer proposed in this book is look beyond traditional computer
arithmetic, long focused on the four basic operations and their monolithic,
one-size-fits-all implementations in the arithmetic units at the core of micro-
processors. The object of this book is the systematic study of non-standard
arithmetic units, built and optimized for specific applications: what we call
application-specific arithmetic.
The remainder of this chapter first justifies this claim and then sketches
the scope and potential of application-specific arithmetic.
For five decades, the evolution of technology has been well summarized
by Moore’s law, which states that the number of transistors that can be inte-
grated on an economically viable chip doubles every other year. This law has
remained surprisingly accurate over this long period of time, partly because
it has become a self-fulfilling prophecy on which the economics of a ma-
jor part of the digital sector relies. The International Technology Roadmap
for Semiconductors (ITRS) has made sure that we get smaller transistors (or
equivalently more transistors per chip) as long as fundamental physical lim-
its are not reached.
However, the circuit-level performance brought by smaller transistors is
no longer what it used to be. Until the end of the twentieth century, each
technology generation brought smaller transistors that would switch faster
and consume less power (this is referred to as Dennard scaling). Thus, pro-
cessors were more and more powerful and clocked higher and higher. How-
1.2 Trends in Digital Technology 3
ever, there is a limit on the amount of heat per square centimeter that can be
dissipated off a chip, and this limit (the “power wall”) was reached around
2004. From there on, transistors could still be made smaller with each gen-
eration, but their practical operational frequency reached a plateau. This
prompted the switch to multicore processors as the most efficient means
to harness increased transistor densities.
Unfortunately, in the upcoming technology generations, the increase of
transistor density will entail a further increase in power density. Roughly
speaking, every two years, transistor density is multiplied by two, but the
power dissipation of each transistor is only reduced by a factor 1.4 [Tay12].
The net effect is that we can no longer afford to operate 100% of a chip all
the time. A certain percentage must be kept switched off (or dark, hence
the term dark silicon [Mer09]). Even worse, this percentage is now increasing
exponentially with each technology generation.
instruction. The argument was the following: it was possible to build very
efficient division in software thanks to a new instruction, the Fused Mutiply-
Add (FMA) [Mar00]. Therefore, the silicon area occupied by a hardware di-
vider could be much better spent, for instance, on a second FMA, which is a
more generally useful operator than a divider.
However, we are now seeing a comeback of hardware dividers in proces-
sors [Bru18; Bru22]. The reason for that is, again, power consumption (and
not performance as in 1997 [OF97a]). A nicely pipelined hardware divider
will be much more power-efficient at computing a division than software:
the latter needs to read and decode instructions, move data from and to the
register file, etc. Even if a hardware divider would not be faster than the
equivalent software (and it is), it would increase the overall performance by
freeing some power budget to be used by other computations. Therefore, it
seems a worthwhile use of dark silicon.
What function is next? Square root is very comparable to division in terms
of algorithms and complexity. However, still according to [OF97a], it ac-
counts for less than 0.33% of floating-point instructions. Then, we may see a
regain of interest in implementations of floating-point elementary functions
such as exponential, logarithm, and trigonometrics. Some GPUs (graphics
processing units) already provide hardware acceleration for the most com-
mon of these functions [OS05].
Another way to put dark silicon to use, according to [Tay12], is to use
more area to buy energy. For instance, since power dissipation is propor-
tional to the square of the frequency, two copies of a functional block, each
clocked to half the frequency, will actually consume half the power. Here,
what is needed is not a new operator, but a performance variation on an
existing one. This book will also attempt to cover this performance space.
We will conclude this section about dark silicon with three quotes. The
first two come from Don Burger’s keynote talk at the HIPEAC 2013 confer-
ence:
“The end of Moore’s Law doesn’t mean the end of progress”, and “noth-
ing in our careers has been as fundamental as this transition.”
The third one is from the conclusion of [Tay12]: “Although silicon is get-
ting darker, for researchers the future is bright and exciting.”
The intent of this book is to bring some of this excitement in the computer
arithmetic domain.
Configurable
logic blocks
Configurable
routing resources
b ×
− d
4 ×
a ×
c