Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Application-Specific Arithmetic:

Computing Just Right for the


Reconfigurable Computer and the Dark
Silicon Era 2024th Edition De Dinechin
Visit to download the full and correct content document:
https://textbookfull.com/product/application-specific-arithmetic-computing-just-right-for
-the-reconfigurable-computer-and-the-dark-silicon-era-2024th-edition-de-dinechin/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

The Dark Side of Silicon Energy Efficient Computing in


the Dark Silicon Era 1st Edition Amir M. Rahmani

https://textbookfull.com/product/the-dark-side-of-silicon-energy-
efficient-computing-in-the-dark-silicon-era-1st-edition-amir-m-
rahmani/

Intelligent Computing Theories and Application De-


Shuang Huang

https://textbookfull.com/product/intelligent-computing-theories-
and-application-de-shuang-huang/

Just Right 1st Edition Catherine Lievens [Lievens

https://textbookfull.com/product/just-right-1st-edition-
catherine-lievens-lievens/

Applied Reconfigurable Computing Architectures Tools


and Applications Nikolaos Voros

https://textbookfull.com/product/applied-reconfigurable-
computing-architectures-tools-and-applications-nikolaos-voros/
Basic Immunology and Its Clinical Application 2024th
Edition Mitsuru Matsumoto

https://textbookfull.com/product/basic-immunology-and-its-
clinical-application-2024th-edition-mitsuru-matsumoto/

Academic English for Computer Science An English for


Specific and Academic Purposes Course for International
students of Computer Science Computer Engineering
Information and Communication Systems Rizopoulou
https://textbookfull.com/product/academic-english-for-computer-
science-an-english-for-specific-and-academic-purposes-course-for-
international-students-of-computer-science-computer-engineering-
information-and-communication-systems-riz/

Quantum computing for computer scientists Mannucci

https://textbookfull.com/product/quantum-computing-for-computer-
scientists-mannucci/

Advanced Binary for Programming Computer Science


Logical Bitwise and Arithmetic Operations and Data
Encoding and Representation Sunil Tanna

https://textbookfull.com/product/advanced-binary-for-programming-
computer-science-logical-bitwise-and-arithmetic-operations-and-
data-encoding-and-representation-sunil-tanna/

The Silicon Valley Model Management for


Entrepreneurship 1st Edition Annika Steiber

https://textbookfull.com/product/the-silicon-valley-model-
management-for-entrepreneurship-1st-edition-annika-steiber/
Florent de Dinechin
Martin Kumm

Application-Specific
Arithmetic
Computing Just Right
for the Reconfigurable Computer
and the Dark Silicon Era
Application-Specific Arithmetic
Florent de Dinechin • Martin Kumm

Application-Specific
Arithmetic
Computing Just Right
for the Reconfigurable Computer
and the Dark Silicon Era

123
Florent de Dinechin Martin Kumm
CITI laboratory, INSA-Lyon Fulda University of Applied Sciences
Villeurbanne, France Fulda, Germany

ISBN 978-3-031-42807-4 ISBN 978-3-031-42808-1 (eBook)


https://doi.org/10.1007/978-3-031-42808-1

© Springer Nature Switzerland AG 2024


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or
part of the material is concerned, specifically the rights of translation, reprinting, reuse of illus-
trations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or
by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are
exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information
in this book are believed to be true and accurate at the date of publication. Neither the pub-
lisher nor the authors or the editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have been made. The pub-
lisher remains neutral with regard to jurisdictional claims in published maps and institutional
affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.


Preface

Computer arithmetic is the art of representing and manipulating numbers in


a computer. This book is about hardware arithmetic, the art of building those
elementary computing operators that sit at the core of a computer, and give
it its name. A programmer who uses the symbols .+ or ./ in a program proba-
bly knows that it will trigger hardware arithmetic operators, but many more
operators are hidden from view. Talking through a mobile phone, paying
with a credit card, driving a car, looking at a digital watch, all these activities
rely on myriads of hardware arithmetic operators embedded in the artifacts
of the digital age.
It is therefore not a surprise that computer arithmetic is among the first
subjects taught in electrical engineering or computer science. Students learn
very early how to build a binary adder, sometimes right after the basic
gates (NOT, AND, OR, and XOR) have been introduced. Most students also
quickly forget it, and only a select few will proceed to discover the more
arcane aspects of hardware arithmetic, for instance the design of dividers,
square root units, or trigonometric operators. This book is for them.
There are already many good books about computer arithmetic, and there
are excellent courses in many universities. They perfectly cover the hard-

ware implementation of .+, .−, .×, ./, . , the five basic operations that have
made the microprocessor so universal and so versatile in the twentieth cen-
tury. This is mainstream hardware arithmetic, and these operators are here
to stay. However, they are increasingly being complemented by a wild pro-
liferation of new operators, diverse in shape and size, but sharing one com-
mon characteristic: they no longer need to be versatile, they no longer need
to be universal, they need to do one thing for one application, and do it well
(faster, or cheaper, or more accurately, or with less energy than the main-
stream arithmetic operators). Multipliers that only multiply by a constant
17π
such as .sin( 1024 ) or .log(2); fancy mathematical functions such as sigmoid,
inverse cumulative distributions, or softmax; operators for low-precision
data that would be catastrophic for general-purpose computing but seem
perfectly suited to signal processing or machine learning: these are but a
few examples of what researchers and application-specific circuit design-
ers have been exploring in the past two decades. This book is about these
application-specific operators.
v
vi Preface

As each application comes with its specific computing needs, there is an


infinite number of application-specific operators. We therefore had to make
a selection, and we apologize in advance to our readers if they do not find
in these pages exactly the operator they need. However, the purpose of this
book is not only to describe operators: it is also to describe methods, tech-
niques that can be used to build all the operators that we do not describe.
These methods were not developed for this book: they are the fruit of more
than a decade of experiments and of development by the authors in the
FloPoCo software project. FloPoCo is an open-source program that gen-
erates application-specific operators. Its operators have, for the most part,
been published in various articles. The core techniques have not, or only in
a rudimentary way. This is the main reason why we wanted to write this
textbook. It fills a gap between existing textbooks and the state of the art.
The twenty-first century has also seen the maturation of the field pro-
grammable gate array (FPGA), and the emergence of reconfigurable com-
puting which complements traditional microprocessors with FPGA-based
accelerators. Some companies are specialized in this kind of acceleration,
and some application domains are increasingly relying on it. When de-
ploying an application on an FPGA, only the operators needed by this
application need to be included: FPGA arithmetics, by definition, should
be application-specific. This book is therefore especially relevant to FPGA-
based computing, and it includes many FPGA-specific arithmetic
techniques.
We hope that this book will be useful to students, to anybody interested
in the design of computing circuits, to practitioners in VLSI design, and to
engineers and researchers working with FPGAs. We hope it will save them
some frustrating tinkering, and guide them on the path to the high-quality
application-specific operators their projects deserve. We hope that they will
find this book useful to their application domain, whatever it may be.1 And
we look forward to their feedback.

Lyon, France Florent de Dinechin


Fulda, Germany Martin Kumm

1 ... with one huge exception: for lack of time we had to leave finite-field arithmetic and
cryptography out of this book, and we deeply regret it.
Preface vii

Acknowledgments

Many people have contributed to making this book possible. Some have
been our mentors, some have been our colleagues, some have been our stu-
dents. Some began as students, then became colleagues. Many have become
close friends. For these reasons and for many others, the authors would like
to thank:
Levent Aksoy, Andrea Bocco, David Boland, Sylvie Boldo, Marthe
Bonamy, Andreas Böttcher, Philip Brisk, Javier Bruguera, Nicolas Brunie,
Chip-Hong Chang, Peter Cheung, Ray Cheung, Sylvain Chevillard, Maxime
Christ, Caroline Collange, Marius Cornea, Octavian Creţ, Marc Daumas,
David Defour, Steven Derrien, Oregane Desrentes, Jérémie Detrey, Laurent-
Stéphane Didier, Benoît Dupont de Dinechin, Yves Durand, Miloš Ercego-
vac, Alexey Ershov, Diana Fanghänel, Julian Faraone, Mathias Faust, Fab-
rizio Ferrandi, Nicolai Fiege, Silviu Filip, Luc Forget, Giulio Gambardella,
Rémi Garcia, Mario Garrido, Alexandre Goldsztejn, Bernard Goossens, Jean-
Marie Gorce, Oscar Gustafsson, Tobias Habermann, Martin Hardieck,
Matei Iştoan, Paolo Ienne, Claude-Pierre Jeannerod, Håkan Johansson,
Mioara Joldeş, Petter Källström, Johannes Kappauf, Nachiket Kapre, Cris-
tian Klein, Marco Kleinlein, Harald Klingbeil, Andreas Koch, Dirk Koch,
Jonas Kühle, Akash Kumar, Martin Langhammer, Philippe Langlois,
Christoph Lauter, Vincent Lefèvre, Philip H. W. Leong, Nicolas Louvet,
Wayne Luk, David Lutz, Sergey Maidanov, Peter Markstein, Steve McK-
eever, Michael Mecik, Guillaume Melquiond, Uwe Meyer-Baese, Marc Mez-
zarobba, Konrad Möller, Lionel Morel, Duncan Moss, Jean-Michel Muller,
Ettore Napoli, Andrey Naraikin, Stuart Oberman, Julian Oppermann, Bog-
dan Pasca, Jakoba Petri-König, Thomas Preußer, Patrice Quinton, Sanjay Ra-
jopadhye, Melanie Reuter-Oppermann, Nathalie Revol, Guillaume Salagnac,
Shahab Sanjari, Tapio Saramäki, Kentaro Sano, Olivier Sentieys, Nabeel Shi-
razi, Patrick Sittel, Hayden So, Christine Solnon, Lukas Sommer, Antonio
Strollo, Ping Tak Peter Tang, David Thomas, Arnaud Tisserand, Stephen
Tridgell, Yohann Uguen, Álvaro Vázquez, Gilles Villard, Anastasia Volkova,
Lukas Weber, Markus Weinhardt, John Wickerson, Paul Zimmermann, Peter
Zipf, and many others.
It is a bit frightening to have found recollections of drinking beer while
talking science with so many people.

The authors also want to express their gratitude to all the FloPoCo de-
velopers (with extra apologies to those whose work had to be left out of the
present book):
Hatam Abdoli, Sebastian Banescu, Louis Besème, Andreas Böttcher, Nico-
las Bonfante, Nicolas Brunie, Romain Bouarah, Victor Capelle, Jiajie Chen,
Maxime Christ, Caroline Collange, Quentin Corradi, Orégane Desrentes,
Jérémie Detrey, Antonin Dudermel, Fabrizio Ferrandi, Nicolai Fiege, Luc
Forget, Martin Hardieck, Valentin Huguet, Kinga Illyes, Matei Iştoan,
viii Preface

Mioara Joldeş, Johannes Kappauf, Cristian Klein, Marco Kleinlein, Kilian


Klug, Jonas Kühle, Keanu Kullmann, Louis Ledoux, Jean Marchal, Antoine
Martinet, Konrad Möller, Raul Murillo, Annika Oeste, Bogdan Pasca, Bog-
dan Popa, Xavier Pujol, Guillaume Sergent, Viktor Schmidt, David Thomas,
Radu Tudoran, Alvaro Vasquez, Anastasia Volkova.

Finally, special thanks to all the people who have spent some of their time
to review part of this book:
Hatam Abdoli, Noah Bertholon, Andreas Böttcher, Romain Bouarah,
Maxime Christ, Quentin Corradi, Orégane Desrentes, Christophe de
Dinechin, Silviu Filip, Luc Forget, Robin Green, Tobias Habermann, Agathe
Herrou, Michael Mecik, Jean-Michel Muller, Raymond Nijssen, Pierre-Yves
Piriou, and Tanguy Risset.
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Computer Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Trends in Digital Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 The Dark Silicon Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 The Dawn of the Reconfigurable Computer . . . . . . . . . 4
1.3 Beyond Traditional Arithmetic Operators . . . . . . . . . . . . . . . . . . 7
1.4 Opportunities of Application-Specific Arithmetic . . . . . . . . . . . 8
1.4.1 Operator Specialization . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.2 Operator Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.3 Function Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.4 Resource Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.5 Target-Specific Optimizations . . . . . . . . . . . . . . . . . . . . . . 15
1.5 General Design Principles for Application-Specific
Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.1 Parameterize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.2 Compute Just Right (Last-Bit Accuracy) . . . . . . . . . . . . 17
1.5.3 Expose the Design Space . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5.4 Do Not Write Operators, Write Generators . . . . . . . . . . 19
1.5.5 Generate the Test Bench Along with the Operator . . . 20
1.6 Organization of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 Support Software: The FloPoCo Project . . . . . . . . . . . . . . . . . . . . . 22
1.8 Other Books on Computer Arithmetic . . . . . . . . . . . . . . . . . . . . . . 22
1.8.1 General Computer Arithmetic . . . . . . . . . . . . . . . . . . . . . 23
1.8.2 Arithmetic for FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8.3 Other Specialized Books . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.8.4 Approximate Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.9 Notations Used in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2 Number Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1 Representing Integers in Binary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.1.1 Binary Representation of Positive Integers . . . . . . . . . . 34

ix
x Contents

2.1.2 Signed Integers in Two’s Complement


Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.1.3 Sign-Magnitude Representation . . . . . . . . . . . . . . . . . . . . 37
2.1.4 Conversion Between Sign-Magnitude and Two’s
Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 Fixed-Point Binary Representations . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2.1 Unsigned Fixed-Point Binary Representation . . . . . . . . 38
2.2.2 Two’s Complement Fixed-Point Binary
Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2.3 Conversion to a Wider Format: Sign Extension . . . . . . 42
2.2.4 Conversion to a Narrower Format: Overflow and
Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.2.5 Alternative Notations for Fixed-Point Formats . . . . . . 43
2.3 High-Radix Binary Representation . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4 Redundant Positional Number Systems . . . . . . . . . . . . . . . . . . . . 46
2.5 Binary Floating-Point Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.5.1 The IEEE 754 Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.5.2 Emerging Non-standard Formats for Machine
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.5.3 A Simplified Floating-Point Format . . . . . . . . . . . . . . . . 54
2.5.4 Respective Motivations of IEEEfloat and Nfloat . . . . . 55
2.5.5 Non-binary Floating-Point Formats . . . . . . . . . . . . . . . . 56
2.6 Logarithmic Number Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.6.1 An Example LNS Format . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.6.2 LNS Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.6.3 LNS in Arbitrary Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3 Computing Just Right: Accuracy Specification and Error


Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.1 Accuracy of an Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.1.1 Absolute and Relative Error . . . . . . . . . . . . . . . . . . . . . . . 67
3.1.2 Unit in the Last Place (ulp) . . . . . . . . . . . . . . . . . . . . . . . . 67
3.1.3 Rounding to the Nearest . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.1.4 Truncation Versus Rounding to the Nearest . . . . . . . . . 70
3.1.5 Rounding and Overflow Specification in
Fixed-Point Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.1.6 Half-Unit Biased (HUB) Format . . . . . . . . . . . . . . . . . . . 71
3.1.7 Rounding to the Nearest When an Approximation
Is Involved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.1.8 Faithful Operators and Last-Bit Accuracy . . . . . . . . . . . 76
3.1.9 Last-Bit Accuracy Versus Rounding to the Nearest . . . 77
3.2 Using the Format to Specify the Accuracy . . . . . . . . . . . . . . . . . . . 77
3.3 Designing Accurate Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.3.1 Parametric Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Contents xi

3.3.2 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80


3.3.3 Error Budget and Design-Space Exploration . . . . . . . . 83
3.4 Now Go Divide and Conquer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4 Field Programmable Gate Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87


4.1 Basic Logic Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.1.1 Look-Up Tables and Flip-Flops . . . . . . . . . . . . . . . . . . . . 89
4.1.2 Basic Logic Elements of AMD FPGAs . . . . . . . . . . . . . . 91
4.1.3 Basic Logic Elements of Intel FPGAs . . . . . . . . . . . . . . . 93
4.1.4 Comparison of BLEs in Commercial FPGAs . . . . . . . . . 94
4.2 DSP Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.3 Memory Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.1 Distributed Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3.2 Shift Register LUTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3.3 Block Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Part I Revisiting Classic Arithmetic


5 Fixed-Point Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.1 Fixed-Point Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.1.1 Unsigned Integer Addition . . . . . . . . . . . . . . . . . . . . . . . . 104
5.1.2 Signed Integer Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1.3 Fixed-Point Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2 Addition Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.2.1 The Full Adder and Half Adder Cells . . . . . . . . . . . . . . 106
5.2.2 Ripple-Carry Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2.3 Addition of Signed Numbers in Two’s
Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2.4 Basic Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2.5 Reconfigurable Adder/Subtractor . . . . . . . . . . . . . . . . . . 110
5.2.6 Carry-Save Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3 Fast Adders for VLSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.3.1 Switched Ripple-Carry Adder . . . . . . . . . . . . . . . . . . . . . 113
5.3.2 Carry-Select Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.3.3 Recursive Carry-Select Adder (Conditional-Sum
Adder) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.3.4 Carry-Lookahead Adder . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.3.5 Recursive Carry-Lookahead Adder . . . . . . . . . . . . . . . . . 117
5.3.6 Parallel Prefix Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.3.7 Compound Fast Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.3.8 Fast Absolute Difference Operator . . . . . . . . . . . . . . . . . 124
5.3.9 Fast Compound Triple Sum . . . . . . . . . . . . . . . . . . . . . . . . 124
5.4 Adders on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
xii Contents

5.4.1 Addition Support on AMD FPGAs . . . . . . . . . . . . . . . . . 126


5.4.2 Addition Support on Intel FPGAs . . . . . . . . . . . . . . . . . . 128
5.4.3 Merging Additional Functionality in an Adder . . . . . . 128
5.4.4 Ternary Adders on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.4.5 Reconfigurable Adder/Subtractor . . . . . . . . . . . . . . . . . . 133
5.5 Fast Adders on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.5.1 Pipelined Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.5.2 Fast Combinatorial Adders . . . . . . . . . . . . . . . . . . . . . . . . 137
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

6 Fixed-Point Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.1.1 Fixed-Point Considerations . . . . . . . . . . . . . . . . . . . . . . . . 146
6.1.2 Area and Delay Considerations . . . . . . . . . . . . . . . . . . . . 146
6.2 Basic Binary Tree Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.3 FPGA-Specific Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.3.1 Exploiting Fast Carry Logic . . . . . . . . . . . . . . . . . . . . . . . . 148
6.3.2 Two-Level Tree of Fast Carry Logic . . . . . . . . . . . . . . . . . 148
6.3.3 Comparing to a Constant . . . . . . . . . . . . . . . . . . . . . . . . . 149
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

7 Sums of Weighted Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151


7.1 Definitions and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.1.1 A Multiplier Using a Compressor Tree . . . . . . . . . . . . . . 153
7.1.2 From Bit Arrays to Bit Heaps . . . . . . . . . . . . . . . . . . . . . . 157
7.1.3 Bit Heaps for Portable Application-Specific
Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.2 Expressing a Computation as a Bit Heap . . . . . . . . . . . . . . . . . . . . 161
7.2.1 Managing Constant Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.2.2 Managing Signed Numbers . . . . . . . . . . . . . . . . . . . . . . . . 162
7.2.3 Algebraic Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.2.4 Truncating a Bit Heap for Last-Bit Accuracy . . . . . . . . . 167
7.3 Compressor Tree Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
7.3.1 Basic Compression Algorithms . . . . . . . . . . . . . . . . . . . . 176
7.3.2 A Bestiary of Compressors . . . . . . . . . . . . . . . . . . . . . . . . 180
7.3.3 Compressor Tree Synthesis Using Arbitrary
Compressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
7.3.4 Some Remarks on the Final Adder . . . . . . . . . . . . . . . . . 193
7.4 Experimentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

8 Fixed-Point Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203


8.1 A Functional Point of View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.1.1 Integer Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.1.2 Fixed-Point Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Contents xiii

8.1.3 Error Analysis Involving Multipliers . . . . . . . . . . . . . . . 209


8.1.4 Summary: Multipliers for Computing Just Right . . . . 211
8.2 Overview of Multiplier Construction . . . . . . . . . . . . . . . . . . . . . . . 213
8.3 Partial Product Generation and Sign Management . . . . . . . . . . . 214
8.3.1 Signed Multiplication in Radix-2 . . . . . . . . . . . . . . . . . . . 215
8.3.2 Radix-4 Multiplication Using Booth Encoding . . . . . . . 215
8.3.3 Higher Radix Multiplication . . . . . . . . . . . . . . . . . . . . . . . 221
8.4 Multiplier Construction for FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . 222
8.4.1 Using a Single DSP for Multiple Multiplications . . . . . 223
8.4.2 Multiplier Construction as a Tiling Problem . . . . . . . . . 226
8.4.3 Solving the Tiling Problem . . . . . . . . . . . . . . . . . . . . . . . . 233
8.5 Truncated Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
8.5.1 Plain or Booth Truncated Multipliers . . . . . . . . . . . . . . . 237
8.5.2 Tiling for Optimal Truncated Multipliers
on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.6 The Karatsuba Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
8.6.1 Two-Part Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
8.6.2 Subtractive Karatsuba Formula . . . . . . . . . . . . . . . . . . . . 245
8.6.3 Tile Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
8.6.4 Square K-Part Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
8.6.5 Recursive Karatsuba . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
8.6.6 Generalized Karatsuba Formula . . . . . . . . . . . . . . . . . . . 249
8.6.7 Karatsuba with Rectangular Sub-multipliers . . . . . . . . 249
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

9 Fixed-Point Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259


9.1 Fixed-Point Division: Problem Formulation . . . . . . . . . . . . . . . . . 260
9.1.1 Format Considerations for Unsigned
Integer Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
9.1.2 Format Considerations for Signed
Integer Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
9.1.3 Format Considerations for Fixed-Point Division . . . . 262
9.1.4 Division of Normalized Floating-Point
Significands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
9.2 Digit-Recurrence Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
9.2.1 Schoolbook Integer Division in Binary (Restoring
Division) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
9.2.2 General Radix-β Formalization of
Integer Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
9.2.3 General Radix-β Formalization for the Division of
Normalized Significands . . . . . . . . . . . . . . . . . . . . . . . . . . 271
9.2.4 Using Redundant Digit Sets . . . . . . . . . . . . . . . . . . . . . . . 272
9.2.5 Initialization and Termination with Redundant
Digits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
xiv Contents

9.2.6 Conversion of the Quotient from Redundant Form


to Binary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
9.2.7 High-Radix Integer Division . . . . . . . . . . . . . . . . . . . . . . . 282
9.2.8 Prescaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
9.2.9 Speeding Up the Partial Remainder Computation . . . 286
9.2.10 A Case Study for Low-Latency Double Precision
Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
9.3 Division Using Multiplication by the Reciprocal . . . . . . . . . . . . . 288
9.3.1 Error Analysis for a Faithfully Rounded Quotient
out of a Reciprocal Approximation . . . . . . . . . . . . . . . . . 289
9.3.2 Reciprocal Using a Generic Approximation
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
9.3.3 Reciprocal Using Newton-Raphson Iteration . . . . . . . . 291
9.4 Division Using Quadratic Series Expansion . . . . . . . . . . . . . . . . . 294
9.5 Other Equivalent Multiplicative Methods . . . . . . . . . . . . . . . . . . . 299
9.5.1 Multiplicative Normalization (Goldschmidt’s
Algorithm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
9.5.2 Higher-Order Housholder Methods . . . . . . . . . . . . . . . 301
9.5.3 Ad hoc Evaluation of Series Expansion . . . . . . . . . . . . . 301
9.6 Square Root, the Little Sister of Division . . . . . . . . . . . . . . . . . . . . 302
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

10 Shifters and Leading Bit Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307


10.1 Shifters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
10.1.1 Avoiding Bidirectional Shifters . . . . . . . . . . . . . . . . . . . . . 308
10.1.2 Parameters of a Generic Shifter Generator . . . . . . . . . . 309
10.1.3 Architecture of an Exact Full Shifter . . . . . . . . . . . . . . . . 312
10.1.4 Barrel Shifter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
10.1.5 Barrel Shifter with Early Sticky Bit Computation . . . . 314
10.2 Variations on Leading Zero Counters . . . . . . . . . . . . . . . . . . . . . . . 316
10.2.1 Naive LZC Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
10.2.2 Logarithmic-Time LZC Architectures . . . . . . . . . . . . . . 320
10.3 Normalizer (Combined LZC + Shifter) . . . . . . . . . . . . . . . . . . . . . 323
10.4 To Read Further . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

11 Basic Floating-Point Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329


11.1 Floating-Point Addition and Subtraction . . . . . . . . . . . . . . . . . . . . 331
11.1.1 General Considerations and Terminology . . . . . . . . . . . 331
11.1.2 Baseline Addition Architecture . . . . . . . . . . . . . . . . . . . . 336
11.1.3 From Baseline Adder to IEEE 754
Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
11.1.4 Dual-Path Architectures and Other Speculative
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Contents xv

11.2 Floating-Point Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343


11.2.1 Baseline Multiplication Architecture . . . . . . . . . . . . . . . . 343
11.2.2 From Baseline Multiplier to IEEE 754 Compliance . . . 345
11.2.3 Faithful Floating-Point Multiplier . . . . . . . . . . . . . . . . . . 346
11.2.4 Injection Rounding to Speed Up a Floating-Point
Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
11.3 Floating-Point Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
11.3.1 Baseline Division Architecture . . . . . . . . . . . . . . . . . . . . . 347
11.3.2 From Baseline Divider to IEEE 754 Compliance . . . . . 350
11.3.3 Faithful Floating-Point Division . . . . . . . . . . . . . . . . . . . . 351
11.3.4 Correct Rounding Out of a Faithfully Rounded
Quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
11.4 Floating-Point Square Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
11.4.1 Baseline Square Root Architecture . . . . . . . . . . . . . . . . . . 353
11.4.2 Faithful Floating-Point Square Root . . . . . . . . . . . . . . . . 355
11.4.3 From Baseline Square Root to IEEE 754
Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
11.4.4 Correct Rounding Out of Faithful Rounding for
Square Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
11.5 Floating-Point Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
11.5.1 Specification of Floating-Point Comparison . . . . . . . . . 357
11.5.2 Implementation of a Floating-Point Comparator . . . . . 359
11.5.3 Specializations of a Floating-Point Comparator . . . . . . 360
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360

Part II Operator Specialization


12 Multiplication by Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
12.1 Shift-and-Add Integer Constant Multiplication . . . . . . . . . . . . . 366
12.1.1 Signed Digit Representation . . . . . . . . . . . . . . . . . . . . . . . 368
12.1.2 Formalization of the Shift-and-Add SCM Problem . . . 369
12.1.3 Minimal-Adder Constant Multiplication by Graph
Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
12.1.4 Minimal-Adder Constant Multiplication by Using
ILP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
12.1.5 Minimal-Adder Constant Multiplication Using
Ternary Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
12.1.6 Minimizing Logic Resources Instead of Number of
Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
12.2 Integer Constant Multiplication Using Tables . . . . . . . . . . . . . . . 383
12.2.1 Tabulated Constant Multipliers . . . . . . . . . . . . . . . . . . . . 383
12.2.2 The Table-and-Addition KCM Algorithm . . . . . . . . . . 384
12.3 Multiplication of a Fixed-Point Number by a Real
Constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
12.3.1 Tabulated Perfectly Rounded Constant
Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
xvi Contents

12.3.2 Faithful Fix-by-Real Using Shift-and-Add . . . . . . . . . . . 387


12.3.3 Faithful Fix-by-Real KCM Algorithm . . . . . . . . . . . . . . . 390
12.4 Multiplication by a Rational Constant . . . . . . . . . . . . . . . . . . . . . . 395
12.4.1 On the Periodicity of the Binary Representation of
Rational Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
12.4.2 Periodical Shift-and-Add Graphs . . . . . . . . . . . . . . . . . . 397
12.4.3 Table-Based Multiplication by Rational Constants . . . 401
12.5 Multiplication by Multiple Constants . . . . . . . . . . . . . . . . . . . . . . . 401
12.5.1 Multiple Constant Multiplication Using
Shift-and-Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
12.5.2 Table-Based Multiple Constant Multiplication . . . . . . . 407
12.5.3 Sum of Constant Products, Integer Case . . . . . . . . . . . . 407
12.5.4 Constant Matrix-Vector Multiplication . . . . . . . . . . . . . . 410
12.5.5 Table-Based Sum of Products of Fixed-Point Inputs
by Real Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
12.6 Other FPGA-Specific Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 414
12.6.1 Constant Multiplication Using DSP Blocks on
FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
12.6.2 Reconfigurable Constant Multiplication . . . . . . . . . . . . 416
12.7 Conclusion: Choosing the Best Constant Multiplication
Technique in a Given Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418

13 Division by Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427


13.1 Multiplying by the Reciprocal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
13.2 Linear Table-Based Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
13.2.1 Radix-2k Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
13.2.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
13.2.3 Iterative or Unrolled Implementation of the Basic
Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
13.2.4 Cost Evaluation of LinArch . . . . . . . . . . . . . . . . . . . . . . . 432
13.2.5 FPGA-Specific Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
13.3 Parallel Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
13.4 Remainder-Only or Quotient-Only . . . . . . . . . . . . . . . . . . . . . . . . . 435
13.4.1 Reciprocal Method Outputting Only the Quotient . . . 435
13.4.2 Remainder-Only Variant of LinArch and BTCD . . . . . 436
13.5 Composite Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
13.5.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
13.5.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
13.5.3 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440

14 Fixed-Point Squares, Cubes, and Other Integer Powers . . . . . . . . . . 443


14.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
14.2 Squarers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
14.2.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
Contents xvii

14.2.2 High-Radix Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445


14.2.3 Booth Recoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
14.2.4 Truncated Squarers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
14.2.5 Squarer-Specific Tiling for FPGAs . . . . . . . . . . . . . . . . . . 447
14.3 Cubes and Higher Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
14.3.1 Direct Algebraic Expression . . . . . . . . . . . . . . . . . . . . . . . 449
14.3.2 Computing Powers by Squaring and Multiplying . . . 450
14.3.3 Approximate Fixed-Point Powers . . . . . . . . . . . . . . . . . . 451
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451

15 Specialization and Fusion of Floating-Point Operators . . . . . . . . . . 453


15.1 Floating-Point Constant Multiplication . . . . . . . . . . . . . . . . . . . . . 453
15.1.1 Floating-Point Multiplication by a Power of Two . . . . 454
15.1.2 Baseline Faithful Floating-Point Multiplier by a
Constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
15.1.3 Correctly Rounded Multiplication by a
Floating-Point Constant . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
15.1.4 Correctly Rounded Multiplication by a Real
Constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
15.1.5 Correct Rounding of the Floating-Point
Multiplication by a Rational Constant . . . . . . . . . . . . . . 461
15.1.6 Subnormal Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
15.2 Floating-Point Division by a Small Integer Constant . . . . . . . . . 464
15.2.1 Baseline Operator for Normalized Inputs . . . . . . . . . . . 464
15.2.2 Subnormal Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
15.3 Floating-Point Squares, Cubes, and X p . . . . . . . . . . . . . . . . . . . . . 467
15.4 Floating-Point Addition of Positive Terms . . . . . . . . . . . . . . . . . . 468
15.5 Combined Floating-Point Sum and Difference . . . . . . . . . . . . . . 469
15.6 Fused Multiply and Add, Other Sums of Products . . . . . . . . . . 469
15.7 Floating-Point Optimizations in an HLS Context . . . . . . . . . . . . 470
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472

Part III Generic Methods for Fixed-Point Function Approximation


16 Generalities on Fixed-Point Function Approximation . . . . . . . . . . . 477
16.1 Defining Domain and Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
16.2 Discretization Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
16.2.1 Range Issues and Their Solutions . . . . . . . . . . . . . . . . . . 481
16.2.2 Monotonicity Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
16.3 Some Classes of Numerical Functions . . . . . . . . . . . . . . . . . . . . . . 486
16.3.1 Algebraic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
16.3.2 Elementary and Special Functions . . . . . . . . . . . . . . . . . . 487
16.4 A First Generic Approximator: Simple Tabulation . . . . . . . . . . . 488
16.4.1 Tabulating Precomputed Function
Values in a ROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
16.4.2 Actual Cost of a ROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
xviii Contents

17 Function Evaluation Using Tables and Additions . . . . . . . . . . . . . . . 497


17.1 Lossless Differential Table Compression . . . . . . . . . . . . . . . . . . . . 497
17.1.1 Applicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
17.1.2 The Parameter Space of LDTC . . . . . . . . . . . . . . . . . . . . . 499
17.1.3 The LDTC Optimization Algorithm . . . . . . . . . . . . . . . . 500
17.1.4 Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
17.1.5 Evaluation and Observations . . . . . . . . . . . . . . . . . . . . . . 502
17.2 Multipartite Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
17.2.1 Applicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
17.2.2 The Basic Bipartite Method . . . . . . . . . . . . . . . . . . . . . . . . 504
17.2.3 Exploiting Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
17.2.4 From Bipartite to Multipartite: Splitting the TO . . . . . . 508
17.2.5 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
17.2.6 Computing the Sizes of the TOi . . . . . . . . . . . . . . . . . . . . 516
17.2.7 Filling the Tables Using HUB Format . . . . . . . . . . . . . . . 517
17.2.8 Errorless Compression of the TIV . . . . . . . . . . . . . . . . . . 518
17.2.9 Putting It All Together: The Multipartite
Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 519
17.2.10 The Issue of Non-Monitonicities . . . . . . . . . . . . . . . . . . . 521
17.2.11 Toward ILP-Optimized Multipartite Architectures . . . 522
17.2.12 Conclusion: Multipartite Architectures Are Close
to Optimal Among Order-One Methods . . . . . . . . . . . . 523
17.3 Other Table-and-Addition Methods . . . . . . . . . . . . . . . . . . . . . . . . 523
17.3.1 Addition-Table-Addition Methods . . . . . . . . . . . . . . . . . 523
17.3.2 Partial Product Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527

18 Polynomial-Based Architectures for Function Evaluation . . . . . . . . 529


18.1 A Primer on Polynomial Approximation . . . . . . . . . . . . . . . . . . . . 530
18.1.1 Taylor Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
18.1.2 Remez Minimax Approximation . . . . . . . . . . . . . . . . . . . 533
18.1.3 Relationship Between Degree and Accuracy . . . . . . . . 535
18.1.4 Relationship Between Interval Size and Accuracy . . . 535
18.1.5 Machine-Efficient Polynomial Approximation . . . . . . . 536
18.1.6 Summing Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
18.1.7 Polynomials Are a Special Case of Rational
Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
18.2 Generic Range Reduction Techniques . . . . . . . . . . . . . . . . . . . . . . . 541
18.2.1 Range Reduction by Uniform Piecewise Splitting . . . . 541
18.2.2 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
18.2.3 Logarithmic Piecewise Splitting . . . . . . . . . . . . . . . . . . . . 548
18.2.4 Hierarchical Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 551
18.2.5 Arbitrary Dichotomy-Based Segmentation . . . . . . . . . . 552
18.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
Contents xix

18.3 Hardware Polynomial Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 554


18.3.1 Horner Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
18.3.2 Parallel Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
18.3.3 Other Polynomial Evaluation Techniques . . . . . . . . . . . 561
18.4 Putting It All Together: Generation of Polynomial
Approximators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
18.4.1 Overall Error Analysis and Error Budget . . . . . . . . . . . 563
18.4.2 A Basic Polynomial Approximator Without Range
Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
18.4.3 A Polynomial Approximator with Uniform
Piecewise Range Reduction . . . . . . . . . . . . . . . . . . . . . . . 565
18.4.4 Combined Approximation and Rounding
Optimization Using integer linear programming
(ILP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
18.5 Floating-Point Architectures Based on Fixed-Point
Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569

19 Digit Recurrence for Algebraic Functions . . . . . . . . . . . . . . . . . . . . . . . 573


19.1 Digit-Recurrence Square Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
19.1.1 Range Reduction for Floating-Point and
Fixed-Point Binary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
19.1.2 Generic Radix-β Formulation . . . . . . . . . . . . . . . . . . . . . . 575
19.1.3 Simple Binary Restoring Square Root . . . . . . . . . . . . . . . 576
19.1.4 Binary Non-restoring Square Root . . . . . . . . . . . . . . . . . . 578
19.1.5 Exploiting Redundant Number Systems in the
Square Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
19.2 Cube Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
19.3 2D and 3D Euclidean Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
19.4 The E-Method for Evaluating Rational Functions . . . . . . . . . . . 591
19.4.1 Digit Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
19.4.2 Rational Approximation Compatible with the
E-Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594

Part IV Example Composite Operators


20 Fixed-Point Sine and Cosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
20.1 Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
20.2 Fixed-Point Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
20.2.1 Binary Angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
20.2.2 Scaled Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
20.2.3 Quick Overview of the Algorithms Compared in
This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
20.3 Argument Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
20.4 CORDIC Computing Just Right . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
20.4.1 Description of the Algorithm . . . . . . . . . . . . . . . . . . . . . . 605
xx Contents

20.4.2 Overall Error Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609


20.4.3 Approximation Error and Number of Iterations . . . . . 610
20.4.4 Rounding Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
20.4.5 Reduced Zi Datapath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
20.4.6 Replacing Half of the CORDIC Iterations with a
Small Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
20.5 An Architecture Based on Tables and Multipliers . . . . . . . . . . . . 614
20.5.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
20.5.2 Rounding Error Analysis and Implementation
Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
20.5.3 Architectural Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
20.5.4 Computing the Number of Guard Bits . . . . . . . . . . . . . . 619
20.5.5 Special Cases for Small Sizes . . . . . . . . . . . . . . . . . . . . . . . 620
20.6 Architecture Using a Generic Polynomial Evaluator . . . . . . . . . 620
20.7 Comparison and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621

21 Floating-Point Accumulation and Sum of Products . . . . . . . . . . . . . . 623


21.1 Motivation and Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
21.1.1 Exact Floating-Point Sum . . . . . . . . . . . . . . . . . . . . . . . . . . 625
21.1.2 Exact Floating-Point Sum of Products . . . . . . . . . . . . . . 626
21.1.3 Kulisch’s Universal Exact Accumulator . . . . . . . . . . . . . 627
21.1.4 Application-Specific Parameterization . . . . . . . . . . . . . . 628
21.2 Accumulator Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630
21.2.1 Exact Floating-Point Multiplier . . . . . . . . . . . . . . . . . . . . 630
21.2.2 Simple Accumulator Architectures for Small
Precisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632
21.2.3 High-Radix Carry-Save Architecture . . . . . . . . . . . . . . . 635
21.2.4 Conversion of the Accumulator Back to Floating
Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
21.3 Cost, Speed, and Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638
21.4 To Probe Further . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639

22 Floating-Point Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641


22.1 Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
22.2 Number Formats and the Exponential Function . . . . . . . . . . . . . 642
22.2.1 Fixed Point Versus Floating Point . . . . . . . . . . . . . . . . . . 642
22.2.2 Fixed-Point-In, Floating-Point-Out Version . . . . . . . . . . 643
22.2.3 Input Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
22.2.4 A First Architecture: Direct Tabulation . . . . . . . . . . . . . . 645
22.3 Architecture for Floating-Point Range Reduction . . . . . . . . . . . . 646
22.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
22.3.2 Accuracy Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 647
Contents xxi

22.3.3 Fixed-Point Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . 648


22.3.4 Computation of the Tentative Exponent . . . . . . . . . . . . 649
22.3.5 Computation of Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
22.3.6 Computation of eY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
22.3.7 Normalization and Rounding . . . . . . . . . . . . . . . . . . . . . . 651
22.3.8 Overall Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
22.4 Fixed-Point-In, Floating-Point Out Exponential . . . . . . . . . . . . . . 654
22.5 Fixed-Point Computation of eY . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
22.5.1 Using Large Tables and Square Multipliers . . . . . . . . . . 655
22.5.2 First-Order Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
22.5.3 Polynomial Approximation for Large Precisions . . . . . 660
22.5.4 FPGA-Specific Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 661
22.5.5 Iterative Computation of eY Using Small Tables and
Rectangular Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
22.5.6 To Read Further . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664

Part V Application Domains


23 Arithmetic in the Design of Linear Time-Invariant Filters . . . . . . . 669
23.1 An Introduction to Discrete-Time Signals and Filters . . . . . . . . . 670
23.1.1 Discrete-Time Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
23.1.2 Elementary Operations on Discrete-Time Signals . . . . 670
23.1.3 Discrete-Time Systems or Filters . . . . . . . . . . . . . . . . . . . 671
23.1.4 Linear Time-Invariant Filters . . . . . . . . . . . . . . . . . . . . . . 672
23.1.5 Specifying LTI Filters by Constant-Coefficient
Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676
23.1.6 Abstract Filter Structures Versus Finite-Precision
Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
23.1.7 Filters Directly Defined by Their Coefficients . . . . . . . . 681
23.1.8 Filters Defined in the Frequency Domain . . . . . . . . . . . 682
23.1.9 Filter Design: From a Frequency Specification to a
Time-Domain Architecture . . . . . . . . . . . . . . . . . . . . . . . . 685
23.2 An Arithmetic Approach to the Implementation
of LTI Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687
23.3 Hardware Digital Filter Faithful to an LTI Filter . . . . . . . . . . . . . 688
23.3.1 Worst-Case Peak Gain of an LTI Filter . . . . . . . . . . . . . . 689
23.3.2 Interface Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 690
23.3.3 Overall Error Analysis of the Implementation of a
Direct-Form LTI Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
23.3.4 Error Amplification in the Feedback Loop . . . . . . . . . . 692
23.3.5 Putting All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694
23.4 Hardware Digital Filter Faithful to a Frequency
Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
23.4.1 Formal Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
23.4.2 Hardware Filter Design as an Optimization Problem . 695
xxii Contents

23.4.3 State of the Art in the Design of Filters Faithful to a


Frequency Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
23.4.4 Frequency Constraints for Linear-Phase FIR Filters
in ILP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702

24 Arithmetic for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707


24.1 Neural Network Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
24.1.1 Basic Neural Network Structure . . . . . . . . . . . . . . . . . . . 708
24.1.2 Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
24.1.3 Topology and Hyperparameters . . . . . . . . . . . . . . . . . . . 713
24.1.4 Training and Using a Neural Network . . . . . . . . . . . . . . 713
24.1.5 Training Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
24.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
24.2.1 Convolutional Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716
24.2.2 Pooling Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
24.2.3 Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
24.2.4 Passthrough Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
24.2.5 Softmax Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
24.2.6 An Example CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
24.3 Number Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
24.3.1 Number Formats for Inference . . . . . . . . . . . . . . . . . . . . . 724
24.3.2 Number Formats for Training . . . . . . . . . . . . . . . . . . . . . . 729
24.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
24.4.1 The Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
24.4.2 Training by Backpropagation . . . . . . . . . . . . . . . . . . . . . . 731
24.4.3 Training and Activation Functions . . . . . . . . . . . . . . . . . 733
24.4.4 Quantization-aware training (QAT) . . . . . . . . . . . . . . . . 733
24.5 Implementation Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734
24.5.1 Architectures for Inference . . . . . . . . . . . . . . . . . . . . . . . . 734
24.5.2 Fast Convolution Algorithms . . . . . . . . . . . . . . . . . . . . . . 737
24.6 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
24.6.1 Google’s TPU Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
24.6.2 Binary CNN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 743
24.6.3 AddNet: FPGA-Specific Optimization of the
Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
24.6.4 Unrolling a Ternary CNN . . . . . . . . . . . . . . . . . . . . . . . . . 746
24.6.5 LogicNets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751

A Custom Arithmetic Datapath Design with FloPoCo . . . . . . . . . . . . . . 761


A.1 History of the FloPoCo Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761
A.2 FloPoCo From a User Point of View . . . . . . . . . . . . . . . . . . . . . . . . 762
A.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762
A.2.2 More on Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
Contents xxiii

A.2.3 Do Not Trust FloPoCo, the Test Bench Is Included . . . 764


A.2.4 Obscure Branches and Code Attics . . . . . . . . . . . . . . . . . 766
A.2.5 Automatic Pipelining, the User Point of View . . . . . . . 766
A.3 FloPoCo for Arithmetic Designers . . . . . . . . . . . . . . . . . . . . . . . . . . 769
A.3.1 Operators and Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . 770
A.3.2 The Target Class Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . 771
A.3.3 Automatic Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772
A.3.4 The BitHeap Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 775
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777

B Optimization Using Integer Linear Programming . . . . . . . . . . . . . . . . 781


B.1 Linear Programming Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782
B.1.1 A First Example Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 783
B.1.2 Integer, Binary, and Mixed-Integer Linear
Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783
B.2 A Tutorial on Using ILP Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784
B.2.1 Using Stand-Alone Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 785
B.2.2 ILP Solvers Embedded Within Other Scripting
Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787
B.2.3 ScaLP, a Common C++ Interface to ILP Solvers . . . . . . 787
B.3 Practical Problem Modeling with ILP . . . . . . . . . . . . . . . . . . . . . . . 787
B.3.1 Using Boolean Variables to Model Decision
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788
B.3.2 Translating Boolean Relations into Constraints . . . . . . 788
B.3.3 Indicator Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793
B.3.4 Splitting Integers into Their Binary
Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793
B.3.5 Counting Leading and Trailing Zeroes . . . . . . . . . . . . . . 794
B.4 Addressing Limitations of ILP Solvers . . . . . . . . . . . . . . . . . . . . . . 795
B.4.1 Run-Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795
B.4.2 Numerical Instabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
1 Introduction
CHAPTER 1

Mathematics is the queen of the sciences,


and arithmetic is the queen of mathematics.
Carl Friedrich Gauß

That arithmetic is a menial task is proved


by the fact that it is the only one that can be performed by a machine.
Arthur Schopenhauer

1.1 Computer Arithmetic

Computer arithmetic is the art of representing and processing numbers in a


machine. It has its roots in the abacus of ancient times and in the mechan-
ical calculator built by Schickhard, Pascal, Leibniz, and others during the
Renaissance. It has also been at the core of electronic computers since the
dawn of this technology in the 1940s.
Computer arithmetic is deeply rooted in technology. For instance, when
technology was based on humans’ head and fingers, numbers were repre-
sented in radix 10 or 60. Western abacuses, the Pascaline, the Arithmometer,
and the Eniac were decimal calculators. In the mid-1930s, however, Konrad
Zuse understood that machines would be more efficient when computing
in binary [Zus84; Cer90; Roj97]. The key idea was that it was much easier
to work with two states, on and off. After two years of attempts to process
binary numbers mechanically, Zuse turned to relay-based switching, and

© Springer Nature Switzerland AG 2024 1


F. de Dinechin, M. Kumm, Application-Specific Arithmetic,
https://doi.org/10.1007/978-3-031-42808-1_1
2 1 Introduction

later to vacuum tubes, and the idea remained relevant with transistor-based
switching, the core technology behind current computers.
During the electronic computer era, technological progress has often
driven evolutions in the computer arithmetic domain. One illustrative ex-
ample is the use of memory when evaluating elementary functions (expo-
nential and logarithm, trigonometric, etc.). In the late 1980s, as memory sizes
doubled every other year following Moore’s law, it became relevant to de-
sign algorithms that could replace long computations with large tables of
precalculated values [Tan89; Tan90; GB91; Tan92]. Such algorithms domi-
nated the state of the art for three decades. But what technology gives, it
can also take back. Even though memory size still increases, memory ac-
cess time has become the main performance bottleneck in current multicore
computers. For this reason, table-free algorithms are now considered more
efficient in many cases [CHT02]: it is often faster and more energy-efficient
to compute tens of additions and multiplications than to perform one mem-
ory lookup.
Digital technology is still evolving. How should computer arithmetic
adapt to current and upcoming changes?
One answer proposed in this book is look beyond traditional computer
arithmetic, long focused on the four basic operations and their monolithic,
one-size-fits-all implementations in the arithmetic units at the core of micro-
processors. The object of this book is the systematic study of non-standard
arithmetic units, built and optimized for specific applications: what we call
application-specific arithmetic.
The remainder of this chapter first justifies this claim and then sketches
the scope and potential of application-specific arithmetic.

1.2 Trends in Digital Technology

For five decades, the evolution of technology has been well summarized
by Moore’s law, which states that the number of transistors that can be inte-
grated on an economically viable chip doubles every other year. This law has
remained surprisingly accurate over this long period of time, partly because
it has become a self-fulfilling prophecy on which the economics of a ma-
jor part of the digital sector relies. The International Technology Roadmap
for Semiconductors (ITRS) has made sure that we get smaller transistors (or
equivalently more transistors per chip) as long as fundamental physical lim-
its are not reached.
However, the circuit-level performance brought by smaller transistors is
no longer what it used to be. Until the end of the twentieth century, each
technology generation brought smaller transistors that would switch faster
and consume less power (this is referred to as Dennard scaling). Thus, pro-
cessors were more and more powerful and clocked higher and higher. How-
1.2 Trends in Digital Technology 3

ever, there is a limit on the amount of heat per square centimeter that can be
dissipated off a chip, and this limit (the “power wall”) was reached around
2004. From there on, transistors could still be made smaller with each gen-
eration, but their practical operational frequency reached a plateau. This
prompted the switch to multicore processors as the most efficient means
to harness increased transistor densities.
Unfortunately, in the upcoming technology generations, the increase of
transistor density will entail a further increase in power density. Roughly
speaking, every two years, transistor density is multiplied by two, but the
power dissipation of each transistor is only reduced by a factor 1.4 [Tay12].
The net effect is that we can no longer afford to operate 100% of a chip all
the time. A certain percentage must be kept switched off (or dark, hence
the term dark silicon [Mer09]). Even worse, this percentage is now increasing
exponentially with each technology generation.

1.2.1 The Dark Silicon Era

Taylor [Tay12] reviews architectural solutions to the dark silicon problem,


some of which are relevant to computer arithmetic. The main one is to
complement the usual, universal processor cores with specialized copro-
cessors that offer only one functionality, but with much higher efficiency.
Such specialized coprocessors may be arithmetic components or may be
dedicated to higher-level functionality such as cryptography, signal process-
ing for software-defined radio, face detection, fingerprint recognition, etc.
Each of these high-level coprocessors will itself require application-specific
arithmetic, for instance, a multiplier by .log(2) in a coprocessor comput-
ing a floating-point exponential or a low-precision .atan( x/y) in a signal-
processing pipeline.
Let us open a parenthesis about the recent history of the division opera-
tion in microprocessors, to illustrate that this solution is already at work. The
design of hardware dividers has been the subject of much research and is
very well covered by other books [EL94; EL04]. In 1997, a paper by Oberman
and Flynn [OF97a] investigated in detail the relationship between divider
latency and system performance. They observed that only 3% of floating-
point operations are divisions in the SPECFP92 benchmark suite, but that
these few divisions could account for up to 40% of the latency if a low-area,
long-latency divider is used. The conclusion was that low latency hardware
dividers were needed in order to achieve the best performance. This was in
line with the prevalent IA32 instruction set, which has included division in-
structions and division hardware since the original 8086 processor (integer
division) and 8087 coprocessor (floating-point division). Despite this, a few
years later, at the turn of the century, the IA64 instruction set was designed
by HP and Intel without division hardware and even without a division
4 1 Introduction

instruction. The argument was the following: it was possible to build very
efficient division in software thanks to a new instruction, the Fused Mutiply-
Add (FMA) [Mar00]. Therefore, the silicon area occupied by a hardware di-
vider could be much better spent, for instance, on a second FMA, which is a
more generally useful operator than a divider.
However, we are now seeing a comeback of hardware dividers in proces-
sors [Bru18; Bru22]. The reason for that is, again, power consumption (and
not performance as in 1997 [OF97a]). A nicely pipelined hardware divider
will be much more power-efficient at computing a division than software:
the latter needs to read and decode instructions, move data from and to the
register file, etc. Even if a hardware divider would not be faster than the
equivalent software (and it is), it would increase the overall performance by
freeing some power budget to be used by other computations. Therefore, it
seems a worthwhile use of dark silicon.
What function is next? Square root is very comparable to division in terms
of algorithms and complexity. However, still according to [OF97a], it ac-
counts for less than 0.33% of floating-point instructions. Then, we may see a
regain of interest in implementations of floating-point elementary functions
such as exponential, logarithm, and trigonometrics. Some GPUs (graphics
processing units) already provide hardware acceleration for the most com-
mon of these functions [OS05].
Another way to put dark silicon to use, according to [Tay12], is to use
more area to buy energy. For instance, since power dissipation is propor-
tional to the square of the frequency, two copies of a functional block, each
clocked to half the frequency, will actually consume half the power. Here,
what is needed is not a new operator, but a performance variation on an
existing one. This book will also attempt to cover this performance space.
We will conclude this section about dark silicon with three quotes. The
first two come from Don Burger’s keynote talk at the HIPEAC 2013 confer-
ence:
“The end of Moore’s Law doesn’t mean the end of progress”, and “noth-
ing in our careers has been as fundamental as this transition.”
The third one is from the conclusion of [Tay12]: “Although silicon is get-
ting darker, for researchers the future is bright and exciting.”
The intent of this book is to bring some of this excitement in the computer
arithmetic domain.

1.2.2 The Dawn of the Reconfigurable Computer

A field programmable gate array (FPGA) is an integrated circuit designed


to emulate arbitrary digital circuits. Technically, its structure is that of a grid
of configurable logic functions, embedded in a fabric of configurable rout-
ing channels (see Fig. 1.1). The word configuration describes the process of
1.2 Trends in Digital Technology 5

assigning a function to the field programmable gate array (FPGA). After


configuration, the FPGA behaves as the circuit it has been configured for.
The configuration of an FPGA is held in static registers and can be changed
arbitrarily. More details will be given in Chap. 4.

Configurable
logic blocks

Configurable
routing resources

Fig. 1.1 Simplified overview of FPGA structure.

This configurability has a cost: A circuit emulated in an FPGA typically


performs one order of magnitude slower than the same circuit cast directly
to silicon [KR07]. This FPGA performance gap is mostly due to the reconfig-
urable routing: wires are longer, and information has to pass through the
switches that ensure the reconfigurability. On a positive note, FPGAs have
not yet hit their power wall. This is due to their lower operating frequency,
combined with a vast area for the reconfigurable routing that dissipates
more heat than it generates.
FPGAs were initially designed for rapid prototyping (emulating a logic
design before committing it to silicon) and as a flexible replacement to pro-
grammable logic arrays. However, they evolved to universal computing ma-
chines. For instance, they can emulate a conventional computer, but also
other programming models, such as von Neumann’s cellular automata or
data-flow machines. They were therefore soon used as “reconfigurable com-
puters.” At the time of writing this book, several companies successfully sell
FPGA-based computing, and FPGA-accelerated cloud services are commer-
cially available.
How can an FPGA compete with a processor considering its performance
gap? The first answer is that a processor is also inherently quite inefficient
6 1 Introduction

due to its sequential programming model. A typical processor cycle consists


of fetching an instruction, decoding it, reading the operands from registers,
executing the instruction, and finally storing the result in registers. Among
all these steps, only one step (execution) actually performs the computation.
All the other steps are there for program and data management. Hence, an
FPGA architecture that only implements the execution steps will typically
be more efficient. Figure 1.2 shows an example of such an architecture for a
simple computation. There is no program management necessary: the vari-
ous operators are laid out on the FPGA fabric. There are registers (typically
between each operator, an even inside operators in a pipelined design), but
these registers are statically assigned and therefore without the address de-
coding logic found in a processor’s register file.

b ×

− d
4 ×
a ×
c

Fig. 1.2 Dataflow computation of .d = b2 − 4ac.

A second way an FPGA can outperform a processor despite its perfor-


mance gap is a better use of parallelism. If a stream of data is available
at the input of a dataflow architecture such as that of Fig. 1.2, all its oper-
ators may function in parallel, in a pipeline fashion. Besides, the operators
implemented in this architecture exactly match the requirements of the ap-
plication. There is also a lot of parallelism in a processor: single instruction,
multiple data (SIMD) parallelism (as in the MMX/SSE/AVX instructions
of Pentium-like processors), superscalar parallelism, multicore parallelism,
and Symmetric Multi-Threading (SMT). However, the amount of operators
is fixed. It has been carefully engineered to best match the average load of
a processor, but will almost always be suboptimal for a given application.
Consider, for instance, the computation of .d = b2 − 4ac in floating point.
Here, we have more multiplications than additions. In a typical processor,
the ratio of floating-point multipliers to floating-point adders is 1:1, so this
computation under-uses the addition hardware. Besides, many other hard-
ware operators are unused: the integer ones, the divider, etc.
A third, and often overlooked, way to bridge the FPGA performance gap
is to tailor, as tightly as possible, the arithmetic to the application. A maxi-
mally efficient implementation would, for each of its operations, toggle and
transmit just the number of bits required by the application at this point.
Conventional microprocessors have a fixed, limited choice of data widths
Another random document with
no related content on Scribd:
stems of the water-palm which was growing at hand in great
profusion, and answered the purpose excellently. It was, however,
partly destroyed by fire, and required great care in crossing. We
could not trust the animals on it, so we had to fall back on our rope,
and haul them across a little higher up the river, where the water was
deeper and the current consequently less violent.
Just below the bridge were a series of magnificent cascades,
which filled the air for a long distance round with their stupendous
roar. As we intended making another march that day, we went on
again after a short halt. The men had had no food for three days,
except the remains of the insignificant quantity of meat I shot a few
days before. We were therefore anxious to reach the cultivated
country in order to buy fresh supplies for them.
After a weary walk from eleven in the morning to four in the
afternoon, we were relieved to find ourselves among the shambas of
the natives. We camped beside a small stream close to a village,
and immediately opened a market, and when the natives appeared
we bought a small supply of maize and sweet potatoes, which were
at once served out to our hungry men.
CHAPTER III.
FROM THE TANA TO M’BU.

We reach and cross the Tana—Maranga—The abundance of food


thereof—We open a market—We treat the Maranga elders to
cigars, with disastrous results—Bad character of the Wa’M’bu—
We resume our journey—A misunderstanding with the A’kikuyu—
We reach M’bu.
Early the following morning we struck camp and travelled due north,
following native paths. Ascending a low hill, we were unexpectedly
greeted by the paramount chief of the district, who rejoiced in the
name of Kinuthia, and several of his elders. He presented us, by way
of an introduction, with a gourd containing about half a gallon of
fresh milk, which we much appreciated, signifying the same in the
usual manner. When we regained our breath once more, Kinuthia
handed us a note given him by Mr. Hall, a Government officer, who
had been up there a month before in order to select a site for the
new Government station for the Kenia district; which stated that
Kinuthia was a friendly chief, and desired to be recognized as such.
We immediately recognized him as such by enlisting him as our
guide to the Sagana, which we expected to be able to cross that day.
After a short conversation he took the lead, and on we marched
again. He led us across some very rough country for an hour and a
half, when we reached a small, swift river, an affluent of the Sagana.
We crossed without much trouble by the timely aid of the ragged-
looking A’kikuyu noblemen in attendance on their chief. Another two-
hour tramp followed, when we at last reached the Sagana, which is
really a noble river, abounding in hippo here, as indeed it does
everywhere. We saw no crocodiles, though we inquired most
anxiously after them.
Kinuthia informed us that the Somalis’ safari had crossed three
weeks or a month before. One of Jamah Mahomet’s cows, while
fording the river, had been seized by a crocodile and the poor
beast’s shoulder torn right out. We did not feel more comfortable on
receipt of this intelligence, but we were assured by the natives that
they had since poisoned all the crocodiles for a distance of half a
mile or so each side of the ford, though they thought it likely that a
stray reptile or two might have escaped the general poisoning. We
had no choice, however; so we stripped and waded, chin-deep, to
the opposite side, about eighty yards distant.
The current was immensely powerful, and the bottom very pebbly
and slippery; but we were assisted by some of Kinuthia’s aristocracy,
and made the passage in safety. Our men were tired and rather
nervous of the current, so for three “makono” (about 1½ yards) of
cloth each, we induced fifteen of the aforesaid A’kikuyu noblemen to
carry their loads across for them—a task they successfully
accomplished, Kinuthia himself not disdaining to discard his royal
robes (a goatskin) and earn his piece of cloth.
We breakfasted on the bank, and then made another move, as
Kinuthia impressed upon us the fact that an hour’s journey further on
was situate the village of Manga, the chief of the Maranga, whose
people had an abundance of food for sale, and where we should be
able to buy all the supplies we needed without any trouble. He said
he would accompany us and introduce us, which we thought was
very good of him.
Our way lay through dense plantations, which fully bore out friend
Kinuthia’s assertions as to the richness of the district in food-stuffs.
In an hour we reached a gently sloping hill, covered with short green
grass, on which we pitched our camp. We sent for the chief, who
shortly afterwards made his appearance. He seemed a very decent
old fellow, and anxious to assist us. We stated our requirements, and
he immediately commanded his people to bring us food for sale, and
did everything in his power—short of giving anything away himself—
to show us that he was friendly and well-disposed towards us.
His son, Koranja, a rather good-looking young fellow for a native,
had been down to Mombasa with a safari, and spoke Kiswahili fairly
well. He seemed very intelligent. Some of the old men of the tribe
also spoke Kiswahili, which, we presumed, they had picked up from
passing Arab or Swahili safaris. Kinuthia bade us adieu and returned
to his own village the other side of the Sagana, having received from
us a suitable present of beads, etc., to gladden his heart, or rather
the hearts of his wives.
Large quantities of food then began to arrive, and we decided to
stop where we were for a day or two, and buy at least ten days’
rations for the men, before resuming our journey northwards. We
retired that night a great deal easier in our minds about the
commissariat than we had been for some days.
Next morning the camp was fairly buzzing with natives of all ages
and both sexes. Most of them had brought food to sell, but many of
them came merely to look at us. Not that we were much to look at; in
any civilized community we should have run a great risk of being
arrested as vagrants and suspicious characters. El Hakim and
George both wore embryo beards, and our appearance generally
was rather that of tramps than otherwise. El Hakim had a great
affection for a pair of moleskin trousers and a leather jacket, both of
which had seen much service. His hat, too, had known better days;
but it was an idiosyncrasy of his to wear his clothes on safari work till
they were absolutely beyond further mending and patching. On one
occasion he was reported to have tramped about the Lykipia plateau
for months, clad only in a coloured cloth and a pair of brown boots,
with a towel twisted round his head turban-wise, he having lost his
only hat. I can vouch for the comfort of such a dress in a good
climate such as obtains on the Waso Nyiro, as I tried the experiment
myself.
THE CAMP AT MARANGA.
BUYING FOOD AT MARANGA. (See page 54.)

As soon as we had breakfasted, we went about the important


business of marketing. Maranga, as is Kikuyu generally, is
extraordinarily rich and fertile. All kinds of grain are exceedingly
plentiful. Among those brought to us for sale were millet (Panicum
Italicum), called by the natives “metama;” Pennisetum spicatum,
known as “mwele,” a seed resembling linseed, which grows on a
close spike like a bulrush flower; Eleusine corocana, known as
“uimbe;” and “muhindi,” or “dhurra” (maize). A large variety of edible
roots is also cultivated, the most common being “viazi” (sweet
potatoes), “vikwer” (yams), and “mahogo” (manioc). Sugar-cane was
very largely grown, and is known to the natives as “mewa.” The
stalks of metama, which are called “kota,” are also chewed by the
natives on account of the sweetish sap. The half-grown stalks of the
same plant are known as “metama m’tindi.” “N’dizi” (bananas) are
also extensively cultivated, but we never ate any, as they are never
allowed to ripen. The natives pluck them while they are green and
hard, and roast them in hot ashes. When cooked they have the
appearance and taste of a floury potato, though with a slightly
astringent flavour. Wild honey was procurable in moderate
quantities. It is called “assala,” evidently derived from the Arabic
word for the same substance, “assal.” The Masai name for honey is
“naischu,” the word generally used in Kikuyu. At certain seasons of
the year the staple diet of the natives is “kundu” (beans), of which we
saw two varieties, viz. “maragua,” a small white bean like a haricot,
and “baazi,” a black bean which grows in pods on a small tree like a
laburnum. They also grow several kinds of gourds, named
respectively “mumunye,” which resembles a vegetable marrow in
size and appearance, “kitoma,” a small, round kind, and “tikiti,” a
small water-melon. It will be observed that we did not lack variety.
We bought large quantities of m’wele, which our Swahilis at first
refused to eat: they said it was “chickens’ food.” They knew better
afterwards. We also procured some “mazewa” (fresh milk) for
ourselves. Food was comparatively cheap. A “makono” of cloth or a
handful of beads bought several “kibabas” of grain or beans. A
kibaba equals about a pint. The term “makono” (meaning, literally, a
hand) is applied to the measure of the forearm from the tip of the
elbow to the end of the second finger, generally about eighteen
inches. Four makono equal one “doti” (about two yards), and twenty-
five yards or so make a “jora” or “piece” of cloth.
The beads most in demand were the small red Masai beads
known as “sem-sem.” We did not part with any wire, as we wanted it
for the districts farther north.
George and I went out in the forenoon to try and shoot hippo in the
Sagana, which was only an hour’s walk from the camp. On reaching
a likely pool, I sat down on the bank to watch. George had turned
very sick again on the way, and laid down under a shady tree. I shot
two hippo in the water, but they sank, and though I sent men down
the river to watch the shallows, I never saw any more of them.
There were a lot of guinea-fowl about, so I sent back to camp for
my shot-gun. George was feeling so queer that he went back also.
When my gun arrived, I had a good time among the guinea-fowl,
securing eight in an hour or so. I also got a partridge, which turned
up in a—for it—inopportune moment.
When I got back to camp, I found that El Hakim had been highly
successful in his marketing, and had obtained a large quantity of
food, mostly mwele, muhindi, and some viazi. For our own
consumption we had laid in a stock of muhindi cobs, maragua
beans, and some butter. The butter was snow-white, but, being
made from curdled milk, was very acid and unpalatable.
The natives always drink their milk sour; they do not understand
our preference for fresh milk. Another thing that tends to make their
milk unpopular with European travellers is the dirty state of the
vessels it is kept in. They are made from gourds which have had the
inside cleaned out by the simple process of burning it out with hot
ashes, which gives the milk a nasty charred flavour. The finished
milk vessel is called a “kibuyu.” I have been told that they stir the
freshly drawn milk with a charred stick from the fire, to preserve it,
but I never saw it done. The Masai especially are very bad offenders
in this respect. The old women who milk the cows invariably wash
out the empty vessels with another fluid from the same animal,
certainly never intended by nature for that purpose. If the milk is
intended for sale to the “wasungu” (white men), it is more often than
not adulterated in the same nauseous manner.
We lunched on some of the guinea-fowl I had shot in the forenoon.
Ramathani somehow boiled them tender. Afterwards we held a
“shaurie” (council), at which old Manga and many of his elders
attended. We wanted all the information we could obtain about our
road northward, the districts we should have to pass through, and
the position of the various streams and camping-places.
We were smoking Egyptian cigarettes, a box of which we
numbered among our most precious possessions, and it was rather
a nuisance to have to pass a freshly lighted cigarette round the circle
of natives squatted in front of El Hakim’s tent for each to take a whiff.
They could not properly appreciate them, and it seemed to me very
much like casting pearls before swine. In addition, when the cigarette
was returned, the end was chewed about, and a good smoke
thereby spoiled. If we lit another, the same process was repeated.
The native gentlemen called it etiquette. I considered it downright
sinful waste, an opinion in which El Hakim evidently concurred, as,
after we had had several cigarettes spoiled in this provoking manner,
he turned to me and said, “Get out your box of ‘stinkers,’ Hardwick,
and let’s try the old gentlemen with those.”
I thought it was a splendid idea, so I brought out two of them, and,
lighting one myself, handed the other to old Manga. He glanced at it
suspiciously, turning it over and over in his grimy paws. He had
apparently never seen a cigar before, but seeing me smoking a
similar specimen, he at last ventured to light it. It seemed to grate on
him a little, but he said nothing, and puffed stolidly away for a
moment or two, though I could see his powers of self-control were
being exerted to the utmost. After a game struggle the cigar scored a
distinct success, and Manga, deliberately passing it on to the elder
on his right, rose slowly, and, stalking with great dignity out of camp,
disappeared behind a clump of bushes.
The old man to whom he handed it gazed wonderingly after him
for a moment, then, placing the fatal weed between his aged lips, he
took a long pull and inhaled the smoke. A startled look appeared in
his dim old eyes, and he threw a quick glance in my direction; but I
was calmly puffing away at mine, so he said nothing either, and took
another whiff. In a few short moments he in his turn was vanquished,
and, handing the cigar to his next neighbour, retired with great
dignity to the clump of bushes, where he and old Manga offered up
sacrifices to the goddess Nicotina with an unanimity that was as
surprising as it was novel.
It was only with the very greatest difficulty that we managed to
control our risible faculties. We were inwardly convulsed with
laughter at the facial expressions of the old gentlemen before and
after tasting the fearsome weed. The looks of delighted, though
timorous, anticipation, the startled realization, and the agonized
retrospection, which in turn were portrayed on the usually blank and
uninteresting countenances of Manga’s Ministers of State, was a
study in expression that was simply killing. One by one they tasted it;
one by one they retired to the friendly clump of bushes that
concealed their exaltation from prying eyes; and one by one they
returned red-eyed and shaky, and resumed their places, inwardly
quaking, though outwardly unmoved.
We also had to get up and go away, but not for the same purpose.
If we had not gone away and laughed, we should have had a fit or
burst a blood-vessel. It was altogether too rich. We returned red-
eyed and weary also, and I believe that the old gentlemen thought
that we had been up to the same performance as themselves,
though they could not understand how I resumed my cigar on my
reappearance, and continued smoking with unruffled serenity. I
made a point of finishing my smoke to the last half-inch, and all
through the “shaurie” that succeeded I became aware that I was the
recipient of covert glances of admiration, not unmixed with envy,
from the various members of that little band of heroic sufferers in the
cause of etiquette.
When the “shaurie” was at length resumed, we gained a lot of
interesting information. We found that the people who had attacked
Finlay and Gibbons were the Wa’M’bu, who live two days’ journey to
the north of Maranga, on the south-east slopes of Mount Kenia. They
had a very bad reputation. The Maranga people spoke of them with
bated breath, and remarked that they were “bad, very bad,” and that
if we went through their country we should certainly be killed.
Jamah Mahomet’s safari, numbering nearly 100 guns, had refused
to go through M’bu, and had turned off to the west from Maranga, to
go round the west side of Mount Kenia and thence northward to
Limeru, as the district north-east of Kenia is called by the Swahilis.
There are many different peoples between Maranga and M’thara,
the most northerly inhabited country, though they are all A’kikuyu in
blood. Beyond M’thara the desert stretches away to southern
Somaliland and Abyssinia, with Lake Rudolph in the foreground
about twelve days’ march north-west of M’thara.
The Maranga elders entreated us very urgently to go round west
of Kenia by the same route as Jamah Mahomet and Co., but we did
not see things in the same light at all. We were three white men with
twenty-five guns; and, as El Hakim observed, we were “not to be
turned from our path and our plans disarranged by a pack of howling
savages, however bad a reputation they might have”—a decision we
conveyed to our Maranga friends forthwith. They heard it with much
raising of hands and rolling of eyes, and clearly regarded us as
persons of unsound mind, who really ought to be kept in
confinement; but still, they said, if we were determined to court a
premature end in M’bu, why, they would do all in their power to help
us—an ambiguity we indulgently excused in consideration of the
evident sincerity of their wish to advise us for our good.
We were informed that all the people northward were “kali sana”
(very fierce), and we should do well to use the utmost precaution in
passing through the various districts—a piece of advice we did not
intend to disregard. To go round the other way meant quite a
fortnight more on the road to M’thara, in addition to which El Hakim
was very anxious to see Mount Kenia from the east side, as, indeed,
were we all, as no white men that we knew of had been round that
way before. Perhaps the fact that the Somalis funked the M’bu route
had something to do with our decision also.
We gathered what information we could of the topography of M’bu
and the adjacent countries, which afterwards proved exceedingly
useful. We packed up our goods and chattels, and made our
preparations for a start on the morrow. One of our men, Hamisi, had
a severe attack of dysentery, and we made arrangements with the
old Manga to leave him behind with enough cloth for his keep for
some months. Manga’s son Koranja and some of the old men
signified their intention of accompanying us part of the way. It
appeared that for two days’ journey we should be among friendly
tribes. After that, the Wa’M’bu!
We started the following morning as soon as Koranja appeared.
The country was extraordinarily rich and fertile. The soil is bright red,
and produces, in conjunction with the constant moisture, a practically
unlimited food-supply. The ground was very hilly and well watered—
too well watered for our comfort. There were no large trees, but the
undergrowth was very rank and dense. We saw large quantities of
the castor-oil plant (Ricinus communis) growing wild. The natives
press the dark-coloured oil from the seeds and smear their bodies
with it.
Several times on that morning’s march we saw Koranja, who was
leading, dart hurriedly to one side, and, leaving the path, plunge into
the undergrowth, making a devious détour round something,
followed, of course, by the safari. We asked the reason of his
strange conduct, and the answer more than satisfied us. It was the
single word “ndui” (small-pox). We passed quite half a dozen villages
which were entirely depopulated by the scourge. Now and again we
saw a solitary emaciated figure, covered with small-pox pustules,
crouching on the side of the path, watching us with an uninterested
and vacant stare. On a shout from Koranja and a threatening motion
of his spear, it would slink mournfully away into the deeper recesses
of the jungle.
We reached a small clearing about midday, and camped. We were
unable to build a boma round the camp, owing to the absence of
thorn trees, or any reliable substitute; so that we were in a measure
defenceless against a sudden attack. Large numbers of armed
natives soon put in appearance, and swaggered in and out with
great freedom, and even insolence. We cleared them out politely, but
firmly, and they then congregated outside and discussed us. They
talked peacefully enough, but it was more like the peaceful singing of
a kettle before it boils over. We ate our lunch, and retired to our
tents. George and I went to our own tent, and, taking off our boots,
laid down on our blankets for a quiet smoke. Our men seemed very
much upset by the stories they had heard in Maranga concerning the
warlike qualities of the Wa’M’bu, and their condition could only be
described as “jumpy.” To put it plainly, they were in a pitiable state of
fright, and needed careful handling, if we were to avoid trouble with
the natives through their indiscretion; as trouble would come quite
soon enough of its own accord without that.
GROUP OF A’KIKUYU.

To resume, George and I had lain down, perhaps, half an hour,


and were quite comfortable and half asleep, when a terrific
altercation caused us to jump up and rush outside. We were just in
time to assist El Hakim in forcibly disarming our men. Some of them
were placing cartridges in the breeches of their rifles; a few yards
away a vast crowd of natives were frantically brandishing their
spears and clubs and yelling like demons. If a shot had been fired,
we should have been in rather a tight place, for, as I have said, the
camp was quite open, and practically defenceless. If the A’kikuyu
had rushed us, then the chances are that another fatality would have
been added to Africa’s already long list. As it was, by much shouting
and punching, we induced our excited and frightened men to put
down their weapons in time, and so regained control over them.
Koranja, shaking visibly, went up to the Kikuyu chief and smoothed
matters down, after which mutual explanations ensued. It appeared
that an M’kikuyu warrior had indulged too freely in “tembo” (native
beer), and had run amuck through our camp. Our men, in their
already fidgety state, jumped to the conclusion that they were being
attacked, seized their rifles, and were about to use them, when our
timely appearance on the scene prevented a very pretty butchery.
The natives professed to be very sorry for what had occurred, and,
seizing their drunken companion, hurried him away, and peace, if not
harmony, was restored.
We did not trust them, however, as they seemed very sullen over
the whole business. Koranja was also very nervous, and showed it,
which did not tend to reassure our men. We ate our dinner at dusk,
to the accompaniment of howling and shouting from A’kikuyu
concealed in the surrounding bush. We doubled the guard at
sundown, just before we went to dinner, giving them the most
precise instructions in the event of an alarm. At the conclusion of the
meal we were startled by a volley from the sentries. The whole camp
was immediately alarmed, and symptoms of a panic manifested
themselves. We restored order with a little difficulty, and, on
investigation, found that the sentries had fired on some natives
skulking round in undue proximity to the camp.
We now made every preparation for attack, and made
arrangements for one or the other of us to be on guard all night. I
took the first watch from 8 p.m. to 10 p.m., and El Hakim the second
from 10 p.m. to 12 a.m.; but everything remained quiet, and El
Hakim did not think it necessary to call George at midnight, the rest
of the night proving uneventful, with the exception that our fox-terrier
gave birth to six puppies, of which she seemed very proud.
At daylight we struck camp, and were away before the sun was
fairly up. The country was much the same as on the day before,
though, if anything, the jungle was more dense. The shambas were
filled to overflowing with unripe muhindi and pumpkins, while sweet
potatoes and beans were growing in great profusion on every side.
Travelling in the early morning was decidedly unpleasant, as the dew
collected on the shrubbery was shaken down upon us in showers,
wetting us through to the skin. We crossed two or three small rivers,
and at midday reached and camped at a place called Materu.
The chief soon put in an appearance, and we purchased a further
supply of food, in the shape of potatoes, beans, muhindi, and a little
honey. We also obtained further information of the road through the
notorious M’bu country which, I must confess, did not seem to have
any better reputation the nearer we approached it.
Our Maranga friends, under Koranja, appeared very frightened at
their close proximity to the dreaded Wa’M’bu, and intimated their
intention of returning to Maranga. We answered that they might go
when we gave them permission, but for the present we required their
services; with which answer they had perforce to be content.
The next morning we again travelled through much the same
densely populated and cultivated country as that hitherto passed,
though it seemed to get more mountainous. We had not as yet got a
view of Mount Kenia, as the sky had been for days covered with a
thick curtain of grey clouds. Koranja informed us that two hours after
starting we should reach a river called “Shelangow,” which was the
boundary of M’bu. We said the sooner the better.
At midday, after some hours’ steady march, we appeared to be as
far from the “Shelangow” as ever, though we had been informed that
it was “huko mbeli kidogo” (only just in front) for over three hours. As
the men were very tired, El Hakim decided to camp, in spite of
Koranja’s energetic protests that the Shelangow was “karibu
kabissa” (very near). The country was very wet with the constant
drizzle and mist, which made the steep clayey paths exceedingly
slippery, while between the shambas the way led through thickets of
brambles and stinging nettles, which caused the porters endless
discomfort. On halting, we built a boma of shrubs; not that we
thought it would be of any use in case of an attack, but to give the
men confidence. We wrote letters and gave them to Koranja, on the
remote chance that they would get down to Nairobi, and thence to
England. (They did get down four months later, and were delivered in
England five months after they were written.)
In the evening Koranja and his friends then bade us an
affectionate and relieved farewell. They remarked in parenthesis that
they would never see us again, as the Wa’M’bu would certainly kill
us all; a belief that probably explained why they helped themselves
to all our small private stock of sweet potatoes before they left; a
moral lapse that—luckily for them—we did not discover till next
morning. Our men sent a deputation to us during the evening,
pointing out the perils of the passage through M’bu, and saying that
we should of a certainty be killed, and most likely eaten. This
statement we received with polite incredulity, and dismissed the
deputation with a warning not to do it again.
Next morning I was very queer, a large lump having formed in my
groin. This is a very common complaint in East Africa and Uganda,
supposedly due to over-fatigue and walking, though I think climate
and diet have something to do with it. George had two very bad ones
on his way down from Uganda. It was my second experience of
them, and the oftener I suffered from them, the less I liked them, as
they are exceedingly painful. The only cure seems to be complete
rest, and hot fomentations applied to the swelling.
We did not travel that day in consequence, but occupied ourselves
in buying a little food and getting what further information we could
about the road ahead. There were not many natives or villages about
—a fact easily explained by the contiguity of the M’bu border. The
place where we were camped was a sort of neutral territory, or “no
man’s land.”
Next day, soon after daylight, we set out for the Shelangow, which
was reached after a couple of hours’ march over very steep country.
It proved to be merely a mountain torrent, which we easily crossed.
On the other side rose a very steep hill, to the top of which we
climbed, and found ourselves at last in the country of the dreaded
Wa’M’bu.
CHAPTER IV.
FROM M’BU, ACROSS EAST KENIA, TO ZURA.

First sight of Kenia—Hostile demonstrations by the M’bu people—We


impress two guides—Passage through M’bu—Demonstrations in
force by the inhabitants—Farewell to M’bu—The guides desert—
Arrival in Zuka—Friendly reception by the Wa’zuka—Passage
through Zuka—Muimbe—Igani—Moravi—Arrival at Zura—
Welcome by Dirito, the chief of Zura.
In order that there should be no misunderstanding on the part of the
Wa’M’bu as to our calibre, El Hakim determined to pursue an
aggressive policy, without, however, committing any overt act. We
accordingly pitched our camp in the middle of one of their shambas,
and helped ourselves freely to anything we fancied in the way of
muhindi, etc. Their natural line of reasoning would be that a safari
which had the effrontery to act in that way must be very powerful,
and should therefore be approached with caution.
The result entirely justified our action; which was only what we
expected, as with bullying natives, might is always right.
No natives came into our camp—a bad sign, though we saw many
skulking round in the bush. They seemed very morose and sulky, but
so far showed no signs of active hostility. We put on a double guard
for the night, and went to sleep in our clothes; but we were not
disturbed.
We did not travel the following morning, as we were without
guides; and as no natives came into camp we resolved to capture
one on the first available opportunity. At sunrise we got our first
glimpse of Mount Kenia, and a wonderful view it was. Kenia is called
“Kilimaro” by the Swahilis, and “Donyo Ebor” (Black Mountain) and
“Donyo Egere” (Spotted Mountain) by the Masai; so called because
of the large black patches on the main peak, where the sides are too
precipitous for the snow to lodge.
Thompson[2] describes his first impressions of Kenia thus:—
“As pious Moslems watch with strained eyes the appearance of
the new moon or the setting of the sun, to begin their orisons, so we
now waited for the uplifting of the fleecy veil, to render due homage
to the heaven-piercing Kenia. The sun set in the western heavens,
and sorrowfully we were about to turn away, when suddenly there
was a break in the clouds far up in the sky, and the next moment a
dazzling white pinnacle caught the last rays of the sun, and shone
with a beauty, marvellous, spirit-like, and divine; cut off, as it
apparently was, by immeasurable distance from all connection with
the gross earth. The sun’s rays went off, and then, with a softness
like the atmosphere of dreams, which befitted the gloaming, that
white peak remained as though some fair spirit with subdued and
chastened expression lingered at her evening devotions. Presently,
as the garish light of day melted into the soft hues and mild
effulgence of a moon-lit night, the ‘heaven-kissing’ mountain became
gradually disrobed; and then in its severe outlines and chaste beauty
it stood forth from top to bottom, entrancing, awe-inspiring—meet
reward for days of maddening worry and nights of sleepless anxiety.
At that moment I could almost feel that Kenia was to me what the
sacred stone of Mecca is to the Faithful, who have wandered from
distant lands, surmounting perils and hardships, that they might but
kiss or see the hallowed object, and then, if it were God’s will, die.”
While I am unable to rise to the dizzy heights of rhetorical
description, or revel in the boundless fields of metaphor so
successfully exploited by Mr. Thompson, I fully endorse his remarks.
The first sight of Kenia does produce a remarkable impression on
the traveller; an impression which does not—one is surprised to find
—wear off with time. Kenia, like a clever woman, is chary of
exhibiting her manifold charms too often to the vulgar gaze. One can
live at the base of the mountain for weeks, or even months, and
never get a glimpse of its magnificent peak.
We, however, could not stop to romance, as the enemy were even
now clamouring without our gates; and we were reluctantly
compelled to turn our wandering attention to a more serious
business. It appeared quite within the bounds of possibility that we
should “die” without even “kissing” the “hallowed object” so ably
eulogized by Mr. Thompson; as the irreverent Wa’M’bu were making
hostile demonstrations in the thick bush surrounding our camp,
regardless of our æsthetic yearnings. They were apparently trying
our temper by means of a demonstration in force, and such awful
howlings as they made I never previously heard.
Our men became very nervous, and fidgeted constantly with their
guns, looking with strained gaze into the bush without the camp. El
Hakim was, as usual, quite undisturbed, and George and I
succeeded in keeping up an appearance of impassive calm, and
condescended even to make jokes about the noise, an attitude
which went a long way towards reassuring our men, who watched us
constantly. Any sign of nervousness or anxiety on our part would
have been fatal, as the men would have instantly scattered and run
for the border, with a result easily foreseen.
The morning passed in this manner, the Wa’M’bu continuing their
howling, while we went through our ordinary camp routine with as
much nonchalance as we could command.
We had lately lived largely upon vegetables, and now determined
to give ourselves a treat, so we cooked our only ham, and made an
excellent lunch on ham and boiled muhindi cobs. During the meal
the war-cries of the Wa’M’bu increased in volume, and our men were
plainly very much disturbed. They kept looking in our direction as if
for orders; while we appeared as if utterly unaware that anything
untoward was happening.
Presently Jumbi came up with his rifle at the shoulder, and
saluting, stood a yard or so away from the table. El Hakim was busily
eating, and studiously ignored him for a moment or two. Presently he
looked up.
“Yes?” he said inquiringly.
Jumbi saluted again. “The ‘Washenzi,’ Bwana!” said he.
“Well?” interrogated El Hakim again.
“They are coming to attack us, Bwana, on this side and on that
side,” said Jumbi, indicating with a sweep of his arm the front and
rear of the camp.
“All right,” said El Hakim, “I will see about it after lunch; I am eating
now. You can go.”
And Jumbi, saluting once more, went off to where the men were
nervously waiting. His account of the interview, we could see,
reassured them greatly. They concluded the “Wasungu” must have
something good up their sleeve to be able to take matters so calmly.
At the conclusion of the meal we instructed our men to shout to
the enemy and ask them as insolently as possible if they wanted to
fight. There was a sudden silence on the part of the Wa’M’bu when
they realized the purport of the words; but in a little time a single
voice answered, “Kutire kimandaga” (We do not want to fight). We
then invited their chief to come into camp, an invitation he seemed
very slow to accept, but after long hesitation he mustered up
sufficient courage, and walked slowly into camp, accompanied by
one other old man.
He was a fine-looking, grey-haired old chap, and carried himself
with great dignity. Negotiations were opened with a few strings of
beads, which after a moment’s indecision he accepted. We then
talked to him gently, but firmly, and asked the reason of the
unseemly noise outside.
“Do you want to fight?” we asked aggressively.
He replied that the old men did not want to fight, but the young
men did.
“Very well,” we said, still more aggressively, “go away and tell the
young men to come on and fight us at once, and let us get it over.”
He then added that the young men did not want to fight either.
This was our opportunity, and, seizing it, we talked very severely
to him, intimating that we were much annoyed at the noise that had
been made. We did not consider it at all friendly, we said, and if there
were any more of it, we should not wait for the young men to come
to us, we should go to them and put a stop to their howling.

You might also like