Professional Documents
Culture Documents
(Alan v. Oppenheim, Ronald W. Schafer, John R. Buc
(Alan v. Oppenheim, Ronald W. Schafer, John R. Buc
(Alan v. Oppenheim, Ronald W. Schafer, John R. Buc
Multiplication
Parts Chapters
1. Numbers and Arithmetic
2. Representing Signed Numbers
I. Number Representation 3. Redundant Number Systems
4. Residue Number Systems
Elementary Operations
II. Addition / Subtraction 7. Variations in Fast Adders
8. Multioperand Addition
Chapter Goals
Study shift/add or bit-at-a-time multipliers
and set the stage for faster methods and
variations to be covered in Chapters 10-12
Chapter Highlights
Multiplication = multioperand addition
Hardware, firmware, software algorithms
Multiplying 2’s-complement numbers
The special case of one constant operand
a Multiplicand
x Multiplier
x 0 a 20
Partial
x 1 a 21
products
x 2 a 22 bit-matrix
x 3 a 23
p Product
Fig. 9.1 Multiplication of two 4-bit unsigned binary numbers in dot notation.
Apr. 2015 Computer Arithmetic, Multiplication Slide 7
Multiplication Recurrence
a Multiplicand
Fig. 9.1 x Multiplier
x 0 a 20
Partial
x 1 a 21
products
x 2 a 22 bit-matrix
x 3 a 23
p Product Preferred
Multiplier x
Shift
Multiplicand a
0
k xj
0 1
Mux
xj a k p(j+1) = (p(j) + xj a 2k) 2–1
Adder |–––add–––|
cout |––shift right––|
k
1 1
0 0
1 1
0 (11)ten
1 1
0 0 0
1 1
0 1
0 1
0 0
1 0 (110)ten
1 0 1 0 (10)ten
k k–1
To adder To mux control
Multiplier x
Shift
xk-j-1 2k Multiplicand a
0
0
Mux 1
xk-j-1a k
2k-bit adder
cout
2k
Multiplicand
1 0 Enable
Mux
Select
k+1
Justification
2j + 2j–1 + . . . + 2i+1 + 2i = 2j+1 – 2i
Column j
Software aspects:
Optimizing compilers replace multiplications by shifts/adds/subs
Produce efficient code using as few registers as possible
Find the best code by a time/space-efficient algorithm
Hardware aspects:
Synthesize special-purpose units such as filters
y[t] = a0x[t] + a1x[t – 1] + a2x[t – 2] + b1y[t – 1] + b2y[t – 2]
R7 R8 – R1
R112 R7 shift-left 4 CPA 1
R119 R112 + R7
119x
R65 R1 shift-left 6 + R1
Chapter Highlights
High radix gives rise to “difficult” multiples
Recoding (change of digit-set) as remedy
Carry-save addition reduces cycle time
Implementation and optimization methods
Add/subtract z i/2 a
control To adder input
Fig. 10.6 The multiple generation part of a radix-4
multiplier based on Booth’s recoding.
Apr. 2015 Computer Arithmetic, Multiplication Slide 37
10.3 Using Carry-Save Adders
Multiplier
0 2a
M ux
0 a x i+1 xi
Old Cumulative M ux
Partial Product
CSA
Adder
New Cumulative Partial Product
Partial Product
k Sum
k
Right
Carry shift
Multiplican d
0
Mux
Upper half of PP Lower half of PP
k-Bit CSA
(a) Multiplier
Carry k k block diagram (b) Operation in a typical cycle
Sum Fig. 10.8 Radix-2 multiplication with
the upper half of the cumulative partial
k-Bit Ad der product kept in stored-carry form.
x x x
i+1 i i-1
Booth recoder
and selector
Old cumulative
z a
partial product i/2
CSA
New cumulative
partial product 2-bit
Adder
Adder FF Extra “dot”
Recoding Logic
0 2a
M ux
0 a x i+1 xi
Old Cumulative M ux
Partial Product
(4; 2)-counter Fig. 10.11 Radix-4 multiplication,
with the cumulative partial product,
CSA
xia, and 2xi+1a combined into two
numbers by two CSAs.
CSA
New Cumulative
Partial Product
2-Bit
Adder FF Adder
CSA CSA
CSA
Fig. 10.12 Radix-16
multiplication with the
upper half of the CSA
cumulative partial Sum
4
product in carry-save 4-Bit
3 FF Ad der
Carry
form. Partial Product 4
To th e Lower Half
(Upper Half) of Partial Produ ct
Apr. 2015 Computer Arithmetic, Multiplication Slide 43
Other High-Radix Multipliers Multiplier
0 8a
Remove this mux & CSA and Mux
x i+3
0 4a
replace the 4-bit shift (adder)
Mux
with a 3-bit shift (adder) to get 2a x i+2
4-Bit 0
a radix-8 multiplier (cycle time Shift Mux
x i+1
0 a
will remain the same, though) Mux xi
CSA CSA
Adder Adder
High-radix
Basic or Full
binary Speed up partial tree Economize tree
Pipelined Pipelined
Radix-8 Radix-8
Booth Booth
Reco der Reco der
& Selector & Selector
CSA CSA
Sum Sum
Carry Carry
5 6
6-Bit
FF Ad der
Adder 6
To th e Lower Half
of Partial Produ ct
Fig. 10.14 Twin-beat multiplier Fig. 10.16 Conceptual view
with radix-8 Booth’s recoding. of a three-beat multiplier.
Apr. 2015 Computer Arithmetic, Multiplication Slide 47
10.6 VLSI Complexity Issues
A radix-2b multiplier requires:
bk two-input AND gates to form the partial products bit-matrix
O(bk) area for the CSA tree
At least Q(k) area for the final carry-propagate adder
Total area: A = O(bk)
Latency: T = O((k/b) log b + log k)
Any VLSI circuit computing the product of two k-bit integers must
satisfy the following constraints:
AT grows at least as fast as k3/2
AT2 is at least proportional to k2
The preceding radix-2b implementations are suboptimal, because:
AT = O(k2 log b + bk log k)
AT2 = O((k3/b) log2b)
Apr. 2015 Computer Arithmetic, Multiplication Slide 48
Comparing High- and Low-Radix Multipliers
AT = O(k2 log b + bk log k) AT 2 = O((k3/b) log2b)
Low-Cost High Speed AT- or AT 2-
b = O(1) b = O(k) Optimal
AT O(k 2) O(k 2 log k) O(k 3/2)
AT 2 O(k 3) O(k 2 log2 k) O(k 2)
Chapter Highlights
Tree multiplier = reduction tree
+ redundant-to-binary converter
Avoiding full sign extension in multiplying
signed numbers
Array multiplier = one-sided reduction tree
+ ripple-carry adder
Apr. 2015 Computer Arithmetic, Multiplication Slide 50
Tree and Array Multipliers: Topics
High-radix
Basic or Full
binary Speed up partial tree Economize tree Redundant result
Redundant-to-Binary
Fig. 10.13 High-radix multipliers Converter
as intermediate between sequential Some lower-order
Higher-order product bits are
radix-2 and full-tree multipliers. product bits generated directly
... ...
Small tree of
Large tree of carry-save
carry-save Log- adders
adders depth
Log-
Adder depth Adder
Product Product
Partial-Products
Reduction Tree
2. Partial products reduction tree (Multi-Operand
Addition Tree)
Redundant result
Fig. 11.1
3. Redundant-to-binary converter Redundant-to-Binary
Converter
Some lower-order
Higher-order product bits are
product bits generated directly
[1, 6]
Fig. 11.3 Possible 7-bit CSA 7-bit CSA
CSA tree for a 7 7 [2, 8] [1,8] [5, 11] [3, 11]
tree multiplier.
7-bit CSA
[6, 12] [3, 12]
[2, 8]
CSA trees are quite 7-bit CSA
irregular, causing [3,9] [2,12]
some difficulties in [3,12]
VLSI realization
The ind ex pair 10-bit CSA
[i, j] means that [3,12]
Thus, our motivation bit pos ition s
[4,13] [4,12]
to examine alternate from i up to j
are involved.
methods for partial 10-bit CPA
products reduction
Ignore [4, 13] 3 2 1 0
Level
Inputs FA
1
FA FA FA
Level-1 FA
carries
11 + y1 = 2y1 + 3 FA FA FA
Level-2 Level
Therefore, y1 = 8 carries 2
FA
carries are needed FA FA
Level-3
carries FA
Level
Fig. 11.4 3
FA
A slice of a Level-4
FA
balanced-delay carry
tree for 11 Level
FA
FA 4
inputs.
Outputs FA
Level
5
Apr. 2015 Computer Arithmetic, Multiplication Slide 57
Binary Tree of 4-to-2 Reduction Modules
4-to-2 compressor
4-to-2 4-to-2 4-to-2 4-to-2
FA
0 1
4-to-2 4-to-2
0 1
FA
4-to-2
c s c s
(a) Binary tree of (4; 2)-counters (b) Realization with FAs (c) A faster realization
Similarly,
using Booth’s
recoding may
not yield any
advantage,
because it
introduces
Redundant-to-binary converter
irregularity
1 1
--------------------------------------------------------
-
p p p p p p p p p p
9 8 7 6 5 4 3 2 1 0
a4 a3 a2 a1 a0
Fig. 11.8 c. Baugh-Wooley x4 x3 x2 x1 x0
----------------------------
_
_ a4 x0 a3 x0 a2 x0 a1 x0 a0 x0
–a4x0 = a4(1 – x0) – a4 _ a4 x1 a3 x1 a2 x1 a1 x1 a0 x1
= a4 x 0 – a 4 _ a4 x2 a3 x2 a2 x2 a1 x2 a0 x2
a4 x3 a
_ _3 x3 _
a2 x3 a
_1 x3 a0 x3
a4 x4 a3 x4 a2 x4 a1 x4 a0 x4
–a4 a4x0 a4 a4
a4 1 x4 x4
--------------------------------------------------------
In next column -
p p p p p p p p p p
9 8 7 6 5 4 3 2 1 0
a4 a3 a2 a1 a0
d. Modified B-W x
4
x
3
x
2
x
1
x
0
---------------------------
__
__ a x a x a x a x a x
–a4x0 = (1 – a4x0) – 1 a x
__
4 0
a x
3 0
a x
2 0
a x
1 0
a x
0 0
= (a4x0) – 1 __ a x
4 2
a4 x1
3 2
a3 x1 a2 x1 a1 x1
2 2 1 2 0 2
0 1
a__x a x a x a x a x
4 3 __3 3 __2 3 __
1 3 0 3
–1 (a4x0) a x
4 4
a x
3 4
a x
2 4
a x
1 4
a x
0 4
1 1
1 --------------------------------------------------------
-
p p p p p p p p p p
In next column 9 8 7 6 5 4 3 2 1 0
Apr. 2015 Computer Arithmetic, Multiplication Slide 62
a4 a3 a2 a1 a0
Baugh-Wooley Methods a4 x2 a3 x2 a2 x2 a1 x2 a0 x2
a4 x3 a3 x3 a2 x3 a1 x3 a0 x3
a4 x4 a3 x4 a2 x4 a1 x4 a0 x4
--------------------------------------------------------
-
p9 p8 p7 p6 p5 p4 p3 p2 p1 p0
1 1
a4
x4
a4
x4
-------------------------------------------- --------------------------------------------------------
-
p p p p p p p p p p
9 8 7 6 5 4 3 2 1 0
+ a4 a4 a4 x 3 a4 x 2 a4 x 1 a4 x 0 a4 a3 a2 a1 a0
d. Modified B-W x x x x x
+ x4 x4 a3 x 4 a2 x 4 a1 x 4 a0 x 4 4 3 2 1 0
---------------------------
__
a x a x a x a x a x
__ 4 0 3 0 2 0 1 0 0 0
a4 a4 __ a
__
x
a x
a4 x1
a x
3
a x 1
a x
2
a x 1
a x
1
a x 1
a x
0 1
4 2 3 2 2 2 1 2 0 2
a__x a x a x a x a x
1 x4 x4 a x
4 4
4 3 __
a x
3 4
3 3 __
a x
2 4
2 3
a x
1 4
__
1 3
a x
0 4
0 3
-------------------------------------------- 1 1
--------------------------------------------------------
-
p p p p p p p p p p
9 8 7 6 5 4 3 2 1 0
p0
x3 a CSA a3 x1
a4 x1 a2 x1 a1 x1 a0 x1
CSA p1
x4 a a3 x2
a2 x2 a1 x2 a0 x2
a4 x2
CSA p2
a3 x3
a2 x3 a1 x3 a0 x3
a4 x3
CSA
p3
a3 x4
a2 x4 a1 x4 a0 x4
Ripple-Carry Adder a4 x4
p4
ax 0
p9 p8 p7 p6 p5
Fig. 11.11 A basic array
multiplier uses a one-sided CSA Fig. 11.12 Details of a 5 5
tree and a ripple-carry adder. array multiplier using FA blocks.
p9 p8 p7 p6 p5 p4
p2
x3
p3
x4
FA p4
p5
p9 p8 p7 p6
i i Bi Level i
Fig. 11.16 Carry-save Mux
addition, performed in i i
Bi+1
level i, extends the i+1 i+1
CSA Latch
Sum CSA
Carry
Lower part of Sum
FF the cumulative
Carry
Adder partial product
h-Bit Lower part of
Adder FF the cumulative
partial product
Adder h-Bit
Adder
Latched FA
Fig. 11.18 Pipelined 5 5 FA with
AND gate FA
array multiplier using
latched FA blocks. Latch
FA
are latches.
p9 p8 p7 p6 p5 p4 p3 p2 p1 p0
Chapter Goals
Learn additional methods for synthesizing
fast multipliers as well as other types
of multipliers (bit-serial, modular, etc.)
Chapter Highlights
Building a multiplier from smaller units
Performing multiply-add as one operation
Bit-serial and (semi)systolic multipliers
Using a multiplier for squaring is wasteful
multipliers and 4-bit [4, 7] [4, 7] [0, 3] [4, 7] [4, 7] [0, 3] [0, 3] [0, 3]
12 8
Add Add
[8,11] [4, 7]
12
Add
000 [8,11]
Add
[12,15]
xH xL
Mult 1 Mult 3 Mult 2
Benefit is quite significant for extremely wide operands
(4/3)5 = 4.2 (4/3)10 = 17.8 (4/3)20 = 315.3 (4/3)50 = 1,765,781
Apr. 2015 Computer Arithmetic, Multiplication Slide 79
12.2 Additive Multiply Modules
y z y
ax
x z
a
x [4, 5] [6, 7]
a [4, 7]
[8, 9]
[10,13]
[10,11]
x [6, 7] p[0, 1]
a [4, 7] p[2, 3]
p[4,5]
p[12,15] p [10,11] p [8, 9] p [6, 7]
*
x [4, 5]
*
x [6, 7]
Bit-serial multiplier
…a a a …p p p
(Must follow the k-bit 2 1 0
inputs with k 0s;
alternatively, view …x x x ? 2 1 0
Sum
FA FA FA FA
Product
Carry (serial out)
e e+d
f f+d
CL CR CL CR
g g–d
h h–d
–d +d
Original delays Adjusted delays
Fig. 12.8 Example of retiming by delaying the inputs to CL
and advancing the outputs from CL by d units
Apr. 2015 Computer Arithmetic, Multiplication Slide 85
Alternate Explanation of Systolic Retiming
t t t+d1 d1 t
Su m
FA FA FA FA
Product
Carry (serial out)
Fig. 12.7
Multiplicand (parallel in) x0 x1 x2 x3
a3 a2 a1 a0
Multiplier
(serial in)
LSB-first
Sum
FA FA FA FA
Product
Carry (serial out)
Su m
FA FA FA FA
Product
Carry (serial out)
Fig. 12.7
Multiplicand (parallel in) x0 x1 x2 x3
a3 a2 a1 a0
Multiplier
(serial in)
LSB-first
Sum
FA FA FA FA
Product
Carry (serial out)
Fig. 12.10 Systolic circuit for
4 4 multiplication in 15 cycles.
Apr. 2015 Computer Arithmetic, Multiplication Slide 88
A Direct Design for a Bit-Serial Multiplier
p (i–1) ai xi ai a(i - 1)
a
x
t out t in xi x(i - 1)
ai x(i - 1) Already
accumulated
cout ai xi c in into three
ai xi
numbers
(5; 3)-counter
2 1 0 xi a(i - 1)
1
sin s out
0
M ux
p
Already output
Fig. 12.11 Building block for a (a) Structure of the bit-matrix
latency-free bit-serial multiplier.
... ai p(i - 1)
... xi ai xi
ai x(i - 1)
... t out t in LSB xi a(i - 1)
... cout c in 0
2p (i ) Shift right to
obtain p(i )
... sin s out pi
FA FA ... FA FA FA
4 4 4 4
Mod-15 CSA
Divide by 16
Mod-15 CSA
Address n inputs
...
Fig. 12.16 One way
to design of a 4 4 .
modulo-13 multiplier. Table . CSA Tree
.
Data
3-input
Fig. 12.17 A method for Modulo-m
Adder
modular multioperand addition.
sum mod m
Apr. 2015 Computer Arithmetic, Multiplication Slide 91
12.5 The Special Case of Squaring
Multiply x by x x4 x3 x2 x1 x0
x4 x3 x2 x1 x0
x4 x0 x3 x0 x2 x0 x1 x0 x0 x0
x4 x1 x3 x1 x2 x1 x1 x1 x0 x1
x4 x2 x3 x2 x2 x2 x1 x2 x0 x2
x4 x3 x3 x3 x2 x3 x1 x3 x0 x3
x4 x4 x3 x4 x2 x4 x1 x4 x0 x4
p9 p8 p7 p6 p5 p4 p3 p2 p1 p0
Simplify
_
x4 x3 x4 x2 x4 x1 x4 x0 x3 x0 x2 x0 x1 x0 x0
x4 x3 x2 x3 x1 x2 x1 x1
x3 x2 x1x0 –x1x0
p9 p8 p7 p6 p5 p4 p3 p2 0 x0
Fig. 12.18 Design of a 5-bit squarer.
p 3b bits
Additive input
Fig. 12.19
Dot matrix for the Dot-notation
4 4 multiplication representations
(c)
of various methods
for performing
Carry-save additive input a multiply-add
operation
Dot matrix for the
4 4 multiplication in hardware.
(d)
Apr. 2015 Computer Arithmetic, Multiplication Slide 94