Lecture8 PDF

Lecture 7 EE 486
Multiplication Add-and-Shift Algorithm

EE 486 lecture 7: Integer Multiplication
Multiplicand “X” 110 6
Multiplier “Y” x 101 x5
110
Partial Products “PP” 000
110
M. J. Flynn Result “S” 11110 30
slides prepared by Albert Liddicoat and Hossam Fahmy X Shift Y

0
ALU
Shift P
Computer Architecture & Arithmetic Group 1 Stanford University Computer Architecture & Arithmetic Group 2 Stanford University
Parallel Multiplication Generation of PP’s Using ROMs

1 Simultaneous generation of partial products
4b x 4b Multiply Using
2 Parallel reduction of partial products
256x8-bit ROM
3 Carry Propagate Addition (CPA) 8b x 8b Multiply Using
256x8-bit ROMs
Address
X
Data
R
Parallel generation of partial products Y
X7 X6 X5 X4 X3 X2 X1 X0
Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0
Using AND gate as 1x1-bit multiplier DOT representation x
.. ... .. .
X3 X2 X1 X0 * Y3 Y2 Y1 Y0
X2 X1 X0 X2 X1 X0
x Y2 Y1 Y0 x Y2 Y1 Y0 X7 X6 X5 X4* Y3 Y2 Y1 Y0
R5 R4 R3
X2Y0 X1Y0 X0Y0
X2Y1 X1Y1 X0Y1
X2Y2 X1Y2 X0Y2
R2 R1 R0
.
.. . . ..
X3 X2 X1 X0 * Y7 Y6 Y5 Y4
X7 X6 X5 X4 * Y7 Y6 Y5 Y4
Generation of PP’s Using ROMs Booth’s Algorithm

Observation: We can replace a string of 1’s in the multiplier by +1 and -1.
4, 8, and 16 bit Multiply
8 x 8 bit Multiply using using 256x8bit ROMs ...0111110...
Example:
256x8b ROMs 16x16 bit ...0111110...
+1
8x8 bit -1
a
b 4x4 bit ...1000000...
c -1
d
. . . 1 0 0 0 0-1 0 . . .
n h CSA Delay X*(0 1 1 1 1 1) X*(1 0 0 0 0-1)
Y Y
4 1 0 X 1 -X -1
b 8 3 1 X 1 0 0
d a 16 7 4
X 1 0 0
c X 1 0 0
32 15 6 X 1 0 0
64 31 8 0 0 X 1
M.J. Flynn 1
Lecture 7 EE 486
Booth’s Algorithm Modified Booth 2

+ Reduces the number of partial products which in turn reduces the
hardware and delay required to sum the partial products. • Booth 2 modified to produce at most n/2+1
- Adds Delay into the formation of the Partial Products. partial products.
• Works well for serial multiplication that can tolerate variable
latency operations by reducing the number of serial additions Algorithm: (for unsigned numbers)
required for the multiplication.
1) Pad the LSB with one zero.
• The number of serial additions depends on the data (multiplicand). 2) Pad the MSB with 2 zeros if n is even and 1 zero if n is odd.
• Worst case 8-bit multiplicand requires 8 additions 3) Divide the multiplier into overlapping groups of 3-bits.
01010101 <=> 1 -1 1 -1 1 -1 1 -1 4) Determine partial product scale factor from modified
• Parallel systems generally are designed for worst case hardware booth 2 encoding table.
and latency requirements. Standard booth 2 does not significantly 5) Compute the Multiplicand Multiples
reduce the worst case number of partial products. 6) Sum Partial Products
Modified Booth 2 Modified Booth 2

4) Determine action from
Example: (n=8-bits unsigned) table for each 3-bit group 5) Compute Multiplicand Scale Factor (~4 gate delays)
Modified Booth 2 Mux Control
Scale Multiplier
1) Pad LSB with 1 zero Yi+1 Yi Yi-1 Factor Direct Multiplier Logic
Xi Xi-1 0
0 0 0 +0 Xi Yj Scale
2) n is even so Pad MSB with 2 zeros Factor Action
0 0 1 +X
+0 Mux 0
0 1 0 +X
0 0 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 0 +X Mux Xi
0 1 1 +2X
Yj+1 +2X Mux Xi-1
1 0 0 -2X Yj
1 0 1 -X -X Mux Xi’
Yj-1
3) Form 3-bit overlapping groups -2X Mux Xi-1’
1 1 0 -X PP ij
for n=8 we have 5 groups 1 1 1 -0 PP ij
Modified Booth 2 Modified Booth 2

6) Sum Partial Products Algorithm Extension: (for signed multiplier)
• Sign extend partial products to the full width of the final result
• Logic may replace the A9, B9, C9, D9, and E9 sign extention bits. 1) Pad the LSB with one zero.
• Yi+1 bit determines if the multiple needs to be complemented 2) If n is even don’t pad the MSB ( n/2 PP’s) and
Multiplicand if n is odd sign extend the MSB by 1 bit ( n+1/2 PP’s).
Partial Products 3) Divide the multiplier into overlapping groups of 3-bits.
Yi+1 Yi YI-1
4) Determine partial product factor from table.
A9 . . . A9 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0 (Y1 Y0 0) 5) Compute the Multiplicand Multiples
B9 . . . B8 B7 B6 B5 B4 B3 B2 B1 B0 Y1 (Y3 Y2 Y1) 6) Sum Partial Products n Even: n Odd:
C9 . . . C6 C5 C4 C3 C2 C1 C0 Y3 (Y5 Y4 Y3)
D9 . . . D4 D3 D2 D1 D0 Y5 (Y7 Y6 Y5) Xn-1 Xn-2 Xn-2 Ø Xn-1 Xn-2
E9 . . . E2 E1 E0 Y7 (0 0 Y7)
Positive: 0 x x 0 0 x
S15. . . S 10 S9 S8 S7 S6 S5 S4 S3 S2 S1 S0 Negative: 1 x x 1 1 x
M.J. Flynn 2
Lecture 7 EE 486
Modified Booth 2 Booth 3

• Reduces the partial products to ~n/3
Algorithm Extension: (for signed multiplicand) • Form overlapping groups of 4 bits.
Nothing the algorithm works fine ! 0 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 0
• The multiplicand may be represented in 2’s complement code. • Booth 3 Encoding Scale Scale
Yi+2 Yi+1 Yi Yi-1 Factor Yi+2 Yi+1 Yi Yi-1 Factor
• The scale factors (0, +X, +2X, -X, and -2X) are handled correctly. 0 0 0 0 +0 1 0 0 0 -4X
• Shift left for 2 times weighting. 0 0 0 1 +X 1 0 0 1 -3X
Note: 3x is a hard 0 0 1 0 +X 1 0 1 0 -3X
•2’s complement the multiplicand for subtraction. multiple that must 0 0 1 1 +2X 1 0 1 1 -2X
0 1 0 0 +2X 1 1 0 0 -2X
be precomputed 0 1 0 1 +3X 1 1 0 1 -X
0 1 1 0 +3X 1 1 1 0 -X
0 1 1 1 +4X 1 1 1 1 -0
Partially Redundant Booth 3 Partially Redundant Booth 3
M.J. Flynn 3

Lecture8 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture8 PDF

Uploaded by

Copyright:

Available Formats

Lecture 7 EE 486

Multiplication Add-and-Shift Algorithm

slides prepared by Albert Liddicoat and Hossam Fahmy X Shift Y

Parallel Multiplication Generation of PP’s Using ROMs

Generation of PP’s Using ROMs Booth’s Algorithm

Booth’s Algorithm Modified Booth 2

Modified Booth 2 Modified Booth 2

Modified Booth 2 Modified Booth 2

Modified Booth 2 Booth 3

Nothing the algorithm works fine ! 0 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 0

Partially Redundant Booth 3 Partially Redundant Booth 3

You might also like