Booth's Multiplication Algorithm

Booth’s Multiplication Algorithm

Booth algorithm gives a procedure for multiplying binary integers in signed 2’s complement
representation in efficient way, i.e., less number of additions/subtractions required. It
operates on the fact that strings of 0’s in the multiplier require no addition but just shifting and
a string of 1’s in the multiplier from bit weight 2^k to weight 2^m can be treated as 2^(k+1 ) to

As in all multiplication schemes, booth algorithm requires examination of the multiplier

bits and shifting of the partial product. Prior to the shifting, the multiplicand may be added to
the partial product, subtracted from the partial product, or left unchanged according to
following rules:
1. The multiplicand is subtracted from the partial product upon encountering the first least
significant 1 in a string of 1’s in the multiplier
2. The multiplicand is added to the partial product upon encountering the first 0 (provided
that there was a previous ‘1’) in a string of 0’s in the multiplier.
3. The partial product does not change when the multiplier bit is identical to the previous
multiplier bit.
Hardware Implementation of Booths Algorithm – The hardware implementation of the booth
algorithm requires the register configuration shown in the figure below.

Booth’s Algorithm Flowchart

SSIPMT RAIPUR | Computer System Architecture Notes

We name the register as A, B and Q, AC, BR and QR respectively. Qn designates the least
significant bit of multiplier in the register QR. An extra flip-flop Qn+1is appended to QR to
facilitate a double inspection of the multiplier. The flowchart for the booth algorithm is shown

AC and the appended bit Qn+1 are initially cleared to 0 and the sequence SC is set to a number
n equal to the number of bits in the multiplier. The two bits of the multiplier in Qn and Qn+1are
inspected. If the two bits are equal to 10, it means that the first 1 in a string has been
encountered. This requires subtraction of the multiplicand from the partial product in AC. If the
2 bits are equal to 01, it means that the first 0 in a string of 0’s has been encountered. This
requires the addition of the multiplicand to the partial product in AC.
When the two bits are equal, the partial product does not change. An overflow cannot occur
because the addition and subtraction of the multiplicand follow each other. As a consequence,
the 2 numbers that are added always have a opposite signs, a condition that excludes an
overflow. The next step is to shift right the partial product and the multiplier (including Qn+1).
This is an arithmetic shift right (ashr) operation which AC and QR ti the right and leaves the sign
bit in AC unchanged. The sequence counter is decremented and the computational loop is
repeated n times.
Example – A numerical example of booth’s algorithm is shown below for n = 4. It shows the
step by step multiplication of -5 and -7.
MD = -5 = 1011, MD = 1011, MD'+1 = 0101
MR = -7 = 1001
The explanation of first step is as follows: Qn+1
AC = 0000, MR = 1001, Qn+1 = 0, SC = 4
Qn Qn+1 = 10
So, we do AC + (MD)'+1, which gives AC = 0101
On right shifting AC and MR, we get
AC = 0010, MR = 1100 and Qn+1 = 1
0000 1001 0 4
AC + MD’ + 1 0101 1001 0
A SH R 0010 1100 1 3
AC + MR 1101 1100 1
A SH R 1110 1110 0 2
A SH R 1111 0111 0 1
AC + MD’ + 1 0010 0011 1 0
Product is calculated as follows:
Product = AC MR
Product = 0010 0011 = 35

Example: Multiply the two numbers 7 and 3 by using the Booth's multiplication

Ans. Here we have two numbers, 7 and 3. First of all, we need to convert 7 and 3 into
binary numbers like 7 = (0111) and 3 = (0011). Now set 7 (in binary 0111) as multiplicand
(M) and 3 (in binary 0011) as a multiplier (Q). And SC (Sequence Count) represents the
number of bits, and here we have 4 bits, so set the SC = 4. Also, it shows the number of
iteration cycles of the booth's algorithms and then cycles run SC = SC - 1 time.

Qn Qn + 1 M = (0111) AC Q Qn + 1 SC
M' + 1 = (1001) & Operation
1 0 Initial 0000 0011 0 4
Subtract (M' + 1) 1001
Perform Arithmetic Right Shift operations 1100 1001 1 3
1 1 Perform Arithmetic Right Shift operations 1110 0100 1 2
0 1 Addition (A + M) 0111
0101 0100
Perform Arithmetic right shift operation 0010 1010 0 1
0 0 Perform Arithmetic right shift operation 0001 0101 0 0

The numerical example of the Booth's Multiplication Algorithm is 7 x 3 = 21 and the binary
representation of 21 is 10101. Here, we get the resultant in binary 00010101. Now we
convert it into decimal, as (000010101)10 = 2*4 + 2*3 + 2*2 + 2*1 + 2*0 => 21.

Example: Multiply the two numbers 23 and -9 by using the Booth's multiplication

Here, M = 23 = (010111) and Q = -9 = (110111)

Qn Qn + 1 M=010111 AC Q Qn + 1 SC
M' + 1 = 1 0 1 0 0 1
Initially 000000 110111 0 6
1 0 Subtract M 101001
Perform Arithmetic right shift 110100 111011 1 5
1 1 Perform Arithmetic right shift 111010 011101 1 4
1 1 Perform Arithmetic right shift 111101 001110 1 3
0 1 Addition (A + M) 010111
Perform Arithmetic right shift 001010 000111 0 2
1 0 Subtract M 101001
Perform Arithmetic right shift 111001 100011 1 1
1 1 Perform Arithmetic right shift 111100 110001 1 0

Qn + 1 = 1, it means the output is negative

Hence, 23 * -9 = 2's complement of 111100110001 => (00001100111)

Bit pair recoding

Bit pair recoding halves the maximum number of summands. Group the Booth-recoded multiplier
bits in pairs and observe the following: The pair (+1 -1) is equivalent to the pair (0 +1). That is
instead of adding -1 times the multiplicand m at shift position i to +1 (M at position i+1, the same
result is obtained by adding +1 (M at position i. Eg: 11010 – Bit Pair recoding value is 0 -1 -2
Bit-pair recoding is the product of the multiplier results in using at most one summand for each
pair of bits in the multiplier. It is derived directly from the Booth algorithm. Grouping the Booth-
recoded multiplier bits in pairs will decrease the multiplication only by summands.

Fast multiplication (Bit – pair recoding of multiplier) - This is derived from the Booth’s algorithm. It pairs
the multiplier bits and gives one multiplier bit per pair, thus reducing the number of summands by half.
This is shown below.

Multiplication requiring only n/2 summands

Example: Multiply each of the following pairs of signed 2’s-complement numbers using the Booth
algorithm. In each case, assume that A is the multiplicand and B is the multiplier.
(a) A = 010111 and B = 110110
(b) A = 110011 and B = 101100

(a)Consider the following binary numbers:

Multiply the signed 2’s complement numbers using the bit-pair recoding of the multiplier.

Thus, the resultant value is .

Consider the following binary numbers:

Multiply the signed 2’s complement numbers using the bit-pair recoding of the multiplier.

Thus, the resultant value is .

Restoring Division Algorithm for Unsigned

A division algorithm provides a quotient and a remainder when we divide two number. They are
generally of two type slow algorithm and fast algorithm. Slow division algorithm are restoring,
non-restoring, non-performing restoring, SRT algorithm and under fast comes Newton–Raphson
and Goldschmidt.
In this article, will be performing restoring algorithm for unsigned integer. Restoring term is due
to fact that value of register A is restored after each iteration.

Here, register Q contain quotient and register A contain remainder. Here, n-bit dividend is loaded
in Q and divisor is loaded in M. Value of Register is initially kept 0 and this is the register whose
value is restored during iteration due to which it is named Restoring.
Let’s pick the step involved:
 Step-1: First the registers are initialized with corresponding values (Q = Dividend, M = Divisor,
A = 0, n = number of bits in dividend)
 Step-2: Then the content of register A and Q is shifted left as if they are a single unit
 Step-3: Then content of register M is subtracted from A and result is stored in A
 Step-4: Then the most significant bit of the A is checked if it is 0 the least significant bit of Q
is set to 1 otherwise if it is 1 the least significant bit of Q is set to 0 and value of register A is
restored i.e the value of A before the subtraction with M
 Step-5: The value of counter n is decremented
 Step-6: If the value of n becomes zero we get of the loop otherwise we repeat from step 2
 Step-7: Finally, the register Q contain the quotient and A contain remainder

Perform Division Restoring Algorithm
Dividend = 11
Divisor = 3
n M A Q Operation
4 00011 00000 1011 initialize
00011 00001 011_ shift left AQ
00011 11110 011_ A=A-M
00011 00001 0110 Q[0]=0 And restore A
3 00011 00010 110_ shift left AQ
00011 11111 110_ A=A-M
00011 00010 1100 Q[0]=0
2 00011 00101 100_ shift left AQ
00011 00010 100_ A=A-M
00011 00010 1001 Q[0]=1
1 00011 00101 001_ shift left AQ
00011 00010 001_ A=A-M
00011 00010 0011 Q[0]=1

Remember to restore the value of A most significant bit of A is 1. As that register Q contain the
quotient, i.e. 3 and register A contain remainder 2.

Non - Restoring Division Algorithm for

Unsigned Integer

Non-Restoring division, it is less complex than the restoring one because simpler operation are
involved i.e. addition and subtraction, also now restoring step is performed. In the method,
rely on the sign bit of the register which initially contain zero named as A.
Here is the flow chart given below.

Let’s pick the step involved:
 Step-1: First the registers are initialized with corresponding values (Q = Dividend, M =
Divisor, A = 0, n = number of bits in dividend)
 Step-2: Check the sign bit of register A
 Step-3: If it is 1 shift left content of AQ and perform A = A+M, otherwise shift left AQ and
perform A = A-M (means add 2’s complement of M to A and store it to A)
 Step-4: Again the sign bit of register A
 Step-5: If sign bit is 1 Q[0] become 0 otherwise Q[0] become 1 (Q[0] means least significant
bit of register Q)
 Step-6: Decrements value of N by 1
 Step-7: If N is not equal to zero go to Step 2 otherwise go to next step
 Step-8: If sign bit of A is 1 then perform A = A+M
 Step-9: Register Q contain quotient and A contain remainder
Examples: Perform Non-Restoring Division for Unsigned Integer
Dividend =11
Divisor =3
-M =11101
N M A Q Action
4 00011 00000 1011 Start
00001 011_ Left shift AQ
11110 011_ A=A-M
3 11110 0110 Q[0]=0
11100 110_ Left shift AQ
11111 110_ A=A+M
2 11111 1100 Q[0]=0
11111 100_ Left Shift AQ
00010 100_ A=A+M
1 00010 1001 Q[0]=1
00101 001_ Left Shift AQ
00010 001_ A=A-M
0 00010 0011 Q[0]=1

Quotient = 3 (Q), Remainder = 2 (A)

IEEE Standard 754 Floating Point Numbers

The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-
point computation which was established in 1985 by the Institute of Electrical and Electronics
Engineers (IEEE). The standard addressed many problems found in the diverse floating point
implementations that made them difficult to use reliably and reduced their portability. IEEE
Standard 754 floating point is the most common representation today for real numbers on
computers, including Intel-based PC’s, Macs, and most Unix platforms.
There are several ways to represent floating point number but IEEE 754 is the most efficient in
most cases. IEEE 754 has 3 basic components:
1. The Sign of Mantissa –

This is as simple as the name. 0 represents a positive number while 1 represents a negative

2. The Biased exponent –

The exponent field needs to represent both positive and negative exponents. A bias is added
to the actual exponent in order to get the stored exponent.

3. The Normalized Mantissa –

The mantissa is part of a number in scientific notation or a floating-point number, consisting

of its significant digits. Here we have only 2 digits, i.e. O and 1. So a normalized mantissa is
one with only one 1 to the left of the decimal.
IEEE 754 numbers are divided into two based on the above three components: single precision
and double precision.

Single precision 1(31st bit) 8(30-23) 23(22-0) 127

1(63rd bit)
Double precision 11(62-52) 52(51-0) 1023

Example –
85 = 1010101
0.125 = 001
85.125 = 1010101.001
=1.010101001 x 2^6
sign = 0

1. Single precision:
biased exponent 127+6=133
133 = 10000101
Normalised mantisa = 010101001
we will add 0's to complete the 23 bits

The IEEE 754 Single precision is:
= 0 10000101 01010100100000000000000
This can be written in hexadecimal form 42AA4000

2. Double precision:
biased exponent 1023+6=1029
1029 = 10000000101
Normalised mantisa = 010101001
we will add 0's to complete the 52 bits

The IEEE 754 Double precision is:

= 0 10000000101 0101010010000000000000000000000000000000000000000000
This can be written in hexadecimal form 4055480000000000




To understand floating point addition, first we see addition of real numbers in decimal as
same logic is applied in both cases.

For example, we have to add 1.1 * 103 and 50.

We cannot add these numbers directly. First, we need to align the exponent and then, we can
add significand.
After aligning exponent, we get 50 = 0.05 * 103
Now adding significand, 0.05 + 1.1 = 1.15
So, finally we get (1.1 * 103 + 50) = 1.15 * 103
Here, notice that we shifted 50 and made it 0.05 to add these numbers.

Now let us take example of floating point number addition

We follow these steps to add two numbers:

1. Align the significand
2. Add the significands
3. Normalize the result

Let the two numbers be

x = 9.75
y = 0.5625

Converting them into 32-bit floating point representation,

9.75’s representation in 32-bit format = 0 10000010 00111000000000000000000
0.5625’s representation in 32-bit format = 0 01111110 00100000000000000000000

Now we get the difference of exponents to know how much shifting is required.
(10000010 – 01111110)2 = (4)10

Now, we shift the mantissa of lesser number right side by 4 units.

Mantissa of 0.5625 = 1.00100000000000000000000
(note that 1 before decimal point is understood in 32-bit representation)
Shifting right by 4 units, we get 0.00010010000000000000000
Mantissa of 9.75 = 1. 00111000000000000000000

Adding mantissa of both

0. 00010010000000000000000
+ 1. 00111000000000000000000
1. 01001010000000000000000
In final answer, we take exponent of bigger number
So, final answer consist of :
Sign bit = 0
Exponent of bigger number = 10000010
Mantissa = 01001010000000000000000
32 bit representation of answer = x + y = 0 10000010 01001010000000000000000

Subtraction is similar to addition with some differences like we subtract mantissa unlike
addition and in sign bit we put the sign of greater number.

Let the two numbers be

x = 9.75
y = – 0.5625

Converting them into 32-bit floating point representation

9.75’s representation in 32-bit format = 0 10000010 00111000000000000000000
– 0.5625’s representation in 32-bit format = 1 01111110 00100000000000000000000

Now, we find the difference of exponents to know how much shifting is required.
(10000010 – 01111110)2 = (4)10
Now, we shift the mantissa of lesser number right side by 4 units.
Mantissa of – 0.5625 = 1.00100000000000000000000
(note that 1 before decimal point is understood in 32-bit representation)
Shifting right by 4 units, 0.00010010000000000000000
Mantissa of 9.75= 1. 00111000000000000000000

Subtracting mantissa of both

0. 00010010000000000000000
– 1. 00111000000000000000000
1. 00100110000000000000000

Sign bit of bigger number = 0

So, finally the answer = x – y = 0 10000010 00100110000000000000000

Multiplication and Division
Multiplication and division are simple because the mantissa and exponents can be processed
independently. FP multiplication requires fixed point multiplication of mantissa and fixed-point
addition of exponents. As discussed in chapter 3 (Data representation) the exponents are stored
in the biased form. The bias is +127 for IEEE single precision and +1023 for double precision.
During multiplication, when both the exponents are added it results in excess 127. Hence the bias
is to be adjusted by subtracting 127 or 1023 from the resulting exponent.

Floating Point division requires fixed-point division of mantissa and fixed point subtraction of
exponents. The bias adjustment is done by adding +127 to the resulting mantissa. Normalization
of the result is necessary in both the cases of multiplication and division. Thus FP division and
subtraction are not much complicated to implement.

All the examples are in base10 (decimal) to enhance the understanding. Doing in binary is similar.

Carry-Look ahead Adder

In case of parallel adders, the binary addition of two numbers is initiated when all the bits of the

augend and the addend must be available at the same time to perform the computation. In a

parallel adder circuit, the carry output of each full adder stage is connected to the carry input of

the next higher-order stage, hence it is also called as ripple carry type adder.

In such adder circuits, it is not possible to produce the sum and carry outputs of any stage until
the input carry occurs. So there will be a considerable time delay in the addition process , which

is known as , carry propagation delay. In any combinational circuit , signal must propagate

through the gates before the correct output sum is available in the output terminals.

Consider the above figure, in which the sum S4 is produced by the corresponding full adder as

soon as the input signals are applied to it. But the carry input C4 is not available on its final steady

state value until carry c3 is available at its steady state value. Similarly C3 depends on C2 and C2

on C1. Therefore, carry must propagate to all the stages in order that output S4 and carry C5

settle their final steady-state value.

The propagation time is equal to the propagation delay of the typical gate times the number of

gate levels in the circuit. For example, if each full adder stage has a propagation delay of 20n

seconds, then S4 will reach its final correct value after 80n (20 × 4) seconds. If we extend the

number of stages for adding more number of bits then this situation becomes much worse.

So the speed at which the number of bits added in the parallel adder depends on the carry

propagation time. However, signals must be propagated through the gates at a given enough

time to produce the correct or desired output.

The following are the methods to get the high speed in the parallel adder to produce the binary


1. By employing faster gates with reduced delays, we can reduce the propagation delay. But

there will be a capability limit for every physical logic gate.

2. Another way is to increase the circuit complexity in order to reduce the carry delay time.

There are several methods available to speeding up the parallel adder, one commonly

used method employs the principle of look ahead-carry addition by eliminating inter stage

carry logic.

Carry-Lookahead Adder

A carry-Lookahead adder is a fast parallel adder as it reduces the propagation delay by more

complex hardware, hence it is costlier. In this design, the carry logic over fixed groups of bits of

the adder is reduced to two-level logic, which is nothing but a transformation of the ripple carry


This method makes use of logic gates so as to look at the lower order bits of the augend and

addend to see whether a higher order carry is to be generated or not. Let us discuss in detail.

Consider the full adder circuit shown above with corresponding truth table. If we define two

variables as carry generate Gi and carry propagate Pi then,

Pi = Ai ⊕ Bi

Gi = Ai Bi

The sum output and carry output can be expressed as

Si = Pi ⊕ Ci

C i +1 = Gi + Pi Ci

Where Gi is a carry generate which produces the carry when both Ai, Bi are one regardless of the

input carry. Pi is a carry propagate and it is associate with the propagation of carry from Ci to Ci


The carry output Boolean function of each stage in a 4 stage carry-Lookahead adder can be

expressed as

C1 = G0 + P0 Cin

C2 = G1 + P1 C1

= G1 + P1 G0 + P1 P0 Cin

C3 = G2 + P2 C2

= G2 + P2 G1+ P2 P1 G0 + P2 P1 P0 Cin

C4 = G3 + P3 C3

= G3 + P3 G2+ P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 Cin

From the above Boolean equations we can observe that C4 does not have to wait for C3 and C2
to propagate but actually C4 is propagated at the same time as C3 and C2. Since the Boolean

expression for each carry output is the sum of products so these can be implemented with one

level of AND gates followed by an OR gate.

The implementation of three Boolean functions for each carry output (C2, C3 and C4) for a carry-

Lookahead carry generator shown in below figure.

Therefore, a 4 bit parallel adder can be implemented with the carry-Lookahead scheme to

increase the speed of binary addition as shown in below figure. In this, two Ex-OR gates are

required by each sum output. The first Ex-OR gate generates Pi variable output and the AND gate

generates Gi variable.

Hence, in two gates levels all these P’s and G’s are generated. The carry-Lookahead generators

allows all these P and G signals to propagate after they settle into their steady state values and

produces the output carriers at a delay of two levels of gates. Therefore, the sum outputs S2 to

S4 have equal propagation delay times.

It is also possible to construct 16 bit and 32 bit parallel adders by cascading the number of 4 bit

adders with carry logic. A 16 bit carry-Lookahead adder is constructed by cascading the four 4 bit

adders with two more gate delays, whereas the 32 bit carry-Lookahead adder is formed by

cascading of two 16 bit adders. In a 16 bit carry-Lookahead adder, 5 and 8 gate delays are

required to get C16 and S15 respectively, which are less as compared to the 9 and 10 gate delay
for C16 and S15 respectively in cascaded four bit carry-Lookahead adder blocks. Similarly, in 32

bit adder, 7 and 10 gate delays are required by C32 and S31 which are less compared to 18 and

17 gate delays for the same outputs if the 32 bit adder is implemented by eight 4 bit adders.

