Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

Adders

Dinesh Sharma

EE Department
IIT Bombay, Mumbai

October 16, 2022

Dinesh Sharma (IIT B) Adders October 16, 2022 1 / 66


1 Half and Full Adders

2 Ripple Carry adder

3 Carry Look Ahead


Manchester Carry Chain

4 Carry Bypass Adder

5 Carry Select Adder


Stacking Carry Select Adders

6 Tree Adders
Brent Kung adder
Tutorial: 32 bit Brent Kung Logarithmic Adder

7 Serial Adders

Dinesh Sharma (IIT B) Adders October 16, 2022 2 / 66


Half and Full Adders

Half Adder

The truth table for addition of two bits is:


A B Sum Carry
0 0 0 0
0 1 1 0 sum = A · B + B · A
carry = A · B
1 0 1 0
1 1 0 1

What do we do with the carry?


Obviously, it must be added to more significant bits.
So we need an adder with three inputs.

Dinesh Sharma (IIT B) Adders October 16, 2022 3 / 66


Half and Full Adders

Full Adder

Truth Table for the addition of


three bits is: Which leads to the following Karnaugh maps:
A B Cin Sum Cout
0 0 0 0 0 AB
Cin 00 01 11 10
0 1 0 1 0 0 0 1 0 1
SUM
1 0 0 1 0 1 1 0 1 0
1 1 0 0 1
0 0 1 1 0 AB
0 1 1 0 1 Cin 00 01 11 10
0 0 0 1 0 CARRY
1 0 1 0 1
1 0 1 1 1
1 1 1 1 1

sum = A · B · Cin + A · B · Cin + A · B · Cin + A · B · Cin


Cout = A · B + B · Cin + Cin · A = A · B + Cin · (A + B)

Dinesh Sharma (IIT B) Adders October 16, 2022 4 / 66


Ripple Carry adder

Ripple Carry adder

A2 B2 A1 B1 A0 B0

Cout Cout Cout Cout Cout

S2 S1 S0

Carry out of one bit becomes Carry in of the next.


This architecture is therefore called ripple carry adder.
The critical delay path of the adder is the carry rippling from one bit to the
next.

Dinesh Sharma (IIT B) Adders October 16, 2022 5 / 66


Ripple Carry adder

Sum derived from carry

Because carry is on the critical path, Carry-out must be generated as


quickly as possible.
We need not optimize the delay of generating sum.
We can in fact generate sum from Carry out.

Cout = A · B + Cin · (A + B)
= (A + B) · (Cin + A · B)
= A · Cin + B · Cin + A · B
Cout · (A + B + Cin ) = A · B · Cin + A · B · Cin + A · B · Cin

sum = A · B · Cin + A · B · Cin + A · B · Cin + A · B · Cin


= Cout · (A + B + Cin ) + A · B · Cin

Dinesh Sharma (IIT B) Adders October 16, 2022 6 / 66


Ripple Carry adder

CMOS Implementation

VDD VDD

A B
A
Cin
B

B
A
Cout Cout Cout A
Sum Sum
Cin
A A

B
B
A
A Cin
B
Gnd

Cout = A.B + Cin . (A+B) Sum = Cout . (A + B + Cin) + A . B . Cin

Dinesh Sharma (IIT B) Adders October 16, 2022 7 / 66


Ripple Carry adder

Complementation Property
Both Sum and Carry show an interesting symmetry:

sum = A · B · Cin + A · B · Cin + A · B · Cin + A · B · Cin


sum = (A + B + Cin ) · (A + B + Cin ) · (A + B + Cin ) · (A + B + Cin )
= (A + A · B + A · Cin + A · B + B · Cin + Cin · A + Cin · B) ·
(A + A · B + A · Cin + A · B + B · Cin + Cin · A + Cin · B)
= (A + B · Cin + B · Cin ) · (A + B · Cin + B · Cin )
= A · B · Cin + A · B · Cin + A · B · Cin + A · B · Cin

Thus

sum = A · B · Cin + A · B · Cin + A · B · Cin + A · B · Cin


sum = A · B · Cin + A · B · Cin + A · B · Cin + A · B · Cin

This shows that the same hardware that produces sum from A, B and Cin ,
will produce sum if the inputs are changed to A, B and Cin

Dinesh Sharma (IIT B) Adders October 16, 2022 8 / 66


Ripple Carry adder

Complementation Property

Carry also has the same complementation property.

Cout = A · B + Cin · (A + B)

Hence, Cout = A · B + Cin · (A + B) = (A + B) · (Cin + A · B)


= A · Cin + B · Cin + A · B

Thus Cout = A · B + Cin · (A + B)


while Cout = A · B + Cin · (A + B)

So the same hardware which produces Cout from A, B and Cin , will produce
Cout from A, B and Cin .

Dinesh Sharma (IIT B) Adders October 16, 2022 9 / 66


Ripple Carry adder

Making use of the symmetry property

In CMOS implementation, we interchange series and parallel


configurations for the n and p channel transistors.
This is to ensure that the pull up and pull down circuits are
complementary.
However, for sum and carry functions, we see that these functions are
their own complements.
Therefore, for implementing sum and carry, we can use the same
configuration for n and p channel transistors.
We use this to reduce the number of series connected transistors in pull
up/pull down networks.

Dinesh Sharma (IIT B) Adders October 16, 2022 10 / 66


Ripple Carry adder

Mirror gates for Adders

By making use of symmetry property of sum and carry, it is possible to


simplify the implementations.
VDD
VDD
A B Cin A
B A A
Cin B

Cin A B Cin Cin


A Cout
B Cin Cout Cout Sum Sum

Cout
Cout Cin
Cin B
B
B A A A B Cin
A
Gnd
Gnd
Cout = A.B + Cin . (A+B)
Sum = Cout . (A + B + Cin) + A . B . Cin
These are called mirror gates because the n and p transistors have the same
series parallel combination.
This is highly unusual.

Dinesh Sharma (IIT B) Adders October 16, 2022 11 / 66


Ripple Carry adder

Speeding up the Ripple Carry Adder

The worst case delay of the ripple carry adder is linear in number of bits
to be added.
To reduce the delay per stage, we can eliminate the inverter from the
carry output.
All even bit adders accept a, b and Cin as inputs. The mirror gate without
inverter gives Cout as the output.
All odd bit adders accept A, B and Cin as inputs and thus produce Cout as
output.
Outputs of all bits are now compatible with inputs of the next stage.

Dinesh Sharma (IIT B) Adders October 16, 2022 12 / 66


Ripple Carry adder

Speeding up the Ripple Carry Adder

Extra inverters are required to produce A, B and at the outputs to produce


the proper result. However, these are not on the critical path, and do not
add to the worst case delay.
Extreme care needs to be taken in layout to ensure that the loading on
the tree gate producing carry output is as small as possible.

Dinesh Sharma (IIT B) Adders October 16, 2022 13 / 66


Carry Look Ahead

Terms Independent of Carry

Carry propagation is the critical path for a multi-bit adder.


To speed up the adder, we would like an architecture where logic terms
are classified as those dependent on carry and those which do not
depend on carry.
To speed up the adder, we would like to pre-compute all terms which do
not depend on carry.
Now when the carry arrives, we quickly compute the output carry and
pass it on to the next stage.

Dinesh Sharma (IIT B) Adders October 16, 2022 14 / 66


Carry Look Ahead

Carry Independent Terms

We would like to analyze what information can be pre-computed from Ai and


Bi , which will help us in generating Cout quickly from Cin .
When Ai = 0 and Bi = 0, Cout is 0, independent of Cin . We define this
condition as ‘Kill’. K = A · B
Similarly, when Ai = 1 and Bi = 1, Cout is 1, independent of Cin . We
define this condition as ‘Generate’: G = A.B.
Only when Ai = 0 and Bi = 1 or when Ai = 1 and Bi = 0,
we need to wait for Cin to compute Cout .
In both these cases, Cout = Cin .
We call this condition as ‘Propagate’, and define P = A.B + A.B.

Dinesh Sharma (IIT B) Adders October 16, 2022 15 / 66


Carry Look Ahead

Using Carry Independent Terms

We define K ≡ A · B, G ≡ A.B and P ≡ A ⊕ B.


Exactly one of K, G or P is true at any time.
When K = 1, Cout is 0, independent of Cin .
When G = 1, Cout is 1, independent of Cin .
When P = 1, Cout = Cin .
P needs to be computed using an xor gate, which can be slow. However, the
only difference between xor and or logic is when both inputs are 1, i.e. G = 1.
If we can ensure that G forces Cout to 1 irrespective of P, we can use the
simpler ‘or’ logic to compute P.

Dinesh Sharma (IIT B) Adders October 16, 2022 16 / 66


Carry Look Ahead

Carry Look Ahead

Cin for bit i+1 is the Cout of bit i.


So we can write Ci+1 = Gi + Pi .Ci
Notice that the Kill signal is not required.
If Gi = 0, Ci+1 = A ⊕ B = A + B when G = A.B = 0
If Gi = 1, Ci+1 = 1, and the value of Pi does not matter anyway.
So we can use P = A + B instead of P = A ⊕ B.
Now, we have the sequence:

Ci+1 = Gi + Pi .Ci = Gi + Pi .Gi−1 + Pi .Pi−1 .Ci−1 = · · ·

and so on, till we reach C0 .


Since all Gi , Pi and C0 can be computed in parallel on arrival of the inputs, we
can compute all sum and carry terms independently if we do not mind the
added complexity.

Dinesh Sharma (IIT B) Adders October 16, 2022 17 / 66


Carry Look Ahead

Carry Look Ahead

Ci+1 = Gi + Pi .Ci = Gi + Pi .Gi−1 + Pi .Pi−1 .Ci−1 = · · ·


Unfortunately, static implementation of these gates has almost as much delay
as the ripple carry implementation.
Therefore, the static implementation of computation of sum and carry terms
as a logic expression depending on all Ai , Bi and C0 is rarely used.
We can use these expressions for blocks of a small number of bits (say 4) and
then propagate carry over these blocks.

Dinesh Sharma (IIT B) Adders October 16, 2022 18 / 66


Carry Look Ahead Manchester Carry Chain

Manchester Carry Chain

VDD

P
Static implementation of look ahead carry is not
really fast if we try to look ahead by a large number Cin Cout

of bits, because the logic becomes very complex. G

A dynamic implementation is useful and is widely


used. It is known as the Manchester Carry Chain. Ck
Gnd

Dinesh Sharma (IIT B) Adders October 16, 2022 19 / 66


Carry Look Ahead Manchester Carry Chain

Manchester Carry Chain

VDD

P When the clock is low, the output is unconditionally


charged by the pMOS.
Cin Cout
G When the clock goes high, the output will be pulled
low if G = 1 or if P = 1 and Cin = 0.

Ck
In all other cases, the output will remain high. Thus
this circuit implements the required logic.
Gnd

This circuit can be concatenated for all bits and since P and G are ready
before Cin arrives, the carry quickly ripples through from bit to bit.

Dinesh Sharma (IIT B) Adders October 16, 2022 20 / 66


Carry Look Ahead Manchester Carry Chain

Manchester Carry Chain as Carry Look Ahead

VDD

P
Notice that the nMOS logic can be interpreted as:
Cin Cout
G P.Cin + G

where Cin itself has been recursively generated by


Ck similar logic.
Gnd

As in the static case, there is a limit to the number of bits which can be so
connected.
If P = 1 for many successive bits, the discharge path is through series
connected pass transistors of all these gates. The discharge time for this
critical path has an n2 dependence.

Dinesh Sharma (IIT B) Adders October 16, 2022 21 / 66


Carry Look Ahead Manchester Carry Chain

Manchester Carry Chain as Carry Look Ahead


The circuit below shows a Manchester carry chain over 4 bits.
VDD

P0 P1 P2 P3
Cin0 Cout0 Cout1 Cout2 Cout3

G0 G1 G2 G3

Ck

If G = 1 for any bit, the output is brought to ‘0’. (Recall that Carry
propagates – not Carry).
The time of carry arrival for all subsequent bits is from the last bit where P
= 0.
The worst case for delay occurs when P = 1 for all bits. In this case, all
load capacitors are shorted, so load capacitance ∝ n.
The discharge of capacitors is through n series connected pass
transistors, so average R is ∝ n.
Thus in the worst case, the delay ∝ RC ∝ n2 .
Dinesh Sharma (IIT B) Adders October 16, 2022 22 / 66
Carry Bypass Adder

Carry Bypass Adder

The worst case for addition occurs when P = 1 for all bits and carry has to
ripple through all bits.
In carry bypass adder, we form groups of bits and if P = 1 for all members
of a group, we pass on the carry input to this group directly to the input of
the next group, without having to ripple through each bit.
This improves the worst case delay of the adder.
bypass = P0.P1.P2.P3

VDD

P0 P1 P2 P3
Cout0 Cout1 Cout2 Cout3

Cin0 G0 G1 G2 G3

Ck

Dinesh Sharma (IIT B) Adders October 16, 2022 23 / 66


Carry Select Adder

Single bit Carry Select Adder

One can make a fast adder at the cost of some added complexity, by
implementing two adders, one assuming that Cin = 0 and the other
assuming that Cin = 1.
When the actual carry input arrives at this bit, it chooses the correct one
using a multiplexer, depending on its value.
Since Cout = G + P · Cin , the two cases are:
For Cin = 0, Cout = G = A · B
For Cin = 1, Cout = G + P = A · B + A ⊕ B = A + B
Thus the two candidates for Cout are quite easy to generate, being just the
AND/OR of A and B.
This concept can be extended to multi-bit carry select adders.

Dinesh Sharma (IIT B) Adders October 16, 2022 24 / 66


Carry Select Adder

Carry Select Adder

An m bit carry select adder can be constructed as follows:


We first compute the generate/propagate/kill signals for each bit (in
parallel) from the input bits. Assuming unit gate delay model, this takes
one unit of time.
We use two m bit carry bypass adders. One of the adders assumes the
carry input Cin to be 0, while the other assumes Cin to be 1. The two
adders work in parallel and each takes m units of time.
We now use a multiplexer controlled by the actual Cin to select the correct
Cout . This takes one unit of time.
The Cout of one such m bit adder will be used as the select input of the
multiplexer of the next.
The sum output of each bit is derived from P and Cout signals for the
corresponding bit and appear one unit of time after Cout is available.

Dinesh Sharma (IIT B) Adders October 16, 2022 25 / 66


Carry Select Adder

Multi-bit Carry Select Adders

a b
(0) (0) The two m bit sub-adders assume the
Generate
G, P, K carry to be 0 or 1 respectively.
(1)

Cin = 0 m bit m bit Cin = 1


Times of availability of various signals
(0)
adder adder
(0) are noted in parentheses in the
(m+1) (m+1)

Mux Cout
diagram.
Actual Cin (m+2)
(Unit delay times)

The two alternatives for the carry output are ready at (m+1) units of time.
If the actual Cin is available at n units of time, the output will be available
at (m+2) or (n+1), whichever is later.
In case of 4 bit adders, this is at 6 units of time or at Cin arrival + 1,
whichever is later.

Dinesh Sharma (IIT B) Adders October 16, 2022 26 / 66


Carry Select Adder Stacking Carry Select Adders

Stacking in Carry Select adders

The sub-adders in carry select adder can use any architecture.


They could be Manchester carry chains, carry bypass or ripple carry
adders.
Obviously, these sub adders should not be very long, otherwise, their
outputs will be ready after a long time and we shall lose the advantage of
carry bypass additions.
Then, how do we make long adders using carry select?
This is done by stacking several smaller carry select adders.

Dinesh Sharma (IIT B) Adders October 16, 2022 27 / 66


Carry Select Adder Stacking Carry Select Adders

First stage of Carry Select adders

The first stage of stacked Carry Select adders is different from the rest.
In this case, we do not have to wait for Cin to arrive – it is already known.
Therefore we do not have to use redundant adders – a single m bit adder
will do.
Since no multiplexing is required, the output of the first stage is ready at
(m + 1) units of time, rather than at (m + 2).
This is convenient – because the two alternatives of the second stage are
also ready at (m + 1) units of time.

Dinesh Sharma (IIT B) Adders October 16, 2022 28 / 66


Carry Select Adder Stacking Carry Select Adders

Linear Stacking

We could stack several identical carry select adders.


There is no need for carry select in the first stage, as Cin for this stage is
available simultaneously with Ai and Bi .
Every subsequent stage will have two sub-adders, one assuming Cin = 0,
the other assuming Cin = 1.
The correct output will be selected by the actual Cin when it arrives.
Thus, after the first stage, each group of m bit adders will add only one
unit of delay.
This is much faster. However, the delay is still linear in number of bits.

Dinesh Sharma (IIT B) Adders October 16, 2022 29 / 66


Carry Select Adder Stacking Carry Select Adders

Linear stacking: Example

A 32-bit adder made by cascading 8 4-bit carry select adders.

a (0-3) b (0-3) a (4-7) b (4-7) (5 gps of 4 bits) a (28-31) b (28-31)


(0) (0) (0) (0) (0) (0) (0) (0)

gen G, P, K gen G, P, K gen G, P, K gen G, P, K


Bits cy in alt cy.s cy out
(1)
` 0' (1) ` 1' ` 0' (1) ` 1' ` 0' (1) ` 1'
0-3 0 - 5
Cin
4 bit
Adder
4 bit 4 bit 4 bit 4 bit
Cin Adder Adder Cin Cin Adder Adder Cin
4 bit 4 bit
Cin Adder Adder Cin
4-7 5 5 6
Cout
(5)
Mux
(5)
Cout
(5)
Mux
(5)
Cout
(5)
Mux
(5)
Cout 8-11 6 5 7
(5) (6) (11) (12)
12-15 7 5 8
The sum generation will take another 16-19 8 5 9
unit of time, so the overall results will 20-23 9 5 10
be available in 13 units of time. 24-27 10 5 11
28-31 11 5 12

Dinesh Sharma (IIT B) Adders October 16, 2022 30 / 66


Carry Select Adder Stacking Carry Select Adders

Square-root Stacking

Can we speed up the adder if we don’t use the same no. of bits in every
stage?
In linear stacking, since all adders are identical, they are ready with their
alternative outputs at the same time.
But the carry arrives later and later at each successive group of carry
select adders.
We could have used this extra time to add up more bits in the later
stages, and still be ready with the alternative results before carry arrives!
Since the carry arrives one unit of time later at each successive group,
each successive group could be longer by one bit.

Dinesh Sharma (IIT B) Adders October 16, 2022 31 / 66


Carry Select Adder Stacking Carry Select Adders

Square-root Stacking

We can do more bits of addition in the same time, if each successive


stage is 1 bit longer than the previous one.
Thus, the number of bits which can be added is given by

s(m0 + m0 + s − 1)
n = m0 + m0 + (m0 + 1) + (m0 + 2) + · · · = m0 +
2
where s is the number of stages following the first one without carry
select.
The total delay will be m0 + 1 for the first stage. Each subsequent stage
takes just 1 unit of time since the candidates for selection are available
just in time.
The time taken is just m0 + s + 1 units. When s ≫ m0 , we have n ≈ s2 /2,
while the time taken is nearly s.

Thus the time taken to add n bits is ≈ 2n

Dinesh Sharma (IIT B) Adders October 16, 2022 32 / 66


Carry Select Adder Stacking Carry Select Adders

Square-root Stacking: Example

For a 32 bit adder, we could use a distribution like: 4,4,5,6,7,6.

Bits carry in carry alternatives carry out


0-3 0 - 5
4-7 5 5 6
8-12 6 6 7
13-18 7 7 8
19-25 8 8 9
26-31 9 7 10

Our sum will be ready at 11 - which is faster. This gain will be much higher for
wider additions.

Dinesh Sharma (IIT B) Adders October 16, 2022 33 / 66


Tree Adders

Tree Adders

Tree adders use the idea of carry look ahead addition.


However, these do not try to implement the complex logic expressions
which would result if we try to generate each carry directly from input
operands.
Instead, these build up the logic in a tree like structure, where each node
performs simple logic operations on the results of the previous node.
Because of the tree structure used in this, the delay is of the order of log n
for an n bit adder.

Dinesh Sharma (IIT B) Adders October 16, 2022 34 / 66


Tree Adders

Carry Look Ahead

For carry look ahead, we had defined


K = A · B, G = A.B and P = A ⊕ B.
P, G and K can be computed without waiting for Cin .
when K = 1 Cout = 0 irrespective of Cin .
when G = 1 Cout = 1 irrespective of Cin .
When P = 1 Cout = Cin : This is the only case when we must wait for Cin in
order to compute Cout
Exactly one of P, G and K will be true for any combination of A, B and C.
Therefore we do not have to compute all three. Most adders just use G and P.

Dinesh Sharma (IIT B) Adders October 16, 2022 35 / 66


Tree Adders

Terminology

Let us first establish the terminology used for this section.


aN-1 bN-1 ai bi a1 b1 a0 b0
cN
N-1
cN-1 ci+1
i
ci
1
c1
0
c0 The least significant bit is indexed as 0
GN-1, PN-1 Gi, Pi G1, P1 G0, P0 and the most significant bit as N − 1.
sN-1 si s1 s0

The input operands to the adder are A = (aN−1 · · · a0 ) and


B = (bN−1 · · · b0 ), with a possible input carry c0 . All these bits are
available at the start.
ci represents the input carry to the i’th bit.
The output carry from bit i is ci+1 , which is the input carry for bit (i+1).
Thus c0 represents the overall input carry for the addition and cN
represents the final output carry.
si represents the sum output from the i’th bit.

Dinesh Sharma (IIT B) Adders October 16, 2022 36 / 66


Tree Adders

P and G signals over blocks of multiple bits


The Generate and Propagate signals are derived exclusively from ai and bi
inputs and are independent of carry input. These can thus be generated in
constant time and in parallel for all the bits.
The output carry for i’th bit is generated from the incoming carry using the
relation: ci+1 = Gi + Pi · ci . Similarly, ci = Gi−1 + Pi−1 · ci−1 .
Substituting for ci in the relation for ci+1 , we get
ci+1 = Gi + Pi · (Gi−1 + Pi−1 · ci−1 ) = (Gi + Pi · Gi−1 ) + (Pi · Pi−1 ) · ci−1
If we define Gi:i−1 ≡ Gi + Pi · Gi−1 and Pi:i−1 ≡ Pi · Pi−1 , we get the
relation: ci+1 = Gi:i−1 + Pi:i−1 · ci−1
This is the same relation as the one used for single bit carry generation,
but permits us to compute ci+1 directly from ci−1 .
Thus Gi:i−1 and Pi:i−1 are effectively the Generate and Propagate values
for a block of 2 bits (i and i − 1).
Like Gi and Pi , Gi:i−1 and Pi:i−1 are independent of carry and can be
computed in constant time from A and B in parallel.

Dinesh Sharma (IIT B) Adders October 16, 2022 37 / 66


Tree Adders

Higher order P and G


Just as we combined single bit G and P values to get new G and P values
which operate over two bits, we can combine these 2 bit G and P values
to get G and P values which operate over 4 bits and so on.
In general, if we take two contiguous ranges u and l each of size 2n , we
can write for the combined range (u : l) of size 2n+1 , the recursive
relation:
Gu:l = Gu + Pu · Gl and Pu:l = Pu · Pl
This suggests a tree structure for computation of successive G and P
values which operate over bigger and bigger ranges of bits.
To distinguish G and P values operating over ranges of different sizes,
we’ll use a superscript which gives the “order” of computation of these.
Thus single bit G and P values will carry a superscript of 0, 2 bit values
will use a superscript of 1 and so on. Eventually, G and P values covering
a range of 2m bits will carry a superscript of m.
As before, G and P values will carry a subscript which gives the range of
bit indices over which these operate.
Dinesh Sharma (IIT B) Adders October 16, 2022 38 / 66
Tree Adders

Higher order P and G

Once the highest order P and G values have been generated, the final
carry can be computed in one step from the input carry.
The final result contains all the sum bits and the final carry. So it may
appear that we do not need the intermediate carries at each bit.
However, the sum bits depend on internal carries. The sum bits are given
by:
Si = Ai ⊕ Bi ⊕ Ci = Pi ⊕ Ci
Thus we do need the internal bit-wise carries for sum generation.
The group size over which the carry can be computed directly multiplies
by two each time we use a higher order for G and P values.
On the other hand, the time to compute the required higher order G and
P values increments by one gate delay.
(time to compute A + B · C for G and A · B for P).
This results in the ultimate time to generate the all the P and G values
being logarithmic in the number of bits being added.

Dinesh Sharma (IIT B) Adders October 16, 2022 39 / 66


Tree Adders

Logarithmic Adders

Using P and G values of different orders, we can compute the bit wise
carry and sum values.
Notice that in logarithmic adders, internal bit-wise sum and carry values
may be available after the final carry.
Thus the critical path is not the generation of the final carry, but that of
bit-wise sums.
Different architectures have been described in literature for the order of
computation of G, P, Cout and Sum bits.
All of these compute the final result in times which are logarithmic
functions of the number of bits.
For wide adders,these can be much faster than other architectures.

Dinesh Sharma (IIT B) Adders October 16, 2022 40 / 66


Tree Adders Brent Kung adder

Brent Kung adder

The Brent Kung tree adder is a logarithmic adder of low complexity.


Generate and Pass signals are successively computed over groups of 1
bit, 2bits, 4bits, . . . in a tree structure.
Since the number of bits covered in every step doubles, the total time
taken for this is a logarithmic function of the number of bits.
Values of multiple orders of G and P so computed are then used to
compute the internal carry values at each internal bit, from which sum
values for every bit are derived.
This step is called a back trace and also takes logarithmic time.

Dinesh Sharma (IIT B) Adders October 16, 2022 41 / 66


Tree Adders Brent Kung adder

Brent Kung adder

The figure below shows the generation of P and G values for an 8 bit
adder.
a7 b7 a6 b6 a5 b5 a4 b4 a3 b3 a2 b2 a1 b1 a0 b0

P70G70 P60G60 P50G50 P40G40 P30G30 P20G20 P10G10 P00G00

P7:61 G7:61 P5:41 G5:41 P3:21 G3:21 P1:01 G1:01

P7:42 G7:42 P3:02 G3:02

P7:03 G7:03

Dinesh Sharma (IIT B) Adders October 16, 2022 42 / 66


Tree Adders Brent Kung adder

Brent Kung adder


a7 b7 a6 b6 a5 b5 a4 b4 a3 b3 a2 b2 a1 b1 a0 b0

P70G70 P60G60 P50G50 P40G40 P30G30 P20G20 P10G10 P00G00

P7:61 G7:61 P5:41 G5:41 P3:21 G3:21 P1:01 G1:01

P7:42 G7:42 P3:02 G3:02

P7:03 G7:03

we first calculate Pi1 , Gi1 , with i = 0 · · · 7.


Gi = Ai · Bi , Pi = Ai ⊕ Bi
2 2
Next, using these values, we can generate P2i+1,2i , G2i+1,2i
with i = 0 · · · 3.
2 1 1 1 2 1
G2i+1,2i = G2i+1 + P2i+1 · G2i , P2i+1,2i = P2i+1 · P2i1
Dinesh Sharma (IIT B) Adders October 16, 2022 43 / 66
Tree Adders Brent Kung adder

Brent Kung adder


a7 b7 a6 b6 a5 b5 a4 b4 a3 b3 a2 b2 a1 b1 a0 b0

P70G70 P60G60 P50G50 P40G40 P30G30 P20G20 P10G10 P00G00

P7:61 G7:61 P5:41 G5:41 P3:21 G3:21 P1:01 G1:01

P7:42 G7:42 P3:02 G3:02

P7:03 G7:03

3 3
In the next step, we use second order P,G values to generate P4i+3,4i , G4i+3,4i
with i = 0, 1.
3 2 2 2 3 2 2
G7,4 = G7,6 + P7,6 · G5,4 , P7,4 = P7,6 · P5,4
3 2 2 2 3 2 2
G3,0 = G3,2 + P3,2 · G1,0 , P3,0 = P3,2 · P1,0

Dinesh Sharma (IIT B) Adders October 16, 2022 44 / 66


Tree Adders Brent Kung adder

Brent Kung adder


a7 b7 a6 b6 a5 b5 a4 b4 a3 b3 a2 b2 a1 b1 a0 b0

P70G70 P60G60 P50G50 P40G40 P30G30 P20G20 P10G10 P00G00

P7:61 G7:61 P5:41 G5:41 P3:21 G3:21 P1:01 G1:01

P7:42 G7:42 P3:02 G3:02

P7:03 G7:03

3 3 4 4
Finally, using G4i+3,4i and P4i+3,4i (with i = 0, 1) we can compute P7,0 , G7,0 .

4 3 3 3
G7,0 = G7,4 + P7,4 · G3,0
4 3 3
P7,0 = P7,4 · P3,0

Dinesh Sharma (IIT B) Adders October 16, 2022 45 / 66


Tree Adders Brent Kung adder

Brent Kung adder

Once P and G terms of various orders are known, we can compute the values
of carry outputs which depend on these and the input carry C0 , which is
available at t = 0.

C1 = G01 + P01 · C0 , 2
C2 = G1,0 2
+ P1,0 · C0
3 3 4 4
C4 = G3,0 + P3,0 · C0 , C8 = G7,0 + P7,0 · C0
When these carry values are valid, the other carry values which depend on
these can be generated.

Dinesh Sharma (IIT B) Adders October 16, 2022 46 / 66


Tree Adders Brent Kung adder

Brent Kung adder

Once C1 , C2 , C4 and C8 have been generated, we can produce internal


carries which depend on these.

C3 = G21 + P21 · C2 , C5 = G41 + P41 · C4 2


C6 = G5,4 2
+ P5,4 · C4 ,

Finally, C7 can be generated from C6 .

C7 = G61 + P61 · C6

With all carry values generated, the corresponding sum values can be
calculated using the relation Sumi = Pi1 ⊕ Ci .

Dinesh Sharma (IIT B) Adders October 16, 2022 47 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

Logarithmic Adders with a tree architecture


We illustrate the operation of a 32 bit Brent Kung adder with a numerical
example.
Recall that if we represents indices for upper half of a range by u and the
lower half by l, we can write:

G(u:l) = Gu + Pu · Gl , whereas Cnext = G(u) + P(u) · Cprev

Notice that G values are computed by the same logic relation as carry
outputs.
The input carry C0 is known at the start itself.
Whenever the carry is already known, we can replace Gl by this carry.
The computed value of G(u:l) will then be the carry output, rather than the
G value. This value can be used for further G calculations and will directly
give the carry each time.
This can reduce the computation required to generate the carry and sum
values since some of the carry values are already available.

Dinesh Sharma (IIT B) Adders October 16, 2022 48 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder: Order 0

We use a unit time model in which we assume that logic functions AND,
XOR, A + B.C as well as A.B + C.(A+B) take the same amount of time,
which defines 1 slot of time for this tutorial.
The single Bit G and P values (designated as order 0) are given by

Pi0 = ai ⊕ bi , Gi0 = ai · bi , except G00 = ai · bi + c0 · (a0 + b0 )

An exception is made for the least significant bit of G because for this bit,
the input carry is known at the start.
We make use of this and compute effectively the carry output from bit 0
(c1 ) and map the output carry as if it was due to a generate signal at this
position. Thus,
G00 = c1 = a0 · b0 + c0 · (a0 + b0 )
All these functions can be computed in one unit of time directly from ai , bi
and input carry c0 . So these are all ready at the end of the first time slot.
Since c1 = G00 , c1 is also ready at the end of first slot.

Dinesh Sharma (IIT B) Adders October 16, 2022 49 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

Brent Kung adder: higher orders

We can define G and P functions which operate over multiple bits. Higher
order G and P values are computed as

G = Gu + Pu · Gl , P = Pu · Pl

where u and l stand for upper half range and lower half range for a range
of bit indices.
These can be computed within one time slot from the next lower order G
and P values. Thus higher orders of G and P values, (successively
covering twice the range of indices for the previous order) will be
available in each time slot.
Internal carries are computed using functions like C = G + P · Cin .
Depending on the order of G and P values, we can compute carry values
whose indices are 1, 2, 4, 8 . . . bits higher than the input carry. This
computation also takes one time slot, but can be performed only after the
needed Cin , P and G values are available.

Dinesh Sharma (IIT B) Adders October 16, 2022 50 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder

G and P values for single bits are available at the end of first slot.
G and P values spanning groups of 2 bits are available at the end of
second slot. G and P values spanning groups of 4 bits are available at
the end of third slot. G and P values spanning groups of 8 bits are
available at the end of fourth slot. G and P values spanning groups of 16
bits are available at the end of fifth slot.
Finally, G and P values spanning the full word of 32 bits are available at
the end of sixth slot.
G and P values are available over spans of 2n bits. The start bit for these
spans has a granularity of 2n bits. For example, second order values
connect 0 → 4, 4 → 8 etc. We cannot connect using these from 1 → 5 in
a Brent Kung adder.
The lowest index G value for any order i is automatically the carry value
for bit index 2i .

Dinesh Sharma (IIT B) Adders October 16, 2022 51 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder

at time =0, all ai , bi and c0 are available.


at time =1, all Pi0 and Gi0 are available. c1 = G00 is also available.
at time =2, all 2 bit P and G values (P..1 and G..1 ) are available. c2 = G(1:0)
1

has been computed.


at time =3, all 4 bit P and G values (P..2 and G..2 ) are available. c4 = G(3:0)
2
,
0 0
c3 ← c2 using G2 , P2 and c2 have also been computed.
at time =4, all 8 bit P and G values (P..3 and G..3 ) are available.
c8 = G(7:0 )3 is also available.
c5 ← c4 using G40 , P40 and c4 ; as well as c6 ← c4 using G(5:4)
1 1
, P(5:4) and
c4 have been computed.

Dinesh Sharma (IIT B) Adders October 16, 2022 52 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder

at time =5, all 16 bit P and G values (P..4 and G..4 ) have been computed.
4
c16 = G(15:0) is also available.
c7 ← c6 using G60 , P60 and c6 ; c9 ← c8 using G8 0, P80 and c8 ;
1 1
c10 ← c8 using G(9:8) , P(9:8) and c8 ;
2 2
c12 ← c8 using G(11:8) , P(11:8) and c8 are all available.
5
at time =6, G(31:0) is generated. This is the value of c32 = Cout .
5
P(31:0) is not required.
0 0 0 0
c11 ← c10 using G10 , P10 and c10 ; c13 ← c12 using G12 , P12 and c12 ;
1 1
c14 ← c12 using G(13:12) , P(13:12) and c12 ;
0 0
c17 ← c16 using G16 , P16 and c16 ;
1 1
c18 ← c16 using G(17:16) , P(17:16) and c16 ;
2 2
c20 ← c16 using G(19:16) , P(19:16) and c16 ; and
3 3
c24 ← c16 using G(23:16) , P(23:16) and c16 have all been computed.

Dinesh Sharma (IIT B) Adders October 16, 2022 53 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder


at time =7, all G and P values for groups of 1, 2, 4, 8 and 16 bits are
available.
0 0
c15 ← c14 using G14 , P14 and c14 .
0 0
c19 ← c18 using G18 , P18 and c18 .
0 0
c21 ← c20 using G20 , P20 and c20 .
1 1
c22 ← c20 using G(21:20) , P(21:20) and c20 .
0 0
c25 ← c24 using G24 , P24 and c24 .
1 1
c26 ← c24 using G(25:24) , P(25:24) and c24 .
2 2
c28 ← c24 using G(27:24) , P(27:24) and c24 .
at time =8, we have computed:
0 0
c23 ← c22 using G22 , P22 and c22 .
0 0
c27 ← c26 using G26 , P26 and c26 .
0 0
c29 ← c28 using G28 , P28 and c28 .
1 1
c30 ← c28 using G(29:28) , P(29:28) and c28 have been computed.
at time =9, we have computed:
0 0
c31 ← c30 using G30 , P30 and c30 .

Dinesh Sharma (IIT B) Adders October 16, 2022 54 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder

We can show the sequence of generation of carry values by the following


diagram:
32 Cout

00 Cin
Carry input to bit number:
31
30
29
28
27
26
25
24
23
22
21
20

09
08
07
06
05
04
03
02
01
19
18
17
16
15
14
13
12
11
10
0
1 G0 P0
2 G1 P1
3 G2 P2
4 G3 P3
Time slot

5 G4 P4
6 G5
7
8
9

Dinesh Sharma (IIT B) Adders October 16, 2022 55 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder: Numerical Example

Taking the example of adding B7A56893H to 506A980CH with an input carry


of ‘1’, let us list the P, G, carry and sum bits generated in each time slot.
In the first slot, we generate the single bit P and G values.
a 1011 0111 1010 0101 0110 1000 1001 0011
b 0101 0000 0110 1010 1001 1000 0000 1100
P0 1110 0111 1100 1111 1111 0000 1001 1111
G0 0001 0000 0010 0000 0000 1000 0000 0001†

Pi0 = ai ⊕ bi , Gi0 = ai · bi
†G00 is generated as a0 · b0 + c0 · (a0 + b0 )
c1 = G00 = 1

Dinesh Sharma (IIT B) Adders October 16, 2022 56 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder: Numerical Example

In the second slot, we generate P and G values spanning two bits each.
From now on,
m+1
Prange = Pum · Plm , m+1
Grange = Gum + Pum · Glm ,

where u represents the upper half range and l represents the lower half range.

P0 1110 0111 1100 1111 1111 0000 1001 1111


G0 0001 0000 0010 0000 0000 1000 0000 0001
P1 10 01 10 11 11 00 00 11
G1 01 00 01 00 00 10 00 01
1
c2 = G1−0 =1
s0 = P00 ⊕ c0 = 1 ⊕ 1 = 0, s1 = P10 ⊕ c1 = 1 ⊕ 1 = 0.

Dinesh Sharma (IIT B) Adders October 16, 2022 57 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder: Numerical Example

In the third slot, we calculate P and G values spanning 4 bits each.

P1 10 01 10 11 11 00 00 11
G1 01 00 01 00 00 10 00 01
P2 0 0 0 1 1 0 0 1
G2 1 0 1 0 0 1 0 1
2
c4 = G3−0 = 1. We can also compute
c3 = G20 + P20 · c2 = 0 + 1 · 1 = 1,
s2 = P20 ⊕ c2 = 1 ⊕ 1 = 0

Dinesh Sharma (IIT B) Adders October 16, 2022 58 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder: Numerical Example

In the fourth slot, we calculate P and G values spanning 8 bits each.

P2 0 0 0 1 1 0 0 1
G2 1 0 1 0 0 1 0 1
P3 0 0 0 0
G3 1 1 1 0
3
c8 = G7−0 = 0. We can also compute
c5 = G40 + P40 · c4 = 0 + 1 · 1 = 1, c6 = G5−4
1 1
+ P5−4 · c4 = 0 + 0 · 1 = 0.
s3 = P30 ⊕ c3 = 1 ⊕ 1 = 0, s4 = P40 ⊕ c4 = 1 ⊕ 1 = 0.

Dinesh Sharma (IIT B) Adders October 16, 2022 59 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder: Numerical Example

In the fifth slot, we calculate P and G values spanning 16 bits each.

P3 0 0 0 0
G3 1 1 1 0
P4 0 0
G4 1 1
4
c16 = G15−0 = 1. We can also compute
c7 = G60 + P60 · c6 = 0 + 1 · 0 = 0, c9 = G80 + P80 · c8 = 0 + 0 · 0 = 0,
1 1
c10 = G9−8 + P9−8 · c8 = 0 + 0 · 0 = 0,
2 2
c12 = G11−8 + P11−8 · c8 = 1 + 0 · 0 = 1.
s5 = P50 ⊕ c5 = 0 ⊕ 1 = 1. s6 = P60 ⊕ c6 = 0 ⊕ 0 = 0.
s8 = P80 ⊕ c8 = 0 ⊕ 0 = 0.

Dinesh Sharma (IIT B) Adders October 16, 2022 60 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder: Numerical Example

5 4 4 4
In the sixth slot, we compute G31−0 = G31−16 + P31−16 · G15−0 .
5
P31−0 is not required.
5
This gives Cout = c32 = G31−0 = 1. We can further compute:
0 0
c11 = G10 + P10 · c10 = 0 + 0 · 0 = 0,
0 0
c13 = G12 + P12 · c12 = 0 + 1 · 1 = 1,
1 1
c14 = G13−12 + P13−12 · c12 = 0 + 1 · 1 = 1,
0 0
c17 = G16 + P16 · c16 = 0 + 1 · 1 = 1,
1 1
c18 = G17−16 + P17−16 · c16 = 1 + 1 · 1 = 1,
2 2
c20 = G19−16 + P19−16 · c16 = 1 + 1 · 1 = 1,
3 3
c24 = G23−16 + P23−16 · c16 = 0 + 1 · 1 = 1
s7 = P70 ⊕ c7 = 1 ⊕ 0 = 1, s9 = P90 ⊕ c9 = 0 ⊕ 0 = 0,
0 0
s10 = P10 ⊕ c10 = 0 ⊕ 0 = 0, s12 = P12 ⊕ c12 = 1 ⊕ 1 = 0,
0
s16 = P16 ⊕ c16 = 1 ⊕ 1 = 0,

Dinesh Sharma (IIT B) Adders October 16, 2022 61 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder: Numerical Example

In the seventh slot, All the required values of P and G are already available.
We can compute:
0 0 0 0
c15 = G14 + P14 · c14 = 0 + 1 · 1 = 1 c19 = G18 + P18 · c18 = 0 + 1 · 1 = 1
0 0 1 1
c21 = G20 + P20 · c20 = 0 + 0 · 1 = 0 c22 = G21−20 + P21−20 · c20 = 1 + 0 · 0 = 1
0 0 1 1
c25 = G24 + P24 · c24 = 0 + 1 · 1 = 1 c26 = G25−24 + P25−24 · c24 = 0 + 1 · 1 = 1
2 2
c28 = G27−24 + P27−24 · c24 = 0 + 0 · 1 = 0
0 0
s11 = P11 ⊕ c11 = 0 ⊕ 0 = 0, s13 = P13 ⊕ c13 = 1 ⊕ 1 = 0,
0 0
s14 = P14 ⊕ c14 = 1 ⊕ 1 = 0, s17 = P17 ⊕ c17 = 1 ⊕ 1 = 0,
0 0
s18 = P18 ⊕ c18 = 1 ⊕ 1 = 0, s20 = P20 ⊕ c20 = 0 ⊕ 1 = 1,
0
s24 = P10 ⊕ c24 = 1 ⊕ 1 = 0,

Dinesh Sharma (IIT B) Adders October 16, 2022 62 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder: Numerical Example

In the eighth slot, we can compute:


0 0
c23 = G22 + P22 · c22 = 0 + 1 · 1 = 1,
0 0
c27 = G26 + P26 · c26 = 0 + 1 · 1 = 1,
0 0
c29 = G28 + P28 · c28 = 1 + 0 · 1 = 1,
1 1
c30 = G29−28 + P29−28 · c28 = 1 + 0 · 1 = 1.
Sums corresponding to carries computed in the previous slot can also be
evaluated as:
0
s15 = P15 ⊕ c15 = 1 ⊕ 1 = 0,
0
s19 = P19 ⊕ c19 = 1 ⊕ 1 = 0,
0
s21 = P21 ⊕ c21 = 0 ⊕ 0 = 0,
0
s22 = P22 ⊕ c22 = 1 ⊕ 1 = 0,
0
s25 = P25 ⊕ c25 = 1 ⊕ 1 = 0,
0
s26 = P26 ⊕ c26 = 1 ⊕ 1 = 0,
0
s28 = P28 ⊕ c28 = 0 ⊕ 0 = 0.

Dinesh Sharma (IIT B) Adders October 16, 2022 63 / 66


Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder

32 bit Brent Kung adder: Numerical Example

0 0
In the ninth slot, we can compute c31 = G30 + P30 · c30 = 0 + 1 · 1 = 1,
and the sum values
0
s23 = P23 ⊕ c23 = 1 ⊕ 1 = 0,
0
s27 = P29 ⊕ c29 = 0 ⊕ 1 = 1,
0
s29 = P29 ⊕ c29 = 1 ⊕ 1 = 0,
0
s30 = P30 ⊕ c30 = 1 ⊕ 1 = 0,
0
Finally in the tenth slot, we can evaluate s31 as s31 = P31 ⊕ c31 = 1 ⊕ 1 = 0.
Thus we have
Cin 1110 1111 1101 1111 1111 0000 0011 1111
a 1011 0111 1010 0101 0110 1000 1001 0011
b 0101 0000 0110 1010 1001 1000 0000 1100
sum 0000 1000 0001 0000 0000 0000 1010 0000

Final carry out is 1.

Dinesh Sharma (IIT B) Adders October 16, 2022 64 / 66


Serial Adders

Serial Adders

Up to now, we have been concerned with making fast adders, even at the cost
of increased complexity and power.
In many applications, speed is not as important as low power consumption
and low cost.
Serial adders are an attractive option in such cases.
A single full adder is used.
If numbers to be added are available in parallel form, these can be serialized
using shift registers.

Dinesh Sharma (IIT B) Adders October 16, 2022 65 / 66


Serial Adders

Serial Adders

A single full adder adds the incoming bits. Bits to be added are fed to it
serially, LSB first.
The sum bit goes to the output while carry is stored in a flip-flop.
Carry then gets added to the more significant bits which arrive next.
Output can be converted to parallel form if needed, using another shift
register.
Cin

Load Cprev
Csel Q
Cy Mux
A operand C D
A Shift Register
Shift Registers Sum
B
Output
B operand Full Adder Cout Latch

Dinesh Sharma (IIT B) Adders October 16, 2022 66 / 66

You might also like