05 Adders

Adders
Dinesh Sharma
EE Department
IIT Bombay, Mumbai
October 16, 2022
Dinesh Sharma (IIT B) Adders October 16, 2022 1 / 66

1 Half and Full Adders
2 Ripple Carry adder
3 Carry Look Ahead

Manchester Carry Chain
4 Carry Bypass Adder
5 Carry Select Adder

Stacking Carry Select Adders
6 Tree Adders
Brent Kung adder
Tutorial: 32 bit Brent Kung Logarithmic Adder
7 Serial Adders

Half and Full Adders
Half Adder
The truth table for addition of two bits is:

A B Sum Carry
0 0 0 0
0 1 1 0 sum = A · B + B · A
carry = A · B
1 0 1 0
1 1 0 1
What do we do with the carry?

Obviously, it must be added to more significant bits.
So we need an adder with three inputs.

Half and Full Adders
Full Adder
Truth Table for the addition of

three bits is: Which leads to the following Karnaugh maps:
A B Cin Sum Cout
0 0 0 0 0 AB
Cin 00 01 11 10
0 1 0 1 0 0 0 1 0 1
SUM
1 0 0 1 0 1 1 0 1 0
1 1 0 0 1
0 0 1 1 0 AB
0 1 1 0 1 Cin 00 01 11 10
0 0 0 1 0 CARRY
1 0 1 0 1
1 0 1 1 1
1 1 1 1 1
sum = A · B · Cin + A · B · Cin + A · B · Cin + A · B · Cin

Cout = A · B + B · Cin + Cin · A = A · B + Cin · (A + B)

Ripple Carry adder
Ripple Carry adder
A2 B2 A1 B1 A0 B0
Cout Cout Cout Cout Cout
S2 S1 S0
Carry out of one bit becomes Carry in of the next.

This architecture is therefore called ripple carry adder.
The critical delay path of the adder is the carry rippling from one bit to the
next.

Ripple Carry adder
Sum derived from carry
Because carry is on the critical path, Carry-out must be generated as

quickly as possible.
We need not optimize the delay of generating sum.
We can in fact generate sum from Carry out.
Cout = A · B + Cin · (A + B)
= (A + B) · (Cin + A · B)
= A · Cin + B · Cin + A · B
Cout · (A + B + Cin ) = A · B · Cin + A · B · Cin + A · B · Cin

= Cout · (A + B + Cin ) + A · B · Cin

Ripple Carry adder
CMOS Implementation
VDD VDD
A B
A
Cin
B
B
A
Cout Cout Cout A
Sum Sum
Cin
A A
B
B
A
A Cin
B
Gnd
Cout = A.B + Cin . (A+B) Sum = Cout . (A + B + Cin) + A . B . Cin

Ripple Carry adder
Complementation Property
Both Sum and Carry show an interesting symmetry:

sum = (A + B + Cin ) · (A + B + Cin ) · (A + B + Cin ) · (A + B + Cin )
= (A + A · B + A · Cin + A · B + B · Cin + Cin · A + Cin · B) ·
(A + A · B + A · Cin + A · B + B · Cin + Cin · A + Cin · B)
= (A + B · Cin + B · Cin ) · (A + B · Cin + B · Cin )
= A · B · Cin + A · B · Cin + A · B · Cin + A · B · Cin
Thus

This shows that the same hardware that produces sum from A, B and Cin ,
will produce sum if the inputs are changed to A, B and Cin

Ripple Carry adder
Complementation Property
Carry also has the same complementation property.
Cout = A · B + Cin · (A + B)
Hence, Cout = A · B + Cin · (A + B) = (A + B) · (Cin + A · B)

= A · Cin + B · Cin + A · B
Thus Cout = A · B + Cin · (A + B)

while Cout = A · B + Cin · (A + B)
So the same hardware which produces Cout from A, B and Cin , will produce
Cout from A, B and Cin .

Ripple Carry adder
Making use of the symmetry property
In CMOS implementation, we interchange series and parallel

configurations for the n and p channel transistors.
This is to ensure that the pull up and pull down circuits are
complementary.
However, for sum and carry functions, we see that these functions are
their own complements.
Therefore, for implementing sum and carry, we can use the same
configuration for n and p channel transistors.
We use this to reduce the number of series connected transistors in pull
up/pull down networks.

Ripple Carry adder
Mirror gates for Adders
By making use of symmetry property of sum and carry, it is possible to

simplify the implementations.
VDD
VDD
A B Cin A
B A A
Cin B
Cin A B Cin Cin

A Cout
B Cin Cout Cout Sum Sum
Cout
Cout Cin
Cin B
B
B A A A B Cin
A
Gnd
Gnd
Cout = A.B + Cin . (A+B)
Sum = Cout . (A + B + Cin) + A . B . Cin
These are called mirror gates because the n and p transistors have the same
series parallel combination.
This is highly unusual.

Ripple Carry adder
Speeding up the Ripple Carry Adder
The worst case delay of the ripple carry adder is linear in number of bits
to be added.
To reduce the delay per stage, we can eliminate the inverter from the
carry output.
All even bit adders accept a, b and Cin as inputs. The mirror gate without
inverter gives Cout as the output.
All odd bit adders accept A, B and Cin as inputs and thus produce Cout as
output.
Outputs of all bits are now compatible with inputs of the next stage.

Ripple Carry adder
Speeding up the Ripple Carry Adder
Extra inverters are required to produce A, B and at the outputs to produce

the proper result. However, these are not on the critical path, and do not
add to the worst case delay.
Extreme care needs to be taken in layout to ensure that the loading on
the tree gate producing carry output is as small as possible.

Carry Look Ahead
Terms Independent of Carry
Carry propagation is the critical path for a multi-bit adder.

To speed up the adder, we would like an architecture where logic terms
are classified as those dependent on carry and those which do not
depend on carry.
To speed up the adder, we would like to pre-compute all terms which do
not depend on carry.
Now when the carry arrives, we quickly compute the output carry and
pass it on to the next stage.

Carry Look Ahead
Carry Independent Terms
We would like to analyze what information can be pre-computed from Ai and

Bi , which will help us in generating Cout quickly from Cin .
When Ai = 0 and Bi = 0, Cout is 0, independent of Cin . We define this
condition as ‘Kill’. K = A · B
Similarly, when Ai = 1 and Bi = 1, Cout is 1, independent of Cin . We
define this condition as ‘Generate’: G = A.B.
Only when Ai = 0 and Bi = 1 or when Ai = 1 and Bi = 0,
we need to wait for Cin to compute Cout .
In both these cases, Cout = Cin .
We call this condition as ‘Propagate’, and define P = A.B + A.B.

Carry Look Ahead
Using Carry Independent Terms
We define K ≡ A · B, G ≡ A.B and P ≡ A ⊕ B.

Exactly one of K, G or P is true at any time.
When K = 1, Cout is 0, independent of Cin .
When G = 1, Cout is 1, independent of Cin .
When P = 1, Cout = Cin .
P needs to be computed using an xor gate, which can be slow. However, the
only difference between xor and or logic is when both inputs are 1, i.e. G = 1.
If we can ensure that G forces Cout to 1 irrespective of P, we can use the
simpler ‘or’ logic to compute P.

Carry Look Ahead
Carry Look Ahead
Cin for bit i+1 is the Cout of bit i.

So we can write Ci+1 = Gi + Pi .Ci
Notice that the Kill signal is not required.
If Gi = 0, Ci+1 = A ⊕ B = A + B when G = A.B = 0
If Gi = 1, Ci+1 = 1, and the value of Pi does not matter anyway.
So we can use P = A + B instead of P = A ⊕ B.
Now, we have the sequence:
Ci+1 = Gi + Pi .Ci = Gi + Pi .Gi−1 + Pi .Pi−1 .Ci−1 = · · ·
and so on, till we reach C0 .

Since all Gi , Pi and C0 can be computed in parallel on arrival of the inputs, we
can compute all sum and carry terms independently if we do not mind the
added complexity.

Carry Look Ahead
Carry Look Ahead
Ci+1 = Gi + Pi .Ci = Gi + Pi .Gi−1 + Pi .Pi−1 .Ci−1 = · · ·

Unfortunately, static implementation of these gates has almost as much delay
as the ripple carry implementation.
Therefore, the static implementation of computation of sum and carry terms
as a logic expression depending on all Ai , Bi and C0 is rarely used.
We can use these expressions for blocks of a small number of bits (say 4) and
then propagate carry over these blocks.

Carry Look Ahead Manchester Carry Chain
VDD
P
Static implementation of look ahead carry is not
really fast if we try to look ahead by a large number Cin Cout
of bits, because the logic becomes very complex. G
A dynamic implementation is useful and is widely

used. It is known as the Manchester Carry Chain. Ck
Gnd

VDD
P When the clock is low, the output is unconditionally

charged by the pMOS.
Cin Cout
G When the clock goes high, the output will be pulled
low if G = 1 or if P = 1 and Cin = 0.
Ck
In all other cases, the output will remain high. Thus
this circuit implements the required logic.
Gnd
This circuit can be concatenated for all bits and since P and G are ready
before Cin arrives, the carry quickly ripples through from bit to bit.

Manchester Carry Chain as Carry Look Ahead
VDD
P
Notice that the nMOS logic can be interpreted as:
Cin Cout
G P.Cin + G
where Cin itself has been recursively generated by

Ck similar logic.
Gnd
As in the static case, there is a limit to the number of bits which can be so
connected.
If P = 1 for many successive bits, the discharge path is through series
connected pass transistors of all these gates. The discharge time for this
critical path has an n2 dependence.

Manchester Carry Chain as Carry Look Ahead

The circuit below shows a Manchester carry chain over 4 bits.
VDD
P0 P1 P2 P3
Cin0 Cout0 Cout1 Cout2 Cout3
G0 G1 G2 G3
Ck
If G = 1 for any bit, the output is brought to ‘0’. (Recall that Carry
propagates – not Carry).
The time of carry arrival for all subsequent bits is from the last bit where P
= 0.
The worst case for delay occurs when P = 1 for all bits. In this case, all
load capacitors are shorted, so load capacitance ∝ n.
The discharge of capacitors is through n series connected pass
transistors, so average R is ∝ n.
Thus in the worst case, the delay ∝ RC ∝ n2 .
Carry Bypass Adder
Carry Bypass Adder
The worst case for addition occurs when P = 1 for all bits and carry has to
ripple through all bits.
In carry bypass adder, we form groups of bits and if P = 1 for all members
of a group, we pass on the carry input to this group directly to the input of
the next group, without having to ripple through each bit.
This improves the worst case delay of the adder.
bypass = P0.P1.P2.P3
VDD
P0 P1 P2 P3
Cout0 Cout1 Cout2 Cout3
Cin0 G0 G1 G2 G3
Ck

Carry Select Adder
Single bit Carry Select Adder
One can make a fast adder at the cost of some added complexity, by
implementing two adders, one assuming that Cin = 0 and the other
assuming that Cin = 1.
When the actual carry input arrives at this bit, it chooses the correct one
using a multiplexer, depending on its value.
Since Cout = G + P · Cin , the two cases are:
For Cin = 0, Cout = G = A · B
For Cin = 1, Cout = G + P = A · B + A ⊕ B = A + B
Thus the two candidates for Cout are quite easy to generate, being just the
AND/OR of A and B.
This concept can be extended to multi-bit carry select adders.

Carry Select Adder
Carry Select Adder
An m bit carry select adder can be constructed as follows:

We first compute the generate/propagate/kill signals for each bit (in
parallel) from the input bits. Assuming unit gate delay model, this takes
one unit of time.
We use two m bit carry bypass adders. One of the adders assumes the
carry input Cin to be 0, while the other assumes Cin to be 1. The two
adders work in parallel and each takes m units of time.
We now use a multiplexer controlled by the actual Cin to select the correct
Cout . This takes one unit of time.
The Cout of one such m bit adder will be used as the select input of the
multiplexer of the next.
The sum output of each bit is derived from P and Cout signals for the
corresponding bit and appear one unit of time after Cout is available.

Carry Select Adder
Multi-bit Carry Select Adders
a b
(0) (0) The two m bit sub-adders assume the
Generate
G, P, K carry to be 0 or 1 respectively.
(1)
Cin = 0 m bit m bit Cin = 1

Times of availability of various signals
(0)
adder adder
(0) are noted in parentheses in the
(m+1) (m+1)
Mux Cout
diagram.
Actual Cin (m+2)
(Unit delay times)
The two alternatives for the carry output are ready at (m+1) units of time.
If the actual Cin is available at n units of time, the output will be available
at (m+2) or (n+1), whichever is later.
In case of 4 bit adders, this is at 6 units of time or at Cin arrival + 1,
whichever is later.

Carry Select Adder Stacking Carry Select Adders
Stacking in Carry Select adders
The sub-adders in carry select adder can use any architecture.

They could be Manchester carry chains, carry bypass or ripple carry
adders.
Obviously, these sub adders should not be very long, otherwise, their
outputs will be ready after a long time and we shall lose the advantage of
carry bypass additions.
Then, how do we make long adders using carry select?
This is done by stacking several smaller carry select adders.

First stage of Carry Select adders
The first stage of stacked Carry Select adders is different from the rest.
In this case, we do not have to wait for Cin to arrive – it is already known.
Therefore we do not have to use redundant adders – a single m bit adder
will do.
Since no multiplexing is required, the output of the first stage is ready at
(m + 1) units of time, rather than at (m + 2).
This is convenient – because the two alternatives of the second stage are
also ready at (m + 1) units of time.

Linear Stacking
We could stack several identical carry select adders.

There is no need for carry select in the first stage, as Cin for this stage is
available simultaneously with Ai and Bi .
Every subsequent stage will have two sub-adders, one assuming Cin = 0,
the other assuming Cin = 1.
The correct output will be selected by the actual Cin when it arrives.
Thus, after the first stage, each group of m bit adders will add only one
unit of delay.
This is much faster. However, the delay is still linear in number of bits.

Linear stacking: Example
A 32-bit adder made by cascading 8 4-bit carry select adders.
a (0-3) b (0-3) a (4-7) b (4-7) (5 gps of 4 bits) a (28-31) b (28-31)

(0) (0) (0) (0) (0) (0) (0) (0)
gen G, P, K gen G, P, K gen G, P, K gen G, P, K

Bits cy in alt cy.s cy out
(1)
` 0' (1) ` 1' ` 0' (1) ` 1' ` 0' (1) ` 1'
0-3 0 - 5
Cin
4 bit
Adder
4 bit 4 bit 4 bit 4 bit
Cin Adder Adder Cin Cin Adder Adder Cin
4 bit 4 bit
Cin Adder Adder Cin
4-7 5 5 6
Cout
(5)
Mux
(5)
Cout
(5)
Mux
(5)
Cout
(5)
Mux
(5)
Cout 8-11 6 5 7
(5) (6) (11) (12)
12-15 7 5 8
The sum generation will take another 16-19 8 5 9
unit of time, so the overall results will 20-23 9 5 10
be available in 13 units of time. 24-27 10 5 11
28-31 11 5 12

Square-root Stacking
Can we speed up the adder if we don’t use the same no. of bits in every
stage?
In linear stacking, since all adders are identical, they are ready with their
alternative outputs at the same time.
But the carry arrives later and later at each successive group of carry
select adders.
We could have used this extra time to add up more bits in the later
stages, and still be ready with the alternative results before carry arrives!
Since the carry arrives one unit of time later at each successive group,
each successive group could be longer by one bit.

Square-root Stacking
We can do more bits of addition in the same time, if each successive

stage is 1 bit longer than the previous one.
Thus, the number of bits which can be added is given by
s(m0 + m0 + s − 1)
n = m0 + m0 + (m0 + 1) + (m0 + 2) + · · · = m0 +
2
where s is the number of stages following the first one without carry
select.
The total delay will be m0 + 1 for the first stage. Each subsequent stage
takes just 1 unit of time since the candidates for selection are available
just in time.
The time taken is just m0 + s + 1 units. When s ≫ m0 , we have n ≈ s2 /2,
while the time taken is nearly s.
√
Thus the time taken to add n bits is ≈ 2n

Square-root Stacking: Example
For a 32 bit adder, we could use a distribution like: 4,4,5,6,7,6.
Bits carry in carry alternatives carry out

0-3 0 - 5
4-7 5 5 6
8-12 6 6 7
13-18 7 7 8
19-25 8 8 9
26-31 9 7 10
Our sum will be ready at 11 - which is faster. This gain will be much higher for
wider additions.

Tree Adders
Tree Adders
Tree adders use the idea of carry look ahead addition.

However, these do not try to implement the complex logic expressions
which would result if we try to generate each carry directly from input
operands.
Instead, these build up the logic in a tree like structure, where each node
performs simple logic operations on the results of the previous node.
Because of the tree structure used in this, the delay is of the order of log n
for an n bit adder.

Tree Adders
Carry Look Ahead
For carry look ahead, we had defined

K = A · B, G = A.B and P = A ⊕ B.
P, G and K can be computed without waiting for Cin .
when K = 1 Cout = 0 irrespective of Cin .
when G = 1 Cout = 1 irrespective of Cin .
When P = 1 Cout = Cin : This is the only case when we must wait for Cin in
order to compute Cout
Exactly one of P, G and K will be true for any combination of A, B and C.
Therefore we do not have to compute all three. Most adders just use G and P.

Tree Adders
Terminology
Let us first establish the terminology used for this section.

aN-1 bN-1 ai bi a1 b1 a0 b0
cN
N-1
cN-1 ci+1
i
ci
1
c1
0
c0 The least significant bit is indexed as 0
GN-1, PN-1 Gi, Pi G1, P1 G0, P0 and the most significant bit as N − 1.
sN-1 si s1 s0
The input operands to the adder are A = (aN−1 · · · a0 ) and

B = (bN−1 · · · b0 ), with a possible input carry c0 . All these bits are
available at the start.
ci represents the input carry to the i’th bit.
The output carry from bit i is ci+1 , which is the input carry for bit (i+1).
Thus c0 represents the overall input carry for the addition and cN
represents the final output carry.
si represents the sum output from the i’th bit.

Tree Adders
P and G signals over blocks of multiple bits

The Generate and Propagate signals are derived exclusively from ai and bi
inputs and are independent of carry input. These can thus be generated in
constant time and in parallel for all the bits.
The output carry for i’th bit is generated from the incoming carry using the
relation: ci+1 = Gi + Pi · ci . Similarly, ci = Gi−1 + Pi−1 · ci−1 .
Substituting for ci in the relation for ci+1 , we get
ci+1 = Gi + Pi · (Gi−1 + Pi−1 · ci−1 ) = (Gi + Pi · Gi−1 ) + (Pi · Pi−1 ) · ci−1
If we define Gi:i−1 ≡ Gi + Pi · Gi−1 and Pi:i−1 ≡ Pi · Pi−1 , we get the
relation: ci+1 = Gi:i−1 + Pi:i−1 · ci−1
This is the same relation as the one used for single bit carry generation,
but permits us to compute ci+1 directly from ci−1 .
Thus Gi:i−1 and Pi:i−1 are effectively the Generate and Propagate values
for a block of 2 bits (i and i − 1).
Like Gi and Pi , Gi:i−1 and Pi:i−1 are independent of carry and can be
computed in constant time from A and B in parallel.

Tree Adders
Higher order P and G

Just as we combined single bit G and P values to get new G and P values
which operate over two bits, we can combine these 2 bit G and P values
to get G and P values which operate over 4 bits and so on.
In general, if we take two contiguous ranges u and l each of size 2n , we
can write for the combined range (u : l) of size 2n+1 , the recursive
relation:
Gu:l = Gu + Pu · Gl and Pu:l = Pu · Pl
This suggests a tree structure for computation of successive G and P
values which operate over bigger and bigger ranges of bits.
To distinguish G and P values operating over ranges of different sizes,
we’ll use a superscript which gives the “order” of computation of these.
Thus single bit G and P values will carry a superscript of 0, 2 bit values
will use a superscript of 1 and so on. Eventually, G and P values covering
a range of 2m bits will carry a superscript of m.
As before, G and P values will carry a subscript which gives the range of
bit indices over which these operate.
Tree Adders
Higher order P and G
Once the highest order P and G values have been generated, the final
carry can be computed in one step from the input carry.
The final result contains all the sum bits and the final carry. So it may
appear that we do not need the intermediate carries at each bit.
However, the sum bits depend on internal carries. The sum bits are given
by:
Si = Ai ⊕ Bi ⊕ Ci = Pi ⊕ Ci
Thus we do need the internal bit-wise carries for sum generation.
The group size over which the carry can be computed directly multiplies
by two each time we use a higher order for G and P values.
On the other hand, the time to compute the required higher order G and
P values increments by one gate delay.
(time to compute A + B · C for G and A · B for P).
This results in the ultimate time to generate the all the P and G values
being logarithmic in the number of bits being added.

Tree Adders
Logarithmic Adders
Using P and G values of different orders, we can compute the bit wise
carry and sum values.
Notice that in logarithmic adders, internal bit-wise sum and carry values
may be available after the final carry.
Thus the critical path is not the generation of the final carry, but that of
bit-wise sums.
Different architectures have been described in literature for the order of
computation of G, P, Cout and Sum bits.
All of these compute the final result in times which are logarithmic
functions of the number of bits.
For wide adders,these can be much faster than other architectures.

Tree Adders Brent Kung adder
Brent Kung adder
The Brent Kung tree adder is a logarithmic adder of low complexity.

Generate and Pass signals are successively computed over groups of 1
bit, 2bits, 4bits, . . . in a tree structure.
Since the number of bits covered in every step doubles, the total time
taken for this is a logarithmic function of the number of bits.
Values of multiple orders of G and P so computed are then used to
compute the internal carry values at each internal bit, from which sum
values for every bit are derived.
This step is called a back trace and also takes logarithmic time.

Brent Kung adder
The figure below shows the generation of P and G values for an 8 bit
adder.
a7 b7 a6 b6 a5 b5 a4 b4 a3 b3 a2 b2 a1 b1 a0 b0
P70G70 P60G60 P50G50 P40G40 P30G30 P20G20 P10G10 P00G00
P7:61 G7:61 P5:41 G5:41 P3:21 G3:21 P1:01 G1:01
P7:42 G7:42 P3:02 G3:02
P7:03 G7:03

Brent Kung adder

P7:61 G7:61 P5:41 G5:41 P3:21 G3:21 P1:01 G1:01
P7:42 G7:42 P3:02 G3:02
P7:03 G7:03
we first calculate Pi1 , Gi1 , with i = 0 · · · 7.

Gi = Ai · Bi , Pi = Ai ⊕ Bi
2 2
Next, using these values, we can generate P2i+1,2i , G2i+1,2i
with i = 0 · · · 3.
2 1 1 1 2 1
G2i+1,2i = G2i+1 + P2i+1 · G2i , P2i+1,2i = P2i+1 · P2i1
Brent Kung adder

P7:61 G7:61 P5:41 G5:41 P3:21 G3:21 P1:01 G1:01
P7:42 G7:42 P3:02 G3:02
P7:03 G7:03
3 3
In the next step, we use second order P,G values to generate P4i+3,4i , G4i+3,4i
with i = 0, 1.
3 2 2 2 3 2 2
G7,4 = G7,6 + P7,6 · G5,4 , P7,4 = P7,6 · P5,4
3 2 2 2 3 2 2
G3,0 = G3,2 + P3,2 · G1,0 , P3,0 = P3,2 · P1,0

Brent Kung adder

P7:61 G7:61 P5:41 G5:41 P3:21 G3:21 P1:01 G1:01
P7:42 G7:42 P3:02 G3:02
P7:03 G7:03
3 3 4 4
Finally, using G4i+3,4i and P4i+3,4i (with i = 0, 1) we can compute P7,0 , G7,0 .
4 3 3 3
G7,0 = G7,4 + P7,4 · G3,0
4 3 3
P7,0 = P7,4 · P3,0

Brent Kung adder
Once P and G terms of various orders are known, we can compute the values
of carry outputs which depend on these and the input carry C0 , which is
available at t = 0.
C1 = G01 + P01 · C0 , 2
C2 = G1,0 2
+ P1,0 · C0
3 3 4 4
C4 = G3,0 + P3,0 · C0 , C8 = G7,0 + P7,0 · C0
When these carry values are valid, the other carry values which depend on
these can be generated.

Brent Kung adder
Once C1 , C2 , C4 and C8 have been generated, we can produce internal

carries which depend on these.
C3 = G21 + P21 · C2 , C5 = G41 + P41 · C4 2

C6 = G5,4 2
+ P5,4 · C4 ,
Finally, C7 can be generated from C6 .
C7 = G61 + P61 · C6
With all carry values generated, the corresponding sum values can be
calculated using the relation Sumi = Pi1 ⊕ Ci .

Tree Adders Tutorial: 32 bit Brent Kung Logarithmic Adder
Logarithmic Adders with a tree architecture

We illustrate the operation of a 32 bit Brent Kung adder with a numerical
example.
Recall that if we represents indices for upper half of a range by u and the
lower half by l, we can write:
G(u:l) = Gu + Pu · Gl , whereas Cnext = G(u) + P(u) · Cprev
Notice that G values are computed by the same logic relation as carry
outputs.
The input carry C0 is known at the start itself.
Whenever the carry is already known, we can replace Gl by this carry.
The computed value of G(u:l) will then be the carry output, rather than the
G value. This value can be used for further G calculations and will directly
give the carry each time.
This can reduce the computation required to generate the carry and sum
values since some of the carry values are already available.

32 bit Brent Kung adder: Order 0
We use a unit time model in which we assume that logic functions AND,
XOR, A + B.C as well as A.B + C.(A+B) take the same amount of time,
which defines 1 slot of time for this tutorial.
The single Bit G and P values (designated as order 0) are given by
Pi0 = ai ⊕ bi , Gi0 = ai · bi , except G00 = ai · bi + c0 · (a0 + b0 )
An exception is made for the least significant bit of G because for this bit,
the input carry is known at the start.
We make use of this and compute effectively the carry output from bit 0
(c1 ) and map the output carry as if it was due to a generate signal at this
position. Thus,
G00 = c1 = a0 · b0 + c0 · (a0 + b0 )
All these functions can be computed in one unit of time directly from ai , bi
and input carry c0 . So these are all ready at the end of the first time slot.
Since c1 = G00 , c1 is also ready at the end of first slot.

Brent Kung adder: higher orders
We can define G and P functions which operate over multiple bits. Higher
order G and P values are computed as
G = Gu + Pu · Gl , P = Pu · Pl
where u and l stand for upper half range and lower half range for a range
of bit indices.
These can be computed within one time slot from the next lower order G
and P values. Thus higher orders of G and P values, (successively
covering twice the range of indices for the previous order) will be
available in each time slot.
Internal carries are computed using functions like C = G + P · Cin .
Depending on the order of G and P values, we can compute carry values
whose indices are 1, 2, 4, 8 . . . bits higher than the input carry. This
computation also takes one time slot, but can be performed only after the
needed Cin , P and G values are available.

32 bit Brent Kung adder
G and P values for single bits are available at the end of first slot.
G and P values spanning groups of 2 bits are available at the end of
second slot. G and P values spanning groups of 4 bits are available at
the end of third slot. G and P values spanning groups of 8 bits are
available at the end of fourth slot. G and P values spanning groups of 16
bits are available at the end of fifth slot.
Finally, G and P values spanning the full word of 32 bits are available at
the end of sixth slot.
G and P values are available over spans of 2n bits. The start bit for these
spans has a granularity of 2n bits. For example, second order values
connect 0 → 4, 4 → 8 etc. We cannot connect using these from 1 → 5 in
a Brent Kung adder.
The lowest index G value for any order i is automatically the carry value
for bit index 2i .

at time =0, all ai , bi and c0 are available.

at time =1, all Pi0 and Gi0 are available. c1 = G00 is also available.
at time =2, all 2 bit P and G values (P..1 and G..1 ) are available. c2 = G(1:0)
1
has been computed.

at time =3, all 4 bit P and G values (P..2 and G..2 ) are available. c4 = G(3:0)
2
,
0 0
c3 ← c2 using G2 , P2 and c2 have also been computed.
at time =4, all 8 bit P and G values (P..3 and G..3 ) are available.
c8 = G(7:0 )3 is also available.
c5 ← c4 using G40 , P40 and c4 ; as well as c6 ← c4 using G(5:4)
1 1
, P(5:4) and
c4 have been computed.

at time =5, all 16 bit P and G values (P..4 and G..4 ) have been computed.
4
c16 = G(15:0) is also available.
c7 ← c6 using G60 , P60 and c6 ; c9 ← c8 using G8 0, P80 and c8 ;
1 1
c10 ← c8 using G(9:8) , P(9:8) and c8 ;
2 2
c12 ← c8 using G(11:8) , P(11:8) and c8 are all available.
5
at time =6, G(31:0) is generated. This is the value of c32 = Cout .
5
P(31:0) is not required.
0 0 0 0
c11 ← c10 using G10 , P10 and c10 ; c13 ← c12 using G12 , P12 and c12 ;
1 1
c14 ← c12 using G(13:12) , P(13:12) and c12 ;
0 0
c17 ← c16 using G16 , P16 and c16 ;
1 1
c18 ← c16 using G(17:16) , P(17:16) and c16 ;
2 2
c20 ← c16 using G(19:16) , P(19:16) and c16 ; and
3 3
c24 ← c16 using G(23:16) , P(23:16) and c16 have all been computed.


at time =7, all G and P values for groups of 1, 2, 4, 8 and 16 bits are
available.
0 0
c15 ← c14 using G14 , P14 and c14 .
0 0
c19 ← c18 using G18 , P18 and c18 .
0 0
c21 ← c20 using G20 , P20 and c20 .
1 1
c22 ← c20 using G(21:20) , P(21:20) and c20 .
0 0
c25 ← c24 using G24 , P24 and c24 .
1 1
c26 ← c24 using G(25:24) , P(25:24) and c24 .
2 2
c28 ← c24 using G(27:24) , P(27:24) and c24 .
at time =8, we have computed:
0 0
c23 ← c22 using G22 , P22 and c22 .
0 0
c27 ← c26 using G26 , P26 and c26 .
0 0
c29 ← c28 using G28 , P28 and c28 .
1 1
c30 ← c28 using G(29:28) , P(29:28) and c28 have been computed.
at time =9, we have computed:
0 0
c31 ← c30 using G30 , P30 and c30 .

We can show the sequence of generation of carry values by the following

diagram:
32 Cout
00 Cin
Carry input to bit number:
31
30
29
28
27
26
25
24
23
22
21
20
09
08
07
06
05
04
03
02
01
19
18
17
16
15
14
13
12
11
10
0
1 G0 P0
2 G1 P1
3 G2 P2
4 G3 P3
Time slot
5 G4 P4
6 G5
7
8
9

32 bit Brent Kung adder: Numerical Example
Taking the example of adding B7A56893H to 506A980CH with an input carry

of ‘1’, let us list the P, G, carry and sum bits generated in each time slot.
In the first slot, we generate the single bit P and G values.
a 1011 0111 1010 0101 0110 1000 1001 0011
b 0101 0000 0110 1010 1001 1000 0000 1100
P0 1110 0111 1100 1111 1111 0000 1001 1111
G0 0001 0000 0010 0000 0000 1000 0000 0001†
Pi0 = ai ⊕ bi , Gi0 = ai · bi
†G00 is generated as a0 · b0 + c0 · (a0 + b0 )
c1 = G00 = 1

In the second slot, we generate P and G values spanning two bits each.
From now on,
m+1
Prange = Pum · Plm , m+1
Grange = Gum + Pum · Glm ,
where u represents the upper half range and l represents the lower half range.
P0 1110 0111 1100 1111 1111 0000 1001 1111

G0 0001 0000 0010 0000 0000 1000 0000 0001
P1 10 01 10 11 11 00 00 11
G1 01 00 01 00 00 10 00 01
1
c2 = G1−0 =1
s0 = P00 ⊕ c0 = 1 ⊕ 1 = 0, s1 = P10 ⊕ c1 = 1 ⊕ 1 = 0.

In the third slot, we calculate P and G values spanning 4 bits each.
P1 10 01 10 11 11 00 00 11
G1 01 00 01 00 00 10 00 01
P2 0 0 0 1 1 0 0 1
G2 1 0 1 0 0 1 0 1
2
c4 = G3−0 = 1. We can also compute
c3 = G20 + P20 · c2 = 0 + 1 · 1 = 1,
s2 = P20 ⊕ c2 = 1 ⊕ 1 = 0

In the fourth slot, we calculate P and G values spanning 8 bits each.
P2 0 0 0 1 1 0 0 1
G2 1 0 1 0 0 1 0 1
P3 0 0 0 0
G3 1 1 1 0
3
c5 = G40 + P40 · c4 = 0 + 1 · 1 = 1, c6 = G5−4
1 1
+ P5−4 · c4 = 0 + 0 · 1 = 0.
s3 = P30 ⊕ c3 = 1 ⊕ 1 = 0, s4 = P40 ⊕ c4 = 1 ⊕ 1 = 0.

In the fifth slot, we calculate P and G values spanning 16 bits each.
P3 0 0 0 0
G3 1 1 1 0
P4 0 0
G4 1 1
4
c7 = G60 + P60 · c6 = 0 + 1 · 0 = 0, c9 = G80 + P80 · c8 = 0 + 0 · 0 = 0,
1 1
c10 = G9−8 + P9−8 · c8 = 0 + 0 · 0 = 0,
2 2
c12 = G11−8 + P11−8 · c8 = 1 + 0 · 0 = 1.
s5 = P50 ⊕ c5 = 0 ⊕ 1 = 1. s6 = P60 ⊕ c6 = 0 ⊕ 0 = 0.
s8 = P80 ⊕ c8 = 0 ⊕ 0 = 0.

5 4 4 4
In the sixth slot, we compute G31−0 = G31−16 + P31−16 · G15−0 .
5
P31−0 is not required.
5
This gives Cout = c32 = G31−0 = 1. We can further compute:
0 0
c11 = G10 + P10 · c10 = 0 + 0 · 0 = 0,
0 0
c13 = G12 + P12 · c12 = 0 + 1 · 1 = 1,
1 1
c14 = G13−12 + P13−12 · c12 = 0 + 1 · 1 = 1,
0 0
c17 = G16 + P16 · c16 = 0 + 1 · 1 = 1,
1 1
c18 = G17−16 + P17−16 · c16 = 1 + 1 · 1 = 1,
2 2
c20 = G19−16 + P19−16 · c16 = 1 + 1 · 1 = 1,
3 3
c24 = G23−16 + P23−16 · c16 = 0 + 1 · 1 = 1
s7 = P70 ⊕ c7 = 1 ⊕ 0 = 1, s9 = P90 ⊕ c9 = 0 ⊕ 0 = 0,
0 0
s10 = P10 ⊕ c10 = 0 ⊕ 0 = 0, s12 = P12 ⊕ c12 = 1 ⊕ 1 = 0,
0
s16 = P16 ⊕ c16 = 1 ⊕ 1 = 0,

In the seventh slot, All the required values of P and G are already available.
We can compute:
0 0 0 0
c15 = G14 + P14 · c14 = 0 + 1 · 1 = 1 c19 = G18 + P18 · c18 = 0 + 1 · 1 = 1
0 0 1 1
c21 = G20 + P20 · c20 = 0 + 0 · 1 = 0 c22 = G21−20 + P21−20 · c20 = 1 + 0 · 0 = 1
0 0 1 1
c25 = G24 + P24 · c24 = 0 + 1 · 1 = 1 c26 = G25−24 + P25−24 · c24 = 0 + 1 · 1 = 1
2 2
c28 = G27−24 + P27−24 · c24 = 0 + 0 · 1 = 0
0 0
s11 = P11 ⊕ c11 = 0 ⊕ 0 = 0, s13 = P13 ⊕ c13 = 1 ⊕ 1 = 0,
0 0
s14 = P14 ⊕ c14 = 1 ⊕ 1 = 0, s17 = P17 ⊕ c17 = 1 ⊕ 1 = 0,
0 0
s18 = P18 ⊕ c18 = 1 ⊕ 1 = 0, s20 = P20 ⊕ c20 = 0 ⊕ 1 = 1,
0
s24 = P10 ⊕ c24 = 1 ⊕ 1 = 0,

In the eighth slot, we can compute:

0 0
c23 = G22 + P22 · c22 = 0 + 1 · 1 = 1,
0 0
c27 = G26 + P26 · c26 = 0 + 1 · 1 = 1,
0 0
c29 = G28 + P28 · c28 = 1 + 0 · 1 = 1,
1 1
c30 = G29−28 + P29−28 · c28 = 1 + 0 · 1 = 1.
Sums corresponding to carries computed in the previous slot can also be
evaluated as:
0
s15 = P15 ⊕ c15 = 1 ⊕ 1 = 0,
0
s19 = P19 ⊕ c19 = 1 ⊕ 1 = 0,
0
s21 = P21 ⊕ c21 = 0 ⊕ 0 = 0,
0
s22 = P22 ⊕ c22 = 1 ⊕ 1 = 0,
0
s25 = P25 ⊕ c25 = 1 ⊕ 1 = 0,
0
s26 = P26 ⊕ c26 = 1 ⊕ 1 = 0,
0
s28 = P28 ⊕ c28 = 0 ⊕ 0 = 0.

0 0
In the ninth slot, we can compute c31 = G30 + P30 · c30 = 0 + 1 · 1 = 1,
and the sum values
0
s23 = P23 ⊕ c23 = 1 ⊕ 1 = 0,
0
s27 = P29 ⊕ c29 = 0 ⊕ 1 = 1,
0
s29 = P29 ⊕ c29 = 1 ⊕ 1 = 0,
0
s30 = P30 ⊕ c30 = 1 ⊕ 1 = 0,
0
Finally in the tenth slot, we can evaluate s31 as s31 = P31 ⊕ c31 = 1 ⊕ 1 = 0.
Thus we have
Cin 1110 1111 1101 1111 1111 0000 0011 1111
a 1011 0111 1010 0101 0110 1000 1001 0011
b 0101 0000 0110 1010 1001 1000 0000 1100
sum 0000 1000 0001 0000 0000 0000 1010 0000
Final carry out is 1.

Serial Adders
Serial Adders
Up to now, we have been concerned with making fast adders, even at the cost
of increased complexity and power.
In many applications, speed is not as important as low power consumption
and low cost.
Serial adders are an attractive option in such cases.
A single full adder is used.
If numbers to be added are available in parallel form, these can be serialized
using shift registers.

Serial Adders
Serial Adders
A single full adder adds the incoming bits. Bits to be added are fed to it
serially, LSB first.
The sum bit goes to the output while carry is stored in a flip-flop.
Carry then gets added to the more significant bits which arrive next.
Output can be converted to parallel form if needed, using another shift
register.
Cin
Load Cprev
Csel Q
Cy Mux
A operand C D
A Shift Register
Shift Registers Sum
B
Output
B operand Full Adder Cout Latch

05 Adders

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

05 Adders

Uploaded by

Copyright:

Available Formats

Adders

October 16, 2022

Dinesh Sharma (IIT B) Adders October 16, 2022 1 / 66

2 Ripple Carry adder

3 Carry Look Ahead

4 Carry Bypass Adder

5 Carry Select Adder

Dinesh Sharma (IIT B) Adders October 16, 2022 2 / 66

The truth table for addition of two bits is:

What do we do with the carry?

Dinesh Sharma (IIT B) Adders October 16, 2022 3 / 66

Truth Table for the addition of

sum = A · B · Cin + A · B · Cin + A · B · Cin + A · B · Cin

Dinesh Sharma (IIT B) Adders October 16, 2022 4 / 66

Ripple Carry adder

Cout Cout Cout Cout Cout

Carry out of one bit becomes Carry in of the next.

Dinesh Sharma (IIT B) Adders October 16, 2022 5 / 66

Sum derived from carry

Because carry is on the critical path, Carry-out must be generated as

sum = A · B · Cin + A · B · Cin + A · B · Cin + A · B · Cin

Dinesh Sharma (IIT B) Adders October 16, 2022 6 / 66

Cout = A.B + Cin . (A+B) Sum = Cout . (A + B + Cin) + A . B . Cin

Dinesh Sharma (IIT B) Adders October 16, 2022 7 / 66

sum = A · B · Cin + A · B · Cin + A · B · Cin + A · B · Cin

sum = A · B · Cin + A · B · Cin + A · B · Cin + A · B · Cin

Dinesh Sharma (IIT B) Adders October 16, 2022 8 / 66

Carry also has the same complementation property.

Hence, Cout = A · B + Cin · (A + B) = (A + B) · (Cin + A · B)

Thus Cout = A · B + Cin · (A + B)

Dinesh Sharma (IIT B) Adders October 16, 2022 9 / 66

Making use of the symmetry property

In CMOS implementation, we interchange series and parallel

Dinesh Sharma (IIT B) Adders October 16, 2022 10 / 66

Mirror gates for Adders

By making use of symmetry property of sum and carry, it is possible to

Cin A B Cin Cin

Dinesh Sharma (IIT B) Adders October 16, 2022 11 / 66

Speeding up the Ripple Carry Adder

Dinesh Sharma (IIT B) Adders October 16, 2022 12 / 66

Speeding up the Ripple Carry Adder

Extra inverters are required to produce A, B and at the outputs to produce

Dinesh Sharma (IIT B) Adders October 16, 2022 13 / 66

Terms Independent of Carry

Carry propagation is the critical path for a multi-bit adder.

Dinesh Sharma (IIT B) Adders October 16, 2022 14 / 66

Carry Independent Terms

We would like to analyze what information can be pre-computed from Ai and

Dinesh Sharma (IIT B) Adders October 16, 2022 15 / 66

Using Carry Independent Terms

We define K ≡ A · B, G ≡ A.B and P ≡ A ⊕ B.

Dinesh Sharma (IIT B) Adders October 16, 2022 16 / 66

Carry Look Ahead

Cin for bit i+1 is the Cout of bit i.

Ci+1 = Gi + Pi .Ci = Gi + Pi .Gi−1 + Pi .Pi−1 .Ci−1 = · · ·

and so on, till we reach C0 .

Dinesh Sharma (IIT B) Adders October 16, 2022 17 / 66

Carry Look Ahead

Ci+1 = Gi + Pi .Ci = Gi + Pi .Gi−1 + Pi .Pi−1 .Ci−1 = · · ·

Dinesh Sharma (IIT B) Adders October 16, 2022 18 / 66

Manchester Carry Chain

of bits, because the logic becomes very complex. G

A dynamic implementation is useful and is widely