Professional Documents
Culture Documents
05 Adders
05 Adders
Dinesh Sharma
EE Department
IIT Bombay, Mumbai
6 Tree Adders
Brent Kung adder
Tutorial: 32 bit Brent Kung Logarithmic Adder
7 Serial Adders
Half Adder
Full Adder
A2 B2 A1 B1 A0 B0
S2 S1 S0
Cout = A · B + Cin · (A + B)
= (A + B) · (Cin + A · B)
= A · Cin + B · Cin + A · B
Cout · (A + B + Cin ) = A · B · Cin + A · B · Cin + A · B · Cin
CMOS Implementation
VDD VDD
A B
A
Cin
B
B
A
Cout Cout Cout A
Sum Sum
Cin
A A
B
B
A
A Cin
B
Gnd
Complementation Property
Both Sum and Carry show an interesting symmetry:
Thus
This shows that the same hardware that produces sum from A, B and Cin ,
will produce sum if the inputs are changed to A, B and Cin
Complementation Property
Cout = A · B + Cin · (A + B)
So the same hardware which produces Cout from A, B and Cin , will produce
Cout from A, B and Cin .
Cout
Cout Cin
Cin B
B
B A A A B Cin
A
Gnd
Gnd
Cout = A.B + Cin . (A+B)
Sum = Cout . (A + B + Cin) + A . B . Cin
These are called mirror gates because the n and p transistors have the same
series parallel combination.
This is highly unusual.
The worst case delay of the ripple carry adder is linear in number of bits
to be added.
To reduce the delay per stage, we can eliminate the inverter from the
carry output.
All even bit adders accept a, b and Cin as inputs. The mirror gate without
inverter gives Cout as the output.
All odd bit adders accept A, B and Cin as inputs and thus produce Cout as
output.
Outputs of all bits are now compatible with inputs of the next stage.
VDD
P
Static implementation of look ahead carry is not
really fast if we try to look ahead by a large number Cin Cout
VDD
Ck
In all other cases, the output will remain high. Thus
this circuit implements the required logic.
Gnd
This circuit can be concatenated for all bits and since P and G are ready
before Cin arrives, the carry quickly ripples through from bit to bit.
VDD
P
Notice that the nMOS logic can be interpreted as:
Cin Cout
G P.Cin + G
As in the static case, there is a limit to the number of bits which can be so
connected.
If P = 1 for many successive bits, the discharge path is through series
connected pass transistors of all these gates. The discharge time for this
critical path has an n2 dependence.
P0 P1 P2 P3
Cin0 Cout0 Cout1 Cout2 Cout3
G0 G1 G2 G3
Ck
If G = 1 for any bit, the output is brought to ‘0’. (Recall that Carry
propagates – not Carry).
The time of carry arrival for all subsequent bits is from the last bit where P
= 0.
The worst case for delay occurs when P = 1 for all bits. In this case, all
load capacitors are shorted, so load capacitance ∝ n.
The discharge of capacitors is through n series connected pass
transistors, so average R is ∝ n.
Thus in the worst case, the delay ∝ RC ∝ n2 .
Dinesh Sharma (IIT B) Adders October 16, 2022 22 / 66
Carry Bypass Adder
The worst case for addition occurs when P = 1 for all bits and carry has to
ripple through all bits.
In carry bypass adder, we form groups of bits and if P = 1 for all members
of a group, we pass on the carry input to this group directly to the input of
the next group, without having to ripple through each bit.
This improves the worst case delay of the adder.
bypass = P0.P1.P2.P3
VDD
P0 P1 P2 P3
Cout0 Cout1 Cout2 Cout3
Cin0 G0 G1 G2 G3
Ck
One can make a fast adder at the cost of some added complexity, by
implementing two adders, one assuming that Cin = 0 and the other
assuming that Cin = 1.
When the actual carry input arrives at this bit, it chooses the correct one
using a multiplexer, depending on its value.
Since Cout = G + P · Cin , the two cases are:
For Cin = 0, Cout = G = A · B
For Cin = 1, Cout = G + P = A · B + A ⊕ B = A + B
Thus the two candidates for Cout are quite easy to generate, being just the
AND/OR of A and B.
This concept can be extended to multi-bit carry select adders.
a b
(0) (0) The two m bit sub-adders assume the
Generate
G, P, K carry to be 0 or 1 respectively.
(1)
Mux Cout
diagram.
Actual Cin (m+2)
(Unit delay times)
The two alternatives for the carry output are ready at (m+1) units of time.
If the actual Cin is available at n units of time, the output will be available
at (m+2) or (n+1), whichever is later.
In case of 4 bit adders, this is at 6 units of time or at Cin arrival + 1,
whichever is later.
The first stage of stacked Carry Select adders is different from the rest.
In this case, we do not have to wait for Cin to arrive – it is already known.
Therefore we do not have to use redundant adders – a single m bit adder
will do.
Since no multiplexing is required, the output of the first stage is ready at
(m + 1) units of time, rather than at (m + 2).
This is convenient – because the two alternatives of the second stage are
also ready at (m + 1) units of time.
Linear Stacking
Square-root Stacking
Can we speed up the adder if we don’t use the same no. of bits in every
stage?
In linear stacking, since all adders are identical, they are ready with their
alternative outputs at the same time.
But the carry arrives later and later at each successive group of carry
select adders.
We could have used this extra time to add up more bits in the later
stages, and still be ready with the alternative results before carry arrives!
Since the carry arrives one unit of time later at each successive group,
each successive group could be longer by one bit.
Square-root Stacking
s(m0 + m0 + s − 1)
n = m0 + m0 + (m0 + 1) + (m0 + 2) + · · · = m0 +
2
where s is the number of stages following the first one without carry
select.
The total delay will be m0 + 1 for the first stage. Each subsequent stage
takes just 1 unit of time since the candidates for selection are available
just in time.
The time taken is just m0 + s + 1 units. When s ≫ m0 , we have n ≈ s2 /2,
while the time taken is nearly s.
√
Thus the time taken to add n bits is ≈ 2n
Our sum will be ready at 11 - which is faster. This gain will be much higher for
wider additions.
Tree Adders
Terminology
Once the highest order P and G values have been generated, the final
carry can be computed in one step from the input carry.
The final result contains all the sum bits and the final carry. So it may
appear that we do not need the intermediate carries at each bit.
However, the sum bits depend on internal carries. The sum bits are given
by:
Si = Ai ⊕ Bi ⊕ Ci = Pi ⊕ Ci
Thus we do need the internal bit-wise carries for sum generation.
The group size over which the carry can be computed directly multiplies
by two each time we use a higher order for G and P values.
On the other hand, the time to compute the required higher order G and
P values increments by one gate delay.
(time to compute A + B · C for G and A · B for P).
This results in the ultimate time to generate the all the P and G values
being logarithmic in the number of bits being added.
Logarithmic Adders
Using P and G values of different orders, we can compute the bit wise
carry and sum values.
Notice that in logarithmic adders, internal bit-wise sum and carry values
may be available after the final carry.
Thus the critical path is not the generation of the final carry, but that of
bit-wise sums.
Different architectures have been described in literature for the order of
computation of G, P, Cout and Sum bits.
All of these compute the final result in times which are logarithmic
functions of the number of bits.
For wide adders,these can be much faster than other architectures.
The figure below shows the generation of P and G values for an 8 bit
adder.
a7 b7 a6 b6 a5 b5 a4 b4 a3 b3 a2 b2 a1 b1 a0 b0
P7:03 G7:03
P7:03 G7:03
P7:03 G7:03
3 3
In the next step, we use second order P,G values to generate P4i+3,4i , G4i+3,4i
with i = 0, 1.
3 2 2 2 3 2 2
G7,4 = G7,6 + P7,6 · G5,4 , P7,4 = P7,6 · P5,4
3 2 2 2 3 2 2
G3,0 = G3,2 + P3,2 · G1,0 , P3,0 = P3,2 · P1,0
P7:03 G7:03
3 3 4 4
Finally, using G4i+3,4i and P4i+3,4i (with i = 0, 1) we can compute P7,0 , G7,0 .
4 3 3 3
G7,0 = G7,4 + P7,4 · G3,0
4 3 3
P7,0 = P7,4 · P3,0
Once P and G terms of various orders are known, we can compute the values
of carry outputs which depend on these and the input carry C0 , which is
available at t = 0.
C1 = G01 + P01 · C0 , 2
C2 = G1,0 2
+ P1,0 · C0
3 3 4 4
C4 = G3,0 + P3,0 · C0 , C8 = G7,0 + P7,0 · C0
When these carry values are valid, the other carry values which depend on
these can be generated.
C7 = G61 + P61 · C6
With all carry values generated, the corresponding sum values can be
calculated using the relation Sumi = Pi1 ⊕ Ci .
Notice that G values are computed by the same logic relation as carry
outputs.
The input carry C0 is known at the start itself.
Whenever the carry is already known, we can replace Gl by this carry.
The computed value of G(u:l) will then be the carry output, rather than the
G value. This value can be used for further G calculations and will directly
give the carry each time.
This can reduce the computation required to generate the carry and sum
values since some of the carry values are already available.
We use a unit time model in which we assume that logic functions AND,
XOR, A + B.C as well as A.B + C.(A+B) take the same amount of time,
which defines 1 slot of time for this tutorial.
The single Bit G and P values (designated as order 0) are given by
An exception is made for the least significant bit of G because for this bit,
the input carry is known at the start.
We make use of this and compute effectively the carry output from bit 0
(c1 ) and map the output carry as if it was due to a generate signal at this
position. Thus,
G00 = c1 = a0 · b0 + c0 · (a0 + b0 )
All these functions can be computed in one unit of time directly from ai , bi
and input carry c0 . So these are all ready at the end of the first time slot.
Since c1 = G00 , c1 is also ready at the end of first slot.
We can define G and P functions which operate over multiple bits. Higher
order G and P values are computed as
G = Gu + Pu · Gl , P = Pu · Pl
where u and l stand for upper half range and lower half range for a range
of bit indices.
These can be computed within one time slot from the next lower order G
and P values. Thus higher orders of G and P values, (successively
covering twice the range of indices for the previous order) will be
available in each time slot.
Internal carries are computed using functions like C = G + P · Cin .
Depending on the order of G and P values, we can compute carry values
whose indices are 1, 2, 4, 8 . . . bits higher than the input carry. This
computation also takes one time slot, but can be performed only after the
needed Cin , P and G values are available.
G and P values for single bits are available at the end of first slot.
G and P values spanning groups of 2 bits are available at the end of
second slot. G and P values spanning groups of 4 bits are available at
the end of third slot. G and P values spanning groups of 8 bits are
available at the end of fourth slot. G and P values spanning groups of 16
bits are available at the end of fifth slot.
Finally, G and P values spanning the full word of 32 bits are available at
the end of sixth slot.
G and P values are available over spans of 2n bits. The start bit for these
spans has a granularity of 2n bits. For example, second order values
connect 0 → 4, 4 → 8 etc. We cannot connect using these from 1 → 5 in
a Brent Kung adder.
The lowest index G value for any order i is automatically the carry value
for bit index 2i .
at time =5, all 16 bit P and G values (P..4 and G..4 ) have been computed.
4
c16 = G(15:0) is also available.
c7 ← c6 using G60 , P60 and c6 ; c9 ← c8 using G8 0, P80 and c8 ;
1 1
c10 ← c8 using G(9:8) , P(9:8) and c8 ;
2 2
c12 ← c8 using G(11:8) , P(11:8) and c8 are all available.
5
at time =6, G(31:0) is generated. This is the value of c32 = Cout .
5
P(31:0) is not required.
0 0 0 0
c11 ← c10 using G10 , P10 and c10 ; c13 ← c12 using G12 , P12 and c12 ;
1 1
c14 ← c12 using G(13:12) , P(13:12) and c12 ;
0 0
c17 ← c16 using G16 , P16 and c16 ;
1 1
c18 ← c16 using G(17:16) , P(17:16) and c16 ;
2 2
c20 ← c16 using G(19:16) , P(19:16) and c16 ; and
3 3
c24 ← c16 using G(23:16) , P(23:16) and c16 have all been computed.
00 Cin
Carry input to bit number:
31
30
29
28
27
26
25
24
23
22
21
20
09
08
07
06
05
04
03
02
01
19
18
17
16
15
14
13
12
11
10
0
1 G0 P0
2 G1 P1
3 G2 P2
4 G3 P3
Time slot
5 G4 P4
6 G5
7
8
9
Pi0 = ai ⊕ bi , Gi0 = ai · bi
†G00 is generated as a0 · b0 + c0 · (a0 + b0 )
c1 = G00 = 1
In the second slot, we generate P and G values spanning two bits each.
From now on,
m+1
Prange = Pum · Plm , m+1
Grange = Gum + Pum · Glm ,
where u represents the upper half range and l represents the lower half range.
P1 10 01 10 11 11 00 00 11
G1 01 00 01 00 00 10 00 01
P2 0 0 0 1 1 0 0 1
G2 1 0 1 0 0 1 0 1
2
c4 = G3−0 = 1. We can also compute
c3 = G20 + P20 · c2 = 0 + 1 · 1 = 1,
s2 = P20 ⊕ c2 = 1 ⊕ 1 = 0
P2 0 0 0 1 1 0 0 1
G2 1 0 1 0 0 1 0 1
P3 0 0 0 0
G3 1 1 1 0
3
c8 = G7−0 = 0. We can also compute
c5 = G40 + P40 · c4 = 0 + 1 · 1 = 1, c6 = G5−4
1 1
+ P5−4 · c4 = 0 + 0 · 1 = 0.
s3 = P30 ⊕ c3 = 1 ⊕ 1 = 0, s4 = P40 ⊕ c4 = 1 ⊕ 1 = 0.
P3 0 0 0 0
G3 1 1 1 0
P4 0 0
G4 1 1
4
c16 = G15−0 = 1. We can also compute
c7 = G60 + P60 · c6 = 0 + 1 · 0 = 0, c9 = G80 + P80 · c8 = 0 + 0 · 0 = 0,
1 1
c10 = G9−8 + P9−8 · c8 = 0 + 0 · 0 = 0,
2 2
c12 = G11−8 + P11−8 · c8 = 1 + 0 · 0 = 1.
s5 = P50 ⊕ c5 = 0 ⊕ 1 = 1. s6 = P60 ⊕ c6 = 0 ⊕ 0 = 0.
s8 = P80 ⊕ c8 = 0 ⊕ 0 = 0.
5 4 4 4
In the sixth slot, we compute G31−0 = G31−16 + P31−16 · G15−0 .
5
P31−0 is not required.
5
This gives Cout = c32 = G31−0 = 1. We can further compute:
0 0
c11 = G10 + P10 · c10 = 0 + 0 · 0 = 0,
0 0
c13 = G12 + P12 · c12 = 0 + 1 · 1 = 1,
1 1
c14 = G13−12 + P13−12 · c12 = 0 + 1 · 1 = 1,
0 0
c17 = G16 + P16 · c16 = 0 + 1 · 1 = 1,
1 1
c18 = G17−16 + P17−16 · c16 = 1 + 1 · 1 = 1,
2 2
c20 = G19−16 + P19−16 · c16 = 1 + 1 · 1 = 1,
3 3
c24 = G23−16 + P23−16 · c16 = 0 + 1 · 1 = 1
s7 = P70 ⊕ c7 = 1 ⊕ 0 = 1, s9 = P90 ⊕ c9 = 0 ⊕ 0 = 0,
0 0
s10 = P10 ⊕ c10 = 0 ⊕ 0 = 0, s12 = P12 ⊕ c12 = 1 ⊕ 1 = 0,
0
s16 = P16 ⊕ c16 = 1 ⊕ 1 = 0,
In the seventh slot, All the required values of P and G are already available.
We can compute:
0 0 0 0
c15 = G14 + P14 · c14 = 0 + 1 · 1 = 1 c19 = G18 + P18 · c18 = 0 + 1 · 1 = 1
0 0 1 1
c21 = G20 + P20 · c20 = 0 + 0 · 1 = 0 c22 = G21−20 + P21−20 · c20 = 1 + 0 · 0 = 1
0 0 1 1
c25 = G24 + P24 · c24 = 0 + 1 · 1 = 1 c26 = G25−24 + P25−24 · c24 = 0 + 1 · 1 = 1
2 2
c28 = G27−24 + P27−24 · c24 = 0 + 0 · 1 = 0
0 0
s11 = P11 ⊕ c11 = 0 ⊕ 0 = 0, s13 = P13 ⊕ c13 = 1 ⊕ 1 = 0,
0 0
s14 = P14 ⊕ c14 = 1 ⊕ 1 = 0, s17 = P17 ⊕ c17 = 1 ⊕ 1 = 0,
0 0
s18 = P18 ⊕ c18 = 1 ⊕ 1 = 0, s20 = P20 ⊕ c20 = 0 ⊕ 1 = 1,
0
s24 = P10 ⊕ c24 = 1 ⊕ 1 = 0,
0 0
In the ninth slot, we can compute c31 = G30 + P30 · c30 = 0 + 1 · 1 = 1,
and the sum values
0
s23 = P23 ⊕ c23 = 1 ⊕ 1 = 0,
0
s27 = P29 ⊕ c29 = 0 ⊕ 1 = 1,
0
s29 = P29 ⊕ c29 = 1 ⊕ 1 = 0,
0
s30 = P30 ⊕ c30 = 1 ⊕ 1 = 0,
0
Finally in the tenth slot, we can evaluate s31 as s31 = P31 ⊕ c31 = 1 ⊕ 1 = 0.
Thus we have
Cin 1110 1111 1101 1111 1111 0000 0011 1111
a 1011 0111 1010 0101 0110 1000 1001 0011
b 0101 0000 0110 1010 1001 1000 0000 1100
sum 0000 1000 0001 0000 0000 0000 1010 0000
Serial Adders
Up to now, we have been concerned with making fast adders, even at the cost
of increased complexity and power.
In many applications, speed is not as important as low power consumption
and low cost.
Serial adders are an attractive option in such cases.
A single full adder is used.
If numbers to be added are available in parallel form, these can be serialized
using shift registers.
Serial Adders
A single full adder adds the incoming bits. Bits to be added are fed to it
serially, LSB first.
The sum bit goes to the output while carry is stored in a flip-flop.
Carry then gets added to the more significant bits which arrive next.
Output can be converted to parallel form if needed, using another shift
register.
Cin
Load Cprev
Csel Q
Cy Mux
A operand C D
A Shift Register
Shift Registers Sum
B
Output
B operand Full Adder Cout Latch