Professional Documents
Culture Documents
Parallel Prefix Adders Presentation
Parallel Prefix Adders Presentation
Parallel Prefix Adders Presentation
Overview of presentation
Parallel prefix operations Binary addition as a parallel prefix operation Prefix graphs Adder topologies Summary
Prefix: The outcome of the operation depends on the initial inputs. Parallel: Involves the execution of an operation in parallel. This is done by segmentation into smaller pieces that are computed in parallel. Operation: Any arbitrary primitive operator that is associative is parallelizable
With B = BnBn-1 B1
B1 = A1 B2 = A1 * A2 B3 = A1 * A1 * A3 and * here is the integer addition operation.
6 +
4 + +
2 +
2 +
B = (A + A +3A3 =6
B1
B6
B5
B4
B3
Binary Addition
c3
This is the pen and paper addition of two 4-bit binary numbers x and y. c represents the generated carries. s represents the produced sum bits. A stage of the addition is the set of x and y bits being used to produce the appropriate sum and carry bits. For example the highlighted bits x2, y2 constitute stage 2 which generates carry c2 and sum s2 .
x3 y3
c2
x2
c1
x1 y1 s1
c0
x0 y0 s0
y2 s2
s4 s3
Each stage i adds bits ai, bi, ci-1 and produces bits si, ci The following hold:
ai 0 0 1 1 bi 0 1 0 1 ci 0 ci-1 ci-1 1 Comment: The stage kills an incoming carry. The stage propagates an incoming carry The stage propagates an incoming carry The stage generates a carry out Generate bit: Formal definition: Kill bit: Propagate bit:
k i = xi + y i
p i = xi y i
g i = xi y i
Binary Addition
ai 0 0 1 1 bi 0 1 0 1 ci 0 ci-1 ci-1 1 Comment: The stage kills an incoming carry. The stage propagates an incoming carry The stage propagates an incoming carry The stage generates a carry out Generate bit: Formal definition: Kill bit: Propagate bit:
k i = xi + y i p i = xi y i g i = xi y i
ci = g i + pi ci 1 = xi yi + ( xi yi ) ci 1
ci = xi yi + ( xi + yi ) ci 1 = g i + ai ci 1
The ai term in the equation being the alive bit. The later form of the equation uses an OR gate instead of an XOR which is a more efficient gate when implemented in CMOS technology. Note that:
ai = k i
Where ki is the kill bit defined in the table above.
Final sum.
The pre-calculation stage is implemented using the equations for pi, gi shown at a previous slide:
x2y2 x1y1 x0y0
g2
p2
g1
p1
g0
p0
g2
a2
g1
a1
g0
a0
Note the symmetry when we use the propagate or the alive bit We can use them interchangeably in the equations!
The carry calculation stage is implemented using the equations produced when unfolding the recursive equation:
ci = g i + pi ci 1 = g i + ai ci 1
c0 = g 0 c2 = g 2 + p2 c1 = g 2 + p2 ( g1 + p1 g 0 ) = g 2 + p2 g1 + p2 p1 g 0 etc K c1 = g1 + p1 g 0
g2p2
g1p1
g0p0
c2
c1
c0
The final sum calculation stage is implemented using the carry and propagate bits ci,pi:
si = pi ci 1 , with pi = xi yi Note : si = g i + ai ci 1 , with ai = xi + yi
c2p3
c1p2
c0p1
cinp0
s3
s2
s1
s0
If the alive bit ai is used the final sum stage becomes more complex as implied by the equations above.
We define a new operator: Input is a vector of pairs of propagate and generate bits:
(g n , pn )(g n1 , pn1 )K (g 0 , p0 )
Properties of operator :
Easy to prove based on the fact that the logical AND, OR operations are associative.
(Gi , Pi ) = ( g i , pi ) o (Gi 1 , Pi 1 ) Where (G1 , P 1 ) = ( g1 , p1 )
Illustration on
b3 b2
Where : (G0 , P0 ) = ( g 0 , p0 )
The familiar carry bit generating equations for stage i in a CLA adder.
(G3 , P3 ) = ( g 3 , p3 ) o (G2 , P2 ) = ( g 3 + p3 ( g 2 + p2 g1 ), p3 p2 p1 )
The parallel prefix adder employs the 3-stage structure of the CLA adder. The improvement is in the carry generation stage which is the most intensive one:
Pre-calculation of Pi, Gi terms Calculation of the carries. This part is parallelizable to reduce time.
Straight forward as in the CLA adder
Prefix graphs can be used to describe the structure that performs this part.
Straight forward as in the CLA adder
(g
in1
, pin1
buffer component:
(g in , pin )
( g in2 , pin2 )
(g out , pout )
(g out , pout )
+ pin 1 g in2 , pin1 pin 2
(g out , pout )
(g out , pout )
(g out , pout ) = (g in
c8
c7
c6
c5
c4
c3
c2
c1
J. Sklansky conditional adder Kogge-Stone adder Ladner-Fisher adder Brent-Kung adder Han Carlson adder S. Knowles
c8
c7
c6
c5
c4
c3
c2
c1
c8
c7
c6
c5
c4
c3
c2
c1
Low depth High node count (implies more area). Minimal fan-out of 1 at each node (implies faster performance).
c8
c7
c6
c5
c4
c3
c2
c1
c8
c7
c6
c5
c4
c3
c2
c1
Maximum logic depth in PP adders (implies longer calculation time). Minimum number of nodes (implies minimum area).
The Han-Carlson adder combines the Brent-Kung and Kogge-Stone structures into a hybrid structure.
1999: S. Knowles
Depth, interconnect, area. These adders are bound by the Lander-Fischer (minimum depth) and Brent-Kung (minimum fanout) topologies.
An interesting taxonomy:
Harris[2003] presented an interesting 3-D taxonomy of the adders presented so far. Each axis represents a characteristic of the adders:
-Fanout -Logic depth -Wire connections
They are based on a different set of equations. The new set of equations introduces the following tradeoffs:
Precalculation of Pi, Gi terms is based on more complex equations
2001: Beaumont-Smith
(p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1)
c8
c7
c6
c5
c4
c3
c2
c1
The Beaumont-Smith adders incorporate nodes that can accept more than a pair of inputs and produce the carry calculation. These higher valency nodes are optimized circuits for a specific technology (CMOS). The above topology is a Beaumont-Smith tree based on the Kogge-Stone architecture
Summary (1/3)
The parallel prefix formulation of binary addition is a very convenient way to formally describe an entire family of parallel binary adders.
Summary (2/3)
There exist various architectures for the carry calculation part. Trade-offs in these architectures involve the
area of the adder its depth the fan-out of the nodes the overall wiring network.
Summary (3/3)
Variations of parallel adders have been proposed. These variations are based on:
Modifying
the carry generation equations and reformulating the prefix definition (Ling) Restructuring the carry calculation trees based by optimizing for a specific technology (BeaumondSmith) Other optimizations.
References:
Beaumont-Smith, Cheng-Chew Lim, Parallel Prefix Adder Design, IEEE, 2001 Han, Carlson, Fast Area-Efficient VLSI Adders, IEEE, 1987 Dimitrakopoulos, Nikolos, High-Speed Parallel-Prefix VLSI Ling Adders, IEEE 2005 Kogge, Stone, A Parallel Algorithm for the Efficient solution of a General Class of Recurrence equations, IEEE, 1973 Simon Knowles, A Family of adders, IEEE, 2001 Ladner, Fischer, Parallel Prefix Computation, ACM, 1980 Brent, Kung, A regular Layout for Parallel Adders, IEEE, 1982 H. Ling, High-Speed Binary Adder, IBM J. Res. And Dev., 1980 J. Sklansky, Conditional-Sum Addition Logic, IRE transactions on computers, 1960 D. Harris, A Taxonomy of Parallel Prefix Networks, IEEE, 2003