Fast Convolution Cook Toom Algorithm

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Fast Convolution

Dr. Arunachalam V
Associate Professor, SENSE
Fast Convolution
• Fast convolution algorithms uses fewer multiplication operations.
• These algorithms belongs to the class of algorithmic strength reduction.
• The number of strong operations such as multiplication operations is reduced possibly at the
expense of an increase in the number of weaker operations such as addition operations.
• These implementations are best suited for implementation using either programmable or dedicated
hardware.
Multiplication of 2 complex numbers in (x + jy)
form
• Assume (a + jb)(c + jd)=(e + jf) , where (a + jb) is a signal sample and
(c + jd) is a coefficient.
 e  c − d a 
• This is expressed in matrix form:  f  = d c  b 
    
• This direct implementation requires 4 multiplications and 2 additions.
e=ca-db = a(c-d) + d(a-b) and
f=da+cb = b(c+d) + d(a-b)
• Coefficient matrix can be decomposed as the product of matrices as shown
below:
c − d 0 0 1 0 
c − d  1 0 1 
d = c+d 0  0 1 
c  0 1 1 
0  
  0 0 d  1 − 1
What is the effect?
• (c-d) and (c+d) are assumed to be pre-computed.
• Thus the algorithmic complexity has been reduced to 3 multiplications and 3 additions.
• One multiplication has traded off for one addition.
• This leads to the reduction in hardware area.
Lagrange interpolation theorem
• Cook-Toom algorithm is a linear convolution algorithm for polynomial multiplication.
• It is based on Lagrange interpolation theorem.

Lagrange interpolation theorem


• Let β0, β1, β2, …, βn be a set of n+1 distinct points,
• Let f(βi), for i = 0, 1, 2, 3, …, n be given.
• There is exactly one polynomial f(p) of degree n or less that has value f(βi) when evaluated at βi for i
= 0, 1, 2, 3, …, n .

n  (p −  )j

f ( p ) =  f ( )
j i

 ( −  )
i
i =0 i j
j i
Convolution of x(p) & h(p)
• Consider an N-point sequence h={h0, h1, … , hN-1} and an L-point sequence
x={x0, x1, … , xL-1} .
• The linear convolution of h and x can be expressed in terms of polynomial
multiplication as : s(p) = h(p) x(p) Where, h( p ) = hN −1 p N −1 +    + h1 p + h0
x( p ) = xL −1 p L −1 +    + x1 p + x0
s( p ) = s N + L − 2 p N + L − 2 +    + s1 p + s0
• The output polynomial s(p) has L+N-2 degrees and L+N-1 different
coefficients.
• It can be uniquely determined by its values at L+N-1 different points.
• Let β0, β1, β2, …, βN+L-2 be a set of L+N-1 different real numbers .
• If s(βi), for i = 0, 1, 2, 3, …, L+N-2 are known, s(p) can be computed using
Lagrange interpolation theorem as:
L+ N −2  (p −  j )
s( p ) =  s( i )
j i

i =0  (i −  j )
j i
Cook-Toom algorithm
1. Choose L+N-1 different real numbers β0, β1, β2, …, βN+L-2 .
2. Compute h(βi) and x(βi), for i = 0, 1, 2, 3, …, L+N-2 .
3. Compute s(βi) = h(βi) . x(βi), for i = 0, 1, 2, 3, …, L+N-2 .
4. Compute s(p) using

L+ N −2  (p −  ) j

s( p ) =  s( )
j i

 ( −  )
i
i =0 i j
j i
The effect of CT algorithm
• The goal of the fast convolution algorithm is to reduce the multiplication
complexity.
• If βi, for i = 0, 1, 2, 3, …, L+N-2 are chosen properly, the computation in step
2 for evaluating h(βi) and x(βi) will involve some additions and multiplications
by small constants (such as positive and negative powers of 2).
• We can ignore these multiplication operations when βi’s are small numbers.
• But these operations may contribute to increased complexity for the larger
problem size.
• Number of multiplications have been reduced from O(LN) to (L+N-1) at the
expense of increase in number of additions.
Example 1
Construct a 2×2 convolution algorithm using Cook-Toom algorithm with β=0, ± 1.
Solution:
• Write 2×2 convolution in polynomial multiplication form as s(p) = h(p) . x(p)
• Where, h(p) = h0+ h1 p, x(p) ) = x0+ x1 p and s(p) ) = s0 + s1 p + s2 p2
• A direct implementation can be expressed in matrix form as follows:
 S 0  h0 0
 S  = h  x0 
 1  1 h0  x 
 S 2   0 h1   1

• Requires 4 multiplications and 1 addition.


Ex 1 – applying CT algorithm
• Compute h(βi) , x(βi) and s(βi), for i = 0, 1, 2 [(L=2)+(N=2)-2].
 0 = 0; h( 0 ) = h0 ; x( 0 ) = x0 ; s( 0 ) = h( 0 )x( 0 )
1 = 1; h(1 ) = h0 + h1 ; x(1 ) = x0 + x1 ; s(1 ) = h(1 )x(1 )
 2 = −1; h( 2 ) = h0 − h1 ; x( 2 ) = x0 − x1 ; s( 2 ) = h( 2 )x( 2 )

• Applying Lagrange interpolation formula:


s ( p ) = s ( 0 )
( p − 1 )( p −  2 ) + s( ) ( p −  0 )( p −  2 ) + s( ) ( p − 1 )( p −  0 )
( 0 − 1 )( 0 −  2 ) 1
(1 −  0 )(1 −  2 ) 2
( 2 − 1 )( 2 −  0 )
(
s( p ) = s( 0 ) − p 2
+ 1) + s( )
(p 2
+p ) + s( 2 )
( p2 − p )
1
2 2
 s (1 ) s ( 2 )  2 s (1 ) s ( 2 ) 
s ( p ) = s ( 0 ) + p  −  + p  − s ( 0 ) + + 
 2 2   2 2 
s( p ) = s0 + ps1 + p 2 s2
Matrix – Vector form
s ( 0 ) = h( 0 )x( 0 )
 s0   1 0 0   s( 0 )
 s  =  0 1 − 1  s ( 1 )  s(1 ) = h(1 )x(1 )
 1   2 
 s2  − 1 1 1   s ( 2 2 )  s( 2 ) = h( 2 )x( 2)

 s0   1 0 0  h( 0 ) 0 0   x( 0 ) h( 0 ) = h0 ; x( 0 ) = x0 ;


 s  =  0 1 − 1  0 h ( 1 )
0   x( )
 1   2  1  h(1 ) = h0 + h1 ; x(1 ) = x0 + x1 ;
 s2  − 1 1 1   0 h( 2 )
  x( 2 )
2  h( 2 ) = h0 − h1 ; x( 2 ) = x0 − x1 ;
0

 s0   1 0 0  h0 0 0  1 0 
 s  =  0 1 − 1  0  x0 
 1  
h0 + h1
2 0  1 1 
  x 
 s2  − 1 1 1   0 h0 − h1
 1 − 1  1
0 2 
Computation steps
The Computation is carried out as follows:
h0 + h1 h0 − h1
1. H 0 = h0 , H1 = 2 , H2 = 2 ( precomputed )
2. X 0 = x0 , X 1 = x0 + x1 , X 2 = x0 − x1 (2 additions )

3. S 0 = H 0 X 0 , S1 = H1 X 1 , S 2 = H 2 X 2 (3 multiplications )

4. s0 = S 0 , s1 = S1 − S 2 , s2 = − S 0 + S1 + S 2 (3 additions )

This algorithm requires only 3 multiplications and 5 additions.


Therefore, the number of multiplications has been reduced by 1 at the
expense of 4 extra addition operations.
Example 2
Construct a 2×3 linear convolution s(p) = h(p) . x(p)
where, h(p) = h0+ h1p, and x(p) ) = x0 + x1 p + x2 p2
Use Cook-Toom algorithm to construct an efficient implementation for the
given linear convolution.
Solution:
Compute h(βi) , x(βi) and s(βi), for i = 0, 1, 2,3 [(L=2)+(N=3)-2]
 0 = 0; h( 0 ) = h0 ; x( 0 ) = x0 ; s( 0 ) = h( 0 )x( 0 )
1 = 1; h(1 ) = h0 + h1 ; x(1 ) = x0 + x1 + x2 ; s (1 ) = h(1 )x(1 )
 2 = −1; h( 2 ) = h0 − h1 ; x( 2 ) = x0 − x1 + x2 ; s ( 2 ) = h( 2 )x( 2 )
 3 = 2; h( 3 ) = h0 + 2h1 ; x( 3 ) = x0 + 2 x1 + 4 x2 ; s( 3 ) = h( 3 )x( 3 )
Applying Lagrange interpolation formula
s ( p ) = s ( 0 )
( p − 1 )( p −  2 )( p −  3 )
+ s (1 )
( p −  0 )( p −  2 )( p −  3 )
+
( 0 − 1 )( 0 −  2 )( 0 −  3 ) (1 −  0 )(1 −  2 )(1 −  3 )
s ( 2 )
( p −  0 )( p − 1 )( p −  3 ) + s( ) ( p −  0 )( p − 1 )( p −  2 )
( 2 −  0 )( 2 − 1 )( 2 −  3 ) 3
( 3 −  0 )( 3 − 1 )( 3 −  2 )

s ( p ) = s ( )
( p − 2 p − p + 2)
3 2
+ s ( )
( p − p − 2 p)
+
3 2

−2
0 1
2

s ( )
( p − 3 p + 2 p)
3 2
+ s ( )
( p − p) 3

−6
2 3
6
(
s ( p ) = s ( 0 ) + p − s ( 2 0 ) + s (1 ) − s ( 3 2 ) − s ( 6 3 ) + )
( )
p 2 − s ( 0 ) + s (21 ) + s ( 2 2 ) + p 3 ((s 0 )
2 − s (21 ) − s ( 6 2 ) + s ( 6 3 ) )
s( p ) = s0 + ps1 + p 2 s2 + p 3 s3
Matrix-vector form
 s0   2 0 0 0   s ( 2 0 ) 
 s   − 1 2 − 2 − 1  s ( 1 ) 
 1 =   2 
 s2   − 2 1 3 0   s ( 6 2 ) 
     s (3 ) 
 s3   1 − 1 − 1 1   6 

 s0   2 0 0 0   h20 0 0 0  1 0 0
 s   − 1 2 − 2 − 1    x0 
 1 =   0
h0 + h1
2 0 0  1 1 1  x 
 s2   − 2 1 3 0  0 0 h0 − h1
0  1 − 1 1  1
   
6
h0 + 2 h1     x2 
 3 
s 1 − 1 − 1 1   0 0 0 6   1 2 4
Computation steps
The Computation is carried out as follows:
h0 + h1 h0 − h1 h0 + 2 h1
1. H 0 = h0
2 , H1 = 2 , H2 = 6 , H3 = 6 ( precomputed )
2. X 0 = x0 , X 1 = ( x0 + x2 ) + x1 , X 2 = ( x0 + x2 ) − x1 , X 3 = x0 + 2 x1 + 4 x2 (5 additions )

3. S 0 = H 0 X 0 , S1 = H1 X 1 , S 2 = H 2 X 2 , S3 = H 3 X 3 (4 multiplications )
4. s0 = 2S0 , s1 = −(S1 + S3 ) + 2(S1 − S 2 ), s2 = −2S0 + S1 + 3S 2 ,
s3 = (S1 + S3 ) − (S1 + S 2 ) (7 additions)

This algorithm requires only 4 multiplications and 12 additions.


Therefore, the number of multiplications has been reduced by 2 at the
expense of 10 extra addition operations.
Comments on CT algorithm
• s = Tx
• Convolution matrix, T = CHD
– C is a post addition matrix
– D is a pre addition matrix
– H is a diagonal matrix with Hi, i = 0, 1, 2, …, L+N-2.
• The CT algorithm provides a way to factorize the T in to C, H and D such that the total number of
general multiplications is determined solely by non-zero elements on the H matrix.
• Although the number of multiplications has been reduced by one-third, the number of additions has
increased.
Iterated Convolution
• Long convolutions are realized using short convolutions.
– 4×4 convolution can be realized using two 2×2 convolutions.
• This method will not achieve minimal multiplication complexity, but
achieves good balance between multiplication and addition
complexity.
• The order of short convolutions in the decomposition affects the
complexity of the derived long convolutions.
Iterated Convolution Algorithm
1. Decompose the long convolution in to several levels of short
convolutions.
2. Construct FAST CONVOLUTION ALGORITHM for short
convolutions.
3. Use the short convolution algorithms to iteratively (hierarchically)
implement the long convolution.
– 4×4 convolution can be realized using two levels of nested 2×2 convolutions.
Example 1
Construct a 4×4 linear convolution algorithm using 2×2 short convolution.
Solution:
• Where, h(p) = h0+ h1 p + h2 p2 + h3 p3, x(p) ) = x0+ x1 p + x2 p2 + x3 p3 and
s(p) ) = h(p) . x(p)
• Define h0'(p) = h0+ h1 p , h1'(p) = h2 + h3 p, x0'(p) = x0+ x1 p , x1'(p) = x2 + x3 p
and q = p2 .s0 + s1 p + s2 p2
• h(p) = h0'(p) + h1'(p) p2 = h0'(p) + h1'(p) q
• x(p) ) = x0'(p) + x1'(p) p2 = x0'(p) + x1'(p) q
• s(p) ) = h(p) . x(p)= (h0'(p) + h1'(p) q ) (x0'(p) + x1'(p) q )
• = s0'(p) + s1'(p) q + s1'(p) q2

 s0' ( p ) 1 0 0 h0' ( p ) 0 0  1 0  '


 '
( ) 
= −  0 '
( ) − '
( )  1 − 1  x0 ( p )
s
 1   p 1 1 1  h0 p h1 p 0     x ' ( p )
 s ' ( p ) 0 0 1  0
 2   0 h1 ( p )
'
0 1   1 
Example 1 – Cont…
 s0 ( p ) 1 0 0 h0 ( p ) 0  1 0  '
'
0
 s ( p ) = 1 − 1 1  0 '
( ) − '
( )  1 − 1  x0 ( p )
 1    h0 p h1 p 0     x ' ( p )
 s2 ( p ) 0 0 1  0 0 h1 ( p )
'
0 1   1 

• This uses 3 polynomial multiplications, 1 degree – 1 polynomial


addition and 2 degree – 2 polynomial additions.
• The 3 polynomial multiplications s0'(p), s1'(p) and s2'(p) are again 2×2
convolutions, each of which requires 3 multiplications and 3 additions.
• Due to the overlap terms between s0'(p) and s1'(p), and s1'(p) and s2'(p)
2 more additions are required.
• Therefore total number of multiplications and additions required are 9
and 19 respectively.
• In the direct form the total number of multiplications and additions are
16 and 12 respectively.
Cyclic or Circular Convolution
• Let the filter coefficients, h = {h0, h1, …, hn-1} and data sequence, x =
{x0, x1, …, xn-1} then cyclic convolution is
s( p ) = hOn x = h( p )x( p ) mod( p n − 1)
n −1
• The output samples are given by si =  h((i − k )) xk , i = 0,1, 2, ..., n − 1
k =0
where ((i-k)) denotes (i-k) mod n.
• The cyclic convolution can be computed as a linear convolution
reduced by modulo pn – 1.
• Notice that there are 2n – 1 different output samples for this linear
convolution .
• Alternatively, the cyclic convolution can be computed using CRT
with m(p) = pn – 1, which is much simpler.
Convolutions - Fast convolution
• Efficient linear convolution algorithm can be used to obtain an
efficient cyclic convolution algorithm.
• Conversely, an efficient cyclic convolution algorithm can be used to
derive an efficient linear convolution algorithm.
• All efficient convolution algorithms cannot be generated by Cook-
Toom or Winograd algorithms. Sometimes, a clever factorization by
inspection may generate better algorithm.
• Fast convolution algorithms form the basis for design of fast
parallel FIR filters.
Next Class
INTRODUCTION TO NUMERICAL STRENGTH REDUCTION

You might also like