Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Chapter 9 Computation of the DFT

9.0 Introduction

The performance of an algorithm to compute DFT is generally judged by the numbers
of the required arithmetic multiplications and additions. Sometimes, other factors are
needed to be considered, such as the chip area of VLSI implementation, power
consumption, and so on.


9.1 Direct Computation of the Discrete Fourier Transform

The DFT and IDFT are defined by
1
0
[ ] [ ] 0,1, , 1
N
kn
N
n
X k x n W k N

=
= =

L (9.1)
1
0
1
[ ] [ ] 0,1, , 1
N
kn
N
k
x n X k W n N
N

=
= =

L (9.2)

Most approaches to improving the efficiency of computation of the DFT exploit the
symmetry and periodicity properties of
kn
N
W ; specifically,

( )
( )
k N n kn kn
N N N
W W W
-
= = (complex conjugate symmetry) (9.3a)
( ) ( ) kn k n N k N n
N N N
W W W
+ +
= = (periodicity in n and k) (9.3b)

9.1.1 Direct Evaluation of the Definition of the DFT

Since W
N
is a complex number, it is needed to use complex multiplication and
addition. Eq.(9.1) in terms of operations of real numbers are expressed by

1
0
[ ] [(Re{ [ ]}Re{ } Im{ [ ]}Im{ })
(Re{ [ ]}Im{ } Im{ [ ]}Re{ })],
0,1, , 1
N
kn kn
N N
n
kn kn
N N
X k x n W x n W
j x n W x n W
k N

=
=
+
=

L
(9.4)

Each complex multiplication needs 4 real multilpcations and 2 real additions. Each
complex addition needs 2 real additions. The direct computation of [ ] X k shown in
Eq.(9.4) requires 4N multiplications and 4N2 additions. So, in total it needs 4N
2

multiplications and N(4N2) additions to compute all X(k), i.e., the number of
computation is proportional to N
2
.

In above analysis, we do not consider the factor that some computations can be omitted.
If we want to save those computations, some additional cost must be paid, such as the
increase of the software complexity and the increase of hardware complexity on data
flow management.

As an illustration of how the properties of
kn
N
W can be exploited, using the symmetry
property shown in Eq.(9.3a), we can group the terms for n and (MN). For example, the
grouping
( )
Re{ [ ]}Re{ } Re{ [ ]}Re{ }
( Re{ [ ]} Re{ [ ]}) Re{ }
kn k N n
N N
kn
N
x n W x N n W
x n x N n W

+
= +

eliminates a real multiplication, as does the grouping
( )
Im{ [ ]}Im{ } Im{ [ ]}Im{ }
(Im{ [ ]} ImRe{ [ ]}) Im{ }
kn k N n
N N
kn
N
x n W x N n W
x n x N n W


=

The nuimber of real multiplications ican be reducted by approximately a factor of 2. But
the amount of computation is still porpotional to N
2
.

9.1.2 The Goertzel Algorithm (Skip)

9.1.3 Exploiting both Symmetry and Periodicity

Cooley and Tukey (1965) published an algorithm to result in discorvery of a number of
highly efficient computational algorithms. Collectively, the entire set of such algorithms
has come to be known as the fast Fourier transform, or the FFT.

These algorithms exploit both the symmetry and the periodicity of the complex
exponential
(2 / ) kn j N kn
N
W e
t
= . Two classes of these algorithms are discussed in the
following sections: decimation in time and decimation in frerquency.

9.2 Decimation-in-time FFT Algorithms

This class of algorithms decomposes the DFT computation into successively smaller
DFT computations. Algorithms in which the decomposition is based on decomposing the
sequence x[n] into successively smaller sequences are called decimation-in-time
algorithms. Fig.9.3 shows the decomposition.


Figure 9.3: Illustration of the basic principle of decimation-in-time.

To understand the significance of Fig.9.3, the frequency-domain equivalents of the
operations depicted in the block diagram are given below:

The operation Left Shift 1 corresponds to multiplying
( )
j
X e
e
by
j
e
e
. The
decimation of the time sequence by 2 has been discussed in Section 4.6.1. So, we
have
( ) ( )
( )
( ) ( )
2 / 2 / 2
1
2
j j j
G e X e X e
e t e e
= + (9.14a)
( ) ( )
( )
( )
( )
( )
2 / 2 2 / 2 / 2 / 2
1
2
j j j j j
H e X e e X e e
e t e t e e e
= + (9.14b)
The sequency-expansion-by-2 results in
( ) ( )
2 j j
e
G e G e
e e
= and
( ) ( )
2 j j
e
H e H e
e e
= . So,
we have
( ) ( ) ( ) ( ) ( )
2 2 j j j j j j j
e e e
X e G e e H e G e e H e
e e e e e e e
= + = + (9.15)
Substituting Eqs.(9.14a) and (9.14b) into Eq.(9.15) will verify that the DTFT
( )
j
X e
e

of the N-point sequence x[n] can be represented as in Eq.(9.15) in terms of the
DTFTs of the two N/2-point sequences [ ] [2 ] g n x n = and [ ] [2 1] h n x n = + .

The DFT of x[n] can likewise be represented in terms of DFTs of [ ] g n and [ ] h n .
Specifically, we have
| | ( )
( )
( )
( )
( )
2 / 2 2 / 2 2 / 2 / j k N j k N j k N j k N
X k X e G e e H e
t t t t
= = + (9.16)
From the definition of [ ] g n and
( )
j
e
G e
e
, we have
( )
( )
| |
( )
| |
( ) (
| |
/ 2 1
2 / 2 2 / 2
0
/ 2 1 / 2 1
2 / / 2
/ 2
0 0
2
2 2
N
j k N j k N n
n
N N
j k N n kn
N
n n
G e x n e
x n e x n W
t t
t

= =
=
= =


(9.17a)
Similarly, we have
( )
( )
| |
/ 2 1
2 / 2
/ 2
0
2 1
N
j k N kn
N
n
H e x n W
t

=
= +

(9.17b)
From Eqs.(9.17a), (9.17b) and (9.16), we obtain
| | | | | |
/ 2 1 / 2 1
/ 2 / 2
0 0
2 2 1 , 0,1, , 1
N N
kn k kn
N N N
n n
X k x n W W x n W k N

= =
= + + =

K (9.18)
From the definition, we have
| | | |
1
0
, 0,1, , 1
N
nk
N
n
X k x n W k N

=
= =

K (9.19)
| | | |
/2 1
/2
0
2 , 0,1, , / 2 1
N
nk
N
n
G k x n W k N

=
= =

K (9.20a)
| | | |
/2 1
/2
0
2 1 , 0,1, , / 2 1
N
nk
N
n
H k x n W k N

=
= + =

K (9.20b)
Substituting them into Eq.(9.18) results in
| | ( ) ( ) ( ) ( )
/ 2 / 2
0,1, , 1
k
N
N N
X k G k W H k k N
( (
= + =

K (9.21)

Fig.9.4 depicts this computation for N=8.

Figure 9.4: Flow graph of the decimation-in-time decomposition of an N-point DFT
computation into two (N/2)-point DFT computations (N = 8).

We can repeat the above procedure to obtain

| | | |
( )
| |
( )
| |
( )
( )
/ 2 1 / 4 1 / 4 1
2 1 2
/ 2 / 2 / 2
0 0 0
2 2 1
N N N
k rk k
N N N
r
G k g r W g W g W

+
= = =
= = + +

l l
l l
l l + (9.22)
or
| | | |
( )
| |
( ) / 4 1 / 4 1
/ 4 / 2 / 4
0 0
2 2 1
N N
k k k
N N N
G k g W W g W

= =
= + +

l l
l l
l l + (9.23)
Similarly,
| | | |
( )
| |
( ) / 4 1 / 4 1
/ 4 / 2 / 4
0 0
2 2 1
N N
k k k
N N N
H k h W W h W

= =
= + +

l l
l l
l l + (9.24)

Fig. 9.5 displays the flow graph of the above computations.

Figure 9.5: Flow graph of the decimation-in-time decomposition of an (N/2)-point
DFT computation into two (N/4)-point DFT computations (N = 8).

Integrate the above results to obtain the flow graph shown in Fig. 9.6.

Figure 9.6: Result of substituting the structure of Figure 9.5 into Figure 9.4.

The complete flow graph of an 8-point DIT-FFT is shown in Fig. 9.9. It repeatedly uses
the same basic butterfly computation structure shown in Fig. 9.8. Fig.9.7 is a special case
of Fig.9.8 for 0 r = .

Figure 9.7: Flow graph of a 2-point DFT.

Figure 9.8: Flow graph of basic butterfly computation in Figure 9.9.

Figure 9.9: Flow graph of complete decimation-in-time decomposition of an 8-point
DFT computation.
From Fig.9.9, we find that it consists of
2
log N v = stages. Each stage requires N
complex multiplications and additions. The total number of complex multiplications and
additions hence is equal to
2
log N N .

The computation in the flow graph of Fig.9.9 can be further reduced. Since
( ) 2 / / 2 / 2
1
j N N N j
N
W e e
t t
= = = (9.25)
The factor
/ 2 r N
N
W
+
can be written as
/ 2 / 2 r N N r r
N N N N
W W W W
+
= = (9.26)
The butterfly shown in Fig.9.8 can be simplified as

Figure 9.10: Flow graph of simplified butterfly computation requiring only one
complex multiplication.
Its multiplication is reduced by 1.
Fig.9.9 is accordingly changed to

Fig. 9.11: Flow graph of 8-point DFT using the butterfly computation of Figure 9.10.

From Fig.9.11, we find that it consists of
2
log N v = stages. Each stage requires N/2
2-point DFT computations (butterflies). Between the sets of 2-point transforms are
complex multipliers of the form
r
N
W . These complex multipliers have been called
twiddle factors. The total numbers of complex multiplications and additions are
equal to
2
( / 2) log N N and
2
log N N , respectively.

9.2.2 In-Place Computations

We can define the input set as (N=8)
0
0
0
[0] [0]
[ ] [ ]
[7] [7]
X x
X n x n
X x
=
=
=
M
M
(9.27)

And think of
1
[ ]
m
X l

as the input array and [ ]


m
X l as the output array for the mth stage
of computations. Using the notation, the butterfly changes to

Figure 9.12: Flow graph of Eqs. (9.28).
The equations of its computation are
1 1
[ ] [ ] [ ]
r
m m N m
X p X p W X q

= + (9.28a)
1 1
[ ] [ ] [ ]
r
m m N m
X q X p W X q

= (9.28b)

Generally, we need two N-dimensional complex arrays to realize the FFT. But, from
the structure of butterfly, we find that the two outputs of a butterfly can use the same
storage of its two inputs, i.e., after obtaining the outputs, we do not need to store the
inputs anymore. Hence, it requires only an N-dimensional complex data array. This
computation form is called in-place computation.

It can also be seen from Fig.9.11, the inputs are rearranged in a bit-reversed order.
Specifically, as writing the indices of signals in Eq.(9.27) in binary form, we obtain
0
0
0
0
[000] [000]
[001] [100]
[110] [011]
[111] [111]
X x
X x
X x
X x
=
=
=
=
M (9.29)
Figs.9.13 and 9.14 show the sorting of inputs to the normal order and to the
bit-reversed order, respectively (for N8).

Figure 9.13: Tree diagram depicting normal-order sorting.


Figure 9.14: Tree diagram depicting bit-reversed sorting.


b
2
b
1
b
0
b
2
' b
1
' b
0
' bit-reversed order
0 0 0 0 0 0 0 0
1 0 0 1 1 0 0 4
2 0 1 0 0 1 0 2
3 0 1 1 1 1 0 6
4 1 0 0 0 0 1 1
5 1 0 1 1 0 1 5
6 1 1 0 0 1 1 3
7 1 1 1 1 1 1 7

9.2.3 Alternative Forms

Actually, we can reform the computation structure of Fig.9.11 to produce other flow
graphs:

(1) Fig. 9.15 illustrates a flow graph with inputs in normal order and outputs in
bit-reversed order;
(2) Fig. 9.16 illustrates a flow graph with both inputs and outputs in normal;
(3) Fig. 9.17 illustrates a flow graph with all stages has the same geometry.


Figure 9.15: Rearrangement of Figure 9.11 with input in normal order and output in
bit-reversed order.


Fig. 9.16: Rearrangement of Figure 9.11 with both input and output in normal order.


Figure 9.17: Rearrangement of Figure 9.11 having the same geometry for each
stage, thereby simplifying data access.

Among these three new flow graphs, the last two can not use the in-place computation.
The flow graph in Fig. 9.16 neednt take care the orders of inputs and outputs. The flow
graph in Fig. 9.17 can repeatedly use the same software or hardware for the computations
in all stages.


9.3 Decimation-In-Frequency FFT Algorithms

Using the same concept of discussion for DIT FFT, we show the basic principle of
decimation-in-frequency in Fig.9.18


Figure 9.18: Illustration of the basic principle of decimation-in-frequency.

It shows that the N-point DFT [ ] X k can be formed by interleaving its even-indexed
and odd-indexed samples after expansion by a factor of 2.

The signals in Fig.9.18 can be analyzed as follows. First, the time-domain signal
represented by
0
[ ] [2 ] X k X k = is
| | | |
0
/ 2
m
x n x n mN n

=
= + < <

% (9.30)
Since | | x n has length N, only two of the shifted copies of | | x n overlap in the
interval 0 / 2 1 n N s s , so | |
0
x n can be represented by
| | | | | |
0
/ 2 0 / 2 1 x n x n x n N n N = + + s s (9.31a)
Similarly, the time-domain signal represented by
1
[ ] [2 1] X k X k = + is
| | | | | |
| | | | ( )
/2
1
/ 2
/ 2 0 / 2 1
n n N
N N
n
N
x n x n W x n N W
x n x n N W n N
+
= + +
= + s s
(9.31b)
Note that
/2
1
N
N
W = .
So, we have
| | | | | | ( )
/2 1
0 /2
0
/ 2
N
kn
N
n
X k x n x n N W

=
= + +

(9.32a)
| | | | | | ( )
/2 1
1 /2
0
/ 2
0,1, , / 2 1
N
n kn
N N
n
X k x n x n N W W
k N

=
(
= +

=

K
(9.32b)
The procedure suggested by Eqs.(9.32a) and (9.32b) is illustrated for N=8 in Fig.9.19.

Figure 9.19: Flow graph of decimation-in-frequency decomposition of an N-point DFT
computation into two (N/2)-point DFT computations (N = 8).

Following a similar procedure to process N/2-point DFT, we can use N/4-point DFTs
to realize it. Fig.9.20 depict the resulting procedure for N=8.

Figure 9.20: Flow graph of decimation-in-frequency decomposition of an 8-point DFT
into four 2-point DFT computations.

For 2-point DFT, we can use the computation shown in Fig.9.21.

Figure 9.21: Flow graph of a typical 2-point DFT as required in the last stage of
decimation-in-frequency decomposition.

The complete DIF decomposition of an 8-point DFT is shown in Fig.9.22.

Figure 9.22: Flow graph of complete decimation-in-frequency decomposition of an
8-point DFT computation.

DIF FFT can also derived as follows.
1
2
0
1
1
2
2 2
0
2
1 1
2 2
2( )
2
2
0 0
1
2
2
0
[2 ] [ ] 0,1, , 1
2
= [ ] [ ]
= [ ] [ ]
2
= ( [ ] ( ))
2
N
nr
N
n
N
N
nr nr
N N
N n n
N N
N
n r
nr
N N
n n
N
nr
N
n
N
X r x n r
W
x n x n
W W
N
x n x n
W W
N
x n x n
W


= =

= =

=
= =
+
+ +
+ +



Similarly, for odd-number O/P

1
(2 1)
0
/ 2 1 1
(2 1) (2 1)
0 / 2
/ 2 1 / 2 1
(2 1) ( / 2 )(2 1)
0 0
[2 1] [ ] 0,1, , / 2 1
[ ] [ ]

[ ] [ ]

2

N
n r
N
n
N N
n r n r
N N
n n N
N N
n r N n r
N N
n n
X r x n r N
W
x n x n
W W
N
x n x n
W W

+
=

+ +
= =

+ + +
= =
+ = =
= +
= + +

/ 2 1
/ 2 (2 1)
0
/ 2 1
/ 2
0
( [ ] [ / 2])
( [ ] [ / 2])
N
N n r
N N
n
N
n nr
N N
n
x n W x n N
W
x n x n N W
W

+
=

=
= + +
= +



These two equations are the same as Eqs.(9.32a) and (9.32b)

As shown in Fig.9.22, DIF-FFT is bit-reversed order for k. The computation requires
2
( / 2)log N N complex multiplications and
2
log N N complex additions. Thus, the total
number of computation is the same for the DIF and DIT algorithms.


9.3.1 In-Place Computation

As seen from Fig.9.22, the basic computation of DIF FFT has the form of a butterflow
computation. So, it can be interpreted as an in-place computation.


9.3.2 Alternative Forms

The basic butterfly computation can be expressed by
| | | | | |
1 1 m m m
X p X p X q

= + (9.33a)
| | | | | | ( )
1 1
r
m m m N
X q X p X q W

= (9.33b)
Fig.9.23 shows its flow graph.

Figure 9.23: Flow graph of a typical butterfly computation required in Figure 9.22.

The alternative forms of DIF-FFT is similar to those of DIT-FFT. We can restructure
Fig.9.22 to obtain Fig.9.24 in which the indices of I/P and O/P are in bit-reversed order
and in normal order, respectively. Note that it is the transpose of Fig.9.15.

Fig. 9.24: Flow graph of a decimation-in-frequency DFT algorithm obtained from Fig.9.22.
Input in bit-reversed order and output in normal order. (Transpose of Fig. 9.15.)

Similarly, applying the transpose operation to Fig.9.16, we can obtain a computation
structure in which both the indices of input and output are in normal order.

The transpose of Fig.9.17 is shown in Fig.9.25. Each stage of Fig.9.25 has the same
geometry.

Figure 9.25: Rearrangement of Figure 9.22 having the same geometry for each stage,
thereby simplifying data access. (Transpose of Figure 9.17.)


9.4 Practical Considerations

The efficiency of an algorithm will be affected by the needs of rearranging the
bit-reversed order of inputs, outputs and coefficients. For the use of in-place computation,
it is needed to let either the inputs or the outputs be arranged in the bit-reversed order.
The efficiency of using memory will be higher. The flow graphs shown in Figs.9.16 and
9.17 can not use the in-place computation.

Generally, the inverse DFT can use the same flow graph to compute as the FFT.

In using the DFT to perform convolution, autocorrelation, or cross-correlation, we first
calculate the DFTs of the two input signals. Then perform the product of the two DFTs.
Lastly, calculate the IDFT of the product to obtain the result. In this produce, we can
adopt two different flow graphs for DFT and IDFT so as to save the operation of
rearranging the order of data. For example, use DIT for DFT and then use DIF for IDFT,
or vice versa. On the other hand, for the consideration of arranging the coefficients, using
the same flow graph for both DFT and IDFT is more convenient.

To speed up the computation, all coefficients can be prepared in advance. The price
paid is the memory to save them. Alternatively, we can calculate them in runtime by the
following recursive formula:

( 1)

ql q q l
N N N
W W W

=


Many FFT structures for general N (other than 2
L
N = ) have been well-studied. When
N is a composite integer, such as NRQ or the product of several integers, it can be
efficiently calculated by the Cooley-Tukey type algorithm. If the integers are prime, the
prime-factor algorithm can be used.


9.5 More General FFT Algorithms (skip)


9.6 Implementation of the DFT using Convolution

9.6.1 Overview of the Winograd Fourier Transform algorithm (WFTA) (skip)

9.6.2 The Chirp Transform Algorithm



0
0, , 1
k
k k M e e e = + = V L (9.39)

DFTdefine
j
W e
e
=
V


0
1
0
1
0
( ) [ ] 0, , 1
[ ]
k k
N
j j n
n
N
j n nk
n
X e x n e k M
x n e
W
e e
e

=
= =
=

L
(9.40, 9.42)


2 2 2
1
( ( ) )
2
nk n k k n = + (9.43)

2 2 2
0
2 2
1
/2 /2 ( ) /2
0
1
/2 ( ) /2
0
( ) [ ]
( [ ] )
k
N
j j n n k k n
n
N
k k n
n
X e x n e W W W
W g n W
e e


=
=
=

(9.44, 9.46)

2
0
/2
[ ] [ ]
j n n
g n x n e W
e
= (9.45)

n k

2 2
1
/2 ( ) /2
0
( ) ( [ ] ) 0,1, , 1
n
N
j n n k
k
X e W g k W n M
e


=
= =

L (9.47)

Fig. 9.26 x[n] g[n]
2
/2 n
W convolution

2
/2 n
W ( )
n
j
X e
e



2
/2 n
W


2
n
j n
e
e

V
complex sinusoid with frequency
2
n e V

radar system chirp signal algorithm chirp transform




2
/2 n
W

infinite sequence O/P (N1)nM1


h[n] Fig. 9.29

h[n] delay N1 causal Fig. 9.30

2
( 1) /2
1
0,1, , 2
[ ]
0 otherwise
n N
W n M N
h n

= +

L



1
( ) [ 1] 0,1, , 1
n
j
X e y n N n M
e
= + = L


1
[ ] y n N1

algorithm DFT e
0
0 MN
2 / j N
N
W e W
t
= =
2
n
n
N
t
e = modify Fig. 9.30 delay Fig. 9.31

2
/2
2
1, 2, , 1
[ ]
0 otherwise
n
N
W n M N
h n

= +

L



2 /
2
( ) [ ] 0,1, , 1
j n N
X e y n N n M
t
= + = L

CTA generalize chirp-z transform (CZT)


9.7 Effects of Finite Register Length

filter finite length effect

Fig. 9.32 DIT-FFT Fig. 9.33 butterfly linear-noise
model noise source [ , ] m q c
(1) error uniformly distributed in
1
2
2
B

1
2
2
B
( (B1)-bit signed
fraction ) mean = 0variance
2
2
12
B

(2) error uncorrelated with each other
(3) error uncorrelated with signal


2
2
2 2
1
2
{ [ , ]} 4
2
12 3
B
B
B
E m q c o

= = =

1 complex 4 real

FFT
(1) O/P N1 butterflies
(2) node nodes transmission gain 1 (for magnitude)
(3) O/P k error
2
2 2
{ [ ] } ( 1)
B B
E F k N N o o = ~


double-length accumulator round-off error


stageO/P noise double mean-square error value

overflow butterfly

-1 -1
max( [ ] , [ ] ) max( [ ] , [ ] )
m m m m
X p X q X p X q s

-1 -1
max( [ ] , [ ] ) 2max( [ ] , [ ] )
m m m m
X p X q X p X q s

maximum value stage

I/P
1
[ ] 1 1 x n n N
N
< s s

[ ] 1 1 1 X k k N < s s

1 1
0 0
1
[ ] [ ] [ ] 1
N N
kn
N
n n
X k x n W x n N
N

= =
= s < =



O/P overflow



2
2
2
1
{ [ ] }
3
x
E x n
N
o = =

uniformly distributed in
1
2N

1
2N
+ for real and imaginary part


1
2 2 2
2
0
1
{ [ ] } { [ ] }
3
N
kn
x
n
E X k E x n N
W
N
o

=
= = =

white

noise-to-signal (N/S) ratio

2
2 2 2 2 2
2
{ } | [ ] |
3 2
{ } | [ ] |
B
B
E F k
N N N
E X k
o

= =


stage (i.e., N2N) 1 bit (i.e., BB1)
N/S ratio

approach maximum value stage
2 stage signal
1
2
I/P1 overflow

Fig. 9.36 butterfly
1
2
|x[n]|1 real
imaginary part
1
2

1
2
noise source noise
source attenuate stage
2
1
( )
2

(in energy)

(1)

1
2 2 2 2 2
0
{ } (0.5) | [ ] |
2
m m
B
m
E F k
v
v v
o


=
=



2 N
v
= , v stages

(2) (v-m)-stage
1
2
m v
butterflies O/P butterflies
noise sources O/P amplitude gain
1
(0.5)
m v

2 2 2
(0.5)
m v


2 2
{ } 4 (1 0.5 ) | [ ] |
B
E F k
v
o =

(3)
1
0.5
N
v
= N

2 2 2
4
{ } 4 2 | [ ] |
3
B
B
E F k o

~ =

(4)
2
2 2
2
{ } | [ ] |
12 4 2
{ } | [ ] |
B
B
E F k
N N N
E X k
o

= =
stage
1
2
bit N/S ratio

approach check overflow overflow
stage data
1
2
O/P N/S ratio

1 or 1
k
N
W = round-off error


signal white

You might also like