Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Lecture 12

Discrete Fourier Transform (cont’d)

Thresholding as a method of denoising

The method of thresholding can also be used to “denoise” signals. (In fact, it had been a rather
standard method for wavelet-based denoising of signals and images.) First of all, we should qualify
what we mean by “noisy signals/images.”
The transmission, reproduction or recording of signals/images, as “pure” as they may be initially,
generally introduces distortions. Some of these distortions may be quite systematic in nature, e.g.,
the scratch on a lens of a digital camera. But signals/images may also be subject to distortions that
may be considered as random in nature, for example, the distortion of an audio signal that is send
over a very poor communications line. There are various models for such degradations, according to
the application. In what follows, we employ one of the simplest and most standard models, namely
additive Gaussian noise. Our actual implementation of this model is also quite simple. (That
being said, the majority of research papers basically use the same type of simplified model.)

Let f0 = (f0 [0], f0 [1], · · · , f0 [N − 1]) denote a “pure” or “noiseless”, i.e., undegraded, signal. For
example, it could represent part of an audio track that was recorded in a “perfect studio”. (Of course,
no such studio exists.) We then assume that this perfect signal f0 is degraded according to the
following model,
f = f0 + n, (1)

where n ∈ RN denotes a random N -vector. The components n[i], 1 ≤ i ≤ N −1 (we’ll use i as an index
instead of the usual n, to avoid the confusing notation “n[n]”) are independent random variables,
which are identically distributed according to the normal or Gaussian distribution N (0, σ), i.e.,
zero-mean, standard deviation σ > 0. The vector f then represents the noisy signal. Of course, what
we want is to find f0 , or at least a good approximation to it, from f .
As you know from probability/statistics, a proper interpretation of this model implies that we
must consider a large collection or ensemble of such noisy signals produced by this random process.
f0 remains the same, but we’ll have many different noisy signals f produced by the random N -vectors
n. And if we examined the values assumed by a particular entry in the n vectors, say n[5], we would
see that, very roughly, the mean of these values would be near zero, and the variance near σ.

144
Actually, let’s stop here for a moment and mention that this represents one way of extracting
approximations to the noiseless image f0 : By collecting a large number M of such distorted images f
and taking the average of them. If M is large enough, then the average of the n vectors will be roughly
(0, 0, · · · , 0) ∈ RN . Therefore the average of all of these noisy signals will be a rough approximation
to f0 . This is one of the oldest methods of noise reduction.

Here, however, we assume that we do not have access to a large number of noisy signals, but only
one – produced, in essence, from a particular realization of the random vector n. In the numerical
experiment below, this particular realization is constructed by simply generating N random numbers
from a random number generator that is designed to generate them according to a normal N (0, σ)
distribution.

Here is the important point:

By “denoising” the noisy signal f , we mean finding approximations to the


noiseless signal f0 .

We can never find f0 exactly, since the elements of the random vector n are not known deterministically.
The best we can do is to find approximations to f0 .

For the noiseless signal, we shall once again employ the discrete signal of length N = 256,

2πn
f0 [n] = f (xn ), xn = , n = 0, 1, · · · , 255, (2)
N

obtained by sampling the function

2 /10
f (x) = e−x [sin(2x) + 2 cos(4x) + 0.4 sin(x) sin(10x)], 0 ≤ x ≤ 2π. (3)

The function is plotted once again at the top left in the figure below.
A particular vector n ∈ RN was also generated by means of a random number generator (in the
FORTRAN programming language) using standard deviation value σ = 0.1. The vector n is plotted
at the top right in the figure below.
Finally, the noisy signal f = f0 + n is constructed by adding the components of these two signals,
cf. Eq. (1). The result is plotted at the bottom of the figure.
We now show that some “denoising” of the signal f , i.e, finding approximations to f0 , may be
achieved by thresholding the discrete Fourier transform F of f . First of all, recall that the DFT is a

145
2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5

-2 -2

-2.5 -2.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5
x x

Left: Noiseless signal f0 [n], sampled from f (x) in Eq. (3). Right: Noise vector n, σ = 0.1.
2.5

1.5

0.5

-0.5

-1

-1.5

-2

-2.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5
x

Resulting noisy signal f = f0 + n.

146
linear operator. This means that

F = F(f ) = F(f0 + n) = F(f0 ) + F(n) = F0 + N. (4)

Here we run the risk of confusion since N , the DFT of n, also denotes the number of samples. We
hope that things will be clear by context.
This addition property of the DFTs is illustrated in the next figure. At the top left is the DFT
F0 of the noiseless signal f . At the top right is the DFT N of the pure-noise signal n. These two are
added to produce the DFT F of the noisy signal F .

125 50

100 40

75 30
|F(k)|

|F(k)|

50 20

25 10

0 0
0 50 100 150 200 250 0 50 100 150 200 250
k k

Left: Noiseless DFT F0 , obtained from sampled signal f . Right: Noise DFT N , obtained from noise vector n.
50

40

30
|F(k)|

20

10

0
0 50 100 150 200 250
k

Resulting noisy DFT F = F0 + N .

We now step back and examine the DFT N of the pure-noise signal. Perhaps the most noteworthy
feature of this plot is that the DFT coefficients do not exhibit the decay characteristic of “normal”
signals. In fact, they do not appear to decay at all. It is a fact, which will not be proved here,

147
that the coefficients N [k] are also random – for all intents and purposes, we may view them as being
generated randomly from a normal distribution. (That being said, each of the coefficients N [k] is
related deterministically from the noise vector coefficients n[i].) In the next figure, we show how
the amplitude of the DFT coefficients is related to the amplitude of the noise vector n. It shows the
noise vector n used in this experiment, with σ = 0.1 (top), along with its DFT, and a noise vector
corresponding to σ = 0.5, so that the random entries can assume values of larger magnitude.

Noise and its DFT representation


2.5 50

1.5 40

0.5 30
|F(k)|
0

-0.5 20

-1

-1.5 10

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Pure noise f [n] signal, zero-mean, σ = 0.1, and corresponding DFT coefficient magnitudes |N [k]|.
2.5 50

1.5 40

0.5 30
|F(k)|

-0.5 20

-1

-1.5 10

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Pure noise f [n] signal, zero-mean, σ = 0.5, and corresponding DFT coefficient magnitudes |N [k]|.

We now come to the main point behind this thresholding denoiser. The coefficients N [k] of
the pure noise vector are relatively small in magnitude for all frequencies k. They are seen to be
insignificant with respect to the low-frequency coefficients F0 [k] of the noiseless signal. They are not
insignificant with respect to the high-frequency coefficients F0 [k]. Therefore we shall assume that
most of the high-frequency content of the noisy DFT F comes from noise. Since these coefficients are
insignificant with respect to much of the low-frequency content, we conjecture that thresholding might
be able to “remove” much of the noise content in F , and thereby provide reasonable approximations

148
to F0 , hence f0 .
The results of thresholding for a number of ǫ values are shown in the next figures. For each ǫ
value are presented the resulting signal as well as the L2 error and relative L2 of approximation to
the noiseless signal f0 .
What is rather interesting, and potentially discouraging, is that the L2 error, 1.74, for the case
ǫ = 2.0 is greater than the error, 1.66, for the actual noisy image! As ǫ is increased, however, the
error decreases - it’s 1.64 at ǫ = 3.0 - but then increases again.
The results are not very encouraging and, indeed, thresholding of DFTs is not a very good method.
But it’s not the thresholding that’s the problem – it’s the DFTs. They are too global: each DFT
coefficient contains information from the entire signal. We’ll see later that thresholding works quite
well with wavelet transforms, because of the locality of wavelet functions.

A closer look at the Convolution Theorem

In this section, we examine some particular examples, along with a very simple, yet interesting,
application to signal processing.

Recall the Convolution Theorem:

Let f and g be two N -periodic complex vectors. Define the (circular) convolution of these two
vectors as the vector h with components
N
X −1
h[n] = f [j]g[n − j], n = 0, 1, · · · , N − 1. (5)
j=0

Then the DFT of h is related to the DFTs of f and g as follows,

H[k] = F [k]G[k]. (6)

We first rewrite the RHS of (5) slightly, via a change of variables:


N
X −1
h[n] = f [n − j]g[j], n = 0, 1, · · · , N − 1. (7)
j=0

In this way, we can view f as a “signal” and g as a “mask”: The convolution then produces a new
signal h from f .
In the figure below, we align the vector g with f in a manner appropriate to the convolution. The
terms that are joined by lines are multiplied and then added together to form the entry h[n]:

149
Simple denoising by thresholding
2.5 50

1.5 40

0.5 30

|F(k)|
0

-0.5 20

-1

-1.5 10

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Left: Original noisy signal f = f0 + n, σ = 0.1. Magnitudes |F [k]| of DFT coefficients. L2 error
kf0 − f˜k2 = 1.66. Relative L2 error 12.6%.
2.5 50

1.5 40

0.5 30
|F(k)|

-0.5 20

-1

-1.5 10

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Threshold ǫ = 2.0. 57.4% of original coeffs retained. Reconstructed signal f˜[n] and magnitudes
|F̃ [k]|. L2 error kf0 − f˜k2 = 1.71. Relative L2 error 13.0%.
2.5 50

1.5 40

0.5 30
|F(k)|

-0.5 20

-1

-1.5 10

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Threshold ǫ = 3.0. 37.1% of original coeffs retained. Reconstructed signal f˜[n] and magnitudes
|F̃ [k]|. L2 error kf0 − f˜k2 = 1.64. Relative L2 error 12.4%.

150
2.5 50

1.5 40

0.5 30

|F(k)|
0

-0.5 20

-1

-1.5 10

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Threshold ǫ = 4.0. 25.3% of original coeffs retained. Reconstructed signal f˜[n] and magnitudes
|F̃ [k]|. L2 error kf − f˜k2 = 1.70. Relative L2 error 12.9%.
2.5 50

1.5 40

0.5 30
|F(k)|

-0.5 20

-1

-1.5 10

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Threshold ǫ = 5.0. 16.8% of original coeffs retained. Reconstructed signal f˜[n] and magnitudes
|F̃ [k]|. L2 error kf − f˜k2 = 1.72. Relative L2 error 13.0%.
2.5 50

1.5 40

0.5 30
|F(k)|

-0.5 20

-1

-1.5 10

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Threshold ǫ = 10.0. 9.0% of original coeffs retained. Reconstructed signal f˜[n] and magnitudes
|F̃ [k]|. L2 error kf − f˜k2 = 2.06. Relative L2 error 15.6%.

151
. . . f [n − 1] f [n] f [n + 1] . . .

g[−1] g[0] g[1] . . .

=
. . . g[N − 1] g[N]

Terms in convolution of f with g contributing to h[n].

The convolution operation may be viewed as a kind of “reverse scalar product: To compute f [n],
we “flip” the order of the elements of g with respect to g[0], which is lined up with f [n] and then
perform the scalar product. We’ll come back to this idea in our study of wavelets.
Let us now examine a few special cases for g:

1. g[0] = 1 and g[n] = 0 otherwise: The only term that contributes to the sum in Eq. (5) is
f [n]g[0] = f [n]. Therefore
h[n] = f [n], n = 0, 1, · · · , N − 1, (8)

or simply h = f . This has the appearance of an identity operation, but it is more convenient to
view g as the discrete version of the Dirac delta function. This will become clearer in the next
example.

In this example, the DFTs of f and h are identical, i.e., H = F . From the Convolution Theorem,
H[k] = F [k]G[k], implying that G[k] = 1. But we could have also derived this result by directly
computing the DFT of g:
N −1  
X 2πkn
G[k] = g[n] exp −
N
n=0
= g[0] exp(0) (since only g[0] is nonzero)

= 1. (9)

2. g[1] = 1 and g[n] = 0 otherwise: The only term that contributes to the sum in Eq. (5) is
f [n − 1]g[1] = f [n − 1]. Therefore,

h[n] = f [n − 1], n = 0, 1, · · · , N − 1. (10)

Thus,
(h[0], h[1], · · · , h[N − 1]) = (f [N − 1)], f [0], f [1], · · · , f [N − 2]). (11)

In other words, g correponds to the right-shift operator.

152
We’ll leave it as an exercise for the reader to determine the DFT of g, i.e., G[k], in two different
ways.

3. Of course, we can generalize the above result: g[k0 ] = 1 and g[n] = 0 otherwise, where k0 ∈
{0, 1, · · · , N − 1}. Then g is a k0 -fold right-shift operator.

Once again, we’ll leave it as an exercise for the reader to determine the DFT of g, i.e., G[k], in
two different ways.

“Averaging” as a convolution

With reference to Eq. (7), we now consider the following “mask” g: For an α ∈ [0, 1],

g[0] = α,
1
g[1] = (1 − α),
2
1
g[−1] = g[N − 2] = (1 − α),
2
g[n] = 0, otherwise. (12)

In other words, g has at most three non-zero elements. When α = 1, we have the Dirac delta mask.
Note that
g[0] + g[1] + g[−1] = 1. (13)

The convolution of a signal f and g then produces the signal

h[n] = g[1]f [n − 1] + g[0]f [n] + g[−1]f [n + 1]


1
= αf [n] + (1 − α)(f [n − 1] + f [n + 1]). (14)
2

This may be viewed as a weighted averaging of f [n] with its immediate neighbours to produce a new
signal value h[n]. In the special case α = 1/3, the weighting is uniform:

1
h[n] = (f [n − 1] + f [n] + f [n + 1]). (15)
3

Since only the immediate neighbours of f [n] are employed in this averaging procedure, it is
often referred to as “local averaging.” The effect of this procedure is to “smoothen out” a signal.
For example, if f [n] lies higher in value than its neighbours, as sketched below, then averaging will

153
f [n]
.

. f [n + 1]
f [n − 1] .
. h[n]

n−1 n n+1

Local averaging at f [n] to produce h[n].

produce a lower value. And, of course, if f [n] lies lower in value, then averaging will produce a higher,
i.e., more positive, value.
This “smoothing” effect may also be viewed as “blurring”, especially if there are sharp disconti-
nuities in the signal, as sketched in the next figure. Signal f consists of two flat, i.e., constant, regions,
and an “edge,” or discontinuity between 4 and 5. One application of the convolution/local averaging
will lower the value of f [4] one-third the way down towards the value of f [5], to produce the value h[4]
and raise the value of f [5] one-third the way up to produce the value h[5]. In summary, signal values
are changed at 4 and 5. The other signal values are unaffected since they lie in constant regions –
local averaging will not change their values. The result of this operation is a slightly “blurred” edge,
i.e., a more gradual change in values from the highest ones to the lowest ones.

Another application of the averaging operator will alter the values of the signal at 3,4,5 and 6.
The reader can see that the gradient of the signal has been further decreased in magnitude, i.e., the
graph has become less steep.
One final point: the reader may have already noticed how each application of the local averaging
operator increases the region of influence, i.e., the points affected by the averaging. Each application
affects an additional signal value, previously unaffected, on either side of the original edge.

154
. . . . .
original signal, f ,

. . . . with edge between


4 and 5

0 1 2 3 4 5 6 7 8

. . . .
. one application of
. local averaging
. . .
h=f ∗g

0 1 2 3 4 5 6 7 8

. . . .
. another application
.
. of local averaging
. .
r = h ∗ g = f ∗ (g ∗ g)

0 1 2 3 4 5 6 7 8

The blurring of an “edge” or discontinuity of a signal by local averaging.

155
Lecture 13

Discrete Fourier Transform (cont’d)

Local averaging viewed in the frequency domain

Let us now examine what is happening in the frequency domain, i.e., in “k-space,” with the DFTs.
Once again, the DFT H of the blurred signal will be related to F as follows,

H[k] = F [k]G[k], k = 0, 1, · · · , N − 1. (16)

Since we know g, we may compute G[k]: By definition,


N −1  
X i2πkn
G[k] = g[n] exp −
n=0
N
   
i2πk i2πk
= g[−1] exp + g[0] exp(0) + g[1] exp −
N N
    
1 i2πk i2πk
= α + (1 − α) exp + exp −
2 N N
 
2πk
= α + (1 − α) cos . (17)
N

Therefore,   
2πk
H(k) = F (k) α + (1 − α) cos . (18)
N
One immediate consequence of this relation is that

H(0) = F (0). (19)

In other words, the zero-frequency component of F is unchanged. But what about the other frequen-
cies? We need to examine the graph of the function G(k) vs. k.
First, we identify some other important values:

G[N/2] = 2α − 1, G[N/4] = G[3N/4] = α. (20)

A qualitative sketch of the graph of G[k] for α < 1/2 is shown in the next figure.

Perhaps the most important feature of the graph is that

|G[k]| < 1, 1 ≤ k ≤ N − 1. (21)

156
G[k] = α + (1 − α) cos(2πk/N )
1

0 k
N/4 N/2 3N/4 N
2α − 1

dampening of magnitudes

-1 of high frequency DFT coefficients

The DFT G[k] of the local averaging convolution kernel g.

Then from the fact that


|H[k]| = |F [k]||G[k]|, (22)

we may conclude that


|H[k]| < |F [k]|, 1 ≤ k ≤ N − 1. (23)

In other words, the magnitudes of the DFT coefficients F [k] have been reduced to produce H[k]. For
the particular case α = 1/3, the degree of shrinking is greatest in the high-frequency region, i.e.,
N/4 ≤ k ≤ 3N/4.
Of course, this result is not surprising – we expected that the blurring or smearing of a signal means
that higher frequency components are being diminished in magnitude. But the main point is that
our analysis allows us to move from deblurring or denoising operations in the spatial or temporal
domain to equivalent operations in the frequency domain. We may choose to modify the DFT
F of a signal in order to denoise/deblur it, rather than working on the signal f itself. Of course, the
method of thresholding of DFT coefficients examined in the previous lecture is an example.

Repeated applications of the averaging operator

Let us change the name of the averaged signal h = f ∗ g to be h1 . If we now apply the averaging
operator to the averaged signal h1 , the result is a new signal – call it h2 :

h2 = h1 ∗ g = (f ∗ g) ∗ g. (24)

157
In the frequency domain, the DFT transform of H2 will be H1 G. But H1 = F G. The net result is
that
H2 [k] = F [k]G[k]2 . (25)

It is straightforward to show that if we apply the convolution/averaging operator n times, the result
is the signal hn with DFT transform,

Hn [k] = F [k]G[k]n . (26)

Recall that G[0] = 1 and that G[k] < 1 for k 6= 1. It follows that

Hn [0] = F [0] n = 1, 2, · · · , (27)

and
G[k]n → 0 as n → ∞. (28)

This implies that in the limit n → ∞, the DFT transform Hn [k] – recall that it is a complex N -vector
– will approach the limiting N -vector

H = (F [0], 0, 0, · · · , 0). (29)

The reader may already see that this corresponds to the DFT of a constant function, as expected:
If you keep taking averages, you eventually smooth the function out to a constant, i.e., h[n] = C. The
question is, “What is the value of the constant C?” We leave it for the reader to show that
N −1
1 X
C= f [n], (30)
N n=0

i.e., the average value of the signal f . This seems to make sense, from a conservation principle, since
the convolution coefficients were chosen to “conserve signals,” cf. Eq. (13).

158
Denoising by local averaging/convolution

The fact that local averaging/convolution smoothens an image suggests that it may be able to perform
denoising, essentially by averaging out the fluctuations produced by the additive noise. This is also
supported by the fact that local averaging dampens the magnitudes of high-frequency coefficients.
As such, we consider applying the local averaging operator introduced in the previous section, with
α = 1/3 to the noisy signal f [n] examined in the previous lecture, and treated with the thresholding
algorithm. Recall that the noisy signal was given by f = f0 + n, where f0 is the noiseless signal
given by Eq. (3) and n is an N -vector composed of random numbers generated by a random number
generator from a normal distribution N (0, σ), zero-mean and standard deviation σ = 0.1.
The results are shown in the accompanying figures. We first present again the noiseless signal f0
along with its DFT, then the noisy signal f along with its DFT. In the following figures, we show the
results of applying one, two, three and four convolutions to the signal. Most noticeably, the L2 error
between the signal and the noiseless signal f0 has been reduced from 1.66 to 1.31 after one application
of averaging. (Recall that the first application of thresholding resulted in an increase in the error.)
The L2 error is further reduced to 1.19 after another convolution. However, the third convolution
increases the error, as does the fourth. This decrease in error, followed by an increase, indicates the
tradeoff between smoothing of the signal to remove noise and smoothing of the signal away from the
underlying noiseless signal f0 .
Note also the effects of the convolutions on the DFT spectra – one application of convolution
significantly diminishes the high-frequency coefficients in the region 75 ≤ k ≤ 175. After two, applica-
tions this part of the spectrum is almost eliminated. Further applications continue to diminish other
parts of the spectrum, which may actually “oversmooth” the signal.

This experiment has shown that the local averaging/convolution method seems to work better
than the thresholding method. But it is only one experiment, and definite conclusions cannot be
made.

Before moving on, it may be useful to see how convolution is effective in denoising by simply
looking at its effects on “pure noise” itself. In other words, instead of looking at a signal corrupted
by noise, as was done above, we simply take “pure noise,” i.e., an N -vector, n = (n1 , n2 , · · · , nN ) with
elements that were generated randomly from a zero-mean Gaussian distribution with variance σ. We
are essentially removing the signal f0 from the experiments done earlier. I thank Y. Lu, a student in

159
Simple denoising by convolution, fc [n] = 13 (f [n − 1] + f [n] + f [n + 1])
2.5 125

1.5 100

0.5 75

|F(k)|
0

-0.5 50

-1

-1.5 25

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Left: Original noiseless signal, N = 256 samples, f0 [n], n = 0, 1, · · · , 255. Right: Magnitudes |F0 [k]| of DFT

coefficients.
2.5 50

1.5 40

0.5 30
|F(k)|

-0.5 20

-1

-1.5 10

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Left: Noisy signal f = f0 + n, σ = 0.1. 0 ≤ x ≤ 2π. N = 256 samples. σ = 0.1. Right: Magnitudes |F [k]| of

DFT coefficients. L2 error kf0 − f k2 = 1.66. Relative L2 error 12.6%.


2.5 50

1.5 40

0.5 30
|F(k)|

-0.5 20

-1

-1.5 10

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Application of one convolution. Reconstructed signal f˜[n] and magnitudes |F̃ [k]|. L2 error kf0 − f˜k2 = 1.31.

Relative L2 error 10.0%.

160
Simple denoising by convolution (cont’d), fc [n] = 13 (f [n − 1] + f [n] + f [n + 1])
2.5 50

1.5 40

0.5 30

|F(k)|
0

-0.5 20

-1

-1.5 10

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Application of two convolutions. Reconstructed signal f˜[n] and magnitudes |F̃ [k]|. L2 error kf0 − f˜k2 = 1.19.

Relative L2 error 9.0%.


2.5 50

1.5 40

0.5 30
|F(k)|

-0.5 20

-1

-1.5 10

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Application of three convolutions. Reconstructed signal f˜[n] and magnitudes |F̃ [k]|. L2 error kf − f˜k2 = 1.25.

Relative L2 error 9.5%.


2.5 50

1.5 40

0.5 30
|F(k)|

-0.5 20

-1

-1.5 10

-2

-2.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Application of four convolutions. Reconstructed signal f˜[n] and magnitudes |F̃ [k]|. L2 error kf − f˜k2 = 1.29.

Relative L2 error 9.8%.

161
the Fall 2019 class, for this suggestion.
The results of this experiment are shown in the figures which appear after those associated with
our earlier experiment. The same variance, i.e., σ = 0.1, was used to generate the noise. In the topmost
level are shown the “pure noise” signal, composed of N = 256 points as well as the magnitudes, |F [k]|,
of its DFT spectrum coefficients. We may consider this pure noise signal to be the corruption of
the signal f0 = 0, i.e., the zero signal, in which case the error of approximation will be given by
kf0 − nk2 = knk2 . In this case, we find that knk2 ≃ 1.65692.
The next-to-top level shows the signal after one application of the convolution – the same con-
volution as was used in the previous experiment. One can see that the “noisiness” of the signal has
been reduced – the L2 norm of the convolved signal n ⋆ g is 0.95496.
But perhaps more dramatic is the clear reduction of high-frequency DFT components. (This is
not “magic” – it is predicted from the action of the convolution operator in frequency space, as derived
in the previous lecture.)
Another application of the convolution reduces the strength of the noisy signal further – its norm
is now 0.78909. And further reduction of high-frequency DFT coefficients has also been achieved.
The final two entries show the originally pure noise signal after 5 and 10 convolutions, respectively.
There has been further reduction in signal strength as well as high-frequency DFT content.

Local averaging/convolution in the frequency domain

The connection between local averaging/convolution and high-frequency damping also suggests that
we could perform denoising in the frequency domain, as was done for thresholding. Instead of merely
discarding DFT coefficients deemed insignificant, we could apply some kind of dampening factor to
the spectrum, with greater dampening being performed for high frequencies.
I thank one of the students in a previous offering of this course (E. Grant) for informing me that
this is essentially what is done by sophisticated audio processing software packages. For example,
I have been told that a software package will ask for a sample of the “background noise”. It then
performs a frequency analysis of this noise – analogous to the DFT – and then allows you to design a
“shaper” to modify these frequencies as desired.
If time permits, perhaps we can return to this topic near the end of this course. In the meantime,
the reader may wish to experiment with various “frequency-shaping” formulas in an effort to denoise
a given signal.

162
Convolution on pure noise, fc [n] = 31 (f [n − 1] + f [n] + f [n + 1])
0.5 5

0.4 4.5

0.3 4

0.2 3.5

0.1 3

|F(k)|
0 2.5

-0.1 2

-0.2 1.5

-0.3 1

-0.4 0.5

-0.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Left: Original pure noise signal n, σ = 0.1, N = 256 samples, knk2 = 1.65692. Right: Magnitudes |F0 [k]| of

0.5
DFT coefficients. 5

0.4 4.5

0.3 4

0.2 3.5

0.1 3
|F(k)|

0 2.5

-0.1 2

-0.2 1.5

-0.3 1

-0.4 0.5

-0.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Application of one convolution. “Reduced noise” signal n ⋆ g and magnitudes |F̃ [k]|. kn ⋆ gk2 = 0.95496.
0.5 5

0.4 4.5

0.3 4

0.2 3.5

0.1 3
|F(k)|

0 2.5

-0.1 2

-0.2 1.5

-0.3 1

-0.4 0.5

-0.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Application of two convolutions. “Reduced noise” signal n ⋆ g ⋆ g and magnitudes |F̃ [k]|.

kn ⋆ g ⋆ gk2 = 0.78009.

163
Convolution on pure noise (cont’d), fc [n] = 31 (f [n − 1] + f [n] + f [n + 1])
0.5 5

0.4 4.5

0.3 4

0.2 3.5

0.1 3
|F(k)|

0 2.5

-0.1 2

-0.2 1.5

-0.3 1

-0.4 0.5

-0.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Application of five convolutions. “Reduced noise” signal and magnitudes |F̃ [k]|. kn ⋆ g ◦5 k2 = 0.61383.
0.5 5

0.4 4.5

0.3 4

0.2 3.5

0.1 3
|F(k)|

0 2.5

-0.1 2

-0.2 1.5

-0.3 1

-0.4 0.5

-0.5 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 50 100 150 200 250
x k

Application of ten convolutions. “Reduced noise” signal and magnitudes |F̃ [k]|. kn ⋆ g ◦10 k2 = 0.51114.

164
Signal/image enhancement and “Deconvolution”

In many practical applications, we are given a degraded signal, say, h, and asked to find good approx-
imations to the original signal f that was degraded to produce h. The degradation could be done by
noise or by blurring, or both. In fact, in signal processing literature the general model for degradation
is a composition of a blur along with a noise operator (often additive noise).
If we happen to know (or assume!) that the degradation was accomplished by convolution with a
kernel g, then the DFTs of the degraded signal h and the original signal h are related as follows,

H[k] = F [k]G[k]. (31)

Now suppose that we know the operator g, hence the DFT coefficients, G[k]. One may well be tempted
to solve for F [k] by division, i.e.,

H[k]
F [k] = , k = 0, 1, · · · , N − 1. (32)
G[k]

and then performing an inverse DFT on F to obtain f , i.e., f = F −1 F .


Very nice in theory, but not often successful in practice! One reason is that some coefficients G[k]
may be zero or very close to zero in magnitude. As a result, this procedure is unstable. A more stable
procedure would be to find an FFT F that minimizes the squared distance

kF − HGk2 . (33)

In L2 , this becomes a least-squares problem which is generally more stable.


But there are other problems. Generally, such inverse problems, i.e., given an h, find f such
that h = f ∗ g, are said to be ill-posed because they lack unique solutions. There are often many, if
not an infinite number of, solutions that satisfy the relation, at least approximately. One must often
impose additional conditions on the solution during the process, which restricts the space of solutions
that we are exploring, but we still may be able to find useful approximations. The imposition of
additional conditions (do you recall the Lagrange multiplier technique in advanced calculus?) is
known as regularization.
The problem of “inverting” Eq. (31), i.e., find F given H, is known as “deconvolution,” for reasons
that should be clear: One obtains H from F by convolution, so obtaining F from H is the reverse
process, i.e., “undoing” the convolution.

165
Signal/image enhancement: When does one stop?

In the previous experiments on denoising using thresholding and convolution, we applied a particu-
lar operation on the noisy image several times and observed the errors between the enhanced (i.e.,
denoised) signal and the reference image f0 . In both cases, as is found in general, the error will
decrease, but then increase. As such, there are optimal “cutoff times”. Of course, if we know the
original signal f0 , we know when to stop. But what if we don’t know f0 , the situation we face most
often? For example, we may retrieve a noisy or blurred image f from somewhere, perhaps our own
digital camera and wish to enhance it, i.e., denoise or deblur it. Most often, this is done by trial
and error – we look at the result of the enhancement process and decide if we are satisfied with the
result. If not, we may wish to continue with the enhancement process, perhaps tweaking the control
parameters.
A big question in signal/image processing is, “How do we automate this process?” For example,
how can we program a computer to know when to stop applying an operation, say convolution, to a
noisy image, if we don’t know the noiseless image f0 ?

166
A simple illustration of DFT for audio signal processing: Handel’s “Hallelujah”
chorus

At this point in the lecture, some results of simple DFT-based denoising, now applied to a real audio
signal – Handel’s “Hallelujah” chorus – were presented. The results, along with the MATLAB file
used to generate them (originally written by D. Brunet, the TA for this course in Winter 2011 and
modified slightly by ERV in 2015), can be found in a folder entitled, hallelujah-appendix.pdf,
which is posted below this week’s (Week No. 5) set of lectures on UW LEARN.

167
Lecture 14

Discrete Cosine Transform (DCT)

Here we briefly examine the DCT, which is of great importance in signal and image processing. It
actually exists in several forms, one of which provides the basis of the standard “JPEG” compression
method.

Introduction

First recall the standard definition of the DFT which is employed in this course. Let f ∈ RN denote
a set of data points f [n], n = 0, 1, · · · , N − 1, with the understanding that f is part of an N -periodic
sequence, i.e., f [n + N ] = f [n], n ∈ Z. Then the Discrete Fourier Transform F = Ff is defined as
follows,
N −1  
X i2πkn
F [k] = f [n] exp − , k = 0, 1, · · · , N − 1. (34)
n=0
N
The inverse DFT is given by
N −1  
1 X i2πkn
f [n] = F [k] exp , n = 0, 1, · · · , N − 1. (35)
N N
k=0

Recall from our discussion of the Fourier series representations of functions f (x) of a continuous
variable −π ≤ x ≤ π, that the series actually defines the 2π-periodic extension of f (x) for all x ∈ R. If
f (π) 6= f (π), then the 2π-periodic extension of f (x) is not continuous at these points. As we saw, such
a discontinuity introduces convergence problems at the endpoints, i.e., Gibbs phenomenon. One way
to overcome such problems was to consider the signal function f (x) as defined over the half-interval
[0, π]. It was also assumed to be an even function, i.e., f (−x) = f (x). The resulting 2π-extension of
f (x) is now continuous at ±π.
This same complication arises with data sequences. The N -point DFT assumes an N -point
periodization of the data set f [n], n = 0, 1, · · · , N − 1, as mentioned earlier. This implies that f [0] =
f [N ], but it does not imply that f [N − 1] is close to f [N ], as sketched in the figure below. In such
situations, there will be convergence problems near the endpoints. (And, as we have already seen,
these problems manifest themselves over the entire data set, slowing down the overall convergence.)
So the idea is to perform an even periodic extension of the N -point data set f [n], n = 0, 1, · · · , N −
1. Before we get to the actual construction of such a data set, let us examine the implications on the

168
o o

o o

o o
o o
o o o

N − 2N − 1 0 1 2 N − 2N − 1 0 1 2

periodic extension N -point data set periodic extension

N -point data set f [n] and the periodic extension f [n + N ] = f [n] implicit in the discrete Fourier transform.

form of the discrete transform that will be associated with this set. Since it will be even, we shall
not require the sine functions that comprise part of the complex exponential in the DFT. As such, we
might expect that the transform will assume the form,
N −1  
X 2πkn
F [k] = f [n] cos , k = 0, 1, · · · , N − 1. (36)
N
n=0

This is almost correct – one slight modification must be made. This would be the transform for N
points over an interval centered at N = 0. But really want this transform to correspond to a signal of
N points only on one side of N = 0. Remember that these N points would then be “flipped” onto
the other side to create the even data set. As such – modulo one other technical consideration to be
addressed in a moment – the above formula should have N changed to 2N . The resulting transform,
N −1  
X πkn
F [k] = f [n] cos , k = 0, 1, · · · , N − 1. (37)
n=0
N

is known as a discrete cosine transform (DCT). Note the use of the article “a” again! This is
actually one of several versions, and is commonly referred to as “DCT-I”, i.e., DCT Version 1.

Let’s now address the “technical consideration” referred to above. Consider the “flipping” of the
N -point data set f [n], n = 0, 1, · · · , N − 1 to produce an even data set, f [n] = f [−n], as sketched in
the figure below.
In this process, we have produced a data set of length 2N − 1 and not 2N since the data point
f [0] has not been repeated. Essentially, we have “lost” one data point. This might not seem like a
big deal, and it isn’t in some considerations. But studies have shown that this transform is not ideal
– that convergence is improved if we preserve the point f [0] to produce a 2N -point data set. The

169
o

o o
o o
o o o o
o o

−N −N + 1 −2 −1 0 1 2 N −2 N −1

even extension N -point data set

Even extension of N -point data set f [n] obtained by inverting the set about f [0] so that f [−n] = f [n]. This is
the basis of the “DCT-I” method.

question is, “How do we do it?” The answer is to “flip” the data set f [n], n = 0, 1, · · · , N − 1 with
respect to the line n = −1/2, as sketched in the figure below.
The resulting 2N -point data set is now even with respect to n = −1/2, and may be periodically
extended. Note that f [0] = f [−1], f [N − 1] = f [N ], etc.. In other words, repetition of data values
has been introduced at points n = pN , p ∈ Z: f [pN ] = f [pN − 1].

o o

o o
o o
o o o o
o o

−N −N + 1 −2 −1 0 1 2 N −2 N −1

-1/2

N -point data set N -point data set


even extension

Even extension of N -point data set f [n] obtained by inverting and copying the entire set f [n], including f [0].
The result is a 2N -point data set. This is the basis of the “DCT-II” method.

The remaining question, “What will be the proper form of the associated discrete cosine transform
for this data set?” First of all, because we are now working with a 2N -point data set – the set is
2N -periodic and not N -periodic – the N in the argument of the cosine function must be replaced by

170
2N . Second, because the data set is even with respect to n = −1/2, we must shift the n parameter in
Eq. (37) by one-half to the left, i.e., replace n with n + 1/2. The result,
N −1    
X π 1
F [k] = f [n] cos n+ k , k = 0, 1, · · · , N − 1. (38)
n=0
N 2

will define what we shall call Version 1.1 of the discrete cosine transform, or “DCT-I.I”.

Before discussing this version of the DCT in more mathematical detail, we mention that a com-
parison of the two figures above indicates why the DCT-I.I method may perform better in signal
processing that DCT-I. The copying of the f [0] data value in DCT-I.I produces a kind of “flattening”
of the resulting signal at n = −1 and n = 0, as opposed to a potential cusp produced by DCT-I.

And now for the mathematical details. First of all, we claim that the set of N -vectors uk ,
k = 0, 1, 2, · · · , N − 1, with components defined as follows,
   
π 1
uk [n] = cos n+ k , n = 0, 1, · · · , N − 1, (39)
N 2

form an orthogonal set in RN . The first basis vector, k = 0, is an easy one:

u0 [n] = cos(0) = 1, implying that u0 = (1, 1, · · · , 1). (40)

It follows immediately that


N
X −1
hu0 , u0 i = 1 = N. (41)
n=0

For k 6= 0, we have
N −1 
  
X 1
2 π
huk , uk i = cos n+ k
n=0
2 N
N −1   
X 1 1 (2n + 1)π
= + cos k
n=0
2 2 N
N
= . (42)
2

The fact that the sum of the discrete cosine functions is zero may be verified by expressing the cosine
in terms of complex exponentials. The sums over both exponentials are finite geometric series which
vanish, in the same way that they did for the discrete Fourier transform.

171
Finally, for k 6= l, we simply state that
N −1        
X π 1 π 1
huk , ul i = cos n+ k cos n+ l
N 2 N 2
n=0
N −1     N −1    
1 X π 1 1 X π 1
= cos n+ (k + l) + cos n+ (k − l)
2 n=0
N 2 2 n=0 N 2
= 0. (43)

Once again, the fact that each of the sums is zero may be shown by expressing the cosine functions in
terms of complex exponentials.

From the above results, it follows that the family of N -vectors, ek , defined below, forms an
orthonormal basis on RN :
r    
2 π 1
ek [n] = λk cos n+ k , n = 0, 1, · · · , N = 1, (44)
N N 2

where 
 √1 , k=0
λk = 2 (45)
 1, k 6= 0.
The special normalization required for the cos(0) function reminds us of the situation with Fourier
cosine series.
In the figure below are presented plots of the N = 8-point orthonormal functions ek [n], k =
0, 1, · · · , 7. These functions are rather special since they form the basis of the JPEG compression
standard.

Given any f ∈ RN , its expansion in the orthonormal basis ek will be given by


N
X −1
f= ck ek , (46)
n=0

where the Fourier coefficients ck are given by


N −1 r    
X 2 π 1
ck = hf, ek i = f [n]λk cos n+ k . (47)
N N 2
n=0

As in the case of the discrete Fourier transform, we consider the ck to define the discrete cosine
transform (DCT) of f , i.e.,
r N −1    
2 X π 1
F [k] = λk f [n] cos n+ k . (48)
N N 2
n=0

172
0.5 0.5 0.5 0.5

0.4 0.4 0.4 0.4

0.3 0.3 0.3 0.3

0.2 0.2 0.2 0.2

0.1 0.1 0.1 0.1

0 0 0 0

-0.1 -0.1 -0.1 -0.1

-0.2 -0.2 -0.2 -0.2

-0.3 -0.3 -0.3 -0.3

-0.4 -0.4 -0.4 -0.4

-0.5 -0.5 -0.5 -0.5

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
n n n n

e0 [n] e1 [n] e2 [n] e3 [n]


0.5 0.5 0.5 0.5

0.4 0.4 0.4 0.4

0.3 0.3 0.3 0.3

0.2 0.2 0.2 0.2

0.1 0.1 0.1 0.1

0 0 0 0

-0.1 -0.1 -0.1 -0.1

-0.2 -0.2 -0.2 -0.2

-0.3 -0.3 -0.3 -0.3

-0.4 -0.4 -0.4 -0.4

-0.5 -0.5 -0.5 -0.5

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
n n n n

e4 [n] e5 [n] e6 [n] e7 [n]

The N = 8-point DCT-II orthonormal functions ek [n], k = 0, 1, · · · , 7, plotted as bargraphs because of their

discrete nature.

The inverse discrete cosine transform may be found by using the orthonormality of the ek :
r N −1    
2 X π 1
f [n] = F [k]λk cos n+ k . (49)
N N 2
k=0

These two formulas define the so-called “DCT-II” discrete cosine transform that is em-
ployed in the so-called JPEG compression standard. They also correspond to the dct
and idct functions in MATLAB, i.e.,

F = dct(f)
f = idct(F) .

In many signal processing books, however, (see, for example, the book by Mallat) the practice is to
remove the normalization term from the forward DCT, and then modify the inverse DCT accordingly.
The result is as follows,
N −1    
X π 1
F [k] = λk f [n] cos n+ k . (50)
N 2
n=0
with inverse
N −1    
2 X π 1
f [n] = F [k]λk cos n+ k . (51)
N N 2
k=0

173
The normalization factors λk must be kept in both formulas, to differentiate the k = 0 case from the
k 6= 0 case.

Examples: N = 4: We use the 4-vectors examined in the section on DFT. In what follows, we use
the DCT-II transforms (which may be verified with MATLAB):

1. f = (1, 1, 1, 1). F = (2, 0, 0, 0).



2. g = (0, 1, 0, 1). G = (1, (− 2 + 1) cos(π/8), 0, − cos(π/8)) ≈ (1, −0.3827, 0, −0.9239).

3. h = (1, 2, 1, 2). H = (3, (− 2 + 1)(cos(π/8), 0, − cos(π/8)).

Note that h = f + g, implying that H = F + G, because of the linearity of the DCT.



4. a = (1, 0, 1, 0). A = (1, ( 2 − 1) cos(π/8), 0, cos(π/8)) ≈ (1, 0.3827, 0, 0.9239).

Note that a represents a left (or right) shift of vector b in 2.. The transforms G and A appear
to be related.

Applications of DCT to signal processing

As mentioned several times above, the DCT is important in signal processing. Generally speaking,
DCTs demonstrate much greater “energy compaction” that discrete Fourier transforms. By this we
mean that more of the “energy” of the signal – the squared L2 norm, measured by the sums of the
squares of the coefficients – is contained in the lower frequency coefficients. (Another way of saying
this is that the DCT coefficients decay more rapidly.) This is important in compression – a signal
can generally be approximated to prescribed degree of accuracy with less DCT coefficients than DFT
coefficients.

We illustrate with a simple example. Consider the function

2 /10
f (x) = e−x [sin(2x) + 2 cos(4x) + 0.4 sin(x) sin(10x)], 0 ≤ x ≤ 2π. (52)

that was used in previous lectures. Once again, we consider an N = 256 sampling of this function over
the interval [0, 2π]. The DFT and DCT of the signal f [n], n = 0, 1, · · · , 255 are shown in the figure
below. (The DFT appeared in an earlier lecture.)

174
125 125

100 100

75 75

|DCT(k)|
|F(k)|

50 50

25 25

0 0
0 50 100 150 200 250 0 50 100 150 200 250
k k

DFT and DCT spectra of sampled signal f [n], n = 0, 1, · · · , N − 1, N = 256, obtained from function f (x) =
2
/10
e−x [sin(2x) + 2 cos(4x) + 0.4 sin(x) sin(10x)], 0 ≤ x ≤ 2π. Left: Magnitudes of DFT coefficients. Right:

Magnitudes of DCT coefficients. The DCT demonstrate a much greater “energy compaction”: Most of the

“energy” (squared L2 norm) of the signal is contained in the first 23 DCT coefficients.

The first notable difference between the plots is that the DCT does not have high-magnitude
coefficients near N . This is because the DCT does not possess conjugate symmetry. As such, the
high-frequency range for DCT is k → N , and not k near N/2 as was the case for DFT.
The second feature is that for N > 30, the DCT coefficients are virtually negligible. This indicates
the energy compaction property mentioned earlier.

A simple experiment to compare DFT and DCT methods of denoising by thresholding

Let f0 denote an N -point signal that is assumed to be “noiseless.” If the DCT coefficients demonstrate
greater “energy compaction”, then for some integer k0 > 0, we expect that the DCT coefficients F0 [k]
are negligible. Now suppose that Gaussian noise N (0, σ) (zero-mean, variance σ 2 ) is added to f0 to
produce a noisy signal f , i.e.,
f = f0 + n, (53)

where n is an N -vector whose entries are random numbers.


The DCT spectrum of pure noise – let’s call it Fn [k] – has the same characteristics as its DFT
spectrum, i.e., statistical fluctuations, but no overall decay with k. From the linearity property of the
DCT,
F [k] = F0 [k] + Fn [k], (54)

it follows that for k > k0 , F [k] ≈ Fn [k]. In other words, the DCT coefficients F [k] for k > k0

175
correspond to the noise, and carry no information from the signal f0 . As such, if we can remove these,
we are removing some of the noise and not affecting the original signal.
This is more difficult with the discrete Fourier transform since, as we saw earlier, the DFT coef-
ficients do decay in magnitude but do not become negligible.
To test this conjecture, we have applied the thresholding method for both DFT and DCT rep-
resentations of the noisy signal f obtained by adding Gaussian noise with zero-mean and standard
deviation σ = 0.1 to the N = 255 signal of Eq. (52). (This is the same noisy signal that was used in
an earlier lecture to illustrate the denoising method for DFTs.) Recall the basis of the thresholding
method: Given a threshold ǫ > 0, and a transform F – either a DFT or a DCT – we discard all
coefficients F [k] whose magnitudes lie below ǫ. This produces a modified transform F̃ǫ which, when
inverted, yields a modified, signal f˜ǫ which represents an approximation to the noiseless signal f0 .
For both representations, we have computed the L2 errors kf0 − f˜ǫ k for threshold parameters ǫ
ranging from 0 to 10.0. The results are shown in the plot below. For 0 ≤ ǫ ≤ 0.5, the two transforms
yield virtually identical results, with almost no improvement in the error. For ǫ > 1, however, the
respective errors begin to diverge dramatically, with the DCT method yielding lower errors and the
DFT yielding higher errors. Between ǫ = 3 and ǫ = 4, the DCT thresholding method yields the lowest
error – roughly one-half the error associated with the original noisy signal.
This simple experiment shows the advantage of working with the more compact DCT represen-
tation. A final comment: The “unsymmetric” form of the DCT was employed in these computations,
where there is no factor in front of the forward DCT and a factor of 1/N in the inverse DCT. In
this way, the magnitudes of the DFT and DCT coefficients are comparable, so that it makes sense to
compare the results for the two transforms at a common ǫ value.

176
3

2.75

2.5

2.25 DFT

1.75

1.5

1.25

0.75 DCT

0.5

0.25

0
0 1 2 3 4 5 6 7 8 9 10
eps

Errors kf0 − f˜ǫ k vs. ǫ for thresholding of DFT and DCT coefficients of the noisy signal f = f0 + n, σ = 0.1.

177

You might also like