Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Lecture 6

Inner product spaces (cont’d)

Best approximation in Hilbert space: Some examples

We now consider some examples to illustrate the property that the best approximation of an element
x ∈ H in a subspace Y ⊂ H is the projection PY (x).

1. The finite dimensional case H = R3 , as an easy “starter”. Let x = (a, b, c) = ae1 + be2 + ce3 .
Geometrically, the ei may be visualized as the i, j and k unit vectors which form an orthonormal
set. Now let Y = span{e1 , e2 }. Then y = PY (x) = (a, b, 0), which lies in the e1 -e2 plane.
Moreover, the distance between x and y is

kx − yk = [(a − a)2 + (b − b)2 + (c − 0)2 ]1/2 = |c|, (1)

the distance from x to the e1 -e2 plane.

2. Now let H = L2 [−π, π], the space of square-integrable functions on [−π, π] and consider the
following set of functions,
1 1 1
e1 (x) = √ , e2 (x) = √ cos x, e3 (x) = √ sin x. (2)
2π π π
These three functions form an orthonormal set in H, i.e.,
Z π
hei , ej i = ei (x)ej (x) dx = δij . (3)
−π

(The reader is invited to verify the above statement.) Let Y3 = span{e1 , e2 , e3 }.

Now consider the function f (x) = x2 in L2 [−π, π]. The best approximation to f in Y3 will be
given by the function

f3 = PY3 f = hf, e1 ie1 + hf, e2 ie2 + hf, e3 ie3 . (4)

We now compute the Fourier coefficients:


Z π √
1 2 2 5/2
hf, e1 i = √ x dx = · · · = π ,
2π −π 3
Z π
1 √
hf, e2 i = √ x2 cos x dx = · · · = −4 π
π −π
Z π
1
hf, e3 i = √ x2 sin x dx = 0. (5)
π −π

67
The final result is


   
2 5/2 1 1
f3 (x) = π √ − 4 π √ cos x
3 2π π
π 2
= − 4 cos x. (6)
3

Note: This result is identical to the result obtained by the traditional Fourier series method,
where you simply worked with the cos x and sin x functions and computed the expansion coef-
ficients using the formulas from AMATH 231, cf. Lecture 2. Computationally, more work is

involved in producing the above result because you have to work with factors and powers of π
that may eventually disappear. The advantage of the above approach is to illustrate the “best
approximation” idea in terms of projections onto spans of orthonormal sets.

Finally, we compute the error of the above approximation to be (via MAPLE)


"Z 
π 2 #1/2
2 π2
kf − f3 k2 = x − + 4 cos x dx ≈ 2.034. (7)
−π 3

The approximation is sketched in the top figure on the next page.

3. Same space and function f (x) = x2 as above, but we add two more elements to our orthonormal
basis set,
1 1
e4 (x) = √ cos 2x, e5 (x) = √ sin 2x. (8)
π π
Now define Y5 = span{e1 , · · · , e5 }. The approximation to f in this space, f5 (x) = PY5 f , is
5
X
f5 = hf, ek iek
k=1
= f3 + hf, e4 ie4 + hf, e5 ie5 . (9)

As you already know from AMATH 231, the first three coefficients of the expansion do not have
to be recomputed. This is the advantage of working with an orthonormal basis set.
The final two Fourier coefficients are now computed,
Z π
1 √
hf, e4 i = √ x2 cos 2x dx = π
π −π
Z π
1
hf, e5 i = √ x2 sin 2x dx = 0 . (10)
π −π
The final result is
π2
f5 (x) = − 4 cos x + cos 2x. (11)
3

68
This approximation is sketched in the bottom figure on this page. Finally, the error of this
approximation is computed to be (via MAPLE)

kf − f5 k2 ≈ 1.694, (12)

which is lower than the approximation error yielded by f3 , as expected.


15

10 y=f(x)=x^2

y=f_3(x)

-5
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x

Approximation f3 (x) = (PY3 f )(x) to f (x) = x2 . Error kf − f3 k2 ≈ 2.034.


15

10 y=f(x)=x^2

y=f_5(x)

-5
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x

Approximation f5 (x) = (PY5 f )(x) to f (x) = x2 . Error kf − f5 k2 ≈ 1.694.

69
4. Now consider the space H = L2 [−1, 1] and the following set of functions
r r
1 3 5 1
e1 (x) = √ , e2 (x) = x, e3 (x) = · (3x2 − 1). (13)
2 2 2 2
These three functions form an orthonormal set on [−1, 1]. Moreover, Z3 = span{e1 , e2 , e3 } =
span{1, x, x2 }. (Note that we’ve called the space Z3 to differentiate it from the space Y3 in
Example 2. The ei are obtained from the application of the Gram-Schmidt orthogonalization
procedure to the linearly independent set {1, x, x2 }.

Let’s go ahead and determine the best approximation of the function f (x) = x2 in L2 [−1, 1].
Note that this function is different from the function f (x) = x2 in L2 [−π, π] considered in
Examples 2 and 3. We compute the Fourier coefficients,
Z 1 √
1 2 2
hf, e1 i = √ x dx = · · · = ,
2 −1 3
r Z 1
3
hf, e2 i = x3 dx = 0 ,
2 −1
5 1 1
r r
2 2
Z
4 2
hf, e3 i = · (3x − x ) dx = · · · = · . (14)
2 2 −1 3 5
The resulting expansion of f (x) in Z3 is

f3 = PZ3 f = hf, e1 ie1 + hf, e2 ie2 + hf, e3 ie3


√ r r
2 1 2 2 5 1
= · √ + 0·x + · · · (3x2 − 1)
3 2 3 5 2 2
1 1
= + x2 −
3 3
2
= x . (15)

We have reconstructed the function f (x) = x2 completely from the basis elements e1 , e2 and
e3 ! In retrospect, this should not be surprising. Our function f (x) = x2 actually “lives” in the
space Z3 which we acknowledged earlier to be the span of the basis functions 1, x and x2 .

Let’s step back a bit, however, and look at the best approximation of f (x) = x2 in the space Z1 ,
i.e.,

f∼
= f1 = PZ1 = hf, e1 ie1
1
= . (16)
3
This is the best constant approximation to f over the interval [−1, 1] which, from an earlier
discussion, should be the mean value of f over this interval which we denote as f¯[−1,1] . Let’s

70
check this:

1 1 2
Z
f¯[−1,1] = x dx
2 −1
1 2
= ·
2 3
1
= . (17)
3

5. We now consider a slightly more intriguing example that will provide a preview to our study of
wavelet functions later in this course. Consider the function space L2 [0, 1] and the two elements
e1 and e2 given by 
 1, 0 ≤ x ≤ 1/2,
e1 (x) = 1, e2 (x) = (18)
 −1, 1/2 < x ≤ 1.

They are sketched in the figure below.


y y = e1 (x) y y = e2 (x)
1 1

0 x 0 x
1 1

-1 -1

It is not too hard to see that these two functions form an orthonormal set in L2 [0, 1], i.e.,

he1 , e1 i = he2 , e2 i = 1, he1 , e2 i = 0. (19)

Now let f (x) = x2 as before. Let us first consider the subspace Y1 = span{e1 }. It is the
one-dimensional subspace of functions in L2 [0, 1] that are constant on the interval. The approx-
imation to f in this space is given by

f1 = (PY1 )f = hf, e1 ie1 = hf, e1 i, (20)

since e1 = 1. The Fourier coefficient is given by


1
1
Z
hf, e1 i = x2 dx = . (21)
0 3
1
Therefore, the function f1 (x) = , sketched in the left subfigure below, is the best constant-
3
function approximation to f (x) = x2 on the interval. It is the mean value of f on [0, 1].

71
Now consider the space Y2 = span{e1 , e2 }. The best approximation to f in this space will be
given by
f2 = hf, e1 ie1 + hf, e2 ie2 . (22)

The first term, of course, has already been computed. The second Fourier coefficient is given by
Z 1
hf, e2 i = x2 e2 (x) dx
0
Z 1/2 Z 1
= x2 dx − x2 dx
0 1/2

1 3 1/2 1 3 1

= x − x
3 0 3 1/2
1
= − . (23)
4

Therefore
1 1
f2 = e1 − e2 . (24)
3 4
In order to get the graph of f2 from that of f1 , we simply subtract 1/4 from the value of 1/3
over the interval [0, 1/2] and add 1/4 to the value of 1/3 over the interval (1/2, 1]. The result is

 1/12, 0 ≤ x ≤ 1/2,
f3 (x) = (25)
 7/12, 1/2 < x ≤ 1.

The graph of f3 (x) is sketched in the right subfigure below. The values 1/12 and 7/12 correspond
to the mean values of f (x) = x2 over the intervals [0, 1/2] and (1/2, 1], respectively. (These
should, of course, agree with your calculations in Problem Set No. 1.)
y y
y = x2 y = x2
1 1

7/12

1/3 1/3

1/12
x x
0 1/2 1 0 1/2 1

The space Y2 is the vector space of functions in L2 [0, 1] that are piecewise-constant over the
half-intervals [0, 1/2] and (1/2, 1]. The function f2 is the best approximation to x2 from this

72
space.

A natural question to ask is, “What would be the next functions in this set of piecewise constant
orthonormal functions?” Two possible candidates are the functions sketched below.
y y
y = e3 (x) y = e4 (x)
1 1

0 x 0 x
1 1

-1 -1

These, in fact, are called “Walsh functions” and have been used in signal image processing.
However, another set of functions which can be employed, and which will be quite relevant later
in the course, include the following:
y y
y = e3 (x) y = e4 (x)
√ √
2 2

0 x 0 x
1 1

√ √
− 2 − 2

These are the next two “Haar wavelet functions”. We claim that the space Y4 = span{e1 , e2 , e3 , e4 }
is the set of all functions in L2 [0, 1] that are piece-wise constant on the half-intervals [0, 1/2] and
(1/2, 1].

A note on the Gram-Schmidt orthogonalization procedure

As you may recall from earlier courses in linear algebra, the Gram-Schmidt procedure allows the
construction of an orthonormal set of elements {ek }n1 from a linearly independent set {vk }n1 , with
span{ek }n1 = span{vk }n1 . Here we simply recall the procedure.

73
Start with an element, say v1 , and define

v1
e1 = . (26)
kv1 k

Now take element v2 and remove the component of e1 from v2 by defining

z2 = v2 − hv2 , e1 ie1 . (27)

We check that e1 ⊥ z2 :
he1 , z2 i = he1 , v2 i − he1 , v2 ihe1 , e1 i = 0. (28)

Now define
z2
e2 = . (29)
kz2 k
We continue the procedure, taking v3 and eliminating the components of e1 and e2 from it. Define,

z3 = v3 − hv3 , e1 ie1 − hv3 , e2 ie2 . (30)

It is straightforward to show that z3 ⊥ e1 and e2 . Then define

z3
e3 = . (31)
kz3 k

In general, from a knowledge of {e1 , · · · , ek−1 }, we can produce the next element ek as follows:
k−1
zk X
ek = , where zk = vk − hvk , ei iei . (32)
kzk k
i=1

Of course, if the inner product space in which we are working is finite dimensional, then the procedure
terminates at k = n = dim(H). But when H is infinite-dimensional, we may, at least in principle, be
able to continue to process indefinitely, producing a countably infinite orthonormal set of elements
{ek }∞
1 . The next question is, “Is such an orthonormal set useful?” The answer is, “Yes.”

Complete orthonormal basis sets in an infinite-dimensional Hilbert space

Let H be an infinite-dimensional Hilbert space. Let us also suppose that we have an infinite sequence
of orthonormal elements {ek } ⊂ H, with hei , ej i = δij . We consider the finite orthonormal sets
En = {e1 , e2 , · · · , en }, n = 1, 2, · · ·, and define

Vn = span{e1 , e2 , · · · , en }, n = 1, 2, · · · . (33)

74
Clearly, each Vn is an n-dimensional subspace of H.
Recall that for an x ∈ H, the best approximation to x in Vn is given by
n
X
yn = PVn (x) = hx, ek iek , (34)
k=1

with
n
X
kyn k2 = |hx, ek i|2 . (35)
k=1
Let us denote the error associated with the approximation x ≈ yn as

∆n = kx − yn k. (36)

Note that this error is the distance between x and yn as defined by the norm on H which, in turn,
is defined by the inner product h , i on H.
Now consider Vn+1 = span{e1 , e2 , · · · , en , en+1 }, which we may write as

Vn+1 = Vn ⊕ span{en+1 }. (37)

It follows that
Vn ⊂ Vn+1 . (38)

This, in turn implies that


∆n+1 ≤ ∆n . (39)

We can achieve the same error ∆n in Vn+1 by imposing the condition that the coefficient cn+1 of
en+1 is zero in the approximation. By allowing cn+1 to vary, it might be possible to obtain a better
approximation. In other words, since we are minimizing over a larger set, we can’t do any worse
than we did before.

If our Hilbert space H were finite dimensional, i.e., dim(H) = N > 0, then ∆N = 0 for all x ∈ H.
But in the case that H is infinite-dimensional, we would like that

∆n → 0 as n → ∞, for all x ∈ H. (40)

In other words all approximation errors go to zero in the limit. (Of course, in the particular case that
x ∈ VN , then ∆N = 0. But we want to be able to say something about all x ∈ H.) Then we shall be
able to write the infinite-sum result,

X
x= hx, ek iek . (41)
k=1

75
The property (40) will hold provided that the orthonormal set {ek }∞
1 is a complete or maximal

orthonormal set in H.

Definition: An orthonormal set {ek }∞


1 is said to be complete or maximal if the following is true:

If hx, ek i = 0 for all k ≥ 1 then x = 0. (42)

The idea is that the {ek } elements “detect everything” in the Hilbert space H. And if none of
them detect anything in an element x ∈ H, then x must be the zero element.

Now, how do we know if a complete orthonormal set can exist in a given Hilbert space? The
answer is that if the Hilbert space is separable, then such a complete, countably-infinite set exists.
(A separable space contains a dense countable subset.) OK, so this doesn’t help yet, because we
now have to know whether our Hilbert space of interest is separable. Let it suffice here to state that
most of the Hilbert spaces that we use in applications are separable. (See Note below.) Therefore,
complete orthonormal basis sets can exist. And the final “icing on the cake” is the fact that, for
separable Hilbert spaces, the Gram-Schmidt orthogonalization procedure can produce such a complete
orthonormal basis.

Note to above: In the case of L2 [a, b], we have the following results from advanced analysis:

1. The space of all polynomials P[a, b] defined on [a, b] is dense in L2 [a, b]. That means that given
any function u ∈ L2 [a, b] and an ǫ > 0, we can find an element p ∈ P[a, b] such that ku − pk2 < ǫ.

2. The set of polynomials with rational coefficients, call it PR [a, b], a subset of P[a, b] is dense in
P[a, b]. (You may know that the set of rational numbers is dense in R.) And finally, the set
P[a, b] is countable. (The set of rational numbers on R is countable.)

3. Therefore the set PR [a, b] is a dense and countable subset of L2 [a, b].

76
Complete orthonormal basis sets – “Generalized Fourier series”

We now conclude our discussion of complete orthonormal basis sets in a Hilbert space.
In what follows, we let H denote an infinite-dimensional Hilbert space (for example, the space of
square-integrable functions L2 [−π, π]). It may help to recall the following definition.

Definition: An orthonormal set {ek }∞


1 is said to be complete or maximal if the following is true:

If hx, ek i = 0 for all k ≥ 1 then x = 0. (43)

Here is the main result:

Theorem: Let {ek }∞


1 denote an orthonormal set on a separable Hilbert space. Then the following

statements are equivalent:

1. The set {ek }∞


1 is complete (or maximal). (In other words, it serves as a complete basis for H.)

2. For any x ∈ H,

X
x= hx, ek iek . (44)
k=1

(In other words, x has a unique representation in the basis {ek }.

3. For any x ∈ H,

X
kxk2 = |hx, ek i|2 . (45)
k=1

This is called Parseval’s equation.

Notes:

1. The expansion in Eq. (44) is also called a “Generalized Fourier Series”. Note that the basis
elements ek do not have to be sine or cosine functions – they can be polynomials in x: the term
“Generalized Fourier Series” may still be used.

2. The coefficients ck = hx, ek i in Eq. (44) are often called “Fourier coefficients” even if the ek are
not sine or cosine functions.

77
3. Most important is the fact that from Eq. (45), the infinite sequence of Fourier coefficients
c = (c1 , c2 , · · ·) is square-summable. In other words, c ∈ l2 , i.e., c is an element of the sequence
space l2 . Parseval’s relation in Eq. (45) may be rewritten as follows,

kxkL2 = kckl2 . (46)

An important consequence of the above theorem:


As before, let Vn = span{e1 , e2 , · · · , en }. For a given x ∈ H, let yn ∈ Vn be the best approximation
of x in Vn so that
n
X
yn = ck ek , ck = hx, ek i. (47)
k=1

Then the magnitude of the error of approximation x ≈ yn is given by

∆n = kx − yn k
∞ n

X X
= ck ek − ck ek


k=1∞
k=1

X
= ck ek


k=n+1
" ∞ #1/2
X
2
= ck . (48)
k=n+1

In other words, the magnitude of the error is the magnitude (in l2 norm) of the “tail” of the sequence of
Fourier coefficients c, i.e., the sequence of Fourier coefficients {cn+1 , cn+2 , · · ·} that has been “thrown
away” in order to produce the approximation x ≈ yn . This truncation occurs because the basis
functions en+1 , en+2 , · · · do not belong to Vn . We shall return to this idea on a number of occasions
later in this course.

78
Lecture 7

Inner product spaces (cont’d)

Some alternate versions of Fourier sine/cosine series

1. We have already seen and used one important orthonormal basis set: The (normalized) co-
sine/sine basis functions for Fourier series on [−π, π]:
 
1 1 1 1
{ek } = √ , √ cos x, √ sin x, √ cos 2x, · · · (49)
2π π π π

These functions form a complete orthonormal basis in the space of real-valued square-integrable
functions L2 [−π, π].

This is a natural, but particular case, of the more general class of orthonormal sine/cosine
functions on the interval [−a, a], where a > 0:
 
1 1  πx  1  πx  1  πx 
{ek } = √ , √ cos , √ sin , √ cos ,··· (50)
2a a a a a a a

These functions form a complete orthonormal basis in the space of real-valued square-integrable
functions L2 [−a, a]. When a = π, we have the usual Fourier series functions cos kx and sin kx.
We shall return to this set in the next lecture.

2. Sometimes, it is convenient to employ the following set of complex-valued square-integrable


functions on [−π, π]:

1
ek = √ exp(ikx), k = · · · , −2, −2, 0, 1, 2 · · · . (51)
π

Note that the index k is infinite in both directions. These functions form a complete set in the
complex-valued space L2 [−π, π]. The orthonormality of this set with respect to the complex-
valued inner product on [−π, π] is left as an exercise. Because of Euler’s formula,

eikx = cos kx + i sin kx, (52)

expansions in this basis are related to Fourier series expansions. In a sense, this set of complex-
valued functions combines the infinite cosine and sine functions from Eq. (49) to produce one
doubly infinite sequence of functions. We shall return to this set in the near future.

79
By means of scaling of the above result, it is easy to show that the following set forms a complete
orthonormal basis for the complex-valued space L2 [−a, a]:
 
1 ikπx
ek = √ exp , k = · · · , −2, −2, 0, 1, 2 · · · . (53)
a a

3. In many books and discussion, the interval of interest is [0, 1]. Using a change of variable in Eq.
(49), one can show that the following sequence of functions
n √ √ √ √ o
{ek } = 1, 2 cos πx, 2 sin πx, 2 cos 2πx, 2 sin 2πx, · · · , , (54)

forms an orthonormal basis over the space L2 [0, 1].

80
Inner product spaces (cont’d)

Convergence of Fourier series expansions

Here we briefly discuss some convergence properties of Fourier series expansions, namely,

1. pointwise convergence: the convergence of a series at a point x,

2. uniform convergence: the convergence of a series on an interval [a, b] in the k k∞ norm/metric,

3. convergence in mean: the convergence of a series on an interval [a, b] in the k k2 norm/metric.

You may have seen some, perhaps all, of these ideas in AMATH 231 (or equivalent). We shall cover
them briefly, and without proof. What will be of greater concern to us in this course is the rate of
convergence of a series and its importance in signal/image processing.

Recall that the Fourier series expansion of a function f (x) over the interval [−π, π] has the following
form,

X
f (x) = a0 + [ak cos kx + bk sin kx], x ∈ [−π, π]. (55)
k=1

The right-hand-side of Eq. (55) is clearly a 2π-periodic function of x. As such, one expects
that it will represent a 2π-periodic function. Indeed, as you probably saw in AMATH 231, this is
the case. If we consider values of x outside the interval (−π, π), then the series will represent the
so-called “2π-extension” of f (x) – one essentially takes the graph of f (x) on (−π, π) and copies it
on each interval (−π + 2kπ, π + 2kπ), k = −1, −2, · · · and k = 1, 2, · · ·. There are some potential
complications, however, at the connection points (2k + 1)π, k ∈ R. It is for this reason that we used
the open interval (−π, π) above.

Case 1: f is 2π-periodic. In this case, there are no problems with translating the graph of f (x) on
[−π, π], since f (−π) = f (π). Two contiguous graphs will intersect at the connection points. Without
loss of generality, let us simply assume that f (x) is continuous on [−π, π]. Then the graph of its
2π-extension is continuous at all x ∈ R. A sketch is given below.

Case 2: f is not 2π-periodic. In particular f (−π) 6= f (π). Then there is no way that two contiguous
graphs of f (x) will intersect at the connection points – there will be discontinuities, as sketched below.

81
y

x
−3π −2π −π 0 π 2π 3π

extension f (x) extension


(copy) (copy)

Case 1: 2π-extension of a 2π-periodic function f (x).

x
−3π −2π −π 0 π 2π 3π

extension f (x) extension


(copy) (copy)

Case 2: 2π-extension of a function f (x) that is not 2π-periodic.

The existence of such discontinuities in the 2π-extension of f will have consequences regarding
the convergence of Fourier series to the function f (x), even on (−π, π). Of course, there may be other
discontinuities of f inside the interval (−π, π), which will also affect the convergence.

We now state some convergence results, starting with the “weakest result,” i.e., the result that has
minimal assumptions on f (x).

Convergence Result No. 1: f is square-integrable on [−π, π]. Mathematically, we require that


f ∈ L2 [−π, π], that is, Z π
|f (x)|2 dx < ∞. (56)
−π

As we discussed in a previous lecture, the function f (x) doesn’t have to be continuous – it can be
piecewise continuous, or even worse! For example, it doesn’t even have to be bounded. In the appli-
cations examined in this course, however, we shall be dealing with bounded functions. In engineering
parlance, if f satisfies the condition in Eq. (56) then it is said to have “finite energy.”
In this case, the convergence result is as follows: The Fourier series in (55) converges to f in L2

82
norm/metric. This is also known as convergence of the Fourier series in mean to f . This type of
convergence implies that the partial sums Sn of the Fourier series converge to f as follows,

kf − Sn k2 → 0 as n → ∞. (57)

Note that this property does not imply that the partial series Sn converge pointwise, i.e., that
|f (x) − Sn (x)| → ∞ as n → ∞ at an x ∈ [−π, π].

As for a proof of this convergence result: It follows from the Generalized Fourier Series Theorem
stated near the end of the previous lecture. We use the fact that the sine/cosine basis used in Fourier
series is complete in L2 [−π, π] which will allow us to express a function f as an infinite series in the
sine and cosine functions.

On the other side of the spectrum, we have the “strongest result,” i.e., the result that has quite
stringent demands on the behaviour of f .

Convergence Result No. 2: f is 2π-periodic and continuous on [−π, π]. Note that this also
implies that f (π) = f (−π). In this case, the Fourier series in (55) converges uniformly to f on
[−π, π], i.e., it converges to f in the k k∞ norm/metric: From previous discussions, this implies that
the partial sums Sn converge to f as follows,

kf − Sn k∞ → 0 as n → ∞. (58)

This is a very strong result – it implies that the partial sums Sn converge pointwise to f : For all
x ∈ [−π, π],
|f (x) − Sn (x)| → 0 as n → ∞. (59)

But the result is actually stronger than this since the pointwise convergence is uniform over the
interval [−π, π], in an “ǫ-ribbonlike” fashion. This comes from the definition of the k k∞ norm.

A proof of this result can be found in the book by Boggess and Narcowich – see Theorem 1.30 and
its proof, pp. 72-75.

83
Example: Consider the function

 −x, −π < x ≤ 0,
f (x) = |x| = (60)
 x, 0 < x ≤ π,

which is continuous on [−π, π]. Moreover, its 2π-extension is also continuous on R since f (−π) =
f (π) = π. Because the function f (x) is even, the expansion is only in terms of the cosine functions.
The series has the form (Exercise)

∞  −4/k2 , k odd,
X π
f (x) = a0 + ak cos kx, a0 = , ak = k ≥ 1. (61)
2  0, k even,
k=1

In the figure below is presented a plot of the partial sum S9 (x) which is comprised of only six nonzero
coefficients, a0 , a1 , a3 , a5 , a7 , a9 . Despite the fact that we use only six terms of the Fourier series, an
excellent approximation to f (x) is achieved over the interval [−π, π]. The use of 11 nonzero coefficients,
i.e., the partial sum S19 (x) produces an approximation that is virtually indistinguishable from the plot
of the function f (x) in the figure!
4

3.5

2.5

1.5

0.5

-0.5
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
x

Partial sum S9 (x) of the Fourier cosine series expansion (61) of the 2π-periodic piecewise constant function,

 −x, −π < x ≤ 0,
f (x) = |x| = (62)
 x, 0 < x ≤ π,

The function f (x) is also plotted.

Convergence Result No. 2 is applicable in this case, so we may conclude that the Fourier series
converges uniformly to f (x) over the entire interval [−π, π]. That being said, we notice that the degree
of accuracy achieved at the points x = 0, x = ±π is not the same as at other points, in particular,
x = ±π/2. Even though uniform convergence is guaranteed, the rate of convergence is seen to be

84
a little slower at these “kinks”. These points actually represent singularities of the function – not
points of discontinuity of the function f (x) but of its derivative f ′ (x). Even such singularities can
affect the rate of convergence of a Fourier series expansion. We’ll say more about this later.

Uniform convergence implies convergence in mean

We expect that if the stronger convergence result, No. 2, applies to a function f , then the weaker
result, No. 1, will also apply to it, i.e.,

uniform convergence on [−π, π] implies L2 convergence on [−π, π].

A quick way to see this is that if f ∈ C[a, b], it is bounded on [a, b], implying that it must be in L2 [a, b].
But let’s go through the mathematical details, since they are revealing.
Suppose that f ∈ C[a, b]. Its L2 norm is given by
Z b 1/2
2
kf k2 = |f (x)| dx . (63)
a

Since f ∈ C[a, b] it is bounded on [a, b]. Let M denote the value of the infinity norm of f , i.e.,

M = max |f (x)| = kf k∞ . (64)


a≤x≤b

Now return to Eq. (63) and note that, from the basic properties of integrals,
Z b Z b
2
|f (x)| dx ≤ M 2 dx
a a
= M 2 (b − a). (65)

Subsituting this result into (63), we have


√ √
kf k2 ≤ M b − a = b − akf k∞ . (66)

Now replace f with f − Sn :



kf − Sn k2 ≤ b − akf − Sn k∞ . (67)

Uniform convergence implies that the RHS goes to zero as n → ∞. This, in turn, implies that the
LHS goes to zero as n → ∞, which implies convergence in L2 , proving the desired result.

Convergence Results 1 and 2 appear to represent opposite sides of the spectrum, in terms of the
behaviour of f . Result 1 assumes that f is square integrable over the interval whereas Result 2

85
assumes a good deal more, namely continuity. The following result, where f is assumed to be piecewise
continuous, is a kind of intermediate result which is quite applicable in signal and image processing.
Recall that f is said to be piecewise continuous on an interval I if it is continuous at all x ∈ I with
the exception of a finite number of points in I. In this way, it can have “jumps”.

Convergence Result No. 3: f is piecewise C 1 on [−π, π]. In this case:

1. The Fourier series converges uniformly to f on any closed interval [a, b] that does not contain a
point of discontinuity of f .

2. If p denotes a point of discontinuity of f , then at p the Fourier series converges to the value

f (p + 0) + f (p − 0)
, (68)
2

where
f (p + 0) = lim f (p + h), f (p − 0) = lim f (p − h). (69)
h→0+ h→0+

Note: The “piecewise C 1 ” requirement, as opposed to “piecewise C” guarantees that the slopes f ′ (x)
of tangents to the curve remain finite as x approaches points of discontinuity both from the left and
from the right. A proof of this convergence result may be found in the book by Boggess and Narcowich,
cf. Theorem 1.22, p. 63 and Theorem 1.28, p. 70.

Example: Consider the function defined by



 −1, −π < x ≤ 0,
f (x) = (70)
 1, 0 < x ≤ π.

Because the function f (x) is odd, the expansion is only in terms of the sine functions (as you found
in Problem Set No. 1). The series has the form


X  4/(kπ), k odd,
f (x) = bk sin kx, bk = (71)
 0, k even.
k=1

Clearly, f (x) is discontinuous at x = 0 because of the jump there. But its 2π-extension is also
discontinuous at x = ±π. In the figure below is presented a plot of the partial sum S50 (x) of the
Fourier series expansion to this function.

86
1.5

0.5

-0.5

-1

-1.5
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
x

Partial sum S50 (x) of Fourier sine series expansion (71) of the 2π-periodic piecewise constant function,

 −1, −π < x ≤ 0,
f (x) = (72)
 1, 0 < x ≤ π.

The function f (x) is also plotted.

Clearly, f (x) is continuous at all x ∈ [−π, π] except at x = 0 and x = ±π. In the vicinity of these
points, the convergence of the Fourier series appears to be slower – one would need a good number
of additional terms in the expansion in order to approximate f (x) near these points to the accuracy
demonstrated elsewhere, say near x = ±π/2. According to the first point of Convergence Result No.
3, the Fourier series converges uniformly on any closed interval [a, b] that does not contain the points
of discontinuity x = 0, x = −π, x = π. Even though the convergence on such a closed interval is
uniform, it may not necessarily be very rapid. Compare this result to that obtained by only six terms
of the Fourier series to the continuous function f (x) = |x| shown in the previous figure. We’ll return
to this point in the next lecture.
Intuitively, one may imagine that it takes a great deal of effort for the series to be approximating
the value f (x) = −1 for negative values of x near 0, and then having to jump up to approximate the
value f (x) = 1 for positive values of x near 0. As such, more terms of the expansion are required
because of the dramatic jump in the function. We shall return to this point in the next lecture as well.
At each of the three discontinuities in the plot, we see that the second point of Convergence Result
No. 3, regarding the behaviour of the Fourier series at a discontinuity, is obeyed. For example, at
x = 0, the series converges to zero, because all terms are zero: sin(k0) = 0 for all k. And zero is
precisely the average value of the left and right limits f (1 − 0) = −1 and f (1 + 0) = 1. The same
holds true at x = π and x = −π.

87
The visible oscillatory behaviour of the partial sum function S50 (x) in the plot is called “Gibbs
ringing” or the “Gibbs artifact.” For lower partial sums, i.e., Sn (x) for n < 50, the oscillatory nature
is even more pronounced. Such “ringing” is a fact-of-life in image processing, since images generally
contain a good number of discontinuities, namely edges. Since image compression methods such as
JPEG rely on the truncation of Fourier series, they are generally plagued by ringing artifacts near
edges. This is yet another point that will be addressed later in this course.

The “moral of the story” regarding discontinuities: They affect the rate of conver-
gence of Fourier series

As suggested by the previous example, discontinuities of a function f (x) create problems for its Fourier
series expansion by slowing down its rate of convergence. At a jump discontinuity, the convergence
may be quite slow, with the partial sums demonstrating Gibbs’ “ringing.”
Another way to look at this situation is as follows: Generally, a higher number of terms in the
Fourier series expansion – or “higher frequencies” – are needed in order to approximate a function
f (x) near points of discontinuity.
But, in fact, it doesn’t stop there – the existence of points of discontinuity actually affects the rate
convergence at other regions of the interval of expansion. To see this, let’s return to the two examples
studied above, i.e., the functions

 −x, −π < x ≤ 0,
f1 (x) = |x| = (73)
 x, 0 < x ≤ π,

and 
 −1, −π < x ≤ 0,
f2 (x) = (74)
 1, 0 < x ≤ π,

Note that we have subscripted them for convenience. Recall that the function f1 (x) is continuous on
[−π, π] and its 2π-extension is continuous for all x ∈ R. On the other hand f2 (x) has a discontinuity
at x = 0 and its 2π-extension has discontinuities at all points kπ.
We noticed how well a rather low number of terms (i.e., 5) in the Fourier expansion of f1 (x)
approximated it over the interval [−π, π]. On the other hand, we saw how the discontinuities of f2 (x)
affected the performance of the Fourier expansion, even for a much larger number of terms (i.e., 50).

88
This is not so surprising when we examine the decay rates of the Fourier series coefficients for each
function:

1. For f1 (x), the coefficients ak decay as O(1/k 2 ) as k → ∞.

2. For f2 (x), the coefficients bk decay as O(1/k) as k → ∞.

The coefficients for f1 are seen to decay more rapidly than those of f2 . As such, you don’t have to
go to such high k values (which multiply sine and cosine functions, of maximum absolute value 1) for
the coefficients ak to become negligible to some prescribed accuracy ǫ. (Of course, there is the infinite
“tail” of the series to worry about, but the above reasoning is still valid.)
The other important point is that the rate of decay of the coefficients affects the convergence over
the entire interval, not just around points of discontinuity. This has been viewed as a disadvantage
of Fourier series expansions: that a “bad point,” p, i.e. a point of discontinuity, even near or at the
end of an interval will affect the convergence of a Fourier series over the entire interval, even if the
function f (x) is “very nice” on the other side of the interval. We illustrate this situation in the sketch
on the left in the figure below.
Researchers in the signal/image processing community recognized this problem years ago and
came up with a clever solution: If the convergence of the Fourier series over the entire interval [a, b]
is being affected by such a bad point p, why not split the interval into two subintervals, say A = [a, c]
and B = [c, b] and perform separate Fourier series expansions over each subinterval. Perhaps in this
way, the number of coefficients saved by the “niceness” of f (x) over [a, c] might exceed the number of
coefficients needed to accomodate the “bad” point p. The idea is illustrated in the sketch on the right
in the figure below.
The above discussion is, of course, rather simplified, but it does describe the basic idea behind
block coding, i.e., partitioning a signal or image into subblocks and Fourier coding each subblock,
as opposed to coding the entire signal/image.
Block coding is the basis of the JPEG compression method for images as well as for the MPEG
method for video sequences. More on this later.

Greater degree of smoothness implies faster decay of Fourier series coefficients

The effect of discontinuities on the rate of convergence of Fourier series expansions does not end
with the discussion above. Recall that the Fourier series for the continuous function f1 (x) given

89
y = f (x)
y = f (x)

a p b
a c p b

“nice” region of smoothness “bad” point of


Fourier series on [a, c] Fourier series on [c, d]
of f (x) discontinuity

Fourier series on [a, b] .

above demonstrated quite rapid convergence. But it is possible that series will demonstrate even
more rapid convergence due to the fact that the Fourier series coefficients ak and bk decay even more
rapidly than 1/k2 . Recall that the function f1 (x) is continuous, but that its derivative f ′ (x) is only
piecewise continuous, having discontinuities at x = 0 and x = ±π. Functions with greater degrees
of smoothness, i.e., higher-order continuous derivatives will have Fourier series with more rapid
convergence. We simply state the following result without proof:

Theorem: Suppose that f (x) is 2π-periodic and C n [−π, π], for some n > 0 – that is, its nth derivative
(and all lower order derivatives) is continuous. Then the Fourier series coefficients ak and bk in Eq.
(55) decay as  
1
a k , bk = O , as k → ∞.
kn+1

An idea of the proof is as follows. To avoid complications, suppose that f is piecewise continuous,
corresponding to n = 0 above, the coefficients must decay at least as quickly as 1/k, since they
comprise a square-summable sequence in l2 . Now consider the function
Z x
g(x) = f (s) ds, (75)
0

which is a continuous function of x (Exercise). The Fourier series coefficients of g(x) may be ob-
tained by termwise integration of the coefficients of f (x) (AMATH 231). This implies that the series
coefficients of g(x) will decay at least as quickly as 1/k 2 . Integrate again, etc..

In other words, the more “regular” or “smooth” a function f (x) is, the faster the decay of its
Fourier series coefficients, implying that you can generally approximate f (x) to a desired accuracy

90
over the interval with a fewer number of terms in the Fourier series expansion. Conversely, the more
“irregular” a function f (x) is, the slower the decay of its FS coefficients, so that you’ll need more terms
in the FS expansion to approximate it to a desired accuracy. This feature of regularity/approximability
is very well-known and appreciated in the signal and image processing field. In fact, it is a very
important, and still ongoing, field of research in analysis.

The above discussion may seem somewhat “handwavy” and imprecise. Let’s look at the problem
in a little more detail. And we’ll consider the more general case in which a function f (x) is expressed
in terms of a set of of functions, {φk (x)}∞
k=1 , which form a complete and orthonormal basis on an

interval [a, b], i.e.,



X
f (x) = ck φk (x), ck = hf, φk i. (76)
k=1

Here, the equation is understood in the L2 sense, i.e., the sequence of partial sums, Sn (x), defined as
follows,
n
X
Sn (x) = ck φk (x), (77)
k=1

converges to f in L2 norm/metric, i.e.,

kf − Sn k2 → 0 as n → ∞. (78)

The expression in the above equation is the magnitude of the error associated with the approximation
f (x) ∼
= Sn (x), which we shall simply refer to as the error in the approximation. This error may be
expressed in terms of the Fourier coefficients ck . First note that

X
f (x) − Sn (x) = ck φk . (79)
k=n+1

Therefore the L2 -squared error is given by

kf − Sn k22 = hf − Sn , f − Sn i

X ∞
X
= h ck φk , cl φl i
k=n+1 l=n+1
X∞
= |ck |2 . (80)
k=n+1

Thus,
" ∞
#1/2
X
2
kf − Sn k2 = |ck | . (81)
k=n+1

91
Recall that for the above sum of an infinite series to be finite, the coefficients ck must tend to
zero sufficiently rapidly. The above summation of coefficients starting at k = n + 1 may be viewed as
involving the “tail” of the infinite sequence of coefficients ck , as sketched schematically below.

|ck |2 vs. k

k
0 n+1

“tail” of infinite sequence

For a fixed n > 0, the greater the rate of decay of the coefficients ck , the smaller the area under
the curve that connects the tops of these lines representing the coefficient magnitudes, i.e., the smaller
the magnitude of the term on the right of Eq. (81), hence the smaller the error in the approximation.
From a signal processing point of view, more of the signal is concentrated in the first n coefficients ck .
From the examples presented earlier, we see that singularities in the function/signal, e.g., discon-
tinuities of the function, will generally reduce the rate of decay of the Fourier coefficients. As such,
for a given n, the error of approximation by the partial sum Sn will be larger. This implies that in
order to achieve a certain accuracy in our approximation, we shall have to employ more coefficients
in our expansion. In the case of the Fourier series, this implies the use of functions sin kx and cos kx
with higher k, i.e., higher frequencies.
Unfortunately, such singularities cannot be avoided, especially in the case of images. Images are
defined by edges, i.e., sharp changes in greyscale values, which are precisely the points of discontinuity
in an image.
However, singularities are not the only reason that the rate of decay of Fourier coefficients may
be reduced, as we’ll see below.

92
Lecture 8

Inner product spaces (cont’d)

Higher variation means higher frequencies are needed

This material is presented as supplementary information. You will not be examined on it.

In the previous discussion, we saw how the irregularity or lack of smoothness of a function f (x)
– for example, points of discontinuity in f (x) or its derivatives – affects the convergence of its Fourier
series expansion. This phenomenon is very important in signal and image processing, particularly in
the field of signal/image compression, where we wish to store approximations to the signal f (x)
to a prescribed accuracy with as few coefficients as possible.
In addition to smoothness, however, the rate of change of f , as measured by the magnitude of its
derivative, |f ′ (x)|, or gradient k∇f k, also affects the convergence. Contrast the two functions sketched
below. The function on the left, g(x), has little variation over the interval [a, b] whereas the one on
the right, h(x), has significant variation.

g(x) h(x)

a b a b

In order to accomodate the more rapid change in f (x), i.e., in order to approximate such a function
better, sine and cosine functions of higher frequencies, i.e., higher oscillation, are required. In other
words, we expect that the Fourier series coefficients of g(x) will decay more rapidly than those of h(x).

Example 1: We can illustrate this point with the help of the following analytical example. Consider
the normalized Gaussian function,
1 2
− x2
gσ (x) = e 2σ , (82)
2πσ 2
which you have probably encountered in a course on probability or statistics. The variance of this

93
function is σ 2 and its standard deviation is σ. As σ decreases toward zero, the graph of gσ (x) becomes
more peaked – higher and narrower – as shown in the figure below. In what follows, we’ll consider the
function gσ (x) as defined only over the interval [−π, π] so that we may examine its Fourier series.
Gaussian functions
2

1.5

sigma = 0.25
G(t)

0.5 sigma = 0.5

sigma = 1

0
-3 -2 -1 0 1 2 3
t

Clearly, the magnitude of the derivative of gσ (x) is increasing near x = 0. Let us now observe
the effect of this increase on the Fourier coefficients of gσ (x). Since it is an even function, its Fourier
series will be composed only of cosine functions, i.e.,

X
gσ (x) = c0 + ck φk , (83)
k=1

where we are using the orthonormal cosine basis set (see earlier notes),

1 1
φ0 (x) = √ , φk (x) = √ cos kx, k ≥ 1. (84)
2π π

Technically, the computation of the integrals of the Gaussian function is rather complicated since we
are integrating only over the finite interval [−π, π]. For sufficiently large σ, the “tail” of gσ (x) lying
outside this interval is very small – in fact, it is exponentially small, therefore negligible. To a good
approximation, therefore,
Z π
a0 = gσ (x) φ0 (x) dx
−π
Z ∞
∼ 1
= √ gσ (x) dx
2π −∞
1
= √ , (85)

94
and
Z π
1
ak = √ gσ (x) cos kx dx
π −π
Z ∞
∼ 1
= √ gσ (x) cos kx dx
π −∞
Z ∞
1 x2
= √ e− 2σ2 cos kx dx
π −∞
1 σ 2 k2
= √ e− 2 . (86)
π

These results can be derived from the following formula that can be found in integral tables,
Z ∞ √
−a2 x2 π − b22
e cos bx dx = e 4a . (87)
0 2a
1
You let a2 = and then do some algebra.
2σ 2

Note that the distribution of ak values with respect to k > 0 – we don’t even have to square
1
them since they are all positive – is a Gaussian distribution with variance . As we let σ → 0+ ,
σ
the distribution spreads out, in complete opposition to the function gσ (x) getting more concentrated
at x = 0. (We’ll return to this theme – the complementarity of space and frequency – later in this
course.)

Profile of ak coefficients

1
σ

k
0

Example 2: This is a numerical version of the previous example. For 0 < a < π, let ga (x) denote
the function,  q
3
1 − xa ,



 2a 0 ≤ x ≤ a,
 q
ga (x) = 3 x (88)

 2a 1 + a , −a ≤ x ≤ 0,


 0, a < |x| ≤ π.

95
y

q
3
2a

y = ga (x)

−π −a a π

A sample graph of this function is sketched in the figure below.


p
The multiplicative factor 3/(2a) was chosen so that

kga k2 = 1, (89)

for all a > 0, a kind of normalization condition. Note that as a approaches zero, the peak becomes

more pronounced, since the magnitudes of the slopes of the peak are given by |ga (x)| = 3/2a−3/2 .
p

Since the function ga (x) is even, it will admit a Fourier cosine series (i.e., the coefficients bk of
all sine terms are zero). Here we consider the expansion of ga (x) in terms of the orthonormal cosine
basis,
1 1
e1 (x) = √ , ek (x) = √ cos kx, k ≥ 1. (90)
2π π
Then

X
ga (x) = c0 + ck ek , (91)
k=1

where
ck = hga , ek i. (92)

For example,
a
r
1 1 3a
Z
c0 = √ · 2 · ga (x) dx = . (93)
2 π 0 2 π
Since ga ∈ L2 [−π, π], the sequence of Fourier coefficients c = (c0 , c1 , c2 , · · ·) is square summable, i.e.,
c ∈ l2 sequence space. Moreover, from a previous lecture,

kga kL2 = kckl2 = 1, (94)

implying that

X
[ck ]2 = 1. (95)
k=1

96
In the figure below are plotted the coefficients cn , 0 ≤ n ≤ 20, for a values 1.0, 0.5, 0.25, 0.1, 0.05.
(The coefficients were computed using MAPLE.) The plots clearly show that the rate of decay of the
coefficients decreases as a is decreased. For a = 1.0, the coefficients cn appear to be negligible for
n > 5, at least to the resolution of the plot. This would suggest that the partial sum function S5 (x),
composed of cosine terms with coefficients c0 to c5 would provide an excellent approximation to ga (x)
over the interval. On the other hand, for a = 0.5, it appears that we would have to use the partial
sum S10 (x), and so on.
0.5

0.4 a=1.0

0.3

0.2

a=0.5

0.1
a=0.25

a=0.1
a=0.05

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
n

Coefficients cn of Fourier cosine series expansion of the triangular peak-function ga (x) defined in Eq. (88), for
a = 1.0, 0.5, 0.25, 0.1, 0.05. As a decreases, the rate of decay of the Fourier coefficients cn is seen to decrease.

In order to understand this more quantitatively, the partial sums S20 (x) were computed for the a-
values shown in the above figure. From these partial sums, the L2 distances kga −S20 k2 were computed
(using MAPLE). These distances represent the L2 error in approximating ga with S20 . The results
are presented in the table below. Clearly, as a is decreased, the error in approximation by the partial
sums S20 increases. There appears to be a dramatic increase between a = 0.25 and a = 0.1.

Improvement by “block coding”. In light of the earlier discussion on “block coding,” let us see if
we can improve the approximation to the above triangular peak function by dividing up the interval
and coding the function separately over the subintervals. In the following experiment, the interval
I = [−π, π] was partitioned into the three subintervals,

I1 = [−π, −π/3], I2 = [−π/3, π/3], I3 = [π/3, π]. (96)

97
a kga − S20 k2
1 0.012
0.5 0.026
0.25 0.056
0.1 0.460
0.05 0.733

Error in approximation to ga (x) afforded by partial sum functions S20 (x) comprised of Fourier coefficients c0
to c20 .

For a ≤ 1, the approximation of ga (x) over intervals I1 and I3 is trivial since ga (x) = 0. As such we
don’t even have to supply any Fourier coefficients but we should record the use of the first coefficient
c0 = 0. After all, the function ga (x) is constant on these intervals, and we should specify the value of
the constant. Since 21 coefficients were used in the previous experiment (S20 (x) uses ck , 0 ≤ k ≤ 20),
we shall use 19 coefficients to code the function ga (x) over interval I2 .
It remains to construct the Fourier series approximation to ga (x) over interval I2 = [−π/3, π/3].
From Lecture 7, we must employ the basis set
   
1 1  πx  1  πx  1 2πx
{ek } = √ , √ cos , √ sin , √ cos ,··· (97)
2a a a a a 2a a

where a = π/3. Once again, the sine functions are discarded since ga (x) is an even function. This was
easily done in MAPLE: For each a value, the necessary integrals were computed (actually only the
integrals over [0, a] were computed), followed by the L2 distance between ga and the S18 (x) partial
sum functions. The results are presented in the table below. We can see an improvement for all a

a kga − S18 k2
1 0.002
0.5 0.007
0.25 0.021
0.1 0.054
0.05 0.294

Error in approximation to ga (x) afforded by partial sums S18 of Fourier cosine series over interval [−π/3, π/3]
employing Fourier coefficients c0 to c18 , along with the trivial Fourier expansions c0 = 0 on [−π, −π/3) and
(π/3, π].

98
values – a roughly five-fold decrease in the error for a = 1 and about a three-fold decrease for a = 0.05.
This very simple implementation of “block coding” has achieved the goal of decreasing the error with
a given number of coefficients.

Question: The fact that the Fourier series over [−π/3, π/3] works better to approximate the function
ga (x) might appear rather magical. Can you come up with a rather rather simple explanation for the
improvement in accuracy?
That being said, the improvement is rather impressive in this case because we know the function
essentially to infinte accuracy, i.e., we have its formula. If we had only a finite set of discrete data
points representing sampled values of the function, the improvement would not be so dramatic. We’ll
return to this matter after looking at discrete Fourier transforms.

Fourier series on the interval [−a, a]. Even and odd extensions

In a previous lecture, it was mentioned that the following functions comprise an orthonormal set on
the interval [−a, a], where a > 0:
 
1 1  πx  1  πx  1 2πx
e0 = √ , e1 = √ cos , e2 = √ sin , e3 = √ cos ,···. (98)
2a a a a a a a

Moreover, this set serves as a complete orthonormal basis for the space L2 [−a, a] of square-integrable
functions on [−a, a]. Thus, for an f ∈ L2 [−a, a],

X
f= hf, ek iek . (99)
k=0

This may be translated to the following standard (unnormalized) Fourier series expansion having the
form
∞     
X kπx kπx
f (x) = a0 + ak cos + bk sin , (100)
a a
k=1

where
Z a
1
a0 = f (x) dx
2a −a
Z a  
1 kπx
ak = f (x) cos dx
2a −a a
Z a  
1 kπx
bk = f (x) sin dx. (101)
2a −a a

99
(We use the term “unnormalized” since the coefficients ak , bk are multiplying the unnormalized func-

tions cos(kπx/a) and sin(kπx/a). The normalization factors, which involve a factors that become a
upon squaring, are swept into the ak and bk coefficients, which accounts for the factors appearing in
front of the above integrals.) Once again, in the special case a = π, the above formulas become the
standard formulas for Fourier series on [−π, π], cf. Eq. (1), Lecture 1 of these notes.

Fourier cosine series on [−a, a] and periodic extensions

In the case that f (x) is even, i.e., f (x) = f (−x), then all coefficients bk = 0, so that the expansion
in (100) becomes a Fourier cosine series expansion. Moreover, since f (x) is even, it need only be
defined on the interval [0, a], and the expressions for the coefficients ak become
1 a 2 a
 
kπx
Z Z
a0 = f (x) dx, ak = f (x) cos dx, k ≥ 1. (102)
a 0 a 0 a
Now suppose that we are given a function f (x) defined on the interval [0, a] as input data. From
this data, we may construct the ak coefficients – these coefficients define a Fourier cosine series that
converges to to the even 2a-extension of f (x), constructed from f (x) by means of two steps, illus-
trated schematically in the figure below,

1. A “flipping” of the graph of f (x) with respect to the y-axis to produce an even function on
[−a, a].

2. Copying this graph on the intervals [a, 3a], [3a, 5a], etc. and [−3a, −a], [−5a, −3a], etc..

y = f (x)
x
−5a −3a −2a −a 0 a 2a 3a 4a

original data

2a-extension even extension 2a-extension


of data

2a-even extension of f (x), 0 ≤ x ≤ a

Note that the resulting 2a-extension is continuous at all “patch points,” i.e., x = (2k − 1)a, k ∈ Z.
For this reason, Fourier cosine series are usually employed in the coding of signals and images. The
JPEG/MPEG standards are based on versions of the discrete cosine transform.

100
Fourier sine series on [−a, a] and periodic extensions

In the case that f (x) is odd, i.e., f (x) = −f (−x), then all coefficients ak = 0, so that the expansion in
(100) becomes a Fourier sine series expansion. Moreover, since f (x) is odd, it need only be defined
on the interval [0, a] as well. The expression for the coefficients bk becomes

2 a
 
kπx
Z
bk = f (x) sin dx, k ≥ 1. (103)
a 0 a

Once again, suppose that we are given a function f (x) defined on the interval [0, a] as input data.
From this data, we may construct the bk coefficients – these coefficients define a Fourier sine series
that converges to to the odd 2a-extension of f (x), constructed from f (x) by means of two steps,
illustrated schematically in the figure below,

1. An inversion of the graph of f (x) with respect to the origin produce an odd function on [−a, a].
(If f (0) 6= 0, then one of the points (0, ±f (0)) will have to be deleted for f to be single-valued
at x = 0.)

2. Copying this graph on the intervals [a, 3a], [3a, 5a], etc. and [−3a, −a], [−5a, −3a], etc.. (Once
again, some endpoints of the pieces of the graph will have to be deleted to make f single-valued.)

y
original data

y = f (x)
x
−5a −3a −2a −a 0 a 2a 3a 4a

2a-extension odd extension 2a-extension


of data

2a-of extension of f (x), 0 ≤ x ≤ a

Note that the resulting 2a-extension need not be continuous at the “patch points,” i.e., x =
(2k − 1)a, k ∈ Z. Indeed, if f (0) 6= 0, then the odd extension of f (x) will not even be continuous at
0, ±2a, ±4a, etc..

101
The example presented on the next page clearly shows the advantage of working with the Fourier
cosine series – producing an even extension of data on (0, π) – vs. the Fourier sine series – producing
an odd extension of the data. Once again, this is why the (discrete) Fourier cosine expansion is used
in JPEG compression (to be discussed very shortly).

1
Even and odd extensions of f (x) = x and approximations yielded by partial sums
2
of corresponding Fourier series

Even extension
2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5

-2 -2
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
x x

Left: 5 nonzero terms of cosine series. Right: 10 nonzero terms.

Odd extension
2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5

-2 -2
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
x x

Left: 10 nonzero terms of sine series. Right: 100 nonzero terms.

102
The two-dimensional case: image functions

We now examine briefly the Fourier analysis of two-dimensional functions, which will be used pri-
marily to represent images. We shall consider an image function f (x, y) to be defined over a suitable
rectangular region D ⊂ R2 . For the moment, let D be defined as the rectangular region −a ≤ x ≤ a,
−b ≤ y ≤ b, centered at the origin. A suitable function space for the representation of images will be
the space of square-integrable functions on D, i.e., L2 (D):
 Z 
2 2

L (D) = f : D → R |f (x, y)| dA < ∞ . (104)
D

Now let

1. {uk (x)}∞ 2
1 denote the orthonormal set of sine and cosine functions on the space L [−a, a].

2. {ok (y)}∞ 2
1 denote the orthonormal set of sine and cosine functions in the space L [−b, b].

Theorem: The set of all product functions {φkl (x, y) = uk (x)vl (y)} k = 1, 2, · · ·, l = 1, 2, · · ·, form
an orthonormal basis in L2 (D).

For simplicity, we now assume that our images are defined on square regions, i.e., a = b, and
further assume that a = b = 1. In this case the basis functions uk and vk have the same functional
form:
1
{ek }∞
1 = { √ , cos(πx), sin(πx), cos(2πx), sin(2πx), · · ·} (105)
2
The set of all products ek (x)el (y) will lead to a complicated mixture of sine and cosine functions.
It is convenient to assume that the image function f (x, y) is an even function with respect to both
x and y, implying that we use only the cosine functions in our basis. In essence, this amounts to the
assumption that the actual image being analyzed lies in the region [0, 1] × [0, 1]. Analogous to the
one-dimensional case, the use of only cosine functions will perform an even 2π-periodic extension of
this image, both in the x and y directions. Let us examine this further.

1. Even w.r.t. x: f (x, y) = f (−x, y).

2. Even w.r.t. y: f (x, y) = f (x, −y).

3. From 1 and 2: f (−x, y) = f (x, −y), implying that f (x, y) = f (−x, −y), i.e., symmetry w.r.t.
inversion about (0, 0).

103
This means that the graph of f (x, y) in the first quadrant, i.e., [0, a] × [0, a], i.e., the input image,
is “flipped” w.r.t. the y-axis, then “flipped” w.r.t. the x-axis, and finally “flipped” w.r.t. the point
(0, 0). The result is an even 2π-extension of the function f (x, y). The process is illustrated below.
y original image
1

x
-1 1

-1

Input image f (x, y), 0 ≤ x, y ≤ 1, and its even 2π-extension in x and y directions via Fourier cosine transform.

The advantage of an even extension in both directions is that no discontinuities are introduced.
The function f (x, y) is continuous at all points on the x and y-axes. As such, no complications
regarding convergence of the Fourier series are introduced artificially.

The net result is that the input image function f (x, y) defined on the region [0, 1] × [0, 1] will admit a
Fourier cosine series expansion of the form,
∞ X
X ∞
f (x, y) = a00 + akl cos(kπx) cos(lπy). (106)
k=1 l=1
The series coefficients akl could be obtained from the expansion for f in terms of the orthonormal
basis functions or by simply multiplying both sides of (106) with the function cos(mπx) cos(nπy) and
integrating x and y over [0, 1], and exploiting the orthogonality of the cosine functions. The net result
is
Z 1Z 1
a00 = f (x, y) dxdy,
0 0
Z 1Z 1
a0l = 2 f (x, y) cos(lπy) dxdy, l ≥ 1,
0 0
Z 1Z 1
ak0 = 2 f (x, y) cos(kπx) dxdy, k ≥ 1,
0 0
Z 1Z 1
akl = 4 f (x, y) cos(kπx) cos(lπy) dxdy, k, l ≥ 1. (107)
0 0

104
The Discrete Fourier Transform

We now turn to the analysis of discrete data, e.g., sets of measurements, yk , k = 0, 1, 2, · · ·, as opposed
to signals in continuous time, e.g., f (t). We also assume that the measurements are evenly spaced
in time/space, i.e., there is a fixed time interval T > 0 between each measurement. This is necessary
for the basic theory to be presented below. That being said, it is very often the procedure employed
in scientific experiments, e.g., measuring the temperature at a particular location at hourly intervals.
At this time, we shall simply assume that the measurements correpond to the values of a function
f (t) at discrete times, tn = nT . In the signal processing literature, the usual notation for such a
sampling is as follows,

f [n] := f (nT ), n ∈ {0, 1, 2, · · ·} or n ∈ {· · · , −1, 0, 1, · · ·}. (108)

The square brackets are rather cumbersome – some authors employ the notation “fn ”, but we shall
reserve this notation for other purposes. The idea is sketched below.
y

o f [0]
f [1]
o f [2]
f [3] f [4] f [5] f [6]
o o o o o
f [n]
o
y = f (t)

t
0 T 2T 3T 4T 5T 6T nT

We now assume that we are working with a set of N such consecutive data points which will
comprise an N -vector, indexed as follows,

f = (f [0], f [1], · · · , f [N − 1]). (109)

These measurements could be complex-valued, so that f ∈ CN . Furthermore, we assume that this set
of measurements is then periodized, i.e., extended into the future and backwards into the past, so
that
f [k + N ] = f [k], k ∈ Z. (110)

105
This represents a periodic extension of the data, a discrete analogy to the periodization of
functions produced by Fourier series representations.

A quick way to derive the basis vectors for the Discrete Fourier Transform

In the same way that the 2π-periodic trigonometric functions sin kx and cos kx are used as basis
functions for Fourier series expansions, we shall be looking for N -periodic vectors to serve as basis
functions for these discrete data sets in RN , i.e., vectors v ∈ RN . With some scaling followed by
discretization, we can accomplish this goal.

1. First of all, let’s scale the functions to be 1-periodic by multiplying their arguments by 2π,

sin(2πkx), cos(2πkx), k = 0, 1, 2, · · · , .

2. Scale them again to be N -periodic in the continuous variable x ∈ [0, N ],


   
2πkx 2πkx
sin , cos , k = 0, 1, 2, · · · , .
N N

3. Now restrict the variable x to be integer valued, i.e., x = n ∈ {0, 1, 2, · · · , N },


   
2πkn 2πkn
sin , cos , k = 0, 1, 2, · · · , .
N N

The result is a set of N -periodic vectors that can serve as a basis for RN or CN :
     
2πk(n + N ) 2πkn 2πkn
sin = sin + 2πk = sin , k = 0, 1, 2, · · ·
N N N
     
2πk(n + N ) 2πkn 2πkn
cos = cos + 2πk = cos , k = 0, 1, 2, · · · .
N N N

Note that these N -vectors (in terms of n) are also N -periodic in the k variable:
     
2π(k + N )n 2πkn 2πkn
sin = sin + 2πn = sin ,
N N N
     
2π(k + N )k 2πkn 2πkn
cos = cos + 2πn = cos .
N N N

As such, we need only consider N k-values, i.e., k = 0, 1, · · · , N − 1. The result is the following set of
2N functions: For k = 0, 1, 2, · · · , N − 1,
   
2πkn 2πkn
sin , cos , n = 0, 1, 2, · · · , N − 1 . (111)
N N

106
We may combine these functions in the same way as for Fourier series, using Euler’s formula to come
up with a set of (unnormalized) complex-valued N -periodic vectors: For k = 0, 1, 2, · · · , N − 1,
 
i2πkn
ukn = exp , 0 ≤ n ≤ N − 1. (112)
N

The result is the usual complex-valued (orthogonal) basis used in the so-called Discrete Fourier
Transform.

107

You might also like