Apuntes. Esp. de Hilbert, Transf. de Fourier. Piere Bremaud.

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 87

Notes on Hilbert Spaces, Fourier Transform and Probability

Pierre Brémaud

November 12, 2004


2
Contents

1 Integration 5
1.1 The Lebesgue integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1 Measurable functions and measures . . . . . . . . . . . . . . . . . . 6
1.1.2 Construction of the integral . . . . . . . . . . . . . . . . . . . . . . 11
1.2 The big results of integration theory . . . . . . . . . . . . . . . . . . . . 15
1.2.1 Dominated convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.2 Fubini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.3 Radon–Nikodym . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 The Riesz–Fischer theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.1 The Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Hölder’s and Minkowski’s Inequalities . . . . . . . . . . . . . . . . 22
1.3.3 Completeness of Lp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2 Hilbert spaces 29
2.1 Basic definitions and properties . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.1 Inner product and Schwarz’s inequality . . . . . . . . . . . . . . . 29
2.1.2 Continuity of the inner product . . . . . . . . . . . . . . . . . . . . 31
2.1.3 The L2 spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Projections and isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.1 The projection principle . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.2 Hilbert space isometries . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3 Orthonormal Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.1 Orthonormal Systems and Bessel’s inequality . . . . . . . . . . . 36
2.3.2 Complete orthonormal systems . . . . . . . . . . . . . . . . . . . . . 37

3 Fourier analysis 41
3.1 Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.1 Fourier transform in L1 . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.2 Fourier Transform in L2 . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.1 Fourier series in L1loc . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.2 The Poisson summation formula . . . . . . . . . . . . . . . . . . . . 54
3.2.3 Fourier Series in L2loc . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.4 The Sampling Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3
4 CONTENTS

4 Probability 67
4.1 Expectation as integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.2 Distribution of a random element . . . . . . . . . . . . . . . . . . . 69
4.1.3 The Lebesgue theorems for expectation . . . . . . . . . . . . . . . 69
4.1.4 Uniform integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.1 From Fubini to independence . . . . . . . . . . . . . . . . . . . . . . 72
4.2.2 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3 The theory of caracteristic functions and weak convergence . . . . 79
4.3.1 Paul Lévy’s inversion formula . . . . . . . . . . . . . . . . . . . . . 79
4.3.2 Bochner”s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.3 The caracteristic function criterion of convergence in distri-
bution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3.4 Hilbert space of square integrable functions . . . . . . . . . . . 85
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Chapter 1

Integration

Introduction

The reader is familiar with the Riemann integral. The latter has however a few weak points when
compared to the Lebesgue integral. For instance,

(1) The class of Riemann-integrable functions is not large enough. Indeed, some functions have an
“obvious” integral, and Riemann’s integration theory denies it, while Lebesgue’s theory recognizes it
(see Example 1.10).

(2) The stability properties of the Riemann integral under the limit operation are too weak. Indeed,
it often happens that the limit of nonnegative Riemann integrable functions is not Riemann integrable,
whereas the limit of nonnegative Lebesgue integrable functions is always Lebesgue integrable.

(3) The Riemann integral is defined with respect to the Lebesgue measure (the “volume” in n ), whereas
the Lebesgue integral can be defined with respect to a general abstract measure, a probability for instance.

This last advantage should convince the student to invest a little of her time in order to understand the
essentials of Lebesgue integral, because the return is considerable. Indeed, the Lebesgue integral of the
function f with respect to the measure µ (to be defined in the present chapter), modestly denoted by
Z
f (x) µ(dx),
X

contains a surprising variety of mathematical objects, such as the usual Lebesgue integral on the line,
Z
f (x) dx,


and also the Lebesgue volume integral. An infinite sum


X
f (n)
n∈ 

can also be regarded (with profit) as a Lebesgue integral with respect to the counting measure on  .
The Stieltjes–Lebesgue integral Z
f (x) dF (x)


5
6 CHAPTER 1. INTEGRATION

with respect to a function F of bounded variation, as well as the expectation

E[Z]

of a random variable Z, are particular cases of the Lebesgue integral. For the reader who is reluctant
to give up the expertise dearly acquired in the Riemann integral, it suffices to say that any Riemann-
integrable function is also Lebesgue-integrable and that both integrals then coincide.

Is Lebesgue’s theory hard to learn and understand? In fact most of the results thereof are very natural,
and do not disturb the intuition acquired by the practice of the Riemann integral. But the Lebesgue
integral is much easier to manipulate correctly than the Riemann integral. The tedious (although not
difficult) part is maybe the step by step construction of the Lebesgue integral. However, omitting the
details and just giving a summary of the main steps is usually not a cause of frustration for the reader
interested in the Lebesgue integral as a tool for applications. The really difficult part of measure theory
is the proof of existence of certain measures, but one usually does not mind admitting such results. For
instance there is an existence theorem for the Lebesgue measure ` (the “length”) on . It says: There
exists a unique measure ` on that gives to the intervals [a, b] the measure b − a. Of course, in order
to understand what all the fuss is about, and what kind of mathematical subtleties hide behind such a
harmless statement, we shall have to be more precise about the meaning of ”measure”. But when this is
done, the statement is of the kind that one is ready to approve, although its proof is not immediate. Of
course, in this chapter, the proofs of such ”obvious” results are systematically omitted, because the goal
is practical: to provide the reader with a powerful tool, and to give a few tips as to how to use it safely.
Someone with no previous knowledge of integration theory will therefore be in the situation of the new
recipient of a driving license. Experience is best acquired on the road, and the main text contains many
opportunities for the reader to apply the rules that we shall now briefly review.

1.1 The Lebesgue integral


1.1.1 Measurable functions and measures
We first recall the notation: , , Q, ,
  are the sets of, respectively, integers, relative integers,
rationals, real numbers, complex numbers; = ∪ {+∞, −∞}; + and + are the sets of positive
integers and nonnegative real numbers; + = + ∪ {+∞}. P(X) is the collection of all subsets of an
arbitrary set X; card(X) is the cardinal, that is the “number” of elements, of set X.

Definition 1.1.1 A family X ⊆ P(X) of subsets of X is called a sigma-field on X if:

(α) X ∈ X ;
(β) A ∈ X =⇒ Ā ∈ X ;
(γ) An ∈ X for all n ∈ =⇒ ∪∞
n=0 An ∈ X .

One then says that (X, X ) is a measurable space.

Observe that if X is a sigma-field on X, and if An ∈ X for all n ∈ , then ∩∞


n=0 An ∈ X .

Example 1.1: Two extremes. Two extremal examples of sigma-fields on X are the gross sigma-field
X = {∅, X}, and the trivial sigma-field X = P(X).

Definition 1.1.2 The sigma-field generated by a nonempty collection of subsets C ⊆ P(X) is, by defi-
nition, the smallest sigma-field on X containing all the sets in C. It is denoted by σ(C).

Recall that a set O ∈ n is called open if for any x ∈ O, one can find a non empty ball centered on x
and entirely contained in O.
1.1. THE LEBESGUE INTEGRAL 7

Definition 1.1.3 Let X be a topological space and let O be the collection of open sets defining the
topology. The sigma-field B(X) = σ(O) is called the Borel sigma-field on X associated with the given
topology. A set B ∈ B(X) is called a Borel set of X.

If X = n is endowed with the Euclidean topology, the Borel sigma-field B( n


) is denoted B n . Next
result gives a more convenient way of defining the Borel sigma-field.

Qn
Theorem 1.1.1 B n is also generated by the collection C of all rectangles of the type i=1 (−∞, ai ],
where ai ∈ Q for all i ∈ {1, . . . , n}.

Q
For n = 1 one writes B( ) = B. For I = n j=−1 Ij , where Ij is a general interval of (I is then called a
n
generalized rectangle of ), the Borel sigma-field B(I) on I consists of all the Borel sets contained in I.

The central concept of Lebesgue’s integration theory is that of a measurable function.

Definition 1.1.4 Let (X, X ) and (E, E) be two measurable spaces. A function f : X → E is said to be
a measurable function with respect to X and E if f −1 (C) ∈ X for all C ∈ E.

This is denoted in various ways, for instance:

f : (X, X ) → (E, E), or f ∈ E/X , or f ∈ X,


n
where the third notation will be used only when (E, E) = (I, B(I)), I being a general rectangle of ,
provided the context is clear enough as to the choice of I.
k
If f : (X, X ) → ( , Bk ) one says that f is a Borel function from X to k
.

Let B be the sigma-field on generated by the intervals of type (− ∞, a], a ∈ . A function f :


(X, X ) → ( , B), where (X, X ) is an arbitrary measurable space, is called an extended Borel function,
or simply a Borel function. As for functions f : (X, X ) → ( , B), they are called real Borel functions.
In general, in a sentence such as “f is a Borel function defined on X”, the sigma-field X is assumed to
be the obvious one in the given context.

It seems difficult to prove measurability since most sigma-fields are not defined explicitly (see the defi-
nition of B n for instance). However, the following result renders the task feasible.

Theorem 1.1.2 Let (X, X ) and (E, E) be two measurable spaces, where E = σ(C) for some collection C
of subsets of E. Then f : (X, X ) → (E, E) if and only if f −1 (C) ∈ X for all X ∈ C.
Two immediate applications of this result are:

Corollary 1.1.1 Let X and E be two topological spaces with respective Borel sigma-fields B(X) and
B(E). Any continuous function f : X → E is measurable with respect to B(X) and B(E).

Proof. See Exercise 4.1. 

Corollary 1.1.2 Let (X, X ) be a measurable space and let n ≥ 1 be an integer. Then f = (f 1 , . . . , fn ) :
(X, X ) → ( n , Bn ) if and only if for all i, 1 ≤ i ≤ n, {fi ≤ ai } ∈ X for all ai ∈ Q.

Proof. See Exercise 4.2. 

Measurability is integrable by composition:


8 CHAPTER 1. INTEGRATION

Theorem 1.1.3 Let (X, X ), (Y, Y) and (E, E) be three measurable spaces, and let φ : (X, X ) → (Y, Y),
g : (Y, Y) → (E, E). Then g ◦ φ : (X, X ) → (E, E).

Proof. This follows immediately from the definition of measurability; See Exercise 4.3. 

The next result shows that the collection of Borel functions is integrable by the “usual” operations.

Theorem 1.1.4 (i) Let f, g : (X, X ) → ( , B), and let λ ∈ . Then f g, f + g, λf , (f /g)1g6=0 are Borel
functions.
(ii) Let fn : (X, X ) → ( , B), n ∈ . Then lim inf n↑∞ fn and lim supn↑∞ fn are Borel functions,
and the set
{lim sup fn = lim inf fn } = {∃ lim fn }
n↑∞ n↑∞ n↑∞

belongs to X . In particular, if {∃ limn↑∞ fn } = X, the function limn↑∞ fn is a Borel function.

(See Exercise 4.4 which gives an idea of how (ii) of Theorem 1.1.4 can be proven.)

The two following results are technical tools, known as monotone class theorems (MCT). They owe their
importance to the fact that sigma-fields are often defined in terms of their generators. The first monotone
class theorem is useful in proving that a given collection of sets S contains a given sigma-field (We shall
use it in a typical way for Theorem 1.1.8).
Theorem 1.1.5 Let S be a collection of subsets of X satisfying the three following properties:

(a) X ∈ S;

(b) A, B ∈ S and A ⊆ B =⇒ B − A ∈ S;

(c) {An }n≥1 is a non-decreasing sequence in S =⇒ ∪∞


n=1 An ∈ S.

Then σ(C) ⊆ S for any collection C of subsets of X such that C ⊆ S.


Next result is called the functional MCT. It is useful in proving that a given class of functions contains
all the functions measurable with respect to a certain sigma-field.
Theorem 1.1.6 Let H be a vector space of real-valued functions defined on X and let C be a collection
of subsets of X, with the following properties:

(1) 1C ∈ H for all C ∈ C, and 1X ∈ H;

(2) If {fn }n≥1 is a nondecreasing sequence of nonnegative functions of H such that f = supn fn is finite
(resp., bounded) then f ∈ H.

Then H contains all finite (resp., bounded) real-valued functions on X which are σ(C)-measurable.

Definition 1.1.5 Let (X, X ) be a measurable space and let µ : X → [0, ∞] be a set function such that
for any denumerable family {An }n≥1 of mutually disjoint sets in X

! ∞
X X
µ An = µ(An ). (1.1)
n=1 n=1

The set function µ is called a measure on (X, X ), and (X, X , µ) is called a measure space.

Property (1.1) is the sigma-additivity property. The next three properties are easy to check.

µ(∅) = 0, (1.2)
1.1. THE LEBESGUE INTEGRAL 9

A ⊆ B and A, B ∈ X =⇒ µ(A) ≤ µ(B) (1.3)



! ∞
[ X
An ∈ X for all n ∈ =⇒ µ An ≤ µ(An ). (1.4)
n=0 n=0

The following are simple examples of measures:

Example 1.2: The Dirac measure. Let a ∈ X. The measure a defined by a (C) = 1C (a) is the
Dirac measure at a ∈ X. The set function µ : X → [0, ∞] defined by

X
µ(C) = αi 1ai (C),
i=0

P∞
where αi ∈ B+ for all i ∈ , is a measure denoted µ = i=0 α i  ai .

Example 1.3: Weighted counting measure. Let {αP n }n≥1 be a sequence of non-negative numbers.
The set function µ : P( ) → [0, ∞] defined by µ(C) = n∈C αn is a measure on ( , P( )). If αn ≡ 1
  

we have the counting measure ν on (ν(C) = card (C)).




Example 1.4: Lebesgue measure. There exists one and only one measure ` on ( , B) such that

`((a, b]) = b − a. (1.5)

This measure is called the Lebesgue measure on . (Note that the statement of this example is in fact
a theorem, which is part of a more general result, Theorem 1.1.8 below.)

Definition 1.1.6 Let µ be a measure on (X, X ). If µ(X) < ∞ the measure µ is called a finite measure.
If µ(X) = 1 the measure µ is called a probability measure. If there exists a sequence {K n }n≥1 of X such
that µ(Kn ) < ∞ for all n ≥ 1, and ∪∞ n=1 Kn = X, the measure µ is called a sigma-finite measure. A
measure µ on ( n , Bn ) such that µ(C) < ∞ for all bounded Borel sets C is called a Radon (or locally
bounded) measure.

Example 1.5: The Dirac measure a is a probability measure. The counting measure ν on is a 

sigma-finite measure. Any Radon measure on ( n , Bn ) is sigma-finite. Lebesgue measure is a Radon


measure.

Theorem 1.1.7 Let (X, X , µ) be a measure space. Let {An }n≥1 be a non-decreasing (that is, An ⊆ An+1
for all n ≥ 1) sequence of X . Then

!
[
µ An = lim ↑ µ(An ). (1.6)
n↑∞
n=1

Let {Bn }n≥1 be a non-increasing (that is, Bn+1 ⊆ Bn for all n ≥ 1) sequence of X such that µ(Bn0 ) < ∞
for some n0 ∈ + . Then

!
\
µ Bn = lim ↓ µ(Bn ). (1.7)
n↓∞
n=1
10 CHAPTER 1. INTEGRATION

Proof. We shall prove (1.6). This equality follows directly from sigma-additivity since

n−1
X
µ(An ) = µ(A1 ) + µ(Ai+1 − Ai )
i=1

and !

[ ∞
X
µ An = µ(A1 ) + µ(Ai+1 − Ai ).
n=1 i=1

The proof of (1.7) is left to the reader. 

The necessity of the condition µ(Bn0 ) < ∞ for some n0 is illustrated by the following counter-example.
Let ν be the counting measure on and for all n ≥ 1 define Bn = {i ∈ : |i| ≥ n}. Then ν(Bn ) = + ∞
 

for all n ≥ 1, and !


\∞
ν Bn = ν(∅) = 0.
n=1

Definition 1.1.7 A function F : → is called a cumulative distribution function (c.d.f.) if the


following properties are satisfied:
1. F is nondecreasing;
2. F is right-continuous;
3. F admits a left-hand limit, denoted F (x−), at all x ∈ .

Example 1.6: Let µ be a Radon measure on ( , B) and define


(
+µ((0, t]) if t ≥ 0,
Fµ (t) = (1.8)
− µ((t, 0]) if t < 0.

This is clearly a cumulative distribution function, and moreover,

Fµ (b) − Fµ (a) = µ((a, b]),


Fµ (a) − Fµ (a−) = µ({a}).

The function Fµ is called the c.d.f. of µ.

Theorem 1.1.8 Let F : → be a c.d.f.. There exists an unique measure µ on ( , B) such that
Fµ = F .

This result is easily stated, but it is not trivial, even in the case of the Lebesgue measure (Example 1.4).
It is typical of the existence results which answer the following type of question: Let C be a collection of
subsets of X with C ⊆ X , where X is a sigma-field on X. Given a set function u : C → [0, ∞], does there
exist a measure µ on (X, X ) such that µ(C) = u(C) for all C ∈ C, and is it unique? Note, however that
uniqueness is proven thanks to the MCT (Theorem (1.1.5)). Indeed suppose that there are two such
measures, µ and ν. Let S be the collection of sets in B that have the same µ- and ν-measures. S satisfies
the conditions (a,b,c) of Theorem 1.1.5, and S contains the class C of intervals (a, b] ⊆ . Therefore it
contains the sigma-field generated by C, that is B.

Definition 1.1.8 Let (X, X , µ) be a measure space. A µ-negligible set is a set contained in a measurable
set N ∈ X such that µ(N ) = 0. One says that some property P relative to the elements x ∈ X holds
µ-almost everywhere (µ-a.e.) if the set {x ∈ X : x does not satisfy P} is a µ-negligible set.
1.1. THE LEBESGUE INTEGRAL 11

For instance, if f and g are two Borel functions defined on X, the expression

f ≤ g µ-a.e.

means that
µ({x : f (x) > g(x)}) = 0.

Theorem 1.1.9 A countable union of µ-negligible sets is a µ-negligible set.

Proof. Let An , n ≥ 1 be a sequence of µ-negligible sets, and let Nn , n ≥ 1 be a sequence of measurable


sets such that µ(Nn ) = 0 and An ⊆ Nn . Then N = ∪n≥1 Nn is a measurable set containing ∪n≥1 An ,
and N is of µ-measure 0, by the sub-sigma-additivity property (1.4). 

Example 1.7: The rationals are Lebesgue-negligible. Any singleton {a}, a ∈ , is a Borel
set of Lebesgue measure 0. The set of rationals Q is a Borel set of Lebesgue measure 0. Proof: The
Borel sigma-field B is generated by the intervals Ia = (−∞, a], a ∈ (Theorem 1.1.1), and therefore
{a} = ∩n≥1 (Ia − Ia−1/n ) is also in B. Denoting ` the Lebesgue measure, `(Ia − Ia−1/n ) = 1/n, and
therefore `({a}) = limn≥1 `(Ia − Ia−1/n ) = 0. Q is a countable union of sets in B (singletons) and is
therefore in B. It has Lebesgue measure 0 as a countable union of sets of Lebesgue measure 0.

Theorem 1.1.10 If two continuous functions f, g : → are `-a.e. equal, they are everywhere equal.

Proof. Let t ∈ be such that f (t) 6= g(t). For any c > 0, there exists s ∈ [t − c, t + c] such that
f (s) = g(s) (Otherwise, the set {t; f (t) 6= g(t)} would contain the whole interval [t − c, t + c], and
therefore could not be of null Lebesgue measure. Therefore, one can construct a sequence {tn }n≥1
converging to t and such that f (tn ) = g(tn ) for all n ≥ 1. Letting n tend to ∞ yields f (t) = g(t), a
contradiction. 

1.1.2 Construction of the integral


We are now in a position to define the Lebesgue integral of a measurable function f : (X, X ) → ( , B)
with respect to µ, denoted
Z Z
f dµ, or f (x) µ(dx), or µ(f ).
X X

The important tool for this is the theorem of approximation of non-negative measurable functions by
the so-called simple Borel functions.

Definition 1.1.9 A Borel function f : (X, X ) → ( , B) of the type


k
X
f (x) = ai 1Ai (x), (1.9)
i=1

where k ∈ +, a1 , . . . , ak ∈ , A1 , . . . , Ak ∈ X , is called a simple Borel function (defined on X).

The following result is the key to the construction of the Lebesgue integral:

Theorem 1.1.11 Let f : (X, X ) → ( , B) be a non-negative Borel function. There exists a nondecreas-
ing sequence {fn }n≥1 of nonnegative simple Borel functions that converges pointwise to f .
12 CHAPTER 1. INTEGRATION

Proof. Take
−n
n2X −1
fn (x) = k2−n 1Ak,n (x),
k=0

where
Ak,n = {x ∈ X : k2−n < f (x) ≤ (k + 1)2−n }.
We leave to the reader to check that this sequence of functions has the announced properties. 

We define the integral in 3 steps:

STEP 1. For any non-negative simple Borel function f : (X, X ) → ( , B) of the form (1.9), one defines
the integral of f with respect to µ, by
Z k
X
f dµ = ai µ(Ai ). (1.10)
X i=1

STEP 2. If f : (X, X ) → ( , B) is non-negative the integral is defined by


Z Z
f dµ = lim ↑ fn dµ, (1.11)
X n↑∞ X

where {fn }n≥1 is a nondecreasing sequence of nonnegative simple Borel functions fn : (X, X ) → ( , B)
such that limn↑∞ ↑ fn = f . This definition can be shown to be consistent, in that the integral so defined
is independent of the choice of the approximating sequence. Note that the quantity (1.11) is non-negative
and can be infinite.

STEP 3. One checks that if f ≤ g, where f, g : (X, X ) → ( , B) are non-negative, then


Z Z
f dµ ≤ g dµ.
X X

Denoting
f + = max(f, 0) and f − = max(−f, 0)
(and in particular f = f + − f − and f ± ≤ |f |), we therefore have
Z Z
f ± dµ ≤ |f | dµ.
X X

Thus, if
Z
|f | dµ < ∞, (1.12)
X

the right-hand side of


Z Z Z
f dµ := f + dµ − f − dµ (1.13)
X X X

is meaningful and defines the integral of the left-hand side. Moreover, the integral of f with respect to
µ defined in this way is finite.

Definition 1.1.10 A measurable function f : (X, X ) → ( , B) satisfying (1.12) is called a µ-integrable


function.
1.1. THE LEBESGUE INTEGRAL 13

STEP 3 (ct’d). The integral can be defined for some non-integrable functions. For example, it is defined
for all non-negative
R functions.
R More generally, if f : (X, X ) → ( , B) is such that at least one of the
integrals X f + dµ or X f − dµ is finite, one defines
Z Z Z
f dµ = f + dµ − f − dµ. (1.14)
X X X

This leads to one of the forms “finite minus finite”, “finite minus infinite”, and “infinite minus finite”.
The case which is rigorously excluded is that in which µ(f + ) = µ(f − ) = + ∞.

Example 1.8: Integral with respect to the weighted counting measure. It is not hard to
check that any function f : →  is measurable with respect to P( ) and B. With the measure µ 

defined in Example 1.3, and with f ≥ 0 for instance,



X
µ(f ) = αn f (n).
n=1

The proof is fairly simple: It suffices to consider the approximating sequence of simple functions
+n
X
fn (k) = f (j)1{j} (k)
j=−n

whose integral is
+n
X +n
X
ν(fn ) = f (j)µ({j}) = f (j)αj
j=−n j=−n

and to let n tend to ∞.

When αn ≡ 1, the integral reduces to the sum of a series:


X
ν(f ) = f (n) .
n∈ 

In this case, integrability means that the series is absolutely convergent.

Example 1.9: Integral with respect to the Dirac measure. Let a be the Dirac measure at
point a ∈ X. Then any f : (X, X ) → ( , B) is a -integrable, and

a (f ) = f (a).

For a simple function f as in (1.9), we have


k
X k
X
a (f ) = ai a (Ai ) = ai 1Ai (a) = f (a).
i=1 i=1

For a non-negative function f , and any non-decreasing sequence of simple non-negative Borel functions
{fn }n≥1 converging to f , we have

a (f ) = lim a (fn ) = lim fn (a) = f (a).


n↑∞ n↑∞

Finally, for any f : (X, X ) → ( , B)

a (f ) = a (f + ) − a (f − ) = f + (a) − f − (a) = f (a)

is a well-defined quantity.
14 CHAPTER 1. INTEGRATION

We give the elementary properties of the integral. First, recall that for all A ∈ X
Z
1A dµ = µ(A), (1.15)
X
R R
and that the notation A
f dµ means X
1A f dµ.

Theorem 1.1.12 Let f, g : (X, X ) → ( , B) be µ-integrable functions, and let a, b ∈ . Then

(a) af + bg is µ-integrable and µ(af + bg) = aµ(f ) + bµ(g);

(b) If f = 0 µ-a.e., then µ(f ) = 0; If f = g µ-a.e., then µ(f ) = µ(g);

(c) If f ≤ g µ-a.e., then µ(f ) ≤ µ(g);

(d) |µ(f )| ≤ µ(|f |);

(e) If f ≥ 0 µ-a.e. and µ(f ) = 0, then f = 0 µ-a.e.;

(f ) If µ(1A f ) = 0 for all A ∈ X , then f = 0 µ-a.e..

(g) If f is µ-integrable, then |f | < ∞ µ-a.e..

For a complex Borel function f : X → (i.e. f = f1 + if2 , where f1 , f2 : (X, X ) → ( , B)) such that


µ(|f |) < ∞, one defines


Z Z Z
f dµ = f1 dµ + i f2 dµ. (1.16)
X X X

The extension to complex Borel functions of the properties (a), (b), (d) and (f) in Theorem 1.1.12 is
immediate.

The following result tells us that all the time spent in learning about the Riemann integral has not been
in vain.
Theorem 1.1.13 Let f : ( , B) → ( , B) be Riemann-integrable. Then it is Lebesgue-integrable with
respect to `, and the Lebesgue integral is equal to the Riemann integral.

Example 1.10: Integrable for Lebesgue and not for Riemann. The converse is not true: The
function f defined by f (x) = 1 if x ∈ Q and f (x) = 0 if x 6∈ Q is a Borel function, and it is Lebesgue
integrable with its integral equal to zero because {f 6= 0}, that is Q, has `-measure zero. However, f is
not Riemann integrable.

Example 1.11: The function f : ( , B) → ( , B) defined by


x
f (x) =
1 + x2
does not have a Lebesgue integral, because
x x
f + (x) = 1(0,∞) (x) and f − (x) = − 1(−∞,0) (x)
1 + x2 1 + x2
have infinite Lebesgue integrals. However, it has a generalized Riemann integral
Z +A
x
lim dx = 0.
A↑∞ −A 1 + x2
1.2. THE BIG RESULTS OF INTEGRATION THEORY 15

1.2 The big results of integration theory


1.2.1 Dominated convergence

We begin with the results giving conditions allowing to interchange the order of limit and integration,
that is, Z Z
lim fn dµ = lim ↑ fn dµ. (1.17)
X n↑∞ n↑∞ X

Example 2.1: The classical counterexample. For (X, X , µ) = ( , B, `), define


1
fn (x) = 0 if |x| > ,
n
1
fn (x) = n2 x + n if − ≤ x ≤ 0,
n
1
fn (x) = − n2 x + n if 0≤x≤ .
n
One has
lim fn (x) = 0 if x 6= 0,
n↑∞

that is limn↑∞ fn = 0 µ-a.e. Therefore µ(limn↑∞ fn ) = 0. However, µ(fn ) = 1 for all n ≥ 1.

The monotone convergence theorem below is sometimes called the Beppo Levi theorem.

Theorem 1.2.1 Let fn : (X, X ) → ( , B), n ≥ 1, be such that


(i) fn ≥ 0 µ-a.e.;
(ii) fn+1 ≥ fn µ-a.e..
Then there exists a nonnegative function f : (X, X ) → ( , B) such that

lim ↑ fn = f µ-a.e.,
n↑∞

and (1.17) holds.


The dominated convergence theorem below is often referred to as the Lebesgue theorem.

Theorem 1.2.2 Let fn : (X, X ) → ( , B), n ≥ 1, be such that, for some function f : (X, X ) → ( , B)
and some µ-integrable function g : (X, X ) → ( , B):

(i) lim fn = f , µ-a.e.;


n↑∞
(ii) |fn | ≤ |g| µ-a.e. for all n ≥ 1.

Then (1.17) holds.


The next result is a useful technical tool, called Fatou’s lemma.

Theorem 1.2.3 Let fn : (X, X ) → ( , B), n ≥ 1, be such that fn ≥ 0 µ-a.e. for all n ≥ 1. Then
Z „Z «
(lim inf fn ) dµ ≤ lim inf fn dµ . (1.18)
X n↑∞ n↑∞ x

Example 2.2: Dominated convergence theorem for series. When the measure is the counting
measure on the dominated convergence theorem takes the following form:


Let {ank }n≥1,k≥1 be an array of real numbers such that for some sequence {bk }k≥1 of nonnegative
numbers satisfying
X∞
bk < ∞,
k=1
16 CHAPTER 1. INTEGRATION

it holds that for all n ≥ 1, k ≥ 1,


|ank | ≤ bk .
If moreover for all k ≥ 1,
lim ank = ak ,
n↑∞

then

X ∞
X
lim ank = ak .
n↑∞
k=1 k=1

We give the proof for this case.

P
Proof.
P∞ Let  > 0 be fixed. Since ∞ k=1 bk is a convergent series, one can find M = M () such that

k=M +1 bk < 3 . Since |ank | ≤ bk and therefore |ak | ≤ bk , we have


X ∞
X 2
|ank | + |ak | ≤ .
k=M +1 k=M +1
3

Now, for sufficiently large n,


M
X 
|ank − ak | ≤ .
k=1
3

Therefore, for sufficiently large n,


˛ ˛
˛X ∞ X∞ ˛ X M X∞ X∞
˛ ˛  2
˛ ank − ak ˛ ≤ |ank − ak | + |ank | + |ak | ≤ + = .
˛ ˛ 3 3
k=1 k=1 k=1 k=M +1 k=M +1

A very useful application of the dominated convergence theorem is the theorem of differentiation under
the integral sign. Let (X, X , µ) be a measure space and let (a, b) ⊆ . Let f : (a, b) × X → and, for
all t ∈ (a, b), define ft : X → by ft (x) = f (t, x). Assume that for all t ∈ (a, b), ft is measurable with
respect to X , and define, if possible, the function I : (a, b) → by the formula
Z
I(t) = f (t, x) µ(dx). (1.19)
X

Theorem 1.2.4 Assume that for µ-almost all x the function t 7→ f (t, x) is continuous at t 0 ∈ (a, b) and
that there exists a µ-integrable function g : (X, X ) → ( , B) such that |f (t, x)| ≤ |g(x)| µ-a.e. for all t in
a neighbourhood V of t0 . Then I is well-defined and is continuous at t0 . If we furthermore assume that
(α) t → f (t, x) is continuously differentiable on V for µ-almost all x; and
(β) For some µ-integrable function h : (X, X ) → ( , B) and all t ∈ V ,

|(df /dt) (t, x)| ≤ |h(x)| µ-a.e.,

then I is differentiable at t0 and


Z
I 0 (t0 ) = (df /dt) (t0 , x) µ(dx). (1.20)
X
1.2. THE BIG RESULTS OF INTEGRATION THEORY 17

Proof. Let {tn }n≥1 be a sequence in V \ {t0 } such that limn↑∞ tn = t0 , and define fn (x) = f (tn , x),
f (x) = f (t0 , x). By the dominated convergence theorem,

lim I(tn ) = lim µ(fn ) = µ(f ) = I(t0 ).


n↑∞ n↑∞

Also Z
I(tn ) − I(t0 ) f (tn , x) − f (t0 , x)
= µ(dx),
tn − t 0 X tn − t 0
and for some θ ∈ (0, 1), possibly depending upon n,
˛ ˛
˛ f (tn , x) − f (t0 , x) ˛
˛ ˛ ≤ |(df /dt) (t0 + θ(tn − t0 ), x)| .
˛ tn − t 0 ˛

The latter quantity is bounded by |h(x)|. By the dominated convergence theorem,


Z „ «
I(tn ) − I(t0 ) f (tn , x) − f (t0 )
lim = lim µ(dx)
n↑∞ tn − t 0 n↑∞ tn − t 0
ZX
= (df /dt) (t0 , x) µ(dx).
X

The following result (called the image measure theorem) is especially important in Probability.

Definition 1.2.1 Let (X, X ) and (E, E) be two measurable spaces, let h : (X, X ) → (E, E) be a mea-
surable function, and let µ be a measure on (X, X ). Define the set function µ ◦ h −1 : E → [0, ∞]
by

(µ ◦ h−1 )(C) = µ(h−1 (C)), C ∈ E. (1.21)

Then, one easily checks that µ ◦ h−1 is a measure on (E, E) called the image of µ by h.

Theorem 1.2.5 For arbitrary non-negative f : (X, X ) → ( , B)


Z Z
(f ◦ h)(x) µ(dx) = f (y)(µ ◦ h−1 ) (dy). (1.22)
X E

For functions f : (X, X ) → ( , B) of arbitrary sign either one of the conditions

(a) f ◦ h is µ-integrable,

(b) f is µ ◦ h−1 -integrable,

implies the other, and equality (1.22) then holds.


Proof. The equality (1.22) is readily verified when f is a nonnegative simple Borel function. In the
general case one approximates f by a nondecreasing sequence of nonnegative simple Borel functions
{fn }n≥1 and (1.22) then follows from the same equality written with f = fn , by letting n ↑ ∞ and using
the monotone convergence theorem. For the case of functions of arbitrary sign, apply (1.22) with f +
and f − . 

1.2.2 Fubini
Let (X1 , X1 , µ1 ) and (X2 , X2 , µ2 ) be two measure spaces where µ1 and µ2 are sigma-finite. Define the
product set X = X1 × X2 and the product sigma-field X = X1 × X2 , where by definition the latter is the
smallest sigma-field on X containing all sets of the form A1 × A2 , where A1 ∈ X1 , A2 ∈ X2 .
18 CHAPTER 1. INTEGRATION

Theorem 1.2.6 There exists an unique measure µ on (X1 × X2 , X1 × X2 ) such that

µ(A1 × A2 ) = µ1 (A1 )µ2 (A2 ) (1.23)

for all A1 ∈ X1 , A2 ∈ X2 .

The measure µ is the product measure of µ1 and µ2 , and is denoted µ1 × µ2 .

The above result extends in an obvious manner to a finite number of sigma-finite measures.

Example 2.3: Lebesgue measureon n . The typical example of a product measure is the Lebesgue
measure on the space ( n , Bn ): It is the unique measure `n on that space that is such that

`n (Πn n
i=1 Ai ) = Πi=1 `(Ai ) for all A1 , . . . , An ∈ B.

Going back to the situation with two measure spaces (the case of a finite number of measure spaces is
similar) we have the following result:
Theorem 1.2.7 Let (X1 , X1 , µ1 ) and (X1 , X2 , µ2 ) be two measure spaces in which µ1 and µ2 are sigma-
finite. Let (X, X , µ) = (X1 × X2 , X1 × X2 , µ1 ⊗ µ2 ).
(A) Tonelli. If f is non-negative, then, for µ1 -almost all x1 , the function x2 → f (x1 , x2 ) is measurable
with respect to X2 , and Z
x1 → f (x1 , x2 ) µ2 (dx2 )
X2

is a measurable function with respect to X1 . Furthermore,


Z Z »Z –
f dµ = f (x1 , x2 ) µ2 (dx2 ) µ1 (dx1 ). (1.25)
X X1 X2

(B) Fubini.R If f is µ-integrable, then, for µ1 -almost all x1 , the function x2 → f (x1 , x2 ) is µ2 -integrable
and x1 → X2 f (x1 , x2 ) µ2 (dx2 ) is µ2 -integrable, and (1.25) is true.

We shall refer to the global result as the Fubini–Tonelli Theorem.


Part (A) says that one can integrate a non-negative Borel function in any order of its variables.
Part (B) says that the same is true of an arbitrary Borel function if that function is µ-integrable. In
general,
R in order to apply Part (B) one must use Part (A) with f = |f | to ascertain whether or not
|f | dµ < ∞.

Example 2.4: When Fubini cannot be applied. Consider the function f defined on X1 × X2 =
(1, ∞) × (0, 1) by the formula
f (x1 , x2 ) = e−x1 x2 − 2e−2x1 x2 .
We have
Z
e−x2 − e−2x2
f (x1 , x2 ) dx1 =
(1,∞) x2
= h(x2 ) ≥ 0,
Z
e−x1 − e−2x1
f (x1 , x2 ) dx2 = −
(0,1) x1
= − h(x1 ).

However, Z Z
1 ∞
h(x2 ) dx2 6= (− h(x1 )) dx1 ,
0 1
1.2. THE BIG RESULTS OF INTEGRATION THEORY 19

since h ≥ 0 `-a.e. on (0, ∞). We therefore see that successive integrations yields different results
according to the order in which they are performed. As a matter of fact, f (x1 , x2 ) is not integrable on
(0, 1) × (1, ∞).

Example 2.5: Fun Fubini. This example is for pure fun. Consider any bounded rectangle of 2 . We
say that it has Property (A) if at least one of its sides “is an integer” (meaning: its length is an integer).
Now you are given a finite rectangle ∆ that is the union of a finite number of disjoint rectangles with
Property (A). Show that ∆ itself must have Property (A).
R
Proof. Let I be a finite interval of . Observe that I e2iπx dx = 0 if and only if the length of I is an
RR
integer. Let now I × J be a finite rectangle. It has Property (A) if and only if e2iπ(x+y) dx dy =
R 2iπx R 2iπy I×J

I
e dx × J e dy = 0. (This is where we use Fubini.) Now
Z Z Z Z
e2iπ(x+y) dx dy = e2iπ(x+y) dx dy
∆ ∪K
n=1 ∆n
K
X Z Z
= e2iπ(x+y) dx dy = 0 ,
n=1 ∆n

since the ∆n ’s form a partition of ∆ and all have Property (A). 

The integration by parts formula is a corollary of the Fubini theorem.


Theorem 1.2.8 Let µ1 and µ2 be two sigma-finite measures on ( , B). For any interval (a, b) ⊆
Z Z
µ1 ((a, b])µ2 ((a, b]) = µ1 ((a, t]) µ2 (dt) + µ2 ((a, t)) µ1 (dt). (1.26)
(a,b] (a,b]

Observe that in the first integral we have (a, t] (closed on the right), whereas in the second integral we
have (a, t) (open at the right).
Proof. The proof consists in computing the µ-measure of the square (a, b] × (a, b] in two ways. The
first one is obvious and gives the left-hand side of (1.26). The second one consists in observing that
µ((a, b] × (a, b]) = µ(D1 ) + µ(D2 ), where D1 = {(x, y); a < y ≤ b, a < x ≤ y} and D2 = (a, b] × (a, b] \ D1 .
Then µ(D1 ) and µ(D2 ) are computed using Tonelli’s theorem. For instance,
Z „Z «
µ(D1 ) = 
1D1 (x, y)µ1 (dx) µ2 (dy)


and Z Z

1D1 (x, y)µ1 (dx) = 
1{a<x≤y}µ1 (dx) = µ1 ((a, y]).

Let µ be a Radon measure on ( , B) and let Fµ be its c.d.f.. The notation


Z
g(x) Fµ (dx)


R
stands for g(x) µ(dx). When this integral is used it is usually called the Lebesgue–Stieltjes integral of


g with respect to Fµ . With this notation (1.26) becomes


Z Z
F1 (b)F2 (b) − F1 (a)F1 (b) = F1 (x) dF2 (x) + F2 (x−) dF1 (x), (1.27)
(a,b] (a,b]
20 CHAPTER 1. INTEGRATION

where Fi := Fµi (i = 1, 2). This is the Lebesgue–Stieltjes version of the integration by parts formula of
Calculus.

We end this section with simple examples involving counting measures on  .

Example 2.6: Fubini for the counting measure. Applied to the product of two counting measures
on , the Fubini theorem deals with the problem of interchanging the order of summations. It says


(observing that almost-everywhere relatively to the counting measure in fact means everywhere since for
such measure, the only set of measure 0 is the void set): Let {ak,n }k,n∈ be a doubly indexed sequence 

of real numbers. Then if this sequence is absolutely summable, that is if


X
|ak,n | < ∞ (1.28)
k,n∈ 

P
the sum k,n∈ 
ak,n is well-defined, for each n ∈ 

X
|ak,n | < ∞
k∈ 

and !
X X X
ak,n = ak,n .
k,n∈ 
n∈ 
k∈ 

If the terms of the doubly indexed sequence are non-negative, the latter equality holds without condition
(1.28).

Example 2.7: Let {fn }n∈ be a sequence of measurable functions fn : → . Applying Fubini’s 

theorem with the product of the Lebesgue measure on by the couunting measure on yields: Under 

the condition !
Z X
|fn (t)| dt < ∞ ,

(1.29)
n∈ 

for almost-all t ∈ (with respect to the Lebesgue measure),


X
|fn (t)| dt < ∞
n∈ 

and !
Z X X „Z «

fn (t) dt = 
fn (t) dt .
n∈  n∈ 

If the fn ’s are non-negative, the latter equality holds without condition (1.29).

1.2.3 Radon–Nikodym

Product of a measure by a function

Definition 1.2.2 Let (X, X , µ) be a measure space and let h : (X, X ) → ( , B) be non-negative. Define
the set function ν : X → [0, ∞] by
Z
ν(C) = h(x) µ(dx). (1.30)
C

Then, one easily checks that ν is a measure on (X, X ) called the product of µ with the function h.
1.2. THE BIG RESULTS OF INTEGRATION THEORY 21

Theorem 1.2.9 For arbitrary non-negative f : (x, X ) → ( , B),


Z Z
f (x) ν(dx) = f (x)h(x) µ(dx). (1.31)
X X

If f : (X, X ) → ( , B) has arbitrary sign then either one of the conditions

(a) f is ν- integrable,

(b) f h is µ-integrable,

implies the other, and the equality (1.31) then holds.

Proof. Verifying (1.31) for elementary non-negative functions and approximating f by a non-decreasing
sequence of such functions, the monotone convergence theorem is then used, as in the proof of (1.22).
For the case of functions of arbitrary sign, apply (1.31) with f = f + and f = f − . 

Equality (1.30) is also written


ν(dx) = h(x) µ(dx).
Observe that for all X ∈ X

µ(C) = 0 =⇒ ν(C) = 0. (1.32)

Definition 1.2.3 Let µ and ν be two measures on (X, X ) such that (1.32) holds for all C ∈ X . Then
ν is said to be absolutely continuous with respect to µ. The measures µ and ν on (X, X ) are said to be
mutually singular if there exist two set A, B ∈ X such that X = A ∪ B and ν(A) = µ(B) = 0.

Absolute continuity is denoted ν << µ. Mutual singularity is denoted µ⊥ν.

We shall admit without proof the following two fundamental results:

Theorem 1.2.10 Let µ and ν be two measures on (X, X , with µ sigma-finite, such that ν << µ. Then
there exists a non-negative function h : (X, X ) → ( , B) such that

ν(dx) = h(x) µ(dx). (1.33)

This function h is called the Radon-Nikodym derivative of ν with respect to µ and is denoted dν/dµ.

The function h is easily seen to be µ-essentially unique in the sense that if h0 is another such function
then h = h0 µ-a.e..

From (1.33) we therefore have


Z Z
f (x) ν(dx) = f (x) h(x) µ(dx)
X X

for all non-negative f = (X, X ) → ( , B).

We quote without proof Lebesgue’s decomposition theorem:

Theorem 1.2.11 Let µ and ν be two sigma-finite measures on (X, X ). There exists an unique decom-
position

ν = ν 1 + ν2 , (1.34)

where ν2 is a measure on (X, X ) that is singular with respect to µ, and where ν 1 < µ.
22 CHAPTER 1. INTEGRATION

1.3 The Riesz–Fischer theorem


1.3.1 The Lp spaces
For a given p ≥ 1, Lp (µ) is, roughly speaking
R (see the details below), the collection of complex-valued
Borel functions f defined on X such that X |f |p dµ < ∞. We shall see that it is a complete normed
vector space over , that is, a Banach space. Of special interest to Fourier analysis is the case p = 2,


since L2 (µ) has additional structure that makes of it a Hilbert space.

Let (X, X , µ) be a measure space and let f, g : → be two Borel functions defined on X. The


relation R defined by
(f Rg) if and only if (f = g µ-a.e.)
is an equivalence relation, and we shall denote the equivalence class of f by {f }. Note that for any p > 0
(using property b of Theorem 1.1.12),
„Z Z «
(f Rg) =⇒ |f |p dµ = |g|p dµ .
X X

The operations +, ×, ∗ , and multiplication by a scalar α ∈  are defined on the equivalence class by

{f } + {g} = {f + g} , {f } {g} = {f g} , {f }∗ = {f ∗ } , α {f } = {αf } .

The first equality means that {f } + {g} is, by definition, the equivalence class consisting of the functions
f + g, where f and g are abritrary members of {f } and {g}, respectively. A similar interpretation holds
for the other equalities.
R
By definition, for a given p ≥ 1, Lp (µ) is the collection of equivalence classes {f } such that X |f |p dµ <
∞. Clearly it is a vector space over (for the proof recall that
„ «p
|f | + |g| 1 1
≤ |f |p + |g|p
2 2 2

since t → tp is a convex function when p ≥ 1).

In order to avoid cumbersome notation, in this Section and in general whenever we consider Lp -spaces,
we shall write f for {f }. This abuse of notation is harmless since two members of the same equivalence
class have the same integral if that integral is defined. Therefore, using loose notation,
 Z ff
Lp (µ) = f : |f |p dµ < ∞ . (1.35)
X

The following is a simple and often used observation.

Theorem 1.3.1 Let p and q be positive real numbers such that p > q. If the measure µ on (X, X , µ) is
finite, then Lp (µ) ⊆ Lq (µ). In particular, L2 (µ) ⊆ L1 (µ).

Proof. From the inequality |a|q ≤ 1 + |a|p , true for all a ∈ , it follows that µ(|f |q ) ≤ µ(1) + µ(|f |p ).


Since µ(1) = µ( ) < ∞, µ(|f |q ) < ∞ whenever µ(|f |p ) < ∞. 

1.3.2 Hölder’s and Minkowski’s Inequalities


Theorem 1.3.2 Let p and q be positive real numbers different from 1 such that
1 1
+ =1
p q
1.3. THE RIESZ–FISCHER THEOREM 23

(p and q are then said to be conjugate), and let f, g : (X, X ) 7→ ( , B) be non-negative. Then, we have
Hölder’s inequality
Z »Z –1/p »Z –1/q
f g dµ ≤ f p dµ g q dµ . (1.36)
X X X

2 1
In particular, if f, g ∈ L ( ), then f g ∈ L ( ).

Proof. Let
„Z «1/p „Z «1/q
p q
A= (f dµ ,B= g dµ .
X X

We may assume that 0 < A < ∞, 0 < B < ∞, because otherwise Hölder’s inequality is trivially satisfied.
Define F = f /A, G = g/A, so that
Z Z
F p dµ = Gq dµ = 1.
X X

The inequality
1 1
F (x)G(x) ≤ F (x)p + G(x)q . (1.37)
p q
is trivially satisfied if x is such that F (x) = 0 or G(x) = 0. If F (x) > 0 and G(x) > 0 define

s(x) = p ln(F (x)), t(x) = q ln(G(x)).

From the convexity of the exponential function and the assumption that 1/p + 1/q = 1,
1 s(x) 1
es(x)/p+t(x)/q ≤ e + et(x) ,
p q
and this is precisely the inequality (1.37). Integrating this inequality yields
Z
1 1
(F G) dµ ≤ + = 1,
X p q
and this is just (1.36). 

Theorem 1.3.3 Let p ≥ 1 and let f, g : (X, X ) 7→ ( , B( )) be non-negative and such that
Z Z
f p dµ < ∞, g p dµ < ∞.
X X

Then, we have Minkowski’s inequality


»Z –1/p »Z –1/p »Z –1/p
(f + g)p ≤ f p dµ + g p dµ . (1.38)
X X X

Proof. For p = 1 the inequality (in fact an equality) is obvious. Therefore, assume p > 1. From Hölder’s
inequality
Z »Z –1/p »Z –1/q
f (f + g)p−1 dµ ≤ f p dµ (f + g)(p−1)q
X X X
and Z »Z –1/p »Z –1/q
p−1 p (p−1)q
g(f + g) dµ ≤ g dµ (f + g) .
X X X
Adding together the above two inequalities and observing that (p − 1)q = p, we obtain
24 CHAPTER 1. INTEGRATION

Z »Z –1/q »Z –1/p »Z –1/p !


p p p p
(f + g) dµ ≤ (f + g) f dµ + g dµ .
X x X X

One may assume that the right-hand side of R (1.38) is finite and that the left-hand side is positive
(otherwise the inequality is trivial). Therefore X (f + g)p dµ ∈ (0, ∞). We may therefore divide both
ˆR ˜1/q
sides of the last display by X (f + g)p dµ . Observing that 1−1/q = 1/p yields the desired inequality
(1.38).
For the last assertion of the theorem, take p = q = 2. 

Theorem 1.3.4 Let p ≥ 1. The mapping νp : Lp (µ) 7→ [0, ∞) defined by


„Z «1/p
νp (f ) = |f |p dµ (1.39)
X

defines a norm on Lp (µ).

`R ´1/p
Proof. Clearly, νp (αf ) = |α|νp (f ) for all α ∈ , f ∈ Lp (µ); Also, νp (f ) = 0 if and only if X |f |p dµ
 =
0 which is in tturn equivalent to f = 0, µ − a.e.; Finally, νp (f + g) ≤ νp (f ) + νp (g) for all f, g ∈ Lp (µ),
by Minkowski’s inequality. Therefore νp is a norm. 

1.3.3 Completeness of Lp
We shall denote νp (f ) by kf kp . Thus Lp (µ) is a normed vector space over  , with the norm k · kp and
the induced distance
dp (f, g) = kf − gkp .

Theorem 1.3.5 Let p ≥ 1. The distance dp makes of Lp (µ) a complete normed vector space.

In other words, Lp (µ) is a Banach space for the norm k · kp .


Proof. To show completeness one must prove that for any sequence {fn }n≥1 of Lp (µ) that is a
Cauchy sequence (that is such that limm,n↑∞ dp (fn , fm ) = 0), there exists f ∈ Lp (µ) such that
limn↑∞ dp (fn , f ) = 0.
Since {fn }n≥1 is a Cauchy sequence, one can select a subsequence {fni }i≥1 such that

dp (fni+1 − fni ) ≤ 2−i . (1.40)

Let
k
X ∞
X
gk = |fni+1 − fni |, g = |fni+1 − fni |.
i=1 i=1

By (1.40) and Minkowski’s inequality we have kgk kp ≤ 1. Fatou’s lemma applied to the sequences
{gkp }k≥1 gives kgkp ≤ 1. In particular, any member of the equivalence class of g is finite µ-almost
everywhere, and therefore

X ` ´
fn1 (x) + fni+1 (x) − fni (x)
i=1

converges absolutely for µ-almost all x. Call the corresponding limit f (x) (set f (x) = 0 when this limit
does not exist). Since
k−1
X` ´
fn 1 + fni+1 − fni = fnk
i=1
1.3. THE RIESZ–FISCHER THEOREM 25

we see that
f = lim fnk µ-a.e..
k↑∞
p
One must show that f is the limit in L (µ) of {fnk }k≥1 . Let  > 0. There exists an integer n = N ()
such that kfn − fm kp ≤  whenever m, n ≥ N . For all m > N , by Fatou’s lemma we have
Z Z
|f − fm |p dµ ≤ lim inf |fni − fm |p dµ ≤ p .
X i→∞ x

p p
Therefore f − fm ∈ L (µ), and consequently f ∈ L (µ). It also follows from the last inequality that

lim kf − fm kp = 0.
m→∞

In the proofs of Theorems 1.3.5 we have obtained the following result.


Theorem 1.3.6 Let {fn }n≥1 be a convergent sequence in Lp (µ), where p ≥ 1, and let f be the limit.
Then, there exists a subsequence {fni }i≥1 such that

lim fni = f µ-a.e.. (1.41)


i↑∞

Note that the statement in (1.41) is about functions and is not about equivalence classes. The functions
thereof are any members of the corresponding equivalence class. In particular, since when a given
sequence of functions converges µ-a.e. to two functions, these two functions are necessarily equal µ-a.e.,
Theorem 1.3.7 If {fn }n≥1 converges both to f in Lp (µ) and to g µ-a.e., then f = g µ-a.e.

Example 3.1: The space `p ( ). When the measure µ is the counting measure on
  , we use the
notation `p ( ) (or, if the context permits, `p ) for Lp (µ). Therefore


( )
p
X p
` ( ) := a ∈
  ;

|an | < ∞
n∈ 
26 CHAPTER 1. INTEGRATION

1.4 Exercises
Exercise 4.1.
Prove Corollary 1.1.1

Exercise 4.2.
Prove Corollary 1.1.2

Exercise 4.3.
Prove Theorem 1.1.3

Exercise 4.4.
Let (X, X ) be some measurable space, and let fn : (X, X ) → ( , B), n ≥ 1, be a sequence of functions
that is pointwise nondecreasing, that is, for all x ∈ X, the sequence of real numbers {fn (x)}n≥1 is
nondecreasing, and in particular, it admits a (possibly infinite) limit f (x). Show that the function
f : (X, X ) → ( , B) is measurable.

Exercise 4.5.
Let ψ be a function in L2R (R) with the FT ψ̂ = 2π 1
1I , where I = [−2π, −π] ∪ [+π, +2π]. Show that
{ψj,n }j∈ ,n∈ is a Hilbert basis of LR (R), where ψj,n (x) = 2j/2 ψ(2j/2 x − n).
 
2

Exercise 4.6.
Let {gj }j≥0 be a Hilbert basis of L2 ((0, 1]). Show that {gj (· − n)1(n,n+1](·) }j≥0,n∈ is a Hilbert basis of


L2 ((0, 1]).(Here, L2 (I) denotes the Hilbert space of (equivalence


R classes of) measurable complex valued
functions defined on I, with the hermitian product < f, g >= I f (t)g(t)∗ dt.)
1.5. SOLUTIONS 27

1.5 Solutions

Solution (Exercise 4.1).


By definition of continuity, the inverse image of an open set of E is an open set of X and is therefore in
B(X). Since the open sets of E generate B(E), the function f is measurable with respect to B(X) and
B(E), by Theorem 1.1.2.

Solution (Exercise 4.2).


By Theorem 1.1.2 and Theorem 1.1.1, it suffices to show that for all a = (a1 , . . . , an ) ∈ Qn , {f ≤ a} ∈ X .
This is indeed the case since
{f ≤ a} = ∩n i=1 {fi ≤ ai } ,

and is therefore in X , being the intersection of sets in X .

Solution (Exercise 4.3).


Let f = g ◦ φ. For all C ∈ E,

f −1 (C) = g −1 (φ−1 (C)) = g −1 (D) ∈ X .

Indeed, D = φ−1 (C) is a set in Y since φ ∈ E/Y, and therefore g −1 (D) ∈ X since g ∈ Y/X .

Solution (Exercise 4.4).


By Theorem 1.1.2 it suffices to show that for all a ∈ , {f ≤ a} ∈ X . But since the sequence {fn }n≥1
is non-decreasing, {f ≤ a} = ∩ni=1 {fn ≤ a}, which is indeed in X , being a countable intersection of sets
in X .
28 CHAPTER 1. INTEGRATION
Chapter 2

Hilbert spaces

2.1 Basic definitions and properties


2.1.1 Inner product and Schwarz’s inequality
Let H be a vector space with scalar field K = or , endowed with an application (x, y) ∈ H × H →<


x, y >∈ K such that for all x, y, z ∈ H and all λ ∈ K,


1. < y, x >=< x, y >∗
2. < λy, x >= λ < y, x >
3. < x, y + z >=< x, y > + < x, z >
4. < x, x >≥ 0 and < x, x >= 0 if and only ifx = 0
Then H is called a pre-Hilbert space over K.

The quantity < x, y > is called the inner product of x and y. For any x ∈ E, denote

kxk2 =< x, x > .

Elementary computations yield

kx + yk2 = kxk2 + kyk2 + 2Re {< x, y >}

for any x, y ∈ E. The parallelogram identity


1
kxk2 + kyk2 = (kx + yk2 + kx − yk2 ) (2.1)
2
is obtained by expanding the right-hand side of (2.1) using the previous equality.

Next theorem gives the ubiquitous Schwarz’s inequality.


Theorem 2.1.1 For all x, y ∈ H,
| < x, y > | ≤ kxk × kyk. (2.2)
Equality occurs if and only if x and y are colinear.

29
30 CHAPTER 2. HILBERT SPACES

Proof. We do the proof for the case K = . We may assume that < x, y >6= 0, otherwise the result is
trivial. For all λ ∈ ,

kxk2 + 2λ < x, y >2 +λ2 < x, y >2 kyk2 = kx + λ < x, y > yk2 ≥ 0.

This second degree polynomial in λ ∈ therefore cannot have two distinct real roots, and this implies
a non-positive discriminant:

| < x, y > |4 − kxk2 | < x, y > |2 kyk2 ≤ 0,

and therefore the inequality (2.2) holds. Equality corresponds to a null discriminant, and this in turn
implies a double root λ of the polynomial. For such a root kx + λ < x, y > yk2 = 0, which implies
x + λ < x, y > y = 0. 

Theorem 2.1.2 The mapping x → kxk is a norm on E, that is to say, for all x, y ∈ E, all α ∈  ,

(a) kxk ≥ 0, and kxk = 0 if and only if x = 0;


(b) kαxk = |α| kxk;
(c) kx + yk ≤ kxk + kyk (triangle inequality).

Proof. The proof of (a) and (b) is immediate. For (c) write

kx + yk2 = kxk2 + kyk2 + < x, y > + < y, x >

and
(kxk + kyk)2 = kxk2 + kyk2 + 2kxkkyk.
It therefore suffices to prove
< x, y > + < y, x > ≤ 2 kxkkyk,
and this follows from Schwarz’s inequality. 

The norm k · k induces a distance d(·, ·) on H by

d(x, y) = kx − yk.

Recall that a mapping d : E × E → + is called a distance on E if, for all x, y, z ∈ E,

(a0 ) d(x, y) ≥ 0, and d(x, y) = 0 if and only if x = y;


(b0 ) d(x, y) = d(y, x);
(c0 ) d(x, y) ≥ d(x, z) + d(z, y).

The above properties are immediate consequences of (a), (b), and (c) of Theorem 2.1.2. When endowed
with a distance, a space H is called a metric space.

Definition 2.1.1 The pre-Hilbert space H is called a Hilbert space if the distance d makes of it a complete
metric space.

By complete, the following is meant: If {xn }n≥1 is a Cauchy sequence in H, that is, if

lim d(xm , xn ) = 0,
m,n↑∞

then there exists x ∈ H such that


lim d(xn , x) = 0.
n↑∞
2.1. BASIC DEFINITIONS AND PROPERTIES 31

2.1.2 Continuity of the inner product


Theorem 2.1.3 Let {xn }n≥1 and {yn }n≥1 be sequences in a Hilbert space H that converge to x and y
respectively. Then:
lim < xn , ym >=< x, y > .
m,n%∞

Proof. We have

| < x + h1 , y + h2 > − < x, y > | = | < x, h2 > + < h1 , y > + < h1 , h2 > |.

By Schwarz’s inequality | < x, h2 > | ≤ kxkkh2 k, | < h1 , y > | ≤ kykkh1 k, and | < h1 , h2 > | ≤ kh1 kkh2 k.
Therefore
lim | < x + h1 , y + h2 > − < x, y > | = 0.
kh1 k,kh2 k↓0

In other words, the inner product of a Hilbert space is bicontinuous. In particular, the norm x 7→ kxk is
a continuous function from H to + .

2.1.3 The L2 spaces


Let µ be a sigma-finite measure on a measurable space (X , X ). Of special interest is the space L2 (µ)
of measurable functions f : X → such that


Z
|f (x)|2 µ(dx) < ∞,
X

where two functions f anx f such that f (x) = f 0 (x), µ − a.e. are not distinguished. We have by the
0

Riesz–Fischer theorem:
Theorem 2.1.4 L2 (µ) is a vector space with scalar field , and when endowed with the inner product


Z
< f, g >= f (x)g(x)∗ µ(dx) (2.3)
X

it is a Hilbert space.

The norm of a function f ∈ L2 (µ) is


„Z «1
2
kf k = |f (x)|2 µ(dx)
X

and the distance between two functions f and g in L2 (µ) is


„Z «1
2
2
d(f, g) = |f (x) − g(x)| µ(dx) .
X

The completeness property of L2 (µ) reads, in this case, as follows. If {fn }n≥1 is a sequence of functions
in L2 (µ) such that Z
lim |fn (x) − fm (x)|2 µ(dx) = 0,
m,n↑∞ X
2
then, there exists a function f ∈ L (µ) such that
Z
lim |fn (x) − f (x)|2 µ(dx) = 0.
n↑∞ X
32 CHAPTER 2. HILBERT SPACES

In L2 (µ), Schwarz’s inequality reads:


Z „Z « 1 „Z «1
2 2
| f (x)g(x)∗ µ(dx)| ≤ |f (x)|2 µ(dx) |g(x)|2 µ(dx)
X X X

The following are particular cases.

Example 1.1: The Hilbert space `2 ( ). The space `2 ( ) of complex sequences a = {an }n∈ such
 

that X
|an |2 < ∞
n∈ 

with the inner product X


< a, b >= an b∗n
n∈ 

is a Hilbert space. This is indeed a particular case of a Hilbert space L2 (µ), with X = , and µ is the
counting measure. In this example, Schwarz’s inequality reads
!1 !1
X ∗
X 2
2 X 2
2

| an bn | ≤ |an | × |bn | .
n∈  n∈  n∈ 

2.2 Projections and isometries


2.2.1 The projection principle
A subset G is said to be closed in H if every convergent sequence of G has a limit in G.

Theorem 2.2.1 Let G ⊂ H be a vector subspace of the Hilbert space H. Endow G with the Hermitian
product which is the restriction to G of the Hermitian product on H. Then, G is a Hilbert space if and
only if G is closed in H. (G is then called a Hilbert subspace of H.)

Proof. Assume that G is closed. Let {xn }n∈N be a Cauchy sequence in G. It is a fortiori a Cauchy
sequence in H, and therefore it converges in H to some x, and this x must be in G, because it is a limit
of elements of G and G is closed.
Assume that G is a Hilbert space with the hermitian product induced by the hermitian product of
H. In particular every convergent sequence {xn }n∈N of elements of G converges to some element of G.
Therefore G is closed. 

Definition 2.2.1 Two elements x, y ∈ E are said to be orthogonal if < x, y >= 0. Let G be a Hilbert
subspace of the Hilbert space H. The orthogonal complement of G in H, denoted G ⊥ , is defined by
G⊥ = {z ∈ H :< z, x >= 0 for all x ∈ G} . (2.4)
Let x1 , . . . , xn ∈ E be pairwise orthogonal. We have Pythagoras’ theorem:
n
X n
X
k x i k2 = kxi k2 , . (2.5)
i=1 i=1

Clearly, G⊥ is a vector space over . Moreover, it is closed in H since if {zn }n≥1 is a sequence of


elements of G⊥ converging to z ∈ H then, by continuity of the hermitian product,


0 = lim < zn , x >=< z, x > for all x ∈ H.
n↑∞
2.2. PROJECTIONS AND ISOMETRIES 33

Therefore G⊥ is a Hilbert subspace of H.

Observe that a decomposition x = y + z where y ∈ G and z ∈ G⊥ is necessarily unique. Indeed, let


x = y 0 + z 0 be another such decomposition. Then, letting a = y − y 0 , b = z − z 0 , we have that 0 = a + b
where a ∈ G and b ∈ G⊥ . Therefore, in particular, 0 =< a, a > + < a, b >=< a, a >, which implies that
a = 0. Similarly, b = 0.

We can now state and prove the projection theorem.

Theorem 2.2.2 Let x ∈ H. There exists an unique element y ∈ G such that x − y ∈ G ⊥ . Moreover,

ky − xk = inf ku − xk. (2.6)


u∈G

Proof. Let d(x, G) = inf z∈G d(x, z) and let {yn }n≥1 be a sequence in G such that

1
d(x, G)2 ≤ d(x, yn )2 ≤ d(x, G)2 + . (2.7)
n
The parallelogram identity gives, for all m, n ≥ 1,
1
kyn − ym k2 = 2(kx − yn k2 + kx − ym k2 ) − 4kx − (ym + yn )k2 .
2
1
Since 2
(yn + ym ) ∈ G,
1
kx − (ym + yn )k2 ≥ d(x, G)2 ,
2
and therefore „ «
1 1
kyn − ym k2 ≤ 2 + .
n m
The sequence {yn }n≥1 is therefore a Cauchy sequence in G and consequently it converges to some y ∈ G
since G is closed. Passing to the limit in (2.7) gives (2.6).

Uniqueness of y satisfying (2.6): Let y 0 ∈ G be another such element. Then

kx − y 0 k = kx − yk = d(x, G),

and from the parallelogram identity


1
ky − y 0 k2 = 2ky − xk2 + 2ky 0 − xk2 − 4kx − (y + y 0 )k2
2
1
= 4d(x, G)2 − 4kx − (y + y 0 )k2 .
2
Since 1
2
(y + y0) ∈ G
1
kx − (y + y 0 )k2 ≥ d(x, G)2 ,
2
and therefore
ky − y 0 k2 ≤ 0,
which implies ky − y 0 k2 = 0, and therefore, y = y 0 .

It now remains to show that x − y is orthogonal to G, that is,

< x − y, z >= 0 for all z ∈ G.

This is trivially true if z = 0, and we shall therefore assume z 6= 0. Because y + λz ∈ G for all λ ∈

kx − (y + λz)k2 ≥ d(x, G)2 ,


34 CHAPTER 2. HILBERT SPACES

that is,
kx − yk2 + 2λRe {< x − y, z >} + λ2 kzk2 ≥ d(x, G)2 .
Since
kx − yk2 = d(x, G)2
we have
− 2λRe {< x − y, z >} + λ2 kzk2 ≥ 0 for all λ ∈ ,
which implies Re {< x − y, z >} = 0. The same type of calculation with λ ∈ i (pure imaginary) leads
to
= {< x − y, z >} = 0.
Therefore
< x − y, z >= 0.
That y is the unique element of G such that y − x ∈ G⊥ follows from the observation made just
before the statement of Theorem 2.2.2. 

Definition 2.2.2 The element y in Theorem 2.2.2 is called the orthogonal projection of x on G and is
denoted PG (x).

The projection theorem states, in particular, that for any x ∈ G there is an unique decomposition

x = y + z, y ∈ G, z ∈ G⊥ , (2.8)

and that y = PG (x), the (unique) element of G closest to x. Therefore

Theorem 2.2.3 The orthogonal projection y = PG (x) is characterized by the two following properties:

(1) y ∈ G;
(2) < y − x, z >= 0 for all z ∈ G.
This characterization is called the projection principle and is useful in determining projections.

Let C be a collection of vectors in the Hilbert space H. The linear span of C, denoted sp(C) is, by
definition, the set of all finite linear combinations of vectors of C. This is a vector space. The closure of
this vector space, sp(C), is called the Hilbert subspace generated by C. By definition, x belongs to this
subspace if and only if there exists a sequence of vectors {xn }n≥1 such that

(i) for all n ≥ 1, xn is a finite linear combination of vectors of C.

(ii) limn↑∞ xn = x

b ∈ H. It is the projection of x onto G = sp(C) if and only if


Theorem 2.2.4 Let x

b∈G
(α) x

(β) < x − x
b, z >= 0 for all z ∈ C.
Note that we have to satisfy requirement not for all z ∈ G, but only for all z ∈ C.

Proof. We have to show that < x − x b, z >= 0 for all z ∈ G. But z = lim n↑∞ zn where {zn }n≥1 is a
sequence of vectors of sp(C) such that limn↑∞ zn = z. By hypothesis, for all n ≥ 1, < x − x
b, zn >= 0.
Therefore, by continuity of the inner product,

<x−x
b, z > lim < x − x
b, zn >= 0.
n↑∞


2.2. PROJECTIONS AND ISOMETRIES 35

2.2.2 Hilbert space isometries

Definition 2.2.3 Let H and K be two Hilbert spaces with Hermitian products denoted f orm H and
f ormK , respectively, and let φ : H 7→ K be a linear mapping such that for all x, y ∈ H
< φ(x), φ(y) >K =< x, y >H . (2.9)

Then, φ is called a linear isometry from H into K. If, moreover, φ is from H onto K, then H and K
are said to be isomorphic.
Note that a linear isometry is necessarily injective, since φ(x) = φ(y) implies φ(x − y) = 0, and
therefore
0 = kφ(x − y)kK = kx − ykH ,
and this implies x = y. In particular, if the linear isometry is onto, it is necessarily bijective.

Recall that a subset A ∈ E where (E, d) is a metric space, is said to be dense in E, if for all x ∈ E,
there exists a sequence {xn }n≥1 in A converging to x. The following result is a useful tool called the
isometry extension theorem. .
Theorem 2.2.5 Let H and K be two Hilbert spaces with Hermitian products f orm H and f ormK ,
respectively. Let V be a vector subspace of H that is dense in H, and let φ : V 7→ K be a linear isometry
from V to K (φ is linear and (2.9) holds for all x, y ∈ V ). Then, there exists a unique linear isometry
φ̃ : H 7→ K whose restriction to V is φ.

Proof. We shall first define φ̃(x) for x ∈ H. Since V is dense in H, there exists a sequence {xn }n≥1 in
V converging to x. Since φ is isometric,
kφ(xn ) − φ(xm )kK = kxn − xm kH for all m, n ≥ 1.

In particular {φ(xn )}n≥1 is a Cauchy sequence in K and therefore it converges to some element of K
which we denote φ̃(x).
The definition of φ̃(x) is independent of the sequence {xn }n≥1 converging to x. Indeed, for another
such sequence {yn }n≥1
lim kφ(xn ) − φ(yn )kK = lim kxn − yn kH = 0.
n↑∞ n↑∞

The mapping φ̃ : H 7→ K so constructed is clearly an extension of φ (for x ∈ V one can take as the
approximating sequence of x the sequence {xn }n≥1 such that xn ≡ x).
The mapping φ̃ is linear. Indeed, let x, y ∈ H, α, β ∈ , and let {xn }n≥1 and {yn }n≥1 be two


sequences in V converging to x and y, respectively. Then {αxn + βyn }n≥1 converges to αx + βy.
Therefore
lim φ(αxn + βyn ) = φ̃(αx + βy).
n↑∞

But
φ(αxn + βyn ) = αφ(xn ) + βφ(yn ) → αφ̃(x) + β φ̃(y).
Therefore φ̃(αx + βy) = αφ̃(x) + β φ̃(y).

The mapping φ̃ is isometric since, in view of the bicontinuity of the Hermitian product and of the
isometricity of φ,

< φ̃(x), φ̃(y) >K = lim < φ(xn ), φ(yn ) >K


n↑∞

= lim < xn , yn >H =< x, y >H ,


n↑∞

where {xn }n≥1 and {yn }n≥1 are two sequences in V converging to x and y, respectively. 
36 CHAPTER 2. HILBERT SPACES

2.3 Orthonormal Expansions


2.3.1 Orthonormal Systems and Bessel’s inequality
Definition 2.3.1 The sequence {en }n≥0 in a Hilbert space H is called an orthonormal system of H if
it satisfies the two following conditions:

(α) < en , ek >= 0 for all n 6= k; and


(β) ken k = 1 for all n ≥ 0.

An orthonormal system {en }n≥0 is free in the sense that an arbitrary finite subset of it is linearly
independent. Indeed, taking (e1 , . . . , ek ) for example, the relation
k
X
αi ei = 0
i=1

implies that
k
X
α` =< e` , αi ei >>= 0 1 ≤ ` ≤ k.
i=1

The following theorem gives the preliminary results that we shall need for the proof of the Hilbert basis
theorem.

Theorem 2.3.1 Let {en }n≥0 be an orthonormal system of H and let G be the Hilbert subspace of H
generated by {en }n≥1 . Then:
P
(a) For an arbitrary sequence {αn }n≥0 of complex numbers the series n≥0 αn en is convergent in
2
H if and only if {αn }n≥1 ∈ ` , in which case
X X
k α n e n k2 = |αn |2 . (2.10)
n≥0 n≥0

(b) For all x ∈ H Bessel’s inequality holds:


X
| < x, en > |2 ≤ kxk2 . (2.11)
n≥0
P
(c) For all x ∈ H the series n≥0 < x, en > en converges, and
X
< x, en > en = PG (x), (2.12)
n≥0

where PG is the projection on G. P


(d) For all x, y ∈ H the series n≥1 < x, en >< y, en > is absolutely convergent, and
X
< x, en >< y, en >∗ =< PG (x), PG (y) > . (2.13)
n≥0

Proof. (a) From Pythagoras’ theorem we have


n
X n
X
k α j e j k2 = |αj |2 ,
j=m+1 j=m+1
Pn
and therefore { j=0 αj ej }n≥0 is a Cauchy sequence in H if and only if
Pn 2 P
{ j=0 |αj | }n≥0 is a Cauchy sequence in R. In other words, n≥0 αn en converges if and only if
2.3. ORTHONORMAL EXPANSIONS 37

P
|αn |2 < ∞. In this case equality (2.10) follows from the continuity of the norm, by letting n tend
n≥0
to ∞ in the last display.

(b) According to (α) of Theorem ??, kxk ≥ kPGn (x)k, where Gn is the Hilbert subspace spanned by
{e1 , . . . , en }. But
Xn
PGn (x) = < x, ei > ei ,
i=0

and by Pythagoras’ theorem


n
X
kPGn (x)k2 = | < x, ei > |2 .
i=0

Therefore
n
X
kxk2 ≥ | < x, ei > |2 ,
i=0

from which Bessel’s inequality follows on letting n → ∞.

P (c) From (2.11) and the result (a), it follows that the series
n≥0 < x, en > en converges. For any m ≥ 0 and for all N ≥ m

N
X
<x− < x, en > en , em >>= 0,
n=0

and therefore by continuity of the hermitian product,


X
< x− < x, en > en , em >>= 0 for all m ≥ 0.
n≥0

P P
This implies that x − n≥0 < x, en > en is orthogonal to G. Also n≥0 < x, en > en ∈ G. Therefore,
by the projection principle, X
PG (x) = < x, en > en .
n≥0

(d) By Schwarz’s inequality in `2 , for all N ≥ 0

N
!2 N
! N
!
X ∗
X 2
X 2
| < x, en >< y, en > | ≤ | < x, en > | | < y, en > |
n=0 n=0 n=0

≤ kxk kyk2 .
2

P∞
Therefore the series n=0 < x, en >< y, en >∗ is absolutely convergent. Also, by an elementary
computation,
N
X N
X N
X
< < x, en > en , < y, en > en >>= < x, en >< y, en >∗ .
n=0 n=0 n=0

Letting N → ∞ we obtain (2.13) (using (2.12) and the continuity of the Hermitian product). 

2.3.2 Complete orthonormal systems


Definition 2.3.2 The sequence {wn }n≥0 of vectors of H is said to be total in H if it generates H.

In other words, the finite linear combination of the elements of {wn }n≥0 forms a dense subset of H.
We are now ready for the fundamental result: Hilbert basis theorem.
38 CHAPTER 2. HILBERT SPACES

Theorem 2.3.2 Let {en }n≥0 be an orthonormal system of H. The following properties are equivalent:

(a) {en }n≥0 is total in H;


(b) For all x ∈ H, the Plancherel–Parseval identity holds true:
X
kxk2 = | < x, en > |2 ; (2.14)
n≥0

(c)For all x ∈ H,
X
x= < x, en > en . (2.15)
n≥0

Proof. (a)⇒(c) According to (c) of Theorem 2.3.1


X
< x, en > en = PG (x),
n≥0

where G is the Hilbert subspace generated by {en }n≥0 . Since {en }n≥0 is total, it follows by (??) that
G⊥ = {0}, and therefore PG (x) = x.
(c)⇒(b) This follows from (a) of Theorem 2.3.1.
(b)⇒(a) From (2.10) and (2.12)
X
| < x, en > |2 = kPG (x)k2 ,
n≥0

and therefore (2.14) implies


kxk2 = kPG (x)k2 .
From Pythagoras’ theorem

kxk2 = kPG (x) + x − PG (x)k2


= kPG (x)k2 + kx − PG (x)k2
= kxk2 + kx − PG (x)k2 ,

and therefore
kx − PG (x)k2 = 0,
which implies
x = PG (x).
Since this is true for all x ∈ H we must have G ≡ H, i.e. {en }n≥0 is total in H. 
A sequence {en }n≥0 satisfying one (and then all) of the conditions of Theorem 2.3.2 is called a (
denumerable) Hilbert basis of H.

Definition 2.3.3 Two sequences {en }n≥0 and {dn }n≥0 of a Hilbert space H form a biorthonormal system
if:

(α) < en , dk >= 0 for all n 6= k;


(β) < en , dn >= 1 for all n ≥ 0.

This system is called complete if in addition each of the sequences {e n }n≥0 and {dn }n≥0 form a total
subset of H.
2.3. ORTHONORMAL EXPANSIONS 39

Then we have the biorthonormal expansions


X X
x= < x, en > dn , x= < x, dn > en
n≥0 n≥0

whenever these series converge. Indeed, with the first series for example, calling its sum y, we have for
any integer m ≥ 0
X
< y, em > = < < x, en > dn , em >>
n≥0
X
= < x, en >< dn , em >=< x, em > .
n≥0

Therefore
< x − y, em >= 0 for all m ≥ 0.
Since {en }n≥0 is total in H this implies x − y = 0.

An interesting theoretical question is: for what type of Hilbert spaces is there a denumerable Hilbert
basis? Here is a first (theoretical) answer.

Definition 2.3.4 A Hilbert space H is called a separable Hilbert space if it contains a sequence {f n }n≥0
that is dense in H.

Theorem 2.3.3 A separable Hilbert space admits at least one denumerable Hilbert basis.

Proof. We recall theGram–Schmidt orthonormalization procedure:

Let {fn }n≥0 be a sequence of vectors of a Hilbert space H. Construct {en }n≥0 as follows:
• Set p(0) = 0 and e0 = f0 /kf0 k (assuming f0 6= 0 without loss of generality);
• e0 , . . . , en and p(n) being defined, let p(n+1) be the first index p > p(n) such that fp is independent
of e0 , . . . , en , and define, with p = p(n + 1),
P
fp − n < f p , ei > e i
en+1 = Pi=1 k.
kfp − n i=1 < fp , ei > ei

Then (exercise) {en }n≥0 is an orthonormal system.

Let now {fn }n≥0 be a sequence defined in Definition 2.3.4. Construct from it the orthonormal se-
quence {en }n≥0 by the Gram–Schmidt orthonormalization procedure. It is a Hilbert basis because (a)
of Theorem 2.3.2 is satisfied. Indeed, for any z ∈ H,

(< en , z >= 0 for all n ≥ 0) =⇒ (< fp , z >= 0 for all n ≥ 0).

In particular, < y, z >= 0 for any finite linear combination of {fp }p≥0 . Because {fp }p≥0 is dense,
< y, z >= 0 for all y ∈ H. In particular < z, z >= 0, that is to say, z = 0. 
40 CHAPTER 2. HILBERT SPACES
Chapter 3

Fourier analysis

3.1 Fourier transform


3.1.1 Fourier transform in L1
Definition 3.1.1 Let the function s : → be in L1 ( ) (that is, integrable). The Fourier transform


(FT) of this function is the function ŝ : → defined by: 

Z
ŝ(ν) = s(t) e−2iπνt dt.

(3.1)

(Note that the argument of the exponential in the integrand is −2iπνt.) The mapping from the function
to its Fourier transform will be denoted by

s(t) → ŝ(ν), or F : s(t) → ŝ(ν).

Theorem 3.1.1 The FT of an integrable function is bounded and continuous.

Proof. We have Z Z
|ŝ(ν)| ≤ 
|s(t) e−2iπνt | dt = 
|s(t)| dt < ∞ ,

and, by dominated convergence,


˛Z “ ”˛˛ Z ˛ ˛
˛ ˛ −2iπht ˛
˛
lim ˛ s(t) e −2iπ(ν+h)t
−e −2iπνt ˛
dt = lim |s(t)| ˛e − 1˛ dt
h↓0 
˛ h↓0 

Z ˛ ˛
˛ −2iπht ˛
= lim |s(t)| ˛e − 1˛ dt = 0 .


h↓0

In the following, s, s1 and s2 are functions in L1 ( ), λ1 , λ2 ∈  , and a ∈ , a > 0. We have (exercise)


the following basic rules:

41
42 CHAPTER 3. FOURIER ANALYSIS

Delay: s(t − t0 ) → e−2iπνt0 ŝ(ν)

Modulation: e2iπν0 t s(t) → ŝ(ν − ν0 )


1
` ´
Doppler: s(at) → |a| ŝ aν

Linearity: λ1 s1 (t) + λ2 s2 (t) → λ1 ŝ1 (ν) + λ2 ŝ2 (ν)

Conjugation: s∗ (t) → ŝ(−ν)∗

Modulation: s(t) cos(2πν0 t) → 21 (ŝ(ν − ν0 ) + ŝ(ν + ν0 )).

For the rectangular pulse recT (t) = 1[−T /2,+T /2] (t), a straightforward computation gives:

recT (t) → T sinc(νT ) , (3.2)

where
sin(πx)
sinc(x) = .
πx

A classical computation using contour integration shows that the Gaussian pulse is its own FT, that is,
2 2
e−πt → e−πν . (3.3)

We deduce from (3.3) that for all α > 0,


r
2 π − πα2 ν 2
e−αt → e .
α

A direct computation gives the FT of the one-sided exponential: For a > 0,

1
s(t) = e−at 1 

+ (t) → ŝ(ν) = . (3.4)


a + 2iπν

We deduce from (3.4) that


2a
s(t) = e−a|t| → ŝ(ν) = .
a2 + 4π 2 ν 2

Theorem 3.1.2 Let h, x ∈ L1 ( ). Then the right-hand side of


Z
y(t) = h(t − s)x(s) ds 
(3.5)

is defined for almost all t, and therefore defines almost-everywhere a function y ∈ L 1 ( ) whose FT is
given by ŷ(ν) = ĥ(ν)x̂(ν).

Proof. By Tonelli’s theorem and the integrability assumptions


Z Z „Z « „Z «

|h(t − s)| |x(s)| dt ds =

|h(t)| dt |x(t)| dt < ∞.
 

This implies that, for almost all t, Z



|h(t − s)x(s)| ds < ∞.
3.1. FOURIER TRANSFORM 43

R
The integral 

h(t − s)x(s) ds is therefore well-defined for almost all t. Also


Z Z ˛Z ˛
˛ ˛
y(t)| dt = ˛ h(t − s)x(s) ds˛ dt

˛  
˛
Z Z
≤ |h(t − s)x(s)| dt ds < ∞.
 

Therefore y is integrable. By Fubini’s theorem,


Z „Z «
h(t − s)x(s) ds e−2iπνt dt
 

Z Z
= h(t − s)e−2iπν(t−s) x(s)e−2iπνs ds dt
 

Z „Z «
= x(s)e−2iπνs 
h(t − s)e−2iπν(t−s) dt ds 

= ĥ(ν)x̂(ν).

The function y, the convolution of h with x, is denoted by

y = h ∗ x.

We therefore have the the convolution–multiplication rule,

(h ∗ x)(t) → ĥ(ν)x̂(ν). (3.6)

Example 1.1: The convolution of the rectangular pulse recT with itself is the triangular pulse of base
[−T, +T ] and height T ,
TriT (t) = (T − |t|)1[−T ,+T ] (t).
By the convolution-multiplication rule,

TriT (t) → (T sinc(νT ))2 (3.7)

We now state the Riemann–Lebesgue Lemma.

Theorem 3.1.3 The FT of a function s ∈ L1 ( ) satisfies

lim |ŝ(ν)| = 0. (3.8)


|ν|→∞

Proof. The FT of a rectangular pulse s satisfies |ŝ(ν)| ≤ K/|ν| (see Eqn. (??)). Hence every function
s that is a finite linear combination of indicator functions of intervals satisfies the same property. Such
finite combinations are dense in L1 ( ), and therefore there exists a sequence sn of integrable functions
such that Z
lim |sn (t) − s(t)| dt = 0,
n→∞ 

and
Kn
|ŝn (ν)| ≤ ,
|ν|
for finite numbers Kn . From the inequality
Z
|ŝ(ν) − ŝn (ν)| ≤ 
|s(t) − sn (t)| dt
44 CHAPTER 3. FOURIER ANALYSIS

we deduce that
Z
|ŝ(ν)| ≤ |ŝn (ν)| + |s(t) − sn (t)| dt


Z
Kn
≤ + |s(t) − sn (t)| dt,
|ν| 

from which the conclusion follows easily. 

In spite of the fact that the FT of an integrable function is uniformly bounded and uniformly continuous,
it is not necessarily integrable. For instance, the FT of the rectangular pulse is the cardinal sine, a non-
integrable function. When its FT is integrable, a function admits a Fourier decomposition:
Theorem 3.1.4 Let s : →  be an integrable function with Fourier transform ŝ. Under the additional
condition
Z

ŝ(ν)| dν < ∞, (3.9)

the inversion formula


Z
s(t) = 
ŝ(ν)e+2iπνt dν (3.10)

holds for almost all t. If s is, in addition to the above assumptions, continuous, equality in (3.10) holds
for all t.
(Note that the exponent of the exponential of the integrand is +2iπνt.)

Proof. We first check (exercise) that the above result is true for the function
2
eα,a (t) = e−αt +at
(α ∈ , α > 0, a ∈  ).
Let now s be an integrable function and consider the Gaussian density function
1 t2
hσ (t) = √ e− 2σ2
σ 2π
with the FT 2
σ2 ν 2
ŝσ (ν) = e−2π .
We first show that the inversion formula is true for the convolution (s ∗ hσ ). Indeed,
Z
(s ∗ hσ )(t) = s(u)hσ (u)e 1 , u (t) du, (3.11)


2σ2 σ2

and the FT of this function is, by the convolution–multiplication formula ŝ × ŝσ . Computing this FT
directly from the right-hand side of (3.11), we obtain
Z „Z «
−2iπνt
ŝ(ν)ŝσ (ν) = s(u)hσ (u) e 1 , u (t)e dt du


2σ2 σ2 

Z
= s(u)hσ (u)ê 1 , u (ν) du.


2σ2 σ2

Therefore, using the result of Exercise ??,


Z Z „Z «
ŝ(ν)ŝσ (ν)e2iπνt dν = s(u)hσ (u)ê 1 , u (ν) du e2iπνt dν
  

2σ2 σ2
Z
= s(u)hσ (u)e 1 , u (t) du


2σ2 σ2

= (s ∗ hσ )(t).
3.1. FOURIER TRANSFORM 45

Therefore, we have
Z
(s ∗ hσ )(t) = 
ŝ(ν)ŝσ (ν)e2iπνt dν, (3.12)

and this is the inversion formula for (s ∗ hσ ).

Since for all ν ∈ limσ↓0 ν ↑ ŝσ (ν) = 1, it follows from Lebesgue’s dominated convergence theorem that
when σ ↓ 0 the right-hand side of (3.12) tends to
Z
ŝ(ν)e2iπνt dν. 

for all t ∈ . If we can prove that when σ ↓ 0, the function on the left-hand side of (3.12) converges in
L1 ( ) to the function s, then, for almost all t ∈ , we have the announced equality (Theorem THM125).

To prove convergence in L1 ( ), we observe that


Z Z ˛Z ˛
˛ ˛
|(s ∗ hσ )(t) − s(t)| dt = ˛ (s(t − u) − s(t))hσ (u) du˛ dt (3.13)

˛ ˛ 

R R
(using the fact 

hσ (u) du = 1), and therefore, defining f (u) = 

|s(t − u) − s(t)| dt,


Z Z

|s ∗ hσ (t) − s(t)| dt ≤ 
f (u)hσ (u) du.

R
Now, |f (u)| is bounded (by 2 

|s(t)| dt). Therefore if limu↓0 f (u) = 0, then, by dominated convergence,


Z Z
lim f (u)hσ (u) = lim f (σu)h1 (u) du = 0. (3.14)
σ↓0 

σ↓0 

To prove that limu↓0 f (u) = 0 we begin with the case where s : → is continuous with compact 

support. In particular, it is uniformly bounded. Since we are interested in a limit as u tends to 0, we


may suppose that u is in a bounded interval around 0, and in particular, the function t → |s(t − u) − s(t)|
is bounded uniformly in u by an integrable function. It follows from the dominated convergence theorem
that limu↓0 f (u) = 0.

Let now s : → be only integrable. Let {sn (·)}n≥1 be a sequence of continuous functions with


compact support that converges in L1 ( ) to s(·). Writing


Z
f (u) ≤ d(s(· − u), sn (· − u)) + |sn (t − u) − sn (t)| dt + d(s(·), sn (·)),


where Z
d(s(· − u), sn (· − u)) = 
|s(t − u) − sn (t − u)| dt,

the result easily follows.

Suppose that, in addition, s is continuous. The right-hand side of (3.10) defines a continuous function
because ŝ(ν) is integrable. The everywhere equality in (3.10) follows from the fact that two continuous
functions that are almost-everywhere equal are necessarily everywhere equal. 

The Fourier transform characterizes an integrable function:

Corollary 3.1.1 If two integrable functions s1 and s2 have the same Fourier transform, then they are
equal almost everywhere.
46 CHAPTER 3. FOURIER ANALYSIS

Proof. The function s(t) = s1 (t) − s2 (t) has the FT ŝ(ν) = 0, which is integrable, and thus by (3.10),
s(t) = 0 for almost all t. 

Exercise 3.9 is very important. It shows that for functions that cannot be called pathological, the version
of the Fourier inversion theorem that we have in this chapter is not applicable.

In the course of the proof of Theorem 3.1.4, we have used a special case of the regularization lemma
below, which is very useful in many circumstances.

Definition 3.1.2 A regularizing function is a nonnegative function h σ : → depending on a param-


eter σ > 0, and such that Z
hσ (u)du = 1, for all σ > 0,


Z +a
lim hσ (u)du = 1, for all a > 0,
σ↓0 −a

lim hσ (u) = 1, for all u ∈ .


σ↓0

Lemma 3.1.1 Let hσ : → be a regularizing function. Let s : →  be in L1 ( ). Then


Z
lim |(s ∗ hσ )(t) − s(t)| dt = 0.
σ↓0 

Proof. We can use the proof of Theorem 3.1.4, starting from (3.13). The only place where the specific
form of hσ (a Gaussian density) is used is (3.14). We must therefore prove that
Z
lim f (u)hσ (u) = 0
σ↓0 

independently. Fix ε > 0. Since limu↓0 f (u) = 0, there exists a = a(ε) such that
Z +a Z +a
ε ε
f (u)hσ (u)du ≤ hσ (u)du ≤ .
−a 2 −a 2

Since f (u) is bounded (say, by M )


Z Z
f (u)hσ (u)du ≤ M hσ (u)du.
\[−a,+a] \[−a,+a]
 

ε
The last integral is, for sufficiently small σ, lesser than 2M . Therefore, for sufficiently small σ,
Z
ε ε
f (u)hσ (u)du ≤ + = ε.


2 2

The function hσ is an approximation of Dirac’s generalized function δ, in that for all ϕ ∈ Cc0 ,
Z Z
lim hσ (t) ϕ(t) dt = ϕ(0) = δ(t) ϕ(t) dt.
σ↓0  

The last equality is symbolic, and defines Dirac’s generalized function. The first equality is obtained as
in the proof of the above lemma, letting this time f (u) = ϕ(u) − ϕ(0).

We shall see how differentiation in the time-domain is expressed in the frequency-domain.


3.1. FOURIER TRANSFORM 47

Theorem 3.1.5 (a) If the integrable function s is such that t k s(t) ∈ L1 ( ) for all 1 ≤ k ≤ n, then its
FT is in C n , and
(−2iπt)k s(t) → ŝ(k) (ν) for all 1 ≤ k ≤ n.
(b) If the function s(t) ∈ C n is , and if it is, together with its n first derivatives, integrable, then

s(k) (t) → (2iπν)k ŝ(ν) for all 1 ≤ k ≤ n.

Proof. (a) In the right-hand side of the expression


Z
ŝ(ν) = e−2iπνt s(t) dt, 

we can differentiate k times under the integral sign (see Theorem ?? of the appendix and the hypothesis
tk s(t) ∈ L1 ( )), and obtain Z
ŝ(k) (ν) = (−2iπt)k e−2iπt s(t) dt.


(b) It suffices to prove this for n = 1, and iterate the result. We first observe that lim|a|↑∞ s(a) = 0.
Indeed, with a > 0 for instance, Z a
s(a) = s(0) + s0 (t)dt,
0
0 1
and therefore, since s (t) ∈ L ( ), the limit exists and is finite. This limit must be 0 because s is
integrable. Now, the FT of s0 is
Z Z +a
e−2iπνt s0 (t) dt = lim e−2iπνt s0 (t) dt.


a↑∞ −a

Integration by parts yields


Z +a “ ”+a Z +a
e−2iπνt s0 (t) dt = e−2iπνt s(t) + (2iπν)e−2iπνt s(t) dt.
−a −a −a

It then suffices to let a tend to ∞ to obtain the announced result. 

The spatial case

The results of the present section, and more generally, of this chapter, extend to the spatial case.

We are now dealing with measurable functions s : n → , that is t = (t1 , . . . , tn ) ∈ n → s(t) ∈ .


 

The Fourier transform of this function when defined is a function ŝ : n → that is, ν = (ν1 , . . . , νn ) ∈ 

n
→ ŝ(ν) ∈ . The scalar product of t = (t1 , . . . , tn ) ∈ n and ν = (ν1 , . . . , νn ) ∈ n is denoted by


n
X
< t, ν >:= tk νk .
k=1

We shall occasionally only quote an important result, but we shall omit the proofs since these are the
same, mutatis mutandis, as in the univariate case. For instance:

If s is in L1 ( n
) then the Fourier transform ŝ is well defined by
Z
ŝ(ν) = s(t) e2iπ<t,ν> dt.


The Fourier transform is then uniformly continuous and bounded, and if moreover ŝ is integrable, then
the inversion formula Z
s(t) = ŝ(ν) e2iπ<t,ν> dν


n
48 CHAPTER 3. FOURIER ANALYSIS

holds almost-everywhere (with respect to the Lebesgue measure) everywhere if s is continuous.

The proof is exactly the same as that of Theorem 3.1.4, with the obvious adaptations. For instance the
multivariate extension of the function hσ thereof is

1 ||t||2

hσ (t) = 2
n e 2σ2 ,
(2πσ ) 2
Pn
where ||t||2 = 2 1
k=1 tk . We also need to observe that if s1 , . . . , sn are functions in L ( ), then
n
s: → defined by


s(t1 , . . . , tn ) = s1 (t1 ) × · · · × sn (tn )


is in L1 ( n
) and its Fourier transform is

ŝ(ν1 , . . . , νn ) = ŝ1 (ν1 ) × · · · × ŝn (νn ).

3.1.2 Fourier Transform in L2


An integrable function as simple as the rectangular pulse has a Fourier transform that is not integrable,
and therefore one cannot use the Fourier inversion theorem for integrable functions as it is. However
there is a version of this inversion formula which applies to all finite energy functions (for instance, the
rectangular pulse). The analysis becomes slightly more involved, and we will have to use the framework
of Hilbert spaces. This is largely compensated by the formal beauty of the results, due to the fact that
a square-integrable function and its FT play symmetrical roles.

We start with a technical result. We use f (.) to denote the function f : 7→  ; in particular, f (a + .)
is the function fa : 7→ defined by fa (t) = f (a + t).


Theorem 3.1.6 Let s ∈ L2 ( ). The mapping from into L2 ( ) defined by

t → s(t + · )

is uniformly continuous.

Proof. We have to prove that the quantity


Z Z
|s(t + h + u) − s(t + u)|2 du =

|s(h + u) − s(u)|2 du 

tends to 0 when h → 0 (the uniformity in t of convergence then follows, since this quantity is independent
of t). When s is continuous and compactly supported the result follows by dominated convergence. The
general case where s(t) ∈ L2 ( ) is obtained by approximating s ∈ L2 ( ) by continuous compactly
supported functions (see the proof of Theorem 3.1.4). 

From Schwarz’s inequality, we deduce that

t →< s(t + · ), s( · ) >L2 ( 

is uniformly continuous on , and bounded by the energy of the function.


The above function is
Z
t→ 
s(t + x)s∗ (x) dx (3.15)

and is called the autocorrelation function of the finite energy function s. Note that it is the convolution
s ∗ s̃, where s̃(t) = s(−t)∗ .
3.1. FOURIER TRANSFORM 49

Theorem 3.1.7 If the function s : →  is in L1 ( ) ∩ L2 ( ), then its FT ŝ(ν) belongs to L2 ( )


and
Z Z

|s(t)|2 dt = 
|ŝ(ν)|2 dν. (3.16)

Proof. The function s̃ admits the FT ŝ∗ , and therefore by the convolution–multiplication rule

(s ∗ s̃)(t) → |ŝ(ν)|2 . (3.17)

Consider the Gaussian density function

1 t2
hσ (t) = √ e− 2σ2 .
σ 2π

Applying the result in (3.12), with (s ∗ s̃) instead of s, and observing that hσ is an even function, we
obtain
Z Z
|ŝ(ν)|2 ŝσ (ν) dν =

(s ∗ s̃)(x)hσ (x) dx. 
(3.18)

2
σ 2 x2
R
Since ŝσ (ν) = e−2π ↑ 1 when σ ↓ 0, the left-hand side of (3.18) tends to 

|ŝ(ν)|2 dν, by dominated


convergence.

On the other hand, since the autocorrelation function (s ∗ s̃) is continuous and bounded, the quantity
Z Z
(s ∗ s̃)(x)hσ (x) dx = 
(s ∗ s̃)(σy)h1 (y) dy 

tends when σ ↓ 0 towards


Z Z

(s ∗ s̃)(0)h1 (y) dy = (s ∗ s̃)(0) = 
|s(t)|2 dt,

by dominated convergence. 

From the result of the last section, we have that the mapping ϕ : s(t) → ŝ(ν) from L1 ( ) ∩ L2 ( )
into L2 ( ) thus defined is isometric and linear. Since L1 ∩ L2 is dense in L2 , this linear isometry can
be uniquely extended into a linear isometry from L2 ( ) into itself (Theorem 2.2.5). We will continue
to denote by ŝ(ν) the image of s under this isometry, and to call it the FT of s.

The above isometry is expressed by the Plancherel–Parseval identity:

Theorem 3.1.8 If s1 : →  and s2 : →


are finite energy functions then


Z Z

s1 (t)s2 (t)∗ dt = ŝ1 (ν)ŝ2 (ν)∗ dν.

(3.19)

Theorem 3.1.9 If h(t) ∈ L1 ( ) and x(t) ∈ L2 ( ), then


Z
y(t) = h(t − s)x(s) ds 
(3.20)

is almost-everywhere well defined and is in L2 ( ). Furthermore, its FT is

ŷ(ν) = ŝ(ν)x̂(ν). (3.21)


50 CHAPTER 3. FOURIER ANALYSIS

R
Proof. Let us first show that h(t − s)x(s) ds is well defined. For this we observe that on the one


hand
Z Z
|h(t − s)| |x(s)| ds ≤

|h(t − s)|(1 + |x(s)|2 ) ds 

Z Z
= |h(t)| dt + |h(t − s)| |x(s)|2 ds,
 

and on the other, for almost all t,


Z

|h(t − s)| |(x(s)|2 ds < ∞,

since |h(t)| and |x(t)|2 are in L1 ( ). Therefore, for almost all t,


Z
|h(t − s| |x(s) ds < ∞,


and y : → is almost-everywhere well defined. Let us now show that y(t) ∈ L2 ( ). Using Fubini’s


theorem and Schwarz’s inequality we have:


Z ˛Z ˛2
˛ ˛
˛ h(t − s)x(s) ds˛ dt
˛  
˛
Z ˛Z ˛2
˛ ˛
= ˛ h(u)x(t − u) du˛ dt
˛ 
˛ 

Z Z Z ff
= 
x(t − u)x(t − v)∗ dt h(u)h(v)∗ du dv
 

„Z « „Z «2
≤ 
|x(s)|2 ds 
|h(u)| du < ∞.

For future reference, we rewrite this

kh ∗ xkL2 ( 

) ≤ khkL1 ( 

) kxkL2 ( )


(3.22)

The function (3.20) is thus in L2 ( ) when h(t) ∈ L1 ( ) and x(t) ∈ L2 ( ). If, furthermore, x ∈ L1 ( ),
then y ∈ L1 ( ). Therefore

x(t) ∈ L1 ( ) ∩ L2 ( ) → y(t) ∈ L1 ( ) ∩ L2 ( ). (3.23)


1
In this case we have (3.21), by the convolution–multiplication formula in L .

We now suppose that x ∈ L2 ( ) (but not necessarily in L1 ( )). The function

xA (t) = x(t)1[−A,+A](t)

is in L1 ( ) ∩ L2 ( ) and lim xA = x in L2 ( ). In particular, lim x̂A (ν) = x̂(ν) in L2 ( ). Introducing


Z
yA (t) = h(t − s)xA (s) ds 

we have ŷA (ν) = ŝ(ν)x̂A (ν). Also, lim yA = y in L2 ( ) (use (3.22), and thus lim ŷA (ν) = ŷ(ν) in
L2 ( ). Now, since lim x̂A (ν) = x̂(ν) in L2 ( ) and ŝ(ν) is bounded, lim ŝ(ν)x̂A (ν) = ŝ(ν)x̂(ν) in
L2 ( ). Therefore, we have (3.21). 

So far, we know that the mapping ϕ : L2 ( ) 7→ L2 ( ) defined above is linear, isometric, and into. We
shall now show that it is onto, and therefore bijective.
3.1. FOURIER TRANSFORM 51

Theorem 3.1.10 Let ŝ(ν) be the FT of s(t) ∈ L2 ( ). Then

ϕ : ŝ(−ν) → s(t), (3.24)

that is to say,
Z +A
s(t) = lim ŝ(ν)e2iπνt dν. (3.25)
A↑∞ −A

where the limit is in L2 ( ), and the equality is almost everywhere.

We shall prepare the way for the proof with the following result.

Lemma 3.1.2 Let u : →  and v : → be two finite energy functions. Then




Z Z
u(x)v̂(x) dx =

û(x)v(x) dx.

(3.26)

Proof. If (3.26) is true for u(t), v(t) ∈ L1 ( ) ∩ L2 ( ) then it also holds for u(t), v(t) ∈ L2 ( ). Indeed,
denoting xA (t) = x(t)1[−A,+A](t) we have
Z Z
d
uA (x)(v A )(x) dx =
[
(u A )(x)vA (x) dx,
 

2
that is to say, < uA , vc
A >=< u cA , vA >. Now uA , vA , uc c
A and v A tend in L ( ) to u, v, û and v̂,
respectively, as A ↑ ∞, and therefore, by the continuity of the Hermitian product, < u, v̂ >=< û, v >.

The proof of (3.26) for integrable functions is accomplished by Fubini’s theorem:


Z Z Z ff

u(x)v̂(x) dx = 
u(x) v(y)e−2iπxy dy dx


Z Z ff
= 
v(y) u(x)e−2iπxy dy dy


Z
= 
v(y)û(y) dy.

d
Proof. (of (3.24)) Let g : → be a real function in L2 ( ) and define f = (g − ), where g − (t) = g(−t).

We have fˆ(ν) = ĝ(ν)∗ . Therefore, by (3.26):


Z Z
ˆ dx =
g(x)f(x) ĝ(x)f (x) dx
 

Z
= ĝ(x)ĝ(x)∗ dx.

Therefore

kg − f k2 = kgk2 − 2Re < g, fˆ > +kfˆk2


= kgk2 − 2kĝk2 + kf k2 .

But kgk2 = kĝk2 and kf k2 = kĝk2 . Therefore kg − fˆk2 = 0, that is to say,

g(t) = fˆ(t). (3.27)

In other words, every real (and therefore, every complex) function g(t) ∈ L2 ( ) is the Fourier transform
of some function of L2 ( ). Hence the mapping ϕ is onto. 
52 CHAPTER 3. FOURIER ANALYSIS

3.2 Fourier Series


3.2.1 Fourier series in L1loc
A periodic function is neither integrable nor of finite energy unless it is almost everywhere null, and
therefore, the theory of the preceding chapter is not applicable. The relevant notion is that of Fourier
series. (Note that Fourier series were introduced before Fourier transforms, in contrast with the order
of appearance chosen in this text.) The elementary theory of Fourier series of this section is parallel to
the elementary theory of Fourier transforms of the previous sections. The connection between Fourier
transforms and Fourier series is made by the Poisson summation formula.

A function s : →  is called periodic with period T > 0 (or T -periodic) if for all t ∈ ,

s(t + T ) = s(t).

Such function is called locally integrable, if in addition s ∈ L1 ([0, T ]), that is,
Z T
|s(t)| dt < ∞.
0

A T -periodic function s is called locally square-integrable if s(t) ∈ L2 ([0, T ]), that is,
Z T
|s(t)|2 dt < ∞.
0

In a signal processing context, one says in this case that the function s has finite power , since
Z Z
1 A 1 T
lim |s(t)|2 = |s(t)|2 dt < ∞.
A→∞ A 0 T 0

As the Lebesgue measure of [0, T ] is finite, L2 ([0,T ]) ⊂ L1 ([0,T ]) (See Theorem 1.3.1). In particular, a
finite power periodic function is also locally integrable.

Definition 3.2.1 The Fourier transform {ŝn }, n ∈ , of the locally integrable T -periodic function


s : → is defined by the formula




Z
1 T n
ŝn = s(t)e−2iπ T t dt, (3.28)
T 0

and ŝn is called the n-th Fourier coefficient of the function s.

One often represents the sequence {ŝn }n∈ of the Fourier coefficients of a T -periodic function by ‘spectral


lines’ separated by 1/T from each other along the frequency axis. The spectral line at frequency n/T
has the complex amplitude ŝn . This is sometimes interpreted by saying that the FT of s is
X “ n”
ŝ(ν) = ŝn δ ν − ,
n∈ 
T

where δ is the so-called Dirac generalized function.

The Poisson kernel will play in the proof of the Fourier series inversion formula a role similar to that of
the Gaussian pulse in the proof of the Fourier transform inversion formula of the previous chapter.

The Poisson kernel is the family of functions Pr : 7→ , 0 < r < 1, defined by




X |n| 2iπ n t
Pr (t) = r e T . (3.29)
n∈ 
3.2. FOURIER SERIES 53

For fixed r, Pr is T -periodic, and elementary computations reveal that


X n 2iπ n t X n −2iπ n t
Pr (t) = r e T + r e T −1

n≥0 n≥0

(1 − r2 )
= ˛ ˛ ≥ 0,
t ˛2
˛
˛1 − re2iπ T ˛

and therefore

Pr (t) ≥ 0. (3.30)

Also
Z +T /2
1
Pr (t) dt = 1. (3.31)
T −T /2

In view of the above expression of the Poisson kernel, we have the bound
Z
1 (1 − r2 )
Pr (t) dt ≤ ˛ ˛ ,
T [− T ,+ T ]\[−ε,+ε] ˛1 − e2iπ Tε ˛2
2 2

and therefore, for all ε > 0,


Z
1
lim Pr (t) dt = 0. (3.32)
r↑1 T [− T ,+ T ]\[−ε,+ε]
2 2

Properties (3.30),(3.31), and (3.32) make of the Poisson kernel a regularizing kernel, and in particular
Z +T
1 2
lim ϕ(t)Pr (t) dt = ϕ(0),
r↑1 T −T
2

for all bounded continuous ϕ : →  (same proof as in Lemma 3.1.1).

The following result is similar to the Fourier inversion formula for integrable functions (Theorem 3.1.4).

Theorem 3.2.1 Let s : →  be a T -periodic locally integrable function with Fourier coefficients {ŝ n },
n ∈ . If


X
|ŝn | < ∞, (3.33)
n∈ 

then, for almost all t ∈ ,


X n
s(t) = ŝn e+2iπ T t . (3.34)
n∈ 

If we add to the above hypotheses the assumption that s is a continuous function, then the inversion
formula (3.34) holds for all t.

Proof. The proof is similar to that of Theorem 3.1.4. We have

X Z +T
n 1 2
ŝn r|n| e2iπ T t = s(u)Pr (t − u) du, (3.35)
n∈
T −T


and Z ˛Z ˛
T ˛ T
du ˛
lim ˛ s(u)Pr (t − u) − s(t)˛˛ dt = 0,
r↑1 ˛ T
0 0
54 CHAPTER 3. FOURIER ANALYSIS

P
that is to say: the right-hand side of (3.35) tends to s in L1 ([0, T ]) when r ↑ 1. Since n∈ |ŝn | < ∞,
P +2iπ(n/T )t


the function of t in left-hand side of (3.35) tends towards the function n∈ ŝn e , pointwise 

1
and in L ([0, T ]). The result then follows from Theorem 1.3.7.

The statement in the case where s is continuous is proved exactly as the corresponding statement in
Theorem 3.1.4. 

As in the case of integrable functions, we deduce from the inversion formula the uniqueness theorem:

Corollary 3.2.1 Two locally integrable periodic functions with the same period T that have the same
Fourier coefficients are equal almost everywhere.

3.2.2 The Poisson summation formula


The Poisson summation formula takes many forms. The strong version is
X X “n”
T s(nT ) = ŝ . (3.36)
n∈ n∈
T 

This aesthetic formula has a number of applications in function processing.

The forthcoming result establishes the connection between Fourier transform and Fourier series, and is
central to sampling theory. It is a weak form of the Poisson summation formula (see the discussion after
the statement of the theorem).
Theorem 3.2.2 Let s :
P → be a integrable function and let 0 < T < ∞ be fixed. The series


n∈ s(t + nT ) converges absolutely almost-everywhere




` ´ to a T -periodic locally integrable function Φ :


→ , the nth Fourier coefficient of which is T1 ŝ Tn .


We paraphrase this result as follows: Under the above conditions, the function
X
Φ(t) := s(t + nT ) (3.37)
n∈ 

is T -periodic and locally integrable, and its formal Fourier series is


1 X “ n ” 2iπ Tn t
Sf (t) = ŝ e . (3.38)
T n∈ T 

(We speak of a “formal” Fourier series, because nothing is said about its convergence.) Therefore,
whenever we are able to show that the Fourier series represents the function at t = 0, that is, if
Φ(0) = Sf (0), then we obtain the Poisson summation formula (3.36).

For the time being, we say nothing about the convergence of the Fourier series. This is why we call the
formula the weak Poisson formula. The strong Poisson formula corresponds to the case where one can
prove the equality everywhere (and in particular at t = 0) of Φ and of its Fourier series.

Proof. We first show that Φ is well-defined:


Z T X XZ T
|s(t + nT )| dt = |s(t + nT )| dt
0 n∈  n∈ 
0

XZ (n+1)T Z
= |s(t)| dt = |s(t)| dt < ∞.
nT


n∈ 

In particular X
|s(t + nT )| < ∞ a.e.
n∈ 
3.2. FOURIER SERIES 55

P
Therefore the series n∈ s(t + nT ) converges absolutely for almost all t. In particular, Φ is well-defined


(define it arbitrarily when the series does not converge). This function is clearly T -periodic. We have
Z T Z T ˛˛ X ˛
˛
˛ ˛
|Φ(t)| dt = ˛ s(t + nT )˛ dt
0 0 ˛n∈ ˛


Z T X Z
≤ |s(t + nT )| dt = |s(t)| dt < ∞.
0


n∈ 

Therefore, Φ is integrable. Its nth Fourier coefficient is


Z
1 T n
cn (Φ) = Φ(t)e−2iπ T t dt
T 0
Z ( )
1 T X n
= s(t + kT ) e−2iπ T t dt
T 0
k∈
Z T (X )


1 n (t+kT )
−2iπ T
= s(t + kT )e dt
T 0 k∈
Z


1 n 1 “n”
= s(t)e−2iπ T t dt = ŝ .
T 

T T


We have a function and we have its formal Fourier series. When both are equal everywhere, we obtain
the strong Poisson summation formula. Exercise 3.23 gives a sufficient condition for this.

3.2.3 Fourier Series in L2loc


P
Let us consider the Hilbert space `2 ( ) of complex sequences a = {an }, n ∈
  , such that n∈ 
|an |2 <
∞, with the Hermitian product
X
< a, b >`2 ( ) = an b∗n , 
(3.39)
n∈ 

RT
and the Hilbert space L2 ([0, T ], dt/T ) of complex functions x = {x(t)}, t ∈ , such that 0
|x(t)|2 dt <
∞, with the Hermitian product
Z T
dt
< x, y >L2 ([0,T ], dt ) = x(t)y(t)∗ . (3.40)
T
0 T

Theorem 3.2.3 Formula


Z T
1 n
ŝn = s(t)e−2iπ T t dt (3.41)
T 0

defines a linear isometry s(·) → {ŝn } from L2 ([0, T ], dt/T ) onto `2 ( ), the inverse of which is given by


X n
s(t) = ŝn e2iπ T t , (3.42)
n∈ 

where the series on the right-hand side converges in L2 ([0, T ], dt/T ), and the equality is almost-
everywhere. This isometry is summarized by the Plancherel–Parseval identity:
X Z
1 T
x̂n ŷn∗ = x(t)y(t)∗ dt . (3.43)
n∈ 
T 0
56 CHAPTER 3. FOURIER ANALYSIS

Proof. The result follows from general results on orthonormal bases of Hilbert spaces, since the sequence
 ff
1 n
{en ( · )} := √ e2iπ T · , n ∈ , 

T
is a complete orthonormal sequence of L2 ([0, T ], dt/T ) (Theorem 3.2.6). 
P
Let `1 ( ) be the space of sequences
 fn , n ∈  , such that n∈ 
|fn |
< ∞ (integrable discrete-time functions).

Theorem 3.2.4 `1 ( ) ⊂ `2 ( ), that is, a discrete-time integrable function has finite energy.
 

P
Proof. Let A = {n; , |xn | ≥ 1}. Since n∈ |xn | < ∞ then necessarily card (A) < ∞. On the other


hand, if |xn | ≤ 1, |xn |2 ≤ |xn |. Whence


X X X
|xn |2 ≤ |xn |2 + |xn | < ∞.
n∈  n∈A n∈A

The situation for discrete-time functions is in contrast with that of continuous-time functions, for which
there exist integrable functions with infinite energy, and finite energy functions which are not integrable.
R +π
Let L2 (2π) be the Hilbert space of functions f˜ [−π, +π] → such that −π |f˜(ω)|2 dω < ∞ provided


with the Hermitian product Z +π


1
(f˜, g̃)L2 (2π) = f˜(ω)g̃(ω)∗ dω
2π −π

Theorem 3.2.5 There exists a linear isomorphism between L2 (2π) and `2 ( ) defined by 

Z +π X

fn = f˜(ω)einω , f˜(ω) = fn e−inω . (3.44)
−π 2π n∈ 

In particular, we have the Plancherel Parseval identity


X Z +π
1
fn gn∗ = f˜(ω)g̃(ω)∗ dω. (3.45)
n∈
2π −π


Proof. This is a rephrasing of Theorem 3.2.4. 

Theorem 3.2.6 The sequence


 ff
1 n ·
{en ( · )} := √ e2iπ T , n∈  ,
T

is a Hilbert basis of L2 ([0, T ]).

Proof. One first observes that {en (·), n ∈ } is an orthonormal system in L2 ([0, T ]). It remains to


show that the linear space it generates is dense in L2 ([0, T ]) (Theorem 2.3.2).

For this, let f ∈ L2 ([0, T ]) and let fN be its projection on the Hilbert subspace generated by
{en (·), −N ≤ n ≤ N }. The coefficient of en in this projection is cn (f ) =< f, en >L2 ([0,T ]) we have
+N
X Z T Z T
|cn (f )|2 + |f (t) − fN (t)|2 dt = |f (t)|2 dt. (3.46)
n=−N 0 0
3.2. FOURIER SERIES 57

(This
P is Pythagoras’ theorem for projections: kPG (x)k2 + kx − PG (x)k2 = kxk2 .) In particular,
2
n∈ |cn (f )| < ∞. It remains to show ((b) of Theorem 2.3.2) that


Z T
lim |f (t) − fN (t)|2 dt = 0.
N ↑∞ 0

We assume in a first step that f is continuous. For such a function, the formula
Z T
ϕ(x) = f˜(x + t)f˜(t)∗ dt,
0

where
X
f˜(t) = f (t + nT )1(0,T ] (t + nT ),
n∈ 

defines a T -periodic and continuous function ϕ. Its n-th Fourier coefficient is


Z T“ ”
1 n
cn (ϕ) = f˜(x + t)f˜(t)∗ dt e−2iπ T x dx,
T 0
Z T Z T ff
1 n
= f˜(t)∗ f˜(x + t)e−2iπ T x dx dt
T 0 0
Z T Z t+T ff
1 n n
= f (t) ∗
f˜(s)e−2iπ T s ds e2iπ T t dt
T 0 t

= T |cn (f )|2 .
P 2
Since n∈ |cn (f )| < ∞ and ϕ(x) is continuous, it follows from the Fourier inversion theorem for


locally integrable periodic functions that for all x ∈


X n
ϕ(x) = |cn (f )|2 e2iπ T x .
n∈ 

In particular, for x = 0
Z T X
ϕ(0) = |f (t)|2 dt = |cn (f )|2 ,
0 n∈ 

and therefore, in view of 3.46,


Z T
lim |f (t) − fN (t)|2 dt = 0.
N ↑∞ 0

It remains to pass from the continuous functions to the square-integrable functions. Since the space
C([0, T ]) of continuous functions from [0, T ] into is dense in L2 ([0, T ]), with any ε > 0, one can associate


ϕ ∈ C([0, T ]) such that kf − ϕk ≤ ε/3. By Bessel’s inequality, kfN − ϕN k2 = k(f − ϕ)N k2 ≤ kf − ϕk2 ,
and therefore

kf − fN k ≤ kf − ϕk + kϕ − ϕN k + kfN − ϕN k
≤ kϕ − ϕN k + 2kf − ϕk
ε
≤ kϕ − ϕN k + 2 .
3

For N sufficiently large, kϕ − ϕN k ≤ ε3 . Therefore, for N sufficiently large, kf − fN k ≤ ε. 


58 CHAPTER 3. FOURIER ANALYSIS

3.2.4 The Sampling Theorem


In a digital communication system, an analog function {s(t)}t∈ must be transformed into a sequence of 

binary symbols, 0 and 1. This binary sequence is generated by first sampling the analog function, that
is, extracting a sequence of samples {s(n∆)}n∈ , and then, quantizing, that is, converting each sample


into a block of 0 and 1.

The first question that arises is: To what extent does the sample sequence represent the original function?
This cannot be true without further assumptions since obviously an infinity of functions fit a given
sequence of samples.

The second question is: How to reconstruct efficiently the function from its samples?

We begin with a general result.


Theorem 3.2.7 Let s : → be an integrable and continuous function with Fourier transform ŝ ∈


L1 ( ), and assume in addition that for some 0 < B < ∞


X ˛˛ “ n ”˛˛
˛s ˛ < ∞. (3.47)
n∈
2B 

Then,
X 1 X “ n ” −2iπν 2B
n
ŝ(ν + j2B) = s e , a.e. (3.48)
j∈ 
2B n∈ 2B 

Let h : →  be a function of the form


Z
h(t) = T (ν)e2iπνt dν, (3.49)
R

where T (ν) ∈ L1 ( ). The function


1 X “ n ” “ n ”
s̃(t) = s h t− (3.50)
2B n∈ 2B 2B 

then admits the representation


Z (X )
s̃(t) = 
ŝ(ν + j2B) T (ν)e2iπνt dν. (3.51)
j∈ 

P
Proof. By Theorem 3.2.2, the 2B-periodic function Φ(ν) = j∈ ŝ(ν + j2B) is locally integrable, and 

its n-th Fourier coefficient is Z


1 n
ŝ(ν)e−2iπ 2B ν dν,
2B 

that is, since the Fourier inversion formula for s holds (ŝ is integrable) and it holds everywhere (s is
continuous), the n-th Fourier coefficient of Φ(ν) is in fact equal to
“ n ”
s − .
2B
The formal Fourier series of Φ(ν) is therefore
1 X “ n ” −2iπ 2B
n ν
s e .
2B n∈ 2B 

In view of condition (3.47), the Fourier inversion formula holds a.e. (Theorem 3.2.1), that is Φ(ν) is
almost everywhere equal to its Fourier series. This proves (3.48).
3.2. FOURIER SERIES 59

Since the frequency response T (ν) ∈ L1 ( ), the impulse response h given by (3.49) is bounded and
uniformly continuous, and therefore s̃ is bounded and continuous (the right-hand side of (3.50) is a
normally convergent series—by (3.47)—of bounded and continuous functions). Also, on substituting
(3.49) in (3.50) we obtain
Z
1 X “ n ” n
s̃(t) = s T (ν) e2iπν(t− 2B ) dν
2B n∈ 2B 

Z (X )


1 “ n ” −2iπν 2B n
= s e T (ν) e2iπνt dν.
n∈


2B 2B 

(The interchange of integration and summation is justified by Fubini’s theorem because


!
Z X˛ “
˛ n ”˛˛ X ˛˛ “ n ”˛˛ „Z «
˛s ˛ |T (ν)| dν = ˛s ˛ |T (ν)| dν < ∞ ).


n∈ 
2B n∈
2B 


Therefore, Z
s̃(t) = 
g(ν)e2iπνt dν,

where ( )
1 X “ n ” −2iπν n
g(ν) = s e 2B T (ν).
2B n∈
2B 

The result (3.51) then follows from (3.48). 

We now state the Shannon–Nyquist sampling theorem.

Theorem 3.2.8 Let s : → be an integrable and continuous function whose FT ŝ vanishes outside


[−B, +B], and assume condition (3.47) is satisfied. We can then recover s from its samples s(n/2B),
n ∈ , by the formula


X “ n ”
s(t) = s sinc(2Bt − n), a.e.. (3.52)
n∈
2B 

Proof. This is a direct consequence of the previous theorem with T (ν) being the frequency response of
the low-pass (B). Indeed
( )
X
ŝ(ν + j2B) T (ν) = ŝ(ν)1[−B,+B] (ν) = ŝ(ν),
j∈ 

and therefore, by (3.51), Z


s̃(t) = 
ŝ(ν)e2iπνt dν = s(t).

The second equality is an almost-everywhere equality; it holds everywhere when s is a continuous function
(see Corollary A1.2). 

If we interpret s(n/2B)h(t − n/2B) as the response of the low-pass (B) when a Dirac impulse of height
s(n/2B) is applied at time n/2B, the right-hand side of equation (3.52) is the response of the low-pass
(B) to Dirac’s comb
1 X “ n ” “ n ”
si (t) = s δ t− . (3.53)
2B n∈ 2B 2B 

The spatial version of Theorem 3.2.8 is the following.


60 CHAPTER 3. FOURIER ANALYSIS

Theorem 3.2.9 Let s(t1 , . . . , tn ) be an integrable and continuous function whose FT ŝ(ν1 , . . . , νn ) van-
ishes outside [−B1 , +B1 ] × · · · × [−Bn , +Bn ], and assume that
X X ˛˛ „ k1 «˛
kn ˛˛
··· ˛s , . . . , < ∞. (3.54)
˛ 2B1 2Bn ˛
k ∈ k ∈ 1 

n 

“ ”
k1 kn
Then, we can then recover s from its samples s 2B1
,..., 2Bn
, k1 ∈  , . . . , kn ∈  , by the formula

X X „ «
k1 kn
s(t1 , . . . , tn ) = ··· s ,..., sinc(2B1 t − k1 ) × · · · × sinc(2Bn t − kn ), a.e.. (3.55)
k1 ∈  kn ∈ 
2B1 2Bn

What happens in the Shannon–Nyquist sampling theorem if one supposes that the function is base-band
(B), although it is not the case in reality?

Suppose that an integrable function s is sampled at the frequency 2B and that the resulting impulse
train is applied to the low-pass (B) with impulse response h(t) = 2Bsinc(2Bt), to obtain, after division
by 2B, the function
X “ n ”
s̃(t) = s sinc(2Bt − n).
n∈
2B 

What is the FT of this function? The answer is given by the theorem below which is a direct consequence
of Theorem 3.2.7.

Theorem 3.2.10 Let s : →  be an integrable and continuous function such that condition (3.47) is
satisfied. The function
X “ n ”
s̃(t) = s sinc(2Bt − n) (3.56)
n∈
2B 

admits the representation Z


s̃(t) = b̃
s(ν)e2iπνt dν,


where
( )
X

s(ν) = ŝ(ν + j2B) 1[−B,+B] (ν). (3.57)
k∈ 

If s̃ is integrable, then b̃
s is its FT, by the Fourier inversion theorem. This FT is obtained by superposing,
in the frequency band [−B, +B], the translates by multiples of 2B of the initial spectrum ŝ(ν). This
superposition constitutes the phenomenon of spectrum folding, and the distortion that it creates is called
aliasing .

The L1 version of the Shannon–Nyquist theorem contains a condition bearing on the samples themselves,
namely:
X ˛˛ “ n ”˛˛
˛s ˛<∞ (3.58)
n∈
2B 

The simplest way of removing this unaesthetic condition is given by the L2 version of the Shannon–
Nyquist theorem.
3.2. FOURIER SERIES 61

Theorem 3.2.11 Let s : →  be a base-band (B) function of finite energy. Then


Z ˛˛ +N
X
˛2
˛
˛ ˛
lim ˛s(t) − bn sinc(2Bt − n)˛ dt = 0, (3.59)
N ↑∞ ˛ 

˛
n=−N

where Z +B
n
bn = ŝ(ν) e2iπν 2B dν.
−B

Proof. Denote L2 ( ; B) the Hilbert subspace of L2 ( ) consisting of the finite energy complex functions
with a Fourier transform having its support contained in [−B, +B]. The sequence
 “ ff
1 n ”
√ h ·− (3.60)
2B 2B n∈ 

2
where h(t) ≡ 2B sinc(2Bt), is an orthonormal basis of L ( ; B). Indeed, the functions of this system
are in L2 ( ; B), and they form an orthonormal system since, by the Plancherel–Parseval formula,
Z “ „ « Z
n ” k n
“ k
”∗
h t− h t− dt = ŝ(ν)e−2iπν 2B ŝ(ν)e−2iπν 2B dν


2B 2B 

Z +B
k−n
= e2iπν 2B dν = 2B × 1n=k .
−B

It remains to prove the totality of the orthonormal system 3.60 (see Theorem 2.3.2). We must show that
if g(t) ∈ L2 ( ; B) and
Z “ n ”
g(t)h t − dt = 0 for all n ∈ , (3.61)
2B


then g(t) ≡ 0 as a function of L2 ( ; B) (or, equivalently, that g(t) = 0 almost everywhere).

Condition 3.61 is equivalent (by the Plancherel–Parseval identity) to


Z +B
n
ĝ(ν)e2iπν 2B dν = 0 for all n ∈ .  (3.62)
−B

But we have proven in the previous section that the system {e2iπνn/2B }n∈ is total in L2 ( ; B), therefore


3.62 implies ĝ(ν) = 0 almost everywhere, and consequently g(t) = 0 almost everywhere.

Expanding s(t) ∈ L2 ( ; B) in the Hilbert basis 3.60 yields


+N
X 1 “ n ”
s(t) = lim cn √ h t− , (3.63)
N ↑∞
−N
2B 2B

where the limit and the equality in 3.63 are taken in the L2 sense (as in 3.59), and
Z “
1 n ”
cn = s(t) √ h t− dt.
2B


2B
By the Plancherel-Parseval identity,
Z +B
1 n
cn = ŝ(ν) √ e2iπν 2B dν.
−B 2B

62 CHAPTER 3. FOURIER ANALYSIS

3.3 Exercises
Exercise 3.1.
Show that the FT of a real function is Hermitian even, that is:

ĥ(− ν) = ĥ(ν)∗ .

Show that the FT of an odd (resp., even; resp., real and even) function is odd (resp., even; resp., real
and even).

Exercise 3.2.
Let f : ( , B) → ( , B), be integrable with respect to the Lebesgue measure. Show that its Fourier
transform fˆ is continuous and bounded.

Exercise 3.3.
Let x : → be an integrable function. Show that its autocorrelation function


Z
c(t) = x(s + t)x∗ (s)ds 

is well-defined, integrable, and that its FT is |x̂|2 .

Exercise 3.4.
Show that the n-th convolution power of f (t) = e−at 1t≥0 (t), where a > 0, is

tn−1 −at
f ∗n (t) = e 1t≥0 (t).
(n − 1)!

(f ∗3 = f ∗ f ∗ f , etc.) Deduce from this the FT of s(t) = tn e−at 1t≥0 (t).

Exercise 3.5.
Show that for s(t) ∈ L2 ( ),
Z ˛ Z ˛2
˛ +T ˛
lim ˛ŝ(ν) − s(t)e−2iπνt dt˛˛ dν. (3.64)
T ↑∞ ˛
−T


Exercise 3.6.
Use the Plancherel-Parseval identity to show that
Z „ «2
sin(πν)
dν = 1.


πν

Exercise 3.7.
1
Give the FT of s(t) = a2 +t2
. Deduce from this the value of the integral
Z
1
I(t) = du, t > 0.


t2 + u 2

Exercise 3.8.
Deduce from the Fourier inversion formula that
Z „ «2
sin(t)
dt = π.
t 
3.3. EXERCISES 63

Exercise 3.9.
Let s : → be an integrable right-continuous function, with a limit from the left at all times. Show


that if s is discontinuous at some time t0 , its FT cannot be integrable.

Exercise 3.10.
Let s : → be an integrable function with a Fourier transform with a compact support. Show that


s ∈ C ∞ , that all its derivatives are integrable, and that the k-th derivative has the FT (2iπν)k ĥ(ν).

Exercise 3.11.
Give a differential equation satisfied by the Gaussian pulse, and use it to deduce its Fourier transform.
Could you do the same to prove (3.4)?

Exercise 3.12.
Use Plancherel–Parseval identity to prove that
Z
dt π
= .
(t2 + a2 )(t2 + b2 )


ab(a + b)

Exercise 3.13.
Show that if an integrable function is base-band (that is, if its FT has compact support), then it also
has a finite energy.

Exercise 3.14.
Show that if an integrable function is discontinuous at a point t = a, its FT is not integrable. (This
shows that the L1 Fourier inversion theorem is limited in scope, since it does not take much for an
integrable function not to have an integrable FT.)

Exercise 3.15.
Find a constant c and polynomials P (z) and Q(z) as in Theorem ??, such that
˛ ˛
5 − 2 cos(ω) ˛ Q(e−iω ) ˛2
) = c ˛˛ ˛ .
3 − cos(ω) P (e−iω ) ˛

Exercise 3.16.
Compute the Fourier coefficients of the T -periodic function s : →  such that on [0, T ), s(t) = t.

Exercise 3.17.
Let s : → be a locally integrable T -periodic function. Defining


sT (t) = s(t)1[0,T ] (t),

show that the n-th Fourier coefficient ŝn of s and the FT sc T of sT are linked by

1 “ n ”
ŝn = sc
T . (3.65)
T T

Exercise 3.18.
Compute the Fourier coefficients of the T -periodic function s(t) such that on [−T /2, +T /2), s(t) =
1[−α T ,+α T ] (t), where α ∈ (0, 1).
2 2

Exercise 3.19.
Let s(t) be a T -periodic locally integrable function with n-th Fourier coefficient ŝn . Show that lim|n|↑∞ ĥn =
0.
64 CHAPTER 3. FOURIER ANALYSIS

Exercise 3.20.
Let x(t) be a T -periodic locally integrable function and let h : →  be an integrable function. Show
that the function
Z
y(t) = h(t − s)x(s) ds

(3.66)

is almost-everywhere well-defined, T -periodic, and locally integrable, and that its n-th Fourier coefficient
is
“n”
ŷn = ĥ x̂n , (3.67)
T

where ĥ is the FT of h .

Exercise 3.21.
Compute X
1/n2
n≥1

using the expression of the Fourier coefficients of the 2-periodic function s : →  such that

s(t) = Tri1 (t) for t ∈ [−1, +1].

Exercise 3.22.
Let x : → be a T -periodic locally integrable function with n-th Fourier coefficient x̂n such that


X p
|n| |x̂n | < ∞.
n∈ 

Show that x is` p times


´p differentiable, and that if the p-th derivative is locally integrable, its n-th Fourier
coefficient is 2iπ Tn x̂n .

Exercise 3.23.
Let s : → be an integrable function with the FT ĥ(ν), and suppose that


P
(a) n∈ s(t + nT ) is a continuous function, and


P
(b) n∈ |ĥ(n/T )| < ∞.


Show that for all t ∈ ,


X X “ n ” 2iπ n t
s(t + nT ) = ĥ e T .
n∈  n∈
T

Exercise 3.24.
Show that if the function „ «2
sin(2πBt)
s(t) =
πt
is sampled at rate 1/2B and if the resulting train of impulses is filtered by a low-pass (B) and divided
by 2B, the result is the function
sin(2πBt)
.
πt

Exercise 3.25.
Let ν0 and B be such that
0 < 2B < ν0 ,
3.3. EXERCISES 65

P
and let s : → be an integrable and continuous base-band (B) function such that


k∈ 
|s(k/ν0 )| <
∞. Consider the train of impulses
„ « „ «
1 X n n
si (t) = s δ t− .
ν0 n∈
ν0 ν0

Passing this train through a low-pass (ν0 + B) one obtains a function a. Passing this train through a
low-pass (B) one obtains a function b .
Show that
a(t) − b(t) = 2 s(t) cos(2πν0 t).
(We have therefore effected the frequency transposition of the original function.)
66 CHAPTER 3. FOURIER ANALYSIS
Chapter 4

Probability

Introduction. From the formal (and limited) point of view, probability theory is a particular case of
measure and integration theory. However, at least the terminologies of the two theories are different
and we shall proceed to the “translation” of the theory of measure and integration into the theory of
probability and expectation.

4.1 Expectation as integral


4.1.1 Random variables

Random variables as measurable functions

Definition 4.1.1 A probability space is a triple (Ω, F, P ), where P , the probability, is a measure on the
measurable space (Ω, F) with total mass 1.

Definition 4.1.2 A random element is a measurable function X from (Ω, F) to a measurable space
(E, E).

If (E, E) = ( , B) (resp. = ( , B)), X is called a real (resp., extended) random variable. If (E, E) =
( n , Bn ), X = (X1 , . . . , Xn ) is called a random vector. A complex random variable is a function X :
Ω → of the form X = XR + iXI where XR and XI are real random variables.


A random variable X is a measurable function, and therefore we can define under general circumstances
its integral with respect to the probability measure P , called the expectation of X and denoted E [X]:
Z
E [X] = X(ω)P (dω).

From the general theory of integration summarized in the previous chapter we collect the following
results. First, if A ∈ F,
E[1A ] = P (A). (4.1)

67
68 CHAPTER 4. PROBABILITY

More generally, if X is a simple random variable, that is,


N
X
X(ω) = αi 1Ai (ω),
i=1

where αi ∈ , Ai ∈ F, then
N
X
E[X] = αi P (Ai ).
i=1
For a nonnegative random variable X, the expectation is always defined by

E[X] = lim E[Xn ],


n↑∞

where {Xn }n≥1 is any nondecreasing sequence of nonnegative simple random variables that converge to
X. This definition is consistent, that is, it does not depend on the approximating sequence of nonnegative
simple random variables, as long as it is non-decreasing and has X for limit. In particular, with a special
choice of the approximating sequence, we have for any nonnegative random variable X, the “horizontal
slice formula”:
n
n2
X −1
k
E[X] = lim P (k × 2−n ≤ X < (k + 1) × 2−n ). (4.2)
n↑∞ 2n
k=0

If X is of arbitrary sign, the expectation is defined by E[X] = E[X + ] − E[X − ] if not both E[X + ] and
E[X − ] are infinite. If E[X + ] and E[X − ] are infinite, the expectation is not defined. If E[|X|] < ∞, X
is said to be integrable, and then E[X] is a finite number.

For a complex valued random variable X = X1 + iX2 , where X1 and X2 are real valued integrable
random variables, E[X] = E[X1 ] + iE[X2 ] defines the expectation of X.

The basic properties of the expectation are of course the same as for the more general Lebesgue integral.
In particular, linearity and monotonicity:

If X1 and X2 are random variables with expectations, then for all λ1 , λ2 ∈ ,

E[λ1 X1 + λ2 X2 ] = λ1 E[X1 ] + λ2 E[X2 ] (4.3)

whenever the right-hand side has meaning (i.e., is not an ∞ − ∞ form). Also, if X1 ≤ X2 , P-a.s., then

E[X1 ] ≤ E[X2 ]. (4.4)

It follows from this that if E[X] is well-defined, then

|E[X]| ≤ E[|X|]. (4.5)

If X is integrable the mean


mX = E[X]
is well-defined and finite. If X is square-integrable (that is, E[|X| 2 ] < ∞, the variance
2
σX = E[(X − mX )2 ] = E[X 2 ] − m2X

is well-defined and finite. Finally, if X is real, the caracteristic function

φX (u) = E[eiuX ]

is well-defined.

As an example of application of Tonelli’s theorem, we shall prove the telescope formula:


4.1. EXPECTATION AS INTEGRAL 69

Theorem 4.1.1 For any nonnegative random variable X,


Z ∞
E[X] = [1 − F (x)]dx. (4.6)
0

Proof. The proof is a simple application of Fubini’s theorem. The product measure here is the product
of the Lebesgue measure on by the probability P . We have
Z Z Z
E[X] = E[ 10≤x<X dx] =

E[10≤x<X ] dx =

P (X > x)] dx.

4.1.2 Distribution of a random element


Definition 4.1.3 Let X is a random element with values in (E, E). Its distribution is, by definition, the
probability measure QX on (E, E), image of the probability measure P by the application X : (Ω, F) →
(E, E), that is, for all C ∈ E,
QX (C) = P (X ∈ C).

If X is a random element with values in (E, E) and if g is a measurable function from (E, E) to ( , B),
then g(X) is, by the composition theorem for measurable functions (Theorem 1.1.3, a random variable.
By Theorem 1.2.5, we have

Theorem 4.1.2 For g : (E, E) → ( , B)


Z
E [g(X)] = g(x)QX (dx),
E

holds whenever one of the sides of the equality is well-defined, in which case the other is also well-defined.
In the particular case where (E, E) = ( , B), taking C = (−∞, x], we have

QX ((−∞, x]) = P (X ≤ x) = FX (x),

where FX is the cumulative distribution function (c.d.f) of X, and


Z
E[g(X)] = g(x)dF (x),

by definition of the Stieltjes-Lebesgue integral.

In the particular case where (E, E) = ( n , Bn ), and where the random vector X admits a probability
density fX , that is if QX is the product of the Lebesgue measure on ( n , Bn ) by the function fX ,
Theorem 1.2.9 tells us that Z
E[g(X)] = g(x)fX (x) dx.


4.1.3 The Lebesgue theorems for expectation


We now state in terms of expectation the Lebesgue theorems, which give general conditions under which
the limit and expectation symbols can be interchanged, that is,
» –
E lim Xn = lim E[Xn ]. (4.7)
n↑∞ n↑∞

The results below have already been stated in Chapter ??, we give them again for the sake of self-
containedness of the present chapter. They are just a rephrasing of the Lebesgue theorems for integrals
70 CHAPTER 4. PROBABILITY

with respect to an arbitrary measure. First we have the monotone convergence theorem (Theorem 1.2.1)
in the context of expectations. .

Theorem 4.1.3 Let {Xn }n≥1 be a sequence of random variables such that for all n ≥ 1,

0 ≤ Xn ≤ Xn+1 , P-a.s.

Then (4.7) holds.

Next, we have the dominated convergence theorem (Theorem 1.2.2).

Theorem 4.1.4 Let {Xn }n≥1 be a sequence of random variables such that for all ω outside a set N of
null probability there exists limn↑∞ Xn (ω), and such that for all n ≥ 1

|Xn | ≤ Y, P-a.s.,

where Y is some integrable random variable. Then (4.7) holds.


Example 1.1: We recall that (4.7) is not always true when lim n↑∞ Xn exists. Indeed, take the following
probabilistic model: Ω = [0, 1], and P is the Lebesgue measure on [0, 1] (the probability of [a, b] ⊂ [0, 1] is
the length b−a of this interval). Thus, ω is a real number in [0, 1], and a random variable is a real function
defined on [0, 1]. Take for Xn the function whose graph is a triangle with base [0, n2 ] and height n. Clearly,
R1
limn↑∞ Xn (ω) = 0 and E[Xn ] = 0 Xn (x)dx = 1, so that E[limn↑∞ Xn ] = 0 6= limn↑∞ E[Xn ] = 1.

The two examples below are left as exercises of application of the monotone convergence theorem
and of the dominated convergence theorem.

Example 1.2: Let {Sn }n≥1 be a sequence of nonnegative random variables. Then
" ∞
# ∞
X X
E Sn = E[Sn ]. (4.8)
n=1 n=1

P
Example 1.3: Let {Sn }n≥1 be a sequence of real random variables such that n≥1 E[|Sn |] < ∞. Then
(4.8) holds.

4.1.4 Uniform integrability


Definition 4.1.4 The sequence {Xn }n≥1 of integrable random variables is called uniformly integrable if
Z
lim |Xn | dP = 0 uniformly in n
c↑∞ {|Xn |>c}

Example 1.4: If for some integrable random variable, P (|Xn | ≤ X) = 1 for all n, then {Xn }n≥1 is
uniformly integrable. Indeed, in this case,
Z Z
|Xn | dP ≤ X dP
{|Xn |>c} X>c

and the right-hand side of the above inequality tends to 0 as c ↑ ∞, by monotone convergence.
4.1. EXPECTATION AS INTEGRAL 71

Theorem 4.1.5 The sequence {Xn }n≥1 of integrable random variables is uniformly integrable if and
only if

(a) supn E [|Xn |] < ∞, and

(b) For every ε > 0, there exists δ(ε) > 0 such that
Z
sup |Xn | dP ≤ ε whenever P (A) ≤ δ(ε) .
n A

Proof. Let X be a non-negative random variable. For any c ≥ 0 and any A ∈ F, we have
Z Z Z
X dP = X dP + X dP
A A∩{X≤c} A∩{X>c}
Z
≤ cP (A) + sup X dP ,
n {|Xn |>c}

and therefore Z Z
sup |Xn | dP ≤ cP (A) + X dP .
n A {X>c}

The necessity of (a) follows by taking A = Ω, and that of (b) by letting P (A) → 0 and c ↑ ∞. By
Markov’s inequality and (b)
1
sup P (|Xn | ≥ c) ≤ sup E [|Xn |] ↓ 0 ,
n c n
R
as c ↑ ∞. Choose c < ∞ so that P (|Xn | ≥ c) ≤ δ(ε). Then by (a), we have {|Xn |>c} |Xn | dP ≤ ε for
all n. 

Theorem 4.1.6 A sufficient condition for the sequence {Xn }n≥1 of integrable random variables to be
uniformly integrable is the existence of a non-negative non-decreasing function G : → such that
G(t)
lim = +∞
t↑∞ t
and
sup E [G(|Xn |)] < ∞
n

Proof. Take ε > 0, let M = supn G(E [|Xn |]) and a = M


ε
. Take c large enough so that G(t)/t ≥ a for
t ≥ c. Then
Z
1 ˆ ˜ M
X dP ≤ E G(|Xn |)1{|Xn |>c} ≤ = ε.
{|Xn |>c} a a

Theorem 4.1.7 Let {Xn }n≥1 be a sequence of integrable random variables, and let X be a random
variable. The following are equivalent:
Pr.
(a) {Xn }n≥1 is ui and Xn → X as n → ∞.
L
(b) X is integrable and Xn →1 X as n → ∞.

Proof.
L
We first prove (b) ⇒ (a). Since Xn →1 X, the sequence {Xn }n≥1 is a Cauchy sequence of L1 , that is
limm,n↑∞ E [|Xm − Xn |] = 0. Therefore, given ε > 0 there is an integer N (ε) such that E [|Xm − Xn |] ≤
72 CHAPTER 4. PROBABILITY

R R R
ε when m, n ≥ N (ε). From this and the inequality A
|Xn | dP ≤ A
|Xm | dP + |Xm − Xn | dP we have
that, for all A ∈ F, Z Z
sup |Xn | dP ≤ sup |Xm | dP + ε
n A m≤N (ε) A
R R
Since the finite family {Xn }n≤N (ε) is ui, supn |Xn | dP < ∞ and supn A |Xn | dP ≤ 2ε for sufficiently
small P (A). This implies uniform integrability of {Xn }n≥1 (Theorem 4.1.5). It remains to prove that
Pr.
Xn → X. This follows from the Cauchy criterion of convergence in probability, since, by Markov’s
inequality,
1
P (|Xm − Xn | ≥ ε) ≤ → 0 as m, n → ∞ .
ε

Pr.
We now prove (a) ⇒ (b). First we show that Xn → X and uniform integrability of {Xn }n≥1 imply
a.s.
that X is integrable. Taking a subsequence if necessary, we can assume that Xn → X and therefore,
by Fatou’s lemma E [|X|] ≤ lim inf E [|Xn |] ≤ sup E [|Xn |] < ∞. Now
Z Z Z
|Xn − X| dP ≤ |Xn − X| dP + |Xn − X| dP
{|Xn −X|≤ε} {|Xn −X|>ε}
Z Z
=ε + |Xn − X| dP + |Xn − X| dP
{|Xn |>ε} {|−X|>ε}

and therefore Z
lim sup |Xn − X| dP ≤ ε
n
as n ↑ ∞ since P (|Xm − Xn | ≥ ε) → 0. Since ε > 0 is otherwise arbitrary, we have that
Z
lim |Xn − X| dP = 0 .
n

4.2 Independence
4.2.1 From Fubini to independence
Definition 4.2.1 Two events A and B are said to be independent events if
P (A ∩ B) = P (A)P (B). (4.9)
More generally, a family {Ai }i∈I of events, where I is an arbitrary index, is called an independent family
of events if for every finite subset J ∈ I
!
\ Y
P Aj = P (Aj ).
j∈J j∈J

Two random variables X : (Ω, F) → (E, E) and Y : (Ω, F) → (G, G) are called independent random
variables if for all C ∈ E, D ∈ G
P ({X ∈ C} ∩ {Y ∈ D}) = P (X ∈ C)P (Y ∈ D). (4.10)
More generally, a family {Xi }i∈I , where I is an arbitrary index, of random variables Xi : (Ω, F) →
(Ei , Ei ), i ∈ I, is called an independent family of random variables if for every finite subset J ∈ I
!
\ Y
P {Xj ∈ Cj } = P (Xj ∈ Cj )
j∈J j∈J

for all Cj ∈ Ej (j ∈ J).


4.2. INDEPENDENCE 73

The following is an immediate consequence of the definition of independent random variables.

Theorem 4.2.1 If the random variables X and Y , taking their values in (E, E) and (G, G) respectively,
are independent , then so are ϕ(X) and ψ(Y ), where ϕ : (E, E) → (E 0 , E 0 ), ψ : (G, G) → (G0 , G 0 ).

Proof. For all C 0 ∈ E 0 , D0 ∈ G 0 , the sets C = ϕ−1 (C 0 ) and D = ψ −1 (D0 ) are in E and G respectively,
since ϕ and ψ are measurable. We have
` ´
P ϕ(X) ∈ C 0 , ψ(Y ) ∈ D0 = P (X ∈ C, Y ∈ D)
= P (X ∈ C) P (Y ∈ D)
` ´ ` ´
= P ϕ(X) ∈ C 0 P ψ(Y ) ∈ D0 .

The above result is stated for two random variables for simplicity, and it extends in the obvious way to
a finite number of independent random variables.

The independence of two random variables X and Y is equivalent to the factorisation of their joint
distribution:
Q(X,Y ) = QX × QY ,
where Q(X,Y ) , QX , and QY are the distributions of (X, Y ), X, and Y , respectively. Indeed, for all sets
of the form C × D, where C ∈ E, D ∈ G,

Q(X,Y ) (C × D) = P ((X, Y ) ∈ C × D)
= P (X ∈ C, Y ∈ D)
= P (X ∈ C)P (Y ∈ D)
= QX (C)QY (D),

and this implies that Q(X,Y ) is the product measure of QX and QY .

In particular, the Fubini–Tonelli theorem gives immediately a result that we have already seen several
times in particular cases: the product formula for expectations (Formula (4.11) below).

Theorem 4.2.2 Let the random variables X and Y taking their values in (E, E) and (G, G) respectively
be independent, and let g : (E, E) → ( , B), h : (G, G) → ( , B) such that either one of the following
two conditions is satisfied:

(i) E [|g(X)|] < ∞ and E [|h(Y )|] < ∞; or

(ii)g ≥ 0 and h ≥ 0.

Then

E [g(x)h(Y )] = E [g(X)] E [h(Y )] . (4.11)

4.2.2 Conditional expectation


Definition 4.2.2 Let Y and let X = (X1 , . . . , XN ) be, respectively, an integrable random variable and
an arbritrary random vector. A version of the conditional expectation of Y given X is any integrable
random variable of the form g(X) where g : N → is measurable such that

E[Y ϕ(X)] = E[g(X)ϕ(X)] (4.12)


N
for all bounded measurable function ϕ : → .
74 CHAPTER 4. PROBABILITY

Theorem 4.2.3 Let Y and X be as above. There exists at least one version of the conditional expectation
of Y given X, g(X), and it is essentially unique, that is, if g 0 (X) is another version of the conditional
expectation of Y given X, then

g(X) = g 0 (X), P -a.s.

Proof. We omit the proof of existence, without qualms, since in all applications, we need to compute
the conditional expectation, and in the process, we prove existence. To prove uniqueness, we first observe
that in view of (4.12),

0 = E[g(X)ϕ(X)] − E[g 0 (X)ϕ(X)] = E[(g(X) − g 0 (X))ϕ(X)]

N
for all bounded measurable ϕ : → . In particular, with ϕ(x) = 1{g(x)>g0 (x)}

E[(g(X) − g 0 (X))1{g(X)>g0 (X)} ] = 0

Since the random variable in the expectation is non-negative, it can have a null expectation only if it
is P -a.s. null, that is if P -a.s., g(X) ≤ g 0 (X). By symmetry, P -a.s., g(X) ≥ g 0 (X), and therefore, as
announced, g(X) = g 0 (X)P -a.s. 

The symbols E[Y |X], or E X [Y ], represent any version of the conditional expectation g(X) of Y given
X. There is no problem in representing two versions of this conditional expectation by the same symbol,
since, as we just saw, they are P -almost surely equal. From now on we say: E X [Y ] (or E[Y |X]) is the
conditional expectation of Y given X. The defining equality (4.12) reads

E[Y ϕ(X)] = E[E[Y |X]ϕ(X)].

Although we have not yet proven the existence of the conditional expectation in the general case, in
many practical situations, we can directly exhibit a version of it. In the following theorems we retrieve
results of Section ??.

Theorem 4.2.4 Let X be a positive integer valued random variable. Then, for any integrable random
variable Y

X∞
E[Y 1{X=n} ]
E X [Y ] = 1{X=n} , (4.13)
n=1
P (X = n)

E[Y 1
{X=n} ]
where, by convention, P (X=n) = 0 when P (X = n) = 0 (in other terms, the sum in 4.13 is over all
n such that P (X = n) > 0).
4.2. INDEPENDENCE 75

Proof. We must verify 4.12 for all bounded measurable ϕ : → . The right hand side is equal to
20 10 13
X E[Y 1{X=n} ] X
E 4@ 1{X=n} A @ ϕ(k)1{X=k} A5
P (X = n)
n≥1 k≥1
2 3
X E[Y 1{X=n} ]
=E 4 ϕ(n)1{X=n} 5
P (X = n)
n≥1

X E[Y 1{X=n} ]
= ϕ(n)E[1{X=n} ]
P (X = n)
n≥1
X E[Y 1{X=n} ]
= ϕ(n)P (X = n)
P (X = n)
n≥1
X
= E[Y 1{X=n} ]ϕ(n)
n≥1
X
= E[Y 1{X=n} ϕ(n)]
n≥1
X
=E[Y ( ϕ(n)1{X=n} )] = E[Y ϕ(X)]
n≥1

Theorem 4.2.5 Let Z and X be the random vectors of dimensions p and n respectively, with the joint
probability density fZ,X (z, x). Recall the definition of the conditional probability density f ZX=x (z):

fZ,X (z, x)
fZX=x (Z) =
fX (x)

with the convention fZX=x (z) = 0 when fX (x) = 0. Let h : p+n → be a measurable function, and
suppose that the random variable Y = h(Z) is integrable. A version of the conditional expectation of
Y = h(Z, X) given X is the random variable g(X) where
Z
g(x) = h(z, x)fZX=x (z)dz


Proof. We first check that g(X) is integrable. We have


Z
|g(x)| ≤ |h(z, x)|fZX=x (z)dz


and therefore
Z Z „Z «
E[|g(x)|] = |g(x)|fX (x)dx ≤ |h(z, x)|fZX=x (z)dz fX (x)dx


n n P  

ZZ
= |h(z, x)|fZX=x (z)fX (x)dzdx
P n
 

Z Z
= |h(z, x)|fZ,X (z, x)dzdx


P 

= E[|h(Z, X)|] = E[|Y |] < ∞


76 CHAPTER 4. PROBABILITY

We must now check that (4.12) is true, where ϕ is bounded. The right-hand side is
Z
E[g(x)ϕ(x)] = g(x)ϕ(x)fX (x)dx
n 

Z „Z «
= h(z, x)fZX=x (z)dz ϕ(x)fX (x)dx
n P  

Z Z
= h(z, x)ϕ(x)fZX=x(z)fZ (x)dzdx
P n  

Z Z
= h(z, x)ϕ(x)fZ,X (z, x)dzdx


P 

= E[h(Z, X)ϕ(X)] = E[Y ϕ(Z)].

Going back to the case where the conditioning variable X is discrete, we shall mention the situation,
often encountered in practice, where Z is a random vector of dimension p, and where X takes its values
in N+ , with
P (X = k) = π(k)
for all k ≥ 1, and Z
P (Z ∈ A|X = k) = fk (Z)dz
A

for all k ≥ 1, all A ∈ B P . Then, for any measurable function g : P


× N+ → such that E[|h(Z, X)|] <
∞, we have
E X [h(Z, X)] = g(X)
where Z
g(k) = h(z, k)fk (z)dz.


The proof is left to the reader, and is similar to the proof when (Z, X) has a joint probability distribution.

Example 2.1: X is above and let Z be of the form

Z =X +ξ

where ξ is a random variable independent of X with probability density fξ . Let h : → be such that
E[|h(Z)|] < ∞. We shall compute E X [h(Z)] = g(X).

We have Z
g(k) = 
h(z)fk (z)dz

and fk is defined by Z
fk (z)dz = P (Z ∈ A|X = k)
A
that is
Z
P (Z ∈ A, X = k)
fk (z)dz =
A P (X = k)
P (k + ξ ∈ A, X = k) P (k + ξ ∈ A)P (X = k)
= =
P (X = k) P (X = k)
Z
= P (k + ξ ∈ A) = P (ξ ∈ A − k) = fξ (zh)dz.
A

Therefore
fk (z) = fξ (zk)
4.2. INDEPENDENCE 77

and Z Z
g(k) = 
h(z)fξ (zk)dz = 
h(z + k)fξ (z)dz

that is
g(k) = E[h(ξ + k)]

We now treat the second mixed case the conditioning variable X is a random vector of dimension n, Z
is a N+ -valued random variable, with the joint distribution of (Z, X) given by

P (Z = k) = π(k)

for all k ≥ 1, and Z


P (X ∈ A|Z = k) = fk (x)dx
A
for all k ≥ 1, and all A ∈ B n . For all k ≥ 1, x ∈ n
, we define

π(k)fk (x)
πZ/X (k|x) =
fX (x)
P
if fX (x) = k≥1 π(k)fk (x) > 0, and π(k|x) = 0 otherwise. We let the reader verify that for all
n
h:N× → such that E[|h(Z, X)| < ∞, then

E X [h(Z, X)] = g(X)

where X
g(x) = h(k, x)πZ|X k(x).
k≥1

Exercise 4.4 is recommended since it features a result that cannot be obtained as an application of
Theorems 4.2.4 or 4.2.5. It requires coming back to the definition of conditional expectation.

Properties of conditional expectations

We shall give the main rules that are useful in computing conditional expectations.

Theorem 4.2.6 Let X be a random vector, Y, Y1 , Y2 be integrable random variables, λ1 , λ2 ∈ . Let n


be the dimension of X.

Rule 1. (linearity)
E X [λ1 Y1 + λ2 Y2 ] = λ1 E X [Y1 ] + λ2 E X [Y2 ]

Rule 2. If Y is independent of X, then

E X [Y ] = E[Y ]

n
Rule 3. If h : → is a measurable function,

E X [h(X)] = h(X)

Proof.

Rule 1: Let g1 (X) = E X [Y1 ], g2 (Y1 ) = E X [Y2 ]. We must show that, for all bounded measurable
ϕ: n→
E[(λ1 g1 (X) + λ2 g2 (X)ϕ(X)] = E[(λ1 Y1 + λ2 Y2 )ϕ(X)]
78 CHAPTER 4. PROBABILITY

This follows immediately from the definition of g1 (X) which says that E[gi (X)ϕ(X)] = E[Yi ϕ(X)], i =
1, 2.

Rule 2: We have to check that E[g(X)ϕ(X)] = E[Y ϕ(X)] with g(X) = E[Y ]. In this case E[g(X)ϕ(X)] =
E[E[Y ]ϕ(X)] = E[Y ]E[ϕ(X)], and since Y and ϕ(X) are independent E[Y ]E[ϕ(X)] = E[Y ϕ(X)].

Rule 3: We must check that E[Y ϕ(X)] = E[h(X)ϕ(X)], which is a tautology since Y = h(X). 

Theorem 4.2.7 Let X be a random vector and let Y1 and Y2 be two integrable random variables such
that Y1 ≤ Y2 P -a.s. Then
E X [Y1 ] ≤ E X [Y2 ], P -a.s. (4.14)

n
Proof. For any bounded measurable ϕ : → ,
E[E [Y1 ]ϕ(X)] = E[Y1 ϕ(X)] ≤ E[Y2 ϕ(X)] = E[E X [Y2 ]ϕ(X)]
X

Therefore
E[(E X [Y2 ] − E X [Y1 ])ϕ(X)] ≥ 0
Taking ϕ(X) = 1{E X [Y2 ]<E X [Y2 ]} , we see that this implies 4.14 . 
In particular, if Y is a non-negative integrable random variable
E X [Y ] ≥ 0, P - a.s.
Theorem 4.2.8 Let X be a random vector and let Y be an integrable random variable of the form
Y = v(X)Z, where v : n → is a measurable bouned function, and Z is an integrable random
variable. Then
E X [v(X)Z] = v(X)E X [Z]

Proof. We must show that the right-hand side is a version of E X [v(X)Z], that is we must prove that
for all bounded measurable ϕ : n → ,
E[v(X)Zϕ(X)] = E[v(X)E X [Z]ϕ(X)].
But, since v(x)ϕ(x) is bounded, by definition of E X [Z],
E[v(X)E X [Z]ϕ(X)] = E[v(X)Zϕ(X)]


Theorem 4.2.9 Let X be a random vector, and let {Yn }n≥1 be a sequence of non-negative integrable
random variables that is P -a.s. non-decreasing and that converges P -a.s. to the integrable random
variable X. Then {E X [Yn ]}n≥1 is a P -a.s. non-decreasing sequence of random variables that converges
P -a.s. to E X [Y ].

Proof. Let gn (X) be a version of E X [Yn ]. By the monotonicity of conditional expectation {gn (X)}n≥1
is a P -a.s. non-decreasing sequence. In particular, there exists a P -a.s. limit g(X) of this sequence, and
by monotone convergence, for any bounded non-negative measurable ϕ : n →
lim E[Yn ϕ(X)] = E[Y ϕ(X)]
n↑∞

lim E[gn (X)ϕ(X)] = E[g(X)ϕ(X)]


n↑∞

and therefore, since E[Yn ϕ(X)] = E[gn (X)ϕ(X)] for all n ≥ 1,


E[Y ϕ(X)] = E[g(X)ϕ(X)].
In particular, with ϕ = 1 we see that g(X) is integrable. The equality holds for any bounded
measurable ϕ : n → (since it holds for ϕ+ and ϕ− ). Therefore g(X) is a version of E X [Y ], and we
have shown that limn↑∞ E X [Yn ] = E X [Y ] . 
4.3. THE THEORY OF CARACTERISTIC FUNCTIONS AND WEAK CONVERGENCE79

4.3 The theory of caracteristic functions and weak


convergence
4.3.1 Paul Lévy’s inversion formula
Theorem 4.3.1 Let X be a real random variable with cumulative distribution function F and caracter-
istic function ϕ. Then for any pair of points a, b (a < b) at which F is continuous, we have (this is Paul
Lévy’s inversion formula)
Z +c −iua
e − e−iub
F (b) − F (a) = lim ϕ(u) du (4.15)
c↑+∞ −c iu

Proof. We have
Z
1 e−iua − e−iub
+c
Φc : = ϕ(u) du
2π −c iu
Z +c −iua „Z +∞ «
1 e − e−iub
= e−iux dF (x) du
2π −c iu −∞
Z +∞ „Z +c −iua «
1 e − e−iub −iux
= e du dF (x)
2π −∞ −c iu
Z +∞
= Ψc (x) dF (x) ,
−∞

where Z
1 +c
e−iua − e−iub −iux
Ψc (x) := e du .
2π −c iu
In the above computations we have applied Fubini’s theorem, which is allowed since, observing that
˛ −iua ˛ ˛Z ˛
˛e − e−iub ˛˛ ˛˛ b −iux ˛˛
˛
˛ iu
=
˛ ˛ e dx ˛ ≤ (b − a) ,
a

we have
Z Z ˛ −iua ˛ Z +c Z +∞ ˛ −iua ˛
+c +∞ ˛e − e−iub −iux ˛˛ ˛e − e−iub ˛˛
˛ e dF (x) du = ˛
˛ iu ˛ ˛ iu ˛ dF (x) du
−c −∞ −c −∞
˛
Z +c Z +∞ −iua ˛
˛e − e−iub ˛˛
= ˛
˛ iu ˛ dF (x) du
−c −∞
Z +c Z +∞
= (b − a) dF (x) du
−c −∞
Z +∞
= (b − a) dF (x) × 2c
−∞

= 2c(b − a) < ∞ .

Observe that
Z +c Z +c(x−a) Z +c(x−b)
1 sin t(x − a) − sin t(x − b) 1 sin u 1 sin u
Ψc (x) = du = du − du .
2π −c u 2π −c(x−a) u 2π −c(x−b) u
RB R +∞
The function s, t → A sinu u du is uniformly continuous in A, B and tends to −∞ sinu u du = π as A ↑ +∞
and B ↓ −∞. Therefore the function Ψc is uniformly bounded. Moreover by the above expression for
Ψc ,
lim Ψc (x) = Ψ(x) ,
c↑∞
80 CHAPTER 4. PROBABILITY

where

Ψ(x) = 0 if x < a or x > b


1
= if x = a or x = b
2
= 1 if a < x < b

Therefore, by dominated convergence, denoting by QX the distribution of X,


Z +∞
lim Φc = lim Ψc (x) dF (x)
c↑∞ −∞ c↑∞
Z +∞
= Ψ(x) dF (x)
−∞
Z +∞
= Ψ(x) QX (dx)
−∞

= QX ((a, b)) + QX ({a}) + QX ({b})


1 1
= F (b−) − F (a) + (F (a) + F (a−)) + (F (b) + F (b−))
2 2
F (b) − F (b−) F (a) − F (a−)
= − .
2 2

The Fourier inversion formula can be obtained as a corollary of Lévy’s theorem:

Theorem 4.3.2 If X be in Theorem 4.3.1 admits a probability density f and if moreover the caracteristic
function is integrable:
Z +∞
|ϕ(u)| du < ∞ ,
−∞

then
Z +∞
1
f (x) = ϕ(u)e−iux du . (4.16)
2π −∞

Proof. With f defined as in (4.16), we have, by Fubini,


Z b Z Z +∞
b
1
f (x) dx = ϕ(u)e−iux du dx
a a 2π −∞
Z +∞ „Z b «
1
= ϕ(u) e−iux du dx
2π −∞ a
Z +c „Z b «
1
= lim ϕ(u) e−iux du dx
c↑∞ 2π −c a
Z +c
1 e−iua − e−iub
= lim ϕ(u) dx
c↑∞ 2π −c iu
= F (b) − F (a) ,

by Theorem 4.3.1. 
4.3. THE THEORY OF CARACTERISTIC FUNCTIONS AND WEAK CONVERGENCE81

4.3.2 Bochner”s theorem


The caracteristic function ϕ of a (real) random variable X has the following two properties:

A. It is continuous at 0, and

B. It is definite non-negative, in the sense that for all integers n, all u1 , . . . , un ∈ , and all z1 , . . . ,
zn ∈ ,


Xn Xn
ϕ(uj − uk )zj zk∗ ≥ 0. (4.17)
j=1 k=1

For the proof of (4.17), just observe that the left-hand side is equal to
"˛ n ˛2 #
˛X ˛
˛ iuj X ˛
E ˛ zj e ˛ .
˛ ˛
j=1

(It turns out that Properties A and B caracterize caracteristic functions. Before we state the correspond-
ing result (Bochner’s theorem), we give some consequences of Property B that we shall need later.

Lemma 4.3.1 Let ϕ : →  be a function satisfying property B. Then for all u ∈ ,

ϕ(−u) = ϕ(u)∗ and |ϕ(u)| ≤ ϕ(0) . (4.18)

If moreover ϕ is continuous at 0, it is uniformly continuous in , and for any continuous function


z : → and any A ≥ 0,


Z AZ A
ϕ(u − v)z(u)z ∗ (v) du dv ≥ 0 . (4.19)
0 0

Proof. In (4.17) let n = 1, u1 = 0, z1 = 1, to obtain ϕ(0) ≥ 0. Let now n = 2, u1 = 0, u2 = u,


z1 = z2 = 1, to obtain
2ϕ(0) + ϕ(u) + ϕ(−u) ≥ 0 ,
and let n = 2, u1 = 0, u2 = u, z1 = 1, z2 = i to obtain

iϕ(u) − iϕ(−u) ≥ 0 .

We deduce from the above two inequalities that ϕ(u)+ϕ(−u) is real and ϕ(u)−ϕ(−u) is pure imaginary,
and therefore that ϕ(u)∗ = ϕ(−u).
Now with n = 2, and u1 = u, u2 = 0, in order that (4.17) holds for all z1 , z2 ∈ , it is necessary that


the determinant of „ «
ϕ(0) ϕ(u)
ϕ(−u) ϕ(0)
be non-negative, that is taking the results ϕ(0) ≥ 0 and ϕ(−u) = ϕ(u)∗ into account, |ϕ(u)| ≤ ϕ(0). In
particular, if ϕ(0) =), then ϕ is identically null, so that we can discard this trivial case for which the
theorem is obviously true, and assume that ϕ(0) > 0.
With n = 3, and u1 = 0, u2 = u, u3 = u + v, in order that (4.17) holds for all z1 , z2 , z3 ∈ , it is 

necessary that the determinant of


0 1
ϕ(0) ϕ(−u) ϕ(−u − v)
@ ϕ(u) ϕ(0) ϕ(−v) A
ϕ(u + v) ϕ(v) ϕ(0)

be non-negative. This leads to (letting ϕ(0) = 1, without loss in generality)

1 + ϕ(u)∗ ϕ(v)∗ ϕ(u + v) + ϕ(u + v)∗ ϕ(u)ϕ(v) − |ϕ(u)|2 − |ϕ(v)|2 − |ϕ(u + v)|2 ≥ 0
82 CHAPTER 4. PROBABILITY

that is
1 + 2Re{ϕ(u)ϕ(v)ϕ(u + v)∗ } − |ϕ(v)|2 ≥ |ϕ(v)|2 + |ϕ(u + v)|2
and subtracting Re{ϕ(u)ϕ(u + v)∗ } from both sides,

1 + 2Re{ϕ(u)ϕ(u + v)∗ (ϕ(v) − 1)} − |ϕ(v)|2 ≥ |ϕ(u + v) − ϕ(v)|2 .

Therefore,
≥ |ϕ(u + v) − ϕ(v)|2 ≤ 1 + 2 − |ϕ(v)|2 + 2 |1 − ϕ(v)| ≤ 4 (1 − ϕ(v)) ,
from which follows the uniform continuity of ϕ.

As the integrand in (4.19) is continuous, the integral is the limit as n ↑ ∞ of


n n
2 2
A2 X X A(j − k) Aj Ak)
ϕ( )z( n )z( n )∗ ,
4n j=1 2n 2 2
k=1

a non-negative quantity. 

Theorem 4.3.3 Let ϕ : → be a function satisfying properties A and B. Then there exists a


nonnegative finite constant K and a real random variable X such that for all u ∈ ,
h i
ϕ(u) = KE eiuX . (4.20)

Proof. From (4.19) we have that


Z A Z A
1
g(x, A) := ϕ(u − v)e−ix(u−v) du dv ≥ 0.
2πA 0 0

Changing variables we obtain for g(x, A) the alternative expression


Z A „ «
1 |a|
g(x, A) : = 1− ϕ(u)eiux du
2π −A A
Z +∞ “ ”
1 u
= h ϕ(u)eiux du , (4.21)
2π −∞ A
where

h(u) = 1 − |u| if |u| ≤ 1


= 0 otherwise .

Let M > 0. We have


Z +∞ “ Z +∞ “ ” Z +∞ “
x ” 1 u x ” iux
h g(x, A), dx = h ϕ(u) du h e dx
−∞ 2M 2π −∞ A −∞ 2M
Z +∞ “ ” „ «2
1 u sin M u
= M h ϕ(u) du
π −∞ A Mu
Z +∞ „ «2
1 sin u
≤ ϕ(0) du = ϕ(0) .
π −∞ u
Since by monotone convergence,
Z +∞ “ x ” Z +∞
lim h g(x, A), dx = g(x, A), dx ,
M ↑∞ −∞ 2M −∞

we have that Z +∞
g(x, A), dx ≤ ϕ(0) .
−∞
4.3. THE THEORY OF CARACTERISTIC FUNCTIONS AND WEAK CONVERGENCE83

The
` u function
´ g(x, A) is therefore integrable and it is the the Fourier transform of the integrable function
h A ϕ(u) (see (4.21). Therefore we have the Fourier inversion formula
“u” Z +∞
h ϕ(u) = g(x, A)e−iux dx .
A −∞
R +∞
In particular, with u = 0, −∞ g(x, A)e−iux dx = ϕ(0). Therefore, f (x, A) := g(x,A) ϕ(0)
is the probability
h( A
u
)
density of some real random variable with caracteristic function ϕ(0) ϕ(u). But
`u´
h A ϕ(u)
lim ϕ(u) = .
A↑∞ ϕ(0) ϕ(0)
From the fundamental criterion of convergence in distribution, we deduce that since the limit is contin-
uous at 0, this limit is a caracteristic function. 

4.3.3 The caracteristic function criterion of convergence in dis-


tribution
Let E be a metric space with distance d, and let E be its Borel sigma-field (generated by the open sets).
For any set A ⊆ E, ∂A denotes its boundary (equal to its closure closA minus its interior intA). For any
set A ⊆ E, x ∈ E,
d(x, A) := inf{d(x, y) ; y ∈ A}
is the distance from x to A. Let ε > 0 and A ∈ E. We shall make use of the function
„ «
1
fε (x) = g d(x, A)
ε
where

g(v) = 1 if v ≤ 0
= 1 − v if 0 ≤ v ≤ 1
= 1 if v ≥ 1 .

Also, define
Aε = {x ∈ E ; d(x, A) < ε} ,
and observe that Aε ↓ A as ε ↓ 0.

Let P and {Pn }n≥1 be probability measures on (E, E).


w
Definition 4.3.1 The sequence {Pn }n≥1 is said to converge weakly to P (Pn → P ) if for all bounded
continuous function f : E → ,
Z Z
f (x)Pn (dx) → f (x)P (dx) (4.22)
E E

The following technical theorem is the basis for the theory of weak convergence.

Theorem 4.3.4 The following statements are equivalent


w
(i) Pn → P

(ii) For any closed set A ⊆ E, lim sup Pn (A) = P (A)

(iii) For any open set A ⊆ E, lim inf Pn (A) = P (A)

(iv) For any continuity set A ⊆ E (that is such that P (∂A) = 0), lim inf P n (A) = P (A)
84 CHAPTER 4. PROBABILITY

Proof. (i)⇒ (ii): We have


Z Z
Pn (A) = 1A (x)Pn (dx) ≤ fε (x)Pn (dx)
E E

and therefore, since fε is bounded and continuous


Z
lim sup Pn (A) ≤ lim sup fε (x)Pn (dx)
n↑∞ n↑∞ E
Z
= fε (x)P (dx)
E
≤ P (Aε ) .

Letting ε ↓ 0, we obtain (ii).

(ii) ⇒ (iii) and (iii) ⇒ (ii): Take the complements of the sets thereof.

(iii) ⇒ (iv): Since P (∂A) = 0, we have by (ii) and (iii),

lim sup Pn (A) ≤ lim sup Pn (closA)


n↑∞ n↑∞

≤ P (closA) = P (A) ,

and

lim inf Pn (A) ≥ lim sup Pn (intA)


n↑∞ n↑∞

≥ P (intA) = P (A) ,

thereby proving (iv).

(iii) ⇒ (i): Let f be bounded and continuous. It suffices to prove that


Z Z
lim sup f (x)Pn (dx) ≤ f (x)P (dx) (4.23)
n E E
R R
since applying (4.23) to −f gives lim inf n E f (x)Pn (dx) ≥ E f (x)P (dx) and therefore (i) is true. By a
linear transformation of f we can assume that 0 < f < 1. For a given integer k define the closed sets
Fi := {x ∈ E ; i−1
k
≤ f (x) < ki } (1 ≤ i ≤ k). Since 0 < f < 1,

Xk Z Xk
i−1 i
(P (Fi−1 ) − P (Fi )) ≤ f (x)P (dx) ≤ (P (Fi−1 ) − P (Fi ))
i=1
k E i=1
k

or, after rearrangement,


k Z k
1X 1 1X
P (Fi ) ≤ f (x)P (dx) ≤ + P (Fi ) .
k i=1 E k k i=1

The same holds for Pn , so that


Z k
1 1X
lim sup f (x)Pn (dx) ≤ + lim sup Pn (Fi )
n E k k i=1 n

k
1 1X
≤ + P (Fi )
k k i=1
Z
1
≤ + f (x)P (dx)
k E
4.3. THE THEORY OF CARACTERISTIC FUNCTIONS AND WEAK CONVERGENCE85

Letting k ↑ ∞, we obtain (4.23)

(iv) ⇒ (v): Let F be a closed set. Since


∂{x ∈ E ; d(x, F ) ≤ δ} ⊆ {x ∈ E ; d(x, F ) = δ}
the boundary sets of the left-hand side of the above equality are disjoint for distinct values of δ. Therefore
at most a countable number of these boundaries have positive probability. Therefore for some sequence
δk ↓ 0, the sets Fk := {x ∈ E ; d(x, F ) ≤ δk } are continuity sets, so that for each k,
lim sup Pn (F ) ≤ lim Pn (Fk ) = P (Fk )
n n

Since F is closed, Fk ↓ F and therefore, letting k ↓ 0


lim sup Pn (F ) ≤ lim P (Fk ) = F (F ) .
n k

4.3.4 Hilbert space of square integrable functions


In this subsection, we translate in a probabilistic setting the results on the Hilbert space L2 .

The space L2 (P ) is the space L2 (P ) of complex random variables such that


ˆ ˜
E |X|2 < ∞,
where two variables X anx X 0 such that P (X = X 0 ) = 1 are not distinguished, with the inner product
< X, Y >= E [XY ∗ ] .
The norm of a random variable X is
ˆ ˜1
kXk = E |X|2 2
and the distance between two variables X and Y is
ˆ ˜1
d(X, Y ) = E |X − Y |2 2 .

In L2 (P ), Schwarz’s inequality reads:


ˆ ˜1 ˆ ˜1
|E [XY ∗ ] | ≤ E |X|2 2 E |Y |2 2

The completeness property of L2 (P ) reads, in this case, as follows. If {Xn }n≥1 is a sequence of random
variables in L2 (P ) such that ˆ ˜
lim E |Xm − Xn |2 = 0,
m,n↑∞

then, there exists a random variable X ∈ L2 (P ) such that


ˆ ˜
lim E |X − Xn |2 = 0.
n↑∞

In Probability theory, one then says that the sequence {Xn }n≥1 converges in quadratic mean to X.

In the Hilbert space L2 (P ), Theorem 2.1.3 reads: Let {Xn }n≥1 and {Yn }n≥1 be sequences of random
variables of L2 (P ) that converge in quadratic mean to the random variables X and Y respectively. Then
lim E [Xn Ym∗ ] = E [XY ∗ ] .
m,n↑∞

In particular, (with Yn ≡ Xn ), ˆ ˜ ˆ ˜
lim E kXn k2 = E kXk2
m↑∞

and (with Yn ≡ 1),


lim E [Xn ] = E [X] .
m↑∞
86 CHAPTER 4. PROBABILITY

4.4 Exercises
Exercise 4.1.
♥ Let X be a non-negative random variable and let G : + →  be the primitive function of g : + →  ,
that is, for all x ≥ 0, Z x
G(x) = G(0) + g(u)du.
0

(a) Let X be a non-negative random variable with finite mean µ and such that E [G(X)] < ∞. Show
that Z ∞
E [G(X)] = G(0) + g(x)P (X ≥ x)dx.
0

(b) Let X be as in (a), and let X̄ be a non-negative random variable with the progability density

µ−1 P (X ≥ x).

Show that ˆ ˜
h i E eiuX − 1
iuX̄
E e = .
iµu

Exercise 4.2.
Let {Xn }n≥1 be a sequence of identically distributed integrable real random variables with common
mean m. Let
Xn
Sn := Xi .
i=1

Suppose that for some constant K < ∞,


" n
!4 #
X
E (Xi − m) ≤ Kn2 .
i=1

(This is the condition prevailing in the proof of Borel’s slln in Section ??.) Show that
"∞ #
X 4
E (Sn − m) < ∞ .
n=1

Deduce from this the slln for the sequence {Xn }n≥1 .

Exercise 4.3.
Let X be a non-negative random variable. Prove that
h i
lim E e−θX = P (X > 0) .
θ↓0 ; θ>0

Exercise 4.4.
♥ Let X be a real random variable with probability density fX (x). Let h : → be a measurable
function such that h(X) is integrable. Prove that
√ √
√ fX ( X 2 ) √ fX (− X 2 )
E[h(x)|X 2 ] = h( X 2 ) √ √ + h(− X 2 ) √ √
fX ( X 2 ) + fX (− X 2 ) fX ( X 2 ) + fX (− X 2 )

Exercise 4.5.
4.4. EXERCISES 87

Let X be an integrable real random variable. Define X + = max(X, 0), X − = max(−X, 0). Prove that
ˆ ˜
E X−
E[X)|X + ] = X + − 1 +
P (X + = 0) {X =0}

Exercise 4.6.
Let X be a real random variable whose caracteristic
P function ψ is such that |ψ(t0 )| = 1 for some t0 6= 0.
Show that there exists some a ∈ such that n∈ P (X = a + n 2π


t0
)=1.

You might also like