Professional Documents
Culture Documents
Apuntes. Esp. de Hilbert, Transf. de Fourier. Piere Bremaud.
Apuntes. Esp. de Hilbert, Transf. de Fourier. Piere Bremaud.
Apuntes. Esp. de Hilbert, Transf. de Fourier. Piere Bremaud.
Pierre Brémaud
1 Integration 5
1.1 The Lebesgue integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1 Measurable functions and measures . . . . . . . . . . . . . . . . . . 6
1.1.2 Construction of the integral . . . . . . . . . . . . . . . . . . . . . . 11
1.2 The big results of integration theory . . . . . . . . . . . . . . . . . . . . 15
1.2.1 Dominated convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.2 Fubini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.3 Radon–Nikodym . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 The Riesz–Fischer theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.1 The Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Hölder’s and Minkowski’s Inequalities . . . . . . . . . . . . . . . . 22
1.3.3 Completeness of Lp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Hilbert spaces 29
2.1 Basic definitions and properties . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.1 Inner product and Schwarz’s inequality . . . . . . . . . . . . . . . 29
2.1.2 Continuity of the inner product . . . . . . . . . . . . . . . . . . . . 31
2.1.3 The L2 spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Projections and isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.1 The projection principle . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.2 Hilbert space isometries . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3 Orthonormal Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.1 Orthonormal Systems and Bessel’s inequality . . . . . . . . . . . 36
2.3.2 Complete orthonormal systems . . . . . . . . . . . . . . . . . . . . . 37
3 Fourier analysis 41
3.1 Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.1 Fourier transform in L1 . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.2 Fourier Transform in L2 . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.1 Fourier series in L1loc . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.2 The Poisson summation formula . . . . . . . . . . . . . . . . . . . . 54
3.2.3 Fourier Series in L2loc . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.4 The Sampling Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3
4 CONTENTS
4 Probability 67
4.1 Expectation as integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.2 Distribution of a random element . . . . . . . . . . . . . . . . . . . 69
4.1.3 The Lebesgue theorems for expectation . . . . . . . . . . . . . . . 69
4.1.4 Uniform integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.1 From Fubini to independence . . . . . . . . . . . . . . . . . . . . . . 72
4.2.2 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3 The theory of caracteristic functions and weak convergence . . . . 79
4.3.1 Paul Lévy’s inversion formula . . . . . . . . . . . . . . . . . . . . . 79
4.3.2 Bochner”s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.3 The caracteristic function criterion of convergence in distri-
bution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3.4 Hilbert space of square integrable functions . . . . . . . . . . . 85
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Chapter 1
Integration
Introduction
The reader is familiar with the Riemann integral. The latter has however a few weak points when
compared to the Lebesgue integral. For instance,
(1) The class of Riemann-integrable functions is not large enough. Indeed, some functions have an
“obvious” integral, and Riemann’s integration theory denies it, while Lebesgue’s theory recognizes it
(see Example 1.10).
(2) The stability properties of the Riemann integral under the limit operation are too weak. Indeed,
it often happens that the limit of nonnegative Riemann integrable functions is not Riemann integrable,
whereas the limit of nonnegative Lebesgue integrable functions is always Lebesgue integrable.
(3) The Riemann integral is defined with respect to the Lebesgue measure (the “volume” in n ), whereas
the Lebesgue integral can be defined with respect to a general abstract measure, a probability for instance.
This last advantage should convince the student to invest a little of her time in order to understand the
essentials of Lebesgue integral, because the return is considerable. Indeed, the Lebesgue integral of the
function f with respect to the measure µ (to be defined in the present chapter), modestly denoted by
Z
f (x) µ(dx),
X
contains a surprising variety of mathematical objects, such as the usual Lebesgue integral on the line,
Z
f (x) dx,
can also be regarded (with profit) as a Lebesgue integral with respect to the counting measure on .
The Stieltjes–Lebesgue integral Z
f (x) dF (x)
5
6 CHAPTER 1. INTEGRATION
E[Z]
of a random variable Z, are particular cases of the Lebesgue integral. For the reader who is reluctant
to give up the expertise dearly acquired in the Riemann integral, it suffices to say that any Riemann-
integrable function is also Lebesgue-integrable and that both integrals then coincide.
Is Lebesgue’s theory hard to learn and understand? In fact most of the results thereof are very natural,
and do not disturb the intuition acquired by the practice of the Riemann integral. But the Lebesgue
integral is much easier to manipulate correctly than the Riemann integral. The tedious (although not
difficult) part is maybe the step by step construction of the Lebesgue integral. However, omitting the
details and just giving a summary of the main steps is usually not a cause of frustration for the reader
interested in the Lebesgue integral as a tool for applications. The really difficult part of measure theory
is the proof of existence of certain measures, but one usually does not mind admitting such results. For
instance there is an existence theorem for the Lebesgue measure ` (the “length”) on . It says: There
exists a unique measure ` on that gives to the intervals [a, b] the measure b − a. Of course, in order
to understand what all the fuss is about, and what kind of mathematical subtleties hide behind such a
harmless statement, we shall have to be more precise about the meaning of ”measure”. But when this is
done, the statement is of the kind that one is ready to approve, although its proof is not immediate. Of
course, in this chapter, the proofs of such ”obvious” results are systematically omitted, because the goal
is practical: to provide the reader with a powerful tool, and to give a few tips as to how to use it safely.
Someone with no previous knowledge of integration theory will therefore be in the situation of the new
recipient of a driving license. Experience is best acquired on the road, and the main text contains many
opportunities for the reader to apply the rules that we shall now briefly review.
(α) X ∈ X ;
(β) A ∈ X =⇒ Ā ∈ X ;
(γ) An ∈ X for all n ∈ =⇒ ∪∞
n=0 An ∈ X .
Example 1.1: Two extremes. Two extremal examples of sigma-fields on X are the gross sigma-field
X = {∅, X}, and the trivial sigma-field X = P(X).
Definition 1.1.2 The sigma-field generated by a nonempty collection of subsets C ⊆ P(X) is, by defi-
nition, the smallest sigma-field on X containing all the sets in C. It is denoted by σ(C).
Recall that a set O ∈ n is called open if for any x ∈ O, one can find a non empty ball centered on x
and entirely contained in O.
1.1. THE LEBESGUE INTEGRAL 7
Definition 1.1.3 Let X be a topological space and let O be the collection of open sets defining the
topology. The sigma-field B(X) = σ(O) is called the Borel sigma-field on X associated with the given
topology. A set B ∈ B(X) is called a Borel set of X.
Qn
Theorem 1.1.1 B n is also generated by the collection C of all rectangles of the type i=1 (−∞, ai ],
where ai ∈ Q for all i ∈ {1, . . . , n}.
Q
For n = 1 one writes B( ) = B. For I = n j=−1 Ij , where Ij is a general interval of (I is then called a
n
generalized rectangle of ), the Borel sigma-field B(I) on I consists of all the Borel sets contained in I.
Definition 1.1.4 Let (X, X ) and (E, E) be two measurable spaces. A function f : X → E is said to be
a measurable function with respect to X and E if f −1 (C) ∈ X for all C ∈ E.
It seems difficult to prove measurability since most sigma-fields are not defined explicitly (see the defi-
nition of B n for instance). However, the following result renders the task feasible.
Theorem 1.1.2 Let (X, X ) and (E, E) be two measurable spaces, where E = σ(C) for some collection C
of subsets of E. Then f : (X, X ) → (E, E) if and only if f −1 (C) ∈ X for all X ∈ C.
Two immediate applications of this result are:
Corollary 1.1.1 Let X and E be two topological spaces with respective Borel sigma-fields B(X) and
B(E). Any continuous function f : X → E is measurable with respect to B(X) and B(E).
Corollary 1.1.2 Let (X, X ) be a measurable space and let n ≥ 1 be an integer. Then f = (f 1 , . . . , fn ) :
(X, X ) → ( n , Bn ) if and only if for all i, 1 ≤ i ≤ n, {fi ≤ ai } ∈ X for all ai ∈ Q.
Theorem 1.1.3 Let (X, X ), (Y, Y) and (E, E) be three measurable spaces, and let φ : (X, X ) → (Y, Y),
g : (Y, Y) → (E, E). Then g ◦ φ : (X, X ) → (E, E).
Proof. This follows immediately from the definition of measurability; See Exercise 4.3.
The next result shows that the collection of Borel functions is integrable by the “usual” operations.
Theorem 1.1.4 (i) Let f, g : (X, X ) → ( , B), and let λ ∈ . Then f g, f + g, λf , (f /g)1g6=0 are Borel
functions.
(ii) Let fn : (X, X ) → ( , B), n ∈ . Then lim inf n↑∞ fn and lim supn↑∞ fn are Borel functions,
and the set
{lim sup fn = lim inf fn } = {∃ lim fn }
n↑∞ n↑∞ n↑∞
(See Exercise 4.4 which gives an idea of how (ii) of Theorem 1.1.4 can be proven.)
The two following results are technical tools, known as monotone class theorems (MCT). They owe their
importance to the fact that sigma-fields are often defined in terms of their generators. The first monotone
class theorem is useful in proving that a given collection of sets S contains a given sigma-field (We shall
use it in a typical way for Theorem 1.1.8).
Theorem 1.1.5 Let S be a collection of subsets of X satisfying the three following properties:
(a) X ∈ S;
(b) A, B ∈ S and A ⊆ B =⇒ B − A ∈ S;
(2) If {fn }n≥1 is a nondecreasing sequence of nonnegative functions of H such that f = supn fn is finite
(resp., bounded) then f ∈ H.
Then H contains all finite (resp., bounded) real-valued functions on X which are σ(C)-measurable.
Definition 1.1.5 Let (X, X ) be a measurable space and let µ : X → [0, ∞] be a set function such that
for any denumerable family {An }n≥1 of mutually disjoint sets in X
∞
! ∞
X X
µ An = µ(An ). (1.1)
n=1 n=1
The set function µ is called a measure on (X, X ), and (X, X , µ) is called a measure space.
Property (1.1) is the sigma-additivity property. The next three properties are easy to check.
µ(∅) = 0, (1.2)
1.1. THE LEBESGUE INTEGRAL 9
Example 1.2: The Dirac measure. Let a ∈ X. The measure a defined by a (C) = 1C (a) is the
Dirac measure at a ∈ X. The set function µ : X → [0, ∞] defined by
∞
X
µ(C) = αi 1ai (C),
i=0
P∞
where αi ∈ B+ for all i ∈ , is a measure denoted µ = i=0 α i ai .
Example 1.3: Weighted counting measure. Let {αP n }n≥1 be a sequence of non-negative numbers.
The set function µ : P( ) → [0, ∞] defined by µ(C) = n∈C αn is a measure on ( , P( )). If αn ≡ 1
Example 1.4: Lebesgue measure. There exists one and only one measure ` on ( , B) such that
This measure is called the Lebesgue measure on . (Note that the statement of this example is in fact
a theorem, which is part of a more general result, Theorem 1.1.8 below.)
Definition 1.1.6 Let µ be a measure on (X, X ). If µ(X) < ∞ the measure µ is called a finite measure.
If µ(X) = 1 the measure µ is called a probability measure. If there exists a sequence {K n }n≥1 of X such
that µ(Kn ) < ∞ for all n ≥ 1, and ∪∞ n=1 Kn = X, the measure µ is called a sigma-finite measure. A
measure µ on ( n , Bn ) such that µ(C) < ∞ for all bounded Borel sets C is called a Radon (or locally
bounded) measure.
Example 1.5: The Dirac measure a is a probability measure. The counting measure ν on is a
Theorem 1.1.7 Let (X, X , µ) be a measure space. Let {An }n≥1 be a non-decreasing (that is, An ⊆ An+1
for all n ≥ 1) sequence of X . Then
∞
!
[
µ An = lim ↑ µ(An ). (1.6)
n↑∞
n=1
Let {Bn }n≥1 be a non-increasing (that is, Bn+1 ⊆ Bn for all n ≥ 1) sequence of X such that µ(Bn0 ) < ∞
for some n0 ∈ + . Then
∞
!
\
µ Bn = lim ↓ µ(Bn ). (1.7)
n↓∞
n=1
10 CHAPTER 1. INTEGRATION
Proof. We shall prove (1.6). This equality follows directly from sigma-additivity since
n−1
X
µ(An ) = µ(A1 ) + µ(Ai+1 − Ai )
i=1
and !
∞
[ ∞
X
µ An = µ(A1 ) + µ(Ai+1 − Ai ).
n=1 i=1
The necessity of the condition µ(Bn0 ) < ∞ for some n0 is illustrated by the following counter-example.
Let ν be the counting measure on and for all n ≥ 1 define Bn = {i ∈ : |i| ≥ n}. Then ν(Bn ) = + ∞
Theorem 1.1.8 Let F : → be a c.d.f.. There exists an unique measure µ on ( , B) such that
Fµ = F .
This result is easily stated, but it is not trivial, even in the case of the Lebesgue measure (Example 1.4).
It is typical of the existence results which answer the following type of question: Let C be a collection of
subsets of X with C ⊆ X , where X is a sigma-field on X. Given a set function u : C → [0, ∞], does there
exist a measure µ on (X, X ) such that µ(C) = u(C) for all C ∈ C, and is it unique? Note, however that
uniqueness is proven thanks to the MCT (Theorem (1.1.5)). Indeed suppose that there are two such
measures, µ and ν. Let S be the collection of sets in B that have the same µ- and ν-measures. S satisfies
the conditions (a,b,c) of Theorem 1.1.5, and S contains the class C of intervals (a, b] ⊆ . Therefore it
contains the sigma-field generated by C, that is B.
Definition 1.1.8 Let (X, X , µ) be a measure space. A µ-negligible set is a set contained in a measurable
set N ∈ X such that µ(N ) = 0. One says that some property P relative to the elements x ∈ X holds
µ-almost everywhere (µ-a.e.) if the set {x ∈ X : x does not satisfy P} is a µ-negligible set.
1.1. THE LEBESGUE INTEGRAL 11
For instance, if f and g are two Borel functions defined on X, the expression
f ≤ g µ-a.e.
means that
µ({x : f (x) > g(x)}) = 0.
Example 1.7: The rationals are Lebesgue-negligible. Any singleton {a}, a ∈ , is a Borel
set of Lebesgue measure 0. The set of rationals Q is a Borel set of Lebesgue measure 0. Proof: The
Borel sigma-field B is generated by the intervals Ia = (−∞, a], a ∈ (Theorem 1.1.1), and therefore
{a} = ∩n≥1 (Ia − Ia−1/n ) is also in B. Denoting ` the Lebesgue measure, `(Ia − Ia−1/n ) = 1/n, and
therefore `({a}) = limn≥1 `(Ia − Ia−1/n ) = 0. Q is a countable union of sets in B (singletons) and is
therefore in B. It has Lebesgue measure 0 as a countable union of sets of Lebesgue measure 0.
Theorem 1.1.10 If two continuous functions f, g : → are `-a.e. equal, they are everywhere equal.
Proof. Let t ∈ be such that f (t) 6= g(t). For any c > 0, there exists s ∈ [t − c, t + c] such that
f (s) = g(s) (Otherwise, the set {t; f (t) 6= g(t)} would contain the whole interval [t − c, t + c], and
therefore could not be of null Lebesgue measure. Therefore, one can construct a sequence {tn }n≥1
converging to t and such that f (tn ) = g(tn ) for all n ≥ 1. Letting n tend to ∞ yields f (t) = g(t), a
contradiction.
The important tool for this is the theorem of approximation of non-negative measurable functions by
the so-called simple Borel functions.
The following result is the key to the construction of the Lebesgue integral:
Theorem 1.1.11 Let f : (X, X ) → ( , B) be a non-negative Borel function. There exists a nondecreas-
ing sequence {fn }n≥1 of nonnegative simple Borel functions that converges pointwise to f .
12 CHAPTER 1. INTEGRATION
Proof. Take
−n
n2X −1
fn (x) = k2−n 1Ak,n (x),
k=0
where
Ak,n = {x ∈ X : k2−n < f (x) ≤ (k + 1)2−n }.
We leave to the reader to check that this sequence of functions has the announced properties.
STEP 1. For any non-negative simple Borel function f : (X, X ) → ( , B) of the form (1.9), one defines
the integral of f with respect to µ, by
Z k
X
f dµ = ai µ(Ai ). (1.10)
X i=1
where {fn }n≥1 is a nondecreasing sequence of nonnegative simple Borel functions fn : (X, X ) → ( , B)
such that limn↑∞ ↑ fn = f . This definition can be shown to be consistent, in that the integral so defined
is independent of the choice of the approximating sequence. Note that the quantity (1.11) is non-negative
and can be infinite.
Denoting
f + = max(f, 0) and f − = max(−f, 0)
(and in particular f = f + − f − and f ± ≤ |f |), we therefore have
Z Z
f ± dµ ≤ |f | dµ.
X X
Thus, if
Z
|f | dµ < ∞, (1.12)
X
is meaningful and defines the integral of the left-hand side. Moreover, the integral of f with respect to
µ defined in this way is finite.
STEP 3 (ct’d). The integral can be defined for some non-integrable functions. For example, it is defined
for all non-negative
R functions.
R More generally, if f : (X, X ) → ( , B) is such that at least one of the
integrals X f + dµ or X f − dµ is finite, one defines
Z Z Z
f dµ = f + dµ − f − dµ. (1.14)
X X X
This leads to one of the forms “finite minus finite”, “finite minus infinite”, and “infinite minus finite”.
The case which is rigorously excluded is that in which µ(f + ) = µ(f − ) = + ∞.
Example 1.8: Integral with respect to the weighted counting measure. It is not hard to
check that any function f : → is measurable with respect to P( ) and B. With the measure µ
The proof is fairly simple: It suffices to consider the approximating sequence of simple functions
+n
X
fn (k) = f (j)1{j} (k)
j=−n
whose integral is
+n
X +n
X
ν(fn ) = f (j)µ({j}) = f (j)αj
j=−n j=−n
Example 1.9: Integral with respect to the Dirac measure. Let a be the Dirac measure at
point a ∈ X. Then any f : (X, X ) → ( , B) is a -integrable, and
a (f ) = f (a).
For a non-negative function f , and any non-decreasing sequence of simple non-negative Borel functions
{fn }n≥1 converging to f , we have
is a well-defined quantity.
14 CHAPTER 1. INTEGRATION
We give the elementary properties of the integral. First, recall that for all A ∈ X
Z
1A dµ = µ(A), (1.15)
X
R R
and that the notation A
f dµ means X
1A f dµ.
For a complex Borel function f : X → (i.e. f = f1 + if2 , where f1 , f2 : (X, X ) → ( , B)) such that
The extension to complex Borel functions of the properties (a), (b), (d) and (f) in Theorem 1.1.12 is
immediate.
The following result tells us that all the time spent in learning about the Riemann integral has not been
in vain.
Theorem 1.1.13 Let f : ( , B) → ( , B) be Riemann-integrable. Then it is Lebesgue-integrable with
respect to `, and the Lebesgue integral is equal to the Riemann integral.
Example 1.10: Integrable for Lebesgue and not for Riemann. The converse is not true: The
function f defined by f (x) = 1 if x ∈ Q and f (x) = 0 if x 6∈ Q is a Borel function, and it is Lebesgue
integrable with its integral equal to zero because {f 6= 0}, that is Q, has `-measure zero. However, f is
not Riemann integrable.
We begin with the results giving conditions allowing to interchange the order of limit and integration,
that is, Z Z
lim fn dµ = lim ↑ fn dµ. (1.17)
X n↑∞ n↑∞ X
The monotone convergence theorem below is sometimes called the Beppo Levi theorem.
lim ↑ fn = f µ-a.e.,
n↑∞
Theorem 1.2.2 Let fn : (X, X ) → ( , B), n ≥ 1, be such that, for some function f : (X, X ) → ( , B)
and some µ-integrable function g : (X, X ) → ( , B):
Theorem 1.2.3 Let fn : (X, X ) → ( , B), n ≥ 1, be such that fn ≥ 0 µ-a.e. for all n ≥ 1. Then
Z „Z «
(lim inf fn ) dµ ≤ lim inf fn dµ . (1.18)
X n↑∞ n↑∞ x
Example 2.2: Dominated convergence theorem for series. When the measure is the counting
measure on the dominated convergence theorem takes the following form:
Let {ank }n≥1,k≥1 be an array of real numbers such that for some sequence {bk }k≥1 of nonnegative
numbers satisfying
X∞
bk < ∞,
k=1
16 CHAPTER 1. INTEGRATION
then
∞
X ∞
X
lim ank = ak .
n↑∞
k=1 k=1
P
Proof.
P∞ Let > 0 be fixed. Since ∞ k=1 bk is a convergent series, one can find M = M () such that
k=M +1 bk < 3 . Since |ank | ≤ bk and therefore |ak | ≤ bk , we have
∞
X ∞
X 2
|ank | + |ak | ≤ .
k=M +1 k=M +1
3
A very useful application of the dominated convergence theorem is the theorem of differentiation under
the integral sign. Let (X, X , µ) be a measure space and let (a, b) ⊆ . Let f : (a, b) × X → and, for
all t ∈ (a, b), define ft : X → by ft (x) = f (t, x). Assume that for all t ∈ (a, b), ft is measurable with
respect to X , and define, if possible, the function I : (a, b) → by the formula
Z
I(t) = f (t, x) µ(dx). (1.19)
X
Theorem 1.2.4 Assume that for µ-almost all x the function t 7→ f (t, x) is continuous at t 0 ∈ (a, b) and
that there exists a µ-integrable function g : (X, X ) → ( , B) such that |f (t, x)| ≤ |g(x)| µ-a.e. for all t in
a neighbourhood V of t0 . Then I is well-defined and is continuous at t0 . If we furthermore assume that
(α) t → f (t, x) is continuously differentiable on V for µ-almost all x; and
(β) For some µ-integrable function h : (X, X ) → ( , B) and all t ∈ V ,
Proof. Let {tn }n≥1 be a sequence in V \ {t0 } such that limn↑∞ tn = t0 , and define fn (x) = f (tn , x),
f (x) = f (t0 , x). By the dominated convergence theorem,
Also Z
I(tn ) − I(t0 ) f (tn , x) − f (t0 , x)
= µ(dx),
tn − t 0 X tn − t 0
and for some θ ∈ (0, 1), possibly depending upon n,
˛ ˛
˛ f (tn , x) − f (t0 , x) ˛
˛ ˛ ≤ |(df /dt) (t0 + θ(tn − t0 ), x)| .
˛ tn − t 0 ˛
The following result (called the image measure theorem) is especially important in Probability.
Definition 1.2.1 Let (X, X ) and (E, E) be two measurable spaces, let h : (X, X ) → (E, E) be a mea-
surable function, and let µ be a measure on (X, X ). Define the set function µ ◦ h −1 : E → [0, ∞]
by
Then, one easily checks that µ ◦ h−1 is a measure on (E, E) called the image of µ by h.
(a) f ◦ h is µ-integrable,
1.2.2 Fubini
Let (X1 , X1 , µ1 ) and (X2 , X2 , µ2 ) be two measure spaces where µ1 and µ2 are sigma-finite. Define the
product set X = X1 × X2 and the product sigma-field X = X1 × X2 , where by definition the latter is the
smallest sigma-field on X containing all sets of the form A1 × A2 , where A1 ∈ X1 , A2 ∈ X2 .
18 CHAPTER 1. INTEGRATION
for all A1 ∈ X1 , A2 ∈ X2 .
The above result extends in an obvious manner to a finite number of sigma-finite measures.
Example 2.3: Lebesgue measureon n . The typical example of a product measure is the Lebesgue
measure on the space ( n , Bn ): It is the unique measure `n on that space that is such that
`n (Πn n
i=1 Ai ) = Πi=1 `(Ai ) for all A1 , . . . , An ∈ B.
Going back to the situation with two measure spaces (the case of a finite number of measure spaces is
similar) we have the following result:
Theorem 1.2.7 Let (X1 , X1 , µ1 ) and (X1 , X2 , µ2 ) be two measure spaces in which µ1 and µ2 are sigma-
finite. Let (X, X , µ) = (X1 × X2 , X1 × X2 , µ1 ⊗ µ2 ).
(A) Tonelli. If f is non-negative, then, for µ1 -almost all x1 , the function x2 → f (x1 , x2 ) is measurable
with respect to X2 , and Z
x1 → f (x1 , x2 ) µ2 (dx2 )
X2
(B) Fubini.R If f is µ-integrable, then, for µ1 -almost all x1 , the function x2 → f (x1 , x2 ) is µ2 -integrable
and x1 → X2 f (x1 , x2 ) µ2 (dx2 ) is µ2 -integrable, and (1.25) is true.
Example 2.4: When Fubini cannot be applied. Consider the function f defined on X1 × X2 =
(1, ∞) × (0, 1) by the formula
f (x1 , x2 ) = e−x1 x2 − 2e−2x1 x2 .
We have
Z
e−x2 − e−2x2
f (x1 , x2 ) dx1 =
(1,∞) x2
= h(x2 ) ≥ 0,
Z
e−x1 − e−2x1
f (x1 , x2 ) dx2 = −
(0,1) x1
= − h(x1 ).
However, Z Z
1 ∞
h(x2 ) dx2 6= (− h(x1 )) dx1 ,
0 1
1.2. THE BIG RESULTS OF INTEGRATION THEORY 19
since h ≥ 0 `-a.e. on (0, ∞). We therefore see that successive integrations yields different results
according to the order in which they are performed. As a matter of fact, f (x1 , x2 ) is not integrable on
(0, 1) × (1, ∞).
Example 2.5: Fun Fubini. This example is for pure fun. Consider any bounded rectangle of 2 . We
say that it has Property (A) if at least one of its sides “is an integer” (meaning: its length is an integer).
Now you are given a finite rectangle ∆ that is the union of a finite number of disjoint rectangles with
Property (A). Show that ∆ itself must have Property (A).
R
Proof. Let I be a finite interval of . Observe that I e2iπx dx = 0 if and only if the length of I is an
RR
integer. Let now I × J be a finite rectangle. It has Property (A) if and only if e2iπ(x+y) dx dy =
R 2iπx R 2iπy I×J
I
e dx × J e dy = 0. (This is where we use Fubini.) Now
Z Z Z Z
e2iπ(x+y) dx dy = e2iπ(x+y) dx dy
∆ ∪K
n=1 ∆n
K
X Z Z
= e2iπ(x+y) dx dy = 0 ,
n=1 ∆n
Observe that in the first integral we have (a, t] (closed on the right), whereas in the second integral we
have (a, t) (open at the right).
Proof. The proof consists in computing the µ-measure of the square (a, b] × (a, b] in two ways. The
first one is obvious and gives the left-hand side of (1.26). The second one consists in observing that
µ((a, b] × (a, b]) = µ(D1 ) + µ(D2 ), where D1 = {(x, y); a < y ≤ b, a < x ≤ y} and D2 = (a, b] × (a, b] \ D1 .
Then µ(D1 ) and µ(D2 ) are computed using Tonelli’s theorem. For instance,
Z „Z «
µ(D1 ) =
1D1 (x, y)µ1 (dx) µ2 (dy)
and Z Z
1D1 (x, y)µ1 (dx) =
1{a<x≤y}µ1 (dx) = µ1 ((a, y]).
R
stands for g(x) µ(dx). When this integral is used it is usually called the Lebesgue–Stieltjes integral of
where Fi := Fµi (i = 1, 2). This is the Lebesgue–Stieltjes version of the integration by parts formula of
Calculus.
Example 2.6: Fubini for the counting measure. Applied to the product of two counting measures
on , the Fubini theorem deals with the problem of interchanging the order of summations. It says
(observing that almost-everywhere relatively to the counting measure in fact means everywhere since for
such measure, the only set of measure 0 is the void set): Let {ak,n }k,n∈ be a doubly indexed sequence
P
the sum k,n∈
ak,n is well-defined, for each n ∈
X
|ak,n | < ∞
k∈
and !
X X X
ak,n = ak,n .
k,n∈
n∈
k∈
If the terms of the doubly indexed sequence are non-negative, the latter equality holds without condition
(1.28).
Example 2.7: Let {fn }n∈ be a sequence of measurable functions fn : → . Applying Fubini’s
theorem with the product of the Lebesgue measure on by the couunting measure on yields: Under
the condition !
Z X
|fn (t)| dt < ∞ ,
(1.29)
n∈
and !
Z X X „Z «
fn (t) dt =
fn (t) dt .
n∈ n∈
If the fn ’s are non-negative, the latter equality holds without condition (1.29).
1.2.3 Radon–Nikodym
Definition 1.2.2 Let (X, X , µ) be a measure space and let h : (X, X ) → ( , B) be non-negative. Define
the set function ν : X → [0, ∞] by
Z
ν(C) = h(x) µ(dx). (1.30)
C
Then, one easily checks that ν is a measure on (X, X ) called the product of µ with the function h.
1.2. THE BIG RESULTS OF INTEGRATION THEORY 21
(a) f is ν- integrable,
(b) f h is µ-integrable,
Proof. Verifying (1.31) for elementary non-negative functions and approximating f by a non-decreasing
sequence of such functions, the monotone convergence theorem is then used, as in the proof of (1.22).
For the case of functions of arbitrary sign, apply (1.31) with f = f + and f = f − .
Definition 1.2.3 Let µ and ν be two measures on (X, X ) such that (1.32) holds for all C ∈ X . Then
ν is said to be absolutely continuous with respect to µ. The measures µ and ν on (X, X ) are said to be
mutually singular if there exist two set A, B ∈ X such that X = A ∪ B and ν(A) = µ(B) = 0.
Theorem 1.2.10 Let µ and ν be two measures on (X, X , with µ sigma-finite, such that ν << µ. Then
there exists a non-negative function h : (X, X ) → ( , B) such that
This function h is called the Radon-Nikodym derivative of ν with respect to µ and is denoted dν/dµ.
The function h is easily seen to be µ-essentially unique in the sense that if h0 is another such function
then h = h0 µ-a.e..
Theorem 1.2.11 Let µ and ν be two sigma-finite measures on (X, X ). There exists an unique decom-
position
ν = ν 1 + ν2 , (1.34)
where ν2 is a measure on (X, X ) that is singular with respect to µ, and where ν 1 < µ.
22 CHAPTER 1. INTEGRATION
Let (X, X , µ) be a measure space and let f, g : → be two Borel functions defined on X. The
relation R defined by
(f Rg) if and only if (f = g µ-a.e.)
is an equivalence relation, and we shall denote the equivalence class of f by {f }. Note that for any p > 0
(using property b of Theorem 1.1.12),
„Z Z «
(f Rg) =⇒ |f |p dµ = |g|p dµ .
X X
The operations +, ×, ∗ , and multiplication by a scalar α ∈ are defined on the equivalence class by
The first equality means that {f } + {g} is, by definition, the equivalence class consisting of the functions
f + g, where f and g are abritrary members of {f } and {g}, respectively. A similar interpretation holds
for the other equalities.
R
By definition, for a given p ≥ 1, Lp (µ) is the collection of equivalence classes {f } such that X |f |p dµ <
∞. Clearly it is a vector space over (for the proof recall that
„ «p
|f | + |g| 1 1
≤ |f |p + |g|p
2 2 2
In order to avoid cumbersome notation, in this Section and in general whenever we consider Lp -spaces,
we shall write f for {f }. This abuse of notation is harmless since two members of the same equivalence
class have the same integral if that integral is defined. Therefore, using loose notation,
Z ff
Lp (µ) = f : |f |p dµ < ∞ . (1.35)
X
Theorem 1.3.1 Let p and q be positive real numbers such that p > q. If the measure µ on (X, X , µ) is
finite, then Lp (µ) ⊆ Lq (µ). In particular, L2 (µ) ⊆ L1 (µ).
Proof. From the inequality |a|q ≤ 1 + |a|p , true for all a ∈ , it follows that µ(|f |q ) ≤ µ(1) + µ(|f |p ).
(p and q are then said to be conjugate), and let f, g : (X, X ) 7→ ( , B) be non-negative. Then, we have
Hölder’s inequality
Z »Z –1/p »Z –1/q
f g dµ ≤ f p dµ g q dµ . (1.36)
X X X
2 1
In particular, if f, g ∈ L ( ), then f g ∈ L ( ).
Proof. Let
„Z «1/p „Z «1/q
p q
A= (f dµ ,B= g dµ .
X X
We may assume that 0 < A < ∞, 0 < B < ∞, because otherwise Hölder’s inequality is trivially satisfied.
Define F = f /A, G = g/A, so that
Z Z
F p dµ = Gq dµ = 1.
X X
The inequality
1 1
F (x)G(x) ≤ F (x)p + G(x)q . (1.37)
p q
is trivially satisfied if x is such that F (x) = 0 or G(x) = 0. If F (x) > 0 and G(x) > 0 define
From the convexity of the exponential function and the assumption that 1/p + 1/q = 1,
1 s(x) 1
es(x)/p+t(x)/q ≤ e + et(x) ,
p q
and this is precisely the inequality (1.37). Integrating this inequality yields
Z
1 1
(F G) dµ ≤ + = 1,
X p q
and this is just (1.36).
Theorem 1.3.3 Let p ≥ 1 and let f, g : (X, X ) 7→ ( , B( )) be non-negative and such that
Z Z
f p dµ < ∞, g p dµ < ∞.
X X
Proof. For p = 1 the inequality (in fact an equality) is obvious. Therefore, assume p > 1. From Hölder’s
inequality
Z »Z –1/p »Z –1/q
f (f + g)p−1 dµ ≤ f p dµ (f + g)(p−1)q
X X X
and Z »Z –1/p »Z –1/q
p−1 p (p−1)q
g(f + g) dµ ≤ g dµ (f + g) .
X X X
Adding together the above two inequalities and observing that (p − 1)q = p, we obtain
24 CHAPTER 1. INTEGRATION
One may assume that the right-hand side of R (1.38) is finite and that the left-hand side is positive
(otherwise the inequality is trivial). Therefore X (f + g)p dµ ∈ (0, ∞). We may therefore divide both
ˆR ˜1/q
sides of the last display by X (f + g)p dµ . Observing that 1−1/q = 1/p yields the desired inequality
(1.38).
For the last assertion of the theorem, take p = q = 2.
`R ´1/p
Proof. Clearly, νp (αf ) = |α|νp (f ) for all α ∈ , f ∈ Lp (µ); Also, νp (f ) = 0 if and only if X |f |p dµ
=
0 which is in tturn equivalent to f = 0, µ − a.e.; Finally, νp (f + g) ≤ νp (f ) + νp (g) for all f, g ∈ Lp (µ),
by Minkowski’s inequality. Therefore νp is a norm.
1.3.3 Completeness of Lp
We shall denote νp (f ) by kf kp . Thus Lp (µ) is a normed vector space over , with the norm k · kp and
the induced distance
dp (f, g) = kf − gkp .
Theorem 1.3.5 Let p ≥ 1. The distance dp makes of Lp (µ) a complete normed vector space.
Let
k
X ∞
X
gk = |fni+1 − fni |, g = |fni+1 − fni |.
i=1 i=1
By (1.40) and Minkowski’s inequality we have kgk kp ≤ 1. Fatou’s lemma applied to the sequences
{gkp }k≥1 gives kgkp ≤ 1. In particular, any member of the equivalence class of g is finite µ-almost
everywhere, and therefore
∞
X ` ´
fn1 (x) + fni+1 (x) − fni (x)
i=1
converges absolutely for µ-almost all x. Call the corresponding limit f (x) (set f (x) = 0 when this limit
does not exist). Since
k−1
X` ´
fn 1 + fni+1 − fni = fnk
i=1
1.3. THE RIESZ–FISCHER THEOREM 25
we see that
f = lim fnk µ-a.e..
k↑∞
p
One must show that f is the limit in L (µ) of {fnk }k≥1 . Let > 0. There exists an integer n = N ()
such that kfn − fm kp ≤ whenever m, n ≥ N . For all m > N , by Fatou’s lemma we have
Z Z
|f − fm |p dµ ≤ lim inf |fni − fm |p dµ ≤ p .
X i→∞ x
p p
Therefore f − fm ∈ L (µ), and consequently f ∈ L (µ). It also follows from the last inequality that
lim kf − fm kp = 0.
m→∞
Note that the statement in (1.41) is about functions and is not about equivalence classes. The functions
thereof are any members of the corresponding equivalence class. In particular, since when a given
sequence of functions converges µ-a.e. to two functions, these two functions are necessarily equal µ-a.e.,
Theorem 1.3.7 If {fn }n≥1 converges both to f in Lp (µ) and to g µ-a.e., then f = g µ-a.e.
Example 3.1: The space `p ( ). When the measure µ is the counting measure on
, we use the
notation `p ( ) (or, if the context permits, `p ) for Lp (µ). Therefore
( )
p
X p
` ( ) := a ∈
;
|an | < ∞
n∈
26 CHAPTER 1. INTEGRATION
1.4 Exercises
Exercise 4.1.
Prove Corollary 1.1.1
Exercise 4.2.
Prove Corollary 1.1.2
Exercise 4.3.
Prove Theorem 1.1.3
Exercise 4.4.
Let (X, X ) be some measurable space, and let fn : (X, X ) → ( , B), n ≥ 1, be a sequence of functions
that is pointwise nondecreasing, that is, for all x ∈ X, the sequence of real numbers {fn (x)}n≥1 is
nondecreasing, and in particular, it admits a (possibly infinite) limit f (x). Show that the function
f : (X, X ) → ( , B) is measurable.
Exercise 4.5.
Let ψ be a function in L2R (R) with the FT ψ̂ = 2π 1
1I , where I = [−2π, −π] ∪ [+π, +2π]. Show that
{ψj,n }j∈ ,n∈ is a Hilbert basis of LR (R), where ψj,n (x) = 2j/2 ψ(2j/2 x − n).
2
Exercise 4.6.
Let {gj }j≥0 be a Hilbert basis of L2 ((0, 1]). Show that {gj (· − n)1(n,n+1](·) }j≥0,n∈ is a Hilbert basis of
1.5 Solutions
Indeed, D = φ−1 (C) is a set in Y since φ ∈ E/Y, and therefore g −1 (D) ∈ X since g ∈ Y/X .
Hilbert spaces
The quantity < x, y > is called the inner product of x and y. For any x ∈ E, denote
29
30 CHAPTER 2. HILBERT SPACES
Proof. We do the proof for the case K = . We may assume that < x, y >6= 0, otherwise the result is
trivial. For all λ ∈ ,
kxk2 + 2λ < x, y >2 +λ2 < x, y >2 kyk2 = kx + λ < x, y > yk2 ≥ 0.
This second degree polynomial in λ ∈ therefore cannot have two distinct real roots, and this implies
a non-positive discriminant:
and therefore the inequality (2.2) holds. Equality corresponds to a null discriminant, and this in turn
implies a double root λ of the polynomial. For such a root kx + λ < x, y > yk2 = 0, which implies
x + λ < x, y > y = 0.
Theorem 2.1.2 The mapping x → kxk is a norm on E, that is to say, for all x, y ∈ E, all α ∈ ,
Proof. The proof of (a) and (b) is immediate. For (c) write
and
(kxk + kyk)2 = kxk2 + kyk2 + 2kxkkyk.
It therefore suffices to prove
< x, y > + < y, x > ≤ 2 kxkkyk,
and this follows from Schwarz’s inequality.
d(x, y) = kx − yk.
The above properties are immediate consequences of (a), (b), and (c) of Theorem 2.1.2. When endowed
with a distance, a space H is called a metric space.
Definition 2.1.1 The pre-Hilbert space H is called a Hilbert space if the distance d makes of it a complete
metric space.
By complete, the following is meant: If {xn }n≥1 is a Cauchy sequence in H, that is, if
lim d(xm , xn ) = 0,
m,n↑∞
Proof. We have
| < x + h1 , y + h2 > − < x, y > | = | < x, h2 > + < h1 , y > + < h1 , h2 > |.
By Schwarz’s inequality | < x, h2 > | ≤ kxkkh2 k, | < h1 , y > | ≤ kykkh1 k, and | < h1 , h2 > | ≤ kh1 kkh2 k.
Therefore
lim | < x + h1 , y + h2 > − < x, y > | = 0.
kh1 k,kh2 k↓0
In other words, the inner product of a Hilbert space is bicontinuous. In particular, the norm x 7→ kxk is
a continuous function from H to + .
Z
|f (x)|2 µ(dx) < ∞,
X
where two functions f anx f such that f (x) = f 0 (x), µ − a.e. are not distinguished. We have by the
0
Riesz–Fischer theorem:
Theorem 2.1.4 L2 (µ) is a vector space with scalar field , and when endowed with the inner product
Z
< f, g >= f (x)g(x)∗ µ(dx) (2.3)
X
it is a Hilbert space.
The completeness property of L2 (µ) reads, in this case, as follows. If {fn }n≥1 is a sequence of functions
in L2 (µ) such that Z
lim |fn (x) − fm (x)|2 µ(dx) = 0,
m,n↑∞ X
2
then, there exists a function f ∈ L (µ) such that
Z
lim |fn (x) − f (x)|2 µ(dx) = 0.
n↑∞ X
32 CHAPTER 2. HILBERT SPACES
Example 1.1: The Hilbert space `2 ( ). The space `2 ( ) of complex sequences a = {an }n∈ such
that X
|an |2 < ∞
n∈
is a Hilbert space. This is indeed a particular case of a Hilbert space L2 (µ), with X = , and µ is the
counting measure. In this example, Schwarz’s inequality reads
!1 !1
X ∗
X 2
2 X 2
2
| an bn | ≤ |an | × |bn | .
n∈ n∈ n∈
Theorem 2.2.1 Let G ⊂ H be a vector subspace of the Hilbert space H. Endow G with the Hermitian
product which is the restriction to G of the Hermitian product on H. Then, G is a Hilbert space if and
only if G is closed in H. (G is then called a Hilbert subspace of H.)
Proof. Assume that G is closed. Let {xn }n∈N be a Cauchy sequence in G. It is a fortiori a Cauchy
sequence in H, and therefore it converges in H to some x, and this x must be in G, because it is a limit
of elements of G and G is closed.
Assume that G is a Hilbert space with the hermitian product induced by the hermitian product of
H. In particular every convergent sequence {xn }n∈N of elements of G converges to some element of G.
Therefore G is closed.
Definition 2.2.1 Two elements x, y ∈ E are said to be orthogonal if < x, y >= 0. Let G be a Hilbert
subspace of the Hilbert space H. The orthogonal complement of G in H, denoted G ⊥ , is defined by
G⊥ = {z ∈ H :< z, x >= 0 for all x ∈ G} . (2.4)
Let x1 , . . . , xn ∈ E be pairwise orthogonal. We have Pythagoras’ theorem:
n
X n
X
k x i k2 = kxi k2 , . (2.5)
i=1 i=1
Clearly, G⊥ is a vector space over . Moreover, it is closed in H since if {zn }n≥1 is a sequence of
Theorem 2.2.2 Let x ∈ H. There exists an unique element y ∈ G such that x − y ∈ G ⊥ . Moreover,
Proof. Let d(x, G) = inf z∈G d(x, z) and let {yn }n≥1 be a sequence in G such that
1
d(x, G)2 ≤ d(x, yn )2 ≤ d(x, G)2 + . (2.7)
n
The parallelogram identity gives, for all m, n ≥ 1,
1
kyn − ym k2 = 2(kx − yn k2 + kx − ym k2 ) − 4kx − (ym + yn )k2 .
2
1
Since 2
(yn + ym ) ∈ G,
1
kx − (ym + yn )k2 ≥ d(x, G)2 ,
2
and therefore „ «
1 1
kyn − ym k2 ≤ 2 + .
n m
The sequence {yn }n≥1 is therefore a Cauchy sequence in G and consequently it converges to some y ∈ G
since G is closed. Passing to the limit in (2.7) gives (2.6).
kx − y 0 k = kx − yk = d(x, G),
This is trivially true if z = 0, and we shall therefore assume z 6= 0. Because y + λz ∈ G for all λ ∈
that is,
kx − yk2 + 2λRe {< x − y, z >} + λ2 kzk2 ≥ d(x, G)2 .
Since
kx − yk2 = d(x, G)2
we have
− 2λRe {< x − y, z >} + λ2 kzk2 ≥ 0 for all λ ∈ ,
which implies Re {< x − y, z >} = 0. The same type of calculation with λ ∈ i (pure imaginary) leads
to
= {< x − y, z >} = 0.
Therefore
< x − y, z >= 0.
That y is the unique element of G such that y − x ∈ G⊥ follows from the observation made just
before the statement of Theorem 2.2.2.
Definition 2.2.2 The element y in Theorem 2.2.2 is called the orthogonal projection of x on G and is
denoted PG (x).
The projection theorem states, in particular, that for any x ∈ G there is an unique decomposition
x = y + z, y ∈ G, z ∈ G⊥ , (2.8)
Theorem 2.2.3 The orthogonal projection y = PG (x) is characterized by the two following properties:
(1) y ∈ G;
(2) < y − x, z >= 0 for all z ∈ G.
This characterization is called the projection principle and is useful in determining projections.
Let C be a collection of vectors in the Hilbert space H. The linear span of C, denoted sp(C) is, by
definition, the set of all finite linear combinations of vectors of C. This is a vector space. The closure of
this vector space, sp(C), is called the Hilbert subspace generated by C. By definition, x belongs to this
subspace if and only if there exists a sequence of vectors {xn }n≥1 such that
(ii) limn↑∞ xn = x
b∈G
(α) x
(β) < x − x
b, z >= 0 for all z ∈ C.
Note that we have to satisfy requirement not for all z ∈ G, but only for all z ∈ C.
Proof. We have to show that < x − x b, z >= 0 for all z ∈ G. But z = lim n↑∞ zn where {zn }n≥1 is a
sequence of vectors of sp(C) such that limn↑∞ zn = z. By hypothesis, for all n ≥ 1, < x − x
b, zn >= 0.
Therefore, by continuity of the inner product,
<x−x
b, z > lim < x − x
b, zn >= 0.
n↑∞
2.2. PROJECTIONS AND ISOMETRIES 35
Definition 2.2.3 Let H and K be two Hilbert spaces with Hermitian products denoted f orm H and
f ormK , respectively, and let φ : H 7→ K be a linear mapping such that for all x, y ∈ H
< φ(x), φ(y) >K =< x, y >H . (2.9)
Then, φ is called a linear isometry from H into K. If, moreover, φ is from H onto K, then H and K
are said to be isomorphic.
Note that a linear isometry is necessarily injective, since φ(x) = φ(y) implies φ(x − y) = 0, and
therefore
0 = kφ(x − y)kK = kx − ykH ,
and this implies x = y. In particular, if the linear isometry is onto, it is necessarily bijective.
Recall that a subset A ∈ E where (E, d) is a metric space, is said to be dense in E, if for all x ∈ E,
there exists a sequence {xn }n≥1 in A converging to x. The following result is a useful tool called the
isometry extension theorem. .
Theorem 2.2.5 Let H and K be two Hilbert spaces with Hermitian products f orm H and f ormK ,
respectively. Let V be a vector subspace of H that is dense in H, and let φ : V 7→ K be a linear isometry
from V to K (φ is linear and (2.9) holds for all x, y ∈ V ). Then, there exists a unique linear isometry
φ̃ : H 7→ K whose restriction to V is φ.
Proof. We shall first define φ̃(x) for x ∈ H. Since V is dense in H, there exists a sequence {xn }n≥1 in
V converging to x. Since φ is isometric,
kφ(xn ) − φ(xm )kK = kxn − xm kH for all m, n ≥ 1.
In particular {φ(xn )}n≥1 is a Cauchy sequence in K and therefore it converges to some element of K
which we denote φ̃(x).
The definition of φ̃(x) is independent of the sequence {xn }n≥1 converging to x. Indeed, for another
such sequence {yn }n≥1
lim kφ(xn ) − φ(yn )kK = lim kxn − yn kH = 0.
n↑∞ n↑∞
The mapping φ̃ : H 7→ K so constructed is clearly an extension of φ (for x ∈ V one can take as the
approximating sequence of x the sequence {xn }n≥1 such that xn ≡ x).
The mapping φ̃ is linear. Indeed, let x, y ∈ H, α, β ∈ , and let {xn }n≥1 and {yn }n≥1 be two
sequences in V converging to x and y, respectively. Then {αxn + βyn }n≥1 converges to αx + βy.
Therefore
lim φ(αxn + βyn ) = φ̃(αx + βy).
n↑∞
But
φ(αxn + βyn ) = αφ(xn ) + βφ(yn ) → αφ̃(x) + β φ̃(y).
Therefore φ̃(αx + βy) = αφ̃(x) + β φ̃(y).
The mapping φ̃ is isometric since, in view of the bicontinuity of the Hermitian product and of the
isometricity of φ,
where {xn }n≥1 and {yn }n≥1 are two sequences in V converging to x and y, respectively.
36 CHAPTER 2. HILBERT SPACES
An orthonormal system {en }n≥0 is free in the sense that an arbitrary finite subset of it is linearly
independent. Indeed, taking (e1 , . . . , ek ) for example, the relation
k
X
αi ei = 0
i=1
implies that
k
X
α` =< e` , αi ei >>= 0 1 ≤ ` ≤ k.
i=1
The following theorem gives the preliminary results that we shall need for the proof of the Hilbert basis
theorem.
Theorem 2.3.1 Let {en }n≥0 be an orthonormal system of H and let G be the Hilbert subspace of H
generated by {en }n≥1 . Then:
P
(a) For an arbitrary sequence {αn }n≥0 of complex numbers the series n≥0 αn en is convergent in
2
H if and only if {αn }n≥1 ∈ ` , in which case
X X
k α n e n k2 = |αn |2 . (2.10)
n≥0 n≥0
P
|αn |2 < ∞. In this case equality (2.10) follows from the continuity of the norm, by letting n tend
n≥0
to ∞ in the last display.
(b) According to (α) of Theorem ??, kxk ≥ kPGn (x)k, where Gn is the Hilbert subspace spanned by
{e1 , . . . , en }. But
Xn
PGn (x) = < x, ei > ei ,
i=0
Therefore
n
X
kxk2 ≥ | < x, ei > |2 ,
i=0
P (c) From (2.11) and the result (a), it follows that the series
n≥0 < x, en > en converges. For any m ≥ 0 and for all N ≥ m
N
X
<x− < x, en > en , em >>= 0,
n=0
P P
This implies that x − n≥0 < x, en > en is orthogonal to G. Also n≥0 < x, en > en ∈ G. Therefore,
by the projection principle, X
PG (x) = < x, en > en .
n≥0
N
!2 N
! N
!
X ∗
X 2
X 2
| < x, en >< y, en > | ≤ | < x, en > | | < y, en > |
n=0 n=0 n=0
≤ kxk kyk2 .
2
P∞
Therefore the series n=0 < x, en >< y, en >∗ is absolutely convergent. Also, by an elementary
computation,
N
X N
X N
X
< < x, en > en , < y, en > en >>= < x, en >< y, en >∗ .
n=0 n=0 n=0
Letting N → ∞ we obtain (2.13) (using (2.12) and the continuity of the Hermitian product).
In other words, the finite linear combination of the elements of {wn }n≥0 forms a dense subset of H.
We are now ready for the fundamental result: Hilbert basis theorem.
38 CHAPTER 2. HILBERT SPACES
Theorem 2.3.2 Let {en }n≥0 be an orthonormal system of H. The following properties are equivalent:
(c)For all x ∈ H,
X
x= < x, en > en . (2.15)
n≥0
where G is the Hilbert subspace generated by {en }n≥0 . Since {en }n≥0 is total, it follows by (??) that
G⊥ = {0}, and therefore PG (x) = x.
(c)⇒(b) This follows from (a) of Theorem 2.3.1.
(b)⇒(a) From (2.10) and (2.12)
X
| < x, en > |2 = kPG (x)k2 ,
n≥0
and therefore
kx − PG (x)k2 = 0,
which implies
x = PG (x).
Since this is true for all x ∈ H we must have G ≡ H, i.e. {en }n≥0 is total in H.
A sequence {en }n≥0 satisfying one (and then all) of the conditions of Theorem 2.3.2 is called a (
denumerable) Hilbert basis of H.
Definition 2.3.3 Two sequences {en }n≥0 and {dn }n≥0 of a Hilbert space H form a biorthonormal system
if:
This system is called complete if in addition each of the sequences {e n }n≥0 and {dn }n≥0 form a total
subset of H.
2.3. ORTHONORMAL EXPANSIONS 39
whenever these series converge. Indeed, with the first series for example, calling its sum y, we have for
any integer m ≥ 0
X
< y, em > = < < x, en > dn , em >>
n≥0
X
= < x, en >< dn , em >=< x, em > .
n≥0
Therefore
< x − y, em >= 0 for all m ≥ 0.
Since {en }n≥0 is total in H this implies x − y = 0.
An interesting theoretical question is: for what type of Hilbert spaces is there a denumerable Hilbert
basis? Here is a first (theoretical) answer.
Definition 2.3.4 A Hilbert space H is called a separable Hilbert space if it contains a sequence {f n }n≥0
that is dense in H.
Theorem 2.3.3 A separable Hilbert space admits at least one denumerable Hilbert basis.
Let {fn }n≥0 be a sequence of vectors of a Hilbert space H. Construct {en }n≥0 as follows:
• Set p(0) = 0 and e0 = f0 /kf0 k (assuming f0 6= 0 without loss of generality);
• e0 , . . . , en and p(n) being defined, let p(n+1) be the first index p > p(n) such that fp is independent
of e0 , . . . , en , and define, with p = p(n + 1),
P
fp − n < f p , ei > e i
en+1 = Pi=1 k.
kfp − n i=1 < fp , ei > ei
Let now {fn }n≥0 be a sequence defined in Definition 2.3.4. Construct from it the orthonormal se-
quence {en }n≥0 by the Gram–Schmidt orthonormalization procedure. It is a Hilbert basis because (a)
of Theorem 2.3.2 is satisfied. Indeed, for any z ∈ H,
In particular, < y, z >= 0 for any finite linear combination of {fp }p≥0 . Because {fp }p≥0 is dense,
< y, z >= 0 for all y ∈ H. In particular < z, z >= 0, that is to say, z = 0.
40 CHAPTER 2. HILBERT SPACES
Chapter 3
Fourier analysis
Z
ŝ(ν) = s(t) e−2iπνt dt.
(3.1)
(Note that the argument of the exponential in the integrand is −2iπνt.) The mapping from the function
to its Fourier transform will be denoted by
Proof. We have Z Z
|ŝ(ν)| ≤
|s(t) e−2iπνt | dt =
|s(t)| dt < ∞ ,
Z ˛ ˛
˛ −2iπht ˛
= lim |s(t)| ˛e − 1˛ dt = 0 .
h↓0
41
42 CHAPTER 3. FOURIER ANALYSIS
For the rectangular pulse recT (t) = 1[−T /2,+T /2] (t), a straightforward computation gives:
where
sin(πx)
sinc(x) = .
πx
A classical computation using contour integration shows that the Gaussian pulse is its own FT, that is,
2 2
e−πt → e−πν . (3.3)
1
s(t) = e−at 1
is defined for almost all t, and therefore defines almost-everywhere a function y ∈ L 1 ( ) whose FT is
given by ŷ(ν) = ĥ(ν)x̂(ν).
R
The integral
Z Z
= h(t − s)e−2iπν(t−s) x(s)e−2iπνs ds dt
Z „Z «
= x(s)e−2iπνs
h(t − s)e−2iπν(t−s) dt ds
= ĥ(ν)x̂(ν).
y = h ∗ x.
Example 1.1: The convolution of the rectangular pulse recT with itself is the triangular pulse of base
[−T, +T ] and height T ,
TriT (t) = (T − |t|)1[−T ,+T ] (t).
By the convolution-multiplication rule,
Proof. The FT of a rectangular pulse s satisfies |ŝ(ν)| ≤ K/|ν| (see Eqn. (??)). Hence every function
s that is a finite linear combination of indicator functions of intervals satisfies the same property. Such
finite combinations are dense in L1 ( ), and therefore there exists a sequence sn of integrable functions
such that Z
lim |sn (t) − s(t)| dt = 0,
n→∞
and
Kn
|ŝn (ν)| ≤ ,
|ν|
for finite numbers Kn . From the inequality
Z
|ŝ(ν) − ŝn (ν)| ≤
|s(t) − sn (t)| dt
44 CHAPTER 3. FOURIER ANALYSIS
we deduce that
Z
|ŝ(ν)| ≤ |ŝn (ν)| + |s(t) − sn (t)| dt
Z
Kn
≤ + |s(t) − sn (t)| dt,
|ν|
In spite of the fact that the FT of an integrable function is uniformly bounded and uniformly continuous,
it is not necessarily integrable. For instance, the FT of the rectangular pulse is the cardinal sine, a non-
integrable function. When its FT is integrable, a function admits a Fourier decomposition:
Theorem 3.1.4 Let s : → be an integrable function with Fourier transform ŝ. Under the additional
condition
Z
ŝ(ν)| dν < ∞, (3.9)
holds for almost all t. If s is, in addition to the above assumptions, continuous, equality in (3.10) holds
for all t.
(Note that the exponent of the exponential of the integrand is +2iπνt.)
Proof. We first check (exercise) that the above result is true for the function
2
eα,a (t) = e−αt +at
(α ∈ , α > 0, a ∈ ).
Let now s be an integrable function and consider the Gaussian density function
1 t2
hσ (t) = √ e− 2σ2
σ 2π
with the FT 2
σ2 ν 2
ŝσ (ν) = e−2π .
We first show that the inversion formula is true for the convolution (s ∗ hσ ). Indeed,
Z
(s ∗ hσ )(t) = s(u)hσ (u)e 1 , u (t) du, (3.11)
2σ2 σ2
and the FT of this function is, by the convolution–multiplication formula ŝ × ŝσ . Computing this FT
directly from the right-hand side of (3.11), we obtain
Z „Z «
−2iπνt
ŝ(ν)ŝσ (ν) = s(u)hσ (u) e 1 , u (t)e dt du
2σ2 σ2
Z
= s(u)hσ (u)ê 1 , u (ν) du.
2σ2 σ2
2σ2 σ2
Z
= s(u)hσ (u)e 1 , u (t) du
2σ2 σ2
= (s ∗ hσ )(t).
3.1. FOURIER TRANSFORM 45
Therefore, we have
Z
(s ∗ hσ )(t) =
ŝ(ν)ŝσ (ν)e2iπνt dν, (3.12)
Since for all ν ∈ limσ↓0 ν ↑ ŝσ (ν) = 1, it follows from Lebesgue’s dominated convergence theorem that
when σ ↓ 0 the right-hand side of (3.12) tends to
Z
ŝ(ν)e2iπνt dν.
for all t ∈ . If we can prove that when σ ↓ 0, the function on the left-hand side of (3.12) converges in
L1 ( ) to the function s, then, for almost all t ∈ , we have the announced equality (Theorem THM125).
R R
(using the fact
R
Now, |f (u)| is bounded (by 2
σ↓0
To prove that limu↓0 f (u) = 0 we begin with the case where s : → is continuous with compact
Let now s : → be only integrable. Let {sn (·)}n≥1 be a sequence of continuous functions with
where Z
d(s(· − u), sn (· − u)) =
|s(t − u) − sn (t − u)| dt,
Suppose that, in addition, s is continuous. The right-hand side of (3.10) defines a continuous function
because ŝ(ν) is integrable. The everywhere equality in (3.10) follows from the fact that two continuous
functions that are almost-everywhere equal are necessarily everywhere equal.
Corollary 3.1.1 If two integrable functions s1 and s2 have the same Fourier transform, then they are
equal almost everywhere.
46 CHAPTER 3. FOURIER ANALYSIS
Proof. The function s(t) = s1 (t) − s2 (t) has the FT ŝ(ν) = 0, which is integrable, and thus by (3.10),
s(t) = 0 for almost all t.
Exercise 3.9 is very important. It shows that for functions that cannot be called pathological, the version
of the Fourier inversion theorem that we have in this chapter is not applicable.
In the course of the proof of Theorem 3.1.4, we have used a special case of the regularization lemma
below, which is very useful in many circumstances.
Z +a
lim hσ (u)du = 1, for all a > 0,
σ↓0 −a
Proof. We can use the proof of Theorem 3.1.4, starting from (3.13). The only place where the specific
form of hσ (a Gaussian density) is used is (3.14). We must therefore prove that
Z
lim f (u)hσ (u) = 0
σ↓0
independently. Fix ε > 0. Since limu↓0 f (u) = 0, there exists a = a(ε) such that
Z +a Z +a
ε ε
f (u)hσ (u)du ≤ hσ (u)du ≤ .
−a 2 −a 2
ε
The last integral is, for sufficiently small σ, lesser than 2M . Therefore, for sufficiently small σ,
Z
ε ε
f (u)hσ (u)du ≤ + = ε.
2 2
The function hσ is an approximation of Dirac’s generalized function δ, in that for all ϕ ∈ Cc0 ,
Z Z
lim hσ (t) ϕ(t) dt = ϕ(0) = δ(t) ϕ(t) dt.
σ↓0
The last equality is symbolic, and defines Dirac’s generalized function. The first equality is obtained as
in the proof of the above lemma, letting this time f (u) = ϕ(u) − ϕ(0).
Theorem 3.1.5 (a) If the integrable function s is such that t k s(t) ∈ L1 ( ) for all 1 ≤ k ≤ n, then its
FT is in C n , and
(−2iπt)k s(t) → ŝ(k) (ν) for all 1 ≤ k ≤ n.
(b) If the function s(t) ∈ C n is , and if it is, together with its n first derivatives, integrable, then
we can differentiate k times under the integral sign (see Theorem ?? of the appendix and the hypothesis
tk s(t) ∈ L1 ( )), and obtain Z
ŝ(k) (ν) = (−2iπt)k e−2iπt s(t) dt.
(b) It suffices to prove this for n = 1, and iterate the result. We first observe that lim|a|↑∞ s(a) = 0.
Indeed, with a > 0 for instance, Z a
s(a) = s(0) + s0 (t)dt,
0
0 1
and therefore, since s (t) ∈ L ( ), the limit exists and is finite. This limit must be 0 because s is
integrable. Now, the FT of s0 is
Z Z +a
e−2iπνt s0 (t) dt = lim e−2iπνt s0 (t) dt.
a↑∞ −a
The results of the present section, and more generally, of this chapter, extend to the spatial case.
The Fourier transform of this function when defined is a function ŝ : n → that is, ν = (ν1 , . . . , νn ) ∈
n
→ ŝ(ν) ∈ . The scalar product of t = (t1 , . . . , tn ) ∈ n and ν = (ν1 , . . . , νn ) ∈ n is denoted by
n
X
< t, ν >:= tk νk .
k=1
We shall occasionally only quote an important result, but we shall omit the proofs since these are the
same, mutatis mutandis, as in the univariate case. For instance:
If s is in L1 ( n
) then the Fourier transform ŝ is well defined by
Z
ŝ(ν) = s(t) e2iπ<t,ν> dt.
The Fourier transform is then uniformly continuous and bounded, and if moreover ŝ is integrable, then
the inversion formula Z
s(t) = ŝ(ν) e2iπ<t,ν> dν
n
48 CHAPTER 3. FOURIER ANALYSIS
The proof is exactly the same as that of Theorem 3.1.4, with the obvious adaptations. For instance the
multivariate extension of the function hσ thereof is
1 ||t||2
−
hσ (t) = 2
n e 2σ2 ,
(2πσ ) 2
Pn
where ||t||2 = 2 1
k=1 tk . We also need to observe that if s1 , . . . , sn are functions in L ( ), then
n
s: → defined by
We start with a technical result. We use f (.) to denote the function f : 7→ ; in particular, f (a + .)
is the function fa : 7→ defined by fa (t) = f (a + t).
t → s(t + · )
is uniformly continuous.
tends to 0 when h → 0 (the uniformity in t of convergence then follows, since this quantity is independent
of t). When s is continuous and compactly supported the result follows by dominated convergence. The
general case where s(t) ∈ L2 ( ) is obtained by approximating s ∈ L2 ( ) by continuous compactly
supported functions (see the proof of Theorem 3.1.4).
and is called the autocorrelation function of the finite energy function s. Note that it is the convolution
s ∗ s̃, where s̃(t) = s(−t)∗ .
3.1. FOURIER TRANSFORM 49
Proof. The function s̃ admits the FT ŝ∗ , and therefore by the convolution–multiplication rule
1 t2
hσ (t) = √ e− 2σ2 .
σ 2π
Applying the result in (3.12), with (s ∗ s̃) instead of s, and observing that hσ is an even function, we
obtain
Z Z
|ŝ(ν)|2 ŝσ (ν) dν =
(s ∗ s̃)(x)hσ (x) dx.
(3.18)
2
σ 2 x2
R
Since ŝσ (ν) = e−2π ↑ 1 when σ ↓ 0, the left-hand side of (3.18) tends to
On the other hand, since the autocorrelation function (s ∗ s̃) is continuous and bounded, the quantity
Z Z
(s ∗ s̃)(x)hσ (x) dx =
(s ∗ s̃)(σy)h1 (y) dy
by dominated convergence.
From the result of the last section, we have that the mapping ϕ : s(t) → ŝ(ν) from L1 ( ) ∩ L2 ( )
into L2 ( ) thus defined is isometric and linear. Since L1 ∩ L2 is dense in L2 , this linear isometry can
be uniquely extended into a linear isometry from L2 ( ) into itself (Theorem 2.2.5). We will continue
to denote by ŝ(ν) the image of s under this isometry, and to call it the FT of s.
Z Z
s1 (t)s2 (t)∗ dt = ŝ1 (ν)ŝ2 (ν)∗ dν.
(3.19)
R
Proof. Let us first show that h(t − s)x(s) ds is well defined. For this we observe that on the one
hand
Z Z
|h(t − s)| |x(s)| ds ≤
|h(t − s)|(1 + |x(s)|2 ) ds
Z Z
= |h(t)| dt + |h(t − s)| |x(s)|2 ds,
and y : → is almost-everywhere well defined. Let us now show that y(t) ∈ L2 ( ). Using Fubini’s
Z Z Z ff
=
x(t − u)x(t − v)∗ dt h(u)h(v)∗ du dv
„Z « „Z «2
≤
|x(s)|2 ds
|h(u)| du < ∞.
kh ∗ xkL2 (
) ≤ khkL1 (
) kxkL2 ( )
(3.22)
The function (3.20) is thus in L2 ( ) when h(t) ∈ L1 ( ) and x(t) ∈ L2 ( ). If, furthermore, x ∈ L1 ( ),
then y ∈ L1 ( ). Therefore
xA (t) = x(t)1[−A,+A](t)
we have ŷA (ν) = ŝ(ν)x̂A (ν). Also, lim yA = y in L2 ( ) (use (3.22), and thus lim ŷA (ν) = ŷ(ν) in
L2 ( ). Now, since lim x̂A (ν) = x̂(ν) in L2 ( ) and ŝ(ν) is bounded, lim ŝ(ν)x̂A (ν) = ŝ(ν)x̂(ν) in
L2 ( ). Therefore, we have (3.21).
So far, we know that the mapping ϕ : L2 ( ) 7→ L2 ( ) defined above is linear, isometric, and into. We
shall now show that it is onto, and therefore bijective.
3.1. FOURIER TRANSFORM 51
that is to say,
Z +A
s(t) = lim ŝ(ν)e2iπνt dν. (3.25)
A↑∞ −A
We shall prepare the way for the proof with the following result.
Z Z
u(x)v̂(x) dx =
û(x)v(x) dx.
(3.26)
Proof. If (3.26) is true for u(t), v(t) ∈ L1 ( ) ∩ L2 ( ) then it also holds for u(t), v(t) ∈ L2 ( ). Indeed,
denoting xA (t) = x(t)1[−A,+A](t) we have
Z Z
d
uA (x)(v A )(x) dx =
[
(u A )(x)vA (x) dx,
2
that is to say, < uA , vc
A >=< u cA , vA >. Now uA , vA , uc c
A and v A tend in L ( ) to u, v, û and v̂,
respectively, as A ↑ ∞, and therefore, by the continuity of the Hermitian product, < u, v̂ >=< û, v >.
Z Z ff
=
v(y) u(x)e−2iπxy dy dy
Z
=
v(y)û(y) dy.
d
Proof. (of (3.24)) Let g : → be a real function in L2 ( ) and define f = (g − ), where g − (t) = g(−t).
Z
= ĝ(x)ĝ(x)∗ dx.
Therefore
In other words, every real (and therefore, every complex) function g(t) ∈ L2 ( ) is the Fourier transform
of some function of L2 ( ). Hence the mapping ϕ is onto.
52 CHAPTER 3. FOURIER ANALYSIS
A function s : → is called periodic with period T > 0 (or T -periodic) if for all t ∈ ,
s(t + T ) = s(t).
Such function is called locally integrable, if in addition s ∈ L1 ([0, T ]), that is,
Z T
|s(t)| dt < ∞.
0
A T -periodic function s is called locally square-integrable if s(t) ∈ L2 ([0, T ]), that is,
Z T
|s(t)|2 dt < ∞.
0
In a signal processing context, one says in this case that the function s has finite power , since
Z Z
1 A 1 T
lim |s(t)|2 = |s(t)|2 dt < ∞.
A→∞ A 0 T 0
As the Lebesgue measure of [0, T ] is finite, L2 ([0,T ]) ⊂ L1 ([0,T ]) (See Theorem 1.3.1). In particular, a
finite power periodic function is also locally integrable.
Definition 3.2.1 The Fourier transform {ŝn }, n ∈ , of the locally integrable T -periodic function
Z
1 T n
ŝn = s(t)e−2iπ T t dt, (3.28)
T 0
One often represents the sequence {ŝn }n∈ of the Fourier coefficients of a T -periodic function by ‘spectral
lines’ separated by 1/T from each other along the frequency axis. The spectral line at frequency n/T
has the complex amplitude ŝn . This is sometimes interpreted by saying that the FT of s is
X “ n”
ŝ(ν) = ŝn δ ν − ,
n∈
T
The Poisson kernel will play in the proof of the Fourier series inversion formula a role similar to that of
the Gaussian pulse in the proof of the Fourier transform inversion formula of the previous chapter.
X |n| 2iπ n t
Pr (t) = r e T . (3.29)
n∈
3.2. FOURIER SERIES 53
n≥0 n≥0
(1 − r2 )
= ˛ ˛ ≥ 0,
t ˛2
˛
˛1 − re2iπ T ˛
and therefore
Pr (t) ≥ 0. (3.30)
Also
Z +T /2
1
Pr (t) dt = 1. (3.31)
T −T /2
In view of the above expression of the Poisson kernel, we have the bound
Z
1 (1 − r2 )
Pr (t) dt ≤ ˛ ˛ ,
T [− T ,+ T ]\[−ε,+ε] ˛1 − e2iπ Tε ˛2
2 2
Properties (3.30),(3.31), and (3.32) make of the Poisson kernel a regularizing kernel, and in particular
Z +T
1 2
lim ϕ(t)Pr (t) dt = ϕ(0),
r↑1 T −T
2
The following result is similar to the Fourier inversion formula for integrable functions (Theorem 3.1.4).
Theorem 3.2.1 Let s : → be a T -periodic locally integrable function with Fourier coefficients {ŝ n },
n ∈ . If
X
|ŝn | < ∞, (3.33)
n∈
If we add to the above hypotheses the assumption that s is a continuous function, then the inversion
formula (3.34) holds for all t.
X Z +T
n 1 2
ŝn r|n| e2iπ T t = s(u)Pr (t − u) du, (3.35)
n∈
T −T
and Z ˛Z ˛
T ˛ T
du ˛
lim ˛ s(u)Pr (t − u) − s(t)˛˛ dt = 0,
r↑1 ˛ T
0 0
54 CHAPTER 3. FOURIER ANALYSIS
P
that is to say: the right-hand side of (3.35) tends to s in L1 ([0, T ]) when r ↑ 1. Since n∈ |ŝn | < ∞,
P +2iπ(n/T )t
the function of t in left-hand side of (3.35) tends towards the function n∈ ŝn e , pointwise
1
and in L ([0, T ]). The result then follows from Theorem 1.3.7.
The statement in the case where s is continuous is proved exactly as the corresponding statement in
Theorem 3.1.4.
As in the case of integrable functions, we deduce from the inversion formula the uniqueness theorem:
Corollary 3.2.1 Two locally integrable periodic functions with the same period T that have the same
Fourier coefficients are equal almost everywhere.
The forthcoming result establishes the connection between Fourier transform and Fourier series, and is
central to sampling theory. It is a weak form of the Poisson summation formula (see the discussion after
the statement of the theorem).
Theorem 3.2.2 Let s :
P → be a integrable function and let 0 < T < ∞ be fixed. The series
We paraphrase this result as follows: Under the above conditions, the function
X
Φ(t) := s(t + nT ) (3.37)
n∈
(We speak of a “formal” Fourier series, because nothing is said about its convergence.) Therefore,
whenever we are able to show that the Fourier series represents the function at t = 0, that is, if
Φ(0) = Sf (0), then we obtain the Poisson summation formula (3.36).
For the time being, we say nothing about the convergence of the Fourier series. This is why we call the
formula the weak Poisson formula. The strong Poisson formula corresponds to the case where one can
prove the equality everywhere (and in particular at t = 0) of Φ and of its Fourier series.
XZ (n+1)T Z
= |s(t)| dt = |s(t)| dt < ∞.
nT
n∈
In particular X
|s(t + nT )| < ∞ a.e.
n∈
3.2. FOURIER SERIES 55
P
Therefore the series n∈ s(t + nT ) converges absolutely for almost all t. In particular, Φ is well-defined
(define it arbitrarily when the series does not converge). This function is clearly T -periodic. We have
Z T Z T ˛˛ X ˛
˛
˛ ˛
|Φ(t)| dt = ˛ s(t + nT )˛ dt
0 0 ˛n∈ ˛
Z T X Z
≤ |s(t + nT )| dt = |s(t)| dt < ∞.
0
n∈
1 n (t+kT )
−2iπ T
= s(t + kT )e dt
T 0 k∈
Z
1 n 1 “n”
= s(t)e−2iπ T t dt = ŝ .
T
T T
We have a function and we have its formal Fourier series. When both are equal everywhere, we obtain
the strong Poisson summation formula. Exercise 3.23 gives a sufficient condition for this.
RT
and the Hilbert space L2 ([0, T ], dt/T ) of complex functions x = {x(t)}, t ∈ , such that 0
|x(t)|2 dt <
∞, with the Hermitian product
Z T
dt
< x, y >L2 ([0,T ], dt ) = x(t)y(t)∗ . (3.40)
T
0 T
defines a linear isometry s(·) → {ŝn } from L2 ([0, T ], dt/T ) onto `2 ( ), the inverse of which is given by
X n
s(t) = ŝn e2iπ T t , (3.42)
n∈
where the series on the right-hand side converges in L2 ([0, T ], dt/T ), and the equality is almost-
everywhere. This isometry is summarized by the Plancherel–Parseval identity:
X Z
1 T
x̂n ŷn∗ = x(t)y(t)∗ dt . (3.43)
n∈
T 0
56 CHAPTER 3. FOURIER ANALYSIS
Proof. The result follows from general results on orthonormal bases of Hilbert spaces, since the sequence
ff
1 n
{en ( · )} := √ e2iπ T · , n ∈ ,
T
is a complete orthonormal sequence of L2 ([0, T ], dt/T ) (Theorem 3.2.6).
P
Let `1 ( ) be the space of sequences
fn , n ∈ , such that n∈
|fn |
< ∞ (integrable discrete-time functions).
Theorem 3.2.4 `1 ( ) ⊂ `2 ( ), that is, a discrete-time integrable function has finite energy.
P
Proof. Let A = {n; , |xn | ≥ 1}. Since n∈ |xn | < ∞ then necessarily card (A) < ∞. On the other
The situation for discrete-time functions is in contrast with that of continuous-time functions, for which
there exist integrable functions with infinite energy, and finite energy functions which are not integrable.
R +π
Let L2 (2π) be the Hilbert space of functions f˜ [−π, +π] → such that −π |f˜(ω)|2 dω < ∞ provided
Theorem 3.2.5 There exists a linear isomorphism between L2 (2π) and `2 ( ) defined by
Z +π X
dω
fn = f˜(ω)einω , f˜(ω) = fn e−inω . (3.44)
−π 2π n∈
Proof. One first observes that {en (·), n ∈ } is an orthonormal system in L2 ([0, T ]). It remains to
show that the linear space it generates is dense in L2 ([0, T ]) (Theorem 2.3.2).
For this, let f ∈ L2 ([0, T ]) and let fN be its projection on the Hilbert subspace generated by
{en (·), −N ≤ n ≤ N }. The coefficient of en in this projection is cn (f ) =< f, en >L2 ([0,T ]) we have
+N
X Z T Z T
|cn (f )|2 + |f (t) − fN (t)|2 dt = |f (t)|2 dt. (3.46)
n=−N 0 0
3.2. FOURIER SERIES 57
(This
P is Pythagoras’ theorem for projections: kPG (x)k2 + kx − PG (x)k2 = kxk2 .) In particular,
2
n∈ |cn (f )| < ∞. It remains to show ((b) of Theorem 2.3.2) that
Z T
lim |f (t) − fN (t)|2 dt = 0.
N ↑∞ 0
We assume in a first step that f is continuous. For such a function, the formula
Z T
ϕ(x) = f˜(x + t)f˜(t)∗ dt,
0
where
X
f˜(t) = f (t + nT )1(0,T ] (t + nT ),
n∈
= T |cn (f )|2 .
P 2
Since n∈ |cn (f )| < ∞ and ϕ(x) is continuous, it follows from the Fourier inversion theorem for
In particular, for x = 0
Z T X
ϕ(0) = |f (t)|2 dt = |cn (f )|2 ,
0 n∈
It remains to pass from the continuous functions to the square-integrable functions. Since the space
C([0, T ]) of continuous functions from [0, T ] into is dense in L2 ([0, T ]), with any ε > 0, one can associate
ϕ ∈ C([0, T ]) such that kf − ϕk ≤ ε/3. By Bessel’s inequality, kfN − ϕN k2 = k(f − ϕ)N k2 ≤ kf − ϕk2 ,
and therefore
kf − fN k ≤ kf − ϕk + kϕ − ϕN k + kfN − ϕN k
≤ kϕ − ϕN k + 2kf − ϕk
ε
≤ kϕ − ϕN k + 2 .
3
binary symbols, 0 and 1. This binary sequence is generated by first sampling the analog function, that
is, extracting a sequence of samples {s(n∆)}n∈ , and then, quantizing, that is, converting each sample
The first question that arises is: To what extent does the sample sequence represent the original function?
This cannot be true without further assumptions since obviously an infinity of functions fit a given
sequence of samples.
The second question is: How to reconstruct efficiently the function from its samples?
Then,
X 1 X “ n ” −2iπν 2B
n
ŝ(ν + j2B) = s e , a.e. (3.48)
j∈
2B n∈ 2B
P
Proof. By Theorem 3.2.2, the 2B-periodic function Φ(ν) = j∈ ŝ(ν + j2B) is locally integrable, and
that is, since the Fourier inversion formula for s holds (ŝ is integrable) and it holds everywhere (s is
continuous), the n-th Fourier coefficient of Φ(ν) is in fact equal to
“ n ”
s − .
2B
The formal Fourier series of Φ(ν) is therefore
1 X “ n ” −2iπ 2B
n ν
s e .
2B n∈ 2B
In view of condition (3.47), the Fourier inversion formula holds a.e. (Theorem 3.2.1), that is Φ(ν) is
almost everywhere equal to its Fourier series. This proves (3.48).
3.2. FOURIER SERIES 59
Since the frequency response T (ν) ∈ L1 ( ), the impulse response h given by (3.49) is bounded and
uniformly continuous, and therefore s̃ is bounded and continuous (the right-hand side of (3.50) is a
normally convergent series—by (3.47)—of bounded and continuous functions). Also, on substituting
(3.49) in (3.50) we obtain
Z
1 X “ n ” n
s̃(t) = s T (ν) e2iπν(t− 2B ) dν
2B n∈ 2B
Z (X )
1 “ n ” −2iπν 2B n
= s e T (ν) e2iπνt dν.
n∈
2B 2B
n∈
2B n∈
2B
Therefore, Z
s̃(t) =
g(ν)e2iπνt dν,
where ( )
1 X “ n ” −2iπν n
g(ν) = s e 2B T (ν).
2B n∈
2B
Theorem 3.2.8 Let s : → be an integrable and continuous function whose FT ŝ vanishes outside
[−B, +B], and assume condition (3.47) is satisfied. We can then recover s from its samples s(n/2B),
n ∈ , by the formula
X “ n ”
s(t) = s sinc(2Bt − n), a.e.. (3.52)
n∈
2B
Proof. This is a direct consequence of the previous theorem with T (ν) being the frequency response of
the low-pass (B). Indeed
( )
X
ŝ(ν + j2B) T (ν) = ŝ(ν)1[−B,+B] (ν) = ŝ(ν),
j∈
The second equality is an almost-everywhere equality; it holds everywhere when s is a continuous function
(see Corollary A1.2).
If we interpret s(n/2B)h(t − n/2B) as the response of the low-pass (B) when a Dirac impulse of height
s(n/2B) is applied at time n/2B, the right-hand side of equation (3.52) is the response of the low-pass
(B) to Dirac’s comb
1 X “ n ” “ n ”
si (t) = s δ t− . (3.53)
2B n∈ 2B 2B
Theorem 3.2.9 Let s(t1 , . . . , tn ) be an integrable and continuous function whose FT ŝ(ν1 , . . . , νn ) van-
ishes outside [−B1 , +B1 ] × · · · × [−Bn , +Bn ], and assume that
X X ˛˛ „ k1 «˛
kn ˛˛
··· ˛s , . . . , < ∞. (3.54)
˛ 2B1 2Bn ˛
k ∈ k ∈ 1
n
“ ”
k1 kn
Then, we can then recover s from its samples s 2B1
,..., 2Bn
, k1 ∈ , . . . , kn ∈ , by the formula
X X „ «
k1 kn
s(t1 , . . . , tn ) = ··· s ,..., sinc(2B1 t − k1 ) × · · · × sinc(2Bn t − kn ), a.e.. (3.55)
k1 ∈ kn ∈
2B1 2Bn
What happens in the Shannon–Nyquist sampling theorem if one supposes that the function is base-band
(B), although it is not the case in reality?
Suppose that an integrable function s is sampled at the frequency 2B and that the resulting impulse
train is applied to the low-pass (B) with impulse response h(t) = 2Bsinc(2Bt), to obtain, after division
by 2B, the function
X “ n ”
s̃(t) = s sinc(2Bt − n).
n∈
2B
What is the FT of this function? The answer is given by the theorem below which is a direct consequence
of Theorem 3.2.7.
Theorem 3.2.10 Let s : → be an integrable and continuous function such that condition (3.47) is
satisfied. The function
X “ n ”
s̃(t) = s sinc(2Bt − n) (3.56)
n∈
2B
where
( )
X
b̃
s(ν) = ŝ(ν + j2B) 1[−B,+B] (ν). (3.57)
k∈
If s̃ is integrable, then b̃
s is its FT, by the Fourier inversion theorem. This FT is obtained by superposing,
in the frequency band [−B, +B], the translates by multiples of 2B of the initial spectrum ŝ(ν). This
superposition constitutes the phenomenon of spectrum folding, and the distortion that it creates is called
aliasing .
The L1 version of the Shannon–Nyquist theorem contains a condition bearing on the samples themselves,
namely:
X ˛˛ “ n ”˛˛
˛s ˛<∞ (3.58)
n∈
2B
The simplest way of removing this unaesthetic condition is given by the L2 version of the Shannon–
Nyquist theorem.
3.2. FOURIER SERIES 61
˛
n=−N
where Z +B
n
bn = ŝ(ν) e2iπν 2B dν.
−B
Proof. Denote L2 ( ; B) the Hilbert subspace of L2 ( ) consisting of the finite energy complex functions
with a Fourier transform having its support contained in [−B, +B]. The sequence
“ ff
1 n ”
√ h ·− (3.60)
2B 2B n∈
2
where h(t) ≡ 2B sinc(2Bt), is an orthonormal basis of L ( ; B). Indeed, the functions of this system
are in L2 ( ; B), and they form an orthonormal system since, by the Plancherel–Parseval formula,
Z “ „ « Z
n ” k n
“ k
”∗
h t− h t− dt = ŝ(ν)e−2iπν 2B ŝ(ν)e−2iπν 2B dν
2B 2B
Z +B
k−n
= e2iπν 2B dν = 2B × 1n=k .
−B
It remains to prove the totality of the orthonormal system 3.60 (see Theorem 2.3.2). We must show that
if g(t) ∈ L2 ( ; B) and
Z “ n ”
g(t)h t − dt = 0 for all n ∈ , (3.61)
2B
But we have proven in the previous section that the system {e2iπνn/2B }n∈ is total in L2 ( ; B), therefore
3.62 implies ĝ(ν) = 0 almost everywhere, and consequently g(t) = 0 almost everywhere.
where the limit and the equality in 3.63 are taken in the L2 sense (as in 3.59), and
Z “
1 n ”
cn = s(t) √ h t− dt.
2B
2B
By the Plancherel-Parseval identity,
Z +B
1 n
cn = ŝ(ν) √ e2iπν 2B dν.
−B 2B
62 CHAPTER 3. FOURIER ANALYSIS
3.3 Exercises
Exercise 3.1.
Show that the FT of a real function is Hermitian even, that is:
ĥ(− ν) = ĥ(ν)∗ .
Show that the FT of an odd (resp., even; resp., real and even) function is odd (resp., even; resp., real
and even).
Exercise 3.2.
Let f : ( , B) → ( , B), be integrable with respect to the Lebesgue measure. Show that its Fourier
transform fˆ is continuous and bounded.
Exercise 3.3.
Let x : → be an integrable function. Show that its autocorrelation function
Z
c(t) = x(s + t)x∗ (s)ds
Exercise 3.4.
Show that the n-th convolution power of f (t) = e−at 1t≥0 (t), where a > 0, is
tn−1 −at
f ∗n (t) = e 1t≥0 (t).
(n − 1)!
Exercise 3.5.
Show that for s(t) ∈ L2 ( ),
Z ˛ Z ˛2
˛ +T ˛
lim ˛ŝ(ν) − s(t)e−2iπνt dt˛˛ dν. (3.64)
T ↑∞ ˛
−T
Exercise 3.6.
Use the Plancherel-Parseval identity to show that
Z „ «2
sin(πν)
dν = 1.
πν
Exercise 3.7.
1
Give the FT of s(t) = a2 +t2
. Deduce from this the value of the integral
Z
1
I(t) = du, t > 0.
t2 + u 2
Exercise 3.8.
Deduce from the Fourier inversion formula that
Z „ «2
sin(t)
dt = π.
t
3.3. EXERCISES 63
Exercise 3.9.
Let s : → be an integrable right-continuous function, with a limit from the left at all times. Show
Exercise 3.10.
Let s : → be an integrable function with a Fourier transform with a compact support. Show that
s ∈ C ∞ , that all its derivatives are integrable, and that the k-th derivative has the FT (2iπν)k ĥ(ν).
Exercise 3.11.
Give a differential equation satisfied by the Gaussian pulse, and use it to deduce its Fourier transform.
Could you do the same to prove (3.4)?
Exercise 3.12.
Use Plancherel–Parseval identity to prove that
Z
dt π
= .
(t2 + a2 )(t2 + b2 )
ab(a + b)
Exercise 3.13.
Show that if an integrable function is base-band (that is, if its FT has compact support), then it also
has a finite energy.
Exercise 3.14.
Show that if an integrable function is discontinuous at a point t = a, its FT is not integrable. (This
shows that the L1 Fourier inversion theorem is limited in scope, since it does not take much for an
integrable function not to have an integrable FT.)
Exercise 3.15.
Find a constant c and polynomials P (z) and Q(z) as in Theorem ??, such that
˛ ˛
5 − 2 cos(ω) ˛ Q(e−iω ) ˛2
) = c ˛˛ ˛ .
3 − cos(ω) P (e−iω ) ˛
Exercise 3.16.
Compute the Fourier coefficients of the T -periodic function s : → such that on [0, T ), s(t) = t.
Exercise 3.17.
Let s : → be a locally integrable T -periodic function. Defining
show that the n-th Fourier coefficient ŝn of s and the FT sc T of sT are linked by
1 “ n ”
ŝn = sc
T . (3.65)
T T
Exercise 3.18.
Compute the Fourier coefficients of the T -periodic function s(t) such that on [−T /2, +T /2), s(t) =
1[−α T ,+α T ] (t), where α ∈ (0, 1).
2 2
Exercise 3.19.
Let s(t) be a T -periodic locally integrable function with n-th Fourier coefficient ŝn . Show that lim|n|↑∞ ĥn =
0.
64 CHAPTER 3. FOURIER ANALYSIS
Exercise 3.20.
Let x(t) be a T -periodic locally integrable function and let h : → be an integrable function. Show
that the function
Z
y(t) = h(t − s)x(s) ds
(3.66)
is almost-everywhere well-defined, T -periodic, and locally integrable, and that its n-th Fourier coefficient
is
“n”
ŷn = ĥ x̂n , (3.67)
T
where ĥ is the FT of h .
Exercise 3.21.
Compute X
1/n2
n≥1
using the expression of the Fourier coefficients of the 2-periodic function s : → such that
Exercise 3.22.
Let x : → be a T -periodic locally integrable function with n-th Fourier coefficient x̂n such that
X p
|n| |x̂n | < ∞.
n∈
Exercise 3.23.
Let s : → be an integrable function with the FT ĥ(ν), and suppose that
P
(a) n∈ s(t + nT ) is a continuous function, and
P
(b) n∈ |ĥ(n/T )| < ∞.
Exercise 3.24.
Show that if the function „ «2
sin(2πBt)
s(t) =
πt
is sampled at rate 1/2B and if the resulting train of impulses is filtered by a low-pass (B) and divided
by 2B, the result is the function
sin(2πBt)
.
πt
Exercise 3.25.
Let ν0 and B be such that
0 < 2B < ν0 ,
3.3. EXERCISES 65
P
and let s : → be an integrable and continuous base-band (B) function such that
k∈
|s(k/ν0 )| <
∞. Consider the train of impulses
„ « „ «
1 X n n
si (t) = s δ t− .
ν0 n∈
ν0 ν0
Passing this train through a low-pass (ν0 + B) one obtains a function a. Passing this train through a
low-pass (B) one obtains a function b .
Show that
a(t) − b(t) = 2 s(t) cos(2πν0 t).
(We have therefore effected the frequency transposition of the original function.)
66 CHAPTER 3. FOURIER ANALYSIS
Chapter 4
Probability
Introduction. From the formal (and limited) point of view, probability theory is a particular case of
measure and integration theory. However, at least the terminologies of the two theories are different
and we shall proceed to the “translation” of the theory of measure and integration into the theory of
probability and expectation.
Definition 4.1.1 A probability space is a triple (Ω, F, P ), where P , the probability, is a measure on the
measurable space (Ω, F) with total mass 1.
Definition 4.1.2 A random element is a measurable function X from (Ω, F) to a measurable space
(E, E).
If (E, E) = ( , B) (resp. = ( , B)), X is called a real (resp., extended) random variable. If (E, E) =
( n , Bn ), X = (X1 , . . . , Xn ) is called a random vector. A complex random variable is a function X :
Ω → of the form X = XR + iXI where XR and XI are real random variables.
A random variable X is a measurable function, and therefore we can define under general circumstances
its integral with respect to the probability measure P , called the expectation of X and denoted E [X]:
Z
E [X] = X(ω)P (dω).
Ω
From the general theory of integration summarized in the previous chapter we collect the following
results. First, if A ∈ F,
E[1A ] = P (A). (4.1)
67
68 CHAPTER 4. PROBABILITY
where αi ∈ , Ai ∈ F, then
N
X
E[X] = αi P (Ai ).
i=1
For a nonnegative random variable X, the expectation is always defined by
where {Xn }n≥1 is any nondecreasing sequence of nonnegative simple random variables that converge to
X. This definition is consistent, that is, it does not depend on the approximating sequence of nonnegative
simple random variables, as long as it is non-decreasing and has X for limit. In particular, with a special
choice of the approximating sequence, we have for any nonnegative random variable X, the “horizontal
slice formula”:
n
n2
X −1
k
E[X] = lim P (k × 2−n ≤ X < (k + 1) × 2−n ). (4.2)
n↑∞ 2n
k=0
If X is of arbitrary sign, the expectation is defined by E[X] = E[X + ] − E[X − ] if not both E[X + ] and
E[X − ] are infinite. If E[X + ] and E[X − ] are infinite, the expectation is not defined. If E[|X|] < ∞, X
is said to be integrable, and then E[X] is a finite number.
For a complex valued random variable X = X1 + iX2 , where X1 and X2 are real valued integrable
random variables, E[X] = E[X1 ] + iE[X2 ] defines the expectation of X.
The basic properties of the expectation are of course the same as for the more general Lebesgue integral.
In particular, linearity and monotonicity:
whenever the right-hand side has meaning (i.e., is not an ∞ − ∞ form). Also, if X1 ≤ X2 , P-a.s., then
φX (u) = E[eiuX ]
is well-defined.
Proof. The proof is a simple application of Fubini’s theorem. The product measure here is the product
of the Lebesgue measure on by the probability P . We have
Z Z Z
E[X] = E[ 10≤x<X dx] =
E[10≤x<X ] dx =
P (X > x)] dx.
If X is a random element with values in (E, E) and if g is a measurable function from (E, E) to ( , B),
then g(X) is, by the composition theorem for measurable functions (Theorem 1.1.3, a random variable.
By Theorem 1.2.5, we have
holds whenever one of the sides of the equality is well-defined, in which case the other is also well-defined.
In the particular case where (E, E) = ( , B), taking C = (−∞, x], we have
In the particular case where (E, E) = ( n , Bn ), and where the random vector X admits a probability
density fX , that is if QX is the product of the Lebesgue measure on ( n , Bn ) by the function fX ,
Theorem 1.2.9 tells us that Z
E[g(X)] = g(x)fX (x) dx.
The results below have already been stated in Chapter ??, we give them again for the sake of self-
containedness of the present chapter. They are just a rephrasing of the Lebesgue theorems for integrals
70 CHAPTER 4. PROBABILITY
with respect to an arbitrary measure. First we have the monotone convergence theorem (Theorem 1.2.1)
in the context of expectations. .
Theorem 4.1.3 Let {Xn }n≥1 be a sequence of random variables such that for all n ≥ 1,
0 ≤ Xn ≤ Xn+1 , P-a.s.
Theorem 4.1.4 Let {Xn }n≥1 be a sequence of random variables such that for all ω outside a set N of
null probability there exists limn↑∞ Xn (ω), and such that for all n ≥ 1
|Xn | ≤ Y, P-a.s.,
The two examples below are left as exercises of application of the monotone convergence theorem
and of the dominated convergence theorem.
Example 1.2: Let {Sn }n≥1 be a sequence of nonnegative random variables. Then
" ∞
# ∞
X X
E Sn = E[Sn ]. (4.8)
n=1 n=1
P
Example 1.3: Let {Sn }n≥1 be a sequence of real random variables such that n≥1 E[|Sn |] < ∞. Then
(4.8) holds.
Example 1.4: If for some integrable random variable, P (|Xn | ≤ X) = 1 for all n, then {Xn }n≥1 is
uniformly integrable. Indeed, in this case,
Z Z
|Xn | dP ≤ X dP
{|Xn |>c} X>c
and the right-hand side of the above inequality tends to 0 as c ↑ ∞, by monotone convergence.
4.1. EXPECTATION AS INTEGRAL 71
Theorem 4.1.5 The sequence {Xn }n≥1 of integrable random variables is uniformly integrable if and
only if
(b) For every ε > 0, there exists δ(ε) > 0 such that
Z
sup |Xn | dP ≤ ε whenever P (A) ≤ δ(ε) .
n A
Proof. Let X be a non-negative random variable. For any c ≥ 0 and any A ∈ F, we have
Z Z Z
X dP = X dP + X dP
A A∩{X≤c} A∩{X>c}
Z
≤ cP (A) + sup X dP ,
n {|Xn |>c}
and therefore Z Z
sup |Xn | dP ≤ cP (A) + X dP .
n A {X>c}
The necessity of (a) follows by taking A = Ω, and that of (b) by letting P (A) → 0 and c ↑ ∞. By
Markov’s inequality and (b)
1
sup P (|Xn | ≥ c) ≤ sup E [|Xn |] ↓ 0 ,
n c n
R
as c ↑ ∞. Choose c < ∞ so that P (|Xn | ≥ c) ≤ δ(ε). Then by (a), we have {|Xn |>c} |Xn | dP ≤ ε for
all n.
Theorem 4.1.6 A sufficient condition for the sequence {Xn }n≥1 of integrable random variables to be
uniformly integrable is the existence of a non-negative non-decreasing function G : → such that
G(t)
lim = +∞
t↑∞ t
and
sup E [G(|Xn |)] < ∞
n
Theorem 4.1.7 Let {Xn }n≥1 be a sequence of integrable random variables, and let X be a random
variable. The following are equivalent:
Pr.
(a) {Xn }n≥1 is ui and Xn → X as n → ∞.
L
(b) X is integrable and Xn →1 X as n → ∞.
Proof.
L
We first prove (b) ⇒ (a). Since Xn →1 X, the sequence {Xn }n≥1 is a Cauchy sequence of L1 , that is
limm,n↑∞ E [|Xm − Xn |] = 0. Therefore, given ε > 0 there is an integer N (ε) such that E [|Xm − Xn |] ≤
72 CHAPTER 4. PROBABILITY
R R R
ε when m, n ≥ N (ε). From this and the inequality A
|Xn | dP ≤ A
|Xm | dP + |Xm − Xn | dP we have
that, for all A ∈ F, Z Z
sup |Xn | dP ≤ sup |Xm | dP + ε
n A m≤N (ε) A
R R
Since the finite family {Xn }n≤N (ε) is ui, supn |Xn | dP < ∞ and supn A |Xn | dP ≤ 2ε for sufficiently
small P (A). This implies uniform integrability of {Xn }n≥1 (Theorem 4.1.5). It remains to prove that
Pr.
Xn → X. This follows from the Cauchy criterion of convergence in probability, since, by Markov’s
inequality,
1
P (|Xm − Xn | ≥ ε) ≤ → 0 as m, n → ∞ .
ε
Pr.
We now prove (a) ⇒ (b). First we show that Xn → X and uniform integrability of {Xn }n≥1 imply
a.s.
that X is integrable. Taking a subsequence if necessary, we can assume that Xn → X and therefore,
by Fatou’s lemma E [|X|] ≤ lim inf E [|Xn |] ≤ sup E [|Xn |] < ∞. Now
Z Z Z
|Xn − X| dP ≤ |Xn − X| dP + |Xn − X| dP
{|Xn −X|≤ε} {|Xn −X|>ε}
Z Z
=ε + |Xn − X| dP + |Xn − X| dP
{|Xn |>ε} {|−X|>ε}
and therefore Z
lim sup |Xn − X| dP ≤ ε
n
as n ↑ ∞ since P (|Xm − Xn | ≥ ε) → 0. Since ε > 0 is otherwise arbitrary, we have that
Z
lim |Xn − X| dP = 0 .
n
4.2 Independence
4.2.1 From Fubini to independence
Definition 4.2.1 Two events A and B are said to be independent events if
P (A ∩ B) = P (A)P (B). (4.9)
More generally, a family {Ai }i∈I of events, where I is an arbitrary index, is called an independent family
of events if for every finite subset J ∈ I
!
\ Y
P Aj = P (Aj ).
j∈J j∈J
Two random variables X : (Ω, F) → (E, E) and Y : (Ω, F) → (G, G) are called independent random
variables if for all C ∈ E, D ∈ G
P ({X ∈ C} ∩ {Y ∈ D}) = P (X ∈ C)P (Y ∈ D). (4.10)
More generally, a family {Xi }i∈I , where I is an arbitrary index, of random variables Xi : (Ω, F) →
(Ei , Ei ), i ∈ I, is called an independent family of random variables if for every finite subset J ∈ I
!
\ Y
P {Xj ∈ Cj } = P (Xj ∈ Cj )
j∈J j∈J
Theorem 4.2.1 If the random variables X and Y , taking their values in (E, E) and (G, G) respectively,
are independent , then so are ϕ(X) and ψ(Y ), where ϕ : (E, E) → (E 0 , E 0 ), ψ : (G, G) → (G0 , G 0 ).
Proof. For all C 0 ∈ E 0 , D0 ∈ G 0 , the sets C = ϕ−1 (C 0 ) and D = ψ −1 (D0 ) are in E and G respectively,
since ϕ and ψ are measurable. We have
` ´
P ϕ(X) ∈ C 0 , ψ(Y ) ∈ D0 = P (X ∈ C, Y ∈ D)
= P (X ∈ C) P (Y ∈ D)
` ´ ` ´
= P ϕ(X) ∈ C 0 P ψ(Y ) ∈ D0 .
The above result is stated for two random variables for simplicity, and it extends in the obvious way to
a finite number of independent random variables.
The independence of two random variables X and Y is equivalent to the factorisation of their joint
distribution:
Q(X,Y ) = QX × QY ,
where Q(X,Y ) , QX , and QY are the distributions of (X, Y ), X, and Y , respectively. Indeed, for all sets
of the form C × D, where C ∈ E, D ∈ G,
Q(X,Y ) (C × D) = P ((X, Y ) ∈ C × D)
= P (X ∈ C, Y ∈ D)
= P (X ∈ C)P (Y ∈ D)
= QX (C)QY (D),
In particular, the Fubini–Tonelli theorem gives immediately a result that we have already seen several
times in particular cases: the product formula for expectations (Formula (4.11) below).
Theorem 4.2.2 Let the random variables X and Y taking their values in (E, E) and (G, G) respectively
be independent, and let g : (E, E) → ( , B), h : (G, G) → ( , B) such that either one of the following
two conditions is satisfied:
(ii)g ≥ 0 and h ≥ 0.
Then
Theorem 4.2.3 Let Y and X be as above. There exists at least one version of the conditional expectation
of Y given X, g(X), and it is essentially unique, that is, if g 0 (X) is another version of the conditional
expectation of Y given X, then
Proof. We omit the proof of existence, without qualms, since in all applications, we need to compute
the conditional expectation, and in the process, we prove existence. To prove uniqueness, we first observe
that in view of (4.12),
N
for all bounded measurable ϕ : → . In particular, with ϕ(x) = 1{g(x)>g0 (x)}
Since the random variable in the expectation is non-negative, it can have a null expectation only if it
is P -a.s. null, that is if P -a.s., g(X) ≤ g 0 (X). By symmetry, P -a.s., g(X) ≥ g 0 (X), and therefore, as
announced, g(X) = g 0 (X)P -a.s.
The symbols E[Y |X], or E X [Y ], represent any version of the conditional expectation g(X) of Y given
X. There is no problem in representing two versions of this conditional expectation by the same symbol,
since, as we just saw, they are P -almost surely equal. From now on we say: E X [Y ] (or E[Y |X]) is the
conditional expectation of Y given X. The defining equality (4.12) reads
Although we have not yet proven the existence of the conditional expectation in the general case, in
many practical situations, we can directly exhibit a version of it. In the following theorems we retrieve
results of Section ??.
Theorem 4.2.4 Let X be a positive integer valued random variable. Then, for any integrable random
variable Y
X∞
E[Y 1{X=n} ]
E X [Y ] = 1{X=n} , (4.13)
n=1
P (X = n)
E[Y 1
{X=n} ]
where, by convention, P (X=n) = 0 when P (X = n) = 0 (in other terms, the sum in 4.13 is over all
n such that P (X = n) > 0).
4.2. INDEPENDENCE 75
Proof. We must verify 4.12 for all bounded measurable ϕ : → . The right hand side is equal to
20 10 13
X E[Y 1{X=n} ] X
E 4@ 1{X=n} A @ ϕ(k)1{X=k} A5
P (X = n)
n≥1 k≥1
2 3
X E[Y 1{X=n} ]
=E 4 ϕ(n)1{X=n} 5
P (X = n)
n≥1
X E[Y 1{X=n} ]
= ϕ(n)E[1{X=n} ]
P (X = n)
n≥1
X E[Y 1{X=n} ]
= ϕ(n)P (X = n)
P (X = n)
n≥1
X
= E[Y 1{X=n} ]ϕ(n)
n≥1
X
= E[Y 1{X=n} ϕ(n)]
n≥1
X
=E[Y ( ϕ(n)1{X=n} )] = E[Y ϕ(X)]
n≥1
Theorem 4.2.5 Let Z and X be the random vectors of dimensions p and n respectively, with the joint
probability density fZ,X (z, x). Recall the definition of the conditional probability density f ZX=x (z):
fZ,X (z, x)
fZX=x (Z) =
fX (x)
with the convention fZX=x (z) = 0 when fX (x) = 0. Let h : p+n → be a measurable function, and
suppose that the random variable Y = h(Z) is integrable. A version of the conditional expectation of
Y = h(Z, X) given X is the random variable g(X) where
Z
g(x) = h(z, x)fZX=x (z)dz
and therefore
Z Z „Z «
E[|g(x)|] = |g(x)|fX (x)dx ≤ |h(z, x)|fZX=x (z)dz fX (x)dx
n n P
ZZ
= |h(z, x)|fZX=x (z)fX (x)dzdx
P n
Z Z
= |h(z, x)|fZ,X (z, x)dzdx
P
We must now check that (4.12) is true, where ϕ is bounded. The right-hand side is
Z
E[g(x)ϕ(x)] = g(x)ϕ(x)fX (x)dx
n
Z „Z «
= h(z, x)fZX=x (z)dz ϕ(x)fX (x)dx
n P
Z Z
= h(z, x)ϕ(x)fZX=x(z)fZ (x)dzdx
P n
Z Z
= h(z, x)ϕ(x)fZ,X (z, x)dzdx
P
Going back to the case where the conditioning variable X is discrete, we shall mention the situation,
often encountered in practice, where Z is a random vector of dimension p, and where X takes its values
in N+ , with
P (X = k) = π(k)
for all k ≥ 1, and Z
P (Z ∈ A|X = k) = fk (Z)dz
A
The proof is left to the reader, and is similar to the proof when (Z, X) has a joint probability distribution.
Z =X +ξ
where ξ is a random variable independent of X with probability density fξ . Let h : → be such that
E[|h(Z)|] < ∞. We shall compute E X [h(Z)] = g(X).
We have Z
g(k) =
h(z)fk (z)dz
and fk is defined by Z
fk (z)dz = P (Z ∈ A|X = k)
A
that is
Z
P (Z ∈ A, X = k)
fk (z)dz =
A P (X = k)
P (k + ξ ∈ A, X = k) P (k + ξ ∈ A)P (X = k)
= =
P (X = k) P (X = k)
Z
= P (k + ξ ∈ A) = P (ξ ∈ A − k) = fξ (zh)dz.
A
Therefore
fk (z) = fξ (zk)
4.2. INDEPENDENCE 77
and Z Z
g(k) =
h(z)fξ (zk)dz =
h(z + k)fξ (z)dz
that is
g(k) = E[h(ξ + k)]
We now treat the second mixed case the conditioning variable X is a random vector of dimension n, Z
is a N+ -valued random variable, with the joint distribution of (Z, X) given by
P (Z = k) = π(k)
π(k)fk (x)
πZ/X (k|x) =
fX (x)
P
if fX (x) = k≥1 π(k)fk (x) > 0, and π(k|x) = 0 otherwise. We let the reader verify that for all
n
h:N× → such that E[|h(Z, X)| < ∞, then
where X
g(x) = h(k, x)πZ|X k(x).
k≥1
Exercise 4.4 is recommended since it features a result that cannot be obtained as an application of
Theorems 4.2.4 or 4.2.5. It requires coming back to the definition of conditional expectation.
We shall give the main rules that are useful in computing conditional expectations.
Rule 1. (linearity)
E X [λ1 Y1 + λ2 Y2 ] = λ1 E X [Y1 ] + λ2 E X [Y2 ]
E X [Y ] = E[Y ]
n
Rule 3. If h : → is a measurable function,
E X [h(X)] = h(X)
Proof.
Rule 1: Let g1 (X) = E X [Y1 ], g2 (Y1 ) = E X [Y2 ]. We must show that, for all bounded measurable
ϕ: n→
E[(λ1 g1 (X) + λ2 g2 (X)ϕ(X)] = E[(λ1 Y1 + λ2 Y2 )ϕ(X)]
78 CHAPTER 4. PROBABILITY
This follows immediately from the definition of g1 (X) which says that E[gi (X)ϕ(X)] = E[Yi ϕ(X)], i =
1, 2.
Rule 2: We have to check that E[g(X)ϕ(X)] = E[Y ϕ(X)] with g(X) = E[Y ]. In this case E[g(X)ϕ(X)] =
E[E[Y ]ϕ(X)] = E[Y ]E[ϕ(X)], and since Y and ϕ(X) are independent E[Y ]E[ϕ(X)] = E[Y ϕ(X)].
Rule 3: We must check that E[Y ϕ(X)] = E[h(X)ϕ(X)], which is a tautology since Y = h(X).
Theorem 4.2.7 Let X be a random vector and let Y1 and Y2 be two integrable random variables such
that Y1 ≤ Y2 P -a.s. Then
E X [Y1 ] ≤ E X [Y2 ], P -a.s. (4.14)
n
Proof. For any bounded measurable ϕ : → ,
E[E [Y1 ]ϕ(X)] = E[Y1 ϕ(X)] ≤ E[Y2 ϕ(X)] = E[E X [Y2 ]ϕ(X)]
X
Therefore
E[(E X [Y2 ] − E X [Y1 ])ϕ(X)] ≥ 0
Taking ϕ(X) = 1{E X [Y2 ]<E X [Y2 ]} , we see that this implies 4.14 .
In particular, if Y is a non-negative integrable random variable
E X [Y ] ≥ 0, P - a.s.
Theorem 4.2.8 Let X be a random vector and let Y be an integrable random variable of the form
Y = v(X)Z, where v : n → is a measurable bouned function, and Z is an integrable random
variable. Then
E X [v(X)Z] = v(X)E X [Z]
Proof. We must show that the right-hand side is a version of E X [v(X)Z], that is we must prove that
for all bounded measurable ϕ : n → ,
E[v(X)Zϕ(X)] = E[v(X)E X [Z]ϕ(X)].
But, since v(x)ϕ(x) is bounded, by definition of E X [Z],
E[v(X)E X [Z]ϕ(X)] = E[v(X)Zϕ(X)]
Theorem 4.2.9 Let X be a random vector, and let {Yn }n≥1 be a sequence of non-negative integrable
random variables that is P -a.s. non-decreasing and that converges P -a.s. to the integrable random
variable X. Then {E X [Yn ]}n≥1 is a P -a.s. non-decreasing sequence of random variables that converges
P -a.s. to E X [Y ].
Proof. Let gn (X) be a version of E X [Yn ]. By the monotonicity of conditional expectation {gn (X)}n≥1
is a P -a.s. non-decreasing sequence. In particular, there exists a P -a.s. limit g(X) of this sequence, and
by monotone convergence, for any bounded non-negative measurable ϕ : n →
lim E[Yn ϕ(X)] = E[Y ϕ(X)]
n↑∞
Proof. We have
Z
1 e−iua − e−iub
+c
Φc : = ϕ(u) du
2π −c iu
Z +c −iua „Z +∞ «
1 e − e−iub
= e−iux dF (x) du
2π −c iu −∞
Z +∞ „Z +c −iua «
1 e − e−iub −iux
= e du dF (x)
2π −∞ −c iu
Z +∞
= Ψc (x) dF (x) ,
−∞
where Z
1 +c
e−iua − e−iub −iux
Ψc (x) := e du .
2π −c iu
In the above computations we have applied Fubini’s theorem, which is allowed since, observing that
˛ −iua ˛ ˛Z ˛
˛e − e−iub ˛˛ ˛˛ b −iux ˛˛
˛
˛ iu
=
˛ ˛ e dx ˛ ≤ (b − a) ,
a
we have
Z Z ˛ −iua ˛ Z +c Z +∞ ˛ −iua ˛
+c +∞ ˛e − e−iub −iux ˛˛ ˛e − e−iub ˛˛
˛ e dF (x) du = ˛
˛ iu ˛ ˛ iu ˛ dF (x) du
−c −∞ −c −∞
˛
Z +c Z +∞ −iua ˛
˛e − e−iub ˛˛
= ˛
˛ iu ˛ dF (x) du
−c −∞
Z +c Z +∞
= (b − a) dF (x) du
−c −∞
Z +∞
= (b − a) dF (x) × 2c
−∞
= 2c(b − a) < ∞ .
Observe that
Z +c Z +c(x−a) Z +c(x−b)
1 sin t(x − a) − sin t(x − b) 1 sin u 1 sin u
Ψc (x) = du = du − du .
2π −c u 2π −c(x−a) u 2π −c(x−b) u
RB R +∞
The function s, t → A sinu u du is uniformly continuous in A, B and tends to −∞ sinu u du = π as A ↑ +∞
and B ↓ −∞. Therefore the function Ψc is uniformly bounded. Moreover by the above expression for
Ψc ,
lim Ψc (x) = Ψ(x) ,
c↑∞
80 CHAPTER 4. PROBABILITY
where
Theorem 4.3.2 If X be in Theorem 4.3.1 admits a probability density f and if moreover the caracteristic
function is integrable:
Z +∞
|ϕ(u)| du < ∞ ,
−∞
then
Z +∞
1
f (x) = ϕ(u)e−iux du . (4.16)
2π −∞
by Theorem 4.3.1.
4.3. THE THEORY OF CARACTERISTIC FUNCTIONS AND WEAK CONVERGENCE81
A. It is continuous at 0, and
B. It is definite non-negative, in the sense that for all integers n, all u1 , . . . , un ∈ , and all z1 , . . . ,
zn ∈ ,
Xn Xn
ϕ(uj − uk )zj zk∗ ≥ 0. (4.17)
j=1 k=1
For the proof of (4.17), just observe that the left-hand side is equal to
"˛ n ˛2 #
˛X ˛
˛ iuj X ˛
E ˛ zj e ˛ .
˛ ˛
j=1
(It turns out that Properties A and B caracterize caracteristic functions. Before we state the correspond-
ing result (Bochner’s theorem), we give some consequences of Property B that we shall need later.
Z AZ A
ϕ(u − v)z(u)z ∗ (v) du dv ≥ 0 . (4.19)
0 0
iϕ(u) − iϕ(−u) ≥ 0 .
We deduce from the above two inequalities that ϕ(u)+ϕ(−u) is real and ϕ(u)−ϕ(−u) is pure imaginary,
and therefore that ϕ(u)∗ = ϕ(−u).
Now with n = 2, and u1 = u, u2 = 0, in order that (4.17) holds for all z1 , z2 ∈ , it is necessary that
the determinant of „ «
ϕ(0) ϕ(u)
ϕ(−u) ϕ(0)
be non-negative, that is taking the results ϕ(0) ≥ 0 and ϕ(−u) = ϕ(u)∗ into account, |ϕ(u)| ≤ ϕ(0). In
particular, if ϕ(0) =), then ϕ is identically null, so that we can discard this trivial case for which the
theorem is obviously true, and assume that ϕ(0) > 0.
With n = 3, and u1 = 0, u2 = u, u3 = u + v, in order that (4.17) holds for all z1 , z2 , z3 ∈ , it is
1 + ϕ(u)∗ ϕ(v)∗ ϕ(u + v) + ϕ(u + v)∗ ϕ(u)ϕ(v) − |ϕ(u)|2 − |ϕ(v)|2 − |ϕ(u + v)|2 ≥ 0
82 CHAPTER 4. PROBABILITY
that is
1 + 2Re{ϕ(u)ϕ(v)ϕ(u + v)∗ } − |ϕ(v)|2 ≥ |ϕ(v)|2 + |ϕ(u + v)|2
and subtracting Re{ϕ(u)ϕ(u + v)∗ } from both sides,
Therefore,
≥ |ϕ(u + v) − ϕ(v)|2 ≤ 1 + 2 − |ϕ(v)|2 + 2 |1 − ϕ(v)| ≤ 4 (1 − ϕ(v)) ,
from which follows the uniform continuity of ϕ.
a non-negative quantity.
Theorem 4.3.3 Let ϕ : → be a function satisfying properties A and B. Then there exists a
nonnegative finite constant K and a real random variable X such that for all u ∈ ,
h i
ϕ(u) = KE eiuX . (4.20)
we have that Z +∞
g(x, A), dx ≤ ϕ(0) .
−∞
4.3. THE THEORY OF CARACTERISTIC FUNCTIONS AND WEAK CONVERGENCE83
The
` u function
´ g(x, A) is therefore integrable and it is the the Fourier transform of the integrable function
h A ϕ(u) (see (4.21). Therefore we have the Fourier inversion formula
“u” Z +∞
h ϕ(u) = g(x, A)e−iux dx .
A −∞
R +∞
In particular, with u = 0, −∞ g(x, A)e−iux dx = ϕ(0). Therefore, f (x, A) := g(x,A) ϕ(0)
is the probability
h( A
u
)
density of some real random variable with caracteristic function ϕ(0) ϕ(u). But
`u´
h A ϕ(u)
lim ϕ(u) = .
A↑∞ ϕ(0) ϕ(0)
From the fundamental criterion of convergence in distribution, we deduce that since the limit is contin-
uous at 0, this limit is a caracteristic function.
g(v) = 1 if v ≤ 0
= 1 − v if 0 ≤ v ≤ 1
= 1 if v ≥ 1 .
Also, define
Aε = {x ∈ E ; d(x, A) < ε} ,
and observe that Aε ↓ A as ε ↓ 0.
The following technical theorem is the basis for the theory of weak convergence.
(iv) For any continuity set A ⊆ E (that is such that P (∂A) = 0), lim inf P n (A) = P (A)
84 CHAPTER 4. PROBABILITY
(ii) ⇒ (iii) and (iii) ⇒ (ii): Take the complements of the sets thereof.
≤ P (closA) = P (A) ,
and
≥ P (intA) = P (A) ,
Xk Z Xk
i−1 i
(P (Fi−1 ) − P (Fi )) ≤ f (x)P (dx) ≤ (P (Fi−1 ) − P (Fi ))
i=1
k E i=1
k
k
1 1X
≤ + P (Fi )
k k i=1
Z
1
≤ + f (x)P (dx)
k E
4.3. THE THEORY OF CARACTERISTIC FUNCTIONS AND WEAK CONVERGENCE85
The completeness property of L2 (P ) reads, in this case, as follows. If {Xn }n≥1 is a sequence of random
variables in L2 (P ) such that ˆ ˜
lim E |Xm − Xn |2 = 0,
m,n↑∞
In Probability theory, one then says that the sequence {Xn }n≥1 converges in quadratic mean to X.
In the Hilbert space L2 (P ), Theorem 2.1.3 reads: Let {Xn }n≥1 and {Yn }n≥1 be sequences of random
variables of L2 (P ) that converge in quadratic mean to the random variables X and Y respectively. Then
lim E [Xn Ym∗ ] = E [XY ∗ ] .
m,n↑∞
In particular, (with Yn ≡ Xn ), ˆ ˜ ˆ ˜
lim E kXn k2 = E kXk2
m↑∞
4.4 Exercises
Exercise 4.1.
♥ Let X be a non-negative random variable and let G : + → be the primitive function of g : + → ,
that is, for all x ≥ 0, Z x
G(x) = G(0) + g(u)du.
0
(a) Let X be a non-negative random variable with finite mean µ and such that E [G(X)] < ∞. Show
that Z ∞
E [G(X)] = G(0) + g(x)P (X ≥ x)dx.
0
(b) Let X be as in (a), and let X̄ be a non-negative random variable with the progability density
µ−1 P (X ≥ x).
Show that ˆ ˜
h i E eiuX − 1
iuX̄
E e = .
iµu
Exercise 4.2.
Let {Xn }n≥1 be a sequence of identically distributed integrable real random variables with common
mean m. Let
Xn
Sn := Xi .
i=1
(This is the condition prevailing in the proof of Borel’s slln in Section ??.) Show that
"∞ #
X 4
E (Sn − m) < ∞ .
n=1
Deduce from this the slln for the sequence {Xn }n≥1 .
Exercise 4.3.
Let X be a non-negative random variable. Prove that
h i
lim E e−θX = P (X > 0) .
θ↓0 ; θ>0
Exercise 4.4.
♥ Let X be a real random variable with probability density fX (x). Let h : → be a measurable
function such that h(X) is integrable. Prove that
√ √
√ fX ( X 2 ) √ fX (− X 2 )
E[h(x)|X 2 ] = h( X 2 ) √ √ + h(− X 2 ) √ √
fX ( X 2 ) + fX (− X 2 ) fX ( X 2 ) + fX (− X 2 )
Exercise 4.5.
4.4. EXERCISES 87
Let X be an integrable real random variable. Define X + = max(X, 0), X − = max(−X, 0). Prove that
ˆ ˜
E X−
E[X)|X + ] = X + − 1 +
P (X + = 0) {X =0}
Exercise 4.6.
Let X be a real random variable whose caracteristic
P function ψ is such that |ψ(t0 )| = 1 for some t0 6= 0.
Show that there exists some a ∈ such that n∈ P (X = a + n 2π
t0
)=1.