Professional Documents
Culture Documents
Pitt H.R. Measure Theory and Probability (Tata Institute, 1958) (O) (126s) MV
Pitt H.R. Measure Theory and Probability (Tata Institute, 1958) (O) (126s) MV
by
H.R. Pitt
by
H.R. Pitt
Notes by
Raghavan Narasimhan
1 Measure Theory 1
1. Sets and operations on sets . . . . . . . . . . . . . . . . 1
2. Sequence of sets . . . . . . . . . . . . . . . . . . . . . . 3
3. Additive system of sets . . . . . . . . . . . . . . . . . . 4
4. Set Functions . . . . . . . . . . . . . . . . . . . . . . . 5
5. Continuity of set functions . . . . . . . . . . . . . . . . 6
6. Extensions and contractions of... . . . . . . . . . . . . . 10
7. Outer Measure . . . . . . . . . . . . . . . . . . . . . . 11
8. Classical Lebesgue and Stieltjes measures . . . . . . . . 16
9. Borel sets and Borel measure . . . . . . . . . . . . . . . 17
10. Measurable functions . . . . . . . . . . . . . . . . . . . 20
11. The Lebesgue integral . . . . . . . . . . . . . . . . . . . 23
12. Absolute Continuity . . . . . . . . . . . . . . . . . . . . 27
13. Convergence theorems . . . . . . . . . . . . . . . . . . 31
14. The Riemann Integral . . . . . . . . . . . . . . . . . . . 34
15. Stieltjes Integrals . . . . . . . . . . . . . . . . . . . . . 37
16. L-Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 39
17. Mappings of measures . . . . . . . . . . . . . . . . . . 46
18. Differentiation . . . . . . . . . . . . . . . . . . . . . . . 47
19. Product Measures and Multiple Integrals . . . . . . . . . 57
2 Probability 61
1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 61
2. Function of a random variable . . . . . . . . . . . . . . 62
3. Parameters of random variables . . . . . . . . . . . . . . 63
iii
iv Contents
1
2 1. Measure Theory
∞
\
(or X.Y) and for a sequence {Xn }, Xn . Two sets are disjoint if
n=1
their intersection is 0, a system of sets is disjoint if every pair of
2 sets of the system is. For disjoint system we write X + Y for X ∪ Y
P
and Xn for ∪Xn , this notation implying that the sets are disjoint.
More generally
(∪X)′ = ∩X ′ , (∩X)′ = ∪X ′ .
0 and X
∪ and ∩
∩ and ∪
⊂ and ⊃
2. Sequence of sets
3 A sequence of sets X1 , X2 , . . . is increasing if
X1 ⊂ X2 ⊂ X3 ⊂ . . .
decreasing If
X1 ⊃ X2 ⊃ X3 ⊃ . . .
The upper limit, lim sup Xn of a sequence {Xn } of sets is the set
of points which belong to Xn for infinitely many n. The lower limit,
lim inf Xn is the set of points which belong to Xn for all but a finite num-
ber of n. It follows that lim inf Xn ⊂ lim sup Xn and if lim sup Xn =
lim inf Xn = X, X is called the limit of the sequence, which then cover-
age to X.
It is easy to show that
∞ \
[ ∞
lim inf Xn = Xm
n=1 m=n
and that
∞ [
\ ∞
lim sup Xn = Xm .
n=1 m=n
Then if Xn ↓,
∞
\ ∞
\ ∞
\
Xm = Xm . lim inf Xn = Xm ,
m=n m=1 m=1
[∞ ∞
\
Xm = Xn , lim sup Xn = Xn ,
m=n n=1
∞
\
lim Xn = Xn ,
n=1
and similarly if Xn ↑,
∞
[
lim Xn = Xn .
n=1
4 1. Measure Theory
4. Set Functions
Functions con be defined on a system of sets to take values in any given
space. If the space is an abelian group with the group operation called
addition, one can define the additivity of the set function.
Thus, if µ is defined on an additive system of sets, µ is additive if
X X
µ Xn = µ(Xn )
1. X is an infinite set,
2. X is the interval 0 ≤ x < 1 and µ(X) is the sum of the lengths of fi-
nite sums of open or closed intervals with closure in X. These sets
6 1. Measure Theory
X = X1 + (X2 − X1 ) + (X3 − X2 ) + · · · ,
µ(X) = −µ(X1 ) + µ(X2 − X1 ) + · · ·
N
X
= µ(X1 ) + lim µ(Xn − Xn−1 )
N→∞
n=2
= lim µ(XN ).
N→∞
5.. Continuity of set functions 7
Y = Y1 + Y2 + Y3 + · · ·
we write
N
X
Y = lim Yn ,
N→∞
n=1
N N
X X
µ(Y) = lim µ Yn , since
Yn ↑ Y
N→∞
n=1 n=1
N
X
= lim µ(Yn )
N→∞
n=1
∞
P
by finite additivity, and therefore µ(Y) = µ(Yn ).
n=1
On the other hand, if µ is finite and continuous at 0, and X =
∞
P
Xn , we write
n=1
N ∞
X X
µ(X) = µ Xn + µ
Xn
n=1 n=N+1
N
∞
X X
= µ(Xn ) + µ Xn , by finite additivity,
n=1 n=N+1
∞
P
since Xn ↓ 0 and has finite µ.
N+1
8 1. Measure Theory
we have
n
\
A = Ak + (A − Ak ), A = [Ak + (A − Ak )]
k=1
5.. Continuity of set functions 9
n
This can be expanded as the union of 2n sets of the form A∗k ,
T
k=1
A∗k = Ak or A − Ak , and we write Bn for the sum of those for which
µ < 0. (If there is no such set, Bn = 0). Then, since An consists of
disjoint sets which either belong to Bn or have µ ≥ 0, we get 10
µ(An ) ≥ (Bn )
and similarly
µ(Bn ) ≥ µ(Bn ∪ Bn+1 ∪ . . . ∪ Bn′ )
for any n′ > n. By continuity from below, we can let n′ → ∞,
∞
[
µ(An ) ≥ µ(Bn ) ≥ µ Bk
k=n
∞
Let X− = limn→∞
S
Bk . Then
k=n
and then
µ(I) ≤ µ[∪∞ ′
n=1 In ] = µ(I1 ) + µ(I2 · I1 ) + · · ·
≤ µ(I1 ) + µ(I2 ) + · · ·
7. Outer Measure
We define the out or measure of a set X with respect to a completely ad- 13
P
ditive non-negative µ(I) defined on a additive system S 0 to be inf µ(In )
S∞
for all sequences {In } of sets of S 0 which cover X (that is, X ⊂ ).
n=1
Since any I of S 0 covers itself, its outer measure does not exceed
µ(I). On the other hand it follows from Theorem 4(c) that
∞
X
µ(I) ≤ µ(In )
n=1
for every sequence (In ) covering I, and the inequality remains true if
the right hand side is replaced by its lower bound, which is the outer
12 1. Measure Theory
∞
X
µ(X) ≤ µ(Xn )
n=1
∞
P
Proof. Let ǫ > 0, ǫn ≤ ǫ. Then we can choose Inν from S 0 so that
n=1
∞
[ ∞
X
Xn ⊂ Inν , µ(Inν ) ≤ µ(Xn ) + ǫn ,
ν=1 ν=1
X ∞
∞ X ∞
X
µ(X) ≤ µ(Inν ) ≤ (µ(Xn ) + ǫn )
n=1 ν=1 n=1
∞
X
≤ µ(Xn ) + ǫ,
n=1
Proof. If P is any set with µ(P) < ∞, and ǫ > 0, we can define In in
S 0 so that
[ ∞ ∞
X
P⊂ In , µ(In ) ≤ µ(P) + ǫ
n=1 n=1
Then
∞
[ ∞
[
PI ⊂ I · In , p − PI ⊂ (In − IIn )
n=1 n=1
and
∞
X
µ(PI) + µ(P − PI) ≤ (µ(IIn ) + µ(In − IIn ))
n=1
∞
X
= µ(In ) ≤ µ(P) + ǫ
n=1
as required.
We can now prove the fundamental theorem.
PX ′ = P − PX, P − PX ′ = PX,
µ(PX ′ ) + µ(P − PX) = µ(PX) + µ(P − PX)
Then, since
and so X1 X2 is measurable.
It follows at once now that the sum and difference of two measurable
sets are measurable and if we take P = X1 + X2 in the formula defining
measurablility of X1 , it follows that
N
P
since Xn is measurable,
n=1
N N
X X
≥ µ P Xn + µ(P − PX) = µ(PXn ) + µ(P − PX)
n=1 n=1
∞
X
µ(P) ≥ µ(PXn ) + µ(P − PX)
n=1
≥ µ(PX) + µ(P − PX),
and so X is measurable.
The measure defined in Theorem 7 is not generally the minimal mea-
sure generated by µ, and the minimal measure is generally not complete.
However, any measure can be completed by adding to the system of
measurable sets (X) the sets X ∪ N where N is a subset of a set of mea-
sure zero and defining µ(X ∪ N) = µ(X). This is consistent with the
original definition and gives us a measure since countable unions of sets
X ∪ N are sets of the same form, (X ∪ N)′ = X ′ ∩ N ′ = X ′ ∩ (Y ′ ∪ N · Y ′ )
(where N ⊂ Y, Y being measurable and of 0 measure) = X1 ∪N1 is of the
same form and µ is clearly completely additive on this extended system.
16 1. Measure Theory
ai ≤ xi < bi (i = 1, 2, . . . , k)
we see that
and so
µ(B − A) = 0,
while A, B are obviously Borel sets.
Conversely, if µ(P) < ∞ and
F ⊂ X ⊂ G,
so that X is measurable.
In the second case, X is the sum of A and a subset of B contained
in a Borel set of measure zero and is therefore Lebesgue measurable by
the completeness of Lebesgue measure.
It is possible to defined measures on the Borel sets in Rk in which the
measure of a rectangle is not equal to its volume. All that is necessary
is that they should be completely additive on figures. Measures of this
kind are usually called positive Stiltjes measures in Rk and Theorems 11
and 12 remain valid for them but Theorem 10 does not. For example, a
single point may have positive Stieltjes measure.
A particularly important case is k = 1, when a Stieltjes measure can
be defined on the real line by any monotonic increasing function Ψ(X).
The figures I are finite sums of intervals ai ≤ x < bi and µ(I) is defined
by X
µ(I) = {Ψ(bi − 0) − Ψ(ai − 0)}.
i
20 1. Measure Theory
Proof. Since
∞ " #
\ 1
ε[ f (x) ≥ k] = ε f (x) > k − ,
n=1
n
10.. Measurable functions 21
the union being over all rationals r. This is a countable union of mea-
surable sets so that f + g is measurable. Similarly f − g is measurable.
Finally
√ √
ε[ f (x))2 > k] = ε[ f (x) > k] + ε[ f (x) < − k] for k ≥ 0
1 1
f (x)g(x) = ( f (x) + g(x))2 − ( f (x) − g(x))2
4 4
f · g is measurable.
∞ \
[ ∞
= ǫ[ fn (x) < k]
N=1 n=N
Proof. We may plainly neglect the set of zero measure in which fn (X)
dose not converge to a finite limit. Let
∞
\
and let X◦ = Xν
ν=1
Then
∞
X
µ(X − X◦ ) ≤ µ(X − Xν ) < δ
ν=1
and | f (x) − fn (x) |< 1/ν for n ≥ Nν
if x is in xν and therefore if X is in X◦ . This proves the theorem.
for a finite V. One of the µ(Eν′ ) can be infinite only if F(x) = ∞ and
then there is nothing to prove. Otherwise, µ(Eν′ ) < ∞ and we let {yν }
be a subdivision with δ = sup(yν+1 − yν ) and denote by S ′′ the sum
defined for {yν } and by S the sum defined for the subdivision consisting
of points yν and y′ν . Since S ′ is not decreased by insertion of extra points
of sub-division,
V
X
S ′′ ≥ S ′ ≥ y′ν µ(Eν′ ) > A,
ν=1
V
X
′′
while S −S ≤δ µ(Eν′ )
1
We define
X X X
when both the integrals on the right are finite, so that f (x) is integrable
if and only if | f (x) | is integrable.
In general, we use the integral sign only when the integrand is in-
tegrable in this absolute sense.
R The only exception to this rule is that
we may sometimes write f (x)dµ = ∞ when f (x) ≥ −r(x) and r(x) is
X
integrable.
∞
X ∞
X ∞
X
S = yν µ(Eν ) = yν µ(Eν Xn )
ν=1 ν=1 n=1
26 1. Measure Theory
∞ X
X ∞
= yν µ (Eν Xn )
n=1 ν=1
∞
X
= Sn
n=1
30 where S n is the sum for f (x) over Xn . Since S and S n (which are ≥ 0)
tend to F(X) and F(XN ) respectively as the maximum interval of subdi-
vision tends to 0, we get
Z ∞
X
F(X) = f (x)dµ = F(Xn ).
X n=1
Proof. We may again suppose that f (x) ≥ 0 and that a > 0. If we use
the subdivision {yν } for f (x) and {ayν } for af (x), the sets Eν are the same
in each case, and the proof is trivial.
has
h positive measure,
i and hence, so has at least one subset En =∈
1 1
n+1 ≤ f (x) < n Then
Z Z
µ(En )
f (x)d ≥ f (x)dµ ≥ >0
n+1
x En
which is impossible.
R
Corollary 1. If f (x)dµ = 0 for all X ⊂ X, f (x) not necessarily of the
K
same sign, then f (x) = 0 p.p.
we have merely to apply Theorem 24 to X1 =∈ [ f (x) ≥ 0] and to
X2 =∈ [ f (x) < 0].
R R
Corollary 2. If f (x)dµ = g(x)dµ for all X ⊂ X , then f (x) = g(x)
X x
p.p. If f (x) = g(x) p.p. we say that f and g are equivalent.
is absolutely continuous.
we have Eν ⊂ X − E A forν ≤ V.
XV Z XV
Now F(X − E A ) ≥ f (x)dµ ≥ yν µ(Eν )
ν=1 EV ν=1
> F(X)− ∈ /2
and therefore, F(E A ) <∈ /2.
33
If X is any measurable set,
12.. Absolute Continuity 29
F(Xn ) → F(X).
Proof. If µ(X) < ∞ this follows from Theorem 25 and the continuity
of µ in the sense of Theorem 2. If µ(X) = ∞, ∈> o we can choose a
subdivision { yν } and corresponding subsets Eν of X so that
∞
X
yν µ(Eν ) > F(X)− ∈
ν=1
∞
X ∞
X
lim F(Xn ) = F(Eν ) ≥ yν µ(Eν ) > F(X)− ∈
n→∞
ν=1 ν=1
Theorem 28. If f (x) and g(x) are integrable on X, so is f (x) + g(x) and
Z Z Z
[ f (x) + g(x)]dµ = f (x)dµ + g(x)dµ.
X X X
Proof. Since | f (x) + g(x) |≤| f (x) | + | g(x) |≤ 2 sup(| f (x) |, | g(x) |)
we have
Z Z
| f (x) + g(x) | dµ ≤ 2 sup(| f (x) |, | g(x) |)dµ
X X
Z Z
= 2 | f (x) | dµ + | g(x) | dµ ≤ 2
| f |≥|g| | f |<|g|
Z Z
| f (x) | dµ + 2 | g(x) | dµ
X X
we get
Z ∞ Z
X ∞
X
[ f (x) + g(x)]dµ ≤ f (x)dµ + yν+1 µ(Eν )
X ν=0.. E ν=0
ν
Z
≤ f (x)dµ + (1+ ∈)S + y1 µ(Eo )
X
Z Z
≤ f (x)dµ + (1+ ∈) g(x)dµ+ ∈ µ(X)
X X
In particular, the conclusion holds if µ(X) < ∞ and the fn (x) are
uniformly bounded.
n
X
provided that | uν (x) |≤ γ(x) for all N and x, γ(x) ∈ L(X).
ν=1
The equation is true if un (x) ≥ 0, in the sense that if either side is
finite, then so is the other and equality holds, while if either side is ∞ so
is the other.
Zb Zb
d ∂f
f (x, y)dx = dx
dy◦ ∂y◦
a a
provided that
f (x, y◦ + h) − f (x, y◦ ) ≤ γ(x)εL(a, b)
h
Rb
ϕn (x) ≤ f (x) ≤ ψn (x), (ψn (x) − ϕn (x))dx → 0 as n → ∞ since lim
a
ϕn (x) = limΨn (x) = f (x) p.p., it is clear that f (x) is L-integrable and
that its L-integral satisfies
Zb Zb Zb
f (x)dx = lim ϕn (x)dx = lim ψn (x)dx,
n→∞ n→∞
a a a
and the common value of these is the R-integral. The following is the
main theorem.
µ(E0 ) > b − a− ∈,
| f (x + h) − f (x)| ≤∈ for x ∈ E0 , x + h ∈ (a, b), |h| < δ.
≤ ǫ/2 + ǫ/2 = ǫ
∞
Let E ∗ =
T
En , so that
n=1
X
µ(E ∗ ) ≥ b − a − ∈n > b − a − ǫ
S − s ≤ 2 ∈ (b − a) + 2 ∈ M
when both integrals on the right are finite, and since µ+ and µ− are mea-
sure, all our theorems apply to the integrals separately and therefore to
their sum with the exception of Theorems 22, 23, 24, 30, 32 in which
the sign of µ obviously plays a part. The basic inequality which takes
the place of Theorem 22 is
Proof.
Z Z
+ −
f (x)dµ =
f (x)dµ − X f (x)d(−µ )
√ x
x
Z Z
+ −
≤ f (x)dµ +
f (x)d(−µ )
√ √
x x
Z Z
≤ | f (x)dµ+ + | f (x)|d(−µ− )
√ √
x x
Z Z
= | f (x)|dµ− = | f (x)| |dµ|.
√ √
x x
will be obvious when theorems do not depend on the sign of µ and these
can be extended immediately to the general case. When we deal with
inequalities, it is generally essential to restrict µ to the positive case (or
replace it by µ).
Integrals with µ taking positive and negative values are usually called
Stieltjes integrals. If they are integrals of functions f (x) of a real vari-
able. x with respect to µ defined by a function ψ(x) of bounded variation,
we write Z Z
f (x)dψ(x) for f (x)dµ,
X X
and if X is an interval (a, b) with ψ(x) continuous at a and at b, we write
it as
Zb
f (x)d ψ(x).
a
In particular, if ψ(x) = x, we get the classical Lebesgue integral,
which can always be written in this from.
If ψ(x) is not continuous at a or at b, the integral will generally
depend on whether the interval of integration is open or closed at each
end, and we have to specify the integral in one of the four forms.
Zb±0
f (x)dψ (x)
a±o
16. L-Spaces
A set L of elements f , g, . . . is a linear space over the field R of real
numbers (and similarly over any field) if
(3) (α + β) f = α f + β f
(4) α( f + g) = α f + αg
(6) 1. f = f.
It is a Banach space if
kα f k =| α | k f k, k f + gk ≤ k f k + kgk
and
Lq (X) ⊂ L p (X).
|| f + g|| ≤ || f || + ||g||
|| fn − fm || → 0 as m, n → ∞,
and in particular
|| fnk+1 − fnk || ≤ ǫk .
Then 47
Z Z
∈kp ≥ | fnk+1 (x) − fnk (x)| p dµ ≥ | fnk+1 (x) − fnk (x)| p dµ
X X−Ek
p
≥ Ak µ(X − Ek ),
" #
∞
(X− ∈k ) → 0 as K → ∞ since (∈k /Ak ) p < ∞.
S P
so that µ
K
∞
T
Since fnk (x) tends to a limit at every point of each set Ek (because
P K
Ak < ∞) , it follows that fnk (x) tends to a limit f (x) p.p.
Also, it follows from Fatou’s lemma that, since || fnk || is bounded,
f (x) ∈ L p (X) and that
|| f nk − f || ≥∈k , || f nk − f || → 0 as k → ∞
Theorem 42. The set of step functions (and even the sets step function
taking relation values) is dense in L p for 1 ≤ p < ∞. If the Borel system
of measurable sets in X is generated by a countable, finitely additive sys-
tem S 0 ,then the set of steps functions with rational values is countable
and L p is separable.
0 ≤ Q(X) = Q(X.X s )
for all measurable X. Then we can find a sequence {θn (x)} in Θ for which
Z Z
θn (x)dµ −→ sup θ(x)dµ ≤ H(X) < ∞
X θǫΘ x
n
[
X= X.ǫ[θn′ (x) = θk (x)]
k=1
we see that belongs to Θ. Since θn′ (x) increases with n for each x,
θn′ (x)
it has a limit f (x) ≥ 0, which also belongs to Θ ,and we can write
Z
F(X) = f (x)dµ ≤ H(X), Q(X) = H(X) − F(X) ≥ 0
X
while R R
(1) X
f (x)dµ = supθǫΘ X θ(x)dµ < ∞.
Now let
µ(X)
Qn (X) = Q(X) −
n
+ −
and let Xn , Xn be the sets defined by Theorem 3 for which
Then,
Z
µ(X) 1
H(X) ≥ F(X) + = ( f (x) + )dµ if X ⊂ X+n
n X n
and if
1
f (x) = f (x) + for x ∈ X+n
n
f (x) = f (x) for x ∈ X+n ,
where F1 (X), F2 (X) are integrals and Q1 (X), Q2 (X) vanish on all sets
disjoint from two sets of measure zero, whose union X s also has measure
zero. Then
Then, it follows from Theorem 27 that we may suppose that φ(X) <
∞. If E > 0, we consider subdivisions {yν } for which
y1 ≤ ǫ, yν+1 ≤ (1 + ǫ)yν (ν ≥ 1)
so that
∞
X Z
s= yν φ (Eν ) ≤ f (x)dφ
ν=1 x
∞
X
≤ yν+1 Φ(E)
v=0
≤ (1+ ∈)s+ ∈ Φ(X)
But 52
Z ∞ Z
X
f (x)φ(x)dµ = f (x)φ(x)dµ
X ν=o Eν
46 1. Measure Theory
In particular
Z b Z B
f (x)dx = f (α(t))dα(t)
a A
18. Differentiation
It has been shown in Theorem 44 that any completely additive and abso-
lutely continuous finite set function can be expressed as the the integral
of an integrable function defined uniquely upto a set of measure zero
called its Radon derivative. This derivative does not depend upon any
topological properties of the space X. On the other hand the derivative 54
of a function of a real variable is defined, classically, as a limit in the
topology of R. An obvious problem is to determine the relationship be-
tween Radon derivatives and those defined by other means. We consider
here only the case X = R where the theory is familiar (but not easy). We
need some preliminary results about derivatives of a function F(x) in the
classical sense.
Definition. The upper and lower, right and left derivatives of F(x) at x
are defined respectively, by
F(x + h) − F(x)
D+ F = lim sup
h→+0 h
F(x + h) − F(x)
D+ F = lim inf
h→+0 h
F(x + h) − F(x)
D− F = lim sup
h→−0 h
48 1. Measure Theory
F(x + h) − F(x)
D− F = lim inf
h→−0 h
Plainly D+ F ≤ D+ F, D− F ≤ D− F. If D+ F = D+ F or D− F = D− F
we say that F(x) is differentiable on the or on the left, respectively, and
the common values are called the right or left derivatives, F+′ , F−′ . If
all four derivatives are equal, we say that F(x) is differentiable with
derivative F ′ (x) equal to the common value of these derivatives.
Theorem 48. The set of points at which F+′ and F−′ both are exist but
different is countable.
55 Proof. It is enough to prove that the set E of points x in which F ′ (x) <
F+′ (x)is countable. Let r1 , r2 . . . be the sequence of all rational numbers
arranged in some definite order. If x ∈ E let k = k(x) be the smallest
integer for which
F ′ (x) < rk < F+′ (x)
Now let m, n be the smallest integers for which
F(ζ) − F(x)
rm < x, < rk for rm < ζ < x,
ζ−x
F(ζ) − F(x)
rn > x, > rk for x < ζ < rn
ζ−x
Every x defines the triple (k, m, n) uniquely, and two numbers x1 <
x2 cannot have the same triple (k, m, n) associated with them. For if they
did, we should have
rm < x1 < x2 < rn
and therefore
F(x1 ) − F(x2 )
< rk from the inequality
x1 − x2
F(x1 ) − F(x2 )
while < rk from the second
x1 − x2
and these are contradictory. Since the number of triples (k, m, n) is
countable, so is E.
18.. Differentiation 49
N
X N
X
µ(In )− ∈≤ µ(E) ≤ µ(E In )+ ∈ .
n=1 n=1
∞
P
since µ(In ) ≤ µ(G) ≤ µ(E)+ ∈< ∞ and µ(A) > 0. It follows that
n=1
∞
[
µ A − A
Jn > 0
n=N+1
∞
S
and that A − A Jn contains at least one point ξ. Moreover, since
n=N+1
N
P
ξ does not belong to the closed set In , we can choose from V an
n=1
interval I containing ξ and such that I. In = 0 for n = 1, 2 . . . , N. On
the other hand, I.In cannot be empty for all n ≥ N + 1 for, if it were, we
should have
0 < µ(I) ≥ 1n < 2λn+1
P∞
for all n ≤ N + 1 and this is impossible since λn → 0 (for λn =
1
∞
P
µ(In ) ≤ µ(G) < ∞). We can therefore define n◦ ≥ N + 1 to be the
1
smallest integer for which I.In◦ , 0. But
I.In = 0 f orn ≤ n◦ − 1
Hence I, and therefore ξ, is contained in Jn◦ since Jn◦ has five times the
length of In◦ and I.In◦ , 0.
∞
S
58 This is impossible since ξ belongs to A − A Jn and n◦ ≥ N + 1.
n=N+1
Hence we must have
µ(A) = 0
∞
P
and µ(ES ) = µ(E), µ(In ) ≤ µ(G) ≤ µ(E)+ ∈ .
n=1
We can therefore choose N as large that
N
X N
X
µ(In )− ∈≤ µ(E) ≤ µ(E In )+ ∈
n=1 n=1
18.. Differentiation 51
F(K ′ ) ≥ r1 µ(K ′ ).
F(x+h)−F(x)
Proof. Since h ≥ 0 for h , o it follows that
F(x + h) − F(x)
F ′ (x) = lim ≥ 0 p.p.
h→0 h
Z b−δ Z b−δ
′ F(x + h) − F(x)
F (x)dx ≤ lim inf dx
a h→0 a h
1 Z b+h−δ 1 b
Z
= lim inf F(x)dx − F(x)dx
h→0 h a+h h a
1 Z b+h−δ 1 a+h
Z
= lim inf F(x)dx − F(x)dx
h→0 h b−δ h a
≤ lim [F(b + h − δ) − F(a)]
h→0
≤ F(b) − F(a)
Proof. Our hypothesis implies that F(X) = 0 for all open or closed
intervals X and therefore, since F(X) is completely additive F(X) = 0
for all open sets X every open set being the sum of a countable number
of disjoint open intervals. But every measurable set is t he sum of a set
of zero measure and the limit of a decreasing sequence of open sets by
Theorem 12, and therefore F(X) = 0 for every measurable set X. The
conclusion then follows from corollary 1 to Theorem 24.
Proof. We may suppose that F(x), G(x) increase on the interval and are
non - negative and define
Z Z Z
∆(I) = F(x)dG(x) + G(x)dF(x) − [F(x)Gx]
I I I
where F(I), G(I) are the interval functions defined by F(x),G(x), and 64
similarly
∆(I) ≥ −F(I)G(I)so that | ∆(I) |≥ F(I)G(I)
Now, any interval is the sum of an open interval and one or two end
points and it follows from the additivity of ∆(I), that
| ∆(I) |≤ F(I)G(I).
56 1. Measure Theory
for all intervals. Let ∈> 0. Then apart from a finite number of points at
which F(x + 0) − F(x − 0) >∈, and on which ∆ = 0, we can divide I into
a finite number of disjoint intervals In on each of which F(In ) ≤∈ . Then
X X X
| ∆(I) |=| ∆ In | = | ∆(In ) |≤ F(In )G(In )
X
≤∈ G(In ) =∈ G(I).
The conclusion follows on letting ∈→ 0.
Theorem 55 (Second Mean Value Theorem). (i) If f (x) ∈ L(a, b)
and ϕ(x) is monotonic,
Z b Z ξ Z b
f (x)φ(x)dx = ϕ(a + 0) f (x)dx + φ(b − 0) f (x)dx
a a ξ
65 for some ξ in a ≤ ξ ≤ b.
(ii) If ϕ(x) ≥ 0 and φ(x) decreases in a ≤ x ≤ b,
Z b Z ξ
f (x)ϕ(x)dx = ϕ(a + 0) f (x)dx
a a
for some ξ, a ≤ ξ ≤ b.
Proof.
Rx Suppose that φ(x) decreases in (i), so that, if we put F(x) =
a
f (t)dt, we have
Z b Z b−0
b−0
f (x)φ(x)dx = a+0 [F(x)φ(x)] − F(x)d(x)
a a+0
Z b
= φ(b − 0) f (x)dx + [φ(a + 0) − φ(b − 0)]F(ξ)
a
by Theorem 22 and the fact that F(x) is continuous and attains every
value between its bounds at some point ξin a ≤ ξ ≤ b. This establishes
(i) and we obtain (ii) by defining ϕ(b + 0) = 0 and writing
Z b Z b+0 Z b+0
f (x)φ(x)dx = [F(x)φ(x)] − F(x)dϕ(x)
a a+0 a+0
= ϕ(a + 0)F(ξ) with a ≤ ξ ≤ b.
A refinement enables us to assert that a < ξ < b in (i) and that
a < ξ ≤ b in (ii).
19.. Product Measures and Multiple Integrals 57
where the sets on the left are disjoint. If x is any point of X0 , let Jn′ (x)
be the set of points x′ of X0′ for which (x, x′ ) belongs to Xn xXn′ . Then
Jn′ (x) is measurable in X′ for each x, it has measure µ′ (Xn′ ) when x is in 67
′ ′
Xn and 0 otherwise. This measure µ (Jn (x)) is plainly measurable as a
function of x and
Z
′ ′
µ′ (Jn′ (x))dµ = µ(Xn )µ (Xn )
Xo
58 1. Measure Theory
N
Jn′ (X) is the set of points x of Xo for which (x, x′ )
P
Moreover,
n=1
PN
belongs to Xn xXn . It is measurable and
n=1
Z N N
X X ′ ′
′
µ Jn (x) dµ = µ(Xn )µ (Xn ).
Xo n=1 n=1
′ ∞
P ′
But since X0 xX0 Xn xXn , it follows that
n=1
N
X
lim Jn′ (x) = X0′ for every x of X0 ,
N→∞
n=1
N
X
and therefore lim µ′ Jn (x) = µ(X0′ ) for every x of X0 .
N→∞
n=1
68 Theorem 57. Let X, X′′ be two measure spaces with measures µ, µ′ re-
spectively such that X(X′ ) is the limit of a sequence | Xn ( Xn′ ) of mea-
surable sets of finite measure µ(Xn )(µ′ (Xn′′ )). Let Y be a set in XxX′
measurable with respect to the product measure m defined by µ, µ′ . Let
Y ′ (x) be the set of points x′ ∈ X′ for which (x, x′ ) ∈ Y. Then Y ′ is mea-
surable in X′ for almost all x ∈ X, its measure µ′ (Y ′ (x)) is a measurable
function of x and Z
µ′ (Y ′ (x))dµ = m(Y).
X
19.. Product Measures and Multiple Integrals 59
Finally, 69
Z Z
′ ′ ′
µ (Y (x))dµ = µ (Q(x))dµ = m(Q) = m(Y).
X X
′
Theorem 58 (Fubini’s theorem). Suppose X, X satisfy the hypotheses
′
of Theorem 57. If f (x, x ) is measurable with respect to the product
′ ′ ′
measure defined by µ, µ it is measurable in x for almost all x and in x
for almost all x. The existence of any one of the integrals
Z Z Z Z Z
′ ′ ′
| f (x, x )|dm, dµ | f (x, x)|dµ , dµ | f (x, x)|dµ
XxX′ X X′ X′ X
implies that of the other two and the existence and equality of the inte-
grals
Z Z Z Z Z
′ ′
f (x, x)dm, d f (x, x )dµ , dµ f (x, x′ )dµ
XxX′ X X′ X X′
60 1. Measure Theory
′
Proof. We may obviously suppose that f (x, x ) ≥ 0. Let yν be a sub-
′
division with Eν =∈ [yν ≤ f (x, x ) < yν+1 ]. The theorem holds for the
function equal to yν in Eν for ν = 0, 1, 2, . . . , N (N arbitrary) and zero
elsewhere, by Theorem 57, and the general result follows easily from
the definition of the integral.
Chapter 2
Probability
1. Definitions
A measure µ defined in a space X of points x is called a probability 70
measure or a probability distribution if µ(X) = 1. The measurable sets
X are called events and the probability of the event X is real number
µ(X). Two events X1 , X2 are mutually exclusive if X1 · X2 = 0.
The statement: x is a random variable in X with probability distri-
bution µ means
(ii) that the expression “the probability that x belongs to X”, where
X is a given event, will be taken to mean µ(X) · µ(X) sometimes
written P(xεX).
61
62 2. Probability
Example 1. Tossing a coin The space X has only two points H, T with
four subsets, with probabilities given by
1
µ(0) = 0, µ(H) = µ(T ) = , µ(X) = µ(H + T ) = 1.
2
If we make H, T correspond respectively with the real numbers 0,1,
we get the random real variable with distribution function
Example 2. Throwing a die- The space contains six points, each with
probability 1/6 (unloaded die). The natural correspondence with R gives
rise to F(x) with equal jumps 1/6 at 1, 2, 3, 4, 5, 6.
If
√ √
a ≥ 0, P(y ≤ a) = P(x2 ≤ a) = P(0 ≤ x ≤ a) + P(− a ≤ x < 0),
√ √
G(a + 0) = F( a + 0) − F(− a − 0).
If a ≥ 0.
!
1
G(a + 0) = P(y ≤ a) = P ≤ a = P(x ≥ 1/a) = 1 − F(1/a − 0).
x
Z ZX Z
2 2
= x dµ − 2x xdµ + x dµ
X X X
2 2
2
= E(x ) − 2x + x = E(x ) − (E(x))2 2
r = sup a, R = inf a
F(a)=0 F(a)=1
R
(iv) The mean error is X
|x − x|dµ = E(|x − x|)
F(A − 0) + F(A + 0) ≤ 1.
74
Then
n
X
x = E(ν) = νpν = np,
ν=0
n
X
2 2 2
σ = E(ν ) − x = ν2 pν − n2 p2 = npq
ν=0
F(x) = 0 for x ≤ a
x−a
= for a ≤ x ≤ b
b−a
66 2. Probability
= 1 for b ≤ x.
We now prove
Theorem 1 (Tehebycheff’s inequality). If ∝ (x) is a nonnegative func-
tion of a random variable x and k > 0 then
E(α(x))
P(α(x) ≥ k) ≤
k
Proof.
Z
E(α(x)) = α(x)dµ
ZX Z
= α(x)dµ + α(x)dµ
Zα(x)≥k α(x)<k
Z
≥ α(x)dµ ≥ k dµ = KF(α(x) ≥ k).
α(x)≥k α(x)≥k
4.. Joint probabilities and independence 67
76
Corollary. If k > 0 and x, σ are respectively the mean and the standard
deviation of a random real x, then
p(α(x) ≥ kσ− ) ≤ 1/k2 .
We merely replace k by k2 σ2 , α(x) by (x − x)2 in the Theorem 1.
i=1
α x (x1 , x2 ) = 1 when x1 + x2 ≤ x
= 0 when x1 + x2 > x
Z∞ Z∞
= F1 (x − x2 )dF2 (x2 ) = F1 (x − u)dF2 (u),
−∞ −∞
F1∗ . . . ∗ Fn
Proof. We write
Z∞ Z∞ Zx−u
F(x) = F1 (x − u)dF2 (u) = dF2 (u) f1 (t)dt
−∞ −∞ −∞
4.. Joint probabilities and independence 71
and so
Z∞
′
f (x) = F (x) = f1 (x − u)dF2 (u)p.p
−∞
function of y and
Z∞ Z∞ Z∞
dF1 (y) α(x + y)dF2 (x) = α(x)dF(x)
−∞ −∞ −∞
where
F(x) = F1 ∗ F2 (x)
Proof. We may suppose that α(x) ≥ 0. If we consider first the case in
which α(x) = 1 f ora ≤ x ≤ b and α(x) = 0 elsewhere,
Z∞
α(x + y)dF2 (x) = F2 (b − y − 0) − F2 (a − y − 0),
−∞
72 2. Probability
Z∞ Z∞ Z∞
dF1 (y) α(x + y)dF2 (x) = (F2 (b − y − 0)) − F2 (a − y − 0)dF1 (y)
−∞ −∞ −∞
= F(b − 0) − F(a − 0)
Z ∞
= α(x)dF(x),
−∞
5. Characteristic Functions
A basic tool in modern probability theory is the notion of the character-
istic function ϕ(t) of a distribution function F(x).
It is defined by Z ∞
ϕ(t) = eitx dF(x)
−∞
Z∞ Z∞
-itx
ϕ(−t) = e dF(x) = eitx dF(x) = ϕ(t)
−∞ −∞
If h , 0,
Z∞
ϕ(t + h) − ϕ(t) = eitx (eixh − 1)dF(x),
−∞
Z∞
| ϕ(t + h) − ϕ(t) | ≤ | eixh − 1 | dF(x) = o(1) as h → 0
−∞
Proof.
Z∞ Z∞
ith
ϕ1 (t).ϕ2 (t) = e dF1 (y). eitx dF2 (x)
−∞ −∞
Z∞ Z∞
= dF1 (y) eit(x+y) dF2 (x)
−∞ −∞
Z∞ Z∞
= dF1 (y) eitx dF2 (x − y)
−∞ −∞
Z∞
= eitx dF(x)
−∞
then
1
RA sin ht −iat
(i) F(a+h)−F(a−h) = lim t e ϕ(t)dt if F(x) is continuous
A→∞ π
−A
at a ±h.
5.. Characteristic Functions 75
a+H
R Ra R∞
1 1−cosHt −iat
(ii) F(x)dx − F(x)dx = π t2
e ϕ(t)dt.
a a−H −∞
85
Proof.
ZA ZA Z∞
1 sin ht −iat 1 sin ht −iat
· e ϕ(t)dt = e dt eitx dF(x)
π t π t
−A −A −∞
Z∞ ZA
1 sin ht it(X−a)
= dF(x) e dt
π t
−∞ −A
Z ∞ ZA
2 sin ht cos((x − a)t)
= dF(x) dt
π −∞ t
o
But
Z∞ Z∞
2 sin ht cos(x − a)t 1 sin(x − a + h)t
= dt = dt
π t π t
0 0
ZA
1 sin(x − a − h)t
− dt
π t
0
A(x−a+h)
Z A(x−a+h)
Z A(x−a+h)
Z
1 sin t 1 sin t 1 sin t
= dt = dt − dt
π t π t π t
0 0 A(x−ah)
and 86
Z T Z 0
1 sin t 1 1 sin t 1
dt → , dt → as T → ∞.
π 0 t 2 π −T t 2
76 2. Probability
It follows that
Z A(x−a+h)
0 if x > a + h
1 sin t
A lim dt = 1 if a − h < x < a + h
→∞ π A(x−a−h) t
0 if x < a − h
and since this integral is bounded, it follows from the Lebesgue conver-
gence theorem that
1 A sin ht −iat
Z
A lim e ϕ(t)dt
→∞ π −A t
Z a+h
= dF(x) = F(a + h) − F(a − h)
a−h
This is possible since the first inequality is clearly satisfied for large
X and
Z −X Z ∞ !
+ dF(X) = Fn (−X − 0) + 1 − Fn (X + 0).
−∞ X
6.. Sequences and limits of... 77
Fn (x) → F(x)
0 ≤ F(x) ≤ 1.
Proof. Let rm be the set of rational numbers arranged in a sequence.
Then the numbers Fn (r1 ) are bounded and we can select a sequence
n11 , n12 , . . . so that Fn1ν (r1 ) tends to a limit as γ → ∞ which we denote
by F(r1 ). The sequence (n1ν ) then contains a subsequence (n2ν ) so that
Fn2ν (r2 ) → F(r2 ) and we define by induction sequences (nkν ), (nk+1,ν )
being a subsequence of (nkν ) so that
1−cos Ht
and if we let k → ∞ and note tat t2
εL(−∞, ∞) and that Fnk ,
ϕnk (t)are bounded, we get
Z H Z ◦ Z∞
1 1 1 1 − cos Ht
F(x)dx − F(x)dx = ϕ(t)dt
H ◦ H −H πH t2
−∞
Z∞
1 1 − cos t t
= ϕ( )dt.
π t2 H
−∞
F(∞) − F(−∞) = 1,
ϕ(t) = (q + peit )n
ν
(ii) The Poisson distribution pν = e−c cν! , ν = 0, 1, 2, . . . has distribu-
tion function X
F(x) = pν
ν≤x
1
(iii) The rectangular distribution F ′ (x) = b−a for a < x < b, 0 for
x < a, x > b, 0 for x < a, x > b, has the characteristic function
eitb − eita
ϕ(t) =
(b − a)it
2/2 σ2
91 (iv) The normal distribution F ′ (x) = √1 e−x has characteristic
σ 2π
function
2 2/2
ϕ(t) = e−t σ
| ϕ(t) |2 = ϕ(t)ϕ(−t).
The converse of Theorem 18, (ii) and (iii) is deeper. We sate it with-
out proof. For the proof reference may be made to “Probability Theory”
by M. Loose, pages 213-14 and 272-274.
8. Conditional probabilities
If x is a random variable in X and C is a subset of X with positive mea-
sure, we define
P(X/C) = µ(XC)/µ(C)
to be the conditional probability that x lies in X, subject to the condition
that x belongs to C. It is clear that P(X/C) is probability measure over
all measurable X.
J
P
Theorem 21 (Bayes’ Theorem). Suppose that X = c j , µ(c j ) > 0.
j=1
Then
P(X/C j )µ(C j )
P(cJ /X) =
J
P
P(X/Ci )µ(Ci )
j=1
94
The set X1 , may reduce to a single point x1 , and the definition re-
mains valid provided that µ1 (x1 ) > 0. But usually µ1 (x1 ) = 0, but
the conditional probability with respect to a single point is not difficult
to define. It follows from the Randon-Nikodym theorem that for fixed
X2 , µ(X1 xX2 ) has a Radon derivative which we can write R(x1 , X2 ) with
the property that
Z
µ(X1 x X2 ) = R(x1 , X2 )dµ1
x1
for all measurable X1 . For each X2 , R(x1 , X2 ) is defined for almost all
x1 and plainly R(x1 , X2 ) = 1 p.p. But unfortunately,since the number
of measurable sets X2 is not generally countable, the union of all the
8.. Conditional probabilities 83
exceptional sets may not be a set of measure zero. This means that we
cannot assume that, for almost all x1 , R(x1 , X2 ) is a measure defined on
all measurable sets X2 . If how ever, it is , we write it P(X1 /x1 ) and call
it that conditional probability that x2 EX2 subject to the condition that x1
has a specified value.
Suppose now that (x1 , x2 ) is a random variable s in the plane with
probability density f (x1 , x2 )(i.e f (x1 , x2 ) ≥ 0 and f (x1 , x2 )dx1 dx2 =
1). Then we can define conditional probability densities as follows:
Z∞
f (x1 , x2 )
P(x2 /x1 ) = f1 (x1 ) = f (x1 , x2 )dx2
f1 (x1 )
−∞
R∞
Z∞ x2 f (x1 , x2 )dx2
−∞
m(x1 ) = x2 P(x2 /x1 )dx2 =
R∞
−∞ f (x1 , x2 )dx2
−∞
Z∞
2
σ (x1 ) = (x2 − m(x1 ))2 P(x2 /x1 )dx2
−∞
R∞
(x2 − m(x1 ))2 f (x1 , x2 )dx2
−∞
R∞
f (x1 , x2 )dx2
−∞
for all possible curves x2 = g(x1 ). If the curves x2 = g(x1 ) are restricted
to specified families, the function which minimizes E in that family.
For example, the linear regression is the line x2 = Ax1 + B for which 96
E((x2 − Ax1 − B)2 ) is least, the nth degree polynomial regression is the
polynomial curve of degree n for which the corresponding E is least.
for every ǫ > 0, P being the joint probability in the product space of xn
and x. In particular, if C is a constant, xn −→ C in prob, if
lim E(| xn − x |α ) = 0.
n→∞
97
The most important case is that in which α = 2. The following result
is got almost immediately from the definition.
Theorem 22. If Fn (X) is the distribution function of xn the necessary
and sufficient condition that xn → 0 in prob. is that
Fn (x) → D(x)
9.. Sequences of Random Variables 85
Z∞
we have φ(t) − 1 − mit = eitx − 1 − itx dF(x).
−∞
eitx −1−itx
Now
t
is majorised by a multiple of | x | and → 0 as t → 0
for each x.
Hence, by Lebesgue’s theorem,
Thus
" !#n
n mit 1
(φ(t/n)) = 1 + +0 → emit asn → ∞,
n n
ϕn (t) = (q + peit )n
[1 + c(eit − 1)]n
=
n
c(eit −1)
→e
where
Zx "
sin y 1 + y2
#
T (x) = 1− dG(y),
y y2
−∞
Zx
y2
G(x) = dT (x)
(1 − siny/y)(1 + y2 )
−∞
sin y 1+y2
h i
since 1 − y y2
and its reciprocal are bounded.
Definition. We say that the random variables xn1 , xn2 , . . . are uniformly
asymptotically negligible (u.a.n.) if, for every ∈> 0,
Z
sup dFn (x) → 0 as n → ∞.
ν
|x|≥∈
The condition that the variables are u.a.n implies that the variables 103
are “centered” in the sense that their values are concentrated near 0.
In the general case of u.a.n variables by considering the new variables
xnν − Cnν . Thus, we need only consider the u.a.n case, since theorems
for this case can be extended to the u.a.n. case by trivial changes. We
prove an elementary result about u.a.n. variables first.
Theorem 31. The conditions
Proof. The equivalence of (i) and (ii) follows from the inequalities
Z∞
x2
Z
dFnν (x + anν ) ≤ (ǫ+ | anν |2 ) + dFnν (x)
1 + x2
−∞ |x|≥ǫ
ǫ2 x2
Z Z
1+
dFnν (x) ≤ dFnν (x).
ǫ2 1 + x2
|x|≥ǫ |x|≥ǫ
104 Theorem 32 (The Central Limit Theorem). Suppose that xnν are inde-
pendent u.a.n. variables and that Fn → F. Then.
(ii) If ψ(t) has representation (a, G) and the real numbers an satisfy
Z∞
x
dFnν (x + anν ) = 0,
1 + x2
−∞
Zx
y2
Gn (x) = dFnν (y + anν ) → G(x),
1 + y2
−∞
X
an = anν → a.
ν
and
α √nν (t) = −R(γ √nν (t)) ≥ 0 (1)
It follow easily from the u.a.n. condition and the definition of anν
that a √nν → 0 uniformly in ν as n → ∞ and from Theorem 31 that 105
γnν (t) → 0 uniformly in ν for | t |≤ H where H > 0 is fixed. Hence
uniformly in | t |≤ H.
Now let
Z H
1
A √nν = ∝n (t)dt
2h −H ν
Z∞ " #
sin H x
= 1− dFnν (x + anν )
Hx
−∞
for | t |≤ H, we have
| γnν (t) |≤ CAnν ,
92 2. Probability
and therefore , taking real parts in (2) and using the fact that sup |
ν
γnν (t) |→ 0 uniformly in | t |≤ H,
X X
αnν (t) ≤ − log | ϕn (t) | + ◦ Anν
ν ν
uniformly for |t| ≤ H, and the before, since H is at our disposal, for each
real t.
The first part of the conclusion follows from Theorem 30.
For the converse, our hypothesis implies that Gn (∞) → G(∞) and if
we use the inequality
2
eitx − 1 − itx ≤ C(t) x ,
1 + x2 1 + x2
it follows from (1) that
X
|γnν (t)| ≤ C(t)Gn (∞) = 0(1)
ν
(a) The first part of the theorem shows that the admissible limit func-
tions for sums of u.a.n variables are those for which log ϕ(t) is a
K − L function.
(b) The numbers anν defined in stating the theorem always exist when 107
R∞ x
n is large since 1+x2
dFnν (x + ξ) is continuous in ξ and takes
−∞
positive and negative values at ξ = ∓1 when n is large. They can
be regarded as extra correcting terms required to complete the
centralization of the variables. The u.a n. condition centralizes
each of them separately, but this is not quite enough.
(c) The definition of anν is not the only possible one. It is easy to see
that the proof goes through with trivial changes provided that the
anν are defined so that anν → 0 and
Z∞ Z∞
x x2
dF nν (x + anν ) = 0( dFnν (x + anν ))
1 + x2 1 + x2
−∞ −∞
108 Theorem 34. In order that Fn should tend to the Poisson distribution
with parameter c, it is necessary and sufficient that
x
X 1 XZ y2 c
anν → c and 2
d Fnν (y + an ) → D(x − 1)
ν
2 ν
1 + y 2
−∞
109 where φn (t) is a characteristic function. This means that it is the dis-
tribution of the sum of n independent random variables with the same
distribution.
Theorem 37. A distribution is i.d. if and only if log φ(t) is a K − L
function.
This follows at once from Theorem 32. That the condition that a dis-
tribution be i.d. is equivalent to a lighter one is shown by the following
11.. Cumulative sums 95
Now if λλn+1
n
= θn , we can choose a subsequence (θnk ) which either
tends to ∞ or to some limit θ ≥ 0. In the first case Fnk (xθnk ) → F(±∞) ,
F(x) for some x. In the other case
Proof. First suppose that ϕ(t) is s.d. if it has a positive real zero, it has
a smallest, 2a, since it is continuous, and so
where 113
98 2. Probability
n+m
Y
χn,m (t) = βγ (t/λn+m )
v=n+1
Using theorem 39, we can those n, m(n) → ∞ so that λn /λn+m →
c(0 < c < 1) and then ϕn+m (t) → ϕ(t). Since ϕn (t) → ϕ(t) uniformly in
any finite t-interval ϕn (λn t/λn+m ) → ϕ(ct). It follows that χn,m (t) has a
limit φc (t) which is continuous at t = 0 and is therefore a characteristic
function by theorem 16. Moreover the form of χn,m (t) shows that ϕc (t)
is i.d.
The theorem characterizes L by a property of ϕ(t). It is also possible
to characterize it by a property of F(x).
Theorem 41 (P.Lévy). The function ϕ(t) of L are those for which log ϕ(t)
2 ′
has a K−L representation (a, G) in which x x+1 G (x) exists and decreases
outside a countable set of points.
Proof. If we suppose that ϕ(t) is of class L and 0 < c < 1 and ignore
terms of the form iat we have
Z∞
itx 1 + x2
log ϕc (t) = eitx − 1 − dG(x)
1 + x2 x2
−∞
Z∞ "
itc2 x x2 + c2
#
itx
− e −1− 2 dG(x/c)
c + x2 x2
−∞
114 and the fact that ϕc (t) is i.d. by Theorem 40 implies that Q(x) − Q(bx)
decreases, where b = 1/c > 1, x > 0 and
Z∞
y2 + 1
Q(x) = dG(y)
y2
x
This implies, of course, that ϕ (t) is s.d. and that ϕc has the form
′
eia t ϕ(c′ t).
we write n = n1 + n2 ,
ϕn (t) = (ϕ(t/λn ))n1 (ϕ(t/λn ))n2 = ϕn1 (tλn1 /λn )ϕn2 (tλn2 /λn ).
100 2. Probability
Since q(x) is real, the only solutions of this difference equation, apart
from the special case G(x) = AD(x) are given by
A1 x1−α
q(x) = A1 e−αx , Q(x) = A1 x−α , G′ (x) = ,x>0
1 + x2
′
and α satisfies 1 = e−αd + e−αd . We can use a similar argument for x <0
and we have also
A2 | x |1−α
q(x) = A2 e−α|x| , Q(x) = A2 | x |−α , G′ (x) = ,x<0
1 + x2
where d, d′ , α are the same. Moreover, since G(x) is bounded, we must
have 0 < α ≤ 2, and since the case α = 2 arises when G(x) = AD(x) and
the distribution is normal, we can suppose that 0 < α < 2. Hence
12.. Random Functions 101
Z ∞
itx dx
log ϕ(t) = iat + A1 eitx − 1 −
0 1 + x2 xα+1
Z 0
itx dx
+ A2 eitx − 1 −
−∞ 1 + x2 | x |α+1
The conclusion follows, if α , 1 form the formula 117
Z∞
dx
(eitx − 1) =| t |α e−απi/2 p(−α) if 0 < α < 1,
xα+1
0
Z∞
itx dx
eitx − 1 − =| t |α e−απi/2 p(−α) if 1 < α < 2
1 + x2 xα+1
0
Z∞
itx dx π
eitx − 1 − 2 2
= − | t | −it log | t | +ia1 t,
1+x x 2
0
ai ≤ x(ti ) ≤ bi
ain ≤ yi ≤ bin (i = 1, 2, . . . , in )
12.. Random Functions 103
For each i there is at least one point yi which is contained in all the
intervals [ain , bin ] , and any function x(t) for which x(ti ) = yi belongs to
all In . This impossible since In ↓ 0, and therefore we have µ(In ) → 0.
As an important special case we have the following theorem on ran-
dom sequences.
Theorem 45. Suppose that for every N we have joint distribution func-
tions F N (ξ1 , . . . , ξN ) in RN which are consistent in the sense of Theorem
44. Then a probability measure can be defined in the space of real se-
quences (x1 , x2 , . . .) in such a way that
P(xi ≤ ξi i = 1, 2, . . . , N)
= F N (ξ1 , . . . , ξN )
Corollary. If Fn (x) is a sequence of distribution functions, a probabil- 120
ity measure can be defined in the space of real sequence so that if In are
any open or closed intervals,
N
Y
P(xn ∈ In , n = 1, 2, . . . , N) = Fn (In ).
n=1
countable set. We may also use the notation w for a sequence and xn (w)
for its nth term.
Theorem 46 (The 0 or 1 principle: Borel, Kolmogoroff). The proba-
bility that a random sequence of independent variables have a property
(e.g. convergence) which is not affected by changes in the values of any
finite number of its terms is equal to 0 to 1.
Proof. Let E be the set of sequences having the given property, so that
our hypothesis is that, for every N ≥ 1,
E = X1 x X2 x . . . x Xn x E N
Proof. Let
N
X
E =∈ [T N ≥∈] = Ek
k=1
where Ek =∈ [| sk |≥∈, T k−1 <∈]
N R
σ2ν = s2N dµ, since the
P
It is plain that the Ek are disjoint . Moreover
ν=1 Ω
xν are independent,
Z N Z
X
≥ s2N dµ = s2N dµ
E k=1 E
K
N Z
X
= (sk + xk+1 + ... + xN )2 dµ
k=1 E
K
N Z
X N
X N
X
= s2k dµ + µ(Ek ) σ2i
k=1 E k=1 i=k+1
K
as we require.
∞
X
Theorem 49. If xν are independent with means mν and σ2ν < ∞ then
P∞ ν=1
1 (xν − mν ) converges p.p.
and therefore
! ∞
1 X 2
P sup | sm+n − sm |>∈ ≤ σ
n≥1 ǫ 2 ν=m+1 ν
and this is enough to show that
lim sup | sm+n − sm |= 0p.p.
m→∞ n≥1
Therefore
∞
Y
1 = P(lim sup | xν |< c) = lim (1 − pν )
N→∞
ν=N
∞
Q ∞
P
by the independence of the xν . Hence (1 − pν ) so pν converge.
ν=1 ν=1
The convergence of the other two series follows from Theorem 50.
127 Conversely, suppose that the three series converge, so that, by Theo-
∞ ∞
rem 50, x′ν converges p.p. But it follows from the convergence of pν
P P
1 1
that xν = X′ν for sufficiently large ν and almost all series, and therefore
∞
P
xν also converges p.p.
1
13.. Random Sequences and Convergence Properties 109
n
P
Theorem 52. If x are independent, sn = xν converges if and only if
Q∞ ν=1
ν=1 ϕν (t) converges to a characteristic function.
We do not give the proof. For the proof see j.L.Doob, Stochastic
processes, pages 115, 116. If would seem natural to ask whether there
is a direct proof of Theorem 52 involving some relationship between T N
in Theorem 48 and the functions ϕν (t). This might simplify the whole
theory.
Stated differently, Theorem 52 reads as follows:
Theorem 53. If xν are independent and the distribution functions of sn
converges to a distribution function, then sn converges p.p.
This is a converse of Theorem 47.
Theorem 54 (The strong law of large numbers). If xν are independent,
with zero means and standard deviations σν , and if
n
X σ2 ν
<∞
ν=1
ν2
n
1X
then lim xν = 0 p.p.
n→∞ n
ν=1
xν
Proof. Let yν = , so that yν has standard deviation σγ /ν. It follows 128
ν P P
then from Theorem 49 that yν = (xν /ν) converges p.p.
Pν
If we write xν = x j / j,
j=1
xν = νXν − νXν=1 ,
n n
1 X 1X
xν = (νXν − νXν=1 )
2 ν=1
n ν=1
n
1X
= Xn − Xν−1
n ν−1
= 0(1)
∞
P
If the series xν does not converge p.p. it is possible to get results
1
n
P
about the order of magnitude of the partial sums sn = xν . The basic
1
result is the famous law of the iterated logarithm of Khintchine.
15. L2 -Processes
Theorem 57. If r(s, t) is the auto correlation function of an L2 function,
r(s, t) = r(t, s)
The first part is trivial and the second part follows from the identity
X
r(ti , t j )zi z j = E( |(x(ti ) − m(ti ))zi | 2 ) ≥ 0.
i, j
132
The uniqueness follows from the fact that a Gaussian process is de-
termined by its first and second order moments given r(s, t). Hence, if
we are concerned only with properties depending on r(s, t) and m(t) we
may suppose that all our processes are Gaussian.
Theorem 59. In order that a centred L2 -process should be orthogonal,
it is necessary and sufficient that
We write
dF = E(|dx|2 )
and for stationary functions,
σ2 dt = E(|dx|2 ).
We say that an L2 - function is continuous at t if
lim E(|x(t + h) − x(t)|2 ) = 0,
h−→0
and that it is continuous if it is continuous for all t. Note that this does
not imply that the individual x(t) are continuous at t.
P
We say that x(t) is R-integrable in a ≤ t ≤ b if i x(ti )δi tends to
135 a limit in L2 for any sequence of sub-divisions of (a, b) into intervals
of lengths δi containing points ti respectively. The limit is denoted by
Rb
a
x(t)dt.
Z∞
x(t) = eiλt dZ(λ)
−∞
E(| dZ |2 ) = dS
E(| dZ |)2 = dS ,
can be defined and it is easy to show that y(t) has spectral representation
Z∞ Z∞
y(t) = eiλt K(λ)dZ(λ) = eiλt dz1 (λ)
−∞ −∞
Z∞
where K(λ) = eiλs dk(s), E( | dZ1 (λ) |2 ) = (K(λ))2 dS (λ).
−∞
138
If k(s) = 0 for s < τ, τ > 0 we have
Z ∞
y(t + τ) = x(t − s)dk(s − τ)
0
which depends only on the ”part” of the function x(t) ”before time t”.
The linear prediction problem (Wiener) is to determine k(s) so as to
minimise (in some sense) the difference between y(t) and x(t). In so
far as this difference can be made small, we can regard y(t + τ) as a
prediction of the value of x(s) at time t + τ based on our knowledge of
its behaviour before t.
140 exists for almost all ω. In this case, however, the memorability condition
on f (T λ ω) may be dispensed with.
Z Zλ 2
1 f (T λ
ω)dλ − f ∗
(ω) dω → 0 as ∧ → ∞.
Λ
Ω
0
For proofs see Doob page 465, 515 or P.R. Halmos, Lectures on
Ergodic Theory, The math.Soc. of Japan, pages 16,18. The simplest
proof is due to F.Riesz (Comm. Math.Helv. 17 (1945)221-239).
Theorems 68 is much than Theorem 69.
Corollary. If a is real
ZΛ
1
lim x(ω, t)eiat dt = x∗ (ω, a)
Λ→∞ Λ
0
for almost all ω. In other words almost all functions have limiting “time
averages” equal to the mean of the values of the function at any fixed
time.
x(t3 ) − x(t1 ) is the convolution of those of x(t3 ) − x(t1 ) and x(t3 ) − x(t2 ).
We are generally interested only in the increments, and it is convenient
to consider the behaviour of the function from some base point, say 141
0, modify the function by subtracting a random variable so as to make
x(0) = 0, x(t) = x(t) − x(0). Then if Ft1 ,t2 is the distribution function for
the increment x(t2 ) − x(t1 ) we have,
We get the stationary case if Ft1 ,t2 (x) depends only on t2 − t1 . (This
by itself is not enough, but together with independence, the condition is
sufficient for stationary) If we put
so that ϕt (u) = etψ(u) for some ψ(u), which must have the K − L form
which is seen by putting t = 1 and using Theorem 37.
Conversely, we have also the
Theorem 72. Any function ϕt (u) of this form is the characteristic func-
tion of a stationary random function with independent increments.
Proof. We observe that the conditions on Ft (x) gives us a system of joint
distributions over finite sets of points ti which is consistent in the sense
of Theorem 44 and the random function defined by the Kolmogoroff
measure in Theorem 44 has the required properties.
Theorem 73. Almost all functions x(t) defined by the Kolmogoroff mea-
sure defined by the Wiener function, or any extension of it, are every-
where non-differentiable. In fact, almost all functions fail to satisfy a
Lipschitz condition of order α(x(t − h) − x(t) = 0(| h |α )) if α > 21 and
are not of bounded variation.
Theorem 74. The Kolmogoroff measure defined by the Wiener function
can be extended so that almost all functions x(t) Satisfy a Lipschitz con-
dition of any order α < 21 at every point, and are therefore continuous
at every point.
For proofs, see Doob, pages 392-396 and for the notion of extension,
pages 50-71.
Theorem 75. The K-measure defined by the Poisson function can be
extended so that almost functions x(t) are step functions with a finite
number of positive integral value in any finite interval.
The probability that x(t) will be constant in an interval of length t is
e−ct .
122 2. Probability