Probability Theory I: CAM 384K Concepts

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 1 of 14

28 Aug
F ⊂ 2Ω is a σ-algebra iff
(i) F,∅
(ii) A ∈ F =⇒ A0 = Ω − A ∈ F

[
(iii) Ai ∈ F ∀i ∈ Ž =⇒ Ai ∈ F
i=1
∅, Ω ∈ F. F is closed under countable intersection.
Ω , ∅ is the sample space. F ⊂ 2Ω is a σ-algebra of events. A ∈ F is an event.
(Ω, F) is a measurable space.
µ : F → [0, ∞] is a measure iff
(i) µ(A) ≥ µ(∅) = 0
[  X
(ii) Ai ∈ F countable, Ai disjoint =⇒ µ Ai = µ(Ai )

If  is a measure where (Ω) = 1 then  : F → [0, 1] is a probability measure. It obeys


(monotonicity) A ⊂ B =⇒ (B) − (A) = (B − A) ≥ 0
[ X
(subadditivity) Ai ∈ F, A ⊂ Ai =⇒ (A) ≤ (Ai )
(continuity from below) Ai % A =⇒ (Ai ) % (A)
(continuity from above) Ai & A =⇒ (Ai ) & (A)
(Ω, F, ) is a probability space.
2 Sept
If A ⊂ 2Ω then A generates σ(A) B F, σ-algebra
T
:A⊂F
using that Fι ⊂ 2Ω , ι ∈ I are σ-algebras =⇒ ι∈I Fι is a σ-algebra.
T

Given (Ω1 , F1 , 1 ) , . . . , (Ωn , Fn , n ) define the product probability space (Ω, F, ) where
Ω B Ω1 × · · · × Ωn = {(ω1 , . . . , ωn ) : ωi ∈ Ωi }
F B F1 × · · · × Fn = σ({A1 × · · · × An : Ai ∈ Fi })
 B 1 × · · · × n where (A1 × · · · × An ) = 1 (A1 ) . . . n (An )
F exists by Carathéodory’s extension theorem. F is unique by the π − λ theorem.
P ⊂ 2Ω is a π-system iff
(i) P,∅
(ii) A, B ∈ P =⇒ A ∩ B ∈ P

L ⊂ 2Ω is a λ-system iff
(i) Ω∈L
(ii) A, B ∈ L, A ⊂ B =⇒ B − A ∈ L
(iii) Ai ∈ L, Ai % A =⇒ A ∈ L
Generally, λ-systems are not σ-algebras.
A λ-system which is additionally closed under intersection is a σ-algebra.
If A ⊂ 2Ω then A generates `(A) B L, λ-system
T
:A⊂L
using that Lι ⊂ 2Ω , ι ∈ I are λ-systems =⇒ ι∈I Lι is a λ-system.
T
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 2 of 14

(π-λ theorem) P ⊂ L =⇒ σ(P) ⊂ `(P) ⊂ L.


The Borel sets on ’ are R B σ open sets in ’ = σ({(a, b] : a < b}) = σ({(a, b) : a < b}).

The Borel sets on ’n are R n B σ open sets in ’n = R × · · · × R.


For measure µ : R → [0, ∞], assume ∃F : ’ → ’ such that µ((a, b]) = F(b) − F(a).
Such an F must be nondecreasing and right continuous.
(Lebesgue-Stieltjes)
F : ’ → ’ nondecreasing, right-continuous =⇒ ∃!µ : R → [0, ∞] where µ((a, b]) = F(b) − F(a).
(Lebesgue measure) The unique measure λ : R → [0, ∞] where λ ((a, b]) = b − a.
4 Sept
{X ∈ B} B X −1 (B) = {ω ∈ Ω : X(ω) ∈ B}.
A random variable is a function X : (Ω, F) → ’ such that {X ∈ B} ∈ F ∀B ∈ R.
Define the probability measure µX : ’ → [0, 1], called the distribution, such that µ X (B) = (X ∈ B).
(’, R, µX ) is a probability space.
The distribution function of X is F X : ’ → [0, 1] such that F X (x) B (X ≤ x) = µX ((−∞, x]).
Every F X satisfies
(i) x ≤ y =⇒ F(x) ≤ F(y)
(ii) xn & x =⇒ F(x) = lim F(xn ) = F(x+ )
n
(iii) (X = x) = F(x) − F(x− )
(v) (X < x) = F(x− )
(vi) F(−∞) B lim F(x) = 0
x→−∞
(vii) F(+∞) B lim F(x) = 1
x→+∞
Properties (i), (ii), (vi), and (vii) characterize a distribution.
Given F : ’ → ’ obeying (i), (ii), (vi), and (vii), there exists a random variable X such that F X = F.
−1 (y) B sup
Fleft {x ∈ X : F(x) < y} is the left continuous inverse of a distribution F.
−1 (y) B sup
Fright {x ∈ X : F(x) ≤ y} is the right continuous inverse of a distribution F.
−1 −1 (y) is used.
Generally, F (y) B Fleft
F X continuous =⇒ F X (X) is uniformly distributed on (0, 1).
9 Sept
Random variables X and Y are equal in distribution, denoted X =Y, when F X = FY .
d


1 x ∈ A

Define the indicator function 1A (x) = 

.
0 x < A

X : (Ω, F) → (S , S) is measurable if ∀B ∈ S {X ∈ B} ∈ F.
X measurable wrt F is also written X ∈ F.
X
For (Ω, F) → (S , S = σ(A)), if {X ∈ A} ∈ F ∀A ∈ A then X is measurable.
X f
For (Ω, F) → (S , S) → (T, T ) if X, f measurable then f (X) measurable.
X1 , X2 , . . . Xn random variables, f : (’n , Rn ) → (’, R) measurable =⇒ f (X1 , . . . , Xn ) a random variable.
For example, X1 + · · · + Xn is a random variable.
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 3 of 14

X1 , . . . , Xn , . . . random variables =⇒ inf Xn , sup Xn , lim inf Xn , and lim sup Xn are all random variables.
 
The extended real numbers are ’ B ’ ∪ {−∞, +∞}. The extended Borel sets are R B σ ’ .
An extended random variable is a function X : (Ω, F) → ’ such that {X ∈ B} ∈ F ∀B ∈ R.
All random variables are automatically extended random variables.

Property p (ω) is satisfied almost everywhere (a. e.) when µ({ω ∈ Ω : ¬p (ω)}) = 0.
Property p (ω) is satisfied almost surely (a. s.) when µ({ω ∈ Ω : ¬p (ω)}) = 0.

X = Y almost surely when (X , Y) = µ ({X − Y , 0}) = 0.


n o
For X : (Ω, F) → (S , S) measurable, σ(X) B X −1 (B) : B ∈ S is the σ-algebra generated by X.
A generates S =⇒ X −1 (A) generates σ(X).

Y : Ω → ’, Y measurable on σ(X) =⇒ ∃ f : R → R : Y = f (X).


11 Sept
(Ω, F, µ) is σ-finite if ∃An ∈ F, An % Ω : µ(An ) < ∞. 16 Sept

f : Ω → ’ measurable is a simple function if f = ai 1Ai where ai ∈ ’, Ai ∈ F disjoint, µ(Ai ) < ∞.


P

There are four incremental stages in the development of the Lebesgue integral:
Z X
(1. f simple) f dµ B ai µ(Ai )
Z Z Z
(2. f bounded, finite support) f dµ B sup ϕ dµ = inf ϕ dµ
ϕ≤ f ϕ≥ f
ϕ simple ϕ simple
Z Z
(3. f ≥ 0) f dµ B sup 0≤h≤ f h dµ
h bounded
h finite support
Z Z Z
(4. f = f + − f − ) f dµ B f + dµ − f − dµ

fR dµ using An underlying the σ-finite space.


R R
For f ≥ 0 we get the continuity result ( f ∧ n) 1An dµ %
+
For f = f − f , we sayR f is integrable +
f dµ, f − dµ < ∞.
R

R only when
Note | f | = f + + f − and | f | dµ ⇐⇒ f + dµ, f − dµ < ∞.
R

The Legesgue integral has the following properties:


Z
(i) f ≥ 0 a. e. =⇒ f dµ ≥ 0
Z Z
(ii) a f dµ = a f dµ
Z Z Z
(iii) f + g dµ = f dµ + g dµ
Z Z
(iv) f ≤ g a. e. =⇒ f dµ ≤ g dµ
Z Z
(v) f = g a. e. =⇒ f dµ = g dµ
Z Z
(vi) f dµ ≤ | f | dµ

Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 4 of 14

Properties (i)–(iii) must be verified at each stage in integral’s development.


Properties (iv)–(vi) follow directly from (i)–(iii). R
In integration, “∞ · 0 B 0” or µ({ f , 0}) = 0 =⇒ f dµ = 0.

L0 (Ω, F, µ) B { f : Ω → ’ measurable}. L0 (Ω, F, µ) B L0 (Ω, F, µ) / ∼ where f ∼ g ⇐⇒ f = g a. e..


L1 (Ω, F, µ) B n f : Ω o . L (Ω, F, µ) B L (Ω, F, µ) / ∼.
 1 1
R → p’ integrable
L p (Ω, F, µ) B f : | f | dµ < ∞ . L p (Ω, F, µ) B L p (Ω, F, µ) / ∼.
n o
L∞ (Ω, F, µ) B f : || f ||∞ < ∞ where || f ||∞ B ess sup f B inf M ∈ ’ : µ({ f > M}) = 0 .

L∞ (Ω, F, µ) B L∞ (Ω, F, µ) / ∼.
Always L∞ , L1 ⊂ L0 .
µ(Ω) < ∞ =⇒ L∞ ⊂ · · · ⊂ L2 ⊂ L1 . This inclusion relation always holds for a probability measure.

(Hölder’s inequality) p, q ∈ [1, ∞] such that 1p + 1q = 1 =⇒ | f g| d µ ≤ || f || p ||g||q .


R

(Minkowski’s inequality) p ∈ [1, ∞] =⇒ || f + g|| p ≤ || f || p + ||g|| p .


a.s.
Xn converges almost surely (everywhere) to X, denoted Xn −→X, when µ({ω ∈ Ω : Xn (ω)−→X(ω)})
/ = 0.
µ
Xn converges in measure (probability) to X, denoted Xn −→X, when µ(|Xn − X| ≥ ε) → 0 ∀ε > 0.
a.s. µ
Xn −→X =⇒ Xn −→X whenever µ(Ω) < ∞.
18 Sept
There are four major theorems regarding interchanging limits and integration:
(bounded convergence) Z Z
µ
| fn | , | f | ≤ M ∈ ’, fn −→ f, µ(Ω) < ∞ =⇒ fn dµ −→ f dµ
(monotone convergence) Z Z
(Ω, F, µ) σ-finite, fn ≥ 0, fn % f a. e. =⇒ fn dµ % f dµ
(Fatou’s Lemma) Z Z Z Z
fn ≥ 0 measurable =⇒ lim inf fn dµ ≤ lim inf fn dµ ≤ lim sup fn dµ ≤ lim sup fn dµ
(dominated convergence) Z Z
a.s.
fn measurable, fn −→ f, fn ≤ g for g ∈ L =⇒ 1
fn dµ −→ f dµ, f ∈ L1

If ∃ f : ’ → [0, ∞) such that (X ∈ B) = B f (x) dx then f is the density function of X, denoted fX .
R
Rx
By definition F X (x) = −∞ fX (y) dy and ’ fX (y) dy = 1.
R

R
The expected value of X on (Ω, F, ) is …[X] B X d.
…[X] is also called the mean and denoted µX or µ.
On a discrete space, …[X] = i ωi (X = ωi ).
P
(X ∈ A) = …[1A ].
X g
(transport formula) (Ω, F, ) → (S , S) → (’, R), g measurable, g ≥ 0 or g bounded =⇒
Z Z Z Z
… g(X) = g(X) d = g(x) dµX (x) = g(x) dF X (x) =
 
g(x) fX (x) dx.
’ ’ ’
The transport formula implies g(X) ∈ 1
L (Ω, F, )
⇐⇒ g ∈ L1 (’, R, µ).
h i
… X k for k ∈ Ž is called the kth moment of X.
h i h i
… (X − …[X])k = … (X − µ)k for k ∈ Ž is called the kth centered moment of X.
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 5 of 14

The variance of X, denoted varh X, iis the second centered moment.


h i
Using the definition, var X = … X 2 − µ2 . Always var X ≤ … X 2 and var (aX + b) = a2 var X.

σX B var X is the standard deviation of X.
 
(Markov’s inequality) ϕ : ’ → [0, ∞), A ∈ R =⇒ inf y∈A ϕ(y) (X ∈ A) ≤ … ϕ(X)1X∈A ≤ … ϕ(X) .
   
… ϕ(|X|)1 … ϕ(|X|)
(Chebyshev’s inequality) Markov when A = {x ∈ ’ : |x| ≥ a} =⇒ (|X| ≥ a) ≤ [ ϕ(a){|X|≥a} ] ≤ [ϕ(a) ] .
… X2
Moments are often used for ϕ, e.g. (|X| ≥ a) ≤ [ ] and (|X − µ| ≥ a) ≤ var X .
a2 a2
22 Sept
ϕ : ’ → ’ is convex if ϕ (λx + (1 − λ)y) ≤ λϕ(x) + (1 − λ)ϕ(y) whenever λ ∈ [0, 1].
In words, “ϕ weighted average ≤ weighted average ϕ.”

ϕ ∈ C 1 ([a, b]) convex ⇐⇒ ϕ(x) + ϕ0 (x)(y − x) ≤ ϕ(y) ∀x, y ∈ [a, b].
ϕ ∈ C 2 ([a, b]) convex ⇐⇒ ϕ00 (x) ≥ 0 ∀x ∈ [a, b] .

(Jensen’s inequality) ϕ convex and …[|X|] , … |ϕ(X)| < ∞ =⇒ ϕ (…[X])


h i ≤ … ϕ(X) .
   
Moments are often used for ϕ, e.g. |…[X]| ≤ …[|X|] and …[X]2 ≤ … X 2 .

The conditional probability that event A occurs given event B is (A|B) = (A∩B)
(B) provided (B) , 0.

Independence of two objects, denoted using infix y, is defined as follows:


(Events A, B) A y B ⇐⇒ (A ∩ B) = (A) (B)
(Random variables X, Y) X y Y ⇐⇒ (X ∈ C, Y ∈ D) = (X ∈ C) (Y ∈ D) ∀C, D ∈ R
(σ-algebras G1 ,G2 ) G1 y G2 ⇐⇒ A y B ∀A ∈ G1 ∀B ∈ G2

AyB =⇒ …[AB] = …[A] …[B].


AyB =⇒ (A|B) = (A).
AyB =⇒ A0 y B, A y B0 , A0 y B0 .
AyB ⇐⇒ 1A y 1B .
XyY ⇐⇒ σ(X) y σ(Y).

Independence of a finite collection requires these logical extensions of the above to hold:
\  Y
(σ-algebras G1 , . . . ,Gn )  Ai = (Ai ) for Ai ∈ Gi
\  Y
(Random variables X1 , . . . , Xn )  {Xi ∈ Ci } = (Xi ∈ Ci ) for Ci ∈ R
 
\  Y
(Events A1 , . . . , An )   Ai  = (Ai ) whenever I ⊂ {1, . . . , n}
i∈I i∈I
 
\  Y
(Classes of events A1 , . . . , An )   Ai  = (Ai ) for Ai ∈ Ai whenever I ⊂ {1, . . . , n}
i∈I i∈I
Independence of an infinite collection requires that every finite subcollection be independent.
Pairwise independence of a collection’s elements does not imply the collection is independent.

A1 , . . . , An independent π-systems =⇒ σ(A1 ) , . . . , σ(An ) independent.

(X1 ≤ x1 , . . . , Xn ≤ xn ) = ≤ xi ) =⇒ X1 , . . . , Xn independent.
Qn
i=1 (Xi
25 Sept
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 6 of 14

Independence of objects derived from independent triangular arrays:


S 
F1,1 . . . F1,m(1) 
 G1 = σ m(1) j=1 F1, j 
. . . .

 

. . . independent =⇒ .
 
(i) . . .  . independent
 
 

Fn,1 . . . Fn,m(n) 
  Sm(n)  
Gn = σ j=1 Fn, j
 

X1,1 . . . X1,m(1)  f1 X1,1 , . . . , X1,m(1) 


 
. . . .

 

. . . =⇒ .
 
(ii) . . .  independent . independent
 
 

Xn,1 . . . Xn,m(n) fn Xn,1 , . . . , Xn,m(n)


 

Given two random variables


X : (Ω, F, ) → (S 1 , S1 ) with µX : (S 1 , S1 ) → [0, 1]
Y : (Ω, F, ) → (S 2 , S2 ) with µY : (S 2 , S2 ) → [0, 1]
we have the joint random variable
(X, Y) : (Ω, F, ) → (S 1 × S 2 , S1 × S2 ) with µX,Y : (S 1 × S 2 , S1 × S2 ) → [0, 1]
where µX,Y is the unique joint distribution of X and Y satisfying µX,Y (A × B) = µX (A) µY (B) ∀A ∈ S1 ∀B ∈ S2 .

If X1 , . . . Xn are random variables on (Ω, F, ) then X1 , . . . Xn independent ⇐⇒ µX1 ,...,Xn = ni=1 µXi .
Q

(Fubini’s theorem) σ-finite (S 1 , S1 , µ1 ), (S 2 , S2 , µ2 ) and either f ≥ 0 or | f | d (µ1 × µ2 ) < ∞ =⇒


R
Z "Z # Z Z "Z #
f (u1 , u2 ) d (µ2 ) d (µ1 ) = f (u1 , u2 ) d (µ1 × µ2 ) = f (u1 , u2 ) d (µ1 ) d (µ2 ) .
S1 S2 S 1 ×S 2 S2 S1

X1 , . . . Xn with densities f1 , . . . , fn independent ⇐⇒ (X1 , . . . , Xn ) has density f ((x1 , . . . , xn )) = f1 (x1 ) . . . fn (xn )

X y Y, h : ’2 → ’ measurable =⇒ …[h(X, Y)] = ’ ’ h(x, y) dµX dµY using Fubini.


R R

X y Y =⇒ … f (X)g(Y) = … f (X) … g(Y) .


     
X1 , . . . , Xn independent =⇒ …[X1 . . . Xn ] = …[X1 ] . . . …[Xn ].
27 Sept
For X y Y, the distribution of the sum Z = X + Y is
Z
 Fubini 
FZ (z) = (X + Y ≤ z) = … 1{X+Y≤z} = … 1{X≤z−Y} = F X (z − y) dFY (y) = (F X ∗ FY ) (z).
 
’
If X has density fX then fZ (z) = ’ fX (x − y) dFY (y) since
R
Z Z z−y ! Z z Z !
FZ (z) = … 1{X≤z−Y} = fX (x) dx dFY (y) =
 
fX (x − y) dFY (y) dx.
’ −∞ −∞ ’
Additionally, if Y has density fY then f Z = ( f X ∗ fY ) (z).

cov (X, Y) B … (X − µX ) (Y − µY ) = …[XY] − …[X] …[Y] is the covariance of X, Y ∈ L2 .


 
ρ (X, Y) B √ cov(X,Y)
√ = cov(X,Y)
σX σY is the correlation coefficient of X, Y.
var X var Y

X, Y are uncorrelated if cov (X, Y) = ρ (X, Y) = 0.


X y Y =⇒ X, Y uncorrelated but X, Y uncorrelated =⇒ / XyY

uncorrelated X1 , . . . , Xn ∈ L2 =⇒ var (X1 + · · · + Xn ) = var X1 + · · · + var Xn .


Lp
For 1 ≤ p < ∞, Xn converges in L p to X, denoted Xn −→X, when ||Xn − X||L p = … |Xn − X| p −→ 0.
 
L∞
For p = ∞, Xn converges in L∞ , denoted Xn −→X, when ||Xn − X||L∞ = ess sup |Xn − X| −→ 0.
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 7 of 14
Lq Lp
Xn −→X =⇒ Xn −→X using that µ(Ω) < ∞,1 ≤ p ≤ q ≤ ∞ =⇒ ||·||L p ≤ ||·||Lq
Lp 
Xn −→X =⇒ Xn −→X using Chebyshev’s inequality.
S n L2
(L2 -WLLN) uncorrelated Xi ∈ L2 , …[Xi ] = µ, var Xi ≤ C < ∞, and S n B X1 + · · · + Xn =⇒ n −→µ.
2 Oct
S n L2
(popular WLLN) Xi ∈ L2 independent and identically distributed (i. i. d.) =⇒ n −→µ.

S n −…[S n ] L2
(WLLN for triangular arrays) S n = Xn,1 + · · · + Xn,n and var S n
b2n
−→ 0 for some bn =⇒ bn −→0.
7 Oct
A random variable X with large tails can be truncated outside a threshold M, i.e. X̄ B X1{|X|≤M} .

(WLLN for triangular arrays with independent rows) Construct bn > 0, bn % ∞ such that both
Pn  2 
n … Xn,k 1
{|Xn,k |≤bn }

X   k=1
 Xn,k ≥ bn −→ 0 and −→ 0.
k=1
b2n
  
Define an = nk=1 … Xn,k 1{|Xn,k |≤bn } and S n = Xn,1 + · · · + Xn,n . Under these conditions S nb−a
P n
n
−→0.
R∞
X ≥ 0, f : [0, ∞] → [0, ∞] increasing, f ∈ C 1 , and f (0) = 0 =⇒ …[ f (X)] = 0 f 0 (x)(X ≥ x) dx
R∞
In particular, …[X p ] = 0 px p−1 (X > x) dx allows estimating moments using tails.
For discrete N : Ω → Ž ∪ {∞}, …[N] = ∞
P
n=0 (N h ≥ n).i
Use p = 1 − ε to show x(|X| > x) −→ 0 =⇒ … |X|1−ε < ∞ for ε > 0.

(General WLLN) X1 , . . . , Xn , . . . i. i. d., x(|X| > x) −→ 0 =⇒ Snn − µn −→0 where µn = … X 1{|X|≤n} .
 

X ∈ L1 =⇒ x(|X| > x) −→ 0 since x(|X| > x) = … x 1{|X|>x} , x 1{|X|>x} ≤ |X|, and x 1{|X|>x} → 0 as x → ∞.
 

Sn 
(L1 -WLLN) Xi ∈ L1 i. i. d. and …[Xi ] = µ =⇒ n −→µ.
9 Oct
Define lim inf An ⊂ lim sup An for sequences of subsets of Ω:
[\ \
Al = lim Al = ω that are in all but finitely many An ’s

lim inf An B
n n→∞
n l≥n l≥n
\[ [
Al = lim Al = ω that are in infinitely many An ’s

lim sup An B
n n→∞
n l≥n l≥n

lim sup An is read An infinitely often (i. o.), i.e. (An i. o.) B  lim sup An .
a.s.
Xn −→X ⇐⇒ ∀ε > 0 ({|Xn − X| ≥ ε} i. o.) = 0 using that {Xn −→X}
/ = ∪ε>0 ∩n ∪l≥n {|Xl − X| ≥ ε}.

(Borel-Cantelli 1) n=1 (An ) < ∞ =⇒ (An i. o.) = 0.


P∞

fast
Xn converges fast to X, denoted Xn −→X, if ∀ε > 0 ≥ ε}) < ∞.
P∞
n=1 ({|Xn − X|
fast a.s.
Xn −→X =⇒ Xn −→X by Borel-Cantelli.

(convergence of random variables) Using that for a topological space, yn −→ y if ∀ynm ∃ynmk −→ y:
fast BC1 a.s. 
(i) Xn −→X =⇒ Xn −→X =⇒ Xn −→X
 fast
(ii) Xn −→X =⇒ ∃Xnk : Xnk −→X
 a.s.
(iii) Xn −→X ⇐⇒ ∀Xnm ∃Xnmk −→X
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 8 of 14

There exist sequences that converge in probability but not almost surely.
Convergence in probability comes from a metric, but convergence almost surely is not from any topology.
S n a.s.
(L4 -SLLN) Xi ∈ L4 i. i. d. =⇒ n −→µ.
14 Oct
S n a.s.
(SLLN) Xi ∈ L1 i. i. d. =⇒ n −→µ.

= ∞ =⇒ (An i. o.) = 1.
P∞
(Borel-Cantelli 2) Ai independent and i=1 (An )

For independent An , the Borel-Cantelli lemmas impose a zero-one law forcing (An i. o.) to be either 0 or 1.
1A1 +···+1An a.s.
= ∞ =⇒
P∞
(Borel-Cantelli 2 extension) Ai independent and i=1 (An ) (A1 )+···+(An ) −→1 as n → ∞.

Weak convergence or convergence in distribution, written with an infix ⇒ , is defined as follows:


(Distribution functions) Fn ⇒ F ⇐⇒ Fn (x) −→ F(x) at each x where F is continuous.
(Probability measures) n ⇒  ⇐⇒ distribution functions Fn ⇒ F
(Random variables) Xn ⇒ X ⇐⇒ distribution functions Fn ⇒ F
Practically, Xn ⇒ X means n (X ≤ x) −→ (X ≤ x) whenever (X = x) = 0.

Weak convergence is metrizable, that is Fn ⇒ F ⇐⇒ ρ (Fn , F) −→ 0


where ρ (F,G) B inf {ε : F(x − ε) − ε ≤ G(x) ≤ F(x + ε) + ε ∀x} is the Lévy metric.
16 Oct a.s.
Fn ⇒ F as n → ∞ =⇒ ∃Xn , X such that F Xn = Fn , F X = F, and Xn −→X as n → ∞.

When taking expectations, everything using almost sure convergence can instead use weak convergence.

(characterization of weak convergence) The following are equivalent:


(i) Xn ⇒ X
   
(ii) … g(Xn ) −→ … g(X) ∀g : ’ → ’ continuous, bounded
(iii) (X ∈ G) ≤ lim inf (Xn ∈ G) ∀G open
(iv) (X ∈ F) ≥ lim sup (Xn ∈ F) ∀F closed
(v) (Xn ∈ A) −→ (X ∈ A) if (X ∈ ∂A) = 0
Results (iii) and (iv) are lower and upper semicontinuity, respectively.

(continuous mapping theorem) Xn ⇒ X, (X ∈ {x ∈ ’ : g is discontinuous at x}) = 0 =⇒ g(Xn ) ⇒ g(X).

n ,  ∈ C ∗ (’), the dual of the space of continuous, bounded functions on ’.


n ⇒  is weak-∗ sequential convergence under the assumption (’) = 1.

(Helly’s selection/compactness theorem)


Fn a sequence of distribution functions =⇒ ∃Fnk , ∃F right continuous and nondescreasing : Fnk ⇒ F.
F is not necessarily a distribution function because mass may escape at ±∞.
21 Oct
P, a set of probability measures on ’, is tight if ∀ε > 0 ∃Mε ∈ ’+ such that µ([−Mε , Mε ]) ≥ 1 − ε ∀µ ∈ P.
Equivalently P, a set of distribution functions, is tight if 1 − F (Mε ) + F (−Mε ) ≤ ε ∀F ∈ P.
Equivalently Pn , a countable set of distribution functions, is tight if lim supn→∞ [1 − Fn (Mε ) + Fn (−Mε )] ≤ ε.

Fn a tight sequence of distribution functions =⇒ ∃Fnk , ∃F a distribution function such that Fnk ⇒ F.

(Prokorov’s theorem) For P, a set of probability measures on ’, P is tight ⇐⇒ ∀Pn ∃Pnk , P∞ : Pnk ⇒ P∞ .
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 9 of 14

For ϕ : ’ → [0, ∞) such that lim|x|→∞ ϕ(x) = +∞, if ’ ϕ(x) dF(x) ≤ C < ∞ ∀F ∈ P then P is tight.
R

For an integer-valued X, let an B (X = n) where ∞ an = 1.


P
P∞ i=0 n h i
Define generating function g(x) = n=0 an x = n=0 x (X = n) = … xX .
P∞ n

Knowing g(x) is equivalent to knowing (X = n).


h i R
Every random variable X has a characteristic function ϕX (t) B … eitX = ’ eitx µX (dx):
h i
(i) ϕ(0) = … ei0X = 1
(ii) ϕ(−t) = …[cos(−tX)] + i…[sin(−tX)] = …[cos(tX)] − i…[sin(tX)] = ϕ(t)
h i
(iii) |ϕ(t)| ≤ … eitX = …[1] = 1
h i h i
(iv) ϕaX+b (t) = … eit(aX+b) = eitb … eitaX = eitb ϕX (at)
(v) X y Y =⇒ ϕX+Y (t) = ϕX (t) ϕY (t)

F X (t) = i=1 λi F Xi (t) i=1 λi = 1 =⇒ ϕX (t) = i=1 λi ϕXi (t).


Pn Pn Pn
where

Characteristic functions are uniformly continuous since dominated convergence implies


h i h i h i
|ϕ(t + h) − ϕ(t)| ≤ … ei(t+h)X − eitX = … eitX eihX − 1 = … eihX − 1 → 0 as h → 0.
R y R y 28 Oct
eix − eiy ≤ |x − y| for x, y ∈ ’ since eix − eiy ≤ x dtd eit = x ieit dt = y − x.
k h i R∞
… |X|n < ∞ =⇒ ϕ0 , . . . , ϕ(n) exist everywhere, continuous, and dtd k ϕ(t) = … (iX)k eitX = −∞ (ix)k eitx µX (dx).
 
k h i
… |X|n < ∞ =⇒ dtd k ϕ(0) = ik … X k for k = 0, . . . , n.
 

itb −eita
Uniform U (a, b) with density f (x) = 1
b−a has characteristic function ϕ(t) = eit(b−a) .
2   2
Normal N (0, 1) with density f (x) = √1 exp −x2 has characteristic function ϕ(t) = exp − t2 .
2π  
2
N µ, σ with density f (x) = √ 1 exp −(x−µ)
   
Normal 2
2σ2
has characteristic function ϕ(t) = exp itµ − 2 σ
1 2 2
t .
2πσ

The inversion formula recovers a distribution from a characteristic function:


Z ∞ −ita −itb
1 1 e −e
µ((a, b)) + µ({a, b}) = ϕ(t) dt
2 2π −∞ it
Z T
1
µ({a}) = lim e−ita ϕ(t) dt
T →∞ 2T −T
In particular, the limits above always exist at ±∞.

ϕ = ϕ0 =⇒ µ = µ0
R∞ 1 ∞ −itx
|ϕ (t)| dt < ∞ =⇒ X has a continuous, bounded density function fX (x) = ϕ(t)
R
−∞ X 2π −∞ e dt.
30 Oct
(continuity theorem) For a sequence of probability distributions µn and their characteristic functions ϕn (t),
(i) µn ⇒ µ∞ =⇒ ϕn (t) −→ ϕ∞ (t) ∀t
(ii) ϕn (t) → ϕ∞ (t) ∀t, ϕ∞ continuous at 0 =⇒ µn ⇒ µ∞ , µn tight
where µ∞ is another probability distribution and ϕ∞ is its characteristic function.
S n −nµ
(CLT) Xi ∈ L2 i. i. d., …[Xi ] = µ, 0 < var Xi = σ2 , S n = Xi + · · · + Xn =⇒ σ n
√ ⇒ N(0, 1).
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 10 of 14
    
≤ min |x|n+1 , 2|x|n with n = 2 gives the estimate ϕ(t) − 1 − σ2 t2 ≤ … min |tx|3 , t2 x2 .
n  
Using eix − nk=0 (ix)
P
n! (n+1)! n! 2 6

 n
cn → c ∈ ƒ =⇒ 1 + cnn → ec

(self-normalized sums) Pn
X
Xi i. i. d., …[Xi ] = 0, var Xi = σ2 ∈ (0, ∞) =⇒ √Pi=1n i 2 ⇒ N (0, 1).
i=1 Xi

A triangular array satisfies the Lindeberg conditions if


(i) σ2n,1 + · · · + σ2n,n → σ2 > 0
n
X  2 
(ii) ∀ε > 0 … Xn,k 1{|Xn,k ≥ε|} → 0 as n → ∞
k=1
The second condition requires that all array elements contribute “equally” to the sum.
h i
(Lindeberg-Feller CLT) If a triangular array with independent rows satisfies … Xn,i = 0, … Xn,i
2 = σ2 < ∞,
 
n,i
X +···+X
 
and the Lindeberg conditions then n,1 √n n,n ⇒ N 0, σ2 .
4 Nov

n n n
|w1 | , . . . , |wn | , |z1 | , . . . , |zn | ≤ θ =⇒ wi − zi ≤ θn−1 |wi − zi |.
Q Q P
i=1 i=1 i=1

  √ √  Rb
a pq a pq
√1 e−x /2 dx
B(n,p)−np B(n,p) 2
√ √
pq n
⇒ N(0, 1) so  n ∈ p + √n , p + √n −→ a by the CLT.

1A1 +···+1An a.s. 1A1 +···+1An −log n


For record values with (An ) = 1
n where log n −→1 we get √ ⇒ N(0, 1) by the CLT.
log n
6 Nov
     
For Ω finite with partition Ωi=1,k ,  ω j Ωi B  ω j /(Ωi ) if ω j ∈ Ωi .  ω j Ωi B 0 if ω j < Ωi .
h i P   P
For X : Ω → ’, let yi = … X Ωi B Nj=1 X(ω j ) ω j Ωi = ω∈Ωi X(ω)(ω)
(Ωi ) .
Define Y : Ω → ’ by Y|Ωi = yi for i = 1, . . . , k. Then …[X1B ] = …[Y1B ] whenever B = ∪Ωi .

(Radon-Nikodym) If (Ω, F) a measurable space with measures µ, ν where µ(A) = 0 =⇒ R ν(A) = 0 ∀A ∈ F


(i.e. ν  µ, absolutely continuous) then ∃ f : Ω → [0, ∞) measurable such that ν(A) = A f (x) µ(dx) ∀A ∈ F.
f is usually denoted dν/dµ and called the Radon-Nikodym derivative.
h i
For X ∈ L1 and G ⊂ F, random variable Y = … X G is a conditional expectation of X wrt G iff
(i) Y measurable wrt G,
(ii) …[X1B ] = …[Y1B ] ∀B ∈ G.
Y exists by Radon-Nikodym and is unique up to versions, i.e. Y = Y 0 a. s. ⇐⇒ Y 0 is a version of Y.
h i h i
… X Y B … X σ(Y) = h(Y) where h : (’, R) → (’, R) is the conditional expectation of X wrt Y.
Since σ(Y) 3 B = {Y ∈ C} with C ∈ R,
Z Z
…[X1B ] = … X1{Y∈C} = …[h(Y)1B ] = … h(Y)1{Y∈C} = = h(y) Y (dy) .
   
h(y)1{y∈C} Y (dy)
’ C
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 11 of 14

Conditional expectation has the following properties:


h i h i
(i) X ∈ L1 =⇒ … X G ∈ L1 and … X G 1 ≤ ||X||L1 .
h i h i h i L
(ii) … aX + bY G = a… X G + b… Y G
h i h i
(iii) X ≤ Y a. s. =⇒ … X G ≤ … Y G
h h ii
(iv) … … X G = …[X] .
h h i i
(v) …[X1A ] = … … X G 1A for A ∈ G
h i
(vi) X y G =⇒ … X G = …[X]
L1
h i L1 h i
(vii) Xn −→X =⇒ … Xn F −→… X F
h i  
(viii) … 1A Fn =  A Fn
h i h i
For X ∈ L2 (Ω, F, ), G ⊂ F, … X G is the unique random variable attaining minY∈L2 (Ω,G,) … (X − Y)2 .
That is, the conditional expectation is the orthogonal projection wrt the L2 -inner product.
  h i h i2
var X G B … X 2 G − … X G is the conditional variance of X wrt G.
h  i h i
var X = … var X G + var … X G .
 
… X 2 F


(Chebyshev’s inequality) a > 0 =⇒  |X| ≥ a F ≤ a2
.
 h i h i
(Jensen’s inequality) ϕ convex and …[|X|] , … |ϕ(X)| < ∞ =⇒ ϕ … X F ≤ … ϕ (X) F .
 

h h i i h h i i h i 11 Nov
(tower property) F1 ⊂ F2 =⇒ … … X F2 F1 = … … X F1 F2 = … X F1 , i.e. “smaller σ-algebra wins”
h i h i
(taking out what is known) X ∈ G and XY, Y ∈ L1 =⇒ … XY G = X… Y G .
h i h i
(monotone convergence) If X ∈ L1 and Xn % X =⇒ … Xn G % … X G .

A filtration {F n} a sequence of σ-algebras where Fn ⊂ Fn+1 .

A discrete time stochastic process is a sequence of random variables {Xn } on (Ω, F, ) indexed by time.

{Xn } (ω) = {X0 (ω), X1 (ω), . . . } for fixed ω ∈ Ω is called a path of a stochastic process.

A stochastic process can be viewed as one of

(1) a sequence of random variables,


(2) an infinite dimensional random variables X : Ω → paths, or
(3) a two dimensional function X(·) (·) : Ž × Ω → ’.

An adapted stochastic process {Xn } satisfies Xn ∈ Fn ∀n, i.e. Xn is measurable wrt Fn .


The filtration {Fn }n = {σ(X0 , . . . , Xn )}n is called the natural filtration of a stochastic process.
{Xn } is adapted to {Fn } ⇐⇒ σ(X0 , . . . , Xn ) ⊂ Fn ∀n.
All processes are adapted on their natural filtration.
 
Ω, F, {Fn }n=0,1,... ,  is called a filtered probability space.
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 12 of 14
 
On Ω, F, {Fn }n=0,1,... ,  , a process {Mn } is a { submartingale, martingale, supermartingale }
h i
iff ∀n all of Mn ∈ L1 , Mn ∈ Fn (adapted), and … Mn+1 Fn {≥, =, ≤} Mn hold (respectively).

Over time, a {sub-, true, super-}martingale {increases, stays the same, decreases} in conditional expectation:
h i
Mn submartingale ⇐⇒ … Mn+1 − Mn Fn ≥ 0
h i
Mn martingale ⇐⇒ … Mn+1 − Mn Fn = 0
h i
M supermartingale ⇐⇒ … M − M F ≤ 0
n n+1 n n

All true martingales are both submartingales and supermartingales.

{Mn , Fn }n a martingale =⇒ Mn , Fnm n a martingale by the tower property.




i Any submartingale Xn can be decomposed uniquely as Xn = Mn + An where A0 B 0,


(Doob’s decomposition)
h
An+1 − An B … Xn+1 Fn − Xn is an increasing, predictable sequence and Mn B Xn − An is a martingale.
13 Nov h i
(multistep) {Mn , Fn }n a {sub-, true, super-}martingale =⇒ … Mk Fn {≥, =, ≤} Mn ∀n ≤ k (respectively).

Assume Mn ∈ Fn , Mn ∈ L1 and consider ϕ : ’ → ’ such that ϕ(Mn ) ∈ L1 :


(i) Mn martingale, ϕ convex =⇒ ϕ(Mn ) submartingale
(ii) Mn martingale, ϕ concave =⇒ ϕ(Mn ) supermartingale
(iii) Mn submartingale, ϕ convex, ϕ increasing =⇒ ϕ(Mn ) submartingale
 h i h i
These follow from conditional Jensen, e.g. for (i) we have ϕ(Mn ) = ϕ … Mn+1 Fn ≤ … ϕ (Mn+1 ) Fn .
Useful examples functions to combine with this fact include |Mn |, (Mn )+ , (Mn − a)+ , (Mn )− , and (Mn − a)− .

H is predictable if Hn ∈ Fn−1 .
Predictable implies is “deterministic” behavior on natural filtration.
Predictable H may be thought of as a betting strategy, i.e. Hn (Mn − Mn−1 ) is the payoff at time n.

(H · M) n B nk=1 Hk (Mk − Mk−1 ) is the discrete time stochastic integral of H onto M.


P
(H · M)n is also called a martingale transform.
(H · M)n is an adapted process.

{Mn , Fn }n supermartingale, H predictable with Hn bounded =⇒ {(H · M)n , Fn }n is a supermartingale.


18 Nov
A random time T : Ω → Ž ∪ {∞} is a stopping time wrt filtration Fn if {T = n} ∈ Fn ∀n.

N = inf n {Xn ∈ A} for A ∈ R is called the hitting time of A.


N is a stopping time because {N = n} = ({X0 < A} ∩ · · · ∩ {Xn−1 < A} ∩ {Xn ∈ A}) ∈ Fn .

{T = n} ∈ Fn =⇒ {T ≤ n} = ({T = 0} ∪ · · · ∪ {T = n − 1} ∪ {T = n}) ∈ Fn .
{T = n} ∈ Fn =⇒ {T < n} = ({T = 0} ∪ · · · ∪ {T = n − 1}) ∈ Fn .

XTn B XT∧n = XT (ω)∧n (ω) is a stopped process: one which “runs until stopping time N occurs.”

T stopping time =⇒ 1{n≤T } predictable because {T ≥ n} = {T < n}C = {T ≤ n − 1}C ∈ Fn−1 .


T stopping time =⇒ 1{n>T } predictable because 1{n>T } = 1 − 1{n≤T } .

1{n≤T } · X n = XT ∧n − X0

Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 13 of 14

T stopping time, {Mn , Fn }n martingale =⇒ {Mn∧T , Fn }n martingale because MT ∧n = M0 + 1{n≤T } · M n .




Using Mn submartingale and a < b, let T 2k+1 = minn≥2k {Mn ≤ a} and let T 2k+2 = minn≥2k+1 {Mn ≥ b}.
Call (T 2k+1 , T 2k+2 ) an interval
P of upcrossing and denote U (a, b) for the number of upcrossings.
The trading strategy Hn = ∞ k=0 1{T 2k+1 <n≤T 2k+2 } is predictable. It represents buy below a and sell above b.
Define Yn = Xn ∧ a = a + (Xn − a)+ so that (H · Y)n represents gains over upcrossings plus a possible last gain.
Then (b − a) Un (a, b) ≤ (H · Y)n . Also 1 − H ≥ 0 =⇒ ((1 − H) · Y) submartingale =⇒ …[((1 − H) · Y)] ≥ 0.
(submartingale upcrossing inequality) ∴ (b − a) …[U n (a, b)] ≤ … (X n − a)+ − … (X0 − a)+
   

(martingale convergence)
{Mn , Fn }n submartingale, supn … Mn+ < ∞ =⇒ Mn −→M, M ∈ L1 .
  a.s.

a.s.
{Mn , Fn }n supermartingale, Mn ≥ 0 =⇒ Mn −→M, …[M] ≤ …[M0 ].
20 Nov
 
(Xi )i∈I is uniformly integrable (u. i.) if supi∈I … |Xi | 1{|Xi |≥M} → 0 as M → ∞.

(Xi )i∈I u. i. ⇐⇒ Xi are L1 -bounded and ∀ε > 0 ∃δ > 0 : ∀A ∈ F (A) ≤ δ =⇒ …[|Xi | 1A ] ≤ ε ∀i ∈ I.

(de la Vallée Possin criterion)


(Xi )i∈I u. i. ⇐⇒ ∃ψ : [0, ∞) → [0, ∞) increasing such that ψ(x)/x → ∞ as x → ∞ and supi∈I … ψ(|Xi |) < ∞.
 

|Xi | ≤ |X| for X ∈ L1 =⇒ (Xi )i u. i..


(Xi )i u. i. =⇒
/ |Xi | ≤ |X| for some X ∈ L1 .
 h i
X ∈ L1 (Ω, F, ) , G ⊂ F =⇒ … X G is uniformly integrable.
G⊂F
  h i
Ω, F, {Fn }n=0,1,... ,  , M ∈ L1 (Ω, F, ) =⇒ Mn = … M Fn is a u. i. Lévy martingale.

 
 L1
Xn −→X =⇒ (Xn )n u. i. ⇐⇒ Xn −→X ⇐⇒ …[|Xn |] → …[|X|] < ∞
" h i#
a.s. L1
Mn martingale =⇒ (Mn )n u. i. ⇐⇒ Mn −→ M ⇐⇒ Mn −→M ⇐⇒ ∃M ∈ L : Mn = … M Fn
1
L1
" #
a.s. L1
Mn submartingale =⇒ (Mn )n u. i. ⇐⇒ Mn −→ M ⇐⇒ Mn −→M
L1

h i h i L1 h i
(multistep at n = ∞) {Mn }n submartingale =⇒ … M∞ Fk ≥ Mk since … Mn Fk −→… M∞ Fk .
h i
M ∈ L1 is called a last element of an adapted {sub-, true, super-}martingale Mn if … M Fk {≥, =, ≤} Mk ∀k.
A {sub-, true, super-}martingale has a last element ⇐⇒ Mn+ n , (Mn )n , Mn− n u. i..
n   o

h i a.s. h i S  25 Nov
(Lévy’s theorem) Given (Fn )n , X ∈ L1 (Ω, F, ) =⇒ … X Fn −→… X F∞ where F∞ = σ ∞ n=0
F n .
  a.s. L1
(Lévy’s corollary) A ∈ F∞ =⇒  A Fn −→1A .
L1h i
a.s.
(Lévy’s 0-1 law) Fn % F∞ , A ∈ F∞ =⇒ … 1A Fn −→1A .

A last element is not generally unique. However, if we require it to be in F∞ then it is unique.


There is a bijection between L1 (Ω, F∞ , ) and the space of u. i. martingales.
We can identify a u. i. martingale by its last element.
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 14 of 14

For stopping time N, F N B {A ∈ F : A ∩ {N = n} ∈ F n ∀n}.


X N is the random variable for the value of process Xn at stopped time N.
Always XN ∈ FN .

(Optional sampling for bounded stopping times)


{Xn , Fn }n submartingale, M, N stopping times with 0h ≤ M
≤ iN ≤ k for k ∈ Ž.
=⇒ …[X0 ] ≤ …[X M ] ≤ …[XN ] ≤ …[Xk ] and X M ≤ … XN F M .

(Optional sampling theorem) For M ≤ N stopping times, possibly unbounded:


+
  h i
(i) Xn submartingale, Xn∧N u. i. =⇒ … XN F M ≥ X M , …[XN ] ≥ …[X M ]
h i
(ii) Xn martingale, (Xn∧N ) u. i. =⇒ … XN F M = X M , …[XN ] = …[X M ]
  h i
(iii) −
Xn supermartingale, Xn∧N u. i. =⇒ … XN F M ≤ X M , …[XN ] ≤ …[X M ]
2 Dec
“Optional sampling holds iff the submartingale has a last element after N”
follows from Xn submartingale =⇒ Xn∧N submartingale and Xn∧N + u. i. ⇐⇒ Xn∧N has a last element.
h i 
(independence lemma) X ∈ G, Y y G, and some integrability conditions =⇒ … g(X, Y) G = … g(x, Y) x=X


(martingale inequality) Xn submartingale, X¯n B max Xk+ =⇒ λ X¯n ≥ λ ≤ … 1{X¯n ≥λ} Xn+ ≤ … Xn+ .
  h i  
k=0,...,n
h  p i  p  p 
(Doob’s L p maximal inequality) Xn submartingale, X¯n B max Xk+ , 1 < p < ∞ =⇒ … X¯n ≤ p−1 … Xn+ p .
 
k=0,...,n

(Wald I) Xi ∈ L1 i. i. d. , S n = X1 + · · · + Xn , N stopping
h i time where …[N]h < ∞ i =⇒ …[S N ] = …[N] …[X].
(Wald II) Above assumptions plus …[X] = 0, … X 2 = σ2 < ∞ =⇒ … S 2N = σ2 …[N].
a.s.
h i
Xn martingale , 1 < p < ∞, sup … (Xn ) p < ∞ =⇒ Xn −→ , =
 
p
X ∞ ∈ F ∞ Xn … X∞ Fn .
n L
( )
For 1 < p < ∞ we identify L p (Ω, F ∞ , ) Hp with Xn martingale : sup … (Xn ) < ∞ .
p


n
4 Dec
Within (Ω, F), a sequence of σ-algebras {Gn }n is called a backward filtration if F ⊃ Gn ⊃ Gn+1 ∀n.
n o h i
Mn ∈ L1 , Gn is a backward {submartingale, martingale, supermartingale} if … Mn Gn+1 {≥ , =, ≤ } Mn+1 .
n

(backward martingale convergence) h i


a.s.
{Mn , Gn } backward martingale =⇒ Mn −→ M∞ = … M1 G∞ where G∞ B ∞
T
n=1 G n
L1
h i h i
H y σ(X, G) =⇒ … X σ(G, H) = … X G .

n=1 σ(Xn , Xn+1 , . . .) ⊂ G∞ =⇒ (A) ∈ {0, 1}.


T∞
(Kolmogorov 0-1 law) Xn independent, A ∈
S n a.s.
(SLLN) Xi ∈ L1 i. i. d. =⇒ n −→ …[Xi ] is proven by showing
L1

 Sn = σ(S
(1) G  n , S n+1 , . . .) is a backward filtration,
(2) n , Gn is a backward martingale, and
n
nh i h i
a.s.
(3) S n −→… S 1 G = … X G = …[X ].
n 1 ∞ i i
L1 ∞

You might also like