Probability Theory I: CAM 384K Concepts

Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 1 of 14
28 Aug
F ⊂ 2Ω is a σ-algebra iff
(i) F,∅
(ii) A ∈ F =⇒ A0 = Ω − A ∈ F
∞
[
(iii) Ai ∈ F ∀i ∈ =⇒ Ai ∈ F
i=1
∅, Ω ∈ F. F is closed under countable intersection.
Ω , ∅ is the sample space. F ⊂ 2Ω is a σ-algebra of events. A ∈ F is an event.
(Ω, F) is a measurable space.
µ : F → [0, ∞] is a measure iff
(i) µ(A) ≥ µ(∅) = 0
[ X
(ii) Ai ∈ F countable, Ai disjoint =⇒ µ Ai = µ(Ai )
If is a measure where (Ω) = 1 then : F → [0, 1] is a probability measure. It obeys

(monotonicity) A ⊂ B =⇒ (B) − (A) = (B − A) ≥ 0
[ X
(subadditivity) Ai ∈ F, A ⊂ Ai =⇒ (A) ≤ (Ai )
(continuity from below) Ai % A =⇒ (Ai ) % (A)
(continuity from above) Ai & A =⇒ (Ai ) & (A)
(Ω, F, ) is a probability space.
2 Sept
If A ⊂ 2Ω then A generates σ(A) B F, σ-algebra
T
:A⊂F
using that Fι ⊂ 2Ω , ι ∈ I are σ-algebras =⇒ ι∈I Fι is a σ-algebra.
T
Given (Ω1 , F1 , 1 ) , . . . , (Ωn , Fn , n ) define the product probability space (Ω, F, ) where
Ω B Ω1 × · · · × Ωn = {(ω1 , . . . , ωn ) : ωi ∈ Ωi }
F B F1 × · · · × Fn = σ({A1 × · · · × An : Ai ∈ Fi })
B 1 × · · · × n where (A1 × · · · × An ) = 1 (A1 ) . . . n (An )
F exists by Carathéodory’s extension theorem. F is unique by the π − λ theorem.
P ⊂ 2Ω is a π-system iff
(i) P,∅
(ii) A, B ∈ P =⇒ A ∩ B ∈ P
L ⊂ 2Ω is a λ-system iff
(i) Ω∈L
(ii) A, B ∈ L, A ⊂ B =⇒ B − A ∈ L
(iii) Ai ∈ L, Ai % A =⇒ A ∈ L
Generally, λ-systems are not σ-algebras.
A λ-system which is additionally closed under intersection is a σ-algebra.
If A ⊂ 2Ω then A generates `(A) B L, λ-system
T
:A⊂L
using that Lι ⊂ 2Ω , ι ∈ I are λ-systems =⇒ ι∈I Lι is a λ-system.
T
(π-λ theorem) P ⊂ L =⇒ σ(P) ⊂ `(P) ⊂ L.

The Borel sets on are R B σ open sets in = σ({(a, b] : a < b}) = σ({(a, b) : a < b}).

The Borel sets on n are R n B σ open sets in n = R × · · · × R.

For measure µ : R → [0, ∞], assume ∃F : → such that µ((a, b]) = F(b) − F(a).
Such an F must be nondecreasing and right continuous.
(Lebesgue-Stieltjes)
F : → nondecreasing, right-continuous =⇒ ∃!µ : R → [0, ∞] where µ((a, b]) = F(b) − F(a).
(Lebesgue measure) The unique measure λ : R → [0, ∞] where λ ((a, b]) = b − a.
4 Sept
{X ∈ B} B X −1 (B) = {ω ∈ Ω : X(ω) ∈ B}.
A random variable is a function X : (Ω, F) → such that {X ∈ B} ∈ F ∀B ∈ R.
Define the probability measure µX : → [0, 1], called the distribution, such that µ X (B) = (X ∈ B).
(, R, µX ) is a probability space.
The distribution function of X is F X : → [0, 1] such that F X (x) B (X ≤ x) = µX ((−∞, x]).
Every F X satisfies
(i) x ≤ y =⇒ F(x) ≤ F(y)
(ii) xn & x =⇒ F(x) = lim F(xn ) = F(x+ )
n
(iii) (X = x) = F(x) − F(x− )
(v) (X < x) = F(x− )
(vi) F(−∞) B lim F(x) = 0
x→−∞
(vii) F(+∞) B lim F(x) = 1
x→+∞
Properties (i), (ii), (vi), and (vii) characterize a distribution.
Given F : → obeying (i), (ii), (vi), and (vii), there exists a random variable X such that F X = F.
−1 (y) B sup
Fleft {x ∈ X : F(x) < y} is the left continuous inverse of a distribution F.
−1 (y) B sup
Fright {x ∈ X : F(x) ≤ y} is the right continuous inverse of a distribution F.
−1 −1 (y) is used.
Generally, F (y) B Fleft
F X continuous =⇒ F X (X) is uniformly distributed on (0, 1).
9 Sept
Random variables X and Y are equal in distribution, denoted X =Y, when F X = FY .
d

1 x ∈ A

Define the indicator function 1A (x) = 

.
0 x < A

X : (Ω, F) → (S , S) is measurable if ∀B ∈ S {X ∈ B} ∈ F.
X measurable wrt F is also written X ∈ F.
X
For (Ω, F) → (S , S = σ(A)), if {X ∈ A} ∈ F ∀A ∈ A then X is measurable.
X f
For (Ω, F) → (S , S) → (T, T ) if X, f measurable then f (X) measurable.
X1 , X2 , . . . Xn random variables, f : (n , Rn ) → (, R) measurable =⇒ f (X1 , . . . , Xn ) a random variable.
For example, X1 + · · · + Xn is a random variable.
X1 , . . . , Xn , . . . random variables =⇒ inf Xn , sup Xn , lim inf Xn , and lim sup Xn are all random variables.

The extended real numbers are B ∪ {−∞, +∞}. The extended Borel sets are R B σ .
An extended random variable is a function X : (Ω, F) → such that {X ∈ B} ∈ F ∀B ∈ R.
All random variables are automatically extended random variables.
Property p (ω) is satisfied almost everywhere (a. e.) when µ({ω ∈ Ω : ¬p (ω)}) = 0.
Property p (ω) is satisfied almost surely (a. s.) when µ({ω ∈ Ω : ¬p (ω)}) = 0.
X = Y almost surely when (X , Y) = µ ({X − Y , 0}) = 0.

n o
For X : (Ω, F) → (S , S) measurable, σ(X) B X −1 (B) : B ∈ S is the σ-algebra generated by X.
A generates S =⇒ X −1 (A) generates σ(X).
Y : Ω → , Y measurable on σ(X) =⇒ ∃ f : R → R : Y = f (X).

11 Sept
(Ω, F, µ) is σ-finite if ∃An ∈ F, An % Ω : µ(An ) < ∞. 16 Sept
f : Ω → measurable is a simple function if f = ai 1Ai where ai ∈ , Ai ∈ F disjoint, µ(Ai ) < ∞.

P
There are four incremental stages in the development of the Lebesgue integral:
Z X
(1. f simple) f dµ B ai µ(Ai )
Z Z Z
(2. f bounded, finite support) f dµ B sup ϕ dµ = inf ϕ dµ
ϕ≤ f ϕ≥ f
ϕ simple ϕ simple
Z Z
(3. f ≥ 0) f dµ B sup 0≤h≤ f h dµ
h bounded
h finite support
Z Z Z
(4. f = f + − f − ) f dµ B f + dµ − f − dµ
fR dµ using An underlying the σ-finite space.

R R
For f ≥ 0 we get the continuity result ( f ∧ n) 1An dµ %
+
For f = f − f , we sayR f is integrable +
f dµ, f − dµ < ∞.
R
−
R only when
Note | f | = f + + f − and | f | dµ ⇐⇒ f + dµ, f − dµ < ∞.
R
The Legesgue integral has the following properties:

Z
(i) f ≥ 0 a. e. =⇒ f dµ ≥ 0
Z Z
(ii) a f dµ = a f dµ
Z Z Z
(iii) f + g dµ = f dµ + g dµ
Z Z
(iv) f ≤ g a. e. =⇒ f dµ ≤ g dµ
Z Z
(v) f = g a. e. =⇒ f dµ = g dµ
Z Z
(vi) f dµ ≤ | f | dµ

Properties (i)–(iii) must be verified at each stage in integral’s development.

Properties (iv)–(vi) follow directly from (i)–(iii). R
In integration, “∞ · 0 B 0” or µ({ f , 0}) = 0 =⇒ f dµ = 0.
L0 (Ω, F, µ) B { f : Ω → measurable}. L0 (Ω, F, µ) B L0 (Ω, F, µ) / ∼ where f ∼ g ⇐⇒ f = g a. e..

L1 (Ω, F, µ) B n f : Ω o . L (Ω, F, µ) B L (Ω, F, µ) / ∼.
1 1
R → p integrable
L p (Ω, F, µ) B f : | f | dµ < ∞ . L p (Ω, F, µ) B L p (Ω, F, µ) / ∼.
n o
L∞ (Ω, F, µ) B f : || f ||∞ < ∞ where || f ||∞ B ess sup f B inf M ∈ : µ({ f > M}) = 0 .

L∞ (Ω, F, µ) B L∞ (Ω, F, µ) / ∼.
Always L∞ , L1 ⊂ L0 .
µ(Ω) < ∞ =⇒ L∞ ⊂ · · · ⊂ L2 ⊂ L1 . This inclusion relation always holds for a probability measure.
(Hölder’s inequality) p, q ∈ [1, ∞] such that 1p + 1q = 1 =⇒ | f g| d µ ≤ || f || p ||g||q .

R
(Minkowski’s inequality) p ∈ [1, ∞] =⇒ || f + g|| p ≤ || f || p + ||g|| p .

a.s.
Xn converges almost surely (everywhere) to X, denoted Xn −→X, when µ({ω ∈ Ω : Xn (ω)−→X(ω)})
/ = 0.
µ
Xn converges in measure (probability) to X, denoted Xn −→X, when µ(|Xn − X| ≥ ε) → 0 ∀ε > 0.
a.s. µ
Xn −→X =⇒ Xn −→X whenever µ(Ω) < ∞.
18 Sept
There are four major theorems regarding interchanging limits and integration:
(bounded convergence) Z Z
µ
| fn | , | f | ≤ M ∈ , fn −→ f, µ(Ω) < ∞ =⇒ fn dµ −→ f dµ
(monotone convergence) Z Z
(Ω, F, µ) σ-finite, fn ≥ 0, fn % f a. e. =⇒ fn dµ % f dµ
(Fatou’s Lemma) Z Z Z Z
fn ≥ 0 measurable =⇒ lim inf fn dµ ≤ lim inf fn dµ ≤ lim sup fn dµ ≤ lim sup fn dµ
(dominated convergence) Z Z
a.s.
fn measurable, fn −→ f, fn ≤ g for g ∈ L =⇒ 1
fn dµ −→ f dµ, f ∈ L1
If ∃ f : → [0, ∞) such that (X ∈ B) = B f (x) dx then f is the density function of X, denoted fX .
R
Rx
By definition F X (x) = −∞ fX (y) dy and fX (y) dy = 1.
R
R
The expected value of X on (Ω, F, ) is [X] B X d.
[X] is also called the mean and denoted µX or µ.
On a discrete space, [X] = i ωi (X = ωi ).
P
(X ∈ A) = [1A ].
X g
(transport formula) (Ω, F, ) → (S , S) → (, R), g measurable, g ≥ 0 or g bounded =⇒
Z Z Z Z
g(X) = g(X) d = g(x) dµX (x) = g(x) dF X (x) =

g(x) fX (x) dx.

The transport formula implies g(X) ∈ 1
L (Ω, F, )
⇐⇒ g ∈ L1 (, R, µ).
h i
X k for k ∈ is called the kth moment of X.
h i h i
(X − [X])k = (X − µ)k for k ∈ is called the kth centered moment of X.
The variance of X, denoted varh X, iis the second centered moment.

h i
Using the definition, var X = X 2 − µ2 . Always var X ≤ X 2 and var (aX + b) = a2 var X.
√
σX B var X is the standard deviation of X.

(Markov’s inequality) ϕ : → [0, ∞), A ∈ R =⇒ inf y∈A ϕ(y) (X ∈ A) ≤ ϕ(X)1X∈A ≤ ϕ(X) .

ϕ(|X|)1 ϕ(|X|)
(Chebyshev’s inequality) Markov when A = {x ∈ : |x| ≥ a} =⇒ (|X| ≥ a) ≤ [ ϕ(a){|X|≥a} ] ≤ [ϕ(a) ] .
X2
Moments are often used for ϕ, e.g. (|X| ≥ a) ≤ [ ] and (|X − µ| ≥ a) ≤ var X .
a2 a2
22 Sept
ϕ : → is convex if ϕ (λx + (1 − λ)y) ≤ λϕ(x) + (1 − λ)ϕ(y) whenever λ ∈ [0, 1].
In words, “ϕ weighted average ≤ weighted average ϕ.”

ϕ ∈ C 1 ([a, b]) convex ⇐⇒ ϕ(x) + ϕ0 (x)(y − x) ≤ ϕ(y) ∀x, y ∈ [a, b].
ϕ ∈ C 2 ([a, b]) convex ⇐⇒ ϕ00 (x) ≥ 0 ∀x ∈ [a, b] .
(Jensen’s inequality) ϕ convex and [|X|] , |ϕ(X)| < ∞ =⇒ ϕ ([X])

h i ≤ ϕ(X) .

Moments are often used for ϕ, e.g. |[X]| ≤ [|X|] and [X]2 ≤ X 2 .
The conditional probability that event A occurs given event B is (A|B) = (A∩B)
(B) provided (B) , 0.
Independence of two objects, denoted using infix y, is defined as follows:

(Events A, B) A y B ⇐⇒ (A ∩ B) = (A) (B)
(Random variables X, Y) X y Y ⇐⇒ (X ∈ C, Y ∈ D) = (X ∈ C) (Y ∈ D) ∀C, D ∈ R
(σ-algebras G1 ,G2 ) G1 y G2 ⇐⇒ A y B ∀A ∈ G1 ∀B ∈ G2
AyB =⇒ [AB] = [A] [B].

AyB =⇒ (A|B) = (A).
AyB =⇒ A0 y B, A y B0 , A0 y B0 .
AyB ⇐⇒ 1A y 1B .
XyY ⇐⇒ σ(X) y σ(Y).
Independence of a finite collection requires these logical extensions of the above to hold:
\ Y
(σ-algebras G1 , . . . ,Gn ) Ai = (Ai ) for Ai ∈ Gi
\ Y
(Random variables X1 , . . . , Xn ) {Xi ∈ Ci } = (Xi ∈ Ci ) for Ci ∈ R
 
\  Y
(Events A1 , . . . , An )  Ai  = (Ai ) whenever I ⊂ {1, . . . , n}
i∈I i∈I
 
\  Y
(Classes of events A1 , . . . , An )  Ai  = (Ai ) for Ai ∈ Ai whenever I ⊂ {1, . . . , n}
i∈I i∈I
Independence of an infinite collection requires that every finite subcollection be independent.
Pairwise independence of a collection’s elements does not imply the collection is independent.
A1 , . . . , An independent π-systems =⇒ σ(A1 ) , . . . , σ(An ) independent.
(X1 ≤ x1 , . . . , Xn ≤ xn ) = ≤ xi ) =⇒ X1 , . . . , Xn independent.
Qn
i=1 (Xi
25 Sept
Independence of objects derived from independent triangular arrays:

S 
F1,1 . . . F1,m(1) 
 G1 = σ m(1) j=1 F1, j 
. . . .

 

. . . independent =⇒ .
 
(i) . . .  . independent
 
 

Fn,1 . . . Fn,m(n) 
 Sm(n) 
Gn = σ j=1 Fn, j
 


X1,1 . . . X1,m(1)  f1 X1,1 , . . . , X1,m(1) 

 
. . . .

 

. . . =⇒ .
 
(ii) . . .  independent . independent
 
 

Xn,1 . . . Xn,m(n) fn Xn,1 , . . . , Xn,m(n)


 


Given two random variables

X : (Ω, F, ) → (S 1 , S1 ) with µX : (S 1 , S1 ) → [0, 1]
Y : (Ω, F, ) → (S 2 , S2 ) with µY : (S 2 , S2 ) → [0, 1]
we have the joint random variable
(X, Y) : (Ω, F, ) → (S 1 × S 2 , S1 × S2 ) with µX,Y : (S 1 × S 2 , S1 × S2 ) → [0, 1]
where µX,Y is the unique joint distribution of X and Y satisfying µX,Y (A × B) = µX (A) µY (B) ∀A ∈ S1 ∀B ∈ S2 .
If X1 , . . . Xn are random variables on (Ω, F, ) then X1 , . . . Xn independent ⇐⇒ µX1 ,...,Xn = ni=1 µXi .
Q
(Fubini’s theorem) σ-finite (S 1 , S1 , µ1 ), (S 2 , S2 , µ2 ) and either f ≥ 0 or | f | d (µ1 × µ2 ) < ∞ =⇒

R
Z "Z # Z Z "Z #
f (u1 , u2 ) d (µ2 ) d (µ1 ) = f (u1 , u2 ) d (µ1 × µ2 ) = f (u1 , u2 ) d (µ1 ) d (µ2 ) .
S1 S2 S 1 ×S 2 S2 S1
X1 , . . . Xn with densities f1 , . . . , fn independent ⇐⇒ (X1 , . . . , Xn ) has density f ((x1 , . . . , xn )) = f1 (x1 ) . . . fn (xn )
X y Y, h : 2 → measurable =⇒ [h(X, Y)] = h(x, y) dµX dµY using Fubini.

R R
X y Y =⇒ f (X)g(Y) = f (X) g(Y) .

X1 , . . . , Xn independent =⇒ [X1 . . . Xn ] = [X1 ] . . . [Xn ].
27 Sept
For X y Y, the distribution of the sum Z = X + Y is
Z
Fubini
FZ (z) = (X + Y ≤ z) = 1{X+Y≤z} = 1{X≤z−Y} = F X (z − y) dFY (y) = (F X ∗ FY ) (z).

If X has density fX then fZ (z) = fX (x − y) dFY (y) since
R
Z Z z−y ! Z z Z !
FZ (z) = 1{X≤z−Y} = fX (x) dx dFY (y) =

fX (x − y) dFY (y) dx.
−∞ −∞
Additionally, if Y has density fY then f Z = ( f X ∗ fY ) (z).
cov (X, Y) B (X − µX ) (Y − µY ) = [XY] − [X] [Y] is the covariance of X, Y ∈ L2 .

ρ (X, Y) B √ cov(X,Y)
√ = cov(X,Y)
σX σY is the correlation coefficient of X, Y.
var X var Y
X, Y are uncorrelated if cov (X, Y) = ρ (X, Y) = 0.

X y Y =⇒ X, Y uncorrelated but X, Y uncorrelated =⇒ / XyY
uncorrelated X1 , . . . , Xn ∈ L2 =⇒ var (X1 + · · · + Xn ) = var X1 + · · · + var Xn .

Lp
For 1 ≤ p < ∞, Xn converges in L p to X, denoted Xn −→X, when ||Xn − X||L p = |Xn − X| p −→ 0.

L∞
For p = ∞, Xn converges in L∞ , denoted Xn −→X, when ||Xn − X||L∞ = ess sup |Xn − X| −→ 0.
Lq Lp
Xn −→X =⇒ Xn −→X using that µ(Ω) < ∞,1 ≤ p ≤ q ≤ ∞ =⇒ ||·||L p ≤ ||·||Lq
Lp
Xn −→X =⇒ Xn −→X using Chebyshev’s inequality.
S n L2
(L2 -WLLN) uncorrelated Xi ∈ L2 , [Xi ] = µ, var Xi ≤ C < ∞, and S n B X1 + · · · + Xn =⇒ n −→µ.
2 Oct
S n L2
(popular WLLN) Xi ∈ L2 independent and identically distributed (i. i. d.) =⇒ n −→µ.
S n −[S n ] L2
(WLLN for triangular arrays) S n = Xn,1 + · · · + Xn,n and var S n
b2n
−→ 0 for some bn =⇒ bn −→0.
7 Oct
A random variable X with large tails can be truncated outside a threshold M, i.e. X̄ B X1{|X|≤M} .
(WLLN for triangular arrays with independent rows) Construct bn > 0, bn % ∞ such that both
Pn 2
n Xn,k 1
{|Xn,k |≤bn }

X k=1
Xn,k ≥ bn −→ 0 and −→ 0.
k=1
b2n

Define an = nk=1 Xn,k 1{|Xn,k |≤bn } and S n = Xn,1 + · · · + Xn,n . Under these conditions S nb−a
P n
n
−→0.
R∞
X ≥ 0, f : [0, ∞] → [0, ∞] increasing, f ∈ C 1 , and f (0) = 0 =⇒ [ f (X)] = 0 f 0 (x)(X ≥ x) dx
R∞
In particular, [X p ] = 0 px p−1 (X > x) dx allows estimating moments using tails.
For discrete N : Ω → ∪ {∞}, [N] = ∞
P
n=0 (N h ≥ n).i
Use p = 1 − ε to show x(|X| > x) −→ 0 =⇒ |X|1−ε < ∞ for ε > 0.

(General WLLN) X1 , . . . , Xn , . . . i. i. d., x(|X| > x) −→ 0 =⇒ Snn − µn −→0 where µn = X 1{|X|≤n} .

X ∈ L1 =⇒ x(|X| > x) −→ 0 since x(|X| > x) = x 1{|X|>x} , x 1{|X|>x} ≤ |X|, and x 1{|X|>x} → 0 as x → ∞.

Sn
(L1 -WLLN) Xi ∈ L1 i. i. d. and [Xi ] = µ =⇒ n −→µ.
9 Oct
Define lim inf An ⊂ lim sup An for sequences of subsets of Ω:
[\ \
Al = lim Al = ω that are in all but finitely many An ’s

lim inf An B
n n→∞
n l≥n l≥n
\[ [
Al = lim Al = ω that are in infinitely many An ’s

lim sup An B
n n→∞
n l≥n l≥n

lim sup An is read An infinitely often (i. o.), i.e. (An i. o.) B lim sup An .
a.s.
Xn −→X ⇐⇒ ∀ε > 0 ({|Xn − X| ≥ ε} i. o.) = 0 using that {Xn −→X}
/ = ∪ε>0 ∩n ∪l≥n {|Xl − X| ≥ ε}.
(Borel-Cantelli 1) n=1 (An ) < ∞ =⇒ (An i. o.) = 0.

P∞
fast
Xn converges fast to X, denoted Xn −→X, if ∀ε > 0 ≥ ε}) < ∞.
P∞
n=1 ({|Xn − X|
fast a.s.
Xn −→X =⇒ Xn −→X by Borel-Cantelli.
(convergence of random variables) Using that for a topological space, yn −→ y if ∀ynm ∃ynmk −→ y:
fast BC1 a.s.
(i) Xn −→X =⇒ Xn −→X =⇒ Xn −→X
fast
(ii) Xn −→X =⇒ ∃Xnk : Xnk −→X
a.s.
(iii) Xn −→X ⇐⇒ ∀Xnm ∃Xnmk −→X
There exist sequences that converge in probability but not almost surely.
Convergence in probability comes from a metric, but convergence almost surely is not from any topology.
S n a.s.
(L4 -SLLN) Xi ∈ L4 i. i. d. =⇒ n −→µ.
14 Oct
S n a.s.
(SLLN) Xi ∈ L1 i. i. d. =⇒ n −→µ.
= ∞ =⇒ (An i. o.) = 1.
P∞
(Borel-Cantelli 2) Ai independent and i=1 (An )
For independent An , the Borel-Cantelli lemmas impose a zero-one law forcing (An i. o.) to be either 0 or 1.
1A1 +···+1An a.s.
= ∞ =⇒
P∞
(Borel-Cantelli 2 extension) Ai independent and i=1 (An ) (A1 )+···+(An ) −→1 as n → ∞.
Weak convergence or convergence in distribution, written with an infix ⇒ , is defined as follows:

(Distribution functions) Fn ⇒ F ⇐⇒ Fn (x) −→ F(x) at each x where F is continuous.
(Probability measures) n ⇒ ⇐⇒ distribution functions Fn ⇒ F
(Random variables) Xn ⇒ X ⇐⇒ distribution functions Fn ⇒ F
Practically, Xn ⇒ X means n (X ≤ x) −→ (X ≤ x) whenever (X = x) = 0.
Weak convergence is metrizable, that is Fn ⇒ F ⇐⇒ ρ (Fn , F) −→ 0

where ρ (F,G) B inf {ε : F(x − ε) − ε ≤ G(x) ≤ F(x + ε) + ε ∀x} is the Lévy metric.
16 Oct a.s.
Fn ⇒ F as n → ∞ =⇒ ∃Xn , X such that F Xn = Fn , F X = F, and Xn −→X as n → ∞.
When taking expectations, everything using almost sure convergence can instead use weak convergence.
(characterization of weak convergence) The following are equivalent:

(i) Xn ⇒ X

(ii) g(Xn ) −→ g(X) ∀g : → continuous, bounded
(iii) (X ∈ G) ≤ lim inf (Xn ∈ G) ∀G open
(iv) (X ∈ F) ≥ lim sup (Xn ∈ F) ∀F closed
(v) (Xn ∈ A) −→ (X ∈ A) if (X ∈ ∂A) = 0
Results (iii) and (iv) are lower and upper semicontinuity, respectively.
(continuous mapping theorem) Xn ⇒ X, (X ∈ {x ∈ : g is discontinuous at x}) = 0 =⇒ g(Xn ) ⇒ g(X).
n , ∈ C ∗ (), the dual of the space of continuous, bounded functions on .

n ⇒ is weak-∗ sequential convergence under the assumption () = 1.
(Helly’s selection/compactness theorem)

Fn a sequence of distribution functions =⇒ ∃Fnk , ∃F right continuous and nondescreasing : Fnk ⇒ F.
F is not necessarily a distribution function because mass may escape at ±∞.
21 Oct
P, a set of probability measures on , is tight if ∀ε > 0 ∃Mε ∈ + such that µ([−Mε , Mε ]) ≥ 1 − ε ∀µ ∈ P.
Equivalently P, a set of distribution functions, is tight if 1 − F (Mε ) + F (−Mε ) ≤ ε ∀F ∈ P.
Equivalently Pn , a countable set of distribution functions, is tight if lim supn→∞ [1 − Fn (Mε ) + Fn (−Mε )] ≤ ε.
Fn a tight sequence of distribution functions =⇒ ∃Fnk , ∃F a distribution function such that Fnk ⇒ F.
(Prokorov’s theorem) For P, a set of probability measures on , P is tight ⇐⇒ ∀Pn ∃Pnk , P∞ : Pnk ⇒ P∞ .
For ϕ : → [0, ∞) such that lim|x|→∞ ϕ(x) = +∞, if ϕ(x) dF(x) ≤ C < ∞ ∀F ∈ P then P is tight.
R
For an integer-valued X, let an B (X = n) where ∞ an = 1.

P
P∞ i=0 n h i
Define generating function g(x) = n=0 an x = n=0 x (X = n) = xX .
P∞ n
Knowing g(x) is equivalent to knowing (X = n).

h i R
Every random variable X has a characteristic function ϕX (t) B eitX = eitx µX (dx):
h i
(i) ϕ(0) = ei0X = 1
(ii) ϕ(−t) = [cos(−tX)] + i[sin(−tX)] = [cos(tX)] − i[sin(tX)] = ϕ(t)
h i
(iii) |ϕ(t)| ≤ eitX = [1] = 1
h i h i
(iv) ϕaX+b (t) = eit(aX+b) = eitb eitaX = eitb ϕX (at)
(v) X y Y =⇒ ϕX+Y (t) = ϕX (t) ϕY (t)
F X (t) = i=1 λi F Xi (t) i=1 λi = 1 =⇒ ϕX (t) = i=1 λi ϕXi (t).

Pn Pn Pn
where
Characteristic functions are uniformly continuous since dominated convergence implies

h i h i h i
|ϕ(t + h) − ϕ(t)| ≤ ei(t+h)X − eitX = eitX eihX − 1 = eihX − 1 → 0 as h → 0.
R y R y 28 Oct
eix − eiy ≤ |x − y| for x, y ∈ since eix − eiy ≤ x dtd eit = x ieit dt = y − x.
k h i R∞
|X|n < ∞ =⇒ ϕ0 , . . . , ϕ(n) exist everywhere, continuous, and dtd k ϕ(t) = (iX)k eitX = −∞ (ix)k eitx µX (dx).

k h i
|X|n < ∞ =⇒ dtd k ϕ(0) = ik X k for k = 0, . . . , n.

itb −eita
Uniform U (a, b) with density f (x) = 1
b−a has characteristic function ϕ(t) = eit(b−a) .
2 2
Normal N (0, 1) with density f (x) = √1 exp −x2 has characteristic function ϕ(t) = exp − t2 .
2π
2
N µ, σ with density f (x) = √ 1 exp −(x−µ)

Normal 2
2σ2
has characteristic function ϕ(t) = exp itµ − 2 σ
1 2 2
t .
2πσ
The inversion formula recovers a distribution from a characteristic function:

Z ∞ −ita −itb
1 1 e −e
µ((a, b)) + µ({a, b}) = ϕ(t) dt
2 2π −∞ it
Z T
1
µ({a}) = lim e−ita ϕ(t) dt
T →∞ 2T −T
In particular, the limits above always exist at ±∞.
ϕ = ϕ0 =⇒ µ = µ0
R∞ 1 ∞ −itx
|ϕ (t)| dt < ∞ =⇒ X has a continuous, bounded density function fX (x) = ϕ(t)
R
−∞ X 2π −∞ e dt.
30 Oct
(continuity theorem) For a sequence of probability distributions µn and their characteristic functions ϕn (t),
(i) µn ⇒ µ∞ =⇒ ϕn (t) −→ ϕ∞ (t) ∀t
(ii) ϕn (t) → ϕ∞ (t) ∀t, ϕ∞ continuous at 0 =⇒ µn ⇒ µ∞ , µn tight
where µ∞ is another probability distribution and ϕ∞ is its characteristic function.
S n −nµ
(CLT) Xi ∈ L2 i. i. d., [Xi ] = µ, 0 < var Xi = σ2 , S n = Xi + · · · + Xn =⇒ σ n
√ ⇒ N(0, 1).

≤ min |x|n+1 , 2|x|n with n = 2 gives the estimate ϕ(t) − 1 − σ2 t2 ≤ min |tx|3 , t2 x2 .
n
Using eix − nk=0 (ix)
P
n! (n+1)! n! 2 6
n
cn → c ∈ =⇒ 1 + cnn → ec
(self-normalized sums) Pn
X
Xi i. i. d., [Xi ] = 0, var Xi = σ2 ∈ (0, ∞) =⇒ √Pi=1n i 2 ⇒ N (0, 1).
i=1 Xi
A triangular array satisfies the Lindeberg conditions if

(i) σ2n,1 + · · · + σ2n,n → σ2 > 0
n
X 2
(ii) ∀ε > 0 Xn,k 1{|Xn,k ≥ε|} → 0 as n → ∞
k=1
The second condition requires that all array elements contribute “equally” to the sum.
h i
(Lindeberg-Feller CLT) If a triangular array with independent rows satisfies Xn,i = 0, Xn,i
2 = σ2 < ∞,

n,i
X +···+X

and the Lindeberg conditions then n,1 √n n,n ⇒ N 0, σ2 .
4 Nov

n n n
|w1 | , . . . , |wn | , |z1 | , . . . , |zn | ≤ θ =⇒ wi − zi ≤ θn−1 |wi − zi |.
Q Q P
i=1 i=1 i=1
√ √ Rb
a pq a pq
√1 e−x /2 dx
B(n,p)−np B(n,p) 2
√ √
pq n
⇒ N(0, 1) so n ∈ p + √n , p + √n −→ a by the CLT.
2π
1A1 +···+1An a.s. 1A1 +···+1An −log n

For record values with (An ) = 1
n where log n −→1 we get √ ⇒ N(0, 1) by the CLT.
log n
6 Nov

For Ω finite with partition Ωi=1,k , ω j Ωi B ω j /(Ωi ) if ω j ∈ Ωi . ω j Ωi B 0 if ω j < Ωi .
h i P P
For X : Ω → , let yi = X Ωi B Nj=1 X(ω j ) ω j Ωi = ω∈Ωi X(ω)(ω)
(Ωi ) .
Define Y : Ω → by Y|Ωi = yi for i = 1, . . . , k. Then [X1B ] = [Y1B ] whenever B = ∪Ωi .
(Radon-Nikodym) If (Ω, F) a measurable space with measures µ, ν where µ(A) = 0 =⇒ R ν(A) = 0 ∀A ∈ F

(i.e. ν µ, absolutely continuous) then ∃ f : Ω → [0, ∞) measurable such that ν(A) = A f (x) µ(dx) ∀A ∈ F.
f is usually denoted dν/dµ and called the Radon-Nikodym derivative.
h i
For X ∈ L1 and G ⊂ F, random variable Y = XG is a conditional expectation of X wrt G iff
(i) Y measurable wrt G,
(ii) [X1B ] = [Y1B ] ∀B ∈ G.
Y exists by Radon-Nikodym and is unique up to versions, i.e. Y = Y 0 a. s. ⇐⇒ Y 0 is a version of Y.
h i h i
XY B X σ(Y) = h(Y) where h : (, R) → (, R) is the conditional expectation of X wrt Y.
Since σ(Y) 3 B = {Y ∈ C} with C ∈ R,
Z Z
[X1B ] = X1{Y∈C} = [h(Y)1B ] = h(Y)1{Y∈C} = = h(y) Y (dy) .

h(y)1{y∈C} Y (dy)
C
Conditional expectation has the following properties:

h i h i
(i) X ∈ L1 =⇒ X G ∈ L1 and X G 1 ≤ ||X||L1 .
h i h i h i L
(ii) aX + bY G = a X G + b Y G
h i h i
(iii) X ≤ Y a. s. =⇒ X G ≤ Y G
h h ii
(iv) X G = [X] .
h h i i
(v) [X1A ] = X G 1A for A ∈ G
h i
(vi) X y G =⇒ X G = [X]
L1
h i L1 h i
(vii) Xn −→X =⇒ Xn F −→ X F
h i
(viii) 1A Fn = AFn
h i h i
For X ∈ L2 (Ω, F, ), G ⊂ F, X G is the unique random variable attaining minY∈L2 (Ω,G,) (X − Y)2 .
That is, the conditional expectation is the orthogonal projection wrt the L2 -inner product.
h i h i2
var X G B X 2 G − X G is the conditional variance of X wrt G.
h i h i
var X = var X G + var X G .

X 2 F

(Chebyshev’s inequality) a > 0 =⇒ |X| ≥ aF ≤ a2
.
h i h i
(Jensen’s inequality) ϕ convex and [|X|] , |ϕ(X)| < ∞ =⇒ ϕ X F ≤ ϕ (X)F .

h h i i h h i i h i 11 Nov
(tower property) F1 ⊂ F2 =⇒ X F2 F1 = X F1 F2 = X F1 , i.e. “smaller σ-algebra wins”
h i h i
(taking out what is known) X ∈ G and XY, Y ∈ L1 =⇒ XY G = X Y G .
h i h i
(monotone convergence) If X ∈ L1 and Xn % X =⇒ Xn G % X G .
A filtration {F n} a sequence of σ-algebras where Fn ⊂ Fn+1 .
A discrete time stochastic process is a sequence of random variables {Xn } on (Ω, F, ) indexed by time.
{Xn } (ω) = {X0 (ω), X1 (ω), . . . } for fixed ω ∈ Ω is called a path of a stochastic process.
A stochastic process can be viewed as one of
(1) a sequence of random variables,

(2) an infinite dimensional random variables X : Ω → paths, or
(3) a two dimensional function X(·) (·) : × Ω → .
An adapted stochastic process {Xn } satisfies Xn ∈ Fn ∀n, i.e. Xn is measurable wrt Fn .

The filtration {Fn }n = {σ(X0 , . . . , Xn )}n is called the natural filtration of a stochastic process.
{Xn } is adapted to {Fn } ⇐⇒ σ(X0 , . . . , Xn ) ⊂ Fn ∀n.
All processes are adapted on their natural filtration.

Ω, F, {Fn }n=0,1,... , is called a filtered probability space.

On Ω, F, {Fn }n=0,1,... , , a process {Mn } is a { submartingale, martingale, supermartingale }
h i
iff ∀n all of Mn ∈ L1 , Mn ∈ Fn (adapted), and Mn+1 Fn {≥, =, ≤} Mn hold (respectively).
Over time, a {sub-, true, super-}martingale {increases, stays the same, decreases} in conditional expectation:
h i
Mn submartingale ⇐⇒ Mn+1 − Mn Fn ≥ 0
h i
Mn martingale ⇐⇒ Mn+1 − Mn Fn = 0
h i
M supermartingale ⇐⇒ M − M F ≤ 0
n n+1 n n
All true martingales are both submartingales and supermartingales.
{Mn , Fn }n a martingale =⇒ Mn , Fnm n a martingale by the tower property.

i Any submartingale Xn can be decomposed uniquely as Xn = Mn + An where A0 B 0,

(Doob’s decomposition)
h
An+1 − An B Xn+1 Fn − Xn is an increasing, predictable sequence and Mn B Xn − An is a martingale.
13 Nov h i
(multistep) {Mn , Fn }n a {sub-, true, super-}martingale =⇒ Mk Fn {≥, =, ≤} Mn ∀n ≤ k (respectively).
Assume Mn ∈ Fn , Mn ∈ L1 and consider ϕ : → such that ϕ(Mn ) ∈ L1 :

(i) Mn martingale, ϕ convex =⇒ ϕ(Mn ) submartingale
(ii) Mn martingale, ϕ concave =⇒ ϕ(Mn ) supermartingale
(iii) Mn submartingale, ϕ convex, ϕ increasing =⇒ ϕ(Mn ) submartingale
h i h i
These follow from conditional Jensen, e.g. for (i) we have ϕ(Mn ) = ϕ Mn+1 Fn ≤ ϕ (Mn+1 )Fn .
Useful examples functions to combine with this fact include |Mn |, (Mn )+ , (Mn − a)+ , (Mn )− , and (Mn − a)− .
H is predictable if Hn ∈ Fn−1 .
Predictable implies is “deterministic” behavior on natural filtration.
Predictable H may be thought of as a betting strategy, i.e. Hn (Mn − Mn−1 ) is the payoff at time n.
(H · M) n B nk=1 Hk (Mk − Mk−1 ) is the discrete time stochastic integral of H onto M.

P
(H · M)n is also called a martingale transform.
(H · M)n is an adapted process.
{Mn , Fn }n supermartingale, H predictable with Hn bounded =⇒ {(H · M)n , Fn }n is a supermartingale.

18 Nov
A random time T : Ω → ∪ {∞} is a stopping time wrt filtration Fn if {T = n} ∈ Fn ∀n.
N = inf n {Xn ∈ A} for A ∈ R is called the hitting time of A.

N is a stopping time because {N = n} = ({X0 < A} ∩ · · · ∩ {Xn−1 < A} ∩ {Xn ∈ A}) ∈ Fn .
{T = n} ∈ Fn =⇒ {T ≤ n} = ({T = 0} ∪ · · · ∪ {T = n − 1} ∪ {T = n}) ∈ Fn .
{T = n} ∈ Fn =⇒ {T < n} = ({T = 0} ∪ · · · ∪ {T = n − 1}) ∈ Fn .
XTn B XT∧n = XT (ω)∧n (ω) is a stopped process: one which “runs until stopping time N occurs.”
T stopping time =⇒ 1{n≤T } predictable because {T ≥ n} = {T < n}C = {T ≤ n − 1}C ∈ Fn−1 .

T stopping time =⇒ 1{n>T } predictable because 1{n>T } = 1 − 1{n≤T } .
1{n≤T } · X n = XT ∧n − X0

T stopping time, {Mn , Fn }n martingale =⇒ {Mn∧T , Fn }n martingale because MT ∧n = M0 + 1{n≤T } · M n .

Using Mn submartingale and a < b, let T 2k+1 = minn≥2k {Mn ≤ a} and let T 2k+2 = minn≥2k+1 {Mn ≥ b}.
Call (T 2k+1 , T 2k+2 ) an interval
P of upcrossing and denote U (a, b) for the number of upcrossings.
The trading strategy Hn = ∞ k=0 1{T 2k+1 <n≤T 2k+2 } is predictable. It represents buy below a and sell above b.
Define Yn = Xn ∧ a = a + (Xn − a)+ so that (H · Y)n represents gains over upcrossings plus a possible last gain.
Then (b − a) Un (a, b) ≤ (H · Y)n . Also 1 − H ≥ 0 =⇒ ((1 − H) · Y) submartingale =⇒ [((1 − H) · Y)] ≥ 0.
(submartingale upcrossing inequality) ∴ (b − a) [U n (a, b)] ≤ (X n − a)+ − (X0 − a)+

(martingale convergence)
{Mn , Fn }n submartingale, supn Mn+ < ∞ =⇒ Mn −→M, M ∈ L1 .
a.s.
a.s.
{Mn , Fn }n supermartingale, Mn ≥ 0 =⇒ Mn −→M, [M] ≤ [M0 ].
20 Nov

(Xi )i∈I is uniformly integrable (u. i.) if supi∈I |Xi | 1{|Xi |≥M} → 0 as M → ∞.
(Xi )i∈I u. i. ⇐⇒ Xi are L1 -bounded and ∀ε > 0 ∃δ > 0 : ∀A ∈ F (A) ≤ δ =⇒ [|Xi | 1A ] ≤ ε ∀i ∈ I.
(de la Vallée Possin criterion)

(Xi )i∈I u. i. ⇐⇒ ∃ψ : [0, ∞) → [0, ∞) increasing such that ψ(x)/x → ∞ as x → ∞ and supi∈I ψ(|Xi |) < ∞.

|Xi | ≤ |X| for X ∈ L1 =⇒ (Xi )i u. i..

(Xi )i u. i. =⇒
/ |Xi | ≤ |X| for some X ∈ L1 .
h i
X ∈ L1 (Ω, F, ) , G ⊂ F =⇒ X G is uniformly integrable.
G⊂F
h i
Ω, F, {Fn }n=0,1,... , , M ∈ L1 (Ω, F, ) =⇒ Mn = M Fn is a u. i. Lévy martingale.

L1
Xn −→X =⇒ (Xn )n u. i. ⇐⇒ Xn −→X ⇐⇒ [|Xn |] → [|X|] < ∞
" h i#
a.s. L1
Mn martingale =⇒ (Mn )n u. i. ⇐⇒ Mn −→ M ⇐⇒ Mn −→M ⇐⇒ ∃M ∈ L : Mn = M Fn
1
L1
" #
a.s. L1
Mn submartingale =⇒ (Mn )n u. i. ⇐⇒ Mn −→ M ⇐⇒ Mn −→M
L1
h i h i L1 h i
(multistep at n = ∞) {Mn }n submartingale =⇒ M∞ Fk ≥ Mk since Mn Fk −→ M∞ Fk .
h i
M ∈ L1 is called a last element of an adapted {sub-, true, super-}martingale Mn if M Fk {≥, =, ≤} Mk ∀k.
A {sub-, true, super-}martingale has a last element ⇐⇒ Mn+ n , (Mn )n , Mn− n u. i..
n o
h i a.s. h i S 25 Nov
(Lévy’s theorem) Given (Fn )n , X ∈ L1 (Ω, F, ) =⇒ X Fn −→ X F∞ where F∞ = σ ∞ n=0
F n .
a.s. L1
(Lévy’s corollary) A ∈ F∞ =⇒ AFn −→1A .
L1h i
a.s.
(Lévy’s 0-1 law) Fn % F∞ , A ∈ F∞ =⇒ 1A Fn −→1A .
A last element is not generally unique. However, if we require it to be in F∞ then it is unique.

There is a bijection between L1 (Ω, F∞ , ) and the space of u. i. martingales.
We can identify a u. i. martingale by its last element.
For stopping time N, F N B {A ∈ F : A ∩ {N = n} ∈ F n ∀n}.

X N is the random variable for the value of process Xn at stopped time N.
Always XN ∈ FN .
(Optional sampling for bounded stopping times)

{Xn , Fn }n submartingale, M, N stopping times with 0h ≤ M
≤ iN ≤ k for k ∈ .
=⇒ [X0 ] ≤ [X M ] ≤ [XN ] ≤ [Xk ] and X M ≤ XN F M .
(Optional sampling theorem) For M ≤ N stopping times, possibly unbounded:

+
h i
(i) Xn submartingale, Xn∧N u. i. =⇒ XN F M ≥ X M , [XN ] ≥ [X M ]
h i
(ii) Xn martingale, (Xn∧N ) u. i. =⇒ XN F M = X M , [XN ] = [X M ]
h i
(iii) −
Xn supermartingale, Xn∧N u. i. =⇒ XN F M ≤ X M , [XN ] ≤ [X M ]
2 Dec
“Optional sampling holds iff the submartingale has a last element after N”
follows from Xn submartingale =⇒ Xn∧N submartingale and Xn∧N + u. i. ⇐⇒ Xn∧N has a last element.
h i
(independence lemma) X ∈ G, Y y G, and some integrability conditions =⇒ g(X, Y)G = g(x, Y) x=X

(martingale inequality) Xn submartingale, X¯n B max Xk+ =⇒ λ X¯n ≥ λ ≤ 1{X¯n ≥λ} Xn+ ≤ Xn+ .
h i
k=0,...,n
h p i p p
(Doob’s L p maximal inequality) Xn submartingale, X¯n B max Xk+ , 1 < p < ∞ =⇒ X¯n ≤ p−1 Xn+ p .

k=0,...,n
(Wald I) Xi ∈ L1 i. i. d. , S n = X1 + · · · + Xn , N stopping
h i time where [N]h < ∞ i =⇒ [S N ] = [N] [X].
(Wald II) Above assumptions plus [X] = 0, X 2 = σ2 < ∞ =⇒ S 2N = σ2 [N].
a.s.
h i
Xn martingale , 1 < p < ∞, sup (Xn ) p < ∞ =⇒ Xn −→ , =

p
X ∞ ∈ F ∞ Xn X∞ Fn .
n L
( )
For 1 < p < ∞ we identify L p (Ω, F ∞ , ) Hp with Xn martingale : sup (Xn ) < ∞ .
p

≡
n
4 Dec
Within (Ω, F), a sequence of σ-algebras {Gn }n is called a backward filtration if F ⊃ Gn ⊃ Gn+1 ∀n.
n o h i
Mn ∈ L1 , Gn is a backward {submartingale, martingale, supermartingale} if Mn Gn+1 {≥ , =, ≤ } Mn+1 .
n
(backward martingale convergence) h i

a.s.
{Mn , Gn } backward martingale =⇒ Mn −→ M∞ = M1 G∞ where G∞ B ∞
T
n=1 G n
L1
h i h i
H y σ(X, G) =⇒ X σ(G, H) = X G .
n=1 σ(Xn , Xn+1 , . . .) ⊂ G∞ =⇒ (A) ∈ {0, 1}.

T∞
(Kolmogorov 0-1 law) Xn independent, A ∈
S n a.s.
(SLLN) Xi ∈ L1 i. i. d. =⇒ n −→ [Xi ] is proven by showing
L1
Sn = σ(S
(1) G n , S n+1 , . . .) is a backward filtration,
(2) n , Gn is a backward martingale, and
n
nh i h i
a.s.
(3) S n −→ S 1 G = X G = [X ].
n 1 ∞ i i
L1 ∞

Probability Theory I: CAM 384K Concepts

Uploaded by

Copyright:

Available Formats

You might also like

Probability Theory I: CAM 384K Concepts

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability Theory I: CAM 384K Concepts

Uploaded by

Copyright:

Available Formats

Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 1 of 14

If  is a measure where (Ω) = 1 then  : F → [0, 1] is a probability measure. It obeys

(π-λ theorem) P ⊂ L =⇒ σ(P) ⊂ `(P) ⊂ L.

X = Y almost surely when (X , Y) = µ ({X − Y , 0}) = 0.

Y : Ω → , Y measurable on σ(X) =⇒ ∃ f : R → R : Y = f (X).

f : Ω →  measurable is a simple function if f = ai 1Ai where ai ∈ , Ai ∈ F disjoint, µ(Ai ) < ∞.

fR dµ using An underlying the σ-finite space.

The Legesgue integral has the following properties:

Properties (i)–(iii) must be verified at each stage in integral’s development.

L0 (Ω, F, µ) B { f : Ω →  measurable}. L0 (Ω, F, µ) B L0 (Ω, F, µ) / ∼ where f ∼ g ⇐⇒ f = g a. e..

(Hölder’s inequality) p, q ∈ [1, ∞] such that 1p + 1q = 1 =⇒ | f g| d µ ≤ || f || p ||g||q .

(Minkowski’s inequality) p ∈ [1, ∞] =⇒ || f + g|| p ≤ || f || p + ||g|| p .

The variance of X, denoted varh X, iis the second centered moment.

(Jensen’s inequality) ϕ convex and [|X|] , |ϕ(X)| < ∞ =⇒ ϕ ( [X])

Independence of two objects, denoted using infix y, is defined as follows:

AyB =⇒ [AB] = [A] [B].

A1 , . . . , An independent π-systems =⇒ σ(A1 ) , . . . , σ(An ) independent.

Independence of objects derived from independent triangular arrays:

X1,1 . . . X1,m(1)  f1 X1,1 , . . . , X1,m(1) 

Given two random variables

(Fubini’s theorem) σ-finite (S 1 , S1 , µ1 ), (S 2 , S2 , µ2 ) and either f ≥ 0 or | f | d (µ1 × µ2 ) < ∞ =⇒

X1 , . . . Xn with densities f1 , . . . , fn independent ⇐⇒ (X1 , . . . , Xn ) has density f ((x1 , . . . , xn )) = f1 (x1 ) . . . fn (xn )

X y Y, h : 2 →  measurable =⇒ [h(X, Y)] =   h(x, y) dµX dµY using Fubini.

X y Y =⇒ f (X)g(Y) = f (X) g(Y) .

cov (X, Y) B (X − µX ) (Y − µY ) = [XY] − [X] [Y] is the covariance of X, Y ∈ L2 .

X, Y are uncorrelated if cov (X, Y) = ρ (X, Y) = 0.

uncorrelated X1 , . . . , Xn ∈ L2 =⇒ var (X1 + · · · + Xn ) = var X1 + · · · + var Xn .

(Borel-Cantelli 1) n=1 (An ) < ∞ =⇒ (An i. o.) = 0.

Weak convergence or convergence in distribution, written with an infix ⇒ , is defined as follows:

Weak convergence is metrizable, that is Fn ⇒ F ⇐⇒ ρ (Fn , F) −→ 0

(characterization of weak convergence) The following are equivalent:

(continuous mapping theorem) Xn ⇒ X, (X ∈ {x ∈  : g is discontinuous at x}) = 0 =⇒ g(Xn ) ⇒ g(X).

n ,  ∈ C ∗ (), the dual of the space of continuous, bounded functions on .

(Helly’s selection/compactness theorem)

For an integer-valued X, let an B (X = n) where ∞ an = 1.

Knowing g(x) is equivalent to knowing (X = n).

F X (t) = i=1 λi F Xi (t) i=1 λi = 1 =⇒ ϕX (t) = i=1 λi ϕXi (t).

Characteristic functions are uniformly continuous since dominated convergence implies

The inversion formula recovers a distribution from a characteristic function:

A triangular array satisfies the Lindeberg conditions if

1A1 +···+1An a.s. 1A1 +···+1An −log n

(Radon-Nikodym) If (Ω, F) a measurable space with measures µ, ν where µ(A) = 0 =⇒ R ν(A) = 0 ∀A ∈ F

Conditional expectation has the following properties:

A filtration {F n} a sequence of σ-algebras where Fn ⊂ Fn+1 .

A stochastic process can be viewed as one of

(1) a sequence of random variables,

An adapted stochastic process {Xn } satisfies Xn ∈ Fn ∀n, i.e. Xn is measurable wrt Fn .

All true martingales are both submartingales and supermartingales.

{Mn , Fn }n a martingale =⇒ Mn , Fnm n a martingale by the tower property.

i Any submartingale Xn can be decomposed uniquely as Xn = Mn + An where A0 B 0,

Assume Mn ∈ Fn , Mn ∈ L1 and consider ϕ :  →  such that ϕ(Mn ) ∈ L1 :

(H · M) n B nk=1 Hk (Mk − Mk−1 ) is the discrete time stochastic integral of H onto M.

{Mn , Fn }n supermartingale, H predictable with Hn bounded =⇒ {(H · M)n , Fn }n is a supermartingale.

N = inf n {Xn ∈ A} for A ∈ R is called the hitting time of A.

T stopping time =⇒ 1{n≤T } predictable because {T ≥ n} = {T < n}C = {T ≤ n − 1}C ∈ Fn−1 .

T stopping time, {Mn , Fn }n martingale =⇒ {Mn∧T , Fn }n martingale because MT ∧n = M0 + 1{n≤T } · M n .

(Xi )i∈I u. i. ⇐⇒ Xi are L1 -bounded and ∀ε > 0 ∃δ > 0 : ∀A ∈ F (A) ≤ δ =⇒ [|Xi | 1A ] ≤ ε ∀i ∈ I.

(de la Vallée Possin criterion)

|Xi | ≤ |X| for X ∈ L1 =⇒ (Xi )i u. i..

If is a measure where (Ω) = 1 then : F → [0, 1] is a probability measure. It obeys

X = Y almost surely when (X , Y) = µ ({X − Y , 0}) = 0.

Y : Ω → , Y measurable on σ(X) =⇒ ∃ f : R → R : Y = f (X).

f : Ω → measurable is a simple function if f = ai 1Ai where ai ∈ , Ai ∈ F disjoint, µ(Ai ) < ∞.

L0 (Ω, F, µ) B { f : Ω → measurable}. L0 (Ω, F, µ) B L0 (Ω, F, µ) / ∼ where f ∼ g ⇐⇒ f = g a. e..

(Jensen’s inequality) ϕ convex and [|X|] , |ϕ(X)| < ∞ =⇒ ϕ ([X])

X y Y, h : 2 → measurable =⇒ [h(X, Y)] = h(x, y) dµX dµY using Fubini.

(Borel-Cantelli 1) n=1 (An ) < ∞ =⇒ (An i. o.) = 0.

(continuous mapping theorem) Xn ⇒ X, (X ∈ {x ∈ : g is discontinuous at x}) = 0 =⇒ g(Xn ) ⇒ g(X).

n , ∈ C ∗ (), the dual of the space of continuous, bounded functions on .

For an integer-valued X, let an B (X = n) where ∞ an = 1.

Knowing g(x) is equivalent to knowing (X = n).

Assume Mn ∈ Fn , Mn ∈ L1 and consider ϕ : → such that ϕ(Mn ) ∈ L1 :

(Xi )i∈I u. i. ⇐⇒ Xi are L1 -bounded and ∀ε > 0 ∃δ > 0 : ∀A ∈ F (A) ≤ δ =⇒ [|Xi | 1A ] ≤ ε ∀i ∈ I.

n=1 σ(Xn , Xn+1 , . . .) ⊂ G∞ =⇒ (A) ∈ {0, 1}.