Professional Documents
Culture Documents
Probability Theory I: CAM 384K Concepts
Probability Theory I: CAM 384K Concepts
Probability Theory I: CAM 384K Concepts
28 Aug
F ⊂ 2Ω is a σ-algebra iff
(i) F,∅
(ii) A ∈ F =⇒ A0 = Ω − A ∈ F
∞
[
(iii) Ai ∈ F ∀i ∈ =⇒ Ai ∈ F
i=1
∅, Ω ∈ F. F is closed under countable intersection.
Ω , ∅ is the sample space. F ⊂ 2Ω is a σ-algebra of events. A ∈ F is an event.
(Ω, F) is a measurable space.
µ : F → [0, ∞] is a measure iff
(i) µ(A) ≥ µ(∅) = 0
[ X
(ii) Ai ∈ F countable, Ai disjoint =⇒ µ Ai = µ(Ai )
Given (Ω1 , F1 , 1 ) , . . . , (Ωn , Fn , n ) define the product probability space (Ω, F, ) where
Ω B Ω1 × · · · × Ωn = {(ω1 , . . . , ωn ) : ωi ∈ Ωi }
F B F1 × · · · × Fn = σ({A1 × · · · × An : Ai ∈ Fi })
B 1 × · · · × n where (A1 × · · · × An ) = 1 (A1 ) . . . n (An )
F exists by Carathéodory’s extension theorem. F is unique by the π − λ theorem.
P ⊂ 2Ω is a π-system iff
(i) P,∅
(ii) A, B ∈ P =⇒ A ∩ B ∈ P
L ⊂ 2Ω is a λ-system iff
(i) Ω∈L
(ii) A, B ∈ L, A ⊂ B =⇒ B − A ∈ L
(iii) Ai ∈ L, Ai % A =⇒ A ∈ L
Generally, λ-systems are not σ-algebras.
A λ-system which is additionally closed under intersection is a σ-algebra.
If A ⊂ 2Ω then A generates `(A) B L, λ-system
T
:A⊂L
using that Lι ⊂ 2Ω , ι ∈ I are λ-systems =⇒ ι∈I Lι is a λ-system.
T
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 2 of 14
For measure µ : R → [0, ∞], assume ∃F : → such that µ((a, b]) = F(b) − F(a).
Such an F must be nondecreasing and right continuous.
(Lebesgue-Stieltjes)
F : → nondecreasing, right-continuous =⇒ ∃!µ : R → [0, ∞] where µ((a, b]) = F(b) − F(a).
(Lebesgue measure) The unique measure λ : R → [0, ∞] where λ ((a, b]) = b − a.
4 Sept
{X ∈ B} B X −1 (B) = {ω ∈ Ω : X(ω) ∈ B}.
A random variable is a function X : (Ω, F) → such that {X ∈ B} ∈ F ∀B ∈ R.
Define the probability measure µX : → [0, 1], called the distribution, such that µ X (B) = (X ∈ B).
(, R, µX ) is a probability space.
The distribution function of X is F X : → [0, 1] such that F X (x) B (X ≤ x) = µX ((−∞, x]).
Every F X satisfies
(i) x ≤ y =⇒ F(x) ≤ F(y)
(ii) xn & x =⇒ F(x) = lim F(xn ) = F(x+ )
n
(iii) (X = x) = F(x) − F(x− )
(v) (X < x) = F(x− )
(vi) F(−∞) B lim F(x) = 0
x→−∞
(vii) F(+∞) B lim F(x) = 1
x→+∞
Properties (i), (ii), (vi), and (vii) characterize a distribution.
Given F : → obeying (i), (ii), (vi), and (vii), there exists a random variable X such that F X = F.
−1 (y) B sup
Fleft {x ∈ X : F(x) < y} is the left continuous inverse of a distribution F.
−1 (y) B sup
Fright {x ∈ X : F(x) ≤ y} is the right continuous inverse of a distribution F.
−1 −1 (y) is used.
Generally, F (y) B Fleft
F X continuous =⇒ F X (X) is uniformly distributed on (0, 1).
9 Sept
Random variables X and Y are equal in distribution, denoted X =Y, when F X = FY .
d
1 x ∈ A
Define the indicator function 1A (x) =
.
0 x < A
X : (Ω, F) → (S , S) is measurable if ∀B ∈ S {X ∈ B} ∈ F.
X measurable wrt F is also written X ∈ F.
X
For (Ω, F) → (S , S = σ(A)), if {X ∈ A} ∈ F ∀A ∈ A then X is measurable.
X f
For (Ω, F) → (S , S) → (T, T ) if X, f measurable then f (X) measurable.
X1 , X2 , . . . Xn random variables, f : (n , Rn ) → (, R) measurable =⇒ f (X1 , . . . , Xn ) a random variable.
For example, X1 + · · · + Xn is a random variable.
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 3 of 14
X1 , . . . , Xn , . . . random variables =⇒ inf Xn , sup Xn , lim inf Xn , and lim sup Xn are all random variables.
The extended real numbers are B ∪ {−∞, +∞}. The extended Borel sets are R B σ .
An extended random variable is a function X : (Ω, F) → such that {X ∈ B} ∈ F ∀B ∈ R.
All random variables are automatically extended random variables.
Property p (ω) is satisfied almost everywhere (a. e.) when µ({ω ∈ Ω : ¬p (ω)}) = 0.
Property p (ω) is satisfied almost surely (a. s.) when µ({ω ∈ Ω : ¬p (ω)}) = 0.
There are four incremental stages in the development of the Lebesgue integral:
Z X
(1. f simple) f dµ B ai µ(Ai )
Z Z Z
(2. f bounded, finite support) f dµ B sup ϕ dµ = inf ϕ dµ
ϕ≤ f ϕ≥ f
ϕ simple ϕ simple
Z Z
(3. f ≥ 0) f dµ B sup 0≤h≤ f h dµ
h bounded
h finite support
Z Z Z
(4. f = f + − f − ) f dµ B f + dµ − f − dµ
If ∃ f : → [0, ∞) such that (X ∈ B) = B f (x) dx then f is the density function of X, denoted fX .
R
Rx
By definition F X (x) = −∞ fX (y) dy and fX (y) dy = 1.
R
R
The expected value of X on (Ω, F, ) is
[X] B X d.
[X] is also called the mean and denoted µX or µ.
On a discrete space,
[X] = i ωi (X = ωi ).
P
(X ∈ A) =
[1A ].
X g
(transport formula) (Ω, F, ) → (S , S) → (, R), g measurable, g ≥ 0 or g bounded =⇒
Z Z Z Z
g(X) = g(X) d = g(x) dµX (x) = g(x) dF X (x) =
g(x) fX (x) dx.
The transport formula implies g(X) ∈ 1
L (Ω, F, )
⇐⇒ g ∈ L1 (, R, µ).
h i
X k for k ∈ is called the kth moment of X.
h i h i
(X −
[X])k =
(X − µ)k for k ∈ is called the kth centered moment of X.
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 5 of 14
The conditional probability that event A occurs given event B is (A|B) = (A∩B)
(B) provided (B) , 0.
Independence of a finite collection requires these logical extensions of the above to hold:
\ Y
(σ-algebras G1 , . . . ,Gn ) Ai = (Ai ) for Ai ∈ Gi
\ Y
(Random variables X1 , . . . , Xn ) {Xi ∈ Ci } = (Xi ∈ Ci ) for Ci ∈ R
\ Y
(Events A1 , . . . , An ) Ai = (Ai ) whenever I ⊂ {1, . . . , n}
i∈I i∈I
\ Y
(Classes of events A1 , . . . , An ) Ai = (Ai ) for Ai ∈ Ai whenever I ⊂ {1, . . . , n}
i∈I i∈I
Independence of an infinite collection requires that every finite subcollection be independent.
Pairwise independence of a collection’s elements does not imply the collection is independent.
(X1 ≤ x1 , . . . , Xn ≤ xn ) = ≤ xi ) =⇒ X1 , . . . , Xn independent.
Qn
i=1 (Xi
25 Sept
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 6 of 14
If X1 , . . . Xn are random variables on (Ω, F, ) then X1 , . . . Xn independent ⇐⇒ µX1 ,...,Xn = ni=1 µXi .
Q
S n −
[S n ] L2
(WLLN for triangular arrays) S n = Xn,1 + · · · + Xn,n and var S n
b2n
−→ 0 for some bn =⇒ bn −→0.
7 Oct
A random variable X with large tails can be truncated outside a threshold M, i.e. X̄ B X1{|X|≤M} .
(WLLN for triangular arrays with independent rows) Construct bn > 0, bn % ∞ such that both
Pn 2
n
Xn,k 1
{|Xn,k |≤bn }
X k=1
Xn,k ≥ bn −→ 0 and −→ 0.
k=1
b2n
Define an = nk=1
Xn,k 1{|Xn,k |≤bn } and S n = Xn,1 + · · · + Xn,n . Under these conditions S nb−a
P n
n
−→0.
R∞
X ≥ 0, f : [0, ∞] → [0, ∞] increasing, f ∈ C 1 , and f (0) = 0 =⇒
[ f (X)] = 0 f 0 (x)(X ≥ x) dx
R∞
In particular,
[X p ] = 0 px p−1 (X > x) dx allows estimating moments using tails.
For discrete N : Ω → ∪ {∞},
[N] = ∞
P
n=0 (N h ≥ n).i
Use p = 1 − ε to show x(|X| > x) −→ 0 =⇒
|X|1−ε < ∞ for ε > 0.
(General WLLN) X1 , . . . , Xn , . . . i. i. d., x(|X| > x) −→ 0 =⇒ Snn − µn −→0 where µn =
X 1{|X|≤n} .
X ∈ L1 =⇒ x(|X| > x) −→ 0 since x(|X| > x) =
x 1{|X|>x} , x 1{|X|>x} ≤ |X|, and x 1{|X|>x} → 0 as x → ∞.
Sn
(L1 -WLLN) Xi ∈ L1 i. i. d. and
[Xi ] = µ =⇒ n −→µ.
9 Oct
Define lim inf An ⊂ lim sup An for sequences of subsets of Ω:
[\ \
Al = lim Al = ω that are in all but finitely many An ’s
lim inf An B
n n→∞
n l≥n l≥n
\[ [
Al = lim Al = ω that are in infinitely many An ’s
lim sup An B
n n→∞
n l≥n l≥n
lim sup An is read An infinitely often (i. o.), i.e. (An i. o.) B lim sup An .
a.s.
Xn −→X ⇐⇒ ∀ε > 0 ({|Xn − X| ≥ ε} i. o.) = 0 using that {Xn −→X}
/ = ∪ε>0 ∩n ∪l≥n {|Xl − X| ≥ ε}.
fast
Xn converges fast to X, denoted Xn −→X, if ∀ε > 0 ≥ ε}) < ∞.
P∞
n=1 ({|Xn − X|
fast a.s.
Xn −→X =⇒ Xn −→X by Borel-Cantelli.
(convergence of random variables) Using that for a topological space, yn −→ y if ∀ynm ∃ynmk −→ y:
fast BC1 a.s.
(i) Xn −→X =⇒ Xn −→X =⇒ Xn −→X
fast
(ii) Xn −→X =⇒ ∃Xnk : Xnk −→X
a.s.
(iii) Xn −→X ⇐⇒ ∀Xnm ∃Xnmk −→X
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 8 of 14
There exist sequences that converge in probability but not almost surely.
Convergence in probability comes from a metric, but convergence almost surely is not from any topology.
S n a.s.
(L4 -SLLN) Xi ∈ L4 i. i. d. =⇒ n −→µ.
14 Oct
S n a.s.
(SLLN) Xi ∈ L1 i. i. d. =⇒ n −→µ.
= ∞ =⇒ (An i. o.) = 1.
P∞
(Borel-Cantelli 2) Ai independent and i=1 (An )
For independent An , the Borel-Cantelli lemmas impose a zero-one law forcing (An i. o.) to be either 0 or 1.
1A1 +···+1An a.s.
= ∞ =⇒
P∞
(Borel-Cantelli 2 extension) Ai independent and i=1 (An ) (A1 )+···+(An ) −→1 as n → ∞.
When taking expectations, everything using almost sure convergence can instead use weak convergence.
Fn a tight sequence of distribution functions =⇒ ∃Fnk , ∃F a distribution function such that Fnk ⇒ F.
(Prokorov’s theorem) For P, a set of probability measures on , P is tight ⇐⇒ ∀Pn ∃Pnk , P∞ : Pnk ⇒ P∞ .
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 9 of 14
For ϕ : → [0, ∞) such that lim|x|→∞ ϕ(x) = +∞, if ϕ(x) dF(x) ≤ C < ∞ ∀F ∈ P then P is tight.
R
itb −eita
Uniform U (a, b) with density f (x) = 1
b−a has characteristic function ϕ(t) = eit(b−a) .
2 2
Normal N (0, 1) with density f (x) = √1 exp −x2 has characteristic function ϕ(t) = exp − t2 .
2π
2
N µ, σ with density f (x) = √ 1 exp −(x−µ)
Normal 2
2σ2
has characteristic function ϕ(t) = exp itµ − 2 σ
1 2 2
t .
2πσ
ϕ = ϕ0 =⇒ µ = µ0
R∞ 1 ∞ −itx
|ϕ (t)| dt < ∞ =⇒ X has a continuous, bounded density function fX (x) = ϕ(t)
R
−∞ X 2π −∞ e dt.
30 Oct
(continuity theorem) For a sequence of probability distributions µn and their characteristic functions ϕn (t),
(i) µn ⇒ µ∞ =⇒ ϕn (t) −→ ϕ∞ (t) ∀t
(ii) ϕn (t) → ϕ∞ (t) ∀t, ϕ∞ continuous at 0 =⇒ µn ⇒ µ∞ , µn tight
where µ∞ is another probability distribution and ϕ∞ is its characteristic function.
S n −nµ
(CLT) Xi ∈ L2 i. i. d.,
[Xi ] = µ, 0 < var Xi = σ2 , S n = Xi + · · · + Xn =⇒ σ n
√ ⇒ N(0, 1).
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 10 of 14
≤ min |x|n+1 , 2|x|n with n = 2 gives the estimate ϕ(t) − 1 − σ2 t2 ≤
min |tx|3 , t2 x2 .
n
Using eix − nk=0 (ix)
P
n! (n+1)! n! 2 6
n
cn → c ∈ =⇒ 1 + cnn → ec
(self-normalized sums) Pn
X
Xi i. i. d.,
[Xi ] = 0, var Xi = σ2 ∈ (0, ∞) =⇒ √Pi=1n i 2 ⇒ N (0, 1).
i=1 Xi
√ √ Rb
a pq a pq
√1 e−x /2 dx
B(n,p)−np B(n,p) 2
√ √
pq n
⇒ N(0, 1) so n ∈ p + √n , p + √n −→ a by the CLT.
2π
h h i i h h i i h i 11 Nov
(tower property) F1 ⊂ F2 =⇒
X F2 F1 =
X F1 F2 =
X F1 , i.e. “smaller σ-algebra wins”
h i h i
(taking out what is known) X ∈ G and XY, Y ∈ L1 =⇒
XY G = X
Y G .
h i h i
(monotone convergence) If X ∈ L1 and Xn % X =⇒
Xn G %
X G .
A discrete time stochastic process is a sequence of random variables {Xn } on (Ω, F, ) indexed by time.
{Xn } (ω) = {X0 (ω), X1 (ω), . . . } for fixed ω ∈ Ω is called a path of a stochastic process.
Over time, a {sub-, true, super-}martingale {increases, stays the same, decreases} in conditional expectation:
h i
Mn submartingale ⇐⇒
Mn+1 − Mn Fn ≥ 0
h i
Mn martingale ⇐⇒
Mn+1 − Mn Fn = 0
h i
M supermartingale ⇐⇒
M − M F ≤ 0
n n+1 n n
H is predictable if Hn ∈ Fn−1 .
Predictable implies is “deterministic” behavior on natural filtration.
Predictable H may be thought of as a betting strategy, i.e. Hn (Mn − Mn−1 ) is the payoff at time n.
{T = n} ∈ Fn =⇒ {T ≤ n} = ({T = 0} ∪ · · · ∪ {T = n − 1} ∪ {T = n}) ∈ Fn .
{T = n} ∈ Fn =⇒ {T < n} = ({T = 0} ∪ · · · ∪ {T = n − 1}) ∈ Fn .
XTn B XT∧n = XT (ω)∧n (ω) is a stopped process: one which “runs until stopping time N occurs.”
1{n≤T } · X n = XT ∧n − X0
Rhys Ulerich, CAM 384K concepts (updated February 12, 2009) Page 13 of 14
Using Mn submartingale and a < b, let T 2k+1 = minn≥2k {Mn ≤ a} and let T 2k+2 = minn≥2k+1 {Mn ≥ b}.
Call (T 2k+1 , T 2k+2 ) an interval
P of upcrossing and denote U (a, b) for the number of upcrossings.
The trading strategy Hn = ∞ k=0 1{T 2k+1 <n≤T 2k+2 } is predictable. It represents buy below a and sell above b.
Define Yn = Xn ∧ a = a + (Xn − a)+ so that (H · Y)n represents gains over upcrossings plus a possible last gain.
Then (b − a) Un (a, b) ≤ (H · Y)n . Also 1 − H ≥ 0 =⇒ ((1 − H) · Y) submartingale =⇒
[((1 − H) · Y)] ≥ 0.
(submartingale upcrossing inequality) ∴ (b − a)
[U n (a, b)] ≤
(X n − a)+ −
(X0 − a)+
(martingale convergence)
{Mn , Fn }n submartingale, supn
Mn+ < ∞ =⇒ Mn −→M, M ∈ L1 .
a.s.
a.s.
{Mn , Fn }n supermartingale, Mn ≥ 0 =⇒ Mn −→M,
[M] ≤
[M0 ].
20 Nov
(Xi )i∈I is uniformly integrable (u. i.) if supi∈I
|Xi | 1{|Xi |≥M} → 0 as M → ∞.
L1
Xn −→X =⇒ (Xn )n u. i. ⇐⇒ Xn −→X ⇐⇒
[|Xn |] →
[|X|] < ∞
" h i#
a.s. L1
Mn martingale =⇒ (Mn )n u. i. ⇐⇒ Mn −→ M ⇐⇒ Mn −→M ⇐⇒ ∃M ∈ L : Mn =
M Fn
1
L1
" #
a.s. L1
Mn submartingale =⇒ (Mn )n u. i. ⇐⇒ Mn −→ M ⇐⇒ Mn −→M
L1
h i h i L1 h i
(multistep at n = ∞) {Mn }n submartingale =⇒
M∞ Fk ≥ Mk since
Mn Fk −→
M∞ Fk .
h i
M ∈ L1 is called a last element of an adapted {sub-, true, super-}martingale Mn if
M Fk {≥, =, ≤} Mk ∀k.
A {sub-, true, super-}martingale has a last element ⇐⇒ Mn+ n , (Mn )n , Mn− n u. i..
n o
h i a.s. h i S 25 Nov
(Lévy’s theorem) Given (Fn )n , X ∈ L1 (Ω, F, ) =⇒
X Fn −→
X F∞ where F∞ = σ ∞ n=0
F n .
a.s. L1
(Lévy’s corollary) A ∈ F∞ =⇒ AFn −→1A .
L1h i
a.s.
(Lévy’s 0-1 law) Fn % F∞ , A ∈ F∞ =⇒
1A Fn −→1A .
(martingale inequality) Xn submartingale, X¯n B max Xk+ =⇒ λ X¯n ≥ λ ≤
1{X¯n ≥λ} Xn+ ≤
Xn+ .
h i
k=0,...,n
h p i p p
(Doob’s L p maximal inequality) Xn submartingale, X¯n B max Xk+ , 1 < p < ∞ =⇒
X¯n ≤ p−1
Xn+ p .
k=0,...,n
(Wald I) Xi ∈ L1 i. i. d. , S n = X1 + · · · + Xn , N stopping
h i time where
[N]h < ∞ i =⇒
[S N ] =
[N]
[X].
(Wald II) Above assumptions plus
[X] = 0,
X 2 = σ2 < ∞ =⇒
S 2N = σ2
[N].
a.s.
h i
Xn martingale , 1 < p < ∞, sup
(Xn ) p < ∞ =⇒ Xn −→ , =
p
X ∞ ∈ F ∞ Xn
X∞ Fn .
n L
( )
For 1 < p < ∞ we identify L p (Ω, F ∞ , ) Hp with Xn martingale : sup
(Xn ) < ∞ .
p
≡
n
4 Dec
Within (Ω, F), a sequence of σ-algebras {Gn }n is called a backward filtration if F ⊃ Gn ⊃ Gn+1 ∀n.
n o h i
Mn ∈ L1 , Gn is a backward {submartingale, martingale, supermartingale} if
Mn Gn+1 {≥ , =, ≤ } Mn+1 .
n
Sn = σ(S
(1) G n , S n+1 , . . .) is a backward filtration,
(2) n , Gn is a backward martingale, and
n
nh i h i
a.s.
(3) S n −→
S 1 G =
X G =
[X ].
n 1 ∞ i i
L1 ∞