Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei

Martingale Limit Theory
and
Stochastic Regression Theory
Ching-Zong Wei
Contents
1 Martingale Limit Theory 2

1.1 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Martingale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Basic Inequalities (maximum inequalities) . . . . . . . . . . . . . . . 25
1.4 Square function inequality . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5 Series Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2 Stochastic Regression Theory 109

2.1 Introduction: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
1
Chapter 1
Martingale Limit Theory
Some examples of Martingale:
Example 1.1 Let yi = ayi−1 + εi , where εi i.i.d. with E(εi ) = 0, Var(εi ) = σ 2 , and
if we estimate a by least squares estimation
Pn
yi−1 yi
â = Pi=1
n 2
i=1 yi−1
Pn
yi−1 εi
â − a = Pi=1n 2
,
i=1 yi−1
then Sn = ni=1 yi−1 εi is a martingale.

P
Example 1.2 Likelihood Ratio:

Given Θ, and
Ln (θ) = fθ (X1 , . . . , Xn )
= fθ (Xn |X1 , . . . , Xn−1 ) · fθ (X1 , . . . , Xn−1 )
Yn
= fθ (Xi |X1 , . . . , Xi−1 ) · fθ (X1 ),
i=2
Ln (θ)
then Rn (θ) = Ln (θ0 )
, Rn (θ) is a martingale.
2
For example, if Xi = θui + εi , where ui is constant, {εi } is i.i.d. N (0.1), then
n (x −θu )2
1
P
i=1 i i
fθ (x1 , . . . , xn ) = ( √ )n e− 2
2π
Pn
(x −θui )2
Pn
fθ (x1 , . . . , xn ) (x −θ u )2
− i=1 2i + i=1 i2 0 i
= e
fθ0 (x1 , . . . , xn )
Pn (θ 2 −θ0
2) Pn
ui xi − u2i
= e(θ−θ0 ) i=1 2 i=1 .
Example 1.3 Likelihood: L0 = 1, d logdθLn (θ) is a martingale.
log Ln (θ) = log fθ (Xn |X1 , . . . , Xn−1 ) + log Ln−1 (θ)

d log fθ (Xn |X1 , . . . , Xn−1 ) d[log Ln (θ) − log Ln−1 (θ)]
ui (θ) = =
dθ dθ
Xn
In (θ) = Eθ (u2i (θ)|X1 , . . . , Xn−1 ).
i=1
Let
dui (θ) d2 log fθ (Xn |X1 , . . . , Xn−1 )
Vi (θ) = = ,
dθ dθ2
since
Eθ (u2i (θ)|X1 , . . . , Xi−1 ) = −Eθ (Vi (θ)|X1 , . . . , Xi−1 )
and n
X
Jn (θ) = Vi (θ),
i=1
Then Jn (θ) + In (θ) is a martingale.
Example 1.4
PZnBranching Process with Immigration :
Let Zn+1 = i=1 Yn+1,i +In+1 , where {Yj,i } is i.i.d. with mean E(Yj,i ) = m, Var(Yj,i ) =
3
σ 2 , and {In } is i.i.d. with mean E(In ) = b, Var(In ) = λ, then
E(Zn+1 |Fn ) = mZn + b

Zn+1 = E(Zn+1 |Fn ) + δn+1
δn+1 = Zn+1 − E(Zn+1 |Fn )
2
E(δn+1 |Fn ) = σ 2 Zn + λ
Zn
X
Zn+1 = mZn + b + { (Yn+1,i − m) + (In+1 − b)}
i=1
p
= mZn + b + σ 2 Zn + λ εn+1 ,
where
δn+1
εn+1 = √ 2
.
σ Zn + λ
4
Consider (Ω, F, P), where
Ω: Sample space
F: σ–algebra ⊂ 2Ω
P: probability
{X = ai } = Ei , i = 1, . . . , n
FX = minimum σ–algebra ⊃ {E1 , . . . , En }
FX1 ,X2 = σ–algebra ⊃ {X1 = ai , X2 = bj } i = · · · , j = · · ·
Note that FX1 ,X2 ⊃ FX1 .
{Xn } is said to be {Fn }–adaptive if Xn is Fn –measurable (i.e. FXn ⊂ Fn .)
1.1 Conditional Expectation

Main purpose: Given X1 = a1 , . . . , Xn = an to find the expectation of Y , i.e. to find
E(Y |X1 = a1 , . . . , Xn = an ).
(Ω, F, P) is a probability space.
Given an event B with P (B) > 0, the conditional probability given B is defined
to be
P (A ∩ B)
P (A|B) = ∀A ∈ F,
P (B)
then (Ω, F, P(·|B)) is a probability space.
Given X, we can define
Z
E(X|B) = XdP (·|B).
Pn Pn
Example 1.5 Let X = i=1 ai IAi where Ai = {X = ai }, then E(X|B) = i=1 ai P (Ai |B).
Ω = ∪∞ i=1 Bi , where Bi ∩ Bj = ∅ if i 6= j.
F = σ(Bi ), P 1≤i<∞
E(X|F) = ∞ i=1 E(X|Bi )IBi
Observe that if X = ni=1 ai IAi , Ω = ∪li=1 Bi , Bi ∩ Bj = ∅ if i 6= j, then

P
(i) E(X|F) is F–measurable and E(X|F) ∈ L1 ,

R R
(ii) ∀ G ∈ F, G E(X|F)dP = G XdP .
5
Sol :
(i) E(X|F) = li=1 E(X|Bi )IBi ,
P
|E(X|F)| ≤ li=1 |E(X|Bi )| < ∞ ⇒ E(X|F) ∈ L1

P
(ii) ∀ G ∈ F
Z l
Z X
E(X|F)dP = E(X|Bi )IBi dP
G G i=1
l
X
= E(X|Bi )P (Bi ∩ G)
i=1
Xl Xn
= aj P (Aj |Bi )P (Bi ∩ G)
i=1 j=1
n
X Xl
= aj ( P (Aj |Bi )P (Bi ∩ G))
j=1 i=1
Xn
= aj P (Aj ∩ G).
j=1
Since by hypothesis G ∈ F,
∃ an index set I s.t. G = ∪i∈I Bi
l
X X X
P (Aj |Bi )P (Bi ∩ G) = P (Aj |Bi )P (Bi ) = P (Aj ∩ Bi )
i=1 i∈I i∈I
= P (Aj ∩ (∪i∈I Bi )) = P (Aj ∩ G).
Definition 1.1 (Ω, G, P) is a probability space. Let F ⊂ G, X ∈ L1 . Define the

conditional expectation of X given F to be a random variable that satisfies (i) and
(ii).
Existence and Uniqueness:

Uniqueness: Assume Z, W both satisfies (i) and (ii).
Let G = Z > W .
By
6
(i) G is F–measurable,
R R R
(ii) G (Z − W )dP = G XdP − G XdP = 0
⇒ P (G) = 0.
Recall that Z ≥ 0 a.s. and E(Z) = 0 ⇒ P (Z > 0) = 0.

Similarly, P (W > Z) = 0.
Existence: X ≥ 0, X = li=1 ai IAi

P
Define ν(G) = G XdP = li=1 ai P (Ai ∩ G) ∀ G ∈ F.

R P
Then ν is a (σ–finite) measure on F.
ν P|F = P̃ (P̃ (G) = 0 ⇒ ν(G) = 0)
By Radon-Nikodym theorem ∃ F–measurable function f

Z Z
s.t. f dP = f dP̃ = ν(G)
G G
so f = E(X|F) a.s.
• derivative : 4f /4t
• density : contents/unit vol
• ratio
Radon-Nikodym Theorem : Assume that ν and µ are σ–finite measure on F s.t.

ν µ. Then ∃ F–measurable function f s.t.
Z
dν
f dµ = ν(A) ∀A ∈ F (f = ).
A dµ
1. transformation of X −→ new measure
2. FA 6= FB ⇒ E(X|FA ) 6= E(X|FB )
Example 1.6
7
1. Discrete : F = σ(Bi , 1 ≤ i < ∞) X ∈ L1
∞
R
X Bi
XdP
E(X|F) = IBi
i=1
P (Bi )
2. Continuous : Let R f (x, y1 , . . . , yn ) be the joint density of (X, Y1 , . . . , Yn ) and

g(y1 , . . . , yn ) = f (x, y1 , . . . , yn )dx,
f (x,y1 ,...,yn )
Set f (x|y1 , . . . , yn ) = g(Ỹ )
I[g(Ỹ )6=0] , Ỹ = (y1 , . . . , yn ).
Then E(ϕ(X)|Y1 , . . . , Yn ) = h(Y1 , . . . , Yn ) a.s.,
R
where h(y1 , . . . , yn ) = ϕ(x)f (x|y1 , . . . , yn )dx.
We only have to show for any Borel set B ⊂ Rn ,
Z
E(h(Ỹ )IB ) = h(Ỹ )g(Ỹ )dỸ , Ỹ = (Y1 , . . . , Yn )
B
Z Z
= [ ϕ(x)f (x|Ỹ )dx]g(Ỹ )dỸ
B
Z Z
= ϕ(x)f (x, Ỹ )dxdỸ
ZBZ
= ϕ(x)IB f (x, Ỹ )dxdỸ
= E(ϕ(X)IB )
= E(E(ϕ(X)IB |Ỹ ))
⇒ ϕ(X) = h(Ỹ )
Proposition 1.1 Let X, Y ∈ L1 ,

1. E[E(X|F)]
R = E X. R
Proof : Ω E(X|F)dP = Ω XdP .
2. E(X|{∅, Ω}) = E X.
3. If X is F–measurable then
R E(X|F) = XR a.s..
Proof : Since ∀G ∈ F G E(X|F)dP = G XdP .
4. If X = cR ,a constant,
R a.s. then E(X|F) = c a.s..
Proof : G XdP = G cdP, Y ≡ c is F–measurable.
8
5. ∀ constantsR a, b E(aXR + bY |F) = aE(X|F) + bE(Y |F).
Proof : G
(rhd) = G (lhs).
6. X ≤ Y a.s. ⇒ E(X|F) ≤ E(Y |F).
Proof : Use (5), we only show that
X − Y = Z ≥ 0 a.s. ⇒ E(Z|F) ≥ 0 a.s..
Let A = {E(Z|F) < 0}, then
Z Z
0≤ ZdP = E(Z|F)dP ⇒ P (A) = 0.
A A
7. |E(X|F)| ≤ E(|X||F) a.s..

8. |Xn | ≤ Y a.s., Y ∈ L1 . If limn→∞ Xn = X a.s., then
lim E(Xn |F) = E(X|F) a.s..
n→∞
Proof :
Set Zn = supk≥n |Xk −X|, then Zn ≤ 2Y . So Zn ∈ L1 , and Zn ↓ ⇒ E(Zn |F) ↓ .
So ∃Z s.t. limn→∞ E(Zn |F) = Z a.s.. We only have to show that Z = 0 a.s..
Since |E(Xn |F) − E(X|F)| ≤ E(|Xn − X||F) ≤ E(Zn |F).
Note that Z ≥ 0 a.s.. We only have to prove E Z = 0.
Since E(Zn |F) ↓ Z, hence
E Z ≤ lim E(E(Zn |F)) = lim E(Zn ) = E( lim Zn ) = 0
n→∞ n→∞ n→∞
⇒ E Z = 0.

Theorem 1.1 If X is F–measurable and Y, XY ∈ L1 , then E(XY |F) = XE(Y |F).
Proof :
1. X = IG where G ∈ F
∀B∈F
Z Z Z Z
E(XY |F)dP = XY dP = IG Y dP = Y dP
B B B B∩G
Z
= E(Y |F)dP (Since B ∩ G ∈ F)
ZB∩G Z
= IG E(Y |F)dP = XE(Y |F)dP.
B B
So E(XY |F) = XE(Y |F).
9
P 2
2. Find Xn s.t. Xn = nk=0 nk I[ k ≤x< k+1 ] − nk I[− k+1 <x≤− k ] ,
n n n n
then |Xn | ≤ |X|, and Xn → X a.s..
From (1), we obtain that E(Xn Y |F) = Xn E(Y |F).
Now Xn Y → XY a.s.
|Xn Y | = |Xn ||Y | ≤ |XY |
byD.C.T.
limn→∞ E(Xn Y |F) = E(limn→∞ Xn Y |F) = E(XY |F).
But limn→∞ Xn E(Y |F) = XE(Y |F) a.s..
So E(XY |F) = XE(Y |F).
Theorem 1.2 (Towering)

If X ∈ L1 and F1 ⊂ F2 , then E[E(X|F2 )|F1 ] = E(X|F1 ).
Proof : ∀ B ∈ F1 then B ∈ F2 and
Z Z
E[E(X|F2 )|F1 ]dP = E(X|F2 )dP (Since B ∈ F1 )
B B
Z
= XdP (Since B ∈ F2 ).
B
So E[E(X|F2 )|F1 ] = E(X|F1 ) a.s..
Remark 1.1 E[E(X|F1 )|F2 ] = E(X|F1 )E[1|F2 ] = E(X|F1 ), since E(X|F1 ) is F2 –

measurable.
Jensen’s Inequality : If ϕ is a convex function on R and X, ϕ(X) ∈ L1 then

ϕ(E(X|F)) ≤ E(ϕ(X)|F) a.s..
Proof :
1. Let X = ki=1 ai IAi , where ∪ki=1 Ai = Ω, and Ai ∩ Aj = ∅ if i 6= j, then
P
k
X
E(X|F) = ai E(IAi |F).
i=1
Since
k
X Xk
E(IAi |F) = E( IAi |F) = E(1|F) = 1 a.s.,
i=1 i=1
10
so
k
X
ϕ(E(X|F)) ≤ E(IAi |F)ϕ(ai )
i=1
Xk
= E( ϕ(ai )IAi |F) = E(ϕ(X)|F)
i=1
P
2. Find Xn as before (i.e., Xn is of the form ai IAi , |Xn | ≤ |X|, and Xn → Xa.s..)
Then ϕ(E(Xn |F)) ≤ E(ϕ(Xn )|F).
First observe that E(Xn |F) → E(X|F)a.s.. By continuity of ϕ,
lim ϕ(E(Xn |F)) = ϕ( lim E(Xn |F)) = ϕ(E(X|F))

n→∞ n→∞
Fix m,we can find a convex function ϕm such that ϕm (x) = ϕ(x), ∀|x| ≤ m,
and |ϕm (x)| ≤ Cm (|x| + 1), ∀x, and ϕ(x) ≥ ϕm (x), ∀x.
Fix m, ∀n,
|ϕm (xn )| ≤ Cm (|xn | + 1) ≤ Cm (|x| + 1),
so
lim E[ϕm (xn )|F] = E[ lim ϕm (xn )|F] = E[ϕm (x)|F],
n→∞ n→∞
E[ϕ(x)|F] ≥ sup E[ϕm (x)|F] = sup lim E[ϕm (xn )|F]

m m n→∞
≥ sup lim ϕm (E(Xn |F)) = sup ϕm [ lim E(Xn |F)]
m n→∞ m n→∞
= sup ϕm [E(X|F)] = ϕ[E(X|F)] a.s.

m
Some properties of convex function ϕ :
• If λi ≥ 0, ni=1 λi = 1 then ϕ( ni=1 λi xi ) ≤ ni=1 λi ϕ(xi )

P P P
• The geometry property
• ϕ is continuous(since right-derivative and left-derivative exist)
11
Corollary 1.1 If X ∈ Lp , p ≥ 1 then E(X|F) ∈ Lp .
Proof : Since ϕ(x) = |x|p is convex if p ≥ 1, then
|E(X|F)|p ≤ E(|X|p |F) a.s.
and
E|E(X|F)|p ≤ EE(|X|p |F) = E|X|p < ∞.
Homework :
1 1
1. If p > 1 and p
+ q
= 1,X ∈ Lp , Y ∈ Lq , then
1 1
E(|XY ||F) ≤ E(|X|p |F) p E(|Y |q |F) q a.s..
2. If X ∈ L2 and Y ∈ L2 (F) = {U : U ∈ L2 and U is F–measurable}, then
E(X − Y )2 = E(X − E(X|F))2 + E(E(X|F) − Y )2 .
Therefore
inf2 E(X − Y )2 = E(X − E(X|F))2 .
Y ∈L (F )
Proof :
E(X − Y )2 = E(X − E(X|F) + E(X|F) − Y )2

= E(X − E(X|F))2 + E(E(X|F) − Y )2
+2E[(X − E(X|F))(E(X|F) − Y )].
Lemma 1.1 E(X − E(X|F))U = 0 if U ∈ L2 (F).

proof:
E[E((X − E(X|F))U |F)] = EU [E((X − E(X|F))|F]
= EU [E(X|F) − E(X|F)] = EU · 0 = 0.
Application : Bayes Estimate (X1 , · · · , Xn ) ∼ f (~x|θ) , θ ∈ L2 , Xi ∈ L2 . Use

X1 , · · · , Xn to estimate θ.
Method : find θ̂(X1 , · · · , Xn ) ∈ L2 such that E(θ − θ̂)2 is minimum.
12
Remark 1.2 Let Fn = σ(X1 , · · · , Xn ). Then θ̂ is Fn –measurable
⇔ ∃ measurable function h such that θ̂ = h(X1 , · · · , Xn ) a.s.
So θ̂n = E(θ|Fn ) is the solution.
Question : In what sense θ̂n −→ θ ?
1.2 Martingale
(Ω, F, P)
Fn ⊂ F, Fn ⊂ Fn+1 : history(filtration)
Definition 1.2
(i) Xn is Fn –adaptive ( or adapted to Fn ) if Xn is Fn –measurable ∀n.
(ii) Yn is Fn –predictive ( predictive w.r.t. Fn ) if Yn is Fn−1 –measurable ∀n.
(iii) The σ–fields Fn = σ(X1 , · · · , Xn ) is said to be the natural history of {Xn }.( It
is obvious Fn ↑. )
(iv) {Xn , n ≥ 1} is said to be a martingale w.r.t. {Fn , n ≥ 1} ,if
(1) Xn is Fn –adaptive.
(2) E(Xn |Fn−1 ) = Xn−1 , ∀n ≥ 2.
(3) {εn , n ≥ 1} is said to be a martingale difference sequence w.r.t. {Fn , n ≥ 0}
if E(εn |Fn−1 ) = 0 a.s., ∀n ≥ 1.
Remark 1.3 If {Xn , n ≥ 1} is a martingale w.r.t. {Fn , n ≥ 1} and E(X1 ) = 0,

then ε1 = X1 , εn = Xn − Xn−1 for n ≥ 2 is a martingale difference sequence w.r.t.
{Fn , n ≥ 0}, where F0 = {∅, Ω}, E(ε1 |F0 ) = E(X1 |F0 ) = E(X1 ) = 0.
If {εn , n ≥ 1} is a martingale difference w.r.t. {FnP
, n ≥ 0},{Yn , n ≥ 1} is {Fn , n ≥
0}–predictive, and εn ∈ L , Yn εn ∈ L , then Sn = ni=1 Yi εi is a martingale w.r.t.
1 1
13
{Fn , n ≥ 0}.
Proof :
E(Sn |Fn−1 ) = E(Yn εn + Sn−1 |Fn−1 )

= E(Yn εn |Fn−1 ) + Sn−1 = Yn E(εn |Fn−1 ) + Sn−1
= Yn · 0 + Sn−1 = Sn−1 a.s..
Example 1.7
(a) IfP{εi } are independent r.v.0 s with E(εi ) = 0, and V ar(εi ) = 1, ∀i. Let Sn =
n
i=1 εi , and Fn = σ(ε1 , · · · , εn ), then E(εn |Fn−1 ) = E(εn ) = 0.
(b) Let Xn = ρXn−1 + εn , |ρ| < 1, where εn are i.i.d. with E(εn ) = 0, E(ε2n ) < ∞ and
n
X0 ∈ L2 is independent of {εi , i ≥ 1}, then i=1 Xi−1 εi is a martingale w.r.t.
P
{Fn , n ≥ 0}, where Fn = σ(X0 , ε1 , · · · , εn ), ∀n ≥ 0.
proof :
Xn = ρ2 Xn−2 + ρεn−1 + εn
= · · · = ρn X0 + ρn−1 ε1 + · · · + εn .
(c) Bayes estimate : θ̂n = E(θ|Fn ) where Fn ↑,
E(θ̂n+1 |Fn ) = E(E(θ|Fn+1 )|Fn ) = E(θ|Fn ) = θ̂n .
(d) Likelihood Ratio : Pθ , dPθ = fθ (X1 , · · · , Xn )dµ
fθ (X1 , · · · , Xn ) dPθ /dµ dPθ

Yn (θ, θ0 , X1 , · · · , Xn ) = = =
fθ0 (X1 , · · · , Xn ) dPθ0 /dµ dPθ0
Fn = σ(X1 , · · · , Xn )
Ln (θ, X1 , · · · , Xn ) = fθ (Xn |X1 , · · · , Xn−1 )Ln−1 (θ, X1 , · · · , Xn−1 ).
14
Fix θ0 , θ, then{Yn (θ), Fn , n ≥ 1} is a martingale
Ln (θ)
Eθ0 (Yn (θ)|Fn−1 ) = Eθ0 ( |Fn−1 )
Ln (θ0 )
fθ (Xn |X1 , · · · , Xn−1 ) Ln−1 (θ)
= Eθ0 ( · |Fn−1 )
fθ0 (Xn |X1 , · · · , Xn−1 ) Ln−1 (θ0 )
Ln−1 (θ) fθ (Xn |X1 , · · · , Xn−1 )
= Eθ0 ( |Fn−1 )
Ln−1 (θ0 ) fθ0 (Xn |X1 , · · · , Xn−1 )
fθ (xn |X1 , · · · , Xn−1 )
Z
= Yn−1 (θ) · fθ0 (xn |X1 , · · · , Xn−1 )dxn .
fθ0 (xn |X1 , · · · , Xn−1 )
Z
i.e., E(ϕ(X)|X1 , · · · , Xn ) = ϕ(x)f (x|X1 , · · · , Xn )dx.
(e) { d logdθLn (θ) , Fn = σ(X1 , · · · , Xn )} is a martingale if

∂fθ (xn |X1 , · · · , Xn−1 )
Z Z
∂
dxn = fθ (xn |X1 , · · · , Xn−1 )dxn = 0.
∂θ ∂θ
d log Ln (θ)
Eθ ( |Fn−1 )
dθ
d log fθ (Xn |X1 , · · · , Xn−1 ) d log Ln−1 (θ)
= Eθ ( + |Fn−1 )
dθ dθ
∂fθ (Xn |X1 ,··· ,Xn−1 )
∂θ d log Ln−1 (θ)
= Eθ [ |Fn−1 ] +
fθ (Xn |X1 , · · · , Xn−1 ) dθ
Z ∂fθ (xn |X1 ,··· ,Xn−1 )
∂θ d log Ln−1 (θ)
= · fθ (xn |X1 , · · · , Xn−1 )dxn +
fθ (xn |X1 , · · · , Xn−1 ) dθ
d log Ln−1 (θ)
= .
dθ
Lemma : If Xn is Fn –adaptive and Xn ∈ L1 , then S1 = X1 , Sn = X1 +
P n
i=2 (Xi − E(Xi |Fi−1 )) is a martingale w.r.t. {Fn , n ≥ 1}.
proof : n ≥ 2,
n
X
∵ E(Sn |Fn−1 ) = X1 + Xi − E(Xi |Fi−1 ) + E[(Xn − E(Xn |Fn−1 ))|Fn−1 ],
i=2
∴ E[(Xn − E(Xn |Fn−1 ))|Fn−1 ] = E(Xn |Fn−1 ) − E(Xn |Fn−1 ) = 0.
15
(f ) Let
d log fθ (Xn |X1 , . . . , Xn−1 )

un (θ) = ,
dθ
n
d log Ln (θ) X
= ui (θ),
dθ i=1
n
X
I(θ) = E[u2i (θ)|Fi−1 ],
i=1
dun (θ)
= vn (θ),
dθ
Xn
J(θ) = vn (θ),
i=1
Pm
then J(θ)+I(θ) is a martingale, and J(θ)− i=1 E(vi (θ)|Fi−1 ) is a martingale.
We only have to show that
E[vi (θ)|Fi−1 ] = −E[u2i (θ)|Fi−1 ] a.s..
Example : Xn = θXn−1 + εn , n = 1, 2, . . ., and X0 ∼ N (0, c2 ) is independent

of i.i.d. sequence εn ∼ N (0, σ 2 ). Assume that σ 2 and c2 are known, then
Ln (θ, X0 , . . . , Xn ) = fθ (X0 )fθ (X1 |X0 ) · · · fθ (Xn |X0 , . . . , Xn−1 )

1 − x202 1 (xn −θxn−1 )2
= √ e 2c · · · √ e− 2σ 2
2πc 2πσ
1 1 1 x20 1 Pn 2
= ( √ )n+1 n e−[ 2c2 + 2σ2 i=1 (xi −θxi−1 ) ] .
2π cσ
Hence
n
n+1 x20 1 X
log Ln (θ) = log(2π) − log c − n log σ − [ 2 + 2 (xi − θxi−1 )2 ],
2 2c 2σ i=1
therefore
n n
d log Ln (θ) 1 X 1 X
= 2 xi−1 (xi − θxi−1 ) = 2 xi−1 εi .
dθ σ i=1 σ i=1
16
1 1 2
i.e., ui (θ) = 2
Xi−1 (Xi − θXi−1 ) ⇒ u2i (θ) = 4 Xi−1 (Xi − θXi−1 )2 .
σ σ
Then
1 2
E[u2i (θ)|Fi−1 ] = X E[(Xi − θXi−1 )2 |Fi−1 ]
σ 4 i−1
2
1 2 2 Xi−1
= X σ = 2 ,
σ 4 i−1 σ
so
n
1 X 2
I(θ) = X ,
σ 2 i=1 i−1
2
dui (θ) Xi−1
vi (θ) = =− 2 ,
dθ σ
n n
X 1 X 2
J(θ) = vi (θ) = − 2 X .
i=1
σ i=1 i−1
⇒ I(θ) + J(θ) = 0.
Pn Pn
And i=1 u2i (θ) + i=1 E[vi (θ)|Fi−1 ] is also a martingale, since
n n n
1 X 2 2 1 X 2 1 X 2 2
X [Xi − θXi−1 ] − 2 X = X [ε − σ 2 ],
σ 4 i=1 i−1 σ i=1 i−1 σ 4 i=1 i−1 i
E[ε2 − σ 2 |Fi−1 ] = E(ε2 − σ 2 ) = σ 2 − σ 2 = 0.
Definition 1.3 An {Fn , n ≥ 1}– adaptive seq. {Xn } is defined to be a sub–martingale

(super–martingale) if E(Xn |Fn−1 ) ≥ (≤)Xn−1 for n = 2, . . ..
(1)Intuitive : martingale — constant

submartingale — increasing
supermartingale — decreasing
(2)Game : martingale — fair game
submartingale — favorable game
suppermartingale — infarovable game
17
Theorem 1.3
(i) Assume that {Xn , Fn } is a martingale. If ϕ is convex and ϕ(Xn ) ∈ L1 , then
{ϕ(Xn )Fn } is a submartingale.
(ii) Assume that {Xn , Fn } is a submartingale. If ϕ is convex, increasing and E[ϕ(Xn )] ∈
L1 , then {ϕ(Xn ), Fn } is a submartingale.
Proof : By Jensen inequality,
E[ϕ(Xn )|Fn−1 ] ≥ ϕ(E[Xn |Fn−1 ]) = ϕ(Xn−1 ).
For examples, ϕ(x) = |x|p , p ≥ 1 or ϕ(x) = (x − a)+ .

Corollary 1.2 If {Xn , Fn } is a martingale, and Xn ∈ Lp with p ≥ 1, then h(n) =
E|Xn |p is an increasing function.
Proof : Since {|Xn |p , Fn } is a submartingale,
E{E(|Xn+1 |p |Fn )} ≥ E{|Xn |p }.

Pn
Prove that Xn = i=1 εi , where ε0i s are i.i.d. r.v.0 s with E(εi ) = 0, and E|εi |3 < ∞,
then
E|Xn |3 ≤ E|Xn+1 |3 ≤ . . . .
(iii) [Gilat,D.(1977) Ann. Prob. 5,pp.475-481]
For a nonnegative submartingale {Xn , σ(X1 , . . . , Xn )}, there is a martingale
D
{Yn , σ(Y1 , . . . , Yn )} s.t. {Xn } = {|Yn |}.
(iv) Assume that {Xn , σ(X1 , . . . , Xn )} is a nonnegative submartingale. If ϕ is con-

vex and ϕ(Xn ) ∈ L1 , then there is a submartingale {Zn , σ(Z1 , . . . , Zn )} s.t.
D
{ϕ(Xn )} = {Zn }.
Proof : Let ψ(X) = ϕ(|X|). Then ψ(X) is a convex function. By Gilat’s
D
theorem, ∃ martingale {Yn } s.t. {Xn } = {|Yn |}, so
D
{ϕ(Xn )} = {ϕ(|Yn |)} = {ψ(Yn )} = {Zn },
which is a submartingale by (i).
18

Homework : Assume that {Xn , Fn } is a submartingale. If ∃ m > 1 s.t. E(Xm ) =

E(X1 ), then {Xi , Fi , 1 ≤ i ≤ m} is a martingale.
Definition 1.4 Let N∞ = {1, 2, . . . , ∞}, and T : Ω → N∞ . Then T is said to be a

Fn –stopping time if {T = n} ∈ Fn , n = 1, 2, . . . .
Remark 1.4 Let F∞ = ∨n Fn . Since {T = ∞} = {T < ∞}c and {T < ∞} =

∪n {T = n} ∈ F∞ so {T = ∞} ∈ F∞ .
We said that a stopping time T is finite if P {T = ∞} = 0.
Remark 1.5 Since {T ≥ n} = {T < n}c ∈ Fn−1 , then
{T ≤ n} ∈ Fn , ∀ n ⇔ {T = n} ∈ Fn , ∀ n.
Definition 1.5 Let T be an Fn –stopping time. The pre–T σ–field FT is defined to

be {Λ ∈ F : Λ ∩ {T = n} ∈ Fn , ∀ n ∈ N∞ }.
If Λ ∈ FT , then Λ = ∪n∈N∞ (Λ ∩ {T = n}) ∈ F∞ , so FT ⊂ F∞ .
Example 1.8 Let Xn be Fn –adaptive ∀ Borel set Γ, we define T = inf{n : Xn ∈ Γ}.

Then T is an Fn –stopping time. (inf Ø = ∞).
Proof : {T = k} = {X1 6∈ Γ, . . . , Xk−1 6∈ Γ, Xk ∈ Γ} ∈ Fk .
Theorem 1.4 Assume that T1 and T2 are Fn –stopping times.
(i) Then so are T1 ∧ T2 and T1 ∨ T2 .
19
(ii) If T1 ≤ T2 then FT1 ⊂ FT2 .
Proof :
(i) {T1 ∧ T2 ≤ n} = {T1 ≤ n} ∪ {T2 ≤ n} ∈ Fn

{T1 ∨ T2 ≤ n} = {T1 ≤ n} ∩ {T2 ≤ n} ∈ Fn
(ii) Let Λ ∈ FT1 , then Λ ∩ {T1 ≤ n} ∈ Fn . Since {T2 ≤ n} ∈ Fn , we have Λ ∩ {T1 ≤

n} ∩ {T2 ≤ n} ∈ Fn and Λ ∩ {T1 ≤ n} ∩ {T2 ≤ n} = Λ ∩ {T2 ≤ n} ∈ FT2 , so
Λ ∈ FT2 .
Theorem 1.5 (Optional Sampling Theorem)

Let α and β be two Fn –stopping times s.t. α ≤ β ≤ K where K is a positive
integers. Then for any (sub or super) martingale {Xn , Fn },{Xα , Fα ; Xβ , Fβ } is a
(sub or super) martingale.
Proof : We only have to consider the case when Xn is a submartingale.
Lemma : Assume that β is an Fn –stopping time s.t. β ≤ K. If {Xn , Fn } is a
submartingale then
E[Xβ |Fn ] ≥ Xn a.s. on {β ≥ n}

E[Xβ |Fn ]I[β≥n] ≥ Xn I[β≥n] a.s.
Proof of Lemma : It is sufficient to show that

Z Z
∀ A ∈ Fn Xβ I[β≥n] dp ≥ Xn I[β≥n] dp
A A
Let A = {Un > E(Z|Fn )}, E(Z|Fn ) ≥ Un ∈ Fn

Z Z
⇔ ∀ A ∈ Fn , Zdp ≥ U dp
A A
Z
⇔ ∀ A ∈ Fn , (Z − U )dp ≥ 0
A
Z Z
⇔ E(Z|Fn )dp ≥ U dp
ZA A
⇔ [E(Z|Fn ) − U ]dp = 0.
A
20
Z Z Z Z
Xn I[β≥n] dp = Xn dp = Xn dp + Xn dp
A A∩[β≥n] A∩[β=n] A∩[β≥n+1]
Z Z
≤ Xβ dp + Xn+1 dp.
A∩[β=n] A∩[β≥n+1]
Since B ∈ Fn , Z Z Z
E[Xn+1 |Fn ]dp = Xn+1 dp ≥ Xn dp.
B B B
We have that
Z Z Z Z
Xn I[β≥n] dp ≤ Xβ dp + . . . + Xβ dp + XK+1 dp
A A∩[β=n] A∩[β=K] A∩[β≥K+1]
Z Z
= Xβ dp = Xβ dp.
A∩[n≤β≤K] A∩[n≤β]
Continuation of the proof of the theorem : R R

It is sufficient to show that ∀ Λ ∈ Fα , Λ Xβ dp ≥ Λ Xα dp. Given Λ ∈ Fα , A =
∪kn=1 (Λ ∩ {α = n}). It is sufficient to show ∀ 1 ≤ n ≤ K,
Z Z Z
Xβ dp ≥ Xα dp = Xn dp.
Λ∩[α=n] Λ∩[α=n] Λ∩[α=n]
R R
However, Λ∩[α=n] Xβ dp = Λ∩[α=n] E(Xβ |Fn )dp. Since {α = n} ⊂ {β ≥ n} (since
R R
β ≥ α = n), we have Λ∩[α=n] E(Xβ |Fn )dp ≥ Λ∩[α=n] Xn dp,
∀ n, {Xα ≤ x} ∪ {α = n} = {Xn ≤ x} ∩ {α = n} ∈ Fn
So {Xα ≤ x} ∈ Fα .
Remark 1.6 If α = 1, ∀ β ≤ K, we have EXβ = EX1 , then {Xn , Fn } is a martin-

gale.
How to prove the convergence of a sequence:
1. Find the limit X, try to show |Xn − X| → 0.
21
2. Without knowing the limit:
(i) Cauchy sequence supm>n |Xn − Xm | → 0 as n → ∞

(ii) limit set ,[lim inf Xn , lim sup Xn ] = A
(a) lim inf Xn = lim sup Xn
(b) ∀ a ∈ A, ψ(a) = 0 and ψ has a unique root.
Consider
{lim inf Xn < lim sup Xn } = ∪ a<b {lim inf Xn < a < b < lim sup Xn }
rationals
α1 = inf{m : Xm ≤ a}
β1 = inf{m > α1 : Xm ≥ b}
..
.
αk = inf{m > βk−1 : Xm ≤ a}
βk = inf{m > αk : Xm ≥ b},
and define upcrossing number Un = Un [a, b] = sup{j : βj ≤ n, j < ∞}. Note that if
αi0 = αi ∧ n, βi0 = βi ∧ n then αn0 = βn0 = n.
Then define τ0 = 1, τ1 = α10 , . . . , τ2n−1 = αn0 , and τ2n = βn0 . Clearly, τn = n.
If {Xn , Fn } is a submartingale, then {Xτk , Fτk , 1 ≤ k ≤ n} is a submartingale by
optional sampling theorem. ( Since τk ≤ n ∀ 1 ≤ k ≤ n. )
Theorem 1.6 (Upcrossing Inequality)
If {Xn , Fn } is a submartingale, then (b − a)EUn ≤ E(Xn − a)+ − E(X1 − a)+ .
Proof : Observe that the upcrossing number Un [0, b − a] of (Xn − a)+ is the same as
Un [a, b] of Xn . Furthermore,{(Xn −a)+ , Fn } is also a martingale. ϕ(x) = (x−a)+ is a
convex function. Hence we only have to show the case Xn ≥ 0 a.s. and Un = Un [0, c].
Now consider
n−1
X X X
Xn − X1 = Xτn − Xτn−1 + . . . + Xτ1 − Xτ0 = (Xτi+1 − Xτi ) = + ,
i=0 i:even i:odd
X
∵ (xτi+1 − Xτi ) ≥ Un C,
i:odd
X X
∴ EXn − EX1 ≥ CEUn + E( ) ≥ CEUn + (EXτi+1 − EXτi ) ≥ CEUn .
i:even i:even
22

Theorem 1.7 (Global convergence theorem)

Assume that {Xn , Fn } is a submartingale s.t. supn E(Xn+ ) < ∞. Then Xn con-
verges a.s. to a limit X∞ and E|X∞ | < ∞.
Proof : We only have to show that
P [lim inf Xn < a < b < lim sup Xn ] = 0. (∗)
Let U∞ [a, b] be the upcrossing number of {Xn }. Then {lim inf Xn < a < b <
lim sup Xn } ⊂ {∪∞ [a, b] = ∞} and Un [a, b] ↑ U∞ [a, b],
EU∞ [a, b] = lim E(Un [a, b])

n→∞
≤ sup(E(Xn − a)+ − E(X1 − a)+ )/(b − a) < ∞,
n
so U∞ [a, b] < ∞ a.s., and P [U∞ [a, b] = ∞] = 0.

This implies (∗). Now
E|Xn | = EXn+ + EXn− = 2EXn+ − (EXn+ − EXn− )

= 2EXn+ − EXn ≤ 2EXn+ − EX1 ,
so supn E|Xn | ≤ 2 supn EXn+ − EX1 < ∞.

By Fatou’s Lemma,
E|X∞ | = E( lim |Xn |) ≤ lim inf E|Xn | ≤ sup E|Xn | < ∞.

n→∞ n
Remark 1.7 Xn ↑, supn EXn+ < ∞ : upper bound.

a.s.
Corollary 1.3 If {Xn } is a nonnegative supermartingale then ∃ X ∈ L0 s.t. Xn →
X.
Proof : Since −Xn is a nonpositive submartingale and E(−Xn )+ = 0, ∀ n.
Example 1.9
23
1. Likelihood Ratio
Ln (θ)
Yn (θ) = ≥ 0.
Ln (θ0 )
So Yn (θ) → Y (θ) a.s. (Pθ0 ), (Y (θ) = 0 if θ1 , θ0 are distinctable.)
2. Baye’s est.
θ̂n = E[θ|X1 , . . . , Xn ], E(θ2 ) < ∞

E|θ̂n | ≤ E{E(|θn ||X1 , . . . , Xn )} = E|θn | < ∞.
a.s.
So supn E|θ̂n | < ∞, and θ̂n → θ∞ .
Definition 1.6 {Xn } is said to be uniformly integrable(u.i.) if ∀ ε > 0, ∃ A s.t.

Z Z
sup |Xn |dp ≤ ε or lim sup |Xn |dp → 0.
n {|Xn |>A} A→∞ n {|Xn |>A}
Theorem 1.8 {Xn } is u.i. ⇐⇒
(i) supn E|Xn | < ∞, and

R
(ii) ∀ ε > 0, ∃ δ > 0 s.t. ∀ E ∈ F, P (E) < δ ⇒ supn E
|Xn |dP < ε.
How to prove {Xn } is u.i. ?
1. If Z = supn |Xn | ∈ L0 then {Xn } is u.i..

Proof :
(i) obvious,since E|Xn | ≤ E(Z) < ∞

(ii)
Z Z Z Z
|Xn |dP ≤ ZdP ≤ ZI[Z≤c] dP + ZI[Z>c] dP
E E E E
Z
≤ cP (E) + ZdP
{Z>c}
24
2. If ∃ Borel–measurable function f : [0, ∞) 7→ [0, ∞) s.t. supn Ef (|Xn |) < ∞ and
limt→∞ f (t)
t
= ∞, then {Xn } is u.i..
p
Theorem 1.9 Assume that Xn → X , then the following statements are equivalent.
(i) {|Xn |p } is u.i.
Lp n→∞
(ii) Xn → X, (i.e.E|Xn − X|p −→ 0)
n→∞
(iii) E|Xn |p −→ E|X|p

D n→∞
Remark 1.8 If Xn → X and {|Xn |p } is u.i., then E|Xn |p −→ E|X|p .
Proof : We can reconstruct the probability space and r.v.’s Xn0 , X 0 ,
D D a.s.
s.t. Xn0 = Xn , X 0 = X and Xn0 → X 0 .

D n→∞
Ex. Let Xn → N (0, σ 2 ) and Xn2 is u.i., then E(Xn2 ) −→ σ 2 . How to know
max1≤i≤n |Xn |p ∈ L1 ?

1.3 Basic Inequalities (maximum inequalities)

Theorem 1.10 (Fundamental Inequality)
If {Xi , Fi , i ≤ i ≤ n} is a submartingale, then ∀ λ
λP [ max Xi > λ] ≤ E(Xn I[max1≤i≤n Xi >λ] ).
1≤i≤n
Proof : Define τ = inf{i : Xi > λ}, (recall : inf Ø = ∞), then {max1≤i≤n Xi > λ} =
{τ ≤ n}. On the set τ = k ≤ n, Xτ > λ, then
Z Z Z
λP [τ = k] ≤ Xτ dP = Xk dP ≤ Xn dP
[τ =k] [τ =k] [τ =k]
Since
τ = k ⇔ X1 ≤ λ, . . . , Xk−1 ≤ λ, Xk > λ,
then
n
X Z Z
λP [ max Xi > λ] = λ P [τ = k] ≤ Xn dP = Xn dP.
1≤i≤n [τ ≤n] [max1≤i≤n Xi >λ]
k=1
25

Theorem 1.11 (Doob’s Inequality)

If {Xi , Fi , 1 ≤ i ≤ n} is a martingale, then ∀ p > 1
kXn kp ≤ k max |Xi |kp ≤ qkXn kp ,
1≤i≤n
1
where kXkp = (E|X|p ) p and p1 + 1q = 1.
Proof : Since {|Xn |, Fn } is a submartingale, by the theorem. Let Z = max1≤i≤n |Xi |,
then
Z ∞
p
E(Z ) = p xp−1 P [Z > x]dx
Z0 ∞ Z ∞
p−2
≤ p x E(|Xn |I[Z>x] )dx = pE[|Xn | I[Z > x]xp−2 dx]
0 0
Z Z p−1
Z
≤ pE[|Xn | xp−2 dx] = pE[|Xn | ]
0 p−1
p p 1
≤ kXn kp kZ p−1 kq = kXn kp [E(Z p )] q .
p−1 p−1
Hence
kZ p−1 kq = {E(Z p−1 )q }1/q = [E(Z p )]1/q ,
1
kZkp = [E(Z p )]1/p = [E(Z p )]1− q ≤ qkXn kp .
Note that
k max |Xi |kp = ∞ ⇒ qkXn kp = ∞.
1≤i≤n
Corollary 1.4 If {Xn , Fn , n ≥ 1} is a martingale s.t. supn E|Xn |p < ∞ for some
p > 1 then {|Xn |p } is u.i. and Xn converges in Lp .
Proof : p > 1 ⇒ supn E|Xn | < ∞ so Xn converges a.s. to a r.v. X. By Doob’s
inequality:
k max |Xi |kp ≤ qkXn kp ≤ q sup kXn kp < ∞
1≤i≤n n
By the Monotone convergence theorem:
E sup |Xi |p = lim E sup |Xi |p ≤ q sup E|Xn |p < ∞
1≤i≤∞ n→∞ 1≤i≤n n
Lp
So sup1≤i≤∞ |Xi |p ∈ L1 , {|Xn |p } is u.i. and Xn −→ X.
26

Homework : Show without using martingale convergence theorem that if {Xn , Fn }

is a martingale and supn E|Xn |p < ∞ for some p > 1 then Xn converges a.s.

a.s.
Ex.( Bayes Est. ) θ̂n = E[θ|X1 , . . . , Xn ]. If θ ∈ L2 then θ̂n → θ∞ and E[θ̂n −θ∞ ]2 → 0.
pf: E θ̂n2 ≤ Eθ2 < ∞(p = 2).
What is θ∞ ? Is θ∞ equal to E[θ|Xi , i ≥ 1]?
Theorem 1.12 If X ∈ L1 , Xn = E(X|Fn ) and X∞ = limn→∞ Xn then (i) {Xn } is

u.i., and (ii) X∞ = E(X|F∞ ) where F∞ = ∨∞ n=1 Fn .
pf: Fix n, {Xn , Fn , X, F} is a martingale. Therefore, {|Xn |, Fn , |X|, F} is a sub-
n|
martingale. So {|Xn |>λ} |Xn |dP ≤ {|Xn |>λ} |X|dP . Now P {|Xn | > λ} ≤ E|X
R R
λ
≤
E|X|
λ
→ 0. R R
{|Xn |>λ}
|X|dP ≤ cP {|X n | > λ} + {|X|>c}
|X|dP
E|X| R
≤ c λ + {|X|>c} |X|dP
Z
E|X|
⇒ sup E|Xn |I[|Xn |>λ] ≤ c + |X|dP
n λ {|X|>c}
Z
lim sup E|Xn |I[|Xn |>λ] ≤ |X|dP ∀ c
λ→∞ n {|X|>c}
L1 R n→∞ R R
RTherefore, XRn → X∞ . So ∀ Λ ∈ F, Λ Xn dP → Λ X∞ dP . Since | Λ Xn dP −
Λ
X∞ dP | ≤ Λ |Xn − X∞ |dP ≤ E|Xn − X∞ | → 0. Fix n, Λ ∈ Fn , ∀ m ≥ n
Z Z Z Z
XdP = Xn dP = Xm dP = X∞ dP
Λ Λ Λ Λ
Let G = {Λ : Λ XdP = Λ X∞ dP }. Then G is a σ–field s.t. G ⊃ ∪∞

R R
n=1 Fn . So
∞
G ⊃ ∨n=1 Fn = F∞ . Observe that X∞ is F∞ –measurable. Hence E(X|F) = X∞ .
Corollary 1.5 Assume that θ ∈ L2 , θ̂n = E(θ|X1 , . . . , Xn ) and θ∞ = E(θ|Xi , i ≥ 1).

p
If ∃ θ̃n = θ̃n (X1 , . . . , Xn ) s.t. θ̃n → θ then θ∞ = θ a.s.
p a.s.
pf: Since θ̃n → θ. Let Fn = σ(X1 , . . . , Xn ). So ∃ nj s.t. θ̃nj → θ as nj → ∞.
Hence θ is F∞ = σ(Xi , i ≥ 1) measurable. By the theorem stated above, we get
θ∞ = E[θ|F∞ ] = θ a.s.
27
Example: yi = θxi + εi ,Xi : constant,θ ∈ L2 with known density f (θ),εi i.i.d.
N (0, σ 2 ), σ 2 known, and {εi } is independent of θ.
Pn
µ i=1 Xi Yi
c 2 + 2
θ̂n = E(θ|Y1 , . . . , Yn ) = Pnσ 2
1 i=1 Xi
c 2 + σ 2
Assume that f (θ) ∼ N (µ, c2 ), µ, c2 known.

Pn 2
1 − (θ−µ) 2
1 n − i=1 (yi −θxi )
g(θ, y1 , . . . , yn ) = √ e 2c2 ( √ ) e 2σ 2
2πc 2πσ
g(θ|y1 , . . . , yn ) = R g(θ,y1 ,...,yn )

g(θ,y1 ,...,yn )dθ
Pn 2 Pn
2
1 i=1 Xi )θ 2 +( µ + i=1 xi yi )θ
∝ K(y1 , . . . , yn )e−( 2c2 + 2σ 2 c2 σ2
P∞
When i=1 Xi2 < ∞
P∞ 2
P∞
µ i=1 Xi i=1 Xi εi
n→∞ c2 + σ2
θ+ σ2
θ̂n −→ P∞ 2 = θ∞
1 i=1 Xi
c2
+ σ2
P∞ 2
P∞
i=1 Xi ( Xi2 )σ 2
σ 2 c2 + i=1
σ4
D
∼ N (µ, P∞ 2 ) 6= θ
i=1 Xi
( c12 + σ 2 )2
P∞
When i=1 Xi2 → ∞
Pn Pn
xi yi xi εi a.s.
θ̂n ∼ Pn 2 = θ + Pi=1
i=1
n 2
→θ
i=1 xi i=1 xi
Pn
xi y i Pn
In general, let θ̂n = Pi=1
n 2 . When i=1 x2i → ∞,
i=1 xi
Pn
2 x i εi 2 σ2
E(θ̃n − θ) = E{ Pi=1
n 2
} = Pn 2
→0
i=1 Xi i=1 xi
p
So θn → θ. By our theorem, θˆn → θ a.s. or L2 .
Pn
How to calculate the upper and lower bound of E|Xn |p and E| i=1 Xi εi |p ?
28
1.4 Square function inequality
Let {Xn , Fn } be a martingale and d1 = X1 ,di = Xi − Xi−1 for i ≥ 2.
Theorem 1.13 (Burkholder’s inequality)

∀ 1 < p < ∞, ∃ C1 and C2 depending only on p such that
n
X n
X
C1 E| d2i |p/2 p
≤ E|Xn | ≤ C2 E| d2i |p/2
i=1 i=1
Cor. For p > 1, ∃C20 depending only on p s.t.

n
X n
X
C1 E| d2i |p/2 ≤ E(Xn∗ )p ≤ C20 E| d2i |p/2
i=1 i=1
where Xn∗ = max1≤i≤n |Xi | and C1 is defined by the theorem.

proof:
Since E(Xn∗ )p ≥ E|Xn |p =⇒ lower half is obtained.
By Doob’s inequality: kXn∗ kp ≤ qkXn kp P
So E(Xn∗ )p = kXn∗ kpp ≤ q p E|Xn |p ≤ q p C2 E| ni=1 d2i |p/2
Remark: When di are independent, it is called Marcinkiewz-Zygmund inequality.
Note that for p ≥ 2
p/2
E| ni=1 d2i |p/2 = k ni=1 d2i kp/2
P P
≤ ( ni=1 kd2i kp/2 )p/2 = { ni=1 (E|di |p )2/p }p/2

P P
D D
N (0, σ 2 ) then Y ∼ N (0, ( ∞ 2 2
P
If εi ∼ P −∞ ai )σ ).
C2 = ( ∞ 2
−∞ ai )σ
2
∞
p Y p p p p p p/2
X
E|Y | = E| | C = (E|N (0, 1)| )C = {E|N (0, 1)| }σ ( a2i )p/2
C −∞
P∞ P∞ 2
Example: Let Y = −∞ ai εi ,where −∞ ai < ∞ and εi are i.i.d. random Pvaribles
2 p n
with E(εi ) = 0 and V ar(εi ) = σ < ∞. Assume E|εi )| < ∞,Yn = −n ai εi ,
(a−n ε−n , a−n ε−n + a−n+1 ε−n+1 , · · · , Yn ) is a martingale.
E|Yn |p ≤ C2 {Pn−n (E|ai εi |p )2/p }p/2

P
= C2 { n−n (|ai |p E|εi |pP )2/p }p/2
= C2 {(E|ε1 |p )2/p }p/2 { n−n a2i }p/2
29
By Fatou’s lemma, E|Y |p ≤ C2 (E|ε1 |p ){ ∞ 2 p/2
P
−∞ ai } , ∃ C1 , C2 depending only on p
p
and E|εi | s.t.
X∞ X∞
C1 ( a2i )p/2 ≤ E|Y |p ≤ C2 ( a2i )p/2
−∞ −∞
Example: Consider yi = α + βxi + εi where εi are P i.i.d. mean 0 and E|εi |p < ∞
for some p ≥ 2. Assume that xi are constant and s2n = ni=1 (xi − x̄n )2 → ∞. If p > 2
then the least square estimator β̂ is strongly consistent.
Pn
(xi − x̄n )εi σ2
β̂n − β = Pi=1
n 2
(V ar(β̂n ) = )
i=1 (xi − x̄n ) s2n
x̄n = n1 ni=1 xi ,let
P
Pn
Sn = i=1 (xi − x̄n )εi , n ≥ 2
= S2 + (S3 − S2 ) + · · · + (Sn − Sn−1 )
When n > m,
Sn − Sm = Pni=1 (xi − x̄n )εi − Pm
P P
i=1 (xi − x̄m )εi
m n
= (x̄
i=1 m − x̄ )ε
n i + i=m+1 (xi − x̄n )εi
Pm
E(Sn − Sn−1 )Sm = x̄m )(x̄m − x̄n )σ 2
i=1 (xi − P
= (x̄m − x̄n )[ m i=1 (xi − x̄m )]σ
2
So s2n = ( ni=2 Ci2 ) where C22 = E(S22 )/σ 2 and Cn2 = E(Sn − Sn−1 )2 /σ 2 . We want to
P
show Ss2n → 0 a.s.
n
Moricz:E| ni=m Zi |p ≤ Cp ( ni=m CPi2 )p/2 ∀ n, m

P P
n
Z
If ni=1 Ci2 → ∞ and P > 2 then Pni=1 C 2i → 0 a.s.
P
i=1 i
Zi = Si − Si−1 ,Sn = ni=1 (xi − x̄n )εi Note that ni=m Zi = ni=1 ai (n, m)εi where
P P P
ai (n, m) may depend on n and m.
So E| ni=m Zi |p ≤ Cp ( ni=1 a2 (n, m))p/2
P P
P i
V ar( n Zi ) p/2
≤ Cp [ P σi=m 2 ]
n
i=m V ar(Zi ) p/2
= Cp [ σ2
]
Cp Pn 2 p/2
= σp ( i=m Ci )
If ai is Fi−1 –measurable,recall:
Pn
} = { ni=1 (E|ai εi |p )2/p }p/2
p 2/p p/2
P
{ P i=1 (E|di | )
= { ni=1 (E|ai |p {E(|εi |p |Fi−1 )})2/p }p/2
30
Theorem 1.14 (Burkholder-Davis-Gundy)
∀ ρ > 0, ∃ C depending only on p s.t.
n
X
E(Xn∗ )p ≤ C{E[ E(d2i |Fi−1 )]p/2 + E( max |di |p )}
1≤i≤n
i=1
Theorem 1.15 (Rosenthal’s inequality)

∀ 2 ≤ p < ∞, ∃ C1 , C2 depending only on p s.t.
C1 {E[ ni=1 2 p/2
Pn
E|di |p } ≤ E|Xn |p
P
E(di |Fi−1 )] + i=1
≤ C2 {E[ i=1 E(d2i |Fi−1 )]p/2 + ni=1 E|di |p }
Pn P
Cor.(Wei,1987,Ann.Stat. 1667-1682)
Assume that {εi , Fi } is a martingale differences s.t. supn E{|εn |p |Fn−1 } ≤ C for
some p ≥ 2 and constant C.
Assume that un is Fn−1 –measurable. Let Xn = ni=1 ui εi and Xn∗ = sup1≤i≤n |Xi |.
P
Then ∃ K depending only on C and p s.t. E(Xn∗ )p ≤ KE( ni=1 u2i )p/2 .
P
Proof: By B–D–G inequality:
n
X
E(Xn∗ )p ≤ Cp {E[ E(u2i ε2i |Fi−1 )]p/2 + E max |ui εi |p }
1≤i≤n
i=1
Pn Pn 2
i=1E(u2i ε2i |Fi−1 ) ≤ p
i=1 ui [E(|εi | |Fi−1 )
2/p
]
2 P
n
≤ C p ( i=1 u2i )
CE( ni=1 u2i )p/2
P
f irst term ≤ CpP
n p p
second term ≤ E Pn i=1 |ui | p|εi | p
= Pi=1 E(|ui | |εi | )
n p p
= i=1 E{E(|ui | |εi | |Fi−1 )}
P n p
≤ C P i=1 E|ui |
n
= CE(P i=1 |ui |p )
≤ CE ni=1 u2i (max1≤j≤n |uj |p−2 )
p−2
≤ CE(Pni=1 u2i )( ni=1 u2i ) 2
P P
= CE( ni=1 u2i )p/2
Let K = Cp C + C.
Pn
|ai |p ≤ ( ni=1 a2i )p/2 .
P
ai constant,p ≥ 2 : i=1
The comparison of Local convergence theorems and Global convergence theorems:

Conditional Borel-Cantelli Lemma:
Classical results: Ai events,
31
P
1. If P (Ai ) < ∞ then P (Ai i.o.) = 0.
P
2. If Ai are independent and P (Ai i.o.) = 0 then P (Ai ) < ∞.
P∞
Define X = i=1 IAi then {Ai i.o.} = {X = ∞}.
X X X
P (Ai ) = E(IAi ) = E( IAi ) = E(X)
The classical result connects the finiteness of X and E(X).

1. X > 0, E(X) < ∞ ⇒ X < ∞ a.s.
2. ?
P∞ P∞
i=1 E(IAi |Fi−1 ) < ∞ a.s. if i=1 EIAi < ∞,Fn = σ(A1 , · · · , An )
Mi = E(IAi |Fi−1 ) = P (Ai )

X∞ ∞
X
P( IAi < ∞) > 0 ⇒ P (Ai ) < ∞
i=1 i=1
Theorem: Let {Xn } be a sequence of nonnegative random variables and {Fn , n ≥ 0}

be a sequence of increasing σ–fields. Let Mn = E(Xn |Fn−1 ). Then
P∞ P∞
1. i=1 Xi < ∞ a.s. on { i=1 Mi < ∞}, and
1
2. if
PY = supn Xn /(1 + XP1 + · · · + Xn−1 ) ∈ L and Xn is Fn –measurable then
∞ ∞
i=1 Mi < ∞ a.s. on { i=1 Xi < ∞}.
Remark: If Xi are uniformly bdd by C then Y ≤ C a.s. and Y ∈ L1 . In this

case,with the assumption Xn is Fn –measurable.
X∞ ∞
X ∞
X ∞
X
P [({ Xi < ∞} 4 { Mi < ∞}) ∪ ({ Xi = ∞} 4 { Mi = ∞})] = 0
i=1 i=1 i=1 i=1
proof: ( Due to Louis,H.Y.Chen,Ann.Prob 1978)
Theorem 1.16 Let {Xn } be a sequence of nonnegative random variables and {Fn }
be a sequence of increasing σ–fields. Let Mn = E(Xn |Fn−1 ) for n ≥ 1.
P∞ P∞
1. i=1 Xi < ∞ a.s. on { i=1 Mi < ∞}.
Xn 1
P∞
2. If Xn is Fn –measurable and Y = supn 1+X1 +···+Xn−1
∈ L then i=1 < ∞ a.s.
P∞
on { i=1 Mi < ∞}.
32
Classical results : Ai events
P ∞
i=1 P (Ai ) < ∞ ⇒ P (An i.o.) = 0
If Ai are independent then P (An i.o.) = 0 or P ( ∞
P P∞
i=1 IAi < ∞) = 1, ⇒ i=1 P (Ai ) <
∞.
xi = IAi , Fn = σ(A1 , · · · , An )
P∞
P (Ai ) = ∞
P P∞
i=1P i=1 E(I Ai
) = E( i=1 IAi )
∞ P∞
= E( i=1 Xi ) = E{ i=1 E(Xi |Fi−1 )} < ∞
P∞
|Fi−1 ) < ∞ a.s. ⇒P ∞
P
⇒ i=1 E(XiP i=1 Xi < ∞ a.s.
{An i.o.} = { ∞ I Ai = ∞} = { ∞
i=1 Xi = ∞}
P∞ P∞ i=1 indep. P∞ P∞
i=1
P∞ Mi = i=1 E(IAi |Fi−1 P)∞ = i=1 E(IAi ) = i=1 P (Ai )
P { i=1 IAi < ∞} > 0 ⇒ i=1 P (Ai ) < ∞
proof of theorem:
(i) Let M0 = 1. Consider
Pn Mi
i=1 (M0 +···+Mi−1 )(M0 +···+Mi )
P n 1 1
= i=1 { M0 +···+M i−1
− M0 +···+M i
}
1 1 1
= M0 − M0 +···+Mn = 1 − 1+M0 +···+Mn
Let Sn = M0 + · · · + Mn then Sn is Fn−1 –measurable.
Since 1 ≥ E ∞
P Mi
P∞ Mi
i=1 Si−1 Si = i=1 E( Si−1 Si )
P∞
= i=1 E( E(X i |Fi−1 )
)= ∞ Xi
P
S i−1 S i i=1 E{E( Si−1 Si |Fi−1 )}
= ∞
P Xi
P∞ Xi
i=1 E( Si−1 Si ) = E( i=1 Si−1 Si )
P∞ Xi
So i=1 Si−1 Si < ∞ a.s.
On the set {S∞ < ∞},

∞ ∞ ∞ ∞
X Xi X Xi 1 X X
≥ 2
= 2 Xi ⇒ Xi < ∞
i=1
Si−1 Si i=1
S∞ S∞ i=1 i=1
(ii) Let X0 = 1 and Un = ni=0 Xi is Fn –measurable.

P
E( ∞ Mi
)= ∞ Mi
)= ∞ EE( UM2 i |Fi−1 )
P P P
2
i=1 Ui−1 i=1 E( Ui−1
2
P∞ Xi P∞ Xi Ui=1 i−1
= E( i=1 U 2 ) = E( i=1 Ui−1 Ui Ui−1 i
)
i−1
≤ E[( ∞ Xi Ui Ui
P
i=1 Ui−1 Ui )(supi Ui−1 )] ≤ E supi Ui−1
= E(supi (1 + UXi−1
i
)) = E(1 + Y ) < ∞
33
P∞ Mi
So 2
i=1 Ui−1 < ∞ a.s.
2
On the set {U∞ < ∞}
∞ P∞ ∞
Mi i=1 Mi
X X
2
≥ 2
⇒ Mi < ∞
i=1
Ui−1 U ∞ i=1
Remark: Under condition(ii)
P [{P∞
P P∞
i=1 Mi < ∞} 4 {Pi=1 Xi < ∞}] = 0, and
P [{ ∞i=1 Mi = ∞} 4 {
∞
i=1 Xi = ∞}] = 0.
1.5 Series Convergence

Recall (Global) convergence theorem :
{Xn , Fn } is a martingale and supn E|Xn | < ∞ ⇒ Xn converges a.s.
Let ε1 = X1 and εn = Xn − Xn−1 ,n ≥ 2
n
X n
X
sup E| εi | < ∞ ⇒ εi converges a.s.
n
i=1 i=1
Pn
Theorem P1.17 (Doob) Let {Xn = i=1 εi , Fn } be a martingale. Then Xn converges
∞
a.s. on { i=1 E(ε2i |Fi−1 ) < ∞}.
proof: Fix K > 0. Define τ = inf{n : n+1 2
P
i=1 E(εi |Fi−1 ) > K}. Then {Xn∧τ , Fn∧τ } is
a martingale.
2
Pn∧τ 2 Pn 2
E(X
Pnn∧τ ) = E( i=1 ε i ) = E( i=1 εi I[τ ≥i] )
2
P n 2
= i=1PnE(I[τ ≥ i]εi2) = i=1 EE(I ≥i] εi |Fi−1 )
P[τn∧τ
= E{ i=1 I[τ ≥i] E(εi |Fi−1 )} = E{ i=1 E(ε2i |Fi−1 )}
≤ E(K) = K
2
) < ∞ so Xn∧τ converges a.s. But on the event AK = { ∞ 2
P
Since supn E(Xn∧τ i=1 E(εi |Fi−1 ) ≤
K} : τ = ∞ and Xn∧τ P∞= Xn .2 So Xn converges a.s. on AK . Hence it also converges
∞
a.s. on ∪K=1 AK = { i=1 E(εi |Fi−1 ) < ∞}.
Theorem 1.18 Pn(Three series Theorem)

Let Xn = i=1 εi be Fn –adaptive and C a positive constant. Then Xn converges
a.s. on the event where
P∞
(i) i=1 P [|εi | > C|Fi−1 ] < ∞,
34
Pn
(ii) i=1 E(εi I[|εi |≤C] |Fi−1 ) converges, and
P∞ 2
(iii) i=1 {E(εi I[|εi |≤C] |Fi−1 ) − E 2 (εi I[|εi |≤C] |Fi−1 )} < ∞
Remark: When εi are independent,(i),(ii) and (iii) are also necessary for Xn to be an
a.s. convergent series.
proof:
Xn = Pni=1 εi
P
n Pn
= i=1 ε i I[|εi |>C] + i=1 {εi I|εi |≤C] − E(εi I[|εi |≤C] |Fi−1 )}
+ ni=1 E(εi I[|εi |≤C] |Fi−1 )
P
= I1n + I2n + I3n
Let Ω0 = {(i),(ii) and (iii) hold}. By (i) and the conditional Borel—Cantelli lemma,
∞
X
I[|εi |>C] < ∞ a.s. on Ω0
i=1
Hence I[|εi |>C] = 0 eventually on Ω0 . So I1n converges a.s. on Ω0 . The conver-

gence of I2n on Ω0 follows from (iii) and Doob’s theorem. Let Zi = εi I|εi |≤C] −
E(εi I[|εi |≤C] |Fi−1 ).
E(Zi2 |Fi−1 ) = E(ε2i I[|εi |≤C] |Fi−1 ) − E 2 (εi I[|εi |≤C] |Fi−1 ).
I3n follows from (ii).
Counterexample: Let Xn be a sequence of independent random variables s.t.

1 −1 1
P [Xn = √ ] = P [Xn = √ ] = .
n n 2
a.s.
Let Fn = σ(X1 , · · · , Xn ),ε1 = X1 and εn = Xn − Xn−1 for n ≥ 2. Claim (i) Xn → 0
since |Xn | = √1n a.s. (ii) Let C = 2. Then I[|εi |≤2] = 1, since |εn | ≤ 2.
Pn Pn
i=1 E(ε i |Fi−1 ) = Pi=1 {E(Xi ) − Xi−1 Pn}
n
P∞ = Pi=2 −X i−1 = −
P∞ i=21 Xi−1
∞ 2
i=2 Var(Xi−1 ) = i=2 EXi−1 = i=2 i−1 = ∞
P
⇒ Xi diverges a.s.
Theorem 1.19P(Chow)
n
P{X
Let n = i=1 εi , Fn } be a martingale and 1 ≤ p ≤ 2. Then Xn converges a.s.
on { ∞ i=1 E(|ε |p
i |Fi−1 ) < ∞}.
35
proof: Let C > 0.
(i) P [|εi | > C|Fi−1 ] ≤ E(|εi |p |Fi−1 )/C p .
(ii)
∞
X ∞
X
|E(εi I[|εi |≤C] |Fi−1 )| = |E(εi I[|εi |>C] |Fi−1 )|
i=2 i=2
∞
X ∞
X
≤ E(|εi |I[|εi |>C] |Fi−1 ) ≤ E(|εi |p |Fi−1 )/C p−1
i=2 i=2
(iii)
E{ε2i I[|εi |≤C] |Fi−1 } ≤ E{|εi |p C 2−p |Fi−1 }
≤ C 2−p E{|εi |p |Fi−1 }.
Pn+1
New proof: τ = inf{n : E(|εi |p |Fi−1 ) > K}, 1 < p ≤ 2.
i=1
Pn
E|Xτ ∧n |p = E| P i=1 I[τ ≥i] εi |
p
n
≤ Cp E( Pi=1 I[τ ≥i] ε2i )p/2
≤ Cp E{Pni=1 I[τ ≥i] |εi |p }
= Cp E{ n∧τ p
i=1 E(|εi | |Fi−1 )}
≤ KCp
When p = 1,
E|Xτ ∧n | ≤ E Pni=1 I[τ ≥i] |εi |
P
= E n∧τ
i=1 E(|εi ||Fi−1 ) ≤ K.
Colloary. Let {εn , Fn } be a sequence P and 1 ≤

of martingale differences p ≤ 2. Let Xn
be Fn−1 –measurable. Then ni=1 Xi εi converges a.s. on { ∞ p p
P
i=1 |Xi | E(|ε i | |Fi−1 ) <
∞}.
Remark: We does not assume that Xi is P integrable.
Proof: We can find constants ai so that ∞ i=1 P [|Xi | > ai ] < ∞. For any Z and
α > 0, we can find n so that P [|Z| > n] ≤ α.
1
an ↔ αn =
n2
P [|Xn | > an i.o.] = 0, so we can replace Xi by X̃i = Xi I[|Xi |≤ai ] . In this case, ni=1 Xi εi
P
is a martingale and E(|Xi εi |p |Fi−1 ) = |Xi |p E(|εi |p |Fi−1 ). The collary follows Chow’s
result.
Remark: If supn E(|εi |p |Fi−1 ) < ∞ then ni=1 Xi εi converges a.s. on { ∞ p
P P
i=1 |Xi | <
∞}
36
yi = βxi + εi
εi i.i.d., Eεi = 0 and V ar(εi ) = σ 2
xi is Fi−1 P
= σ(ε1 , · · · , εi−1 )–measurable. P
n ∞ P∞ 2
xi εi xi εi
β̂n = β + Pi=1 2 converges a.s. to β + 2 on { i=1 xi < ∞}
n Pi=1
∞
x
i=1 i x
i=1 i
Chow’s Theorem
n
(∞ )
X X
εi converges a.s. on E(| εi |P | Fi−1 ) < ∞ , where 1 ≤ p ≤ 2.
i=1 1
Special case:
sup E(| εi |2 | Fi−1 ) < ∞

i
n
(∞ )
X X
⇒ xi εi converges a.s. on x2i < ∞
1 1
Corollary : If un is Fn−1 measurable

then
n
( ∞
)
X X
εi = 0(un ) a.s. on the set un ↑ ∞, | ui |−p E{| εi |p | Fi−1 } < ∞
1 1
pf : Take xi = u1i
∞
X 1
Then εi converges a.s. by previous corollary. In view of Kronecker’s Lemma.
1
ui
∞
X
εi
1
−→ 0 when un ↑ ∞
un
R∞
Pn : Let2 f : [0, ∞) → (0, ∞) be an increasing fun. s.t.
Corollary 0
f −2 (t)dt < ∞ . Let
2
sn = i=1 E(εi | Fi−1 )Fn−1 measurable. Then
n n
X εi X
2
converges a.s., εi = 0(f (s2n )) a.s.
i=1
f (si ) i=1
on {s2n → ∞} where lim f (t) = ∞.
t→∞
37
pf:
∞
" 2 # ∞
X εi X E(ε2i | Fi−1 )
E | Fi−1 =
i=1
f (s2i ) i=1
f 2 (s2i )
∞ ∞ Z s2i
X s2i − s2i−1 X 1
= ≤ dt
f 2 (s2i ) s2i−1 f 2 (t)
Zi=1∞ i=1
1
≤ dt < ∞
so f 2 (t)
Remark:
1+δ
t1/2 (log t) 2 , δ > 0, t ≥ 2
f (t) =
f (2), o.w.
or f (t) = t
For this, we have that
∞
X
s2∞ = x2i E(ε2i | Fi=1 )
i=1
n n
!
X X
x i εi = 0 x2i E(ε2i | Fi=1 )
"i=1∞ i=1
#
X
on x2i E(ε2i | Fi=1 ) = ∞
i=1
If we assume that
sup E(ε2i | Fi−1 ) < ∞
i
n n
! (∞ )
X X X
x i εi = 0 x2i on x2i = ∞
i=1 i=1 i=1
In summary, under the assumption

sup E(ε2i | Fi−1 ) < ∞
i
n P∞ 2
X O(1) on { 1 xi < ∞}
x i εi =
o( 1 xi ) on { ∞
Pn 2 P 2
i=1 1 xi = ∞}
38
Example : yi = βxi + εi
where {εi , Fi } is a martingale difference seq. s.t.
sup E(ε2n | Fn−1 ) < ∞ a.s. and xi is Fi−1 measurable.

n
Then
n
X n
X
xi yi x i εi
1 1
β̂n = n =β+ n
X X
x2i x2i
1 1
∞
X
converges a.s. and the limit is β on { x2i = ∞}
1
pf:
∞
X n
X
On { x2i < ∞}, xi εi converges
1 1
∞
X
x i εi
i=1
So that β̂n → β + ∞
X
x2i
i=1
∞
X
On { x2i = ∞}
1
n
X
x i εi
i=1
n −→ 0, as n → ∞
X
x2i
i=1
So that β̂n −→ β
Application (control)
yi = βxi + εi , (β 6= 0) where εi i.i.d. with E(εi ) = 0, V ar(εi ) = σ 2
Goal: Design xi which depends on previous observations so that y ' y ∗ 6= 0
39
Strategy : choose x1 arbitrary , set
y∗
xn+1 =
β̂n
Question:
y∗
xn → a.s. ?
β
or β̂n → β a.s. ?
By previous result, β̂n always converges.
(y ∗ )2
Then x2n+1 = is bounded away from zero
β̂n2
∞
X
and x2n+1 = ∞ a.s.. Therefore, β̂n → β a.s.
1
Open Question:
Is there a corresponding result for
yi = α + βxi + εi
or yi = αyi−1 + βxi + εi
Open Questions:
∞
X
Assume that | xi |p < ∞ a.s. and
1
p
sup E(| εn | | Fn−1 ) < ∞ a.s. for some 1 ≤ p ≤ 2
n
∞
X
What are the distribution properties of S = x i εi ?
1

xi are constants 

xi 6= 0 i.o.

⇒ S has a continuous distribution
p=2 

lim inf n→∞ E(| εn || Fn−2 ) > 0 a.s.

40
Almost Supermartingale
Theorem (Robbins and Siegmund)
Let {Fn } be a sequence of increasing fields and xn , βn , yn , zn are nonnegative Fn -
measurable random variables
s.t. E(xn+1 | Fn ) ≤ xn (1 + βn ) + yn − zn a.s.
Then on
(∞ ∞
)
X X
βi < ∞, yi < ∞
i=1 i=1
∞
X
xn converges and zi < ∞ a.s.
1
pf:1o Reduction to the case βn = 0, ∀ n

n−1
Y
set x0n = xn (1 + βi )−1
i=1
n
Y
yn = yn (1 + βi )−1
0
i=1
n
Y
0
zn = zn (1 + βi )−1
i=1
n
0
Y
Then E(xn+1 | Fn ) = E(xn+1 | Fn ) (1 + βi )−1
i=1
n
Y
≤ [xn (1 + βn ) + yn − zn ] (1 + βi )−1
i=1
= x0n + yn0 − zn0
(∞ ) n
X Y
on βi < ∞ , (1 + βi )−1 converges to a nonzero limit.
i=1 i=1
41
Therefore,
X X
(i) yi < ∞ ⇐⇒ yi0 < ∞
(ii) xn converges ⇐⇒ x0n converges
X X
(iii) zi < ∞ ⇐⇒ zi0 < ∞
2o Assume that βn = 0, ∀ n
E(xn+1 | Fn ) ≤ xn + yn − zn
n−1
X n−1
X n−1
X
Let un = xn − (yi − zi ) = xn + zi − yi
1 1 1
n
X
Then E(un+1 | Fn ) = E(xn+1 | Fn ) − (yi − zi )
1
n
X
≤ xn + yn − zn − (yi − zi )
1
n−1
X
= xn − (yi − zi ) = un
1
Given a > 0 , define

n
X
τ = inf{n : yi > a}
1
∞
X
Observe that [τ = ∞] = [ yi ≤ a]
1
and uτ Λn is also a supermartingale

∧n−1
τX
uτ ∧n ≥ − yi ≥ −a, ∀ n
1
So that uτ ∧n converges a.s.
X∞
Consequently un = uτ ∧n converges on [τ = ∞] = { yi ≤ a} . Since a is arbi-
1
X∞
trary, un converges a.s. on { yi < ∞}
1
42
∞
X ∞
X
So that x + zi converges a.s. on { yi < ∞}
1 1
n
X
So that zi converges and so does xn .
1
Example : Find the quantile
Assume y = α + βx + ε where β > 0

Given y ∗ , want to find x∗
Method : choose x1 arbitrary
xn+1 = xn + an (y ∗ − yn ) , an > 0
↑ ↑
control step control direction
=⇒ Stochastic Approximation
?
Question : xn → x∗
(xn+1 − x∗ ) = (xn − x∗ ) + an (α + βx∗ − α − βxn − εn )
= (xn − x∗ )(1 − an β) − an εn
xn+1 is Fn -measurable
where Fn = σ(xo , ε1 , · · · , εn )
E((xn+1 − x∗ )2 | Fn−1 ) = (xn − x∗ )2 (1 − an β)2 + a2n σ 2
where we assume εi are i.i.d., Eεi = 0, var(i ) = σ 2
Xn = (xn+1 − x∗ )2
Zn−1 = 2an β(xn − x∗ )2 = 2βan Xn−1
Yn−1 = a2n σ 2
Bn−1 = a2n β 2
E(Xn | Fn−1 ) ≤ Xn−1 (1 + βn−1 ) + Yn−1 − Zn−1
43
P 2
Condition (1) an < ∞ P
Then Xn converges
P a.s. and Zi < ∞ a.s.
Condition (2) an = ∞
X
Pn converges
P to X
Zi = 2β ai+1 Xi < ∞
⇒ X = 0 a.s. P
Remark: Assume ai < ∞
(xn+1 − x∗ ) = (xn − x∗ ) + an (α + βx∗ − α − βxn − εn )

= (xn − x∗ )(1 − an β) − an εn
Yn Xn Yn
∗
= (1 − aj β)(x1 − x ) − (1 − a` β)aj εj
j=1 i=1 `=j+1
n n
" # n " j #−1
Y Y X Y
= (1 − aj β)(x1 − x∗ ) − (1 − a` β) (1 − a`β ) aj ε j
j=1 `=1 j=1 `=1
n
Y
P
when aj < ∞, Cn = (1 − aj β) converges to C > 0.
j=1
" ∞
#
X
So that xn − x∗ → C (x1 − x∗ ) − Cj−1 aj εj
j=1
∞
X
Note that (Cj−1 aj )2 < ∞, Cj−1 aj > 0 ∀j
j=1
∞
X
So that Cj−1 aj εj has a continuous distribution
j=1
This implies that where xi is a const.

" ∞
#
X
P (x1 − x∗ ) − Cj−1 aj εj = 0 = 0
j=1
Central Limit Theorems (CLT)

Reference: I.S. Helland (1982)
Central Limit Theorems for martingales with discrete or continuous time. Scand J.
Statist. 9, 79∼ 94.
44
Classical CLT:
Assume that ∀ n
Xn,i , 1 ≤ i ≤ kn are indep. with EXn,i = 0.
Xkn
2 2
Let sn = Xn,i
i=1
kn
X 1 2
Thm. If ∀ ε > 0, 2
E Xn,i I[|Xn,i |>sn ε] → 0
s
i=1 n
kn
X Xn,i D
then → N (0, 1)
i=1
sn
Xn,i
* Reformulation: X
en,i =
sn
kn
X
(i) e2 ) = 1
E(Xn,i
i=1
Xkn h i
2
(ii) E Xen,i I[|Xen,i |>ε] → 0 ∀ ε
i=1
0
(ii) is Lindeberg s condition
* uniform negligibility (How to use mathematics to formulate?)
( D
max | Xn,i |→ 0
1≤i≤kn
2
controlXn,i
* condition of varience
To recall Burkholder0 s inequality: ∀ 1 < p < ∞
n
!p/2 n
!p/2
X X
Cp0 E d2i ≤ E | Sn |p ≤ Cp E d2i
i=1 i=1
EZ p d2i )1/2 EZ p
P
Z=(
kn kn
f ormalize
X X
2 2
Xn,i → (Xn,i | Fn,i−1 )
i=1 i=1
1 j
X X
2 2
Xn,i E(Xn,i | Fn,i−1 )
i=1 i=1
↑ ↑
optional quadratic variance predictable quadratic variance
45
( Thm. j∀ n ≥ 1, {Fn,j ; 1 ≤
) j ≤ kn < ∞} is a sequence of increasing σ-fields. Let
X
Sn,j = Xn,i , 1 ≤ j ≤ kn be {Fn,j }-adaptive.
i=1
Define
Xn∗ = max | Xn,i |,
1≤i≤kn
j
X
2 2
Un,j = Xn,i , 1 ≤ j ≤ kn
i=1
Assume that
kn
D
X
(i) Un2 = 2
Un,k n
= 2
Xn,i → Co , where Co > 0 is a constant.
i=1
D
(ii) Xn∗ → 0
(iii) sup E(Xn∗ )2 < ∞
n≥1
kn kn
D D
X X
(iv) E{Xn,j | Fn,j−1 } → 0 and E 2 {Xn,j | Fn,j−1 } → 0
j=1 j=1
Then
kn
D
X
Sn = Xn,i → N (0, Co )
i=1
= Sn,kn
Remark:{Xn,j , 1 ≤ j ≤ kn } can be defined on different probability space for different
n.
Step 1. Reduce the problem to the case
where {Sn,j , Fn,j , 1 ≤ j ≤ kn } is a martingale. Set
en,j = Xn,j − E(Xn,j | Fn,j−1 )
X 1 ≤ j ≤ kn , Fn,o : trivial field
kn
X
2 e2
Un =
e X n,j
j=1
en∗
X = max | X
en,j |
1≤j≤kn
kn
X
Sen = X
en,j
j=1
46
kn
D
X
(a)Sn − Sen = E(Xn,j | Fn,j−1 ) → 0 by(iv)
j=1
1/2
en∗ ≤ | E(Xn,j | Fn,j−1 ) |2

(b) X max | Xn,j | + max
1≤j≤kn 1≤j≤kn
(k )1/2
X n
= Xn∗ + E 2 (Xn,j | Fn,j−1 )

j=1
e∗ → D
So that X n 0 by (ii) and (iv)

∗ 2 ∗ 2 2
(X ) ≤ 2(Xn ) + 2 max E (Xn,j | Fn,j−1 )
e
1≤j≤kn

∗ 2 2 ∗
≤ 2(Xn ) + 2 max E (Xn | Fn,j−1 )
1≤j≤kn
| E(Xnj | Fn,j−1 ) |≤ E(| Xn,j || Fn,j−1 ) ≤ E(Xn∗ | Fn,j−1 )

Vj = E(Xn∗ | Fn,j ) is a martingale 1 ≤ j ≤ kn

E sup Vj ≤ 4E(Xn∗ )2
2
1≤j≤kn
by Doob0 s ineq. ∞ > p > 1 ,

k sup | Xj | kp ≤ q k Xn kp .
1≤j≤n
e ∗ )2 ≤ 2E(X ∗ )2 + 2 × 4E(X ∗ )2
So that E(Xn n n
∗ 2
= 10E(Xn ) < ∞
kn kn
D
X X
e2 − U 2 =
U 2
E (Xn,j | Fn,j−1 ) − 2 Xn,j E(Xn,j | Fn,j−1 ) → 0
n n
j=1 j=1
kn
D
X
E 2 (Xn,j | Fn,j−1 ) → 0 By(iv)
j=1
kn
D
X
Xn,j E(Xn,j | Fn,j−1 ) → 0
j=1
47
kn kn
!1/2 kn
!1/2
D
X X X
2
Because | Xn,j E(Xn,j | Fn,j−1 ) |≤ Xn,j E 2 (Xnj | Fn,j−1 ) →0
i=1 j=1 i=1
kn
!1/2
D
X
2
Xn,j = (Un2 )1/2 → Co1/2
j=1
kn
!1/2
D
X
E 2 (Xn,j | Fn,j−1 ) →0
i=1
D
e2 →
So that Un Co
Thm. ∀n ≥ 1, {Fn,j , 1 ≤ j ≤ kn < ∞} is a sequence of increasing σ-fields. Let

j
X
{Sn,j = Xn,i , 1 ≤ j ≤ kn } be {Fn,j } -martingale. Define Xn∗ = max | Xn,i |
1≤i≤kn
i=1
j
X
2 2
, Un,j = Xn,i , 1 ≤ j ≤ kn
i=1
Assume that
kn
D
X
(i) Un2 = 2
Un,k n
= 2
Xn,i → Co , where Co > 0 is a constant
i=1
D
(ii) Xn∗ → 0
(iii) sup E(Xn∗ )2 < ∞
n≥1
Then
kn
D
X
Sn = Xn,i → N (0, Co )
i=1
Step 2. Further Reduction. Define

2
inf{i : 1 ≤ i ≤ kn , Un,i > C} , when Un2 > C
τ=
kn , when Un2 ≤ C
48
where C > Co
Define X̂n,j = Xn,j I[τ ≥j]

kn
X kn
X τ
X
Ŝn = X̂n,j = Xn,j I[τ ≥j] = Xn,i
j=1 i=1 j=1
j
X
2 2
Ûn,j = X̂n,i ,
i=1
∗
X̂ = max | X̂n,j |
1≤i≤kn
τ
X
2 2
Ûn = Ûn,k n
= Xn,j
j=1
P (Sn 6= Ŝn ) ≤ P (Un2 > C) → 0
⇒ It is sufficient to show that

D
Ŝn → N (0, Co )
If C ≥ Un2 then Ûn2 = Un2

If C < Un2 then τ ≤ kn and
τ −1
X
2
C < Ûn = 2
Xn,j 2
+ Xn,τ ≤ C + (Xn∗ )2
i=1
So that Un2 ∧ C ≤ Ûn2 ≤ (Un2 ∧ C) +(Xn∗ )2

↓ ↓ ↓D
Co ∧ C = Co Co ∧ C = Co 0
D
⇒ Ûn2 → Co
Clearly, X̂n∗ ≤ Xn∗
D
Therefore, X̂n∗ → 0 by (ii) and
sup E(X̂n∗ )2 ≤ sup E(Xn∗ )2 < ∞
n≥1 n≥1
Step 3. E eiŜn → e−co /2

D
Claim: This is sufficient to show Sn → N (0, Co )
49
Reason : Step 3 ⇒ E eiSn → e−Co /2
2
Now replace Sn by t Sn . Using step 3 again, we obtain EeitSn → e−t Co /2
(a) Expansion
2
eix = (1 + ix)e(−x /2)+r(x) , where | r(x) |≤| x |3 for | x |< 1
Because | x |< 1
⇒ ix = [log(1 + ix)] − x2 /2 + r(x)
x2
⇒ r(x) = + ix − log(1 + ix)
2 "∞ #
x2 X (ix)j
= + ix − (−1)j+1
2 j=1
j
∞ j
X
j (ix) (ix)3 (ix)4
= (−1) =− + − ···
j=3
j 3 4
= x a(x) + x3 b(x)i
4
1 x2 x4 1
where a(x) = − + − ··· <
4 6 8 4
2 4
1 x x 1
b(x) = − + ··· <
3 5 7 3
p
| r(x) | = x8 a2 (x) + x6 b2 (x)
r r
x8 x6 3 1 1
≤ + ≤| x | + ≤| x |3
16 9 16 9
kn
Y
iŜn
e = eiXn,j
j=1
kn
X Xkn
2
"k
n
# − X̂n,j /2 + r(X̂n,j )
Y
= (1 + iX̂n,j ) e j=1 j=1
j=1
def 2
= Tn e−Ûn /2+Rn
h 2
i 2
= (Tn − 1)e−Co /2 + (Tn − 1) e−Ûn /2+Rn − e−Co /2 + e−Ûn /2+Rn
= In + IIn + IIIn
50
Note that on {X̂n∗ < 1}
kn
X kn
X
| Rn | ≤ | r(X̂n,j ) |≤ | X̂n,j |3
j=1 j=1
kn
X
≤ X̂n∗ 2
X̂n,j = X̂n∗ Ûn2
j=1
So that | Rn |≤| Rn | I[X̂n∗ ≥1] + X̂n∗ Ûm

2
↓D ↓D ↓D
0 0 Co
D
⇒ Rn → 0
D
So that IIIn → e−Co /2
(k )
X n
Now E | Tn |2 = E 2
(1 + X̂n,j )
j=1
Y
2 2
= E(1 + X̂n,τ ) (1 + X̂n,j )
j<τ
τ −1
X
2
X̂n,j
≤ E(1 + X̂n∗2 )e j=1
≤ ec E(1 + X̂n∗2 ) < ∞
So that {Tn } is u.i. ⇒ conv. in dist.

⇒ conv. in expectation
D
| IIn |= | Tn − 1 | | IIIn − e−co /2 |→ 0
k ↓D
Op (1) 0
51
E(In ) = e−co /2 [E(Tn ) − 1] = 0
(k )
Y n
E(Tn ) = E (1 + iX̂n,j )
j=1
(k )
Y n
= E (1 + iX̂n,j ) · E(1 + iX̂n,kn | Fn,kn −1 )

j=1
(k −1 )
n
Y
= E (1 + iX̂n,j ) = · · · = E{1 + iX̂n,1 } = 1
j=1
So that eiŜn = In + IIn + IIIn

D D
E(In ) = 0, IIn → 0, IIIn → e−co /2
In = (Tn − 1)e−co /2 is u.i.
D
eiŜn − In = IIn + IIIn → e−co /2
But eiŜn − In is u.i.
Therefore E(eiŜn ) = E(eiŜn − In )

u.i.
→ E(e−co /2 ) = e−co /2 , as n → ∞
Note:
j
( )
X
∀n Sn,j = Xn,i , Fn,j is a martingale
i=1
kn
D
X
(i) Un2 = 2
Xn,i →C>0
i=1
D
(ii) sup | Xn,i |→ 0
1≤i≤kn
(iii) sup E sup | Xn,i |2 < ∞

n 1≤i≤kn
kn
D
X
⇒ Sn = Xn,i → N (0, C)
i=1
52
Lemma 1. Assume that Fo ⊂ F1 ⊂ · · · ⊂ Fn
Then ∀ ε > 0 ,
n
! ( n )
[ X
P Ai ≤ ε + P P (Aj | Fj−1 ) > ε
i=1 j=1
k
X
pf: Let µk = P (Aj | Fj−1 )
j=1
Then µk is Fk−1 -measurable
n
! n
[ X
So that P Ai [µn ≤ ε] ≤ P (Ai [µn ≤ ε])
i=1 i=1
Xn
≤ P (Ai [µi ≤ ε])
i=1
n
X
= E E(IAi I[µi ≤ε] | Fi−1 )
i=1
n
X
= E E(IAi | Fi−1 )I[µi ≤ε]
i=1
≤ ε
j
X
Lemma : Zj ≥ 0, µj = E(Zi | Fi−1 )
i=1
n
X n
X
Then E Zi I[µi ≤ε] = E E(Zi | Fi−1 )I[µi ≤ε] ≤ ε
i=1 i=1
pf: Set τ = max{j : 1 ≤ j ≤ n, µj ≤ ε}

Then, since µ1 ≤ µ2 ≤ · · · ≤ µτ
n
X
E(Zi | Fi−1 )I[µi ≤ε]
i=1
τ
X
= E(Zi | Fi−1 ) = µτ ≤ ε.
i=1
53
Corollary. Assume that Ynj ≥ 0 a.s. and Fn,1 ⊂ · · · ⊂ Fn,kn
kn
D
X
Then P (Yn,j > ε | Fn,j−1 ) → 0, ∀ ε
j=1
D
⇒ max Yn,j → 0
a≤j≤kn
kn
D
X
Remark : E[Yn,j I[Yn ,j>ε] | Fn,j−1 ] → 0 is sufficient
j=1
pf : Let Yn∗ = max Yn,j
1≤j≤kn
"k #
[ n
P (Yn∗ > ε) = P (Yn,j > ε)

j=1
" kn
#
X
≤η+P P ([Yn,j > ε] | Fn,j−1 ) > η ∀η > 0 By Lemma 1
j=1
lim sup P [Yn∗ > ε] ≤ η
n→∞
Set η → 0
Lemma 2. ∀n {Yn,j } is {Fn,j }-adaptive

Assume that Yn,j ≥ 0 a.s. and E(Yn,j ) < ∞
j j
X X
Let Un,j = Yn,i and Vn,j = E(Yn,j | Fn,j−1 )
i=1 i=1
Un = Un,kn , Vn = Vn,kn
k n
D
X
If E{Yn,j I[Yn,j >ε] | Fn,j } → 0
i=1
and {Vn } is tight (i.e. lim sup P (Vn > λ) = 0)
λ→∞ n
D
then max | Un,j − Vn,j |→ 0
1≤j≤kn
D
pf: By previous corollary Yn∗ → 0
0
Let Yn,j = Yn,j I[Yn,j ≤δ, Vn,j ≤λ]
54
0 0
Define Un,j , Vn,j , Un0 , Vn0 similarly

Then P max | Un,j − Vn,j |> 3γ
1≤j≤kn

0
≤ P max | Un,j − Un,j |> γ
1≤j≤kn

0 0
+P max | Un,j − Vn,j |> γ
1≤j≤kn

0 def
+P max | Vn,j − Vn,j |> γ ≡In + IIn + IIIn
1≤j≤kn
(1) In ≤ P [∃j 3 Yn,j > δ or Vnj > λ]

≤ P [Yn∗ > δ] + P [Vn > λ]

1 0 0 2
(2) IIn ≤ 2 E max (Un,j − Vn,j )
r 1≤j≤kn
1
≤ 2 4E(Un0 − Vn0 )2
r
kn
4 X 0 2 0
= 2 [E(Yn,j ) − E(E 2 (Yn,j | Fn,j−1 ))]
r i=1
kn
4 X 0 2
≤ E(Yn,j )
r2 j=1
k kn
!
n
4 X 0 4 X
0
≤ 2δ E(Yn,j ) = 4 δE E(Yn,j | Fn,j−1 )
γ j=1 γ j=1
kn
!
4δ X
≤ 2E E Yn,j I[Vnj ≤λ] | Fn,j−1
r j=1
kn
!
4δ X 4δλ
= 2E E[Yn,j | Fn,j−1 ]I[Vn,j ≤λ] ≤
r j=1
γ2
55
0
(3) Note that max | Vn,j − Vn,j |
1≤j≤kn
j
X
0
≤ max | (E(Yn,i | Fn,i−1 ) − E(Yn,i | Fn,i−1 )) |
1≤j≤kn
i=1
kn
X
0
≤ E(| Yn,i − Yn,i || Fn,i−1 )
i=1
kn
X
≤ E(Yn,j I[Yn,j >δ or Vn,j >λ] | Fn,j−1 )
j=1
kn
X kn
X
≤ E(Yn,j I[Yn,j >δ] | Fn,j−1 ) + E(Yn,j I[Vn,j >λ] | Fn,j−1 )
j=1 j=1
kn
X kn
X
≤ E(Yn,j I[Yn,j >δ] | Fn,j−1 ) + E(Yn,j | Fn,j−1 )I[Vn,j >λ]
j=1 j=1
kn
X kn
X
≤ E(Yn,j I[Yn,j >δ] | Fn,j−1 ) + E(Yn,j | Fn,j−1 )I[Vn >λ]
j=1 j=1
kn
X
≤ E(Yn,j I[Yn,j >δ] | Fn,j−1 ) + Vn I[Vn >λ]
j=1
" kn
#
X γ
IIIn ≤ P E(Yn,j I[Yn,j >δ] | Fn,j−1 ) >
j=1
2
h γi
+P Vn I[Vn >λ] >
"k 2 #
n
X γ
≤ P E(Yn,j I[Yn,j >δ] | Fn,j−1 ) > + P [Vn > λ]
j=1
2

4δλ
So that lim sup P max | Un,j − Vn,j |> 3γ ≤ 2 sup P [Vn > λ] +
n→∞ 1≤j≤kn n γ2
1
Let λ → ∞, δ = λ2
. The proof is completed.
56
j
X
Thm. ∀ n {Sn,j = Xn,i , Fn,j } is a martingale
i=1
kn
D
X
If (i) Vn2 = 2
E(Xn,i | Fn,i−1 ) → C > 0
i=1
kn
D
X
2 0
and (ii) E(Xn,i 2 >ε] | Fn,i−1 ) → 0 Conditional Lindeberg s condition
I[Xn,i
i=1
kn
D
X
then Sn = Xn,i → N (0, C)
i=1
2
pf: Set Yn,j = Xn,j
2 D
By (ii) and lemma 1, Yn∗ = max Xn,j →0
1≤j≤kn
D
or max | Xn,j |→ 0
1≤j≤kn
By (i), {Vn2 } is tight.
Therefore by (ii) and lemma 2.
D D
Vn2 − Un2 → 0, So that Un2 → C by (i).
0
Now define Xn,j = Xn,j I j

X
2
E(Xn,j 2 >ε] | Fn,j−1 ) ≤ 1
I[Xn,j
 

 
i=1
" kn
#
X
2
Since P [Sn 6= Sn0 ] ≤ P E(Xn,j 2 >ε] | Fn,j−1 ) > 1
I[Xn,j →0
j=1
D
So that it is sufficient to show that Sn0 → N (0, C)
0 D
(a) max | Xn,j |≤ Xn∗ → 0
1≤j≤kn
"k #
n
02
X
2 2
(b) P [Un 6= Un ] ≤ P 2 >ε] | Fn,j−1 ) > 1
E(Xn,j I[Xn,j →0
j=1
02 D
So that Un → C
57

0 0 0
(c) E max (Xn,j )2 ≤ E max (Xn,j )2 I[(Xn,j
0 )2 ≤ε] + E max (X
2
n,j ) I[(Xn,j
0 )2 >ε]
1≤j≤kn 1≤j≤kn 1≤j≤kn
kn
X
0
≤ ε+E (Xn,j )2 I[(Xn,j
0 )2 >ε]
j=1
kn
X
2
= ε+E Xn,j 2 >ε] I
I[Xn,j j

X
j=1 2
E(Xn,i 2 >ε] | Fn,i−1 ) ≤ 1
I[Xn,i
 

 
i=1
≤ ε + 1 < ∞.
( i
)
X
Thm. Let Sn,i = Xn,j , Fn,i 1 ≤ j ≤ kn be a martingale, s.t.
j=1
kn
D
X
2
(i) E(Xn,i | Fn,i−1 ) → C > 0
i=1
and
kn
D
X
2
(ii) An = E(Xn,i 2 >ε] | Fn,i−1 ) → 0 ∀ ε
I[Xn,i
i=1
kn
D
X
Then Sn = Xn,i → N (0, C)
i=1
Conditional Lyapounov condition

kn
D
X
Bn = E(| Xn,i |2+δ | Fn,i−1 ) → 0 for some δ > 0
i=1
Lyapounov0 s condition ⇒ Lindeberg0 s condition

kn kn
| xn,i |2+δ

D
X X
2
E Xn,i I[Xn,i
2 >ε] | Fn,i−1 ≤ E √ | Fn,i−1 →0
i=1 i=1
( ε)δ
kn
X
2
E(An ) = E(Xn,i 2 >ε] ) → 0
I[Xn,i
i=1
Xkn
E(Bn ) = E | Xn,i |2+δ → 0
i=1
58
Both are sufficient since An ≥ 0 and Bn ≥ 0
Example: yi = βxi + εi , i = 1, 2, · · ·
n
X n
X
xi yi x i εi
i=1 i=1
β̂n = n =β+ n
!
X X
x2i x2i
i=1 i=1
Assumptions:
n
an X
(1) ∃ an > 0 s.t. an ↑ ∞, → 1 and x2i /an → 1 a.s.
an+1 i=1
(2) εi i.i.d. E(εi ) = 0, V ar(εi ) = σ 2
(3) xi is Fi = σ(xo , ε1 , · · · , εi−1 ) measurable
(a) If E | ε1 |2+δ < ∞ then
√ D
an (β̂n − β) → N (0, σ 2 )
(b) If (xi , εi ) are identically distributed with
E(Xi2 ) < ∞, and an = n, then
√ D
n(β̂n − β) → N (0, σ 2 )
n
X
x i εi
i=1 xi εi
Consider Sn = √
an
, i.e. Xn,i = √
an
, kn = n
kn n
X
2
X x2 i
(1) E(Xn,i | Fn,i−1 ) = E(ε2i )
i=1 i=1
an
Xn
x2i
i=1 a.s.
= σ2 → σ2
an
59
n n !
X X Xi 2+δ
(a) E(| Xn,i |2+δ | Fn,i−1 ) = √
an (E | ε1 |2+δ )
i=1 i=1
 n
X


max | Xi |  | Xi |2 
δ
1≤i≤n a.s.
  i=1
 
≤ √  E | ε1 |2+δ →0
an 
 a n


n
X n−1
X
x2i x2i
x2n i=1 an−1 i=1 a.s.
= − · →0
an an an an−1
max (x2i )
1≤i≤n a.s.
⇒ →0
an
n
!
X Xi2 ε2i
(b) E I Xi2 ε2i
i=1
n n
>δ
n
!
1X
= E X12 ε21 I X12 ε21
n i=1 n
>δ
n→∞
= E(X12 ε21 I[X12 ε21 >nδ] ) −→ 0
Note that
E(X12 ε21 ) = E(X12 E(ε21 | Fo )) = σ 2 E(X12 ) < ∞
Lemma. If Z ≥ 0 and E(Z) < ∞
then lim E(ZI[Z>Cn ] ) = 0 when Cn → ∞

n→∞
0 ≤ Zn = ZI[Z>Cn ] ≤ Z
Zn → 0a.s. by Lebesgue Dominated Convergence Theorem
Theorem 1. (Unconditional form)
60
( i
)
X
Let Sn,i = Xn,j , Fn,i , 1 ≤ i ≤ kn be a martingale s.t.
j=1
kn
D
X
2
(1) Xn,j →C>0
j=1
D
(2) Xn∗ = max | Xni |→ 0
1≤i≤kn
(3) sup E(Xn∗ )2 < ∞

n
kn
D
X
Then Sn = Xn,i → N (0, C)
i=1
Theorem 3.
(1) +
E(Xn∗ ) → 0 is sufficient
!
Note that (3) ⇒ {Xn∗ } is u.i.
(2) + u.i. ⇒ lim E(Xn∗ ) = 0
n→∞
Theorem 30 .
(1)+(2)+
kn
D
X
E(Xn,j I[|X
nj |>1]
|Fn,j−1 )| → 0 is sufficient
j=1
Lemma. Assume that Yn,j ≥ 0 is Fnj -adaptive

∗
If E(Yn ) = E max Ynj = 0(1)
1≤j≤kn
kn
D
X
then E(Yn,j I[Yn,j >ε] | Fn,j−1 ) → 0 ∀ ε > 0
j=1
S kn
> ε] = [Yn∗ > ε]

inf {1 ≤ j ≤ kn : Yn,j > ε} on j=1 [Yn,j
pf : Define τn =
kn Otherwise
61
∀δ(> 0 )
kn
X
P E(Yn,j I[Yn,j >ε] | Fn,j−1 ) > δ Fn,j−1 −measurable
j=1
(τ )
Xn
≤ P {τn < kn } + P E(Yn,j I[Yn,j >ε] | Fn,j−1 ) > δ

j=1
(k )
X n
≤ P {Yn∗ > ε} + P I[τn ≥j] E(Yn,j I[Yn,j >ε] | Fn,j−1 ) > δ

j=1
(k )
X n
≤ P {Yn∗ > ε} + P E(Yn,j I[τn ≥j,Yn,j >ε] | Fn,j−1 ) > δ

j=1
kn
!
X
≤ ε−1 E(Yn∗ ) + δ −1 E Yn,j I[τn ≥j,Yn,j >ε]
j=1
kn
!
X
≤ ε−1 E(Yn∗ ) + δ −1 E Yn∗ I[τn ≥j,Yn,j >ε]
j=1
−1
≤ε E(Yn∗ ) +δ −1
E(Yn∗ ) → 0.
Corollary 1. Yn,j ≥ 0 is Fn,j -adaptive

kn
D D
X
If Yn∗ → 0 then P [Yn,j > ε | Fn,j−1 ] → 0, ∀ ε > 0
j=1
pf: Fix ε > 0
Let znj = I[Yn,j >ε] ≥ 0

zn∗ = max I[Yn,j >ε] = I[Yn∗ >ε]
1≤j≤kn
E(zn∗ ) = P [Yn∗ > ε] = 0(1)
62
kn
D
X
Therefore E(zn,j I[zn,j > 1 ] | Fn,j−1 ) → 0
2
j=1
kn
X
= E(I[Yn,j >ε] I[zn,j =1] | Fn,j−1 )
j=1
kn
X
= E(I[Yn,j >ε] | Fn,j−1 )
j=1
kn
X
= P (Yn,j > ε | Fn,j−1 ).
j=1
Corollary 2. Thm 3. is a corollary of Thm 30 .

pf: Let Yn,j =| Xn,j |
Then E(Yn∗ ) = E(Xn∗ ) → 0
kn
D
X
So that E(| Xn,j | I[|Xn,j |>ε] | Fn,j−1 ) → 0.
j=1
Corollary 3.
If (1) Yn,j ≥ 0 is Fn,j -adaptive
(2) | Yn,j |≤ C ∀ n, j
D
(3) Yn∗ → 0
kn
D
X
2
then E(Yn,j 2 >ε] | Fn,j−1 ) → 0
I[Yn,j
j=1
k
X n
2
pf: E(Yn,j 2 >ε] | Fn,j−1 )
I[Yn,j
j=1
√ D
≤ C 2 kj=1
Pn
P [Yn,j > ε | Fn,j−1 ] → 0 by (3) and Corollary 1.
kn
D
X
E(Yn,j I[Yn,j >ε] | Fn,j−1 ) → 0
j=1
kn
D
Pkn X
Vn = j=1 E(Yn,j | Fn,j−1 ) is tight ⇒| Yn,j − Vn |→ 0
j=1
63
pf. of Theorem 30
kn
X
Sn = Xn,i
i=1
Xkn kn
X
= Xn,i I[|Xn,i |≤1] + Xn,i I[|Xn,i |>1]
i=1 i=1
Let X
en,i = Xn,i I[X |≤1]
n,i
Note that
P [Xn,j 6= X
en,j , for some 1 ≤ j ≤ kn ]
≤ P [Xn∗ > 1] → 0 by (2)
D
So that Sn − Sen → 0
kn
D
X
2
and (1) gives Xen,j →C
j=1
en,j − E(X
X̄n,j = X en,j | Fn,j−1 )
kn
X
Sn − S̄n =
e E(Xn,j I[|Xn,j |≤1] | Fn,j−1 )
j=1
kn
X
= − E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) By martingale properties.
j=1
kn
D
X
So that | Sen − S̄n | ≤ | E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) |→ 0
j=1
Observe that
|Xen,j |≤ 1 ⇒| X̄n,j |≤ 2
So that sup E(X̄n∗ ) ≤ 2 [(3) is satisfied]
n
64
∗
Xn = max | X
en,j − E(X
en,j | Fn,j−1 ) |
1≤j≤n
≤ max | X
en,j | + max | E(Xn,j I[|X |>1] | Fn,j−1 ) |
n,j
1≤j≤n 1≤j≤n
kn
X
≤ max | Xnj | + | E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) |
1≤j≤n
j=1

X kn kn
2 X
2
X n,j − Xn,j
e

j=1 j=1

X kn X kn
= −2 Xen,j E(X en,j | Fn,j−1 ) + E 2 (X
en,j | Fn,j−1 )

j=1 j=1
k kn
X n X
≤ 2 Xn,j E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) + E 2 (Xn,j I[|Xn,j |>1] | Fn,j−1 )
e

j=1 j=1
kn
! 1/2 kn
!1/2
X X
2
≤2 Xen,j E 2 (Xn,j I[|X |>1] | Fn,j−1 )
n,j
j=1 j=1
kn
X
+ E 2 (Xn,j I[|Xn,j |>1] | Fn,j−1 )
j=1
It is sufficient to show
kn
D
X
| E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) |2 → 0 (By the assumption ∀ 0 < δ < 1)
j=1
kn
(k )2
n
D
X X
| E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) |≤ | E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) | →0
j=1 j=1
65
Homework: Assume that Xn,j is Fnj -measurable
kn
D
X
2
(1) E(Xn,j 2 >ε] | Fn,j−1 ) → 0
I[Xn,j
j=1
kn
D
X
(2) E(Xn,j | Fn,j−1 ) → 0
j=1
kn
D
X
2
(3) {E(Xn,j | Fn,j−1 ) − E 2 (Xn,j | Fn,j−1 )} → C > 0
j=1
kn
D
X
Then Sn = Xn,j → N (0, C)
j=1
Exponential Inequality:
Theorem 1 (Bennett0 inequality):
Assume that {Xn } is a martingale difference with respect to {Fn } and τ is an {Fn }-
stopping time (with possible value ∞). Let σn2 = E(Xn2 | Fn−1 ) for n ≥ 1. Assume
Pτ
that ∃ positive constants U and V such that Xn ≤ U a.s. for n ≥ 1 and i=1 σi2 ≤ V
a.s., Then ∀ λ > 0
( τ )
X 1 2 −1 −1
P Xi ≥ λ ≤ exp − λ V ψ(4λV )
i=1
2
where ψ(λ) = (2/λ2 )[(1 + λ)log(1 + λ) − λ], ψ(0) = 1.

Note:
n ∞
√
Z
X 1 x2 1 1 − λ2
(i) Xi / n =⇒ √ e− 2 dx ∼ √ e 2.
i=1
2π λ 2π λ
(ii) Prokhorov0 s “arcsinh” inequality:

Its upper bound is

1 −1 −1
h = exp − λ(2υ) arcsinh(υλ(2V )
2
where υλV ≈ 0, arcsinh[υλ(2V )−1 ] ∼
−1
= υλ(2V )−1
λ2

∼ 1 −1 −1
h = exp − λ(2υ) υλ(2V ) = exp −
2 8V
66
Reference: (i) Annals probability (1985).
Johson, Schechtman, and Zin.
(ii) Journal of theoretical probalility (1989) (Levental).
Corollary:(Bernsteins in equality).
τ
X 1 2 1
P( Xi ≥ λ) ≤ exp − λ /(V + υλ)
i=1
2 3
proof:
λ
By ψ(λ) ≥ (1 + )−1 , ∀ λ > 0.
3
idea:(i) Note that on (τ = ∞)
∞
X τ
X
since E(Xi2 | Fi−1 ) = σi2 ≤ V a.s.
i=1 i=1
τ
X
Xi coverges a.s. on(τ = ∞).
i=1
(By Chow0 s Theorem).

(ii) We can replace
( τ ) ( τ )
X X
P Xi ≥ λ by P Xi > λ
i=1 i=1
since λ > 0, δ > 0.
( τ )
X 1 2 −1 −1
P Xi > λ + δ ≤ exp − (λ + δ) V ψ(υ(λ + δ)V
i=1
2
( τ )
X
Let δ ↓ 0. Left = P Xi ≥ λ
i=1

1 2 −1 −1
right = exp − λ V ψ(υλV )
2
67
(iii)
τ
X ∞
X n
X
Xi = Xi I[τ ≥i] = lim Xi I[τ ≥i] a.s. (By (i))
n→∞
i=1 i=1 i=1
 
τ
!
X  
P Xi > λ = E I X
 
τ 
Xi > λ]
i=1
 
[
i=1
 
 
≤ E lim inf I  (Fatou0 s Lemma)
 
n
n→∞  
X 


Xi I[τ ≥i] > λ
 
i=1
 
 
≤ lim inf E I X
 
n 
n→∞
Xi I[τ ≥i] > λ}
 
{
i=1
Therefore, it is sufficient to show that

n
!
X 1 2 −1 −1
P Xi I[τ ≥i] > λ ≤ exp − λ V ψ(υλV ) , ∀ n
i=1
2
(iv) {Xi I[τ ≥i] , Fi } is a martingale difference sequence.

X
since [τ ≥ i] = Ω\ (τ = j)Fi−1 − measurable.
j<i
=⇒ E(Xi I[τ ≥i] | Fi−1 ) = I[τ ≥i] E(Xi | Fi−1 ) = 0
So that,
n
X n
X
Xi2 I[τ ≥i] I[τ ≥i] E(Xi2 | Fi−1 )

E | Fi−1 =
i=1 i=1
τ
X
≤ σi2 ≤ V.
i=1
68
and Xi I[τ ≥i] ≤ υ a.s.
Proof: Let Yi = Xi I[τ ≥i] .
E(etYi | Fi−1 ), t > 0 (etYi ≤ etυ )

∞ j j
!
X t Yi
= E 1 + tYi + | Fi−1
j=2
j!
∞ j
X t E[Yi2 | Fi−1 ] j−2
≤ 1+ υ , Yij = Yi2 Yij−2 ≤ Yi2 υ j−2
j=2
j!
∞ j
X t I[τ ≥i]
= 1+ σi2 υ j−2
j=2
j!
∞
!
X tj υ j
= 1+ I[τ ≥i] σi2
j=2
j!υ 2
2
= 1 + g(t)I[τ ≥i] σi2 ≤ eg(t)I[τ ≥i] σi ,
where
g(t) = (etυ − 1 − tυ)/υ 2 , and

∞ j j ∞
X tυ X (υt)j
= − 1 − tυ = etυ − 1 − tυ
j=2
j! j=0
j!
Claim:
 
j j
X
I[τ ≥i] σi2 
X  
t Yi 
 g(t)
i=1
e i=1 /e
is a supermartingale.
69
proof:
 n
X n
X

 t Yi − g(t) I[τ ≥i] σi2 
E e i=1 i=1 | Fn−1 
 
 
n−1
X n
X
t Yi − g(t) I[τ ≥i] σi2
E etYn | Fn−1

=e i=1 i=1
n−1
X n−1
X
t Yi − g(t) I[τ ≥i] σi2
≤e i=1 i=1
n
X n
X
 n
X

t Yi t Yi g(t)V −

I[τ ≥i] σi2 
Ee i=1 ≤ Ee i=1 · e i=1
 n n 
X X
 t Yi − g(t) I[τ ≥i] σi2 
 g(t)V
= E e e
 i=1 i=1
 
n
!
X
≤ eg(t)V since V − I[τ ≥i] σi2 > 0
i=1
( n )
X Pn
−λt t Yi
P Yi > λ ≤ e E e i=1
i=1
≤ e−λt · eg(t)V = e−λt+g(t)V , ∀ t > 0,
⇒
( n )
X inf (−λt + g(t)V )
P Yi > λ ≤ et>0
i=1
Differentiate h(t) = −λt + g(t)V

we obtain the minmizer to = υ −1 log(1 + υλV −1 )
70
Therefore
( n )
X
P Yi > λ ≤ eh(to)
i=1
2
λ −1 −1
= exp − V ψ(υλV ) .
2
Note:
Pn Pn
Eet i=1 Yi
= E E et i=1 Yi | Fn−1
Remark:
(i) ψ(0+ ) = 1
(ii) ψ(λ) ∼= 2λ−1 logλ, as λ → ∞.
λ
(iii) ψ(λ) ≥ (1 + )−1 , ∀λ > 0.
3
Reference: Appendix of shorack and wellner (1986, p.852).
∀ λ>0
( τ τ
) 2
X X
2 λ −1 −1
P Xi > λ, σi ≤ V ≤ exp − V ψ(υλV )
i=1 i=1
2
also holds.
Example:
∞
X
V = σi2 < ∞
( ni=1 ) ( τ )
X X
P Xi > λ, for some n ≤ P Xi > λ
i=1 i=1
( n
)
X
Let τ = inf n: Xi > λ .
i=1
Theorem 2 (Hoeffding0 s inequality):

Let {Xn , Fn } be an adaptive sequence such that ai ≤ Xi ≤ bi , a.s.
and µi = E[Xi | Fi−1 ]
71
Then ∀ λ > 0,
( n n
)
2λ2
X X
P Xi − µi ≥ λ ≤ exp − Pn 2
i=1 i−1 i=1 (bi − ai )
2n2 λ2

or P X̄n − µ̄n ≥ λ ≤ exp − Pn 2
i=1 (bi − ai )
proof: By convexity of etx , (t > 0)
bi − Xi tai Xi − ai tbi
etXi ≤ e + e
b i − ai b i − ai
bi − µi t(ai −µi ) µi − ai t(bi −µi )

E et(Xi −µi ) | Fi−1

≤ e + e
b i − ai b i − ai
= eL(hi )
where L(hi ) = −hi Pi + `n(1 − Pi + Pi ehi )

µ i − ai
hi = t(bi − ai ), Pi =
b i − ai
t(ai −µi )
+ Pi et(bi −µi )

L(hi ) = `n 1 − Pi )e
= `n et(ai −µi ) (1 − Pi ) + Pi et(bi −ai )

L0 (hi ) = −Pi + Pi / (1 − Pi )e−hi + Pi

Pi (1 − Pi )e−hi
L00 (hi ) = = ui (1 − ui )
[(1 − Pi )e−hi + Pi ]2
where 0 ≤ ui = Pi /[(1 − Pi )e−hi + Pi ] ≤ 1

1
L(hi ) = L(0) + L0 (0)hi + L00 (h∗i )h2i
2
1 0 1
≤ L(0) + L (0)hi + h2i
2 8
L(hi ) ≤ h2i /8 ≤ t2 (bi − ai )2 /8
t2 (bi − ai )2

t(Xi −µi )
So that E(e ) ≤ exp
8
72
n
X
t (Xi − µi )
Ee i=1
≤ E{E(· · · | Fn−1 )}
n−1
X
t (Xi − µi )
1 2
t (bi −ai )2
≤ e 8 Ee i=1
n
X
1 2
8
t (bi − ai )2
≤ e i=1
( n ) " n
#
X 1 X
So that P (Xi − µi ) > λ ≤ exp −λt + t2 (bi − ai )2
i=1
8 i=1
n
1 X
Leth(t) = −λt + t2 (bi − ai )2
8 i=1
n
X
minimizer t0 = 4λ (bi − ai )2
i=1
 2
  n
4λ 1 4λ  X
h(t0 ) = −λ n +  n
  (bi − ai )2
X 8 X 
(bi − ai )2 (bi − ai )2
  i=1
i=1 i=1
n
X
= −2λ2 (bi − ai )2
i=1
( n ) " n
X #
X
So that P (Xi − µi ) > λ ≤ exp −2λ2 (bi − ai )2
i=1 i=1
Application: yn = βXn + εn , where

Xn is Fn -measurable r.v.s.
εn i.i.d. with common distriburtion F.
73
εn is independent of Fn−1 ⊃ σ(ε1 , · · · , εn )
Eεn = 0, 0 < V ar(εn ) = σ 2 < ∞ .
Question : Test F = Fo (Ho )
Example : AR(1) process
yn = βyn−1 + εn , yo Fo − measurable
n
1X
F̂n (u) = I ,
n i=1 [yi −β̂n xi ≤u]
where β̂n an estimator of β based on {(y1 , x1 ), · · · , (yn , xn )}.

n
1X
idea : F̂n (u) ∼
= Fn (u) = I[ε ≤u] , if β̂n xi ∼
= βxi
n i=1 i
P
sup | Fn (u) − F0 (u) |→ 0
u
√ D o
n sup | Fn (u) − F0 (u) |→ sup |ω (t) |, (Under Ho )
u 0≤t≤1
o o
where ω (t) is the Brownian Bridge which is defined by ω (t) = w(t) − tw(1) and
w(t) is the Brownian Motion
(i) {w(ti ) − w(si )} are independent,
∀ 0 = s0 ≤ t0 ≤ s1 ≤ t1 ≤ · · · ≤ sn ≤ tn
(ii) w(t) − w(s) = N (0, t − s)
(iii) w(0) = 0
If the εn are independent and have a cemmon distribution function F (t). Then for
large n,
Fn (t, w) → F (t).
Glivenko-Cantelli theoren:
sup | Fn (t) − F (t) |→ 0 a.s.
0≤t≤1
n
1X
Fn (t) = I[ε ≤t]
n i=1 i
Basic Theorem:
If εi are i.i.d. U (0, 1).
Then
n
!
1 X
αn (t) = √ I[εi ≤t] − F (t)
n i=1
D o
→ω (t) in D − space.
74
√ P
Wish : n sup | F̂n (u) − Fn (u) |→ 0 (In general, it is wrong)
u
√ D o
n sup | F̂n (u) − Fn (u) |→ sup |ω (t) |
u 0≤t≤1
√
Reject if n sup | F̂n (u) − Fn (u) |> Cα
u
Compare:
n
√ 1X P
(i) n sup | F̂n (u) − F (u + (β̂n − β)xi ) − Fn (u) + F (u) |→ 0 (right)
u n i=1
√ P
(ii) n sup | [F̂n (u) − F (u)] − [Fn (u) − F (u)] |→ 0 (It is wrong, in general)
u
n
1X
F̂n (u) = I
n i=1 [yi −β̂n xi ≤u]
n
1X
= I
n i=1 [εi ≤u+(β̂n −β)xi ]
F (c xi + u)
= E(I[εi ≤c xi +u] | Fi−1 )
(If C is constant, we can use the exponential bound).
√
n(F̂n (u) − F (u))
n
√ 1X
= n(F̂n (u) − F (·) − Fn (u) + F (u)) · · · (1)
n i=1
n
!
√ 1X
+ n F (·) − F (u) · · · (2)
n i=1
√
+ n(Fn (u) − F (u)) · · · (3)
In fact, tell us:
n
1 X
√ [F (u + (β̂n − β)xi ) − F (u)]
n i=1
n
∼ 1 X 0
=√ F (u)(β̂n − β)xi
n i=1
n
!
1 X
= F 0 (u) √ xi (β̂n − β) does not converge to zero.
n i=1
75
Example:
yi = βxi + εi , xi = 1, β̂n − β = ε̄n

n
!
1 X √ D
√ xi (β̂n − β) = n(ε̄n ) → N (0, 1)
n i=1
wish:(1) → 0p (1)
(2) → 0, and
D
known (3) → Wo (t), 0 ≤ t ≤ 1
Classical result: υ(0, 1) = F
Define: √
αn (t) = n(Fn (t) − t)
Oscillation modulus:
Wn (δ) = sup | αn (t) − αn (u) |

|t−u|≤δ
Lemma:∀ ε > 0, ∀ η > 0, ∃ δ and N 3 n ≥ N, P {Wn (δ) ≥ ε} ≤ η.

Reference:
Billingsley, (1968)
Convergence of probability measures. (Book).
Papers:
(i) W. Stute (1982, 1984). Ann. Prob. p.86-107, p.361-379.
(ii) The Oscillation behavior of empirical process.: The Multivariate case.
Key idea:
• If (β̂n − β) ∼
= C and u fixed. Then
n
1 X
√ I(εi ≤Cxi +u) − F (Cxi + u) − I[εi ≤u] + F (u)
n i=1
n
X
By Yi , (Yi | Fi−1 ) ∼ b(1, Pi ) and exponential bound. Pi ∈ Fi−1 -measurable.
i=1
0
• Lemma: If k F∞ k, Then
n
√ X
n sup | I[εi ≤u+δni ] − F (u + δni ) − I[εi ≤u] + F (u) |
u
i=1
P 1
→ 0, if δn = op ( √ )
n
76
•(β̂n − β) = op (an ) PP
∃ c ∈ Cn lattice points and ∀x ∈ ( : square set) .
1
3 (c − x) sup | xi |= 0( √ )
1≤i≤n n
# (Cn ) ≤ nk .
wish:
n
√ 1X P
n sup | F̂n (u) − F (u + (β̂n − β)xi ) − Fn (u) + F (u) |→ 0
u n i=1
By
n
√ 1X
n sup sup | F̂n (u) − F (u + cxi ) − Fn (u) + F (u) |
u c∈Cn n i=1
∀ ε >0
√
X
P n sup | F̂n (u) − · · · |> ε
u
c∈Cn
XX n√ o
≤ P n | F̂n (u) · · · |> ε
u∈Un c∈Cn
nε2
0
0
≤ nk+k e− 2
t
. if #(Un ) ≤ nk
Question:
n
1 X
√ I[εi ≤(β̂n −β)xi +u] − F ((β̂n − β)xi + u) − I[εi ≤u] + F (u)
n i=1
•(β̂n − β) = Op (an )
Yi = βXi + εi , εi i.i.d. with distribution ft. F
Xi ∈ Fi−1 -measurable, εi independent of Fi−1 .
n
1 X
√ I[εi ≤δXi +u] − F (δXi + u) − I[εi ≤u] + F (u)
n i=1
n
1 X
=√ Yi
n i=1
77
Z Z
(a) E[Yi | Fi−1 ] = dF (ε) − F (δXi + u) − dF (ε) + F (u)
[ε≤δXi +u] [ε≤u]
= 0
(b) − 1 ≤ yi ≤ 1
Hoeffding0 s inequality: (not good)

{Yi , Fi } is a martingale difference.
−1 = ai ≤ Yi ≤ bi = 1
 
2n2 t2
 
 
P {Ȳn ≥ t} ≤ exp − n
 .

X
(bi − ai )2
 
( n i=1 ) 2
1 X 2n2 λn
So that P √ Yi ≥ λ ≤ 2e− 2n = 2exp[−λ2 ]
n i=1

— It can0 t reflect the true variance.

Bennett0 s inequality: (better)
τ
X
Yi ≤ υ, E(Yi2 | Fi−1 ) ≤ V
i=1
( τ ) 2
X t −1
P Yi ≥ t ≤ exp − ψ(U tV )
i=1
2V
E(Yi2 | Fi−1 ) = | F (δXi + u) − F (u) || 1 − (· · · ) |

≤ | F (δXi + u) − F (u) |
≤ k F 0 k∞ | δ || Xi |
78
n
X n
X
0
E[Yi2 | Fi−1 ] ≤k F k∞ | δ | | xi |
i=1 i−1
n
X n
X
x i εi x i εi
i=1 i=1 1
β̂n − β = n = ! 12 ! 12
X n n
x2i
X X
x2i x2i
i=1
i=1 i=1
n
x2i ∼
X
= a2n cn
i=1
n
X
n
| xi | n
! 12
1
X i=1
X
(β̂n − β) | xi |≈ Op (1) ! 12 ≤ n 2 x2i
n
i=1 X i=1
x2i
i=1
√
take V = nc, τ = n, υ = 1
( n
)
1 X
P |√ Yi |> λ
n i=1
√
( nλ)2 √ √

≤ exp − √ ψ( nλ/ nc)
2 nc
√ 2
nλ λ
= exp − ψ
2c c
Law of the iteratived logarithm:
classical: Xn i.i.d., EXn = 0, 0 < V ar(Xn ) = σ 2 < ∞
Sn
lim sup √ = σ a.s.
n→∞ 2nloglogn
√

D
(a) Zn = Sn nσ ∼ N (0, 1)

p
Sn = Zn 2loglogn
79
(b) if m and n very closeness.
If Zm and Zn are very closeness.
n
!2
X
E Xi r
i=1 n 1 n
E(Zm Zn ) = 2
√ = 2√ == 2 .
σ mn σ mn σ m
n
(c) m
= 1c , c large enough.
n1 = c, n2 = c2 , · · · , nk = ck
Zn1 , Zn2 , · · · , Znk ' i.i.d.N (0, 1).
(d) if Yi is i.i.d. N(0,1)

p
lim sup Yn 2logn = 1 a.s.
n→∞
proof: ∀ ε > 0
p
P {Yn ≥ (1 + ε) 2logn i.o.} = 0
p
P {Yn ≥ (1 − ε) 2logn i.o.} = 1
By Borel-Contelli lemma, we only have to check

∞
X p
P {Yn ≥ (1 + δ) 2logn} < ∞
n=1
∞
X 1 2(1+δ)2 logn
∼ √ e− 2
n=1
(1 + δ) 2logn
∞
X 1 1
= √ (1+δ) 2 < ∞ if δ > 0 .
n=1
(1 + δ) 2logn n
Zn,k
(e) lim sup √ = 1 a.s.
n→∞ 2logk
nk = ck , loglognk = logk + loglogc.
S ck
(f) lim sup p = 1 a.s.
k→∞ ck · 2 · loglogck
80
Sn
(g) lim sup √ = 1 a.s.
n→∞ σ 2nloglogn
Theorem A: Let {Xi , Fi } be a martingale difference such that | Xi |≤ υ a.s. and
n
X
s2n = E(Xi2 | Fi−1 ) → ∞ a.s.
i=1
Then
Sn
lim sup 1 ≤ 1 a.s.
n→∞ sn (2loglogs2n ) 2
where
n
X
Sn = Xi
i=1
Corollary:
Sn
lim inf 1 ≥ −1
| Sn |
and lim sup 1 ≤ 1 a.s.
proof: (theorem A)
c>1
∀ k, let Tk = inf {n : s2n+1 ≥ c2k }
So that Tk is a stopping time
Tk < ∞ a.s. since s2n → ∞ a.s.
Consider STk

a.s.
ST2k ≤c ,2k
ST2k c2k → 1.
Want to show:
n p o
k
(∗) P STk > (1 + ε)c 2logk, i.o. = 0

2 12
⇒ lim sup (STk STk (2loglogsTk ) )] ≤ 1 + ε a.s.
k→∞
81
By Bennett0 s inequality, let
p
λ = (1 + ε)ck 2logk, V = c2k , υ = υ
∞
λ2

X υλ
(∗) ≤ exp − ψ
k=1
2V V
 k
√ 
X∞ (1 + ε)2 c2k ψ υ(1+ε)cc2k 2logk 2logk
= exp − 2k

k=1
2c
∞
X
≤ c0 exp[−(1 + ε0 )2 logk]
k=1
∞
X 1
= c0 <∞
k=1
k (1+ε0 )2
1
Because (1 + ε)2 logk · ≥ (1 + ε0 )2 logk

√
υ(1+ε)ck 2logk
1+ c2k
∀ n, ∃ Tk , Tk+1 , s.t. Tk ≤ n ≤ Tk+1
Sn = STk + Sn − STk
S ST Sn − STk
p n ≤ p k + p
sn 2loglogsn2 sn 2loglogsn sn 2loglogs2n
2
Given ε > 0 , choose c > 1

So that ε2 /(c2 − 1) > 1
( n )
X
Xi I[Tk <i≤Tk+1 ] , Fn is a martingale
i=1
n
!
X
sup (Sn − STk ) ≤ sup Xi I[Tk <i≤Tk+1 ]
Tk <n≤Tk+1 1≤n<∞
i=1
Tk+1
X
Since E(Xi2 | Fi−1 ) = ST2k+1 − ST2k +1 ≤ c2(k+1) − c2k = c2k (c2 − 1).
i=Tk +1
Want to prove:
p
P{ sup (Sn − STk ) > εck 2logk, i.o.} = 0
Tk <n≤Tk+1
j
X p
pf : Def τ = inf {j : Xi I[Tk <i≤Tk+1 ] > εck 2logk}
i=1
82
∞
( )
X p
k
P sup (Sn − STk ) > εc 2logk
Tk <n≤Tk+1
k=1
∞
( τ )
X X p
= P Xi I[Tk <i≤Tk+1 ] > εck 2logk
k=1 i=1
∞ k√
ε2 c2k 2logk

X υc 2logk
≤ exp − 2 2k
ψ
k=1
2(c − 1)c (c2 − 1)c2k
∞ 2 √
υ 2logkck

X ε logk
≤ exp − 2 ψ
k=1
c −1 (c2 − 1)c2k
when k is large, [ε2 /(c2 − 1)]ψ(·) ≥ 1 + δ, for some δ > 0.

∞
X
0
≤ C exp[−(1 + δ) log k]
k=1
∞
X
= C0 k −(1+δ) < ∞.
k=1
Reference:
1. W. Stout: A martingale analysis of kolmogorov0 s law of the iteratived logarithm.

Z.W. Verw. Geb. 15, 279∼290, (1970).
2. D.A. Freedman, Ann. Prob. (1975), 3, 100-118. On Tail Probability For

Martingale.
Exponential Centering:
X ∼ F, ∃ ϕ(t) = EetX
P {X > µ}
etx dF (x)
Z Z
= dF (x) = ϕ(t)e−tx
[x>µ] [x>µ] ϕ(t)
Z
= ϕ(t) e−tx dG(x)
[x>µ]
Under G, X have the mean=ψ 0 (t) and Variance=ψ 00 (t).

tx dF (x)
where ψ(t) = log ϕ(t), G(x) = e ϕ(t)
83
d
R
xetx etx dF ϕ0 (t)
Z Z
• xdG(x) = dF (x) = dt
= = [log ϕ(t)]0
ϕ(t) ϕ(t) ϕ(t)
= ψ 0 (t)
R
Similarly, for x2 dG(x).
So, P {x > u}
Z
= ϕ(t) e−tx dG(x)
[x>u]
Z
0 0
= ϕ(t)e−tψ (t) e−t(x−ψ (t)) dG(x)
x−ψ 0 (t) u−ψ 0−1 (t)
√ > √
ψ 00 (t) ψ 00 (t)
0
Z √ 00
= eψ(t)−tψ (t) e−t ψ (t)z dH(z)
u−ψ 0 (t)
z> √ 00
ψ (t)
p
where H(z) = G( ψ 00 (t)z + ψ 0 (t)).
Example: X ∼ N (0, 1)
t2
ϕ(t) = e− 2
ψ(t) = t2 /2, ψ 0 (t) = t, ψ 00 (t) = 1
Z
t2
−t2
P {X > u} = e 2 e−tz dH(z)
[z>u−t]
H(z) ∼ N (0, 1)
Simulation: t = u
2
Z
− u2
P {X > u} = e e−uz dH(z)
[z>0]
Exponential bound : t = u(1 + ε)

2
Z
− u2 (1+ε)2
P {X > u} = e e−u(1+ε)z dH(z)
[z>−εu]
2
Z
− u2 (1+ε)2
≥ e e−u(1+ε)z dH(z)
[0≥z>−εu]
Ref: R.B. Bahadur : Some limit theorems in statisties. SIAM.

Lemma 1: If E[X | F] = 0, E[X 2 | F] ≥ c > 0 and E[X 4 | F] ≤ d < ∞
84
Then P {X > 0 | F} ∧ P {X < 0 | F} ≥ c2 /4d.
proof:
E[X | F] = 0 ⇔ E[X + | F] = E[X − | F]

c
E[X 2 | F] ≥ c ⇔ E[X +2 | F] ≥ or
2
−2 c
E[X | F] ≥
2
2 c
Assume that:E(X + | F) ≥ 2
c 2 4 1
≤ E(X +2 | F) = E[(X + ) 3 · (X + ) 3 | F ] (Hölder inequality)
2
2 1
≤ E 3 (X + | F)(E(X + )4 ) 3
c
So that ( )3 ≤ E 2 (X + | F)E(X 4 | F)
2
c 3
( ) /d ≤ E 2 (X + | F)
2
c 3 1
( ) 2 /d 2 ≤ E(X + | F) = E(X + I[X>0] | F)
2
1 3
≤ E 4 (X 4 | F)E 4 (I[X>0] | F) (Hölder inequality)
1 3
≤ d 4 P 4 {X > 0 | F}
c
( )6 /d2 ≤ dP 3 {X > 0 | F}, implies
2
c2
P {X > 0 | F} ≥
4d
c 32
( )
Similarly, E(X − | F) ≥ 2 1 , and
d2
P {X < 0 | F} ≥ c2 /4d.
Lemma 2: !Assume that {εn , Fn } is a martingale difference sequence such that

Xn
E ε2i | Fo ≥ c2 > 0.
i=1
n
X
E(ε2i | Fi−1 ) ≤ c1 and sup | εi |≤ M a.s.
1≤i≤n
i=1
85
Then there is a universal constant B s.t.
( n ) ( n )
X X
p εi < 0 | Fo ∧ P εi > 0 | Fo
i=1 i=1
≥ Bc22 /(c21 +M )4
proof: (i) Burkholder-Gundy-Davis
p>0
" i
#
X
E ( sup | εj |P | Fo
1≤i≤n
j=1

n
! P2 
X
≤ kE  E(ε2j | Fj−1 ) | Fo 
j=1

P
+kE max | εi | | Fo
1≤j≤n
use: If E(XIA ) ≥ E(Y IA ), ∀AF, X ≥ 0, Y ≥ 0

Then E(X | F) ≥ E(Y | F) a.s.
p.f. : Let A = {E(X | F) < E(Y | F)}

E(XIA − Y IA ) = E{[E(X | F) − E(Y | F)]IA }
= E(XIA ) − E(Y IA )
⇒ P (A) = 0.
(ii) By a conditional version of B-G-D inequality take p=4.
 4 
X n
E  εj | Fo 

j=1
( )2 
X n
≤ kE  (ε2i | Fi−1 ) | Fo 
i=1

4
+kE max | εi | | Fo
1≤i≤n
≤ kc21 + kM 4 = k(c21 + M 4 )
86
By Lemma 1,
( n ) ( n )
X X
P εi > 0 | Fo ∧P εi < 0 | Fo
i=1 i=1
≥ c22 /4k(c21 + M 4 ) = Bc22 /(c21 + M 4 )
1
where B = 4k
n
!
X
use (i) E εi | Fo =0
i=1
 !2  !
n
X n
X
(ii) E εi | Fo  = E ε2i | Fo ≥ c2 .
i=1 i=1
Similarly,
( n ) ( n )
X X
P εi (−λ, 0) | Fo ∧P εi (0, λ) | Fo
i=1 i=1
≥ Bc22 /(c21 4
+ M ) − c1 /λ 2
(By Markov-inequality)
n
X
Let Sn = Xi
i=1
Assumptions:
(i) {Xi , Fi } is a martingale difference sequence.
(ii) P {| Xi |≤ d} = 1, ∀ 1 ≤ i ≤ n
Notations:
i
X
σi2 = E(Xi2 | Fi−1 ), s2i = σj2
j=1
−1 −2
g1 (x) = x (e − 1), g(x) = x (ex − 1 − x)
x
Conditional Exponential Centering:

idea : P {A | Fo } = E(E(· · · E(E(IA | Fn−1 ) | Fn−2 ) · · · | Fo ))
ϕi (t) = E[etXi | Fi−1 ], ψi (t) = logϕi (t)
Definition
(t)
Fi (x) = E[I[Xi ≤x] etXi | Fi−1 ]/ψi (t)
87
So that P {Sn > λ | Fo }
n
X
Z Z "Y
n
# −t xi
(t)
= ··· [ϕi (t)] e i=1 dFn(t) · · · dF1
[Sn >λ] i=1
n
X n
X
Z Z [ψi (t)] −t xi
(t)
= ··· e i=1 e i=1 dFn(t) · · · dF1
[Sn >λ]
n
X n
X
Z Z [ψi (t) − tψi0 (t)] −t (xi − ψi0 (t))
(t)
= ··· e i=1 e i=1 dFn(t) · · · dF1
[Sn >λ]
Under new measure,
E[Xi | Fi−1 ] = ψi0 (t).

V ar(Xi | Fi−1 ) = ψi00 (t).
Goal : Compute P {Sn > λ | Fo }

= E(I[Sn >λ] | Fo )
= E(E · · · E(I[Sn >λ] | Fn−1 ) | Fn−2 ) · · · | Fo )
(t) etXi
by dFi = dP [Xi ≤ x | Fi−1 ].
ϕi (t)
88
Now, if s2n ≤ M, g(−td) − t2 d2 g 2 (−td) − g1 (td) ≤ 0,
then P {Sn > λ | Fo }
Xn
Z Z Y" n
# −t xi
(t)
= ··· ϕi (t) e i=1 dFn(t) · · · dF1 , ∀t > 0
[Sn >λ] i=1
n
X n
X
Z Z ψi (t) − t xi
(t)
= ··· e i=1 i=1 dFn(t) · · · dF1
[Sn >λ]
n
X n
X
Z Z (ψi (t) − tψi0 (t)) −t (xi − ψi0 (t))
(t)
(∗∗) = ··· e i=1 e i=1 dFn(t) · · · dF1
[Sn >λ]
(t) (t)
under dFn , · · · dF1 ,
Z
(t)
E[Yi | Fi−1 ] = ydFi (y) = E[Xi etXi | Fi−1 ]
= [logϕi (t)]0 = ψi0 (t), and
V ar(Yi | Fi−1 ) = ψi00 (t).
•ψi0 (t) = E(Xi etXi | Fi−1 )

= E(Xi (etXi − 1) | Fi−1 )
etXi − 1
= E[tXi2 | Fi−1 ]
tXi
= tE[Xi2 g1 (tXi ) | Fi−1 ], where g1 (x) ↑ as x ↑ .

≤ tE[Xi2 g1 (td) | Fi−1 ] ≤ tσi2 g1 (td).
≥ tE[Xi2 g1 (−td) | Fi−1 ] ≥ tσi2 g1 (−td).
x
e −1
Since g1 (x) > 0, ≥ 0, ∀x
x
ϕ0i (t) ≥ 0, There ϕi (t) ≥ ϕi (0) = 1
89
• • ϕi (t) = E[etXi | Fi−1 ]
= E[1 + tXi + t2 Xi2 g(tXi ) | Fi−1 ]

≤ 1 + t2 σi2 g(td)
≥ 1 + t2 σi2 g(−td)
0
ϕi (t)
• • •ψi0 (t) =
ϕ (t)
(i
≤ tg1 (td)σi2
tg (−td)σi2
≥ 1+t1 2 σ2 g(td)
i
So, ψi (t) − tψi0 (t) ≥ logϕi (t) − t2 σi2 g1 (td).

≥ log[1 + t2 σi2 g(−td)] − t2 σi2 g1 (td)
2 2
(1 + u ≥ eu−u , u ≥ 0; eu (1 + u) ≥ eu )
≥ t2 σi2 g(−td) − t4 σi4 g 2 (−td) − t2 g1 (td)σi2
≥ t2 σi2 {g(−td) − t2 d2 g 2 (−td) − g1 (td)}
Because σi2 = E(Xi2 | Fi−1 ) ≤ d2
, and
n
X
(ψi (t) − tψi0 (t))
i=1
≥ t2 s2n {g(−td) − t2 d2 g 2 (−td) − g1 (td)}
Xn
Because σi2 = s2n .
i=1
90
Thus,
2 M [g(−td)−t2 d2 g 2 (−td)−g (td)]
(∗∗) ≥ et 1
n
X
Z Z −t (xi − ψi0 (t))
(t)
· ··· e dFn(t) · · · dF1 .
i=1
[Sn >λ]
n n
!
X X
[Sn > λ] = [Sn − ψi0 (t) > λ − ψi0 (t)].
i=1 i=1
 n 
X
Z Z −t

(xi − ψi0 (t))
≥ ··· h i e i=1 dFn(t) · · · dFn(t)
tmg(−td)
Sn − n 0
P
i=1 ψi (t)≥λ− 1+t2 d2 g(td)
2 2 2 2
·et M [g(−td)−t d g (−td)−g1 (td)]
2 2 d2 g 2 (−td)−g (td)]
≥Z et M [g(−td)−t
Z
1
(t)
· ··· h i 1dFn(t) · · · dF1 .
tmg(−td)
0≥Sn − n 0
P
i=1 ψi (t)≥λ− 1+t2 d2 g(td)
• • • • ϕ00i (t) = E(Xi2 etXi | Fi−1 )

≤ E(Xi2 etd | Fi−1 ) = etd σi2
≥ e−td σi2
ϕ00i (t)
ψi00 (t) = − (ψi0 (t))2
ϕi (t)
≤ ϕ00i (t)/ϕi (t) ≤ ϕ00i (t) ≤ etd σi2
e−td σi2
≥ − t2 g12 (td)σi4
1 + t2 σi2 g(td)
2 σ 2 g(td)
≥ σi2 e−td e−t i − t2 g12 (td)σi4
2 d2 g(td)
≥ σi2 [e−td−t − t2 d2 g12 (td)].
n
X
So, ψi00 (t)
i=1
≤ s2n etd

= 2 2
≥ s2n {e−td−t d g(td) − t2 d2 g12 (td)}
91
√ √
Replace t by t/ M and λ by (1 − r) M t.
√
(∗ ∗ ∗) P {Sn > (1 − r) M t | Fo }
√ √
t2 {g(−td/ M )−(td/ M )2 g 2 (− √td )−g1 ( √td )}
≥ eZ M M
Z √ √
(t/ M )
· ··· " Pn √ m √t
√
g(−td/ M )
# dFn(t/ M)
· · · dF1
0 √t M
0≥ i=1 xi −ψi ≥(1−r) M t− 2 √
M 1+ t d2 g 2 (td/ M )
M
√ 2d
Let εi = Xi − ψi0 (t/ M ), | εi |≤ √
M
n 2 √ √
X s
(1) E(ε2i | Fi−1 ) ≤ n e(td/ M ) ≤ etd/ M = c1
i=1
M
" n # " n #
X X √
(2) E ε2i | Fo = E (Xi − ψi0 (t/ M ))2 | Fo
i=1 i=1
t2 d2 2 td

M1 td td m t td
≥ − 2 √ g1 √ + √ g1 (− √ )/(1 + g ( √ ))
M M M M M M M M
Thus,
2
√ √ 2 2
√ √
(∗ ∗ ∗) ≥ et [g(td/ M )−(td/ M ) g (−td/ M )−g1 (td/ M )]
h √ √ √
 M M
1
− (2td/ M )g 1 (td/ M ) + m √t
M M
g(−td/ M) · B
· √ √
 e2td/ M + (2d/ M )4
√ )
etd/ M
− m
√ 2 2
√
t2 [(1 − r) − M tg(−td/ M )/(1 + tMd g 2 (td/ M ))]2
√
Let t → ∞, and td/ M → 0
n
X
Assume that E(Xi2 | Fo ) ≥ M1 > 0, and let M1 /M → 1, m/M → 1,
i=1
n
X
m≤ E(Xi2 | Fi−1 ) ≤ M , and 1 − (m/M ) < r
i=1
Then √
P {Sn > (1 − r) M t | Fo }
t2
≥ e− 2 (1+0(1)) · B(1 + 0(1))
In summary:
92
For each n, {Xn,i , Fu,i , i = 1, 2, · · · , n} is a martingale difference such that
(1) sup | Xn,i |≤ dn , dn increasing.

n
n
X n
X
2 2
(2) mn ≤ E(Xn,i | Fn,(i−1) ) ≤ Mn , E(Xn,i | Fn,o ) ≥ Mn,1 ,
i=1 i=1
where mn /Mn → 1, Mn,1 /Mn → 1
√
If tn → ∞, and tn dn / Mn → 0
then
( n )
X p
P Xn,i > (1 − r) Mn tn | Fno
i=1
t2
n
≥ e− 2 (1+0(1)) · C(1 + 0(1))
( n
)
X
Theorem: Assume that Sn = Xi , Fn is a martingale
i=1
such that sup | Xn |≤ d < ∞ a.s.
1≤n<∞
n
X
Let σi2 = E(Xi2 | Fi−1 ) and s2n = σi2
i=1
If s2n → ∞ a.s., then
lim sup Sn /(2s2n loglogs2n )1/2 = 1 a.s.

n→∞
proof: (i) “≤ ” is already shown.

(ii) To show “≥ 1”.
we only have to show that ∀ ε > 0, ∃ nk 3
P {Snk > (1 − ε)(2s2nk loglogs2nk )1/2 i.o.} = 1

Given c > 1, let τk = {n : s2n+1 ≥ ck }
τk is a stopping time, since

n+1
X
s2n+1 = E(Xi2 | Fi−1 ) is Fn measurable.
i=1
Note that
s2τk < ck , s2τk+1 ≥ ck
93
(1) s2τk+1 − s2τk ≤ ck+1 − s2τk +1 − στ2k +1

≤ ck+1 − ck + d2

(2) s2τk+1 − s2τk ≥ s2τk+1 +1 − στ2k+1 +1 − ck
≥ ck+1 − d2 − ck
By in summary,
τk+1 ∞
X X
Sτk+1 − Sτk = Xi = Xi I[τk <i≤τk+1 ]
i=τk +1 i=1
P {Sτk+1 − Sτk > (1 − δ)(2s2τk+1 loglogs2τk+1 )1/2 | Fτk }

≥ P {Sτk+1 − Sτk > (1 − δ)(2ck+1 loglogck+1 )1/2 | Fτk }
1−δ
(∗) = P {Sτk+1 − Sτk > (1 − r)( )(2ck+1 loglogck+1 )1/2 | Fτk }
1−r
let r = δ/2 and choose c so that
r
1−δ √
r
1−δ 1 d2
< 1 − , implies ≤ 1 − c−1 ≤ 1 − c−1 + k+1
1 − δ/2 c 1−r c
k+1 k 2 k+1 2 k
Mk = c − c + d , mk = c −d −c
k+1 1/2
(2loglogc ) 1−δ
tk = (
d2 1/2 1 − r
)
−1
(1 − c + ck+1 )
< α(2loglogck+1 )1/2 , 0 < α < 1.
p
(∗) = P {Sτk+1 − Sτk > (1 − r) Mk tk | Fτk }
t2
k
≥ e− 2 (1+0(1)) B(1 + 0(1))
2 loglogck+1 (1+0(1))
≥ B(1 + 0(1))e−α
2
≥ B(1 + 0(1))((k + 1)α (1+0(1)) )−1
∞
X p
So that, P {Sτk+1 − Sτk > (1 − r) Mk tk | Fτk } = ∞ a.s.
k=1
So that, P {Sτk+1 − Sτk > (1 − δ)(2s2τk+1 loglogs2τk+1 )1/2 i.o. | Fτk }

=1
94
But
Sτk Sτk (s2τk loglog × s2τk )1/2

1/2 = 1/2 1/2
2s2τk+1 loglogs2τk+1 2s2τk loglogs2τk sτk+1 loglog × s2τk+1
(s2τk loglogs2τk )1/2 (ck loglogck )1/2
1/2 ≤
((ck+1 − d2 )loglogck+1 )1/2

s2τk+1 loglogs2τk+1
≤ (1/(c − d2 /ck ))1/2 → 0, as c → ∞
≤ δ (choose c so that)
So that, with c choosen,
lim sup Sτk+1 /(2s2τk+1 loglogs2τk+1 )

k→∞
Sτk+1 − Sτk
≥ lim sup 2
k→∞ (2sτk+1 loglog s2τk+1 )1/2
+ lim sup Sτk /(2s2τk+1 loglogs2τk+1 )1/2
k→∞
≥ (1 − δ) + (−1)δ = 1 − 2δ
By lim sup(an + bn )
n→∞
≥ lim sup an + lim inf bn .
n→∞ n→∞
History of L.I.L.:
Step 1:
{Xi } i.i.d. P {Xi = 1} = P {Xi = −1} = 1/2

X n
Sn = Xi
i=1
s2n = n
1
(1913) Hausdorff: Sn = O(n 2 +ε ) a.s.
(By moment and chebyshev0 s inequality).
(1914) Hardy-Littlewood:
Sn = O((n log n)1/2 )

x2
(By e− 2 or e−x/2 )
95
(1922) Steinhauss:
lim sup Sn /(2nlogn)1/2 ≤ 1 a.s.

n→∞
(1923) Khinchine:
Sn = O((n loglogn)1/2 )
(1924) Khinchine:
lim sup Sn /(2n loglog n)1/2 = 1 a.s.

n→∞
step 2:
(1929) Kolmogorov:
n
X
0
Xi indep. r.v .s EXi = 0, s2n = EXi2
i=1
sn
(i) sup | Xk |≤ kn
1≤k≤n (loglogs2n )1/2
(ii) kn → 0, s2n → ∞.
Then
lim sup Sn /(2s2n loglogs2n )1/2 = 1 a.s.

n→∞
(1937) Marcinkewicz and Zygmund:

Given an example:
n
X 1
Sn = ci εi , P (εi = −1) = P (εi = 1) = ,
i=1
2
{εn } i.i.d.
kn sn
{cn } is choosen, so that kn → k > 0, | cn |≤ .
(2 loglogs2n )1/2
They showed that
lim sup Sn /(2s2n loglogs2n )1/2 < 1 a.s.

n→∞
(1941) Hartman and Witter

Xi i.i.d. EXi = 0, V ar(Xi ) = σ 2 .
96
Step 3:
(196?) Strassen:
Xi i.i.d, EXi = 0, V ar(Xi ) = 1.
limit of Sn /(2loglogn) is {-1, 1}.
Wn is a Brownian Motion
1 1
| Sn − Wn |= 0 (n 2 (loglogn) 2 ).
Construct a Brownian Motion W (t) and stopping time τ1 , τ2 , · · · so that
( n
)
D
X
Sn = W ( τi ), n = 1, 2, · · ·
i=1
n
!
X
| Sn − Wn |=| W τi − Wn |
i=1
(1965) Strassen:
Xi independent case and special martingale.
(1970) W.F. Stout:
Martingale
( Version)of Kolmogorov0 s Law of Iterated Logarithm. Z.W.V.G. 15, 279∼290.
Xn
Xn = Yi , Fn is a martingale.
i=1
n
X
s2n = E[Yi2 | Fi−1 ]
i=1
1
If s2n → ∞ a.s. and | Yn |≤ kn sn /(2log2 s2n ) 2 a.s.
where kn is Fn−1 -measurable and lim kn = 0
n→∞
Then lim sup Xn /(sn un ) = 1 a.s.

n→∞
1 1
un = (2log2 s2n ) 2 ≡ (2loglogs2n ) 2 .
(1979) H. Teicher:
Z.W.V.G. 48, p.293-307.
Indepent Xi , P {| Xn |≤ dn } = 1.
1
dn (log2 s2n ) 2
lim =a≥0
n→∞ sn
(a=0, Kolmogorov0 s condition)
n
2 12
√ o
P lim Sn /sn (2log2 sn ) = c/ 2 = 1
n→∞
97
1
where 0.3533/a ≤ c ≤ min[ + bg(a, b)]
b>0 b
(1986) E. Fisher:
Sankhyea, Series A, 48, p.267∼ 272.
Martingale Version:
lim sup kn < k. a.s.

n→∞
n
1
X
implies lim sup Yi /sn (2log2 s2n ) 2 ≤ 1 + ε(k).
n→∞
i=1

k/4, if 0 < k ≤ 1
where ε(k) =
(3 + 2k 2 )/4k − 1, if k > 1.
This bound is not as good as Teicher0 s bounds.

Problems:
1. Do we have a martingale version of Teicher0 s result?

√
2. M-Z. implies c/ 2 < 1.
Teicher0 s result does not imply this.
How to interprate M-Z phenomenon?
3. Can we extend martingale0 s result to the double arrays of martingale differences

Sn ?
∞
X
Sn = ani εi .
−∞
Lai and Wei (1982). Annals prob. 10, 320∼ 335.
Papers:
D. Freedman (1973). Annals probability, 1, 910∼925.
Basic assumptions:
(i) Fo ⊂ F1 ⊂ · · · ⊂ Fn · · · (σ −fields)
(ii) Xn is Fn −measurable, n ≥ 1.
(iii) 0 ≤ Xn ≤ 1 a.s.
Xn n
X
Sn = Xi , Mi = E[Xi | Fi−1 ], Tn = Mi .
i=1 i=1
98
Theorem: Let τ be a stopping time
(i) If 0 ≤ a ≤ b, then
( τ τ
)
X X
P Xi ≤ a and Mi ≥ b
i=1
i=1
(a − b)2

a a−b
≤ (b/a) e ≤ exp −
2c
, where c = a ∨ b = max{a, b}.
(ii) If 0 ≤ b ≤ a, then
( τ τ
)
X X
P Xi ≥ a and Mn ≤ b
i=1
i=1
(b − a)2

≤ (b/a)a ea−b ≤ exp − , where c = aV b.
2c
Lemma:
P 0 ≤ X ≤ 1 is a r.v. on (Ω , F, P).
Let be a sub-σ-field
P of F.
Let M = E{X | } and h be a real number.
Then
X
E{exp(hX) | } ≤ exp[M (eh − 1)]
proof: f (x) = exp(hx), f 00 (x) = h2 ehx ≥ 0

So f (x) is convex.
ehX = f (x) ≤ f (0)(1 − x) + f (1)x.

= (1 − x) + eh x
X X
E[ehX | ] ≤ E[(1 − X) + eh X | ]
= (1 − M ) + eh M
h −1)M
= 1 + (eh − 1)M ≤ e(e .
(Because 1 − x ≤ ex , ∀x).
Corollary : For each h, define Rn (m, x) = exp[hx − (eh − 1)m].
Then
Rh (Tn , Sn ) is a super-martingale.
99
proof:
Rh (Tn , Sn ) = Rh (Tn−1 , Sn−1 ) exp[hXn − (eh − 1)Mn ]

So E[Rh (Tn , Sn ) | Fn−1 ]
≤ Rh (Tn−1 , Sn−1 )E[exp hXn | Fn−1 ] exp[−(eh − 1)Mn ]
≤ Rh (Tn−1 , Sn−1 ) (By lemma).
In the following, we use exp(∞) = ∞, exp(−∞) = 0, then

Rh (m, x) is a continuous function on [0, ∞]2 − (∞, ∞).
Lemma: Let τ be a stopping time.
G = {Tτ < ∞ or Sτ < ∞}

Z
Then Rh (Tτ , Sτ )dP ≤ 1.
G
proof: By the super-martingale property,

∀ n, ERh (Tτ ∧n , Sτ ∧n ) ≤ 1.
So that, 1 ≥ lim inf E[Rh (Tτ ∧n , Sτ ∧n )]

n→∞
h i
≥ E lim inf Rh (Tτ ∧n , Sτ ∧n )
n→∞
(Fautou0 s Lemma).
Z
≥ lim inf Rh (Tτ ∧n , Sτ ∧n )
G n→∞
Z
= Rh (Tτ , Sτ )dP.
G
proof of the theorem:

1, if m ≥ b and x ≤ a, ∀ (m, x)[0, ∞]2
Let u(m, x) =
0, o.w.
Qh (m, x) = exp[ha − (1 − e−h )b] R−h (m, x),

∀ (m, x)[0, ∞]2 − (∞, ∞), ∀ h ≥ 0
100
Then
P {S ≤ a and Tτ ≥ b}
Z τ
= u(Tτ , Sτ )dP
Z
= (Tτ , Sτ )dP, G = {Tτ < ∞ or Sτ < ∞}
ZG
≤ Qh (Tτ , Sτ )dP ( Qh ≥ u)
G
(Qh (m, x) = exp[−h(x − a) + (1 − e−h )(m − b)] ≥ 1, if m ≥ b and x < a)
Z
= Qh (0, 0) R−h (Tτ , Sτ )dP
G
≤ Qh (0, 0)
So P {Sτ ≤ a and Tτ ≥ b} ≤ inf Qh (0, 0)

h≥0
= inf exp[ha − (1 − e−h )b]

−h
= exp inf [ha − (1 − e )b .
h≥0
−h −h
d/dh[ha − (1 − e )b] = a − e b
minimum point ho satisfies eho = b/a
So that, min Qh (0, 0) = exp(ho a) · exp[be−ho − b]
h≥0
= (eho )a exp[(eho )−1 b − b]

a
= (b/a)a exp[ · b − b]
b
So, P {Sτ ≤ a and Tτ ≥ b}

≤ (b/a)a e(a−b) .
Another one: Let

1, if m ≤ b and x ≥ a
u(m, x) =
0, o.w.
h ≥ 0, Qh (m, x) = exp[−ha + (eh − 1)b]Rh (m, x)

G = {Tτ < ∞ and Sτ < ∞}, a > 0.
101
Lemma1 : a ≥ 0, b ≥ 0, c = a ∨ b
Then (b/a)a ea−b ≤ exp[−(a − b)2 /2c]
0 1 1−ε −ε
Lemma1 : 0 < ε < 1, f (ε) = ( ) e , g(ε) = (1 − ε)eε .
1−ε
we have
f (ε) < exp[−ε2 /2] < 1 and
g(ε) < exp[−ε2 /2] < 1.
proof : log f (ε) = −(1 − ε) log(1 − ε) − ε

x2 x3
(Because − log(1 − x) = x + + + · · · , 0 < x < 1).
2 3
ε2 ε3
= (1 − ε)[ε + + + · · · ] − ε
2 3
ε2 ε3 ε2
= [ε + + + · · · ] − ε2 − − · · · − ε
2 3 2
2
≤ −ε /2
log g(ε) = log(1 − ε) + ε
ε2 ε3
= −(ε + + + · · · ) + ε
2 3
2
≤ −ε /2
proof of Lemma 1:
(i) a=b (trivial).
(ii) case 1: 0 < a < b, let ε = (b − a)/b = 1 − a/b.
(b/a)a ea−b = [(1 − ε)−1 ](1−ε)b eb (−ε)

= [(1/(1 − ε))1−ε e−ε ]b
ε2
= f b (ε) ≤ exp[−b ]
2
(b − a)2

= exp −b = exp[−(b − a)2 /2b]
2b2
102
case 2: 0 < b < a.
ε = (a − b)/a = 1 − b/a
(b/a)a ea−b = (1 − ε)a eaε
ε2

a
= g (ε) ≤ exp −a
2
(a − b)2

= exp −a ·
2a2
(a − b)2

= exp −
2a
If 0 ≤ a ≤ b then
( τ τ
)
X X
P Xi ≤ a and Mi ≥ b ≤ exp[−(a − b)2 /2(a ∨ b)]
i=1 i=1
Application:
Let Xn = ρXn−1 + εn , n = 1, 2, · · · , | ρ |< 1.
{εn , Fn } is a martingale difference sequence such that E[ε2n | Fn−1 ] = σ 2 , and
sup E[(ε2n )p | Fn−1 ] ≤ c < ∞
n
where p > 1 , c is a constant.

we know that
n
X
2
(i) 1/n Xi−1 → c2 a.s.
i=1
n
! 12
D
X
2
(ii) Xi−1 (ρ̂n − ρ) → N (0, σ 2 )
i=1
where ρ̂n is the L.S.E. of ρ.

Question: when Xi is random variable
n
!
n→∞
X
E Xi−1 (ρ̂n − ρ)2 −→ σ 2 ?
2
i=1
n
!−1 n
!
X X
ρ̂n − ρ = Xi2 Xi−1 εi
n
!i=1 i=1
2
( ni=1 Xi−1 εi )
X P
2 2
Xi−1 (ρ̂n − ρ) = Pn 2
i=1 i=1 Xi−1
103
n
X
2
difficult: Xi−1 is a random variable.
i=1
This problem how to calculate.
The corresponding χ2 -statistic is
n n
!2 n
X X X
2 2 2
Qn = Xi−1 (ρ̂n − ρ) = Xi−1 εi Xi−1 (Cauchy − Schwarz inequality)
i=1 i=1 i=1

n
!1/2 n
!1/2 2
X X
2
 Xi−1 ε2i 
n
i=1 i=1 X
≤ n = ε2i
X
2 i=1
Xi−1
i=1
?
E(Qpn ) → σ 2p E | N (0, 1) |2p .
A sufficient condition is to show {Qpn } is uniformly integrable. It is sufficient to

show that
0
∃ p0 > p 3 sup E[Qpn ] < ∞
n
Assume that ∃ q > p 3 E | Qn |2q < ∞.
104
Ideas:
(i) ε2i = (Xi − ρXi−1 )2 ≤ 2(Xi2 + ρ2 Xi−1

2
)
n n n
!
X X X
ε2i ≤ 2 Xi2 + ρ2 2
Xi−1
i=1 i=1 i=1
n+1
!
X
≤ 2(1 + ρ2 ) 2
Xi−1
i=1
n−1 n
!
X X
So that ε2i ≤ 2(1 + ρ2 ) 2
Xi−1
i=1 i=1
n
X
(ii) Qn ≤ (Xi − ρXi−1 )2
i=1
n
X
= ε2i
i=1
n
X
implies Qn ≤ ε2i
i=1
n
X n
X
Since ε2i = (Xi − ρ̂n Xi−1 )2 + Qn
i=1 i=1
Pn 2
2(1 + ρ2 ) ( ni=1 Xi−1 εi )
P
i=1 Xi−1 εi
implies Pn 2
≤ Pn−1 2
i=1 Xi−1 i=1 εi
n
!
2
2(1 + ρ2 ) ( ni=1 Xi−1 εi )
X P
2
(iii) Qn ≤ εi IAn + Pn−1 2 IAn c
i=1 i=1 εi
↑ ↑
(By (ii)) (By (i))
(iv) Let 0 < τ < σ 2 , choose k so that

h i
E ε2i I[ε2i ≤k] = σ 2 − E ε2i I[ε2i >k]
E | εi |2q E | εi |2q
≥ σ2 − , let α = σ 2
−
k 2q−2 k 2q−2
> τ.
105
( n )
X
Then P ε2i ≤ nτ
i=1
( n )
X
≤P ε2i I[ε2i ≤k] ≤ nτ
( i=1
n
)
X
≤P (ε2i /k)I[ε2i ≤k] ≤ n τ /k
("i=1n # " n
#)
ε2i

X X n
=P (ε2i /k)I[ε2i /k≤1] ≤ n τ /k , E I 2 | Fi−1 ≥ α
i=1 i=1
k [εi /k<1] k
h n i n
nE ε2i /k
I[εi /k≤1] ≥ α > τ
2
k k
((n/k) α − nk τ )2

≤ exp −
2( nk α)
= exp[−n(α − τ )2 /2kα]
n
(α − τ )2

= exp − = r−n
2kα
(α − τ )2

r = exp > 1.
2kα
106
" n−1 #
X
(v) Let An = ε2i ≤ (n − 1)τ , and q > p0 > p ≥ 1.
i=1
 !p0 1/p0
 n
X 
E ε2i IAn
 
i=1
n 1/p0
0
X
≤ E(ε2i )p IAn
i=1
n
X p0 1 1 1 1
≤ ((E[ε2i ]q ) q (EIAn
s
) s ) p0 , + = 1.
i=1
q s
(Hölder inequality)
1 1
≤ E(ε2i )q q · n{p(An)} sp0

1
−n sp
≤c·n·r → 0.
0
Pn 2p0
0 0E | Xi−1 ε i |
(vi) EQpn IAc n ≤c i=1
(n − 1)p0
Recall : (1987) Wei, Ann. Stat. 1667∼ 1687.
X n
Xn = ui εi , ui − Fi−1 measurable.
i=1
{εi , Fi } is a martingale difference sequence.
p≥2
sup E{| εn |p | Fn−1 } ≤ c a.s.
n
n
! p2
X
Then E sup | Xi |p ≤kE u2i
1≤i≤n
i=1
, k depends only on p, c.
2p0 !p0
Xn n
X
2
So, E Xi−1 εi ≤ k E Xi−1

i=1 i=1
n
X 0
≤kk 2
Xi−1 kpp0
i=1
n
!p0
X
≤k k Xi kp0
i=1
107
Now, Xn = ρXn−1 + εn = εn + ρεn−1 + · · · + ρn−1 ε1 + ρn Xo
= Yn + ρn Xo .
0
E | Yn + ρn Xo |2p
0 0 0
≤ 22p [E | Yn |2p +(| ρ |n | Xo |)2p ]
0
It is sufficient to show that sup E | Yn |2p < ∞
n
Since this implies
2p
Xn
0
E Xi−1 εi = O(np ) and

i=1
0
E[Qpn IAcn ] = O(1)
By the same inequality again,

0 0
E | Yn |2p = E | εn + ρεn−1 + · · · + ρn−1 ε1 |2p
0
≤ k E(12 + ρ2 + · · · + ρ2n−2 )p
p0
1 − ρ2n

= k
1 − ρ2
0
≤ k[1/(1 − ρ2 )p ] < ∞
108
Chapter 2
Stochastic Regression Theory
2.1 Introduction:
Model yn = β1 xn,1 + · · · + βn xn,p + εn
where {εn , Fn } is a martingale difference sequence and ~x = (xn,1 , · · · , xn,p ) is Fn−1
-measurable.
~
Issue: Based on the observations {~x1 , y1 , · · · , ~xn , yn }, make inference on β.
Examples:
(i) Classical Regression Model
(Fixed Design, i.e. ~x0i s are constant vectors).
(ii) Time series: AR(p) model
yn = β1 yn−1 + β2 yn−2 + · · · βp yn−p + εn
where εn are i.i.d. N (0, σ 2 ).
~xn = (yn−1 , · · · , yn−p )0 .
(iii) Input-Output Dynamic System.
(1) System Identification (Economic of Control)
yn = α1 yn−1 + · · · + αp yn−p + β1 un−1 + · · · + βq un−q + εn

~xn = (yn−1 , · · · , yn−p , un−1 , · · · , un−q )0
~un = (un−1 , · · · , un−q )0 ∼ exogeneous variable
(2) Control:
~u Fn−1 -measurable.
Example:
yn = αyn−1 + βun−1 + εn
Goal: yn ≡ T, T fixed constant.
If α, β are known.
109
After observing {u1 , y1 , · · · , un−1 , yn−1 }
Define un−1 so that
T − αyn−1
T = αyn−1 + βun−1 , i.e. un−1 = , (β 6= 0)
β
Fn−1 −measurable.
If α, β unknown:
Based on {u1 , y1 , · · · , un−1 , yn−1 }
Let α and β (say by α̂n−1 , β̂n−1 ).
Define un−1 = T −α̂β̂n−1 yn−1
n−1
Question:
Is the system under control?
Xm
Is m1
(yn − εn − T )2 small?
n=1
(iv) Transformed Model:
Xn
X
Branching Pocess with Immigration: Xn+1 = Yn+1,i + In+1
i=1
Xn : the population size of n-th generation.
Yn+1,i : the size of the decends of i-th number in n-th generation.
In+1,i : the size of the immigration in (n+1)th generation.
Assumptions:
(i) {Yn,i , 1 ≤ n < ∞, 1 ≤ i < ∞} are i.i.d. random variables.

with m = EYn,i , σ 2 = EYn,i2
(ii) {In } i.i.d. r.v. with b = EIn , V ar(In ) = σI2

(iii) {In } is independent of {Yn,i }
110
Xn
X
E(Xn+1 | Fn ) = E[Yn+1,i | Fn ] + E[In+1 | Fn ]
i=1
= mXn + b
Xn
X
V ar(Xn+1 | Fn ) = (E((Yn+1,i − m)2 | Fn ))
i=1
+E((In+1 − b)2 | Fn )
= Xn σ 2 + σI2
Xn
X
(Yn+1,i − m) + (In+1 − b)
i=1
Let εn+1 = p
σ 2 Xn + σI2
Then {εn , Fn } is a martingale difference sequence with E[ε2n | Fn−1 ] = 1.

The model becomes
q
Xn+1 = mXn + b + ( σ 2 Xn + σI2 )εn+1
If σ 2 and σI2 are known,

1 Xn 1
Yn+1 = Xn+1 /(σ 2 Xn + σI2 ) 2 = m p + bp + εn+1
σ 2 Xn + σI2 σ 2 Xi + σI2
In general we may use

1 Xn 1
Yn+1 = Xn+1 /(1 + Xn ) 2 = m √ + b√ + ε0n+1
1 + Xn 1 + Xn
s
0 σ 2 Xn + σI2
where εn+1 = εn+1 ,
1 + Xn
σ 2 Xn + σI2
V ar(ε0n | Fn−1 ) = ≤ c.
1 + Xn
In both cases, the inference on m and b can be handed by the Stochastic Regres-
sion Theory.
Reference:
Least Squares Estimation Stochastic Regression Models with Applications to Identi-
fication and Control of Dynamic Systems.
111
T.L. Lai and C.Z. Wei (1982).
Ann. Stat., 10, 154 ∼ 166.
Model: yi = β~ 0~xi + εi
{εi , Fi } is a sequence of martingale difference and ~xi is Fi−1 -measurable.
Bassic Issue : Make inference on β~ , based on observations {~x1 , y1 , · · · , ~xn , yn }
Estimation:
(a) εi ∼ i.i.d. N (0, σ 2 )
~x1 fixed, ~xi σ(y1 , · · · , yi−1 ), i = 2, 3, · · ·
MLE of β~ :
~ = L(β,
L(β) ~ y 1 , · · · , yn )
~ y1 , · · · , yn−1 )L(β,
= L(β, ~ yn | y1 , · · · , yn−1 )
~ y1 , · · · , yn−1 ) √ 1 e−(yn −β~ 0 ~xn )2 /2σ2
= L(β,
2πσ
..
.
X n
√ − (yi − β~ 0~xi )2 /2σ 2

= (1/ 2πσ)n e i=1 .
n
!−1 n
ˆ X X
So, M.L.E. β~n = ~xi~x0i ~xi yi
i=1 i=1
n
~ˆxi )2
X
σ̂n2 = 1/n (yi − β~
i=1
(b) Least squares:

n
X
~ =
minimum h(β) (yi − β~ 0~xi )2 over β.
~
i=1
n
X
~ β~ =
∂h(β)/∂ (yi − β~ 0~xi )~xi
i=1
n
! n
!
X X
= yi~xi − ~xi~x0i β~
i=1 i=1
ˆ
Solve the equation, we obtain β~n .
Computation Aspect:
112
• Recursive Formula
ˆ ˆ ˆ
β~n+1 = β~n + {(yn+1 − β~n0 ~x0n+1 )/(1 + ~x0n+1 Vn~xn+1 )}Vn ~xn+1
Vn+1 = Vn − Vn~xn+1~x0n+1 Vn /(1 + ~xn+1 Vn~xn+1 )
n
!−1
X
Vn = ~xi~x0i
i=1
Kalman filter type estimator:

! ! !
ˆ
~ ˆ
βn+1 =f β~n , ~xn+1 , n + 1
Vn+1 Vn
f : hardware or program.
!
ˆ
~
βn : stored in the memory.
Vn
~xn+1 : new data
Real Time Calculation:

• automatic
• large data set.
what is filter?
yi = β~ 0~xi + εi (state process.)

Oi = yi + δi (Observation process)
Filter Theory : Estimation state.

Predict state.
State History : F Y
Observation History : F O
Global History : F = F Y ∪ F O .
h is F-measurable
ĥ = E[h | F O ].
Author:
P. Brémaud : Point Process and Queues : Martingale Dynamic., Spring-Verlag, Ch.
IV : Filtering.
Matrix Lemma:
113
(1) If A, m× m matrix, is nonsingular υ, V <m
Then
0 −1 −1 (A−1 υ)(V 0 A−1 )
(A + υV ) = A −
1 + V 0 A−1 υ
(A−1 υ)(V 0 A−1 )

−1
proof : A − 0 −1
[A + υV 0 ]
1+V A υ
(A υ)(V 0 A−1 )
−1
−1 0 (A−1 υ)(V 0 A−1 )υV 0
= I− A + A υV −
1 + V 0 A−1 υ 1 + V 0 A−1 υ
−1 0
A υV −1 0 (A υ)(V 0 A−1 υ)V 0
−1
= I− + A υV −
1 + V 0 A−1 υ 1 + V 0 A−1 υ
1
= I− 0 −1
{A−1 υV 0 − A−1 υV 0
1+V A υ
−V 0 A−1 υA−1 υV 0 + (V 0 Aυ)AυV 0 }υV 0 } = I
Corollary:
n+1
!−1
X
−1
Pn+1 = ~xi~x0i
i=1
n
!−1
X
= ~xi~x0i + ~xn+1~x0n+1
i=1
(Pn−1~xn+1 )(~x0n+1 Pn−1 )
= Pn−1 −
1 + ~x0n+1 Pn−1~xn+1
n+1
!−1 n+1
ˆ X X
β~n+1 = ~xi~x0i ~xi yi
i=1 i=1
n+1
!−1 n
X X
= ~xi~x0i −1
~xi yi + Pn+1 ~xn+1 yn+1 .
i=1 i=1
n
(Pn−1~xn+1 )(~x0n+1 Pn−1 )
X
= Pn−1 − −1
~xi yi + Pn+1 ~xn+1 yn+1
1 + ~x0n+1 Pn−1~xn+1 i=1
Pn−1~xn+1~x0n+1 ~ˆ (Pn−1~xn+1 )(x0n+1 Pn−1~xn+1 )

ˆ
~ −1
= βn − βn + Pn ~xn+1 − yn+1
1 + ~x0n+1 Pn−1~xn+1 1 + ~x0n+1 Pn−1~xn+1
ˆ Pn−1~xn+1 ~ˆ0 ~xn+1 ) + Pn−1~xn+1
= β~n − (β n yn+1
1 + ~x0n+1 Pn−1~xn+1 1 + ~x0n+1 Pn−1~xn+1
ˆ ˆ
= β~n + (yn+1 − β~n~xn+1 )Pn−1~xn+1 )/(1 + ~x0n+1 Pn−1~xn+1 )
114
Po
!−1
X ˆ
If we set VPn = ~xi~x0i , β~Po =Least square estimator. Then Vn+1 =
i=1
n+1
!−1
X
~xi~x0n and
i=1
ˆ
β~n are least square estimator of β. ~
Engineer : Set initial value
Vo = CI, C is very small.
ˆ
β~o : guess.
(2) If A = B + w ~w~ 0 is nonsingular
|A|−|B|
Then w ~ 0 Aw
~ = |A|
Notice:
N
an−1
X an − an−1
as an ↑ ∞, an → 1, ∼ log aN
i=1
an
n+1 n
!
X X
Special Case : x2i = x2i + x2n+1 .
i=1 i=1

A w
~ 0
proof : | B |=| A − w
~w~ |= 0
(∗)
w
~ 1
Lemma : If A is nonsingular,

A C −1
Then B D =| A || D − BA C |

I O A C
proof : det
−BA−1 I B D

A C
= det
0 −BA−1 C + D
~ 0 A−1 w
So, (∗) = | A || 1 − w ~|
2. Strong Consistency:
Conditional Fisher0 s information matrix:
L(β,~ yi | y1 , · · · , yi−1 )
n
Y
= ~ yi | y1 , · · · , yi−1 ), implies
L(β,
i=1
n
X
~ y 1 , y2 , · · · , yn ) =
log L(β, ~ yi , | y1 , · · · , yi−1 )
log L(β,
i=1
115
Definition:
( )
~ yi | y1 , · · · , yi−1 ) [∂ log L(β,
∂ log L(β, ~ yi | y1 , · · · , yi−1 )]0
Ji = E y1 , · · · , yi−1
∂ β~ ∂ β~
Conditional Fisher0 s information matrix is
Xn
In = Ji
i=1
Model : yn = β~ 0~xn + εn
εn i.i.d. ∼ N (0, σ 2 )
~xn σ{y1 , · · · , yn−1 } = Fn−1
~0 ~

x )2
~ yi | y1 , · · · , yi−1 ) = log √ 1 e− i 2σ2 i
(y −β
log L(β,
2πσ
√ (yi − β~ ~xi )
0 2
= − log 2πσ −
( 2σ 2
(yi − β~ 0~xi ) 0 (yi − β~ 0~xi )
Ji = E ~xi~xi |Fi−1 }
σ2 σ2
= E{ε2i ~xi~x0i | Fi−1 }/σ 4 = ~xi~x0i E{ε2i | Fi−1 }/σ 4
= ~xi~x0i /σ 2 ,
Xn
In = ~xi~x0i /σ 2
i=1
Recall that when ~xi are constant vectors,

 !−1 n 
n
ˆ X X
cov(β~n ) = cov  ~xi~x0i ~xi εi 
i=1 i=1
n
!−1
X
= ~xi~x0i σ 2 = In−1
i=1
Therefore, for any unit vector ~e ,

n
!−1
ˆ X
V ar(~e0 β~n ) = ~e0 ~xi~x0i ~e σ 2
i=1
= ~e0 In−1~e
116
ˆ
Let δn (~e∗ ) be the minimum eigenvalue (eigenvector) of In . Then Var(~e0∗ β~n ) =
~e0∗ In−1~e∗ = 1/δn ≥ ~e0 In−1~e, ∀ ~e.
So, the data set {~x1 , y1 , ~x2 , y2 , · · · , ~xn , yn } provides least information for estimat-
ing β~ along the direction ~e∗ , we can interpretate the maximum-eigenvaluce similarly.
ˆ
When the L.S.E. β~n is (strongly) consistent? Heuristically, if the most difficult direc-
tion has “infinite” information, we should be able to estimate β~ consistently. More
precisely, if
ˆ
λmin (In ) → ∞, we expect β~n → β~ a.s.
Weak consistently is trivial when ~xi are constants, since

ˆ 1
cov(β~n ) = In−1 and k In−1 k= → 0.
λmin (In )
For strong consistency, this is shown by Lai, Robbins and Wei(1979), Journal
Multivariate Analysis, 9, 340 ∼ 361. !
Xn
Theorem : In the fixed design case if lim λmin ~xi~x0i → ∞
n→∞
i=1
ˆ
Then β~n → β~ a.s. if {εi } is a convergence system.
Definition : {εn } is a convergence system if
n
X n
X
ci εi converges a.s. for all c2i < ∞.
i=1 i=1
Example:
εi ∼i.i.d. Eεi = 0, V ar(εi ) < ∞.
More general, {εn , Fn } is a martingale difference sequence such that
sup E[ε2i | Fi−1 ] < ∞ and

i
sup E[ε2i ] < ∞.
i
Stochastie Case:
< 1 > First Attempt : (Reduce to 1-dimension case).
n
!−1 n
ˆ X X
β~n − β~ = ~xi~x0 ~xi εi
i
i=1 i=1
117
Recall that : {εi , Fi } martingale difference sequence ui Fi−1 .
n
( P∞ 2
X converges a.s. on { i=1 ui < ∞}
ui εi 1+δ
P
1/2
0 ( ni=1 u2i ) [log ( ni=1 u2i )] 2
P
a.s. ∀ δ > 0
i=1
p = dim(β)~ = 1.
ˆ
Conclusion: β~n converges a.s.
n
X
The limit is β~ on the set {In = x2i → ∞}. In fact on this set
i=1

n
! 1+δ
2 n
!1/2 
ˆ X X
β~n − β~ = 0  log x2i / x2i  a.s. ∀ δ > 0.
i=1 i=1
n
!
X
Let Pn = ~xi~x0i , Vn = Pn−1 , Dn = diag(Pn ).
i=1
n
ˆ X
β~n − β~ = (Pn−1 Dn )(Dn−1 ~xi εi )
i=1
 Xn Xn 
 xi1 , εi / x2i1 
= Pn−1 Dn  i=1 i=1
 
.. 

Pn . P 
n 2
i=1 ip i /
x ε i=1 xip
So
n
! 1+δ
2
X
log x2ij
ˆ
k β~n − β~ k ≤ k Pn−1 kk Dn k max Pn
i=1
1/2
1≤j≤P
i=1 x2ij
1+δ
!
(log λ∗n ) 2
= O 1/λn · λ∗n · 1/2
, λ∗n : max. eigen.
λn
118
since
0
 
 0 
 
 0 
 .. 
.
 
 
(0, · · · , 0, 1, 0, · · · , 0)Pn  0  ≥ λn
 
1
 
 
0
 
 
 .. 
 . 
0
1+δ 3
= O(λ∗n (log λ∗n ) 2 /λn ). (∗)
2
ˆ
Conclusion: β~n → β~ a.s. on the set
n 3 o
lim λ∗n (log λ∗n )(1+δ)/2 /λn2 = 0, for some δ > 0 = C
n→∞
n 3 o
Remark: C ⊂ lim λ∗n /λn2 =0
n→∞
If λn ∼ n, then the order of λ∗n should be smaller than n3/2 .
λ∗n λn

det Pn
λn /2 ≤ = ∗ ≤ λn
tr(Pn ) λn + λn
119
Example 1 : yi = β1 + β2 i + εi
i = 1, 2, 3, · · · , n.

1
~xi =
i
 n
X 
n
X  n i 
Pn = ~xi~x0i = X i=1
 
n Xn 
 2 
i=1 i i
i=1 i=1
n
X
implies tr(Pn ) = n + i2 ∼ n 3 .
i=1
n n
!2
X X
det(Pn ) = n i2 − i
i=1 i=1
2
n2 n4 n4 n4

3
∼ n n /3 − = − = .
2 3 4 12
implies λ∗n ∼ n3
λn ∼ n
implies (∗) is not satisfy.

Example 2 : AR (2)
zn = β1 zn−1 + β2 zn−2 + εn
Characteristic polynomial
P (λ) = λ2 − β1 λ − β2
The roots of P (λ) determine the behavior of zn , assume that
P (λ) = (λ − ρ1 )(λ − ρ2 )
= λ2 − (ρ1 + ρ2 )λ + ρ1 ρ2
β1 = ρ1 + ρ2 , β2 = −ρ1 ρ2

zn−1
yn = zn , ~xn =
zn−2
Depcomposition:

vn 1 −ρ1 zn zn − ρ1 zn−1
= =
wn 1 −ρ2 zn−1 zn − ρ2 zn−1
120
Claim : vn = ρ2 vn−1 = εn
wn = ρ1 wn−1 = εn
vn − ρ2 Vn−1 = (zn − ρ1 zn−1 ) − ρ2 (zn−1 − ρ1 zn−2 )

= zn − (ρ1 + ρ2 )zn−1 + ρ1 ρ2 zn−2
= zn − β1 zn−1 − β2 zn−2 = εn
ρ2 = 1, ρ1 = 0, then vn − vn−1 = εn
Xn
= εi + v o
i=1
and wn = εn
n
X zi−1
Pn = (zi−1 , zi−2 )
zi−2
i=1

1 − ρ1 1 − ρ1
Pn
1 − ρ2 1 − ρ2
n
X vi−1
= (vi−1 , wi−1 )
wi−1
i=1
 X n Xn 
2
 vi vi wi 
 i=1 i=1
=  X

n Xn 
 2 
vi wi wi
i=1 i=1
n
X
vo = 0 implies vn = εi , w i = εi
i=1
εi i.i.d. Eεi = 0, and V ar(εi ) < ∞.
121
n
! n
!
X X
tr(Pn ) on order vi2 + ε2i
i=1 i=1
n
! n
! n
!2
X X X
det(Pn ) = vi2 ε2i − v i εi
i=1 i=1 i=1
n
! n
! n n
!2
X X X X
= vi2 ε2i − ε2i + vi−1 εi
i=1 i=1 i=1 i=1
Because vi = vi−1 + εi .
n
X
lim sup vi2 /n(2n log log n) < ∞ a.s. (Donsker Theorem)
n→∞
i=1
n
X
inf(log log n) vi2
i=1
lim > 0 a.s.
n→∞ n2
n
X
implies tr(Pn ) ∼ vi2
i=1
n

n
! " n
#! 1+δ
2

X X X
2 2
Because vi−1 εi = 0  vi−1 log vi−1 .
i=1 i=1 i=1

n
! n
! 1+δ
2

X X
det(Pn ) = −O n2 + 2
vi−1 log 2
vi−1 
i=1 i=1
n
! n
!
X X
+ vi2 ε2i .
i=1 i=1
 
n
! n
!1+δ 
X X
 n2 + 2 2
n
! n
!
  vi−1 log vi−1 

X X i=1 i=1
= vi2 ε2i 1 − O 
  ! ! 
n n

  X X 
i=1 i=1 2
  vi=1 ε2i 
i=1 i=1
122
n
X ! n
!
X n
n2 2
vi−1 ε2i ∼ n
!
i=1 i=1
X
2
vi−1
i=1

2 log log n
= O(n/(n / log log n)) = O
n
" n
!#1+δ n
(log n)1+δ
X X
2
log vi−1 ε2i = O
i=1 i=1
n
= o(1)
implies
n
X
tr(Pn ) ∼ vi2
i=1
n
!
X
det(Pn ) ∼ vi2 ·n
i=1
Not application I
< 2 > Second Approach
Energy function, ε-Liapounov0 s function.
dε(x(t))/dt < 0
Roughly speaking, construct a constant function.
V : <P → <
V (~x) > 0, if ~x 6= ~0
V (~0) = 0
inf V (~x) > 0
|~
x|>M
~ n is a sequence of vectors in <n

If w
s.t.
~ n+1 ) ≤ V (w
V (w ~ n ) and lim V (~ωn ) = 0
n→∞
then ~ n = ~0.
lim w
n→
123
Two essential ideas:
(1) decreasing
(2) never ending unless it reaches zero.
What are the probability analogous ?
Decreasing → supermartingale .
→ almost supermartingle.
Recall the following theorem (Robbins and Siegmund) 1971, Optimization Methods
in stat. ed. by Rustgi, 233∼.
Lemma : (Important Theorem )
Let an , bn , cn , dn , be Fn -measurable nonnegative
( ∞ random varaibles ) E[an+1 | Fn ] ≤
s.t.
X ∞
X
an (1 + bn ) + cn − dn . Then on the event bi < ∞, ci < ∞
i=1 i=1
n
X
lim an exists and finite a.s. and di < ∞ a.s.
n→∞
i=1
What is the supermartingale in above ?
Ans: bn = 0, cn = 0, dn = 0.
We start with the residual sum of squares.
n n
X ˆ X
(yi − β~n0 ~xi )2 = ε2i − Qn
i=1 i=1
n
X ˆ
where Qn = (β~n~xi − β~ 0~xi )2
i=1
n
!
ˆ X ˆ
= (β~n − β)
~ 0 ~xi~x0i (β~n − β)
~
i=1
Heuristic : If the least squares functions is good, one would expect

n n
(yi − β~i0~xi )2 ∼
X X
= ε2i .
i=1 i=1
n
X
That is, relative to ε2i , Qn should be smaller. Therefore, Qn /a∗n may be a
i=1
right consideration for the “energying function ”. Another aspect of Qn is that it is
ˆ ~ which reaches zero only when β~ˆn = β.
a quadratic function of (β~n − β), ~
124
How to choose a∗n ?
ˆ
Qn ≥k β~n − β~ k2 ·λn
ˆ
or Qn /λn ≥k β~n − β~ k, choose : a∗n = λn .
Theorem : In the stochastic regression model.
yn = β~ 0~xi + εi
if sup E[ε2n | Fn−1 ] < ∞ a.s.
n
then on the event

 !−1 
X ∞ n
X 
~x0 ~xi~x0i ~xn /λn < ∞, lim λn = ∞
 n=p n n→∞ 
i=1
proof : an = Qn /λn , bn = 0.
n
!0 n
!−1 n
!
X X X
0
Qn = ~xi εi ~xi~xi ~xi εi
i=1 i=1 i=1
n−1
!0 n−1
!
X X
E[an | Fn−1 ] = ~xi εi Vn ~xi εi /λn
i=1 i=1
n−1
!
X
+2E[~x0n εn Vn ~xi εi | Fn−1 ]/λn
i=1
+E(~x0n Vn εn | Fn−1 )/λn .
n
!0 n
!
X X
= ~xi εi Vn ~xi εi /λn
i=1 i=1
+~x0n Vn~xn E[ε2n | Fn−1 ]/λn
n−1
!0 n−1
!
X X
≤ ~xi εi Vn−1 ~xi εi /λn + cn−1
i=1 i=1
= Qn−1 /λn + cn−1

1 1
= Qn−1 /λn−1 − Qn−1 − + cn−1
λn−1 λn
= an−1 − an−1 (1 − λn−1 /λn ) + cn−1
125
By the almost supermartingale theorem.
X
λn − λn−1
lim an < ∞ and an−1 <∞
n→∞ λn
X 0
X ~xn Vn~xn 2
a.s. on { cn−1 < ∞} = E[εn | Fn−1 ] < ∞
λn
X 0
~xn Vn~xn
⊃ <∞
λn
If lim an = a > 0
n→∞
Then ∃N s.t. an ≥ a/2, ∀ n > N
∞ ∞
!
X λi − λi−1 a X λi − λi−1
So ai−1 ≥
i=1
λi 2 i=N
λi
∞ Z λi
a X dx
≥ · λn /λn−1
2 i=N λi−1 x
Z ∞
a 1
≥ inf λn /λn−1 dx = ∞
2 n≥N λn−1 x
Note 1: If λn−1 /λn has limit point λ < 1 then there exists
λnj − λnj−1
nj 3 lim λnj−1 /λnj = λ, lim = 1 − λ.
j→∞ j→∞ λnj
This contradicts.
X λi − λi−1
Note 2 : If <∞
i
λi
λn −λn−1
Then λn
→0
λn−1 /λn → 1.
Therefore, on the event
X
~xn Vn~xn
< ∞, λn → ∞ ,
λ
an → 0 a.s.
ˆ
since an ≥k β~n − β~ k2
ˆ
β~n → β~ a.s. on the same event.
126
Corollary : On the event
{λn → ∞, (log λ∗n )1+δ = O(λn ) for some δ > 0}

ˆ
Then lim β~n = β~ a.s.
n→∞
∞ n
!−1
X X
proof : ~x0n ~xi~x0i ~xn /λn < ∞
n=p i=1
∞ ∞
X ~x0 Vn~xn X | Pn | − | Pn−1 |
= n
≤ (By Pn = Pn−1 + ~xn~x0n )
n=p
λn n=p
| P n | λ n
∞
!
X | Pn | − | Pn−1 |
= O
n=p
| Pn | (log λ∗n )1+δ
∞
!
X | Pn | − | Pn−1 |
= O
n=p
| Pn | (log | Pn |)1+δ
= O(1)
Since | Pn |= λ∗n · · · λn → ∞.
implies log | Pn |≤ p log(λ∗n ).
•• Knopp : Sequence and Series.
as an ↑
X an − an−1
implies <∞
an (log an )1+δ
Z ∞
1
dx < ∞
2 x(log x)1+δ
Because ~x0n Vn = ~x0n Vn−1 /(1 + ~x0n Vn−1~xn )
~x0 Vn−1~xn~x0n Vn−1
~x0n Vn = ~x0n Vn−1 − n
1 + ~x0n Vn−1~xn
127
< 3 > Third Approach:
k
!0 k
!
X X
Qk = ~xi εi Vk ~xi εi
i=1 i=1
k−1
!0 k−1
!
X X
= ~xi εi Vk ~xi εi
i=1 i=1
k−1
X
+~x0k Vk ~xk ε2k + 2(~x0k ~xk ~xi εi )εk
i=1
k−1
X
= Qk−1 − (~x0k Vk−1 ~xi εi )2 /(1 + ~x0k Vk−1~xk )
i=1
k−1
X
+~x0k Vk ~xk ε2k + 2(~x0k Vk ~xi εi )εk .
i=1
n
X
Qn − QN = (Qj − Qj−1 )
j=N +1
n k−1
!2
X X
= − ~x0k Vk−1 (~xi εi ) 2
(1 + ~x0k Vk−1~xk )
k=N +1 i=1
n n k−1
!
X X X
+ ~x0k Vk ~xk ε2k + 2 ~x0k Vk ~xi εi εk
k=N +1 k=N +1 i=1
128
n k−1
!2
X X
implies Qn − QN + ~x0k Vk−1 ~xi εi /(1 + ~x0k Vk−1~xk )
k=N +1 i=1
k−1
!
X
n n
~x0k Vk−1 ~xi εi
X X k=1
(1) = ~x0k Vk ~xk ε2k +2 εk
k=N +1 k=N +1
1 + ~x0k Vk−1~xk
n k−1
!2
X X
(2) = ~x0k Vk−1 ~xi εi /(1 + ~x0k Vk−1~xk )
k=N +1 i=1
k−1
!
X
n
~x0k Vk−1 ~xi εi
X i=1
= εk
i=N +1
1 + ~x0k Vk−1~xk
(1) finite if and only if (2) finite.

Theorem: If sup E[ε2n | Fn−1 ] < ∞ a.s.
n
Then
k−1
!2
X
n
~x0k Vk−1 ~xi εi
X i=1
−QN + Qn +
k=N +1
1 + ~x0k Vk−1~xk
n
X
∼ ~x0k Vk ~xk ε2k a.s.
k=N +1
on the set where one of it approaches ∞.

proof: Let
k−1
X
~x0k Vk−1 ~xi εi
i=1
Uk =
1 + ~x0k Vk−1~xk
Then Uk is Fk−1 -measurable.
129
Therefore
 n
! " n
#
X X
Uk2 on Uk2 < ∞

 O


n
X 
k=N +1 " k=N +1
Uk εk = n
! ∞
#
 X X
k=N +1 2 2
 o Uk on Uk = ∞



k=N +1 k=N +1
n
X n
X
But Uk2 ≤ Uk2 (1 + ~x0k Vk−1~xk )
N +1 N +1
n k−1
!2
X X
= ~x0k Vk−1 ~xi εi /(1 + ~x0k Vk−1~xk )
N +1 i=1
Special case ~xi = 1, Pn = n.

 2
k−1
X
n
!2 n
 εi 

X X 
 i=1  1
εi n+ 1+
k − 1 k−1
 
i=1 k=N +1  
n
X ε2k
∼
k=N +1
k
 k−1 2
X
 n εi  n !
 X 1 1 X k − 1
(εk−1 )2

  1+ =

k=N +1 k − 1 
 k − 1 k=N +1
k
130
(εk )2 ∼ (log n)σ 2 .
P
Because
n
X k−1
X
Qn + (~x0k Vk+1 ~xi εi )2 /(1 + ~x0k Vk−1~xk )
k=N +1 i=1
Xn
∼ ~x0k Vk ~xk ε2k , if one of it → ∞, where
k=N +1
n
!
ˆ X ˆ
Qn = (β~n − β)
~ ~xi~x0i (β~n − β)
~
i=1
n
!0 n
!
X X
= ~xi εi Vn ~xi εi .
i=1 i=1
Lemma : Assume that {εk , Fk } is a martingale difference sequence and Vk is Fk−1 -

measurable all k.
(i) Assume that sup E[ε2n | Fn−1 ] < ∞ a.s.
n
∞
(∞ )
X X
Then | uk | ε2k < ∞ a.s. on | uk |< ∞
k=1 k=1
 !1+δ 
∞ n
! n
X X X
and | uk | ε2k = o  | uk | log | uk | .
k=1 k=1 k=1
(∞ )
X
on the set | uk |= ∞ , for all δ > 0.
k=1
(ii) Assume that sup E[| εn |α | Fn−1 ] < ∞, for some α > 2. Then
n
n
X n
X
| uk | ε2k − | uk | E[ε2k | Fk−1 ]
k=1 k=1
n
! (∞ )
X X
=o | uk | a.s. on | uk |= ∞, sup | un |< ∞ .
n
k=1 k=1
131
Therefore, if lim E[ε2k | Fk−1 ] = σ 2 a.s.
k→∞
n
X n
X
Then lim | uk | ε2k / | uk |= σ 2 a.s.
n→∞
( ∞ k=1 k=1
)
X
on | uk |= ∞, sup | un |< ∞
n
k=1
Note:
Basic idea is to ask : zi ≥ 0, the relation of
n
X n
X
zi and E[zi | Fi−1 ]
i=1 i=1
Xn n
X
Because E(| uk | ε2k | Fk−1 ) = | uk | E(ε2k | Fk−1 )
k=1 k=1
(Freedman. D. (1973). Ann. Prob. 1, 910∼925.).

proof: (i) Take an large enough so that
∞
X
P [| uk |> ak ] < ∞
k=1
Let u∗k = uk I[|uk |≤ak ]
Then P {uk = u∗k eventually }=1.
If we can show our results for {u∗k } then the results also hold for {uk }.
Therefore, we can assume that each uk is a bounded random variables.
∀ M > 0, define
vk = uk I[E(ε2k |Fk−1 )≤M ] I k

X
| ui |≤ M 
 


i=1
then vk is Fk−1 -measurable.

∞
! ∞
!
X X
Then E | vi | ε2i = E(| vi | ε2i | Fi−1 )
i=1 i=1
∞
!
X
= E | vi | E[ε2i | Fi−1 ]
i=1
132
 
∞
X 

≤E
 | u i | I
i
X
 · M

 i=1 
| u |≤ M
 

 j 

j=1
≤ M2 < ∞
∞
X
So | vi | ε2i < ∞ a.s.
i=1
( ∞
)
X
Observe that vk = uk , ∀ k on sup E[ε2n | Fn−1 ] ≤ M, | un |≤ M = ΩM .
n
n=1
∞
X
So | ui | ε2i < ∞ a.s. on ΩM , ∀ M .
i=1
∞
( ∞
)
[ X
But ΩM = sup E[ε2n | Fn−1 ] < ∞, | un |< ∞ .
n
M =1 n=1
(∞ )
X
= | un |< ∞
n=1
The proof is first part.

n
X
Let sn = | ui |
i=1
n
X | uk | ε2k
consider
k=1
sk (log sk )1+δ
∞
X | un |
Since < ∞ a.s.
s (log sn )1+δ
n=1 n
∞ Z sn
X dx
≤
n=1 sn−1
x(log x)1+δ
∞
X | uk |
implies ε2 < ∞ a.s.
k=1
sk (log sk )1+δ k
133
( n
)
X
By Kronecker0 s Lemma, on sn = | ui |→ ∞
i=1
n
X
| uk | ε2k
k=1
lim = 0 a.s.
n→∞ sn (log sn )1+δ
(ii) (Chow (1965), local convergence theorem).
For a martingale difference sequence {δk , Fk }
X n
εk converges a.s. on
k=1
(∞ )
X
E(| δk |r | Fk−1 ) < ∞ .
k=1
where 1 ≤ r ≤ 2.
Set δk = u2k [ε2k − E(ε2k | Fk−1 )]
Then {δk , Fk } is a martingale difference sequence without loss of generality,
1 1
we can assume that 2 < α ≤ 4. If α ≥ 4, then E 4 (ε4i | Fi−1 ) ≤ E α (| εi |α | Fi−1 ).
Set r = α/2.
Let tn = ni=1 | ui |2r .
P
E[| δk |r | Fk−1 ]
=| uk |2r E{| ε2k − E[ε2k | Fk−1 ] |r | Fk−1 }
≤ | uk |2r E{[max(| ε2k |, E[ε2k | Fk−1 ]r | Fk−1 }
k
≤ | uk | E{| εk |2r +E r [ε2k | Fk−1 ] | Fk−1 }
2r
= | uk |2r {E[| εk |2r | Fk−1 ] + E r [ε2k | Fk−1 ]}

≤ 2 | uk |2r E[| εk |2r | Fk−1 ]
Xn
So E(| δk /tk |r | Fk−1 )
k=1
n
!
X | uk |2r
≤ 2 sup E[| εn |α | Fn−1 ] < ∞ a.s.
k=1
trk n
n
X
So δk = o(tn ) a.s. on {tn → ∞}
k=1
n
(∞ )
X X
But δk converges a.s. on | ui |2r = lim tn < ∞ .
n→∞
k=1 i=1
134
n
X
0
by Chow s Theorem on δi .
i=1
Observe that on {supn | un |< ∞}.
n
!
X
2r−1
tn ≤ | ui | sup | un |
n
i=1
Combining all those results

n n
! (∞ )
X X X
δi = o | ui | a.s. on | ui |= ∞, sup | un |< ∞ .
n
i=1 i=1 i=1
It is not difficult to see that

n n
!
X X
| uk | ε2k =O | uk | a.s. on sup | un |< ∞
n
k=1 k=1
This is because
( n
X
(a) On | ui | < ∞, sup | un |< ∞ ,
n
i=1
n n
!
X X
| uk | ε2k = O(1) = O | uk | (by (i))
k=1 k=1
( n
X
(b) On | uk | = ∞, sup | un |< ∞} ,
n
k=1
n n n
!
X X X
| uk | ε2k = | uk | E(ε2k | Fk−1 ) + o | uk |
k=1 k=1 k=1
n
! n
!
X X
≤ | ui | sup E(ε2n | Fn−1 ) + o | uk |
n
i=1 k=1
n
!
X
= | ui | sup E(ε2n | Fn−1 ) + o(1)
n
i=1
n
!
X
= O | ui | .
i=1
135
Now, if lim E[ε2n | Fn−1 ] = σ 2 .
n→∞
n n
(∞ )
X X X
Then | uk | E[ε2k | Fk−1 ]/ | uk |→ σ 2 a.s. on | uk |= ∞ .
k=1 k=1 k=1
n
X
By an ≥ 0, bn ≥ 0, bn → b, ai → ∞
i=1
n
X n
X
Then ai b i / ai → b.
i=1 i=1
n
X n
X
So | uk | ε2k / | uk |
k=1 k=1
n
X
| uk | E[ε2k | Fk−1 ]
k=1
= n + o(1)
X
| uk |
k=1
( ∞
)
X
→ σ 2 , a.s. on sup | un |< ∞, | uk |= ∞ .
n
k=1
Lemma 2: Let {wn } be a p × 1 vectors and

Xn
An = w ~ i0 . Assume that AN is nonsingular for some N . Let λ∗n and | An | denote
~ iw
i=1
the maximum evgenvalue and determinant of An .
Then (i) λ∗n ↑ .
∞
X
(ii) lim λ∗n < ∞ implies ~ i0 Ai w
w ~ i < ∞.
n→∞
i=N
n
X
(iii) lim λ∗n = ∞, implies ~ i0 A−1
w i w~ i = O(log λ∗n ).
n→∞
i=N
0 −1
(iv) lim λ∗n = ∞, w ~i →
~ i Ai w 0, implies
n→∞
Xn
~ i0 A−1
w i w~ i ∼ log | An | .
i=N
136
proof : (i) trivial.
| An | − | An−1 |
~ n0 A−1
(ii) w n w~n =
| An |
(λn ) ≥| An | and | An |≥ λ∗n λp−1
∗ p
n
Where λn is the minimum eigenvalue of An .
If λ∗n < ∞, then lim | An |< ∞.

n→∞
∞ ∞
X X | Ai | − | Ai−1 |
So ~ i0 A−1
w i w~i =
i=N i=N
| Ai |
X∞
(| Ai | − | Ai−1 |)
lim | An | − | AN −1 |
i=N n→∞
≤ = < ∞.
| Ai | | AN |
n n
X X | Ai | − | Ai−1 |
(iii) Note that ~ i0 A−1
w i w~i =
i=N i=N
| Ai |
n Z |Ai |
X 1
≤ dx + 1
i=N +1 |A i−1 | x
= 1 + log | An | − log | AN |
= O(log | An |) = O(log λ∗ ).
(iv) Note that λ∗n → ∞, | An |→ ∞.
| An | − | An−1 |
Now → 0 implies
| An |
n
X | Ai | − | Ai−1 |
∼ log | An |
i=N
| A i |
137
Corollaryl : (1) If sup E[ε2n | Fn−1 ] < ∞ a.s.
n
n
X
Then ~x0k Vk ε2k
k=N +1
= O((log λ∗n )1+δ ) a.s.

(2) If sup E[| εn |2+δ | Fn−1 ] < ∞, for some δ > 0.
n
Then
n
X
(i) ~x0k Vk ~xk ε2k = O(log λ∗n ) a.s.
k=N +1
(ii) lim E[ε2n | Fn−1 ] = σ 2 . Then

n→∞
n n
!!
X X
~x0k Vk ~xk ε2k ∼ log det ~xk ~x0k
k=N +1 k=1
n o
on lim ~x0n Vn~xn = 0, λ∗n → ∞ .
n→∞
| Pk | − | Pk−1 |
proof : 0 ≤ uk = ~x0k Vk ~xk = ≤1
| Pk |
(1) If lim λ∗n < ∞ then ∞

P
n→∞ k=1 uk < ∞ (lemma 2 - (ii)).
∞
X
Therefore uk ε2k < ∞ (by lemma 1-(i)).
k=1
n
X
So uk ε2k = O((log λ∗n )1+δ ) on (λ∗n → ∞)
k=1
n
X
If λ∗n → ∞, ui = O(log λ∗n ).
i=1
n n
! n
!
X X X
and ui ε2i = O( ui [log ui ]1+δ )
i=1 i=1 i=1
∗ ∗ 1+δ
= O(log λn (log log λn ) )
= O((log λ∗n )1+δ ).
138
(2) Note that 0 ≤ ui ≤ 1.
un → 0 on Ωo , Ωo = {limn→∞~x0n Vn~xn = 0, λ∗n → ∞} .
Xn
ui → ∞ on Ωo
i=1
By lemma 1 - (ii),
n
X n
X
ui ε2i / ui → σ 2 a.s.
i=1 i=1
n
X
on ui ε2i ∼ (log | Pn |)σ 2
i=1
Remark:
n k−1
!2
X X
1o Rn = Qn + ~x0k Vk ~xi εi /(1 + ~x0k Vk−1~xk )
k=N +1 i=1
n
X
∼ ~x0k Vk ~xk ε2k if one of it → ∞.
k=N +1
2o (i) Assume that sup E[ε2n | Fn−1 ] < ∞ a.s.

n
Then Rn = O((log λ∗n )1+δ ) a.s. for δ > 0
(ii) If sup E[| εn |2+δ | Fn−1 ] < ∞ a.s. for some α > 2,
n
Then Rn = O(log λ∗n )
3o If sup E[| εn |α | Fn−1 ] < ∞ a.s. and lim E[ε2n | Fn−1 ] < ∞ a.s.
n n→∞
then on {~x0n Vn~xn → 0, λ∗n → ∞}

n
!
X
Rn ∼ [log det ~xi~x0i ]σ 2 a.s.
i=1
Corollary 1 : (i) If sup E[ε2n | Fn−1 ] < ∞ a.s.

n
n
!−1/2 n
X X
Then Qn =k ~xi~x0i ~xi εi k2 (∗)
i=1 i=1
= O((log λ∗n )1+δ ) a.s. (∗∗)
139
and k ~bn − β~ k2 = O((log λ∗n )1+δ /λn ) a.s., for all δ > 0.
(ii) If sup E[| εn |α | Fn−1 ] < ∞ a.s. for some α > 2,

n
then (∗) and (∗∗) holds with δ = 0
proof : Qn ≤ Rn (implies (∗) follow from Remark − 2o )

n
!0 n
!−1 n
!
X X X
Qn = ~xi εi ~xi~x0i ~xi εi
i=1 i=1 i=1
n
!
X
= (~bn − β)
~ 0 ~xi~x0i (~bn − β)
~
i=1
≥ λn (~bn − β)
~ 0 (~bn − β)
~
= λn k ~bn − β~ k2 .
So (∗∗) follow from (∗).

Corroblary 2: (Adaptive prediction)
If lim E[ε2n | Fn−1 ] = σ 2 a.s. and
n→∞
sup E[| εn |α | Fn−1 ] < ∞ for some α > 2
n
then on the set
{~x0n Vn~xn → 0, λ∗n → ∞}, we have that

Xn
Qn + {(~bk−1 − β)
~ 0~xk }2 .
k=N +1
" n
!#
X
∼ σ 2 log det ~xi~x0i a.s.
i=1
Therefore, if Qn = 0(log λ∗n ), then

n n
!
0
X X
(yk − ~bk−1~xk − εk )2 ∼ σ 2 log[det ~xi~x0i ] a.s.
k=N +1 i=1
140
proof: By Remark- 3o ,
n k−1
!2
X X
Qn + ~x0k Vk ~xi εi /(1 + ~x0k Vk−1~xk )
k=N +1 i=1
Xn
Qn + ~x0k (~bk−1 − β)
~ 2 /(1 + ~x0 Vk−1~xk )
k
k=N +1
n
!
X
∼ σ 2 log[det ~xi~x0i ] a.s.
i=1
n
X
[~x0k (~bk−1 − β)]
~ 2 /(1 + ~xk Vk−1~xk )
k=N +1
Xn
∼ [~x0k (~bk−1 − β)]
~ 2 if it → ∞ and ~x0 Vk−1~xk → 0,
k
k=N +1
1
since 1 + ~x0k Vk−1~xk = → 1.
1 − ~x0k Vk ~xk
n
X n
X
and ai b i ∼ ai (ai bi > 0)
i=1 i=1
n
X
if bi → 1 and ai → ∞
i=1
~0
(Because yk = β ~xk + εk )
Predict:
At stage n, we already above {y1 , ~x1 , · · · , yn , ~xn } since we can not forsee the
future, we have to use observed data to predict yn+1 .
i.e. The predictor ŷn+1 is Fn -measurable.
If we are only interested in a single period prediction, we may use (yn+1 − ŷn+1 )2 as
a measure of performance. In the adaptive prediction case, it may be more appropriate
to use the accumulated prectiction errors
n
X
Ln = (yk+1 − ŷk+1 )2
k=1
141
In the stochastic regression model,
n
X
Ln = (β~ 0~xk+1 − ŷk+1 )2
k=1
n
X n
X
+2 (β~ 0~xk+1 − ŷk+1 )εk+1 + ε2k+1
k=1 k=1
By Chow0 s local convergence Theorem,

n
X n+1
X
Ln ∼ (β~ 0~xk+1 − ŷk+1 )2 + εk a.s. if any side → ∞.
k=1 k=1
Therefore, to compare difference predictors, it is sufficient to compare

n
X
(β~ 0~xk+1 − ŷk+1 )2 = Cn
k=1
The least square predictor ŷk+1 = ~b0k ~xk+1 .

n
X
Note : (yi − ~b0i−1~xi )2 (1 − ~x0i Vi~xi )
i=P +1
Xn n
X
= ε̂2i (n) − ε̂2i (p), where ε̂i (n) = yi − ~b0n~xi
i=1 i=1
Example : AR(1)
xk = ρxk−1 + εk , εk ∼ i.i.d..
E[εi ] = 0, V ar(εi ) = σ 2 , E | εi |3 < ∞
n
X
(i) | ρ |< 1, x2i /n → σ 2 /(1 − ρ2 ) a.s.
i=1
n
X
(ii) | ρ |= 1, x2i = O(n2 log log n)
i=1
142
n
!
X
(log log n) x2i
1
lim inf > 0 a.s.
n→∞ n2
λ∗n = O(n3 ) a.s., | ρ |≤ 1.
lim inf λn /n > 0
n→∞
1/2
log n
ρ̂n − ρ = 0 (By Corollary1.)
n
Xn
2
xn / x2i → 0
i=1
n
X n−1
X
n
x2i /n − x2i /n
X i=1 i=1 0
(i) | ρ |< 1, x2n / x2i = n → =0
X 1/(1 − ρ2 )
i=1
x2i /n
i=1
(ii) | ρ |= 1,
 !2 
n
X n
X
x2n / x2i = O  εi /(log log n/n2 )−1 
i=1 i=1

log log n p
= O ( 2n log log n)2
n2
(log log n)2

= O = o(1)
n
143
n
!2 n
X X
Qn = x i εi / x2i
 i=1  i=1
 

n
!1/2 n
!1/3 2
 1  X X
=  n
 X !

 x2i log x2i 
2  i=1 i=1
xi

i=1
n
!2/3
X
= log x2i
i=1
= (log n) 2/3
= 0(log λ∗n )
By Corolary 2,
n n+1
!
X X
(ρ̂n − ρ)x2i+1 ∼ σ 2 log x2i a.s.
i=2 i=1

σ 2 log n, a.s. if | ρ |< 1.
∼
2σ 2 log n, a.s. if | ρ |= 1.
log[n2 log log n] = 2 log n + log(log log n)
log[n2 / log log n] = 2 log n − log(log log n).
To find the eigenvalue (maximum and minimum)
inf ~x0 Bn~x nonnegative positive

k~
xk=1

o 0
1 lim inf inf ~x Bn~x
n→∞ k~
xk=1
n o
6= inf lim ~x0 Bn~x (The place of difficulity)
k~
xk=1 n→∞
2o Lemma : Assume that {Fn } be a sequence of ↑ σ -fields and ~yn = ~xn + ~εn ,
when
`
X
~xn is Fn−` -measurable, ~εn = ~εn (j) and E{~εn (j) | Fn−j−1 } = 0.
j=1
144
sup E[k~εn (j)kα | Fn−j−1 ] < ∞ a.s. for some α > 2. Also assume that λn =
n ! !
X n n
X n
X
λ ~xi~x0i + ~εi ~ε0i → ∞ a.s. and log λ∗ ~xi~x0i = 0 (λn ) a.s.
i=1 i=1 i=1
n
!
X
Then lim λ ~yi ~yi0 /λn = 1 a.s.
n→∞
i=1
n
X n
X
proof : Let Rn = ~xi~x0i and Gn = ~εi ~ε0i
i=1 i=1
n
X n
X n
X
Then ~yi ~yi0 = Rn + ~xi ~ε0i + ~εi~xi + Gn
i=1 i=1 i=1
We can assume that Rn is nonsingular.

 
1
 0 
Otherwise, add ~yo =   = ~xo
 
..
 . 
0
 
0

 1 

~y−1 = ~x−1 =
 0 

 .. 
 . 
0
..
.  
0
 
  0
~y1−P = ~x1−P =  
  ..
  .
 0 
1
εo = ε−1 = · · · = ε−P +1 = 0
n
−1
X
kRn 2 ~xi ~ε0i (j)k2 = O(log(λ∗n )), (By Corollary 1.)
1
= o(λn )
145
n
−1
X
Therefore kRn 2 ~xi ~εi k2 = O(log λ∗n )
1
Given any unit vector ~u ,

n
!
X
0
~u ~xi ~εi ~u
1
n
!
1
−1
X
0
= ~u Rn Rn 2
2
~xi ~ε0i ~u
1
n
!
1
− 12
X
≤ k~u0 Rn kkRn 2
~xi ~ε0i k
1
1 1
= k~u0 Rn kO((log λ∗n ) 2 )
2
1 1
= (~u0 Rn~u) 2 O((log λ∗ ) 2 )
1 1
≤ (~u0 (Rn + Gn )~u) 2 O(log 2 λ∗n )
1 1
≤ (~u0 (Rn + Gn )~u/λn2 ) O(log 2 λ∗n )
(Because 1 ≤ ~u0 (Rn + Gn )~u)/λn )
1
= ~u0 (Rn + Gn )~u O((log λ∗n /λn ) 2 )
= (~u0 (Rn + Gn )~u)o(1)
n
!
X
So ~u0 ~yi ~yi0 ~u = ~u0 (Rn + Gn )~u(1 + o(1))
1
Since o(1) does not depend on ~u, we complete this proof.

Example :AR(p).
yi = β1 yi−1 + · · · + βp yi−p + εi
ψ(z) = z p − β1 z p−1 · · · − βp .
All the roots of ψ have magnitudes less than or equal to 1.

 
yp
 yp+1 
Let ~yn = 
 
.. 
 . 
yn−p+1
146
Then L.S.E.
n
!−1 n
!
X X
~bn = 0
~yi−1 ~yi−1 yi−1 εi + β~
i=1 i=1
Assume that εi are i.i.d.
E[εi ] = 0 and E[| εi |2+δ ] < ∞, Eε2i = σ 2 > 0.

β1 · · · βp
Let B =
Ip−1 O
   
  β1 β2 · · · βp   εn
yn yn−1
 yn−1  
 1 0 ··· 0    yn−2

  0 

~yn = 

..
  0 1
= ··· 0  
  .. +
  0 

 .   .. .. ..  .   .. 
 . . .   . 
yn−p+1 yn−p
0 0 1 0 0
 
1

 0 

implies ~yn = B ~yn−1 + ~e εn , where e = 
 .. 
 . 

 0 
0
~yn = B n yo + B n−1~eεn + · · · + B o~eεn
B can be written as
B = C −1 DC, where D = diag [D1 , · · · , Dq ]
 
λj 1 0 · · · 0
 0 λj 1 · · · 0 
Dj = 
 
.. .. . . .. 
 . . . . 
0 0 ··· λj
is an mj × mj matrix, mj is the multiplicity of λj .
147
q
X
, mj = p, λj are roots of ψ and C is a nonsingular matrix.
1
 k k k

λk−1 λk−2

λkj 1 j 2 j ··· mj −1
λk−mj +1
k
 0 λj 0 ··· 0
 

k
 .. .. 
Dj = 
 0 0 . . 

 .. .. .. .. 
 . . . . 
0 0 0 ··· λkj
B n = C −1 Dn C
= C −1 diag[D1n , · · · , Dqn ]C
kB n k ≤ kC −1 kkCk max{kD1n k, · · · , kDqn k}

p n n!
≤ k n (By = ∼ np )
p p!(n − p)!
k~yn k ≤ kB n kk~yo k + · · · + kB o~ek | εn |
≤ k np {k~yo k+ | ε1 | + · · · + | εn |}
= O(np+1 )
n
!
X
0
λmax ~yi−1 ~yi−1
i=1
n
X
0
≤ k ~yi−1 ~yi−1 k
i=1
n
X n
X
0
≤ k~yi−1 ~yi−1 k≤ k~yi−1 k2
i=1 i=1
n
!
X
= O ip+1 a.s.
i=1
p+2
= O(n ) a.s.
148
implies λmax = O(np+2 )
~yn = B 2 ~yn−2 + B~eεn−1 + ~eεn

= B p ~yn−p + B p−1~eεn−p+1 + · · · + ~eεn
= ~xn + ~εn , where
~xn = B p ~yn−p , ~εn = B p−1~e εn−p+1 + · · · + ~e εn
` = p.
n p−1
1X 0 X
Claim : lim ~εi ~εi = σ 2 B j ~e ~e0 (B 0 )j ≡ Γ, a.s.
n→∞ n
i=1 j=0
where Γ is positive definite.

n
!
X
Therefore, λmin ~εi ~ε0i /n → λmin (Γ) > 0 a.s.
i=1
p−1
X
~εi ~ε0i = B j ~e~e0 (B 0 )j ε2i−j
j=0
p−1
X
+ B j ~e ~e0 (B 0 )` εi−j εi−`
j6=`
Using the properties that

n
X
1/n ε2i−j = σ 2
i=1
n
X
1/n εi−` εi−j = 0 a.s. ∀ ` 6= j.
i=1
(From Martingale form and by Chow0 s theorem.)

n
1X 0
We have lim ~εi ~εi = Γ. a.s.
n→∞ n
i=1
149
Observe that
 
~e0
 ~e0 B 0 
Γ = (~e, B~e, · · · , B p−1~e) 
 
.. 
 . 
~e0 (B 0 )p−1
To show Γ is nonsingular, it is sufficient to show (~e, B~e, · · · , B p−1~e) is nonsingular.

 
1 β1 ∗ ∗ · · · ∗
 0 1 β1 ∗ · · · ∗ 
 
p−1  0 0 1
(~e, B~e, · · · , B ~e) =   is nonsingular.

 .. .. .. 
 . . . 
0 0 0 ··· ··· 1
~xn = B p ~yn−p
n
! n
X X
∗ 0 p 2 0
λ ~xi~xi ≤ kB k ~yi−p ~yi−p k
i=p i=p
p+2
= O(n ) a.s.
n
!
X
But λn ≥ λ∗ ~εi ~ε0i ∼ nλ∗ (Γ)
i=1
n
!
X
So log λ∗ ~xi~x0i = O(log n) = o(λn ) a.s.
i=1
By previous theorem,
n
!
X
0
λ∗ ~yi−1 ~yi−1
i=1
lim = 1 a.s.
n→∞ λn
n
!
X
0
Therefore, lim inf λ∗ ~yi−1 ~yi−1 /n > 0 a.s.
n→∞
i=1
150
n
!
X
So log λ∗ 0
~yi−1 ~yi−1
i=1
n
!!
X
0
= o λ∗ ~yi−1 ~yi−1 and
i=1
lim ~bn = β~ a.s.

n→∞
3. Limiting Distribution :
yn,i = βn0 xn,i + εn,i , i = 1, 2, · · · , n
Assume that ∀ n,
∃ ↑ σ-fields {Fn,j ; j = 0, 1, 2, · · · , n}
s.t. ∀ n {εn,j , Fn,j } is a martingale difference sequence and xn,j is Fn,j−1 -measurable.
Assume that:
(i) E[ε2n,j | Fn,j−1 ] = σ 2 a.s. ∀ n, j.

(ii) sup E[| εn,j |α | Fn,j−1 ] = OD (1), α > 2.
1≤j≤n
(iii) ∃ nonsingular matrices An s.t.
n
!
D
X
An ~xi,n~x0i,n A0n → Γ, where Γ is p.d.
i=1
D
(iv) sup kAn~xn,i k → 0.
1≤i≤n
n
!−1 n
!
X X
Then if ~bn = ~xn,i~x0n,i ~xn,i yn,i , we have
i=1 i=1
D
(A0n )−1 (~bn ~ → N (0, σ 2 Γ−1 )
− β)
take i = 1, 2, · · · , kn
151
Note: If {Xn,j , Fn,j , 1 ≤ j ≤ kn } is a martingale difference sequence s.t.
kn
D
X
2
(i) E[Xn,j | Fn,j−1 ] → C, constant
j=1
kn
D
X
2
(ii) E[Xn,j 2 >ε} | Fn,j−1 ] → 0
I{Xn,j
j=1
kn
D
X
Then Xn,j → N (0, C)
j=1
proof: W.L.O.G, we can assume that

xni is bounded,∀ n, i.
since (A0n )−1 (~bn − β) ~
kn
! kn
!
X X
= An ~xn,i~x0n,i A0n An ~xn,i εn,i
i=1 i=1
It is sufficient to show that

kn
D
X
An ~xn,i εn,i → N (~o, σ 2 Γ)
i=1
0
By Wald s device, it is sufficient to show that
∀ ~t =6 ~0
kn
D
X
~t0 An ~xn,i εn,i → N (0, σ 2~t0 Γ~t)
i=1
Let un,i = ~t0 An~xn,i εn,i

Then {un,i , Fn,i } is martingale difference s.t.
kn
X kn
X
E(u2n,i | Fn,i−1 ) = (~t0 An~xn,i )2 E[ε2n,i | Fn,i−1 ]
i=1 i=1
kn
X
= σ 2 ~t0 An~xn,i~x0n,i A0n~t
i=1
kn
X
= σ 2~t0 An ~xn,i~x0n,i A0n t
1
D 2~0
→ σ t Γ~t = C, say.
152
kn
X h i
and E u2n,i I{u2n,i >ε} | Fn,i−1
i=1
kn
α−2
X
α
≤ E[| un,i | | Fn,i−1 ] ε 2
i=1
kn
1 X
= α−2 | ~t0 An~xn,i |α E[| εn,i |α | Fn,i−1 ]
ε 2
i=1
kn
−( α−2
2 )
X
≤ ε 2
sup E[| εn,i | | Fn,i−1 ] | ~t0 An~xn,i |2
1≤i≤kn
i=1
· sup | ~t0 An~xn,i |α−2 .
1≤i≤kn
n
!
α−2
≤ ε−( ) sup E[| ε |α | F
X
2
n,i
~0
n,i−1 ] · t An ~xn,i~xn,i A0n~t
1≤i≤kn
i=1
D
·k~tk sup kAn~xn,i k → 0.
1≤i≤kn
Example : yo = 0
yn = α + βyn−1 + εn , where | β |< 1, εn i.i.d., E[εn ] = 0,
V ar[εn ] = σ α , E[| εn |2 ] < ∞, for some α > 2.
yn = α + β[α + βyn−2 + εn−1 ] + εn

= α + βα + β 2 yn−2 + βεn−1 + εn
= α + βα + β 2 α + · · · + β n−1 α + β n−1 εn−1 + · · · + εn
Since α + βα + β 2 α + · · · + β n−1 α + · · ·
= α(1 + β + β 2 + · · · + β n−1 + · · · )
1
= α .
1−β
153
n 2
σ2

1X 2 α
implies y → +
n i=1 i 1−β (1 − β)2
n
1X α
yi → a.s.
n i=1 1−β

1
yn = (α, β) = β~ 0~xn
yn−1
n
1X 1
(1, yi−1 )
n i=1 yi−1
 n
X 
n yi−1
1 i=1

=
 
 n n
n X X 
2 
yi−1 yi−1
i=1 i=1
!
1 α/(1 − β)
→
α
2
σ2 ≡ Γ.
α/(1 − β) 1−β
+ (1−β) 2
√
take ( n)−1 = A
Now, kn = n

1 1
sup k √ k
1≤i≤n n yi−1
1 1
≤ √ + √ sup | yi−1 |
n n 1≤i≤n
It is sufficient to show that
√
yn−1 / n → 0 a.s.
n−1
X n−2
X
yi2 − yi2
2 i=1 i=1
yn−1 /n = → 0 a.s.
kn
!n
X
An ~xn,i~x0n,i A0n → Γ a.s.
1
An /An−1 → 1 a.s.
D
implies sup kAn~xn,i k → 0.
1≤i≤kn
154

Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei

Uploaded by

Copyright:

Available Formats

Martingale Limit Theory

1 Martingale Limit Theory 2

2 Stochastic Regression Theory 109

Martingale Limit Theory

Some examples of Martingale:

then Sn = ni=1 yi−1 εi is a martingale.

Example 1.2 Likelihood Ratio:

Example 1.3 Likelihood: L0 = 1, d logdθLn (θ) is a martingale.

log Ln (θ) = log fθ (Xn |X1 , . . . , Xn−1 ) + log Ln−1 (θ)

Then Jn (θ) + In (θ) is a martingale.

E(Zn+1 |Fn ) = mZn + b

{Xn } is said to be {Fn }–adaptive if Xn is Fn –measurable (i.e. FXn ⊂ Fn .)

1.1 Conditional Expectation

Observe that if X = ni=1 ai IAi , Ω = ∪li=1 Bi , Bi ∩ Bj = ∅ if i 6= j, then

(i) E(X|F) is F–measurable and E(X|F) ∈ L1 ,

|E(X|F)| ≤ li=1 |E(X|Bi )| < ∞ ⇒ E(X|F) ∈ L1

Definition 1.1 (Ω, G, P) is a probability space. Let F ⊂ G, X ∈ L1 . Define the

Existence and Uniqueness:

Recall that Z ≥ 0 a.s. and E(Z) = 0 ⇒ P (Z > 0) = 0.

Existence: X ≥ 0, X = li=1 ai IAi

Define ν(G) = G XdP = li=1 ai P (Ai ∩ G) ∀ G ∈ F.

ν  P|F = P̃ (P̃ (G) = 0 ⇒ ν(G) = 0)

By Radon-Nikodym theorem ∃ F–measurable function f

• density : contents/unit vol

Radon-Nikodym Theorem : Assume that ν and µ are σ–finite measure on F s.t.

2. Continuous : Let R f (x, y1 , . . . , yn ) be the joint density of (X, Y1 , . . . , Yn ) and

Proposition 1.1 Let X, Y ∈ L1 ,

7. |E(X|F)| ≤ E(|X||F) a.s..

So E(XY |F) = XE(Y |F).

Theorem 1.2 (Towering)

So E[E(X|F2 )|F1 ] = E(X|F1 ) a.s..

Remark 1.1 E[E(X|F1 )|F2 ] = E(X|F1 )E[1|F2 ] = E(X|F1 ), since E(X|F1 ) is F2 –

Jensen’s Inequality : If ϕ is a convex function on R and X, ϕ(X) ∈ L1 then

lim ϕ(E(Xn |F)) = ϕ( lim E(Xn |F)) = ϕ(E(X|F))

E[ϕ(x)|F] ≥ sup E[ϕm (x)|F] = sup lim E[ϕm (xn )|F]

= sup ϕm [E(X|F)] = ϕ[E(X|F)] a.s.

Some properties of convex function ϕ :

• If λi ≥ 0, ni=1 λi = 1 then ϕ( ni=1 λi xi ) ≤ ni=1 λi ϕ(xi )

• The geometry property

• ϕ is continuous(since right-derivative and left-derivative exist)

|E(X|F)|p ≤ E(|X|p |F) a.s.

2. If X ∈ L2 and Y ∈ L2 (F) = {U : U ∈ L2 and U is F–measurable}, then

E(X − Y )2 = E(X − E(X|F))2 + E(E(X|F) − Y )2 .

E(X − Y )2 = E(X − E(X|F) + E(X|F) − Y )2

Lemma 1.1 E(X − E(X|F))U = 0 if U ∈ L2 (F).

Application : Bayes Estimate (X1 , · · · , Xn ) ∼ f (~x|θ) , θ ∈ L2 , Xi ∈ L2 . Use

Question : In what sense θ̂n −→ θ ?

(i) Xn is Fn –adaptive ( or adapted to Fn ) if Xn is Fn –measurable ∀n.

(ii) Yn is Fn –predictive ( predictive w.r.t. Fn ) if Yn is Fn−1 –measurable ∀n.

(iv) {Xn , n ≥ 1} is said to be a martingale w.r.t. {Fn , n ≥ 1} ,if

Remark 1.3 If {Xn , n ≥ 1} is a martingale w.r.t. {Fn , n ≥ 1} and E(X1 ) = 0,

E(Sn |Fn−1 ) = E(Yn εn + Sn−1 |Fn−1 )

(c) Bayes estimate : θ̂n = E(θ|Fn ) where Fn ↑,

E(θ̂n+1 |Fn ) = E(E(θ|Fn+1 )|Fn ) = E(θ|Fn ) = θ̂n .

(d) Likelihood Ratio : Pθ , dPθ = fθ (X1 , · · · , Xn )dµ

fθ (X1 , · · · , Xn ) dPθ /dµ dPθ

Ln (θ, X1 , · · · , Xn ) = fθ (Xn |X1 , · · · , Xn−1 )Ln−1 (θ, X1 , · · · , Xn−1 ).

(e) { d logdθLn (θ) , Fn = σ(X1 , · · · , Xn )} is a martingale if

d log fθ (Xn |X1 , . . . , Xn−1 )

E[vi (θ)|Fi−1 ] = −E[u2i (θ)|Fi−1 ] a.s..

Example : Xn = θXn−1 + εn , n = 1, 2, . . ., and X0 ∼ N (0, c2 ) is independent

Ln (θ, X0 , . . . , Xn ) = fθ (X0 )fθ (X1 |X0 ) · · · fθ (Xn |X0 , . . . , Xn−1 )

ν P|F = P̃ (P̃ (G) = 0 ⇒ ν(G) = 0)