Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

Week 2 Practice: Solutions

d
Problem 1. Consider a sequence Xn = B(n, pn ) (n > 1) of binomial random
variables. Suppose that limn→∞ npn = λ for some constant λ.
(1) For each k > 0, compute limn→∞ P(Xn = k).
[Hint: Let bn (k) , P(Xn = k). First compute limn→∞ bnb(k−1)
n (k)
. Then use bn (k) =
bn (0) · bbnn (0)
(1)
· · · · · bnb(k−1)
n (k)
]
(2) If λ > 0, show that Xn converges weakly to the Poisson distribution with
parameter λ.
(3) If λ = 0, show that Xn converges weakly to the zero random variable.
Solution. (1) For each k > 1, we have
n  k
p (1 − pn )n−k
bn (k) k n
=
bn (k − 1) n  k−1
p (1 − pn )n−k+1
k−1 n
n−k+1 pn
= ·
k 1 − pn
npn (k − 1)pn
= − .
k(1 − pn ) k(1 − pn )
Since npn → λ, we also have pn → 0. Therefore,
bn (k) λ
→ as n → ∞.
bn (k − 1) k
In addition, we also have
1
bn (0) = (1 − pn )n = (1 − pn ) pn ·npn → e−λ .
Therefore,
k
bn (1) bn (k) −λ λ λ −λ λ
bn (k) = bn (0) · · ··· · → e · · ··· · = e · .
bn (0) bn (k − 1) 1 k k!

1
(2) For any x > 0, we have
[x] [x]
X X
−λ λk
P(Xn 6 x) = bn (k) → e · = F (x),
k=0 k=0
k!
where [x] denotes the integer part of x, and F (x) is the cumulative distribution
function of the Poisson distribution with parameter λ. The above convergence also
holds trivially if x < 0. Therefore, Xn converges weakly to Poi(λ).
(3) If λ = 0, according to Part (1) we have bn (0) → 1 and bn (k) → 0 for each
k > 1. Therefore,
[x]
X
P(Xn 6 x) = bn (k) → 1
k=0
if x > 0, and P(Xn 6 x) = 0 if x < 0. Since
(
1, x > 0;
F (x) =
0, x < 0,
is the cumulative distribution function of the zero random variable, we conclude
that Xn converges weakly to zero in this case.
Remark. In this particular problem, we actually have P(Xn 6 x) → F (x) for all
x ∈ R, where F (x) is the relevant limiting distribution. Note that, in general this
may not be true and one can only expect convergence at continuity points of F ,
as illustrated in the definition of weak convergence.
Problem 2. Let Xn (n > 1) be a geometric random variable with parameter 1/n.
d
(1) Show that Xn /n converges weakly to Y = exp(1).
(2) What is the probability that Xn is an even integer?
(3) Does P(Xn /n ∈ A) converge to P(Y ∈ A) for any Borel measurable subset
A ⊆ R?
Solution. (1) If x < 0, we have
Xn 
P 6 x = 0 → 0 = P(Y 6 x)
n
trivially. If x > 0, we have
[nx]
Xn  X 1 k 1
P 6 x = P(Xn 6 nx) = 1− ·
n k=0
n n
1 [nx] 1 n· [nx]
=1− 1− =1− 1− n
,
n n

2
[nx]
where [nx] denotes the integer part of nx. Note that n
→ x. Therefore,

Xn
6 x → 1 − e−x = P(Y 6 x).

P
n
This implies that Xn /n converges weakly to Y .
(2) Since Xn is a geometric random variable with parameter 1/n, we have

X 1 2k 1 n
P(Xn is an even number) = 1− · = .
k=0
n n 2n − 1

(3) This is not true. Consider A = Q (the set of rational numbers). Since Y is a
continuous random variable, we have

P(Y ∈ Q) = 0.

On the other hand, since


 Xn
{Xn is an even number} ⊆ ∈Q ,
n
we know that
Xn  n 1
P ∈ Q > P(Xn is an even number) = > .
n 2n − 1 2
This shows that
Xn 
P ∈ Q 9 P(Y ∈ Q).
n
Remark 3. There is a much simpler solution to Part (3) due to Jaymond: P( Xnn ∈
Q) = 1!

3
Week 3 Practice: Solutions

Problem 1. Given n > 1, let {X1 , X2 , · · · } be a sequence of independent ran-


dom variables each following the discrete uniform distribution over {1, 2, · · · , n}.
Define
Tn , inf{k > 1 : Xl = Xk for some l < k}.

(i) When n = 365, we intepret Xi (i > 1) as the birthday of the i-th person. What
is the meaning of the event {Tn > r}?
(ii) For each r > 1, show that
1 2 r − 1
P(Tn > r) = 1 − · 1− ··· 1 − .
n n n
1
(iii) Show that n− 2 Tn converges weakly (as n → ∞) to the distribution F given
by ( 2
1 − e−s /2 , x > 0;
F (x) ,
0, x 6 0.
[Hint: To be fully rigorous you may use the elementary inequality

−x − x2 6 log(1 − x) 6 −x for x ∈ [0, 1/2].

But for understanding the idea, it is enough to apply log(1 − x) ≈ −x.]

Solution. (i) Tn > r means no two members among the first r terms X1 , · · · , Xr
are equal. In the context of the birthday problem when n = 365, this is the event
that no two members in a group of r people have the same birthday.
(ii) Note that
{Tn > r} = {X1 , · · · , Xr all distinct}.
To produce this event, there are no restriction on X1 , and then there are n − 1
choices for X2 to be different from X1 , and then n−2 choices for X3 to be different

1
from X1 and X2 , etc. Therefore,
n−1 n−2 n−r+1
P(Tn > r) = 1 × × × ··· ×
n n n
1 2 r − 1
= 1− · 1− ··· 1 − .
n n n
Note that Tn 6 n + 1, and thus P(Tn > r) = 0 if r > n + 1. The above formula
covers this situation as well and is thus true for all r ∈ N.
(iii) If x 6 0 we have P(n−1/2 Tn 6 x) = 0. If x > 0, we have

P(n−1/2 Tn > x) = P(Tn > nx)

[ nx]−1
Y k
= 1−
k=1
n

[ nx]−1
X k 
= exp log 1 − .
k=1
n

To understand the limit as n → ∞, we apply the inequality

−x − x2 6 log(1 − x) 6 −x for x ∈ [0, 1/2]


k k
with x = n
(note that when n is large we have n
∈ [0, 1/2]). Firstly, we have

[ nx]−1 √ √
X k [ nx] · ([ nx] − 1) x2
= → .
k=1
n 2n 2

In addition, we have
√ √
[ nx]−1 √ [ nx]−1
X k 2 [ nx] X
6 · k
k=1
n n2 k=1
√ √ √
[ nx] [ nx] · ([ nx] − 1)
= ·
n 2n
→ 0.

Therefore, by the “sandwich principle” we see that



[ nx]−1
X k x2
log 1 − →− .
k=1
n 2

2
In particular,

P(n−1/2 Tn 6 x) = 1 − P(n−1/2 Tn > x)



[ nx]−1
X k 
= 1 − exp log 1 −
k=1
n
−x2 /2
→1−e .

Problem 2. Prove the sufficiency part of Proposition 1.1 in the lecture notes.
Namely, let Fn , F be cumulative distribution functions and let µn , µ be the corre-
sponding probability measures. If

µn ((a, b]) → µ((a, b])

for any continuity points a < b of µ, then Fn converges weakly to F .

Solution. Let x be a continuity point of F . We want to show that Fn (x) → F (x).


First of all, since µn converges weakly to µ, we know that the seqeunce {µn } is
tight. In particular, given ε > 0, there exists M > 0 such that

µn ((−M, M ]) > 1 − ε.

By further enlarging M , we may assume that −M is a continuity point of µ and


the above inequality also holds for µ. It follows that

|Fn (x) − F (x)|



= µn ((−∞, −M ] + µn ((−M, x])) − µ((−∞, −M ] + µ((−M, x]))

6 µn ((−∞, −M ]) + µ((−∞, −M ]) + µn ((−M, x]) − µ((−M, x])

6 2ε + µn ((−M, x]) − µ((−M, x]) .

Since −M and x are both continuity points of µ, letting n → ∞ the second term
vanishes, and we arrive at

lim Fn (x) − F (x) 6 2ε.
n→∞

The result follows as ε is arbitrary.

Problem 3. Let Xn and X be random variables taking values in Z (the set of


integers). Show that the following three statements are equivalent.

3
(i) Xn converges weakly to X.
(ii) For any k ∈ Z, we have

lim P(Xn = k) = P(X = k).


n→∞

(iii) For any A ⊆ Z, we have

lim P(Xn ∈ A) = P(X ∈ A).


n→∞

Solution. (i) =⇒ (ii). For each k ∈ Z, k ± 1/2 are both continuity points of the
law of X. By the definition of weak convergence, we have
1 1
P(Xn = k) = P Xn ∈ (k − , k + ]
2 2
1 1
→ P X ∈ (k − , k + ]
2 2
= P(X = k).

(ii) =⇒ (i). Given any finite interval (a, b], it contains at most finitely many
integer points. Therefore,
X X
P(Xn ∈ (a, b]) = P(Xn = k) → P(X = k) = P(X ∈ (a, b]).
k∈(a,b] k∈(a,b]

This implies the weak convergence.


(iii) =⇒ (ii) is trivial.
(i)+(ii) =⇒ (iii). Let A ⊆ Z be a given subset. Since the sequence {Xn } is
weakly convergent, we know that it is tight. In particular, given ε > 0, there
exists N ∈ N such that

P(|Xn | > N ) < ε for all n. (0.1)

By enlarging N if necessary, we may assume that (0.1) holds for X as well. Observe
that,
P(Xn ∈ A) = P(Xn ∈ A ∩ [−N, N ]) + P(Xn ∈ A ∩ [−N, N ]c ),
and
P(X ∈ A) = P(X ∈ A ∩ [−N, N ]) + P(X ∈ A ∩ [−N, N ]c ).

4
It follows that

P(Xn ∈ A) − P(X ∈ A)

6 P(Xn ∈ A ∩ [−N, N ]) − P(X ∈ A ∩ [−N, N ])
+ P(|Xn | > N ) + P(|X| > N )

6 2ε + P(Xn ∈ A ∩ [−N, N ]) − P(X ∈ A ∩ [−N, N ]) .

Note that there are finitely many integer points in [−N, N ]. By (ii) we have

lim P(Xn ∈ A ∩ [−N, N ]) = P(X ∈ A ∩ [−N, N ]).


n→∞

Therefore,
lim P(Xn ∈ A) − P(X ∈ A) 6 2ε.
n→∞

This gives the desired result as ε is arbitrary.

5
Week 4 Practice: Solutions

Problem 1. Suppose that Xn converges weakly to a deterministic constant c.


Show that Xn → c in probability.
Solution. Let (
0, x < c;
F (x) ,
1, x > c,
be the cumulative distribution function of the constant c. By assumption, we
know that
P(Xn 6 x) → F (x)
for any x 6= c (any x 6= c is a continuity point of F ). Therefore, given ε > 0,

P(|Xn − c| > ε) = P(Xn > c + ε) + P(Xn < c − ε)


6 1 − P(Xn 6 c + ε) + P(Xn 6 c − ε)
→ 1 − F (c + ε) + F (c − ε)
→1−1+0
= 0.

This shows that Xn → c in probability.


Problem 2. (1) Suppose that

Xn → X, Yn → Y in prob.

Show that Xn + Yn → X + Y in probability.


(2) We say that Xn converges to X in Lp if

E[|Xn → X|p ] → 0.

Show that convergence in Lp implies convergence in probability.


(3) Suppose that Xn converges to X in probability. Show that there exists a
subsequence Xnk converging to X almost surely.

1
Solution. (1) Let ε > 0. Note that
  ε  ε
|(Xn + Yn ) − (X + Y )| > ε ⊆ |Xn − X| > ∪ |Yn − Y | > .
2 2
Therefore,

P |(Xn + Yn ) − (X + Y )| > ε
 ε  ε
6 P |Xn − X| > + P |Yn − Y | > .
2 2
The right hand side tends to zero by assumption. Therefore, Xn + Yn converges
to X + Y in probability.
(2) Suppose that Xn converges to X in Lp . Given ε > 0, by the chebyshev in-
equality, we have
1
P(|Xn − X| > ε) 6 E[|Xn − X|p ] → 0.
εp
Therefore, Xn → X in proability.
(3) By the definition of convergence in probability (with ε = 1), there exists n1
such that
P(|Xn1 − X| > 1) < 1.
We can use the definition again (with ε = 1/2) to find n2 > n1 such that
1 1
P |Xn2 − X| > < 2.
2 2
Inductively, at the k-th step (using ε = 1/k in the definition of convergence in
probability), we find nk > nk−1 such that
1 1
P |Xnk − X| > < 2.
k k
We now end up with a subsequence {Xnk : k > 1}, and we claim that Xnk → X
a.s. To see this, observe from the construction that

X 1
P |Xnk − X| > < ∞.
k=1
k

According to the first Borel-Cantelli’s lemma, we conclude that


1 
P |Xnk − X| > for infinitely many k = 0.
k
2
Equivalently, with probability one we have
1
|Xnk − X| 6 when k is large enough.
k
But this property implies that Xnk → X. Therefore, Xnk converges to X almost
surely.

Problem 3. Let Xn (n > 1) and X be random variables. Suppose that for any
ε > 0,
X∞
P(|Xn − X| > ε) < ∞.
n=1

Show that Xn → X almost surely.

Solution. For each k > 1, according to the first Borel-Cantelli’s lemma, we know
that
1 
P |Xn − X| > for infinitely many n = 0.
k
Equivalently, with
 1
Ωk , |Xn − X| 6 when n is sufficiently large
k
we have P(Ωk ) = 1. Let Ω , ∩k>1 Ωk . Then P(Ω) = 1. By the definition of
convergence, we have 
Ω ⊆ lim Xn = X .
n→∞

Therefore, Xn converges to X almost surely.

Problem 4. Let {Xn : n > 1} be a sequence of independent and identically


distributed random variables, such that E[|X1 |] < ∞.
(i) Show that Xnn → 0 almost surely.
(ii) If Var[Xn ] = ∞, show that with probability one,

Xn > n for infinitely many n.

Solution. (i) It suffices to show that: for any ε > 0,



X Xn 
P > ε < ∞.
n=1
n

3
To this end, we first write
∞ ∞ ∞
X Xn  X X
P >ε = P(|Xn | > nε) = P(|X1 | > nε),
n=1
n n=1 n=1

where the last equality follows from the fact that the Xn ’s are identically dis-
tributed. In addition, we have
Z ∞
E[|X1 |] = P(|X1 | > x)dx
0
∞ Z
X nε
= P(|X1 | > x)dx
n=1 (n−1)ε
∞ Z
X nε
> P(|X1 | > nε)dx
n=1 (n−1)ε

X
=ε· P(|X1 | > nε).
n=1

Therefore, if E[|X1 |] < ∞, we then know that



X
P(|X1 | > nε) < ∞,
n=1

which gives the desired claim.


(ii) First of all, the assumption implies that E[X12 ] = ∞. In addition, we have
Z ∞
2
E[X1 ] = P(X12 > x)dx
0
∞ Z
X n+1
= P(X12 > x)dx
n=0 n

X∞
6 P(X12 > n)
n=0

X
61+ P(Xn2 > n).
n=1

It follows that ∞
X
P(Xn2 > n) = ∞.
n=1

4
According to the second Borel-Cantelli’s lemma, we conclude that

P Xn2 > n infinitely often = 1,





or equivalently, with probability one |Xn | > n for infinitely many n.

5
Week 5 Practice: Solutions

Problem 1. Let f : [0, 1] → R be a continuous function. Show that


Z 1 Z 1
x1 + · · · + xn  1
lim ··· f dx1 · · · dxn = f
n→∞ 0 0 n 2
and Z 1 Z 1
1
f (x1 · · · xn )1/n dx1 · · · dxn = f

lim ··· .
n→∞ 0 0 e
Solution. We let {Xn : n > 1} be a sequence of independent and identically
distributed random variables, each following the uniform distributed over [0, 1].
(1) We have
Z 1 Z 1
x1 + · · · + xn   X1 + · · · + Xn 
··· f dx1 · · · dxn = E f .
0 0 n n
By the strong law of large numbers, we have
X1 + · · · + Xn 1
→ E[X1 ] = a.s.
n 2
Therefore,
1 1
x1 + · · · + xn 
Z Z
lim ··· dx1 · · · dxn
f
n→∞ 0 0 n
 X1 + · · · + Xn 
= lim E f
n→∞ n
1
=f .
2
To prove the second assertion, by using
1
(x1 · · · xn )1/n = exp

(log x1 + · · · + log xn ) ,
n
1
we can write
Z 1 1Z
f (x1 · · · xn )1/n dx1 · · · dxn

···
0
Z 1 0Z 1
1 
= ··· f exp (log x1 + · · · + log xn ) dx1 · · · dxn
0 0 n
 1 
= E f exp (log X1 + · · · + log Xn ) .
n
By applying the strong law of large numbers to the sequence {log Xn }, we have
Z 1
log X1 + · · · + log Xn
→ E[log X1 ] = log xdx = −1.
n 0

Therefore,
Z 1 Z 1
f (x1 · · · xn )1/n dx1 · · · dxn → f exp(−1) .
 
···
0 0

Problem 2. (i) Let {Xn : n > 1} be an i.i.d. sequence of Bernoulli P∞random


variables with parameter
P∞ 1/2. If an are positive real numbers such that n=1 an <
∞, show that n=1 an Xn is convergent almost surely.
(ii) Consider an independent sequence {Xn : n > 1} where
2 1
P(Xn = 0) = 1 − , P(X n = n) = P(X n = −n) = .
n2 n2
Show that ∞
P
n=1 Xn is convergent almost surely. This indicates that the reverse
of Kolmogorov’s two-series theorem may not be true in general.

Solution. (i) First of all, we have


∞ ∞
X 1X
E[an Xn ] = an < ∞.
n=1
2 n=1

In addition, since ∞
P
n=1 an < ∞, we know that an → 0 and thus the sequence
{an } is bounded, say with some M > 0 we have an 6 M for all n. It follows that

X ∞
X
a2n 6 M an < ∞.
n=1 n=1

2
This clearly implies that

X
Var[an Xn ] < ∞.
n=1
P∞
By Kolmogorov’s two-series theorem, we conclude that n=1 an Xn is convergent
a.s.
(ii) The key observation is that
∞ ∞
X X 2
6 0) =
P(Xn = < ∞.
n=1 n=1
n2

According to the first Borel-Cantelli’s lemma, we have

P(Xn 6= 0 for infinitely many n) = 0.

P one, Xn = 0 for all sufficiently large n. This implies


Equivalently, with probability
that, with probability one ∞ n=1 Xn is convergent. Note that

∞ ∞
X X 2
E[Xn2 ] = n2 × = ∞,
n=1 n=1
n2

and thus the condition in Kolmogorov’s two-series theorem fails in this example.

Problem 3. Following the steps below to show that the second Borel-Cantelli’s
lemma holds true when total independence is replaced by pairwise independence.
Let {An : n > 1} be a sequence of pairwisely independent events (i.e. P(An ∩
Am ) = P(An ) · P(Am ) for any n 6= m). Suppose that ∞
P
n=1 P(An ) = ∞. We wish
to show that P(An infinitely often) = 1.
(i) Consider the indicator random variable Xn , 1An and define Sn , X1 + · · · +
Xn . Observe that the condition is equivalent to “E[Sn ] → ∞”, and the claim is
equivalent to “Sn → ∞ a.s.”
(ii) Let A > 0 be a fixed number. Use Chebyshev’s inequality to show that
 1
P Sn − E[Sn ] 6 Aσ(Sn ) > 1 − 2 , for all n > 1, (E1)
A
p
where σ(Sn ) , Var[Sn ].
(iii) Show that
n
X
Var[Sn ] = Var[Xk ],
k=1

3
and use this to deduce that
σ(Sn )
→ 0 as n → ∞. (E2)
E[Sn ]

(iv) Use (E1) and (E2) to deduce that, for given A > 0, there exists N > 1
depending on A, such that
1  1
P Sn > E[Sn ] > 1 − 2
2 A
for all n > N . Conclude the proof from this point.

Solution. (i) Since E[Xn ] = P(An ), we have


n
X
E[Sn ] = P(Ak ),
k=1

and thus ∞
X
P(An ) = ∞ ⇐⇒ E[Sn ] → ∞.
n=1

In addition, since Xn = 1 if and only if An happens, we see that An happens for


infinitely
P∞ many n if and only of there are infinitely many 1’s among the Xn ’s, i.e.
n=1 Xn = ∞, or equivalently Sn → ∞.
(ii) According to Chebyshev’s inequality
 2 
 E Sn − E[Sn ] 1
P Sn − E[Sn ] > Aσ(Sn ) 6 = .
A2 Var[Sn ] A2

The claim follows from taking complement.


(iii) Define X̄n , Xn − E[Xn ]. Then we have
n n
 X 2  X X
Var[Sn ] = E X̄k = E[X̄k2 ] + E[X̄j X̄k ]
k=1 k=1 j6=k
n
X X
= Var[Xk ] + E[X̄j ] · E[X̄k ] (by pairwise independence)
k=1 j6=k
Xn
= Var[Xk ].
k=1

4
Since Xk is the indicator random variable for Ak , we see that

Var[Xk ] = E[Xk2 ] − (E[Xk ])2 = P(Ak ) − P(Ak )2 6 P(Ak ).

Therefore,
n
X
Var[Sn ] 6 P(Ak ) = E[Sn ].
k=1

By the assumption that E[Sn ] → ∞, we have


p
σ(Sn ) E[Sn ] 1
6 =p → 0.
E[Sn ] E[Sn ] E[Sn ]

(iv) By the triangle inequality, we have


 
Sn − E[Sn ] 6 Aσ(Sn ) ⊆ Sn > E[Sn ] − Aσ(Sn ) .

According to (E2), we can find N depending on A, such that for any n > N ,
1
E[Sn ] − Aσ(Sn ) > E[Sn ].
2
It follows from (E1) that
1  1
P Sn > E[Sn ] > 1 − 2 (E3)
2 A
when n > N . To finish the proof, let us write

X
S , lim Sn = Xn .
n→∞
n=1

Since S > Sn for all n, we see from (E3) that


1 1
P(S > E[Sn ]) > 1 − 2 for all n > N.
2 A
In particular, by letting n → ∞ we arrive at
1
P(S = ∞) > 1 − .
A2
The desired result follows since A is arbitrary.

5
Week 6 Practice: Solutions

Problem 1. (i) Compute the characteristic function of a Gamma distribution


γ(n, α) where n ∈ N and a > 0.
(ii) Let X, Y be independent standard normal random variables. Compute the
characteristic function of X 2 + Y 2 . What is the corresponding distribution?
[Hint: To avoid change of variables over C and complex integration, a general
strategy of computing the characteristic function is to first compute the moment
generating function, and then to apply the substitution t 7→ it. ]

Solution. To avoid change of variables over C and complex integration, our gen-
eral strategy of computing the characteristic function is to first compute the mo-
ment generating function, and then to apply the substitution t 7→ it. This is easily
seen from the following relation:
t7→it
m.g.f E[etX ] −→ ch.f. E[eitX ].

(i) We first consider the exponential distribution (i.e. n = 1). The moment
generating function of exp(α) is given by
Z ∞
α
M (t) = etx · αe−αx dx = .
−∞ α−t
M (t) is well-defined when t < α. Changing t to it, we obtain the characteristic
function
α
f (t) = .
α − it
Since γ(n, α) is the sum of n independent copies of exp(α), its characteristic
function is given by
α n
g(t) = f (t)n = .
α − it

1
(ii) The moment generating function of X 2 is given by
2
MX 2 (t) = E[etX ]
Z ∞
2 1 x2
= etx · √ e− 2 dx
−∞ 2π
Z ∞
1 x2

=√ e 2·(1−2t)−1 dx.
2π −∞
The above integral is convergent if and only if t < 1/2. In this case,
Z ∞
(1 − 2t)−1/2 − x2
MX 2 (t) = p e 2·(1−2t)−1 dx
2π · (1 − 2t)−1 −∞
= (1 − 2t)−1/2 , t < 1/2.

Therefore, by changing t to it, the characteristic function of X 2 is given by

fX 2 (t) = (1 − 2it)−1/2 .

It follows that
fX 2 +Y 2 = (1 − 2it)−1 .
This is the exponential distribution with parameter 1/2.
Problem 2. Let f (t), g(t) be characteristic functions of some random variables.
Show that each of the following functions is again a characteristic function (of
some random variable/probability measure).
(i) h(t) = |f (t)|2 ;
R 1(t) + (1 − λ)g(t) where λ ∈ (0, 1);
(ii) h(t) = λf
(iii) h(t) = 0 f (ut)du.
Solution. (i) Let X, Y be two independent random variables with the same char-
acteristic function f . Then the characteristic function of X − Y is given by

fX−Y (t) = E[eit(X−Y ) ] = E[eitX ] · E[e−itY ] = f (t)f (−t)


= f (t)f (t) = |f (t)|2 .

(ii) Let f (t) and g(t) be the characteristic functions of probability measures µ and
ν respectively. In other words,
Z Z
itx
f (t) = e µ(dx), g(t) = eitx ν(dx).
R R

2
Define a new measure τ by

τ (A) , λµ(A) + (1 − λ)ν(A), A ∈ B(R).

Then τ is a probability measure, and the characteristic function of τ is given by


Z Z
itx
eitx λdµ + (1 − λ)dν

fτ (t) = e dτ =
R R
= λf (t) + (1 − λ)g(t).

More generally, one can show in the same way that, if f1 , · · · , fn are characteristic
functions and λ1 , · · · , λn are positive numbers such that λ1 + · · · + λn = 1, then

λ1 f1 + · · · + λn fn

is also a characteristic function.


(iii) Once we have understood Part (ii) properly, this problem is just a “continuous
version” of that part. The crucial point isR that, for each fixed u, the function
1
t 7→ f (ut) is a characteristic function, and 0 du = 1.
To be more precise, let X be a random variable whose characteristic function
is f (t). Then for each u, the function fu (t) , f (ut) is the characteristic function
of the random variable uX. If we denote µu as the law of uX, then the measure
Z 1
µ, µu du
0

is a probability measure on R. The characteristic function of µ is given by


Z Z Z 1
itx itx

fµ (t) = e µ(dx) = e µu (dx)du
0
ZR1 Z R

= du eitx µu (dx)
Z0 1 R

= fu (t)du
0
Z 1
= f (ut)du.
0

Problem 3. Consider an i.i.d. sequence {Xn : n > 1} in which the common


distribution is given by
c
P(X1 = k) = P(X1 = −k) = , k = 2, 3, · · · .
k2 log k

3
Here c > 0 is the normalising constant so that the above expression defines a
probability mass function. Let fn (t) be the characteristic function of Snn where
Sn , X1 + · · · + Xn . Show that
lim fn (t) = 1
n→∞

for every t ∈ R.
[Hint: write the characteristic function as
fn (t) = exp(n log(1 − an ))
P∞
and P∞an = o(1/n). Note that an is given by a sum
Pn show that k=2 . Split it into
k=2 and k=n ]

Remark. Note that f (t) = 1 is the characteristic function of X = 0. As we will


see in future lectures, this imples that Snn → 0 weakly (equivalently, in probability
since the limiting random variable is deterministic). In particular, {Xn } satisfies a
weak law of large numbers. On the other hand, since E[|X1 |] = ∞ in this example,
from the lecture we have
|Sn |
lim = ∞ a.s. (E)
n→∞ n

Therefore, we do not have a strong law Snn → 0 a.s. For this example, the “limsup”
in (E) cannot be replaced by the “lim”, since we know that there is a subsequence
Snk
nk
→ 0 almost surely (a consequence of convergence in probability–see Week 4
Practice, Problem 2 (3)).
Solution. The characteristic function of X1 is given by

X c X 2c cos kt
f1 (t) = eikt · = .
k 2 log k k=2
k 2 log k
k∈Z\{0,±1}

Sn
Since the Xn ’s are i.i.d., the characteristic function of n
is given by

X 2c cos kt n n
fn (t) =
k=2
k2 log k

X 2c cos kt  n
= exp n log
k=2
k 2 log k

2c 1 − cos kt

X
n

= exp n log 1 − 2
,
k=2
k log k

4
where we have used the relation that

X 2c
= 1.
k=2
k2 log k

Let us denote the quantity


∞ ∞
X 1 − cos kt n
X 2 sin2 kt
2n
an , = .
k=2
k 2 log k k=2
k 2 log k

If we can show that an = o(1/n), i.e. nan → 0 (for each fixed t), then the desired
result follows. In fact, this will imply

n log(1 − 2can ) ≈ −2cnan → 0

and thus fn (t) → 1.


To show that an = o(1/n), we write
∞ n ∞
X 2 sin2 kt
2n
X 2 sin2 kt
2n
X 2 sin2 kt
2n
an = = + . (E1)
k=2
k 2 log k k=2
k 2 log k k=n
k 2 log k

The first term is estimated as:


n n
X 2 sin2 kt
2n
X kt 2 1
2
6 2 · · 2 (| sin x| 6 |x|)
k=2
k log k k=2
2n k log k
n
t2 1 X 1
= · 2 . (E2)
2 n k=2 log k

Now recall from real analysis that


n
1X
cn → 0 =⇒ ck → 0.
n k=1

In particular, we have
n
1X 1
→0
n k=2 log k
and thus the right hand side of (E2) is of o(1/n). For the second term in (E1),
we have ∞ ∞
X 2 sin2 2n
kt
2 X 1
6 .
k=n
k 2 log k log n k=n k 2

5
Note that ∞ ∞ ∞
X 1 X 1 X 1 1 1
2
6 = − = .
k=n
k k=n
k(k − 1) k=n
k − 1 k n − 1
It follows that ∞
X 2 sin2 kt
2n 2 1
6 =o .
k=n
k2 log k (n − 1) log n n
Therefore, we conclude that an = o(1/n).

6
Week 7 Practice: Solutions

Problem 1. In this problem, you can look up the explicit formulae for charac-
teristic functions without deriving them.
(i) Let Xn be a binomial random variable with parameters n and pn . Suppose that
npn → λ > 0 as n → ∞. Use the method of characteristic functions to show that
Xn converges weakly to the Poisson distribution with parameter λ.
[Hint: You may use the approximation log(1 + z) = z + o(|z|) when |z| is small
even in the case when z is a complex variable.]
(ii) Let Xλ be a Poisson random variable with parameter
√ λ > 0. Use the method
of characteristic functions to show that, (Xλ − λ)/ λ converges weakly to the
standard normal distribution as λ → ∞. How will you intepret this property
heuristically?

Solution. (1) The characteristic function of Xn is given by


n
fn (t) = (1 − pn + pn eit )n = 1 − pn (1 − eit )
= exp n log(1 − pn (1 − eit )) .


Note that
log(1 − pn (1 − eit )) = −pn (1 − eit ) + o(pn ).
Therefore,

n log(1 − pn (1 − eit ))
= −npn (1 − eit ) + no(pn )
o(pn )
= −npn (1 − eit ) + npn ·
pn
it
→ −λ(1 − e ).

It follows that
fn (t) → exp(−λ(1 − eit )), as n → ∞,

1
where the right hand side is exactly the characteristic function of the Poisson
distribution with parameter λ. √
(2) By the linearity property, the characteristic function of (Xλ − λ)/ λ is
given by
√ √ 
fλ (t) = exp λ · eit/ λ − 1 − i λt

√ t 
= exp λ · eit/ λ − 1 − i √ .
λ
Note that
√ it t2 1
eit/ λ
−1− √ =− +o , as λ → ∞.
λ 2λ λ
Therefore,
2 /2
fλ (t) → e−t , as λ → ∞,
where the right hand side is the characteristic function of the standard normal
distribution.
One way to interpret this result is the following. We assume that λ is a positive
integer. Recall that the sum of independent Poisson random variables is again a
Poisson random variable whose parameter is the sum of the individual parameters.
Therefore, we may think of
X λ = X1 + X2 + · · · + Xλ ,
where {Xn : n > 1} is an i.i.d. sequence of Poisson(1)-random variables. Since
E[Xλ ] = λ, Var[Xλ ] = λ,
the result
Xλ − λ
√ → N (0, 1) weakly
λ
is precisely a central limit theorem for the fluctuation of the partial sum.
Problem 2. Let f (t) be a characteristic function of some random variable, and
let λ > 0.
(i) For each integer n > λ, explain why
λ(f (t) − 1)
1+
n
is a characteristic function.
(ii) By using Part (i) and Lévy-Cramér’s theorem, show that eλ(f (t)−1) is a char-
acteristic function.

2
(iii) Let {Xn : n > 1} be an i.i.d. sequence with characteristic function f (t),
and let N be a Poisson(λ)-distributed random variable that is independent from
{Xn : n > 1}. Can you use them to construct a random variable whose charac-
teristic function is eλ(f (t)−1) ?
Solution. (i) We can write
λ(f (t) − 1) λ λ
1+ = 1− · 1 + · f (t).
n n n
Note that the constant function 1 is a characteristic function (of X = 0). When
n > λ, the above expression is a convex combination of 1 and f (t). According to
Week 6 Practice Problem 2 (ii), it is again a characteristic function.
λ(f (t)−1) n
(ii) Using Part (i), we also know that 1 + n
is a characteristic function.
In addition, we know that
λ(f (t) − 1) n
eλ(f (t)−1) = lim 1 + .
n→∞ n
The limiting function eλ(f (t)−1) is apparently continuous at t = 0. According to
Lévy-Cramér’s theorem, we know that it must be a characteristic function.
(iii) Consider the random variable
N (ω)
X
ω 7→ S(ω) , Xk (ω),
k=1

where we define S(ω) , 0 if N (ω) = 0. Then the characteristic function of S is


given by
 PN
g(t) = E[eitS ] = E eit k=1 Xk


X  Pn
E eit k=1 Xk |N = n P(N = n)

=
n=0

X  Pn
E eit k=1 Xk P(N = n) (since N and {Xn } are independent)

=
n=0

X
ne−λ λn
= f (t) ·
n=0
n!
= e−λ · eλf (t)
= eλ(f (t)−1) .

3
Problem 3. Recall that the standard Cauchy distribution has density
1
ρ(x) = , x ∈ R.
π(1 + x2 )

How will you show that its characteristic function is given by f (t) = e−|t| without
using any complex integration?

Solution. We first reverse the problem by regarding e−|t| as a density and com-
puting its characteristic function. To be more precise, the function
1
g(t) , e−|t| , t ∈ R,
2
is a probability density function. Its characteristic function is given by
Z
1
ψ(x) = eixt g(t)dt
2 R
Z
1
= (cos xt + i sin xt)e−|t| dt
2
Z ∞R
= e−t cos xtdt,
0

where the last equality follows from the fact that sin x is odd and cos x is even.
Using integration by parts twice, we have
Z ∞
ψ(x) = − cos xtd(e−t )
0
Z ∞
−t ∞
e−t sin xtdt

= − e cos xt|0 + x
0
Z ∞
=1−x e−t sin xtdt
0
Z ∞
=1+x sin xtd(e−t )
0
Z ∞
−t ∞
e−t cos xtdt

= 1 + x e sin xt|0 − x
0
Z ∞
= 1 − x2 e−t cos xtdt
0
= 1 − x2 ψ(x).

4
Therefore,
1
ψ(x) =
1 + x2
is the characteristic function of g. Note that ψ is integrable on R. Therefore, by
the inversion formula, we have
Z ∞ Z ∞
1 −|t| 1 −itx 1 1
e = g(t) = e ψ(x)dx = e−itx · dx.
2 2π −∞ 2π −∞ 1 + x2

Changing t to −t in the above expression, we obtain

1 ∞ eitx
Z
−|t|
e = dx.
π −∞ 1 + x2

The right hand side is the characteristic function of the Cauchy distribution by
definition.

Remark. There is a very clever observation by a student which simplifies the


calculation a lot. The p.d.f. 21 e−|t| corresponds to the distribution of X − Y
where X, Y are independent exp(1)-distributed random variables. Therefore, its
characteristic function is given by
1 1 1
· =
1 − it 1 + it 1 + t2
.

5
Week 8 Practice: Solutions

Problem 1. (1) Suppose that X and Y are both independent standard Cauchy
random variables. What is the prbability density function of X + Y ?
(2) Find an example to show that fX+Y (t) = fX (t)fY (t) (characteristic functions)
does not imply X, Y are independent.
[Hint: Consider the Cauchy distribution.]
Solution. (1) Recall that the characteristic function of X is given by fX (t) = e−|t| .
Therefore,
fX+Y (t) = fX (t) · fY (t) = e−2|t| .
This is the same as the distribution of 2X, whose probability density function is
easily seen to by
1 2
ρ(x) = · , x ∈ R.
π 4 + x2
(2) Let X be a standard Cauchy random variable. Then
f2X (t) = e−2|t| = fX (t)fX (t).
However, any random variable cannot be independent from itself unless it is a
deterministic constant (why?).
d
Problem 2. Let Xn = N (an , σn2 ) with an ∈ R and σn > 0. Suppose that Xn
converges weakly to some random variable X. Show that
an → a, σn → σ
d
for some a ∈ R and σ > 0. In addition, X = N (a, σ 2 ). This result tells us that
normal distributions are stable under weak convergence.
Solution. Since Xn is weakly convergent, we know that {Xn } is tight. Therefore,
the real sequences {an } and {σn } are both bounded (from Assignment 1, Problem
1). By the Bolzano-Weierstrass theorem in real analysis, we can find a subsequence
ank → a, σnk → σ

1
with some limit points a and σ. Since the characteristic function of Xnk is given
by
1 2 2
fnk (t) = eitank − 2 σnk t ,
we have
1 2 t2
fnk (t) → eita− 2 σ .
On the other hand, by assumption we have Xn → X weakly (so is true for
1 2 2
Xnk ). Therefore, the characteristic function of X must be equal to eita− 2 σ t , or
d
equivalently, X = N (a, σ 2 ). The above argument also shows that there cannot
be other limit points for the sequences an and σn apart from a and σ. Indeed,
if a0 and σ 02 are different limit points, by the same reason as before, we have
d
X = N (a0 , σ 02 ), and thus a0 = a, σ 0 = σ. As a consequence, an → a and σn → σ.
Problem 3. Let X be a random variable with characteristic function f (t).
(1) Suppose that f (2π) = 1. Show that with probability one, X takes values in
the set of integers.
[Hint: Look at the real part of the equation f (2π) = 1.]
(2) Suppose that X takes integer values only. Show that
Z π
1
P(X = k) = f (t)e−ikt dt
2π −π
for each k ∈ Z.
(3) By using the previous two parts, show that | cos t| cannot be a characteristic
function. Note that cos t is a characteristic function. This problems shows that
the modulus of a characteristic function needs not be a characteristic function.
Solution. (1) Recall that

f (t) = E[eitX ] = E[cos tX] + iE[sin tX].

The assumption f (2π) = 1 gives

1 = E[cos 2πX] + iE[sin 2πX].

By looking at the real part, we obtain

E[1 − cos 2πX] = 0.

Since 1 − cos 2πX > 0, we conclude that with probability one, 1 − cos 2πX = 0,
or equivalently, X ∈ Z (cos u = 1 if and only if u ∈ 2πZ).

2
(2) Since X is integer valued, we have
X
f (t) = E[eitX ] = eint P(X = n).
n∈Z

For given k ∈ Z, let us multiply e−ikt on both sides and then integrate over [−π, π].
Note that
Z π Z π (
0, if k 6= n
eint · e−ikt dt = ei(n−k)t dt =
−π −π 2π, if k = n.

Therefore,
Z π X Z π
−ikt
f (t)e dt = P(X = n) · eint · e−ikt dt
−π n∈Z −π

= 2πP(X = k).
(3) Suppose on the contrary that | cos t| is a characteristic function of some
random variable X. Since | cos 2π| = 1, we know that X must be integer valued.
Therefore, we can use the formula in Part (2). In particular, we have
Z π
2πP(X = k) = | cos t| · e−ikt dt
−π
Z π
=2 | cos t| · cos ktdt
0
Z π/2 Z π

=2 cos t · cos ktdt − cos t · cos ktdt
0 π/2
Z π/2 
= cos(k + 1)t + cos(k − 1)t dt
0
Z π 
− cos(k + 1)t + cos(k − 1)t dt
π/2

2 sin k+1
2
π 2 sin k−1
2
π
= + .
k+1 k−1
For k = 4, the right hand side equals
2 (−2) 4
+ = − < 0,
5 3 15
which is absurd. Therefore, | cos t| cannot be a characteristic function.

3
Week 9 Practice: Solutions

The purpose of this practice class is to work out a proof of Lyapunov’s central
limit theorem by using the method of characteristic functions. This is a bit more
sophisticated than the i.i.d. case, but after working through these details we will
be more comfortable with complex numbers and characteristic functions.
Theorem 1 (Lyapunov’s CLT). Let {Xn : n > 1} be a sequence of independent
random P have zero mean and finite third moments. Define Σn ,
variables that
Var[Sn ] and Γn , nm=1 E[|Xm |3 ]. If
p

Γn
→0 as n → ∞,
Σ3n
Sn
then Σn
→ N (0, 1) weakly.
Problem 2. Let {θn,m : 1 6 m 6 n, n > 1} be a double array of complex
numbers. Suppose that the following three properties hold:

P 16m6n |θn,m | → 0 as n → ∞;
(i) max
(ii) nm=1 |θn,m | 6 M for all n, where M is some positive number that is indepen-
dentP of n;
(iii) nm=1 θn,m → θ for some complex number θ as n → ∞.
Show that n
Y
(1 + θn,m ) → eθ .
m=1

Hint: It is enough to show that


n
X
log(1 + θn,m ) → θ as n → ∞.
m=1

To see this, we use the fact that


elog w1 +log w2 = w1 w2

1
for all complex numbers whenever log w1 , log w2 are defined (regardless of the
analytic branch we choose). For those who have not seen complex logarithm, I
would like you to take the following facts as granted. The function log(1 + z) is a
well defined continuous function in the open ball |z| < 1 which satisfies log 1 = 0.
It has the following estimate:

| log(1 + z) − z| 6 |z|2

for all z ∈ C such that |z| 6 21 . This estimate can be obtained easily by using the
Taylor expansion of log(1 + z) around z = 0.

Solution. By Assumption (i), we may first assume that n is large enough so that
|θn,m | 6 21 for any 1 6 m 6 n. This allows us to make sure that log(1 + θn,m ) is
well defined and to use the inequality

| log(1 + θn,m ) − θn,m | 6 |θn,m |2 .

It follows that
n n n
X X X
log(1 + θn,m ) − θ 6
θn,m − θ +
log(1 + θn,m ) − θn,m
m=1 m=1 m=1
n n
X X
6 θn,m − θ +
|θn,m |2 .
m=1 m=1

By Assumption (iii), the first term tends to zero. As for the second term, we have
n
X n
2
 X
|θn,m | 6 max |θn,m | · |θn,m | 6 M · max |θn,m | → 0,
16m6n 16m6n
m=1 m=1

where the second inequality follows from Assumption (ii) and the last convergence
follows from Assumption (i). Therefore, we conclude that
n
X
lim log(1 + θn,m ) = θ,
n→∞
m=1

and thus n n
Y X
log(1 + θn,m ) → eθ .

(1 + θn,m ) = exp
m=1 m=1

2
Problem 3. Follow the steps below to prove Lyapunov’s central limit theorem
(i) Let fm be the characteristic function of Xm and let t ∈ R be fixed. By using
Taylor’s theorem we can write
00
0 fm (0) 2 f (3) (ζm,t ) 3
fm (t) = fm (0) + fm (0)t + t + t,
2 6
where ζm,t is a number between 0 and t. What is the explicit form of the right
hand side in terms of Xm ?
(ii) Using the expansion in Part (i), write the characteristic function of ΣSnn (eval-
uated at a fixed t) in the form
n
Y
f Sn (t) = (1 + θn,m ).
Σn
m=1

(iii) Show that the double array {θn,m : 1 6 m 6 n, n > 1} of complex numbers
satisfy all the assumptions in Problem 1 with θ = −t2 /2 (recall that t has always
been fixed).
[Hint: Assumption (i) is harder to check, and you may need to use Hölder’s
inequality which says E[|X|p ]1/p 6 E[|X|q ]1/q for all 1 6 p 6 q < ∞].
Solution. Let fm be the
Qn characteristic function of Xm . The characteristic func-
Sn t
tion of Σn is given by m=1 fm ( Σn ). We wish to show that:
n
Y t  2
fm → e−t /2 as n → ∞,
m=1
Σn

for each fixed t ∈ R. The main idea is to use Problem 1, which requires writing
fm ( Σtn ) = 1 + θn,m with the complex numbers {θn,m } satisfying the assumption in
Problem 1. To obtain such an expansion we need to rely on Taylor’s approxima-
tion.
Since Xm has finite third moment, we know that fm is continuously differ-
entiable up to order three. Let t be a fixed real number. According to Taylor’s
theorem, we have
00
0 fm (0) 2 f (3) (ζm,t ) 3
fm (t) = fm (0) + fm (0)t + t + t,
2 6
where ζm,t is a number between 0 and t. By writing out the derivatives in terms
of moments explicitly, we have
1 2 2 i 3 iζm,t Xm 3
fm (t) = 1 − σm t − E[Xm e ]t .
2 6
3
Therefore,
2 3 iζm,t Xm
t  1 σm 2 i E[Xm e ] 3
fm =1− · 2t − 3
t.
Σn 2 Σn 6 Σn
We denote
1 σ2 3 iζm,t Xm
i E[Xm e ] 3
θn,m , − · m2 t2 − 3
t,
2 Σn 6 Σn
and check the three assumptions in Problem 1.
Assumption (i). We first look at the second term in θn,m . For each 1 6 m 6 n,
we have
3 iζm,t Xm
i E[Xm e ] 3 |t|3 max16m6n E[|Xm |3 ] |t|3 Γn
max t 6 · 6 ·
16m6n 6 Σ3n 6 Σ3n 6 Σ3n

which converges to zero by the assumption. As for the first term in θn,m , let us
denote X̂m , XΣm
n
. According to Hölder’s inequality, we have

2 E[|Xm |3 ] 2/3 Γn 2/3


E[X̂m ] 6 E[|X̂m |3 ]2/3 = 6 .
Σ3n Σ3n

Therefore,
t2 σ2 t2 t2 Γn 2/3
· max m2 = 2
max E[X̂m ]6 · 3 →0
2 16m6n Σn 2 16m6n 2 Σn
as n → ∞. We have thus concluded that

max |θn,m | → 0 as n → ∞.
16m6n

Assumption (ii). Note that


n 2
X σm
= 1.
Σ2
m=1 n

Therefore,
n n
X t2 |t|3 X E[|Xm |3 ] t2 |t|3 Γn
|θn,m | 6 + · = + · .
m=1
2 6 m=1 Σ3n 2 6 Σ3n

Γn
The right hand side is uniformly bounded in n since we know that Σ3n
→ 0 by the
assumption.

4
Assumption (iii) with θ = −t2 /2. We have
n n
X 3 iζm,t Xm
t2 it3 X E[Xm e ]
θn,m =− − · 3
.
m=1
2 6 m=1 Σn

We have seen previously that the modulus of the second term is bounded by
|t|3 Γn
6
· Σ3 → 0. Therefore, we conclude that
n

n
X t2
θn,m → − .
m=1
2

According to the result of Problem 1, we arrive at


n n
Y t  Y 2
f Sn (t) = fm = (1 + θn,m ) → e−t /2
Σn
m=1
Σn m=1

Sn
as n → ∞. By Lévy-Cramér’s theorem, we have Σn
→ N (0, 1) weakly. This
completes the proof of Lyapunov’s theorem.

5
Week 10 Practice: Solutions

Problem 1. Let {Xn : n > 1} be an i.i.d. sequence with zero mean and unit
variance. Show that X1 +···+X

n
n
will never converge in probability. This tells us that
we cannot strengthen the central limit theorem to convergence in probability.
Sn
Solution. We already know from the central limit theorem that √ n
→ N (0, 1)
weakly. Suppose on the contrary that there is a random variable Z, such that
Sn

n
→ Z in probability. Since convergence in probability implies weak conver-
d
gence, we must have Z = N (0, 1). On the other hand, as a subsequence we have

S S + (Xn+1 + · · · + X2n )
√2n = n √ →Z in probability.
2n 2n
Therefore,
Xn+1 + · · · + X2n √ S2n Sn √
√ = 2 × √ − √ → ( 2 − 1)Z in probability.
n 2n n
But since
Xn+1 + · · · + X2n d Sn
√ =√ ,
n n
we also have
Xn+1 + · · · + X2n
√ →Z weakly.
n

Therefore, we conclude that Z has the same distribution as ( 2 − 1)Z, which is
Sn
absurd. In other words, √n
can never converge in probability.

Problem 2. Let {Xn : n > 1} be an i.i.d. sequence with zero mean and unit
variance.
(i) Show that P
16i<j6n Xi Xj
→0
n2
1
almost surely as n → ∞.
(ii) Show that P
16i<j6n Xi Xj Z2 − 1
→ weakly
n 2
d
as n → ∞, where Z = N (0, 1).

Solution. (i) We can write


P
16i<j6n Xi Xj (X1 + · · · + Xn )2 − (X12 + · · · + Xn2 )
=
n2 2n2
1 X1 + · · · + Xn  2 1 X 2 + · · · + Xn2
= × − × 1 .
2 n 2n n
According to the strong law of large numbers, we have

X1 + · · · + Xn X 2 + · · · + Xn2
→ 0, 1 → 1 a.s.
n n
Therefore, P
16i<j6n Xi Xj
→ 0 a.s.
n2
(ii) Using the same expression as in Part (i), we have
P
16i<j6n Xi Xj 1 X1 + · · · + Xn 2 X12 + · · · + Xn2
= × √ − .
n 2 n 2n
Since
X1 + · · · + Xn
√ →Z weakly
n
and
X12 + · · · + Xn2
→ 1 a.s.,
n
we conclude that P
16i<j6n Xi X j Z2 − 1
→ weakly.
n 2
The above argument uses the following facts for weak convergence.

Lemma. (i) If Xn → X weakly, then Xn2 → X 2 weakly.


(ii) If Xn → X weakly and Yn → c weakly where c is a deterministic constant,
then Xn + Yn → X + c weakly.

2
Proof. (i) Let f be a bounded continuous function. We need to show that

E[f (Xn2 )] → E[f (X 2 )].

But this follows from the assumption and the fact that the function x 7→ f (x2 ) is
bounded continuous.
(ii) We verify the second characterisation in the Portmanteau theorem. Let f
be a bounded and uniformly continuous function on R. We first claim that

lim E[f (Xn + Yn )] − E[f (Xn + c)] = 0.
n→∞

Indeed, given ε > 0, by uniform continuity there exists δ > 0 such that

|x − y| 6 δ =⇒ |f (x) − f (y)| 6 ε.

It follows that

E[f (Xn + Yn )] − E[f (Xn + c)]
6 E[|f (Xn + Yn ) − f (Xn + c)|; |Yn − c| 6 δ]
+ E[|f (Xn + Yn ) − f (Xn + c)|; |Yn − c| > δ]
6 ε + 2kf k∞ · P(|Yn − c| > δ),

where kf k∞ , supx∈R |f (x)|. Since c is deterministic, we know that

Yn → c weakly =⇒ Yn → c in prob.

Therefore, by taking n → ∞ we have



lim E[f (Xn + Yn )] − E[f (Xn + c)] 6 ε.
n→∞

The claim follows since ε is arbitrary.


Now it remains to see that, E[f (Xn + c)] → E[f (X + c)]. But this follows
from the assumption that Xn → X weakly along with the test function g(x) ,
f (x + c).
Problem 3. Let {Xn : n > 1} be an i.i.d. sequence whose probability density
function is given by (
1
|x|3
, |x| > 1;
p(x) =
0, otherwise.
Establish a suitable version of the central limit theorem for the partial sum se-
quence Sn , X1 + · · · + Xn .

3
Solution. We use the method of characteristic functions. Let f (t) be the char-
acteristic function of X1 . For each t 6= 0 we have
Z ∞
1 − cos tx
Z
itx
1 − f (t) = (1 − e )p(x)dx = 2 dx
R 1 x3
Z ∞
2 1 − cos u
= 2t du. (E1)
|t| u3

We wish to understand the behaviour of f (t) when t → 0. The issue of the above
integral is that, when u is small we have
1 − cos u 1
3

u 2u
which is not integrable near the origin.
We now use L’Hôpital’s rule to figure out the precise explosion rate of this
integral. Let Z ∞
1 − cos u
ϕ(x) , du, x > 0.
x u3
Then
1 − cos x 1
ϕ0 (x) = − 3
≈− as x → 0.
x 2x
Therefore, the explosion rate of ϕ(x) should coincide with the explosion rate of
Z
1 1
(− )dx = ln x−1
2x 2
as x → 0. Using L’Hôpital’s rule, one checks that

ϕ(x)
lim = 1.
x→0 1 ln x−1
2

As a consequence, we know that

ϕ(x)
1 = 1 + g(x)
2
ln x−1

where g(x) = o(1), i.e. a function such that g(x) → 0 when x → 0. In particular,
we can write
1 1
ϕ(x) = ln x−1 + g(x) ln x−1 .
2 2

4
We substitute the above expression into equation (E1) to obtain that
1 1
1 − f (t) = 2t2 × ln |t|−1 + g(|t|) ln |t|−1

2 2
= t2 ln |t|−1 + t2 g(|t|) ln |t|−1 ,
or equivalently,
f (t) = 1 − t2 ln |t|−1 − t2 g(|t|) ln |t|−1 .
This expression gives the precise behaviour of f (t) near the origin, which can be
used to obtain a central limit theorem for Sn .
To be more precise, our aim is to find a suitable normalising constant an > 0,
so that Sann has a meaningful weak limit as n → ∞. To figure out what an should
be, we look at the characteristic function fn (t) of Sann :
t n t2 an t2 |t| an n
fn (t) = f = 1 − 2 ln − 2 g( ) ln
an an |t| an an |t|
2 2
t an t |t| an 
= exp n × ln 1 − 2 ln − 2 g( ) ln .
an |t| an an |t|
Note that as n → ∞, the last term at2 g( a|t|n ) ln a|t|n is smaller than the term at2 ln a|t|n
2 2

n n
due to the presence of the function g as n → ∞. By using the approximation
ln(1 − x) ≈ x as x → 0, in order to expect that fn (t) converges to a meaningful
limit, we must require that
t2 an
n × 2 ln
an |t|
converges to a meaningful
√ limit. A moment’s thought reveals that this is suggest-
ing the choice an = n ln n. Indeed, for this choice of an we have
t2 an t2 (n ln n)1/2
n× ln = n × ln
a2n |t| n ln n |t|
2
t 1 
= × (ln n + ln ln n) − ln |t|
ln n 2
2
t
→ .
2

Therefore, with the choice of an = n ln n, we conclude that
2 /2
fn (t) → e−t .
In other words, we have
X 1 + · · · + Xn
√ → N (0, 1) weakly.
n ln n

5
Week 11 Practice: Solutions

Problem. Let ξ be a uniformly distributed random variable on (0, 1) with binary


expansion
ξ = 0.ξ1 ξ2 ξ3 · · · .
Recall that {ξn :> 1} is an i.i.d. sequence of standard Bernoulli random variables.
For each n > 1, let Nn be the total number of occurrence for the string “11”
among the first n + 1 digits of ξ. To be more precise, let
(
1, ξk = ξk+1 = 1,
ηk ,
0, otherwise.
Then Nn = η1 + · · · + ηn .
(i) Show that Nn satisfies the strong law of large numbers:
Nn 1
→ a.s. as n → ∞.
n 4
√n −E[Nn ] . By developing Step 3 of Stein’s method, show that there
(ii) Let Ŝn , N
Var[Nn ]
exists a constant C > 0, such that
C
|E[ϕ(Ŝn )] − E[ϕ(Z)]| 6 √ kϕ0 k∞
n
for every n and every ϕ : R → R that is continuously differentiable with bounded
d
derivative, where Z = N (0, 1). In particular, the central limit theorem
Ŝn → Z weakly
holds and we have the L1 -Berry-Esseen estimate
C
kFn − ΦkL1 6 √ , n > 1,
n

where Fn and Φ are the distribution functions of Ŝn and Z respectively.

1
Solution. (i) We can write
Pn
Nn ηk 1 X X
= k=1 =

ηn + ηn .
n n n 16k6n 16k6n
k:odd k:even

Note that η1 , η3 , η5 , · · · is an i.i.d. sequence of Bernoulli random variables with


parameter 41 . Therefore, the strong law of large numbers holds for this sequence:

η1 + η3 + · · · + η2m−1 1
→ a.s.
m 4
Similarly,
η2 + η4 + · · · + η2m 1
→ a.s.
m 4
It follows that
Nn 1 X X 
= ηn + ηn
n n 16k6n 16k6n
k:odd k:even
1 1 X 1 1 X
= × ηn + × ηn
2 n/2 16k6n 2 n/2 16k6n
k:odd k:even
1 1 1 1 1
→ × + × = a.s.
2 4 2 4 4
(ii) Let ϕ be a given continuously differentiable function with bounded derivative.
Let f be the unique bounded solution to Stein’s equation associated with ϕ. We
wish to estimate the quantity

E[f 0 (Ŝn )] − E[Ŝn f (Ŝn )].


p
As usual, we set X̂m , XΣm
n
, where Xm , ηm − E[ηm ] and Σn , Var[Nn ].
In the same way as in the independent case, we write
n
X
E[Ŝn f (Ŝn )] , E[X̂m f (Ŝn )].
m=1

The first issue will arise if we copy the proof from the independent case, namely
when switching from Ŝn to Ŝn − X̂m . The problem is that we no longer have

E[X̂m f (Ŝn − X̂m )] = 0,

2
since X̂m is not independent of Ŝn − X̂m . To overcome this issue, we need to
subtract a bit more terms from Ŝn so that the remaining part is independent of
X̂m . It is apparent that X̂m is independent of

Ŝn − (X̂m−1 + X̂m + X̂m+1 ) = X̂1 + · · · + X̂m−2 + X̂m+2 + · · · + X̂n .

As a consequence, if we set

X̄m , X̂m−1 + X̂m + X̂m+1 ,

then we have

E[X̂m f (Ŝn − X̄m )] = E[X̂m ] · E[f (Ŝn − X̄m )] = 0

and thus

E[X̂m f (Ŝn )] = E[X̂m f (Ŝn ) − f (Ŝn − X̄m ) ].

The next step can be copied from the independent case. Namely, we write
Z 1
f (Ŝn ) − f (Ŝn − X̄m ) = f 0 (Tn,m (t)) · X̄m dt
0

where
Tn,m (t) , (1 − t)(Ŝn − X̄m ) + tŜn .
It follows that
Z 1
E[X̂m f (Ŝn )] = E[X̂m X̄m · f 0 (Tn,m (t))dt]
Z0 1
f 0 (Tn,m (t)) − f 0 (Tn,m (0)) dt]

= E[X̂m X̄m ·
0
+ E[X̂m X̄m · f 0 (Tn,m (0))].

Therefore, we have

E[f 0 (Ŝn )] − E[Ŝn f (Ŝn )]


X n
= E[f 0 (Ŝn )] − E[X̂m X̄m · f 0 (Tn,m (0))]

m=1
n
X Z 1
f 0 (Tn,m (t)) − f 0 (Tn,m (0)) dt].

− E[X̂m X̄m · (E1)
m=1 0

3
The last sum on the right hand side can be estimated easily. Indeed, we have

|f 0 (Tn,m (t)) − f 0 (Tn,m (0))| 6 kf 00 k∞ · |Tn,m (t) − Tn,m (0)| 6 kf 00 k∞ |X̄m |,

and thus
n Z 1
X
f 0 (Tn,m (t)) − f 0 (Tn,m (0)) dt]

E[X̂m X̄m ·
m=1 0
n
X
6 kf 00 k∞ 2
E[|X̂m | · X̄m ]
m=1
00 n
= C1 kf k∞ × ,
Σ3n

where
C1 , E[|Xm | · (Xm−1 + Xm + Xm+1 )2 ]
is a constant independent of m.
It remains to estimate the first term on the right hand side of equation (E1).
For this purpose, we write
n
X
0
E[f (Ŝn )] − E[X̂m X̄m · f 0 (Tn,m (0))]
m=1
n
X
0
= E[f (Ŝn )] − E[X̂m X̄m · f 0 (Ŝn − X̄m )]
m=1
n
X
= E[Var[Ŝn ] × f 0 (Ŝn )] − E[X̂m X̄m · f 0 (Ŝn − X̄m )],
m=1

To mimic the argument in the independent case, first observe that


n
X n
X
Var[Ŝn ] = E[Ŝn2 ] = E[X̂m · Ŝn ] = E[X̂m · X̄m ].
m=1 m=1

4
Let us denote ζm , E[X̂m · X̄m ]. It follows that
n
X
0
E[f (Ŝn )] − E[X̂m X̄m · f 0 (Tn,m (0))]
m=1
n
X
E[ζm f 0 (Ŝn )] − E[X̂m X̄m · f 0 (Ŝn − X̄m )]

=
m=1
Xn
E[ζm · f 0 (Ŝn ) − f 0 (Ŝn − X̄m ) ]

=
m=1
Xn
E[ ζm − X̂m X̄m f 0 (Ŝn − X̄m )].

+ (E2)
m=1

The first sum on the right hand side is also easily estimated:
n
X
E[ζm · f 0 (Ŝn ) − f 0 (Ŝn − X̄m ) ]


m=1
n
X
00
6 kf k∞ |ζm | · E[|X̄m |]
m=1
n
6 C2 kf 00 k × ,
Σ3n
where C2 is a constant independent of m. As for the second term on the right
hand side of (E2), the key observation is that, if we subtract two more terms from
Ŝn − X̄m (namely X̂m−2 and X̂m+2 ), the remaining quantity is independent of
ζm − X̂m X̄m . To be precise, let us denote
¯ = X̄ + X̂
X̄ m m m−2 + X̂m+2 .

Then
¯ )] = E[ ζ − X̂ X̄ ] · E[f 0 (Ŝ − X̄
E[ ζm − X̂m X̄m f 0 (Ŝn − X̄
 ¯ )] = 0.
m m m m n m

It follows that the second term on the right hand side of (E2) is equal to
n
X
E[ ζm − X̂m X̄m f 0 (Ŝn − X̄m )]

m=1
n
¯ )],
X
E[ ζm − X̂m X̄m · f 0 (Ŝn − X̄m ) − f 0 (Ŝn − X̄

= m
m=1

5
which is bounded above by
n
¯ )]
X
E[ ζm − X̂m X̄m · f 0 (Ŝn − X̄m ) − f 0 (Ŝn − X̄

m
m=1
n
X
00
  
6 kf k∞ E |ζm | + |X̂m X̄m | · X̂m−2 + X̂m+2
m=1
00 n
6 C3 kf k∞ × .
Σ3n

To put everything together, we have obtained that


n
|E[f 0 (Ŝn )] − E[Ŝn f (Ŝn )]| 6 C4 kf 00 k∞ × ,
Σ3n

where C4 , max{C1 , C2 , C3 }. On the other hand, it is elementary to calculate


that
5n − 2
Σ2n = Var[Nn ] = .
16
Therefore, we arrive at

Ckf 00 k∞
|E[f 0 (Ŝn )] − E[Ŝn f (Ŝn )]| 6 √ .
n

From this point on, the central limit theorem and the L1 -Berry-Esseen estimate
follow from exactly the same argument as in the independent case.

You might also like