Professional Documents
Culture Documents
Practice Classes
Practice Classes
d
Problem 1. Consider a sequence Xn = B(n, pn ) (n > 1) of binomial random
variables. Suppose that limn→∞ npn = λ for some constant λ.
(1) For each k > 0, compute limn→∞ P(Xn = k).
[Hint: Let bn (k) , P(Xn = k). First compute limn→∞ bnb(k−1)
n (k)
. Then use bn (k) =
bn (0) · bbnn (0)
(1)
· · · · · bnb(k−1)
n (k)
]
(2) If λ > 0, show that Xn converges weakly to the Poisson distribution with
parameter λ.
(3) If λ = 0, show that Xn converges weakly to the zero random variable.
Solution. (1) For each k > 1, we have
n k
p (1 − pn )n−k
bn (k) k n
=
bn (k − 1) n k−1
p (1 − pn )n−k+1
k−1 n
n−k+1 pn
= ·
k 1 − pn
npn (k − 1)pn
= − .
k(1 − pn ) k(1 − pn )
Since npn → λ, we also have pn → 0. Therefore,
bn (k) λ
→ as n → ∞.
bn (k − 1) k
In addition, we also have
1
bn (0) = (1 − pn )n = (1 − pn ) pn ·npn → e−λ .
Therefore,
k
bn (1) bn (k) −λ λ λ −λ λ
bn (k) = bn (0) · · ··· · → e · · ··· · = e · .
bn (0) bn (k − 1) 1 k k!
1
(2) For any x > 0, we have
[x] [x]
X X
−λ λk
P(Xn 6 x) = bn (k) → e · = F (x),
k=0 k=0
k!
where [x] denotes the integer part of x, and F (x) is the cumulative distribution
function of the Poisson distribution with parameter λ. The above convergence also
holds trivially if x < 0. Therefore, Xn converges weakly to Poi(λ).
(3) If λ = 0, according to Part (1) we have bn (0) → 1 and bn (k) → 0 for each
k > 1. Therefore,
[x]
X
P(Xn 6 x) = bn (k) → 1
k=0
if x > 0, and P(Xn 6 x) = 0 if x < 0. Since
(
1, x > 0;
F (x) =
0, x < 0,
is the cumulative distribution function of the zero random variable, we conclude
that Xn converges weakly to zero in this case.
Remark. In this particular problem, we actually have P(Xn 6 x) → F (x) for all
x ∈ R, where F (x) is the relevant limiting distribution. Note that, in general this
may not be true and one can only expect convergence at continuity points of F ,
as illustrated in the definition of weak convergence.
Problem 2. Let Xn (n > 1) be a geometric random variable with parameter 1/n.
d
(1) Show that Xn /n converges weakly to Y = exp(1).
(2) What is the probability that Xn is an even integer?
(3) Does P(Xn /n ∈ A) converge to P(Y ∈ A) for any Borel measurable subset
A ⊆ R?
Solution. (1) If x < 0, we have
Xn
P 6 x = 0 → 0 = P(Y 6 x)
n
trivially. If x > 0, we have
[nx]
Xn X 1 k 1
P 6 x = P(Xn 6 nx) = 1− ·
n k=0
n n
1 [nx] 1 n· [nx]
=1− 1− =1− 1− n
,
n n
2
[nx]
where [nx] denotes the integer part of nx. Note that n
→ x. Therefore,
Xn
6 x → 1 − e−x = P(Y 6 x).
P
n
This implies that Xn /n converges weakly to Y .
(2) Since Xn is a geometric random variable with parameter 1/n, we have
∞
X 1 2k 1 n
P(Xn is an even number) = 1− · = .
k=0
n n 2n − 1
(3) This is not true. Consider A = Q (the set of rational numbers). Since Y is a
continuous random variable, we have
P(Y ∈ Q) = 0.
3
Week 3 Practice: Solutions
(i) When n = 365, we intepret Xi (i > 1) as the birthday of the i-th person. What
is the meaning of the event {Tn > r}?
(ii) For each r > 1, show that
1 2 r − 1
P(Tn > r) = 1 − · 1− ··· 1 − .
n n n
1
(iii) Show that n− 2 Tn converges weakly (as n → ∞) to the distribution F given
by ( 2
1 − e−s /2 , x > 0;
F (x) ,
0, x 6 0.
[Hint: To be fully rigorous you may use the elementary inequality
Solution. (i) Tn > r means no two members among the first r terms X1 , · · · , Xr
are equal. In the context of the birthday problem when n = 365, this is the event
that no two members in a group of r people have the same birthday.
(ii) Note that
{Tn > r} = {X1 , · · · , Xr all distinct}.
To produce this event, there are no restriction on X1 , and then there are n − 1
choices for X2 to be different from X1 , and then n−2 choices for X3 to be different
1
from X1 and X2 , etc. Therefore,
n−1 n−2 n−r+1
P(Tn > r) = 1 × × × ··· ×
n n n
1 2 r − 1
= 1− · 1− ··· 1 − .
n n n
Note that Tn 6 n + 1, and thus P(Tn > r) = 0 if r > n + 1. The above formula
covers this situation as well and is thus true for all r ∈ N.
(iii) If x 6 0 we have P(n−1/2 Tn 6 x) = 0. If x > 0, we have
√
P(n−1/2 Tn > x) = P(Tn > nx)
√
[ nx]−1
Y k
= 1−
k=1
n
√
[ nx]−1
X k
= exp log 1 − .
k=1
n
In addition, we have
√ √
[ nx]−1 √ [ nx]−1
X k 2 [ nx] X
6 · k
k=1
n n2 k=1
√ √ √
[ nx] [ nx] · ([ nx] − 1)
= ·
n 2n
→ 0.
2
In particular,
Problem 2. Prove the sufficiency part of Proposition 1.1 in the lecture notes.
Namely, let Fn , F be cumulative distribution functions and let µn , µ be the corre-
sponding probability measures. If
µn ((−M, M ]) > 1 − ε.
Since −M and x are both continuity points of µ, letting n → ∞ the second term
vanishes, and we arrive at
lim Fn (x) − F (x) 6 2ε.
n→∞
3
(i) Xn converges weakly to X.
(ii) For any k ∈ Z, we have
Solution. (i) =⇒ (ii). For each k ∈ Z, k ± 1/2 are both continuity points of the
law of X. By the definition of weak convergence, we have
1 1
P(Xn = k) = P Xn ∈ (k − , k + ]
2 2
1 1
→ P X ∈ (k − , k + ]
2 2
= P(X = k).
(ii) =⇒ (i). Given any finite interval (a, b], it contains at most finitely many
integer points. Therefore,
X X
P(Xn ∈ (a, b]) = P(Xn = k) → P(X = k) = P(X ∈ (a, b]).
k∈(a,b] k∈(a,b]
By enlarging N if necessary, we may assume that (0.1) holds for X as well. Observe
that,
P(Xn ∈ A) = P(Xn ∈ A ∩ [−N, N ]) + P(Xn ∈ A ∩ [−N, N ]c ),
and
P(X ∈ A) = P(X ∈ A ∩ [−N, N ]) + P(X ∈ A ∩ [−N, N ]c ).
4
It follows that
P(Xn ∈ A) − P(X ∈ A)
6 P(Xn ∈ A ∩ [−N, N ]) − P(X ∈ A ∩ [−N, N ])
+ P(|Xn | > N ) + P(|X| > N )
6 2ε + P(Xn ∈ A ∩ [−N, N ]) − P(X ∈ A ∩ [−N, N ]).
Note that there are finitely many integer points in [−N, N ]. By (ii) we have
Therefore,
lim P(Xn ∈ A) − P(X ∈ A) 6 2ε.
n→∞
5
Week 4 Practice: Solutions
Xn → X, Yn → Y in prob.
E[|Xn → X|p ] → 0.
1
Solution. (1) Let ε > 0. Note that
ε ε
|(Xn + Yn ) − (X + Y )| > ε ⊆ |Xn − X| > ∪ |Yn − Y | > .
2 2
Therefore,
P |(Xn + Yn ) − (X + Y )| > ε
ε ε
6 P |Xn − X| > + P |Yn − Y | > .
2 2
The right hand side tends to zero by assumption. Therefore, Xn + Yn converges
to X + Y in probability.
(2) Suppose that Xn converges to X in Lp . Given ε > 0, by the chebyshev in-
equality, we have
1
P(|Xn − X| > ε) 6 E[|Xn − X|p ] → 0.
εp
Therefore, Xn → X in proability.
(3) By the definition of convergence in probability (with ε = 1), there exists n1
such that
P(|Xn1 − X| > 1) < 1.
We can use the definition again (with ε = 1/2) to find n2 > n1 such that
1 1
P |Xn2 − X| > < 2.
2 2
Inductively, at the k-th step (using ε = 1/k in the definition of convergence in
probability), we find nk > nk−1 such that
1 1
P |Xnk − X| > < 2.
k k
We now end up with a subsequence {Xnk : k > 1}, and we claim that Xnk → X
a.s. To see this, observe from the construction that
∞
X 1
P |Xnk − X| > < ∞.
k=1
k
Problem 3. Let Xn (n > 1) and X be random variables. Suppose that for any
ε > 0,
X∞
P(|Xn − X| > ε) < ∞.
n=1
Solution. For each k > 1, according to the first Borel-Cantelli’s lemma, we know
that
1
P |Xn − X| > for infinitely many n = 0.
k
Equivalently, with
1
Ωk , |Xn − X| 6 when n is sufficiently large
k
we have P(Ωk ) = 1. Let Ω , ∩k>1 Ωk . Then P(Ω) = 1. By the definition of
convergence, we have
Ω ⊆ lim Xn = X .
n→∞
3
To this end, we first write
∞ ∞ ∞
X Xn X X
P >ε = P(|Xn | > nε) = P(|X1 | > nε),
n=1
n n=1 n=1
where the last equality follows from the fact that the Xn ’s are identically dis-
tributed. In addition, we have
Z ∞
E[|X1 |] = P(|X1 | > x)dx
0
∞ Z
X nε
= P(|X1 | > x)dx
n=1 (n−1)ε
∞ Z
X nε
> P(|X1 | > nε)dx
n=1 (n−1)ε
∞
X
=ε· P(|X1 | > nε).
n=1
X∞
6 P(X12 > n)
n=0
∞
X
61+ P(Xn2 > n).
n=1
It follows that ∞
X
P(Xn2 > n) = ∞.
n=1
4
According to the second Borel-Cantelli’s lemma, we conclude that
√
or equivalently, with probability one |Xn | > n for infinitely many n.
5
Week 5 Practice: Solutions
Therefore,
Z 1 Z 1
f (x1 · · · xn )1/n dx1 · · · dxn → f exp(−1) .
···
0 0
In addition, since ∞
P
n=1 an < ∞, we know that an → 0 and thus the sequence
{an } is bounded, say with some M > 0 we have an 6 M for all n. It follows that
∞
X ∞
X
a2n 6 M an < ∞.
n=1 n=1
2
This clearly implies that
∞
X
Var[an Xn ] < ∞.
n=1
P∞
By Kolmogorov’s two-series theorem, we conclude that n=1 an Xn is convergent
a.s.
(ii) The key observation is that
∞ ∞
X X 2
6 0) =
P(Xn = < ∞.
n=1 n=1
n2
∞ ∞
X X 2
E[Xn2 ] = n2 × = ∞,
n=1 n=1
n2
and thus the condition in Kolmogorov’s two-series theorem fails in this example.
Problem 3. Following the steps below to show that the second Borel-Cantelli’s
lemma holds true when total independence is replaced by pairwise independence.
Let {An : n > 1} be a sequence of pairwisely independent events (i.e. P(An ∩
Am ) = P(An ) · P(Am ) for any n 6= m). Suppose that ∞
P
n=1 P(An ) = ∞. We wish
to show that P(An infinitely often) = 1.
(i) Consider the indicator random variable Xn , 1An and define Sn , X1 + · · · +
Xn . Observe that the condition is equivalent to “E[Sn ] → ∞”, and the claim is
equivalent to “Sn → ∞ a.s.”
(ii) Let A > 0 be a fixed number. Use Chebyshev’s inequality to show that
1
P Sn − E[Sn ] 6 Aσ(Sn ) > 1 − 2 , for all n > 1, (E1)
A
p
where σ(Sn ) , Var[Sn ].
(iii) Show that
n
X
Var[Sn ] = Var[Xk ],
k=1
3
and use this to deduce that
σ(Sn )
→ 0 as n → ∞. (E2)
E[Sn ]
(iv) Use (E1) and (E2) to deduce that, for given A > 0, there exists N > 1
depending on A, such that
1 1
P Sn > E[Sn ] > 1 − 2
2 A
for all n > N . Conclude the proof from this point.
and thus ∞
X
P(An ) = ∞ ⇐⇒ E[Sn ] → ∞.
n=1
4
Since Xk is the indicator random variable for Ak , we see that
Therefore,
n
X
Var[Sn ] 6 P(Ak ) = E[Sn ].
k=1
According to (E2), we can find N depending on A, such that for any n > N ,
1
E[Sn ] − Aσ(Sn ) > E[Sn ].
2
It follows from (E1) that
1 1
P Sn > E[Sn ] > 1 − 2 (E3)
2 A
when n > N . To finish the proof, let us write
∞
X
S , lim Sn = Xn .
n→∞
n=1
5
Week 6 Practice: Solutions
Solution. To avoid change of variables over C and complex integration, our gen-
eral strategy of computing the characteristic function is to first compute the mo-
ment generating function, and then to apply the substitution t 7→ it. This is easily
seen from the following relation:
t7→it
m.g.f E[etX ] −→ ch.f. E[eitX ].
(i) We first consider the exponential distribution (i.e. n = 1). The moment
generating function of exp(α) is given by
Z ∞
α
M (t) = etx · αe−αx dx = .
−∞ α−t
M (t) is well-defined when t < α. Changing t to it, we obtain the characteristic
function
α
f (t) = .
α − it
Since γ(n, α) is the sum of n independent copies of exp(α), its characteristic
function is given by
α n
g(t) = f (t)n = .
α − it
1
(ii) The moment generating function of X 2 is given by
2
MX 2 (t) = E[etX ]
Z ∞
2 1 x2
= etx · √ e− 2 dx
−∞ 2π
Z ∞
1 x2
−
=√ e 2·(1−2t)−1 dx.
2π −∞
The above integral is convergent if and only if t < 1/2. In this case,
Z ∞
(1 − 2t)−1/2 − x2
MX 2 (t) = p e 2·(1−2t)−1 dx
2π · (1 − 2t)−1 −∞
= (1 − 2t)−1/2 , t < 1/2.
fX 2 (t) = (1 − 2it)−1/2 .
It follows that
fX 2 +Y 2 = (1 − 2it)−1 .
This is the exponential distribution with parameter 1/2.
Problem 2. Let f (t), g(t) be characteristic functions of some random variables.
Show that each of the following functions is again a characteristic function (of
some random variable/probability measure).
(i) h(t) = |f (t)|2 ;
R 1(t) + (1 − λ)g(t) where λ ∈ (0, 1);
(ii) h(t) = λf
(iii) h(t) = 0 f (ut)du.
Solution. (i) Let X, Y be two independent random variables with the same char-
acteristic function f . Then the characteristic function of X − Y is given by
(ii) Let f (t) and g(t) be the characteristic functions of probability measures µ and
ν respectively. In other words,
Z Z
itx
f (t) = e µ(dx), g(t) = eitx ν(dx).
R R
2
Define a new measure τ by
More generally, one can show in the same way that, if f1 , · · · , fn are characteristic
functions and λ1 , · · · , λn are positive numbers such that λ1 + · · · + λn = 1, then
λ1 f1 + · · · + λn fn
= du eitx µu (dx)
Z0 1 R
= fu (t)du
0
Z 1
= f (ut)du.
0
3
Here c > 0 is the normalising constant so that the above expression defines a
probability mass function. Let fn (t) be the characteristic function of Snn where
Sn , X1 + · · · + Xn . Show that
lim fn (t) = 1
n→∞
for every t ∈ R.
[Hint: write the characteristic function as
fn (t) = exp(n log(1 − an ))
P∞
and P∞an = o(1/n). Note that an is given by a sum
Pn show that k=2 . Split it into
k=2 and k=n ]
Therefore, we do not have a strong law Snn → 0 a.s. For this example, the “limsup”
in (E) cannot be replaced by the “lim”, since we know that there is a subsequence
Snk
nk
→ 0 almost surely (a consequence of convergence in probability–see Week 4
Practice, Problem 2 (3)).
Solution. The characteristic function of X1 is given by
∞
X c X 2c cos kt
f1 (t) = eikt · = .
k 2 log k k=2
k 2 log k
k∈Z\{0,±1}
Sn
Since the Xn ’s are i.i.d., the characteristic function of n
is given by
∞
X 2c cos kt n n
fn (t) =
k=2
k2 log k
∞
X 2c cos kt n
= exp n log
k=2
k 2 log k
∞
2c 1 − cos kt
X
n
= exp n log 1 − 2
,
k=2
k log k
4
where we have used the relation that
∞
X 2c
= 1.
k=2
k2 log k
If we can show that an = o(1/n), i.e. nan → 0 (for each fixed t), then the desired
result follows. In fact, this will imply
In particular, we have
n
1X 1
→0
n k=2 log k
and thus the right hand side of (E2) is of o(1/n). For the second term in (E1),
we have ∞ ∞
X 2 sin2 2n
kt
2 X 1
6 .
k=n
k 2 log k log n k=n k 2
5
Note that ∞ ∞ ∞
X 1 X 1 X 1 1 1
2
6 = − = .
k=n
k k=n
k(k − 1) k=n
k − 1 k n − 1
It follows that ∞
X 2 sin2 kt
2n 2 1
6 =o .
k=n
k2 log k (n − 1) log n n
Therefore, we conclude that an = o(1/n).
6
Week 7 Practice: Solutions
Problem 1. In this problem, you can look up the explicit formulae for charac-
teristic functions without deriving them.
(i) Let Xn be a binomial random variable with parameters n and pn . Suppose that
npn → λ > 0 as n → ∞. Use the method of characteristic functions to show that
Xn converges weakly to the Poisson distribution with parameter λ.
[Hint: You may use the approximation log(1 + z) = z + o(|z|) when |z| is small
even in the case when z is a complex variable.]
(ii) Let Xλ be a Poisson random variable with parameter
√ λ > 0. Use the method
of characteristic functions to show that, (Xλ − λ)/ λ converges weakly to the
standard normal distribution as λ → ∞. How will you intepret this property
heuristically?
Note that
log(1 − pn (1 − eit )) = −pn (1 − eit ) + o(pn ).
Therefore,
n log(1 − pn (1 − eit ))
= −npn (1 − eit ) + no(pn )
o(pn )
= −npn (1 − eit ) + npn ·
pn
it
→ −λ(1 − e ).
It follows that
fn (t) → exp(−λ(1 − eit )), as n → ∞,
1
where the right hand side is exactly the characteristic function of the Poisson
distribution with parameter λ. √
(2) By the linearity property, the characteristic function of (Xλ − λ)/ λ is
given by
√ √
fλ (t) = exp λ · eit/ λ − 1 − i λt
√ t
= exp λ · eit/ λ − 1 − i √ .
λ
Note that
√ it t2 1
eit/ λ
−1− √ =− +o , as λ → ∞.
λ 2λ λ
Therefore,
2 /2
fλ (t) → e−t , as λ → ∞,
where the right hand side is the characteristic function of the standard normal
distribution.
One way to interpret this result is the following. We assume that λ is a positive
integer. Recall that the sum of independent Poisson random variables is again a
Poisson random variable whose parameter is the sum of the individual parameters.
Therefore, we may think of
X λ = X1 + X2 + · · · + Xλ ,
where {Xn : n > 1} is an i.i.d. sequence of Poisson(1)-random variables. Since
E[Xλ ] = λ, Var[Xλ ] = λ,
the result
Xλ − λ
√ → N (0, 1) weakly
λ
is precisely a central limit theorem for the fluctuation of the partial sum.
Problem 2. Let f (t) be a characteristic function of some random variable, and
let λ > 0.
(i) For each integer n > λ, explain why
λ(f (t) − 1)
1+
n
is a characteristic function.
(ii) By using Part (i) and Lévy-Cramér’s theorem, show that eλ(f (t)−1) is a char-
acteristic function.
2
(iii) Let {Xn : n > 1} be an i.i.d. sequence with characteristic function f (t),
and let N be a Poisson(λ)-distributed random variable that is independent from
{Xn : n > 1}. Can you use them to construct a random variable whose charac-
teristic function is eλ(f (t)−1) ?
Solution. (i) We can write
λ(f (t) − 1) λ λ
1+ = 1− · 1 + · f (t).
n n n
Note that the constant function 1 is a characteristic function (of X = 0). When
n > λ, the above expression is a convex combination of 1 and f (t). According to
Week 6 Practice Problem 2 (ii), it is again a characteristic function.
λ(f (t)−1) n
(ii) Using Part (i), we also know that 1 + n
is a characteristic function.
In addition, we know that
λ(f (t) − 1) n
eλ(f (t)−1) = lim 1 + .
n→∞ n
The limiting function eλ(f (t)−1) is apparently continuous at t = 0. According to
Lévy-Cramér’s theorem, we know that it must be a characteristic function.
(iii) Consider the random variable
N (ω)
X
ω 7→ S(ω) , Xk (ω),
k=1
3
Problem 3. Recall that the standard Cauchy distribution has density
1
ρ(x) = , x ∈ R.
π(1 + x2 )
How will you show that its characteristic function is given by f (t) = e−|t| without
using any complex integration?
Solution. We first reverse the problem by regarding e−|t| as a density and com-
puting its characteristic function. To be more precise, the function
1
g(t) , e−|t| , t ∈ R,
2
is a probability density function. Its characteristic function is given by
Z
1
ψ(x) = eixt g(t)dt
2 R
Z
1
= (cos xt + i sin xt)e−|t| dt
2
Z ∞R
= e−t cos xtdt,
0
where the last equality follows from the fact that sin x is odd and cos x is even.
Using integration by parts twice, we have
Z ∞
ψ(x) = − cos xtd(e−t )
0
Z ∞
−t ∞
e−t sin xtdt
= − e cos xt|0 + x
0
Z ∞
=1−x e−t sin xtdt
0
Z ∞
=1+x sin xtd(e−t )
0
Z ∞
−t ∞
e−t cos xtdt
= 1 + x e sin xt|0 − x
0
Z ∞
= 1 − x2 e−t cos xtdt
0
= 1 − x2 ψ(x).
4
Therefore,
1
ψ(x) =
1 + x2
is the characteristic function of g. Note that ψ is integrable on R. Therefore, by
the inversion formula, we have
Z ∞ Z ∞
1 −|t| 1 −itx 1 1
e = g(t) = e ψ(x)dx = e−itx · dx.
2 2π −∞ 2π −∞ 1 + x2
1 ∞ eitx
Z
−|t|
e = dx.
π −∞ 1 + x2
The right hand side is the characteristic function of the Cauchy distribution by
definition.
5
Week 8 Practice: Solutions
Problem 1. (1) Suppose that X and Y are both independent standard Cauchy
random variables. What is the prbability density function of X + Y ?
(2) Find an example to show that fX+Y (t) = fX (t)fY (t) (characteristic functions)
does not imply X, Y are independent.
[Hint: Consider the Cauchy distribution.]
Solution. (1) Recall that the characteristic function of X is given by fX (t) = e−|t| .
Therefore,
fX+Y (t) = fX (t) · fY (t) = e−2|t| .
This is the same as the distribution of 2X, whose probability density function is
easily seen to by
1 2
ρ(x) = · , x ∈ R.
π 4 + x2
(2) Let X be a standard Cauchy random variable. Then
f2X (t) = e−2|t| = fX (t)fX (t).
However, any random variable cannot be independent from itself unless it is a
deterministic constant (why?).
d
Problem 2. Let Xn = N (an , σn2 ) with an ∈ R and σn > 0. Suppose that Xn
converges weakly to some random variable X. Show that
an → a, σn → σ
d
for some a ∈ R and σ > 0. In addition, X = N (a, σ 2 ). This result tells us that
normal distributions are stable under weak convergence.
Solution. Since Xn is weakly convergent, we know that {Xn } is tight. Therefore,
the real sequences {an } and {σn } are both bounded (from Assignment 1, Problem
1). By the Bolzano-Weierstrass theorem in real analysis, we can find a subsequence
ank → a, σnk → σ
1
with some limit points a and σ. Since the characteristic function of Xnk is given
by
1 2 2
fnk (t) = eitank − 2 σnk t ,
we have
1 2 t2
fnk (t) → eita− 2 σ .
On the other hand, by assumption we have Xn → X weakly (so is true for
1 2 2
Xnk ). Therefore, the characteristic function of X must be equal to eita− 2 σ t , or
d
equivalently, X = N (a, σ 2 ). The above argument also shows that there cannot
be other limit points for the sequences an and σn apart from a and σ. Indeed,
if a0 and σ 02 are different limit points, by the same reason as before, we have
d
X = N (a0 , σ 02 ), and thus a0 = a, σ 0 = σ. As a consequence, an → a and σn → σ.
Problem 3. Let X be a random variable with characteristic function f (t).
(1) Suppose that f (2π) = 1. Show that with probability one, X takes values in
the set of integers.
[Hint: Look at the real part of the equation f (2π) = 1.]
(2) Suppose that X takes integer values only. Show that
Z π
1
P(X = k) = f (t)e−ikt dt
2π −π
for each k ∈ Z.
(3) By using the previous two parts, show that | cos t| cannot be a characteristic
function. Note that cos t is a characteristic function. This problems shows that
the modulus of a characteristic function needs not be a characteristic function.
Solution. (1) Recall that
Since 1 − cos 2πX > 0, we conclude that with probability one, 1 − cos 2πX = 0,
or equivalently, X ∈ Z (cos u = 1 if and only if u ∈ 2πZ).
2
(2) Since X is integer valued, we have
X
f (t) = E[eitX ] = eint P(X = n).
n∈Z
For given k ∈ Z, let us multiply e−ikt on both sides and then integrate over [−π, π].
Note that
Z π Z π (
0, if k 6= n
eint · e−ikt dt = ei(n−k)t dt =
−π −π 2π, if k = n.
Therefore,
Z π X Z π
−ikt
f (t)e dt = P(X = n) · eint · e−ikt dt
−π n∈Z −π
= 2πP(X = k).
(3) Suppose on the contrary that | cos t| is a characteristic function of some
random variable X. Since | cos 2π| = 1, we know that X must be integer valued.
Therefore, we can use the formula in Part (2). In particular, we have
Z π
2πP(X = k) = | cos t| · e−ikt dt
−π
Z π
=2 | cos t| · cos ktdt
0
Z π/2 Z π
=2 cos t · cos ktdt − cos t · cos ktdt
0 π/2
Z π/2
= cos(k + 1)t + cos(k − 1)t dt
0
Z π
− cos(k + 1)t + cos(k − 1)t dt
π/2
2 sin k+1
2
π 2 sin k−1
2
π
= + .
k+1 k−1
For k = 4, the right hand side equals
2 (−2) 4
+ = − < 0,
5 3 15
which is absurd. Therefore, | cos t| cannot be a characteristic function.
3
Week 9 Practice: Solutions
The purpose of this practice class is to work out a proof of Lyapunov’s central
limit theorem by using the method of characteristic functions. This is a bit more
sophisticated than the i.i.d. case, but after working through these details we will
be more comfortable with complex numbers and characteristic functions.
Theorem 1 (Lyapunov’s CLT). Let {Xn : n > 1} be a sequence of independent
random P have zero mean and finite third moments. Define Σn ,
variables that
Var[Sn ] and Γn , nm=1 E[|Xm |3 ]. If
p
Γn
→0 as n → ∞,
Σ3n
Sn
then Σn
→ N (0, 1) weakly.
Problem 2. Let {θn,m : 1 6 m 6 n, n > 1} be a double array of complex
numbers. Suppose that the following three properties hold:
P 16m6n |θn,m | → 0 as n → ∞;
(i) max
(ii) nm=1 |θn,m | 6 M for all n, where M is some positive number that is indepen-
dentP of n;
(iii) nm=1 θn,m → θ for some complex number θ as n → ∞.
Show that n
Y
(1 + θn,m ) → eθ .
m=1
1
for all complex numbers whenever log w1 , log w2 are defined (regardless of the
analytic branch we choose). For those who have not seen complex logarithm, I
would like you to take the following facts as granted. The function log(1 + z) is a
well defined continuous function in the open ball |z| < 1 which satisfies log 1 = 0.
It has the following estimate:
| log(1 + z) − z| 6 |z|2
for all z ∈ C such that |z| 6 21 . This estimate can be obtained easily by using the
Taylor expansion of log(1 + z) around z = 0.
Solution. By Assumption (i), we may first assume that n is large enough so that
|θn,m | 6 21 for any 1 6 m 6 n. This allows us to make sure that log(1 + θn,m ) is
well defined and to use the inequality
It follows that
n n n
X X X
log(1 + θn,m ) − θ 6
θn,m − θ +
log(1 + θn,m ) − θn,m
m=1 m=1 m=1
n n
X X
6 θn,m − θ +
|θn,m |2 .
m=1 m=1
By Assumption (iii), the first term tends to zero. As for the second term, we have
n
X n
2
X
|θn,m | 6 max |θn,m | · |θn,m | 6 M · max |θn,m | → 0,
16m6n 16m6n
m=1 m=1
where the second inequality follows from Assumption (ii) and the last convergence
follows from Assumption (i). Therefore, we conclude that
n
X
lim log(1 + θn,m ) = θ,
n→∞
m=1
and thus n n
Y X
log(1 + θn,m ) → eθ .
(1 + θn,m ) = exp
m=1 m=1
2
Problem 3. Follow the steps below to prove Lyapunov’s central limit theorem
(i) Let fm be the characteristic function of Xm and let t ∈ R be fixed. By using
Taylor’s theorem we can write
00
0 fm (0) 2 f (3) (ζm,t ) 3
fm (t) = fm (0) + fm (0)t + t + t,
2 6
where ζm,t is a number between 0 and t. What is the explicit form of the right
hand side in terms of Xm ?
(ii) Using the expansion in Part (i), write the characteristic function of ΣSnn (eval-
uated at a fixed t) in the form
n
Y
f Sn (t) = (1 + θn,m ).
Σn
m=1
(iii) Show that the double array {θn,m : 1 6 m 6 n, n > 1} of complex numbers
satisfy all the assumptions in Problem 1 with θ = −t2 /2 (recall that t has always
been fixed).
[Hint: Assumption (i) is harder to check, and you may need to use Hölder’s
inequality which says E[|X|p ]1/p 6 E[|X|q ]1/q for all 1 6 p 6 q < ∞].
Solution. Let fm be the
Qn characteristic function of Xm . The characteristic func-
Sn t
tion of Σn is given by m=1 fm ( Σn ). We wish to show that:
n
Y t 2
fm → e−t /2 as n → ∞,
m=1
Σn
for each fixed t ∈ R. The main idea is to use Problem 1, which requires writing
fm ( Σtn ) = 1 + θn,m with the complex numbers {θn,m } satisfying the assumption in
Problem 1. To obtain such an expansion we need to rely on Taylor’s approxima-
tion.
Since Xm has finite third moment, we know that fm is continuously differ-
entiable up to order three. Let t be a fixed real number. According to Taylor’s
theorem, we have
00
0 fm (0) 2 f (3) (ζm,t ) 3
fm (t) = fm (0) + fm (0)t + t + t,
2 6
where ζm,t is a number between 0 and t. By writing out the derivatives in terms
of moments explicitly, we have
1 2 2 i 3 iζm,t Xm 3
fm (t) = 1 − σm t − E[Xm e ]t .
2 6
3
Therefore,
2 3 iζm,t Xm
t 1 σm 2 i E[Xm e ] 3
fm =1− · 2t − 3
t.
Σn 2 Σn 6 Σn
We denote
1 σ2 3 iζm,t Xm
i E[Xm e ] 3
θn,m , − · m2 t2 − 3
t,
2 Σn 6 Σn
and check the three assumptions in Problem 1.
Assumption (i). We first look at the second term in θn,m . For each 1 6 m 6 n,
we have
3 iζm,t Xm
i E[Xm e ] 3 |t|3 max16m6n E[|Xm |3 ] |t|3 Γn
max t 6 · 6 ·
16m6n 6 Σ3n 6 Σ3n 6 Σ3n
which converges to zero by the assumption. As for the first term in θn,m , let us
denote X̂m , XΣm
n
. According to Hölder’s inequality, we have
Therefore,
t2 σ2 t2 t2 Γn 2/3
· max m2 = 2
max E[X̂m ]6 · 3 →0
2 16m6n Σn 2 16m6n 2 Σn
as n → ∞. We have thus concluded that
max |θn,m | → 0 as n → ∞.
16m6n
Therefore,
n n
X t2 |t|3 X E[|Xm |3 ] t2 |t|3 Γn
|θn,m | 6 + · = + · .
m=1
2 6 m=1 Σ3n 2 6 Σ3n
Γn
The right hand side is uniformly bounded in n since we know that Σ3n
→ 0 by the
assumption.
4
Assumption (iii) with θ = −t2 /2. We have
n n
X 3 iζm,t Xm
t2 it3 X E[Xm e ]
θn,m =− − · 3
.
m=1
2 6 m=1 Σn
We have seen previously that the modulus of the second term is bounded by
|t|3 Γn
6
· Σ3 → 0. Therefore, we conclude that
n
n
X t2
θn,m → − .
m=1
2
Sn
as n → ∞. By Lévy-Cramér’s theorem, we have Σn
→ N (0, 1) weakly. This
completes the proof of Lyapunov’s theorem.
5
Week 10 Practice: Solutions
Problem 1. Let {Xn : n > 1} be an i.i.d. sequence with zero mean and unit
variance. Show that X1 +···+X
√
n
n
will never converge in probability. This tells us that
we cannot strengthen the central limit theorem to convergence in probability.
Sn
Solution. We already know from the central limit theorem that √ n
→ N (0, 1)
weakly. Suppose on the contrary that there is a random variable Z, such that
Sn
√
n
→ Z in probability. Since convergence in probability implies weak conver-
d
gence, we must have Z = N (0, 1). On the other hand, as a subsequence we have
S S + (Xn+1 + · · · + X2n )
√2n = n √ →Z in probability.
2n 2n
Therefore,
Xn+1 + · · · + X2n √ S2n Sn √
√ = 2 × √ − √ → ( 2 − 1)Z in probability.
n 2n n
But since
Xn+1 + · · · + X2n d Sn
√ =√ ,
n n
we also have
Xn+1 + · · · + X2n
√ →Z weakly.
n
√
Therefore, we conclude that Z has the same distribution as ( 2 − 1)Z, which is
Sn
absurd. In other words, √n
can never converge in probability.
Problem 2. Let {Xn : n > 1} be an i.i.d. sequence with zero mean and unit
variance.
(i) Show that P
16i<j6n Xi Xj
→0
n2
1
almost surely as n → ∞.
(ii) Show that P
16i<j6n Xi Xj Z2 − 1
→ weakly
n 2
d
as n → ∞, where Z = N (0, 1).
X1 + · · · + Xn X 2 + · · · + Xn2
→ 0, 1 → 1 a.s.
n n
Therefore, P
16i<j6n Xi Xj
→ 0 a.s.
n2
(ii) Using the same expression as in Part (i), we have
P
16i<j6n Xi Xj 1 X1 + · · · + Xn 2 X12 + · · · + Xn2
= × √ − .
n 2 n 2n
Since
X1 + · · · + Xn
√ →Z weakly
n
and
X12 + · · · + Xn2
→ 1 a.s.,
n
we conclude that P
16i<j6n Xi X j Z2 − 1
→ weakly.
n 2
The above argument uses the following facts for weak convergence.
2
Proof. (i) Let f be a bounded continuous function. We need to show that
But this follows from the assumption and the fact that the function x 7→ f (x2 ) is
bounded continuous.
(ii) We verify the second characterisation in the Portmanteau theorem. Let f
be a bounded and uniformly continuous function on R. We first claim that
lim E[f (Xn + Yn )] − E[f (Xn + c)] = 0.
n→∞
Indeed, given ε > 0, by uniform continuity there exists δ > 0 such that
|x − y| 6 δ =⇒ |f (x) − f (y)| 6 ε.
It follows that
E[f (Xn + Yn )] − E[f (Xn + c)]
6 E[|f (Xn + Yn ) − f (Xn + c)|; |Yn − c| 6 δ]
+ E[|f (Xn + Yn ) − f (Xn + c)|; |Yn − c| > δ]
6 ε + 2kf k∞ · P(|Yn − c| > δ),
Yn → c weakly =⇒ Yn → c in prob.
3
Solution. We use the method of characteristic functions. Let f (t) be the char-
acteristic function of X1 . For each t 6= 0 we have
Z ∞
1 − cos tx
Z
itx
1 − f (t) = (1 − e )p(x)dx = 2 dx
R 1 x3
Z ∞
2 1 − cos u
= 2t du. (E1)
|t| u3
We wish to understand the behaviour of f (t) when t → 0. The issue of the above
integral is that, when u is small we have
1 − cos u 1
3
≈
u 2u
which is not integrable near the origin.
We now use L’Hôpital’s rule to figure out the precise explosion rate of this
integral. Let Z ∞
1 − cos u
ϕ(x) , du, x > 0.
x u3
Then
1 − cos x 1
ϕ0 (x) = − 3
≈− as x → 0.
x 2x
Therefore, the explosion rate of ϕ(x) should coincide with the explosion rate of
Z
1 1
(− )dx = ln x−1
2x 2
as x → 0. Using L’Hôpital’s rule, one checks that
ϕ(x)
lim = 1.
x→0 1 ln x−1
2
ϕ(x)
1 = 1 + g(x)
2
ln x−1
where g(x) = o(1), i.e. a function such that g(x) → 0 when x → 0. In particular,
we can write
1 1
ϕ(x) = ln x−1 + g(x) ln x−1 .
2 2
4
We substitute the above expression into equation (E1) to obtain that
1 1
1 − f (t) = 2t2 × ln |t|−1 + g(|t|) ln |t|−1
2 2
= t2 ln |t|−1 + t2 g(|t|) ln |t|−1 ,
or equivalently,
f (t) = 1 − t2 ln |t|−1 − t2 g(|t|) ln |t|−1 .
This expression gives the precise behaviour of f (t) near the origin, which can be
used to obtain a central limit theorem for Sn .
To be more precise, our aim is to find a suitable normalising constant an > 0,
so that Sann has a meaningful weak limit as n → ∞. To figure out what an should
be, we look at the characteristic function fn (t) of Sann :
t n t2 an t2 |t| an n
fn (t) = f = 1 − 2 ln − 2 g( ) ln
an an |t| an an |t|
2 2
t an t |t| an
= exp n × ln 1 − 2 ln − 2 g( ) ln .
an |t| an an |t|
Note that as n → ∞, the last term at2 g( a|t|n ) ln a|t|n is smaller than the term at2 ln a|t|n
2 2
n n
due to the presence of the function g as n → ∞. By using the approximation
ln(1 − x) ≈ x as x → 0, in order to expect that fn (t) converges to a meaningful
limit, we must require that
t2 an
n × 2 ln
an |t|
converges to a meaningful
√ limit. A moment’s thought reveals that this is suggest-
ing the choice an = n ln n. Indeed, for this choice of an we have
t2 an t2 (n ln n)1/2
n× ln = n × ln
a2n |t| n ln n |t|
2
t 1
= × (ln n + ln ln n) − ln |t|
ln n 2
2
t
→ .
2
√
Therefore, with the choice of an = n ln n, we conclude that
2 /2
fn (t) → e−t .
In other words, we have
X 1 + · · · + Xn
√ → N (0, 1) weakly.
n ln n
5
Week 11 Practice: Solutions
1
Solution. (i) We can write
Pn
Nn ηk 1 X X
= k=1 =
ηn + ηn .
n n n 16k6n 16k6n
k:odd k:even
η1 + η3 + · · · + η2m−1 1
→ a.s.
m 4
Similarly,
η2 + η4 + · · · + η2m 1
→ a.s.
m 4
It follows that
Nn 1 X X
= ηn + ηn
n n 16k6n 16k6n
k:odd k:even
1 1 X 1 1 X
= × ηn + × ηn
2 n/2 16k6n 2 n/2 16k6n
k:odd k:even
1 1 1 1 1
→ × + × = a.s.
2 4 2 4 4
(ii) Let ϕ be a given continuously differentiable function with bounded derivative.
Let f be the unique bounded solution to Stein’s equation associated with ϕ. We
wish to estimate the quantity
The first issue will arise if we copy the proof from the independent case, namely
when switching from Ŝn to Ŝn − X̂m . The problem is that we no longer have
2
since X̂m is not independent of Ŝn − X̂m . To overcome this issue, we need to
subtract a bit more terms from Ŝn so that the remaining part is independent of
X̂m . It is apparent that X̂m is independent of
As a consequence, if we set
then we have
and thus
E[X̂m f (Ŝn )] = E[X̂m f (Ŝn ) − f (Ŝn − X̄m ) ].
The next step can be copied from the independent case. Namely, we write
Z 1
f (Ŝn ) − f (Ŝn − X̄m ) = f 0 (Tn,m (t)) · X̄m dt
0
where
Tn,m (t) , (1 − t)(Ŝn − X̄m ) + tŜn .
It follows that
Z 1
E[X̂m f (Ŝn )] = E[X̂m X̄m · f 0 (Tn,m (t))dt]
Z0 1
f 0 (Tn,m (t)) − f 0 (Tn,m (0)) dt]
= E[X̂m X̄m ·
0
+ E[X̂m X̄m · f 0 (Tn,m (0))].
Therefore, we have
3
The last sum on the right hand side can be estimated easily. Indeed, we have
and thus
n Z 1
X
f 0 (Tn,m (t)) − f 0 (Tn,m (0)) dt]
E[X̂m X̄m ·
m=1 0
n
X
6 kf 00 k∞ 2
E[|X̂m | · X̄m ]
m=1
00 n
= C1 kf k∞ × ,
Σ3n
where
C1 , E[|Xm | · (Xm−1 + Xm + Xm+1 )2 ]
is a constant independent of m.
It remains to estimate the first term on the right hand side of equation (E1).
For this purpose, we write
n
X
0
E[f (Ŝn )] − E[X̂m X̄m · f 0 (Tn,m (0))]
m=1
n
X
0
= E[f (Ŝn )] − E[X̂m X̄m · f 0 (Ŝn − X̄m )]
m=1
n
X
= E[Var[Ŝn ] × f 0 (Ŝn )] − E[X̂m X̄m · f 0 (Ŝn − X̄m )],
m=1
4
Let us denote ζm , E[X̂m · X̄m ]. It follows that
n
X
0
E[f (Ŝn )] − E[X̂m X̄m · f 0 (Tn,m (0))]
m=1
n
X
E[ζm f 0 (Ŝn )] − E[X̂m X̄m · f 0 (Ŝn − X̄m )]
=
m=1
Xn
E[ζm · f 0 (Ŝn ) − f 0 (Ŝn − X̄m ) ]
=
m=1
Xn
E[ ζm − X̂m X̄m f 0 (Ŝn − X̄m )].
+ (E2)
m=1
The first sum on the right hand side is also easily estimated:
n
X
E[ζm · f 0 (Ŝn ) − f 0 (Ŝn − X̄m ) ]
m=1
n
X
00
6 kf k∞ |ζm | · E[|X̄m |]
m=1
n
6 C2 kf 00 k × ,
Σ3n
where C2 is a constant independent of m. As for the second term on the right
hand side of (E2), the key observation is that, if we subtract two more terms from
Ŝn − X̄m (namely X̂m−2 and X̂m+2 ), the remaining quantity is independent of
ζm − X̂m X̄m . To be precise, let us denote
¯ = X̄ + X̂
X̄ m m m−2 + X̂m+2 .
Then
¯ )] = E[ ζ − X̂ X̄ ] · E[f 0 (Ŝ − X̄
E[ ζm − X̂m X̄m f 0 (Ŝn − X̄
¯ )] = 0.
m m m m n m
It follows that the second term on the right hand side of (E2) is equal to
n
X
E[ ζm − X̂m X̄m f 0 (Ŝn − X̄m )]
m=1
n
¯ )],
X
E[ ζm − X̂m X̄m · f 0 (Ŝn − X̄m ) − f 0 (Ŝn − X̄
= m
m=1
5
which is bounded above by
n
¯ )]
X
E[ ζm − X̂m X̄m · f 0 (Ŝn − X̄m ) − f 0 (Ŝn − X̄
m
m=1
n
X
00
6 kf k∞ E |ζm | + |X̂m X̄m | · X̂m−2 + X̂m+2
m=1
00 n
6 C3 kf k∞ × .
Σ3n
Ckf 00 k∞
|E[f 0 (Ŝn )] − E[Ŝn f (Ŝn )]| 6 √ .
n
From this point on, the central limit theorem and the L1 -Berry-Esseen estimate
follow from exactly the same argument as in the independent case.