LN SP2018

Stochastic Processes
Dmitry Ioffe
June 13, 2018

2
Contents
1 Basic Probability Theory. . . . . . . . . . . . . . . . . . . . . . . . . 6

1.1 Probability Spaces. . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Modes of convergence. . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Properties with probability 1 . . . . . . . . . . . . . . . . . . 20
1.5 Convergence of random series . . . . . . . . . . . . . . . . . . 25
1.6 Law of large numbers . . . . . . . . . . . . . . . . . . . . . . . 28
1.7 Further Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 31
2 Renewal theory in discrete time. . . . . . . . . . . . . . . . . . . . . . 35
2.1 The setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2 Renewal equation and renewal theorem. . . . . . . . . . . . . 35
2.3 Renewal-reward theorem. . . . . . . . . . . . . . . . . . . . . . 40
2.4 Size bias, Excess life and Stationarity. . . . . . . . . . . . . . . 42
2.5 Coupling and the key renewal theorem. . . . . . . . . . . . . . 47
3 Conditional expectations. . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1 Definition and basic properties. . . . . . . . . . . . . . . . . . 50
3.2 Example: Branching process. . . . . . . . . . . . . . . . . . . 52
3.3 Example: Erdos-Renyi random graph . . . . . . . . . . . . . . 54
4 Markov Chains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1 The setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Linear algebra. . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4 Strong Markov property. . . . . . . . . . . . . . . . . . . . . . 60
4.5 Class properties. Transience and recurrence. . . . . . . . . . . 62
4.6 Ergodic theorem for Markov chains. . . . . . . . . . . . . . . . 63
4.7 Coupling from the past and perfect sampling. . . . . . . . . . 69
5 Martingales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.1 Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2 Convergence theorems for expectations. . . . . . . . . . . . . . 74
5.3 Optional stopping. . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4 Maximal inequality and Martingale LLN. . . . . . . . . . . . . 78
5.5 Martingale convergence theorem. . . . . . . . . . . . . . . . . 80
5.6 Transience and recurrence of Markov chains. . . . . . . . . . . 83
6 Reversible Random Walks and Electric Networks . . . . . . . . . . . 88
3
4 CONTENTS
6.1 The setup: Probabilistic vrs Electrostatic Interpretation. . . . 88
6.2 Necessary and sufficient criterion for transience. . . . . . . . . 90
6.3 Probabilistic interpretation of unit currents. . . . . . . . . . . 91
6.4 Variational description of effective conductances and effective
resistances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.5 Rayleigh’s principle and Nash-Williams criterion. . . . . . . . 94
6.6 Simple random walk on Zd and Polya’s Theorem. . . . . . . . 95
6.7 Simple random walk on trees. . . . . . . . . . . . . . . . . . . 96
7 Renewal theorey in continuous time . . . . . . . . . . . . . . . . . . . 97
7.1 Poisson Process. . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.2 The Setup and the Elementary Renewal Theorem. . . . . . . . 97
7.3 Renewal-reward theorem and applications. . . . . . . . . . . . 102
7.4 Excess life distribution and stationarity. . . . . . . . . . . . . 113
8 Continuous Time Markov Chains. . . . . . . . . . . . . . . . . . . . . 117
8.1 Finite state space. . . . . . . . . . . . . . . . . . . . . . . . . 117
8.2 Ergodic theorem for CTMC on a finite state space. . . . . . . 124
8.3 Countable state space. . . . . . . . . . . . . . . . . . . . . . . 125
8.4 Explosions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.5 Ergodic theorem for CTMC on countable state spaces. . . . . 129
8.6 Biased sampling and PASTA. . . . . . . . . . . . . . . . . . . 130
8.7 Reversibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
List of Notations
i.i.d. Independent and identically distributed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1A Indicator of A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
σ(A0 ) Minimal σ-algebra which contains A0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
AB Short hand notation for A ∩ B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
A∆B Symmetric difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
N0 The set of non-negative integers {0, 1, 2, . . . } . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5
6 CONTENTS
1 Basic Probability Theory.
1.1 Probability Spaces.
A Probability Space is a triple (Ω, A, P), where:
• The sample space Ω is the set of outcomes, points in Ω are denoted ω ∈ Ω.
• A is the collection (σ-algebra) of events - subsets of Ω; A ⊆ Ω.
• is a probability measure, which assigns to each A ∈ A its probability P(A) ∈
[0.1].
A is a σ-algebra if:
1. Ω ∈ A.
2. For any A ∈ A its complement Ac = Ω \ A ∈ A. In particular the empty set ∅ ∈ A.
3. If A1 , A2 , · · · ∈ A then ∪n An ∈ A.
Exercise 1.1.1. Check that if A is σ-algebra, then ∩n An ∈ A for any finite or

countable collection of A1 , A2 , · · · ∈ A. Check that (∪An )c = ∩Acn . Also check that
if A, B ∈ A, then A \ B = AB c ∈ A and A∆B = (A \ B) ∪ (B \ A) ∈ A.
For instance,
ω ∈ (∪An )c ⇔ ∀ n ω 6∈ An ⇔ ∀ n ω ∈ Acn ⇔ ω ∈ ∩Acn .
Remark 1.1.1. There are plenty of different σ-algebras which may be defined on
the same Ω. The two simplest ones are:
A∅ = {∅, Ω} and AΩ = 2Ω . (1.1.1)
That is A∅ contains only two sets and AΩ contains all the subsets of Ω. In many
cases A∅ is too small and AΩ two big. Usually we shall work with σ-algebras A
which are constructed as follows: If A0 is some family of subsets of Ω, then is
the minimal σ-algebra which contains all the sets from A0 . We shall denote such
minimal σ-algebras as σ (A0 ).
P is a probability measure if:
(A1) For any event A ∈ A the probability 0 ≤ P(A) ≤ 1.
(A2) P(Ω) = 1.
P or countable family of pairwise disjoint events A1 , A2 , · · · ∈ A,

(A3) For any finite
P(∪An ) = n P(An ).
1. BASIC PROBABILITY THEORY. 7
Exercise 1.1.2. Check that (A3) above is equivalent to additivity of P (meaning
that P(A ∪ B) = P(A) + P(B) for any pair of disjoint events A, B ∈ A) and any of
the following:
(A3.1) For any B1 ⊆ B2 ⊆ B3 ⊆ . . . set B = ∪n Bn . Then,
lim P(Bn ) = P(B),
n→∞
(A3.2) For any B1 ⊇ B2 ⊇ B3 ⊇ . . . set B = ∩n Bn . Then,

lim P(Bn ) = P(B),
n→∞
(A3.3) For any B1 ⊇ B2 ⊇ B3 ⊇ . . . with ∩Bn = ∅ ,

lim P(Bn ) = 0.
n→∞
For instance in order to check that (A3.3) ⇒ (A3) set A = ∪An and define
Bn = ∪∞n+1 A` . Then by additivity
n
X
P(A) = P(Aj ) + P(Bn ).
1
But, limn→∞ P(Bn ) = 0 by (A3.3). Hence (A3).

Remark 1.1.2. As we shall briefly indicate below, in general it is not possible to
define P(A) for all subsets A ⊆ Ω. This is similar to not being able to measure
the area of any subset A ⊂ R2 . This, however, does not happen if Ω is finite or
countable.
Finite or Countable Probability Spaces.

If Ω is finite or countable, then it is possible and natural to take A as the power
set (collection of all the subsets) of Ω, A = 2Ω . Probabilities are determined by the
collection of numbers, {P(ω)}ω∈Ω . That is for any A ∈ A or, equivalently, for any
A ⊆ Ω, X
P (A) = P(ω). (1.1.2)
ω∈A
Example 1.1.1. Bernoulli trails. Let us flip a coin n times. The probability space is
(Ωn , Fn , Pn ). The sample space is Ωn = {0, 1}n . That is ω ∈ Ωn is a n-dimensional vector
with coordinates ωi = 0, 1. We say that ωi = 1 indicates success (e.g. Head) in the i-th
trail. Fn = 2Ωn . There is only one parameter - probability of success p ∈ [0, 1], which
specifies Pn . Alternatively, q = 1 − p is the probability of failure (in a single trail). The
probabilities related to Bernoulli trails are given by
Pn Pn Pn Pn
ωi 1 (1−ωi ) ωi 1 (1−ωi )
Pn (ω) = p 1 (1 − p) =p 1 q . (1.1.3)
8 CONTENTS
Of course, n1 ωi is just the total number of successes in the outcome (of n trails)
P
ω. Important events related to Bernoulli trails are
( i
)
X
Ak = ω : ωi = k
1
Clearly,
n k n−k
P(Ak ) = p q . (1.1.4)
k
Note that the minimal σ-algebra σ (A0 , A1 , . . . , An ) contains all possible unions of
A0 , . . . , An and it is strictly smaller than 2Ωn .
Example 1.1.2. Random permutations. Ωn = Sn (the so called symmetric group) is

set of all n! permutations π = {π1 , . . . , πn } of {1, . . . , n}. We assume that all are equally
probable, that is Pn (π) = n!1 for any π ∈ Sn . Define Ak = {π : πk = k}. Then Pnk=1 (∪Ak )
is the probability that the random permutation has at least one fixed point.
Here are two classical problems related to random permutations:

Careless secretary. A secretary randomly distributes n letters to n envelopes
with printed addresses. Let Ai be the event that i-th letter is placed into the correct
envelop. Then ∪ni=1 Ai is the event that at least one letter is placed into correct
envelop. How to compute P (∪ni=1 Ai )?
Prisoner’s amnesty. There are 100 prisoners awaiting death penalty. They all sit
in Room 1. In Room 2 there are 100 boxes numbered 1 to 100. Each box contains a
slip with one of the prisoner’s name. Assume that all 100 names are different, and
each of 100 names is contained in one of the boxes. The prisoners enter Room 2 one
by one. Each one can open up to fifty boxes. Either he/she his/her name or not.
After that a prisoner proceeds to Room 3, all boxes are closed, and next prisoner
enters Room 2. There is no feedback, and the only thing the prisoners can do is to
agree beforehand on the strategy how the boxes should be open. If at least one of
100 prisoners fails to find his/her name, all will be executed. Using the model of
random permutations Example 1.1.2 it is possible to devise a strategy which will
ensure that with probability exceeding 0.3 the prisoners will survive.
Exercise 1.1.3. Let (Ω, A, P) be a probability space. If A1 , . . . An are any events,
then
X X X X
P(∪A` ) ≤ P(A` ), P(∩A` ) ≤ 1− P(Ac` ) but P(∪A` ) ≥ P(A` )− P(Aj A` ).
j<`
(1.1.5)
For the exact equality check the exclusion-inclusion principle:
X X X
P(∪A` ) = P(A` )− P(Aj A` )+ P(Aj Ak A` )−· · ·+(−1)n−1 P(∩A` ). (1.1.6)
j<` j<k<`
The first two of inequalities in (1.1.5) are called Boole’s inequalities, the last one
is called Bonferroni’s inequality.
Then, (1.1.6) and elementary combinatorics lead to the following solution to the
Careless secretary problem:
1 1 1 (−1)n−1
Pn (∪A` ) = 1 − + − + ··· + (1.1.7)
2! 3! 4! n!
Uncountable Sample spaces.

In many interesting and natural situations one has to consider uncountable sample
spaces. Two important examples are infinite sequences of Bernoulli trails and R.
Example 1.1.3. Ω = {ω = (ω1 , ω2 , . . . ) with ωi = 0, 1} = {0, 1}N . This is the space

of outcomes of an infinite sequence of Bernoulli trails. The family A0 is what one can
observe, namely it contains all the elementary outcomes of finite number of trials. That
is A0 consists of all the finite unions of the so called cylindrical events
∆
[a1 , . . . , an ] = {ω ∈ Ω : ω1 = a1 , . . . , ωn = an } . (1.1.8)
for n = 1, 2, 3, . . . and a1 , . . . an = 0, 1.
Remark 1.1.3. Note that Ω in Example 1.1.3 is uncountable, which means that it
is impossible to number all elements of Ω as ω (1) , ω (2) , . . . . Indeed, assume that we
(n) (n) (n)
manage to number them with ω (n) = a1 , a2 , . . . , an , . . . . Then consider

(1) (2)
ω ∗ = 1 − a1 , 1 − a2 , . . . , 1 − a(n)
n , . . . .
Then ω ∗ ∈ Ω, but ω ∗ 6= ω (n) for every n.

Also note that the family A0 in in Example 1.1.3 is algebra, but not σ-algebra.
Many “natural” events belong to σ (A0 ), but not to A0 itself. For instance
consider
An = {ω ∈ Ω : ωn = 1} = {Success in n-th trial} .
Then,
\ [ [ \
{Infinite number of successes} = An and {Finite number of failures} = An
m∈N n≥m m∈N n≥m
are such events. Furthermore, for ω ∈ Ω and n ∈ N define

n−1
1X
pn (ω) = ωi . (1.1.9)
n 1
10 CONTENTS
In other words, pn (ω) is proportion of successes among first n trials of ω. Given a
number p ∈ [0, 1] consider the event
n o ∞ [
∞ \
∞
∆
\
ω : |pn (ω) − p| < 2−k .

lim pn (ω) = p = (1.1.10)
n→∞
k=1 N =0 n=N
Then {limn→∞ pn (ω) = p} ∈ σ(A0 ), but it does not belong to A0 .
Example 1.1.4. Ω = [0, 1] and B0 = {[a, b] : 0 ≤ a ≤ b ≤ 1}, that is B0 is the family

∆
of all closed sub-intervals of [0, 1]. Then B[0,1] = σ(B0 ) is the σ-algebra of Borel subsets
of [0, 1]. It is possible to prove that B does not contain all the subsets of [0, 1] that is
B ⊂ 2[0,1] .
In many instances construction of probability measures is based on the following

fundamental theorem which we shall not prove here.
Let us say that P is σ-additive on an algebra A0 if it is additive and A.3.3 holds, that is
for any collection A1 ⊃ A2 ⊃ A3 ⊃ . . . of events from A0 satisfying ∩An = ∅, the limit
limn→∞ P(An ) = 0.
Caratheodory’s Extension Theorem. If P is σ-additive probability measure on an
algebra A0 , then it has a unique σ-additive extension to A = σ(A0 ).
Construction of Probability measure for Bernoulli Trials. Consider Example 1.1.3.

Once we fix probability of success p ∈ [0, 1] then for any cylindrical event A = [a1 , . . . an ] (recall (1.1.8),
P P
ai
P(A) = p (1 − p)n− ai
. (1.1.11)
Recall that A0 is the algebra of events which depend only on finite number of trials. Using (1.1.11) we extend P to
A0 by additivity. Actually, if A ∈ A depends only on first n trails, then computation of P(A) reduces toP
computation
n
on a finite probability space (Ωn , An , Pn ) discussed in Example 1.1.1 above. For instance the event 1 ωi = k
belongs to A0 and its probability equals to
n
pk (1 − p)n−k .
k
Exercise 1.1.4. Show that if A1 ⊇ A2 ⊇ . . . is a non-increasing sequence of events from A0 which satisfies
∩An = ∅, then there exists N < ∞, such that An = ∅ for all n ≥ N . In particular, limn→∞ P(An ) = 0 and hence
the conditions of Caratheodory’s theorem are satisfied.
Solution: We shall check that under the conditions of Exercise 1.1.4 there exists a finite n0 < ∞ such that An = ∅ for
any n ≥ n0 . Equivalently, since An is non-increasing sequence of events, there exists n such that An = ∅. This follows by
contradiction. Let
Fn = σ ([a1 , . . . , an ] ; a1 , . . . , an = 0, 1) .
There is no loss of generality to assume that An ∈ Fn . Note that for any n the collection {[a1 , . . . , an ]}a1 ,...,an =0,1 of 2n
events is a partition of Ω.
Assume now that An 6= ∅ for any n. Since An -s form a non-increasing sequence of events this would mean that there
exists a1 = 0, 1 such that An ∩ [a1 ] 6= ∅ for all n as well. Proceeding along these lines of reasoning we conclude that
under assumption An 6= ∅ there exists an infinite sequence a1 , a2 , . . . such that An ∩ [a1 , . . . , ak ] 6= ∅ for any n and
k. But An ∈ Fn . Therefore An ∩ [a1 , . . . , an ] 6= ∅ simply means that [a1 , . . . , an ] ⊆ An for any n. Consider a point
a = (a1 , a2 , a3 , . . . ) ∈ Ω. Clearly a ∈ [a1 , . . . , an ] for any n. By the above a ∈ An for any n. Hence a ∈ ∩An 6= ∅. A
contradiction.
Construction of Uniform Distribution (Lebesgue measure) on [0, 1]. Consider
now Example 1.1.4. Let B̃0 be the smallest algebra which contains B0 . In other words, B̃0 contains finite unions of
intervals (a, b), (a, b], [a, b), [a, b] For 0 ≤ a ≤ b ≤ 1 define
P ((a, b)) = · · · = P ([a, b]) = b − a,
and extend it by additivity to B̃0 .
Remark 1.1.4. On can use Heine-Borel argument/ compactness considerations in order to check that P above
satisfies conditions of Caratheodory’s theorem.
An example of an additive measure which does not have a σ-additive

extension: Consider Ω = Q ∩ (0, 1), that is Ω is the set of all rational number which belong to interval
(0, 1). Let B0 be the family of finite union of the sets (a, b) ∩ Q where 0 < a < b < 1. Define,
P ((a, b) ∩ Q) = b − a,
and extend it by additivity to B0 . Then there does not exist a σ-additive extension to σ(B0 ). Indeed, let us number
all rational numbers in (0, 1) as q1 , q2 , q3 , . . . . Consider sets
Ai = qi − 4−i , qi + 4−i ∩ Q ∩ [0, 1].

Clearly, P(Ai ) ≤ 2 × 4−i . On the other hand, Ω = ∪Ai . Hence, should there be a σ-additive extension of P, we
would obtain:
X 2
1 = P(Ω) ≤ P(Ai ) = ,
i
3
a contradiction.
Alternatively, should there be a σ-additive extension of P, the for any rational q ∈ (0, 1) it should hold that
P(q) = lim→0 P ((q − , q + ) ∩ Q) = 0, and hence P(Ω) = 0, again a contradiction.
One may object that B0 above is not an algebra. Here is a classical example of an additive but not σ-additive
function defined on a σ-algebra of subsets: Let F = 2N set of all the subsets for N. Using convention ∞ · 0 = 0,
define
µ(A) = ∞1{A is infinite} . (1.1.12)
Then µ is additive, but not σ-additive.
1.2 Random Variables

Let (Ω, A, P). A random variable X is any function from Ω to R which satisfies the
following measurability property:
CASE 1 If the range of X (which we shall denote RX ) is finite or countable, in which case
X is called discrete, then the event
∆
{X = a} = {ω : X(ω) = a} ∈ A (1.2.1)
for any a ∈ RX .
CASE 2 In general, for any a ∈ R the set
∆
{X ≤ a} = {ω : X(ω) ≤ a} ∈ A. (1.2.2)
∆
That is {X ≤ a} is an event for any a ∈ R and hence probabilities P (X ≤ a) = FX (a) are
well defined.
12 CONTENTS
Each event A ∈ A has its representative X = 1A , called the indicator of A, in
the world of random variables.
(
1, if ω ∈ A
1A (ω) = (1.2.3)
0, if ω ∈
6 A
Remark 1.2.1. Each random variable X generates a sub σ-algebra AX , which is

defined as the smallest σ algebra which contains all the events (1.2.2). For instance
σ-algebra generated by an indicator 1A contains is {Ω, ∅, A, Ac }.
Distribution function of random variable Given a random variable X its distribu-

tion function The distribution function FX of a random variable X is defined via
FX (x) = P (X ≤ x) . (1.2.4)
Let us say that x belongs to the support supp(X) of the distribution of X if FX (x + ) >
FX (x − ) for any > 0. Equivalently, x ∈ supp(X), if P (X ∈ (x − , x + )) > 0 for any
> 0.
Let us say that X has discrete distribution with probability function px if there is an at
most countable S ⊂ R such that P(X ∈ S) = 1, and P(X = s) = pX (s) for every s ∈ S.
RLet
x
us say that X has a continuous distribution with density function fX if FX (x) =
−∞ X
f (t)dt for every x ∈ R.
Exercise 1.2.1. (a) Check that FX is right-continuous:
lim FX (y) = FX (x)

y↓x
and that it has left limits limy↑x FX (y) = P(X < x). Conclude that P(X = x) equals
to the size of jump of FX at x.
(b) Find an example of a discrete random variable X which has a continuous support,
e.g. supp(X) = R.
(c) Let X be a continuous random variable. Check that Y = FX (X) has a uniform
distribution on [0, 1], that is P(Y ≤ y) = y for every y ∈ [0, 1].
(d) Give an example of a random variable which is neither discrete, nor continuous.
For instance here is a solution to (b): Number all rational numbers Q = {q1 , q2 , q3 , . . .}
(this is possible since Q is countable. Define the probability distribution of X via:
P (X = qi ) = 2−i .
Then X is discrete, but supp(X) = R.

Independence We say that σ-algebras A1 , . . . , An are independent if P(A1 . . . An ) =

P(A1 ) . . . P(An ) for any A1 ∈ A1 , . . . An ∈ An . Random variables X1 , . . . , Xn are in-
dependent if the σ-algebras AX1 , . . . , AXn they generate are independent. In particular
events A1 , . . . , An are independent if their indicators 1A1 , . . . , 1An are independent. Fi-
nally random variables in an infinite sequence X1 , X2 , . . . are independent if X1 , . . . , Xn are
independent for every n ∈ N.
Examples of Random Variables and their distributions.
• Bernoulli random variable X ∼ B(p) has only two possible values 0 and 1 with
p = P (X = 1). For any event A the indicator X = 1A is a Bernoulli random
variable. If (Ω, A, Pp ) is the probability space of a finite or infinite sequence
of independent Bernoulli trails as describe above, then Xi (ω) = ωi is a finite
or infinite sequence of independent Bernoulli random variables.
• Binomial distribution S ∼ BN (n, p) describes number of successes in n inde-

pendent Bernoulli trails with probability of success p each. So S is a sum of n
independent B(p) random variables,
N n
X X n k n−k
S= Xi = 1{ Success in i-th trail} and pS (k) = p q .
1 1
k
• Geometric random variable Y ∼ Geo(p) describes number of Bernoulli until

the first success, it version Y0 ∼ Geo0 (p) describes the number of failures until
the first success. Their probability functions are given by
pY (k) = pq k−1 for k = 1, 2, . . . and, respectively, pY0 (k) = pq k for k = 0, 1, . . . .
• Negative binomial distribution Z ∼ N B(r, p) describes number of Bernoulli

trails until r − th success.

k − 1 r k−r
pZ (k) = P (Z = k) = pq .
r−1
• Poisson random variable with intensity λ, N ∼ P oi(λ) has Range N0 and its
distribution function is given by
λk −λ
pN (k) = e .
k!
• Uniform random variable on a finite set A, U ∼ U ni(A) satisfies pU (u) = |A|−1

for any u ∈ A.
14 CONTENTS
• Uniform random variable on a continuous interval [a, b] has density function
fU (u) = (b − a)−1 1[a,b] (u).
• Exponential random variable T ∼ Exp(λ) of intensity λ has density function
fT (t) = λe−λt 1[0,∞]) (t).
• For r ∈ N, random variable S ∼ Erlang(r, λ) is sum of r independent Exp(λ)

random variables, S = T1 + · · · + Tr . Its density function is given by
λr sr−1 −λs
fS (s) = e 1[0,∞]) (s) (1.2.5)
Γ(r)
where the normalizing constant

Z ∞
Γ(r) = λr sr−1 e−λs ds. (1.2.6)
0
Γ in (1.2.6) is called Gamma function. For r ∈ N it could be exactly computed:

Γ(r) = (r −1)!. Moreover, since the integral on the right hand side of (1.2.6) is
still convergent for any r > 0, (1.2.5) still defines probabilistic density, which
is called density of Γ(r, λ) distribution.
• X ∼ β(α, β), the so called Beta distribution, if the density function of X is

given by
1
uα−1 (1 − u)β−1
Z
fX (u) = 1[0,1] (u) where B(α, β) = uα−1 (1 − u)β−1 du.
B(α, β) 0
(1.2.7)
• Normal or Gaussian distribution X ∼ N (µ, σ 2 ) has density
1 (x−µ)2
fX (x) = √ e− 2σ2 . (1.2.8)
2πσ 2
Expectation. For an indicator random variable X = 1A its expectation E(X) is de-

fined via E(X) = P(A). For a finite linear combination of indicators, the expectation
is defined via
Xn n
X
E( ai 1Ai ) = ai P(Ai ). (1.2.9)
1 1
More care is needed for defining an expectation of an infinite linear combination

of indicators: Let X(ω) = ∞
P
1 a i Ai (ω).
1 P∞ (we assume that the sum is defined with
probability 1, that is we assume that P ( 1 ai 1Ai (ω) is absolutely convergent) = 1).
Then:
 P∞

= 1 ai P(Ai ) if the series is absolutely convergent
∞ 
= ∞ P P
if i ai 1ai >0 P(Ai ) = ∞, but |ai | 1ai <0 P(Ai ) < ∞
X
E( ai 1Ai ) P Pi
= −∞ if i ai 1ai >0 P(Ai ) < ∞, but |ai | 1ai <0 P(Ai ) = ∞
P i

1 
 P
both i ai 1ai >0 P(Ai ) = i |ai | 1ai <0 P(Ai ) = ∞

not defined if
(1.2.10)
In order to define expectation of a general random variable X note that if X is

discrete, then it is already in the above form (finite or infinite combination of indi-
cators). Indeed, if S is finite or countable, and P (X ∈ S) = 1, then,
X
X(ω) = s1X=s .
s∈S
A general procedure for construction expectations for non necessarily discrete ran-
dom variables is described in Exercise 1.7.4, and it boils down to a construction
of the so called Lebesgue integral, which rucially differs from the usual Riemann
integral in the in the following sense: Lebesgue integral uses partition and approxi-
mation of the range rather than of the domain. If X is a continuous random variable
with density function fX , then
 R∞
= −∞ xfX (x)dx


if the integral is absolutely convergent
R∞ R0
= ∞ if xf (x)dx = ∞, but |x| fX (x)dx < ∞
X
E(X) R0∞ R−∞
0 (1.2.11)

 = −∞ if 0
xfX (x)dx < ∞, but −∞ |x| fX (x)dx = ∞
 R∞ R0
both 0 xfX (x)dx = −∞ |x| fX (x)dx = ∞

not defined if
Exercise 1.2.2. Let X be a random variable, and let F and G be two bounded
non-decreasing functions. Then
Cov (F (X), G(X)) = E (F (X)G(X)) − E(F (X))E(G(X)) ≥ 0 (1.2.12)
The property (1.2.12) is called positive association of probability measures on R.
Hint. Consider a random variable Y which is independent of X, but has exactly

the same distribution of the latter. Notice that in view of the assumed mono-
tonicity, (F (X) − F (Y)) (G(X) − G(Y)) ≥ 0 with probability 1, and in particular
E ((F (X) − F (Y)) (G(X) − G(Y))) ≥ 0.
16 CONTENTS
Tail formula for expectations. Let X be a non-negative random variable, and let F
be its distribution function. Then
Z ∞ Z ∞
E(X) = P(X > x)dx = ((1 − F (x)) dx. (1.2.13)
0 0
More generally, let X be any random variable, and let ϕ be a smooth non-decreasing
∆
function ϕ(−∞) = limx→−∞ ϕ(x) = 0. Then,
Z ∞
E(ϕ(X)) = ϕ0 (x)P(X > x)dx. (1.2.14)
−∞
RX R∞
Sketch of the Proof of (1.2.14). Write ϕ(X) = −∞ ϕ0 (x)dx = −∞ ϕ0 (x)1{X>x} dx.
Then interchanging the expectation and the integral (which is justified by the so
called Tonelli Theorem)
Z ∞ Z ∞ Z ∞
0 0
ϕ0 (x)P(X > x)dx.

E(ϕ(X)) = E ϕ (x)1{X>x} dx = ϕ (x) E 1{X>x} dx =
−∞ −∞ −∞
Exercise 1.2.3. Average waiting time for the best offer: Let X1 , X2 , . . . be contin-
uous i.i.d.random variables, say offers by different clients for the car you wish to
sell. Define
N = inf {n > 1 : Xn > X1 } .
That is N describes the amount of clients which will show up before there will be an
offer than the one made by the very first client. Check that E(N ) = ∞, but
√ better
E N < ∞.
Hint. Use tail formula (1.2.13).
Inequalities for Expectations. There are two groups of inequalities which we

shall use:
p
Cauchy-Schwartz. E (|XY|) ≤ E(X2 )E(Y2 ) .
1 1
Hölder. For any p, q ≥ 1 satisfying 1
p
+ 1
q
= 1, E (|XY|) ≤ (E |X|p ) p (E |Y|q ) q .
Jensen. If ϕ is convex, then ϕ(EX) ≤ Eϕ(X).

A proof of Hölder inequality is based on the so called Young inequality: For any
two poisitive numbers a, b and any Hölder conjugate powers p and q,
ap b q
ab ≤ + . (1.2.15)
p q
In its turn (1.2.15) is just a manifestation of concavity of log: For any t ∈ [0, 1]
log (tap + (1 − t)bq ) ≤ t log(ap ) + (1 − t) log(bq ).
1
It remains to take t = p
(and hence 1 − t = 1q ).
Markov. If X is non-negative random variable, then for any a ≥ 0,
E(X)
P (X ≥ a) ≤ . (1.2.16)
a
In particular, for any random variable X and for any non-negative and strictly
increasing (on supp (X )) function ϕ,
Eϕ(X)
P (X ≥ a) ≤ (1.2.17)
ϕ(a)
Chebychev. This is a particular instance of (1.2.17): If E(X) is defined and finite, then
Var(X)
P (|X − E(X)| ≥ a) ≤ . (1.2.18)
a2
1.3 Modes of convergence.

The first of the two main examples (of convergence as n → ∞) we shall keep in
mind at this stage is
Example 1.3.1.
n
1X
Xn = ξi , (LLN)
n 1
where ξ1 , ξ2 , . . . are i.i.d. (independent and identically distributed) random variables.
Results about convergence of such X-s are called Laws of Large Numbers (LLN).
The second main example is
Example 1.3.2.
n
X
Xn = ξn (RS)
1
where ξi -s are still independent, but not necessarily identically distributed. Here we
talk about convergence of random series.
18 CONTENTS
Here are few more, perhaps familiar, examples which do not fall into the same
category as above:
Example 1.3.3.
n
1 X
Xn = √ ξi , (LLN)
σ n 1
where ξ1 , ξ2 , . . . are i.i.d. mean zero and finite variance σ 2 random variables.
Example 1.3.4. Fix λ > 0, and let Xn is the number of successes in n Bernoulli
trials with probability of success pn = nλ .
Example 1.3.5. Fix r > 0 and consider distribution of m = brnc particles into n
urns (or energy levels) labeled 1, . . . , n. Let Xn be the number of particles which land
in urn number 1. Note that random variables Xn have different properties depending
on whether we consider particles to be different (Maxwell-Boltzmann statistics) or
identical (1.7.1). with, of course, xi ≤ yi for any i. (Bose-Einstein statistics).
Let us define several modes of convergence of a (general) sequence of random

variables {Xn } to some random variable X.
Almost-sure convergence. We say that limn→∞ Xn = X P-a.s. if

P lim Xn = X = 1.
n→∞
Convergence in probability. Let us say that p − limn→∞ Xn = X if for any > 0,
lim P (|Xn − X| ≤ ) = 1.
n→∞
Convergence in the mean. We say that limn→∞ Xn = X in mean, if
lim E (|Xn − X|) = 0.

n→∞
More generally, let r ≥ 1. We say that limn→∞ Xn = X in Lr (Ω, A, P) if
lim E (|Xn − X|r ) = 0.

n→∞
Convergence in distribution or weak convergence. We say a sequence of random

variables Xn with distribution functions Fn weakly converges to a random variable
X with distribution function F ; that w − limn→∞ Xn = X, if limn→∞ Fn (x) = F (x)
in any continuity point of F .
Remark 1.3.1. Note that unlike all other type of convergences defined above, convergence
in distribution does not require that random variables X1 , X2 , . . . , X are defined on the
same probability space.
Remark 1.3.2. The following implications hold:

(a) limn→∞ Xn = X P-a.s. ⇒ p − limn→∞ Xn = X.
(b) limn→∞ Xn = X in Lr ⇒ p − limn→∞ Xn = X.
(c) If q ≥ r, then limn→∞ Xn = X in Lq ⇒ limn→∞ Xn = X in Lr .
(d) p − limn→∞ Xn = X ⇒ w − limn→∞ Xn = X.
(e) w − limn→∞ Xn = X ⇔ For any bounded and continuous function f
lim E (f (Xn )) = E (f (X)) .

n→∞
For instance in order to check (a) pick a sequence m ↓ 0. Then (recall (1.4.1)) limn→∞ Xn = X is recorded as
 
\[ \
P {|Xn − X| ≤ m } = 1.
m N n≥N
Hence,  
[ \
P {|Xn − X| ≤ m } = 1
N n≥N
for any m. Since m is decreasing to zero, this implis that

 
[ \
P {|Xn − X| ≤ } = 1
N n≥N
for any > 0 fixed. But by σ-additivity,

   
[ \ \
P {|Xn − X| ≤ } = lim P  {|Xn − X| ≤ } .
N →∞
N n≥N n≥N
T
Since P n≥N {|Xn − X| ≤ } ≤ P (|XN − X| ≤ ) it follows that for any > 0,
lim P (|XN − X| ≤ ) = 1,
N →∞
which precisely means that p − limn→∞ Xn = X.
Remark 1.3.3. Note that (a)-(d) of Remark 1.3.2 imply that weak convergence is the weakest one, and convergence in
probability is the next weakest one. Exercise 1.7.5 below implies that in general arrows in (a),(b) and (d) cannot be inverted.
On the other hand the following fact (which we shall not prove at this stage), called Skorohod convergence Theorem, holds:
If w − limn→∞ Xn = X, then one can construct a probability space and a sequence of random variables Y1 , Y2 , . . . , Y
on it, such that for any n random variable Yn has the same distribution as Xn , Y has the same distribution as X, and
limn→∞ Yn = Y P-a.s.
In Example 1.3.4 there is a convergence in distribution, w − limn→∞ Xn = X,

where X is a Poisson random variable, X ∼ Poisson(λ). Indeed, the set RX of
continuity points of X is precisely R \ N0 . If x ∈ RX is negative, then trivially
0 = FX (x) = lim FXn (x).

n→∞
20 CONTENTS
If x ∈ RX is positive, then
n
X X λk
FX (x) = P (X = k) = e−λ .
k<x 0
k!
P
The above sum is finite. Since FXn (x) = k≤x P (Xn = k) for every n, it remains to
check that k n−k
n λ λ λk
lim P (Xn = k) = 1− = e−λ ,
n→∞ k n n k!
for every k = 0, 1, . . . . This, however, is straightforward.
Exercise 1.3.1. Check that in Exercise 1.3.5 there is convergence in distribution

w − limn→∞ Xn = X, where
(a) X ∼ Poisson(r) in the case of Maxwell-Boltzmann statistics.
1
(b) X ∼ Geo0 1+r in the case of Bose-Einstein statistics.
Central limit theorem for (LLN) is a weak convergence statement. Although

random variables Xn are defined on the same probability space, limn→∞ Xn does not
exist in P-a.s. or in probability in the simplest situation when ξi -s are 1/2-Bernoulli,
or even N(0, 1). Indeed, in the latter case
n 4n
!
1 X X
X4n − Xn = √ − ξi + ξj ∼ N(0, 1),
2 n i=1 j=n+1
which clearly does not tend to zero.

In fact, the following theorem known as law of iterated logarithm (proved by
Khinchin in the Bernoulli case, and by Kolmogorov in more generality) holds:
Xn √
P − a.s. lim sup √ = 2. (1.3.1)
n→∞ log log n
Proving (1.3.1) is beyond the scope of this course, although we shall come rather close
to it from the point of view of developing relevant techniques in Subsections 1.4-1.6.
1.4 Properties with probability 1

There are three basic properties with probability-1 which we shall repeatedly use:
two Borel-Cantelli lemmas and Kolmogorov’s 0 − 1 law.
A convenient way to record the event {limn→∞ Xn = X} is
n o \[ \
lim Xn = X = {|Xn − X| ≤ m } . (1.4.1)
n→∞
m N n≥N
where {m } is any sequence satisfying m ↓ 0, for instance m = 2−m .
In general, given a sequence of events {An } let us define
\ [ [ \
lim sup An = {An i.o} = An and lim inf An = {An a.b.f } = An
n→∞ n→∞
N n≥N N n≥N
(1.4.2)
Above ”i.o.” stand for infinitely often and ”a.b.f” stands for all but finite.
In the above notation (1.4.1) reads as:
n o \
lim Xn = X = {|Xn − X| ≤ m a.b.f } ,
n→∞
m
which means that we are talking about the set of those ω ∈ Ω, such for any > 0,
|Xn (ω) − X(ω)| ≤ as soon as n is sufficiently large.
Exercise 1.4.1. (a) Check that (for any collection of event {An } it always holds
that lim inf n→∞ An ⊆ lim supn→∞ An . Construct example when they ere equal, and
when they are different.
(b) Check that
{An i.o}c = {Acn a.b.f.}
(c) Check that for any sequence of events An , the following holds:

P lim sup An ≥ lim sup P (An ) and P lim inf An ≤ lim inf P (An ) (1.4.3)
n→∞ n→∞ n→∞ n→∞
(e) Assume that A2n = A and A2n+1 = B. Chech that lim supn→∞ An = A ∪ B,
whereas lim inf n→∞ An = A ∩ B.
First Borel-Cantelli Lemma.
Lemma
P 1.4.1. Let {An } be a sequence of events (defined on the same probability space).
If n P(An ) < ∞, then
P (An i.o.) = 0. (1.4.4)
Proof. Note that by definition (1.4.2)

! !
\ [ [
P (An i.o.) = P An ≤P An ,
N n≥N n≥N
for any N ∈ N. Therefore,

!
[
P (An i.o.) ≤ inf P An .
N
n≥N
S P P
But by Exercise 1.1.3, P n≥N An ≤ n≥N P(An ). Hence, if the series n P(An )
converges, the infimum above equals to zero.
22 CONTENTS
P Note that in general a converse to the First Borel-Cantelli Lemma (that is

n P(An ) = ∞ ⇒ P (An i.o.) = 1) cannotP
be true. For instance if we take An ≡ A
for some event A, then {An i.o} = A and n P(An ) = ∞ whenever P(A) > 0.
Example 1.4.1. Simple random walk on Zd is defined as follows: Let ξ1 , ξ2 , . . . be

i.i.d. Zd -valued random variables (steps), such that
1
P (ξi = ±e` ) = for ` = 1, . . . , d.
2d
Above e` is the unit vector in `-th coordinate direction. The position Sn of the random
walk at time n is defined via: S0 = 0 and
Sn = ξ1 + ξ2 + · · · + ξn .
It is possible to check that
∼ 1
P (Sn = 0) = . (1.4.5)
nd/2
The first Borel-Cantelli Lemma, therefore, implies that
P (Sn = 0 i.o) = 0, (1.4.6)
in any dimension d ≥ 3, which means that in any dimension d ≥ 3 the simple
random walk is transient. Namely, with probability one it returns to the origin at
most finite number of times.
Second Borel-Cantelli Lemma.
Lemma 1.4.2. Let {An } be a sequence of independent events (defined on the same
P ∆ P
probability space). If n P(An ) = n pn = ∞, then
P (An i.o.) = 1. (1.4.7)
Remark 1.4.1. Note that two Borel -Canteli lemmas; Lemma 1.4.1 and Lemma 1.4.1,
imply:
P Let {An } be a sequence of independent events. Then P (An i.o.) = 1(0) ⇔
n P(A n ) = (<)∞.
Proof of Second Borel-Cantelli Lemma. Recall (Exercise 1.4.1) that {An i.o}c = {Acn a.b.f.}.
But
∞
! M
!
Exercise 1.1.2
\ \
P (Acn a.b.f.) = lim P Acn = lim lim P Acn .
N →∞ N →∞ M →∞
n=N N
Since {An } are independent,

M
! M
( M )
\ Y X
P Acn = (1 − pn ) = exp − ln(1 − pn ) .
N N N
However, for any x ∈ [0, 1]
Z x
du
ln(1 − x) = 0 − ≤ −x.
0 1−u
We conclude that for any N ≤ M ,
∞
! M
! ( M )
\ \ X
P Acn ≤ P Acn ≤ exp − pn .
N N N
P∞
Since by assumption, N pn = ∞, it follows that
P (Acn a.b.f.) = 0.
Exercise 1.4.2. Let An be a sequence of independent events such that
0 < P(An ) < 1
for every n. Check that

∞
! ∞
!
[ [ X
P An = 1 ⇔ P An = 1 ∀N ⇔ P(An i.o.) = 1 ⇔ P(An ) = ∞
1 N n
(1.4.8)
P
Solution to Exercise 1.4.2.S n P(An ) = ∞ ⇒ P(An i.o.) = 1 Sis the second BC
Lemma. P(An i.o.) S = 1 ⇒ P( ∞ N An )S= 1 ∀N because {An i.o.} ⊆
∞
N An for any N .
∞ ∞
The implication
S∞ P ( PN A n ) = 1 ⇒ P ( 1 A n ) = 1 is trivial. So it remains to show that
P ( 1 An ) = 1 ⇒ n P(An ) = ∞. To this end one need to check the following: If
pn ∈ (0, 1) is a sequence of probabilities, then
∞
Y X
(1 − pn ) = 0 ⇔ pn = ∞. (1.4.9)
1 n
Q∞
Since 1 − pn ≤ e−pn , the implication n pn =
P
P ∞ ⇒ 1 (1 − pn ) = 0 follows as in the
proof of the second BC lemma. Assume that n pn < ∞. Then there exists n0 such that
pn ≤ 21 for all n ≥ n0 . But if pn ≤ 21 , then
Z pn
dt
log(1 − pn ) = − ≥ −2pn .
0 1−t
Qn0 −1
(1 − P (A` )) exp −2 ∞
T c P S
T c P ( An )P≥ 1
Hence, n0 pn > 0. Therefore P ( An ) = 1 −
P ( An ) < 1 if pn < ∞.
24 CONTENTS
Exercise 1.4.3. Let {Xn } be a sequence of independent random variables.
(a) Check that
X
P lim sup Xn = ∞ = 1 ⇔ P (Xn > K) = ∞ ∀K.
n→∞
n
(b) Check that

X
P lim Xn = ∞ = 0 ⇔ ∃K such that P (Xn ≤ K) = ∞.
n→∞
n
(c) Let {Xn } be a sequence of i.i.d. non-negative random variables. Check that if
E(Xi ) < ∞, then limn→∞ Xnn = 0 P-a.s. On the other hand, check that if E(Xi ) = ∞,
then p − limn→∞ Xnn = 0, but

Xn
P lim sup = ∞ = 1.
n→∞ n
Hint. Use (1.2.13).
(d) Let ξ1 , ξ2 , . . . be a sequence of i.i.d. exponential random variables, that is there
exists λ > 0 such that P (ξ > t) = e−λt for any t ≥ 0. Check that

ξn 1
P lim sup = = 1.
n→∞ log n λ
(e) Prove that for any random variable X and any sequence of positive numbers cn
satisfying limn→∞ cn = ∞, the following holds:

X
P lim = 0 = 1.
n→∞ cn
(f ) With ξi -s as in (d) define Xn = max1≤`≤n ξ` . Show that

Xn 1
P lim = = 1.
n→∞ log n λ
Hint. Check that for any > 0,
[ [ !
Xn 1+ ξn 1+ Xm
lim sup > ⊆ > i.o. lim >0 ,
n→∞ log n λ log n λ n→∞ log n
m
and then rely on (d) and (f ).

Let ξ1 , ξ2 , . . . be a sequence of random variables defined on the same probability space (Ω, A, P).
Tail σ-algebra and Kolmogorov’s 0 − 1 Law. Define Tn = σ (ξn , ξn+1 , . . .). In other words, Tn is the σ-algebra
generated by random variables ξ1 , ξn+1 , . . . . Then define the tail σ-algebra
\
T∞ = Tn . (1.4.10)
n
Lemma 1.4.3. If {ξn } are independent, then T∞ is trivial in the following sense: For any A ∈ T∞
Either P(A) = 1 or P(A) = 0. (1.4.11)

Proof. The logic of Kolmogorov’s 0 − 1 law is that any event A ∈ T∞ should be independent of itself, which is
equivalent to saying that P(A) = 0, 1. Assume, to the contrary that 0 < P(A) < 1. Then the conditional probability
∆ P(AB)
P(B|A) =
P(A)
is well defined. Set An = σ(ξ1 , . . . , ξn ). By assumption A is independent of An , for any n ∈ N. Hence
[
P(B|A) = P(B) for any B ∈ An .
n
S
Hence P(·|A) is σ-additive on the algebra n An and by Caratheodory’s theorem it should coincide with P on
A∞ = σ (ξ1 , ξ2 , . . .) ⊆ A. But T∞ ⊆ A∞ . Which means that P(A|A) = P(A), which is a contradiction to
P(A) ∈ (0, 1).
Exercise 1.4.4. Let {ξn } be a sequence of i.i.d random variables. Consider normalized sums Xn defined in
(LLN). Check that for any number a ∈ R ∪ ∞ the probability P(limn Xn = a) i s either 0 or 1. The same regarding
probabilities P(lim supn Xn ≥ a) and P(lim supn Xn < a).
1.5 Convergence of random series

In this subsection we study convergence of random series in (RS).
Let us start by general considerations: Let {Xn } be any sequence of random
variables defined on the same probability space (Ω, A, P). The random variables
∆ ∆
lim inf Xn = lim inf Xn and lim sup Xn = lim sup Xn (1.5.1)
n→∞ N →∞ n≥N n→∞ N →∞ n≥N
are always defined, albeit they may take values ±∞. Here is a necessary and suffi-
cient condition for P-a.s. existence of a finite limit X = limn→∞ Xn :
Proposition 1.5.1. A sequence {Xn } is P-a.s. convergent to a finite (and in general

random) limit iff
!
P lim sup |Xn − Xm | > = 0 for every > 0 (1.5.2)
n→∞
m→∞
In other words, {Xn } is P-a.s. convergent iff it is P-a.s. fundamental.
Proof. Assume that X = limn→∞ Xn , P-a.s.. Recall Exercise 1.4.1 which states that this happens iff
P (|Xn − X| < a.b.f ) = 1 (1.5.3)
for any > 0. Therefore,

P lim sup |Xn − X| < = 1.
n→∞
But |Xn − Xm | ≤ |Xn − X| + |Xm − X|. Hence,

 
P lim sup |Xn − Xm | < 2 = 1,

n→∞
m→∞
26 CONTENTS
for any > 0. This is precisely (1.5.2).
Assume now that (1.5.2) holds. Set X = lim supn→∞ Xn . Because of (1.5.2), random variable X is P-a.s finite.
But,
lim sup |Xn − X| ≤ lim sup |Xn − Xm | .
n→∞ n→∞
m→∞
Hence (1.5.3)
At this point let us go back to random series (RS).

Kolmogorov-Khinchin theorem.
Theorem 1.5.1.P Let {ξn } be a sequence of zero mean (Eξn = 0) independent random
variables with n Var(ξn ) < ∞. Then random series Xn in (RS) converges P-a.s.
Proof. The proof relies on the following inequality:
Kolmogorov’s Maximal Inequality. Let ξ1 , ξ2 , . . . be zero mean independent random

variables. Set Xn = n1 ξi . Then for any > 0,
P
E(X2n )

Var(Xn )
P max |Xk | ≥ ≤ 2
= . (1.5.4)
1≤k≤n 2
Let us proceed assuming (1.5.4). By Proposition 1.3.2 we need to check (1.5.2).

Now,
 

P lim sup |X` − Xm | >  = lim P sup |X` − Xm | > ≤ lim P sup |X` − Xn | > .
`→∞ n→∞ `,m≥n n→∞ `≥n 2
m→∞
In the last inequality we just used that |X` − Xm | ≤ |X` − Xn | + |Xm − Xn |. On the
other hand (writing ` = n + k for ` > k),

P sup |X` − Xn | > = lim P max |Xn+k − Xn | > .
`≥n 2 K→∞ 1≤k≤K 2
At this point we infer by (1.5.4):

Pn+K
E(ξi2 )

4 n+1
P max |Xn+k − Xn | > ≤ .
1≤k≤K 2 2
Hence,
4 ∞ 2
P
n+1 E(ξi )
P sup |X` − Xn | > ≤ .
`≥n 2 2
Since, by assumption i E(ξi2 ) < ∞, the right hand side above converges to zero as
P
n → ∞, and we are home.
It remains to prove (1.5.4). Define A = {max1≤k≤n |Xk | ≥ }, and for each k =
1, 2, . . . , n define
Ak = {|X1 | < , . . . , |Xk−1 | < ; |Xk | ≥ } .
In other words the event Ak occurs if |X` | ≥ for the first time at
P index ` = k.
Clearly, Ak -s are disjoint, and A = ∪n1 Ak . Which means that 1A = n1 1Ak . Now,
n
X
E(X2n ) E(X2n 1A ) E X2n 1Ak .

≥ =
1
For each k = 1, . . . , n:
 !2 
n
X
E X2n 1Ak = E  Xk +

ξi 1Ak 
k+1
!  !2 
n
X n
X
= E X2k 1Ak + 2E Xk 1Ak

ξi + E 1Ak ξi .
k+1 k+1
Pn
The last term above is non-negative. Since k+1 ξi and Xk 1Ak are independent, he
second term above is zero. Finally, since by construction |Xk | ≥ on Ak , the first
term above is bounded below by 2 E (1Ak ) = 2 P(Ak ). We, therefore conclude:
n
X
E(X2n ) ≥ 2
P(Ak ) = 2 P(A).
1
This is (1.5.4).
Exercise 1.5.1. (a) Let {ξn } be a sequence of independent random variables P such
that P
E(ξn ) and Var(ξn ) are defined and finite for every n. Check that if both n E(ξn )
and n Var(ξn ) converge, then random series Xn in (RS) converge P P − a.s.
(b) Find an example when X in (RS) is P-a.s. convergent, but n Var(ξn ) = ∞.
(c) Wiener process on [0, 1]: Let ξ0 , ξ1 , . . . be i.i.d. standard normal N (0, 1) random
variables. For each t ∈ [0, 1] set
∞
√ X sin(nπt)
Wt = ξ0 t + 2 ξn . (1.5.5)
n=1
nπ
Check that random variables Wt are well defined for each t ∈ [0, 1].
The following most general theorem about convergence of random serries is given without proof:
Kolmogorov’s three-series theorem.
Theorem 1.5.2. Let {ξn } be a sequence of independent random variables. A necessary and sufficient condition for
convergence of random series Xn in (RS) is convergence of each of the three (non-random) sequences below:
X X X
P (|ξn | > r) , E ξn 1|ξn |≤r and Var ξn 1|ξn |≤r (1.5.6)
n n n
for some (equivalently for all) r > 0.

28 CONTENTS
1.6 Law of large numbers
Let ξ1 , ξ2 , . . . are i.i.d. random variables. Consider Xn as defined in (LLN).
Strong Law of Large Numbers.
∆
Theorem 1.6.1. Assume that E(ξ) = µ ∈ [−∞, ∞] is defined. Then,
P (lim Xn = µ) = 1. (1.6.1)
Proof. There are several cases to be considered:

CASE 1 E (ξ 4 ) < ∞ (in particular E (|ξ|) < ∞ and hence µ ∈ R).
There is no loss of generality to assume that µ = 0. Indeed,
n n
1X 1X
lim ξi = µ ⇔ lim (ξi − µ) = 0,
n→∞ n n→∞ n
i=1 i=1
and variables ξ − µ have zero mean.

Exercise 1.6.1. Check (using Cauchy-Scwartz Inequality) that if an i.i.d sequence
{ξi } is such that E(ξ) = 0 and E (ξi4 ) < ∞, then there exists a finite constant c < ∞
such that !4
n
1X c
E ξi ≤ 2, (1.6.2)
n i=1 n
for any n ∈ N.
By the generalized Markov inequality (1.2.17),
n
!
1X c
P ξi ≥ ≤ 2 4 ,
n 1 n
for any n and any > 0. By the First Borel-Cantelli Lemma,

n
!
1X
P ξi ≥ i.o = 0,
n 1
1
Pn
for any > 0. Hence P limn→∞ n 1 ξi = 0 = 1.
CASE 2 E (ξ 2 ) < ∞ (in particular E (|ξ|) < ∞ and hence µ ∈ R).
We proceed to assume that E(ξi ) = 0. As in (1.6.2) one may check that
n
!2
1X c
E ξi ≤ .
n i=1 n
P∞ 1 1 Pn
Since, however, 1 n
= ∞ we cannot directly rely on the first Borel-Cantelli Lemma. In the sequel Xn = n 1 ξi .

Exercise 1.6.2. (a) Fix a number α > 1, and check that P limn→∞ Xbnα c = 0 = 1.
(b) Check that for any > 0,

P max Xk − Xbnα c ≥ i.o. = 0.
bnα c≤k<b(n+1)α c
(c) Deduce from (a) and (b) above the statement of the LLN in CASE 2.
CASE 3 E (|ξ|) < ∞ and hence µ ∈ R.

We shall work out this case later (as an application of martingale convergence the-
orem).
CASE 4 µ = ±∞.
We shall consider µ = ∞, the treatment of µ = −∞ is the same. Write the
decomposition to negative and positive parts as
∆
ξi = ξi 1{ξi ≥0} + ξ1{ξi <0} = ξi+ − ξi− .
Variables ξ ± are non-negative, and E(ξ) = ∞ means that E(ξ − ) < ∞, whereas
n −
E(ξ + ) = ∞. Thus X− 1
P
n = n 1 ξi falls in the framework of CASE 3 above. We just
need to show that
1X +
P lim ξi = ∞ = 1. (1.6.3)
n→∞ n
Exercise 1.6.3. (a) Use the tail formula (1.2.13) to show the following: If Y is a
non-negative random variable, then
∞
X ∞
X
P (Y > n) ≥ E(Y) ≥ P (Y > n) . (1.6.4)
n=0 n=1
(b) Show that if ξi+ are i.i.d non-negative random variables with E(ξi+ ) = ∞, then

for any C > 0, the probability P (ξn+ ≥ Cn i.o.) = 1. Conclude that

n
!
1X +
P lim sup ξi = ∞ = 1.
n→∞ n
1
(c) In conditions of (b) above let M > 0. Consider (i.i.d.) random variables
∆
min ξi+ , M = ξi+ ∧ M ∈ [0, M ]. Use LLN to check that for any M ,

n
!
1X +
ξi ≥ E ξ + ∧ M

P lim inf = 1.
n→∞ n
1
Use the lower bound in (1.6.4) to check that limM →∞ E (ξ + ∧ M ) = ∞, and complete
the proof of LLN in CASE 4.
Several classical exmaples related to laws of large numbers are listed below:
30 CONTENTS
Exercise 1.6.4. Bernstein polynomials. Let f be a continuous function on [0, 1].
Pn p ∈ [0, 1] consider ξ1 , ξ2 , . . . i.i.d. Bernoulli(p) random variables. Set
For any
Sn = 1 ξi . Let Pp be the corresponding probability. Consider
X n
1 n k
Ep f Sn = f pk (1 − p)n−k . (1.6.5)
n k=0
k n
The expression in (1.6.5) is a polynomial of order n (in real variable p ∈ [0, 1]), the
so called Bernstein polynomial. Use Chebychev bound employed in the proof of LLN
to check that
1
lim max f (p) − Ep f Sn = 0. (1.6.6)
n→∞ p∈[0,1] n
Hint. Rely on the following property of continuous functions: If f is continuous on
[0, 1], then there exists a (continous) function δf with δf (0) = 0; called modulus of
continuity of f , such that for any t, s ∈ [0, 1]
|f (t) − f (s)| ≤ δf (t − s).
Exercise 1.6.5. Monte-Carlo integration. Let f be a continuous function on [0, 1]
and let U1 , U2 , . . . be i.i.d Uni[0, 1] random variables. Then, P-a.s.
n Z t
1X
lim f (Ui ) = f (t)dt. (1.6.7)
n→∞ n 0
i=1
Exercise
Qn 1.6.6. Let U1 , U2 , . . . be i.i.d Uni[0, 2] random variables. Define√Wn =
n
i=1 Ui . Check that limn→∞ Wn = 0 with probability one. Compute limn→∞ Wn .
Exercise 1.6.7. Long term optimal investment problem (following Durrett). As-
sume that each $1 you invest in bonds in the beginning of any month yields fixed
sum $a in the end of this month. On the other hand each $1 invested in stocks in
the beginning of months 1, 2, 3, . . . yields $V1 , $V2 , $V3 , . . . in the end of the corre-
sponding months, where Vn -s are i.i.d positive random variables.
The problem is to choose optimal proportion p ∈ [0, 1] of money to be invested into
stocks. Once such Qp is chosen, each dollar invested in the beginning of the first
month yields Wn = n1 (a(1 − p) + pVi ) in the end of months n.
Assume that there exists > 0 such that P ( ≤ Vi ≤ −1 ) = 1.
(a) Show that for any p ∈ [0, 1] fixed the limit
1 ∆
lim log Wn = φ(p)
n→∞ n
exists and finite P-a.s.
(b) Check that if E(V ) > a and E(V −1 ) > a−1 , then there is a non-trivial optimal
investemt p∗ ∈ (0, 1).
Hint. Check that φ is a concave function: φ00 (p) ≤ 0.
(c) Use Jensen inequlity to check that E (V −1 ) ≤ a−1 implies that E(V ) ≥ a.
(d) What is the optimal stategy if either E(V ) ≤ a or if E(V −1 ) ≤ a−1 .
1.7 Further Exercises
Exercise 1.7.1. Let {Aα } be a family (not necessarily countable) of σ algebras of
subsets of Ω. Check that
∆
A = {A ⊆ Ω : A ∈ Aα for any α} = ∩α Aα
is a σ-algebra. Deduce that given a collection A0 the minimal σ-algebra A ⊃ A0 is
well defined via:
∆ ∆
\
A = σ(A0 ) = B,
B⊃A0
where the intersection is over all different σ-algebras B which contain A0 .

Exercise 1.7.2. Show that the Borel σ-algebra contains all open intervals (a, b) ⊆
[0, 1]. Define BR σ-algebra of all Borel subsets of R, and check that
B[0,1] = {B ∩ [0, 1] : B ∈ BR } .
Exercise 1.7.3. Check the following statements:

(a) X is a random variable on a probability space (Ω, A, P) if and only if {X ∈ B} =
{ω : X(ω) ∈ B} for any Borel set B ∈ BR .
(b) If X, Y are random variables (on the same probability space of course), then
X + Y and XY are also random variables.
(c) If X1 , X2 , . . . are random variables then lim sup Xn and lim inf Xn are random
variable.
(d) If X is a random variable and f is a continuous function, then Y = f (X) is
a random variable.
(e) If X is a random variables then for each n
Xm X m+1
Xn = 1 and X = 1 m (1.7.1)
2n { 2n ≤X< 2n } 2n { 2n <X≤ 2n }
m m+1 n m+1
m∈Z m∈Z
are also random variables and

P Xn ≤ X ≤ Xn ≤ Xn + 2−n = 1.

(1.7.2)
(f ) If X is a discrete random variable, then (1.2.1) implies (1.2.1).
For instance (a) follows from

1
{X < a} = ∪n X ≤ a − ,
n
which means that {ω : X(ω) ∈ [a, b]} belongs to A for any interval [a, b] ⊂ R. The
latter intervals generate BR . Since, the family
{B ⊂ R : {ω ∈ Ω : X(ω) ∈ B} ∈ A}
is a σ-algebra, it should contain BR
32 CONTENTS
Exercise 1.7.4. Consider random variables Xn and Xn constructed in (1.7.1).
(a) Check that for each n either E(Xn ) and E(Xn ) are either both defined or they are
both undefined.
(b) Check that if E(Xn ) (or E(Xn )) is defined for some n, then it is defined for all
n, and then
∆
lim E(Xn ) = lim E(Xn ) = E(X) (1.7.3)
n→∞ n→∞
is defined.
(c) Check that if P(X ≤ Y) = 1 and the expectations E(X) and E(Y) are defined,
then E(X) ≤ E(Y).
(d) Using the above definition (1.7.3) check that if X is a random variable with
continuous density function f , then the expectation E(X) is indeed given by (1.2.11).
1
(e) Check that if X has Cauchy distribution, that is f (x) = π(1+x 2 ) , then E(X) is not
defined.
Hint for (c) Check that if both X and Y are discrete, then one can find partition
{Ci } of Ω such that
X X
X(ω) = xi 1Ci (ω) and Y(ω) = yi 1Ci (ω)
i i
with, of course, xi ≤ yi for any i.
Exercise 1.7.5. Find examples for:

(a) p − limn→∞ Xn = 0, but limn→∞ P-a.s. does not exist.
(b) limn→∞ Xn = 0 P-a.s., but E(|Xn |) = 1 for any n (and hence Xn does not converge
to zero in the mean).
(c) limn→∞ Xn = 0 in Lr , but lim Xn does not exist P-a.s.
(d) Random variables Xn are defined on the same probability space, lim Xn exists in
distribution, but Xn does not converge in probability.
(e) Random sequence {Xn } of continuous random variables which weakly converges
to a discrete random variable X, such that limn→∞ FXn (x) 6= FX (x) at any point of
discontinuity x of FX .
Exercise 1.7.6. Use first Borel-Cantelli lemma to prove the following statement:
If limn→∞ Xn = X in probability, then there exists a sub-sequence nk such that
limk→∞ Xnk = X P-a.s.
Exercise 1.7.7. Check that a sequence of random variables {Xn } converges in prob-
ability iff for any > 0,
lim P (|Xn − Xm | > ) = 0. (1.7.4)
m,n→∞
In other words, In other words, {Xn } converges in probability iff it is fundamental

in probability.
Hint: Prove that if {Xn } is fundamental in probability, then there exists a
subsequence {Xnk } which is P-a.s. fundamental.
Exercise 1.7.8. Let {ξi } is a sequence of i.i.d. ±1 valued random variables with
P(ξn = 1) = P(ξn = −1) = 21 . Let a1 , a2 , a3 , . . . be a sequence of (non-random)
numbers. Define
X n
Yn = a` ξ` .
1
a2n < ∞.
P
Use Theorem 1.5.2 to show that random series Yn converges iff n
Exercise 1.7.9. Prove the following general version of Exercise 1.5.1, which gives
a general construction of Brownian motion on [0, 1]: Let ψ0 , ψ1 , . . . be a complete
orthonormal basis of L2 (0, 1), that is any f ∈ L2 (0, 1) can be represented as
∞ Z 1
∆
X
f= hf, ψk iψk where for g, h ∈ L2 (0, 1) hg, hi = f (t)g(t)dt.
k=0 0
Let ξ0 , ξ1 , . . . be i.i.d. standard normal N (0, 1) random variables. Then, the series
∞
X Z t
Bt = ξk ψk (s)ds (1.7.5)
k=0 0
converge P-a.s for each t ∈ [0, 1]. Moreover, for any t, s ∈ [0, 1],
∆
Cov (Bt , Bs ) = min {s, t} = s ∧ t.
Rt
Hint. Define It (s) = 1{s≤t} and note that 0 ψk (s)ds = hIt , ψk i. Since {ψk } is
R1
a complete orthonormal basis, 0 f (s)2 ds = ∞ 2
P
0 hf, ψk i for any f ∈ L2 (0, 1), in
particular for f = It .
Exercise 1.7.10. Consider a complete graph KN of N vertices. Complete means

that any two vertices i, j ∈ KN are connected by an edge. So there are N2 edges.
A 2-coloring of KN is an assignment of one of two given colors, say red and blue,
N
to each edge. So, there are 2( 2 ) different 2-coloring of KN . A clique C of size
M is a complete sub-graph of M vertices of KN . So any clique of size M is, as a
graph, isomorphic to KM . Given a coloring of KN let us say that a clique C of size
M is monochromatic if all of its M2 edges are painted in the same color. One of

Ramsey’s question is: Given M < N , whether there exists a 2-coloring of KN such
that there are no monochromatic M -cliques.
34 CONTENTS
Clearly, the smaller M is, the more it is difficult to color KN without having
monochromatic M cliques. For small N -s and large N -s this even might be impos-
sible. Prove the following probabilistic lower bound: If
M > 2(log2 N + 1),
then there exists a coloring of KN without monochromatic M -cliques
Hint. Color all the edges of KN independently and estimate the probability that
there exists a monochromatic M -clique. Think what it means if this probability
happens to be less than one.
2. RENEWAL THEORY IN DISCRETE TIME. 35
2 Renewal theory in discrete time.
2.1 The setup.
Let T1 , T2 , . . . be independent random variables. We assume that: T1 is N0 -valued,
and that T2 , T3 , . . . are identically distributed N ∪ {∞}-valued random variables. In
the sequel we use the following notation for probability functions f of T1 and p of
Ti ; i ≥ 2:
f` = P (T1 = `) and p` = P (Ti = `) for ` = 0, 1, 2, 3, . . . , ∞. (2.1.1)
By our assumption f∞ = 0 and p0 = 0. Next, we set
k
X ∞
X
Sk = Ti ∈ N0 , V (n) = 1Sk =n and v(n) = EV (n) = P (∃ k : Sk = n) .
1 k=1
(2.1.2)
Interpretation. The above quantities have the following interpretation: We think in

terms of an arrival/renewal process in discrete time. S1 , S2 , S3 , . . . are arrival times.
V (n) is the indicator that there was an arrival at time n, and v(n) is the corresponding
probability.
T1 is the time until the first arrival. If P(T1 = 0) = 1, then we are talking about zero-
delayed renewal process, or simply renewal process. If T1 is non-trivial, P(T1 > 0) > 0,
then we are talking about delayed (by T1 ) renewal process.
The number of arrival until time n and its expected value are denoted as
n
X ∞
X n
X
M (n) = V (k) = 1{Sk ≤n} , and m(n) = EM (n) = v(k).
k=0 k=1 k=0
If p∞ = 0 (respectively p∞ > 0), then we are talking about proper(respectively defective)

renewal process.
Example 2.1.1. Bernoulli process of arrivals. Let {V (n)} be i.i.d Bernoulli

random variables with probability of success p. This corresponds to delayed renewal
with T1 ∼ Geo0 (p) and Ti ∼ Geo(p).
2.2 Renewal equation and renewal theorem.

Renewal equation is a recursion which is based on independence assumptions we
made.
36 CONTENTS
Theorem 2.2.1. The sequences v(n) and m(n) satisfy the following recursions:
n
X n−1
X
0
v(n) = f` v (n − `) = fn + v(`)pn−`
`=0 0
and (2.2.1)
Xn n
X
0
m(n) = f` m (n − `) = m(n − `)p` ,
`=0 1
where v 0 (n) is the probability of arrival and, respectively, m0 (n) is the expected
number of arrivals for the un-delayed (T1 = 0) renewal sequence.
In the defective case set
ξ = max {Sk : Sk < ∞} and ρ(n) = P (ξ ≤ n) . (2.2.2)
In other words, ξ is the time of last renewal. Then,

n
X
ρ(n) = fk ρ0 (n − k), (2.2.3)
0
where ρ0 is the corresponding probability for the un-delayed renewal sequence.

Proof of Theorem 2.2.1. Write
∞
X ∞
X ∞
X
V (n) = 1Sk =n = 1T1 =n + 1T1 <n 1Sk =n = 1T1 =n + 1Sk−1 <n 1Sk =n (2.2.4)
k=1 k=2 k=2
Now, for any k ≥ 2,

n−1
X
1T1 <n 1Sk =n = 1T1 =` 1Sk −T1 =n−` .
`=0
However, Sk0
= Sk − T1 is exactly the k-th arrival time of the zero-delayed renewal
sequence. Therefore,
n−1
! n−1
X X X
E 1T1 =` 1Sk −T1 =n−` = f` v ∗ (n − `).
k `=0 `=0
In a similar fashion,
n−1
X
1Sk−1 <n 1Sk =n = 1Sk−1 <n 1Tk =n−` .
`=0
Hence X
E 1Sk−1 =` 1Sk =n−` = v(`)pn−` ,
k>1
and the first of
P(2.2.1) follows. The second of (2.2.1) is an immediate consequence
since m(n) = `≤n v(`).
The proof of (2.2.3) goes along similar lines: Write
n
X
1ξ≤n = 1T1 ≤n 1ξ≤n = 1T1 =` 1ξ0 ≤n−` ,
`=0
where ξ 0 is the last renewal time of the zero-delayed renewal sequence Sk0 = Sk − T1 ,
that is S10 = 0, S20 = T2 , S30 = T2 + T3 , . . . . Taking expectations one gets (2.2.3).
Existence of solutions to (2.2.1) follows by recursion. Note that since v(0) =
m(0) = P(T1 = 0) are unambiguously defined, equations (2.2.1) always have unique
solutions. The renewal theorem describes the large n behaviour of these solutions.
Theorem 2.2.2. Define: µ = E(T ) ∈ (0, ∞]. Then,
M (n) m(n) 1
P − a.s lim = lim = lim v(n) = . (2.2.5)
n→∞ n n→∞ n n→∞ µ
We shall prove now only first two limits in (2.2.2), which are usually referred to
as an elementary renewal theorem . The full renewal theorem, limn→∞ v(n) = µ1 will
be explained later as an application of coupling methods.
Remark 2.2.1. Recall that in general the almost sure convergence; limn→∞ Xn = X
P-a.s, does not imply that limn→∞ E(Xn ) = E(Xn ), even if all the expectations are
defined and finite. Which means that the first limit in (2.2.1) does not automatically
imply the second limit therein, and additional arguments are needed. Such argument
will be based on Wald’s formula.
Note that the following events coincide:
{M (n) ≥ k} = {Sk ≤ n} and {M (n) ≤ k} = {Sk+1 > n} . (2.2.6)
The defective case is easy:

Exercise 2.2.1. Show that if p∞ > 0, then P − a.s. limn→∞ M (n) exists and is
smaller than infinity. Deduce from this that P − a.s. limn→∞ Mn(n) = 0.
So, as far as the first limit in (2.2.5) is considered, we may restrict attention to
the proper renewal, that is to the one with P(Ti < ∞) = 1, and then split the proof
of the first limit in (2.2.1) into three (short) steps.
STEP 1. We claim that
P − a.s. lim M (n) = ∞ (2.2.7)
n→∞
Indeed, since M (n) is non-decreasing in n, the limit exists. Now, for any m ∈ N,

P lim M (n) ≤ m = lim P (M (n) ≤ m) .
n→∞ n→∞
38 CONTENTS
However, by (2.2.6)
∞
X
P (M (n) ≤ m) = P (Sm+1 > n) = P(Sm+1 = `) = 1 − FSm+1 (m),
`=m+1
where FSn is the distribution function of Sn .

Exercise 2.2.2. Check that since P(Ti < ∞) = 1,
lim FSn (t) = P (Sn < ∞) = 1.

t→∞
Hence (2.2.7).
STEP 2. By STEP 1 and SLLN for n1 Sn ,
SM (n) M (n) 1
lim = µ P − a.s. ⇒ lim = P − a.s. (2.2.8)
n→∞ M (n) n→∞ SM (n) µ
STEP 3. By definition SM (n) ≤ m < SM (n)+1 . Hence,

M (n) M (n) M (n)
∈ , .
n SM (n)+1 SM (n)
1
By (2.2.8) right end-points of the above intervals converge to µ
. As for the left
end-points:
M (n) M (n) + 1 M (n)
lim = lim ·
n→∞ SM (n)+1 n→∞ SN (t)+1 M (n) + 1
M (n)
By STEP 1 we conclude that limn→∞ M (n)+1
= 1 P-a.s (simply because lim M (n) =
N (t)+1 1
∞). On the other hand limt→∞ SN (t)+1 = µ by (2.2.8).
In order to prove the second limit in (2.2.5) we need to develop an additional
tool: Wald formula for stopping times.
Stoping times and Wald’s formula. Let A1 ⊆ A2 ⊆ · · · ⊆ An ⊆ . . . be a

non-decreasing sequence of σ-agebras, which is also called filtration.
An integer valued random variable M is called a stopping time with respect to filtration
{Ak } if
{M ≤ k} ∈ Ak for any k ∈ N. (2.2.9)
Exercise 2.2.3. (a) Check that M is a stopping time with respect to {Ak } iff
{M = k} ∈ Ak for any k iff {M ≥ k} ∈ Ak−1 for any k .
(b) Show that if M1 and M2 are two stopping times with respect to a filtration {An },
then, N1 = min(M1 , M2 ) and N2 = max(M1 , M2 ) are both stopping times (with resp.
to the same filtration). In addition, show that a constant M = c is a stopping time
(with resp. to any filtration)
Consider now the following filtration associated to non-negative i.i.d random
inter-arrival times {Ti }: Ak = σ(T1 , . . . , Tk ). Note that for any n ∈ N the random
variable M (n) + 1 is a stopping time with respect to {Ak }, however M 0 = M (n)
is not, in general, a stopping time. Indeed, by (2.2.6), {M (n) ≤ k} = {Sk+1 > n},
whereas
{M (n) + 1 ≤ k} = {Sk > n} ∈ Ak .
Wald’s formula. If M is a stopping time, then (recall notation µ = E(Ti ))

M
!
X
E Ti = µE(M ). (2.2.10)
1
Proof of Wald’s formula. Note that

M
X X
Ti = Ti 1M ≥i .
1 i
However, {M ≥ i}c = {M < i} = ∪j<i {M = j} ∈ Ai−1 . Which means that 1M ≥i

and Ti are independent. Hence:
M
! !
X X X X
E Ti = E Ti 1M ≥i = E (Ti 1M ≥i ) = µ P(M ≥ i) = µE(M ).
1 i i i
(2.2.11)
Remark 2.2.2. Note that there is a flaw in the proof: In the second equality in
(2.2.11) we have exchanged the expectation with the infinite sum. This should be
justified, and we shall discuss the issue later when we shall talk about Convergence
Theorems. In particular what we did would follow from the so called MON =
Monotone Convergence Theorem, which relies on the fact that Ti -s are non-negative.
Wald’s formula (2.2.10) holds for general (not necessarily non-negative) i.i.d. ran-
dom variables Ti under additional assumption that both E (|Ti |) , E (M ) < ∞.
Exercise
Pn 2.2.4. Let X1 , X2 , . . . be i.i.d. Uni(0, 1) random variables. Set Sn =
1 X i and M = inf {n : Sn ≥ 1}. Check that M is a stopping time and prove that
E (M ) = e and that E (SM ) = 2e .
Hint. Check by direct computation that P (M > n) = n!1 .
Exercise 2.2.5. (Following Bremaud) At a crosswalk car pass on a single lane
according to renewal sequence with i.i.d. inter-arrival times Ti ∼ Uni (1, 2 . . . , 10).
A pedestrian arrives to the crosswalk at time 0 and shall cross the road at time S,
as soon as she/he sees a time interval of at least 7Pbetween two consecutive cars.
−1
That is if M = min {k : Tk ≥ 7}, then S = 1M >1 M 1 Tk . Check that M − 1 is
not a stopping time, but still compute E(S).
40 CONTENTS
Proof of the second limit in (2.2.5). Again we shall split the proof of the second
limit in (2.2.5) into three short steps.
STEP 1 Let us assume first that µ < ∞. Since M (n) + 1 is a stopping time, by
Wald’s formula
ESM (n)+1 = µE(M (n) + 1) = µ(m(n) + 1).
However n < Sm(n)+1 . Therefore,

E(SM (n)+1 ) m(n) 1 1 m(n)
1≤ =µ + ⇒ ≤ lim inf . (2.2.12)
n n n µ t→∞ n
Note that (7.2.5) is trivial if µ = ∞.
STEP 2 For A ∈ N define TiA = Ti ∧ A. Then TiA is an i.i.d sequence of bounded
random variables. By the Monotone Convergence Theorem (still to be described),
∆
lim E(TiA ) = lim µA = µ.
A→∞ A→∞
Now, since TiA ≤ Ti we obviously have that M A (n) ≥ M (n), and hence mA (n) ≥
m(n). Therefore it is enough to check that for any A,
mA (n) 1
lim sup ≤ A. (2.2.13)
n→∞ n µ
STEP 3. Since TiA -s are bounded above by A,
SM A (n) SM A (n)+1 − A
1≥ ≥ .
n n
Therefore, taking expectations and using Wald’s formula again,
µ(mA (n) + 1) − A
1≥ ,
n
for any A > 0. (7.2.6) follows.
2.3 Renewal-reward theorem.

Consider a collection of i.i.d. couples of (Ti , Ri ), such that:
(a) We think of Ti -s in terms of inter-arrival times of a delayed renewal process.
(b) We think of Ri -s in terms of awards collected during time interval of length Ti . It
makes sense to consider a general situation, e.g. we do not require Ri -s to be non-negative,
for instance the setup is adjusted to considering fines instead of rewards. In any case,
however, we shall assume that ERi is defined and finite,
∆
r = E(Ri ) ∈ (−∞, ∞). (2.3.1)
(c) Note that in general Ti and Ri are dependent. What we require is independence of
couples (Ti , Ri ) for different i-s.
We continue to employ notation M (n) for the number of arrivals by time n and
Sk = k1 Ti for the time of k-th arrival. Recall that SM (n) ≤ n ≤ SM (n)+1 .
P
There are several ways to define reward collected by time n:
(a) Initial reward (collected at the beginning of an epoch) CI (n) = M

P (n)+1
1 Ri .
PM (n)
(b) Terminal reward (collected in the end of an epoch) CT (n) = 1 Ri .
(c) Partial reward CP (n) = CT (n)+R(n, SM (n) , SM (n)+1 ). For convenience we shall always
assume that partial reward is in between the terminal and the initial one, that is:

min 0, RM (n)+1 ≤ R(n, SM (n) , SM (n)+1 ) ≤ max 0, RM (n)+1 . (2.3.2)
Here are two important examples:
Example 2.3.1. Ti -s are inter arrival times, Ri -s are service times, that is Ri is time
needed to serve i-th customer. Then CT (n) is the total time needed to serve all the cus-
tomers who arrived before n.
Example 2.3.2. Let Ti -s be inter-arrival times. Set

n
Tk (Tk − 1) X
Rk = and R n, SM (n) , SM (n)+1 = k − SM (n) . (2.3.3)
2 k=SM (n)
Renewal-reward Theorem. . Assume (2.3.1) and (2.3.2). Then,
C∗ (n) r
lim = P − a.s., (2.3.4)
n→∞ t µ
for ∗ = I, T, P. Moreover,
E(C∗ (n)) r
lim = . (2.3.5)
n→∞ n µ
Proof. By (2.3.2) it would be enough to consider ∗ = I, T. For terminal rewards,

M (n) M (n)
CT 1 X M (n) 1 X
= Rk = · Rk .
n n 1 n M (n) 1
Recall that we have already checked that a.s. − limn→∞ M (n) = ∞, and that a.s. −
limn→∞ Mn(n) = µ1 , the latter is just the elementary renewal theorem. On the other
hand,
M
1 X
a.s − lim Rk = r,
M →∞ M
1
42 CONTENTS
by the strong LLN. The same logic applies for initial rewards:
M (n)+1
CI M (n) + 1 1 X
= · Rk .
n n M (n) + 1 1
Remark 2.3.1. One can incorporate the case P(Ti = 0) = p0 > 0 as follows: Con-

sider N-valued (and hence positive) i.i.d. random variables T̃i with P T̃i = ` =
p`
1−p0
for ` = 1, 2, . . . , and an independent i.i.d sequence Nk such that Nk ∼ Geo(1 − p0 ).
n o
In this way the k-th (single) arrival for T̃i -process corresponds to a simultaneous
arrival of Nk customers in the original process. Precisely, set S̃k = k1 T̃` . Then,
P
X
M (n) = Nk 1{S̃k ≤n} . (2.3.6)
k
Exercise 2.3.1. Let T1 , T2 , T3 , . . . be N0 = N ∪ {0}-valued i.i.d. inter-arrival times.

Assume that p0 = P (Ti = 0) < 1. Define M (n) as in (2.3.6) and check that the first limit
in (2.2.5) still holds (with µ = E(Ti )); limn→∞ Mn(n) = µ1 .
M (n) in (2.3.6) makes sense even if i.i.d random variables {N` } have distribution
different from geometric. In this case the picture falls in the general framework of
the renewal-reward theorem.
Exercise 2.3.2. Buses arrive to an archeological site according to the discrete

renewal process with i.i.d inter-arrival times T1 , T2 , T3 , . . . which are distributed
Geo(p). Assume k-th bus carries a random number Nk of tourists, and that all
Nk -s are independent and, for each n = 1, 2, . . . the conditional distribution of Nk
given Tk+1 = n is Uni{1, . . . , n + 2}. According to the archeological site regulations,
tourists enter the site immediately upon arrival and leave it as soon as the next bus
arrives. Compute the average time (in a long run) that a tourist spends at the site.
2.4 Size bias, Excess life and Stationarity.

We proceed with considering N-valued inter-arrival times, that is P (Ti = 0) = 0 for
i > 1.
Size bias. The size bias (or biased sampling) is the following statement
Theorem 2.4.1. Recall our notation p` = P(Ti = `) and µ = ETi . Then, for any ` ∈ N,
n
1X `p`
lim 1{SM (k)+1 −SM (k) =`} = . (2.4.1)
n→∞ n µ
k=1
Formula (2.4.1) has the following interpretation: We chose a random time uni-
formly from {1, . . . , n} and ask for the probability that this random time falls into
an inter-renewal interval of duration `. Then (2.4.1) describes the asymptotics of
this probability as n → ∞. Since the right hand side of (2.4.1) is different from
p` there is a (size) bias for sampling intervals lengths as compared to the unbiased
distribution {p` }.
Proof of Theorem 2.4.1. we can rewrite the left hand-side of (2.4.1) in terms of
renewals with partial reward as follows: Set Rk = `1Tk =` and set
R(n, SM (n) , SM (n)+1 ) = (n − SM (n) )1TM (n)+1 =` .
In this notation (2.4.1) is just the renewal-reward statement (2.3.4).
Exercises below are mostly borrowed from the books by Durrett and Grimmet-
Stirzaker.
Exercise 2.4.1. Suppose the lifetime of a car is a random variable with probability
function p. B buys a new car always on january 1, as soon as the old one breaks
down during the preceeding year or reaches N years. Suppose a new car costs a NIS
and that an additional cost of b NIS is accumulate if the car breaks down before N .
What is the long-run cost per unit time of B’s car policy?
Calculate the cost (as a function of N ) when the lifetime is uniformly distributed
on {1, 10} (meaning that probaility that the car breaks down during the i-th year is
1/10 for i = 1, . . . , 10); a = 10, and b = 3.
Hint Take U1 , U2 , U3 , . . . as i.i.d Uni {1, . . . , 10} random variables and set inter-
renewal times Ti = min {Ui , N } and rewards Ri = a + b1Ti =N .
Exercise 2.4.2. Let T1 , T2 , . . . be i.i.d. inter-arrival times with T1 ∼ Poi(λ), λ > 0.

Calculate the probability that, as n → ∞, a point uniformly chosen from {0, n} will
fall on an interval of length at least u > 0.
Exercise 2.4.3. A machine is working for a geometric time with parameter p1 > 0
and is being repaired for a geometric time with parameter p2 > 0. At what rate does
the machine break down? Calculate the probability that the machine is working at a
time point uniformly chosen from {0, . . . , n} as n → ∞.
44 CONTENTS
Exercise 2.4.4. Bears arrive in a village at the instants of a renewal process with
ET1 = µ. They are captured and locked in a safe place which costs c NIS per unit
time per bear. When N bears have been captured, an expedition costing d NIS is
organized to remove and release them far away. What is the long-run average cost
of this policy.
Exercise 2.4.5. The weather in a certain locale consists of alternating wet and dry
spells. Suppose that the number of days in each rainy spell is a Poisson distribu-
tion with mean 2, and that a dry spell follows a Geometric distribution with mean
7. Assume that the successive durations of rainy and dry spells are independent.
Calculate the probability that it rains at a point uniformly chosen from {0, . . . , n} as
n → ∞.
Bernoulian formulation of P biased sampling. Let T1 , T2 , . . . be i.i.d. inter-

k
arrival times. As usual Sk = 1 Ti . Let now A(n) be a Bernoulli process of of
intensity (probability of success) p. Assume that A is independent of Ti -s. For
instance, one can think about Ti -s being random times between successive arrivals
of buses to a certain bus station, whereas A describes arrivals of passengers to the
same station. Let Rk be the number of passengers who arrived (according to the
Bernoulli process A) to the station during the time interval {Sk−1 , . . . Sk − 1}. Given
u > 0 consider the reward
Rku = Rk 1{Tk ≤u} .
Then the terminal award

M (n)
X
CTu (n) = Rku
1
describes the total number of passengers who departed before time n and whose
arrival “fell” into
the inter-arrival (buses) interval of length less or equal to u. Note
that A SM (n) describes the total number of passengers who departed from the
station by time n. By the renewal-reward theorem.
1 SM (n) S C u (n) 1 E(Ru )

lim CTu (n) = lim · M (n) · T = · . (2.4.2)
n→∞ A SM (n) n→∞ A SM (n) n n p E(T )
Now,
E(Ru ) = E(E(Ru | T )) = E 1{T ≤u} E(A(T ) | T ) = pE 1{T ≤u} T .

Therefore, the limiting ratio of passengers who fall into inter-arrival (buses) times
of length ≤ u is still given by size bias law (2.4.1).
Excess life distribution and stationarity. Define the excess life at time n
E e (n) = min {Sk : Sk ≥ n} − n. (2.4.3)
Note that it might happen that P (E e (n) = 0) > 0 despite our assumption on Ti ∈ N.
Exercise 2.4.6. Check that for any ` ∈ N0 the following limit exists P-a.s.
n P∞
1X pj P (T > `) ∆ e
lim 1{E e (k)=`} = P`+1
∞ = = p (`). (2.4.4)
n→∞ n
k=1 1 jpj µ
The distribution in (2.4.6) is called excess-life distribution.
Exercise 2.4.7. Compute discrete excess life distribution {pe` } for Ti -s being dis-
tributed according to the following laws:
(a) Poisson(λ).
(b) Geo(p).
(c) Bin(n, p).
Stationarity. We shall use Pe , Ske , M e etc for the delayed renewal with the delay
T1e distributed according to the excess life distribution.
Exercise 2.4.8. For the delay T1 = T1e being distributed according to the limiting excess
life distribution pe` -s in (2.4.4) check that v(n) ≡ µ1 is the unique solution to (2.2.1), and
that me (n) = n+1µ
for any n ∈ N0 .
By Exercise 2.4.8 the Pe -probability v(n) of having renewal does not change with
time. This is an expression of stationarity. Let us elaborate on this notion.
46 CONTENTS
Note the following property of the process {E e (n)}: For k, ` ∈ N0 set

(
P(T = ` + 1), if k = 0,
P (k, `) = (2.4.5)
1{`=k−1} , if k > 0.
Then, the conditional probability
P E e (n + 1) = ` E e (n), . . . , E e (0) = P (E e (n), `) .

(2.4.6)
Furthermore, the limiting excess life distribution {pe` } satisfies:

∞
X
pe (`) = pe (k)P (k, `) ∀ ` ∈ N0 . (2.4.7)
k=0
In the language of the next section, {E e (n)} is a Markov chain on N0 with matrix of
transition probabilities P and stationary distribution pe .
By the above, under Pe the sequence of excess life times is stationary in the
following sense:
Pe (E e (n) = `) = pe (`), (2.4.8)
for any n, ` ∈ N0 . In other words, under Pe the excess life time E e (n) has the same
distribution pe for any time n ∈ N0 . In fact (2.4.8) implies that the renewal process
M e itself is stationary, that is for any k ∈ N the distribution of M e (n + k) − M e (n)
does not depend on n. Indeed, first of all,
X
Pe (M e (n + k) − M e (n) = 0) = Pe (E e (n) > k) = pe (j).
j>k
In order to treat Pe (M e (n + k) − M e (n) = `) for ` > 0 set

φ` (n) = P0 M 0 (n) = ` ,

where as before P0 , M 0 correspond to the un-delayed renewal. We claim that for

any ` ≥ 1,
Pe (M e (n + k) − M e (n) = `) = Ee 1{E e (n)≤k} φ` (k − E e (n)) .

(2.4.9)
Indeed, if ` ≥ 1, then
k−`+1
X
e e e
P (M (n + k) − M (n) = `) = Pe (M e (n + k) − M e (n) = `, E e (n) = m) .
m=0
(2.4.10)
It remains to notice that
Pe (M e (n + k) − M e (n) = `, E e (n) = m) = Ee 1{E e (n)=m} φ` (k − m) ,

and therefore, in view of stationarity of E e , the expression in (2.4.10) does not

change with n.
Remark 2.4.1. A different way to formulate stationarity under Pe is as follows:
The random sequence V = (V (0), V (1), V (2), . . .) is an element of {0, 1}N0 . Let
A ⊆ {0, 1}N0 be a measurable subset, as briefly discussed in the end of Section 1.1.
Stationarity means that for any n ∈ N,
Pe (V ∈ A) = Pe (θn V ∈ A) , (2.4.11)
where θn V = (V (n), V (n + 1), . . .) is a shifted sequence.
2.5 Coupling and the key renewal theorem.

Consider delayed renewal with delay T1 and i.i.d integer-valued inter-arrival times T2 , T3 . . . . We assume that
the distribution of inter-arrival times is proper; P (T ∈ N) = 1, and, furthermore, that it has finite expectation;
E(T ) < ∞. In other words, we assume that the probabilities
∞ ∞
∆ X X
p` = P (T = `) for ` = 1, 2, 3, . . . satisfy p` = 1 and `p` = µ < ∞. (2.5.1)
`=1 `=1
In this case the limiting excess life distribution pe` is defined, see (2.4.4), and the corresponding renewal process

M (n) is stationary. As before we use P and P0 for the laws of stationary and un-delayed renewals, and reserve
e e
notation P for the generic delayed one.
Exercise 2.5.1. Check that any delayed renewal process M satisfies the following property: Define
Ak = {Renewal at time k} .
Then conditionally on Ak , the process M̃ (n) = M (k + n) − M (k) is distributed like (meaning has the same finite
dimensional distributions) the un-delayed renewal process. In other words for any n1 < n2 < · · · < nj and any
`1 ≤ `2 ≤ · · · ≤ `j ,
P0 M (k + n1 ) − M (k) = `1 , . . . , M (k + nj ) − M (k) = `j Ak

(2.5.2)
= P0 (M (n1 ) = `1 , . . . , M (nj ) = `j ) .
We would like to argue that under the above assumptions on inter-arrival times (and under the tacit assumption
P (T1 < ∞) = 1), the delayed renewal M will converge to N e . In order to make such statement precise let us relabel
M (·) as follows: Set
Vn = 1{Renewal at time n} = 1{∃k : Sk =n} .
Thus {Vn } is just a random sequence of 0-s and 1-s. Evidently, one can recover M (·) from {Vn } and visa versa. We
reserve notation {Vne } for the stationary renewal with T1 distributed according to the limiting excess life distribution
(2.4.4).
Definition Let us say that M (·) converges to M e (·) as n → ∞ if,
lim sup P ((Vn , Vn+1 , . . . ) ∈ A) − Pe (Vne , Vn+1

e

,...) ∈ A = 0, (2.5.3)
n→∞ A
where the supremum is over all cylindrical (that is depending only on finitely many coordinates) subset A ⊆ {0, 1}N0 .
In fact it would be enough to consider only particular cases of delayed renewal Pa with P (T1 = a) = 1. Indeed,
in general,
∞
X
P (·) = P (T1 = a) Pa (·) .
a=0
P∞
Since Pe(·) = e a
a=0 pa P (·), we conclude that (2.5.3) is equivalent to the following statement: For n ∈ N0 and a
sequence v = (v0 , v1 , . . . ) ∈ {0, 1}N0 define the shift θn v = (vn , vn+1 , . . . ). Then
lim sup Pa (θn V ∈ A) − P0 (θn V ∈ A) = 0, (2.5.4)

n→∞ A
48 CONTENTS
for any a ∈ N0 . A look at (2.5.4) reveals that we may expect it only if the distribution of inter-arrival times is
a-periodic. That is,
Aperiodicity assumption. We assume that
g.c.d. {` : p` > 0} = 1, (2.5.5)
where g.c.d stands for the greatest common divisor. It is known that (2.5.5) implies that there exists n0 < ∞ such that
∆
v0 (n) = E0 (Vn ) = P0 (Vn = 1) > 0 (2.5.6)
for every n ≥ n0 .
We proceed to work under Assumption (2.5.6). The proof of (2.5.4) is split into several steps.
STEP 1. (Basic coupling construction). Let {Vn } and {Ṽn } be two renewal sequences which correspond to different
(but both P-a.s. finite) delays, for instance to 0-delay and to a-delay as in (2.5.4), but in the beginning it would
make sense to do things in general. Since we want to make a statement about the marginal laws P and P̃, there is
no loss to assume that {Vn } and {Ṽn } are independent. We use Q for the corresponding product law.
Exercise 2.5.2. (a) Check that if {Vn } and {Ṽn } are independent, then
Un = Vn · Ṽn (2.5.7)
is a delayed renewal sequence.

(b) Set τ = inf {n > 0 : Un = 1}, that is τ is the delay time associated to {Un }. Check that for any 0 < k ≤ n the
conditional, on {τ = k}, distributions of both θn V and θn Ṽ are given by:

τ = k = P0 (θn−k V ∈ A)

Q θn V ∈ A τ = k = Q θn Ṽ ∈ A (2.5.8)
Formula (2.5.8) implies that

Q (θn V ∈ A) − Q θn Ṽ ∈ A ≤ 2Q (τ > n) , (2.5.9)
for any n and for any cylindrical A ⊆ {0, 1}N0 . In particular, if Qa is the product measure for independent copies of
0-delayed and a-delayed renewals, and if τ a = inf n > 0 : Vn0 · Vna = 1 , then (2.5.4) would follow if we show that
Qa (τ a < ∞) = 1. (2.5.10)
STEP 2. At this stage let us consider two independent copies of stationary renewal processes {Vne } and {Ṽne }. Let
Qe be the corresponding product measure. As we know from Exercise 2.5.2(a) , the process Une = Vne · Ṽne is a
delayed renewal sequence under Qe . Let J1e and J1 , J2 , . . . be the corresponding independent delay and the i.i.d
inter-arrival times. By stationarity and independence of two copies,
1
Qe (Une = 1) = Pe (V e = 1)2 = Pe (E e (n) = 0)2 = > 0. (2.5.11)
µ2
On the other hand, by the delayed renewal Theorem,
n
!
1 e X Qe (J e < ∞)
lim Q Uke = . (2.5.12)
n→∞ n 1
Qe (J1 )
Check that (2.5.11) implies that

Qe (J e < ∞) 1
= 2, (2.5.13)
Qe (J1 ) µ
Together (2.5.11) and (2.5.12) imply that Qe (J1 ) < ∞, in particular that the probability Qe (J1 < ∞) = 1.
d
STEP 3. Notice now that J1 is a distributed as the coupling time τ 0 of two zero delayed renewals, J1 = τ 0 .
Therefore
Q0 (Un = 1 i.o.) = 1.
Hence, for any m ≥ n0 and a ∈ N,

1 = Q0 (Un = 1 i.o.) = Q0 Un = 1 i.o. Vm · Ṽm+a = 1 Q0 Vm · Ṽm+a = 1
(2.5.14)
+ Q0 Un = 1 i.o.; Vm · Ṽm+a = 0 .

We could write conditional expectation since by (2.5.6), Q0 Vm · Ṽm+a = 1 = g(m)g(m + a) > 0 as soon as
m ≥ n0 . But since we have 1 on the left - hand side of (2.5.14), it immediately follows that

Q0 Un = 1 i.o. Vm · Ṽm+a = 1 = 1.

It remains to notice that under Q0 · Vm · Ṽm+a = 1 the shifted sequences W̃k = Vm+k and W̃k = 1{k≥a} Ṽm+k
are independent and distributed according to the product measure Qa . Hence,

Qa Wk = W̃k i.o = 1 ⇒ (2.5.10).
Key renewal theorem in discrete case. Assume (2.5.1) and (2.5.5). Then, the last of (2.2.5) holds,
that is
1
lim v(n) = . (2.5.15)
n→∞ µ
Moreover, let τ e be the coupling time between independent zero renewal and stationary renewal sequences under the product
measure Qe . Then,
1 2Qe (ϕ(τ e ))
v(n) − ≤ .
µ ϕ(n)
for any non-negative increasing function ϕ.
50 CONTENTS
3 Conditional expectations.
Let (Ω, A, P) be a probability space, and let B ⊆ A be a sub σ-algebra. A random
variable Y is said to be B-measurable; Y ∼ B, if
{ω : Y(ω) ≤ y} ∈ B
for any y ∈ R. A ranom variable Y is said to be bounded if there exists K < ∞

such that P (|Y| ≤ K) = 1.
3.1 Definition and basic properties.

Let now X be a random variable with E (|X|) < ∞. We do not assume that X is
B-measurable.
that a B-measurable random variable Z is a conditional expecta-

Definition. Let us say
tion; Z(ω) = E X B (ω), if
E (YZ) = E (YX) , (3.1.1)
for any bounded B-measurable random variable Y.
Remark 3.1.1. In fact, it is enough to check (3.1.1) only for Y-s which are indicators of
events from B, that is for Y = 1B for some B ∈ B.
Note that in particular,

E E X B = E (X) . (3.1.2)
Remark 3.1.2. If B is an atomic σ-algebra, that is if B = σ (B1 , B2 , . . .), where

B1 , B2 , . . . is a (disjoint) partition of Ω, and if P (B` ) > 0 ∀`, then,
X
E X B (ω) = E (X|B` ) 1B` (ω). (3.1.3)
`
Remark 3.1.3. If (X, Y) is a continuous random vector with joint probability density fXY ,
and if B = σ(Y), then
Z
∆
E X B (ω) = E X Y (ω) = xfX|Y (x|Y(ω))dx, (3.1.4)
fXY (x,y)
where fX|Y (x|y) = fY (y)
1fY (y)>0 is the conditional density.
3. CONDITIONAL EXPECTATIONS. 51
Exercise 3.1.1. Let U ∼ Uni(0, 1) and n ∈ N. Define the conditional distribution
of Sn given U = u to be binomial Bin(n, u). That is
Z 1 Z 1
n
P (Sn = k) = P (Sn = k|U = u) du = uk (1 − u)n−k du
0 k 0
Find E (U |σ(Sn )).

Hint. Recognize β-distribution in the above formula.
Exercise 3.1.2. Given a random variable X with finite variance Var(X) < ∞ and
a σ-algebra F define the conditional variance Var (X|F) = E (X 2 |F) − (E (X|F))2 .
Prove that
Var(x) = E (Var (X|F)) + Var (E (X|F)) .
Exercise 3.1.3. Let (X, Y ) be random variable with joint density function fXY (x, y) =
xe−x(1+y) 1{x,y>0} . Compute E (X|Y ) and E (Y |X).
Properties of conditional expectation. The following set of properties holds P-a.s.

(with P-probability one):
P1. For any B-measurable random variable Y such that E (|YX|) < ∞,

E YX B (ω) = Y(ω)E X B (ω). (3.1.5)
P2. If X and B are independent, then

E X B (ω) = E (X) . (3.1.6)
P3. (Tower property) If C ⊆ B, then

E X C (ω) = E E X B C (ω). (3.1.7)
P4. (Projection/least square property). If X has finite variance; E (X2 ) < 0, then the
conditional expectation Z = E (X|B) solves the following minimization problem:
E (X − Z)2 = min E (X − Y)2 (3.1.8)

Y∼B
P5. (Inequalities). Cauchy-Schwartz, Hölder and Jensen inequalities hold for conditional
expectations. For instance, if ϕ is convex, and E (|ϕ(X)|) < ∞, then

E ϕ(X) B (ω) ≥ ϕ (E (X|B) (ω)) P − a.s.. (3.1.9)
X is non-negative, that is if P (X ≥ 0) = 1, then the conditional

P6. (Positivity). If
expectation E X B is also non-negative.

Exercise 3.1.4. Let Z = E X B . Show that if E (X2 ) = E (Z2 ) < ∞, then
P (X = Z) = 1.
52 CONTENTS
Hint. Check that if Z = E X B , then E (X2 ) − E (Z2 ) = E (X − Z)2 .

Exercise 3.1.5. Let B be an atomic σ-algebra. Check that E(X|B) defined in (3.1.3)
indeed satisfies P1-P3 and P5-P6.
3.2 Example: Branching process.

Galton-Watson branching process is a model for the evolution of population sizes.
The main input is the distribution p = {p` } of a random variable η ∈ N, to which in
the sequel we shall refer as to a branching mechanism, and which describes random
number
i of offsprings of an individual member of the population. Specifically, let
ηj be i.i.d. random variables with the same probability function p as η. Let Zn
be the size of the population at generation n. In general Z0 is a random
variable
which is assumed to be independent of the i.i.d reproduction array ηji . Let us for
simplicity take Z0 = 1. Then population sizes Zn are recursively defined as
Zn
X
Zn+1 = ηjn . (3.2.1)
j=1
In other words each of the Zn -members of the population in generation n gives

birth to a random number of offsprings. Since, in general, p0 = P(η = 0) > 0 the
population may die out, and for Zn = 0 one should read at (3.2.3) as Zn+1 = 0.
Probability of extinction. Set
α∞ = lim P (Zn = 0) . (3.2.2)

n→∞
The above limit exists since {Zn = 0} is a non-decreasing sequence of events.
Theorem 3.2.1. Define µ = E(η) and σ 2 = Var(η) > 0. Then,

(
nσ 2 , if µ = 1
E (Zn ) = µn , Var (Zn ) = σ2 (µn −1)µn−1 . (3.2.3)
µ−1
if µ 6= 1
Furthermore (recall that we assume genuine randomness σ > 0), α∞ = 1 ⇔ µ ≤ 1.
If µ > 1, then probability of extinction α equals to the smallest non-negative root of
the equation
∆
s = Gη (s) = E (sη ) . (3.2.4)
Proof. Function G in (3.2.4) is called probability generating function. It works
particularly well with non-negative integer valued random variables. In the latter
case it is well defined on [0, 1], and it is possible to read off various quantities from
G and its derivatives. For instance, if GZ is a probability generating function of a
non-negative integer valued random variable G, then
1 dk d d2
P (Z = k) = GZ (0), E (Z) = GZ (1), E (Z(Z − 1)) = GZ (1) (3.2.5)
k! dsk ds ds2
d
Above, G (1)
ds Z
is understood either in the limiting sense
d d
GZ (1) = lim GZ (s)
ds s→1 ds
d 2
or as the left derivative at s = 1. The same regarding ds 2 GZ (1). In general GZ is
not defined on (1, ∞).

In the sequel let G be the probability generating function of η, and let Gn be
the probability generating function of Zn . Since we are assuming that Z0 = 1, the
size of the first generation Z1 is distributed like η and hence G1 = G.
Let us develop a recursion for Gn :
Gn+1 (s) = E sZn+1 = E E sZn+1 Zn = E G(s)Zn = Gn (G(s)).

(3.2.6)
Consequently,
d
Gn+1 (s) = G0n (G(s)) G0 (s).
ds
Since, G(1) = 1 and G0 (1) = µ, we infer:
E(Zn+1 ) = µE(Zn ) (3.2.7)
Similarly,
2
G00n+1 (s) = G00n (s) (G0 (s)) + G0n (s)G00 (s).
Substituting the last of (3.2.5) yields, by induction, the second of (3.2.3).
Now, let αn = P (Zn = 0), α∞ = limn→∞ αn . By the first of (3.2.5),
(3.2.6)
αn+1 = Gn+1 (0) = Gn (G(0)) = G (Gn (0)) = G(αn ) (3.2.8)
By continuity of G this implies that the extinction probability is indeed a root
of the equation (3.2.4). Note that the point α = 1 is always a root of (3.2.4).
Since G00 (s) = E (η(η − 1)sη ), the function G is (apart from the trivial case of
Bernoulli η ∈ {0, 1}, which we do not consider) strictly convex on [0, 1] there are
two alternatives: either µ ≤ 1 and then α = 1 is the only root of (3.2.4) on [0, 1].
Otherwise, if µ > 1, there should be an additional root α ∈ [0, 1). Since α0 = 0
and since G is monotone this additional root will be precisely the limit α∞ of the
sequence αn .
Exercise 3.2.1. (Following Grimmett-Strizaker) Consider geometric branching rates,
that is let p be the probability of success and q = 1 − p the probability of failure, and
assume that i.i.d. random variables ηij -s are distributed according to Geo0 (p). As-
sume that Z0 = 1.
a. Check that
( n−(n−1)s
n+1−ns
, if p = 21
Gn (s) = q(pn −qn −ps(pn−1 −qn−1 )) .
pn+1 −q n+1 −ps(pn −q n )
, if p 6= q
b. Compute P(Zn = 0) = Gn (0). Define T = inf {n : Zn = 0}. Compute
P(T = n) and E(T ).
54 CONTENTS
Exercise 3.2.2. Prove that in the general case of random progenitor
Z0 (which
is assumed to be independent of the i.i.d reproduction array ηji ), the extinction
probability is still 1 if µ ≤ 1, but equals to
1 − GZ0 (1 − α∞ ) < 1,
if µ > 1.
3.3 Example: Erdos-Renyi random graph

Let Kn be the complete graph of n vertices. Let us open each of its n2 edges with

probability pn = na , where a ∈ (0, ∞) is a fixed constant. Alternatively, we associate

with each edge e = (ij) a Bernoulli random variable ηe = η(ij) such that
a
Pn (ηe = 1) = (3.3.1)
n
We shall say that i is connected to j; i ↔ j, if there exists a sequence of vertices
v0 = i, v1 , . . . , vk = j such that η(v` v`+1 ) = 1 for all ` = 0, . . . , k − 1. The connected
cluster C1 is
Ci = {j : j ↔ i}
All the connected clusters Ci have the same distribution. In the sequel C := C1 .
Theorem 3.3.1. If a < 1, then the size |C| is of order 1. That is, that
lim lim sup Pn (|C| > K) = 0 (3.3.2)

K→∞ n→∞
If a > 1, then, with positive probability, the size |C| is of order n. Precisely there exists
δ > 0, such that
lim inf Pn (|C| > δn) > 0 (3.3.3)
n→∞
Sketch of the proof. We can think about C being constructed as follows: Set I0 =
{1}. Then iterate:

It+1 = j ∈ Kn \ It : ∃ i ∈ It with η(ij) = 1 . (3.3.4)
Of course It+1 = It implies that all of C is constructed, C = It .

Using the above, the size Zt = |It \ It−1 | can be compared with branching pro-
cesses as follows:
Upper bound. Consider the branching process Ztu with branching mechanism
a

Bin n, n . Then one can construct (coupling) Zt and Ztu on the same probability
space such that Zt ≤ Ztu for all t = 1, . . . , n.
l
Lower bound. For ∈ (0, 1) consider the branching process Zt with branching
a
mechanism Bin n(1 − ), n . Then one can construct (coupling) Zt and Ztl on the
same probability space such that
Zt 1It−1 ≤n ≥ Ztl 1It−1 ≤n . (3.3.5)
for all t = 1, . . . , n.
Given the above Upper and Lower bounds we are left with the following task:
Let Zn be a branching process with branching mechanism Bin n, nb . Then, if b < 1,
∞
!
X
lim lim sup Pn Zk > K = 0. (3.3.6)
K→∞ n→∞
1
On the other hand, if b > 1, then there exists δ > 0, such that
∞
!
X
lim inf Pn Zn > δn > 0 (3.3.7)
n→∞
1
In fact, in view of the Poisson approximation, one is tempted to substitute the

branching mechanism Bin n, nb by Poi(b). This way or another proofs of (3.3.6)
and (3.3.7) are base on the P following random walk representation of the total (all
generations) population size ∞ 0 Zn : Let Zn be the branching process with branch-
ing mechanism η. Let η1 , η2 , . . . be the i.i.d. copies of η. Define
k
X
S0 = 1 and Sk = S0 + (ηj − 1) .
1
P∞
Then, 0 Zn is distributed as the first hitting time T of the random walk {Sk } ,
T = inf {k ≥ 1 : Sk = 0} . (3.3.8)
In order to derive (3.3.6) and (3.3.7) from (3.3.8) one needs to develop an elementary
large deviation theory, and we shall return to this issues in the sequel.
56 CONTENTS
4 Markov Chains.
4.1 The setup.
Let S be a finite or countable set. In the sequel we shall call it state space. Let
(Ω, F, P) be a probability space. Consider a sequence X = (X0 , X1 , . . .) of S-valued
random variables. Define the following filtration of sigma-algebras:
Fn = σ (X0 , . . . , Xn ) ; n = 0, 1, 2, . . . .
Initial distribution π0 is the distribution of X0 ; π0 (x) = P (X0 = x) ∀x ∈ S. Sometimes

it helps to think about π0 in terms of |S|-dimensional vector with entries π0 (x); x ∈ S.
This vector is probabilistic in the sense that
X
0 ≤ π0 (x) ≤ 1 and π0 (x) = 1.
x∈S
Similarly regarding πn -the distribution of Xn .

Notation Pµ and Px are used to stress that π0 = µ and π0 = δx .
Matrix of transition probabilities Pn is a |S| × |S| dimensional matrix with entries
Pn (x, y) = P (Xn+1 = y | Xn = x) . In the sequel we shall assume that Pn ≡ P,
and call this stationary transition probabilities. Note that P is stochastic:

X
P(x, y) = 1 ∀x ∈ S.
y∈S
Markov operator. For any bounded function f : S 7→ R define

X
Pf (x) = P(x, y)f (y). (4.1.1)
y
Markov property. Random sequence X is called Markov chain if for any n and any
bounded function f on S
E (f (Xn+1 ) | Fn ) = Pf (Xn ) P − a.s. (4.1.2)
Alternative definitions. The usual definition is: For any n and any x0 , . . . , xn , xn+1 ∈
S,
P (Xn+1 = xn+1 | Xn = xn , . . . , X0 = x0 ) = P(xn , xn+1 ). (4.1.3)
The relation (4.1.3) implies that finite dimensional distributions of X under Pµ are
given by
Pµ (X0 = x0 , X1 = x1 , . . . , Xn = xn ) = µ(x0 )P(x0 , x1 )P(x1 , x2 )P(xn−1 , xn ). (4.1.4)
4. MARKOV CHAINS. 57

By the Caratheodory’s extension theorem (4.1.4) has a unique extension to SN0 , A∞ ,
where A∞ is the σ-algebra generated by all cylindrical subsets of SN0 , or in other
words (4.1.4) unambiguously defines the distribution of infinite random sequence
(process) X.
To formulate a more upscaled version of (4.1.2) let F be a bounded measurable1
function on SN0 . For instance, let F be a local function, which means that there
exists k < ∞ such that F depends only on first k + 1 coordinates of x = (x0 , x1 , . . . );
F (x) = F (x0 , . . . , xk ).
Example 4.1.1. Let y ∈ S, k ∈ N, and consider
k
X
F (x0 , x1 , x2 , . . . ) = δy (xi ).
i=1
That is F (X) is number of visits to y by the Markov Chain X during first k steps.
Define
P[F ](x) = E (F (X)|X0 = x) = Ex F (X).
Recall our notation for shifts: θn (x0 , x1 , . . .) = (xn , xn+1 , . . .). In Example 4.1.1,
F (θn X) is the number of visits to y by X during the time interval {n + 1, n + 2, . . . , n + k}.
The upscaled version of Markov property states that for any boundsed measur-
able function F ands for any n,
E (F (θn X) | Fn ) = P[F ](Xn ) P − a.s. (4.1.5)
We shall take (4.1.5) for granted (see though Exercise 4.1.1 which is a first step
towards deriving (4.1.5) from (4.1.2)).
In order to see how (4.1.2) implies (4.1.3) consider the function f (x) = δxn+1 (x)
and the event B = {X0 = x0 , . . . , Xn = xn }. Of course B ∈ Fn . Note that for any
x ∈ S and for any function g on S,
Pf (x) = P(x, xn+1 ) and g(Xn )1B = g(xn )1B . (4.1.6)
Now,
P (Xn+1 = xn+1 , Xn = xn , . . . , X0 = x0 ) = E (f (Xn+1 )1B ) = E (1B E (f (Xn+1 ) | Fn ))
(4.1.6)
= E (1B Pf (Xn )) = E (1B P(xn , xn+1 )) = P(B)P(xn , xn+1 ).
This is (4.1.3).
Exercise 4.1.1. (a) Check that (4.1.2) implies (4.1.3).
(b) Check that (4.1.2) (and hence (4.1.3)) imply the following: for any 0 ≤ n1 <
n2 · · · < nk < nk+1 < · · · < nk+` and for any x1 , x2 , . . . , xk+` ∈ S,

P Xnk+` = xk+` , . . . , Xnk+1 = xk+1 Xnk = xk , . . . , Xn1 = x1
(4.1.7)
= P Xnk+` = xk+` , . . . , Xnk+1 = xk+1 Xnk = xk .
1
Measurable here means that F (X) is a random variable
58 CONTENTS
4.2 Examples.
Random walk on Zd . Let ξ1 , ξ2 , . . . be a collection of i.i.d Zd random variables.
We think about ξ` -s as of steps of the random walk Xn . In its turn the position of
the walker at time n is given by: Fix x ∈ Zd -starting point. Then,
Xn
X0 = x and for n ≥ 1 Xn = x + ξ` . (4.2.1)
`=1
d
In this example S = Z and P(x, y) = P (ξ = y − x).
Branching process. Let {ξkn } be an i.i.d array of N0 -valued random variables.

We think of ξkn as of number of offsprings of member #k of generation n. Then, the
branching process which starts at time zero with x members is given by:
Xn
X ∞
X
n
X0 = x and for n > 0 Xn+1 = ξk 1{Xn >0} = ξkn 1{Xn ≥k} . (4.2.2)
k=1 k=1
Here S = N0 , and the matrix of transition probabilities is

P(0, k) = δ0 (k) and for ` > 0 P(`, k) = P (ξ1 + · · · + ξ` = k) , (4.2.3)
where ξ1 , . . . , ξ` are i.i.d. with the same offspring distribution.
Repair shop. Let ξ1 , ξ2 , . . . be i.i.d numbers of machines for repair to the repair
shop on mornings of days 1, 2, . . . . Assume that the shop is capable of repairing
exactly one machine per day. Let X0 be a (random) number of machines before
repair in the end of day 0, which we assume to be independent of ξ` -s. Define Xn as
the number of machines before repair in the end of day n. Then,
For any n ≥ 0 Xn+1 = max {Xn + ξn+1 − 1, 0} . (4.2.4)
If we set p` = P (ξ = `), then the matrix of transition probabilities is given by:
(
p0 + p1 if ` = 0
P(0, `) = and, for k > 0, P(k, `) = p0 1{`=k−1} + p`−k+1 1{`≥k} .
p`+1 if ` > 0
(4.2.5)
Next two models are not explicitly based on a sequence of i.i.d.-s
Fisher-Wright model (of fixed size haploid population). Let the population size
N be fixed. Each member of a certain generation is either of type A or of type
B. We assume that each member of (n + 1)-st generation picks a type at random
from one of the members of generation n, and independently from other members
of (n + 1)-st generation. Xn is random number of members of type A in generation
n. The state space is finite S = {0, . . . , N }. An the transition probabilities are:
` N −`
N k k
P(k, `) = δ0 (k)δ0 (`) + δN (k)δN (`) + 1{0<k<N } 1− . (4.2.6)
` N N
Ehrenfest chain (particle exchange dynamics between two containers). There
are N particles at states A or B. At each step one chooses a particle at random and
changes its state. Let Xn be the number of particles in state A at time n.
As before the state space is finite; S = {0, . . . , N }. However, the matrix of
transition probabilities is given by:
N −` `
P(`, ` + 1) = 1{`<N } and P(`, ` − 1) = 1{`>0} . (4.2.7)
N N
Birth and death processes in discrete time. The state space is S ⊆ N0 . The
jump probabilities P(`, k) are
• For ` = 0, 1, 2, . . . probabilities of births p` = P(`, ` + 1).
• For ` = 0, 1, 2, . . . probabilities of deaths q` = P(`, ` − 1). We set q0 = 0,
• For ` = 0, 1, 2, . . . probabilities r` = P(`, `) = 1 − p` − q` .
The model incorporates finite state spaces {m, . . . , m}, if we set qm = pn = 0.
4.3 Linear algebra.

The theory of Markov chains may be viewed as a chapter of linear algebra. By itself
such point of view does not help to understand many issues of interest. Still:
Exercise 4.3.1. (a) Prove that in general the n-step transition probabilities of a Markov
chain are given by
P Xn = y X0 = x = Pn (x, y),

(4.3.1)
where Pn (x, y) are entries of the n-th power of the transition matrix P
(b) Prove that the distribution πn (vector) of Xn could be recovered from the initial distri-
bution π0 via:
πn = π0 Pn . (4.3.2)
(c) Consider two state MC with transition matrix

1−α α
P=
β 1−β
Compute Pn and limn→∞ Pn .

(d) Consider Fisher-Wright model with N = 2. Compute Pn and limn→∞ Pn .
(e) Consider Ehrenfest chain with N = 3. Compute limn→∞ Pn .
Hint: Consider first reduced chain with two states: a = {0, 3} and b = {1, 2}.
60 CONTENTS
4.4 Strong Markov property.
Let X be a Markov chain. As before define the filtration Fn = σ (X0 , . . . , Xn ). Recall
that a random variable T is a stopping time with respect to {Fn } if
{T = n} ∈ Fn ,
for every n ∈ N0 .
Basic Example. For x ∈ S define the first hitting time Hx ands its hitting/return time
version Tx of x:
Hx = inf {n ≥ 0 : Xn = x} and Tx = inf {n > 0 : Xn = x} . (4.4.1)
Since
{Hx = n} = {X0 6= x; X1 6= x, . . . , Xn−1 6= x; Xn = x} ∈ Fn ,
the random variable Hx is a stopping time. The same regarding Tx . Note that Hx 6= Tx
only if the chain starts at x.
Exercise 4.4.1. Consider a simple random walk Xn on Z with jump probabilities
P (ξ = 1) = p = 1 − P (ξ = −1). Assume that p > 1/2 and that the walk starts at
zero.
(a) Check that P (limn→∞ Xn = ∞) = 1.
(b) For x > 0 consider the last visit time,
Lx = sup {n : Xn = x} .
Check that P (Lx < ∞) = 1 (that is sup is actually P-a.s. max). Check that Lx is
not a stopping time.
Let T be a stopping time.
Definition.
FT = {B ∈ F : B ∩ {T = n} ∈ Fn ∀n ∈ N} . (4.4.2)
Roughly speaking FT contains events which are determined by the behaviour of

the chain up to time T.
Example 4.4.1. Let, as before, Tx be the first hitting/return time of a MC X to x.
First of all the event {Tx < ∞} belongs to FTx . Next, for any other y ∈ S consider
Tx
X
Nyx = δy (Xk ). (4.4.3)
k=1
Nyx describes random number of visits at y before the first hitting/return to x. Con-
sider for instance B = {Nyx = 5}. Then, for any n ∈ N (since by construction
Tx ≥ 1, there is no point considering n = 0),
( n )
X
B ∩ {Tx = n} = δy (Xk ) = 5 ∩ {Tx = n} ∈ Fn .
1
Strong Markov Property. Let X be a Markov chain (on a finite or countable state
space S) and let T be a stopping time. Given a sequence x = (x0 , x1 , . . .) ∈ SN0 define the
random shift θT x = (xT , xT+1 , . . .). Then for any bounded measurable function F on SN0
(see the footnote after (4.1.3)),

E 1{T<∞} F (θT X) FT = 1{T<∞} P [F ] (XT ), (4.4.4)
P-a.s. In particular,

E F (θT X) T < ∞ = E P [F ] (XT ) T < ∞ . (4.4.5)
Informally Strong Markov Property means that the chain starts afresh (forgets about the
past) at stopping time.
Proof of (4.4.5).
∞
X ∞
X

E F (θT X) 1{T<∞} = E F (θT X) 1{T=n} = E 1{T=n} E F (θn X) Fn
n=0 n=0
∞
(4.1.5) X
= E 1{T=n} P[F ](Xn ) = E 1{T<∞} P[F ](XT )
n=0
Regenerative structure of Markov Chains. Let x ∈ S. Consider the hitting

time Hx defined in (4.4.1). We know that Hx is a stopping time. By definition
XTx = x on the event {Hx < ∞}. Let B ∈ FTx , for instance consider B from the
Example 4.4.1. Let x1 , . . . xn ∈ S. By the Strong Markov Property,

P B; XHx +1 = x1 , . . . XTx +n = xn Hx < ∞
(4.4.6)
= P B Tx < ∞ Px (X1 = x1 , . . . , Xn = n) .
Relation (4.4.6) has the following implication: Consider a MC X which starts at some
probability distribution µ and fix x ∈ S. Let T1 = H − x = inf {n ≥ 0 : Xn = x} be
the first hitting time of x. By definition, T1 = Hx ∈ {0, 1, . . . , ∞}. On the event
∆
T1 = S1 < ∞ define
S2 = min {n > S1 : Xn = x} ∈ N ∪ {∞} and T2 = S2 − S1 .
By (4.4.6) the conditional (on {T1 < ∞} ) distribution of T2 is the distribution of

the return time Tx under Px . Proceeding inductively we construct:
62 CONTENTS
Cycle decomposition of X with respect to a point x ∈ S. Consider a MC X which
start at arbitrary initial distribution µ, that is for any y ∈ S, P (X0 = y) = Pµ (X0 = y) =
µ(y). It can be represented as follows: Let T1 be distributed as Hx under Pµ , and let T2 ,
T3 , T4 , . . . be i.i.d random variables, which are independent of T1 and distributed as Tx
under Px .
Furthermore, let X(1) be a realization of X on time interval {0, . . . , Hx } under Pµ , and,
for k = 2, 3, . . . , letPX(k) be independent realizations of X on time interval {0, . . . , Tx }
under Px . Set Sk = ki=1 Ti , exactly as we did in the construction of the delayed renewal
process. Then
∞
X
(1) (k)
Xn = Xn 1{n<T1 } + Xn−Sk−1 1{Sk−1 ≤n<Sk } (4.4.7)
k=2
is distributed as MC X under Pµ .
The decomposition (4.4.7) is always true, but it is not always very informative.
For instance it might happen that the chain which starts at µ will never go to x, that
is it might happen that Pµ (Hx = ∞) > 0. Or it might happen that Ex (Tx ) = ∞.
Moreover, if we want to study long term behaviour, for instance convergence in
distribution of Xn , periodicity issues should matter. All these cases should be sorted
out before attempts to apply renewal theory.
4.5 Class properties. Transience and recurrence.

The following definitions are for general Markov Chains on a finite or countable
state space S.
Definition 1.(Equivalence) Two states x, y are equivalent; x ↔ y, if there exisist n, m ≥ 0
such that
Pn (x, y) > 0 and Pm (y, x) > 0. (4.5.1)
This is an equivalence relation on S.
Definition 2.(Irreducibility) A chain is said to be irreducible if x ↔ y for any x, y ∈ S.
Definition 3.(Transience and Recurrence) A state x ∈ S is called recurrent if
Px (Tx < ∞) = 1. Otherwise, if Px (Tx < ∞) < 1, it is called transient.
Definition 4.(Positive and Null Recurrence) A recurrent state x is called positive recur-
rent if Ex (Tx ) < ∞. Otherwise it is called null recurrent.
Definition 5.(Period) For a given state x ∈ S, the period dx of x is defined as
dx = g.c.d. {n ≥ 1 : Pn (x, x) > 0} . (4.5.2)
For x, y ∈ S set:
fx = Px (Tx < ∞) and fxy = Px (Ty < ∞) . (4.5.3)

P 4.5.1. Define Mx to be the total (could be ∞) number of visits at x;

Exercise
Mx = ∞ 1 δx (Xn ).
(a) Use strong Markov Property to check that under Px , Mx is distributed Geo0 (1 −
fx ).
(b) If x ↔ y, then x is recurrent iff y is recurrent.
(c) If x ↔ y then fx = 1 implies that fyx = 1. Construct an example of two
equivalent x and y such that fx < 1, but fxy = 1.
(d) If x ↔ y and Ex (Tx ) < ∞, then Ey (Ty ), Ex (Ty ), Ey (Tx ) < ∞.
(e) Compute fx and fxy for simple random walk on Z with jump probabilities p to
the right and q = 1 − p to the left.
Exercise 4.5.2. Let p + q = 1. Consider MC on N0 with transition probabilities
P(0, k) = pq k−1 and P(k, k − 1) = 1 for k = 1, 2, . . . .
(a) Show that all the states are positively recurrent; Ek (T` ) < ∞ for any k, ` ∈ N0 .
(b) Find a probability distribution µ on N0 such that Eµ (Tk ) = ∞ for any k ∈ N0 .
Exercise 4.5.3. Let X be a MC on S. Recall the definition of the period
∆
dx = g.c.d. {n ≥ 1 : Pn (x, x) > 0} = g.c.d. {n1 , n2 , . . .} .
(a) Set n0 = 0 and check that dx = mink≥1 (nk − nk−1 ).

(b) Check that if x ↔ y, then dx = dy .
Definition 5.(Class property) A certain property P is a class property if P holds for x

and x ↔ y implies that P holds for y. As we have seen in the above exercises, recurrence,
transience, positive and null recurrence and periodicity are all class properties.
4.6 Ergodic theorem for Markov chains.

Throughout this subsection we shall assume that X is irreducible.
Positively recurrent chains.

Theorem 4.6.1. Assume that X is positively recurrent. Then for any bounded
function f on S and for any initial distribution π,
n−1
1X X f (y)
lim f (Xk ) = , (4.6.1)
n→∞ n Ey (Ty )
0 y∈S
Pπ -a.s. Moreover if the chain is aperiodic (dx = 1 for some hence, by irreducibility,
any x ∈ S), then
1
lim Pπ (Xn = x) = , (4.6.2)
n→∞ Ex (Tx )
P-a.s for any x ∈ S.
64 CONTENTS
Proof. Fix x ∈ S and consider the induced regenerative structure as in (4.4.7). By
the renewal-reward theorem,
P
! Tx
k=1 δy (Xk )
n Tx Ex
1 X 1 X X X
lim f (Xk ) = f (y)Ex δy (Xk ) = f (y) ,
n→∞ n Ex (Tx ) y Ex (Tx )
0 k=1 y
Pπ -a.s. The term P

Tx
Ex k=1 δy (Xk )
Ex (Tx )
should be the same no matter which x we choose. Choosing x = y we infer that it
1
equals to Ey (T y)
.
Formula (4.6.2) is just the key renewal theorem (2.5.15) in the discrete case.
Exercise 4.6.1. (a) Check that any finite state irreducible MC is poisitively recur-
rent, and, furthermore, distributions of first hitting/return times have exponential
tails: ∃ α > 0 such that for any n ∈ N and any x, y ∈ S
Px (Ty > n) ≤ e−αn .
(b) Prove that if the chain is positively recurrent, then for any initial distribution µ,
for any k ∈ N and for any bounded function f on Sk ,
n
1X X 1
lim f (Xj , Xj+1 , . . . , Xj+k−1 ) = P(x1 , x2 ) . . . P(xk−1 , xk )f (x1 , . . . , xk ),
n→∞ n Ex1 (Tx1 )
j=1 x ,...,x ∈S
1 k
(4.6.3)
Pµ -a.s
Hint. Consider an auxiliary Markov chain Yi = (Xi , Xi+1 . . . . , Xi+k−1 ). Check that
it is irreducible and positive recurrent on the state space
S̃k := (x1 , . . . , xk ) ∈ Sk : P(x1 , x2 ) . . . P(xk−1 , xk ) > 0 .

Using ergodic theorem and bounded convergence theorem (BON) observe that for
(x1 , . . . , xk ) ∈ S̃k , the following limits exist and coincide P-a.s.:
n n
1X 1X
lim δ(x1 ,...,xk ) (Xi . . . , Xi+k−1 ) = lim P (Xi = x1 , . . . , Xi+k−1 = xk ) ,
n→∞ n n→∞ n
i=1 i=1
and finish the proof using P (Xi = x1 , . . . , Xi+k−1 = xk ) = P(Xi = x1 )P(x1 , x2 ) . . . P(xk−1 , xk ).
Invariant measures and invariant distributions. We continue to consider
irreducible chains on a countable S.
Definition.(Invariant measure and invariant distribution) A non-negative non-trivial
function µ on S is called invariant measure if
X
µy = µx P(x, y) (4.6.4)
x∈S
for any y ∈ S.
Iterating (4.6.4) we check that ∀ y ∈ S and n ∈ N,
Irreducibility
X
µy = µx Pn (x, y) ⇒ 0 < µy < ∞ ∀ y ∈ S. (4.6.5)
x∈S
P
An invariant measure µ satisfying y µy = 1 is called invariant distribution.
P Note that
invariant distribution exists iff there exists an invariant measure with y µy < ∞.
Theorem 4.6.2. An irreducible chain which has invariant distribution is necessarily

recurrent
Sketch of the proof. Let µ be an invariant distribution. Then, iterating (4.6.4) we
infer that X
µy = µx Pn (x, y) (4.6.6)
x∈S
for any y and n. However, for transient chains,
lim Pn (x, y) = 0,
n→∞
for any two states x and y. Taking n → ∞ limits in the right hand side of (4.6.6)
and using BON - bounded convergence theorem, still to be formulated, we conclude
that µy = 0 for any y ∈ S. A contradiction.
Here is the connection between invariance and cycle decomposition (4.4.7). Fix
x ∈ S and define !
XTx
µy = E x δy (Xn ) . (4.6.7)
1
Exercise 4.6.2. Prove that if y ↔ x, then µy ∈ (0, ∞). In particular, formula

(4.6.7) defines a finite positive measure for any irreducible chain X.
Hint. For proving µy < ∞ check that under Py , the number of returns to y before
visiting x has a Geo0 -distribution with positive probability of success (see also the
proof of Theorem 4.6.3 below).
For the rest of this Section we shall restrict attention to irreducible recurrent
chains.
66 CONTENTS
Theorem 4.6.3. (a) The measure µ in (4.6.7) is invariant.
(b) Furthermore, µ is the unique invariant measure up to a multiplication by a
constant, namely if π is another invariant measure, then
µx µy
= (4.6.8)
πx πy
for any x, y ∈ S
(c) The chain is positively recurrent iff there exists an invariant distribution, that is
a probabilistic solution to (4.6.4).
Proof. Since we are assuming recurrence, µx = Px (Tx < ∞) = 1. Let us check that
µy > 0 for any other y 6= x. Indeed, µy ≥ Px (Ty < Tx ), and by irreducibility there
exists n with 0 < Pn (x, y). However,
n−1
X
n
0 < P (x, y) = Px (Xk = x) Px (Xn−k = y; Ty < Tx )
k=0
n−1
X
≤ Px (Ty < Tx ) Px (Xk = x) ≤ nPx (Ty < Tx ) ,
k=0
as it follows by the last exit (from x) decomposition and, of course, Markov property.
Hence, µy > 0 as it was claimed. .
In order to prove invariance, write (recall that µx = 1)
∞
X ∞
X
µy = Px (Xn = y; n ≤ Tx ) = p(x, y) + Px (Xn = y; n ≤ Tx )
n=1 n=2
∞ X
X
= µx p(x, y) + Px (Xn−1 = z; Xn = y; n ≤ Tx ) .
n=2 z6=x
However, for any z 6= x,
Px (Xn−1 = z; Xn = y; n ≤ Tx ) = Px (Xn−1 = z; n − 1 ≤ Tx ) P(z, y).
and, furthermore,
∞
X
Px (Xn−1 = z; n − 1 ≤ Tx ) = µz .
n=2
Hence the conclusion (a) of the Theorem.

Let now π be another invariant measure. Then, π ∈ (0, ∞) for any y ∈ S, as we
already know from (4.6.6).
Definition 4.6.1. (Transition probabilities for the chain reversed with respect to π.) Set
b z) = πz P(z, y) 1 .
P(y, (4.6.9)
πy
Let Y be the chain with transition probabilities P

b (see Remark 4.6.1 below). It is called
the reversed chain for the following reason for any n and any sequence x0 , . . . , xn ∈ S,
πx0 P(x0 , x1 ) . . . P(xn−1 , xn ) = πxn P(x

b n , xn−1 ) . . . P(x
b 1 , x0 ). (4.6.10)
b that is if for any x, y ∈ S,

If P = P,
πx P(x, y) = πy P(y, x), (4.6.11)
then the corresponding chain is said to be reversible with respect to π. Note that if (4.6.11)
holds, then π is automatically invariant.
Remark 4.6.1. By invariance of π it is straightforward that P

b is a matrix of tran-
sition probabilities:
b z) = 1 πy
X X
P(y, πz P(z, y) = = 1.
z
πy z πy
Furthermore,
b n (y, z) = πz Pn (z, y) 1 .
P (4.6.12)
πy
By definition this holds for n = 1. Indeed assume that (4.6.12) holds for n ∈ N.
Then,
X X
b n+1 (y, z) =
P P(y,
b u)Pb n (u, z) = b n (u, z)P(y,
P b u)
u u
X 1 1 1
= πz Pn (z, u) πu P(u, y) = Pn+1 (z, y)πz ,
u
πu πy πy
and one can apply induction. Furthermore π is clearly an invariant measure for Y:
X X 1
πy P(y,
b z) = πy P(z, y)πz = πz . (4.6.13)
y y
πy
Exercise 4.6.3. Check that

(i) The original chain X with transition probabilities matrix P is irreducible and re-
current iff the revesed chain Y-chain with transition probabilites (4.6.9) is irreducible
and recurrent.
68 CONTENTS
(ii) For any x ∈ S there is an equality between expected return times for the direct
and the reversed chains
Ex (Tx ) = P
bx (Ex ) . (4.6.14)
Conclude that X is ergodic iff Y is ergodic.
The path reversal relation (4.6.10) implies the following identity: For any n, x
and y
πx Px (Xn = y; Tx ≥ n) = πy P
by (Tx = n) (4.6.15)
Let us fix x as in the definition (4.6.7) and sum both sides of (4.6.15). with respect
to n. Since by recurrence,
X X
by (Tx = n) = 1 = µx , and by (4.6.7),
P Px (Xn = y; Tx ≥ n) = µy ,
n n
it follows that πx µy = πy µx . Hence (4.6.8).

P The last statemet (c) of the Theorem P is now straightforward: By (b) either
y πy = ∞ for all solutions to (4.6.4), or y πy < ∞ for all solutions
P to (4.6.4). But
for our particular solution µ defined in (4.6.7) the value of the sum y µy = Ex (Tx ).
Hence the probabilistic solution exists iff the chain is positively recurrent.
Let X be an irreducible and positively recurrent MC, and let µ be the corresponding
invariant distribution
1
µx = .
Ex (Tx )
Exercise 4.6.4. Show that for any x 6= y ,
1 Ex (Tx )
Px (Ty < Tx ) = = . (4.6.16)
µx (Ex (Ty ) + Ey (Tx )) (Ex (Ty ) + Ey (Tx ))
Hint. Define T to be the first return time to y after a visit to x.
(a) Use reward renewal and ergodic theorem to check that
T
!
X
Ey δx (Xn ) = µx Ey (T).
1
(b) Show that Ey (T) = Ex (Ty ) + Ey (Tx ).

(c) Show that
Ty
T
! !
X X
Ey δx (Xn ) = Ex δx (Xn ) ,
1 0
PTy
and check that under Px , random variable 0 δx (Xn ) has a geometric distribution.
Specifically, under Px random variable T0 y δx (Xn ) is distributed Geo(Px (Ty < Tx ))
P
and hence
Ty
!
X 1
Ex δx (Xn ) = .
0
Px (Ty < Tx ))
4.7 Coupling from the past and perfect sampling.
Let X be an irreducible Marckov chain on a finite state space S. For convenience
we shall label points in S as S = {1, 2, . . . , n}. We know that X is ergodic and that
there exists the unique invariant distribution µ. The question is how to simulate µ?
Ergodic theorem tells: no matter where one starts the chain just run it for a long
time and, for any x, the probability P(Xn = x) will converge to µx . But how fast
and is there a way to sample µ exactly?
The following algorithm by Propp and Wilson, called Perfect Sampling based on
Coupling from the Past gives an answer to this question. Let us define a random
map Fω from S to itself in the following way:
• For each i ∈ S choose a random image fω (i) ∈ S according to transition

probabilities P(i, ·) and independently for different i 6= j.
• Fω (S) = {fω (i)}i∈S .
Consider now independent iterations of Fω , and, with some abuse of notation, let
Fω` be the composition of ` such iterations. Let us say that Fω` is constant if the set
Fω` (S) is a singleton. Of course, if Fω` is constant, then so is Fωk for any k > `. In
light of this observation define random number of iterations
M = inf ` : Fω` is constant and set Z = FωM (S).

(4.7.1)
Theorem 4.7.1. Random variable Z ∈ S constructed in (4.7.1) is distributed ac-

cording to the invariant distribution of the Markov chain X.
Proof. Let us first check that M < ∞ P-a.s. The fact that the state spce S was
assumend to be finite plays a role here. Indeed, since X is irreducible, and since S is
finite, there exists L < ∞, such that PL (i, j) > 0 for any i, j ∈ S. This means that
P (M > L) < 1. However, by independence (of iterates Fω ), P(M > kL) ≤ P(M >
L)k . Hence, (e.g. by Borel-Cantelli), P (M < ∞) = 1.
The rest of the proof clarifies why one is talking about coupling from the past
here. Let us think about FωM as a map from S located at negative random time −M ,
and let X be the stationary Markov chain run from −∞. We may couple X with
independent random maps Fωt from S located at times t to S located at subsequent
times t + 1; t ∈ Z. In this way M < ∞ simply implies that X0 = Z with probability
one. Hence the conclusion.
70 CONTENTS
5 Martingales.
Throughout this section {Fn } is a filtration of σ-algebras.
Definition. A random sequence M = (M0 , M1 , . . .) is called a martingale, respectively
sub-martingale or super-martingale, if
E (|Mn |) < ∞ ∀n, (5.0.2)
and, for any n = 0, 1, . . . ,,

E Mn+1 Fn = Mn , respectively ≥ Mn or ≤ Mn . P − a.s. (5.0.3)
5.1 Examples.
The first example explains prefixes sub and super.
Example 5.1.1.
Let P be a transition matrix of a Markov Chain X = (X0 , X1 , . . .). Given a

function f on S, the expression (or, in the language of the previous Section, the
action of Markov operator P)
X
Pf (x) = P(x, y)f (y)
y∈S
is sometimes called the harmonic average of f at x. Note that Pf is always defined

if f is bounded.
A function h on the state space S is called harmonic; respectively sub-harmonic
or super-harmonic, if for any x ∈ S it compares with its harmonic average as follows:
h(x) = Ph(x), respectively, h(x) ≤ Ph(x) or h(x) ≥ Ph(x). (5.1.1)
Then, for any bounded harmonic (sub, sup) function h, the sequence Mn = h(Xn )
is a martingale (sub, sup) with respect to the filtration Fn = σ (X0 , . . . , Xn ).
Proof. By Markov property (4.1.2),
E (Mn+1 |Fn ) = E (h(Xn+1 )|Fn ) = Ph(Xn ) P − a.s.

5. MARTINGALES. 71
Exercise 5.1.1. Let X be a Markov Chain, and f is a bounded function on S. Check that
n−1
X
Mn = f (Xn ) − f (X0 ) + (f (Xk ) − Pf (Xk )) (5.1.2)
0
is a martingale.
∆ P
Hint. Rewrite Mn as Mn = n1 (f (Xk ) − Pf (Xk−1 )) = n1 ηk ,, and check that ηk is a
P
martingale difference sequence, that is E(ηk+1 |Fk ) = 0.
Example 5.1.2.
Let Mn = x + n1 ξ1 be a random walk starting at x, say on Z with i.i.d steps
P
ξ satisfying E(|ξi |) < ∞. Then M is a martingale, respectively sub-martingale
or super-martingale, with respect to Fn = σ (ξ1 , . . . , ξn ), if E(ξ) = 0, respectively
E(ξ) ≥ 0 or E(ξ) ≤ 0
Exercise 5.1.2. Fix any x ∈ Z and consider a non-trivial simple Random Walk on Z
starting at x, that is when 0 < P (ξ = 1) = p = 1 − q = 1 − P(ξ = −1) < 1. Prove that
Xn
q
Mn =
p
is a martingale.
Example 5.1.3.
Think about a sequence of games with i.i.d outcomes ξ1 , ξ2 , . . . . The outcomes
are ±1 - win or loss. One can describe a previsible strategy as follows: Your decide
to bet Ci (ξ1 , . . . , ξi−1 ) ≥ 0 on i-th game. For instance if you bet $ 1 until the first win
(and then leave the casino), then Ci = 1{T>i−1} , where T = inf {n > 0 : ξn = 1}.
Then, the net gain/loss after n games is
n
X
Mn = Ci ξi . (5.1.3)
1
Then, regardless of the strategy one choses, Mn is a martingale, respectively sub-

martingale or super-martingale, whenever E(ξ) = 0, respectively E(ξ) ≥ 0 or E(ξ) ≤
0. That is: One cannot beat the system!
Example 5.1.4.
72 CONTENTS
Consider Branching process Xn in (4.2.2). Let µ = E(ξ) and Fn = σ (X0 , . . . , Xn ).
Then,
E Xn+1 Fn = µXn .
Which means that Xn is a martingale/sub-martingale/super-martingale if µ = 1/
µ ≥ 1/ µ ≤ 1 . Furthermore,
∆ 1
Zn = n Xn , (5.1.4)
µ
is always a martingale.
Example 5.1.5.
In Polya’s Urn Scheme one starts with w white and b black balls in the urn. At
each step a ball is randomly sampled, and then returned to the urn with c balls of
the same colour. Define Mn as proportion of white balls in the urn after n-th stage.
w
Of course, M0 = w+b , but M1 , M2 , . . . are already random.
Lemma 5.1.1. The sequence {Mn } is a martingale with respect to the natural fil-
tration Fn = σ (M0 , . . . , Mn ).

Proof. Let m be in the range of Mn , and let us compute E Mn+1 Mn = m . Note
that after n-th stage there are w + b + nc balls in the urn, and given {Mn = m},
exactly m (w + b + nc) of them are white. Note also that the conditional probability,
given {Mn = m}, to sample a whiet ball on (n + 1)-st stage is m. Therefore,
m (w + b + nc) + c m (w + b + nc)
E Mn+1
Mn = m = m + (1 − m)
w + b + (n + 1)c w + b + (n + 1)c
m (m (w + b + nc) + c) + (1 − m)m (w + b + nc)
= = m.
w + b + (n + 1)c

Hence E Mn+1 Mn = Mn .
Exercise 5.1.3. Define the event Bn = {Black ball is sampled on n-th stage}. Us-
b
ing Lemma 5.1.1 give a short proof that P (Bn ) ≡ w+b .
Example 5.1.6.
Consider the following model of stock price evolution: Let X0 be the initial price,
and let ξ1 , ξ2 , . . . be i.i.d non-negative (and better positive) random variables, which
are also independent of X0 . Define:
n
Y
Xn = X0 ξi (5.1.5)
i=1
For instance in a discrete version of the Black-Sholes model random change factors
ξi = eηi with ηi ∼ N (µ, σ 2 ).
5. MARTINGALES. 73
Exercise 5.1.4. Find conditions under which {Xn } is a martingale/sub-martingale/super-
martingale sequence. Quantify this in terms of µ and σ for Black-Sholes model.
Remark 5.1.1. A typical question about Xn would be whether it hits level a before
hitting level b. This is a random walk question: Consider
n
X
Yn = ln Xn = ln X0 + ln ξi .
1
Yn is a random walk, and the question is whether it visits [ln a, ∞) before visiting
(−∞, ln b].
Example 5.1.7.
Let ξ1 , ξ2 , . . . be i.i.d with E(ξ) = 0 and finite variance Var(ξ) = σ 2 < ∞. Define
Fn = σ(ξ1 , . . . , ξn ) and set
n
X
Mn = ξi and Yn = Mn2 − nσ 2 .
1
We know that Mn is a martingale. An additional important observation is that Yn

is a martingale as well. Indeed,
2
|Fn − (n + 1)σ 2 = E (Mn + ξn+1 )2 |Fn − (n + 1)σ 2

E (Yn+1 |Fn ) = E Mn+1
= Mn2 + E(ξn+1
2
) − (n + 1)σ 2 = Yn .
Variance Martingales.
Exercise 5.1.5. Let Mn be a martingale such that E (Mn2 ) < ∞ for all n = 0, 1, 2, . . . .
For n = 1, 2, . . . define
n
X
E (Mk − Mk−1 )2 |Fk−1 .

An = (5.1.6)
k=1
Check that Yn = Mn2 − An ; n = 1, 2, . . . is a martingale.
Example 5.1.8.
Consider a certain service system in discrete time. The state of the system is
described by a random variable Yk . Assume that supk E (|Yk |2 ) < ∞. Customers
arrive to the system according to the Bernoulli process ξ = (ξ1 , ξ2 , ξ3 , . . .). That is
ξi -s are independent, and P (ξi = 1) = p = 1 − q = P (ξi = 0). Assume, furthermore,
that for any k the random indicator ξk is independent of {Yj }j<k . Define Fn =
σ (ξ1 , . . . , ξn ; Yk , k < n). Then,
n
X
Mn = Yk−1 (ξk − p) (5.1.7)
k=1
74 CONTENTS
is an Fn -martingale.
Indeed, by our assumptions ξn+1 is independent of Fn , and hence
E (Mn+1 |Fn ) = Mn + Yn−1 E (ξn+1 − p|Fn ) = Mn .
BASTA (Bernoulli arrivals see time averages). As we shall see below, under our condi-
tions the following version of the Martingale convergence theorem applies:
1
lim Mn = 0 P − a.s. (5.1.8)
n→∞ n
Assume that
n−1
1X
Ȳ = lim Yk (5.1.9)
n→∞ n
0
also exists
P P-a.s. Then Ȳ is interpreted as an objective long range average of Y . Define
Sn = n1 ξi - number of customers which arrived to the system by time n. By (5.1.8) and
(5.1.9),
n n
1X 1 X
pȲ = lim Yk−1 ξk = p lim Yk−1 ξk . (5.1.10)
n→∞ n n→∞ Sn
1 1
However, limn→∞ S1n n1 Yk−1 ξk is exactly the subjective average computed according to
P
what customers see when they arrive.
5.2 Convergence theorems for expectations.

Here are long-promised results, which we give without proofs.
Assume that a sequence of random variables {Xn } is P − a.s convergent to a random
variable X. Then
E (X) = lim E (Xn ) , (5.2.1)
n→∞
if one of the following happens:

MON(Monotone convergence theorem). If P (0 ≤ X1 ≤ X2 ≤ . . .) = 1.
BON(Bounded convergence theorem). If there exists K < ∞ such that P (|X` | ≤ L) = 1
for all `.
DOM(Dominated convergence theorem). If there exists a (non-negative) random variable
Y, such that E (Y) < ∞, and P (|X` | ≤ Y) = 1 for any ` = 1, 2, . . . .
A closely related result is:
Fatou Lemma. If {Xn } is a sequence of non-negative random variables, then

E lim inf Xn ≤ lim inf E (Xn ) . (5.2.2)
n→∞ n→∞
5. MARTINGALES. 75
5.3 Optional stopping.
Recall our gambling strategy Example 5.1.3. Usually E(ξi ) ≤ 0, so we are talking
about super-martingales. Thus,
n
X
Xn = ξk
1
is a super-martingale.
If T is a stopping time, then {T > n − 1} ∈ Fn−1 , and
n
X
Mn = 1{T≥k} ξk = Xn∧T . (5.3.1)
1
is also a super-martingale. As we have seen E (Mn ) = E (Xn∧T ) ≤ 0 for any n. But

does this imply that E (XT ) ≤ 0 as well?
A counter-example. Assume that p = 1/2. That is Xn is just a simple symmetric
random walk. Define T = inf {n : Xn = 1}. As we shall see, and as we have already
mentioned Xn is recurrent, and hence P (T < ∞) = 1. On the other hand, XT ≡ 1, and
hence
1 = E (XT ) > 0 = E (X0 ) .
Theorem 5.3.1. Let X be a super-martingale and T is a stopping time, both with respect
to the same filtration {Fn }. Then
E (XT ) ≤ E (X0 ) , (5.3.2)
if one of the following happens:

(a) Xn is P-a.s. non-negative for any n, and T is P-a.s. finite.
(b) T is bounded, that is there exists N , such that P (T ≤ N ) = 1.
(c) E (T) < ∞, and Xn has bounded increments in the following sense: ∃ R such that

P |Xn+1 − Xn |1{T>n} ≤ R = 1. (5.3.3)
Proof of Theorem 5.3.1. All three claims (a)-(c) follow from the fact that Mn in
(5.3.1) is super-martingale. In particular E(Mn ) ≤ E(M0 ) = E(X0 ).
(a) and (b) are easy: Indeed, in order to see (b) just take n = N . Then MN =
XT∧N = XT and, consequently, E(XT ) = E(MN ) ≤ E(M0 ) = E(X0 ).
If T is P-a.s. finite, then XT = limn→∞ XT∧n , and (a) follows from Fatou Lemma.
Let us turn to (c). Note that
∞
X
XT − XT∧n = (XT − XT∧n ) 1T>n = (Xm+1 − Xm )1T>m .
m=n
76 CONTENTS
Consequently,
∞
X ∞
X
E |XT − Mn | = E |XT − XT∧n | ≤ E (|Xm+1 − Xm | 1T>m ) ≤ R P (T > m) ,
m=n m=n
where the first inequality follows by Fatou, and the second one follows
P by assumption
(5.3.3). Since E(T) < ∞, it follows by tail formula that limn→∞ ∞ m=n P (T > m) =
0. Since E(Mn ) ≤ E(X0 ) and since E(XT ) ≤ E(Mn ) + E |XT − Mn |, we are home.
Application to computing exit probabilities for SRW. Consider random

walk on Z with jump probabilities
p(k, k + 1) = p, p(k, k − 1) = q, p(k, k) = 1; p + q + r = 1. (5.3.4)
Assume that Xn start at x; P (X0 = x) = 1. Then,
Xn
q
Mn =
p
is a martingale. In particular it is a non-negative super-martingale. Hence it satisfies
(a) of Theorem 5.3.1 . Hence,
XT ! x
q q
Ex ≤ ,
p p
for any stopping time T. Consider now a < x < b, and let Ta , Tb be the first hitting
times of a and b. Then T = Ta ∧ Tb is the exit time from {a + 1, . . . , b − 1}.
Exercise 5.3.1. Check that T and Mn also satisfy (c) of Theorem 5.3.1
Hence,
x a b
q q q
= E (MT ) = Px (Ta < Tb ) + Px (Tb < Ta ) .
p p p
The above equation is not very useful if p = q. However, if p 6= q, then,
x a
q
p
− pq
Px (Tb < Ta ) = b a . (5.3.5)
q
p
− pq
Exercise 5.3.2. Consider random walk Xn with transition probabilities (5.3.4). As

before for a < x < b set T = Ta ∧ Tb .
(a) For p 6= q consider
Yn = Xn − (p − q)n.
Use Theorem 5.3.1 and (5.3.5) and compute Ex (T).
(b) For q = p compute Px (Ta < Tb ). Consider variance martingale (Exer-
cise 5.1.5) Mn = X2n − An . Compute what is An , and use Theorem 5.3.1 for com-
puting Ex (T).
5. MARTINGALES. 77
Proof of Theorem 5.3.1. First of all if Xn is a super-martingale and T is a stopping
time, then Yn = Xn∧T is also a super-martingale, in particular, for every n
E (Xn∧T ) ≤ E (X0 ) . (5.3.6)
This is just a generalisation of Example 5.1.3 . Indeed,

n
X
Yn = X0 + 1{T≥`} (X` − X`−1 ) , (5.3.7)
1
and hence
E (Yn+1 − Yn |Fn ) = 1{T≥n+1} (E (Xn+1 |Fn ) − Xn ) ≤ 0.
The rest of the proof is based on integral convergence theorems and on the following
fact: If T < ∞ P-a.s, then limn→∞ Xn∧T = XT P-a.s.
(a) If Xn is non-negative, then by Fatou Lemma,

E (XT ) = E lim Xn∧T ≤ lim inf E (Xn∧T ) ≤ E (X0 ) .
n→∞ n→∞
(b) If P (T ≤ N ) = 1, then T ∧ n = T for every n ≥ N . Hence in this case (5.3.2) is

actually just (5.3.6) applied for large values of n.
(c) Not that by (5.3.7) and then by (5.3.3),
n
X
Xn∧T − X0 = 1{T≥`} (X` − X`−1 )
`=1
n
X
≤ 1{T≥`} X` − X`−1 ≤ RT.
`=1
Since by assumption E (T) < ∞, the dominated convergence theorem and then
(5.3.6) imply that
E (XT − X0 ) = lim (XT∧n − X0 ) ≤ 0.
n→∞
Exercise 5.3.3. Prove that Xn is a P-a.s. bounded super-martingale, that is if there

exists K < ∞ such that E (|Xn | ≤ K) = 1 for every n, then the optional stopping
inequality (5.3.2) holds for any P-a.s. finite stopping time T.
Exercise 5.3.4. 10 Englishmen are trying to leave a pub in a rainy weather. They
do it in the following way: Initially they store all 10 umbrellas in a basket next to
the exit from the pub. They enter and drink a pint each. Then they return to the
basket and each one picks an umbrella at random (random permutation). Those who
picked their own umbrellas leave upset, while those who did pick a wrong umbrella,
put it back and return to the pub for another pint of ale. After that they return to
78 CONTENTS
the basket and try once again. And so on. Let T be the number of rounds needed
for all Englishmen to leave, and let N be the total number of ales consumed during
the procedure.
(a) Compute E(T ).
(b) Compute E(N ).
Hint: For n = 0, 1, 2, . . . set Xn to be the number of Englishmen in the pub after
n-th round, and consider Mn = Xn∧T + n ∧ T . To solve (b) think about variance
martingales.
Exercise 5.3.5. Let Yi , i ≥ 1 be a sequence of i.i.d. random variables with P(Y1 =
−1) = P(Y1 = 1) = 21 . If a gambler bets at time k on an amount of ak NIS, then
he/she wins ak Yk NIS. Consider the following strategy: a1 = A > 0 and, as soon
as the gambler wins, he/she stops gambling. Otherwise the gambler doubles the bet
for the next round. Denote by Xn the amount of money the gambler has by time n;
X0 = 0 and by T the first time the gambler wins, T = min{i : Yi = 1}.
(a) Show that for n ≥ 1,
Xn = A1T ≤n − A(2n − 1)1T >n .
Determine the distribution of Xn for each n ≥ 1 and check that Xn has zero expec-
tation.
(b) Show that Xn is a martingale with respect to Fn , where Fn is the sigma algebra
generated by Y1 , ...Yn (F0 = {Ω, ∅}) .
(c) Is T a stopping time with respect to the filtration {Fn }? Determine the distri-
bution of T and check that, P-a.s. T < ∞. Does EXT = EX0 hold? Why?
(d) Does Xn converge almost surely as n → ∞? Does Xn converge in L1 ?
Exercise 5.3.6. Let Xn , n ≥ 1 be i.i.d. random variables with strictly positive
probability density function g : R → (0, ∞). Let f : R → (0, ∞) be another strictly
positive probability density function. Suppose that
Z ∞ Z ∞ 2
f (x) f (x)
ln g(x)dx < 0 ln g(x)dx < ∞.
0 g(x) 0 g(x)
Set n
Y f (Xi )
Mn = .
i=1
g(Xi )
(a) Prove that Mn is a martingale.
(b) Does Mn converge almost surely as n → ∞?
(c) Does Mn converge in L1 ?
5.4 Maximal inequality and Martingale LLN.

Let {Mn } be a non-negative sub-martingale with respect to some filtration Fn . Then,
for any r > 0 and for any n ∈ N,

E (Mn )
P max M` ≥ r ≤ . (5.4.1)
1≤`≤n r
5. MARTINGALES. 79
Proof. Inequality (5.4.1) is called Doob’s maximal inequality and it is a generaliza-
tion of Kolmogorov’s maximal inequality (1.5.4). Define
T = inf {n : Mn ≥ r} .
Clearly, T is a stopping time. Indeed,
{T = n} = {M0 < r, M1 < r, . . . , Mn−1 < r, Mn ≥ r} ∈ Fn .
Next,
n n
X M` 1X
P max M` ≥ r = P (T ≤ n) ≤ E 1{T=`} = E 1{T=`} M` .
1≤`≤n
`=0
r r `=0
(5.4.2)
Since M is a sub-martingale; M` ≤ E (Mn | F` ), for any ` ≤ n. Therefore,

E 1{T=`} M` ≤ E 1{T=`} E (Mn | F` ) = E E 1{T=`} Mn | F` = E 1{T=`} Mn .
Substituting into (5.4.2) we conclude:

n
1X E 1{T≤n} Mn
P max M` ≥ r ≤ E 1{T=`} Mn = .
1≤`≤n r `=0 r
Note that up to now we have not relied

on non-negativity of M . Now we do: If Mn
is non-negative, then E 1{T≤n} Mn ≤ E (Mn ), and (5.4.1) follows.
The following Corollary is a version of martingale LLN:
Corollary 5.4.1. Let M be a martingale such that E (Mn2 ) < ∞ for any n ∈ N0 .
Then,
E (Mn2 )

P max |M` | ≥ r ≤ . (5.4.3)
1≤`≤n r2
Furthermore, let M be a martingale. Assume that there exists K < ∞ such that
E (Mn+` − M` )2 ≤ Kn,

(5.4.4)
for any `, n ∈ N. Then, P-a.s.,

1
lim Mn = 0 (5.4.5)
n→∞ n
BASTA: Example 5.1.8 and formula (5.1.10). Before proving Corollary 5.4.1
let us discuss how it implies BASTA. Consider the martingale Mn which was defined
in (5.1.7). We assume that
∆
R = sup E Yk2 < ∞.

k
80 CONTENTS
Now,
 !2 
`+n
X
E (Mn+` − M` )2 = E 

Yk−1 (ξk − p) 
`+1
`+n
X X
2
(ξk − p)2 + 2

= E Yk−1 E (Yj−1 (ξj − p) Yk−1 (ξk − p)) .
`+1 `+1≤j<k≤`+n
By the assumed independence of ξk from Fk−1 ,

2
(ξk − p)2 = p(1−p)E Yk−1
2

E Yk−1 and, for k > j, E (Yj−1 (ξj − p) Yk−1 (ξk − p)) = 0.
Hence (5.4.4) is satisfied with K = p(1 − p) supk E (Yk2 ) = p(1 − p)R.
Proof of Corollary 5.4.1. The proof is split into several exercises.
Exercise 5.4.1. Using Jensen’s inequality for conditional expectations: If X is an integrable random variable, F
a σ-algebra and ϕ a convex function, then P-a.s.

E ϕ (X) F ≥ϕ E X F , (5.4.6)
prove that if M is a martingale, ϕ is convex and E (|ϕ (Mn ) |) < ∞ for any n, then Xn = ϕ(Mn ) is a sub-martingale.
Then deduce (5.4.3).
The following three exercises are similar to Exercise 1.6.2, where LLN was established under the second moment
condition.
Exercise 5.4.2. Assume (5.4.4). Check, using Borel-Cantelli lemma, that for any α > 1,
1
lim Mbnα c = 0, (5.4.7)
n→∞ nα
P-a.s.
Exercise 5.4.3. Check, using (5.4.3) and Borel-Cantelli lemma, that for any α > 1, P-a.s.
M` Mbnα c
lim max − = 0. (5.4.8)
n→∞ bnα c≤`<(n+1)α ` nα
Exercise 5.4.4. Using (5.4.7) and (5.4.8) finish the proof of (5.4.5).
5.5 Martingale convergence theorem.

First of all two definitions:
A (super/sub) martingale M is bounded if
sup E (|Mn |) < ∞. (5.5.1)

n
A (super/sub) martingale M is uniformly integrable(UI) if

lim sup E 1{|Mn |≥K} |Mn | = 0. (5.5.2)
K→∞ n
5. MARTINGALES. 81
Same definitions are for bounded and UI super/sub martingales.
Martingale Convergence Theorem (without proof). If M is a bounded (super/sub)
martingale then there exists an integrable random variable M∞ , such that P-a.s.
lim Mn = M∞ . (5.5.3)
n→∞
Furthermore, if M is an UI martingale, then P-a.s. for any n,

Mn = E M∞ Fn , and for any σ-algebra F lim E (Mn |F) = E (M∞ |F) . (5.5.4)
n→∞
Remark 5.5.1. If M is a non-negative super-martingale, respectively non-positive

sub-martingale, then it is automatically bounded, and, by Fatou lemma,
E (M∞ ) ≤ lim E (Mn ) . (5.5.5)

n→∞
It might happen, however, that if M is a non-negative martingale which is not UI,

then the inequality in (5.5.5) is strict, that is E (M∞ ) < E (M0 ) ≡ E (Mn ).
Branching Process. Recall Branching process in Example 5.1.4 and the related
non-negative martingale Zn defined in (5.1.4). As before let ξ be the offspring
variable, µ = E (ξ).
Exercise 5.5.1.
(a) Check that if µ < 1, then both limn→∞ Xn = limn→∞ Zn = 0 P-a.s.. Conclude
that {Zn } is not UI, and explain your conclusion.
(b) Check that if µ = 1 and if P (ξ = 0) > 0, then limn→∞ Xn = 0 P-a.s, and
explain why this implies that {Xn } is not UI.
∆
(c) Assume that µ > 1 and that σ 2 = Var(ξ) < ∞. Use conditional variance
formula: For any two random variables U and W, if Var(W) < ∞, then
Var(W) = E (Var (W|U)) + Var (E (W|U)) ,
in order to check that {Var (Zn )} is a bounded sequence. Conclude that {Zn } is
unifromly integrable, and that E (Z∞ ) = 1.
(d) Assume that µ > 1 and that P (ξ = 0) > 0. Prove that there exists unique
x∗ ∈ (0, 1) such that
E xξ∗ = x∗ .

∆
Check that with x∗ as above, Mn = xX∗ n is a martingale. Prove that M∞ =
limn→∞ Mn exists, and conclude that P-a.s.
lim Xn exists and belongs to {0, ∞}

n→∞
82 CONTENTS
Hint: Note that Xn ∈ N and that under above assumption on ξ, P (limn→∞ Xn = k) =
0 fro every k ∈ N.
(e) Under assumptions of (d) conclude, using (5.5.4) that x∗ is the probability
of extinction.
Polya’s Urn Scheme with c = 1. Recall Example 5.1.5. In this case exactly one
ball is added to the urn at each stage. By Lemma 5.1.1 the random proportion Mn
of white balls in the urn after n stages is a martingale. It is non-negative, and since
Mn ∈ [0, 1], it is automatically UI. Hence M∞ = limn→∞ Mn exists P-a.s.
Below we give a complete probabilistic characterization of M∞ for c = 1, that is
when exactly one ball is added to the urn at each stage.
Recall that a random variable Z has a β-distribution β(α, β) with parameters
α, β > 0 if it density function fα,β is given by
(
0, if p 6∈ [0, 1]
fα,β (p) = pα−1 (1−p)β−1 . (5.5.6)
B(α,β)
, if p ∈ [0, 1]
Above the β-function

Z 1
Γ(α)Γ(β)
B(α, β) = = pα−1 (1 − p)β−1 dp. (5.5.7)
Γ(α + β) 0
Recall that for integer α ∈ N, Γ(α) = (α − 1)!.
Lemma 5.5.1. Fix any w, b ∈ N. If c = 1, then M∞ is distributed β(w, b).
First of for any n and any k = 0, . . . , n

w+k n Γ(w + k) Γ(b + (n − k)) Γ(w + b)
P Mn = = · · (5.5.8)
w+b+n k Γ(w) Γ(b) Γ(w + b + n)
Indeed, thew event
w+k
Mn =
w+b+n
is a disjoint union of nk events S[a] labeled by different a ∈ {0, 1}n with n1 ai = k.
P
ai = 1 in S[a] means that a white ball was sampled on i-th stage. Then, by direct
computation,
w(w + 1) . . . (w + k − 1)b(b + 1) . . . (b + n − k − 1)
P (S[a]) = ,
(w + b)(w + b + 1) . . . (w + b + n − 1)
for any particular choice of a. Hence (5.5.8). The rest is an exercise.
Exercise 5.5.2.
5. MARTINGALES. 83
With w, b and c = 1 as above:
(a) Let Pn,p be the probability function of the binomial distribution Bin(n, p).
Using (5.5.6) and (5.5.7) check that one can rewrite (5.5.8) as follows:
Z 1
w+k
P Mn = = fw,b (p)Pn,p (k). (5.5.9)
w+b+n 0
(b) Deduce from (5.5.9) that for any x ∈ [0, 1] ,

Z x
lim E (Mn ≤ x) = fw,b (p)dp. (5.5.10)
n→∞ 0
(c) Explain, using Proposition 1.3.2, why this implies that M∞ ∼ β(w, b).
Kolmogorov’s 0−1 law. Let ξ1 , ξ2 , ξ3 , . . . be independent random variables. Define
∆
Fn = σ(ξ1 , . . . , ξn ) and Tn = σ(ξn , ξn+1 , . . . ). The tail σ-algebra is T∞ = ∩Tn .
Kolmogorovs 0 − 1 law claims that T∞ is trivial: If A ∈ T∞ , then either P(A) = 1
or P(A) = 0. The proof is immediate if one uses martingale convergence theorem:
Define Mn = P(A | Fn ) = E(1A | Fn ). Then M is a uniformly integrable martingale,
and limn→∞ Mn exists and equals to 1A . On the other hand, since for every n, A and
Fn are independent, E(1A | Fn ) = P(A). Consequently, P(A) almost surely equals
to 1A , which is the zero-one law.
Backward martingales and LLN. A process M = {Mn ; n ∈ Z− } is called back-
ward martingale with respect to filtration F−1 ⊃ F−2 , . . . ,, if if E (Mn |F` ) = M`
for any ` < n ≤ −1. Alternatively, M` = E (M−1 |F` ) for any ` ∈ Z− . Note that
backward martingales are always uniformly integrable.
Let ξ1 , ξ2 , . . . be i.i.d random variables, and assume that µ = E(ξi ) is well defined
and finite. Set n
1X
Sn ξi and F−n = σ (Sn , Sn+1 , . . .) .
n 1
By symmetry,
E (ξ1 | F−n ) = Sn
So, S is a backward martingale, and the limit S∞ = limn→∞ Sn exists P-a.s. But
S∞ ∼ F−∞ = ∩F−` . So by Kolmogorov’s zero-one law S∞ is a P-a.s constant. Since,
by uniform integrability, lim E(Sn ) = E (S∞ ), the LLN follows. .
5.6 Transience and recurrence of Markov chains.

In this section X is an irreducible Markov Chain with transition matric P.
Creterion for transience. X is transient if and only if there exists a non-trivial
and non-negative P -superharmonic function.
Indeed, non-trivial means that there exist x and y such that h(x) 6= h(y). Now
Mn = h(Xn ) is a non-negative supermartingale, hence limn→∞ h(Xn ) exists. which
means that
Pπ (Xn = x i.o and Xn = y i.o) = 0 (5.6.1)
84 CONTENTS
for any initial distribution π.
Exercise 5.6.1. Check that (5.6.1) implies that the chain is transient.
Other way around, if X is transient, then fix any x ∈ S, and consider
h(y) = Py (Tx < ∞) . (5.6.2)
Let us check that h in (5.6.2) is (if the chain is transient as we assume) a non-trivial
super-harmonic function. By conditioning on the first step,
h(y) = Py (Tx < ∞; X1 = x) + Py (Tx < ∞; X1 6= x)

X X
= P(y, x) + P(y, z)h(z) ≥ P(y, x)h(x) + P(y, z)h(z),
z6=x z6=x
the last inequality holds since transient chains 0 ≤ h(x) < 1. So h is indeed super-
harmonic.
On the other hand, since X is transient, n Pn (x, x) < ∞. Consequently,
P
X
lim Px (θn Tx < ∞) ≤ lim Pm+n (x, x) = 0.
m→∞ m→∞
n
Since Pnx (θn Tx < ∞) = z Pnx (x, z)h(z), it follows that infz h(z) = 0, and, since by
P
irreducibility h is positive, it follows that it is non-trivial.
Pn
Example 5.6.1. Consider random walk Xn = 1 ξ` such that
(i) E etξ < ∞ for any t ∈ R.
(ii) µ = E (ξ) 6= 0.
(iii) P (ξ < 0) and P (ξ > 0) are both positive.
By LLN (ii) implies that X is transient. Let us see how this conclusion fol-
lows from the criterion above, namely from existence of positive non-trivial super-
harmonic function.
Lemma 5.6.1. Under conditions (i)-(iii) above there exists t∗ 6= 0, such that h(x) =
et∗ x is harmonic.
Proof of Lemma 5.6.1. . Consider the function
ϕ(t) = log E etξ .

(5.6.3)
Exercise 5.6.2. Prove that ϕ is finite and differentiable by (i), convex either by
direct computation or by Hölder, strictly convex by (iii) and, again by (iii), satisfies
limt→±∞ ϕ(t) = ∞.
5. MARTINGALES. 85
Furthermore, ϕ(0) = 0 and ϕ0 (0) = µ 6= 0. Since limt→±∞ ϕ(t) = ∞ there exists
the second root t∗ 6= 0 of ϕ(t) = 0. That is there exists t∗ 6= 0 such that E et∗ ξ = 1.
But then, X
P(x, y)et∗y = E et∗(x+ξ) = et∗x ,

y
as claimed.
A sufficient criterion for recurrence. If there exists a finite set F ⊂ S and a

non-negative function hPon S such that:
(i) ∀x 6∈ F , h(x) ≥ y P(x, y)h(y).
(ii) For any number r > 0 the set {y : h(y) ≤ r} is finite.
Then X is recurrent.
Proof. Consider TF = inf {n ≥ 1 : Xn ∈ F }.
Exercise 5.6.3. Check that TF is a stopping time, and that Yn = h (Xn∧TF ) is a
super-martingale under Px for any x 6∈ F .
Hence limn→∞ Yn exists and finite. In view of (ii) this means that Xn∧TF can
visit only finite number of states. Therefore,
Either Px (TF < ∞) = 1 or Px (Xn visit finite number of states|TF = ∞) = 1

(5.6.4)
Exercise 5.6.4. Check that the first of (5.6.4) implies recurrence, whereas if
Px (TF = ∞) > 0, then the state space S is infinite, and the second of (5.6.4) is
impossible.
A sufficient criterion for positive recurrence.(Foster’s Theorem) Assume that

X is irreducible and that there exists a finite set F ⊂ S, > 0 and a non-negative
function h on S such that: P
(i)0 ∀x 6∈ F , h(x) − ≥ y P(x, y)h(y) and ∀x ∈ F , y P(x, y)h(y) < ∞.
P
Then X is positively recurrent.

Indeed, as before consider Yn = h (Xn∧TF ), and consider Px for x 6∈ F . By assump-
tion,
Ex (Yn+1 | Fn ) ≤ Yn − 1{n<TF } .
Which, by iteration, means that for any n
n
X
E (Yn+1 ) ≤ Ex (Y0 ) − Px (TF > k) .
0
As a result, since Y is non-negative, we conclude that under our assumption,

1
Ex (TF ) ≤ h(x). (5.6.5)

86 CONTENTS
whenever x 6∈ F . Note that by the first step analysis the second of conditions in
(i)0 implies now that Ex (TF ) < ∞ also for any x ∈ F . The conclusion of Foster’s
theorem follows then from
Lemma 5.6.2. If X is irreducible, and if for some finite F ⊂ S the expectation

Ex (TF ) < ∞ for any x ∈ F , then the chain is positively recurrent.
Proof of Lemma 5.6.2. . Let us start the chain from some x ∈ F , and let us define
the following sequence of stopping times which record successive returns of X to F ,
T0 = 0, T1 = TF , T2 = T1 + θT1 TF , . . . , Tn+1 = Tn + θTn TF , . . . (5.6.6)
For n ∈ N0 define Yn = XTn . By the strong Markov property of X, the chain Y is

Markov on the finite state space F . Evidently it inherits irreducibility from X.
Let Hx be the first return time of Y to x. Since F is finite, Ex (Hx ) < ∞.
Hx and Tx are related as follows: Consider (5.6.6) and, for n ∈ N0 , define
Sn+1 = θTn TF . Then,
Hx
X ∞
X
Tx = Sn ⇒ Ex (Tx ) = Ex (Sn 1Hx >n−1 ) .
1 n=1
However, by strong Markov property,

Ex Sn Hx > n − 1 ≤ max Ey (TF ) < ∞.
y∈F \{x}
Hence, Ex (Tx ) < ∞ as claimed.
Exercise 5.6.5. Consider the following generalization of the Markovian model of

repair shop (4.2.4): Let k ∈ N and ξ1 , ξ2 , . . . are i.i.d N0 -valued random variables.
For a chain starting at X0 define recursively
Xn+1 = max {Xn + ξn+1 − k, 0} .
We assume that Var(ξi ) < ∞ and that the chain is irreducible. Set µ = E (ξ). Find
(with proofs of course) conditions in terms of µ, when the chain is transient, recur-
rent and positively recurrent.
Hint. If µ > k, then use LLN. If µ ≤ k, then consider F = {0, . . . k} and h(x) = x.
5. MARTINGALES. 87
Discrete birth and death process. Consider Markov chain Xn on N0 with tran-
sition probabilities
P(0, 1) = p0 = 1 and, for k 6= 0, pk = P(k, k + 1) = 1 − P(k, k − 1) = 1 − qk .
Assume that pk , qk > 0 for any k > 0. Define

q1 . . . qk
f (0) = 1 and, for k > 0, f (k) = . (5.6.7)
p1 . . . pk
Finally define 
0.
 if n = 0,
h(n) = 1, if n = 1, (5.6.8)
 Pn−1
1 + k=1 f (k), if n > 1.

Exercise 5.6.6. (a) Check that for any n > 0 the function h satisfies:
X
h(n) = P(n, `)h(`).
`
(b) Prove that if limn→∞ h(n) < ∞, then the chain X is transient.
(c) Prove that if limn→∞ h(k) = ∞, then the chain X is recurrent.
(d) Prove that if qk ≥ pk for all k large enough, then the chain is recurrent.
(e) Prove that if there exists > 0 such that qk ≥ pk + for all large enough k, then
the chain is positively recurrent.
Hint for (e) Use Foster’s theorem with h(x) = x.
(f ) Find an example when limk→∞ (qk − pk ) = 0, but the chain is still positively
recurrent.
(g) Find necessary and sufficient conditions for X be positively recurrent, and write
down expression for the invariant distribution πk ; k = 0, 1, 2, . . . .
Hint for (f ) and (g): Check that
Qn−1
pk
µ0 = 1 and µn = Q0 n for n > 0
1 qk
is an invariant measure for X. (h) Define T0 = inf {n > 1 ; Xn = 0}. For positively
recurrent discrete birth and death process find a formula for the expected value of
Ek (T0 ), that is expected value of T0 for a chain starting at point k, for k = 0, 1, . . . .
Hint. Recall that in the positive recurrent (ergodic) case, E0 (T0 ) = π0−1 , and that
by the first step conditioning E0 (T0 ) = 1 + E1 (T0 ). This gives E1 T0 and, by similar
reasoning, Ek+1 Tk for any k ∈ N. Finally note that by recurrence and strong Markov
property,
Ek+1 (T0 ) = Ek+1 (Tk ) + Ek (T0 ) .
88 CONTENTS
6 Reversible Random Walks and Electric Net-
works
The exposition in this section is inspired by the books by Doyle-Snell and Lyons-
Peres.
6.1 The setup: Probabilistic vrs Electrostatic Interpreta-

tion.
Let X be an irreducible Markov chain an a finite or infinite state space V. Assume
that X is reversible with respect to some (not necessarily finite) positive measure π,
that is
∆
c(x, y) = πx P(x, y) = πy P(y, x) = c(y, x). (6.1.1)
We shall consider V as the set of vertices of a graph G = (V, E), where the set of
(un-oriented) edges E is
E = {(x, y) : c(x, y) = c(y, x) > 0} .
In this way we can interpret X as a random walk on G which makes random jumps
across the edges of E, and the corresponding jump probabilities are given by
c(x, y) c(x, y)
P(x, y) = P = .
z c(x, z) πx
Remark 6.1.1. We shall always assume that G is connected and locally finite, the
last sentence means that there is a finite number of edges incident to a particular
vertex v ∈ V.
Example 6.1.1. Simple random walk on a graph G, for instance on Zd , corresponds

to πx = degG (x) and c(x, y) = 1, where degG (x) is the number of edges which are
incident to x, or the number of neighbors of x in G. In the case of Zd , degG (x) = 2d
for any x.
In the electrostatic interpretation of G as an electric network, V is the set of

nodes, E is the set of wires between various nodes and, for e = (x, y) ∈ E, the
quantity c(e) = c(x, y) is the conductance. The reciprocal r(e) = c(e)−1 is the
resistance of edge/wire e.
In electrostatics there are two disjoint subsets A, B ⊂ V which are wired to two
different poles of a battery, so that the voltage at A is positive and the voltage at B
is 0.
Remark 6.1.2. In the sequel, unless mentioned otherwise, we shall assume that
V \ (A ∪ B) is finite and connected.
6. REVERSIBLE RANDOM WALKS AND ELECTRIC NETWORKS 89
In this case there is an unambiguously defined (see Theorem 6.1.1 below) equi-
librium voltage distribution v on V, and there is an equilibrium current i from the
source A to the sink B across the edges of E. In the sequel we shall use symbol v(x)
for the voltage at x and i(x, y) for the current across e = (x, y) from x to y. Note
that i(x, y) may be positive or negative and that i(x, y) = −i(y, x). In the sequel
we shall use
~E = The set of oriented edges of G
If ~e = {x, y}, then −~e = {y, x}. For ~e = {x, y} ∈ ~E we use x = s− (~e) and y = s+ (~e)
to stress the orientation of the edge. In this way the outflow from A and inflow into
B are given by
X X
Out(A) = i(~e) and In(B) = i(~e) (6.1.2)
~e : s− (~e)∈A ~e : s+ (~e)∈B
Electrostatics is run by two laws which regulate voltages and currents (and which,
in particular, imply that Out(A) = In(B)).
Ohm’s Law. For ~e = {x, y} the voltages v(x) and v(y) and the current i(~e) = i(x, y)
satisfy:
i(x, y) = c(x, y) (v(x) − v(y)) . (6.1.3)
Kirchoff ’s Law. For any x ∈ G \ (A ∪ B)

X
i(~e) = 0. (6.1.4)
~e : s− (~e)=x
Theorem 6.1.1. There exists a unique equilibrium voltage distribution which is 1

on A and 0 on B, and which satisfies Ohm’s and Kirchoff ’s laws. In the sequel we
shall call it unit equilibrium voltage and denote it vAB .
Here is the relation between the random walk X and the unit equilibrium voltage
distribution vAB :
Theorem 6.1.2. Let TA and TB be the first hitting times by X of A and B. Then,
vAB (x) = Px (TA < TB ) for any x 6∈ A ∪ B. (6.1.5)
Furthermore, the total current i(A, B) from A to B equals to

X X
Out(A) = πa Pa (TB < TA ) = In(B) = πb Pb (TA < TB ) . (6.1.6)
a∈A b∈B
Proof of Theorem 6.1.1 and Theorem 6.1.2. By the combination of Ohm’s and Kir-
choff’s laws the unit equilibrium voltage is harmonic function on V \ (A ∪ B) which
90 CONTENTS
equals 1 on A and 0 on B. This means that vAB satisfies the following equation for
any x 6∈ A ∪ B X
c(x, y) (v(x) − v(y)) = 0. (6.1.7)
y∼x
We accept without proof that if V \ (A ∪ B) is finite, then there is a unique such

solution to (6.1.7). Hence the conclusion of Theorem 6.1.1.
The claim of Theorem 6.1.2 will then follow once we shall check that the function

1,
 x∈A
h(x) = 0, x∈B (6.1.8)

Px (TA < TB ) , otherwise

satisfies (6.1.7). But this is straightforward by the first step decomposition: For
x 6∈ A ∪ B,
X (6.1.1) 1 X
h(x) = Ex h(X1 ) ⇒ 0 = P(x, y)(h(x) − h(y)) = c(x, y)(h(x) − h(y)).
y
πx y
Consequently, the outgoing current from any a ∈ A equals to

X X
c(a, x) (1 − Px (TA < TB )) = c(a, x)Px (TB < TA ) = πa Pa (TB < TA ) ,
x x
and (6.1.6) follows.
Since we imposed the unit voltage drop between A and B, the following definition
is natural in light of Ohm’s law:
Definition 6.1.1. The effective conductance ceff (A, B) between A and B is defined
via X
ceff (A, B) = πa Pa (TB < TA ) (6.1.9)
a∈A
The effective resistance reff (A, B) is the reciprocal of ceff (A, B).
6.2 Necessary and sufficient criterion for transience.

Let G be an infinite graph, X is a reversible random walk on G. We continue to em-
ploy notation (6.1.1) which relates X to electric network with conductances c(x, y).
We would like to formulate and electrostatic criterion which would distinguish be-
tween transience and recurrence.
By assumption, the graph G is connected, so X is irreducible. We have to explore
escape probability to ∞ from any a ∈ V. Consider singletons A = {a}. Then,
πa Pa (TB < Ta ) = i(a, B) = ceff (a, B). (6.2.1)

relates escape probabilities to effective conductances whenever G \ B is finite.
Let now Bn be a non-increasing sequence of subsets of G which goes to ∞ in the
following sense:
For all n, V \ Bn is connected and ∩n Bn = ∅. (6.2.2)
For instance, in the case of Zd one can take Bn as
( d
)
X
d
Bn = x = (x1 , . . . , xd ) ∈ Z : kxk1 = |xi | > n . (6.2.3)
1
Evidently, Pa (TBn < Ta ) is non-increasing in n. Hence,

ceff (a, ∞) = lim ceff (a, Bn ) (6.2.4)
n→∞
is well defined. Since, by construction,

1
Pa (Ta = ∞) = ceff (a, ∞), (6.2.5)
πa
we have proved:
Theorem 6.2.1. An irreducible reversible Markov chain X on a countable state
space V is transient if and only for some (and hence for any) a ∈ V the effective
conductance ceff (a, ∞) > 0.
6.3 Probabilistic interpretation of unit currents.

If unit voltage difference is applied to a and B, then the total current i from a
to B equals to the effective conductance ceff (a, B). Hence, voltage difference of
ceff (a, B)−1 = reff (a, B) leads to the unit current,
∆ 1 1
iaB (~e) = i(~e) = c(x, y) (vaB (x) − vaB (y)) , (6.3.1)
ceff (a, B) ceff (a, B)
where vaB is the equilibrium unit voltage described in Theorem 6.1.2.
Theorem 6.3.1. Given x, y ∈ V, let Jxy be the number of jumps from x to y for
the random walk X which starts at a and which is stopped upon arrival to B. Then,
Ea (Jxy − Jyx ) = iaB (x, y). (6.3.2)
Proof. We shall prove (6.3.2) for x, y 6∈ B. The remaining cases will follow by a
straightforward adaptation.
∞
X
Ea (Jxy ) = Pa (X(n) = x; X(n + 1) = y; TB > n)
n=0
∞ (6.3.3)
∆
X
= P(x, y) Pa (X(n) = x; TB > n) = P(x, y)GB (a, x).
n=0
92 CONTENTS
The function GB is called Green’s function with zero (Dirichlet) boundary conditions
on B. Since by (6.1.1) P(x, y) = πx−1 c(x, y), order to prove (6.3.2) we need to check
that
1 1 1
GB (a, x) − GB (a, y) = (vaB (x) − vaB (y)) . (6.3.4)
πx πx ceff (a, B)
By reversibility
πa Pa (X(n) = x; TB > n) = πx Px (X(n) = a; TB > n)
Which means that,

1 1
GB (a, x) = GB (x, a)
πx πa
On the other hand,
GB (x, a) = Px (Ta < TB ) GB (a, a) = vaB (x)GB (a, a)
However, by the strong Markov property, number of visits ∞

P
n=0 1{X(n)=a;TB >n} has,
under Pa , geometric distribution with probbility of success p = Pa (TB < Ta ). Hence,
1 1 1
GB (a, x) = vaB (x) = vaB (x).
πx πa Pa (TB < Ta ) ceff (a, B)
Hence (6.3.4)
6.4 Variational description of effective conductances and ef-

fective resistances.
Given a function g on let us define it’s energy E(g) as
X
E(g) = c(x, y) (g(x) − g(y))2 . (6.4.1)
(x,y)∈E
Above each un-oriented edge (x, y) ∈ E is encountered exactly once. There is no

inconsistency since c(x, y) = c(y, x).
Definition. Let us say that θ : E ~ 7→ R is a flow from a to B it is
(a) Antisymmetric: θ(x, y) = θ(y, x).
(b) Kirchoff’s law is satisfied for any x ∈ V \ ({a} ∪ B):
X
θ(~e) = 0,
~e:s− (~e)=x
(c) It is non-negative:
X X X
0≤ θ(~e) = − θ(~e). (6.4.2)
s− (~e)=a b∈B s+ (~e)=b
Note that the equality in (6.4.2) follows from (a) and (b).
The flow θ is called unit if the quantities on the both side of (6.4.2) equal to one.
It is called positive if there is a strickt inequality in (6.4.2). Otherwise, if both sides
equal to zero, it is called sourceless. .
Recall (6.2.1). An example of unit flow from a to B over oriented edges ~e = {x, y}
is given by the unit current iaB in (6.3.1).
In view of (6.3.1) it makes sense to fine the energy of a flow θ as
1X 1 1X
E(θ) = θ(~e)2 = r(~e)θ(~e)2 . (6.4.3)
2 c(~e) 2
~e ~e
In this way the energy of the equilibrium voltage vaB and of the unit flow iaB are
related as follows:
1 ∆
E(vaB ) · E (iaB ) = 1 or E (iaB ) = = reff (a, B). (6.4.4)
ceff (a, B)
In particular, one can reformulate Theorem 6.2.1 in terms of resistances as follows:
Theorem 6.4.1. An irreducible reversible Markov chain X on a countable state
space V is transient if and only for some (and hence for any) a ∈ V the effective
resistance reff (a, ∞) < ∞.
In order to use either of the two formulations for sorting out recurrence and tran-
sience issue for particular examples one relies on the following variational principle:
Theorem 6.4.2. The equilibrium voltage vaB is the unique solution of the mini-
mization problem
min E(g). (6.4.5)
g(a)=1,g|B =0
Consequently, the minimum in (6.4.5) equals to ceff (a, B).

Similarly, the unit current iaB is the unique solution of the minimization problem
min E(θ). (6.4.6)

θ unit flow from a to B
Consequently, the minimum in (6.4.6) equals to reff (a, B).

Sketch of the proof. If g is a minimizer in (6.4.5), then in view of the necessary
condition for the minimum, the induced flow i(x, y) = c(x, y)(g(x) − g(y)0 should
satisfy the Kirchoff’s law at any vertex from V \ ({a} ∪ B). Hence, g = vaB by
Theorem 6.1.1.
The second claim of Theorem 6.4.2 is the instance of the so called Thomson’s
principle. Let us sketch the proof of the latter and refer to the book by Lyons-Peres
for complete detail:
Any unit flow θ from a to B could be recorded as
∆
θ = iaB + (θ − iaB ) = iaB + η.
94 CONTENTS
P
By construction η is sourceless, that is ~e:s− (~e)=a θ(~e) = 0. Prime examples of
sourceless flows are loops or vortices: Let ~e1 , . . . , ~ek be a sequence
Pkof adjacent pair-
wise disjoint edges, such that s− (~e1 ) = s+ (~ek ). Then set θ(~e) = `=1 1~e=~e` .
If η is a vortex, then it is easy to see that
k
X 1 X
re iaB (~e)η(~e) = (vaB (s+ (~e` )) − vaB (s− (~e` ))) = 0.
ceff (a, B) `=1
~e
Hence
E (iaB + η) = E (iaB ) + E (η) , (6.4.7)
for any vortex η. It happens that votices span sourceless flows in such a way that
the orthogonality relation (6.4.7) holds for any sourceless flow η. Hence, an addition
of sourceless flows can only increase energy, and (6.4.6) is indeed attained at iaB as
claimed.
6.5 Rayleigh’s principle and Nash-Williams criterion.

One can associate different sets of conductances {c(e} with edges e ∈ E of a given
a graph G = (V, E). Rayleigh’s principles compares corresponding effective conduc-
tances.
Theorem 6.5.1. If two assignments of conductances {c(e)} and {c0 (e)} satisfy,
c(e) ≤ c0 (e) for any e ∈ E, (6.5.1)
then ceff (a, B) ≤ c0eff (a, B) for any B such that V \ B is finite.
Proof. Since the associated energies satisfy E(g) ≤ E 0 (g) for any function g con-
tributing to minimum in the left hand side (6.4.5), the claim follows directly from
the variation description of effecting conductances.
Remark 6.5.1. In view of (6.2.4) (6.5.1) implies that
ceff (a, ∞) ≤ c0eff (a, ∞).
Hence, the associated X0 random walk is transient if X is. Alternatively, X is recur-
rent if X0 is.
Nash-Williams creterion give a lower bound on the effective resistance reff (a, ∞).
Definition. A set of edges Λ is called a cutset (for a ∈ V) if any infinite path from
a to infinity includes at least one edge from Λ.
For instance, if V = Zd and a = 0, then
Λn = {e = (x, y) : kxk∞ = n, kyk∞ = n + 1} (6.5.2)
is a cutset for every n ∈ N. Above kxk∞ = max {|x1 | , . . . , |xd |} .
Theorem 6.5.2. Let a ∈ V and let {Λn } be a countable family of disjoint finite
cutsets for a. Then, !−1
X X
reff (a, ∞) ≥ c(e) . (6.5.3)
n e∈Λn
Proof. Let θ be a unit flow from a to ∞. Recall the definition of the energy E(θ) in
(6.4.3). Since θ(−~e) = −θ(~e), the absolute value |θ(e)| is unambiguously defined for
any (un-oriented) edge e. If we show that
!−1
X X X
r(e)θ2 (e) ≥ c(e) , (6.5.4)
e∈E n e∈Λn
then the claim follows by Thomson’s principle (6.4.6). Since θ is a unit flow and
since, for any n, Λn is a cutset it is intuitively clear and actually not difficult to
check that X
|θ(e)| ≥ 1.
e∈Λn
However,
v ! !
u
X X p Cauchy−Schwarz u X X
|θ(e)| = |θ(e)| c(e)r(e) ≤ t θ2 (e)r(e) c(e) .
e∈Λn e∈Λn e∈Λn e∈Λn
Therefore, !−1
X X
θ2 (e)r(e) ≥ c(e) ,
e∈Λn e∈Λn
and, since Λn -s are pair-wise disjoint, (6.5.4) follows.
6.6 Simple random walk on Zd and Polya’s Theorem.

Simple random walk on a graph G = (V, G) corresponds to unit conductances c(e) ≡
1 across the edges e ∈ E. Polya’s theorem states that simple random walk on Zd is
recurrent for d = 1, 2 and transient for d ≥ 3.
CASE 1. Low dimensions d = 1, 2. Consider cutsets (6.5.2). Then,
!−1 (
1
X
2
, , if d = 1
c(e) = 1 . (6.6.1)
e∈Λn 4(2n+1)
, if d = 2
In both instances the series in (6.5.4) diverge.

CASE 2. High dimensions d ≥ 3. First of all by Rayleigh’s principle it would be
enough to establish transience only for d = 3. By Thomson’s principle we need
96 CONTENTS
to construct a finite energy positive flow θ from the origin to ∞. This may be
done in the following way. Let P be a distribution of random semi-infinite edge
self-avoiding paths γ ω = (γ0ω , γ1ω , γ2ω , . . .) from the origin; γ0ω ≡ 0, to ∞ (meaning
limn→∞ kγnω k1 = ∞ P-a.s). We can talk about events
{v ∈ γ ω } = {∃n : v = γnω } and {~e ∈ γ ω } = ∃n : ~e = γnω , γn+1 ω

Note that
θ(~e) = P (~e ∈ γ ω ) − P (−~e ∈ γ ω ) (6.6.2)
is a unit flow from the origin to ∞.
Exercise 6.6.1. Construct a distribution P on the set of semi-infinite edge-selfavoiding

paths in Zd from the origin to ∞, which complies with the following uniform uper
bound: There exists a constant C < ∞, such that
C
P (v ∈ γ ω ) ≤
1 + kvk∞
for every v ∈ Zd . Check that for such distribution P the flow θ in (6.6.2) has finite
energy.
6.7 Simple random walk on trees.

Trees are graphs without loops. The homogeneous tree with forward branching ratio
k; Tk , is an infinite tree such that each vertex v ∈ Tk has exactly (k + 1) neighbours.
Of course T1 = Z. In the sequel let us fix a vertex 0 ∈ Tk and call it root. Then for
any v ∈ Tk the distanced |v| from v to the root 0 is well defined.
Exercise 6.7.1. Consider the simple random walk X on Tk . Give two different
proofs that X is transient for any k > 1, using:
(a) Consider Yn = |Xn | and describe it as a random walk on N0 .
(b) Directly from Thomson’s principle, that is via constracting a finite energy
unit flow.
7. RENEWAL THEOREY IN CONTINUOUS TIME 97
7 Renewal theorey in continuous time
7.1 Poisson Process.
Recall that N is a Poisson random variable with parameter λ > 0; N ∼ Poi(λ), if
λk −λ
P (N = k) = e , k = 0, 1, 2 . . . (7.1.1)
k!
Poisson process of arrivals with intensity λ is a collection of random variables
{N (t)}t∈[0,∞) , where N (t) describes number of arrivals by time t. In this way
N (s, t] = N (t) − N (s) is number of arrivals during the time interval (s, t]. The
inter-arrival times are denoted T1 , T2 , . . . . The time of k − th arrival is denoted
k
X
Sk = Ti .
1
Poisson process of arrivals has (and is characterized by) the following set of proper-
ties:
• For each t > 0, N (t) ∼ Poi(λt). More generally, for each s < t, N (s, t] = N [s, t] ∼
Poi(λ(t − s)).
• For any k and any 0 < s1 < t1 ≤ s2 < t2 ≤ · · · ≤ sk < tk , random variables
N [s1 , t1 ], . . . , N [sk , tk ] are independent.
• Inter-arrival times T1 , T2 , . . . are independent and distributed Exp(λ).
• For any k ≥ 1, the time Sk of k − th arrival is distributed Γ(k, λ). that is the desnity
function fSk (s) is zero for s < 0, and
λk sk−1 e−λs
fSk (s) = for s ≥ 0. (7.1.2)
(k − 1)!
7.2 The Setup and the Elementary Renewal Theorem.

We start with the directly with delayed renewal, which enables to consider a natural
situation when the distribution until first arrival differs from the distribution of
successive inter-arrival times. Namely, let T1 , T1 , T2 , . . . be independent non-negative
random variables such that T2 , T3 , . . . are identically distributed with µ = E(Ti ). In
the sequel F 0 is the distribution function of Ti -s for i ≥ 2, and F is the distribution
function of the initial delay T1 .
We shall assume that the distribution of F 0 is proper and non-trivial, meaning that
P(Ti < ∞) = 1 for i ≥ 2, and that F 0 (0) < 1. Necessary adjustments for the defective
case P(Ti < ∞) = 1 will be explicitly mentioned whenever appropriate.
98 CONTENTS
As in the discrete case we shall think about Ti -s as of inter-arrival times. Define
arrival times
n
X
Sn = Ti n = 1, 2, . . . . (7.2.1)
1
Two basic quantities of interest are:
• Number of arrivals by time t ∈ [0, ∞): N (t) = # {k : Sk ≤ t}.
• Expected number of arrivals: m(t) = EN (t).
Let Sk0 , N 0 (t) and m0 (t) be the corresponding quantities for zero-delay, that is
when T1 is distributed as the rest of Ti -s (or, depending on the context, T1 is set to
be zero).
Remark 7.2.1. (a) In the case of exponential distribution Ti ∼ Exp(λ), N (t) is the
familiar Poisson process of arrivals with intensity λ. However, our objective here is
to explore a much more general situation.
∆
(b) It might happen that Ti -s take only integer values; P Ti ∈ N0 = N ∪ {0} = 1.
Yet, even in such cases, we shall consider continuous time here, in particular both
N (t) and m(t) will be functions of continuous variable t ∈ [0, ∞).
Note that the following events coincide:
{N (t) ≥ k} = {Sk ≤ t} and {N (t) ≤ k} = {Sk+1 > t} . (7.2.2)
Exercise 7.2.1. In the following two cases write explicit expressions for P(N 0 (t) =
n) (for any t ∈ [0, ∞)):
(a) Ti ∼ Γ(2, λ).
(b) Ti ∼ Poi(λ).
Remark 7.2.2. In general it is impossible to compute easily or even exactly quanti-

ties like, P (N (t) = n), and the purpose of renewal theory is to develop analytic tools
needed for a study of N on large time intervals [0, t].
Elementary Renewal Theorem for Delayed Renewal. Under our assumptions:
N ( t) 1
lim = 1{T1 <∞} P − a.s. (B.1)
t→∞ t µ
Furthermore,
m(t) 1
lim = P (T1 < ∞) . (B.2)
t→∞ t µ
Finally, for any t ≥ 0
Z t Z t
0
m(t) = F (t) + m (t − s)dF (s) = F (t) + m(t − s)dF 0 (s). (B.3)
0 0
Above F and F 0 are distribution functions of T1 and T2 , T3 , . . . respectively.
The integrals in (B.3) are defined as:

Z t Z t
0 ∆ 0 ∆
m(t−s)dF 0 (s) = E m(t − T )1{T ≤t} ,

m (t−s)dF (s) = E m (t − T1 )1{T1 ≤t} and
0 0
where T is distributed as T2 , T3 , . . . .
Remark 7.2.3. As in the discrete case, the almost sure convergence; limn→∞ Xn =
X P-a.s, does not imply that limn→∞ E(Xn ) = E(Xn ), even if all the expectations
are defined and finite. Which means that (B.1) does not automatically imply (B.2),
and additional arguments, based on integral convergence theorems are needed.
The proof is identical to the proof in the discrete case.
Proof. We shall split the proof into several steps.
STEP 1. Let us start with zero-delayed process N 0 . We claim that
lim N 0 (t) = ∞ P − a.s. (7.2.3)

t→∞
Indeed, since N 0 (t) is non-decreasing in t, the limit exists. Now, for any n ∈ N,

lim N 0 (t) ≤ n = lim P N 0 (t) ≤ n .

P
t→∞ t→∞
However, by (7.2.2)
P N 0 (t) ≤ n = P Sn+1
0

> t = 1 − FS 0 (t),
n+1
where FS0n is the distribution function of Sn

0 . Since P(T < ∞) = 1,
i
lim FS0n (t) = P (Sn < ∞) = 1.

t→∞
Hence (7.2.3).
STEP 2. By STEP 1 and SLLN for 1 0
S ,
n n
0
SN 0 (t) N 0 (t) 1
lim =µ P − a.s. ⇒ lim 0
= P − a.s. (7.2.4)
t→∞ N 0 (t) t→∞ SN 0 (t) µ
100 CONTENTS
STEP 3. 0
By definition SN 0
0 (t) ≤ t < SN (t)+1 . Hence,
" #
N 0 (t) N 0 (t) N 0 (t)
∈ 0
, 0 .
t SN 0 (t)+1 SN 0 (t)
1
By (7.2.4) right end-points of the above intervals converge to µ
. As for the left end-points:
N 0 (t) N 0 (t) + 1 N 0 (t)

lim 0
= lim 0
· 0
t→∞ SN (t)+1 t→∞ S 0
N (t)+1
N (t) + 1
N 0 (t)
By STEP 1 we conclude that limt→∞ N 0 (t)+1
= 1 P-a.s (simply because lim N 0 (t) = ∞). On the other hand
0
N (t)+1 1
limt→∞ S0 0
= µ
by (7.2.4).
N (t)+1
In order to prove (B.2) we, exactly as it was done in the discrete case, shall rely Wald formula (2.2.10)for
stopping times.
STEP 4 Let us assume first that µ < ∞. Since N 0 + 1 is a stopping time, by Wald’s formula
0 0
ESN 0 (t)+1 = µE(N (t) + 1) = µ(m(t) + 1).
0
However t < SN . Therefore,
(t)+1
0
E(SN 0 (t)+1 ) m0 (t) m0 (t)

1 1
1≤ =µ + ⇒ ≤ lim inf . (7.2.5)
t t t µ t→∞ t
Note that (7.2.5) is trivial if µ = ∞.

STEP 5 For A > 0 define TiA = Ti ∧ A. Then TiA

is an i.i.d sequence of bounded random variables. By the
Monotone Convergence Theorem,
∆
lim E(TiA ) = lim µA = µ.
A→∞ A→∞
Now, since TiA ≤ Ti we obviously have that N A (t) ≥ N 0 (t), and hence mA (t) ≥ m0 (t). Therefore it is enough to
check that for any A,
mA (t) 1
lim sup ≤ A. (7.2.6)
t→∞ t µ
STEP 6. Since TiA -s are bounded above by A,
SN A (t) SN A (t)+1 − A
1≥ ≥ .
t t
Therefore, taking expectations and using Wald’s formula again,
µ(mA (t) + 1) − A
1≥ ,
t
for any A > 0. (7.2.6) follows.

STEP 7. Let us turn to the general delayed renewal. Delayed renewal means that our zero-delayed renewal process,
which we continue to call N 0 starts not at time zero, but at random time T1 . Thus,
N (t) = 1{T1 ≤t} + 1{T1 ≤t} N 0 (t − T1 ) . (7.2.7)
By the renewal theorem for zero-delayed process,
N 0 (t − T1 ) 1
lim =
t→∞ t µ
P-a.s. on the event {T1 < ∞}. This yields (B.1). Next, because of independence of T2 , T3 , . . . from T1 ,
E 1{T1 ≤t} N 0 (t − T1 ) ≤ E 1{T1 ≤t} N 0 (t) = P (T1 ≤ t) m0 (t).

Therefore, by (B.2) of the renewal theorem for the zero-delayed process,
m0 (t) 1
lim sup ≤ P (T1 < ∞) .
t→∞ t µ
On the other hand, for every A fixed, 1{T1 ≤t} N 0 (t − T1 ) ≥ 1{T1 ≤A} N 0 (t − A) whenever t ≥ A. Which means
that,
m(t) 1
lim inf ≥ P (T1 ≤ A) .
t→∞ t µ
Since A is arbitrary (and since P (T1 < ∞) = limA→∞ P T1∗ ≤ A ), (B.2) follows as well.

In order to prove (B.3) let us introduce for k ≥ 1,
ϕ0k (t) = P Sk0 ≤ t ,

ϕk (t) = P (Sk ≤ t) and (7.2.8)
where, as usual, Sk0 = T1 + . . . , Tk for T1 being distributed according to F 0 as the rest of Ti -s. In this notation,
∞
X ∞
X
m(t) = ϕk (t) and m0 (t) = ϕ0k (t). (7.2.9)
k=1 k=1
Of course, ϕ1 (t) = P (T1 ≤ t) = F (t). Let us consider now k = ` + 1 for ` ≥ 1. In this case
ϕ`+1 (t) = P (T1 + T2 + · · · + T`+1 ≤ t) = E 1{T1 ≤t} P S`0 ≤ t − T1

T1
(7.2.10)
= E 1{T1 ≤t} ϕ0` (t − T1 ) .

In view of (7.2.10) and (7.2.7),
∞
X
E 1{T1 ≤t} ϕ0` (t − T1 ) = F (t) + E 1{T1 ≤t} m (t − T1 ) .

m(t) = F (t) +
`=1
Similarly, if k = ` + 1, then,

ϕ`+1 (t) = P (T1 + T2 + · · · + T`+1 ≤ t) = E 1{T P S` ≤ t − T`+1 T`+1 = E 1{T ϕ` (t − T` ) .
`+1 ≤t} `+1 ≤t}
This follows by independence of T1 , T2 , . . . , T` from T`+1 . As a result, since Ti -s are equally distributed for i ≥ 2,
∞
X
m(t) = F (t) + E 1{T` ≤t} ϕ` (t − T` ) = F (t) + E 1{T ≤t} m (t − T ) .
`=2
Both equalities in (B.3) follow.
Exercise 7.2.2. Solve Exercise 2.2.4 using (B.3) for inter-arrival times Ti ∼ Uni(0, 1).
Hint: Note that M = N (1) + 1. Also note that for t ≤ 1 the expectation m(t) =
E(N (t)) satisfies the following ordinary differential equation: m(0) = 0 and
Z t
d
m(t) = t + m(t − s)ds ⇒ m(t) = 1 + m(t).
0 dt
102 CONTENTS
7.3 Renewal-reward theorem and applications.
The setup and the exposition is rather similar to the discrete case. In particular,
what happens during the initial delay T1 does not play a role, and we shall conve-
niently ignore it, as if T1 is distributed as the rest of Ti -s.
Consider a collection of i.i.d. couples of (Ti , Ri ), such that:
(a) We think of Ti -s in terms of inter-arrival times as in the case of usual renewals.

(b) We think of Ri -s in terms of awards collected during time interval of length Ti . It
makes sense to consider a general situation, e.g. we do not require Ri -s to be non-negative,
for instance the setup is adjusted to considering fines instead of rewards. In any case,
however, we shall assume that ERi is defined and finite,
∆
r = E(Ri ) ∈ (−∞, ∞). (7.3.1)
(c) Note that in general Ti and Ri are dependent. What we require is independence of
couples (Ti , Ri ) for different i-s.
We continue
Pk to employ notation N (t) for the number of arrivals by time t and
Sk = 1 Ti for the time of k-th arrival. Recall that SN (t) ≤ t ≤ SN (t)+1 .
There are several ways to define reward collected by time t:
(a) Initial reward (collected at the beginning of an epoch) CI (t) = N

P (t)+1
Ri .
PN (t)1
(b) Terminal reward (collected in the end of an epoch) CT (t) = 1 Ri .
(c) Partial reward CP (t) = CT (t) + R(t, SN (t) ). For convenience we shall always assume
that partial reward is in between the terminal and the initial one, that is:

min 0, RN (t)+1 ≤ R(t, SN (t) ) ≤ max 0, RN (t)+1 . (7.3.2)
Here are two important examples:
Example 7.3.1. Ti -s are inter arrival times, Ri -s are service times, that is Ri is time
needed to serve i-th customer. Then CT (t) is the total time needed to serve all the cus-
tomers who arrived before t.
Example 7.3.2. Let Ti -s be inter-arrival times. Set

Z t
1 2
Rk = Tk and R t, SN (t) = s − SN (t) ds. (7.3.3)
2 SN (t)
Renewal-reward Theorem. . Assume (7.3.1) and (7.3.2). Then,
C∗ (t) r
lim = P − a.s., (7.3.4)
t→∞ t µ
for ∗ = I, T, P. Moreover,
E(C∗ (t)) r
lim = . (7.3.5)
t→∞ t µ
Proof. By (7.3.2) it would be enough to consider ∗ = I, T. For terminal rewards,

N (t) N (t)
CT 1X N (t) 1 X
= Rk = · Rk .
t t 1 t N (t) 1
Recall that we have already checked that a.s. − limt→∞ N (t) = ∞, and that a.s. −
limt→∞ Nt(t) = µ1 , the latter is just (A.1) of the elementary renewal theorem. On
the other hand,
N
1 X
a.s − lim Rk = r,
N →∞ N
1
by the strong LLN. The same logic applies for initial rewards:
N (t)+1
CI N (t) + 1 1 X
= · Rk .
t t N (t) + 1 1
Here is the first application:

Example 7.3.3. In the G|G|1-queue there is one server. Customers arrive with
i.i.d inter-arrival times Ti and their required service times Ri -s are also i.i.d. and
independent of Ti -s. It is assumed that an arriving customer immediately enters the
service if the server is free, otherwise he waits until the service of all the previous
customers is finished, and enters the service as soon as the server becomes free.
Assume also that the server was idle before time zero and that a customer
enters the service at time 0 and that the service time R0 is independent
and identical to Ri -s. Let M (t) be the total time that the server was busy during
[0, t]. Then Mt(t) describes the load of the server by time t. By renewal reward,

M (t) r
lim sup ≤ min 1, , (7.3.6)
t→∞ t µ
Indeed, (7.3.6) follows as soon as we notice that
N (t)
X
M (t) ≤ R0 + Rk = R0 + CT (t).
1
104 CONTENTS
r
It is tempting to conclude that there is an equality in (7.3.6) whenever µ
< 1. This
will be discussed in the framework of Little’s laws below.
Biased sampling/Size-biased distributions. Consider rewards as in the Ex-

ample 7.3.2. The expression
1 t
Z
1
CP (t) = s − SN (s) ds
t t 0
has the following interpretation: We choose a random point uniformly from [0, t]
(and independently from the renewal process) and ask for the mean time since the
previous arrival. Then, the Renewal-Reward theorem implies:
1 t 1 E(T 2 )
Z

lim s − SN (s) ds = · . (7.3.7)
t→∞ t 0 2 E(T )
More generally, for u ≥ 0 consider the (normalized) partial reward
1 t
Z
1
CP (t) = 1 ds. (7.3.8)
t t 0 {SN (s)+1 −SN (s) ≤u}
This corresponds to the probability that a randomly sampled (uniformly from [0, t])
point falls into the inter-arrival interval of duration at most u. By the Renewal-
Reward theorem

1 t E T 1{T ≤u}
Z
lim 1{SN (s)+1 −SN (s) ≤u} ds = . (7.3.9)
t→∞ t 0 E(T )
Exercise 7.3.1. Check the following forms of the so called size-biased distribution:
(a) If T is a continuous random variable with density function f , then the right
hand side of (7.3.9) describes the law of continuous random variable T̃ with density
function f˜(t) = E(T
tf (t)
)
.
(b) If T is a discrete random variable with probability function p, then the right
hand side of (7.3.9) describes the law of discrete random variable T̃ with probability
tp(t)
function p̃(t) = E(T )
. What happens if P(T = 0) > 0?
Further exercises (mostly borrowed from Durrett and Grimmet-Stirzaker, see the
difference with their analogs in the discrete case):
Exercise 7.3.2. Suppose the lifetime of a car is a random variable with density
function h. B buys a new car as soon as the old one breaks down or reaches T years.
Suppose a new car costs a NIS and that an additional cost of b NIS is accumulate
if the car breaks down before T . What is the long-run cost per unit time of B’s car
policy?
Calculate the cost (as a function of T ) when the lifetime is uniformly distributed on
[0, 10], a = 10, and b = 3.
Exercise 7.3.3. Let T1 , T2 , . . . be i.i.d. inter-arrival times with T1 ∼ Poi(λ), λ > 0.
Calculate the probability that, as t → ∞, a point uniformly chosen from (0, t) will
fall on an interval of length at least u > 0.
Exercise 7.3.4. A machine is working for an exponential time with parameter λ1 >
0 and is being repaired for an exponential time with parameter λ2 > 0. At what rate
does the machine break down? Calculate the probability that the machine is working
at a time point uniformly chosen from (0, t) as t → ∞.
Exercise 7.3.5. The weather in a certain locale consists of alternating wet and dry
spells. Suppose that the number of days in each rainy spell is a Poisson distribution
with mean 2, and that a dry spell follows a Geometric distribution with mean 7. As-
sume that the successive durations of rainy and dry spells are independent. Calculate
the probability that it rains at a point uniformly chosen from (0, t) as t → ∞.
Poissonian formulation ofPbiased sampling. Let T1 , T2 , . . . be i.i.d. inter-

arrival times. As usual Sk = k1 Ti . Let now A(t) be a Poisson process of intensity
λ which is independent of Ti -s. For instance, one can think about Ti -s being random
times between successive arrivals of buses to a certain bus station, whereas A(t)
describes arrivals of passengers to the same station. Let Rk be the number of
passengers who arrived to the station during the time interval (Sk−1 , Sk ]. Given
u > 0 consider the reward
Rku = Rk 1{Tk ≤u} .
Then the terminal award
N (t)
X
CTu (t) = Rku
1
describes the total number of passengers who departed before time t and whose
arrival “fell” into
the inter-arrival (buses) interval of length less or equal to u. Note
that A SN (t) describes the total number of passengers who departed from the
station by time t. By the renewal-reward theorem.
1 SN (t) S C u (t) 1 E(Ru )

lim CTu (t) = lim · N (t) · T = · . (7.3.10)
t→∞ A SN (t) t→∞ A SN (t) t t λ E(T )
Now,
E(Ru ) = E(E(Ru | T )) = E 1{T ≤u} E(A(T ) | T ) = λE 1{T ≤u} T .

Therefore, the limiting ratio of passengers who fall into inter-arrival (buses) times
of length ≤ u is still given by size biased law (7.3.9).
106 CONTENTS
Little’s laws. Let us return to the Example 7.3.3. Recall that we are assuming
that the server was idle before time zero and that a customer enters the service
at time 0 and that the service time R0 is independent and identical to Ri -s For
simplicity we shall assume that P (Ti = 0) = 0, that is customers arrive one at
a time. In particular, this means that the customer which arrived at time zero
immediately entered the service.
Recall that we could not compute the limiting server load limt→∞ Mt(t) because
in general M (t) ≤ N
P (t)
0 Rk , that is because in general not all the customers who
arrived before time t complete their service before t. Let X(t) be the total number
of customers in the G|G|1 system (queue and server) at time t. For definitness we
take X to be right continuous. If X(t) = 0, then evidently M (t) = N
P (t)
0 Rk . Define
σ̂1 = inf {t > 0 : X(t) = 0} . (7.3.11)
That is σ̂1 is the first time when the server becomes idle.
Exercise 7.3.6. Consider the G/G/1 queue described in Example 7.3.3. Let µ be
the mean inter arrival time and r = E (R1 ) to be the mean service time. Denote
ρ = r/µ. Prove that if ρ < 1 then the queue will empty almost surely, or equivalently
P (σ̂1 < ∞) = 1 and if ρ > 1, and if there is a customer in the queue at time zero,
then there is a positive probability that it will never become empty.
Hint. Recall that we are assuming that the 0-th customer
Pk enters the system exatly
at time 0. We continue to use notation Sk = 1 Ti is the arrival time of k-th
customer. Note that σ̂1 > Sk if and anly if all the following happen:
T1 < R0
T1 + T2 < R0 + R1
(7.3.12)
...
T1 + · · · + Tk < R0 + · · · + Rk−1
Consider now Y` = `j=1 (Tj − Rj−1 ) := `j=1 ξ` . In this way, {Y` }`∈N is a random
P P
walk on R with i.i.d. steps ξ` . Note that E(ξ` ) < 0 if ρ < 1 and E(ξ` ) > 0 if ρ > 1.
Hence by the strong LLN, P-a.s.,
lim Y` = −∞ if ρ < 1 and lim Y` = ∞ if ρ > 1. (7.3.13)
`→∞ `→∞
Use (7.3.13) to complete the solution to Exercise 7.3.6.

Let us go back to (7.3.11). Define
τ̂1 = inf {t > σ̂1 : X(t) > 0} . (7.3.14)
That is τ̂1 is the first arrival time after σ̂1 , and the server resumes his work after
being idle during [σ̂1 , τ̂1 ). If
P (τ̂1 < ∞) = 1, (7.3.15)
then we may define, τ̂2 , τ̂3 , . . . recursively.
Exercise 7.3.7. Check that under (7.3.15) the random variables τ̂1 , τ̂2 , . . . are i.i.d.
Therefore,
Pk assuming (7.3.15), τ̂i -s givePrise to a new renewal process, and we
set Ŝk = 1 τ̂i and, accordingly, N̂ (t) = k 1{Ŝk ≤t} . By construction (recall that
CT (t) = N
P (t)
1 Rk ), X
M Ŝk = Rj .
j:Sj <Ŝk
Hence
R0 + CT ŜN̂ (t)−1 ≤ M (t) ≤ R0 + CT ŜN̂ (t)+1 (7.3.16)
By the Renewal-reward theorem we know that limS→∞ CTS(S) = µr . In addition we

know that (7.3.15) implies (see the first step in the proof of the renewal theorem)
that limt→∞ ŜN̂ (t) = ∞, P-a.s. Writing

R0 ŜN̂ (t)−1 CT ŜN̂ (t)−1 M (t) R0 ŜN̂ (t)+1 CT ŜN̂ (t)+1
+ · ≤ ≤ + · ,
t t ŜN̂ (t)−1 t t t ŜN̂ (t)+1
we conclude:
Ŝk+1 M (t) r
If lim = 1, then lim = . (7.3.17)
k→∞ Ŝk t→∞ t µ
Exercise 7.3.8. (a) Check (7.3.17).

(b) Prove that if
E(τ̂i ) < ∞, (7.3.18)
then the assumption limk→∞ ŜŜk+1 = 1 in (7.3.17) holds.

k
Hint. For (a) remember that ŜN̂ (t) ≤ t ≤ ŜN̂ (t)+1 , and deduce
Ŝk+1 t
lim = 1 ⇒ lim = 1.
k→∞ Ŝk t→∞ ŜN̂ (t)±1
Then use Renewal-Reward. For (b) use LLN.

We shall discuss Condition 7.3.18 in the next section in a more general framework
of large deviation bounds. For the moment, let’s consider the following interpretation
of the right hand side of (7.3.17): Think about the system ”server” . It has two
states: 1, if he is busy, and 0, if he is idle. In this way L = limt→∞ Mt(t) describes
the limiting average size of this system. On the other hand W = r is an average
time that a customer spends at the system ”server”. Finally, λ = µ1 describes the
intensity with which customers enter/leave the system ”server”. Therefore, (7.3.17)
reads as
L = λW. (7.3.19)
108 CONTENTS
The relation (7.3.19) happens to be an instance of a very general rule, called Little’s
law. Let us describe and discuss a more general framework when this rule applies:
Customers arrive to the system at (in general random) times A1 < A2 < A3 < . . . ,
and leave at (again in general random) times D1 , D2 , D3 , . . . . Note that we do not
require that Di ≤ Dj if i < j. There are natural arrival and departure processes
A(t) = # {k : Ak ≤ t} and D(t) = # {k : Dk ≤ t} .
Naturally, D(t) ≤ A(t). Finally let Wk be the time which k-th customer spends in
the system, and let M (t) be the total number of customers in the system at time t.
Little’s laws describe situations when the limits
Z t N
1 A(t) D(t) 1 X
L = lim M (s)ds, λ = lim = limand W = lim Wk
t→∞ t 0 t→∞ t t→∞ t N →∞ N
1
(7.3.20)
exist, and, moreover, when the relation (7.3.19) holds. We shall justify Little’s laws
under the following set of assumptions on the system:
Assumption (Regenerative structure). There exist an infinite sequence of times
0 = S0 < S1 < S2 < . . . such that:
A1. The system is empty just before time Si , that is M (Si −) = 0.
A2. Define Ti = Si − Si−1 , Ni -number of customers who entered (and by A1. leaved) the
system during the time interval [Si−1 , Si ), and M [Si−1 , Si ) the trajectory of the process
M during the time interval [Si−1 , Si ). Then {Ti , Ni , M [Si−1 , Si )} is an i.i.d. collection of
random objects. RT
A3. The expectations E(T1 ), E(N1 ) and 0 1 M (t)dt are all finite.
Theorem (Little’s law). Under assumptions A1.-A3. the limits in (7.3.20) are defined
(and in particular two limits for A(t) and D(t) coinside), and, moreover, (7.3.19) holds.
Proof (After Grimmet-Stirzaker). Let W1 , W2 , . . . be the times spent by customers

number 1, 2, . . . in the system. Note that Wi = Di − Ai , and that in general
W1 , W2 , W3 , . . . are dependent. The proof relies on the following starightforward
geometric identities:
Z S1 N1 Z Sk N1 +···+Nk−1 +Nk
X X
M (t)dt = Wj or, more generally, M (t)dt = Wj
0 1 Sk−1 N1 +...Nk−1 +1
(7.3.21)
Indeed, by construction all j = 1, . . . , N1 arrivals during the interval [0, S1 ) satisfy
0 ≤ Aj < Dj < S1 . The total number of customer at time t ∈ [0, S1 ) is
N1
X
M (t) = 1{t∈[Aj ,Dj ]} .
j=1
R S1
On the other hand, 0 1{t∈[Aj ,Dj ]} dt = Dj − Aj = Wj . Hence the first of (7.3.21).
The case of general k is completely similar.
In order to finish the proof we shall apply the Renewal-Reward theorem three
times
First application of Renewal-Reward. Consider inter-arrival times T1 , T2 , . . . and
rewards Z Sk
Rk = M (t)dt.
Sk−1
By Assumption A.3, we know that R1 , R2 , . . . are i.i.d. and that E(Rk ) < ∞.
Hence, by the renewal reward theorem
Z t
1 E(R) ∆
lim M (s)ds = = L. (7.3.22)
t→∞ t 0 E(T )
Second application of Renewal-Reward. Consider rewards Nk and let C∗ (t) be the

corresponding reward processes. Clearly
CT (t) ≤ D(t) ≤ A(t) ≤ CI (t).
Hence,
D(t) A(t) E(N ) ∆
lim = lim = = λ. (7.3.23)
t→∞ t t→∞ t E(T )
Third application of Renewal-Reward. Consider renewal process with (discrete) inter
arrival times Nk , and consider rewards,
N1 +···+Nk−1 +Nk Z Sk
X
Gk = Wj = M (t)dt = Rk .
N1 +...Nk−1 +1 Sk−1
The last inequality is precisely (7.3.21). Then the following limit exists:
1+N2X
+···+Nk
1 E(R) ∆
lim Wj = = W. (7.3.24)
k→∞ N1 + N2 + · · · + Nk E(N )
1
Thus, (7.3.19) follows by substitution of (7.3.22) - (7.3.24).
Exercise 7.3.9. Consider G|G|1 queue with mean inter-arrival time µ and average
service time r such that µ > r. Set ρ = µr . Let LQ be the average length of the
queue and L average number of the customers in the system. Similarly, let WQ be
the average time a customer spends waiting in the queue and W the average time
the customer spends in the system. Use an obvious relation between W and = WQ
and Little’s theorem to check that LQ = L − ρ.
110 CONTENTS
The case of G|G|1 queue. We shall prove A.3 for G|G|1 queue under additional assumption that
i.i.d. service times W0 , W1 , W2 , . . . have finite exponential moments:

E eaWk < ∞ for some a > 0. (7.3.25)
Of course, since Wk -s are non-negative, (7.3.25) always holds for any a ≤ 0. Also for simplicity we shall assume
that customers arrive one at a time, that is P(Tk = 0) = 0 and that E(T ) = µ < ∞.
Recall the definitions of σ̂1 and τ̂1 in (7.3.14). Also let η̂1 be the total number of customers who arrived to the
system during [0, τ̂1 ). That is η̂1 = N (σ̂1 ). Note that
Z τ̂1
M (t)dt ≤ σ̂1 η̂1 .
0
Thus A.3 will follow if we check that all the three expectations E (τ̂1 ) , E (η̂1 ) and E (σ̂1 η̂1 ) are finite. By the tail
formula we need to show that all the three integrals
Z ∞ Z ∞ Z ∞
P (τ̂1 > t) dt, P (η̂1 > t) dt and P (σ̂1 η̂1 > t) dt (7.3.26)
0 0 0
are finite.
STEP 1 (Bound on E(σ1 )). First of all:
Exercise 7.3.10. Check the following inclusion (between events):

 
 N (t) 
X
{σ̂1 > t) ⊆ W0 + Wk > t
 
1
Therefore, for any k,
k−1 k−1
! !
X X
P (σ̂1 > t) ≤ P (N (t) > k − 1) + P Wj > t = P (Sk ≤ t) + P Wj > t . (7.3.27)
0 0
Recall that we assume that r < µ. Fix a number ρ ∈ (r, µ), and let us consider t = kρ. Then one can rewrite the
right-hand side above as (µ = E(T ) and r = E(W )),
k k
! !
X X
P Ti ≤ kE(T ) − k(µ − ρ) +P Wk ≥ kE(W ) + k(ρ − r) . (7.3.28)
1 1
Lemma 7.3.1. Under our assumptions there exists a constant c = c(ρ) > 0, such that the expression in (7.3.28)
is bounded above by e−c(ρ)k for all k ∈ N.
Lemma 7.3.1 follows from the (Cramer’s) Large Deviation Upper bound, which will be formulated and explained
below. It implies that
Z ∞ ∞
X
P (σ̂1 > t) dt ≤ ρ P (σ̂1 > kρ) < ∞.
0 k=0
Generalized tail formula. Let X be a non-negative random variable and ϕ a non-decreasing and non-negative differen-
tiable function on [0, ∞) with ϕ(0) = 0. Then
Z ∞
E (ϕ(X)) = ϕ0 (t)P (X > t) dt. (7.3.29)
0
Formula (7.3.29) follows from the identity Z ∞

ϕ(X) = ϕ0 (t)1{X>t} dt
0
and from MON, which we still have to discuss.

Exercise 7.3.11. Check, using (7.3.29) that under our assumptions on Ti -s and Wi -s (r < µ < ∞ and (7.3.25))
the expectation E σ̂1α is finite for any α > 0. Moreover, there exists δ > 0 such that even the exponential moment

E eδσ̂1 < ∞.

STEP 2 (Bound on E (η̂1 )). Note that (as in Exercise 7.3.10)
(k−1 k
)
X X
{η̂1 > k} = {σ̂1 > Sk } ⊆ Wj > Tj .
0 1
However, the probability of the right hand side above is bounded by (7.3.28) and hence
k−1 k
!
X X
P Wj ≥ Tj ≤ e−c(ρ)k (7.3.30)
0 1
for all k ∈ N. By Exercise 7.3.11 not only the expectation, but all the moments of η̂1 are bounded.
STEP 3 (Bound on E (σ̂1 η̂1 )). By Cauchy-Schwartz,
q
E σ̂12 E η̂12 .

E (σ̂1 η̂1 ) ≤
Since all the moments (second moment in particular) of both σ̂1 and η̂1 are bounded, the expectation on the left
hand side above is bounded as well.
STEP 4 It remains to show that E (τ̂1 ) is bounded as well. Now,
P (τ̂1 > 2t) ≤ P (σ̂1 > t) + P (σ̂1 ≤ t; τ̂1 > 2t) . (7.3.31)
It is tempting to conclude that P (σ̂1 ≤ t; τ̂1 > 2t) ≤ P(T > T ), but we are dealing with dependent events and,
therefore, should proceed with some care.
Clearly,
X
P (σ̂1 ≤ t; τ̂1 > 2t) = P (N (σ̂1 ) = k; σ̂1 ≤ t; τ̂1 > 2t) (7.3.32)
k
However,
k k−1
( )
X X
{N (σ̂1 ) = k} ⊆ Ti ≤ Wj .
1 0
Consequently,
k k−1
( )
X X
{N (σ̂1 ) = k; σ̂1 ≤ t; τ̂1 > 2t} ⊆ Ti ≤ Wj ∩ {Tk+1 ≥ t} . (7.3.33)
1 0
In the right hand side of (7.3.33) there is already an intersection of two independent events. A substitution to
(7.3.32) yields:
k k−1
!
X X X (7.3.30) P (T > t)
P (σ̂1 ≤ t; τ̂1 > 2t) ≤ P (T > t) P Ti ≤ Wj ≤ . (7.3.34)
k 1 0
1 − e−c(ρ)
Going back to (7.3.31) we conclude:
Z ∞
E(T )

E (τ̂1 ) = 2 P (τ̂1 > 2t) dt ≤ 2 E (σ̂1 ) + .
0 1 − e−c(ρ)
The proof is finished.

112 CONTENTS
Large Deviations. Let ξ1 , ξ2 , . . . be i.i.d random variables with E (ξi ) = 0. Define:

h(a) = log E eaξ and I(x) = sup {ax − h(x)} . (7.3.35)
a
The following upper bound is called exponential Chebychev inequality: For any x > 0 and a > 0,
Pk
k E e a 1 ξi
!
X Pk
P ξi ≥ kx = P ea 1 ξi
≥ eakx ≤ = exp {−k (ax − h(a))} (7.3.36)
1
eakx
Similarly,
Pk
k E e−a 1 ξi
!
X Pk
−a 1 ξi akx
P ξi ≤ −kx =P e ≥e ≤ = exp {−k (ax − h(−a))} (7.3.37)
1
eakx
The function h in (7.3.35) is called log-moment generating function. By Hölder inequality it is convex: If λ ∈ (0, 1)
and a < b, then

λ 1−λ
h(λa + (1 − λ)b) = log E e(λa+(1−λ)b)ξ ≤ log E eaξ E ebξ .
Since E(ξ) = 0, Jensen inequality implies that h ≥ 0. Clearly, h(0) = 0. In general it might happen that h = ∞ on
an open or closed semi-line not containing zero. It might even happen that h(a) = ∞ ∀a 6= 0.
The function I in (7.3.35) is called the Legendre-Fenchel transform of h. Clearly, I ≥ 0 and, since h is
non-negative, h(0) = 0. Furthermore, for any x > 0,
I(x) = sup {ax − h(a)} if x > 0, and I(−x) = sup {ax − h(−a)}
a≥0 a≥0
Consequently, (7.3.36) and (7.3.37) imply:
Cramer’s Large Deviation upper bound. For any k ∈ N and for any x > 0,
k k
! !
X X
P ξi ≥ kx ≤ e−kI(x) and P ξi ≤ −kx ≤ e−kI(−x) . (7.3.38)
1 1
The bound (7.3.38) is non-trivial if I(x) > 0, respectively if I(−x) > 0. Here is a necessary and sufficient
condition:
Lemma 7.3.2. I(x) > 0 for any x > 0 iff there exists a > 0 such that h(a) < ∞. Similarly, I(−x) > 0 iff there exists
a > 0 such that h(−a) < ∞.
The proof of Lemma 7.3.2 is easy. Clearly I(x) = 0 if h(a) = ∞ for any a > 0. On the other hand, if h(a) < ∞
for some a > 0 (and hence by convexity for all b ∈ [0, a]) , then h is infinitely many times differentiable on (0, a),
and for b ∈ (0, a),
E ξebξ

0
h (b) = (7.3.39)
E ebξ
Moreover, limb↓0 h0 (b) = E(ξ) = 0. The latter statement and (7.3.39) follow from the dominated convergence
theorem (DOM), which, as well as MON, will be formulated later. As a result, if x > 0, then there exists b ∈ (0, a]
such that h0 (b) ≤ x/2. But then,
Z b xb
I(x) ≥ bx − h(b) = (x − h0 (t))dt ≥ > 0.
0 2
The same regarding I(−x).

Of course, condition E(ξi ) = 0 is for notational convenience only. If E(ξi ) = µ, then for any x > µ
k k
! !
X X
P ξi ≥ kx =P (ξi − µ) ≥ k(x − µ) ≤ e−kJ(x−µ) ,
1 1
where n o n o

∆
J(x − µ) = sup a(x − µ) − log E ea(ξ−µ) = sup ax − log E eaξ = I(x). (7.3.40)
a a
The same for x < µ. I in (7.3.40) is called large deviation rate function.
Exercise 7.3.12. Compute rate function I for ξ ∼ Bernoulli(p), N (µ, σ 2 ), Poisson(λ) and Exp(µ).
7.4 Excess life distribution and stationarity.

Pk
Excess life P
distribution. Let T1 , T2 , . . . be i.i.d. inter-arrival times, Sk = 1 Ti
and N (t) = ∞ 1 1{Sk ≤t} . The excess life at time t is defined via
E(t) = SN (t)+1 − t. (7.4.1)
In words, E(t) is the time from t to the next arrival. Given u ≥ 0 let us compute
1 t
Z
e
F (u) = lim 1{E(s)≤u} ds. (7.4.2)
t→∞ t 0
F e above should be considered as a limiting distribution function of the excess life -

recall our interpretation of the normalized integral on the right hand side of (7.4.2)
in terms of the averaged excess life for a point uniformly (and independently fro the
renewal process) chosen from [0, t].
Here is our basic result on the form of excess life distribution:
Lemma 7.4.1. Let F be the distribution function of Ti -s. Then,

1 u
Z
e
F (u) = (1 − F (τ )) dτ. (7.4.3)
µ 0
Proof. The statement is a version of biased sampling and it follows by the renewal-
reward argument. Indeed, we are dealing with rewards
Z Tk
u
Rk = 1{Tk −s≤u} ds = Tk 1{Tk ≤u} + u1{Tk >u} .
0
Hence,
∆
E(Rku ) = E T 1{T ≤u} + uP (T > u) = E (Y u ) + uP (T > u) .

The random variable Y u = T 1{T ≤u} is non-negative, and

(
0, if, τ ≥ u
P (Y u > τ ) = .
P (T > τ ) − P (T > u) if, τ < u.
114 CONTENTS
By the tail formula,
Z ∞ Z u Z u
u u
E (Y ) = P (Y > τ ) dτ = P (T > τ ) dτ −uP (T > u) ⇒ E (Rku ) = (1 − F (τ )) dτ,
0 0 0
and (7.4.3) follows.

Exercise 7.4.1. Find limiting excess life distribution F e if inter-arrival times are
distributed:
(a) Uni(0, 1)
(b) Exp(λ)
(c) Like sum of two independent variables X + Y with X ∼ Exp(1) and Y ∼ Exp(2).
(d) Geo(p).
(e) Uni(1, 2, . . . , n).
Stationarity. Let us say that a delayed renewal process T1 , T2 , T3 , . . . is stationary

if the distribution of the number of arrivals on interval (t, t + s]
N (t, t + s] = N (t + s) − N (t)
depends only on the length of the interval s (but not on the starting point t).
In the sequel we shall consider delayed renewals satisfying
P (0 < T1 < ∞) = 1 and µ = E (T ) < ∞. (7.4.4)
Note that if N (t) is stationary, then m(t) is linear, and hence, m(t) = µt .
Theorem 7.4.1. The renewal process is stationary iff one of the following happens:
(a) The distribution of excess life time E(t) does not depend on t.
(b) The distribution of T1 is given by (7.4.3).
Proof. If N is stationary, then for any t and s
P (E(t) > u) = P (N (t, t + u] = 0) = P (N (0, u] = 0) = P (E(0) > u) = P (T1 > u) .
Therefore, Z t
1
P (T1 ≤ u) = lim E 1{E(τ )≤u} dτ .
t→∞ t 0
Exercise 7.4.2. Prove that under our Assumption (7.4.4),
1 t
Z
lim 1{E(τ )≤u} dτ = F e (u),
t→∞ t 0
where F e is the limiting excess life distribution defined in (7.4.3).

Bounded convergence theorem (BON) implies that the limit could be exchanged
with the expectation, and hence that F ∗ = F e .
To summarize: We have checked that stationarity of N implies stationarity of E
and F = F e . To finish the proof it would be enough to show that
E is stationary ⇒ N is stationary and F = F e ⇒ E is stationary. (7.4.5)
To check the first of implications (7.4.5) assume that the distribution of E(t) does not depend on t. We shall proceed
as in the proof of (B.3). First of all,
P (N (t, t + u] ≥ 1) = P (E(t) ≤ u) ,
which by assumption depends only on the value of u. Next, for k = ` + 1 and ` ≥ 1,
P (N (t, t + u] ≥ k) = E 1{E(t)≤u} 1{N (u−E(t)≥`} = E 1{E(t)≤u} ϕ0` (u − E(t)) ,

see (7.2.8) where ϕ0` was defined. The expression on the right hand side above depend only on the law of E ∗ (t)
which, by assumption, is the same for all t.
Let us turn to the second implication in (7.4.5)
t
Exercise 7.4.3. Check that m(t) = µ
solves
Z t
t−T

m(t) = F e (t) + m(t − s)dF 0 (s) = F e (t) + E 1{T ≤t} . (7.4.6)
0 µ
t t
We shall assume that m(t) = µ
is the only solution to (7.4.6), which by (B.3) means that me (t) = µ
.
Let N e be the delayed renewal with T1 = T e distributed according to F e . We keep notation E e (t) for the
excess life time for this process. Evidently,
P (E e (t) > u) = P (T e > t + u) + P (N e (t) ≥ 1 ; N e (t, t + u] = 0)
(7.4.7)
X
= P (T e > t + u) + P (N e (t) = k ; N e (t, t + u] = 0) .
k≥1
Next, for every k ≥ 1,

P (N e (t) = k ; N e (t, t + u] = 0) = E 1{S e ≤t} (1 − F 0 (t + u − Ske ) . (7.4.8)
k
1
Exercise 7.4.4. Check that since T e = S1e has a continuous distribution (with density f1e (t) = µ
(1 − F (t)), then
Ske is a continuous random variable for any k ≥ 1.
Using Exercise 7.4.4 let fke be the density function of Ske . Then the right hand side of (7.4.8) reads as:
Z t
fke (τ ) 1 − F 0 (t + u − τ ) dτ.

0
On the other hand (by (MON)),

∞
Z vX ∞ Z v ∞
X X v
fke (τ )dτ = fke (τ )dτ = P (Ske ≤ v) = me (v) = .
0 1 1 0 1
µ
for every v > 0. Which (omitting technical details) means that

∞
X 1
fke (τ ) ≡ . (7.4.9)
1
µ
A substitution into (7.4.7) yields:

X 1
Z t
P (N e (t) = k ; N e (t, t + u] = 0) = 1 − F 0 (t + u − τ ) dτ.

k≥1
µ 0
All together,
1
Z ∞ Z t
1
Z ∞
P (E e (t) > u) = 1 − F 0 − (τ ) dτ + 1 − F 0 (t + u − τ ) dτ 1 − F 0 (τ ) dτ,

=
µ t+u 0 µ u
and, by (7.4.3), we are home.

116 CONTENTS
Key renewal (Blackwell’s ) theorem in the non-lattice case. (Without
proof). Let us say that a random variable T has a non-lattice distribution if 6 ∃ a, h
such that
P (T ∈ a + hZ) = 1.
Assume that i.i.d inter-arrival times have non-lattice distribution. Then, for any
s ∈ [0, ∞),
s
lim (m(t + s) − m(t)) = = me (s). (7.4.10)
t→∞ µ
The reason behind (7.4.10) should be clear: as t → ∞, the distribution of the shifted
process s 7→ N (t + s) − N (t) should converge to the distribution of the stationary
process with the delay distributed according to the limiting excess life distribution
1 u
Z
e
F (u) = (1 − F (τ )) dτ.
µ 1
As we already know the stationary renewal function satisfies me (t + s) − me (t) ≡ µs ,

which is the right hand side of (7.4.10). Proving convergence to stationary renewal
involves a modification of coupling construction (indeed in the case of continuous
time it is unreasonable to expect that renewals in independent copies happen simul-
taneously), which goes beyond the scope of the course. A good reference is Lindvall’s
Lectures on the Coupling Method.
8. CONTINUOUS TIME MARKOV CHAINS. 117
8 Continuous Time Markov Chains.
8.1 Finite state space.
Let S be a finite state space.
Definition 8.1.1. (Q-matrices and associated semi-groups). An S × S-matrix Q is called

Q-matrix if: (
qxy , if x 6= y,
Q(x, y) = , (8.1.1)
−qx , if x = y
and, in addition, X
∀ y 6= x, qxy ≥ 0 and ∀ x, qxy = qx .
y6=x
Another name for Q as above is infinitesimal generator. A family of matrices {Pt }t∈R+ ;
∞
tQ ∆
X (tQ)n
Pt = e = . (8.1.2)
n=0
n!
is called Markov semi-group generated by Q.
Exercise 8.1.1.
Let S be a finite set, Q be an infinitesimal generator, and let Pt be the associated
semi-group.
(a) Check that
n
Pt+s = Pt Ps ∀ t, s ∈ R+ , in particular, Pt = P nt for any t ∈ R+ and n ∈ N
(8.1.3)
(b) Check that for any t the matrix Pt is stochastic, that is:
X
For any x, y, Pt (x, y) ≥ 0 and, for any x, Pt (x, y) = 1. (8.1.4)
y
Hint: For proving the first of (8.1.4), notice that Pδ has non-negative entries for δ
sufficiently small,
P and then use (8.1.3). For proving the second of (8.1.4) use (8.1.2)
and note that y Q(x, y) = 0 for any x.
(c) Check that the matrix-valued function t 7→ Pt is the unique solution to both
Kolmogorov’s forward (8.1.5) and backward (8.1.6) differential equations below:
d
Pt = Pt Q and P0 = I. (8.1.5)
dt
and
d
Pt = QPt and P0 = I. (8.1.6)
dt
118 CONTENTS
Equivalently, for any function f on S, for any x and for any t ≥ 0,
Z t Z t
Pt f (x) − f (x) = QPs f (x)ds = Ps Qf (x)ds. (8.1.7)
0 0
Definition 8.1.2. (Cadlag stochastic process). A stochastic process {Xt } which takes
values on a finite or countable state space S is called cadlag if it is P-a.s is right continuous
and has left limits.
For finite state spaces this implies the following:

(a) {Xt } can be recovered from its values at rational points,
P-a.s for any t ∈ R+ , Xt = lim Xq . (8.1.8)

q↓t
(b) For all t ∈ R+ and ω ∈ Ω there exists > 0 such that X has at most one jump on
(t − , t + ).
As we shall see below, on countable state spaces it is in principle possible that the process
escapes to ∞ in finite time, so (a) and (b) above will be refined to accommodate such a
possibility.
Let X be a cadlag process. Define the following filtration F = {Ft } of σ-algebras:

Ft = σ (Xs , s ∈ [0, t]) = ∩>0 σ (Xq , q ∈ Q ∩ [0, t + ]) . (8.1.9)
The second equality above follows from right continuity. It will play a role for
establishing the strong Markov property.
There are three alternative definitions/constructions of continuous time Markov
chains (CTMC) on finite state spaces S:
Definition 8.1.3. (Martingale problem). A cadlag process X on a finite state space S is

called CTMC with infinitesimal generator Q, and respectively with semi-group of transition
probabilities Pt = etQ if for any function f on S, the process
Z t
f
Mt = f (Xt ) − Qf (Xs )ds (8.1.10)
0
is a martingale (with respect to the filtration F).
Definition 8.1.4. (Analytic). A cadlag process X on a finite state space S is called CTMC
with infinitesimal generator Q, and respectively with semi-group of transition probabilities
Pt = etQ if for any function f on S and for any t, s ≥ 0,

E f (Xt+s ) Ft = Ps f (Xt ) P − a.s. (8.1.11)
Exactly as in the case of discrete time Markov chains (8.1.11) implies the follow-
ing conventional definition of Markov property: For any 0 ≤ t1 < t2 < · · · < tn <
tn+1 and for any x1 , x2 , . . . , xn .xn+1 ∈ S,

P Xtn+1 = xn+1 | Xtn = xn , . . . , Xt1 = x1 = P Xtn+1 = xn+1 | Xtn = xn . (8.1.12)
Furthermore, for any s, t ≥ 0 and for any x, y ∈ S,
P (Xt+s = y | Xt = x) = Ps (x, y), (8.1.13)
and finite dimensional distributions of X are given by:
X
P (Xtn = xn , . . . , Xt1 = x1 ) = P(X0 = y)Pt1 (y, x1 ) . . . Ptn −tn−1 (xn−1 , xn ).
y
(8.1.14)
We do not have time for a complete workout of above equivalences and conclusions.
For instance in order to see how (8.1.11) implies (8.1.13) just write:
P (Xt+s = y; Xt = x) = E {δy (Xt+s )δx (Xt )} = E {δx (Xt )E (δy (Xt+s ) | Ft )}
(8.1.11)
= E {δx (Xt )Ps δy (Xt )} = E {δx (Xt )Ps δy (x)} = Ps (x, y)P(Xt = x).
Let us sketch how the martingale characterization (8.1.10) implies (8.1.11): Let f be a function on S. Since Mtf is
a martingale,
Z s Z s

E (f (Xt+s )|Ft ) − f (Xt ) = E Qf (Xt+τ )dτ Ft = QE f (Xt+τ ) Ft dτ. (8.1.15)
0 0
∆
which means that P̃s f (Xt ) = E f (Xt+τ ) Ft satisfies for any f
(a) By right continuity lims→0 P̃s f (Xt ) = f (Xt ).
d
(b) For any s > 0, ds P̃s f (Xt ) = QP̃s f (Xt ).
(a) and (b) above give a randomized form of the backward equation (8.1.6). By the unicity of solutions to the
latter (which is just a fact about systems of ODE-s when we are on a finite state space),

E f (Xt+s ) Ft = P̃s f (Xt ) = Ps f (Xt ),
which is (8.1.11).
Graphical Representations. The main (third) equivalent way to think about

CTMC is given by graphical representations. They are based on the Strong Markov
Property, which we formulate without proof:
Strong Markov Property for CTMC on finite state space. Let X be a CTMC and
let T be stopping time with respect to {Ft }. Define the associated σ-algebra FT exactly
as in the discrete time:
FT = {B : B ∩ {T ≤ t} ∈ Ft for any t ∈ R+ } .
Then for any n, any function F on Sn and any 0 ≤ t1 < t2 < · · · < tn ,

E 1{T<∞} F (XT+t1 , . . . , XT+tn ) FT = 1{T<∞} EXT (F (Xt1 , . . . , Xtn )) P − a.s. (8.1.16)
In particular, given {T < ∞; XT = x}, the chains X[0, T] and X[T, ∞] are independent,
and the chain X[T, ∞] is distributed like the the usual chain which starts from x. We use
Px for the later.
120 CONTENTS
Now a CTMC which starts at some x ∈ X behaves as follows: Define

τx = inf {t : Xt 6= x} . (8.1.17)
Exercise 8.1.2. Check that τx is a stopping time.
We shall call τx holding time at x. By the strong Markov property the chain
starts afresh at time τx at (in general random) state Xτx . Thus in order to reconstruct
the chain, the only thing wee need is to describe joint distribution of (τx , Xτx ) for
any x ∈ S.
Theorem (Joint distribution of (τx , Xτx )). Let X be a finite state Markov chain with
generator Q in (8.1.1). Then for any x, y ∈ S:
(a) τx ∼ exp (qx )
(b) If qx > 0, then Px (Xτx = y) = qqxyx .
(c) Under Px (and assuming that qx > 0) random variables τx and Xτx are independent.
Sketch of the proof. It is easy to understand claim (a): By the usual Markov prop-
erty,
Px (τx > t + s |τ > t) = Px (τx > s) . (8.1.18)
To make this rigorous, however, requires some work. Indeed, if what we call usual
Markov property is (8.1.12), then it involves only finite number of times, whereas
events {τx > t + s} involve a continuity of times. One should go to a limit, and
right-continuity will play a role.
Assuming (8.1.18), we conclude that τx is memoryless. Hence it is exp (λ) with
some λ ≥ 0. The fact that λ = qx follows from intuitively obvious (and indeed easily
justifiable in the finite state space case) fact: For t small,
Px (τ > t) = Px (Xt = x) + o(t) = Pt (x, x) + o(t).
But then λ = qx follows from (8.1.2).
An alternative clean and very short proof of (a) follows by the martingale char-
acterization (8.1.10) and by the following fact which we inherit without a proof from
the discrete time case:
If Mt is a martingale and T is a stopping time, then Mt∧T is also a martingale.
(8.1.19)
Fix x and define f (y) = 1{y6=x} . Using (8.1.10) and (8.1.19) with T = τx , we infer
that Z τx ∧t
Ex (f (Xτx ∧t )) = E Qf (Xs )ds . (8.1.20)
0
for any t ≥ 0. Note that for s < τx the quantity Qf (Xs ) = qx . On the other hand,
by right continuity, f (Xτx ∧t ) = 1{τx ≤t} . Therefore (8.1.20) reads as: For any t ≥ 0
Z t
Px (τx ≤ t) = qx Ex (τx ∧ t) = qx Px (τx > s) ds. (8.1.21)
0
(8.1.21) is an integral equation for Px (τx ≥ t). It has the unique solution Px (τx ≥ t) =
e−qx t , which is precisely the tail of Exp (qx ).
Claim (b) of the theorem follows easily if we are permitted to rely on optional
stopping in the continuous case. The relevant input (without proof) follows:
Optional stopping for continuous time martingales. Let Mt be a bounded martin-
gale in the following sense: There exists R < ∞, such that |Ms+t − Ms |1T ≤s+t ≤ tR, P-a.s
for all t and s. Let T be a stopping time with finite expectation E (T) < ∞. Then,
E (MT ) = E (M0 ) . (8.1.22)
Turning back to the proof of (b) pick y 6= x. Then, by (8.1.10), the process
Z t
Mt = δy (Xt ) − Qδy (Xs ) ds
0
is a martingale. In view of (a), Ex (τx ) = q1x < ∞, and conditions of the optional
stopping Theorem above are satisfied. Therefore, noting that Qδy (x) = qxy ,
qxy
Ex (Xτx = y) = E (δy (Xτx )) = qxy Ex (τx ) = .
qx
A short proof of (c) relies on the following generalization of martingale property
(8.1.10):
General Martingales related to CTMC. Let X be a CTMC on a finite state space S,
∂
and let g(t, x) be a function on R+ × S which is bounded and whose derivative ∂t g( t, x) is
also bounded. Then
Z t
g ∂
Mt = g (t, Xt ) − g (s, Xs ) + Qg (s, Xs ) ds (8.1.23)
0 ∂t
is a martingale.
Let us turn to the proof of (c). Pick y 6= x and λ > 0, and consider g(t, z) =
e−λt δy (z). Again, if qx > 0, optional stopping applies for Mtg and τx under Px . Since
under Px , M0g = g(0, x) = 0,
Z τx
g −λτx −λs

0 = Ex Mτx = Ex e 1{Xτx =y} − qxy Ex e ds .
0
Hence, for any y 6= x and any λ > 0,

−λτx qxy qxy qx qxy qx
1 − Ex e−λτx =

Ex e 1{Xτx =y} = 1− = .
λ λ qx + λ q x qx + λ
We conclude: For any y 6= x and for any λ > 0,
Ex e−λτx 1{Xτx =y} = Ex e−λτx Px (Xτx = y) .

(8.1.24)
122 CONTENTS
Which, by known results about moment generating functions and Laplace trans-
forms, means that τx and Xτx are independent.
In view of the above Theorem CTMC with generator Q could be constructed

as a continuous time randomization of a discrete time Markov Chain, the so called
Jump Chain. Let us first describe the latter:
Jump Chain. The Jump Chain {Yn } is a discrete time Markov chain on S with transition
probabilities
qxy
R(x, y) = Px (Xτx = y) = . (8.1.25)
qx
Graphical Construction 1. CTMC starting from initial distribution π; P (X0 = x) =

π(x) is constructed as follows:
(a) Construct the jump chain Yn starting from π.
(b) Let ξ0 , ξ1 , . . . be i.i.d exponential Exp(1) random variables, which are also independent
from {Yn }. Define holding times τn = qξYn .
n
(c) Set (
Y0 , if t < τ0
Xt = . (8.1.26)
if 0n−1 τ` ≤ t < n0 τ`
P P
Yn ,
Here is a useful version of the above construction:

Graphical Construction 2. For each pair of states (x, y) define Poisson process Nxy
of intensity qxy . We think about arrivals of Nxy in terms of arrivals of arrows from x to
y. Processes Nxy are independent for different pairs (xy). Then, the trajectory of Xt is
constructed as follows:
(a) Sample realizations of arrows Nxy for all pairs of sites.
(b) Start X at time zero according to a distribution π (and independently from all the
arrows).
(c) Go up the time direction until the first arrow leading from the state the CTMC is
currently in. Follow the arrow and continue going up the time direction until the next
arrow. And so on.
Martingales related to Graphical Construction 2. If Nt is Poisson process

of intensity λ, then
Mt = Nt − λt
is a martingale. This has the following far reaching generalization which we proceed
to discuss:
Let Xn be a CTMC on a finite state space S with Q-matrix Q. Given a Poisson
process N and a cadlag random function ψ which is adapted to filtration F (meaning
that for any t random variable ψ(t) ∼ Ft ), let define the integral
Z t Nt
X
ψ(s−)dN (s) = ψ(Sk −),
0 k=1
where S1 , S2 , . . . are arrival times of N .
Exercise 8.1.3.
(a) Check (at least on a heuristic level) that for any x 6= y with qxy > 0 and for any
function g on S the process
Z t Z t
Mtg = g (Xs− ) dNxy (s) − qxy g (Xs ) ds, (8.1.27)
0 0
is a martingale. Above Xs− is the left limit Rof X at s. and, given a function ψ on
t
[0, t] and a Poisson process N , the integral 0 ψ(s)dN (s) is simply defined as
where S1 , S2 , . . . are arrival times of N .
(b) Use (8.1.27) to derive the following: Let Πt (x) be the total time spent by X at x
during [0, t], and let Jxy (t) be the number of jumps from x to y during [0, t]. Then,
Mtxy = Jxy (t) − qxy Πt (x)
is a martingale.
Hint: Consider g = δx .
(c) Again, at least on a heuristic level, check that
xy
E (Mt+s − Mtxy )2 ≤ qxy s. (8.1.28)
Assuming that LLN (5.4.5) holds for continuous time martingales (which it does),
we infer from (8.1.28) that
1
lim (Jxy (t) − qxy Πt (x)) = 0.
t→∞ t
Πt (x)
In particular, if limt→∞ t
= π(x) exists, then
Jxy (t)
lim = π(x)qxy .
t→∞ t
The latter conclusion is both an instance of PASTA and of the Ergodic Theorem for
CTMC on finite state space, which we proceed to discuss.
124 CONTENTS
8.2 Ergodic theorem for CTMC on a finite state space.
Let us say that a CTMC X on finite state space S is irreducible if its Jump chain
Y defined in (8.1.25) is. Using Graphical Construction 1, we can easily transfer all
the conclusions from discrete MC (on finite state spaces) to CTMC. For instance let
Ty be the first time when Xt arrives/returns to y (that is after making at least one
jump). Let Ny be the number of steps needed for Yn to reach/return to y. Then for
x 6= y
Ny −1
!
X ξ`
Ex (Ty ) = Ex .
0
qY `
∆
Since, for irreducible chains on finite state space q̄ = minz qz > 0, we conclude that
1
max Ex (Ty ) ≤ max Ex (Ny ) < ∞.
x6=y q̄ x6=y
In particular irreducible CTMC on finite state spaces are always positively recurrent.
Next, cycle decomposition for Xt is inherited from the cycle decomposition of
Yn : Fix x ∈ S. Let Ñ1 , N1 , N2 , . . . be cycle lengths (integer inter-renewal times)
for Yn . Recall that Ni is distributed as the first return time Nx under Px . Define
independent T̃1 , T1 , T2 , . . . via
Ñ 1 −1 N i −1
X ξ` Px
X ξ`
T̃1 = and Ti ∼ . (8.2.1)
0
qY` 0
qY`
By positive recurrence, E (Ti ) < ∞. Hence renewal-reward applies. For a given

y ∈ S, the reward
N −1
y
X ξ`
R = δy (Y` )
0
qy
corresponds to the total time spend in y during a renewal cycle. Which means that
1 t Ex (Ry ) ∆
Z
lim δy (Xs ) ds = = π(y) (8.2.2)
t→∞ t 0 Ex (Tx )
It is instructive to compare π in (8.2.2) with the invariant measure µ of the Jump
chain:
Exercise 8.2.1. (a) Check that
µ(y)/qy
π(y) = P . (8.2.3)
z∈S µ(z)/qz
(b) Check that π is the unique (up to multiplication by a constant) non-negative

solution of X
π(z)Q(z, y) = 0 ∀ y ∈ S. (8.2.4)
z∈S
Hint: Use already known facts about µ.
π is called invariant distribution or steady state, and (8.2.4) is the steady state
equation. Many quantities related to long-term behaviour of Xt could be character-
ized in terms of π:
Exercise 8.2.2. Let Xt be an irreducible CTMC on a finite state space S with
generator Q and invariant distribution π. Solve using renewal theory and common
sense:
(a) Use (8.2.2) with x = y in order to check that for any y ∈ S,
1
Ey (Ty ) = . (8.2.5)
qy π(y)
(b) For any function f on S ,
Z t
1 X
lim f (Xs ) ds = f (x)π(x). (8.2.6)
t→∞ t 0 x∈S
(c) Prove that for any y ∈ S

π(y)
E (Time spent at y between two successive visits to x) = .
qx π(x)
(d) Prove that for any x and y with qxy > 0 ,
1
E (Time between two successive jumps x → y) = .
π(x)qxy
(e) For x, y with qxy > 0 and z ∈ S find and prove expression for
E (Time at z between two successive jumps x → y) .
8.3 Countable state space.

We continue to work with Q-matrices as defined in (8.1.1). In the countable case
we shall request in addition that
X
qx = qxy < ∞ ∀ x. (8.3.1)
y6=x
Informally there is no sites which the chain would instantaneously leave.
Example 8.3.1.
A Pure Birth Chain has generator Q with the following jump rates:
0 q0 1 q1 2 q2 3 q2
• −→ • −→ • −→ • −→ . . . (8.3.2)
Recall that there were three ways to think about CTMC on finite state spaces: Graphical
representations, Analytic and via Martingale problems. Let us check whether and how
these approaches go through in the case of countable state spaces:
126 CONTENTS
Graphical representation. Consider (8.1.26). Because of (8.3.1) everything is
well defined and the construction makes sense. However, it might be ambiguous.
Indeed define
X n X∞
Jn = τ` and J∞ = lim Jn = τ` . (8.3.3)
n→∞
0 0
Then, (8.1.26) tells how to construct Xt only for t < J∞ . But what happens if
Definition (Explosion)
P (J∞ < ∞) > 0. (8.3.4)
Exercise 8.3.1. 1. Check that if x ∼ y, then
Px (J∞ < ∞) > 0 ⇔ Py (J∞ < ∞) > 0.
In particular, in the case of irreducible CTMC there is no ambiguity in saying that

the chain is explosive if P (J∞ < ∞) > 0.
2. Consider the pure birthP∞ process of Example 8.3.1. Check that
1
(a) P (J∞ = ∞) = 1 ⇔ 0 q` = ∞.
P∞ 1
(b) P (J∞ < ∞) = 1 ⇔ 0 q` < ∞.
P∞ 1
(c) If 0 q` < ∞, then for any time t > 0,
P (J∞ < t) > 0. (8.3.5)
To summarize: The graphical construction gives probabilities
Pt (x, y) = Px (Xt = y; t < J∞ ) . (8.3.6)
It does not tell what happens with the process on [J∞ , ∞]. There is a freedom to
postulate this. The simplest thing to do is to declare that the process is killed at J∞
and either add a cemetery state ∂ where the process is absorbed at J∞ , or to think
in terms of sub-probability distributions. P −1
Or, for instance, in the pure birth process with q` < ∞ it is possible to
declare XJ∞ = 0 or, more generally, to fix p ∈ [0, 1] and to sample XJ∞ from any
probability distribution on N0 with probability p, or to send it to the cemetery state
∂ with probability q = 1 − p.
Analytic approach. In the countable case it is not immediately clear how to make
sense out of (8.1.2). However, (8.1.5) and (8.1.6) could be viewed as an infinite
systems of ODE-s.
Theorem (Without proof). Pt defined in (8.3.6) is always a solution, and the minimal
one, to both (8.1.5) and (8.1.6). It is unique iff the process is non-explosive, that is if
P (J∞ = ∞) = 1.
Note that unicity in the non-explosive case follows from minimality. Indeed, if
P̃
Pt another solution, then
is P by minimilaity P̃t ≥ Pt . But in the non-explosive case
y Pt (x, y) ≡ 1. Since y P̃t (x, y) ≤ 1, the equality follows.
Martingale problem. Both Mtf in (8.1.10) and Mtg in (8.1.23) are ambiguous if the process is explosive. However,
if P (J∞ = ∞) = 1, then Mtg is a martingale. Indeed, let Sn be an increasing sequence of finite subsets of S, such
that S = ∪Sn . Set
Rn = inf {t ≥ 0 : Xt 6∈ Sn } .
Since we assume that the chain is non-explosive,
lim Rn = ∞ P − a.s. (8.3.7)

n→∞
On the other hand Xt∧Rn could be viewed as a CTMC on the finite set Sn ∪ ∂, where we kill X at time Rn , XRn = ∂.
For any bounded (and with bounded derivatives) function g consider
(
g(t, x), if x ∈ Sn
gn (t, x) =
0, if x = ∂.
Then,
Z t∧Rn
∂g(s, Xs )

Mtg,n = gn (t ∧ Rn , Xt∧Rn ) − gn (0, X0 ) − + Qg(s, Xs ) ds
0 ∂s
is a martingale for any n ∈ N.
By (8.3.7) the limit
Z t
∂g(s, Xs )

Mtg = lim Mtg,n = g (t, Xt ) − g (0, X0 ) − + Qg(s, Xs ) ds
n→∞ 0 ∂s
exists. Using (BON) for conditional expectations, one can conclude that Mtg,n is a martingale as well.
8.4 Explosions.
We shall discuss irreducible chains here. Let Q be a Q-matrix. Pick λ > 0 and
consider the following in general infinite system of linear equations: For any x ∈ S,
X
(λ + qx ) g(x) = qxy g(y) or, in vector notation, λg = Qg. (8.4.1)
y6=x
Theorem (Reuter’s criterion.) An irreducible CTMC is explosive if and only if for some
λ > 0 there exists a bounded and positive solution gλ to (8.4.1). If such gλ exists for some
λ > 0, then it exists for all λ > 0.
Proof. Assume that Xt is explosive. Pick λ > 0. By 1. of Exercise 8.3.1,
gλ (x) = Ex e−λJ∞

(8.4.2)
is positive (and of course bounded) for any x ∈ S. We claim that gλ solves (8.4.1).
Indeed, by Graphical construction τx ∼ Exp(qx ) under Px and it is independent of
Xτx . By the same authority,
qxy
Px (Xτx = y) = .
qx
128 CONTENTS
Hence
X qxy qx X qxy
gλ (x) = Ex e−λJ∞ = Ex e−λτx Ey e−λJ∞ =

gλ (y),
y6=x
q x λ + q x
y6=x
q x
which is (8.4.1).
Assume now that gλ is a positive and bounded solution of (8.4.1) for some λ > 0.
There is no loss of generality to assume that supy gλ (y) ≤ 1. Pick any x ∈ X. Then,
X qxy (8.4.1)
Ex e−λJ1 gλ (XJ1 ) = Ex e−λJ1

gλ (y) = gλ (x).
u6=x
qx
Iterating (and using gλ ≤ 1) we conclude that for any n ∈ N,

gλ (x) = Ex e−λJn gλ (XJn ) ≤ Ex e−λJn .

(8.4.3)
Taking n → ∞ we infer from (BON) that,
0 < gλ (x) ≤ E e−λJ∞ .

(8.4.4)
Hence Px (J∞ < ∞) > 0.
Exercise 8.4.1.
Check that if Xt is irreducible, and if there exist λ > 0 and a bounded non-
negative and non-trivial function g on S which satisfies
X
(λ + qx ) g(x) ≤ qxy g(y) or, in vector notation, λg ≤ Qg, (8.4.5)
y6=x
for any x ∈ S, then Xt is explosive.

Exercise 8.4.2. Birth and Death Process. The state space of Xt is S = N0 .
The jump rates are, for any k ∈ N0 , given by
Q(k, k + 1) = λk and Q(k + 1, k) = µk+1 . (8.4.6)
Check that Xt is explosive iff
∞
X 1 µn µn µn−1 . . . µ1
+ + ··· + <∞ (8.4.7)
n=1
λ n λn λ n−1 λ n λ n−1 . . . λ0
Exercise 8.4.3.
Check that in general an irreducible CTMC Xt on a countable state space S is
non-explosive if one of the following conditions happens:
(a) If supx qx < ∞.
(b) Xt is recurrent.
Exercise 8.4.4.
Find an example of a CTMC such that P (J∞ < ∞) ∈ (0, 1).
Hint: Think of two different pure birth processes or, more generally, of two
different transient birth and death processes.
8.5 Ergodic theorem for CTMC on countable state spaces.
In the sequel we assume that a CTMC Xt is recurrent (and hence in particluar
non-explosive).
Steady state eqaution. Let us say that a non-trivial and non-negative {πx } is an
invariant measure if X
πx q x = πy qyx , (SE)
y6=x
for any x ∈ S. P
An invariant measure is called invariant distribution if x πx = 1.
Following Exercise 8.2.1 we know that if π satisfies (SE), then {µx = qx πx } is

an invariant measure for the jump chain Yn . Hence, Theorem 4.6.3 implies that
{πx } is strictly positive and finite, and that it is the unique solution to (SE) up to
a multiplication by a constant. In particular, invariant distribution is unique, if it
exists. And it exists iff X
πx < ∞. (8.5.1)
x
Next fix x ∈ S and consider the cycle decomposition of Yn . As in the finite state
space case it induces the cycle decomposition of Xt , see (8.2.1). Since
N
Xx −1
µy = Ex 1{Yn =y}
0
is an invariant measure for the jump chain Yn ,

Nx −1 Z Tx
1 X
πy = Ex 1{Yn =y} = Ex 1{Xt =y} dt , (8.5.2)
qy 0 0
is an invariant measure for Xt . Hence, in view of (8.5.1) the invariant distribution

for Xt exists iff , X
πy = Ex (Tx ) < ∞, (8.5.3)
y
that is iff Xt is positively recurrent.

Exercise 8.5.1. Find examples when there is an invariant distribution for the jump
chain Yn , but not for the CTMC Xt , and vice versa, when Xt is positively recurrent,
whereas Yn is not.
If Xt is positively recurrent, then the renewal-reward result applies.
Exercise 8.5.2. Ergodic theorem for CTMC. Assume that Xt is irreducible and
positively recurrent.
(a) Check that
1
πx = (8.5.4)
qx Ex (Tx )
130 CONTENTS
is the invariant distribution. P
(b) Check that for any bounded f on S, or even for any f satisfying x |f (x)|πx <
∞, the limit
1 t
Z X
lim f (Xs )ds = f (x)πx P − a.s. (8.5.5)
t→∞ t 0
x
(c) Let Jxy (t) be the number of jums of Xt up to time t. Check that
Jxy
lim = πx qxy . (8.5.6)
t→∞ t
Hint: Think about Jxy (t) in renewal-reward terms.

(d) More generally given x1 , . . . , xn ∈ S define Jx1 x2 ...xn (t) to be the number of times
up to t such that X makes successive jumps x1 7→ x2 7→ x3 7→ . . . 7→ xn . Then
Jx1 x2 ...xn (t)
lim = πx1 qx1 R(x1 , x2 )R(x2 , x3 ) . . . R(xn−1 , xn ). (8.5.7)
t→∞ t
where R(x, y) are transition probabilities of the jump chain Yn .
Hint: Consider (new) continuous time Markov chain on on Sn with jump rates qxn ,y
from (x1 , x2 , . . . , xn ) 7→ (x2 , . . . , xn , y).
8.6 Biased sampling and PASTA.

The Poissonian form of biased sampling (7.3.10) was formulated in a general context.
In particular it applies when the renewal structure is given by a cycle decomposition
of an ergodic (irreducible and positively recurrent) CTMC.
PASTA is based on the martingale LLN. A basic continuous time martingale
related to Poisson process N (t) is
Mt = N (t) − λt. (8.6.1)
Let S1 , S2 , . . . be the arrival times of N . Then we can define
Z t X
G(s)dN (s) := G (Sk ) . (8.6.2)
0 Sk ≤t
Let Xt be an ergodic CTMC on some state space S and let N (t) be a Poisson process
of intensity λ, such that X[0, t) and N [t, ∞) are independent. Let g be a function
on S Then, since by definition Xt is cadlag and, in particular, has left limits, the
expression Z t
g (Xs− ) dN (s)
0
is well defined by means of (8.6.2). Consider
Z t Z t
g
Mt = g (Xs− ) dN (s) − λ g (Xs ) ds
0 0
A generalization of (8.6.1) implies that Mtg is a martingale.
If, in addition supt E g 2 (Xt )2 < ∞, then Mtg satisfies the LLN, that is
1 g
lim Mt = 0 P − a.s.. (8.6.3)
t→∞ t
However, by (8.5.5) ,
Z t X
lim g (Xs− ) ds = πx g(x).
t→∞ 0 x
P
The quantity x πx g(x) is the objective long-range time average of g(Xt ).
Since limt→∞ Nt(t) = λ, using S1 , S2 , . . . to describe arrival times of N , we con-
clude:
N (t) N
X 1X 1 X
λ πx g(x) = lim g (XSk − ) = λ lim g (XSk − ) .
t→∞ t N →∞ N
x 1 1
Thus, Poisson Arrivals See Time Averages.
Exercise 8.6.1. Assume that buses arrive to a station according to Poisson process
of intensity µ and passengers arrive to this station according to Poisson process of
intensity λ, and independently of buses. Assume that all the passengers board the
bus when it arrives, and the whole procedure takes essentially zero time.
Define Nk to be the number of passengers already waiting at the station which
passenger number k sees upon his/her arrival. Compute
n
1X
lim Nk .
n→∞ n
1
8.7 Reversibility.
Let us start by setting up appropriate notions in the context of irreducible discrete
time Markov chains Y = (Yn ) on finite or countable state spaces S.
Definition 8.7.1. An positive measure µ on S is said to satisfy the detailed balance

condition with respect to matrix of transition probabilities P if
µx P(x, y) = µy P(y, x) ∀ x, y ∈ S. (8.7.1)
In the latter case we shall say the Markov chain X is reversible with respect to µ.
Remark 8.7.1. Note that we do Pnot require µ to be a probability distribution, or in

other words, we do not require y µy < ∞.
132 CONTENTS
Note that if µ satisfies (8.7.1), then it is invariant. Indeed,
X (8.7.1) X
µy P(y, x) = µx P(x, y) = µx .
y y
In particular Q = P, where Q is the transition matrix of the reversed chain defined

in (4.6.1). Which means that for any x1 , . . . , xn ,
µx1 P(x1 , x2 )P(x2 , x3 ) . . . P(xn−1 , xn ) = µxn P(xn , xn−1 )P(xn−1 , xn−2 ) . . . P(x2 , x1 ).
(8.7.2)
Kolmogorov’s criterion for reversibility. A Markov chain X is reversible (meaning

that there exists a measure µ which satisfies the detailed balance condition (8.7.1) with
respect to the transition matrix P of X) iff for any cycle x1 , . . . , xn , x1 ,
P(x1 , x2 )P(x2 , x3 ) . . . P(xn , x1 ) = P(x1 , xn )P(xn , xn−1 ) . . . P(x2 , x1 ). (8.7.3)
Proof. If X is reversible with respect to µ, then by (8.7.2) for any x1 ,
µx1 P(x1 , x2 )P(x2 , x3 ) . . . P(xn , x1 ) = µx1 P(x1 , xn )P(xn , xn−1 ) . . . P(x2 , x1 ).
Since µ (by definition) is positive, this implies (8.7.3) .

Other way around, let us assume (8.7.3). Fix x0 ∈ S and set µx0 = 1. Then for
any y ∈ S take x1 , . . . , xn such that the probability of the path P( x0 , x1 ) . . . P(xn , y)
is positive (such paths exist by irreducibility ), and set
P( x0 , x1 ) . . . P(xn , y)
µy = . (8.7.4)
P( y, xn ) . . . P(x1 , x0 )
Exercise 8.7.1. Check that under (8.7.3) the measure µy in (8.7.4) is well defined
(that is the expression in the right hand side of (8.7.4) does not depend on the path
chosen), and that µ indeed satisfies the detailed balance condition with respect to P.
Let us turn the case of continuous time Markov chains. Consider an irreducible
CTMC X with Q-matrix Q as in (8.1.1). Assume that π is a (positive) invariant
measure
P πQ = 0. Note that we do not assume that π is a distribution, or even, that
x πx < ∞
Definition 8.7.2. (Generator of reversed chain). For x 6= y,

1
q̂xy = πy qyx or, in other words, πx q̂xy = πy qyx (8.7.5)
πx
Note that because of invariance of π; πQ = 0, the holding rates for direct and
reversed chains are equal:
X 1 X 1
q̂x = q̂xy = πy qyx = πx qx = qx . (8.7.6)
y6=x
πx y6=x πx
We shall denote the reversed chain as X̂. Note also that if Ŷ is the jump chain of X̂,
then its matrix of transition probabilities R̂ satisfies
q̂xy (8.7.5),(8.7.6) πy qyx 1
R̂(x, y) = = = µy R(y, x) , (8.7.7)
qx πx q x µx
where µx = πx qx is the invariant measure of the jump chain Y of the original CTMC
X. Inother words Ŷ is the reversal of Y. In particular, (8.7.2) folds for the pair

Y, Ŷ of discrete time MC.
The following claim (without proof) is a generalization of (8.7.2) for CTMC:
Direct and reversed chains satisfy: For any 0 < t1 < t2 < · · · < tn and any x0 , x1 . . . , xn ,

πx1 Px0 (Xt1 = x1 , . . . , Xtn = xn ) = πxn P̂xn Xtn −tn−1 = xn−1 , . . . , Xtn = x1 . (8.7.8)
A proof of (8.7.8) is based on (8.7.2) for the associated jump chain Y, the relation
µx = qx πx between the invariant measure π of X and invariant measure µ of Y, and
on the following
Exercise 8.7.2. Let T1 ∼ Exp(q1 ), T2 ∼ Exp(q2 ) be two independent exponential

random variables. Then,
1 1
P (T1 < t; T1 + T2 ≥ t) = P (T2 < t; T2 + T1 ≥ t) . (8.7.9)
q1 q2
More generally, if Ti ∼ Exp(qi ); i = 1, . . . n, are independent exponential random

variables, then
n−1 n
! n−1 n
!
1 X X 1 X X
P Ti < t; Ti ≥ t = P Tn+1−i < t; Tn+1−i ≥ t . (8.7.10)
q1 1 1
q n 1 1
Definition 8.7.3. An positive measure π on S is said to satisfy the detailed balance

condition with respect to Q-matrix Q if
πx qxy = πy qyx ∀ x 6= y. (8.7.11)
In the latter case we shall say the CTMC X is reversible with respect to π.
134 CONTENTS
Note that if π satisfies (8.7.11), then it is invariant. Indeed,
X (8.7.11) X
πy qxy = πx qxy = πx qx .
y6=x y6=x
In particular Q̂ = Q, where Q is the Q-matrix of the reversed chain defined above.
Theorem 8.7.1. Assume that (an irreducible) CTMC X is reversible with respect to π.
Then for any 0 < t1 < t2 < · · · < tn and any x0 , x1 . . . , xn ,

πx1 Px0 (Xt1 = x1 , . . . , Xtn = xn ) = πxn Pxn Xtn −tn−1 = xn−1 , . . . , Xtn = x1 . (8.7.12)
If π is a probability distribution, then the right hand side of (8.7.12) defines finite dimen-
sional distributions for CTMC X = {X(t)}t∈(−∞,∞) in equilibrium. In this case (8.7.12)
implies that for the time reversal X̂(t) = X(−t) of X has the same distribution as X.
Theorem 8.7.1 has dramatic implications for birth and death processes and re-
lated queuing systems and networks.
Example 8.7.1. Consider a M/M/1 queue with customer arrival rates λ and ser-
vice rates µ. It is described by CTMC X on N0 with jump rates
qkm = λ1m=k+1 + µ1m=k−1 .
Set ρ = µλ . Then X is reversible with respect to πk = ρk . If ρ < 1, X is ergodic, and
the invariant distribution is given by πk = (1 − ρ)ρk , which is just Geo(1 − ρ).
In the equilibrium X and its time reversal X̂ have the same distributions. But arrivals
of X̂ are departures of X. Let D(t) be the process of departures of X. Theorem 8.7.1
implies:
D is a Poisson process with intensity λ. Moreover, D(−∞, t] and X(t) are independent.
(8.7.13)
The statement (8.7.13) above is called Burke’s Theorem.
Consider now general birth and death processes X defined in Exercise 8.4.2
Exercise 8.7.3. Recall that the state space of Xt is S = N0 . The jump rates are,
for any k ∈ N0 , given by
Q(k, k + 1) = λk and Q(k + 1, k) = µk+1 . (8.7.14)
(i) Check that X is reversible with respect to
λ0 . . . λk−1
π0 = 1 and, for k > 0, πk = . (8.7.15)
µ1 . . . µk
find necessary and sufficient condition (in terms of λi -s and µj -s for the erodicity of
X.
(ii) If λi ≡ λ (constant birth rates ) and X is ergodic, check that the conclusion
(8.7.13) of Burke’s theorem still holds.
Queuing Networks
Consider a tandem of two Markovian queues (birth and death processes) Q1 and Q2 ,
for instance Q1 = M/M/1 (gas station) and Q2 = M/M/N/N (rest area parking
lot next to the gas station). Assume that:
a. Customers arrive to Q1 with rate λ1 . The service rate at Q1 is µ1 .
b. Customers bypass Q1 and arrive directly to Q2 with rate λ2 . The service rate at
Q2 is µ2 .
c. Each client departing from Q1 with probability p goes to Q2 , and with probability
1 − p leaves the system. Both independently of all other clients.
a.-c. above define an a-cyclic service network, which could be schematically
depicted as follows:
λ2
λ1
M |M |1 M |M |N |N
p
1−p
Figure 1: A tandem queue
Set ρ1 = µλ11 . If ρ1 < 1, then the M/M/1 queue on Figure 1 is ergodic. Let
X1 = {X1 (t)}t∈R be its state (number of customers) in equilibrium. Then X1 (t) ∼
π1 = Geo(1 − ρ1 ) for any t ∈ R. Furthermore, by Burke’s theorem the process of
departures D1 from Q1 is Poisson with intensity λ1 . Arrivals to Q2 come from two
independent (Poissonian) sources - direct arrivals with intensity λ2 and thinning of
departures from Q1 with intensity pλ1 . Hence the effective rate of r2 of arrivals to
Q2 equals to r2 = λ2 + pλ1 . The process X2 lives on a finite state space {1, . . . , N },
and it is, therefore, ergodic. In the he equilibrium X2 is distributed according to π2
which is given by
1 ρk2
π2 (k) = , (8.7.16)
c k!
where
r2 λ2 + pλ1
ρ2 = = .
µ2 µ2
By Burke’s theorem in the equilibrium the process of departures D1 [−∞, t) from Q1
is independent of X1 (t). Hence, in the equilibrium, X1 (t) and X2 (t) are independent
for each t ∈ R.
136 CONTENTS
Exercise 8.7.4. Make an argument (even heuristic) which would imply that for
t 6= s random variables X1 (t) and X2 (s) are, in general dependent.
Another example of a more complicated a-cyclic queuing network is depicted on
Figure 2
p1J
λ1 p14
M/M/2
p13
λ3
p12 M/M/∞ M/M/N
p23
λ2
M/M/4
p24
Figure 2: An a-cyclic queuing network
Exercise 8.7.5. Consider the network of four Markovian queues as on Figure 8.7.5
M/M/2, M/M/4, M/M/∞, M/M/N
arrival rates: λ1 = 6, λ2 = 5, λ3 = 3
service rates (per server): µ1 = 4, µ2 = 2, µ3 = 1, µ4 = 3
transitions: p12 = 13 , p13 = 16 , p14 = 16 , p1J = 1
3
p23 = 72 , p24 = 5
7
p34 = 1.
For instance, a customer after receiving service at Station 1 either leaves the system
with probability p1J = 13 or goes to Station 2 with probability p12 = 13 , or goes to
Station 3 with probability p13 = 61 , or goes to Station 4 with probability p14 = 61
and so on.
(i) For which values of N does the network have an invariant distribution?
(ii) Calculate the invariant distribution for N = ∞.
A solution to Exercise 8.7.5 should be based on the following principle for a-cyclic
networks with constant arrival rates.
Definition 8.7.4. An a-cyclic network with n nodes is a graph with vertices {1, . . . , n}
and with oriented edges eij ; i < j. By construction it does not contain loops. There
is a service station at each node i, and for each node/station i we specify:
a. Maximal service rate µ1 .
b. The rate λi of customers who arrive directly to station i.
Furthermore, for each two stations i < j we specify
c. The probability pij that a customer after leaving station i goes directly to station j.
It is assumed that decisions of different customers and decisions of same customers
at different stations are independent.
If the network is ergodic then Burke’s theorem implies that in the equilibrium
arrivals of customers to stations i; i = 1, . . . , n are according to Poisson processes
with effective rates ri . Effective rates satisfy the following system of equations:
X
rj = λj + ri pij . (8.7.17)
i<j
Other way around (8.7.17) his gives a criterion for stability/ergodicity of the a-cyclic
network: It is easy to see that the solution to (8.7.17) always exists and it is always
unique:
r1 = λ1 , r2 = λ2 + r1 p12 , . . . .
Then, the a-cyclic network is ergodic iff
ri
< 1 for i = 1, . . . , n. (8.7.18)
µi
Index
Q-matrices, 117 Boole, 9

Expectations, 16
Bernoulli trials Kolmogorov’s Maximal, 26
Finite number, 7 Positive association, 15
Infinite number, 9
Beta distribution, 82 Law of iterated logarithm, 20
Biased sampling, 104 Little’s laws, 106
Borel σ-field, 10
Borel-Cantelli Lemma Markov chains
First, 21 Explosion, 126
Second, 22 Invariant measure, 65
Branching Process, 52 Perfect sampling, 69
Brownian motion Time reversal, 67
Series expansion, 27, 33 Markov semi-group, 117
Forward and backward equations, 117
Cadlag process, 118 Martingales
Caratheodory’s Extension Theorem, 10 Doob’s maximal inequality, 79
Conditional Expectation, 50
Convergence of random series Poisson Process, 97
Kolmogorov-Kninchin, 26 Polya’s urn, 72
Three series theorem, 27 Limiting distribution, 82
Two series theorem, 27 Probability Space, 6
Problems
Electrical networks Careless secretary, 8
Effective conductance, 90 Prisoner’s amnesty, 8
Rayleigh’s principle, 94 Ramsey’s coloring, 33
Thomson’s principle, 93
Erdos-Renyi random graph, 54 Queue
Ergodic Theorem G/G/1, 103
Markov Chains, 63
Exclusion-inclusion principle, 8 Random permutations, 8
Expectation Random variables
Tail formula, 110 Distribution, 12
Support, 12
Independence, 13 Examples, 13
Inequalities Expectation, 14
Bonferroni, 9 Tail formula, 16
138
INDEX 139
Modes of convergence, 18
Random walk
Exit probabilities, 76
Renewal
Deffective, 35
Delayed, 35
Excess life distribution, 45, 113
Renewal Theorem
Delayed, 99
Elementary, 37
Renewal-reward, 41, 103
Reversibility
Detailed Balance CTMC, 133
Detailed Balance MC, 131
Kolmogorov’s criterion MC, 132
Sigma algebra, 6
Filtration, 38
Statistics
Bose-Einstein, 18
Maxwell-Boltzmann, 18
Stopping time, 38
Tail σ-algebra, 24
Kolmogorov’s 0 − 1 Law, 24
Theorem
Burke’s, 134
Wald’s formula, 39

LN SP2018

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LN SP2018

Uploaded by

Copyright:

Available Formats

Stochastic Processes

June 13, 2018

1 Basic Probability Theory. . . . . . . . . . . . . . . . . . . . . . . . . 6

i.i.d. Independent and identically distributed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

σ(A0 ) Minimal σ-algebra which contains A0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

AB Short hand notation for A ∩ B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

A∆B Symmetric difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

N0 The set of non-negative integers {0, 1, 2, . . . } . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2. For any A ∈ A its complement Ac = Ω \ A ∈ A. In particular the empty set ∅ ∈ A.

Exercise 1.1.1. Check that if A is σ-algebra, then ∩n An ∈ A for any finite or

P is a probability measure if:

(A1) For any event A ∈ A the probability 0 ≤ P(A) ≤ 1.

P or countable family of pairwise disjoint events A1 , A2 , · · · ∈ A,

(A3.2) For any B1 ⊇ B2 ⊇ B3 ⊇ . . . set B = ∩n Bn . Then,

(A3.3) For any B1 ⊇ B2 ⊇ B3 ⊇ . . . with ∩Bn = ∅ ,

But, limn→∞ P(Bn ) = 0 by (A3.3). Hence (A3).

Finite or Countable Probability Spaces.

Example 1.1.2. Random permutations. Ωn = Sn (the so called symmetric group) is

Here are two classical problems related to random permutations:

Uncountable Sample spaces.

Example 1.1.3. Ω = {ω = (ω1 , ω2 , . . . ) with ωi = 0, 1} = {0, 1}N . This is the space

Then ω ∗ ∈ Ω, but ω ∗ 6= ω (n) for every n.

are such events. Furthermore, for ω ∈ Ω and n ∈ N define

Then {limn→∞ pn (ω) = p} ∈ σ(A0 ), but it does not belong to A0 .

Example 1.1.4. Ω = [0, 1] and B0 = {[a, b] : 0 ≤ a ≤ b ≤ 1}, that is B0 is the family

In many instances construction of probability measures is based on the following

Construction of Probability measure for Bernoulli Trials. Consider Example 1.1.3.

P ((a, b)) = · · · = P ([a, b]) = b − a,

and extend it by additivity to B̃0 .

An example of an additive measure which does not have a σ-additive

Ai = qi − 4−i , qi + 4−i ∩ Q ∩ [0, 1].

Then µ is additive, but not σ-additive.

1.2 Random Variables

Remark 1.2.1. Each random variable X generates a sub σ-algebra AX , which is

Distribution function of random variable Given a random variable X its distribu-

Exercise 1.2.1. (a) Check that FX is right-continuous:

lim FX (y) = FX (x)

Then X is discrete, but supp(X) = R.

Independence We say that σ-algebras A1 , . . . , An are independent if P(A1 . . . An ) =

Examples of Random Variables and their distributions.

• Binomial distribution S ∼ BN (n, p) describes number of successes in n inde-

• Geometric random variable Y ∼ Geo(p) describes number of Bernoulli until

pY (k) = pq k−1 for k = 1, 2, . . . and, respectively, pY0 (k) = pq k for k = 0, 1, . . . .

• Negative binomial distribution Z ∼ N B(r, p) describes number of Bernoulli

• Uniform random variable on a finite set A, U ∼ U ni(A) satisfies pU (u) = |A|−1

• Exponential random variable T ∼ Exp(λ) of intensity λ has density function

fT (t) = λe−λt 1[0,∞]) (t).

• For r ∈ N, random variable S ∼ Erlang(r, λ) is sum of r independent Exp(λ)

where the normalizing constant

Γ in (1.2.6) is called Gamma function. For r ∈ N it could be exactly computed:

• X ∼ β(α, β), the so called Beta distribution, if the density function of X is

• Normal or Gaussian distribution X ∼ N (µ, σ 2 ) has density

Expectation. For an indicator random variable X = 1A its expectation E(X) is de-

More care is needed for defining an expectation of an infinite linear combination

In order to define expectation of a general random variable X note that if X is

Cov (F (X), G(X)) = E (F (X)G(X)) − E(F (X))E(G(X)) ≥ 0 (1.2.12)

The property (1.2.12) is called positive association of probability measures on R.

Hint. Consider a random variable Y which is independent of X, but has exactly

Inequalities for Expectations. There are two groups of inequalities which we

Jensen. If ϕ is convex, then ϕ(EX) ≤ Eϕ(X).

Markov. If X is non-negative random variable, then for any a ≥ 0,

1.3 Modes of convergence.

Convergence in probability. Let us say that p − limn→∞ Xn = X if for any > 0,

for any m. Since m is decreasing to zero, this implis that

for any > 0 fixed. But by σ-additivity,

P (|Xn − X| < a.b.f ) = 1 (1.5.3)

for any > 0. Therefore,

P lim sup |Xn − Xm | < 2 = 1,

for any n and any > 0. By the First Borel-Cantelli Lemma,