Professional Documents
Culture Documents
Markov Chains PDF
Markov Chains PDF
Markov Chains
• A discrete time process {Xn , n = 0, 1, 2, . . .}
with discrete state space Xn ∈ {0, 1, 2, . . .} is a
Markov chain if it has the Markov property:
P[Xn+1 = j|Xn = i, Xn−1 = in−1 , . . . , X0 = i0 ]
= P[Xn+1 = j|Xn = i]
• In words, “the past is conditionally independent
of the future given the present state of the
process” or “given the present state, the past
contains no additional information on the future
evolution of the system.”
• The Markov property is common in probability
models because, by assumption, one supposes
that the important variables for the system being
modeled are all included in the state space.
• We consider homogeneous Markov chains for
which P[Xn+1 = j | Xn = i] = P[X1 = j | X0 = i].
1
Example: physical systems. If the state space
contains the masses, velocities and accelerations of
particles subject to Newton’s laws of mechanics,
the system in Markovian (but not random!)
Example: speech recognition. Context can be
important for identifying words. Context can be
modeled as a probability distrubtion for the next
word given the most recent k words. This can be
written as a Markov chain whose state is a vector
of k consecutive words.
Example: epidemics. Suppose each infected
individual has some chance of contacting each
susceptible individual in each time interval, before
becoming removed (recovered or hospitalized).
Then, the number of infected and susceptible
individuals may be modeled as a Markov chain.
2
• Define Pij = P Xn+1 = j | Xn = i .
Let P = [Pij ] denote the (possibly infinite)
transition matrix of the one-step transition
probabilities.
2
P∞
• Write Pij = k=0 Pik Pkj , corresponding to
standard matrix multiplication. Then
X
2
Pij = P Xn+1 = k | Xn = i P Xn+2 = j | Xn+1 = k
k
X
= P Xn+2 = j, Xn+1 = k | Xn = i
k
(via the Markov property. Why?)
[
= P Xn+2 = j, Xn+1 = k Xn = i
k
= P Xn+2 = j | Xn = i
3
• The matrix multiplication identity
P n+m = P n P m
corresponds to the Chapman-Kolmogorov
equation
n+m P∞ n m
Pij = k=0 Pik Pkj .
• Let ν (n) be the (possibly infinite) row vector of
(n)
probabilities at time n, so νi = P Xn = i .
Then ν (n) = ν (0) P n , using standard
multiplication of a vector and a matrix. Why?
4
Example. Set X0 = 0,
and let Xn evolves as a 1/2
✯
✟ 1
Markov chain with transi- ✟
✟
✌
tion matrix ✟✟ 1/2
0 1
1/2 1/2 0 ❍
❨
❍
❍ ❄
1❍
P = 0
0 .
1 ❍
2
1 0 0
(n)
Find ν0 = P[Xn = 0] by
(i) using a probabilistic argument
5
(ii) using linear algebra.
6
Classification of States
• State j is accessible from i if Pijk > 0 for some
k ≥ 0.
• The transition matrix can be represented as a
directed graph with arrows corresponding to
positive one-step transition probabilities j is
accessible from i if there is a path from i to j.
For example,
✌ ✌ ✌ ✌ ✌
0 ✲ 1 ✲ 2 ✲ 3 ✲ 4 ✲ ...
7
• An equivalence relation divides a set (here, the
state space) into disjoint classes of equivalent
states (here, called communication classes).
• A Markov chain is irreducible if all the states
communicate with each other, i.e., if there is
only one communication class.
• The communication class containing i is
absorbing if Pjk = 0 whenever i ↔ j but i 6↔ k
(i.e., when i communicates with j but not with
k). An absorbing class can never be left. A
partial converse is . . .
8
• State i has period d if Piin = 0 when n is not a
multiple of d and if d is the greatest integer with
this property. If d = 1 then i is aperiodic.
ous” by considering a ✻
❄
generic directed graph i+d-1 ✛ ... ✛ ...
9
• State i is recurrent if P re-enter i | X0 = i] = 1,
where {re-enter i} is the event
S∞
n=1 {Xn = i, Xk 6= i for k = 1, . . . , n − 1}.
If i is not recurrent, it is transient.
• Let S0 = 0 and S1 , S2 , . . . be times of successive
returns to i, with Ni (t) = max {n : Sn ≤ t} being
the corresponding counting process.
• If i is recurrent, then Ni (t) is a renewal process,
since the Markov property gives independence of
interarrival times XnA = Sn − Sn−1 .
Letting µii = E[X1A ], the expected return
time for i, we then have the following from
renewal theory:
⋄ P limt→∞ Ni (t)/t = 1/µii | X0 = i = 1
Pn
⋄ If i ↔ j, limn→∞ k=1 Pijk /n = 1/µjj
⋄ If i is aperiodic, limn→∞ Pijn = 1/µjj for j ↔ i
⋄ If i has period d, limn→∞ Piind = d/µii
10
• If i is transient, then Ni (t) is a defective
renewal process. This is a generalization of
renewal processes where X1A , X2A , . . . are still iid,
abut we allow P[X1A = ∞] > 0.
11
Example: Show that if i is recurrent and i ↔ j
then j is recurrent.
12
• If i is transient, then µii = ∞. If i is recurrent
and µii < ∞ then i is said to be positive
recurrent. Otherwise, if µii = ∞, i is null
recurrent.
13
Random Walks
• The simple random walk is a Markov chain
on the integers, Z = {. . . , −1, 0, 1, 2, . . .} with
X0 = 0 and P Xn+1 = Xn + 1 = p,
P Xn+1 = Xn − 1 = 1 − p.
14
Example: A biological signaling molecule
becomes separated from its receptor. It starts
diffusing, due to themal noise. Suppose the
diffusion is well modeled by a random walk. Will
the molecule return to the receptor? If the
molecule is constrained to a one-dimensional line?
A two-dimensional surface? Three-dimensional
space?
Proposition: The symmetric random walk is
null recurrent when d = 1 and d = 2, but
transient for d ≥ 3.
Proof: The method is to√employ Stirling’s
formula, n! ∼ nn+1/2 e−n 2π where an ∼ bn
means limn→∞ (an /bn ) = 1, to approximate
(d) (d)
P Xn = X0 .
Note: This is in HW 5. You are expected to solve
the simpler case (i), though you can solve (ii) if
you want a bigger challenge.
15
Stationary Distributions
• π = {πi , i = 0, 1, . . .} is a stationary
P∞
distribution for P = [Pij ] if πj = i=0 πi Pij
P∞
with πi ≥ 0 and i=0 πi = 1.
P∞
• In matrix notation, πj = i=0 πi Pij is π = πP
where π is a row vector.
Proof
16
Proof continued
17
• Irreducible chains which are transient or null
recurrent have no stationary distribution. Why?
18
Example: Monte Carlo Markov Chain
• Suppose we wish to evaluate E h(X) where X
has distribution π (i.e., P X= i = πi ). The
Monte Carlo approach is to generate
X1 , X2 , . . . Xn ∼ π and estimate
1
Pn
E h(X) ≈ n i=1 h(Xi )
• If it is hard to generate an iid sample from π,
we may look to generate a sequence from a
Markov chain with limiting distribution π.
• This idea, called Monte Carlo Markov
Chain (MCMC), was introduced by
Metropolis and Hastings (1953). It has become a
fundamental computational method for the
physical and biological sciences. It is also
commonly used for Bayesian statistical inference.
19
Metropolis-Hastings Algorithm
(i) Choose a transition matrix Q = [qij ]
(ii) Set X0 = 0
(iii) for n = 1, 2, . . .
⋄ generate Yn with P Yn = j | Xn−1 = i = qij .
⋄ If Xn−1 = i and Yn = j, set
j with probability min(1, π q /π q )
j ji i ij
Xn =
i otherwise
20
Proposition: Set
qij min(1, πj qji /πi qij )
j 6= i
Pij = q + Xq {1 − min(1, π q /π q )} j = i
ii
ik k ki i ik
k6=i
21
Example: (spatial models on a lattice)
22
Example: Sampling conditional distributions.
• If X and Θ have joint density fXΘ (x, θ) and we
observe X = x, then one wants to sample from
fΘ|X (θ | x) = fXΘ (x,θ)
fX (x) ∝ fΘ (θ)fX|Θ (x | θ)
23
Galton-Watson Branching Process
• Let Xn be the size of a population at time n.
Suppose each individual has an iid offspring
PXn (k)
distribution, so Xn+1 = k=1 Zn+1 where
(k)
Zn+1 ∼ Z.
(k)
• We can think of Zn+1 = 0 being the death of
the parent, or think of Xn as the size of the nth
generation.
• Suppose X0 = 1. Notice that Xn = 0 is an
absorbing state, termed extinction. Supposing
P[Z = 0] > 0 and P[Z > 1] > 0, we can show
that 1, 2, 3, . . . are transient states. Why?
24
• Set φ(s) = E[sZ ], the probability generating
function for Z. Set φn (s) = E[sXn ]. Show that
φn (s) = φ(n) (s) = φ(φ(. . . φ(s) . . .)), meaning φ
applied n times.
25
• If we can find a solution to φ(s) = s, we will
have φn (s) = s for all n. This suggests plotting
φ(s) vs s, noting that
dφ
(iii) φ(s) is increasing, i.e., ds > 0. Why?
d2 φ
(iv) φ(s) is convex, i.e., ds2 > 0. Why?
26
• Now, notice that, for 0 < s < 1,
P extinction = lim P Xn = 0 = lim φn (s).
n→∞ n→∞
27
• Conclude that in CASE 1 (with dφ ds | s=1 > 1, i.e.
E[Z] > 1) there is some possibility of infinite
growth. In CASE 2 (with dφ ds | s=1 ≤ 1, i.e.
E[Z] ≤ 1) extinction is assured.
• Example: take
0 w.p. 1/4
Z= 1 w.p. 1/4 .
2 w.p. 1/2
Find the chance of extinction.
28
Time Reversal
• Thinking backwards in time can be a powerful
strategy. For any Markov chain
{Xk , k = 0, 1, . . . , n}, we can set Yk = Xn−k .
Then Yk is a Markov chain. Suppose Xk has
transition matrix Pij , then Yk has
∗[k]
inhomogeneous transition matrix Pij , where
∗[k]
Pij = P Yk+1 = j | Yk = i
= P Xn−k−1 = j | Xn−k = i
P Xn−k = i | Xn−k−1 = j P Xn−k−1 = j
=
P Xn−k = i
= Pji P Xn−k−1 = j P Xn−k = i .
29
Theorem. (detailed balance equations)
If π is a probability distribution with
πi Pij = πj Pji , then π is a stationary distribution
for P and the chain started at π is time-reversible.
Proof
31
Example: Birth-Death Process.
Let P Xn = Xn +1 | Xn = i = pi and
P X n = X n − 1 | X n = i = q i = 1 − pi . ..
.
p2 ✕ q3
☛
2
p1 ✕
• Find the stationary distribution giv- ☛
q2
p0 =1 ✕ q1
☛
0
33
Random Walk on a Graph
• Consider undirected, connected graph with a
positive weight wij on each edge ij. Set
wij = 0 if i & j are not connected by an edge.
wij
• Set Pij = P
k wik
1 2
❩ ✚ ❩
❩ 1 2 ✚ ❩ 2
❩ ✚ ❩
❩ ✚ ❩
1 3 0
✚ ❩ ✚
✚1 ❩
2 ❩ ✚2
✚ ✚
✚ ❩ ✚
4 5
34
• Show that the random walk on a graph is
reversible, and has stationary distribution
P P
πi = j wij jk wjk
Note: the double sum counts each weight twice.
35
Ergodicity
• A stochastic process {Xn } is ergodic if limiting
time averages equal limiting probabilities, i.e.,
time in state i up to time n
lim = lim P[Xn = i].
n→∞ n n→∞
36
Semi-Markov Processes
• A Semi-Markov process is a discrete state,
continuous time process {Z(t), t ≥ 0} for which
the sequence of states visited is a Markov chain
{Xn , n = 0, 1, . . .} and, conditional on {Xn }, the
times between transitions are independent.
Specifically, define transition times S0 , S1 , . . . by
S0 = 0 and Sn − Sn−1 ∼ Fij conditional on
Xn−1 = i and Xn = j. Then, Z(t) = XN (t) where
N (t) = sup {n : Sn ≤ t}.
• {Xn } is the embedded Markov chain.
• The special case where Fij ∼ Exponential (νi ) is
called a continuous time Markov chain
• The special
case when
0, t < 1
Fij (t) =
1, t ≥ 1
37
Example: A taxi serves the airport, the hotel
district and the theater district. Let Z(t) = 0, 1, 2
if the taxi’s current destination is each of these
locations respectively. The time to travel from i
to j has distribution Fij . Upon arrival at i, the
taxi immediately picks up a new customer whose
destination has distribution given by Pij . Then,
Z(t) is a semi-Markov process.
38
Proposition: If {Z(t)} is an irreducible,
non-lattice semi-Markov process with µii < ∞,
then
µi
lim P Z(t)= i | Z(0)= j =
t→∞ µii
time in i before t
= lim w.p. 1
t→∞ t
39
• By counting up time spent in i differently, we
get another identity
time in i during [0, t] πi µ i
lim =P
t→∞ t j πj µ j
Proof
40
• Calculations with semi-Markov models typically
involve identifying a suitable renewal, alternating
renewal, regenerative or reward process.
41
Proof (continued)
42