Markov chains_lectures_DCM_2024

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 137

Markov Chains

• Innocent Ndoh Mbue, PhD


(Professor in Ecoinformatics)

• E-mail: dndoh2009@gmail.com
• Tel: 653754070/677540384

.
Markov Chain :
a process with a finite number of states (or outcomes, or events) in
which the probability of being in a particular state at step n + 1
depends only on the state occupied at step n.

Prof. Andrei A. Markov (1856-1922) , published his result in 1906

2
A sequence of trials of an experiment is a Markov chain if:
1. the outcome of each experiment is one of a set of discrete states;
2. the outcome of an experiment depends only on the present state, and not on any
past states.

Categories

3
…Categories
 If the time parameter is discrete {t1,t2,t3,.....}, it is called Discrete Time Markov
Chain (DTMC ).

 If time parameter is continues, (t≥0) it is called Continuous Time Markov Chain


(CTMC )

If Xn= j, we say that the process is in state j. The numbers P(Xm+1=j|Xm= i) are called
the transition probabilities. We assume that the transition probabilities do not depend
on time. That is, pij = P(Xm+1= j|Xm= i) does not depend on m.
4
State Transition Matrix and Diagram
We often list the transition probabilities in a matrix. The matrix is
called the state transition matrix or transition probability matrix and is
usually shown by P. Assuming the states are 1, 2, ⋯, r, then the state
transition matrix is given by

5
two states => n = 1 (0, 1)
P00 = P(X t+1 = 0 | X t = 0) = ¼ ; 0 0
P01 = P(X t+1 = 1 | Xt = 0) = ¾; 0 1
P10 = P(X t+1 = 0 | Xt = 1) = ½; 1 0
P11 = P(X t+1 = 1 | Xt = 1) = ½; 1 1

6
7
Transition matrix features:

It is square, since all possible states must be used both as rows


and as columns.
 All entries are between 0 and 1, because all entries represent
probabilities.
 The sum of the entries in any row must be 1, since the
numbers in the row give the probability of changing from the
state at the left to one of the states indicated across the top.

8
Matrix Representation
A B C D The transition probabilities
0.95 0 0.05 0 Matrix M =(ast)
A
M is a stochastic Matrix:
B 0.2 0.5 0 0.3
a t st 1
0 0.2 0 0.8
C The initial distribution vector
(u1…um) defines the distribution
0 0 1 0 of X1 (p(X1=si)=ui) .
D

Then after one move, the distribution is changed


to X2 = X1M

9
Matrix Representation
A B C D
0.95 0 0.05 0
A
Example: if X1=(0, 1, 0, 0)
B 0.2 0.5 0 0.3 then X2=(0.2, 0.5, 0, 0.3)

0 0.2 0 0.8 And if X1=(0, 0, 0.5, 0.5)


C then X2=(0, 0.1, 0.5, 0.4).
0 0 1 0
D

The i-th distribution is Xi = X1Mi-1

10
Representation of a Markov Chain as a
Digraph
State Transition Diagram
0.95
A B C D
0.95 0 0.05 0

0.2 0.5 A
0.2 0.5 0 0.3
B

0.05 0.2 0.3 0 0.2 0 0.8


C

0.8 D
0 0 1 0

1
A state transition diagram.

Each directed edge AB is associated with the positive


transition probability from A to B.

11
Example: Consider the Markov chain shown in Figure below:

12
Solution

13
Write any transition diagrams as transition matrices

a) b)

c)
a)

14
Probability Distributions: State Probability Distributions:

Consider a Markov chain {Xn,n= 0,1,2,...}, where Xn∈S={1,2,⋯,r}.

Suppose that we know the probability distribution of X0. More


specifically, define the row vector π(0) as:

π(0) = [P(X0 = 1)P(X0 = 2)⋯P(X0 = r)].

How can we obtain the probability distribution of X1, X2, ⋯?

We can use the law of total probability. More specifically, for any j∈S,
we can write:

15
If we generally define

we can rewrite the above result in the form of matrix multiplication

where P is the state transition matrix. Similarly, we can write:

More generally, we can write:

16
Our work thus far is summarized below:

Example

17
Solution

18
Solved example
Voting Trends At the end of June in a presidential election year, 40% of the voters
were registered as liberal, 45% as conservative, and 15% as independent. Over a one-
month period, the liberals retained 80% of their constituency, while 15% switched to
conservative and 5% to independent. The conservatives retained 70% and lost 20% to
the liberals. The independents retained 60% and lost 20% each to the conservatives
and liberals. Assume that these trends continue.
a) Write a transition matrix using this information.
b) Write a probability vector for the initial distribution.
Find the percent of each type of voter at the end of each of
the following months.
c). July
d). August
e). September
f. October

Solution
Transition matrix
L I C
Liberal 0.8 0.15 0.05
a) Conservative 0.2 0.7 0.1
Independent 0.2 0.2 0.6
19
b) Initial distribution
40% of the voters were registered as liberal, 45% as conservative, and 15% as
independent:
b) The probability vector for the initial distribution:
X0 = π(0) = [0.4 0.45 0.15]
c) Percent of each type of voter at the end of July:
44% L; 40.5% C; 15.5% I
d) Percent of each type of voter at the end of August:
46.4% L; 38.05% C; 15.55% I
e) Percent of each type of voter at the end of August: Apply X0 *P(n)
47.84% L; 36.705% C; 15.3355% I
f) Percent of each type of voter at the end of September:
48.704% I; 35.9605% C; 15.3355% I

20
Chapman Kolmogorov equations
Multiple step transition probabilities

21
Chapman-Kolmogorov equation

22
2-step transition probabilities

23
n-step, m-step and (m + n)-step

24
Interpretation

Chapman Kolmogorov is intuitive. Recall

25
Matrix form

n-step transition probabilities

26
n-Step Transition Probabilities:

Consider a Markov chain {Xn, n = 0,1,2,...}, where Xn∈S. If X0= i, then X1=j with
probability pij. That is, pij gives us the probability of going from state i to state j in
one step. Now suppose that we are interested in finding the probability of going from
state i to state j in two steps, i.e.,

We can find this probability by applying the law of total probability. In particular, we
argue that X1 can take one of the possible values in S. Thus, we can write:

We conclude

27
Example: Happy-Sad

28
29
Regular Transition Matrices

One of the many applications of Markov chains is in finding long-range predictions.


It is not possible to make long-range predictions with all transition matrices, but for a
large set of transition matrices, long-range predictions are possible. Such predictions
are always possible with regular transition matrices.

A transition matrix is regular if some power of the matrix contains all positive
entries. A Markov chain is a regular Markov chain if its transition matrix is regular.

30
NOTE
If a transition matrix P has some zero entries, and P2 does as well, you may wonder
how far you must compute Pk to be certain that the matrix is not regular.

The answer is that if zeros occur in the identical places in both Pk and Pk+1 for any k,
they will appear in those places for all higher powers of P, so P is not regular.

Suppose that v is any probability vector. It can be shown that for a regular Markov
chain with a transition matrix P, there exists a single vector V that does not depend on
v, such that gets v.Pn closer and closer to V as n gets larger and larger 31
PROPERTIES OF REGULAR MARKOV CHAINS

Suppose a regular Markov chain has a transition matrix P.


1) As n gets larger and larger, the product v.Pn approaches a
unique vector V for any initial probability vector v. Vector V is
called the equilibrium/steady state vector or fixed vector.

2) Vector V has the property that V.P = P

3) To find V, solve a system of equations obtained from the


matrix equation V.P = V and from the fact that the sum of the
entries of V is 1.

4) The powers Pn come closer and closer to a matrix whose


rows are made up of the entries of the equilibrium vector V.

32
EQUILIBRIUM VECTOR OF A MARKOV CHAIN
If a Markov chain with transition matrix P is regular, then there is a unique vector V
such that, for any probability vector v and for large values of n

Vector V is called the equilibrium vector or the fixed vector of the Markov chain.

If a Markov chain with transition matrix P is regular, then there exists a probability
vector V such that

This vector V gives the long-range trend of the Markov chain. Vector V is found by
solving a system of linear equations, as shown in the next example.

Example:

33
34
35
36
37
38
39
Markov Chain (cont.)
X1 X2 Xn-1 Xn

• For each integer n, a Markov Chain assigns probability to


sequences (x1…xn) over D (i.e, xi D) as follows:
n
p(( x1 , x2 ,...xn ))  p( X 1  x1 ) p( X i  xi | X i 1  xi 1 )
n
i 2

 p ( x1 ) axi1xi
i 2
Similarly, (X1,…, Xi ,…) is a sequence of probability
distributions over D. There is a rich theory which studies the
properties of these sequences. A bit of it is presented next.
40
Markov Chain (cont.)
X1 X2 Xn-1 Xn

Similarly, each Xi is a probability distributions over D, which


is determined by the initial distribution (p1,..,pn) and the
transition matrix M.
There is a rich theory which studies the properties of such
“Markov sequences” (X1,…, Xi ,…). A bit of this theory is
presented next.

41
Random walk with borders (gambling)

42
43
44
45
46
47
48
Classification of States

Performance Questions to be answered:


 How often a certain state is visited?
 How much time will be spent in a state by the system?
 What is the average length of intervals between visits?

Other Properties:
Irreducible
Recurrent
Mean Recurrent Time
Aperiodic
Homogeneous

49
Classification of States
The first definition concerns the accessibility of states from each other: If it is
possible to go from state i to state j, we say that state j is accessible from state i. In
particular, we can provide the following definitions.

50
On dira que deux états i et j communiquent, noté,

51
Exemple.
Considérons une chaîne de Markov d’espace d’état S = {0, 1, 2, 3} de
matrice de transition

On souhaite définir les communications entre les états. On a 0 1,


0 2 et 2 3, les états 2 et 3 sont inaccessibles des états 1 et 0, l’état 3
est inaccessible de 2. La chaîne n’est pas irréductible, on a trois classes:

Considérons une chaîne de Markov d’espace d’état S = {0, 1, 2} de matrice de transition


Conclusion????
Tous les états communiquent, la
chaîne est irréductible 52
Example 2:

Find the communicating classes associated with the transition diagram


shown.

Solution:
{1, 2, 3}, {4, 5}.
State 2 leads to state 4, but state 4 does not lead back to state 2, so they are in
different communicating classes.
Definition: A communicating class of states is closed if it is not possible to
leave that class.

Example: In the transition diagram above:


• Class {1, 2, 3} is not closed: it is possible to escape to class {4, 5}
.• Class {4, 5} is closed: it is not possible to escape

53
Example:
Consider the Markov chain shown in Figure that follows.

It is assumed that when there is an arrow from state i to state j, then


pij>0. Find the equivalence classes for this Markov chain.
54
There are four communicating classes in this Markov chain. Looking at Figure below,
we notice that states 1 and 2 communicate with each other, but they do not
communicate with any other nodes in the graph.

Similarly, nodes 3 and 4


communicate with each other,
but they do not communicate
with any other nodes in the
graph. State 5 does not
communicate with any other
states, so it by itself is a class.
Finally, states 6, 7, and 8
construct another class.

Thus, here are the classes:


Class 3={state 5},
Class 1={state 1,state 2},
Class 4={state 6,state 7,state 8}
Class 2={state 3,state 4},
55
A Markov chain is said to be irreducible if all states communicate with each other.
Looking at Figure below:

Notice that there are two kinds of classes. In particular, if at any time the Markov
chain enters Class 4, it will always stay in that class.

On the other hand, for other classes this is not true. For example, if X0=1, then the
Markov chain might stay in Class 1 for a while, but at some point, it will leave that
class and it will never return to that class again. The states in Class 4 are called
recurrent states, while the other states in this chain are called transient.
56
Recurrent states
In general, a state is said to be recurrent if, any time that we leave that state, we will
return to that state in the future with probability one. On the other hand, if the
probability of returning is less than one, the state is called transient. Here, we provide
a formal definition:

Consider a Markov chain and assume X0=i. If i is a recurrent state, then the
chain will return to state i any time it leaves that state. Therefore, the chain will
visit state i an infinite number of times. On the other hand, if i is a transient
state, the chain will return to state i with probability fii<1. Thus, in that case, the
total number of visits to state i will be a Geometric random variable with
parameter 1−fii.
57
Recurrent – Transient State

58
59
60
61
Example 1
Show that in a finite Markov chain, there is at least one recurrent class.

Solution

Consider a finite Markov chain with r states, S={1,2,⋯,r}.


Suppose by contradiction that all states are transient.

Then, starting from time 0, the chain might visit state 1 several times, but at some
point the chain will leave state 1 and will never return to it. That is, there exists an
integer M1>0 such that Xn≠1, for all n ≥ M1. Similarly, there exists an integer M2>0
such that Xn≠2, for all n≥M2, and so on. Now, if you choose

n ≥ max{M1,M2,⋯,Mr},

then Xn cannot be equal to any of the states 1,2,⋯,r.

This is a contradiction, so we conclude that there must be at least one recurrent state,
which means that there must be at least one recurrent class.

62
Ex2
Sur l’ensemble S = {0, 1, . . . , n} on considère la chaîne de Markov de
matrice de transition P donnée pour 0 ≤ x ≤ n − 1 par:

l’état n étant absorbant (i.e. P(n, x) = 1 si n = x et 0 sinon), et avec 0 < p < 1.


a) Dessiner le graphe associé à cette chaîne de Markov. Quels sont les états récurrents
et les états transients de cette chaîne ?

b) Soit S = {1, . . . , 6}, compléter la matrice suivante pour qu’elle soit matrice de
transition d’une chaîne de Markov

et déterminer quels sont ses états transitoires et récurrents. 63


Solution
a) On a deux classe d'équivalence (deux sous-chaînes irréductibles) dans cette
chaîne. La première contient les états {0, 1, . . . , n - 1} et l'autre l'état n.

L'état n est irréductible, car P {Tn < ∞|X0 = n} = 1.


Les autres états sont soit tous irréductibles soit tous transients
(car ils sont dans la même composante fortement connexe du graphe).

Étudions l'état n -1
On a
P{Tn-1<∞|X0=n-1} ≤ P{X1=0|X0=n-1}
=1–p
<1
Les états 0 à n - 1 sont donc transients

64
… Solution
b) Dessiner le graphe de la chaîne de Markov.

????????????

On voit qu'il y a deux composantes fortement connexes fermées (avec


aucune arrête sortante) : C1 = {1, 2} et C2 = {3, 5}. L'ensemble {4, 6} est
aussi fortement connexe mais n'est pas fermé.

Si on regroupe ces composantes connexes et qu'on dessine l'arbre


associé, l'ensemble {4, 6} est la racine de l'arbre, et les noeuds C1 et C2
sont ses fils et sont des feuilles de l'arbre.

Comme le graphe est ni, on sait que les états dans les feuilles sont
récurrents et les autres sont transitoires. Montrons le dans ce cas là.

65
… Solution

On montre de même que les états de C2 sont récurrents.


Montrons maintenant que l'état 4 n'est pas récurrent (comme il
communique avec 6, cela montrera aussi que 6 est transitoire).
Comme 6 est le seul état atteignable en un coup depuis 4 qui peut
ensuite permettre de revenir en 4, on a

P[Tx<∞|X0=4] ≤ P [X1 = 6|X0 = 4] < 1.

Donc 4 et 6 sont transitoires


66
Periodicity
Consider the Markov chain shown in Figure below:

There is a periodic pattern in this chain. Starting from state 0, we only return to 0 at
times n=3,6,⋯. In other words, p(n)00= 0, if n is not divisible by 3. Such a state is
called a periodic state with period d(0)=3.

67
Periodic/aperiodic

A class is said to be periodic if its states are periodic. Similarly, a class is said to be
aperiodic if its states are aperiodic. Finally, a Markov chain is said to be aperiodic if
all of its states are aperiodic.
If i↔j, then d(i)=d(j).

68
Example 2
Consider the Markov chain

Is Class 1={state 1,state 2} aperiodic?

Is Class 2={state 3,state 4} aperiodic?

Is Class 4={state 6,state 7,state 8}


aperiodic?

Solution

gcd : greatest common divisor 69


Irrudicible/Irréductible

A Markov chain is said to be irreducible if all states communicate with


each other.
Une chaîne de Markov est dite irréductible si elle ne posséde qu’une
seule classe, c’est `a dire si tous les états communiquent entre eux
70
71
gcd : greatest common divisor

72
73
Ergodic Markov Chains

A Markov chain is ergodic if :


1. the corresponding graph is
strongly connected.
2. It is not peridoic

Ergodic Markov Chains are important since they guarantee the


corresponding Markovian process converges to a unique
distribution, in which all states have strictly positive probability.

74
Ex2
On considère la matrice Q de taille 7 × 7 suivante :

75
Ex4

76
Ex5
On considère un joueur qui joue au jeu suivant au casino :

77
“Good” Markov chains
A Markov Chains is good if the distributions Xi , as i∞:

(1) converge to a unique distribution, independent of the initial


distribution.

(2) In that unique distribution, each state has a positive


probability.

The Fundamental Theorem of Finite Markov Chains:


A Markov Chain is good  the corresponding graph is
ergodic.

We will prove the  part, by showing that non-ergodic Markov


Chains are not good.
78
Examples of
“Bad” Markov Chains

A Markov chains is not “good” if either :


1. It does not converge to a unique
distribution.
2. It does converge to u.d., but some states
in this distribution have zero probability.

79
Bad case 1: Mutual Unreachabaility

Consider two initial distributions:


a) p(X1=A)=1 (p(X1 = x)=0 if x≠A).

b) p(X1= C) = 1

In case a), the sequence will stay at A forever.


In case b), it will stay in {C,D} for ever.

Fact 1: If G has two states which are unreachable from each


other, then {Xi} cannot converge to a distribution which is
independent on the initial distribution.
80
Bad case 2: Transient States

Once the process moves from B to D, it will never come back.

81
Bad case 2: Transient States

Fact 2: For each initial distribution,


with probability 1 a transient state
will be visited only a finite number
of times.

Proof: Let A be a transient state, and let X be the set of states


from which A is unreachable. It is enough to show that,
starting from any state, with probability 1 a state in X is
reached after a finite number of steps (Exercise: complete the
proof)
82
Corollary: A good Markov
Chain is irreducible

83
Bad case 3: Periodic Markov Chains
E

Recall: A Markov Chain is periodic if all the states in it have a period


k >1. The above chain has period 2.
In the above chain, consider the initial distribution p(B)=1.
Then states {B, C} are visited (with positive probability) only in odd
steps, and states {A, D, E} are visited in only even steps.

84
Bad case 3: Periodic States
E

Fact 3: In a periodic Markov Chain (of period k >1) there are initial
distributions under which the states are visited in a periodic manner.
Under such initial distributions Xi does not converge as i∞.

Corollary: A good Markov


Chain is not periodic
85
The Fundamental Theorem of Finite Markov
Chains:
We have proved that non-ergodic Markov Chains are not good.
A proof of the other part (based on Perron-Frobenius theory) is
beyond the scope of this course:

If a Markov Chain is ergodic, then


1. It has a unique stationary distribution vector V > 0, which is an
Eigenvector of the transition matrix.
2. For any initial distribution, the distributions Xi , as i∞,
converges to V.

86
Hidden Markov Model
M M M M
S1 S2 SL-1 SL

T T T T
x1 x2 XL-1 xL

L
A Markov chain (s1,…,sL): p( s1 , , sL )   p( si | si 1 )
i 1
and for each state s and a symbol x we have p(Xi=x|Si=s)

Application in communication: message sent is (s1,…,sm) but we


receive (x1,…,xm) . Compute what is the most likely message sent ?
Application in speech recognition: word said is (s1,…,sm) but we
recorded (x1,…,xm) . Compute what is the most likely word said ?

87
Hidden Markov Model
M M M M
S1 S2 SL-1 SL

T T T T
x1 x2 XL-1 xL

Notations:
Markov Chain transition probabilities: p(Si+1= t|Si = s) = ast
Emission probabilities: p(Xi = b| Si = s) = es(b)
L
For Markov Chains we know: p( s)  p( s1 , , sL )   p( si | si 1 )
i 1

What is p(s,x) = p(s1,…,sL;x1,…,xL) ?

88
Hidden Markov Model
M M M M
S1 S2 SL-1 SL

T T T T
x1 x2 XL-1 xL

p(Xi = b| Si = s) = es(b), means that the probability of xi


depends only on the probability of si.
Formally, this is equivalent to the conditional
independence assumption:
p(Xi=xi|x1,..,xi-1,xi+1,..,xL,s1,..,si,..,sL) = esi(xi)
L

Thus p( s, x)  p( s1 , , sL ; x1 ,..., xL )   p( si | si 1 ) esi ( xi )


i 1

89
Absorbing Markov Chains
Not all Markov chains are regular. In fact, some of the most important life science
applications of Markov chains do not involve transition matrices that are regular. One
type of Markov chain that is widely used in the life sciences is called an absorbing
Markov chain.

When we use the ideas of Markov chains to model living organisms, a common state
is death. Once the organism enters that state, it is not possible to leave. In this
situation, the organism has entered an absorbing state.

For example, suppose a Markov chain has transition matrix

The matrix shows that, P12 ,the probability of


going from state 1 to state 2, is 0.6, and that,
P22 ,the probability of staying in state 2, is 1.

Thus, once state 2 is entered, it is impossible to


leave. For this reason, state 2 is called an
absorbing state.

90
The resulting transition diagram shows that it is not possible to leave
state 2.

Generalizing from this example leads to the following definition

91
Absorbing Markov Chains
Identify all absorbing states in the Markov chains having the following matrices.
Decide whether the Markov chain is absorbing

92
Solution

a) Since P11 = 1, and P33 =1 both state 1 and state 3 are absorbing states. (Once
these states are reached, they cannot be left.) The only non-absorbing state is state
2.

There is a 0.3 probability of going from state 2 to the absorbing state 1, and a 0.2
probability of going from state 2 to state 3, so that it is possible to go from the
nonabsorbing state to an absorbing state. This Markov chain is absorbing. The
transition diagram is shown in Figure.

93
b) States 2 and 4 are absorbing, with states 1 and 3 non-absorbing.
From state 1, it is possible to go only to states 1 or 3; from state 3 it is possible to go
only to states 1 or 3.

As the transition diagram in Figure below shows, neither non-absorbing state leads to
an absorbing state, so that this Markov chain is nonabsorbing.

94
Let P be the transition matrix for an absorbing Markov chain.
Rearrange the rows and columns of P so that the absorbing states come first.
Matrix P will have the form:

where Im is an identity matrix, with m equal to the number of absorbing states, and O
is a matrix of all zeros.

 The element in row i, column j of the fundamental matrix gives the number of
visits to state j that are expected to occur before absorption, given that the
current state is state i.
 The product FR gives the matrix of probabilities that a particular initial
nonabsorbing state will lead to a particular absorbing state. 95
Using the Law of Total Probability with Recursion

In this section, we will use this technique to find:

 absorption probabilities,

 mean hitting times, and

 mean return times.

96
Consider the Markov chain shown in the state transition diagram:

The state transition matrix of this Markov chain is given by the following matrix

How many classes are there? For each class, mention if it is recurrent or transient.

There are three classes: Class 1 consists of one state, state 0, which is a recurrent state.
Class two consists of two states, states 1 and 2, both of which are transient. Finally,
class three consists of one state, state 3, which is a recurrent state.

97
Note that states 0 and 3 have the following property:
 once you enter those states, you never leave them.

For this reason, we call them absorbing states. For our example here, there are two
absorbing states. The process will eventually get absorbed in one of them. The first
question that we would like to address deals with finding absorption probabilities.

Absorption Probabilities
Consider the Markov chain in Figure below oncemore:

Let's define ai as the absorption probability in state 0 if we start from state i. More
specifically,

98
By the above definition, we have a0=1 and a3= 0. To find the values of a1 and a2, we
apply the law of total probability with recursion. The main idea is the following:

 if Xn=i, then the next state will be Xn+1=k with probability pik.

Thus, we can write:

Solving the above equations will give us the values of a1 and a2. More specifically,
using Equation above, we obtain
We also know a0=1 and a3=0. Solving for
a1 and a2, we obtain

99
Exercise:
Consider the Markov chain in above markov chain oncemore.

Let's define bi as the absorption probability in state 3 if we start from state i. Use the
above procedure to obtain bi for i = 0,1,2,3.

100
From the definition of bi and the Markov chain graph above, we have b0=0 and b3=1.
Now, from
, we have:

101
Exampe 2

102
In sum,

 In general, a finite Markov chain might have several transient as well as several
recurrent classes.
 As n increases, the chain will get absorbed in one of the recurrent classes and it
will stay there forever.
 We can use the above procedure to find the probability that the chain will get
absorbed in each of the recurrent classes.
 In particular, we can replace each recurrent class with one absorbing state. Then,
the resulting chain consists of only transient and absorbing states.
 We can then follow the above procedure to find absorption probabilities.
103
Mean Hitting Times
Definition: the expected time until the process hits a certain set of states for the first
time.

Again, consider the Markov chain below:

Let's define ti as the number of steps needed until the chain hits state 0 or state 3,
given that X0=i. In other words, ti is the expected time (number of steps) until the
chain is absorbed in 0 or 3, given that X0=i. By this definition, we have t0=t3= 0

To find t1 and t2, we use the law of total probability with recursion as before. For
example, if X0=1, then after one step, we have X1=0 or X1=2. Thus, we can write

Similarly, we can write:

104
In sum,

105
Vector of hitting probabilities

Let A be some subset of the state space S. (A need not be a


communicating class: it can be any subset required, including the
subset consisting of a single state: e.g. A = {4}.)

The hitting probability from state i to set A is the probability of ever


reaching the set A, starting from initial state i.

We write this probability as hiA


.
Thus
hiA = P(Xt ∈ A for some t ≥ 0 | X0 = i).

106
Example:

Let set A = {1, 3} as shown

The hitting probability for set A is:


 1 starting from states 1 or 3
(We are starting in set A, so we hit it immediately);
 0 starting from states 4 or 5
(The set {4, 5} is a closed class, so we can never escape out to set
A);
 0.3 starting from state 2
(We could hit A at the first step (probability 0.3), but otherwise we
move to state 4 and get stuck in the closed class {4, 5} (probability 0.7).)

107
We can summarize all the information from the example above in a vector of hitting
probabilities:

108
Example

109
Mean Return Times

110
Calculating the mean hitting times
The vector of expected hitting times mA = (miA : i ∈ S) is the
minimal non-negative solution to the following equations:

111
112
113
Example:Consider the Markov chain shown in Figure.
Let tk be the expected number of steps until the
chain hits state 1 for the first time, given that X0=k.
Clearly, t1=0. Also, let r1 be the mean return time to
state 1.

1) Find t2 and t3
2) Find r1

Solution

114
115
Stationary and Limiting Distributions
Here, we would like to discuss long-term behavior of Markov chains. In particular,
we would like to know the fraction of times that the Markov chain spends in each
state as n becomes large. More specifically, we would like to study the distributions

as n→∞. To better understand the subject, we will first look at an example and then
provide a general analysis

116
Example
Consider a Markov chain with two possible states, S={0,1}. In particular, suppose
that the transition matrix is given by

where a and b are two real numbers in the interval [0,1] such that 0<a+b<2. Suppose
that the system is in state 0 at time n=0 with probability α, i.e.,

117
118
119
120
Example 2:
Consider a Markov chain with two possible states, S={0,1}. In particular, suppose
that the transition matrix is given by:

where a and b are two real numbers in the interval [0,1] such that 0<a+b<2. Find the
mean return times, r0 and r1, for this Markov chain.

121
122
Finite Markov Chains:
Here, we consider Markov chains with a finite number of states. In general, a finite
Markov chain can consist of several transient as well as recurrent states.

As n becomes large the chain will enter a recurrent class and it will stay there forever.
Therefore, when studying long-run behaviors we focus only on the recurrent classes.

If a finite Markov chain has more than one recurrent class, then the chain will get
absorbed in one of the recurrent classes.
It turns out that in this case the Markov chain has a well-defined limiting behavior if it
is aperiodic (states have period 1). How do we find the limiting distribution? The trick
is to find a stationary distribution.

Here is the idea: If π=[π1,π2,⋯] is a limiting distribution for a Markov chain, then we
have :

123
Similarly, we can write:

We can explain the equation π = πP intuitively:

Suppose that Xn has distribution π. As we saw before, πP gives the probability


distribution of Xn+1. If we have π = πP, we conclude that Xn and Xn+1 have the same
distribution. In other words, the chain has reached its steady-state (limiting)
distribution. We can equivalently write π = πP as :

The righthand side gives the probability of going to state j in the next step. When we
equate both sides, we are implying that the probability of being in state j in the next
step is the same as the probability of being in state j now.
124
We now summarize the discussion in the following theorem.

125
126
127
128
Example
Consider the Markov chain shown below:

a) Is this chain irreducible?


b) Is this chain aperiodic?
c) Find the stationary distribution for this chain.
d) Is the stationary distribution a limiting distribution for the chain?

129
Solution
The chain is irreducible since we can go from any state to any other states in a finite
number of steps.
Since there is a self-transition, i.e., p11>0, we conclude that the chain is aperiodic.
To find the stationary distribution, we need to solve :

d) Since the chain is irreducible and aperiodic, we conclude that the above stationary
distribution is a limiting distribution.

130
Countably Infinite Markov Chains

Theorem

131
where rj is the mean return time to state j.
How do we use the above theorem? Consider an infinite Markov chain {Xn,n
=0,1,2,...}, where Xn∈S={0,1,2,⋯}. Assume that the chain is irreducible and
aperiodic. We first try to find a stationary distribution π by solving the equations
132
Example
Consider the Markov chain shown in Figure below.

Assume that 0<p<12. Does this chain have a limiting distribution?


133
Solution
This chain is irreducible since all states communicate with each other. It is also
aperiodic since it includes a self-transition, P00>0. Let's write the equations for a
stationary distribution.

For state 0, we can write:

134
135
Consider the Markov chain shown in Figure below:

Assume that 12<p<1. Does this chain have a limiting distribution? For all
i,j∈{0,1,2,⋯}, find
lim P(Xn= j|X0 = i).
n→∞
Solution
The chain is irreducible since all states communicate with each other. It is also
aperiodic since it includes a self-transition, P00>0. Let's write the equations for a
stationary distribution.
For state 0, we can write :

136
137

You might also like