Professional Documents
Culture Documents
Markov Chains MC
Markov Chains MC
by
Howard G. Tucker
1
Finally, for every j 1, there exists a value of k 1 such that qkj > 0, and
for this k, there exists an i 1 such that pik > 0. Thus, for every j 1
there exists an i 1 such that rij pik qkj > 0. Q.E.D.
Now let n be any positive integer for which the theorem is true. Then
P ([Xn = k]) > 0 for all k 2 X, and, as before, there exists an i 2 X
such that P ([X1 = j]j[X0 = i]) > 0. By requirement (ii) in the de…nition of
a Markov chain, P ([X1 = j]j[X0 = i]) = P ([Xn+1 = j]j[Xn = i]) > 0. Thus
P
P ([Xn+1 = j]) = k2X P ([Xn+1 = j]j[Xn = k])P ([Xn = k])
P ([Xn+1 = j]j[Xn = i])P ([Xn = i]) > 0. Q.E.D.
2
and a stochastic matrix P = (pij ) indexed by X, there exist a probability
space ( ; A; P ) and X-valued random variables X0 ; X1 ; X2 ; de…ned over
which form a Markov chain for which (i) P ([X0 = i]) = (i) for all i 2 X
and (ii) P ([Xn+1 = j]j[Xn = i]) = pij for all n 0 for all j 2 X and all
i 2 X.
Proof: Let us order the elements of X in the order type 1; 2; ; n or
1; 2; ; n; , depending on whether X is …nite or denumerable, so that for
all intents and purposes the elements of X are consecutive positive integers.
Then denote = 1 r=0 X. For every j0 2 X, de…ne
A(0; j0 ) = f(x0 ; x1 ; x2 ; )2 : x0 = j0 g.
For arbitrary j0 ; j1 in X, de…ne
A(0; j0 ; 1; j1 ) = f(x0 ; x1 ; x2 ; ) 2 : x0 = j0 ; x1 = j1 g
T
= 1k=0 f(x0 ; x1 ; x2 ; ) 2 : xk = jk g,
and, in general, for arbitrary j0 ; j1 ; ; jn in X, de…ne
A(0; j0 ; : : : ; n; jn ) = f(x
T n 0 ; x1 ; x 2 ; ) 2 : x 0 = j 0 ; ; xn = j n g
= k=0 f(x0 ; x1 ; x2 ; ) 2 : xk = jk g
Over all subsets of the kind just de…ned, de…ne a function P by P (A(0; j0 )) =
P (A(0; j0 ; 1; j1 )) = (j0 )pj0 j1 , and, in general, P (A(0; j0 ; : : : ; n; jn )) =
(j0 ),Q
(j0 ) nk=1 pjk 1 jk . Without ambiguity we may de…ne, for integers 0
k0 < < kr and r, arbitrary elements of X, call them jk0 ; ; jkr , the
set A(k0 ; jk0 ; ; kr ; jkr ) by taking the disjoint union of all sets of the form
A(0; j0 ; : : : ; n; jn ), for any n kr , over all sets of the form
A(0; m0 ; 1; m1 ; 2; m2 ; ; n; mn ),
where mk0 = jk0 ; mk1 = jk1 ; ; mkr = jkr , then P (A(k0 ; jk0 ; ; kr ; jkr ))
is de…ned as the sum of P (A(0; j0 ; : : : ; n; jn )) of these sets. The manner
in which we de…ned P (A(k0 ; j; ; kr ; jkr )) indicates that we can apply the
Kolmogorov-Daniell theorem to extend uniquely the function P from the
algebra of subsets generated by all sets of the form A(k0 ; jk0 ; ; kr ; jkr ) to
the sigma-algebra generated by this algebra. Then de…ne for k 0 the
function Xk by Xk (j0 ; j1 ; j2 ; ) = jk . We now verify that fX0 ; X1 ; g is
the Markov chain as claimed. Since [X0 = j0 ] = A(0; j0 ), it follows that
P ([X0 = j0 ]) = (j0 ) > 0. Also, since
T
A(0; j0 ; : : : ; n; jn ) = nk=0 [Xk = jk ]
3
for n 0, it follows that
T Q
P ( nk=0 [Xk = jk ]) = (j0 ) nk=1 pjk 1 jk .
T
Now suppose that P ( nk=0 [Xk = jk ]) > 0. Then by the above
Tn+1
Tn P ( k=0 [Xk =jk ])
P ([Xn+1 = jn+1 ]j k=0 [Xk = jk ]) = Tn
P ( k=0 [Xk =jk ])
Qn+1
(j0 ) pjk 1 jk
= Qk=1
n
(j0 ) k=1
pjk 1 jk
= pjn jn+1 .
Also
P ([Xn+1 =jn+1 ][Xn =jn ])
P ([Xn+1 = jn+1 ]j[Xn = jn ]) =
P P ([Xn =jn ]) Tn 1
P ([Xn+1 =jn+1 ][Xn =jn ] [Xk =jk ])
= P Tn 1
k=0
P( k=0
[Xk =jk ])
= pjn jn+1 ,
where the two sums are taken over all n tuples of states, j0 ; j1 ; ; jn 1 .
From this we also obtain
and thus fX0 ; X1 ; g satis…es the de…nition of a Markov chain with state
space X, stochastic matrix P and initial distribution ( ).
7. Corollary. If fXn g is a Markov Chain determined by initial dis-
tribution f (i) : i 2 Xg and stochastic matrix P = (pij ), if n 1, and if
i0 ; i1 ; ; in is any sequence of states in X, then
T Q
P ( nj=0 [Xj = ij ]) = (i0 ) nj=1 pij 1 ij .
and in general P
pn+1
ij = k2X pnik pkj .
4
9. Proposition. If fXn g is a Markov Chain determined by initial
distribution f (i) : i 2 Xg and stochastic matrix P = (pij ), and if n 1,
then for every positive n and k,
But
P kT1
_
P ([Xk = i1 ]j[X0 = i0 ]) = P ([Xk = i1 ] [Xr = ur ][X0 = i0 ])=P ([X0 = i0 ]);
r=1
P ([Xk0 = i0 ][Xk1 = i1 ])
is equal to
P k0T 1 k1T 1
P( [Xq = uq ][Xk0 = i0 ] [Xr = vr ][Xk1 = i1 ]),
q=0 r=k0 +1
P kQ
0 1
(u0 )( p um 1 um )puk0 1 i0
,
m=1
5
where the sum is taken over all states u0 ; u1 ; ; uk0 1 in X, and
P kQ
1 2
pi0 vk0 +1 ( pvm vm+1 )pvk1 1 i1
,
m=k0 +1
where the sum is taken over all states vk0 +1 ; ; vk1 1 in X. The …rst sum
is equal to P ([Xk0 = i0 ]), and the second sum is equal to pki01i1 k0 . The proof
for general n involves the product of n + 1 sums of products. Q.E.D.
11. Theorem. If fXn g is a Markov Chain determined by the state
space X, initial distribution f (i) : i 2 Xg and stochastic matrix P = (pij ),
if 0 k0 < k1 < < km < r0 < r1 < < rn , and if
T
P( mj=0 [Xkj = uj ]) > 0,
6
for all j in X.
Proof: The left side of the above may be expressed as
P ([Xn+2 = j][Xn+1 = i][(X0 ; X1 ; ; Xn ) 2 S])
.
P ([Xn+1 = i][(X0 ; X1 ; ; Xn ) 2 S])
The numerator is equal to
P T
n
P ([Xn+2 = j][Xn+1 = i] [Xk = uk ]),
k=0
7
particular case of this. Prove that if fXn g is a Markov Chain determined
by state space X, initial distribution f (i) : i 2 Xg and stochastic matrix
P = (pij ), then
P ([Xn 2 = i][Xn 1 = j][Xn+1 = k][Xn+2 = l]j[Xn = r])
is equal to
P ([Xn 2 = i][Xn 1 = j]j[Xn = r])P ([Xn+1 = k][Xn+2 = l]j[Xn = r])
for n 2 and for arbitrary elements i; j; k; l; r in X:
3. Sometimes that part of the de…nition of Markov chain that states
T
P ([Xn+1 = jn+1 ]j nk=0 [Xk = jk ]) = P ([Xn+1 = jn+1 ]j[Xn = jn ])
T
for all states j0 ; j1 ; ; jn+1 in X which satisfy P (T nk=0 [Xk = jk ]) > 0 is
replaced by the requirement that P ([Xn+1 = jn+1 ]j nk=0 [Xk = jk ]) does not
depend
T on j0 ; j1 ; ; jn 1 for all states j0 ; j1 ; ; jn+1 in X which satisfy
P ( nk=0 [Xk = jk ]) > 0. Prove that these two statements are equivalent.
4. Consider the set of all 2 2 tables of nonnegative integers for which
the numbers in the …rst row adds up to 6, the numbers in the …rst column
adds up to 5, and the sum of all four numbers is 16. An example of one such
table is
2 4
s2 =
3 7
(i) List all six tables that satisfy the requirements.
(ii) These six tables constitute a state space for a Markov chain which
we are about to construct. Let si denote the table with the number i in …rst
row and the …rst column. A table like this one arises from time to time in
statistical practice with a probability measure de…ned over it by
5 11
i 6 i
(si ) = 16 for 0 i 5.
6
The mechanism for generating a Markov chain here is to select a …rst state
at random according to the distribution ( ). Whatever state is obtained,
then the next state is selected according to the following procedure. First,
select one of the following tables,
1 1 1 1
or
1 1 1 1
8
with probability 12 ; 12 . Whichever table is obtained, add it termwise to the
original table for your next state unless adding it termwise puts a negative
number in one of the entries, in which case that next state is the same as the
present state. So display the 6 6 stochastic matrix for this Markov chain:
(For example, ps0 ;s0 = 21 , ps3 s2 = 21 , ps2 s4 = 0, etc.)
(iii) Compute the density of X0 from the formula given above, then com-
pute the density of X1 , and …nally compute the joint density of X0 ; X1 .
(iv) If P denotes the stochastic matrix from part (ii), display these ma-
trices: P 10 , P 100 and P 1000 .
9
r0 (a; b). But note that by b = q0 a + r0 , it follows that (a; b)jr0 . Therefore
by Proposition 2 above, (a; b) r0 , so the two inequalities imply (a; b) = r0 .
But if r1 > 0, then as before we may write r0 = q1 r1 + r2 where 0 r2 < r1 .
Again, if r2 = 0, then in the very same way, one can show that r1 ja and
r1 jb so that r1 (a; b), while at the same time showing that (a; b)jr1 . Thus
we have, as before, (a; b) = r1 :Proceed until the last nonzero remainder is
obtained.
8. Theorem. If 0 < a < b are integers, then there exist integers u and
v such that (a; b) = ua + vb.
Proof: In the Euclidean algorithm above, let us take the the case where
(a; b) = r1 . Then since r1 = a q1 r0 and since r0 = b q0 a, we have
r1 = a q1 (b q0 a) = (1 + q0 q1 )a q1 b, which proves the theorem in the case
when (a; b) = r1 in the application of the Euclidean algorithm. From here
on, the general proof is just a matter of notation.
9. Theorem. If a and b are positive integers, if dja and if djb, then
dj(a; b).
Proof: This follows directly from Theorem 8.
10. Theorem. If 0 < k1 < k2 < < kr are integers, where r 2, then
(k1 ; k2 ; ; kr ) = ((k1 ; k2 ; ; kr 1 ); kr ),
Proof : We shall prove this by induction on n for the integers 0 < b <
a0 < a 1 < < an . We …rst prove it is true for n = 1 by proving (b; a0 ; a1 ) =
((b; a0 ); a1 ) and then by applying theorem. Let d = (b; a0 ; a1 ). Then djb
and dja0 and dja1 , so by theorem 8, dj(b; a0 ); but also dja1 , so again by
theorem 9, dj((b; a0 ); a1 ), from which follows from Proposition 2 above that
(b; a0 ; a1 ) ((b; a0 ); a1 ). On the other hand, if d = ((b; a0 ); a1 ), then dj(b; a0 )
and dja1 , which in turn implies djb and dja0 and dja1 so d (b; a0 ; a1 ) . Thus
the …rst conclusion of the theorem,
(b; a0 ; a1 ) = ((b; a0 ); a1 );
10
(b; a0 ; a1 ; ; an ) = ((b; a0 ; a1 ; ; an 1 ); an ) and
P
(b; a0 ; a1 ; ; an ) = jb + nk=0 jk ak
1 = c 1 a1 + + c n an .
Without loss of generality we may assume that c1 > 0. Now let us de…ne
N + m = N + ca1 + r,
11
As noted above, 1 = c1 a1 + + cn an , so
P
N + m = a1 a2 jc2 j + a1 a3 jc3 j + a1 an jcn j + m ni=1 ci ai
= mc1 a1 + (a1 jc2 j + mc2 )a2 + + (a1 jcn j + mcn )an ,
which proves the theorem.
12. De…nition. If fXn g is a Markov Chain determined by state space
X, initial distribution f (i) : i 2 Xg and stochastic matrix P = (pij ), then
state j is said to be accessible from state i if there exists a nonnegative integer
n such that pnij > 0. (Note: for all i 2 X, i is accessible from i since p0ii = 1.)
13. Proposition. If state j is accessible from state i, and if state k is
accessible from state j, then state k is accessible from state i.
Proof: By hypothesis, there exist positive integers m and n such that
pm
ij > 0 and pnjk > 0. Thus by the Chapman-Kolmogorov equations,
P
pm+n
ik = t2X pm n
it ptk
m n
pij pjk > 0,
12
18. De…nition. If fXn g is a Markov Chain determined by initial distri-
bution f (i) : i 2 Xg and stochastic matrix P = (pij ), then the state i 2 X
is called aperiodic if di = 1.
19. De…nition. If fXn g is a Markov Chain determined by state space
X, initial distribution f (i) : i 2 Xg and stochastic matrix P = (pij ), then it
is said to satisfy the uniform irreducibility condition if there exists a positive
integer, N , such that, for all n N and all i; j in X, pnij > 0.
20. Theorem: If fXn g is a Markov Chain determined by state space
X, initial distribution f (i) : i 2 Xg and stochastic matrix P = (pij ), and if
two distinct states i and j communicate with each other, then the periods of
the two states are equal.
Proof: Let n and m be positive integers that satisfy: pnij > 0 and pm ji > 0.
Let k > 0 satisfy pkii > 0. Then, by the Chapman-Kolmogorov equations, it
follows that P
m+n+k
pjj = u;v2X pm k n
ju puv pvj
pji pii pij = cpkii ,
m k n
m+n
where c = pm n
ji pij > 0 does not depend on k. Since pjj > 0; it follows that
m + n = k1 d(j) for some positive integer k1 and m + n + k = k2 d(j). For our
particular value of k such that pkii > 0, we have
Thus d(j) divides all values of k for which pkii > 0, which implies that
d(j)jd(i). By the arbitrariness of i and j, one can also prove that d(i)jd(j).
Thus by Proposition 2, d(i) = d(j).
13
ordered pairs of states i; j. For any state j, we then have, by the Chapman-
Kolmogorov equations, P
pm+1
jj = k2X pm jk pkj .
By the de…nition of Markov chain, there exists a P state k0 , such that pk0 j > 0.
m+1
From this last equation, we observe that pjj = k2X pm jk pkj pm
jk0 pk0 j > 0.
Since (m; m + 1) = 1, it follows that the chain is aperiodic. Since pm ij > 0
for all ordered pairs of states i; j, it easily follows that the Markov chain is
irreducible. Conversely, suppose that the chain is irreducible and is aperiodic.
Let i 2 X, and let Di = fn : pnii > 0g. Since
P
pm+n
ii = k2X pm n
ik pki pm n
ii pii ,
mj (n) = minfpnij : i 2 Xg
14
and
Mj (n) = maxfpnij : i 2 Xg
for all j 2 X and for all n 1. Applying the Chapman-Kolmogorov equa-
tions, we obtain, for n 2,
P P
pnij = k2X pik pnkj 1 k2X pik mj (n 1) = mj (n 1)
and P P
n
pnij = k2X pik pkj
1
k2X pik Mj (n 1) = Mj (n 1)
for all i; j 2 X and for all n 2. Taking the mini2X of both sides of the
…rst inequality, we get
mj (n) mj (n 1),
and, taking the maxi2X of the second inequality, we obtain
Mj (n) Mj (n 1)
for all n 2 and all j 2 X. Thus for each j 2 X, mj and Mj satisfy
mj = lim mj (n) lim Mj (n) = Mj .
n!1 n!1
Note that for every j and for all ordered pairs of states i; j 2 X,
P P
1 = j2X pnij j2X mj (n 1)
and P P
1= j2X pnij j2X Mj (n 1).
Hence,
P if we can prove that mj = Mj =(some) j for all j 2 X, then
j2X j = 1, and all of the rows are the same. So it is su¢ cient to prove
mj = Mj for all j 2 X. Let i0 2 X be such that
mj (n) = min pnij = pni0 j ,
i2X
Then
P n 1
mj (n) = pni0 j = k2X pi0 k pkj
P n 1
= k6=i1 pi0 k pkj + pi0 i1 pin1 j 1
P
= pni1 j 1 + (pi0 i1 )pni1 j 1 + k6=i1 pi0 k pkj
n 1
P
Mj (n 1) + fpi0 i1 + k6=i1 pi0 k gmj (n 1)
15
or
mj (n) Mj (n 1) + (1 )mj (n 1).
Taking the limit of both sides as n ! 1, we obtain mj Mj + (1 )mj or
mj Mj. Since we established abovePthat mj Mj , we may conclude that
mj = Mj = j for all j 2 X. Hence j2X j = 1, and all rows of the limit
matrix are identical. This proves the theorem for m = 1. We now prove the
theorem for m > 1. Since Pm is a stochastic matrix with all positive entries,
we may replace P in the proof of the case above where m = 1 by Pm and
thereby conclude that
Pmn ! (some) as n ! 1,
where is a stochastic matrix of which all rows are identical. Then for
1 k m 1,
lim Pmn+k = Pk .
n!1
Since all rows of are the same , and since Pk is a stochastic matrix, then
Pk = . So Pr ! as r ! 1.
P=P = .
25. Corollary. Under the conditions of the theorem and corollary above,
if the initial distribution of X0 is the same as that of the common row in ,
then the random variables fXn g are identically distributed, and the sequence
is stationary.
Proof: This follows from the conclusion in Corollary 24 that states P =
.
16
Part III. Recurrence
17
Proof: In the proof of Proposition 2, we noted that
S T
[Ti (1) < 1] = n 1 [Xn = i] nr=11 [Xr 6= i] .
and hence S1
[Ti (1) < 1] = n=1 [Xn = i],
from which the proposition follows.
5. Notation. For every pair of elements i; j in the state space X of a
Markov chain fXn g, we shall denote
Tn 1
fijn = P r=1 [Xr 6= j] \ [Xn = j]j[X0 = i]
and P1 S
fij = n=1 fijn = P ( 1n=1 [Xn = j]j[X0 = i]) ,
with fii0 = 0 for all i 2 X. Here fij0 = 0 for all pairs i; j of states where i 6= j;
also, p0ii = 1 for all i 2 X, and p0ij = 0 for i 6= j.
6. Remark. If fXn g is a Markov chain, then
and P1
Pij (s) = n=0 pnij sn ,
both of which converge for jsj < 1.
and
1
Pii (s) = for all s 2 (0; 1).
1 Fii (s)
18
Proof: We …rst observe that
S
[Xn = i] = nk=1 [Ti (1) = k][Xn = i],
so that
S
pnii = P ([Xn = i]j[X0 = i]) = P ( nk=1 [Ti (1) = k][Xn = i]j[X0 = i]) ,
and since the union of intersections on the right side is a disjoint union, then,
letting
Tk 1
W = P [Xk = i] j=0 [Xj 6= i]j[X0 = i]
the above is
P Tk 1
= nk=1 P j=0 [Xj 6= i][Xk = i][Xn = i]j[X0 = i]
Pn T
= k=1 P [Xn = i]j[Xk = i] kj=01 [Xj 6= i][X0 = i] W
P Tk 1
= nk=1 P ([Xn = i]j[Xk = i]) P [Xk = i] j=0 [Xj 6= i]j[X0 = i]
Pn k n k
= k=1 fii pii .
and
Pij (s) = Fij (s)Pjj (s)
19
for all s 2 ( 1; 1).
Proof: This proof is the same as the proof of theorem 8, except in this
case, one must use the agreement that fij0 = p0ij = 0.
it follows that
S Tn 1
P ([Ti (1) < 1]j[X0 = i]) = P ( n 1 [Xn = i] r=1 [Xr 6= i] j[X0 = i]) = fii ;
which proves that (i) holds if and only if (ii) is true. We now prove that (ii)
is equivalent to (iii). We …rst prove
P P1 n P1 n n P1 n
Claim: lim 1 n n
n=1 fii s = n=1 fii and lim n=1 pii s = n=1 pii :
s"1 s"1
20
with each other, so there exist positive integers m and n such that pm
ij > 0
n m+k+n m k n
and pji > 0. Since P = P P P , it follows that
P
pm+k+n
jj = u2X;v2X pnju pkuv pm
vj
pnji pkii pm
ij
= cpkii ,
n m
where
P1 ck = pji pij > 0. Since state i is recurrent, then by theorem 10,
k=1 pii = 1, which combined with the above display implies
P1 m+k+n
k=1 pjj = 1.
Now sum both sides of this equation with respect to u, and we obtain
21
thus proving the two random variables to be conditionally identically distrib-
uted. Substituting this last quantity into the earlier one, we get
then fZn g are independent and identically distributed with respect to P ( j[X0 =
i]).
Proof: This is proved in much the same way as Theorem 12. Again we
prove it for Z1 and Z2 ; the general proof has only more notation. Let a and
b be any real numbers. Then
22
Now let a " 1 on both sides, from which we get
1. Refer to problem 2 in the exercises for Part II. Suppose that the
outcome of 16 plays of the game you observed this table:
B Bc
A 4 2
Ac 1 9
In much more complicated cases than this, we feel that in as many as 4 trials
out of 16 of having both A and B occur is too large relative to the other cell
counts for the hypothesis of independence. So what we might wish to do is
compute the conditional probability P ([X11 4]j[X1 = 6][X 1 = 5]) to see if
this "is possible". From your answer last time, this conditional probability
is 0:0357. You might simulate this game until you obtain 10,000 games in
which the …rst row sums are equal to 6 and the …rst column sums are equal
to 4. As these occur in your program, …nd the relative frequency in which
6 5
the event [X11 4] occurs. To do so, you might …rst take = 16 and = 16 .
Then you might obtain empirical evidence that this conditional probability
does not depend on and by taking = :5 and = :5.
23
2. One can also create a Markov chain that we shall use later on. The
state space is
X = fs(i) : 0 i 5g ,
where 0 1
B Bc
s(i) = @ A i 6 i A.
Ac 5 i 5+i
The mechanism for this Markov chain is as follows. Select the state s(i) with
probability
= ( 1; 2; )t
24
is a stationary distribution for fXn g, then the Markov chain is strictly sta-
tionary.
Proof: We observe that
P
P ([X1 = i]) = j2X P ([X1 = i]j[X0 = j])P ([X0 = j])
P
= j2X j pji = i
= P ([X0 = i]).
Continuing inductively, one can prove P ([Xn = i]) = i for all n 1 and all
i 2 X. From this we obtain for all positive integers m and n,
T
m+n Q Tt 1
P [Xk = ik ] = P ([Xm = im ]) m+n
t=m+1 P ([Xt = it ]j q=m [Xq = iq ])
k=m Q
= im nt=m+1 pit 1 it
Q Tt 1
= P ([X0 = im ]) m+n P ([X t m = i t ]j q=m [Xq m = iq ])
Tm+n t=m+1
=P k=m [Xk m = ik ] . Q.E.D.
25
From the …niteness of i and from pm
ji > 0, it follows that j < 1.
then
(1) i = 1
(2) is aP …nite invariant measure, and
(3) j = 1 n=0 P ([Xn = j][Ti (1) > n]j[X0 = i]) < 1 for all j 2 X..
Proof: Note that in this theorem and its proof, the measure is a function
of i. We …rst prove that (3) is true. By the monotone convergence theorem,
!
P1
Ti (1) P
1 PTi (1) 1
E I[Xn =j] j[X0 = i]) = E n=0 I[Xn =j][Ti (1)=k] j[X0 = i])
n=0 k=1
P1 P
k 1
= n=0 P ([Xn = j][Ti (1) = k]j[X0 = i]))
P1 P1
k=1
= P1n=0 S1 P ([Xn = j][Ti (1) = k]j[X0 = i])
k=n+1
= Pn=0 P ( k=n+1 [Xn = j][Ti (1) = k]j[X0 = i])
1
= n=0 P ([Xn = j][Ti (1) > n]j[X0 = i]),
PTi (1) 1
i =E n=0 I[Xn =i] j[X0 = i])
P Ti (1) 1
= P ([X10 =i]) E n=0 I[Xn =i][X0 =i]
P1 Pm
= P ([X10 =i]) E T 1
m=1 I m 1 [Xt 6=i][Xm =i] n=0 I[Xn =i][X0 =i]
t=1
1
= P ([X0 =i])
E(I[X0 =i] ) = 1,
26
Thus, for j 6= i, we may write
P1 P i (1)
j = E( I[Ti (1)=k] Tn=0 I[Xn =j] j[X0 = i])
P1 Pk
k=1
= E(Pk=1 Pn=0 I[Ti (1)=k][Xn =j] j[X0 = i])
= E( 1 n=1
1
k=n I[Ti (1)=k][Xn =j] j[X0 =Pi])
= P ([X1 = j][Ti (1) 1]j[X0 = i]) + 1 n=2 P ([Xn = j][Ti (1) n]j[X0 = i]).
So, since j 6= i, then [Xn = j][Ti (1) n] = [Xn = j][Ti (1) n + 1], and
P1
j = pij + Pn=2 P ([X Pn = j][Ti (1) n + 1]j[X0 = i])
= pij + k2X;k6=i 1 n=2 P ([Xn = j][Ti (1) n + 1][Xn 1 = k]j[X0 = i])
P P1
= pij + k2X;k6=i n=2 P ([Xn = j]j[Ti (1) n + 1][Xn 1 = k][X0 = i])Q(k; n)
where Q(k; n) = P ([Ti (1) n + 1][Xn 1 = k]j[X0 = i]). But for n 2 and
k 6= i,
The inner sum vanishes for m Ti (1); also since k 6= i, we may start the
inner sum at m = 0. Thus
P PTi (1) 1
j = i pij + k2X;k6=i pkj E m=0 I[Xm =k] j[X0 = i]
P P
= i pij + k2X;k6=i pkj k = k2X k pkj ,
which establishes (2) for j 6= i. Last we wish to prove (2) for j = i. But …rst
we establish a claim.
Claim. If j 6= i, if n 1 and if P ([Xn = j][Ti (1) > n][X0 = i]) > 0, then
27
Proof: By the technical theorems we obtained in Part I, and since j 6= i,
which proves the claim. Continuing our proof of (2) for j = Pi, since in (1)
we established that i = 1, it is therefore su¢ cient to prove k2X k pki = 1.
Applying the above claim, the hypothesis of recurrence and (iii), we have,
letting W = P ([Xn+1 = i]j[Xn = k][Ti (1) > n][X0 = i]),
P
k2X k p
P P ki
= Pk2X 1 P1
n=0 P ([Xn = k][Ti (1) > n]j[X0 = i])pki
= k2X;k6=i n=0 P ([Xn = k][Ti (1) > n]j[X0 = i])pki
P P
= k2X;k6=i 1 P ([Xn = k][Ti (1) > n]j[X0 = i])P ([Xn+1 = i]j[Xn = k])
P Pn=0
= k2X;k6=i 1 n=0 ([Xn = k][Ti (1) > n]j[X0 = i])W
P
P1 P
= n=0 k2X;k6=i P ([Xn = k][Ti (1) > n][Xn+1 = i]j[X0 = i])
P
= 1 n=0 P ([Xn+1 = i][Ti (1) = n + 1]j[X0 = i]) = 1.
is a stationary distribution. P
Proof: By theorem 6, we need only prove that j2X j = 1. We shall
make use of this well-known fact: if U is a nonnegative integer valued random
variable with …nite expectation, then
P
E(U ) = u 0 P ([U u]).
28
Making use of the Lebesgue monotone convergence theorem, we obtain
P P PTi (1) 1
j2X j = j2X E I[Xn =j] j[X0 = i]
n=0
P P1 Pk 1
=E j2X k=1 n=0 I[Xn =j][Ti (1)=k] j[X0 = i])
P1 P1 P
=E n=0 k=n j2X I[Xn =j][Ti (1)=k] j[X0 = i])
P1 P1
P1 n=0 k=n I[Ti (1)=k] j[X0 = i])
=E
= n=0 P ([Ti (1) n]j[X0 = i]) = E(Ti (1)j[X0 = i]),
from which the conclusion follows.
29
for all positive integers m. By the hypothesis of irreducibility, there exists a
positive integer m such that pm ji > 0. Thus
P m m
i = k2X k pki j pji > 0,
which proves the claim. This claim and proposition 5 imply 0 < k < 1 for
all k 2 X. Now …x the state i that de…nes the measure . By theorem 6,
i = 1. Since the product of any positive constant with is still an invariant
measure, we may assume without loss of generality that i = 1.
i i
Claim 2. j j for all j 2 X. Indeed, let P = ( prs ) denote P with the
column determmined by state i replaced by zeros. We …rst verify that
P i
j = ij + k2X k pkj ;
Then by lemmaP 9, i pnij = P ([Xn = j][Ti (1) > n]j[X0 = i]). So i = 1, and,
for j 6= i, j = 1 i n
n=0 pij . So denote i as the vertical vector with all zeros
except for a 1 in the ith place. Then we may write
t i
= ( 1 ; 2P
; ) P
i i
= ( i1 + k2X k pk1 ; i2 + k2X k pk2 ; )
= ti + t i P.
30
P1 n
Since we established above that i = 1 and j = n=0 pij for j 6= i, we
obtain from what has been established up to this point that j j for
all j 6= i; and in addition i = i = 1. This veri…es claim (2). But from
claim (2), is nonnegative and is also an invariant measure. If, for any
j 2 X, j j > 0, then by claim (i); k k > 0 for all states k. But this
contradicts the fact that i i = 0. Hence the two invariant measures are
identical. Q.E.D.
31
P
the state space is …nite, so j2X j < 1. Thus
P PTi (1) 1
1 > j2X E n=0 I[Xn =j] j[X0 = i])
PTi (1) 1P
=E n=0 j2X I[Xn =j] j[X0 = i])
PTi (1) 1
=E n=0 1j[X0 = i]) = E(Ti (1)j[X0 = i]),
Then j de…ned by
PTi (1) 1
E n=0 I[Xn =j] j[X0 = i])
j
j = =
E(Ti (1)j[X0 = i]) E(Ti (1)j[X0 = i])
(sj )ps(j)s(i)
(s(i); s(j)) = min 1; .
(si )ps(i)s(j)
32
If [U < a(s(i); s(j))] occurs, then decide that [X1 = s(j)] has occurred;
otherwise, decide that [X0 = s(i)] has occurred. Repeat all of the above for
X2 ; X3 ; .
(i) Display the stochastic matrix for this altered Markov chain.
(ii) Compute P ([X0 = s(i)]) for 0 i 5, and determine whether
P ([X0 = s(i)]) = P ([X1 = s(i)]) for 0 i 5.
(iii) Verify that this Markov chain is irreducible, is aperiodic, has a …nite
state space and is positive recurrent.
Proof: Suppose the conclusion is not true. Then there exists an > 0
such that
P ([Xn in…nitely often]) > 0.
P
Let ! 2 [Xn in…nitely often]; then 1 n=0 Xn (!) = 1 over this set of
positive probability, which contradicts the hypothesis. Q.E.D.
fk : xk ng fk : xk n + 1g.
Thus it is su¢ cient to prove that frn g is unbounded. If, to the contrary, frn g
is bounded, then there is a minimum value of n, call it n0 , and a maximum
value of k, call it k0 , such that rn = rn0 for all n > n0 and such that xk xn0
for all k > k0 . This contradicts the hypothesis that xn ! 1 as n ! 1.
3. Theorem (Strong Law of Large Numbers). If fXn g is a sequence
of independent, identically distributed random variables, and if X n = (X1 +
+ Xn )=n, then P ([X n converges]) = 1 if and only if EjX1 j < 1, in which
case P ([X n ! E(X1 ) as n ! 1]) = 1.
33
Theorem 3, due to A. N. Kolmogorov, is a classical theorem in measure-
theoretic probability.
At this point we should note that in this last sum, f (X0 ) = f (i) since every-
thing takes place over the event [X0 = i], and when we take n = k, since we
are then over the event [Ti (1) = k], then also f (Xn ) = f (i), so we can start
the innermost sum at n = 1 and end it at k. Thus
P1 Pk
P E I
k=1 [Ti (1)=k] n=1
f (Xn )j[X0 =i])
j2X f (j) j =
PTi (1) E(Ti (1)j[X0 =i])
E n=1
f (Xn )j[X0 =i])
= E(Ti (1)j[X0 =i])
,
1 PN P
P lim n=0 f (Xn ) exists and = i2X f (i) i j[X0 = i] =1
N
34
for every i 2 X, and
1 PN P
P lim n=0 f (Xn ) exists and = i2X f (i) i = 1.
N
Bi (N ) = maxfk 0 : Ti (k) N g,
So now we may apply the classical strong law of large numbers of Kolmogorov
to obtain
1 Pm PTi (k+1) PTi (1)
lim k=1 n=T (k)+1 f (Xn ) = E n=1 f (Xn )j[X0 = i] a.e.[P ( j[X0 = i])].
m!1 m i
Notice that the random variable Bi (N ) de…ned above is the largest value of
k such that Ti (k) N , then Ti (Bi (N )) N , and for any random variable
strictly greater than Bi (N ), then Ti evaluated at that random variable is >
N . This means that
35
Let us look at the sum at the left of this last string of inequalities; this term
can be written as
PTi (Bi (N )) P i (1) PBi (N 1) PBi (N 1)
n=0 f (Xn ) = Tn=0 f (Xn ) + k=1 k = J + k=1 k,
36
or
1 PN P
lim inf
N !1 N n=0 f (Xn ) j2X f (j) j . a.e. [P ( j[X0 = i])].
to obtain
1 PN P
lim sup n=0 f (Xn ) j2X f (j) j a.e. [P ( j[X0 = i])].
N !1 N
Now let
1 PN P
A = [ lim n=0 f (Xn ) exists and = j2X f (j) j ].
N !1 N
37
3. Prove that since Bi (N ) " 1 with P ( j[X0 = i])-probability 1 as n !
1, then
Ti (Bi (N ))
! E(Ti (1)j[X0 = i])_ a.e. [P ( j[X0 = i])] as n ! 1.
Bi (N )
and Tn+k Qk
P r=n [Xr = ir ] = P ([Xn = i0 ]) r=1 pir 1 ir ,
from which the conclusion follows.
j) (i; j) if i 6= j
K(i;P
M K(i; j) =
1 k2Xnfig K(i; k) (i; k) if i = j,
38
where
(j)K(j; i)
(i; j) = min 1; ,
(i)K(i; j)
then the Markov chain determined by , M K and X is a reversible (and thus
stationary), irreducible, aperiodic Markov chain.
Proof: It is su¢ cient to prove: for every pair of distinct states i and j,
(i)M K(i; j) = (j)M K(j; i).
Indeed, if (j)K(j; i) = (i)K(i; j) for all pairs i; j, there is no change
needed to obtain the conclusion. But if there exists states i 6= j such that
(j)K(j; i) < (i)K(i; j), then (i; j) = (j)K(j;i)
(i)K(i;j)
and (j; i) = 1, so that
(j)K(j; i)
(i)M K(i; j) = (i)K(i; j) = (j)K(j; i) = (j)M K(j; i),
(i)K(i; j)
which establishes reversibility. Since 0 < (i; j) 1, then K(i; j) > 0 if
and only if M K(i; j) > 0, so irreducibility and aperiodicity are preserved.
Q.E.D.
39
then fYn g is a Markov chain that is irreducible, reversible and aperiodic.
Proof: Since fXn g is reversible, we know that
(i)pij = (j)pji .
40
is, say, X1 = x1 , one may simulate an observation on X according to tne
probability distribution K(x1 ; ), and call the outcome X2 . In this way one
can generate an observable Markov chain fXn g that is reversible, irreducible
and aperiodic. After a very large number n of observations, our simulated
observation on X will be, if our hypothesis is correct, none other than the
distribution ; this follows from our …rst big limit theorem. So at some time
after taking a large number N of observations on the Markov chain one may
begin start observing I['(Xn ) '(x)] for n N: Keeping track of the number of
1’s for the next M simulations yields the quantity
P +M
SM = N n=N I['(Xn ) '(x)] :
By the second big limit theorem, known as the ergodic theorem, the ratio
1
M M
S converges with probability one as M ! 1 to (fz 2 X :'(z) '(x)g).
Next suppose that the only stochastic matrix that one can come up with
that is indexed by X and that has a relatively small number of positive
entries in each row is such that the Markov chain determined by f ; Kg is
irreducible and reversible (and therefore stationary) but not aperiodic. In
this case, for every pair states i 6= j, de…ne a new Markov chain f ; AKg by
AK(i; j) = 0:9K(i; j), and for j = i, let
P
AK(i; i) = 1 fAK(i; j) : j 6= ig.
41
Then by theorem 4, the Markov chain determined by f ; M Kg is reversible
and therefore stationary, it is still irreducible, and by proposition 7, it is
aperiodic. The procedure in this case and in a previous case is an application
of what is referred to in the literature as the Hastings-Metropolis procedure.
In the case when the Hastings-Metropolis procedure is applied, the sto-
chastic matrix M K enjoys a certain opimality property, which we now de-
velop.
Let X denote a …nite state space, and let denote a probability measure
over X such that (x) > 0 for all x 2 X. Let S(X) denote the set of all
stochastic matrices indexed by X such that for each K 2 S(X), the Markov
chain determined by { ; Kg, is irreducible and aperiodic. Let R( ) S(X)
be de…ned as the set of all K 2 S(X) such that the Markov chain determined
by f ; Kg is reversible. Finally over S(X) we create a metric d(K; K 0 ) de…ned
by
P
d(K; K 0 ) = Pf (x)jK(x; 0
P y) K (x; y)j : x 2 0X;y2 X;x 6= yg
= x2X (x) fy:y6=xg jK(x; y) K (x; y)j.
It should be noted that d( ; ) satis…es the properties of a metric over S(X).
Next we de…ne the mapping M : S(X) ! R(X) by
(y)
M K(x; y) = K(x; y) ^ (x) K(y; x) if x 6= y
P
=1 fz:z6=xg M K(x; z) if x = y.
42
We shall break up the proof into a sequence of claims and subclaims.
+
Claim 1: If K 2 S(X), then K 2 Hxy if and only if K 2 Hyx . This claim
follows directly from the de…nitions.
Now let N 2 R(X) be arbitrary. Then by the de…nition of R(X);
(x)N (x; y) = (y)N (y; x)
for all x; y in X.
Claim 2: For all N 2 R( ) and all K 2 S(X)nR( ),
P
d(K; N ) f (x)jK(x; y) N (x; y)j + (y)jK(y; x) N (y; x)jg,
where the sum is taken over only those pairs of elements x; y in X that satisfy:
+
x 6= y and K 2 Hxy . To prove this we break up d(K; N ) into a sum of three
sums:
P P
d(K; N ) = x2X (x) fy:y6=x;K2Hxy
P P
+
g jK(x; y) N (x; y)j
+ x2X (x) fy:y6=x;K2Hxy g jK(x; y) N (x; y)j
P P
+ x2X (x) fy:y6=x;K2Hxy g jK(x; y) N (x; y)j
P P
P x2X (x) fy:y6=x;K2Hxy
P
+
g jK(x; y) N (x; y)j
+ x2X (x) fy:y6=x;K2Hxy g jK(x; y) N (x; y)j.
By claim 1, this last sum above is
P P
x2X (x) +
fy:y6=x;K2Hyx g jK(x; y) N (x; y)j,
and now replace in it each x by a y and each y by an x to obtain
P P
y2X (y) +
fx:x6=y;K2Hxy g jK(y; x) N (y; x)j,
which proves the claim.
Claim 3. d(K; M K) d(K; N ). To prove this, we …rst note that since
K(x; y) M K(x; y) for x 6= y, we obtain.
P
d(K; M K) = x6=y f (x)jK(x; y) M K(x; y)jg
P
= x6=y f (x)K(x; y) (x)M K(x; y)g
P (y)
= x6=y f (x)K(x; y) (x)fK(x; y) ^ (x) K(y; x)gg.
+
Then, by the de…nitions of Hxy , Hxy and Hxy , (and for this …xed K), we
continue the above to obtain
P
d(K; M K) = fx6=y:K2Hxy + f (x)K(x; y) (y)K(y; x)g
P g
+ fx6=y:K2Hxy g f (x)K(x; y) (x)K(x; y)g
P
+ fx6=y:K2Hxy g f (x)K(x; y) (x)K(x; y)g
P
= fx6=y:K2Hxy+ f (x)K(x; y) (y)K(y; x)gg.
g
43
Since N satis…es (x)N (x; y) = (y)N (y; x), the the last sum above is equal
to P
f (x)(K(x; y) N (x; y)) + (y)(N (y; x) K(y; x))g,
+
x6=y;K2Hxy
which is
P
f (x)jK(x; y) N (x; y)j + (y)jN (y; x) K(y; x)jg
+
x6=y;K2Hxy
44
+
Now for those x 6= y for which K 2 Hxy , the expression (y)K(y; x)
(x)K(x; y) is negative, while (x) (x; y) is nonnegative, with at least one
term strictly positive by claim 4. These two observations imply
P
d(K; N ) > fx6=y:K2Hxy +
g (x)j (x; y)j
P
+ fx6=y:K2Hxy
P
+
g j (y)K(y; x) (x)K(x; y)j
> fx6=y:K2Hxy+ ( (x)K(x; y)
g (y)K(y; x)).
1 1 1 2 2 2
; ; ; ; ; :
1 2 3 1 2 3
Suppose we desire to test the null hypothesis that U and V are independent,
that is, for some unknown positive probabilities p1 ; p2 ; q1 ; q2 ; q3 , the joint
density of U; V satis…es
P ([U = i][V = j]) = pi qj ; i = 1; 2; j = 1; 2; 3:
U
So we take 10 independent observations on , and obtain results that
V
can be arranged in a table with two rows and three columns, namely,
X11 X12 X13
;
X21 X22 X23
where Xij = the number of observations which take U -values in row i and
V -values in column j. Thus
P
2 P
3 P
3 P
2
Xij = Xij = 10:
i=1 j=1 j=1 i=1
45
1. Verify that the total number of tables produced in this way is 610 .
2. Now suppose that when we take these ten observations, we obtain the
following table:
2 2 1
.
1 1 3
The row totals are 5 and 5, and the column totals are 3; 3; 4 respectively.
3. Let us denote the row totals for any of the 610 outcomes by X1 ; X2
and the column totals by X 1 ; X 2; X 3 . Compute
for all (i; j)’s that do not violate the conditioning event.
4. Note that in problem 3, this conditional joint distribution of X11 and
X12 does not depend on the unknown parameters p1 ; q1 ; q2 . So now …nd the
conditional density of 2 given [X1 = 5][X 1 = 3][X 2 = 3], where
2
P2 P3 (Xij 10Xi X j )2
= i=1 j=1 .
10Xi X j
If this is too small, we would reject the null hypothesis that the selections of
the rows and the columns are independent. In this problem, this computation
could be easily carried out. However, for more rows and/or columns and for
larger sample sizes, this is an impossible task. But this probability can be
simulated, and this is by means of of the Hastings-Metropolis algorithm,
which you are now asked to carry out in a sequence of steps outlined in
problems 7, 8, 9 and 10.
7. First note that the conditional density computed in problem 3 has
13 points in its domain. Thus we have a state space of 13 states. In brief,
we would like to have at our disposal a Markov chain with this 13 state
state space and a transition matrix for which this conditional density is the
46
invariant measure. Having these, we could then simulate the Markov chain,
and for a large number of observations we could …nd the relative frequency
with which the event [ 2 2
0 ] occurs. By the ergodic theorem for Markov
chains, this relative frequency converges to P ([ 2 2
0 ]) with probability
1. So we need a stochastic matrix, which we construct and simulate as
follows. Whichever state the Markov chain is at, select two columns in the
table at random without replacement. These two columns and the two rows
determine a 2 2 table or matrix. Then select one of the two matrices
1 1 1 1
or
1 1 1 1
at random with probabilities 21 ; 21 : If you can add whatever you get to the
two columns selected at random without replacement and not obtain any
negative integers, then do so, and call the resulting 2 3 table the new state
of the chain. On the other hand, if you do get a 1, then the next state is the
same as the present one. Thus a Markov chain is created whose state space
is the set of all 2 3 tables of nonnegative integers whose row sums are 5
and 5 and whose column sums are 3; 3 and 4. Display the transition matrix,
P = (p(i; j)), indexing the rows and columns in the same manner as the
states are indexed for the invariant probability measure, i.e., the conditional
density computed in problem 3.
8. With respect to the Markov chain created in problem 7 which could
be simulated, verify that this chain is irreducible and aperiodic.
9. If you let denote the desired invariant probability measure (as a
row vector), and if the rows of P are indexed in the same order as the
coordinates of , you should be able to verify that the equation = P
(matrix multiplication) is not satis…ed. Thus the stochastic matrix P is not
the one we need for a stationary Markov chain.
10. We now know how to alter P to get a new stochastic matrix Q = (qij )
which together with the initial distribution,
= ( (1); ; (13)),
will determine a stationary Markov chain. So de…ne
q(i; j) = P (i; j) if i 6= j
p(i; j)
1 fp(i; k) (i; k) : k 6= ig if j = i,
where
(j)p(j; i)
(i; j) = min 1;
(i)p(i; j)
47
for all those ordered pairs of states (i; j) for which p(i; j) > 0. Now simulate
this chain, ( ; Q); say, 25; 000 times, starting from any state you choose, and
then at each simulation for the next 25; 000 observations compute the relative
frequency with which the event [ 2 2
0 ] occurs. This relative frequency
should be very close to the conditional probability that was obtained in
problem 3. (Epilogue: Why did we simulate the chain 25; 00 times before we
started simulating P ([ 2 2
0 ])? For some reason, one wants to start the
chain in which an estimate of P ([ 2 2
0 ]) is computed “according to the
invariant measure .").
48