Foss Lecture1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Lectures on Stochastic Stability

Sergey FOSS
Heriot-Watt University
This mini-course presents an overview of stochastic stability methods, mostly moti-
vated by (but not limited to) stochastic network applications. We work with stochastic
recursive sequences, and, in particular, Markov chains, in a general Polish state space. We
discuss and compare methods based on (i) Lyapunov functions, and uid limits, (ii) ex-
plicit coupling (renovating events and Harris chains), (iii) monotonicity, and some others.
We also discuss instability methods and perfect simulation methods.
Lectures are based on handouts of my lecture notes (Colorado State Uni, 1996; Novosi-
birsk State Uni, 19972000; Kazakh National University, 2007), on the joint overview pa-
per with Takis Konstantopoulos (2004), on notes written by us for a Short LMS/EPSRC
Course for PhD students (September 2006), and on some (more-or-less) recent publica-
tions.
1
Table of Topics
1. Introduction.
2. Lyapunov techniques. Criteria for Positive Recurrence and for Instability.
3. Fluid Approximation Approach.
4. Coupling and Harris Chains.
5. Monotonicity and Saturation Rule.
6. Renovation Theory, Perfect Simulation.
7. Some intriguing open problems.
2
1 Lecture 1. Basic Tools.
1.1 Notation, Acronyms, and Basic Concepts
R.v. random variable
i.i.d. independent identically distributed
X, Y, Z, , , , . . . for r.v.s
F, G distribution functions, f density function
P probability and probability measure, E expectation, D variance
= F means P( x) = F(x) for all x
= P means P( B) = P(B), B B.
I(A), or 1(A) the indicator function of event A, I(A) = 1 if A occurs, and .I(A) = 0,
otherwise.
Here are standart families of distributions:
U[a, b] G(p)
E() B(m, p)
N(a,
2
) ()
Convergence:

n
a.s.
means P(lim
n
= ) = 1, or > 0, P(sup
mn
[
m
[ > ) 0
as n .

n
p
means > 0, P([
n
[ > ) 0 as n .
The same for random vectors.
3
Key Properties of Convergence. Let mean either
a.s.
or
p
.
(1) If
n
and
n
, then (
n
,
n
) (, )
(2) If
n
and if g is a continuous function, then g(
n
) ().
(3) More generally, assume that g is not continuous everywhere and denote by D
g
a set
of its discontinuity points. If
n
and if P( D
g
) = 0, then g(
n
) ().
Weak convergence of distribution functions: F
n
F, if, for each x such that F(x) is
continuous in x,
F
n
(x) F(x).
Equivalent form: F
n
F if, for any bounded and continuous function g,
_
g(x)dF
n
(x)
_
g(x)dF(x).
Comment on terminology: Weak convergence is the most common term. Other
terms are convergence of/in distribution(s) and convergence in law.
Weak convergence of random variables:
n
. It means:
n
= F
n
, = F and
F
n
F.
Note that
n
is just a convenient notation ! There is no any real convergence of
random variables on sample paths.
Relations between convergence types:

n
a.s.
implies
n
p
and
n
p
implies
n
.
Both converse statements are incorrect. Here are two examples:
Example 1. Weak convergence does not imply convergence in probability. Let P(
1
=
1) = P(
1
= 1) = 1/2 and
n+1
=
n
, n = 1, 2, . . ..
Example 2. Convergence in probability does not imply a.s. convergence. Let , F, P
= ((0, 1], B
(0,1]
, ) where is the Lebesgue measure. Let
0
1. Let, for m = 1, 2, . . .,
for n such that 1 + 2 + . . . + 2
m1
< n 1 + 2 + . . . + 2
m1
+ 2
m
, and for i =
n (1 + 2 +. . . + 2
m1
),

n
() = 1 if ((i 1)/2
m
, i/2
m
) and
n
= 0, otherwise.
4
Laws of Large Numbers.
If ,
1
,
2
, . . . are i.i.d. random variables with a nite mean, say a = E, then the
Weak Law of Large Numbers (WLLN) says:
S
n
/n
p
a as n
and the Strong Law of Large Numbers (SLLN) says
S
n
/n
a.s.
a as n .
Lebesgue and Beppo Levy Theorems.
Theorem (Beppo Levy). If
n
is a.s. non-negative and non-decreasing sequence
of random variables, then
E lim
n

n
= lim
n
E
n
where both sides are either nite or innite simultaneously.
5
Coupling.

is a copy of they have the same distribution


D
= . In general,

and
may be dened on dierent probability spaces.
(a) Coupling of distribution functions (d.f.) or of probability measures.
For two d.f.s F
1
and F
2
, their coupling is a construction of a two-variate distribution
function F(x
1
, x
2
) such that F(x
1
, ) = F
1
(x
1
) and F(, x
2
) = F
2
(x
2
).
Similarly, for two probability measures, P
1
and P
2
on the real line, their coupling is a
probability measure on the plane P(), such that its projections are P
1
and P
2
.
The same denitions of coupling may be introduced for any number of distributions
(distribution functions, probability measures).
Such a coupling may also be viewed as follows: we dene a probability space , F, P
and two random variables
1
and
2
on this space such that
1
= F
1
and
2
= F
2
(or,
in other notation,
1
= P
1
and
2
= P
2
). Then their joint distribution, say F, has
marginals F
1
and F
2
(or, equivalently, a probablity measure P(B) = P((
1
,
2
) B) has
martinals P
1
and P
2
).
(b) Coupling of two random variables.
Let
1
be dened on
1
, F
1
, P
1
and
2
be dened on
2
, F
2
, P
2
.
A coupling of these two r.v.s is dened by, rst, an introduction of a new probability
space, say , F, P and, then, by dening a pair of two r.v.s

1
,

2
on this space such
that

1
D
=
1
,

2
D
=
2
.
Examples:
(1) F
1
= U(0, 1), F
2
= U(0, 1);
(2) F
1
= U(0, 1), F
2
= E(1);
(3) F
1
= U(0, 1), F
2
= (1);
(4) F
1
= B(n, p), F
2
= (np);
(5) F
1
has a density 2xI(x (0, 1) and F
2
a density 2(1 x)I(x (0, 1).
In each example, there are many couplings !
6
1.2 Weak and strong convergence
Lemma 0.
_

_
If F
n
F (all F
n
and F are d.f.s), then a coupling of F
n
and F:

n
a.s.
.
Proof. For a d.f. F, dene its inverse F
1
by
F
1
(z) = infx : F(x) z, z (0, 1).
Let =(0,1), F be the -algebra of Borel subsets in (0,1), and P the Lebesgue
measure on (0,1).
Set () = , . Then = U(0, 1).
Let
n
= F
1
n
(), = F
1
() and show
n
a.s.
. Note that
n
= F
n
, = F
In order to avoid some technicalities, assume, for simplicity, that all d.f. are continuous.
Let

n
= inf
mn

m
,
n
= sup
mn

m
, F
n
= sup
mn
F
m
, F
n
= inf
mn
F
m
Then
n
= F
n
,
n
= F
n
.
Indeed,
P(
n
x) = P(
n
< x) = P( m n :
m
< x) =
= P( m n : F
1
m
() < x) = P( m n : < F
m
(x)) =
= P( < sup
mn
F
m
(x)) = F
n
(x)
Similarly, P(
n
> x) = . . . = 1 F
n
(x).
Since F
n
F and F
n
F (by denition), it is sucient to show that, for instance,

n
a.s.
.
But both F
n
and
n
are monotone as a function of n!
Then
n
a.s. and, therefore, there exists such that
n
a.s. Then
a.s.
If P( ,= ) > 0, then there exists x:
P( x) > P( x).
But P( x) = F(x) = limF
n
(x) P( x)!
Thus, we got a contradiction, and
n
a.s. By similar arguments,
n
a.s.
Therefore,
n
a.s.
.
Problem No 1. Prove this lemma without the additional
assumption that all d.f.s are continuous
Exercises: What is F
1
for the following distribution functions:
U(0, 1), E(), N(0, 1), B(1, p), B(n, p, ()...
7
1.3 Uniform integrability
Let
n

n1
be a sequence of real-valued r.v.s.
Denition 1.
_

n
are uniformly integrable (UI), if E[
n
[ < n and, moreover,
sup
n
E[
n
[ I([
n
[ x) h(x) 0 as x .
Comments:
Actually, we can put = instead of in the denition above. But I prefer to
keep since I want the upper bound h(x) to be monotone non-increasing and right-
continuous.
Clearly, if
n
are UI, then sup
n
E[
n
[ is nite.
Examples:
(1)
n
= E(
n
), n = 1, 2, . . . are UI if and only if min
n

n
> 0.
(2)

n
2n 2n 0
1
2n
1
2n
1
1
n
[=
_
E[
n
[ = 2, E
n
= 0; 0,
but
n
are not UI!
Lemma 1.
_

_
The following are equivalent:
(i)
n
are UI;
(ii) a function g : [0, ) [0, ) :
(a) g(0) > 0; g ; lim
x
g(x) = ;
(b) sup
n
E[
n
[ g([
n
[) <
Note: g(0) > 0 is not essential!
Proof.
(ii) (i). For each n,
E[
n
[ I([
n
[ x) E[
n
[
g([
n
[)
g([
n
[)
I([
n
[ x)

1
g(x)
sup
n
E[
n
[ g([
n
[) 0 as x .
(i) (ii). Assume that h(x) > 0 x (otherwise the statement is trivial).
For m Z, let
A
m
= x :
1
2
2(m+1)
< h(x)
1
2
2m

and, for x A
m
, let g(x) = 2
m
. From h(0) < , we get g(0) > 0.
Note that A
m
is an interval which is closed from the left and open from the right.
Denote by z
m
its left boundary point, z
m
A
m
. Then
E[
n
[ g([
n
[) =

m
E[
n
[ g([
n
[) I([
n
[ A
m
) =
8
=

m
E[
n
[ 2
m
I([
n
[ A
m
)

m
2
m
E[
n
[ I([
n
[ z
m
)

m
2
m
h(z
m
)

m
2
m

1
2
2m
<
Note that, in the proof of Lemma 1, we have proposed a particular choice of a function
g which is the step function. But we can make g very smooth if we like.
Remark 1.
_
_
As a corollary, one can get the following: if E[[ < , then g from
Lemma 1 such that E[[ g([[) < . This may be reworded as
the rst moment [= something more.
Lemma 2.
_

_
Assume
n
. Then
(1)
n
are UI [= E[[ < and E[
n
[ E[[ (and, therefore,
E
n
E );
(2) P(
n
0) = 1 n; E
n
< n; E
n
E < [=
n

are UI.
Remark 2.
_
_
In (2), the condition P(
n
0) = 1 may be weakened in a natural way.
Problem No 2. How? But it cannot be eliminated.
Proof of Lemma 2. First, note that both statements (1) and (2) are marginal, i.e.
only marginal distributions are involved. So, we can construct a coupling:
n
a.s.
.
Prove (1).
(a) Assume rst that (distributions of) r.v.s are uniformly bounded, i.e. N:
P([
n
[ N) = 1 n (this is a special case of UI).
Then P([[ N) = 1 and, > 0,
0 [E[
n
[ E[[[ E[[
n
[ [[[ = E[[
n
[ [[[ I([[
n
[ [[[ )+
+ E[[
n
[ [[[ I([[
n
[ [[[ > ) + 2N P([[
n
[ [[[ > ) as n
Since > 0 is arbitrary, E[
n
[ E[[.
(b) Assume now that at least one of distributions of r.v.s has an unbounded support,
that is, P([
n
[ N) < 1 N and for some n.
(b1) Take any x > 0 such that P([[ = x) = 0. Since
n
a.s.
,

n

n
I([
n
[ < x)
a.s.
I([[ < x) .
Then
n, P([
n
[ x) = P([[ x) = 1 [= E
n
E (see (a));
and
[
n
[ [
n
[ a.s. [= E[
n
[ E[
n
[ sup
n
E[
n
[ K n [= E[[ K.
9
(b2) Show rst that E[[ < . Indeed,
E[[ = lim
x
E[[ I([[ N) K <
(b3) Now > 0, choose x such that P([[ = x) = 0, h(x) , and E[[ I([[
x) .
Let

n
= E[
n
[ I([
n
[ x) and = E[[ I([[ x).
Then
E[
n
[ = E[
n
[ I([
n
[ < x) +
n
,

E = E[[ I([[ < x) +.


Since
n
n and [[ , then
limsup(E[
n
[ E[[) 2 and
liminf(E[
n
[ E[[) 2 for any .
Letting to 0, we obtain the rst statement of the lemma. .
Prove now the second statement. First, from E < , we may take an arbitrary > 0
and then choose x
0
= x
0
() such that P( = x
0
) = 0 and
E I( x
0
) /2.
Then we may use part (b1) from the proof of (1): for a given x
0
,
E
n
E [= E
n
I(
n
x
0
) = E(
n

n
) =
= E
n
E
n
E E = E I( x
0
) /2.
Therefore, n() such that
E
n
I(
n
x
0
) n > n().
Now, n = 1, 2, . . . , n(),
E
n
< [= x
n
: E
n
I(
n
x
n
) .
Let x = max(x
1
, . . . , x
n()
, x
0
). Then
E
n
I(
n
x) n.
Thus,
sup
n
E
n
I(
n
x) 0 as x .
.
10
1.4 Some useful properties of UI
Property 1. [ If
n
are UI and if
n
are such that [
n
[ [
n
[ a.s., then
n
are UI.
Indeed, let h(x) be from Denition 1. Then, x > 0,
E[
n
[ I([
n
[ > x) E[
n
[ I([
n
[ > x) E[
n
[ I([
n
[ > x) h(x).
.
Property 2.
_

_
If
n
is an i.i.d. sequence with nite mean, E[
1
[ < and if [
n
[ [
n
[
a.s.,
then a sequence
n
=

1
+. . . +
n
n
, n = 1, 2, . . . is UI.
Indeed,
[
n
[
[
1
[ +. . . +[
n
[
n

n
,
where
(i) E
n
= E[
1
[ n and,
(ii) by the SLLN,

n
a.s.
E[
1
[.
[= From Lemma 2, (2),
n
are UI.
[= From Property 1.1,
n
are UI. .
Property 3.
_

_
Since the UI property is the property of marginal distributions only,
one can replace the a.s.-inequality in Property 1.1, [
n
[ [
n
[, by the
weaker one, [
n
[
st
[
n
[ (this means: P([
n
[ > x) P([
n
[ > x) x).
In particular, if r.v.s
n
admit a stochastic integrable majorant ,
[
n
[
st
[[, n
and if E[[ < , then
n
are UI.
Remark 3. Consider, instead of a sequence
n

n1
, a family of r.v.s
t

tT
in-
dexed by an arbitrary set T. Then one can introduce the following
Denition 2.
_

_
(compare with Denition 1).

tT
are UI, if E[
t
[ < t T and, moreover,
sup
tT
E[
t
[ I([
t
[ x) h(x) 0, as x .
Then
11
(a) The statement and the proof of Lemma 1 stay the same if we replace n = 1, 2, . . .
by t T.
(b) Similarly, the statement and the proof of Lemma 2 stay unchanged if we replace
n = 1, 2, . . . by t T = [0, ).
(c) Properties 1 and 3 still hold is we replace n = 1, 2, . . . by t T.
12
1.5 Coupling inequality. Maximal coupling. Dobrushins
theorem.
In this section, we assume that random variables are not necessarily real-valued and may
take values in a general measurable space (X, B
X
) which is assumed to be complete
separable metric space.
The Coupling Inequality
Let
1
,
2
: , F, P (X, B
X
) be two X-valued r.v.s. Let
P
1
(B) = P(
1
B), P
2
(B) = P(
2
B), B B
X
.
Then, for B B
X
,
P
1
(B) P
2
(B) = P(
1
B,
1
=
2
) +P(
1
B,
1
,=
2
)
P(
2
B,
1
=
2
) P(
2
B,
1
,=
2
) =
= P(
1
B,
1
,=
2
) P(
2
B,
1
,=
2
)
P(
1
,=
2
)
P(
1
,=
2
)
Therefore, for any B B
X
, [P
1
(B) P
2
(B)[ P(
1
,=
2
), that is
()
sup
BB
X
[P
1
(B) P
2
(B)[ P(
1
,=
2
)
The Maximal Coupling
Now we reformulate the result obtained. Note that the LHS of inequality (*) depends
on marginaldistributions P
1
and P
2
only and does not depend on the joint distribution
of
1
and
2
. Therefore, we get the following:
for any coupling of marginal distributions P
1
and P
2
, inequality (*) holds. Equivalently,
()
sup
BB
X
[P
1
(B) P
2
(B)[ inf
in all coupling
P(
1
,=
2
)
The following questions seem to be natural:
(?) Is there equality in () ?
(??) If the answer isyes, then does there exist a coupling such that
sup
BB
X
[P
1
(B) P
2
(B)[ = P(
1
,=
2
)?
13
The answers to both questions are positive! And this is the content of
Dobrushins theorem.
Theorem 1.
_

_
Let P
1
and P
2
be two probability measures on a complete separable metric
space (X, B
X
). There exists a coupling of these probalility measures such
that, for
i
= P
i
,i = 1, 2,
sup
BB
X
[P
1
(B) P
2
(B)[ = P(
1
,=
2
).
Proof. (B) = P
1
(B) P
2
(B) is a signed measure. Then Banach theorem states that
there exists a subset C X such that
(a) (B) 0 B C;
(b) (B) 0 B X C C.
Note:
1) if (C) = 0, then P
1
= P
2
and the coupling is obvious;
2) (C) = (C).
Assume (C) > 0. Introduce 4 distributions (probability measures):
Q
1,1
is dened by
_
_
_
Q
1,1
= U(C), if P
1
(C) = 0,
Q
1,1
(B) =
P
1
(C B)
P
1
(C)
, B B
X
, otherwise.
and
Q
2,1
is dened by Q
2,1
(B) =
P
2
(C B) P
1
(C B)
(C)
, B B
X
.
Similarly,
Q
2,2
is dened by
_
_
_
Q
2,2
= U(C), if P
2
(C) = 0,
Q
2,2
(B) =
P
2
(C B)
P
2
(C)
, B B
X
, otherwise.
and
Q
1,2
is dened by Q
1,2
(B) =
P
1
(C B) P
2
(C B)
(C)
, B B
X
.
Then introduce 5 mutually independent r.v.s:

1,1
= Q
1,1
,
1,2
= Q
1,2
,
2,1
= Q
2,1
,
2,2
= Q
2,2
,
and
1 2 0
P
1
(C) P
2
(C) (C)
14
Now we can dene
1
and
2
as follows:

1
=
1,1
I( = 1) +
2,2
I( = 2) +
2,1
I( = 0),

2
=
1,1
I( = 1) +
2,2
I( = 2) +
1,2
I( = 0).
Simple calculations show that
i
= P
i
, i = 1, 2. This is Problem No 3 for you.
Then,
P(
1
,=
2
) = P( = 0) = (C) sup
BB
X
[P
1
(B) P
2
(B)[.
So,
P(
1
,=
2
) = sup
BB
X
[P
1
(B) P
2
(B)[.
.
Comment. Banach theorem and Radon-Nykodim theorem are two equivalent state-
ments formulated in slightly dierent ways.
There is (formally!) another proof (see, e.g. T. Lindvalls book on the coupling method)
based on Radon-Nykodim theorem:
Consider a new probability measure P() = (P
1
() + P
2
())/2. Let f
i
=
dP
i
dP
be corre-
sponding densities. Then
sup
BB
X
[P
1
(B) P
2
(B)[ = 1
_
min(f
1
(x), f
2
(x))P(dx),
and we may repeat the previous construction using densities.
What is the maximal coupling in the following examples:
(1) Two discrete two-point distributions.
(2) Two absolutely continuous distributions on (0, 1) with densities f
1
and f
2
.
(3) Bernoulli and Poisson distributions.
(4) Normal and exponential distributions.
(this is another exercise to you)
15
1.6 Probabilistic Metrics
Dobrushins theorem provides a positive solution to one of important problems in the
theory of Probabilistic Metrics. We will discuss briey basic concepts of this theory.
Again, consider a complete separable metric space (X,B
X
)and introduce the following
notation:
1) X
2
= X X,
2) B
2
X
= B
X
B
X
is a -algebra in X
2
generated by all sets B
1
B
2
, B
1
, B
2
B
X
,
3) diag(X
2
) = (x, x), x X.
Problem No 4. Prove that diag(X
2
) B
2
X
. (Actually, there is no need to assume
that the state space is complete separable metric, and the minimal requirement for
diag(X
2
) B
2
X
to hold is that the sigma-algebra B
X
is countably generated).
Let P be any probability distribution on (X
2
, B
2
X
). Denote by P
i
, i = 1, 2 its marginal
distributions:
P
1
(B) = P(B X),
P
2
(B) = P(X B), B B
X
.
Let P be a set of all probability distributions (measures) on (X
2
, B
2
X
).
Denition 3.
_

_
A function d : P [0, ) is called a probabilistic metric if it satises
the following conditions:
(1) P(diag(X
2
)) = 1 [= d(P) = 0;
(2) d(P) = 0 [= P
1
= P
2
;
(3) the triangle inequiality:
P
(1)
has marginals P
1
and P
2
P
(2)
has marginals P
1
and P
3
P
(3)
has marginals P
3
and P
2
[= d(P
(1)
) d(P
(2)
) +d(P
(3)
);
Denition 4.
_
_
A probabilistic metric d is simple if it depends on marginal distributions
only (i.e. if P
(1)
and P
(2)
have the same marginals, then d(P
(1)
) =
d(P
(2)
)), and complex otherwise.
For a simple metric, it is reasonable to write d(P
1
, P
2
) instead of d(P), so d has the
meaning of a distance between P
1
and P
2
.
For a complex metric, we may also write d(
1
,
2
) instead of d(P) where
1
,
2
is a
coupling of two r.v.s with a joint distribution P,
P(B) = P((
1
,
2
) B), B B
2
X
.
So, d(
1
,
2
) may be considered as a distance between r.v.s.
16
We can also write d(
1
,
2
) for simple metrics. In this case,
d(
1
,
2
) = d(F
1
, F
2
) = d(P
1
, P
2
).
Examples.
Simple Complex
1) sup
BB
[P
1
(B) P
2
(B)[ 2) P(
1
,=
2
) P(X
2
diag(X
2
))
(Total variation norm (T.V.N.)) (Indicator metric (I.M.))
For real-valued r.v.s:
3) sup
x
[F
1
(x) F
2
(x)[ 5) inf > 0 : P([
1

2
[ > ) <
(Uniform metric (U.M.)) (Ki Fan metric (K.F.M.))
4) inf > 0 : F
1
(x ) F
2
(x) F
1
(x +) + x
(Levy metric (L.M.))
One of key problems in the theory of probabilistic metrics is to nd answers to the
following questions:
Assume a simple metric d(P
1
, P
2
) is given. Does there exist a complex metric

d such that
(a) the following coupling inequality holds:
d(
1
,
2
) inf
all couplings

d(
1
,
2
) ?(compare with ())
(b) If yes, then is it possible to replace by = in (a) ?
(c) Does there exist a coupling such that d(
1
,
2
) =

d(
1
,
2
)?
The following result holds:
Theorem 2.
_
_
The answers to the above questions are positive for the metrics:
(1) d = T.V.N.

d = I.M.
(2) d = L.M.

d = K.F.M.
Comment. Statement (1) is Dobrushins theorem. Statement (2) is Strassens theo-
rem (its proof is omitted).
17
1.7 Stopping times
Let , F, P be a probability space and
n

n1
a sequence of r.v.s,
n
: R.
Denote by F
n
a -algebra, generated by
n
:
F
n
F; F
n
=
1
n
(B), B B,
where B is a -algebra of Borel sets in R.
Then, for 1 k n, F
[k,n]
is a -algebra generated by
k
, . . . ,
n
; i.e.
F F
[k,n]
is a minimal -algebra such that
F
[k,n]
F
l
for all l = k, . . . , n.
Another way to describe F
[k,n]
is:
let

k,n
:= (
k
, . . . ,
n
) be a random vector;

k,n
: R
nk+1
. Then
F
[k,n]
=

1
k,n
(B), B B
nk+1
,
where B
nk+1
is a -algebra of Borel sets in R
nk+1
.
Finally, F
[1,)
is a -algebra generated by the whole sequence
n

n1
.
Good Property :
_
A F
[1,)
, a sequence of events A
n

n1
, A
n
F
[1,n]
such that
P(A A
n
) +P(A
n
A) 0 as n .
Let now : 1, 2, . . . , n, . . . be an integer-valued r.v. (we say it is a counting
r.v.)
Denition 5.
_

_
is a stopping time (ST) with respect to
n
, if n 1,
= n F
[1,n]
(or, equivalently n F
[1,n]
).
Another variant of a denition of a stopping time is:
Denition 6.
_

_
is an ST if a family of functions h
n
: R
n
0, 1 such that:
n 1, I( = n) = h
n
(
k
, . . . ,
n
) a.s.
(or, equivalently I( n) = h
n
(
k
, . . . ,
n
) a.s.).
Examples of STs:
(1) = minn 1 :
n
x;
(2) = minn 1 :

n
1

i
x;
(3) More examples....
Assume now that
n
is an i.i.d. sequence, is an ST with P( < ) = 1.
Let

1
=
+1
,

2
=
+2
, . . . ,

i
=
+i
, . . .
18
Lemma 3.
_

_
The following statements hold:
1)

i
is an i.i.d. sequence;
2)

i
D
=
1
;
3)

i1
and a random vector (,
1
, . . . ,

) are mutually indepen-


dent.
Corollary 1.
_

i1
and S


1
+. . . +

are mutually independent.


Proof of Lemma 3. It is sucient to show that
k 1, m 1, Borel sets B
1
, . . . , B
k
and C
1
, . . . , C
m
,
()
P( = k;
1
B
1
, . . . ,
k
B
k

1
C
1
, . . . ,

m
C
m
) =
= P( = k;
1
B
1
, . . . ,
k
B
k
)P(
1
C
1
, . . . ,
m
C
m
).
Indeed, () [= 1), 2), and 3).
First, take B
1
= . . . = B
k
= B
k+1
= . . . = R. Then, m,
()
P(

1
C
1
, . . . ,

m
C
m
)
t.p.f.

k=1
P( = k;

1
C
1
, . . . ,

m
C
m
)
()
=

k=1
P( = k)
m

i=1
P(
1
C
i
) =
m

i=1
P(
1
C
i
).
In particular, j 1 C
j
, we can take m j and C
i
= R for i ,= j.
Then the LHS of ()=P(

j
C
j
),
the RHS of ()=P(
1
C
j
).
_
_
_
[= 2)
Now, take any C
1
, . . . , C
m
and replace in ()
m

i=1
P(
1
C
i
) by
m

i=1
P(

1
C
i
).
_
_
_
[= 1)
Finally, take any B
1
, . . . , B
k
and C
1
, . . . , C
m
and replace in ()
m

i=1
P(
1
C
i
) by
m

i=1
P(

i
C
i
).
_
_
_
[= 3)
So, we will prove () now:
P( = k;
1
B
1
, . . . ,
k
B
k

1
C
1
, . . . ,

m
C
m
) =
P(h
k
(
1
, . . . ,
k
) = 1;
1
B
1
, . . . ,
k
B
k

. .
F
[1,k]

k+1
C
1
, . . . ,
k+m
C
m

. .
F
[k+1,k+m]
) =
= P(. . .) P(. . .) =
= P(. . .)
m

i=1
P(
k+i
C
i
) = P(. . .)
m

i=1
P(
1
C
i
).
19
.
Lemma 4.
(Wald
identity)
_

_
Assume that E[
1
[ < and E < . Then
ES

= E
1
E.
Proof. (a) Show that E[S

[ < .
[S

n=1
[
n
[

n=1
[
n
[ I( n).
Note, that I( n) = 1 I( n 1), and n 1 F
[1,n1]
[=
n
and I( n) are independent [= [
n
[ and I( n) are independent
[= E[S

[ E

n=1
[
n
[ I( n) =

n=1
E. . . =
=

n=1
E[
n
[ P( n) = E[
1
[

n=1
P( n) = E[
1
[ E < .
(b) Therefore,
ES

= E

n=1

n
I( n) = . . . = E
1
E.
.
Lemma 5.
_

_
Let
n

n1
be an i.i.d. sequence;
be an ST w.r. to
n

n1
, P( < ) = 1;

i1
be as dened above;
be an ST w.r. to

i1
, P( < ) = 1.
Then + is a ST w.r. to
n

n1
.
Proof.
+ = k =
k1
_
l=1
= l = k l
=
k1
_
l=1
h
l
(
1
, . . . ,
l
) = 1

h
kl
(

1
, . . . ,

kl
) = 1
=
k1
_
l=1
h
l
(
1
, . . . ,
l
) = 1
. .
F
[1,l]

h
kl
(
l+1
, . . . ,
k
) = 1
. .
F
[l+1,k]
20
[= . . . F
[1,k]
k
[=
_
. . .

F
[1,k]
.
.
Now let us write
(1)
i
instead of
i

(1)
instead of

(2)
i
...

(2)
...
.
.
. ...
.
.
.
Lemma 6.
_
_
If
(i)
is a ST w.r. to
(j)
i

i1
j = 1, . . . , J
and if
(j+1)
i
=

(j)
i
,
then
(1)
+. . . +
(J)
is an ST w.r. to
i

i1
.
Problem No 5. Prove Lemma 6.
21
1.8 Two-dimensional stopping times
Let
n,1

n1
and
n,2

n1
be two sequences of r.v.s and F
[k
1
,n
1
][k
2
,n
2
]
a -algebra
generated by

k
1
,1
,
k
1
+1,1
, . . . ,
n
1
,1
;
k
2
,2
,
k
2
+1,2
, . . . ,
n
2
,2
.
Denition 7.
_

_
A pair of r.v.s
1
,
2
: 1, 2, . . . is an ST w.r. to
n,1
and
n,2
,
if
n
1
1, n
2
1
1
= n
1
,
2
= n
2
F
[1,n
1
][1,n
2
]
.
Lemma 7.
_

_
If
n,1

n1
and
n,2

n1
are two mutually independent sequences and
if (
1
,
2
) is an ST, then
1) each of the sequences

i,1

1
+i,1
and

i,2

2
+i,2

is i.i.d., and these sequences are mutually independent;


2)

i,1
D
=
1,1
;

i,2
D
=
1,2
;
3)

i,1

i1
;

i,2

i1
and a random vector
(
1
,
2
;
1,1
, . . . ,

1
,1
;
1,2
, . . . ,

2
,2
)
are mutually independent.
Proof is omitted.
Lemma 8.
_

_
In conditions of Lemma 7, assume, in addition, that

1,1
D
=
1,2
.
Then a sequence
n

n1
,

n
=
_

n,1
, if n
1

n
1
+
2
,2
, if n >
1
is i.i.d.;
n
D
=
1,1
.
Proof. We have to show that n = 1, 2, . . ., B
1
, . . . , B
l
P(
1
B
1
, . . . ,
n
B
n
) =
n

i=1
P(
1,1
B
i
).
1) n, B
P(
n
B) = P(
n,1
B; n
1
) +P(
n
1
+
2
,2
B; n >
1
).
22
P(
n,1
B; n
1
) = P(
1,1
B) P(
1,1
B) P(n >
1
) =
= P(
n,1
B) P(n
1
)
P(
n
1
+
2
,2
B; n >
1
) =
n1

l=1
P(

2
+nl,2
B;
1
= l)
=
n1

l=1
P(

nl,2
B;
1
= l)
= . . . = P(
1,2
B) P(
1
< n)
2) Problem No 6. Prove the statement for joint distributions. Use the induc-
tion arguments.
.
Here is another variant of a two-dimensional analogue of Lemma 3.
Lemma 9.
_

_
Assume that
(i)

n
= (
n,1
,
n,2
) is a sequence (n = 1, 2, . . .) of independent random
vectors;
(ii) each of
n,1

n1
and
n,2

n1
is an i.i.d. sequence;
(iii)
1,1
D
=
1,2
;
(iv) (
1
,
2
) is an ST and
1

2
= .
Then

n
=
_

n,1
, if n

n,2
, if n >
is an i.i.d. sequence;
n
D
=
1,1
.
Proof is very similar to that of Lemma 8 (omitted).
Finally, here is a further generalization of Lemma 9.
Lemma 10.
_

_
In the statement of Lemma 9, replace
( i ) by ( i ) =
_
_
_
m
1
1, m
2
1:

n
= (
(n1)m
1
+1,1
, . . . ,
nm
1
,1
;
(n1)m
2
+1,2
, . . . ,
nm
2
,2
) is
an i.i.d. sequence;
and
(iv) by (iv) =
_

_
(
1
,
2
) is an ST,
P(
1
m
1
, 2m
1
, . . .) = P(
2
m
2
, 2m
2
, . . .) = 1
and

1
m
1


2
m
2
.
Then

n
=
_

n,1
, if n
1

n
1
+
2
,2
, if n >
1
is an i.i.d. sequence;
n
D
=
1,1
.
Problem No 7. Prove Lemma 10.
23
1.9 Stationary Sequences and Processes
Discrete Time
Denition 8.
_

_
(a) Let
n

n0
be a sequence of r.v.s.
It is stationary if l = 1, 2, . . ., 0 i
1
< i
2
< . . . < i
l
,
B
1
, . . . , B
l
B, m = 1, 2, . . .
P(
i
1
B
1
, . . . ,
i
l
B
l
) = P(
i
1
+m
B
1
, . . . ,
i
l
+m
B
l
). (1)
(b) Similarly, a double-innite sequence
n

n=
is stationary, if (1)
holds m Z and B
1
, . . . , B
l
B.
Continuous Time
Denition

8.
_

_
(a) Let
t

t0
be a family of r.v.s.
It is stationary, if l = 1, 2, . . ., 0 t
1
< t
2
< . . . < t
l
,
B
1
, . . . , B
l
B, u 0
P(
t
1
B
1
, . . . ,
t
l
B
l
) = P(
t
1
+u
B
1
, . . . ,
t
l
+u
B
l
).
(b) Similarly,
t

t=
is stationary, if the above equality holds u R
and B
1
, . . . , B
l
B.
Denition 9.
_
A sequence of events A
n

n=
is stationary, if a sequence of random
variables I(A
n
)

n=
is stationary.
Assume that A
n

n=
is a stationary sequence and that P(A
0
) > 0 and P(

n=0
A
n
) =
1.
Introduce the following r.v.s:

+
= minn 1 : I(A
n
) = 1 minn 1 : A
n

= minn 1 : I(A
n
) = 1

+
: P( > n) = P(A
1
. . . A
n
[A
0
)

: P(

> n) = P(A
1
. . . A
n
[A
0
)
Lemma 11.
_

_
(a)
D
=

;
(b)
D
=

;
(c) P( = n) = P(A
0
) P( n) n = 1, 2, . . .
Remark 4. [ The statement of the lemma is not obvious, in general.
24
Examples: Let
n
be an i.i.d. sequence, P(
n
> 0) > 0.
The we can take a) A
n
=
n
> 0; b) A
n
=
n
+
n1
> 0.
Proof of Lemma 11.
(a)
P( > n) = P(A
1
. . . A
n
)
m

= P(A
1+m
. . . A
n+m
)
m=n1

=
= P(A
n
. . . A
1
) = P(

> n).
(b)
P( = n) =
P(A
0
A
1
. . . A
n1
A
n
)
P(A
0
)
=
P(A
n
A
n+1
. . . A
1
A
0
)
P(A
0
)
=
= P(

= n).
(c)
P( n) = P(A
1
. . . A
n1
) = P(A
0
A
1
. . . A
n1
) +P(A
0
A
1
. . . A
n1
)
= P(A
0
) P(A
1
. . . A
n1
[A
0
) +P(A
1
. . . A
n
) =
= P(A
0
) P( n) +P( n + 1).
[= P( = n) = P( n) P( n + 1) = P(A
0
) P( n).
.
Corollary 2.
_
k > 0, E
k
< E
k+1
< .
Proof. Note that
l

n=1
n
k

_
l
0
x
k
dx =
l
k+1
k + 1
and
l

n=1
n
k

_
l+1
1
x
k+1
dx
(l + 1)
k+1
k + 1
2
k+1
l
k+1
k + 1
.
[= E
k
=

n=1
n
k
P( = n) = P(A
0
)

n=1
n
k
P( n) =
= P(A
0
)

n=1
n
k

l=n
P( = l) = P(A
0
)

l=1
P( = l)
l

n=1
n
k

P(A
0
)
k + 1
2
k+1

l=1
P( = l)l
k+1
=
P(A
0
)
k + 1
2
k+1
E
k+1
25
and, using similar arguments with the lower bound,
E
k

P(A
0
)
k + 1
E
k+1
.
[= E
k
and E
k+1
are either nite or innite simultaneously. .
26
1.10 On -algebras generated by a sequence of r.v.s.
(1). Let , F, P be a probabililty space and
n
: R, n = 1, 2, . . . a sequence
of r.v.s. Let F
[k,n]
= (
k
, . . . ,
n
); F
[k,)
= (
k
,
k+1
. . .).
For A, B F, introduce a distance
d(A, B) = P(A B) +P(B A).
(A) Recall basic properties of -algebras.
1) If F
(1)
, F
(2)
are -algebras on [= F
(1)
F
(2)
is -algebra, too, but
F
(1)
F
(2)
may be not, in general.
2) More generally, let T be any parameter set and F
(t)
, t T -algebras on
[=
tT
F
(t)
is -algebra, too.
By denition, F
[1,)
is a minimal -algebra which contains all -algebras F
[1,n]
, n =
1, 2, . . . it is an intersection of all -algebras F
[1,n]
n = 1, 2, . . ..
Since F F
[1,n]
n [= F
[1,]
F.
(B) Now we study properties of the distance d:
(1) Clearly, d(A, B) = d(B, A) 0;
(2) d(A, C) d(A, B) +d(B, C) (the triangle inequality);
Indeed, A C = (A B) (A (B C)) (A B) (B C)
[= P(A C) P(A B) +P(B C).
Similarly,
P(C A) P(B A) +P(C B).
(3) d(A, B) = d(A, B) (since P(A B) = P(B A));
(4) [P(A) P(B)[ [P(A B) +P(A B) P(A B) P(B A)[ d(A, B);
(5) d(A
1
A
2
, B
1
B
2
) d(A
1
, B
1
) +d(A
2
, B
2
);
Indeed, (A
1
A
2
)(B
1
B
2
) = (A
1
(B
1
B
2
))(A
2
(B
1
B
2
)) (A
1
B
1
)(A
2
B
2
)
[= P((A
1
A
2
) (B
1
B
2
)) P(A
1
B
1
) +P(A
2
B
2
).
Lemma 12.
_
A F
[1,)
, A
n

n1
, A
n
F
[1,n]
: d(A, A
n
) 0.
27
Proof. Let U be a set of events A F such that A
n

n1
, A
n
F
[1,n]
:
d(A, A
n
) 0.
1) One can easily se that U F
[1,m]
m = 1, 2, . . ..
Indeed, m, A F
[1,m]
, let
A
n
=
_
, if n < m;
A, if n m.
Therefore, A U.
2) Thus, it is sucient to show that U is -algebra. Then, with necessity, U F
[1,)
,
that completes the proof.
2.1) First we prove that U is an algebra, i.e.
(i) U;
(ii) A U [= A U;
(iii) k, A
(1)
, . . . , A
(k)
U [= A
(1)
. . . A
(k)
U.
(i) is obvious, (ii) follows from property (3), and (iii) follows from (5):
d(A
(1)
. . . A
(k)
, A
(1)
n
. . . A
(k)
n
)
k

j=1
d(A
(j)
, A
(j)
n
) 0.
2.2) Now we prove that U is a -algebra:
(iii) A
(1)
, A
(2)
. . . U [= A

j=1
A
(j)
U.
Let B
(k)
=
k
j=1
A
(j)
Then B
(k)
A and P(B
(k)
) P(A).
[= B
(k)
n
: B
(k)
n
F
[1,n]
, d(B
(k)
, B
(k)
n
) 0 as n .
Choose
n(1) = minn 1 : d(B
(1)
, B
(1)
l
) 1/2 l n
and, for k 1,
n(k + 1) = minn n(k) : d(B
(k)
, B
(k)
l
) 1/2
k
l n.
Then let
A
n
=
_
, if n < n(1);
B
(k)
n(k)
, if n(k) n < n(k + 1).
Clearly, A
n
F
[1,n]
. Then d(A, A
n
) d(A, B
(k)
) + 1/2
k
, for n(k) n < n(k + 1).
Since k as n , d(A, A
n
) 0. .
28
Lemma 13.
_

_
Let
n

n=
be a double-innite sequence of r.v.s,
F
(,)
= . . . ,
2
,
1
,
0
,
1
,
2
, . . ..
Then A F
(,)
, A
n
, A
n
F
[n,n]
: d(A, A
n
) 0.
Problem No 8. Prove Lemma 13.
(2). Sigma-algebras generated by sequences of independent r.v.s.
Denition 10.
_

_
For a sequence
n

n1
of r.v.s, its tail -algebra is
F

k=1
F
[k,)
.
Note: Since F
[k+1,)
F
[k,)
, [= F

k=l
F
[k,)
l.
Denition 11.
_

_
For a sequence
n

n=
,
F

k=1
F
[k,)

k=l
F
[k,)
, < l <
is its right tail -algebra and
F

k=0
F
(,k]

k=l
F
(,k]
, < l <
its left tail -algebra.
Examples...
Lemma 14.
_

_
If
n

n1
is a sequence of independent r.v.s, then F

is trivial, i.e.
A F

, P(A) = 01.
Proof.
1) A F
[1,n]
n;
2) Since F

F
[1,)
, A
n
F
[1,n]
: d(A
n
, A) 0.
Therefore,
P(A) = P(A A
n
) +P(A A
n
) = P(A) P(A
n
) +P(A A
n
);
0 P(A)[1 P(A
n
)] = P(A A
n
) d(A
n
, A) 0.
.
29
Lemma 15.
_
If
n

n=
is a sequence of independent r.v.s, then both F

and F

are trivial.
Problem No 9. Prove Lemma 15.
(3). A stationary sequence of r.v.s.
Denition 12.
_

_
A sequence
n

n1
(or
n

n=
) is stationary, if
l 1, 1 n
1
< n
2
< . . . < n
l
(or without 1 ),
k 1 (or < k < ),
B
1
, . . . , B
l
P(
n
1
B
1
, . . . ,
n
l
B
l
) = P(
n
1
+k
B
1
, . . . ,
n
l
+k
B
l
).
In particular, all
n
are identically distributed and all nite-dimensional vectors

n
=
(
n
,
n+1
, . . . ,
n+l
) are i.d. (for a xed l).
Examples 1)
n
i.i.d.
2)
n

1
3)
n+1
=
n
,
1
=
_
1, 1/2
1, 1/2
Introduce the shift transformation on the set of F
[1,)
-measurable (or F
(,)
-
measurable) r.v.s:
1)
n
=
n+1
n
2) if = h(
n
,
n+1
, . . . ,
n+l
), then = h(
n+1
,
n+2
, . . . ,
n+l+1
)
3) if = h(. . . ,
n
,
n+1
, . . .), then = h(. . . ,
n+1
,
n+2
, . . .).
Note: is measure-preserving, i.e.
D
= .
Introduce a shift transformation on events from F
[1,)
(or from F
(,)
):
A F
[1,)
I(A) is F
[1,)
measurable h : I(A) = h(. . . ,
n
,
n+1
, . . .),
h is 0, 1-valued. Then
A = h(. . . ,
n+1
,
n+2
, . . .) = 1 I(A) = h(. . . ,
n+1
,
n+2
, . . .).
For any m, introduce
m
. . .
. .
m
.
In the case of F
(,)
, we can introduce
m
, too. Finally,
0
is the identical
transformation.
30
Denition 13.
_

_
An F
[1,)
-measurable (or F
(,)
-measurable) r.v. is invariant (w.r.to
), if
= a.s. (i.e. P( = ) = 1).
An event A F
[1,)
(or A F
(,)
) is invariant (w.r.to ), if
P(A A) = P(A).
Note that = a.s. x,
P( x x) = P( x).
Comments, examples...
Denition 14.
_

_
A stationary sequence
n
is ergodic (w.r.to ), if A F
[1,)
(A
F
[1,)
),
A is invariant [= P(A) = 01
(or is invariant [= = const a.s. ).
Remark 5.
_
All invariant events (sets) form a -algebra F
(inv)
(invariant -algebra).
Lemma 16.
_

_
(1) A F
[1,)
(or A F
(,)
) a sequence of events
n
A, n
0 (or
n
A, n ) is stationary;
(2) If
n
is stationary ergodic, then A F
[1,)
(or A
F
(,)
), P(A) > 0
[= P(

n=l

n
A) = 1 l ( and P(

n=l

n
A) = 1 l).
Proof. (1) follows from denitions.
(2) Let B =

n=l

n
A. Then
B =

n=l
(
n
A) =

n=l+1

n
A
and B B
[= P(B B) = P(B) = P(B) [= B is invariant
[= P(B) = 01.
But P(B) P(
l
A) = P(A) > 0 [= P(B) = 1. .
Lemma 17.
_
If A is invariant, then B F

such that d(A, B) = 0.


31
Proof. There are two cases: (a) F
[1,)
; (b) F
(,)
. Here we give a proof in the
rst case.
Problem No 10. Prove the lemma in the case (b).
1) Let B
0,m
= A A
2
A . . .
m
A, B
0
=

n=0

n
A. Then
A = B
0,0
B
0,1
. . . B
0,m
B
0,m+1
. . . B
0
and P(B
0,m
) P(B
0
). But
P(B
0,m
) = P(A) m [= P(B
0
) = P(A) and d(B
0
, A) = 0.
2) For k 1, put B
k
=
k
B
0

n=k

n
A.
Note that B
k+1
B
k
and B
k
F
[k,)
,
P(B
k
) = P(B
0
) = P(A) and d(B
k
, A) = 0.
Let
B = lim
k
B
k
[= P(B) = P(A) and d(B, A) = 0.
Since B F
[k,)
k [= B F

. .
Remark 6.
_
In the case F
(,)
, the symmetric statement is true, too: if A is
invariant, then B F

such that d(A, B) = 0.


Corollary 3.
_
Any i.i.d. sequence is stationary ergodic.
Indeed,
F

is trivial
[=
if A is invariant, B F

, P(B) = 01 and d(A, B) = 0


[= P(A) = 0 1.
Remark 7.
_
_
There exists a number of weaker conditions that imply the triviality of
the tail -algebra F

and, as a corollary, the ergodicity of a stationary


sequence.
For instance, we can introduce the following mixing coecients:
d
k
= sup
BF
[k,)
,AF
(,0]
[P(A B) P(A) P(B)[,
and then show that if d
k
0 as k , then F

is trivial.
In general, there are examples when F

is not trivial, but F


inv
is (i.e. the sequence
is ergodic).
Example
n+1
=
n
n;
1
=
_
1, w.pr. 1/2
1, w.pr. 1/2
Then
F

= (
1
), F
inv
= , .
32

You might also like