Professional Documents
Culture Documents
Section 2 - Martingales
Section 2 - Martingales
Section 2: Martingales
Contents
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Definitions and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.3 State-dependent Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 The Optional Sampling Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 The Martingale Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Martingales for Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 Exponential Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.8 Nonnegative Martingales and Change-of-measure . . . . . . . . . . . . . . . . . . . . 25
2.9 The Doob-Meyer Decomposition for Supermartingales . . . . . . . . . . . . . . . . . 29
2.10 Martingales in Continuous Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1 Introduction
A fundamental tool in the analysis of Markov processes is the notion of a martingale. Mar-
tingales also form the basis of the modern theory of stochastic integration. The general theory is
phrased in terms of integration against arbitrary square-integrable martingales (rather than inte-
gration against standard Brownian motion, as will be the emphasis in this course).
Definition 2.2.1 We say that the sequence X = (Xn : n ≥ 0) is adapted to the sequence Z =
(Zn : n ≥ 0) if, for each n ≥ 0, there exists a (deterministic) function gn (·) such that
Xn = gn (Z0 , Z1 , . . . , Zn ).
Similarly, we say that the process X = (X(t) : t ≥ 0) is adapted to the process Z = (Z(t) : t ≥ 0)
if, for each t ≥ 0, there exists a (deterministic) function gt (·) such that
Remark 2.2.2 The notion of adaptedness, as we have introduced it above, is not quite rigorous
from a measure-theoretic probability viewpoint. In the language of measure-theoretic probability,
1
§ SECTION 2: MARTINGALES
the precise statement that (Xn : n ≥ 0) is adapted to (Zn : n ≥ 0) comes down to requiring that
for each n ≥ 0, Xn is Fn -measurable, where Fn is the sigma-algebra generated by Z0 , . . . , Zn
(typically denoted as σ(Z0 , . . . , Zn )). However, the assertion that Xn is Fn -measurable is known
to be equivalent to asserting that Xn can be expressed as a measurable function of Z0 , . . . , Zn . In
other words, the assumption of Fn -measurability is essentially identical to what we are requiring in
our definition. We use our version of the definition because we are not requiring measure-theoretic
probability as a prerequisite for this class, and because no serious mathematical or computational
errors are likely to arise in typical applications of the material covered in this class as a result of
our more intuitive (but not completely rigorous) definition.
It should be further noted that a common mathematical terminology for adaptedness is to state
that (Xn : n ≥ 0) is adapted to the filtration (Zn : n ≥ 0). Similarly, adaptedness of (X(t) : t ≥ 0)
to (Z(t) : t ≥ 0) is typically phrased, in measure-theoretic probability texts, as a requirement that
(X(t) : t ≥ 0) is adapted to the filtration (Ft : t ≥ 0), where Ft = σ(Z(s) : 0 ≤ s ≤ t).
Definition 2.2.2 We say that (Mn : n ≥ 0) is a martingale (adapted to (Zn : n ≥ 0)) if:
Exercise 2.2.1 Prove that a mean-zero random walk is a martingale (adapted to (Sn : n ≥ 0)).
Exercise 2.2.2 Prove that if (Mn : n ≥ 0) is a martingale adapted to (Zn : n ≥ 0), then
(Mn : n ≥ 0) is a martingale adapted to (Mn : n ≥ 0).
Not surprisingly, it is also useful to model the notion of a favorable and unfavorable game.
Remark 2.2.3 From a terminology standpoint, it perhaps comes as a surprise that the “unfa-
vorable” supermartingale carries the prefix “super”, whereas the “favorable” submartingale carries
the prefix “sub”. This terminology has to do with the deep connection between martingale theory
and the classical area of mathematics known as “potential theory”; see Section 3.2 for additional
details.
Remark 2.2.4 In the setting of measure-theoretic probability, it is common to use the more com-
pact notation E [X|Ft ] in place of E [X|Z(u) : 0 ≤ u ≤ t], where Ft = σ(Z(u) : 0 ≤ u ≤ t). In
view of this notation, condition 3 above would be written as the requirement that
for s, t ≥ 0.
be the j’th martingale difference (for j ≥ 1), and note that we can write a discrete-time martingale
as
Mn = M0 + D 1 + D 2 + · · · + D n . (2.2.1)
2. E Dj = 0 for j ≥ 1 and E Mn = E M0 .
If, in addition, Mn is square-integrable for n ≥ 0, then:
3
§ SECTION 2: MARTINGALES
E [Dj |Z0 , . . . , Zj−1 ] = E [Mj − Mj−1 |Z0 , . . . , Zj−1 ] = E [Mj |Z0 , . . . , Zj−1 ] − Mj−1 = 0;
2. follows immediately from 1.. For 3., suppose, without loss of generality, that i < j. Then,
We conclude this section with a result that hints of the connections between martingales and
stochastic integration.
Proposition 2.2.2 Let M = (Mn : n ≥ 0) be a martingale adapted to (Zn : n ≥ 0), and let
(φn : n ≥ 0) be a sequence of bounded rv’s that is adapted to (Zn : n ≥ 0). Then
n
∆
X
M̃n = φj−1 ∆Mj
j=1
for n ≥ 0, verifying the fact that (M̃n : n ≥ 0) is a martingale adapted to (Zn : n ≥ 0). The rest
of the proof is left to the reader.
Remark 2.2.5 Note that M̃n is a discrete-time analogue to a continuous-time stochastic integral
of the form Z t
φ(s)dM (s).
0
Martingales play a key role in the analysis of Markov chains and Markov processes. We start
by considering the class of “state-dependent” processes of the form Mn = f (Xn ), where X = (Xn :
n ≥ 0) is an S-valued Markov chain with stationary transition probabilities.
Proposition 2.3.1 Suppose that f is such that Ex |f (Xn )| < ∞ for n ≥ 0 and x ∈ S. Then:
4
§ SECTION 2: MARTINGALES
so it follows that Ex f (Xn ) ≤ f (x) for n ≥ 0. Hence, the integrability follows immediately when f
is non-negative and satisfies P f ≤ f .
Exercise 2.3.1 Let X = (X(t) : t ≥ 0) be a Markov jump process living on finite state space S
and possessing rate matrix Q. Prove that:
We, of course, expect to see analogous results in the setting of SDE’s. In particular, suppose
that X = (X(t) : t ≥ 0) satisfies the SDE
The heuristic argument of Section 1 suggests that if f is bounded and twice continuously differen-
tiable, then
Ex [f (X(t))] = f (x) + t(L f )(x) + o(t)
as t → 0, where L is the second-order linear differential operator given by
d σ 2 (x) d2
L = µ(x) + .
dx 2 dx2
This, in turn, indicates that (f (X(t)) : t ≥ 0) will be a martingale adapted to (X(t) : t ≥ 0) if
L f = 0, and a supermartingale (submartingale) if L f ≤ 0 (L f ≥ 0).
In the special case that µ ≡ 0 and σ = I, then X = B and we are led to the conclusion that
(f (B(t)) : t ≥ 0) is a martingale (adapted to (B(t) : t ≥ 0)) if
∆f = 0 (2.3.1)
5
§ SECTION 2: MARTINGALES
d
X ∂2
∆=
i=0
∂x2i
E Mn = E M 0 (2.4.1)
for n ≥ 0. It is natural to wonder about the degree to which (2.4.1) can be extended to a random
time T , namely to an identity of the form
E MT = E M0 . (2.4.2)
The theory of “optional sampling” provides conditions under which (2.4.2) is valid.
Before describing the relevant theory, let us illustrate how (2.4.2) can be used to compute ex-
pectations and probabilities of interest.
E ST = 0. (2.4.3)
But
6
§ SECTION 2: MARTINGALES
Example 2.4.2 Consider once again the nearest-neighbor random walk of 2.4.1, but with P {Zi = 1} =
p = 1 − P {Zi = −1}. Suppose S0 = x ∈ Z+ and T = inf{n ≥ 0 : Sn = 0}. Our goal is to com-
pute E T . (This computation was also considered in Section 2.2, where “first transition” analysis
was used to derive a linear system of equations from which E T could be calculated). Note that
(Sn − n(2p − 1) : n ≥ 0) is a martingale adapted to (Sn : n ≥ 0). Assuming that (2.4.2) is valid,
we find that
E [ST − T (2p − 1)] = x.
But E ST = 0, so
E T (1 − 2p) = x. (2.4.4)
The equation (2.4.4) clearly makes no sense if p ≥ 1/2. On the other hand, if p < 1/2, it suggests
that
x
ET = .
1 − 2p
Note that if p > 1/2, the system has a tendency to drift to infinity, so we expect that T = ∞ with
positive probability in that case. So the restriction to the case p < 1/2 is a reasonable one.
The above constraint on p suggests that optional sampling is not universally valid. In 2.4.2, the
difficulty is that when p > 1/2, T is not finite-valued a.s., so that MT makes no sense. But even
when T is finite-valued, optional sampling need not be valid.
1 = E ST = E S0 = 0,
Example 2.4.4 Let S = (Sn : n ≥ 0) be a random walk on Z for which P {Zi = 0} = r and
P {Zi = −1} = P {Zi = 1} = (1 − r)/2. Put
Mn = Sn2 − nσ 2 ,
where σ 2 = var Zi = (1 − r). We claim that (Mn : n ≥ 0) is a martingale adapted to (Sn : n ≥ 0).
The key is to note that
Put T1 = inf{n ≥ 0 : Sn = 1} and T2 = inf{n ≥ 0 : |Sn | = 1}; both T1 and T2 are a.s. finite-valued.
If S0 = 0, (2.4.2) implies that
E ST2i − Ti (1 − r) = 0.
7
§ SECTION 2: MARTINGALES
However, T2 ≤ T1 , so it is impossible that both E T1 and E T2 equal (1 − r)−1 . This raises the
question of which identity E Ti = (1 − r)−1 (if any) is correct. Hence, optional sampling cannot be
applied to this martingale at both the times T1 and T2 .
Recalling the fact that a martingale is a mathematical idealization of a gambler’s fair game, it
seems reasonable that the optional sampling theorem should require that the random time T be
non-clairvoyant.
Definition 2.4.1 We say that a random time T ∈ Z+ is a stopping time that is adapted to
(Zn : n ≥ 0) if the sequence of rv’s (I {T ≤ n} : n ≥ 0) is adapted to (Zn : n ≥ 0). A random time
T ∈ R+ is a stopping time that is adapted to (Z(t) : t ≥ 0) if the process (I {T ≤ t} : t ≥ 0) is
adapted to (Z(t) : t ≥ 0).
Remark 2.4.1 An equivalent way of asserting that T ∈ Z+ is a stopping time adapted to (Zn :
n ≥ 0) is to require that (I {T = n} : n ≥ 0) is adapted to (Zn : n ≥ 0).
∆ ∆
For a, b ∈ R, let a ∨ b = max{a, b} and a ∧ b = min{a, b}.
2. Prove that if T = inf{n ≥ 0 : Zn ∈ A} (i.e. T is the “first hitting time” of A), then T is a
stopping time adapted to (Zn : n ≥ 0).
3. Prove that if T1 and T2 are stopping times adapted to (Zn : n ≥ 0). then T1 ∨ T2 , T1 ∧ T2
and T1 + T2 are stopping times adapted to (Zn : n ≥ 0).
We are now ready to state and prove the basic optional sampling theorem.
We illustrate the use of optional sampling on the examples discussed earlier in this section.
Hence, if E |Z1 | < ∞, and E T < ∞ (2.4.6) follows from (2.4.7). Returning to the specific problem
context of 2.4.2, it is a standard fact that E T < ∞ when p < 1/2, proving (2.4.4) under this
restriction on p.
9
§ SECTION 2: MARTINGALES
E ST22 ∧n → E ST22
as n → ∞. Hence
1 = E ST22 = (1 − r) E T2 ,
proving that E T2 = (1 − r)−1 . The optional sampling theorem clearly must therefore fail for T1 .
Perhaps surprisingly, the definition of a martingale imposes a great deal of structure on the
“path regularity” of the process. As an example, we start with the maximal inequality (that
controls the maximal behavior of the path).
Theorem 2.5.1 Let (Mn : n ≥ 0) be a submartingale adapted to (Zn : n ≥ 0). Then, for x > 0,
E Mn+
P max Mk > x ≤ .
0≤k≤n x
Proof: Let τ = inf{j ≥ 0 : Mj > x}, and note that P (max0≤k≤n Mk > x) = P (τ ≤ n). Since
Mτ /x > 1,
n
E Mτ I {τ ≤ n} X E Mj
P (τ ≤ N ) ≤ = I {τ = j} .
x x
j=0
But
E [Mn I {τ = j} |Z0 , . . . , Zj ] ≥ I {τ = j} Mj
so it follows that
The result is then an immediate consequence of the fact that (Mn : n ≥ 0) is a submartingale.
Corollary 2.5.1 Let (Mn : n ≥ 0) be a martingale adapted to (Zn : n ≥ 0) for which E |Mn |p < ∞
for p ≥ 1. Then
E |Mn |p
P max Mk > x ≤ . (2.5.1)
0≤k≤n xp
A further example of “path regularity” is the “upcrossing inequality” for submartingale. For
a < b, let Un [a, b] be the number of times that (Mj : 0 ≤ j ≤ n) crosses from below a to above b.
E Mn+ + |a|
E Un [a, b] ≤ .
b−a
See Billingsley for a proof.
A similar downcrossing inequality also holds. These upcrossing/ downcrossing inequalities es-
tablish that submartingales can typically not exhibit “oscillatory discontinuities” (since an infinite
number of oscillations implies that there must exist a < b for which infinitely many upcross-
ings occur). Hence, submartingales can fail to converge almost surely only if they diverge to
infinity. This divergence can be prevented through an appropriate assumption on the submartin-
gale/supermartingale.
Mn → M∞ a.s.
as n → ∞.
Mn → M∞ a.s.
as n → ∞.
The following variant of the Martingale Convergence Theorem follows from an elementary ar-
gument. Recall that for a rv W , kW kp = (E |W |p )1/p for p ≥ 1.
11
§ SECTION 2: MARTINGALES
kMn − M∞ k2 → 0
as n → ∞.
which can be made arbitrarily small by choosing n large enough, on account of the fact that
∞
X
sup E Mn2 = E M02 + E Dj2 < ∞.
n≥0 j=1
Proof: Note that (f (Xn ) : n ≥ 0) is a martingale that is adapted to (Xn : n ≥ 0). Since f
is bounded, the Martingale Convergence Theorem applies. So, there exists a finite-valued rv M∞
such that
f (Xn ) → M∞ a.s. (2.5.2)
as n → ∞. But if f is non-constant, there exists x, y ∈ S such that f (x) < f (y). Since (Xn : n ≥ 0)
is recurrent, evidently
lim inf f (Xn ) ≤ f (x)
n→∞
and
lim sup f (Xn ) ≥ f (y)
n→∞
Proposition 2.6.1 Let Y by an integrable rv (i.e. E |Y | < ∞). Then, for any sequence (Zn : n ≥
0),
Mn = E [Y |Z0 , . . . , Zn ]
is a martingale adapted to (Zn : n ≥ 0).
Definition 2.6.1 A sequence of rv’s (Wn : n ≥ 0) is said to be uniformly integrable if for each
> 0, there exists x = x() such that
We are now ready to discuss the connection between martingales and linear systems in the
Markov setting through the following illustrations of the general principle.
Y = f (Xn )
and apply Proposition (2.6.1). It follows that Mj = Ex [f (Xn )|X0 , . . . , Xj ] is a martingale adapted
to (Xj : 0 ≤ j ≤ n) for 0 ≤ j ≤ n. But
So, if one can guess a good candidate function u, Problem (2.6.3) provides bounds on the cor-
responding expectation.
hP i
∞ −αj f (X ) : For α > 0 and f
Infinite-horizon discounted expectation of the form Ex j=0 e j
hP i
∞
bounded, let u∗ (x) = Ex j=0 e−αj f (X ) . Put
j
∞
X
Y = e−αj f (Xj ),
j=0
So,
n−1
X
Mn = e−αj f (Xj ) + e−αn u∗ (Xn )
j=0
is a martingale adapted to (Xn : n ≥ 0). The u∗ (x)’s satisfy a linear system described in Chapter 2.
and
T
X −1
Y = f (Xj ).
j=0
If (u∗ (x)
: x ∈ S) is finite-valued, Proposition (2.6.1) implies that Mn = E [Y |X0 , . . . , Xn ] is a
martingale adapted to (Xn : n ≥ 0). Here,
(T ∧n)−1
X
E [Y |X0 , . . . , Xn ] = f (Xj ) + u∗ (XT ∧n ),
j=0
so
(T ∧n)−1
X
Mn = f (Xj ) + u∗ (XT ∧n ) (2.6.1)
j=0
is a martingale adapted to (Xn : n ≥ 0). Again, u∗ = (u∗ (x) : x ∈ S) satisfies a linear system as
described earlier.
1. Put Z T
∗
u (x) = Ex f (X(s))ds .
0
Note that the process defined by (2.6.1) is exactly the process (M̃T ∧n : n ≥ 0), where
n−1
X
M̃n = f (Xj ) + u∗ (Xn ).
j=0
In view of the Optional Sampling Theorem, this suggests that the process (M̃n : n ≥ 0) is a
martingale. This is justified by our next result.
Proposition 2.6.2 Let X = (Xn : n ≥ 0) be a Markov chain on discrete state space S with
transition matrix P . Suppose that Ex |g(Xn )| < ∞ for n ≥ 0. Then, there exists a function f such
that
n−1
X
Mn = g(Xn ) − f (Xk ) (2.6.2)
k=0
Proof: Note that Ex |(P g)(Xn )| ≤ Ex [(P |g|)(Xn )] = Ex |g(Xn+1 )| < ∞, so that the f (Xn )’s are
integrable rv’s with respect to the distribution Px . Observe that
n
X n−1
X
Ex [Mn+1 |X0 , . . . , Xn ] = (P g)(Xn ) − f (Xk ) = g(Xn ) − f (Xk ) = Mn ,
k=0 k=0
The fact that processes of the form (2.6.2) are martingales adapted to (Xn : n ≥ 0) actually
forces X to be Markov.
Proposition 2.6.3 Let X = (Xn : n ≥ 0) be a sequence of S-valued rv’s. Suppose that for each
bounded function g : S → R, there exists a function f (denoted Ag) such that
n−1
X
Mn = g(Xn ) − f (Xk )
k=0
is a martingale adapted to (Xn : n ≥ 0). Then, X is Markov with stationary transition probabilities.
Proof: Because f (Xn ) = g(Xn+1 ) − g(Xn ) + Mn − Mn+1 , the f (Xn )’s are integrable rv’s. If
Dn = Mn − Mn−1 recall that E [Dn+1 |X0 , . . . , Xn ] = 0 (since Dn+1 is a martingale difference). So
In other words
E [g(Xn+1 )|X0 , . . . , Xn ] = g(Xn ) + f (Xn ).
Specializing to the case where g(y) = δz (y), we conclude that
Remark 2.6.2 2.6.3 establishes that X is Markov with stationary transition probabilities. So, if
S is discrete, X possesses a one-step transition matrix P . One choice for f = Ag is
f = (P − I)g,
∆ Pn−1
where g̃ = −g. To apply the Dynkin’s martingale to the analysis of the rv k=0 f (Xk ) and obtain
the identity (2.6.3), we therefore need to find a function g̃ for which
Remark 2.6.3 When |S| < ∞, it is evident that (P − I) is a singular matrix (since 1 is an
eigenvalue of a stochastic matrix). So (2.6.4) cannot be solvable for all functions f . Suppose, in
particular, that X is irreducible. Thus, there exists a unique stationary distribution π satisfying
π = πP . Pre-multiplying through (2.6.4) by π establishing that πf = 0. So, (2.6.4) is then solvable
only when πf = 0.
The next result establishes a probabilistic representation for the solution to (2.6.4). Let fc (x) =
f (x) − πf for x ∈ S.
Proposition 2.6.4 Let X = (Xn : n ≥ 0) be a finite-state irreducible Markov chain with transi-
tion matrix P . If there exists a solution g to
n−1
X
Mn = g(Xn ) + fc (Xk )
k=0
is a martingale adapted to (Xn : n ≥ 0). The Optional Sampling Theorem yields the identity
Ex Mτ (x)∧n = Ex M0 = g(x). So,
(τ (x)∧n)−1
X
g(x) = Ex g(Xτ (x)∧n ) + Ex fc (Xk ) . (2.6.6)
k=0
Since X is positive recurrent, τ (x) < ∞ a.s. and Ex τ (x) < ∞. Since fc and g are bounded, the
Dominated Convergence Theorem guaranties that
and
(τ (x)∧n)−1 τ (x)−1
X X
Ex fc (Xk ) → Ex fc (Xk )
k=0 k=0
18
§ SECTION 2: MARTINGALES
where c = g(0).
Remark 2.6.4 Note that if g solves (2.6.5), then so does g − c, for any c ∈ R. Hence, the
proposition above establishes that if there exists a solution g to (2.6.5), one choice of solution is
τ (x)−1
X
gz (x) = Ex fc (Xk ) .
k=0
hP i
τ (x)−1
The solution gz (·) is the solution that vanishes at z. (Recall that because πfc = 0, Ex k=0 fc (Xk ) =
0.)
Our final result of this section establishes a converse to the previous proposition.
Proposition 2.6.5 Let X = (Xn : n ≥ 0) be a finite-state irreducible Markov chain with transi-
tion matrix P . Then
τ (x)−1
X
g ∗ (x) = Ex fc (Xk ) (2.6.7)
k=0
is a solution of (P − I)g = fc (so Poisson’s equation always has a solution for such finite-state
chains).
Proof: Because X is positive recurrent and fc is bounded, g as defined through (2.6.7) is finite-
valued. By conditioning on X1 , we find that for x ∈ S,
X
g ∗ (x) = fc (x) + P (x, y)g ∗ (y).
y6=x
Problem 2.6.6 Let X = (X(t) : t ≥ 0) be a finite-state irreducible Markov jump process with
rate matrix Q.
1. Prove that Z t
M (t) = g(X(t)) − (Qg)(X(s))ds
0
2. Assuming that the Optional Sampling Theorem generalizes to continuous time, prove that
Qg = −f always has a solution given by
"Z #
τ (x)
g(x) = Ex fc (X(s))ds ,
0
Note that the state-dependent martingales discussed previously are all special cases of the
Dynkin martingales developed in this section.
Sn −n E Z1 is a martingale adapted to (Sn : n ≥ 0), provided that E |S0 | < ∞ and E |Z1 | < ∞
(Sn − n E Z1 )2 − n var (Z1 ) is a martingale adapted to (Sn : n ≥ 0), provided that E S02 and
E Z12 are finite.
(Actually, we proved the second assertion in Example 2.4.4 only in that special case, but the
proof easily extends to the more general case as stated above.) It turns out that the two above
martingales, involving Sn and Sn2 , are the first two martingales in an infinite sequence of such
random walk martingales, in which the k’th martingale involves Snk .
The source of this infinite family of martingales is the so-called “exponential martingale” for
random walk. Assume that the moment generating function of the Zi ’s converges in a neighborhood
of the origin, so that
E eθZ1 < ∞ (2.7.1)
for θ in a neighborhood of the origin. The domain D of θ-values for which (2.7.1) holds then takes
the form [−a, b], (−a, b], [−a, b) or (−a, b) for some a, b > 0. Note that this covers most commonly
encountered increment distributions, but not all. In particular, if Z1 has “power tail” type decay
of the form P {Z1 > x} ∼ cx−α as x → ∞ for some c > 0, then (2.7.1) fails. So, (2.7.1) goes
hand-in-hand with the requirement that the rv Z1 is “light-tailed” (in the sense that its tails decay
at least exponentially fast).
Let
ψ(θ) = log E eθZ1
be the logarithmic moment generating function of Z1 (also known as the cumulant generating
function of Z1 ).
Let D0 = (a, b) be the interior of D. It is well known that ψ is infinitely differentiable on D0 .
Also, for each θ ∈ D0 , θ + h ∈ D0 and θ − h ∈ D0 for h sufficiently small, so that (Mn (θ) : n ≥ 0),
(Mn (θ + h) : n ≥ 0) and (Mn (θ − h) : n ≥ 0) are all martingales adapted to (Sn : n ≥ 0). Since it
is trivial that the sum of two identically adapted martingales is a martingale, it follows that
Mn (θ + h) − Mn (θ)
: n≥0
h
so that (Mn0 (θ) : n ≥ 0) is a martingale adapted to (Sn : n ≥ 0). Since Mn (·) is infinitely differ-
entiable on D0 , a similar argument to that just used for the first derivative proves that for each
(k) (k)
θ ∈ D0 and k ≥ 1, (Mn (θ) : n ≥ 0) is a martingale adapted to (Sn : n ≥ 0), where Mn (θ) is the
k’th derivative (with respect to θ) of Mn (·).
Note that
d
E eθZ1 E Z1 eθZ1
(1) dθ
ψ (θ) = =
E eθZ1 E eθZ1
so that ψ (1) (0) = E Z1 . A similar calculation shows that ψ (2) (0) = var (Z1 ). It then easily follows
that
Sn − n E Z1 = Mn(1) (0)
and
(Sn − n E Z1 )2 − n var (Z1 ) = Mn(2) (0),
so that we recover the two random walk martingales discussed earlier in this section via the first
two derivatives of the exponential martingale (evaluated at 0). A martingale involving Snk is then
obtained from the exponential martingale by differentiating k times (and setting θ = 0); this mar-
tingale will involve the first k moments of Z1 .
In addition to generating this useful family of martingales, the exponential martingale is itself
valuable for many computations. As an illustration, consider the special case where the Zi ’s ∈
{−1, 1}, so that P {Zi = 1} = p = 1 − P {Z1 = −1} (and hence S is a nearest-neighbor random
21
§ SECTION 2: MARTINGALES
We will use the optional sampling theorem to compute P {ST = −a} and P {ST = b}. Since we are
interested in the location ST (and not the exit time T ), it seems intuitively convenient to try to
choose a value θ that will eliminate the contribution of T ψ(θ) to the exponent of (2.7.3) at the exit
time T . Since the choice of θ is at our disposal in (2.7.3), this suggests choosing θ so that ψ(θ) = 0
or, equivalently,
E eθZ1 = 1. (2.7.4)
One possible choice is θ = 0. But if we choose θ = 0 in (2.7.3), we get the trivial martingale
Mn (0) = 1 from which we can learn nothing. Furthermore, there is a second, non-zero, value of θ,
call it θ∗ , satisfying (2.7.4) when p 6= 1/2, namely
∗ 1−p
θ = log .
p
With this θ∗ in hand, the optional sampling theorem yields the identity
∗S
E eθ T ∧n
= 1.
But T < ∞ a.s. and exp(θ∗ ST ∧n ) ≤ exp(θ∗ (a ∨ b)). So, the Bounded Convergence Theorem proves
that
∗
E eθ ST = 1,
from which it follows that ∗b
eθ −1
P {ST = −a} = ,
eθ∗ b − e−θ∗ a
∗
1 − e−θ a
P {ST = b} = θ∗ b .
e − e−θ∗ a
Hence, the exponential martingale is a very useful tool in explicitly computing exit probabilities
for this example. For more general increment distributions, explicit computation will typically be
impossible (in closed form), but (tight) bounds can often by computed.
We have seen that if (Sn : n ≥ 0) is a random walk with iid increments Z1 , Z2 , . . . having a
finite moment generating function in a neighborhood of the origin, then Sn − n E Z1 is a martingale
adapted to (Sn : n ≥ 0) and is one in an infinite family of such martingales. We now show that
the key exponential martingale for random walks generalizes in a suitable way to Markov random
walks (in which the increments are Markov-dependent).
Specifically, let X = (Xn : n ≥ 0) be a finite state irreducible Markov chain and suppose
f : S → R is a real-valued function. Set
Sn = f (X0 ) + · · · + f (Xn−1 ).
22
§ SECTION 2: MARTINGALES
As in the setting of an ordinary random walk, the starting point is to consider the moment gener-
ating function
φn (θ, x) = Ex eθSn .
Note that
h i X Pn−1
Ex eθSn I {Xn = y} = eθf (x)+θ j=1 f (xj )
P (x, x1 ) · · · P (xn−1 , y)
x1 , x2 , ..., xn−1
X
= G(θ, x, x1 )G(θ, x1 , x2 ) · · · G(θ, xn−1 , y)
x1 , x2 , ..., xn−1
= Gn (θ, x, y),
where G(θ) = (G(θ, x, y) : x, y ∈ S) is the matrix having (x, y)’th entry given by G(θ, x, y) =
eθf (x) P (x, y). Hence
φn (θ, x) = (Gn (θ)e)(x),
where e = (1, 1, . . . , 1)T is the column vector in which all the entries are equal to 1. If follows that
if we write
φn (θ, x) ≈ enψ(θ) ,
we expect that eφ(θ) will be the largest eigenvalue of G(θ). To proceed further, we need the following
result due to Perron and Frobenius:
Theorem 2.7.1 Let G = (G(x, y) : x, y ∈ S) be an irreducible (i.e. for each x, y ∈ S, there exists
n ≥ 1 such that Gn (x, y) > 0) matrix with non-negative entries and |S| < ∞. Then, the largest
eigenvalue λ (in absolute value) is positive and its corresponding row and column eigenvectors have
strictly positive entries
Let λ(θ) be the “Perron-Frobenius eigenvalue” of G(θ) and let h(θ) = (h(θ, x) : x ∈ S) be its
associated column eigenvector. By analogy with ordinary random walk, we expect eθSn −nψ(θ) to be
(approximately) a martingale, where ψ(θ) = log(λ(θ)).
Proof:
h i
Ex [Mn+1 (θ)|X0 , . . . , Xn ] = eθSn −nψ(θ) Ex eθf (Xn )−ψ(θ) h(θ, Xn+1 )|X0 , . . . , Xn
X
= eθSn −nψ(θ) eθf (Xn ) P (Xn , y)h(θ, y)e−ψ(θ)
y
23
§ SECTION 2: MARTINGALES
Thus, the exponential martingale for Markov random walk takes the same form as in the ordi-
nary random walk setting, except that the martingale must incorporate a state-dependent compo-
nent given by h(θ, Xn ).
As for ordinary random walks, it is of interest to consider the martingales that are obtained
through successive differentiation of Mn (θ) (with respect to θ). The irreducibility of P guarantees
that the eigenvalue λ(θ) has multiplicity one for each θ. It follows that λ(θ) and h(θ) are infinitely
differentiable in θ. Differentiating the equation G(θ)h(θ) = eψ(θ) h(θ) with respect to θ, we get
X
eθf (x) P (x, y)(f (x)h(θ, y) + h0 (θ, y)) = eψ(θ) (ψ 0 (θ)h(θ, x) + h0 (θ, x)). (2.7.5)
y
Pre-multiplying through (2.7.6) by π, we find that ψ 0 (0) = πf . Hence, h0 (0) solve Poisson’s
equations
(P − I)g = −fc ,
where fc (x) = f (x) − πf .
h0 (0, Xn )
0 0
Mn (θ) = Mn (θ) Sn − nψ (0) + ,
h(θ, Xn )
so
n−1
h0 (0, Xn ) X
Mn0 (θ) = Sn − nψ 0 (0) + = Sn − nπf + g(Xn ) = fc (Xj ) + g(Xn )
h(0, Xn )
j=0
is precisely the Dynkin martingale obtained earlier in the chapter. So, the first derivative of the
exponential martingale for Markov random walk recovers the Dynkin martingale. Additional (use-
ful) martingales can be obtained through successive differentiation of this exponential martingale.
The exponential martingale for Markov random walk can be applied, as in the ordinary random
walk setting, to compute bounds on exit probabilities of the form P {ST ≤ −a} and P {ST ≥ b},
where T = inf{n ≥ 0 : Sn ≤ −a or Sn ≥ b}.
Problem 2.7.2 Let X = (X(t) : t ≥ 0) be an irreducible finite-state Markov jump process with
rate matrix Q. Put Z t
S(t) = f (X(s))ds.
0
1. Use the Perron-Frobenius theorem to prove that there exists a positive function h(θ) =
(h(θ, x) : x ∈ S) and ψ(θ) such that
2. Suppose πf > 0 and {x : f (x) < 0} is non-empty. For r > 0, let Tr = inf{t ≥ 0 : S(t) ≤
−ar or S(t) ≥ br}. Use the above exponential martingale to prove that there exists γ such
that
1
log P {S(Tr ) ≤ −ar} → γ
r
as r → ∞.
Remarkably, something special happens in the setting of a martingale. Specifically, suppose that
(Mn : n ≥ 0) is a non-negative martingale that is adapted to (Zn : n ≥ 0) for which E M0 = 1. Set
Pn {·} = E I {·} Mn
Prove that if (Mn : n ≥ 0) is adapted to (Zn : n ≥ 0), then (Mn : n ≥ 0) is a martingale adapted
to (Zn : n ≥ 0).
In view of (2.8.1), the Kolmogorov extension theorem guarantees the existence of a single
probability P̃ on Ω (not depending on n) such that
P̃{(Z0 , . . . , Zn ) ∈ ·} = Pn {(Z0 , . . . , Zn ) ∈ ·}
where Ẽ[·] is the expectation operator corresponding to P̃. Furthermore, if T is a stopping time
adapted to (Zn : n ≥ 0), then
for any non-negative function fn . Summing over n in (2.8.2) we arrive at the following important
identity:
ẼfT (Z0 , . . . , ZT )I {T < ∞} = E fT (Z0 , . . . , ZT )MT I {T < ∞} . (2.8.3)
Hence, the expectation of any functional of the path of Z up to a stopping time T can be
computed in terms of an expectation defined in terms of P̃.
is a unit mean positive martingale that is adapted to (Sn : n ≥ 0). If Pθ is the probability on Ω
induced by the martingale (Mn (θ) : n ≥ 0), then
where
Pθ {Zi ∈ dx} = P {Z ∈ dx} eθz−ψ(θ) . (2.8.5)
In other words, Z1 , Z2 , . . . are iid under Pθ with common distribution given by (2.8.5), so that
(Sn : n ≥ 0) continues to be an (ordinary) random walk under Pθ (but with a modified increment
distribution).
Observe that if θ > 0, the distribution (2.8.5) favors positive values of Z relative to negative
values of Z (as compared to P {Z ∈ ·}). This implies that if Eθ [·] is the expectation operator
corresponding to Pθ , then Eθ Z > E Z when θ > 0 ( and Eθ Z < E Z when θ < 0).
Proof: Note that ψ 0 (θ) = φ0 (θ)/φ(θ). An easy application of the Dominated Convergence Theo-
rem shows that when θ ∈ D0 ,
dk θZ dk θZ
E e = E e = E Z K eθZ
dθk dθk
and hence
eθZ
ψ 0 (θ) = E Z = E ZeθZ−ψ(θ) = Eθ Z.
φ(θ)
Similarly
2
φ00 (θ) φ0 (θ) Z 2 eθZ
00
ψ (θ) = − =E − (Eθ Z)2 = Eθ Z 2 − (Eθ Z)2 = varθ Z.
eθ φ(θ) φ(θ)
Note that if Z is not deterministic, then varθ Z > 0, so ψ is strictly convex on D0 . It follows
that ψ 0 (·) is strictly increasing, so that Eθ Z is strictly increasing in θ on D0 .
Example 2.8.1 Suppose that the Zi ’s are iid Norm(µ, σ 2 ) rv’s under P. Then, ψ(θ) = θµ(θ2 σ 2 )/2.
It follows from (2.8.3) and (2.8.4) that if T is a stopping time adapted to (Zn : n ≥ 0), then
θ2 σ2
Pθ {(Z1 , Z2 , . . . , ZT ) ∈ ·, T < ∞} = E I {(Z1 , . . . , ZT ) ∈ ·, T < ∞} eθ(ST −S0 )−T (θµ+ 2
)
and
σ2 θ2
P {(Z1 , · · · , ZT ) ∈ ·, T < ∞} = Eθ I {(Z1 , . . . , ZT ) ∈ ·, T < ∞} e−θ(ST −S0 )+T (θµ+ 2
)
.
Furthermore, the Zi ’s are iid under Pθ with common increment distribution given by
θσ 2 1 (z−µ)2 2 2
1 (z−µ−θσ 2 )2
+θZ−θµ− σ 2θ
Pθ (Zi ∈ dz) = P {Z ∈ dz} eθZ−(θµ+ 2
)
=√ e− 2σ 2 dz = √ e− 2σ2 ,
2πσ 2 2πσ
so the Zi ’s are iid Norm(µ + θσ 2 , σ 2 ) rv’s under Pθ . Hence, the change-of-measure Pθ has added
θσ 2 into the mean of the Zi ’s as computed under P. So,
σ2 θ2
P (Z1 + θσ 2 , Z2 + θσ 2 , . . . , Zn ∈ θσ 2 ) ∈ · = E I {(Z1 , . . . , Zn ) ∈ ·} eθ(Sn −S0 )−n(θµ+ 2 ) .
(2.8.6)
In particular, if µ = 0, (2.8.6) shows that one can (in principle) reduce the computation of a prob-
ability or expectation involving Gaussian random walks with drift to a corresponding expectation
computation involving a driftless (i.e. µ = 0) Gaussian random walk.
1 2
log P {Sn > n(µ + )} → − 2
n 2σ
as n → ∞. (This is a classical “large deviations” result. One way to prove this is by using the
change-of-measure Pθ under which the Zi ’s have mean µ + .)
27
§ SECTION 2: MARTINGALES
The above exponential change-of-measure makes clear that T = ∞ can not generally be substi-
tuted in (2.8.3) or (2.8.4). To see this, note that if we set A = {n−1 Sn → E Z1 as n → ∞}, then
the strong law of large numbers (SLLN) guarantees that P {A} = 1. On the other hand, the SLLN
under Pθ ensures that Pθ {n−1 Sn → ψ 0 (θ) as n → ∞} = 1, so Pθ {Ac } = 1 for θ 6= 0. The fact that
P{A} = 1 but Pθ {A} = 0 is inconsistent with (2.8.3) and (2.8.4) holding for T = ∞. One can
further verify that
1
log Mn (θ) → θ E Z1 − ψ(θ) < 0 a.s.
n
under P as n → ∞, so M∞ (θ) = 0 a.s. under P for this class of martingales.
Remark 2.8.1 In the language of measure-theoretic probability, Pθ and P are singular on the
sigma-algebra generated by (Zn : n ≥ 0) but mutually absolutely continuous on the σ-algebra
generated by (Zn : 0 ≤ n ≤ T ), where T is a finite-valued stopping time adapted to the Zn ’s.
Turning next to the exponential martingales associated with Markov random walks, we recall
that
h(θ, Xn )
Mn (θ) = eθSn −nψ(θ)
h(θ, X0 )
is a unit mean positive martingale adapted to (Xn : n ≥ 0). Let Pθ {·} and Eθ (·) be the probability
and expectation operators on Ω induced by the martingale (Mn (θ) : n ≥ 0). Note that
where K(θ) = (K(θ, x, y) : x, y ∈ S) is the stochastic matrix with (x, y)’th entry given by
h(θ, y)
K(θ, x, y) = eθf (x)−ψ(θ) .
h(θ, x)
In other words, X continues to be a Markov chain under Pθ (with stationary transition probabili-
ties), but with modified transition matrix given by K(θ).
Problem 2.8.3 Prove that if X is irreducible with finite state space, then
X
ψ 0 (θ) = π(θ, x)f (x),
x
Problem 2.8.4 Consider the exponential martingales constructed in Problem 2.7.1 for Markov
jump processes. If Pθ is the change-of-measure induced by (M (θ, t) : t ≥ 0), prove that X
continues to be a Markov jump process under Pθ , and compute its modified rate matrix.
28
§ SECTION 2: MARTINGALES
We conclude this section with a brief discussion of a useful class of additional nonnegative
martingales that induce associated change-of measures. Let X = (Xn : n ≥ 0) be a discrete state
space Markov chain and set T = inf{n ≥ 0 : Xn ∈ A ∪ B}, where A and B are disjoint subsets.
Suppose that Px {T < ∞} = 1 for each x ∈ S, and set
u∗ (x) = Px {XT ∈ A} .
u(XT ∧n )
Ex Mn = Ex =1
u(x)
and hence
Px {XT ∈ A, T ≤ n} + Ex [u(Xn )I {T ≥ n}] = u(x).
Sending n → ∞ and applying the Bounded Convergence Theorem yields the conclusion that u = u∗ .
In other words, any bounded solution to (2.8.7) (subject to the boundary condition on A and B)
is the probabilistically meaningful solution.
But (Mn : n ≥ 0) also induces a change-of measure, call it P̃. In particular, for fT (·) nonnega-
tive,
u(XT )
Ẽx fT (X0 , . . . , XT ) = Ex fT (X0 , . . . , XT )
u(X0 )
Ex [fT (X0 , . . . , XT )I {XT ∈ A}]
=
Px {XT ∈ A}
= Ex [fT (X0 , . . . , XT )|XT ∈ A] ,
where Ẽx [·] is the expectation operator associated with P̃{·|X0 = x}. In other words, the change-of-
measure induced by (Mn : n ≥ 0) is precisely the conditional distribution of X, given that XT ∈ A.
Furthermore, on {T < n},
u(y)
P̃{Xn+1 = y|X0 , . . . , Xn } = P (Xn , y) ,
u(Xn )
so that (Xn : 0 ≤ n < T ) is, conditional on XT ∈ A, a Markov chain with modified transition
matrix having (x, y)’th entry given by P (x, y)u(y)/u(x) for x ∈ Ac ∩ B c .
This suggests the possibility of writing M as the sum of a martingale (that models a fair game)
and a decreasing process. Such a decomposition is due to Doob and Meyer (and is a non-trivial
fact in continuous time; see Section 3.11).
Definition 2.9.1 We say that sequence V = (Vn : n ≥ 0) is predictable (with respect to (Zn : n ≥
0)) if (Vn+1 : n ≥ 0) is adapted to (Zn : n ≥ 0) (i.e. for each n ≥ 1, there exists a (deterministic)
function gn such that Vn = gn (Z0 , . . . , Zn−1 )).
Remark 2.9.1 Note that the sequence (Γn : n ≥ 0) constructed in the proof of Theorem 2.9.1 is
a predictable sequence.
Problem 2.9.1 Suppose that (Mn : n ≥ 0) is a supermartingale adapted to (Zn : n ≥ 0). Suppose
that
Mn = M̃n + Γ̃n ,
where (M̃n : n ≥ 0) is a martingale adapted to (Zn : n ≥ 0) with M̃0 = M0 and (Γn : n ≥ 1) is a
predictable sequence (with respect to (Zn : n ≥ 0)). Show that
n
X
M̃n = M0 + (Mi − E [Mi |Z0 , . . . , Zi=1 ]),
i=1
n
X
Γ̃n = (E [Mi |Z0 , . . . , Zi−1 ] − Mi=1 ).
i=1
(Hence, the decomposition can be viewed as being unique in the above sense.)
Definition 2.9.2 The sequence (Γn : n ≥ 1) is call the predictable quadratic variation of M and
is denoted by (hM i(n) : n ≥ 1).
30
§ SECTION 2: MARTINGALES
We discuss here the generalizations of our key results established for discrete-time martingales,
supermartingales, and submartingales to the continuous-time setting. Because of results like the
upcrossing and downcrossing inequalities, it should come as no surprise that if (M (t) : t ≥ 0) is a
submartingale adapted to (Z(t) : t ≥ 0), then one can establish (in great generality) that one can
presume that M (·) has right-continuous paths (provided that E M (·) is right-continuous on [0, ∞)).
Perhaps the most important result that we will need is the optional sampling theorem in contin-
uous time. This can be obtained from the discrete-time result by an approximation argument that is
widely used in proving that discrete-time martingale results can be generalized to continuous time.
Note that if M is a submartingale that is adapted to Z = (Z(t) : t ≥ 0), then (M (j2−n ) : j ≥ 0)
is a discrete-time submartingale adapted to (M (0), (Z(((j − 1) + t)2−n ) : 0 ≤ t ≤ 1) : j ≥ 1) for
each n ≥ 1. For a given stopping time T that is adapted to (Z(t) : t ≥ 0), put
Tn = 2−n d2n T e,
where dxe is the least integer greater than or equal to x. Note that
E M (T ∧ t) ≥ E M (0)
for t ≥ 0.
This is easily derived from the corresponding discrete-time result, namely Theorem 2.5.1, by
approximating the supremum over s ∈ [0, t] by the supremum over {j2−n : 0 ≤ j2−n ≤ t}.
The Martingale Convergence Theorem takes the following form in continuous time.
Definition 2.10.1 We say that (V (t) : t ≥ 0) predictable (with respect to (Z(t) : t ≥ 0)) if
(V (t) : t ≥ 0) is adapted to (Z(t−) : t ≥ 0) (i.e. for each t ≥ 0, there exists a (deterministic)
function gt such that V (t) = gt (Z(u) : 0 ≤ u < t)).
Problem 2.10.1 Let X = (X(t) : t ≥ 0) be a finite state Markov jump process with rate matrix
Q. For g : S → R, consider the martingale
Z t
M (t) = g(X(t)) − (Qg)(X(s))ds.
0
32