Section 2 - Martingales

MS&E 322 Winter 2023
Stochastic Calculus and Control January 7, 2023

Prof. Peter W. Glynn Page 1 of 32
Section 2: Martingales
Contents
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Definitions and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.3 State-dependent Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 The Optional Sampling Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 The Martingale Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Martingales for Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 Exponential Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.8 Nonnegative Martingales and Change-of-measure . . . . . . . . . . . . . . . . . . . . 25
2.9 The Doob-Meyer Decomposition for Supermartingales . . . . . . . . . . . . . . . . . 29
2.10 Martingales in Continuous Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1 Introduction
A fundamental tool in the analysis of Markov processes is the notion of a martingale. Mar-
tingales also form the basis of the modern theory of stochastic integration. The general theory is
phrased in terms of integration against arbitrary square-integrable martingales (rather than inte-
gration against standard Brownian motion, as will be the emphasis in this course).
2.2 Definitions and Basic Properties
We start with the notion of “adaptedness”.
Definition 2.2.1 We say that the sequence X = (Xn : n ≥ 0) is adapted to the sequence Z =
(Zn : n ≥ 0) if, for each n ≥ 0, there exists a (deterministic) function gn (·) such that
Xn = gn (Z0 , Z1 , . . . , Zn ).
Similarly, we say that the process X = (X(t) : t ≥ 0) is adapted to the process Z = (Z(t) : t ≥ 0)
if, for each t ≥ 0, there exists a (deterministic) function gt (·) such that
X(t) = gt (Z(s) : 0 ≤ s ≤ t).
Remark 2.2.1 An equivalent terminology is to assert that X is non-anticipating relative to Z (as

an alternative to requiring that X is adapted to Z).
Remark 2.2.2 The notion of adaptedness, as we have introduced it above, is not quite rigorous
from a measure-theoretic probability viewpoint. In the language of measure-theoretic probability,
1
§ SECTION 2: MARTINGALES
the precise statement that (Xn : n ≥ 0) is adapted to (Zn : n ≥ 0) comes down to requiring that
for each n ≥ 0, Xn is Fn -measurable, where Fn is the sigma-algebra generated by Z0 , . . . , Zn
(typically denoted as σ(Z0 , . . . , Zn )). However, the assertion that Xn is Fn -measurable is known
to be equivalent to asserting that Xn can be expressed as a measurable function of Z0 , . . . , Zn . In
other words, the assumption of Fn -measurability is essentially identical to what we are requiring in
our definition. We use our version of the definition because we are not requiring measure-theoretic
probability as a prerequisite for this class, and because no serious mathematical or computational
errors are likely to arise in typical applications of the material covered in this class as a result of
our more intuitive (but not completely rigorous) definition.
It should be further noted that a common mathematical terminology for adaptedness is to state
that (Xn : n ≥ 0) is adapted to the filtration (Zn : n ≥ 0). Similarly, adaptedness of (X(t) : t ≥ 0)
to (Z(t) : t ≥ 0) is typically phrased, in measure-theoretic probability texts, as a requirement that
(X(t) : t ≥ 0) is adapted to the filtration (Ft : t ≥ 0), where Ft = σ(Z(s) : 0 ≤ s ≤ t).
A martingale is a generalization of a random walk having mean-zero increments, namely a se-

quence (Sn : n ≥ 0) in which Sn can be represented as Sn = S0 + Z1 + · · · + Zn , with S0 , Z1 , Z2 , . . .
independent with E [Zi ] = 0. If one views Sn as the fortune of a gambler at time n, then the random
walk (Sn : n ≥ 0) is a mathematical model in which the gambler is making successive wagers that
are “fair”. In other words, (Sn : n ≥ 0) models a “fair game”. A martingale is a generalization of
the fair game concept.
Definition 2.2.2 We say that (Mn : n ≥ 0) is a martingale (adapted to (Zn : n ≥ 0)) if:
1. E |Mn | < ∞ for n ≥ 0;
2. (Mn : n ≥ 0) is adapted to (Zn : n ≥ 0);
3. E [Mn+1 |Z0 , . . . , Zn ] = Mn a.s. for n ≥ 0.
Similarly, we say that (M (t) : t ≥ 0) is a martingale (adapted to (Z(t) : t ≥ 0)) if:
1. E |M (t)| < ∞ for t ≥ 0;
2. (M (t) : t ≥ 0) is adapted to (Z(t) : t ≥ 0);
3. E [M (t + s)|Z(u) : 0 ≤ u ≤ t] = M (t) a.s. for s, t ≥ 0.
Exercise 2.2.1 Prove that a mean-zero random walk is a martingale (adapted to (Sn : n ≥ 0)).
Exercise 2.2.2 Prove that if (Mn : n ≥ 0) is a martingale adapted to (Zn : n ≥ 0), then
(Mn : n ≥ 0) is a martingale adapted to (Mn : n ≥ 0).
Not surprisingly, it is also useful to model the notion of a favorable and unfavorable game.
Definition 2.2.3 We say that (Mn : n ≥ 0) is a submartingale (supermartingale) (adapted to

(Zn : n ≥ 0)) if
1. E |Mn | < ∞ for n ≥ 0;

2
2. (Mn : n ≥ 0) is adapted to (Zn : n ≥ 0);
3. E [Mn+1 |Z0 , . . . , Zn ] ≥ Mn a.s. for n ≥ 0 (E [Mn+1 |Z0 , . . . , Zn ] ≤ Mn a.s. for n ≥ 0).
Similarly, we say that (M (t) : t ≥ 0) is a submartingale (supermartingale) (adapted to (Z(t) : t ≥

0)) if:
1. E |M (t)| < ∞ for t ≥ 0;
2. (M (t) : t ≥ 0) is adapted to (Z(t) : t ≥ 0);
3. E [M (t + s)|Z(u) : 0 ≤ u ≤ t] ≥ M (t) a.s. for s, t ≥ 0 (E [M (t + s)|Z(u) : 0 ≤ u ≤ t] ≤ M (t)

a.s. for s, t ≥ 0).
Remark 2.2.3 From a terminology standpoint, it perhaps comes as a surprise that the “unfa-
vorable” supermartingale carries the prefix “super”, whereas the “favorable” submartingale carries
the prefix “sub”. This terminology has to do with the deep connection between martingale theory
and the classical area of mathematics known as “potential theory”; see Section 3.2 for additional
details.
Remark 2.2.4 In the setting of measure-theoretic probability, it is common to use the more com-
pact notation E [X|Ft ] in place of E [X|Z(u) : 0 ≤ u ≤ t], where Ft = σ(Z(u) : 0 ≤ u ≤ t). In
view of this notation, condition 3 above would be written as the requirement that
E [M (t + s)|Ft ] = M (t) a.s.
for s, t ≥ 0.
We mentioned earlier that a martingale is a generalization of a mean-zero random walk. To

make this point precise, let
Dj = Mj − Mj−1
be the j’th martingale difference (for j ≥ 1), and note that we can write a discrete-time martingale
as
Mn = M0 + D 1 + D 2 + · · · + D n . (2.2.1)
Proposition 2.2.1 Let M = (Mn : n ≥ 0) be a martingale adapted to (Zn : n ≥ 0). Then
1. E [Dj |Z0 , . . . , Zj−1 ] = 0 for j ≥ 1;
2. E Dj = 0 for j ≥ 1 and E Mn = E M0 .
If, in addition, Mn is square-integrable for n ≥ 0, then:
3. E [Di Dj ] = 0 for i 6= j and E [M0 Di ] = 0 for i ≥ 1;

Pn
4. var (Mn ) = var (M0 ) + i=1 var (Di ) .
3
Proof: For 1., observe that
E [Dj |Z0 , . . . , Zj−1 ] = E [Mj − Mj−1 |Z0 , . . . , Zj−1 ] = E [Mj |Z0 , . . . , Zj−1 ] − Mj−1 = 0;
2. follows immediately from 1.. For 3., suppose, without loss of generality, that i < j. Then,
E [Di Dj |Z0 , . . . , Zj−1 ] = Di E [Dj |Z0 , . . . , Zj−1 ] = 0,
so it follows that E [Di Dj ] = 0. Similarly, E [M0 Di ] = 0 for i ≥ 1. Part 4. follows immediately

from 3. and the fact that E Di = 0 for i ≥ 1.
It follows from (2.2.1) that Mn − M0 can be written as a sum of mean-zero uncorrelated rv’s
D1 , D2 , . . . , Dn . It is in this sense that a martingale generalizes a mean-zero random walk.
We conclude this section with a result that hints of the connections between martingales and
stochastic integration.
Proposition 2.2.2 Let M = (Mn : n ≥ 0) be a martingale adapted to (Zn : n ≥ 0), and let
(φn : n ≥ 0) be a sequence of bounded rv’s that is adapted to (Zn : n ≥ 0). Then
n
∆
X
M̃n = φj−1 ∆Mj
j=1
is a martingale adapted to (Zn : n ≥ 0), where ∆Mj = Mj − Mj−1 for j ≥ 1 and

2 n
X
E φ2j−1 (∆Mj )2 .

E M̃n − M̃0 =
j=1
Proof: Note that

h i
E M̃n+1 |Z0 , . . . , Zn = M̃n + E [φn Dn+1 |Z0 , . . . , Zn ]
= M̃n + φn E [Dn+1 |Z0 , . . . , Zn ]
= M̃n
for n ≥ 0, verifying the fact that (M̃n : n ≥ 0) is a martingale adapted to (Zn : n ≥ 0). The rest
of the proof is left to the reader.
Remark 2.2.5 Note that M̃n is a discrete-time analogue to a continuous-time stochastic integral
of the form Z t
φ(s)dM (s).
0
2.3 State-dependent Martingales
Martingales play a key role in the analysis of Markov chains and Markov processes. We start
by considering the class of “state-dependent” processes of the form Mn = f (Xn ), where X = (Xn :
n ≥ 0) is an S-valued Markov chain with stationary transition probabilities.
Proposition 2.3.1 Suppose that f is such that Ex |f (Xn )| < ∞ for n ≥ 0 and x ∈ S. Then:
4
1. If f = P f , then (f (Xn ) : n ≥ 0) is a martingale adapted to (Xn : n ≥ 0);
2. If P f ≤ f , then (f (Xn ) : n ≥ 0) is a supermartingale adapted to (Xn : n ≥ 0);
3. If P f ≥ f , then (f (Xn ) : n ≥ 0) is a submartingale adapted to (Xn : n ≥ 0).
Proof: The integrability and adaptedness are obvious. Also,

X
Ex [f (Xn+1 )|X0 , . . . , Xn ] = f (y)P (Xn , y) = (P f )(Xn ) = f (Xn ),
y
verifying 1.; the proofs of 2. and 3. are similar.
Remark 2.3.1 Note that if f ≥ 0 and P f ≤ f , then
Ex f (Xj+1 ) = Ex [Ex [f (Xj+1 )|X0 , . . . , Xj ]] = Ex [(P f )(Xj )] ≤ Ex f (Xj ),
so it follows that Ex f (Xn ) ≤ f (x) for n ≥ 0. Hence, the integrability follows immediately when f
is non-negative and satisfies P f ≤ f .
A similar result holds in the context of Markov jump processes.
Exercise 2.3.1 Let X = (X(t) : t ≥ 0) be a Markov jump process living on finite state space S
and possessing rate matrix Q. Prove that:
1. If Qf = 0, then (f (X(t)) : t ≥ 0) is a martingale adapted to (X(t) : t ≥ 0).
2. If Qf ≤ 0, then (f (X(t)) : t ≥ 0) is a supermartingale adapted to (X(t) : t ≥ 0).
3. If Qf ≥ 0, then (f (X(t)) : t ≥ 0) is a submartingale adapted to (X(t) : t ≥ 0).
We, of course, expect to see analogous results in the setting of SDE’s. In particular, suppose
that X = (X(t) : t ≥ 0) satisfies the SDE
dX(t) = µ(X(t))dt + σ(X(t))dB(t).
The heuristic argument of Section 1 suggests that if f is bounded and twice continuously differen-
tiable, then
Ex [f (X(t))] = f (x) + t(L f )(x) + o(t)
as t → 0, where L is the second-order linear differential operator given by
d σ 2 (x) d2
L = µ(x) + .
dx 2 dx2
This, in turn, indicates that (f (X(t)) : t ≥ 0) will be a martingale adapted to (X(t) : t ≥ 0) if
L f = 0, and a supermartingale (submartingale) if L f ≤ 0 (L f ≥ 0).
In the special case that µ ≡ 0 and σ = I, then X = B and we are led to the conclusion that
(f (B(t)) : t ≥ 0) is a martingale (adapted to (B(t) : t ≥ 0)) if
∆f = 0 (2.3.1)
5
and a supermartingale (submartingale) if ∆f ≤ 0 (∆f ≥ 0), where
d
X ∂2
∆=
i=0
∂x2i
is the Laplacian differential operator. A function f satisfying (2.3.1) is said to be a harmonic

function, whereas functions satisfying the inequalities ∆f ≤ 0 and ∆f ≥ 0 are said to be super-
harmonic and subharmonic, respectively. The study of harmonic, superharmonic and subharmonic
functions lies at the heart of potential theory. It is this connection to potential theory that led
to the attachment of the prefixes “super” and “sub” to their corresponding martingale counterparts.
Note that the conditions that L f = 0, L f ≤ 0 or L f ≥ 0 for a general second-order differential

operator generalize the classical notions of harmonic, superharmonic, and subharmonic functions.
Such functions inherit many of the associated properties of classical functions for which L = ∆. As
a result such functions are called L -harmonic, L -superharmonic and L -subharmonic, respectively.
Similarly, for P a transition matrix, functions satisfying f = P f , P f ≤ f and P f ≥ f are
respectively called P -harmonic, P -superharmonic and P -subharmonic.
2.4 The Optional Sampling Theorem
We have already established that if M = (Mn : n ≥ 0) is a martingale, then
E Mn = E M 0 (2.4.1)
for n ≥ 0. It is natural to wonder about the degree to which (2.4.1) can be extended to a random
time T , namely to an identity of the form
E MT = E M0 . (2.4.2)
The theory of “optional sampling” provides conditions under which (2.4.2) is valid.
Before describing the relevant theory, let us illustrate how (2.4.2) can be used to compute ex-
pectations and probabilities of interest.
Example 2.4.1 Let S = (Sn : n ≥ 0) be a symmetric nearest-neighbor random walk on Z, so that

Sn = S0 +Z1 +· · ·+Zn , where the Zi ’s are iid (independent of S0 ) with P {Zi = −1} = P {Zi = 1} =
1/2. Put S0 = 0 and let a, b ∈ Z+ . We are interested in computing the exit probabilities for S
from the interval [−a, b]. Put T = inf{n ≥ 0 : Sn ≥ b or Sn ≤ −a}. If (2.4.2) holds, then
E ST = 0. (2.4.3)
But
E ST = b P {ST = b} − a P {ST = −a}

= b(1 − P {ST = −a}) − a P {ST = −a}
= b − (a + b) P {ST = −a} ,
6
from which it follows that

b a
P {ST = −a} = , P {ST = b} = .
a+b a+b
Example 2.4.2 Consider once again the nearest-neighbor random walk of 2.4.1, but with P {Zi = 1} =
p = 1 − P {Zi = −1}. Suppose S0 = x ∈ Z+ and T = inf{n ≥ 0 : Sn = 0}. Our goal is to com-
pute E T . (This computation was also considered in Section 2.2, where “first transition” analysis
was used to derive a linear system of equations from which E T could be calculated). Note that
(Sn − n(2p − 1) : n ≥ 0) is a martingale adapted to (Sn : n ≥ 0). Assuming that (2.4.2) is valid,
we find that
E [ST − T (2p − 1)] = x.
But E ST = 0, so
E T (1 − 2p) = x. (2.4.4)
The equation (2.4.4) clearly makes no sense if p ≥ 1/2. On the other hand, if p < 1/2, it suggests
that
x
ET = .
1 − 2p
Note that if p > 1/2, the system has a tendency to drift to infinity, so we expect that T = ∞ with
positive probability in that case. So the restriction to the case p < 1/2 is a reasonable one.
The above constraint on p suggests that optional sampling is not universally valid. In 2.4.2, the
difficulty is that when p > 1/2, T is not finite-valued a.s., so that MT makes no sense. But even
when T is finite-valued, optional sampling need not be valid.
Example 2.4.3 Let S = (Sn : n ≥ 0) be a symmetric nearest-neighbor random walk on Z. Put

S0 = 0 and T = inf{n ≥ 0 : Sn = +1}. Since S is a recurrent Markov chain, T is a.s. finite-valued.
If (2.4.2) is valid, we would conclude that
1 = E ST = E S0 = 0,
which is a clear contradiction.
Example 2.4.4 Let S = (Sn : n ≥ 0) be a random walk on Z for which P {Zi = 0} = r and
P {Zi = −1} = P {Zi = 1} = (1 − r)/2. Put
Mn = Sn2 − nσ 2 ,
where σ 2 = var Zi = (1 − r). We claim that (Mn : n ≥ 0) is a martingale adapted to (Sn : n ≥ 0).
The key is to note that
E [Mn+1 |S0 , · · · , Sn ] = E (Sn + Zn+1 )2 − (n + 1)σ 2 |S0 , . . . , Sn

= Sn2 − (n + 1)σ 2 + 2Sn E [Zn+1 |S0 , . . . , Sn ] + E Zn+1

2
|S0 , . . . , Sn
= Sn2 − (n + 1)σ 2 + σ 2 = Mn .
Put T1 = inf{n ≥ 0 : Sn = 1} and T2 = inf{n ≥ 0 : |Sn | = 1}; both T1 and T2 are a.s. finite-valued.
If S0 = 0, (2.4.2) implies that
E ST2i − Ti (1 − r) = 0.

7
But ST2i = 1, so we find that

1
E T1 = = E T2 .
1−r
However, T2 ≤ T1 , so it is impossible that both E T1 and E T2 equal (1 − r)−1 . This raises the
question of which identity E Ti = (1 − r)−1 (if any) is correct. Hence, optional sampling cannot be
applied to this martingale at both the times T1 and T2 .
Recalling the fact that a martingale is a mathematical idealization of a gambler’s fair game, it
seems reasonable that the optional sampling theorem should require that the random time T be
non-clairvoyant.
Definition 2.4.1 We say that a random time T ∈ Z+ is a stopping time that is adapted to
(Zn : n ≥ 0) if the sequence of rv’s (I {T ≤ n} : n ≥ 0) is adapted to (Zn : n ≥ 0). A random time
T ∈ R+ is a stopping time that is adapted to (Z(t) : t ≥ 0) if the process (I {T ≤ t} : t ≥ 0) is
adapted to (Z(t) : t ≥ 0).
Remark 2.4.1 An equivalent way of asserting that T ∈ Z+ is a stopping time adapted to (Zn :
n ≥ 0) is to require that (I {T = n} : n ≥ 0) is adapted to (Zn : n ≥ 0).
Remark 2.4.2 In the language of measure-theoretic probability, T ∈ Z+ is a stopping time if

I {T ≤ n} ∈ Fn for each n ≥ 0, where Fn = σ(Z0 , . . . , Zn ); this has an identical meaning to our
definition above.
∆ ∆
For a, b ∈ R, let a ∨ b = max{a, b} and a ∧ b = min{a, b}.
Exercise 2.4.1 1. Prove that if T = n (i.e. T is deterministic), then T is a stopping time

adapted to (Zn : n ≥ 0).
2. Prove that if T = inf{n ≥ 0 : Zn ∈ A} (i.e. T is the “first hitting time” of A), then T is a
stopping time adapted to (Zn : n ≥ 0).
3. Prove that if T1 and T2 are stopping times adapted to (Zn : n ≥ 0). then T1 ∨ T2 , T1 ∧ T2
and T1 + T2 are stopping times adapted to (Zn : n ≥ 0).
We are now ready to state and prove the basic optional sampling theorem.
Theorem 2.4.1 (Optional Stopping Theorem) 1. Let (Mn : n ≥ 0) be a martingale adapted

to (Zn : n ≥ 0) and let T ∈ Z+ be a stopping time adapted to (Zn : n ≥ 0). Then, for each
n ≥ 0,
E MT ∧n = E M0 . (2.4.5)
2. Let (Mn : n ≥ 0) be a supermartingale (submartingale) adapted to (Zn : n ≥ 0) and let

T ∈ Z+ be a stopping time adapted to (Zn : n ≥ 0). Then, for each n ≥ 0, E MT ∧n ≤ E M0
(E MT ∧n ≥ E M0 ).
8
Proof: Note that

T
X ∧n n
X
MT ∧n = M0 + D i = M0 + Di I {T ≥ i} .
i=1 i=1
The rv I {T ≥ i} can be re-written as 1 − I {T ≤ i − 1}. So
E [Di I {T ≥ i} |Z0 , . . . , Zi−1 ] = [1 − I {T ≤ i − 1}] E [Di |Z0 , . . . , Zi−1 ] = 0,
and hence E [Di I {T ≥ i}] = 0, proving 1.. The proof of part 2. is identical.
Of course, our goal is (2.4.2), not (2.4.5). Suppose that T is a finite-valued stopping time. Then
MT ∧n → MT a.s.
as n → ∞. Hence, (2.4.2) holds provided that we can justify interchanging the expectation and
limit: h i
E MT = E lim MT ∧n = lim E MT ∧n = E [M0 ] .
n→∞ n→∞
The standard results that allow one to interchange limits and expectations are the Bounded Conver-
gence Theorem (BCT), the Dominated Convergence Theorem (DCT), the Monotone Convergence
Theorem, and Fatou’s lemma.
We illustrate the use of optional sampling on the examples discussed earlier in this section.
Example 2.4.1 (Continued)

According to Theorem 2.4.1,
E ST ∧n = 0.
But |ST ∧n | ≤ a ∨ b. So, the BCT justifies (2.4.3).

Consider first the case where Sn = S0 + Z1 + · · · + Zn and where the Zi ’s are iid non-negative
rv’s with finite expectation (and independent of S0 assumed integrable). Let T be a stopping time
adapted to (Sn : n ≥ 0). Then, Mn = Sn − n E Zi is a martingale adapted to (Sn : n ≥ 0) and
E MT ∧n = E M0 ,
so that
E [ST ∧n ] = E S0 + E [T ∧ n] E Z1 .
Note that ST ∧n − S0 and T ∧ n are non-negative rv’s that increase monotonically to ST − S0 and
T , respectively. Hence, the Monotone Convergence Theorem implies that
E ST = E S0 + E T · E Z1 . (2.4.6)
To deal with the case where the Zi ’s are of mixed sign, apply (2.4.6) to the Zi+ ’s and Zi− ’s separately
∆ ∆
(where Zi+ = Zi ∨ 0 and Zi− = −Zi ∨ 0), yielding
" T # " T #
X X
E Zi+ = E T · E Z1+ and E Zi− = E T · E Z1− . (2.4.7)
i=1 i=1
Hence, if E |Z1 | < ∞, and E T < ∞ (2.4.6) follows from (2.4.7). Returning to the specific problem
context of 2.4.2, it is a standard fact that E T < ∞ when p < 1/2, proving (2.4.4) under this
restriction on p.
9

In this example (Sn : n ≥ 0) is a null recurrent Markov chain and E T = ∞, so we can not conclude
(2.4.6).

Consider the stopping time T2 . It is finite-valued with E T2 < ∞. (Prove this!) Applying Theorem
2.4.1, we find that if S0 = 0,
E ST2 ∧n = (1 − r) E [T2 ∧ n] .
The Monotone Convergence Theorem implies that E T2 ∧n % E T2 . On the other hand, |ST2 ∧n | ≤ 1,
so the Bounded Convergence Theorem yields
E ST22 ∧n → E ST22
as n → ∞. Hence
1 = E ST22 = (1 − r) E T2 ,
proving that E T2 = (1 − r)−1 . The optional sampling theorem clearly must therefore fail for T1 .
2.5 The Martingale Convergence Theorem
Perhaps surprisingly, the definition of a martingale imposes a great deal of structure on the
“path regularity” of the process. As an example, we start with the maximal inequality (that
controls the maximal behavior of the path).
Theorem 2.5.1 Let (Mn : n ≥ 0) be a submartingale adapted to (Zn : n ≥ 0). Then, for x > 0,
E Mn+

P max Mk > x ≤ .
0≤k≤n x
Proof: Let τ = inf{j ≥ 0 : Mj > x}, and note that P (max0≤k≤n Mk > x) = P (τ ≤ n). Since
Mτ /x > 1,
n
E Mτ I {τ ≤ n} X E Mj
P (τ ≤ N ) ≤ = I {τ = j} .
x x
j=0
But
E [Mn I {τ = j} |Z0 , . . . , Zj ] ≥ I {τ = j} Mj
so it follows that
P (τ ≤ n) ≤ x−1 E [Mn I {τ ≤ n}] ≤ x−1 E [Mn I {T ≤ n, Mn > 0}] ≤ x−1 E Mn+ .
Remark 2.5.1 The submartingale property is preserved under convex mappings.
Proposition 2.5.1 Let φ : R → R be convex. If (Mn : n ≥ 0) is a martingale adapted to

(Zn : n ≥ 0) for which E |φ(Mn )| < ∞ for n ≥ 0, then (φ(Mn ) : n ≥ 0) is a submartingale adapted
to (Mn : n ≥ 0). If φ is both convex and non-decreasing and (Mn : n ≥ 0) is a submartingale for
which E |φ(Mn )| < ∞ for n ≥ 0, then (φ(Mn ) : n ≥ 0) is a submartingale adapted to (Mn : n ≥ 0).
10
Proof: By the (conditional) Jensen’s inequality,
E [φ(Mn+1 )|Z0 , . . . , Zn ] ≥ φ (E [Mn+1 |Z0 , . . . , Zn ]) .
The result is then an immediate consequence of the fact that (Mn : n ≥ 0) is a submartingale.
Corollary 2.5.1 Let (Mn : n ≥ 0) be a martingale adapted to (Zn : n ≥ 0) for which E |Mn |p < ∞
for p ≥ 1. Then
E |Mn |p

P max Mk > x ≤ . (2.5.1)
0≤k≤n xp
A further example of “path regularity” is the “upcrossing inequality” for submartingale. For
a < b, let Un [a, b] be the number of times that (Mj : 0 ≤ j ≤ n) crosses from below a to above b.
Proposition 2.5.2 Let (Mj : j ≥ 0) be a submartingale adapted to (Zj : j ≥ 0). Then
E Mn+ + |a|
E Un [a, b] ≤ .
b−a
See Billingsley for a proof.
A similar downcrossing inequality also holds. These upcrossing/ downcrossing inequalities es-
tablish that submartingales can typically not exhibit “oscillatory discontinuities” (since an infinite
number of oscillations implies that there must exist a < b for which infinitely many upcross-
ings occur). Hence, submartingales can fail to converge almost surely only if they diverge to
infinity. This divergence can be prevented through an appropriate assumption on the submartin-
gale/supermartingale.
Theorem 2.5.2 1. Let (Mn : n ≥ 0) be a submartingale adapted to (Zn : n ≥ 0). If
sup E |Mn | < ∞,

n≥0
then there exists a finite-valued rv M∞ such that
Mn → M∞ a.s.
as n → ∞.
2. Let (Mn : n ≥ 0) be a non-negative supermartingale adapted to (Zn : n ≥ 0). Then, there

exists a finite-valued rv M∞ such that
Mn → M∞ a.s.
as n → ∞.
When (Mn : n ≥ 0) is a martingale, part 1. is called the Martingale Convergence Theorem.

Part 2. is known as the Supermartingale Convergence Theorem.
The following variant of the Martingale Convergence Theorem follows from an elementary ar-
gument. Recall that for a rv W , kW kp = (E |W |p )1/p for p ≥ 1.
11
Proposition 2.5.3 Let (Mn : n ≥ 0) be a martingale that is adapted to (Zn : n ≥ 0). If
sup E Mn2 < ∞,

n≥0
then there exists a finite-valued rv M∞ such that
kMn − M∞ k2 → 0
as n → ∞.
Proof: Note that for n ≥ m,

2
n
X n
X ∞
X
kMn − Mm k22 = Dj = kDj k22 ≤ kDj k22 ,
j=m+1 j=m+1 j=m+1
2
which can be made arbitrarily small by choosing n large enough, on account of the fact that
∞
X
sup E Mn2 = E M02 + E Dj2 < ∞.
n≥0 j=1
Hence, (Mn : n ≥ 0) is a Cauchy sequence in L2 , proving the result.

We can apply the Martingale Convergence Theorem to establish the following result about
P -harmonic functions.
Proposition 2.5.4 Let X = (Xn : n ≥ 0) be an irreducible recurrent Markov chain on discrete

state space S with transition matrix P . Then, any bounded P -harmonic function is a constant
function.
Proof: Note that (f (Xn ) : n ≥ 0) is a martingale that is adapted to (Xn : n ≥ 0). Since f
is bounded, the Martingale Convergence Theorem applies. So, there exists a finite-valued rv M∞
such that
f (Xn ) → M∞ a.s. (2.5.2)
as n → ∞. But if f is non-constant, there exists x, y ∈ S such that f (x) < f (y). Since (Xn : n ≥ 0)
is recurrent, evidently
lim inf f (Xn ) ≤ f (x)
n→∞
and
lim sup f (Xn ) ≥ f (y)
n→∞
contradicting (2.5.2). So, f must be constant.
Problem 2.5.1 Suppose that X = (Xn : n ≥ 0) is an irreducible recurrent Markov chain on

discrete state space S with transition matrix P . If f is non-negative and P -superharmonic, then
show that f is a constant function.
12
2.6 Martingales for Markov Chains
In this section, we establish a tight connection between construction of martingales in the

Markov setting and associated systems of linear equations satisfied by certain expectations.
We start with the following simple result.
Proposition 2.6.1 Let Y by an integrable rv (i.e. E |Y | < ∞). Then, for any sequence (Zn : n ≥
0),
Mn = E [Y |Z0 , . . . , Zn ]
is a martingale adapted to (Zn : n ≥ 0).
Proof: The key step is to note that
E [Mn+1 |Z0 , . . . , Zn ] = E [E [Y |Z0 , . . . , Zn+1 ] |Z0 , . . . , Zn ]

= E [Y |Z0 , . . . , Zn ]
= Mn ,
as a consequence of the “tower property” of conditional expectation.
Definition 2.6.1 A sequence of rv’s (Wn : n ≥ 0) is said to be uniformly integrable if for each
> 0, there exists x = x() such that
E [|Wn |I {|Wn | > x}] < .
Problem 2.6.1 Prove that (Wn : n ≥ 0) is uniformly integrable if (Wn : n ≥ 0) is Lp bounded

for some p > 1 (i.e. supn≥0 kWn kp < ∞ for some p > 1).
Theorem 2.6.1 A martingale M = (Mn : n ≥ 0) adapted to (Zn : n ≥ 0) is uniformly integrable

if and only if there exists an integrable rv Y such that Mn = E [Y |Z0 , . . . , Zn ] for n ≥ 0.
For the proof, see Chung.
We are now ready to discuss the connection between martingales and linear systems in the
Markov setting through the following illustrations of the general principle.
Finite-time expectations of the form Ex f (Xn ): Let X = (Xn : n ≥ 0) be a Markov chain

on discrete state space S, possessing stationary transition probabilities. For f bounded (i.e.
supx |f (x)| < ∞), put u∗ (n, x) = Ex f (Xn ). Set
Y = f (Xn )
and apply Proposition (2.6.1). It follows that Mj = Ex [f (Xn )|X0 , . . . , Xj ] is a martingale adapted
to (Xj : 0 ≤ j ≤ n) for 0 ≤ j ≤ n. But
Ex [f (Xn )|X0 , . . . , Xj ] = Ex [f (Xn )|Xj ] = u∗ (n − j, Xj ).

13
Hence, we conclude that (u∗ (n − j, Xj ) : 0 ≤ j ≤ n) is a martingale adapted to (Xj : 0 ≤ j ≤ n).

Of course, the u∗ (j)’s satisfy a linear system described in Chapter 2.
The above analysis extends easily to the continuous-time setting.
Problem 2.6.2 1. Let X = (X(t) : t ≥ 0) be an S-valued Markov process with stationary

transition probabilities. For f : S → R+ , let u∗ (t, x) = Ex f (X(t)). If (u∗ (s, x) : 0 ≤ s ≤
t, x ∈ S) is finite-valued, prove that (u∗ (t − s, X(s)) : 0 ≤ s ≤ t) is a martingale adapted to
(X(s) : 0 ≤ s ≤ t).
2. Let (u(s, x) : 0 ≤ s ≤ t, x ∈ S) be a function satisfying u(0, x) = f (x) for x ∈ S. If

(u(t − s, X(s)) : 0 ≤ s ≤ t) is a martingale adapted to (X(s) : 0 ≤ s ≤ t), prove that
u(t, x) = Ex f (X(s)).
Supermartingale and submartingale ideas allow one to obtain bounds on Ex f (X(t)).
Problem 2.6.3 Let (u(s, x) : 0 ≤ s ≤ t, x ∈ S) be a function satisfying u(0, x) ≥ f (x) for x ∈ S.

If (u(t − s, X(s)) : 0 ≤ s ≤ t) is a supermartingale adapted to (X(s) : 0 ≤ s ≤ t), show that
Ex f (X(t)) ≤ u(t, x).
So, if one can guess a good candidate function u, Problem (2.6.3) provides bounds on the cor-
responding expectation.
hP i
∞ −αj f (X ) : For α > 0 and f
Infinite-horizon discounted expectation of the form Ex j=0 e j
hP i
∞
bounded, let u∗ (x) = Ex j=0 e−αj f (X ) . Put
j
∞
X
Y = e−αj f (Xj ),
j=0
and apply Proposition (2.6.1). Hence, Mj = E [Y |X0 , . . . , Xj ] is a martingale adapted to (Xj :

j ≥ 0). But
"∞ # j−1
X X
E e−α` f (X` )|X0 , . . . , Xj = e−α` f (X` ) + e−αj u∗ (Xj ).
`=0 `=0
So,
n−1
X
Mn = e−αj f (Xj ) + e−αn u∗ (Xn )
j=0
is a martingale adapted to (Xn : n ≥ 0). The u∗ (x)’s satisfy a linear system described in Chapter 2.
As in our earlier example, this extends clearly to the continuous-time context.
Problem 2.6.4 1. Let f : S → R+ . For α > 0, put

Z ∞
∗ −αt
u (x) = Ex e f (X(t))dt ,
0
14
and assume (u∗ (x) : x ∈ S) is finite-valued. Prove that

Z t
M (t) = e−αs f (X(s))ds + e−αt u∗ (X(t))
0
is a martingale adapted to (X(s) : 0 ≤ s ≤ t)
2. Let (u(x) : x ∈ S) be a bounded function for which

Z t
M (t) = e−αs f (X(s))ds + e−αt u(X(t))
0
is a martingale adapted to (X(t) : t ≥ 0). Prove that

Z ∞
u(x) = Ex e−αt f (X(t))dt .
0

Z t
M (t) = e−αs f (X(s))ds + e−αt u(X(t))
0
is a supermartingale adapted to (X(t) : t ≥ 0). Prove that

Z ∞
−αt
u(x) ≥ Ex e f (X(t))dt .
0
Expected reward up to a hitting time: For C c ⊆ S, let T = inf{n ≥ 0 : Xn ∈ C c }. For

f : S → R+ , set  
TX−1
u∗ (x) = Ex  f (Xj ) ,
j−0
and
T
X −1
Y = f (Xj ).
j=0
If (u∗ (x)
: x ∈ S) is finite-valued, Proposition (2.6.1) implies that Mn = E [Y |X0 , . . . , Xn ] is a
martingale adapted to (Xn : n ≥ 0). Here,
(T ∧n)−1
X
E [Y |X0 , . . . , Xn ] = f (Xj ) + u∗ (XT ∧n ),
j=0
so
(T ∧n)−1
X
Mn = f (Xj ) + u∗ (XT ∧n ) (2.6.1)
j=0
is a martingale adapted to (Xn : n ≥ 0). Again, u∗ = (u∗ (x) : x ∈ S) satisfies a linear system as
described earlier.
The next problem describes the corresponding result in continuous time.

15
Problem 2.6.5 Let X be as in 2.6.2 and let f : S → R+ . Suppose T = inf{t ≥ 0 : X(t) ∈ C c }.
1. Put Z T
∗
u (x) = Ex f (X(s))ds .
0
If (u∗ (x) : x ∈ S) is finite-valued, prove that

Z T ∧t
M (t) = f (X(s))ds + u∗ (X(T ∧ t))
0
is a martingale adapted to (X(t) : t ≥ 0).

Z T ∧t
M (t) = f (X(s))ds + u(X(T ∧ t))
0
is a martingale adapted to (X(t) : t ≥ 0). Prove that if Px {T < ∞} = 1 for x ∈ S and u = 0

on C c , Z T
u(x) = Ex f (X(s))ds .
0
3. Let (u(x) : x ∈ S) be a nonnegative function for which

Z T ∧t
M (t) = f (X(s))ds + u(X(T ∧ t))
0
is a supermartingale adapted to (X(t) : t ≥ 0). Prove that

Z T
u(x) ≥ Ex f (X(s))ds .
0
Note that the process defined by (2.6.1) is exactly the process (M̃T ∧n : n ≥ 0), where
n−1
X
M̃n = f (Xj ) + u∗ (Xn ).
j=0
In view of the Optional Sampling Theorem, this suggests that the process (M̃n : n ≥ 0) is a
martingale. This is justified by our next result.
Proposition 2.6.2 Let X = (Xn : n ≥ 0) be a Markov chain on discrete state space S with
transition matrix P . Suppose that Ex |g(Xn )| < ∞ for n ≥ 0. Then, there exists a function f such
that
n−1
X
Mn = g(Xn ) − f (Xk ) (2.6.2)
k=0
is a martingale adapted to (Xn : n ≥ 0), under Px . Furthermore, f = (P − I)g.

16
Proof: Note that Ex |(P g)(Xn )| ≤ Ex [(P |g|)(Xn )] = Ex |g(Xn+1 )| < ∞, so that the f (Xn )’s are
integrable rv’s with respect to the distribution Px . Observe that
n
X n−1
X
Ex [Mn+1 |X0 , . . . , Xn ] = (P g)(Xn ) − f (Xk ) = g(Xn ) − f (Xk ) = Mn ,
k=0 k=0
so (Mn : n ≥ 0) is a martingale adapted to (Xn : n ≥ 0).
Remark 2.6.1 A martingale of the form (2.6.2) is called a Dynkin martingale.
The fact that processes of the form (2.6.2) are martingales adapted to (Xn : n ≥ 0) actually
forces X to be Markov.
Proposition 2.6.3 Let X = (Xn : n ≥ 0) be a sequence of S-valued rv’s. Suppose that for each
bounded function g : S → R, there exists a function f (denoted Ag) such that
n−1
X
Mn = g(Xn ) − f (Xk )
k=0
is a martingale adapted to (Xn : n ≥ 0). Then, X is Markov with stationary transition probabilities.
Proof: Because f (Xn ) = g(Xn+1 ) − g(Xn ) + Mn − Mn+1 , the f (Xn )’s are integrable rv’s. If
Dn = Mn − Mn−1 recall that E [Dn+1 |X0 , . . . , Xn ] = 0 (since Dn+1 is a martingale difference). So
0 = E [g(Xn+1 ) − g(Xn ) − f (Xn )|X0 , . . . , Xn ] = E [g(Xn+1 )|X0 , . . . , Xn ] − g(Xn ) − f (Xn ).
In other words
E [g(Xn+1 )|X0 , . . . , Xn ] = g(Xn ) + f (Xn ).
Specializing to the case where g(y) = δz (y), we conclude that
P {Xn+1 = z|X0 , . . . , Xn } = δz (Xn ) + (Aδz )(Xn ).
It follows that X = (Xn : n ≥ 0) is Markov (with stationary transition probabilities).

This martingale characterization of the Markov property has become a powerful tool in the
theory of Markov processes, particularly in establishing weak convergence to Markov processes;
see, for example, Ethier and Kurtz.
Remark 2.6.2 2.6.3 establishes that X is Markov with stationary transition probabilities. So, if
S is discrete, X possesses a one-step transition matrix P . One choice for f = Ag is
f = (P − I)g,
so that A can be taken to be the linear operator A = P − I.
Note that when (2.6.2) is a martingale, it follows that

"n−1 #
X
Ex f (Xk ) = g̃(x) − Ex g̃(Xn ), (2.6.3)
k=0
17
∆ Pn−1
where g̃ = −g. To apply the Dynkin’s martingale to the analysis of the rv k=0 f (Xk ) and obtain
the identity (2.6.3), we therefore need to find a function g̃ for which
(P − I)g̃ = −f. (2.6.4)
The equation (2.6.4) is called Poisson’s Equation.
Remark 2.6.3 When |S| < ∞, it is evident that (P − I) is a singular matrix (since 1 is an
eigenvalue of a stochastic matrix). So (2.6.4) cannot be solvable for all functions f . Suppose, in
particular, that X is irreducible. Thus, there exists a unique stationary distribution π satisfying
π = πP . Pre-multiplying through (2.6.4) by π establishing that πf = 0. So, (2.6.4) is then solvable
only when πf = 0.
The next result establishes a probabilistic representation for the solution to (2.6.4). Let fc (x) =
f (x) − πf for x ∈ S.
Proposition 2.6.4 Let X = (Xn : n ≥ 0) be a finite-state irreducible Markov chain with transi-
tion matrix P . If there exists a solution g to
(P − I)g = −fc , (2.6.5)
then g can be represented as  

τ (x)−1
X
g(x) = Ex  fc (Xk ) + c,
k=0
for some constant c ∈ R, where τ (x) = inf{n ≥ 1 : Xn = x}.
Proof: It follows from (2.6.5) that
n−1
X
Mn = g(Xn ) + fc (Xk )
k=0
is a martingale adapted to (Xn : n ≥ 0). The Optional Sampling Theorem yields the identity
Ex Mτ (x)∧n = Ex M0 = g(x). So,
 
(τ (x)∧n)−1
X
g(x) = Ex g(Xτ (x)∧n ) + Ex  fc (Xk ) . (2.6.6)
k=0
Since X is positive recurrent, τ (x) < ∞ a.s. and Ex τ (x) < ∞. Since fc and g are bounded, the
Dominated Convergence Theorem guaranties that
Ex g(Xτ (x)∧n ) → Ex g(Xτ (x) ) = g(x)
and
(τ (x)∧n)−1 τ (x)−1
X X
Ex fc (Xk ) → Ex fc (Xk )
k=0 k=0
18
as n → ∞. So (2.6.6) shows that

 
τ (x)−1
X
g(x) = c + Ex  fc (Xk ) ,
k=0
where c = g(0).
Remark 2.6.4 Note that if g solves (2.6.5), then so does g − c, for any c ∈ R. Hence, the
proposition above establishes that if there exists a solution g to (2.6.5), one choice of solution is
 
τ (x)−1
X
gz (x) = Ex  fc (Xk ) .
k=0
hP i
τ (x)−1
The solution gz (·) is the solution that vanishes at z. (Recall that because πfc = 0, Ex k=0 fc (Xk ) =
0.)
Our final result of this section establishes a converse to the previous proposition.
Proposition 2.6.5 Let X = (Xn : n ≥ 0) be a finite-state irreducible Markov chain with transi-
tion matrix P . Then  
τ (x)−1
X
g ∗ (x) = Ex  fc (Xk ) (2.6.7)
k=0
is a solution of (P − I)g = fc (so Poisson’s equation always has a solution for such finite-state
chains).
Proof: Because X is positive recurrent and fc is bounded, g as defined through (2.6.7) is finite-
valued. By conditioning on X1 , we find that for x ∈ S,
X
g ∗ (x) = fc (x) + P (x, y)g ∗ (y).
y6=x
But g ∗ (x) = 0. Hence,

X
g ∗ (x) = fc (x) + P (x, y)g ∗ (y)
y
for x ∈ S, proving that g ∗ = fc + P g ∗ as required.
Problem 2.6.6 Let X = (X(t) : t ≥ 0) be a finite-state irreducible Markov jump process with
rate matrix Q.
1. Prove that Z t
M (t) = g(X(t)) − (Qg)(X(s))ds
0
is a martingale adapted to (X(s) : 0 ≤ s ≤ t).

19
2. Assuming that the Optional Sampling Theorem generalizes to continuous time, prove that
Qg = −f always has a solution given by
"Z #
τ (x)
g(x) = Ex fc (X(s))ds ,
0
where τ (x) = inf{t ≥ 0 : X(t−) 6= x, X(t) = x}.
Note that the state-dependent martingales discussed previously are all special cases of the
Dynkin martingales developed in this section.
2.7 Exponential Martingales
We saw earlier that if S = (Sn : n ≥ 0) is a random walk with Sn = S0 + Z1 + · · · + Zn , with

S0 , Z1 , Z2 , . . . independent and the Zi ’s independently distributed, then:
Sn −n E Z1 is a martingale adapted to (Sn : n ≥ 0), provided that E |S0 | < ∞ and E |Z1 | < ∞
(Sn − n E Z1 )2 − n var (Z1 ) is a martingale adapted to (Sn : n ≥ 0), provided that E S02 and
E Z12 are finite.
(Actually, we proved the second assertion in Example 2.4.4 only in that special case, but the
proof easily extends to the more general case as stated above.) It turns out that the two above
martingales, involving Sn and Sn2 , are the first two martingales in an infinite sequence of such
random walk martingales, in which the k’th martingale involves Snk .
The source of this infinite family of martingales is the so-called “exponential martingale” for
random walk. Assume that the moment generating function of the Zi ’s converges in a neighborhood
of the origin, so that
E eθZ1 < ∞ (2.7.1)
for θ in a neighborhood of the origin. The domain D of θ-values for which (2.7.1) holds then takes
the form [−a, b], (−a, b], [−a, b) or (−a, b) for some a, b > 0. Note that this covers most commonly
encountered increment distributions, but not all. In particular, if Z1 has “power tail” type decay
of the form P {Z1 > x} ∼ cx−α as x → ∞ for some c > 0, then (2.7.1) fails. So, (2.7.1) goes
hand-in-hand with the requirement that the rv Z1 is “light-tailed” (in the sense that its tails decay
at least exponentially fast).
Let
ψ(θ) = log E eθZ1
be the logarithmic moment generating function of Z1 (also known as the cumulant generating
function of Z1 ).
Proposition 2.7.1 For each θ ∈ D,

Pn
Zi θ−nψ(θ)
Mn (θ) = e i=1
is a martingale adapted to (Sn : n ≥ 0).

20
Proof: The key step is to observe that

h i
E [Mn+1 (θ)|S0 , . . . , Sn ] = Mn (θ) E eθZn+1 −ψ(θ) |S0 , . . . , Sn = Mn (θ) E eθZn+1 /eψ(θ) = Mn (θ).

Let D0 = (a, b) be the interior of D. It is well known that ψ is infinitely differentiable on D0 .
Also, for each θ ∈ D0 , θ + h ∈ D0 and θ − h ∈ D0 for h sufficiently small, so that (Mn (θ) : n ≥ 0),
(Mn (θ + h) : n ≥ 0) and (Mn (θ − h) : n ≥ 0) are all martingales adapted to (Sn : n ≥ 0). Since it
is trivial that the sum of two identically adapted martingales is a martingale, it follows that

Mn (θ + h) − Mn (θ)
: n≥0
h
is a martingale adapted to (Sn : n ≥ 0). In other words,

Mn+1 (θ + h) − Mn+1 (θ) Mn (θ + h) − Mn (θ)
E S0 , . . . , Sn = . (2.7.2)
h h
But
n
!
Mn (θ + h) − Mn (θ) X
→ Mn0 (θ) = 0
Zi − nψ (θ) Mn (θ) a.s.
h
i=1
as h → 0. A straightforward application of the Dominated Convergence Theorem for conditional
expectations therefore establishes that if we send h to 0 in (2.7.2), we get
0
(θ)|S0 , . . . , Sn = Mn0 (θ) a.s.

E Mn+1
so that (Mn0 (θ) : n ≥ 0) is a martingale adapted to (Sn : n ≥ 0). Since Mn (·) is infinitely differ-
entiable on D0 , a similar argument to that just used for the first derivative proves that for each
(k) (k)
θ ∈ D0 and k ≥ 1, (Mn (θ) : n ≥ 0) is a martingale adapted to (Sn : n ≥ 0), where Mn (θ) is the
k’th derivative (with respect to θ) of Mn (·).
Note that
d
E eθZ1 E Z1 eθZ1

(1) dθ
ψ (θ) = =
E eθZ1 E eθZ1
so that ψ (1) (0) = E Z1 . A similar calculation shows that ψ (2) (0) = var (Z1 ). It then easily follows
that
Sn − n E Z1 = Mn(1) (0)
and
(Sn − n E Z1 )2 − n var (Z1 ) = Mn(2) (0),
so that we recover the two random walk martingales discussed earlier in this section via the first
two derivatives of the exponential martingale (evaluated at 0). A martingale involving Snk is then
obtained from the exponential martingale by differentiating k times (and setting θ = 0); this mar-
tingale will involve the first k moments of Z1 .
In addition to generating this useful family of martingales, the exponential martingale is itself
valuable for many computations. As an illustration, consider the special case where the Zi ’s ∈
{−1, 1}, so that P {Zi = 1} = p = 1 − P {Z1 = −1} (and hence S is a nearest-neighbor random
21
walk). Put S0 = 0 and let T = inf{n ≥ 0 : Sn ≥ b, or Sn ≤ −a} for a, b ≥ 0. In an earlier

example, we computed the exit probabilities from [−a, b] for the special case p = 1/2, by using the
fact that (Sn : n ≥ 0) is itself a martingale. To handle p 6= 1/2, consider
Mn (θ) = exp(θSn − nψ(θ)). (2.7.3)
We will use the optional sampling theorem to compute P {ST = −a} and P {ST = b}. Since we are
interested in the location ST (and not the exit time T ), it seems intuitively convenient to try to
choose a value θ that will eliminate the contribution of T ψ(θ) to the exponent of (2.7.3) at the exit
time T . Since the choice of θ is at our disposal in (2.7.3), this suggests choosing θ so that ψ(θ) = 0
or, equivalently,
E eθZ1 = 1. (2.7.4)
One possible choice is θ = 0. But if we choose θ = 0 in (2.7.3), we get the trivial martingale
Mn (0) = 1 from which we can learn nothing. Furthermore, there is a second, non-zero, value of θ,
call it θ∗ , satisfying (2.7.4) when p 6= 1/2, namely

∗ 1−p
θ = log .
p
With this θ∗ in hand, the optional sampling theorem yields the identity
∗S
E eθ T ∧n
= 1.
But T < ∞ a.s. and exp(θ∗ ST ∧n ) ≤ exp(θ∗ (a ∨ b)). So, the Bounded Convergence Theorem proves
that
∗
E eθ ST = 1,
from which it follows that ∗b
eθ −1
P {ST = −a} = ,
eθ∗ b − e−θ∗ a
∗
1 − e−θ a
P {ST = b} = θ∗ b .
e − e−θ∗ a
Hence, the exponential martingale is a very useful tool in explicitly computing exit probabilities
for this example. For more general increment distributions, explicit computation will typically be
impossible (in closed form), but (tight) bounds can often by computed.
Problem 2.7.1 Try Exercise 2.1 of Steele (2001).
We have seen that if (Sn : n ≥ 0) is a random walk with iid increments Z1 , Z2 , . . . having a
finite moment generating function in a neighborhood of the origin, then Sn − n E Z1 is a martingale
adapted to (Sn : n ≥ 0) and is one in an infinite family of such martingales. We now show that
the key exponential martingale for random walks generalizes in a suitable way to Markov random
walks (in which the increments are Markov-dependent).
Specifically, let X = (Xn : n ≥ 0) be a finite state irreducible Markov chain and suppose
f : S → R is a real-valued function. Set
Sn = f (X0 ) + · · · + f (Xn−1 ).
22
As in the setting of an ordinary random walk, the starting point is to consider the moment gener-
ating function
φn (θ, x) = Ex eθSn .
Note that
h i X Pn−1
Ex eθSn I {Xn = y} = eθf (x)+θ j=1 f (xj )
P (x, x1 ) · · · P (xn−1 , y)
x1 , x2 , ..., xn−1
X
= G(θ, x, x1 )G(θ, x1 , x2 ) · · · G(θ, xn−1 , y)
x1 , x2 , ..., xn−1
= Gn (θ, x, y),
where G(θ) = (G(θ, x, y) : x, y ∈ S) is the matrix having (x, y)’th entry given by G(θ, x, y) =
eθf (x) P (x, y). Hence
φn (θ, x) = (Gn (θ)e)(x),
where e = (1, 1, . . . , 1)T is the column vector in which all the entries are equal to 1. If follows that
if we write
φn (θ, x) ≈ enψ(θ) ,
we expect that eφ(θ) will be the largest eigenvalue of G(θ). To proceed further, we need the following
result due to Perron and Frobenius:
Theorem 2.7.1 Let G = (G(x, y) : x, y ∈ S) be an irreducible (i.e. for each x, y ∈ S, there exists
n ≥ 1 such that Gn (x, y) > 0) matrix with non-negative entries and |S| < ∞. Then, the largest
eigenvalue λ (in absolute value) is positive and its corresponding row and column eigenvectors have
strictly positive entries
Let λ(θ) be the “Perron-Frobenius eigenvalue” of G(θ) and let h(θ) = (h(θ, x) : x ∈ S) be its
associated column eigenvector. By analogy with ordinary random walk, we expect eθSn −nψ(θ) to be
(approximately) a martingale, where ψ(θ) = log(λ(θ)).
Proposition 2.7.2 For each θ ∈ R,
Mn (θ) = eθSn −nψ(θ) h(θ, Xn )
is a martingale adapted to (Xn : n ≥ 0).
Proof:
h i
Ex [Mn+1 (θ)|X0 , . . . , Xn ] = eθSn −nψ(θ) Ex eθf (Xn )−ψ(θ) h(θ, Xn+1 )|X0 , . . . , Xn
X
= eθSn −nψ(θ) eθf (Xn ) P (Xn , y)h(θ, y)e−ψ(θ)
y
= eθSn −nψ(θ) λ(θ)h(θ, Xn )e−ψ(θ)

= Mn (θ).

23
Thus, the exponential martingale for Markov random walk takes the same form as in the ordi-
nary random walk setting, except that the martingale must incorporate a state-dependent compo-
nent given by h(θ, Xn ).
As for ordinary random walks, it is of interest to consider the martingales that are obtained
through successive differentiation of Mn (θ) (with respect to θ). The irreducibility of P guarantees
that the eigenvalue λ(θ) has multiplicity one for each θ. It follows that λ(θ) and h(θ) are infinitely
differentiable in θ. Differentiating the equation G(θ)h(θ) = eψ(θ) h(θ) with respect to θ, we get
X
eθf (x) P (x, y)(f (x)h(θ, y) + h0 (θ, y)) = eψ(θ) (ψ 0 (θ)h(θ, x) + h0 (θ, x)). (2.7.5)
y
Note that λ(0) = 1 (so ψ(0) = 0) and h(0) = e, so (2.7.5) gives
(P − I)h0 (0) = −f + ψ 0 (0)e. (2.7.6)
Pre-multiplying through (2.7.6) by π, we find that ψ 0 (0) = πf . Hence, h0 (0) solve Poisson’s
equations
(P − I)g = −fc ,
where fc (x) = f (x) − πf .
Returning to the definition of Mn (θ) we find that
h0 (0, Xn )

0 0
Mn (θ) = Mn (θ) Sn − nψ (0) + ,
h(θ, Xn )
so
n−1
h0 (0, Xn ) X
Mn0 (θ) = Sn − nψ 0 (0) + = Sn − nπf + g(Xn ) = fc (Xj ) + g(Xn )
h(0, Xn )
j=0
is precisely the Dynkin martingale obtained earlier in the chapter. So, the first derivative of the
exponential martingale for Markov random walk recovers the Dynkin martingale. Additional (use-
ful) martingales can be obtained through successive differentiation of this exponential martingale.
The exponential martingale for Markov random walk can be applied, as in the ordinary random
walk setting, to compute bounds on exit probabilities of the form P {ST ≤ −a} and P {ST ≥ b},
where T = inf{n ≥ 0 : Sn ≤ −a or Sn ≥ b}.
Problem 2.7.2 Let X = (X(t) : t ≥ 0) be an irreducible finite-state Markov jump process with
rate matrix Q. Put Z t
S(t) = f (X(s))ds.
0
1. Use the Perron-Frobenius theorem to prove that there exists a positive function h(θ) =
(h(θ, x) : x ∈ S) and ψ(θ) such that
M (θ, t) = eθS(t)−tψ(θ) h(θ, X(t))
is a martingale adapted to (X(t) : t ≥ 0).

24
2. Suppose πf > 0 and {x : f (x) < 0} is non-empty. For r > 0, let Tr = inf{t ≥ 0 : S(t) ≤
−ar or S(t) ≥ br}. Use the above exponential martingale to prove that there exists γ such
that
1
log P {S(Tr ) ≤ −ar} → γ
r
as r → ∞.
2.8 Nonnegative Martingales and Change-of-measure
For Ω = S × S × · · · , let (Zn : n ≥ 0) be the sequence of S-valued random variables defined by

Zi (ω) = zi for ω = (z0 , z1 , . . .) ∈ Ω. Given a probability P on Ω, let (Wn : n ≥ 0) be a sequence
of non-negative rv’s for which E Wn = 1. As we have seen previously, the Wn ’s induce a sequence
of “change-of-measure” defined via
Pn {·} = E I {·} Wn .
In general, there is no relationship between the change-of-measure Pn and Pn+1 .
Remarkably, something special happens in the setting of a martingale. Specifically, suppose that
(Mn : n ≥ 0) is a non-negative martingale that is adapted to (Zn : n ≥ 0) for which E M0 = 1. Set
Pn {·} = E I {·} Mn
for n ≥ 0. Note that if n, k ≥ 0, then

Pn+k {(Z0 , . . . , Zn ) ∈ ·} = E I {(Z0 , . . . , Zn ) ∈ ·} Mn+k
= E E [I {(Z0 , . . . , Zn ) ∈ ·} Mn+k |Z0 , . . . , Zn ]
= E I {(Z0 , . . . , Zn ) ∈ ·} E [Mn+k |Z0 , . . . , Zn ] (2.8.1)
= E I {(Z0 , . . . , Zn ) ∈ ·} Mn
= Pn {(Z0 , . . . , Zn ) ∈ ·},
so that the Pn ’s consistently assign probabilities to the finite-dimensional distributions of the Zn ’s.
Problem 2.8.1 Suppose that for each A ∈ S n+1 and n, k ≥ 0
E I {(Z0 , . . . , Zn ) ∈ A} Mn+k = E I {(Z0 , . . . , Zn ) ∈ A} Mn .
Prove that if (Mn : n ≥ 0) is adapted to (Zn : n ≥ 0), then (Mn : n ≥ 0) is a martingale adapted
to (Zn : n ≥ 0).
In view of (2.8.1), the Kolmogorov extension theorem guarantees the existence of a single
probability P̃ on Ω (not depending on n) such that
P̃{(Z0 , . . . , Zn ) ∈ ·} = Pn {(Z0 , . . . , Zn ) ∈ ·}
for n ≥ 0. It follows that if f : S n+1 → R+ , then
Ẽf (Z0 , . . . , Zn ) = E [f (Z0 , . . . , Zn )Mn ] ,
where Ẽ[·] is the expectation operator corresponding to P̃. Furthermore, if T is a stopping time
adapted to (Zn : n ≥ 0), then
Ẽfn (Z0 , . . . , Zn )I {T = n} = E fn (Z0 , . . . , Zn )Mn I {T = n} (2.8.2)

25
for any non-negative function fn . Summing over n in (2.8.2) we arrive at the following important
identity:
ẼfT (Z0 , . . . , ZT )I {T < ∞} = E fT (Z0 , . . . , ZT )MT I {T < ∞} . (2.8.3)
Note that if MT > 0 on {fT (Z0 , . . . , ZT )I {T < ∞} > 0}, then
E fT (Z0 , . . . , ZT )I {T < ∞} = ẼfT (Z0 , . . . , ZT )I {T < ∞} MT−1 . (2.8.4)
Hence, the expectation of any functional of the path of Z up to a stopping time T can be
computed in terms of an expectation defined in terms of P̃.
Given that (Mn : n ≥ 0) is a nonnegative martingale, the supermartingale convergence theorem

guarantees that
Mn → M∞ a.s.
under P as n → ∞. However, in virtually all applications of change-of-measure methods, M∞ = 0

with positive probability (and usually M∞ = 0 under P a.s.). This suggests that we cannot extend
(2.8.3) and (2.8.4) to T = ∞.
We now specialize this discussion to exponential martingales. In particular, let (Sn : n ≥ 0) be

an (ordinary) random walk with iid increments Z1 , Z2 , . . . satisfying the light-tailed conditions as
stated in the earlier section. Then, for each θ ∈ D0 ,
Pn
Zi −nψ(θ)
Mn (θ) = eθ i=1
is a unit mean positive martingale that is adapted to (Sn : n ≥ 0). If Pθ is the probability on Ω
induced by the martingale (Mn (θ) : n ≥ 0), then
Pθ {Z1 ∈ A1 , Z2 ∈ A2 , . . . , Zn ∈ An } = E I {Z1 ∈ A1 , . . . , Zn ∈ An } Mn (θ)

Yn
= E I {Zi ∈ Ai } eθZi −ψ(θ)
i=1
n
Y
= Pθ {Zi ∈ Ai },
i=1
where
Pθ {Zi ∈ dx} = P {Z ∈ dx} eθz−ψ(θ) . (2.8.5)
In other words, Z1 , Z2 , . . . are iid under Pθ with common distribution given by (2.8.5), so that
(Sn : n ≥ 0) continues to be an (ordinary) random walk under Pθ (but with a modified increment
distribution).
Observe that if θ > 0, the distribution (2.8.5) favors positive values of Z relative to negative
values of Z (as compared to P {Z ∈ ·}). This implies that if Eθ [·] is the expectation operator
corresponding to Pθ , then Eθ Z > E Z when θ > 0 ( and Eθ Z < E Z when θ < 0).
Proposition 2.8.1 For θ ∈ D0 , Eθ Z = ψ 0 (θ) and varθ Z = ψ 00 (θ).

26
Proof: Note that ψ 0 (θ) = φ0 (θ)/φ(θ). An easy application of the Dominated Convergence Theo-
rem shows that when θ ∈ D0 ,
dk θZ dk θZ
E e = E e = E Z K eθZ
dθk dθk
and hence
eθZ
ψ 0 (θ) = E Z = E ZeθZ−ψ(θ) = Eθ Z.
φ(θ)
Similarly
2
φ00 (θ) φ0 (θ) Z 2 eθZ

00
ψ (θ) = − =E − (Eθ Z)2 = Eθ Z 2 − (Eθ Z)2 = varθ Z.
eθ φ(θ) φ(θ)

Note that if Z is not deterministic, then varθ Z > 0, so ψ is strictly convex on D0 . It follows
that ψ 0 (·) is strictly increasing, so that Eθ Z is strictly increasing in θ on D0 .
Example 2.8.1 Suppose that the Zi ’s are iid Norm(µ, σ 2 ) rv’s under P. Then, ψ(θ) = θµ(θ2 σ 2 )/2.
It follows from (2.8.3) and (2.8.4) that if T is a stopping time adapted to (Zn : n ≥ 0), then
θ2 σ2
Pθ {(Z1 , Z2 , . . . , ZT ) ∈ ·, T < ∞} = E I {(Z1 , . . . , ZT ) ∈ ·, T < ∞} eθ(ST −S0 )−T (θµ+ 2
)
and
σ2 θ2
P {(Z1 , · · · , ZT ) ∈ ·, T < ∞} = Eθ I {(Z1 , . . . , ZT ) ∈ ·, T < ∞} e−θ(ST −S0 )+T (θµ+ 2
)
.
Furthermore, the Zi ’s are iid under Pθ with common increment distribution given by
θσ 2 1 (z−µ)2 2 2
1 (z−µ−θσ 2 )2
+θZ−θµ− σ 2θ
Pθ (Zi ∈ dz) = P {Z ∈ dz} eθZ−(θµ+ 2
)
=√ e− 2σ 2 dz = √ e− 2σ2 ,
2πσ 2 2πσ
so the Zi ’s are iid Norm(µ + θσ 2 , σ 2 ) rv’s under Pθ . Hence, the change-of-measure Pθ has added
θσ 2 into the mean of the Zi ’s as computed under P. So,
σ2 θ2
P (Z1 + θσ 2 , Z2 + θσ 2 , . . . , Zn ∈ θσ 2 ) ∈ · = E I {(Z1 , . . . , Zn ) ∈ ·} eθ(Sn −S0 )−n(θµ+ 2 ) .

(2.8.6)
In particular, if µ = 0, (2.8.6) shows that one can (in principle) reduce the computation of a prob-
ability or expectation involving Gaussian random walks with drift to a corresponding expectation
computation involving a driftless (i.e. µ = 0) Gaussian random walk.
Problem 2.8.2 Let the Zi ’s be iid Norm(µ, σ 2 ) rv’s. Prove that
1 2
log P {Sn > n(µ + )} → − 2
n 2σ
as n → ∞. (This is a classical “large deviations” result. One way to prove this is by using the
change-of-measure Pθ under which the Zi ’s have mean µ + .)
27
The above exponential change-of-measure makes clear that T = ∞ can not generally be substi-
tuted in (2.8.3) or (2.8.4). To see this, note that if we set A = {n−1 Sn → E Z1 as n → ∞}, then
the strong law of large numbers (SLLN) guarantees that P {A} = 1. On the other hand, the SLLN
under Pθ ensures that Pθ {n−1 Sn → ψ 0 (θ) as n → ∞} = 1, so Pθ {Ac } = 1 for θ 6= 0. The fact that
P{A} = 1 but Pθ {A} = 0 is inconsistent with (2.8.3) and (2.8.4) holding for T = ∞. One can
further verify that
1
log Mn (θ) → θ E Z1 − ψ(θ) < 0 a.s.
n
under P as n → ∞, so M∞ (θ) = 0 a.s. under P for this class of martingales.
Remark 2.8.1 In the language of measure-theoretic probability, Pθ and P are singular on the
sigma-algebra generated by (Zn : n ≥ 0) but mutually absolutely continuous on the σ-algebra
generated by (Zn : 0 ≤ n ≤ T ), where T is a finite-valued stopping time adapted to the Zn ’s.
Turning next to the exponential martingales associated with Markov random walks, we recall
that
h(θ, Xn )
Mn (θ) = eθSn −nψ(θ)
h(θ, X0 )
is a unit mean positive martingale adapted to (Xn : n ≥ 0). Let Pθ {·} and Eθ (·) be the probability
and expectation operators on Ω induced by the martingale (Mn (θ) : n ≥ 0). Note that
f (Xj )−nψ(θ) h(θ,xn )

Pn−1
Pθ {X1 = x1 , . . . , Xn = xn |X0 = x0 } = Ex0 I {X1 = x1 , . . . , Xn = xn } eθ j=0
h(θ, x0 )
n−1
Y h(θ, xj+1 )
= eθf (xj )−ψ(θ) P (xj , xj+1 )
h(θ, xj )
j=0
n−1
Y
= Kθ (xj , xj+1 )
j=0
where K(θ) = (K(θ, x, y) : x, y ∈ S) is the stochastic matrix with (x, y)’th entry given by
h(θ, y)
K(θ, x, y) = eθf (x)−ψ(θ) .
h(θ, x)
In other words, X continues to be a Markov chain under Pθ (with stationary transition probabili-
ties), but with modified transition matrix given by K(θ).
Problem 2.8.3 Prove that if X is irreducible with finite state space, then
X
ψ 0 (θ) = π(θ, x)f (x),
x
where π(θ) = (π(θ, x) : x ∈ S) is the unique stationary distribution of X under Pθ .
Problem 2.8.4 Consider the exponential martingales constructed in Problem 2.7.1 for Markov
jump processes. If Pθ is the change-of-measure induced by (M (θ, t) : t ≥ 0), prove that X
continues to be a Markov jump process under Pθ , and compute its modified rate matrix.
28
We conclude this section with a brief discussion of a useful class of additional nonnegative
martingales that induce associated change-of measures. Let X = (Xn : n ≥ 0) be a discrete state
space Markov chain and set T = inf{n ≥ 0 : Xn ∈ A ∪ B}, where A and B are disjoint subsets.
Suppose that Px {T < ∞} = 1 for each x ∈ S, and set
u∗ (x) = Px {XT ∈ A} .
Then, u∗ = (u∗ (x) : x ∈ S) satisfies

u = Pu (2.8.7)
on Ac ∩ B c , subject to u = 1 on A and u = 0 on B. Let u be a bounded solution to (2.8.7) that is
positive on B c . In this case, if X0 ∈ Ac ∩ B c , then
u(XT ∧n )
Mn =
u(X0 )
is a non-negative martingale with unit mean adapted to (Xn : n ≥ 0). So
u(XT ∧n )
Ex Mn = Ex =1
u(x)
and hence
Px {XT ∈ A, T ≤ n} + Ex [u(Xn )I {T ≥ n}] = u(x).
Sending n → ∞ and applying the Bounded Convergence Theorem yields the conclusion that u = u∗ .
In other words, any bounded solution to (2.8.7) (subject to the boundary condition on A and B)
is the probabilistically meaningful solution.
But (Mn : n ≥ 0) also induces a change-of measure, call it P̃. In particular, for fT (·) nonnega-
tive,

u(XT )
Ẽx fT (X0 , . . . , XT ) = Ex fT (X0 , . . . , XT )
u(X0 )
Ex [fT (X0 , . . . , XT )I {XT ∈ A}]
=
Px {XT ∈ A}
= Ex [fT (X0 , . . . , XT )|XT ∈ A] ,
where Ẽx [·] is the expectation operator associated with P̃{·|X0 = x}. In other words, the change-of-
measure induced by (Mn : n ≥ 0) is precisely the conditional distribution of X, given that XT ∈ A.
Furthermore, on {T < n},
u(y)
P̃{Xn+1 = y|X0 , . . . , Xn } = P (Xn , y) ,
u(Xn )
so that (Xn : 0 ≤ n < T ) is, conditional on XT ∈ A, a Markov chain with modified transition
matrix having (x, y)’th entry given by P (x, y)u(y)/u(x) for x ∈ Ac ∩ B c .
2.9 The Doob-Meyer Decomposition for Supermartingales
Let M = (Mn : n ≥ 0) be a supermartingale that is adapted to (Zn : n ≥ 0). As we have

previously seen, M models an unfavorable gamblers environment, so M has a tendency to decrease.
29
This suggests the possibility of writing M as the sum of a martingale (that models a fair game)
and a decreasing process. Such a decomposition is due to Doob and Meyer (and is a non-trivial
fact in continuous time; see Section 3.11).
Theorem 2.9.1 Let M = (Mn : n ≥ 0) be a supermartingale that is adapted to (Zn : n ≥ 0).

Then, M can be represented as
Mn = Mn∗ + Γn ,
where M ∗ = (Mn∗ : n ≥ 0) is a martingale that is adapted to (Zn : n ≥ 0) and Γ = (Γn : n ≥ 0) is
a sequence with non-increasing sample paths.
Proof Note that

Mn − Mn−1 = (Mn − E [Mn |Z0 , . . . , Zn−1 ]) + (E [Mn |Z0 , . . . , Zn−1 ] − Mn−1 ) .
But Mn − E [Mn |Z0 , . . . , Zn−1 ] is a martingale difference that is adapted to (Zn : n ≥ 0), and
E [Mn |Z0 , . . . , Zn−1 ] − Mn−1 ≤ 0 a.s., from which the result follow immediately.
Definition 2.9.1 We say that sequence V = (Vn : n ≥ 0) is predictable (with respect to (Zn : n ≥
0)) if (Vn+1 : n ≥ 0) is adapted to (Zn : n ≥ 0) (i.e. for each n ≥ 1, there exists a (deterministic)
function gn such that Vn = gn (Z0 , . . . , Zn−1 )).
Remark 2.9.1 Note that the sequence (Γn : n ≥ 0) constructed in the proof of Theorem 2.9.1 is
a predictable sequence.
Problem 2.9.1 Suppose that (Mn : n ≥ 0) is a supermartingale adapted to (Zn : n ≥ 0). Suppose
that
Mn = M̃n + Γ̃n ,
where (M̃n : n ≥ 0) is a martingale adapted to (Zn : n ≥ 0) with M̃0 = M0 and (Γn : n ≥ 1) is a
predictable sequence (with respect to (Zn : n ≥ 0)). Show that
n
X
M̃n = M0 + (Mi − E [Mi |Z0 , . . . , Zi=1 ]),
i=1
n
X
Γ̃n = (E [Mi |Z0 , . . . , Zi−1 ] − Mi=1 ).
i=1
(Hence, the decomposition can be viewed as being unique in the above sense.)
An important application of this decomposition arises in the context of a square-integrable

martingale (Mn : n ≥ 0) that is adapted to (Zn : n ≥ 0). Since φ(x) = x2 is convex, it follows from
Proposition 2.5.1 that (Mn2 : n ≥ 0) is a submartingale. By virtue of Theorem 2.9.1 and Problem
2.9.1, we can therefore uniquely write
Mn2 = Mn∗ + Γn ,
where (Mn∗ : n ≥ 0) is a martingale with M0∗ = M02 and (Γn : n ≥ 1) is a non-decreasing predictable
sequence.
Definition 2.9.2 The sequence (Γn : n ≥ 1) is call the predictable quadratic variation of M and
is denoted by (hM i(n) : n ≥ 1).
30
2.10 Martingales in Continuous Time
We discuss here the generalizations of our key results established for discrete-time martingales,
supermartingales, and submartingales to the continuous-time setting. Because of results like the
upcrossing and downcrossing inequalities, it should come as no surprise that if (M (t) : t ≥ 0) is a
submartingale adapted to (Z(t) : t ≥ 0), then one can establish (in great generality) that one can
presume that M (·) has right-continuous paths (provided that E M (·) is right-continuous on [0, ∞)).
Perhaps the most important result that we will need is the optional sampling theorem in contin-
uous time. This can be obtained from the discrete-time result by an approximation argument that is
widely used in proving that discrete-time martingale results can be generalized to continuous time.
Note that if M is a submartingale that is adapted to Z = (Z(t) : t ≥ 0), then (M (j2−n ) : j ≥ 0)
is a discrete-time submartingale adapted to (M (0), (Z(((j − 1) + t)2−n ) : 0 ≤ t ≤ 1) : j ≥ 1) for
each n ≥ 1. For a given stopping time T that is adapted to (Z(t) : t ≥ 0), put
Tn = 2−n d2n T e,
where dxe is the least integer greater than or equal to x. Note that
I Tn ≤ j2−n = I {d2n T e ≤ j} = I T ≤ j2−n = g(Z(u) : 0 ≤ u ≤ j2−n ),

so Tn is adapted to (M (0), (Z(((j − 1) + t)2−n ) : 0 ≤ t ≤ 1)) : j ≥ 1). The discrete-time optional

sampling theorem guarantees that
E M (Tn ∧ t) ≥ E M (0) (2.10.1)
for n ≥ 1. Furthermore, if M (·) is right-continuous, then M (Tn ∧ t) → M (T ∧ t) a.s. as n → ∞,

since Tn ↓ T . The fact that (M (Tn ∧ t) : n ≥ 1) is a “backwards martingale” (see Chung for the
definition) implies the uniform integrability necessary to interchange the limit and expectation in
(2.10.1), yielding the inequality
E M (T ∧ t) ≥ E M (0).
We have therefore provided an outline of the proof of the following version of the Optional Sampling
Theorem in continuous time; see Karatzas and Shreve for full details):
Theorem 2.10.1 Let (M (t) : t ≥ 0) be a right-continuous submartingale that is adapted to

(Z(t) : t ≥ 0). If T is a stopping time adapted to (Z(t) : t ≥ 0), then
E M (T ∧ t) ≥ E M (0)
for t ≥ 0.
We turn next to the maximal inequality for submartingales.
Theorem 2.10.2 Let (M (t) : t ≥ 0) be a right-continuous submartingale that is adapted to

(Z(t) : t ≥ 0). Then
E M (t)+

P sup M (s) > x ≤
0≤s≤t x
for x ≥ 0.
31
This is easily derived from the corresponding discrete-time result, namely Theorem 2.5.1, by
approximating the supremum over s ∈ [0, t] by the supremum over {j2−n : 0 ≤ j2−n ≤ t}.
The Martingale Convergence Theorem takes the following form in continuous time.
Theorem 2.10.3 Let (M (t) : t ≥ 0) be a right-continuous martingale that is adapted to (Z(t) :

t ≥ 0). If
sup E |M (t)| < ∞,
t≥0
then there exists a finite-values rv M (∞) such that

M (t) → M (∞) a.s.
as t → ∞.
This result is the key to proving the next theorem.
Theorem 2.10.4 Let (M (t) : t ≥ 0) be a right-continuous martingale that is adapted to (Z(t) :

t ≥ 0). Then, (M (t) : t ≥ 0) is uniformly integrable if and only if there exists an integrable rv Y
such that
M (t) = E [Y |Z(u) : 0 ≤ u ≤ t] .
Finally, we state the continuous-time version of the Doob-Meyer decomposition.
Definition 2.10.1 We say that (V (t) : t ≥ 0) predictable (with respect to (Z(t) : t ≥ 0)) if
(V (t) : t ≥ 0) is adapted to (Z(t−) : t ≥ 0) (i.e. for each t ≥ 0, there exists a (deterministic)
function gt such that V (t) = gt (Z(u) : 0 ≤ u < t)).
Theorem 2.10.5 Let M = (M (t) : t ≥ 0) be a uniformly integrable right-continuous supermartin-

gale that is adapted to (Z(t) : t ≥ 0). Then, we can write
M (t) = M ∗ (t) + Γ(t)
where (M ∗ (t) : t ≥ 0) is a martingale adapted to (Z(t) : t ≥ 0) for which M ∗ (0) = M (0) and
(Γ(t) : t ≥ 0) is a non-increasing predictable process (with respect to (Z(t) : t ≥ 0)). Furthermore,
M ∗ and Γ are a.s. unique.
If (M (t) : t ≥ 0) is a square-integrable right-continuous martingale adapted to (Z(t) : t ≥ 0),

then Theorem 2.10.5 asserts that we may uniquely write
M 2 (t) = M ∗ (t) + Γ(t)
for t ≥ 0, where (M ∗ (t) : t ≥ 0) is a martingale adapted to (Z(t) : t ≥ 0) satisfying M ∗ (0) = M 2 (0)
and (Γ(t) : t ≥ 0) is a non-decreasing predictable process. The process (Γ(t) : t ≥ 0) is called the
predictable quadratic variation of (M (t) : t ≥ 0) and is written as Γ(t) = hM i(t).
Problem 2.10.1 Let X = (X(t) : t ≥ 0) be a finite state Markov jump process with rate matrix
Q. For g : S → R, consider the martingale
Z t
M (t) = g(X(t)) − (Qg)(X(s))ds.
0
Compute hM i(t) for this martingale.
32

Section 2 - Martingales

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Section 2 - Martingales

Uploaded by

Copyright:

Available Formats

MS&E 322 Winter 2023

Stochastic Calculus and Control January 7, 2023

2.2 Definitions and Basic Properties

We start with the notion of “adaptedness”.

X(t) = gt (Z(s) : 0 ≤ s ≤ t).

Remark 2.2.1 An equivalent terminology is to assert that X is non-anticipating relative to Z (as

A martingale is a generalization of a random walk having mean-zero increments, namely a se-

1. E |Mn | < ∞ for n ≥ 0;

2. (Mn : n ≥ 0) is adapted to (Zn : n ≥ 0);

3. E [Mn+1 |Z0 , . . . , Zn ] = Mn a.s. for n ≥ 0.

Similarly, we say that (M (t) : t ≥ 0) is a martingale (adapted to (Z(t) : t ≥ 0)) if:

1. E |M (t)| < ∞ for t ≥ 0;

2. (M (t) : t ≥ 0) is adapted to (Z(t) : t ≥ 0);

3. E [M (t + s)|Z(u) : 0 ≤ u ≤ t] = M (t) a.s. for s, t ≥ 0.

Definition 2.2.3 We say that (Mn : n ≥ 0) is a submartingale (supermartingale) (adapted to

1. E |Mn | < ∞ for n ≥ 0;

2. (Mn : n ≥ 0) is adapted to (Zn : n ≥ 0);

3. E [Mn+1 |Z0 , . . . , Zn ] ≥ Mn a.s. for n ≥ 0 (E [Mn+1 |Z0 , . . . , Zn ] ≤ Mn a.s. for n ≥ 0).

Similarly, we say that (M (t) : t ≥ 0) is a submartingale (supermartingale) (adapted to (Z(t) : t ≥

1. E |M (t)| < ∞ for t ≥ 0;

2. (M (t) : t ≥ 0) is adapted to (Z(t) : t ≥ 0);

3. E [M (t + s)|Z(u) : 0 ≤ u ≤ t] ≥ M (t) a.s. for s, t ≥ 0 (E [M (t + s)|Z(u) : 0 ≤ u ≤ t] ≤ M (t)

E [M (t + s)|Ft ] = M (t) a.s.

We mentioned earlier that a martingale is a generalization of a mean-zero random walk. To

Proposition 2.2.1 Let M = (Mn : n ≥ 0) be a martingale adapted to (Zn : n ≥ 0). Then

1. E [Dj |Z0 , . . . , Zj−1 ] = 0 for j ≥ 1;

3. E [Di Dj ] = 0 for i 6= j and E [M0 Di ] = 0 for i ≥ 1;

Proof: For 1., observe that

E [Di Dj |Z0 , . . . , Zj−1 ] = Di E [Dj |Z0 , . . . , Zj−1 ] = 0,

so it follows that E [Di Dj ] = 0. Similarly, E [M0 Di ] = 0 for i ≥ 1. Part 4. follows immediately

is a martingale adapted to (Zn : n ≥ 0), where ∆Mj = Mj − Mj−1 for j ≥ 1 and

Proof: Note that

2.3 State-dependent Martingales

1. If f = P f , then (f (Xn ) : n ≥ 0) is a martingale adapted to (Xn : n ≥ 0);

2. If P f ≤ f , then (f (Xn ) : n ≥ 0) is a supermartingale adapted to (Xn : n ≥ 0);

3. If P f ≥ f , then (f (Xn ) : n ≥ 0) is a submartingale adapted to (Xn : n ≥ 0).

Proof: The integrability and adaptedness are obvious. Also,

verifying 1.; the proofs of 2. and 3. are similar.

Remark 2.3.1 Note that if f ≥ 0 and P f ≤ f , then

Ex f (Xj+1 ) = Ex [Ex [f (Xj+1 )|X0 , . . . , Xj ]] = Ex [(P f )(Xj )] ≤ Ex f (Xj ),

A similar result holds in the context of Markov jump processes.

1. If Qf = 0, then (f (X(t)) : t ≥ 0) is a martingale adapted to (X(t) : t ≥ 0).

2. If Qf ≤ 0, then (f (X(t)) : t ≥ 0) is a supermartingale adapted to (X(t) : t ≥ 0).

3. If Qf ≥ 0, then (f (X(t)) : t ≥ 0) is a submartingale adapted to (X(t) : t ≥ 0).

dX(t) = µ(X(t))dt + σ(X(t))dB(t).

and a supermartingale (submartingale) if ∆f ≤ 0 (∆f ≥ 0), where

is the Laplacian differential operator. A function f satisfying (2.3.1) is said to be a harmonic

Note that the conditions that L f = 0, L f ≤ 0 or L f ≥ 0 for a general second-order differential

2.4 The Optional Sampling Theorem

We have already established that if M = (Mn : n ≥ 0) is a martingale, then

Example 2.4.1 Let S = (Sn : n ≥ 0) be a symmetric nearest-neighbor random walk on Z, so that

E ST = b P {ST = b} − a P {ST = −a}

from which it follows that

Example 2.4.3 Let S = (Sn : n ≥ 0) be a symmetric nearest-neighbor random walk on Z. Put

which is a clear contradiction.

E [Mn+1 |S0 , · · · , Sn ] = E (Sn + Zn+1 )2 − (n + 1)σ 2 |S0 , . . . , Sn

= Sn2 − (n + 1)σ 2 + 2Sn E [Zn+1 |S0 , . . . , Sn ] + E Zn+1

But ST2i = 1, so we find that

Remark 2.4.2 In the language of measure-theoretic probability, T ∈ Z+ is a stopping time if

Exercise 2.4.1 1. Prove that if T = n (i.e. T is deterministic), then T is a stopping time

Theorem 2.4.1 (Optional Stopping Theorem) 1. Let (Mn : n ≥ 0) be a martingale adapted

2. Let (Mn : n ≥ 0) be a supermartingale (submartingale) adapted to (Zn : n ≥ 0) and let

Proof: Note that

E [|Wn |I {|Wn | > x}] < .