Section 1 - What This Course Is About

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

MS&E 322 Winter 2023

Stochastic Calculus and Control January 7, 2023


Prof. Peter W. Glynn Page 1 of 15

Section 1: What This Course Is About

Contents

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Discrete-time Deterministic Dynamical Systems . . . . . . . . . . . . . . . . . . . . . 1
1.3 Continuous-time Deterministic Dynamical Systems . . . . . . . . . . . . . . . . . . . 2
1.4 Discrete-time Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Stochastic Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Markov Structure of SDE’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Connection Between SDE’s and PDE’s . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.8 Advantages of Model Formulation in Continuous Time . . . . . . . . . . . . . . . . . 14

1.1 Introduction

Our goal here is to give a brief account of some of the main themes of the course, in particular the
role of stochastic differential equations (SDE’s) in modeling continuous-time dynamical systems that
are subject to uncertainty and identifying some of the key properties that make this class of models
tractable. We start with a discussion of dynamical systems in discrete time and then introduce
the class of models that are solutions of SDE’s. This is followed by a heuristic derivation of the
beautiful connections between SDE’s and (deterministic) partial differential equations (PDE’s). We
conclude this section with a discussion of some of the advantages of modeling stochastic systems
via continuous-time SDE’s.

1.2 Discrete-time Deterministic Dynamical Systems

Consider a discrete-time time-dependent Rd -valued sequence (xn : n ≥ 0). The most common
means of modeling such dynamical systems is by specifying their dynamics recursively, via an
equation of the form
xn+1 = fn+1 (x0 , x1 , . . . , xn ) (1.2.1)

subject to the initial condition x0 = x.


Of course, it has long been known that the tractability of (1.2.1) is greatly enhanced by assuming
that the function fn+1 (·) depends on the history (x0 , x1 , . . . , xn ) only through xn , so that

xn+1 = fn+1 (xn ) (1.2.2)

subject to x0 = x. When the recursion (1.2.1) takes the form of (1.2.2), xn is typically called the
“state” of the system at time n.

1
§ SECTION 1: WHAT THIS COURSE IS ABOUT

Note that by setting yn = n, we can write


   
xn+1 fyn (xn )
=
yn+1 yn + 1

subject to    
x0 x
= .
y0 0
It follows that by passing to the “space-time” quantity x̃n = (xn , yn )T , the recursion (1.2.2) can be
written as
x̃n+1 = f (x̃n ), (1.2.3)
where  
T
 fy (x)
f (x̃) = f (x, y) = .
y+1
Hence, (1.2.2) can be reduced to the special case in which fn is independent of n. Of course, this
comes at the cost of increasing the dimension of the state space from d to d + 1. As a result, while
this reduction is mathematically useful, it is rarely exploited at a computational level.

When the dynamics are described through (1.2.3), the equilibrium behavior of the system can
be readily computed. In particular, assuming that xn → x∞ as n → ∞, the equilibrium (or
steady-state) x∞ should satisfy the fixed point equation

x∞ = f (x∞ ). (1.2.4)

Finally, it should be evident that the most tractable dynamics arise when the function f ap-
pearing in (1.2.4) is affine, so that
f (x) = M x + c
for some d × d matrix M and d × 1 (column) vector c. In this “linear” setting, a great deal can be
computed in closed form, and much is known about the qualitative behavior of (xn : n ≥ 0).

1.3 Continuous-time Deterministic Dynamical Systems

Let (x(t) : t ≥ 0) be an Rd -valued continuous-time time-dependent process. The most common


means of modeling the evolution of such a system is through a differential equation of the form
d
x(t) = gt (x(s) : 0 ≤ s ≤ t) (1.3.1)
dt
subject to the initial condition x(0) = x. As in the discrete-time setting, tractability typically
demands that one restrict ones attention to equations taking the form
d
x(t) = gt (x(t)) (1.3.2)
dt
subject to x(0) = x. As in discrete time, x(t) is called the “state” of the system at time t. By
passing to the “space-time” process x̃(t) = (x(t), t)T , one can (without loss of generality) assume
that the function gt appearing in (1.3.2) is independent of time (at the cost of increasing the
dimension of the state variable from d to d + 1).
2
§ SECTION 1: WHAT THIS COURSE IS ABOUT

Suppose then that


d
x(t) = g(x(t))
dt
subject to x(0) = x. Assuming that x(t) → x(∞) as t → ∞, one expects that

d
x(t) → 0
dt
as t → ∞, and hence the equilibrium x(∞) should satisfy

g(x(∞)) = 0.

This equation is, of course, the continuous-time analog to (1.2.4).


Finally, the most tractable variant of the above class of continuous-time models arises when g
is affine, so that the dynamical system is described by
d
x(t) = M x(t) + c,
dt
subject to x(0) = x, where M is a d × d matrix and c is a d × 1 column vector.

1.4 Discrete-time Markov Chains

The above ideas carry over in a direct way to the setting of dynamical systems that evolve
stochastically. We start first with a discussion of discrete-time Rd -valued stochastic sequences
(Xn : n ≥ 0). A very general equation for the dynamical behavior of such a system is to postulate
that
Xn+1 = fn+1 (X0 , X1 , . . . , Xn , ξn+1 ), (1.4.1)
subject to X0 = x, where (ξn : n ≥ 1) is a sequence of independent and identically distributed (iid)
“noise disturbances”. Such an iid sequence of random variables (rv’s) is often called a “white noise
process” in discrete time. Note that regardless of the distribution of the ξj ’s, the recursion (1.4.1)
always leads to a well-defined sequence of Xn ’s (so that the model above always makes sense at a
mathematical level).

For reasons identical to those that arise in the deterministic setting, the systems described by
(1.4.1) become much more tractable if the functions fn+1 (·) depend on the history (X0 , X1 , . . . , Xn )
only through Xn so that
Xn+1 = fn (Xn , ξn+1 ) (1.4.2)
subject to X0 = x. In this special case, the independence of the ξn ’s guarantees that the solution
to (1.4.2) enjoys the Markov property, namely

P {Xn+1 ∈ ·|X0 , . . . , Xn } = P {Xn+1 ∈ ·|Xn } . (1.4.3)

In general, the one-step transition probability P {Xn+1 |Xn = x} takes the form

P {Xn+1 ∈ A|Xn = x} = Pn+1 (x, A), (1.4.4)

so that the transition probabilities typically depend explicitly on time and hence are “non-stationary”
transition probabilities. A discrete-time stochastic sequence (Xn : n ≥ 0) exhibiting the Markov
3
§ SECTION 1: WHAT THIS COURSE IS ABOUT

property (1.4.3) is called a discrete-time Markov chain (DTMC). Here the state space is S = Rd .

Not surprisingly, by passing to the “space-time” process X̃n = (Xn , n)T , we obtain a Markov
chain (X̃n : n ≥ 0) satisfying a recursion of the form

X̃n+1 = f (X̃n , ξn+1 ); (1.4.5)

the corresponding transition probabilities are now stationary. As in the deterministic setting, this
reduction from the non-stationary setting to the stationary setting comes at the cost of increasing
the dimension of the state variable from d to d + 1. As a consequence, this reduction is computa-
tionally not useful (although theoretically convenient).

Suppose now that the Markov chain (Xn : n ≥ 0) obeys (1.4.2) with fn independent of n (as in
(1.4.5)). Assuming that (Xn : n ≥ 0) exhibits equilibrium behavior in the sense that there exists
a limit random variable (rv) X∞ such that

Xn ⇒ X∞

as n → ∞, one expects that the equilibrium limiting quantity X∞ should satisfy


D
X∞ = f (X∞ , ξ), (1.4.6)
D
where = denotes “equality in distribution” and ξ is a rv independent of X∞ and with the same

distribution as the ξj ’s. It follows from (1.4.6) that the equilibrium distribution π(·) = P {X∞ ∈ ·}
should satisfy the equation Z
π(·) = π(dx) P {f (x, ξ) ∈ ·} .
Rd

Finally, the most tractable class of Rd -valued Markov chains arises when f (·) is affine, in which
case the Xj ’s satisfy a recursion of the form

Xn+1 = M Xn + N ξn+1 + c
0
= M Xn + ξn+1 ,
0
where ξn+1 = N ξn+1 + c, for some d × d (deterministic) matrix M . This class of stochastic models
is often called a “state space” model, and is widely used throughout engineering and economics.

We conclude this section by quickly reviewing the equations that arise when computing various
probabilities and expectations in the discrete-time Markov chain context. We assume that (Xn :
n ≥ 0) satisfies the stochastic recursion

Xn+1 = f (Xn , ξn+1 )

subject to X0 = x. When (Xn : n ≥ 0) has stationary transition probabilities, it is standard to


adopt the notation

Px {·} = P {·|X0 = x}
and
Ex [·] = E [·|X0 = x] .
4
§ SECTION 1: WHAT THIS COURSE IS ABOUT

Example 1.4.1 (Computing Expected Pay-off / Reward at Time n) Our goal here is to
compute Ex [r(Xn )] for some “pay-off/reward” function r : Rd → R. Put
ũ∗j (x) = E [r(Xn )|Xj = x]
for 0 ≤ j ≤ n. Note that for 0 ≤ j < n,
ũ∗j (x) = E [r(Xn )|Xj = x]
= E [E [r(Xn )|Xj , Xj+1 ] |Xj = x]
= E ũ∗j+1 (Xj+1 )|Xj = x
 
Z
= ũ∗j+1 (y) Px {X1 ∈ dy} .
Rd
Since ũ∗n
= r, we can solve for ũ∗n−1 , ∗
ũn−2 , . . . , ũ∗0 by backwards recursion. The function ũ∗0 is the
desired expectation.

We can re-index time to get an equation that is more closely analogous to the equations that
arise in continuous-time stochastic modeling. Put u∗j = ũ∗n−j . Then
Z

un−j (x) = u∗n−j−1 (y) Px {X1 ∈ dy}
Rd
and hence Z
u∗n−j (x) − u∗n−j−1 (x) = u∗n−j−1 (y) (Px {X1 ∈ dy} − δx (dy)) ,
Rd
where δx (A) = 1 is x ∈ A and 0 otherwise. In other words,
∆u∗n−j (x) = Au∗n−j−1 (x),
 
(1.4.7)
where A is the linear (integral) operator defined by
Z

(Ag) (x) = g(y) (Px {X1 ∈ dy} − δx (dy)) ,
Rd
subject to u∗0 = r. This is the so-called backwards equation for the Markov chain (Xn : n ≥ 0).

Example 1.4.2 (Computing Expected Cumulative Reward to Hitting a Set) Given a set
C c ⊆ Rd , let  
TX−1
u∗ (x) = Ex  r(Xj ) ,
j=0

where T = inf{n ≥ 1 : Xn ∈ C c } is the first “hitting time” of C c . For x ∈ C,


 
T
X −1
u∗ (x) = r(x) + Ex  r(Xj )I {T > 1}
j=1

= r(x) + Ex [u∗ (X1 )I {X1 ∈ C}]


Z
= r(x) + u∗ (y) Px {X1 ∈ dy} .
C
Of course, u∗ satisfies the boundary condition u∗ (x) = 0 for x ∈ C c . Consequently,
(Au∗ )(x) = −r(x) (1.4.8)
for x ∈ C, subject to u∗ = 0 on C c .
5
§ SECTION 1: WHAT THIS COURSE IS ABOUT

Example 1.4.3 (Computing Expected Infinite Horizon Discounted Reward) For α > 0,
let  
X∞
u∗ (x) = Ex  e−αj r(Xj ) .
j=0

Then
  
X∞
u∗ (x) = r(x) + e−α Ex Ex  e−αj r(Xj+1 ) X1 
j=0
−α ∗
= r(x) + e Ex [u (X1 )]
Z
= r(x) + e−α u∗ (y) Px {X1 ∈ dy} .
Rd

This can, if we wish, be re-written in the form

(Au∗ )(x) − (eα − 1)u∗ (x) = −eα r(x) (1.4.9)

for x ∈ Rd .

1.5 Stochastic Differential Equations

We now arrive at the class of models that will form the core of this course, namely the class
of continuous-time stochastic processes that arise as solutions of stochastic differential equations.
Such equations are commonly used to describe time-dependent phenomena, arising from engineer-
ing contexts, physical science applications, and economics and financial settings.

Given our discussion thus far, the most obvious approach to modeling a stochastic system in
continuous time is to postulate that the Rd -valued process (X(t) : t ≥ 0) satisfies a differential
equation of the form
d
X(t) = gt ((X(s) : 0 ≤ s ≤ t), ξ(t)), (1.5.1)
dt
subject to
X(0) = x, (1.5.2)
where (ξ(t) : t ≥ 0) is a continuous-time Rm -valued “white noise” process (i.e. ξ(t) is identically
distributed for t ≥ 0 and ξ(t) is independent of (ξ(u) : u 6= t) for all t ≥ 0). Of course, in order that
the equation (1.5.1) make sense as a model, one must establish mathematically that a process X
satisfying (1.5.1) and (1.5.2) exists. In fact, we typically would also like to know that the solution
X is (in some suitable sense) unique, for otherwise there are multiple different models (each po-
tentially exhibiting different behavior) that are consistent with our modeling postulates. Thus, the
use of (1.5.1) and (1.5.2) as a model description demands that a mathematical existence-uniqueness
theory for such equations be created.

Such a theory does not exist, in large part because the right-hand side of (1.5.1) depends explic-
itly on continuous-time white noise. The process (ξ(t) : t ≥ 0) is incredibly badly behaved from a
mathematical viewpoint, because ξ(s) has nothing to do with the value ξ(t), regardless of how close
s is to t. As a consequence, the realizations of (ξ(t) : t ≥ 0) exhibit no mathematical “regularity”
6
§ SECTION 1: WHAT THIS COURSE IS ABOUT

whatsoever, so that it is impossible to build a general existence-uniqueness theory for such equa-
tions. Of course, it is known that integration of a function smooths the function and makes it more
regular. This suggests that the right starting point for a general class of continuous-time stochastic
models should take as its primitive a continuous-time version of “integrated white noise”, rather
than white noise itself.

To get a sense of the correct definition for integrated white noise in continuous-time, consider
discrete-time white noise (ξn : n ≥ 1). Discrete-time integrated white noise is the sequence (Sn :
n ≥ 0), where
Sn = S0 + ξ1 + · · · + ξn .
(Such a sequence is often called a random walk process.) The sequence (Sn : n ≥ 0) has the
following important properties:

1. For n1 < n2 < · · · < n` , the increments Sn1 − S0 , Sn2 − Sn1 , . . . , Sn` − Sn`−1 are independent
rv’s. (This is what is known as the independent increments property.)

2. The increment Sn+` − S` has a distribution identical to that of Sn − S0 . (This is what is


known as the stationary increments property.)

In addition, if E [ξj ] = 0 and var (kξj k) < ∞, then


n
1 X 1
n− 2 ξj ⇒ Σ 2 N (0, I)
j=1

as n → ∞, where Σ is the covariance matrix of ξ and N (0, I) is an m-vector of independent unit


variance normal rv’s with mean 0. This implies that
D √ 1
Sn − S0 ≈ nΣ 2 N (0, I) (1.5.3)
D
for n large (where ≈ means “has approximately the same distribution as” and is not intended to
be interpreted as a rigorous mathematical statement). Since
√ 1 D
nΣ 2 N (0, I) = N (0, nΣ) ,

we may re-write (1.5.3) as


D
Sn − S0 ≈ N (0, nΣ)
for n large.

The above discrete-time discussion suggests the following continuous-time formulation for inte-
grated white noise. We say that (Z(t) : t ≥ 0) is a continuous-time integrated white noise process
if:

1. (Z(t) : t ≥ 0) has independent increments (i.e. for 0 < t1 < t2 < · · · < tn , Z(t1 ) −
Z(0), Z(t2 ) − Z(t1 ), . . . , Z(tn ) − Z(tn−1 ) are independent rv’s);

2. (Z(t) : t ≥ 0) has stationary increments (i.e. for s, t ≥ 0, Z(s + t) − Z(t) has the same
distribution as Z(s) − Z(0));
7
§ SECTION 1: WHAT THIS COURSE IS ABOUT

3.
D
Z(t) − Z(0) = N (0, tΣ) ,
for some symmetric non-negative definite matrix Σ.

The class of stochastic processes Z = (Z(t) : t ≥ 0) that we have described through 1., 2., and
3. above is what is called Brownian motion (more precisely, Brownian motion with zero drift and
covariance matrix Σ).

Definition 1.5.1 A Rm -valued process B = (B(t) : t ≥ 0) is called m-dimensional standard


Brownian motion if it has stationary independent increments and satisfies
D
B(t) − B(0) = N (0, tI) .

Note that a Brownian motion (Z(t) : t ≥ 0) with zero drift and covariance matrix Σ can be
2

expressed in terms of standard Brownian motion (just as a N 0, σ rv can be expresses as a
σ N (0, 1)):
D 1
Z(·) = Σ 2 B(·). (1.5.4)
As a consequence, the fundamental continuous-time integrated white noise process is standard
Brownian motion B. Just as in discrete time where

∆Sn = ξn ,

we can symbolically write


dB(t) = ξ(t)dt.
Returning now to (1.5.1), we must pay a price in order to exploit integrated white noise in continuous
time as a means of making sense of the equation mathematically. The price we shall pay is that
a useful theory can be developed only in the special case in which the noise appears additively in
(1.5.1), namely an equation of the form
d
X(t) = µt (X(s) : 0 ≤ s ≤ t) + σt (X(s) : 0 ≤ s ≤ t)ξ(t) (1.5.5)
dt
where µt (·) is Rd -valued and σt (·) is Rd×m -valued.

Formally integrating both sides of (1.5.5) we obtain


Z t Z t
X(t) − X(0) = µs (X(u) : 0 ≤ u ≤ s)ds + σs (X(u) : 0 ≤ u ≤ s)dB(s). (1.5.6)
0 0

The process X = (X(t) : t ≥ 0) is said to satisfy a stochastic differential equation with drift
µs (X(u) : 0 ≤ u ≤ s) and volatility σs (X(u) : 0 ≤ u ≤ s) if it obeys (1.5.6). In other words, the
rigorous mathematical meaning of (1.5.5) comes from its integrated version, namely (1.5.6).

But, at this point in our development, we are still far from a rigorous mathematical understand-
ing of equation (1.5.6). In particular, note that (1.5.6) involves the stochastic integral
Z t
σs (X(u) : 0 ≤ u ≤ s)dB(s). (1.5.7)
0
8
§ SECTION 1: WHAT THIS COURSE IS ABOUT

The key mathematical challenge is the presence of the stochastic integrator dB(s). The obvious
temptation is to attempt to define (1.5.7) as a Stieltjes integral. In order that (1.5.7) make sense
as a Stieltjes integral, one starts by partitioning the interval [0, t] into the n sub-intervals [0, t1 ],
[t1 , t2 ], ... , [tn−1 , tn ]. Let si be any point chosen from the i’th sub-interval, [ti−1 , ti ] and put

σ̃(si ) = σsi (X(u) : 0 ≤ u ≤ si ).

The integral (1.5.7) is said to exist in the (Riemann-) Stieltjes sense if for every sequence of partitions
satisfying max1≤i≤n (ti − ti−1 ) → 0 and choice of representative points (si ), the limit
n
X
σ̃(si ) (B(ti ) − B(ti−1 )) (1.5.8)
i=1

exists and is independent of the partition chosen and representative points that are used (in which
case the Riemann-Stieltjes integral is the common limit). Existence of the Riemann-Stieltjes inte-
gral typically demands that the integrator (in this case, B(·)) be a function that is of “bounded
variation” over finite intervals. Bounded variation in turn implies that the function must be differ-
entiable almost everywhere.

A consequence of the fact that B is an “integrated white noise” is that it is non-differentiable.


To get a sense of the difficulty, note that if B is differentiable at t (with derivative B 0 (t)), then
B(t + h) − B(t)
⇒ B 0 (t) (1.5.9)
h
as h ↓ 0. But
B(t + h) − B(t) D B(h) − B(0) D − 1
= = h 2 N (0, I) ,
h h
and hence the left-hand side of (1.5.9) does not converge as h ↓ 0. It turns out that a much stronger
statement holds: with probability one, the path B(·) is non-differentiable at every t ≥ 0 (see Steele,
p.64). As a consequence, one can not define (1.5.7) as a Riemann-Stieltjes integral. We must find
some alternative mathematical approach to attach meaning to (1.5.7).

For the purpose of this course, we will choose to follow the Itô definition of the stochastic
integral (1.5.7). Given a partition [0, t1 ], [t1 , t2 ], . . . , [tn−1 , tn ] as defined earlier, the Itô integral is
defined as the limit of the sequence of rv’s
n
X
σ̃(ti−1 ) (B(ti ) − B(ti−1 )) (1.5.10)
i=0

as max1≤i≤n (ti − ti−1 ) → 0 as n → ∞. The key feature to note in the Ito definition is that the
representative point si ∈ [ti−1 , ti ] has been chosen to be at the left endpoint of the sub-interval,
namely ti−1 . Surprisingly, the choice of representative point makes a big difference in the setting of
(1.5.7). In particular, if one chooses si = αti−1 + (1 − α)ti , (0 ≤ α ≤ 1), one gets entirely different
limits (as a function of α) for (1.5.8) as n → ∞. This unexpected feature of the stochastic integral
(1.5.7) is a consequence of the non-differentiability of B.

In the next sections, we will get a quick sense of why the Itô definition is a particularly natural
choice of definition for the stochastic integral (1.5.7).
9
§ SECTION 1: WHAT THIS COURSE IS ABOUT

1.6 Markov Structure of SDE’s

Given the previous discussion, it will come as not surprise that an SDE at the level of generality
of (1.5.5) is typically highly intractable. A much higher level of tractability ensues when the drift
and volatility functions depend on the history (X(s) : 0 ≤ s ≤ t) only through X(t), so that the
SDE takes the form
dX(t) = µ(t, X(t))dt + σ(t, X(t))dB(t) (1.6.1)

subject to
X(0) = x. (1.6.2)

In the setting of (1.6.1), we expect X = (X(t) : t ≥ 0) to enjoy the Markov property, namely

P {X(t + s) ∈ ·|X(u) : 0 ≤ u ≤ t} = P {X(t + s) ∈ ·|X(t)} .

Because the functions µ and σ depend explicitly on time, the transition probabilities for X are
non-stationary. In particular,

P {X(t + s) ∈ ·|X(t)} = Pt (s, X(t), ·)

depends explicitly on t. Because X has the Markov property, it is evident that X is a Markov
process with the state space Rd . As with discrete-time Markov chains, passing to the “space-time”
process X̃(t) = (X(t), t)T leads to a Markov process with stationary transition probabilities satis-
fying an SDE of the form (1.6.1) (with associated drift and volatility functions that are independent
of time), at the cost of increasing the dimension of the state space from d to d + 1.

Assume now that X satisfies the SDE

dX(t) = µ(X(t))dt + σ(X(t))dB(t)

subject to X(0) = x. If X exhibits equilibrium behavior in the sense that there exists a limiting
equilibrium rv X(∞) such that
X(t) ⇒ X(∞)
D
as t → ∞, we expect that when X(0) = X(∞), then (X(t) : t ≥ 0) ought to be a stationary
D
stochastic process (i.e. (X(u + t) : u ≥ 0) = (X(u) : u ≥ 0) for t ≥ 0); this is just the contin-
uous time version of (1.4.5). We will see later in the course how one can obtain an equation for

π(·) = P {X(∞) ∈ ·} from this stationarity relationship.

To complete our analogy with the previous sections, we note that the most tractable SDE’s are
the ones that possess linear dynamics. In the SDE context, the tractable such “linear SDEs” are
the processes X that satisfy
1
dX(t) = M X(t)dt + Σ 2 dB(t) (1.6.3)

subject to X(0) = x. A process X satisfying (1.6.3) is called a (vector-valued) Ornstein-Uhlenbeck


process.
10
§ SECTION 1: WHAT THIS COURSE IS ABOUT

1.7 Connection Between SDE’s and PDE’s

Perhaps the single most important property of SDE’s is the intimate connection between SDE’s
and PDE’s. Deterministic PDE’s arise in computing probabilities and expectation for solutions to
SDE’s, just as solving linear systems of equations arise naturally in computing probabilities and
expectations for finite state Markov chains.

As in our discussion of discrete-time Markov chains, we start with a derivation of the equation
satisfied by Ex [r(X(t))] where r : Rd → R is a given pay-off / reward function.

Example 1.7.1 (Computing Expected Pay-off / Reward at Time t) Put

u∗ (t, x) = Ex [r(X(t))] .

where X = (X(t) : t ≥ 0) is presumed to satisfy

dX(t) = µ(X(t))dt + σ(X(t))dB(t).

Then u∗ (0, x) = r(x) and

u∗ (t, x) = Ex [Ex [r(X(t))|X(u) : 0 ≤ u ≤ h]]


= Ex [u∗ (t − h, X(h))] .

for t > 0.

Suppose that we know that u∗ is smooth in (t, x). In this case, we can expand u∗ (t − h, X(h))
in a Taylor expansion about (t, X(0)) = (t, x):
d
∂u∗ (t, x) X ∂u∗ (t, x)
u∗ (t − h, X(h)) = u∗ (t, x) − h+ (Xi (h) − xi )
∂t ∂xi
i=1
d
1 X ∂ 2 u∗ (t, x)
+ (Xi (h) − xi )(Xj (h) − xj )
2 ∂xi ∂xj
i, j=1
d
X ∂ 2 u∗ (t, x) 1 ∂ 2 u∗ (t, x) 2
− h(Xi (h) − xi ) + h + ···
∂xi ∂t 2 ∂t2
i=1

Taking expectations term-by-term, we find that


d
∂u∗ (t, x) X ∂u∗ (t, x)
Ex [u∗ (t − h, X(h))] = u∗ (t, x) − h+ Ex [Xi (h) − xi ]
∂t ∂xi
i=1
d
1 X ∂ 2 u∗ (t, x)
+ Ex [(Xi (h) − xi )(Xj (h) − xj )]
2 ∂xi ∂xj
i, j=1
d
X ∂ 2 u∗ (t, x) 1 ∂ 2 u∗ (t, x) 2
− h Ex [Xi (h) − xi ] + h + ··· (1.7.1)
∂t∂xi 2 ∂t2
i=1

11
§ SECTION 1: WHAT THIS COURSE IS ABOUT

To proceed further, we need to heuristically compute the mixed moments


" d #
Y
k
Ex (Xi (h) − xi ) .
i=1

This can be done when h is small. We start by recalling that


D 1
B(t + h) − B(t) = h 2 N (0, I) .
For h small, evidently
Z t+h Z t+h
X(t + h) − X(t) = µ(X(s))ds + σ(X(s))dB(s)
t t
≈ µ(X(t))h + σ(X(t))(B(t + h) − B(t))
D 1
= µ(X(t))h + h 2 σ(X(t)) N (0, I) ,
where N (0, I) is independent of X(t). It follows that this heuristic argument suggests that
Ex [X(h) − x] ≈ µ(x)h (1.7.2)
and
Ex (X(h) − x)(X(h) − x)T ≈ hσ(x)σ(x)T .
 
(1.7.3)
Furthermore, h i  k
k
Ex kX(h) − xk = O h 2 (1.7.4)
as h ↓ 0. If we plug (1.7.2), (1.7.3) and (1.7.4) into (1.7.1) and subtract u∗ (t, x) off both sides of
(1.7.1), we find that
d d
∂u∗ (t, x) X ∂u∗ (t, x) 1 X ∂ 2 u∗ (t, x)
0=− h+ µi (x)h + bij (x)h + o(h) (1.7.5)
∂t ∂xi 2 ∂xi ∂xj
i=1 i, j=1

as h ↓ 0, where

(bij (x) : 1 ≤ i, j ≤ d) = σ(x)σ(x)T .
Dividing through (in (1.7.5)) by h and sending h to 0, we get the PDE
∂u∗
= Lu∗ (1.7.6)
∂t
subject to u∗ (0, x) = r(x), where L is the second order linear differential operator
d d
X ∂ 1 X ∂2
L= µi (x) + bij (x) . (1.7.7)
∂xi 2 ∂xi ∂xj
i=1 i, j=1

Note the similarity of (1.7.6) with (1.4.7).

The linear PDE (1.7.6) is a PDE of parabolic type, and generalizes the so-called “heat equation”
of mathematical physics given by
∂u∗
= Lu∗ ,
∂t
where
d
X ∂2
L= .
i=1
∂x2i
12
§ SECTION 1: WHAT THIS COURSE IS ABOUT

Example 1.7.2 (Computing Expected Cumulative Reward to Hitting a Set) We consider


here the expectation Z T 
u∗ (x) = Ex r(X(s))ds ,
0
where T = inf{t ≥ 0 : X(t) ∈ C c}
is the first hitting time of C c . For x ∈ C,
Z h∧T 
∗ ∗
u (x) = Ex r(X(s))dx + u (X(h))I {T > h}
0


(here a ∧ b = min{a, b}). Fix x ∈ C. As h ↓ 0,
n 1
o
Px {T ≤ h} ≈ P x + h 2 σ(x) N (0, I) ∈ C c = O(hq )

for each q ≥ 0, so the likelihood of X escaping from x ∈ C to C c in the interval [0, h] appears to
decay faster than any power of h (for h small). It follows that if we Taylor expand u∗ (X(h)) about
X(0) and take expectations, we get
d d
X ∂u∗ (x) 1 X ∂ 2 u∗ (x)
u∗ (x) = u∗ (x) + r(x)h + µi (x)h + bij (x)h + o(h). (1.7.8)
∂xi 2 ∂xi ∂xj
i=1 i, j=1

Subtracting u∗ (x) off each side of (1.7.8), dividing by h, and sending h to zero, we get the PDE

Lu∗ = −r, (1.7.9)

subject to u∗ = 0 on C c , where r is defined as above. Again, note the similarity to the discrete-time
equation (1.4.2). The equation (1.7.9) is a linear PDE of elliptic type.

Example 1.7.3 (Computing Expected Infinite Horizon Discounted Reward) Set, for α >
0, Z ∞ 
∗ −αt
u (x) = Ex e r(X(t))dt .
0
Then
Z h  Z ∞ 
u∗ (x) = Ex e−αt r(X(t))dt + e−αh Ex e−αt r(X(h + t))dt
0 0
Z h 
= Ex e−αt r(X(t))dt + e−αh Ex [u∗ (X(h))] .
0

Assume u∗ is smooth. Then, we can expand u∗ (X(h)) about X(0) = x and take expectations,
thereby yielding
 
d ∗ d 2 ∗
X ∂u (x) 1 X ∂ u (x)
u∗ (x) = r(x)h + (1 − αh + o(h)) u∗ (x) + µi (x)h + bij (x)h + o(h) .
∂xi 2 ∂xi ∂xj
i=1 i, j=1
(1.7.10)
As in the previous examples, we now subtract u∗ (x) from both sides, divide by h, and send h to
zero. This leads to the linear PDE
Lu∗ − αu∗ = −r. (1.7.11)
This is another PDE of so-called elliptic type.
13
§ SECTION 1: WHAT THIS COURSE IS ABOUT

Many other expectations and probabilities can be computed as solutions to linear PDE’s of
parabolic and elliptic type. These equations can typically be derived by following the same style
of heuristic argument that has been successfully utilized in the three above examples. The fact
that the above heuristic argument leads to the correct PDE is connected to our adoption of the
Ito definition in interpreting the stochastic integrals that arise in defining our SDE models. If we
had utilized a different definition of the stochastic integral, it would lead to a process X having
a different distribution, and the PDE’s we have derived would no longer correctly describe the
corresponding expectations and probabilities. Hence, a key reason for why the Ito definition is used
is because SDE’s lead to heuristic derivations that can generally be rigorously established to be
correct.

1.8 Advantages of Model Formulation in Continuous Time

In modeling the dynamics of physical systems, the use of continuous-time models (and, in
particular, SDE’s) seems particularly natural. However, it is less clear in various economic and
financial modeling contexts that continuous-time formulations are necessarily better descriptions of
the underlying system dynamics than are discrete-time models. Given the additional mathematical
challenges posed by working with continuous-time SDE’s (e.g. the necessity of developing a theory
of stochastic integration), this raises the question of whether the additional effort is balanced by
any mathematical and computational advantages to working in continuous time.

To get a sense of the potential advantages, we contrast discrete-time versus continuous-time


modeling in the setting of a stochastic model that is intended to describe the time-evolution of
the price of an asset. We start with the discrete-time model. Let Pn be the price at time n. A
natural starting point for such a model is to postulate that the percentage change in price over
each period is iid, so that the ratios (Pn /Pn+1 : n ≥ 1) are iid. This is equivalent to asserting that
the log-price process is a random walk with iid increments. In other words if Sn = log Pn , then
Sn = S0 + ξi + · · · + ξn , where the ξi ’s are iid.

The corresponding continuous-time analog is to postulate that log P (t) has stationary indepen-
dent increments. Furthermore, there is a natural continuous-time approximation to the discrete-
time process (log Pn : n ≥ 0). In particular, if E [ξi ] = 0 and var (ξi ) < ∞, the central limit theorem
(CLT) suggests that
D ∆
log Pn = Sn ≈ σB(n) = log P (t),
where (B(t) : t ≥ 0) is a one-dimension standard Brownian motion with B(0) = S0 (thereby con-
necting log P (t) to its discrete-time counterpart log Pn .

Suppose that we wish to use these two models (one in discrete time, the other in continuous
time) to answer the question: what is the probability that the price will cross the barrier at level
P0 eb sometime in the interval [0, t]? This type of probability is relevant to the analysis of certain
“barrier options” that arise in financial mathematics.
n o
For the discrete-time model, this calculation involves computing P T̃b ≤ t , where T̃b =
inf{n ≥ 1 : ξ1 + · · · + ξn ≥ b}, whereas the corresponding continuous-time computation involves
P {Tb ≤ t}, where Tb = inf{t ≥ 0 : σB(t) ≥ b}. Even in the presence of Gaussian increments
14
§ SECTION 1: WHAT THIS COURSE IS ABOUT

n o
ξ1 , ξ2 , . . ., computing P T̃b ≤ t in closed form turns out to be impossible; only asymptotics and
bounds are available. On the other hand, in continuous time, we can proceed as follows:

P {Tb ≤ t} = P {Tb ≤ t, σB(t) ≥ b} + P {Tb ≤ t, σB(t) ≤ b}


= P {σB(t) ≥ b|Tb ≤ t} P {Tb ≤ t} + P {σB(t) ≤ b|Tb ≤ t} P {Tb ≤ t} .

Conditional on {Tb ≤ t}, we intuitively expect that the increment B(t) − B(Tb ) will be independent
of (B(u) : 0 ≤ u ≤ Tb ) and have a N (0, t − Tb ) distribution. But B(t) = B(Tb ) + (B(t) − B(Tb )).
It seems reasonable to expect that B has continuous paths, in which case B(Tb ) = b. Hence, condi-
tional on {Tb ≤ t}, the symmetry of the N (0, 1) distribution suggests that P {σB(t) ≥ b|Tb ≤ t} =
P {σB(t) ≤ b|Tb ≤ t}. As a consequence,

P {Tb ≤ t} = 2 P {σB(t) ≥ b|Tb ≤ t} P {Tb ≤ t}


= 2 P {σB(t) ≥ b, Tb ≤ t}
= 2 P {σB(t) ≥ b}
n 1
o
= 2 P N (0, 1) ≥ t− 2 b/σ ,

yielding a closed form for P {Tb ≤ t}. The additional tractability that is present in continuous-
time derives from the fact that path continuity implies that B(Tb ) = b (whereas, in discrete time,
ST̃b ≥ b), as well as the symmetry of Brownian motion. There are many other calculations that
are possible for Brownian motion that are generally impossible to carry out for its discrete time
analogs. In sum, Brownian motion is the most tractable and beautiful stochastic process in the world
of probability.

The use of an SDE to model also carries with it certain numerical advantages. Consider,
for example, the linear systems of equations that arise when the elliptic PDE’s of the previous
section are discretized. These linear systems (in one dimension) are tri-diagonal linear systems;
the corresponding linear systems that approximate the linear (integral) equation associated with
discrete-time models are rarely tri-diagonal. This typically means that a discretized SDE model
can be numerically solved faster than the associated discrete-time model.

15

You might also like