Continuous Time

Stochastic Processes II
Continuous time
FS 2010
Ilya Molchanov
Contents
1 Random Elements in Functional Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Probability space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Topological and metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Borel -algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Random elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Regular and tight measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Finite-dimensional distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Moments and Gaussian processes . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.8 Equivalent processes and separability . . . . . . . . . . . . . . . . . . . . . . . . 13
1.9 Sample path continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Key properties of the Wiener process . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Brownian bridge and Brownian motion in higher dimensions . . . . . . . . . . . 25
2.4 Sample path properties of Brownian motion . . . . . . . . . . . . . . . . . . . . 26
2.5 Supremum of the Wiener process . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Arcsine law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Integration with respect to Brownian motion . . . . . . . . . . . . . . . . . . . . . . . 32
3.1 Itos stochastic integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Continuous-time martingales and properties of the stochastic integral . . . . . 37
3.3 Ito processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 Functions of Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Multidimensional integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6 Examples of applying Its formula . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.7 Stochastic dierential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4 Levy processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1 Some basic ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1
4.2 Innite divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Denition of Levy processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 The characteristic exponent of Levy processes . . . . . . . . . . . . . . . . . . . 53
4.5 The Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.6 Basic ideas of potential theory for Levy processes . . . . . . . . . . . . . . . . . 57
4.7 Subordinators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Literature:
Chapters 2 and 3 are based on the lecture notes by Enkeleid Hashorva Stochastic Processes, Bern,
2002.
J. Bertoin (1996) Levy Processes, Cambridge University Press.
P. Billingsley (1968) Weak Convergence of Probability Measures, Wiley, New York.
E. Hashorva (2002) Stochastic Processes. Lecture course.
O. Kallenberg (1997) Foundations of modern probability. Springer, New York.
N.V. Krylov (2002) Introduction to the Theory of Random, Processes. Amer. Math. Society.
K.I. Sato (1999) Levy processes and innitely divisible distributions, Cambridge University Press
A. Shiryaev, Probability, Springer.
Stochastische Prozesse (Teil II) FS 2010
1. Random Elements in Functional Spaces
1.1. Probability space
The probability space (, F, P) is a set with a -algebra F and a probability measure P. The
-algebra F is a collection of subsets of such that
F;
if A F, then A
c
F;
if A
n
F, n 1, then
n
A
n
F.
The elements of F are called events. The trivial -algebra is , , the richest one is 2
(the collection
of all subsets of ). The smallest -algebra that contains a family of sets / is denoted by (/).
The probability measure P is a function from F into [0, 1], such that
normalised, i.e. P() = 1;
-additive, i.e. if A
n
, n 1, are disjoint sets from F, then
P(
n
A
n
) =
n
P(A
n
) . (1.1)
For a general measure (instead of P) we drop the requirement that () = 1 and even allow to
take innite values. A measure is called nite if () < . A measure is -nite if can be
covered by an at most countable family of sets F
1
, F
2
, . . ., each of them with nite measure. A signed
measure may take negative values.
Proposition 1.1. The -additivity property of a nitely additive and nite set-function is equivalent
to either of the following two properties
If A
n
A for A
n
, A F, then (A
n
) (A).
If A
n
for A
n
F, then (A
n
) 0.
The following theorem is crucial for constructing measures. Recall that an algebra is a family of sets
which is closed under taking complements and nite unions and contains the empty set.
Theorem 1.2 (Caratheodory extension theorem). Let be a non-empty set and / be an algebra on
. If is a -additive and nite on /, then there exists an extension of onto (/) that agrees with
on /.
A measure-determining class ( is a family of subsets of , such that a -nite measure on F is uniquely
determined by its values on (.
If is a measure on X, denote the corresponding Lebesgue integral as
_
f(x)(dx) or
_
fd, if the
integral is taken over the whole X. Otherwise write
_
A
fd if A X.
1
Denition 1.3. Measure P is absolutely continuous with respect to Q (notation P Q) if Q(F) = 0
implies P(F) = 0 for all F F. If both P Q and P Q, then P and Q are said to be equivalent.
Theorem 1.4 (Radon-Nikodym). Let be a -nite measure on (X, A). If , then there exists
an integrable function f : X [0, ) such that
(A) =
_
A
f(x)(dx) , A A.
The function f is uniquely dened up to its values on a set of measure zero.
If for all B B(R
d
)
P(B) =
_
B
f(x) dx, Q(B) =
_
B
g(x) dx,
writing
P(B) =
_
B
f(x)
g(x)
g(x) dx
implies
dP
dQ
=
f
g
.
Denition 1.5. A random variable is a function : R, such that : () t F for all
t R.
Later on we write shortly t instead of : () t. The cumulative distribution function
(cdf) of is
F(t) = P : () t = P t .
Further we consider more general random elements, in particular those which take values in functional
spaces. Let X be a space, where our random elements take their values. Consider a -algebra A on
X. The pair (X, A) is called a measurable space.
Denition 1.6 (Random element). A function : X is called a random element if : ()
A F for all A A. In this case is also called a measurable map from to X.
The distribution of is the measure on (X, A) dened by
P A = P : () A .
It is the image of P under the map .
Note the following trivial theorem.
Theorem 1.7. Let be a probability measure on (X, A). Then there exists a probability space (, F, P)
and a random element with values in X such that is the distribution of .
Proof. Identify (, F, P) with (X, A, ) and take to be the identical map.
2
1.2. Topological and metric spaces
In order to better understand properties of random elements in general spaces we need to work out
some tools that enable us to deal with measures and -algebras on general spaces.
A topological space is a pair (X, T ), where T is the topology, i.e. the family of open sets in X such
that
and X are open;
the union of any family of open sets is open and the intersection of any nite family of open sets
is open.
Any open set that contains x X is called a neighbourhood of x. A sequence x
n
is said to converge
to x if every neighbourhood of x contains x
n
for all suciently large n.
The complements to open sets in X are closed sets. A set F is closed if and only if F contains all its
limiting points, i.e. if x
n
F for all n 1 and x
n
x as n , then x F. If M X, then the
closure of M is the smallest closed set that contains M, i.e. the intersection of all closed sets F such
that M F. Note that X and are open and closed at the same time.
A topological space X is called separable if X possesses a countable dense set, i.e. if it is possible to
nd a countable set Q X such that X is the closure of Q. In other words, it is possible to nd a
sequence x
n
such that each x X is the limit of a certain sub-sequence x
n(k)
. The Euclidean
space is separable. An example of non-separable space is provided by all real-valued functions on [0, 1]
with the uniform metric.
A metric space is a pair (X, ) such that is a non-negative real function on XX with the following
properties
(x, y) = 0 if and only if x = y;
(x, y) = (y, x) (symmetry);
(x, y) (x, z) +(z, y) (triangle inequality).
The function is called a metric. We often denote the metric by d instead of . It is assumed that
the topology on X is compatible with the metric (in other words, the topology is metrised by ), i.e.
x
n
x if and only if (x
n
, x) 0 as n .
A metric space is called complete if each fundamental sequence converges in X. Recall that x
n
is called fundamental (Cauchy sequence) if for each > 0 there is a natural number N such that
(x
n
, x
m
) < for all m, n N. Examples of complete spaces:
the Euclidean space R
d
;
the space of continuous functions with the uniform metric C[0, 1];
the space of square integrable functions with the L
2
-distance L
2
[0, 1].
A Polish space is a complete separable metric space (more exactly, a space which is homeomorphic
to a complete separable metric space). In the following we mostly consider random elements and
probability measures on Polish spaces.
3
1.3. Borel -algebra
Denition 1.8. The Borel -algebra B on a topological space X is the smallest -algebra that includes
all open sets in X.
One says that the Borel -algebra is generated by open sets. In order to construct the members of
B(X) one has to take open sets, then their countable intersections, etc. etc. If X is a separable
metric space, then the topology of X has a countable base, so that B(X) is generated by a countable
family of open sets. In this case B(X) has the cardinality c of a continuum (recall that [0, 1] has the
cardinality of the continuum). If X is the real line, then all subsets of X has the cardinality more
than continuum, which implies that there are non-Borel sets.
A probability measure P can be dened on a space X with its Borel -algebra, i.e. by setting = X
and F = B(X).
A function f : X Y between two topological spaces is called Borel if the inverse image of every
Borel set in Y belongs to the -algebra on X. The inverse image of a set U Y is denoted by f
1
(U),
so that f
1
(U) = x X : f(x) U.
A function f : X Y between two topological spaces X and Y is called continuous if the inverse
image of any open set in Y is open in X.
Theorem 1.9. Let X be a metric space. Then B(X) is the smallest -algebra, such that all continuous
real-valued functions f : X R are measurable, where R is endowed with its Borel -algebra.
Proof. Let B be the smallest -algebra that makes all continuous functions on X measurable. Then
B B(X). It suces to show that each closed subset F of X belongs to B.
Let F =
n
G
n
for open sets G
n
, n 1. Then F and X G
n
are disjoint for all n. Dene
f
n
(x) =
(x, G
c
n
)
(x, F) +(x, G
c
n
)
1 .
It is easy to see that f
n
is continuous and bounded by 1, so that the sum of the convergent series
f(x) =
n
2
n
f
n
(x)
is also a continuous function. Now note that F = x : f(x) = 1.
Consider a certain family of real-valued functions X R. The smallest -algebra C(X, ), with
respect to which all f become measurable is generated by the sets of the type x X : f(x) t
for f and t R. By taking intersections and doing countable set-operations with these sets, one
can say that this smallest -algebra C(X, ) is generated by the sets of the type
x X : (f
1
(x), . . . , f
n
(x)) B , f
1
, . . . , f
n
, B B(R
n
), n 1 . (1.2)
These sets are called cylinders.
Example 1.10. Let X be the family of real-valued functions x(t), t [0, 1]. Consider a family of
functions f
s
(x) from X to the real line dened as f
s
(x) = x(s). Then
x X : x(t
1
) [a
1
, b
1
], . . . , x(t
n
) [a
n
, b
n
]
4
is a cylinder in X. These cylinders are important in order to dene distributions of stochastic processes.
The cylindrical -algebra is the Borel -algebra on X if X is endowed with the topology of pointwise
convergence.
Recall that Theorem 1.9 says that all continuous functions generate the Borel -algebra on X. Now we look at
a general family of functions in order to nd out when this family generates the Borel -algebra.
We say that a family of functions separates points of X if for each x ,= y from X there is a function f
such that f(x) ,= f(y). A family of sets separates points of X if the corresponding indicator functions do.
Proposition 1.11. Let X be a Polish space. Then a -algebra generated by any countable family of Borel sets
that separates all points of X is the Borel -algebra on X.
Proof. Let B
n
, n 1 be the countable family of Borel sets that separate all points of X. Consider the smallest
-algebra A that contains all these B
n
s. Let B B(X). The function
f(x) =
3
n
1
Bn
(x)
is A-measurable and so is Borel since A B(X), see also Theorem 1.15. Then f(B) is a Borel set. This
follows from a general result that images of Borel sets by Borel functions are Borel sets. To prove this one
should consider the set (x, y) : x B, y = f(x) in the product space and take its projection on the second
coordinate. The fact that the result is a Borel set relies on the fact that X is Polish. Now f
1
(f(B)) A and
B = f
1
(f(B)), since f is an injective map.
Corollary 1.12 (X. Fernique). Let X be Polish. If is a countable family of real-valued Borel functions that
separates all points of X, then the generated -algebra C(X, ) coincides with B(X).
Proof. The real line R has a countable base of the topology (take rational intervals). Let U
n
be this countable
base. Then the sets B
f,n
= f
1
(U
n
) for f and n 1 build a countable family of Borel sets that separates
the points of X.
Theorem 1.13. Let X be Polish. If is any family of continuous real-valued functions that separates the
points of X, then C(X, ) coincided with the Borel -algebra B(X).
Proof. It suces to show that countable sub-family
0
of functions separates the points of X. Dene
U
f
= (x, y) X X : f(x) ,= f(y) , f .
Then
f
U
f
=
f0
U
f
, since X has a countable base. Indeed, let ( be the countable base of topology in
X X. Let (
0
be the family of all G ( such that G U
f
for some f. Associate every such G the function
f
G
such that G U
fG
. It is possible to choose as
0
the family of all f
G
for G (
0
. Indeed, let x
f
U
f
.
Then x G U
f
for some G (. Then G (
0
and f
G

0
, i.e. x
f0
U
f
.
Proposition 1.14. Let X be any set and let be a family of real functions dened on X. Then every set
A C(X, ) can be represented as A = g
1
(B), where g = (g
n
) is a sequence of functions from and
B B(R
N
).
Proof. Note that so dened sets A build a -algebra.
If f : S T is a measurable function and a measure on S then the image of is a measure f
1
on (T, T ) dened as
( f
1
)(B) = (f
1
(B)) = (s S : f(s) B)
for all measurable B T.
5
1.4. Random elements
A Borel function : X is said to be a random element in a topological space X. If X is the real
line with its Borel -algebra, we obtain the usual denition of random variables.
Theorem 1.15. Assume that X is a metric space. If
n
() () as n for all and a
sequence of random elements
n
, then the limit is measurable, i.e. is a random element.
Proof. It suces to show that
1
(U) F for each open U. Dene
U
k
= x X : (x, U
c
) > k
1
k 1 .
Then U
k
= U and
1
(U) =
_
k1
liminf
n
1
n
(U
k
)
belongs to F. Here, the set-limit liminf
n
A
n
=
m

nm
A
n
is the set of points which belong to A
n
for all suciently large n. Hence it is a measurable set if all A
n
are measurable.
Two random elements and are said to have the same distribution (or called identically distributed)
if P A = P A for all measurable sets A. We then write
d
= or
d
.
Denition 1.16. A -algebra F is said to be complete if A F and P(A) = 0 implies that F contains
each subset of A.
In other words, a complete -algebra contains all subsets of events with probability zero. If F is
complete, then Theorem 1.15 holds assuming that
n
converges to almost surely.
Denition 1.17. A sequence of random elements
n
converges to almost surely if
n
() ()
as n for all from a set of probability one.
Example 1.18. Theorem 1.15 does not hold in general if X is not assumed to be a metric space. Consider
= [0, 1] with its Borel -algebra and let X be the space of all functions x : [0, 1] [0, 1] with the topology of
pointwise convergence (this topology is not metrisable!). Dene
n
()(t) = max(1 n[ t[, 0) , n 1 .
Note that
n
()(t) is continuous with respect to , i.e.
n
(
k
) converges pointwisely to
n
() if
k
. Thus,
n
, n 1, are measurable. Then the pointwise limit of
n
() is the function ()(t) such that ()(t) = 1 is
and only if = t and is equal to zero otherwise. The set U
= x X : x() > 0 is open in X. Dene

U =
A
U
, where A is a non-Borel subset of [0, 1]. Then

1
(U) = A, which is not a Borel set.
In many instances (e.g. when dening the expectation of random variables) one tries to approximate
a random element with random elements that possess at most countable set of values.
This situation resembles the studies of random variables, when one starts with the indicator function
1
A
for the set A . The indicator function is measurable if and only if A is an event. Simple (or
step) functions are all those that can be written as linear combination of indicator functions. Quite
often, in order to prove measurability of a complicated function, we may try to approximate it by
simple functions and pass to limit.
6
A metric space is called totally bounded if it possesses a nite -net for each > 0, i.e. if X can
be covered by a nite number of balls of radius . For a metric space, the totally boundedness is
equivalent to compactness. A set K is compact if each its open cover admits a nite sub-cover, i.e. if
K is covered by any famuily of open sets, then it is covered by its nite sub-family. For a separable
space, this is equivalent to the fact that each sequence has a convergent sub-sequence.
Lemma 1.19. Let X be a separable metric space.
(i) There exists a family of maps f
n
: X X, n 1, such that f
n
(X) (the image of X under f
n
) is
at most countable for all n 1, and
sup
xX
(f
n
(x), x) 0 as . (1.3)
If X is totally bounded, then one can choose f
n
, such that f
n
(X) is nite for all n.
(ii) There exists a sequence of Borel functions f
n
: X X such that f
n
(X) is nite for all n and
f
n
(x) x as n . (1.4)
Proof. Fix a dense sequence x
n
in X. Its existence follows from the fact that X is separable.
(i) Dene
B
n,1
= x : (x, x
1
) <
1
n
, . . .
B
n,k
= x : (x, x
1
)
1
n
, . . . , (x, x
k1
)
1
n
, (x, x
k
) <
1
n
, . . .
These sets are disjoint and
k
B
n,k
= X. Let f
n
(x) = x
k
if x B
n,k
. Check that this sequence satises
(1.3). If X is totally bounded use the 1/n-net in order to construct the functions f
n
.
(ii) By replacing metric with /(1 +) we can asume that takes values in [0, 1]. Note that
d(x, y) =
n
2
n
[(x, x
n
) (y, x
n
)[
1 +[(x, x
n
) (y, x
n
)[
denes a metric on X, such that convergence in it is equivalent to the converegnce in . The map
x (x, x
n
), n 1 which associates with every x X a sequence of real numbers is an isometry
between (X, d) and a subset of compact space [0, 1]
N
, the latter equipped with metric
d(t
n
, s
n
) =
n
2
n
[t
n
s
n
[
1 +[t
n
s
n
[
.
Thus (X, d) totally bounded, i.e. for each > 0 it possesses an -net. By (i), it is possible to nd
functions f
n
with nite sets of values such that d(x, f
n
(x)) 0, whence f
n
(x) x.
Note the dierence between (1.3) and (1.4). The rst establishes the uniform convergence, while the
second one only the pointwise convergence. However, the second presumes that f
n
takes only a nite
(!) number of values.
7
Denition 1.20. A random element is called simple, if it takes a nite number of values; elementary,
if it takes at most countable values.
A set A X is said to be separable in X if A lies inside the closure of a countable subset Q X.
Theorem 1.21. Let : X for a metric space X. Then the following statements are equivalent.
(a) is measurable (i.e. is a random element) and () is separable in X.
(b) There exists a sequence of elementary random elements
n
such that
n
() () uniformly
in . If X is totally bounded, these random elements can be chosen to be simple.
(c) There exists a sequence
n
of simple random elements such that
n
() () for all .
Proof. (a)(b) If Y = () is separable, use Lemma 1.19, construct f
n
: Y Y and set
n
= f
n
().
(b)(a) Pointwise limit of random elements is a random element (see Theorem 1.15). Furthermore
() is a subset of the closure of
n
(), so it is separable. Further equivalences are proved similarly
using Lemma 1.19(ii).
A random element in X = R
d
with its Borel -algebra is a random vector. Random sequences (or
discrete time stochastic processes) are dened by taking X = R
, whereas stochastic processes are

dened for X being some function space, e.g., the space of all continuous functions on [0, 1] denoted
by C([0, 1]). If X consists of measures, then we deal with random measures.
1.5. Regular and tight measures
Consider a measure on (X, B(X)).
Denition 1.22. Measure is said to be regular if for every Borel A one has
(A) = sup(F) : F A, F closed = inf(G) : G A, G open .
This property can be formulated as follows. For every Borel A and > 0 there exist closed set F and
open set G such that F A G and (G F) .
Theorem 1.23. Let X be a Polish space. Then any nite measure on (X, B(X)) is regular.
Proof. Let be the family of all Borel sets B which are regular, i.e. for every > 0 there exist closed
set F and open set G such that F A G and (G F) .
First show that each closed ball B
r
(x) belongs to . Indeed,
F = B
r
(x) G
n
= x : (x, y) < r + 1/n .
Since (G
n
F) = , we have (G
n
F) 0.
It remains to show that is a -algebra. First, X , since X is open and closed at the same
time. Furthermore, if B , then B
c
(use the fact that is nite). If B
n
, n 1, choose
F
n
B
n
G
n
with (G
n
F
n
) 2
n
. Dene
B =
n1
B
n
, G =
n1
G
n
, D
n
=
n
i=1
F
i
.
8
Then
lim
n
(G D
n
) = (G D
n
(G
n
F
n
) .
Note that the conclusion of Theorem 1.23 holds also for any metric space X.
Corollary 1.24. If
1
and
2
are two nite measures on Polish space X and
1
(F) =
2
(F) for all
closed F, then
1
and
2
coincide.
Proof. First
1
(X) =
2
(X), then the equality holds for all open sets, etc.
Further 1
A
(x) is the indicator of set A, so that
_
1
A
d = (A).
Theorem 1.25. Let
1
and
2
be two nite measures on a Polish space X with its Borel -algebra.
If
_
fd
1
=
_
fd
2
for all bounded continuous functions f, then
1
=
2
.
Proof. Let F be any closed set. The distance (x, F) is a continuous function. Thus,
1 f
n
(x) = (1 +n(x, F))
1
1
F
(x)
for a sequence of continuous functions f
n
. By the condition, integrals of f
n
with respect to
1
and
2
coincide, so that the dominated convergence theorem implies that the limits coincide. Note that
_
f
n
d
i

i
(F), i = 1, 2.
Denition 1.26. A measure on a metric space X is said to be tight if for each > 0 there exists a
compact set K such that (K
c
) < .
Theorem 1.27 (Ulam). Let be a nite measure on Polish space X. Then is tight.
Proof. Let x
n
be a dense subset of X. Then
i
B
1/n
(x
i
) = X for every n 1. Thus, there exists an
i
n
such that
(
iin
B
1/n
(x
i
)) (X) 2
n
.
Dene
K =
n1
_
iin
B
1/n
(x
i
) .
The set K is totally bounded, since the nite -net is x
1
, . . . , x
in
for n 1/. Furthermore, K is
closed, and so is compact. Finally, notice that
(K
c
)
n
((
_
iin
B
1/n
(x
i
))
c
)
n
2
n
.
Denition 1.28. A nite Borel measure is called a Radon measure if, for each Borel set B, its
measure (B) equals the supremum of (K) for all compact sets K B.
Clearly, each Radon measure is regular and tight. Each nite Borel measure in a Polish space is
Radon.
9
1.6. Finite-dimensional distributions
A stochastic process is a random element in the space of functions, i.e. X is the space of functions
and () is a function for each . Since it is a function, it also has an argument, which is usually
denoted by t. The set of argument values T is called the time domain. From now on we mostly
consider stochastic processes with continuous time. The time domain T is often chosen to be [0, 1] or
[0, ) or R or R
d
.
For a stochastic process we write (t, ) or (t) or
t
or
t
(). For each xed the function (t, ),
t T, is called a realisation of the stochastic process (or trajectory or sample path). The values of
are assumed to belong to a metric seprable space S. Then is a random element in the space X = S
T
of S-valued functions with argument from T. From now on we will concentrate mainly on R-valued
(real-valued) stochastic processes and sometimes will also consider the R
d
-valued case.
Example 1.29 (Rather trivial stochastic process). Let (t) = t, t [0, 1] and a random variable.
For any we have (t, ) = t() is a function of t.
In order to generate a -algebra on S
T
we may use evaluation maps
t
: S
T
S dened by
t
(x) =
x(t), t T, i.e. the corresponding cylindrical -algebra becomes
C =
t
, t T
is the minimal -algebra which makes all evaluation maps
t
measurable. By the denition and the
fact that
t
=
t
is a function of two variables (, t) the random S
T
-valued element is measurable
if and only if all the S-valued functions
t
() : S are measurable for all t T. This corresponds
to the naive interpretation of a stochastic process as a collection of random variables
t
indexed by
time t.
The associated nite-dimensional distributions are dened for every nite subset T
n
T by
t
1
,...,tn
= P (
t
1
, . . . ,
tn
)
1
.
The nite dimensional distribution
t
1
,...,tn
is the distribution of the random element ((t
1
), . . . , (t
n
))
which takes values in the space S
n
= S S with the corresponding Borel -algebra B(S)
n
= B(S
n
).
The cylinders form an algebra in S
T
. Since the values of the probability measure on an algebra
determine uniquely its values on the -algebra B
T
generated by all cylinders, we obtain the following
result.
Proposition 1.30. Let and be two stochastic processes. Then
d
= if and only if the joint
distributions of (
t
1
, . . . ,
tn
) and (
t
1
, . . . ,
tn
) coincide for all t
1
, . . . , t
n
T and all n 1.
If is real-valued, the nite dimensional distribution
t
1
,...,tn
is dened by means of the corresponding
multivariate cumulative distribution function
F
t
1
,...,tn
(x
1
, . . . , x
n
) = P
t
1
x
1
, . . . ,
tn
x
n
, x
1
, . . . , x
n
R, n 1 .
The following consistency conditions (assumed to hold for all n 1, t
1
, . . . , t
n
T and x
1
, . . . , x
n
R)
are necessary for the existence of stochastic processes. For any permutation (i
1
, . . . , i
d
) of (1, . . . , n)
F
t
i
1
,...,t
in
(x
i
1
, . . . , x
in
) = F
t
1
,...,tn
(x
1
, . . . , x
n
), x R
n
,
10
and
F
t
1
,...,tn
(x
1
, x
2
, . . . , x
n
) = F
t
1
,...,tn,t
n+1
(x
1
, x
2
, . . . , x
n
, ) .
Such family of distribution functions is called a consistent family. The following theorem is called the
Kolmogorov theorem on nite dimensional distributions.
Theorem 1.31. Assume that F
t
1
,...,tn
for t
1
, . . . , t
n
T and n 1 is a family of distribution functions.
Then there exists a stochastic process with these nite dimensional distributions if and only if F
t
1
,...,tn
is a consistent family.
Proof. Necessity is evident, also for stochastic processes with values in an arbitrary space S. Then
one works with a consistent family
t
1
,...,tn
of measures on S
n
for t
1
, . . . , t
n
T and n 1.
Suciency. Dene from F
t
1
,...,tn
a consistent family of probability measures
t
1
,...,tn
on R
n
for n 1.
A cylinder C might have various representations. The rst step is to show that C remains the same
whatever the representation of C is. This implies that we can consistently dene a probability measure
P on the algebra of cylinders.
Note that so dened P is nitely additive. In order to show this, it suces to embed the cylinders
C
1
, . . . , C
m
into a space R
N
with, perhaps, large N by merging the corresponding sets of t-values that
come from the constructions of C
1
, . . . , C
m
.
In order to be able to extend P onto the cylindrical -algebra we need to show that P is -additive on
cylinders, which is equivalent to the continuity of P at the empty set. Consider a decreasing sequence
C
n
, n 1, of cylinders such that C
n
. We aim to show that P(C
n
) 0 as n . Assume that
it is not the case, i.e. P(C
n
) > for some > 0 and all suciently large n.
Without loss of generality we have
C
n
= x S
T
: (x
t
1
, . . . , x
tmn
) A
n
, A
n
B(R
mn
) .
It is also possible to assume that the time moments used to dene C
n
form a growing sequence of
nite sets, in particular m
n
is non-decreasing. Without loss of generality assume that m
n
= n (this
can be achieved by either omitting some of the C
n
s or inserting some identical elements).
Each probability measure on R
n
is tight, so that there exists a compact (closed bounded) set K
n
A
n
such that
t
1
,...,tn
(K
n
) >
t
1
,...,tn
(A
n
) 2
n
,
where > 0. Dene cylinders
D
n
= x S
T
: (x
t
1
, . . . , x
tn
) K
n
and further build D
n
= D
1
D
n
in order to obtain a decreasing sequence. Then
D
n
= x S
T
: (x
t
1
, . . . , x
tn
) K
for compact sets K
n
, n 1. The construction of the sets K
n
implies that if (z
1
, . . . , z
n
, . . . , z
n+l
)
K
n+l
, then (z
1
, . . . , z
n
) K
n
. This, in a sense, means that K
n
is decreasing, although these sets lie if
dierent spaces. Furthermore
P(D
n
) = P(C
n
) P(
n
i=1
(C
n
D
i
))
n
i=1
P(C
n
D
i
)
n
i=1
P(C
i
D
i
) > 0 .
11
Thus, K
n
is non-empty for any n. By a diagonal method one can nd a sequence (z
1
, z
2
, . . .) so that
for any n its restriction (z
1
, . . . , z
n
) K
n
. Thus, D
n
contains a function x S
T
such that x
tn
= z
n
for all n and so is non-empty. This contradicts the assumption, since
n1
C
n

n1
D
n

n1
D
n
,= .
The proof of suciency holds for all stochastic processes with values in Polish spaces.
1.7. Moments and Gaussian processes
The rst moment (if it exists) of a stochastic processes
t
, t T, is called the mean function or
sometimes referred as the trend or drift. It is dened by m
t
= E
t
, t T. In general, m
t
can be
innite or may not exist at all.
The variance function is dened by
Var
t
= E[
2
t
] [E
t
]
2
, t T.
The dependence between the values of the process at dierent time moments s, t is captured by the
covariance function
ts
= Cov(
s
,
t
) = E[
t
s
] E
s
E
t
, (s, t) T T.
Obviously,
tt
gives the variance function. The covariance function is symmetric. i.e.
ts
=
st
and
positive denite, i.e.
n
j,k=1
c
j
c
k
t
j
t
k
0 (1.5)
for all n 1, t
1
, . . . , t
n
T and all complex-valued c
1
, . . . , c
n
where c is the complex conjugate.
The correlation function is dened by
st
=
Cov(
s
,
t
)
Var
t
Var
s
, (s, t) T T.
Clearly,
(s, t) [1, 1].

The process
t
, t R, is said to be stationary in the wide sense if m
t
does not depend on t and
ts
= (t s) depends only on the dierence t s. The process
t
is said to be stationary (or strict
stationary) if the nite-dimensional distributions do not change after time shift, i.e. (
t
1
, . . . ,
t
k
) and
(
t
1
+s
, . . . ,
t
k
+s
) share the same distribution for all t
1
, . . . , t
k
, s R and all k 1.
The most important distribution in probability theory, the Gaussian distribution, is determined by its
mean and the variance. Take any real-valued function m
t
, t T, and a real-valued symmetric positive
denite function
ts
dened on T T. For each t
1
, . . . , t
n
T and n 1 the matrix (
t
j
t
k
)
n
jk=1
is
positive denite and can be used as a covariance matrix for the normal law. Assume that (
t
1
, . . . ,
tn
)
has the Gaussian distribution with mean vector (m
t
1
, . . . , m
tn
) and the covariance matrix (
t
j
t
k
)
n
jk=1
.
This family of nite dimensional distributions is consistent and denes the probability measure on
R
T
. The corresponding stochastic process is said to be Gaussian.
Since the Gaussian distribution is determined by its rst two moments, the statioinarity in the wide
sense and the strict stationaity are the same concept for Gaussian processes.
12
Example 1.32 (Finite-dimensional distribution of the Wiener process). Take m
t
identically equal zero
for all t [0, ) and set
ts
= t s = min(t, s). In order to show that this function is positive denite
we construct a Gaussian random vector such that (
t
j
t
k
)
n
jk=1
becomes its covariance matrix. Let
0 = t
0
t
1
t
2
t
n
be points from [0, ). Consider i.i.d. standard Gaussian random variables
1
, . . . ,
n
and dene
1
=
t
1
t
0
1
with t
0
= 0, then
2
=
1
+
t
2
t
1
2
,
3
=
2
+
t
3
t
2
3
, etc.
For j k,
E(
j
k
) = E
_
j
i=1
_
t
i
t
i1
i
k
l=1
_
t
l
t
l1
l
_
=
j
i=1
(t
i
t
i1
)E
2
i
= t
j
= min(t
j
, t
k
) .
The stochastic process W
t
with these nite dimensional distributions is called the Wiener process.
For a stochastic process (t) = (
1
(t), . . . ,
d
(t)) with values in R
d
, d 2, we also sometimes consider
the vector-valued mean E(t) and the matrix-valued cross-correlation function
ij
st
= Cov(
i
(s),
j
(t)) , 1 i, j d.
Now we state a general version of Fubinis theorem.
Theorem 1.33. Let
t
, t [0, T] be a stochastic process dened on the probability space (, F, P)
with regular sample paths (i.e.
t
() has at all t [0, T] a.s. left and right limits). Then
_
T
0
E[
t
[ dt = E
_
T
0
[
t
[ dt.
Furthermore if the above quantity is nite then
_
T
0
E
t
dt = E
_
T
0
t
dt < .
Proof. The result follows from Fubinis theorem if one shows that the properties of the trajectories of
imply that (t, ) is jointly measurable with respect to the product -algebra B([0, T]) F.
In a similar way (and under necessary integrability conditions) one ontains the expression for the
second moment
E
__
T
0
t
dt
_
2
= E
__
T
0
t
dt
_
T
0
s
ds
_
=
_
T
0
_
T
0
E[
t
s
]dtds .
1.8. Equivalent processes and separability
A stochastic process is said to be stochastically continuous if
t
converges to
s
in probability for
t s for all s T. This is a rather weak assumption, which does not imply the continuity of the
sample paths.
The cylindrical -algebra C coincides with the Borel -algebra B(X) if X is equipped with the topology
of pointwise convergence. This means that all events from C are determined by the values of
t
for a
countable family of time moments t. However, this does not suce to handle many important events
13
like the event is a continuous function of t or is bounded for all t [0, 1]. These events are
not measurable with respect to the cylindrical -algebra. Similarly, the functionals like sup
tT

t
or
limsup
tt
0

t
are not measurable and so do not automatically become random variables.
It is possible to settle these issues by requiring that all values (t) can be explored using only countable
dense set of time moments.
Denition 1.34. Two stochastic processes (t), t T and (t), t T dened on a common
probability space (, F, P) and taking values in the state space (S, o) are said to be (stochastically)
equivalent if P(t) = (t) = 1 for all t T. Then is said to be a version of .
It should be noted that two processes may be versions of each other but they dont have the same
sample paths.
Example 1.35. Consider two stochastic processes on the probability space ([0, 1], B([0, 1]), L) with L
the Lebesgue measure dened as (t) = (t, ) = 0 and (t) = (t, ) = 1(t = ) for each [0, 1].
Clearly, for xed t [0, 1] the set of s such that is not equal to is t and it has Lebesgue measure
0. So the set where the processes are equal has probability 1, in other words P(t) = (t) = 1 for
all t T. But the processes have clearly dierent sample paths!
In many cases we aim to establish the existence of a version of that satises some properties, e.g.
is continuous or monotone. The general method suitable to prove these results relies on the following
construction:
Choose a countable dense set D T such that is stochastically continuous at all points from
T D.
Check that
t
for t D has the desired property.
Set

t
=
t
for all t D.
Dene

t
for t T D as a limit of
tn
for t
n
D as t
n
t. Take care that this operation yields
a result for all .
Prove that

t
is a random variable.
By stochastic continuity conclude that

is a version of .
Denition 1.36. The stochastic processes (t), t T and (t), t T which are versions of each
other are said to be indistinguishable if
P(t) = (t) t T = 1.
(Note that the above event does not belong to the cylindrical -algebra, so that its probability is
understood as the outer probability, dened in the beginning of the next section).
Events, which are measurable with respect to the cylindrical -algebra are determined by the values
of the stochastic process at most countable family of time moments. The following denition aims to
explore the whole path of a stochastic process using only a countable family of time moments.
14
Denition 1.37. Stochastic process (t, ) is said to be separable if there exist a countable dense set
D T and an event
0
of probability 0, such that for each open G T and closed F R,
: (t, ) F t D G : (t, ) F t G
is a subset of
0
. The set D is called a separant of .
In other words, is separable if with probability 1, for all t T D the value
t
is an accumulation
point (i.e. a partial limit) of
s
as s t and s D. Alternatively, the graph of
t
, t T, is contained
in the closure of the graph of
s
, s D.
Proposition 1.38. A stochastic process (t), t T is separable (for the topology of the real line)
if and only if there exists a countable dense separant D T and an event set
0
F with probability
0 so that for every ,
0
and t T, there exists a sequence of time points t
1
, t
2
, . . . D such that
t
n
t and (t
n
, ) (t, ).
Proof. Clearly if (t
n
, ) F for a closed set F, then (t
n
, ) (t, ) implies that (t, ) F hence
the suciency follows. Now let us show the necessity. Assume that X is separable with a separant
D and negligible event
0
. Let t T and ,
0
. Let O
n
be a nested sequence of open intervals
containing t such that
n
O
n
= t. Dene
F
n
= y : [y (t, )[ 1/n, n 1 ,
which is clearly a closed set. By the assumption since ,
0
we can not have (s, ) F
n
for all
t O
n
D, hence there exists t
n
O
n
D with [(t
n
, ) (t, )[ < 1/n. Clearly, t
n
t as n ,
hence (t
n
, ) (t, ) as n .
The above proposition implies that
sup
tO
(t, ) = sup
tOD
(t, )
for all ,
0
. The right hand side above is a measurable function, since the supremum is taken over
a countable set. If T
n
D and
n
T
n
= D, then by the separability,
sup
tTnO
(t) sup
tO
(t, )
as n . So we may approximate the distribution of supremum by nite dimensional distributions.
The separability denition can be extended for stochastic processes with values in a general metric
space X. If X is separable and locally compact and T is any separable metric space, then each
stochastic process has a separable modication which takes values in a compactication of X.
The following theorem given without proof assures that every stochastic process has an equivalent
separable version.
Theorem 1.39. Let (t), t T be a stochastic processes.
a) If takes values from a bounded subset of R, then there exists a separable version of .
b) If in addition (T, ) is a topological space with countable base then there exists a separable version
of that takes values in extended real line [, ] (without assuming that is bounded). If is
stochastically continuous, then each countable dense set D T can be chosen as the separant of
.
15
There is some concern about
taking possibly innite values. Nevertheless, [(t)[ is a.s. nite for

each t T. For most of the processes of interests which are smooth enough there exists a separable
version
that does not take innite values at all.

Example 1.40. Let T = [0, 1] and (, F, P), with = T, F = B([0, 1]) and the Lebesgue measure
P. Consider a stochastic process (t), t T dened by (t, ) = 1 if t = and 0 otherwise. Then
(t) = 0 on the set : ,= t. This process is not separable since P(t) = 0, t O = 1 P(O)
and P(t) = 0, t O D = 1, where with D the set of all rational numbers in [0,1].
Example 1.41. Let T be an uncountable compact subset of R and denote C(T) the set of all con-
tinuous functions on T. Clearly C(T) R
T
. Further let (t) be a stochastic process on (, F, P).
Then
: (t, ) C(T) =
n=1
_
k=1
[ts[<1/k
: [(t, ) (s, )[ 1/n , B
T
.
If the stochastic process is separable with separant D, then
: (t, ) C(T) =
n=1
_
k=1
[ts[<1/k,s,tD
: [(t, ) (s, )[ 1/n B
T
.
1.9. Sample path continuity
Despite some properties of sample paths (e.g., continuity or monotonicity) do not form measurable
events, it is possible to determine their outer probabilities. Even if C is a non-measurable set, its
outer probability P
(C) is dened as the inmum of
P(A
i
), where C (
i
A
i
) for measurable
events A
1
, A
2
, . . ..
Theorem 1.42. If F R
T
, then there exists a stochastic process with distribution P whose realisations
almost surely belong to F if and only if P
(F) = 1.
Proof. The necessity is trivial. For suciency, assume that P
(F) = 1. Choose F to be the space of

elementary events and dene -algebra F to be the family of all subsets of F that can be represented
as A = F B for B from the cylindrical -algebra and set Q(A) = P(B) in this case. Dene the
stochastic process as
t
(x) = x
t
for x F.
To check the correctness of this denition, assume that A = F B
1
= F B
2
. Then F (B
1
B
2
) = ,
i.e. F R
T
(B
1
B
2
). Since the outer probability of F is one, P(B
1
B
2
) = 0, so the denition of
Q is unambiguous.
It is clear that Q is a probability measure and the nite dimensional distributions of the newly dened
stochastic process coincide with those of the original process:
Qx F : (
t
1
(x), . . . ,
tn
(x)) C = Px R
T
: (x
t
1
, . . . , x
tn
) C .
Let C(T) denote the family of continuous functions on T. We aim to specify some conditions that
guarantee the existence of a version of that is continuous, i.e. belongs to C(T). If is itself separable,
this immediately would mean that is almost surely continuous.
16
Example 1.43. Let (, F, P) a probability space, with = [0, 1], F = B([0, 1]) and the Lebesgue
measure P. Consider a stochastic process (t), t [0, 1] dened as
(t, ) =
_
0, t <
1, t .
Let us show that is a separable stochastic process. Put D = Q[0, 1] the set of all rational numbers
in [0, 1]. For xed 0 t < 1 we have (t, ) = 0. Since D is dense in [0, 1], we can nd t
n
t with
t
n
D, so that for all large n we have t
n
< implying thus (t
n
, ) = 0, hence (t
n
, ) (t, ) as
n . Similarly we can show the convergence for all t . The set S
t
= : = t has probability
0, implying the separability of . Observe now that in S
t
the process is discontinuous, however we
know that this is a negligible set. Since P
_
t[0,1]
S
t
_
= 1, the process does not have continuous
sample paths.
Theorem 1.44 (Kolmogorov continuity criterion). If
t
, t 0, is a separable stochastic process such
that, for some h, , > 0 and C < , we have
E[
t

s
[
C[t s[
1+
, t, s 0, [t s[ h, (1.6)
then is continuous.
Proof. First of all, note that is stochastically continuous. We can split the whole half-line into a
countable collections of (possibly overlapping) intervals of length h and prove the continuity on each
of those intervals. Because of this, it suces to consider
t
, t [0, 1], and assume that (1.6) holds for
all t, s [0, 1].
We aim to show that the process is uniformly continuous on the set T
0
of dyadic rationals k/2
n
[0, 1].
By the generalised Markov inequality,
P
_
[
(k+1)2
n
k2
n[ q
n
_
C2
nn
q
n
.
For q = 2
/2
< 1 we get
P
_
[
(k+1)2
n
k2
n[ q
n
_
C2
n
r
n
,
where r = 2
/2
< 1. Denote
n
= max
0k2
n
<(k+1)2
n
1
[
(k+1)2
n
k2
n[ .
Therefore, for all n 1,
P
n
= P
_
_
_
_
0k2
n
<(k+1)2
n
1
[
(k+1)2
n
k2
n[ q
n
_
_
_
0k2
n
<(k+1)2
n
1
P
_
[
(k+1)2
n
k2
n[ q
n
_
2
n
C2
n
r
n
= Cr
n
.
The sum of these probabilities over n 1 converges, so the Borel-Cantelli lemma yields that
n
< q
n
almost surely for all suciently large n.
17
If t = i2
m
and s = k2
n
with n > m and [t s[ < 2
m
, then
[
t

s
[
m+1
+ +
n
.
Thus, for each t, s T
0
with 2
m
> [t s[ 2
m1
,
[
t

s
[ 2
n=m+1
n
.
Since the latter series converges, is uniformly continuous on T
0
.
If the time argument t belongs to R
d
, a similar condition is applicable with C|t s|
d+
in the right-
hand side of (1.6). However, if T is a general metric space, the situation is far more complicated. In
the following we mostly consider T = [0, 1].
We conclude our discussion giving another criterion for sample path continuity of stochastic process.
Theorem 1.45. Let (t), t [0, 1] be a stochastic process. If there are non-decreasing functions
g, h such that
n=1
g(2
n
) < ,
n=1
2
n
h(2
n
) < ,
and for all t < t + and t, t + [0, 1]
P[(t +) (t)[ g() h(),
then there exists a version of with continuous sample paths.
If is continuous or has a continuous version, it is convenient to consider it an element in the space
C(T) of continuous functions on T with the uniform metric. The cylinder sets restricted to C(T)
generate the cylindrical -algebra on C(T). Since C(T) is Polish, we can use Theorem 1.13 to show
that the cylindrical -algebra coincides with the Borel -algebra generated by the topology of uniform
convergence on C(T). The following result shows that continuous stochastic processes are, in fact,
random elements in C(T). It opens a way to use the previous results for random elements in Polish
spaces in order to handle stochastic processes.
Proposition 1.46. A random element in C(T) is a continuous stochastic process. Conversely, each
continuous stochastic process is a random element in C(T).
Proof. If is a random element in C(T), consider a continuous (so measurable) map
t
, so
t
is
a random variable.
If is a continuous stochastic process, then use the fact that the cylinders in C(T) generate the Borel
-algebra.
One particularly important probability measure on C is the Wiener measure, whose existence is
established in the following theorem.
Theorem 1.47 (Wiener measure). There exists a unique probability measure on C[0, 1] (or the
corresponding random element W) such that the following properties hold.
18
1. W
0
= 0 a.s. (i.e. (x : x
0
= 0) = 1).
2. If 0 t
1
< t
2
< < t
n
1, then the random variables W
t
1
,W
t
2
W
t
1
,. . . , W
tn
W
t
n1
are
(jointly) independent (i.e. W has independent increments).
3. If 0 s < t 1, the random variable W
t
W
s
is normally distributed with mean 0 and variance
t s.
Proof. The nite dimensional distributions are multivariate normal (Exercise: describe the corre-
sponding covariance matrices and show that the distributions form a consistent family). Thus, there
exists a stochastic process that satises the required conditions. It remains to show that the process
is continuous. Recalling the formula for the 4th moment of the Gaussian random variables, we obtain
E[W
t
W
s
[
4
= 2[t s[
2
,
so that Theorem 1.44 is applicable.
The stochastic process from Theorem 1.47 is called the (standard) Wiener process. It will be shown
later on that the Wiener process appears as a limit for random walks in the scheme similar to the
classical central limit theorem. Namely, if S
n
=
1
+ +
n
for i.i.d. random variables
1
, . . . ,
n
with mean zero and variance 1, then
(n)
t
= n
1/2
S
[nt]
converges in distribution to the Wiener process.
This result is called the invariance principle or the Donsker theorem. For this however, one needs to
work out a number of tools suitable to handle weak convergence of measures on metric spaces.
19
2. Brownian motion
2.1. Denition
Brownian motion was discovered by botanist Robert Brown in 1827. He found that small particles
suspended in distilled water were in continuous movement. His discovery did not receive much atten-
tion for a long time until before the turn of the 20th century. The rst quantitative work on Brownian
motion is due to Bachelier in 1900, who was interested in stock price uctuations. In 1905 Einstein
made a historical breakthrough by discovering the underlying mathematical laws governing the move-
ments of particles. A rigorous mathematical treatment of Brownian motion began with Wiener, in
1923, who provided the rst existence proof.
Denition 2.1. The standard Brownian motion (the Wiener Process) is a real-valued stochastic
process W
t
, t 0, such that
a) W
0
= 0 a.s.;
b) the trajectories W
t
, t 0, are a.s. continuous;
c) W
t
, t 0, has independent increments;
d) For any 0 s < t the random variable W
t
W
s
is Gaussian with mean 0 and variance t s.
Note that W
t
W
s
for t s is called the increment. The stochastic process is said to have independent
increments if the random variables
W
t
2
W
t
1
, . . . , W
tn
W
t
n1
are jointly independent whatever n 2 and points 0 t
1
t
2
t
n
are.
The existence of a sample continuous stochastic process with such nite dimensional distributions
has been shown in the previous chapter. Note that we write W
t
or W(t) and sometimes B
t
or B(t)
to designate the Brownian motion. The general Brownian motion satises the above properties, but
W
t
W
s
has the variance
2
(t s) for a certain parameter . The process W
t
+m(t) +x with m(t)
a continuous function with m(0) = 0 and a real number x is called a Brownian motion with drift
function m(t) started at x.
Recalling the denition above, we can easily see that W
t
is normally distributed with
EW
t
= 0, Var W
t
= t, t 0.
For time moments 0 = t
0
< t
1
< < t
n
< , n 1, the random vector (W
t
1
, . . . , W
tn
) has the
probability density function
n
i=1
p(t
i
t
i1
, x
i1
, x
i
) , x = (x
1
, . . . , x
n
) R
n
,
where
p(t, x, y) =
exp((y x)
2
/(2t))
2t
20
is the probability density function of the normal distribution with mean x and variance t. This
can be shown by noticing that (W
t
1
, . . . , W
tn
) is a linear transformation of the vector (W
t
1
, W
t
2

W
t
1
, . . . , W
tn
W
t
n1
). The function p(t, x, y) can be interpreted as the transition density of the
Wiener process, i.e.
_
A
p(t, x, y)dy = PW
s+t
A[W
t
= x .
In other words, p(t, x, y) is a continuous analogue of the transition probability, it yields the density of
W
t+s
given that W
s
= x and so corresponds to the transition from x to y in time t.
Since the nite-dimensional distributions of W
t
are jointly Gaussian, all linear combinations like
a
i
W
t
i
are normally distributed and all conditional distributions are also normal. In order to identify
these distributions, one needs to calculate the expectations, variances and covariances (if needed).
Corollary 2.2. The covariance function of the Wiener process is given by the following relation:
Cov(W
s
, W
t
) = min(s, t) = s t, 0 < s < t < .
Proof. Since Var W
s
= s, Var W
t
= t with s < t, and Var (W
t
W
s
) = t s,
Cov(W
s
, W
t
) = [Var W
s
+ Var W
t
E(W
t
W
s
)
2
]/2 = s = min(s, t).
Summarising we obtain a clear distributional picture of the Wiener process in terms of nite-dimensional
distributions.
Proposition 2.3. For all 0 < t
1
< t
n
, the random vector (W
t
1
, . . . , W
tn
) is Gaussian with mean
zero and covariance matrix given by
Cov(W
t
i
, W
t
j
) = min(t
i
, t
j
) .
A stochastic process whose nite dimensional distributions are Gaussian is called a Gaussian process.
Thus, we obtain an equivalent denition of the Wiener process.
Proposition 2.4. A stochastic process W
t
, t 0, is the Wiener process if and only if the following
properties are satised:
a) W
0
= 0 a.s.;
b) the trajectories W
t
, t 0, are a.s. continuous;
c) W
t
, t 0, is a Gaussian process;
d) for any t, s 0, EW
t
= 0 and Cov(W
t
, W
s
) = t s.
Example 2.5. In order to nd the distribution of a linear combination of the values of W
t
one has
to calculate the corresponding mean and variance, e.g. for = W
1
+ 3W
4
2W
5
we get E = 0 and
Var () = E(W
1
+ 3W
4
2W
5
)
2
= 1 + 9 4 + 4 5 + 2 3 1 2 2 1 2 3 5 4.
21
Example 2.6. Let us calculate the probability of non-crossing of level 0 for W
t
at points 0, 1, 2. By
the denition and the independence of increments,
PW
t
0, t = 0, 1, 2 = PW
1
0, W
2
0
= PW
1
0, W
1
+W
2
W
1
0 =
_
x0
PW
2
W
1
x d(x)
=
_
x>0
PW
1
> x d(x) =
_
x>0
(x) d(x) =
_
x[1/2,1]
s ds = 3/8.
Example 2.7. For the Wiener process W
t
dene
t
=
_
t
0
W
s
ds, t > 0. The integral is well dened and
measurable (as the limit of integral sums), so that
t
is again a stochastic process. Each integral sum is
normally distributed. Since the limit of normally distributed random variables is normally distributed,
we obtain that
t
is normally distributed for each t. Thus, we need to identify the expectation and
the variance of
t
:
E
_
t
0
W
s
ds = lim
n
E
i
W
t
i
t
i
= 0, t > 0.
By Fubinis theorem,
Var
_
t
0
W
s
ds = Cov(
_
t
0
W
s
ds,
_
t
0
W
s
ds) = E
_
t
0
W
u
du
_
t
0
W
v
dv
=
_
t
0
_
t
0
EW
u
W
v
dudv =
_
t
0
_
t
0
Cov(W
u
, W
v
) dudv =
_
t
0
_
t
0
min(u, v) dudv
= 2
_
t
0
_
u
0
v dudv =
_
t
0
u
2
du = t
3
/3.
Since
_
t
0
_
t
0
E[W
u
W
v
[ dudv
_
t
0
_
t
0
uv dudv < ,
the conditions of Fubinis theorem are satised.
2.2. Key properties of the Wiener process
The following results about transformation of the Wiener process can be proved by applying Propo-
sition 2.4.
Lemma 2.8 (Scale invariance). If W
t
, t 0, is the Wiener process, then so is
W
a
(t) = a
1/2
W
at
, t 0 ,
for all a > 0.
Proof. Scaling does not inuence the almost sure continuity of the sample paths. Further W
a
(0 = 0
a.s. It remains to show that the increments are independent Gaussian random variables and calculate
the variance of
W
a
(t) W
a
(s) =
W
at
W
as
a
1/2
which is equal to a
2
(t s)/a
2
= t s.
22
Lemma 2.9 (Time shift). If W
t
, t 0, is the Wiener process and T > 0 a xed constant, then
W
T
(t) = W
t+T
W
T
, t 0,
is also the Wiener process which is independent of W
s
, s [0, T].
Proof. The mean function of the process W
T
is 0 since EW
T
(t) = E(W
t+T
W
T
) = 0. If h > 0, then
Cov(W
T
(t), W
T
(t +h)) = Cov(W
T+t
W
T
, W
T+t+h
W
T
)
= T +t T T +T = t.
The independence follows from the independence of the increments.
Lemma 2.10 (Symmetry). If W
t
, t 0, is the Wiener process, then so is W
t
, t 0.
Lemma 2.11 (Time inversion). If W
t
, t 0, is the Wiener process, then the stochastic process
W
I
(t) =
_
tW
1/t
, t > 0
0, t = 0
, t 0 ,
is stochastically equivalent to W.
Proof. Clearly, EW
I
(t) = tEW
1/t
= t0 = 0 for all t > 0. For any h > 0,
Cov(W
I
(t), W
I
(t +h)) = t(t +h)Cov(W
1/t
, W
1/(t+h)
) = t(t +h) min(1/t, 1/(t +h)) = t.
It remains to conrm that W
I
is continuous at the origin. For this, write
W
I
(t) =
W
1/t
1/t
,
so the continuity as t 0 would follow from
W
s
s
0 as s .
This is the strong law of large numbers for the Wiener process. It can be rst shown for s = n, since
W
n
n
=
W
1
+ (W
2
W
1
) + + (W
n
W
n1
)
n
and using the conventional strong law of large numbers for the sum of i.i.d. random variables in the
numerator. For the general s one can write
W
s
s
=
W
n
+ (W
s
W
n
)
n
n
s
for n being the integer part of s. By the time shift, W
s
W
n
is equivalent to W
t
with t = sn [0, 1)
being the fractional part of s and noticing that the absolute value of W
t
, t [0, 1], is bounded in view
of the continuity property.
23
A family of -algebras F
t
, t 0, is called a ltration if F
s
F
t
whenever s t, i.e. this family if
non-decreasing. Assume that the ltration is right-continuous, i.e.
F
t
=
s>t
F
s
= F
t+
, t 0 .
The process W
t
, t 0 is called the (F
t
)-Wiener process if W
t
is adapted (i.e. W
t
is F
t
-measurable for
all t) and if the increment W
s
W
t
is independent of F
t
for all t s. For simplicity, dene F
t
as the
-algebra generated by W
s
for s [0, t].
The Markov property of the Wiener process can be formulated as follows. Let F
t
be the -algebra
generated by W
s
for s t and let F
t
be the -algebra generated by W
t+s
W
t
for s 0. Then these
two -algebras are independent, i.e. P(AB) = P(A)P(B) for each A F
t
and B F
t
.
A random variable with values in [0, ] is said to be a stopping time if t F
t
for all t. The
stopping -algebra F
consists of all measurable events A such that A t F

t
for all t. In
plain words, A is determined by the behaviour of W
t
for t [0, ], so F
is generated by the random

variables W
t
for t 0.
The following result extends the time shift property for T being a stopping time.
Theorem 2.12 (Strong Markov property). For each a.s. nite stopping time , the process B
t
=
W
+t
W
, t 0, is again a Wiener process independent of F
.
Proof. It suces to show that for all 0 t
1
t
k
and any bounded continuous functions f and
g on R
k
I
= E[f(W
t
1
, . . . , W
t
k
)g(B
t
1
, . . . , B
t
k
)] = Ef(W
t
1
, . . . , W
t
k
)Eg(W
t
1
, . . . , W
t
k
) .
Assume rst that takes at most countable number of values, denoted by r
1
, r
2
, . . .. Note that
= r
n
F
rn
and
F
n
= f(W
t
1
, . . . , W
t
k
)1
=rn
= f(W
t
1
rn
, . . . , W
t
k
rn
)1
=rn
is F
rn
-measurable. Then
1
=rn
g(B
t
1
, . . . , B
t
k
) = 1
=rn
g(W
rn+t
1
W
rn
, . . . , W
rn+t
k
W
rn
) .
By the time shift property, the last factor is independent of F
rn
and
Eg(W
rn+t
1
W
rn
, . . . , W
rn+t
k
W
rn
) = Eg(W
t
1
, . . . , W
t
k
) .
Thus,
I
rn
E[F
n
g(W
rn+t
1
W
rn
, . . . , W
rn+t
k
W
rn
)] = Eg(W
t
1
, . . . , W
t
k
)
rn
EF
n
,
where the last sum equals Ef(W
t
1
, . . . , W
t
k
).
To nish the proof, we approximate our general nite stopping time by a sequence of discrete stopping
times
n
dened as
n
= (k +1)2
n
if (k2
n
, (k +1)2
n
] for k = 1, 0, . . .. Then
n
+2
n
and
n
. It is easy to check that
n
is also a stopping time. Since the result holds for
n
it suces
to use the continuity to conrm it for .
24
2.3. Brownian bridge and Brownian motion in higher dimensions
A closely related process to the standard Brownian motion is the Brownian bridge. From the denition,
the standard Brownian motion starts at 0. Additionally, the Brownian bridge ends at 0 for t = 1 with
probability 1.
Denition 2.13. A Brownian bridge is a stochastic process B
o
(t), t [0, 1], dened by
B
o
t
= W
t
tW
1
, t [0, 1] ,
where W
t
is the Wiener process.
By the properties of Brownian motion, EB
o
t
= EW
t
tW
1
= 0 and
E(B
o
t
B
o
s
) = E[(W
t
tW
1
)(W
s
sW
1
)] = min(t, s) ts = s(1 t)
for all 0 s t 1. It is possible to show that the Brownian bridge can be equivalently dened as
the Wiener process conditional on W
1
= 0.
Lemma 2.14 (Scaling and inversion). If W
t
is the Wiener process, then (1t)W
t/(1t)
and tW
(1t)/t
are Brownian bridges. If B
o
is a Brownian bridge, then (1 +t)B
o
t/(1+t)
and (1 +t)B
o
1/(1+t)
are Wiener
processes on [0, 1].
Proof. The continuity of sample paths follows immediately by the denition. So it suces to show
that the covariance agrees with what we have seen previously for the Brownian motion.
Example 2.15. Dene (t) =
_
t
0
B
o
s
ds, t [0, 1]. Then E
_
t
0
B
o
u
du =
_
t
0
EB
o
u
du = 0. For the
variance function, write
Var
_
t
0
B
o
s
ds = Cov(
_
t
0
B
o
s
ds,
_
t
0
B
o
s
ds) = E
_
t
0
B
o
u
du
_
t
0
B
o
v
dv
=
_
t
0
_
t
0
EB
o
u
B
o
v
dudv =
_
t
0
_
t
0
Cov(B
o
u
, B
o
v
) dudv
=
_
t
0
_
t
0
[min(u, v) uv] dudv = t
3
/3
_
t
0
_
t
0
uv dudv
= t
3
/3
_
u
2
/2[
t
0
_
2
= t
3
/3 t
4
/4 = t
3
(4 3t)/12 > 0, t (0, 1].
Both Brownian motion and Brownian bridge can be dened in higher dimensions.
Denition 2.16. Dene Brownian motion and Brownian bridge in R
d
, d 2 as the random element
W(t) = (W
1
(t), W
2
(t), . . . , W
d
(t)), t 0,
and
B
o
(t) = (B
o
1
(t), W
o
2
(t), . . . , B
o
d
(t)), t [0, 1],
respectively, where W
i
(t) and W
o
i
(t) are independent standard Brownian motions and Brownian
bridges, respectively.
It is possible to introduce further relationships between the components by linear transforming W.
The Wiener process on the line and in the plane (i.e. d = 1, 2) is recurrent, i.e. it innitely often
returns to any open set. Due to increase in dimensions, the Wiener process is transient for d 3, i.e.
it returns to any bounded set at most a nite number of times.
25
2.4. Sample path properties of Brownian motion
Denition 2.17. Let f be a function dened on [0, T] for T > 0. The quadratic variation of f is
dened as
[f, f](t) = [f, f]([0, t]) = lim
n
i=1
[f(t
n
i
) f(t
n
i1
)[
2
,
where for each n, t
n
i
is a partition of [0, t], with t (0, T] and the limit above is taken for all
partitions with maximum length vanishing to 0 when n . If 2 is replaced by p, we obtain the
denition of p-variation and total variation if p = 1.
The covariation [f, g](t) of two functions f and g is dened similarly. The following theorem shows that
the Wiener process has locally nite quadratic variation and its quadratic variation is deterministic.
If the function f is Lipschitz (e.g. if it has a bounded derivative), then its quadratic variation vanishes,
while the linear variation is nite. One can check that the linear variation of a dierentiable function
is given by
_
T
0
[f
s
[ds. The situation with the Wiener process is quite dierent.
Theorem 2.18 (Quadratic variation). The quadratic variation of the Wiener process over [0, T] equals
T for all T > 0, i.e.
n
i=1
(W
t
n
i
W
t
n
i1
)
2
T as max
in
(t
n
i
t
n
i1
) 0
where the rst limit is in mean square (i.e. in L
2
). Further if partitions are successive renements
then the convergence holds almost surely.
Proof. For a given partition 0 = t
0
, . . . , t
n
= T of [0, T], denote
n
=
n
i=1
(W
t
n
i
W
t
n
i1
)
2
.
We aim to show that E(
n
T)
2
0 as n . First, conrm that the expectation of
n
is exactly
T. Indeed,
E
n
=
n
i=1
E(W
t
n
i
W
t
n
i1
)
2
=
n
i=1
(t
n
i
t
n
i1
) = T .
Thus, it remains to check that the variance of
n
converges to zero. Denote by the standard normal
random variable. Since Var (
2
) = 2 and W
t
W
s
is distributed as
t s,
Var
n
=
n
i=1
Var (W
t
n
i
W
t
n
i1
)
2
=
n
i=1
(t
n
i
t
n
i1
)
2
2 2 max
in
(t
n
i
t
n
i1
)T 0 .
Let us show the almost sure convergence for the case of dyadic partition k2
n
. The monotone con-
vergence theorem implies that
E(
n=1
(
n
T)
2
) =
n=1
E(
n
T)
2
=
n=1
22
n
(2
n
)
2
< .
26
Therefore,
(
n
T)
2
is nite almost surely, i.e. the common term of this series a.s. converges to
zero. The proof for general partitions relies on the fact that
E(
n1
[
n
,
n+1
, . . . ) =
n
,
i.e.
n
is a reversed martingale and then using the a.s. convergence theorem for martingales.
The above result is often written as
[W, W]([0, T]) = T .
Theorem 2.19 (Linear variation). The Wiener process has almost surely unbounded variation on
every interval of positive length.
Proof. Without loss of generality consider interval [0, T]. Since
n
i=1
[W
t
n
i
W
t
n
i1
[
2
max
in
[W
t
n
i
W
t
n
i1
[
n
i=1
[W
t
n
i
W
t
n
i1
[ ,
by the continuity of the sample functions of W
t
max
in
[W
t
n
i
W
t
n
i1
[ 0 as n ,
implying
limsup
n
n
i=1
[W
t
n
i
W
t
n
i1
[ = ,
hence the claim.
The above fact tells us that we can not dene
_
f(t) dW
t
by path-wise integration using the common
Riemann-Stieltjes approach. Indeed the trajectories of the Brownian motion are very rough. They
are not dierentiable at any point as shown in the following theorem.
Theorem 2.20 (Non-dierentiability of trajectories). The Wiener process is a.s. not dierentiable
at any T 0.
Proof. Let us show rst that the non-dierentiability holds for T = 0. Put
A
n
=
_
W
t
t
> n, for some t [0, 1/n
2
]
_
.
Then
P(A
n
) P
_
[W
1/n
4 [
1/n
4
> n
_
= P[W
1
[ > 1/n 1 as n
where the scaling property has been used to replace n
2
W
1/n
4 with identically distributed W
1
. The
sequence of events A
n
is contracting, therefore P(
n
A
n
) = limP(A
n
) = 1, hence the claim.
For T > 0 we use the fact that W
T
(s) = W
T+s
W
T
, s 0 is a Brownian motion, which by the above
derivation is not dierentiable at T = 0, hence W
t
is not dierentiable at t = T.
The above result establishes the non-dierentiability at any xed point only. It is possible to show
even stronger statement that W
t
is nowhere dierentiable with probability one.
27
2.5. Supremum of the Wiener process
In this section we consider the distribution of supremum of the Wiener process
M
t
= sup
0st
W
s
, t > 0.
For any x > 0, the following relation holds
PM
t
x = PT
x
t ,
where T
x
= inft > 0 : W
t
= x is called the hitting time of x. Note that both T
x
and M
t
are random
variables.
The random variable T
x
is also a stopping time. Indeed T
x
t is determined solely by W
s
, s [0, t],
and so is F
t
-measurable. By the strong Markov property,
W
Tx
(t) = W
t+Tx
W
Tx
.
is also Brownian motion, independent from W
t
, t T
x
. Relying on this property, we are able to prove
the following very important fact which provides the distribution of the supremum of a Brownian
motion. It also allows to analyse distributional properties of complicated functionals.
Theorem 2.21 (Reection principle). If W is the Wiener process and is a stopping time, then W
has the same distribution as the reected process
W
t
= W
t
(W
t
W
t
) , t 0 .
Proof. It suces to assume that is a.s. nite. Dene B
t
= W
t
and B
t
= W
t+
W
. Then B
is
the Wiener process independent of (, B). Since B
t
has the same distribution as B
t
, we deduce that
the triplets (, B, B
) and (, B, B
) share the same distribution. It remains to note that

W
t
= B
t
+B
(t)
+
,

W
t
= B
t
B
(t)
+
.
Theorem 2.22 (Distribution of supremum). Given a Brownian motion, for every x 0, we have
PM
t
x = 2PW
t
x = P[W
t
[ x = 2[1 (x/
t)].
Proof. First,
PW
t
x = PW
t
x, M
t
x
= PW
t
x [ M
t
x PM
t
x
= PW
t
x [ T
x
t PM
t
x .
We have that W
Tx
(t) = W
Tx+t
W
Tx
is a Brownian motion independent of W
t
, t T
x
. Note also that
W
Tx
= x a.s. Then, conditioning on the event T
x
t, we obtain
PW
t
x [ T
x
t = P
_
W
Tx+(tTx)
x 0 [ T
x
t
_
= PW
Tx
(t) 0
=
1
2
28
Hence, putting pieces together we obtain
PW
1
x =
1
2
PM
1
x .
Further identities are obvious.
Note that, for each t the random variables M
t
and [W
t
[ have the same law. Of course, the process M
and [W[ do not have the same law (M is increasing and [W[ is not).
Corollary 2.23. For any x and t positive we have
P
_
inf
0st
W
s
x
_
= 2PW
t
x = 2PW
t
x = 2[1 (x/
t)].
Proof. The claim follows readily from the previous theorem and the symmetry property of W.
Example 2.24. Let us calculate the probability that Brownian motion stays below zero for all t
[0, 1]. So we are interested in PW
t
0, t [0, 1]. Then
PW
t
0, t [0, 1] = PM
1
0 .
By the law of the supremum (maximum) of Brownian motion
PM
1
0 = 1 PM
1
> 0 = 1 2PW
1
> 0 = 0,
implying PW
t
0, t [0, 1] = 0.
Example 2.25. It is easy to see that M
t
is almost surely positive for each t > 0. By symmetry
argument we show that inf
0st
W
s
is almost surely negative for each t > 0. Thus, starting at x = 0
at time t = 0 any trajectory (almost surely) of Brownian motion crosses the time axis innitely many
times in (0, t] for any t > 0.
Thus, the zero set t 0 : W
t
= 0 of the Brownian motion a.s. does not have isolated points. Now
we show that the Lebesgue measure L() of any level set (and also of the zero set) vanishes almost
surely.
Proposition 2.26 (Level sets). If u R, then L(t : W
t
= u, t [0, 1]) = 0 almost surely.
Proof. By Fubinis theorem for the process 1
Wt=u
we get
EL(t : W
t
= u, t [0, 1]) = E
_
1
0
1
Wt=u
dt =
_
1
0
E1
Wt=u
dt =
_
1
0
PW
t
= u dt = 0.
It remains to note that the Lebesgue measure is non-negative.
The zero set t 0 : W
t
= 0 is actually an uncountable set of fractal type.
29
2.6. Arcsine law
Recall that the hitting time is dened by
T
x
= inft > 0 : W
t
= x, x R
+
.
Let us dene another quantity
T
o
x
= inft > 0 : W
t
> x.
Lemma 2.27. For all x R
+
, T
x
= T
o
x
a.s.
Proof. Clearly T
0
x
T
x
a.s. Since, B
s
= W
Tx+s
W
Tx
has the same distribution as the original Wiener
process, sup
s[0,]
B
s
> 0 a.s. for each > 0, whence T
p
x
T
x
+ a.s. Thus, T
x
= T
o
x
a.s.
Since M
t
is non-decreasing, it is possible to dene its (generalised) inverse as x inft 0 : M
t
x.
Since
inft > 0 : W
t
x = inft > 0 : sup
0st
W
s
x = inft > 0 : M
t
x ,
we see that T
x
is the inverse function of M
t
.
Furthermore,
PT
x
t = 2[1 (x/
t)], t > 0.
By dierentiation, we obtain that (for each x ,= 0) the density of T
x
is
[x[
2t
3/2
exp(x
2
/(2t))
called the inverse Gaussian distribution. If x ,= 0 then ET
x
= , since
ET
x
=
[x[
2
_

0
t
1/2
exp(x
2
/(2t)) dt = .
Note that PT
x
< = 1, i.e. the point x is visited with probability one. On the other hand, the
expected time for it to happen, ET
x
is innite if x ,= 0. This reminds the case of null recurrent
random walks.
For any x ,= 0 the probability that the Wiener process started at x has at least one zero in the time
interval (0, t) is given by
[x[
2
_
t
0
u
3/2
exp(x
2
/(2u)) du.
Indeed, assume that x < 0. The Wiener process starting at x can be written as W
x
(t) = W
t
+x, so
PW
x
(s) has a zero between 0 and t = P
_
sup
0st
W
x
(s) 0
_
= P
_
sup
0st
W
s
+x 0
_
= P
_
sup
0st
W
s
x
_
= 2PW
t
x = 2PT
x
t
=
x
2
_
t
0
u
1/2
exp(x
2
/(2u)) du.
For x > 0 the proof is similar and based on the symmetry argument.
30
Theorem 2.28 (Arcsine law). The probability that the Wiener process on [0, 1] attains its last zero
in [0, 1] before time t > 0 is
2
arcsin
t .
Proof. Let U be uniformly distributed on (0, 2) and let and be independent standard Gaussian
random variables. Then
PW has last zero in [0,1] before time t = Psups [0, 1] : W
s
= 0 < t
= P
_
sup
ts1
W
s
< 0
_
+P
_
inf
ts1
W
s
> 0
_
= 2P
_
inf
ts1
W
s
> 0
_
= P
_
inf
ts1
[W
s
W
t
] > W
t
, W
t
> 0
_
= P
_
sup
ts1
[W
s
W
t
] < W
t
, W
t
> 0
_
= 2P[W
1
W
t
[ < W
t
, W
t
> 0 = P[W
1
W
t
[ < [W
t
[
= P
_
t
2
(1 t)
2
_
= P
_

2
2
+
2
t
_
= P
_
sin
2
(U) t
_
= P
_
[ sin(U)[
t
_
=
2
arcsin
t .
We recall that if U is uniform in (0, 2) then = sin
2
U is arcsine distributed.
Theorem 2.29 (Arcsine law). The probability that the Wiener process has no zero in the time interval
(a, b) with 0 < a < b < is
2
arcsin
_
a/b.
The probability that the Wiener process has at least one zero in the time interval (a, b) with 0 < a <
b < is
2
arccos
_
a/b.
Theorem 2.30. If
t
= sups t : W
s
= 0 = last zero before t,
t
= infs t : W
s
= 0 = rst zero after t,
for t > 0, then
P
t
x =
2
arcsin
_
x/t, 0 < x < t,
P
t
y =
2
arcsin
_
t/y, 0 < t < y,
P
t
x,
t
y =
2
arcsin
_
x/y, 0 < x < y.
31
Proof. Writing P
t
x = PW has no zero in (x, t) and
P
t
x,
t
y = PW has no zero in (x, y)
it is easy to see that the proof follows immediately from the above results.
3. Integration with respect to Brownian motion
3.1. It os stochastic integral
Stochastic integration plays the same important role as the integration in measure theory. There are
several common points with Riemann integration, however, there are also few important dierences.
Since the Brownian motion has an unbounded variation, the integral
_
f(t) dW
t
cannot be dened in
the usual way, as the limit of Riemanns sums, even if f is a deterministic function. In the following
we are interested also in the case that f is a random function (process). The rst thing we must keep
in mind to understand the stochastic integral is that we will not try to dene this integral for every
path separately; instead we think of W
t
as an element of the space of square integrable functions, and
the integral is to be dened as an element of the same space.
To see that the classical integration theory fails when applied to integral with respect to Brownian
motion, we study the stochastic integral
_
1
0
W
t
dW
t
.
Recall that the quadratic variation of a stochastic process
t
is dened as
[, ](0, T) = lim
n1
i=0
(
t
i+1

t
i
)
2
where 0 = t
0
t
1
t
n
= T and max [t
i+1
t
i
[ 0. In case of the Wiener process [W, W](0, T) =
1
2
T.
Recall also that the linear variation of the Wiener process is innite, so that it is not possible to
dene the above integral pathwise. Indeed, parallelling the Riemann integration we consider rst an
approximating sum
n1
i=0
W
t
i
[W
t
i+1
W
t
i
], n 2,
with t
1
, . . . , t
n
being a partition of [0, 1]. By the properties of the Wiener process we get
n1
i=0
W
t
i
[W
t
i+1
W
t
i
] =
1
2
n1
i=0
[W
2
t
i+1
W
2
t
i
]
1
2
n1
i=0
[W
t
i+1
W
t
i
]
2
= W
2
1
/2
1
2
n1
i=0
[W
t
i+1
W
t
i
]
2
W
2
1
/2 [B, B](0, 1)/2, n
= W
2
1
/2 1/2 ,
32
where the limit holds in mean square and almost surely for nested partitions. Similar calculations
imply
n1
i=0
W
t
i+1
[W
t
i+1
W
t
i
] =
1
2
n1
i=0
[W
2
t
i+1
W
2
t
i
] +
1
2
n1
i=0
[W
t
i+1
W
t
i
]
2
= W
2
1
/2 +
1
2
n1
i=0
[W
t
i+1
W
t
i
]
2
W
2
1
/2 + [B, B](0, 1)/2 as n
= W
2
1
/2 + 1/2,
so we obtain two dierent limits, i.e. we can not use the classical way to dene a stochastic integral.
However, it is easy to dene the integral for constant a R as
_
T
0
adW
t
= a(W
T
W
0
) = aW
T
.
Dene the natural ltration generated by the Wiener process as
F
t
= (W
s
, s t) , t 0 .
Similarly to the Lebesgue integration, the stochastic integral is rst dened for the simple random
functions.
Denition 3.1. Let f(t) = f(t, ) be a random function dened on [0, ) . We say that f is a
step random function if it can be written as
f(t) =
n1
i=0
i
1
[t
i
,t
i+1
)
(3.1)
such that for any i = 0, . . . , n 1,
i
is a square integrable and F
t
i
-measurable random variable and
0 = t
0
< t
1
< . . . < t
n
< is a partition of [0, ) (we suppress t in 1
[t
i
,t
i+1
)
, the rigorous notation is
1
[t
i
,t
i+1
)
(t).) We denote this set of functions by H
0
.
Denition 3.2. The stochastic integral of a random function f H
0
with respect to (W
t
)
t0
is
dened by
I(f) =
_

0
f(t) dW
t
=
n1
i=0
i
[W
t
i+1
W
t
i
].
It is easy to see that this denition is consistent, i.e. the value of the integral does not change if we
change the partition without altering f.
Remark that I(f) is a random variable. Since
i
is F
t
i
-measurable, the independence of increments of
Brownian motion implies that
Cov(
i
, W
t
i+1
W
t
i
) = E
i
(W
t
i+1
W
t
i
) E
i
E(W
t
i+1
W
t
i
)
= E
_
E[
i
(W
t
i+1
W
t
i
)[W
t
i
]
_
= E
_
i
E[W
t
i+1
W
t
i
)]
_
= 0 ,
so we see that the expectation of I(f) vanishes.
33
Remark 3.3. The fact that EI(f) = 0 for all step-functions of type (3.1) is important for the con-
struction of the integral. If
i
is not F
t
i
measurable, this fact no longer holds. For instance, consider
f(t) = W
1
1
[0,1)
. Then with any reasonable denition of an integral one should have
_
1
0
W
1
dW
t
= W
2
1
,
which does not have zero expectation.
Proposition 3.4 (Itos isometry). If f H
0
then
EI(f)
2
=
_

0
Ef(t)
2
dt < .
Proof. It follows from the denition that
f(t)
2
=
n1
i=0
2
i
1
[t
i
,t
i+1
)
,
and
I(f)
2
=
n1
i=0
n1
j=0
j
[W
t
i+1
W
t
i
][W
t
j+1
W
t
j
].
Hence
_
1
0
E(f(t)
2
) dt =
n1
i=0
E
2
i
(t
i+1
t
i
).
We have further
E
_
2
j
[W
t
j+1
W
t
j
]
2
_
= E
2
j
(t
j+1
t
j
) .
Since
j
k
[W
t
j+1
W
t
j
] and [W
t
k+1
W
t
k
] are independent for k ,= j,
EI(f)
2
=
n1
i=0
E
2
i
(t
i+1
t
i
) .
So dened, the stochastic integral I maps the set H
0
into the family L
2
() of square integrable random
variables with expectation zero. Note that the set H
0
is equipped with the L
2
( [0, T])-metric.
It is possible to reduce the stochastic integral dened for step-functions onto a nite interval by setting
_
T
0
f(t)dW
t
= I(f1
[0,T)
)
for any T > 0.
Remark 3.5. The functions in H
0
and their corresponding integral have a straightforward economic
interpretation. Let W
t
be the gain at time t dened for the unit investment. By
i
we denote the
amount invested at time t
i
. This amount can only be determined on the basis of the knowledge of the
past and current gains, so it has to be determined by the history until time t
i
, and so is F
t
i
-measurable.
Then
i
(W
t
i+1
W
t
i
) is the gain on the investment in period i and I(f) is the total gain.
34
The next step is to extend the denition of the stochastic integral to a larger class of (random)
integrands. To this end, we note that I is a linear and isometric (Itos isometry) operator from H
0
into L
2
(). Since H
0
is a subset of a Banach space L
2
( [0, )) of square integrable measurable
processes, classical result from functional analysis tells us that I admits an unique isometric extension
from the closure

H
0
of H
0
in L
2
( [0, )) into L
2
(). It remains to characterise

H
0
. We shall need
the following denition.
Denition 3.6. A stochastic process (t, ), t 0, is called progressively measurable if for any t 0,
the map (s, ) (s, ) is a measurable function from [0, t] (equipped with product -algebra
B[0, t] F
t
) to R.
A progressively measurable process is (F
t
)
t0
-adapted, i.e.
t
is F
t
-measurable for all t. We have the
following partial converse result.
Proposition 3.7. If
t
, t 0 is (F
t
)
t0
-adapted (i.e.
t
is F
t
-measurable for all t 0) and the
trajectories of are right-continuous (or else left-continuous), then is progressively measurable.
Proof. Prove it for the right-continuous case. For n 1 dene
n
s
=
k2
n
t
in case (k 1)2
n
t <
s k2
n
t and also let
n
0
=
0
. Then
n
is measuarable as a map from [0, t] with the product
-algebra into the real line. Furthermore,
n
converges to , so that the limit is measurable too, see
Theorem 1.15 and note that the real line is a metric space.
Observe that any function f H
0
is progressively measurable.
Recall that the set

H
0
is the family of all stochastic processes f(t), t 0, such that
E
_
T
0
f(t)
2
dt < , (3.2)
which can be approximated by step random processes f
n
H
0
in the following sense
lim
n
E
_

0
(f(t) f
n
(t))
2
dt = 0 .
Next, we denote by [0, ) the Banach space of square integrable progressively measurable stochastic
processes, i.e.
_
0
f
2
(t)dt is assumed to have a nite expectation. The following result says that each
function from [0, ) also belongs to

H
0
.
Proposition 3.8. For any f [0, ), there exists a sequence f
n
H
0
, n 1, such that
lim
n
E
_

0
(f(t) f
n
(t))
2
dt = 0 .
Proof. It suces to prove that f [0, ) for f such that f(t) = 0 for t T, where T is xed.
Indeed, the dominated convergence theorem yields that
E
_

0
(f(t) f(t)1
tn
)
2
dt = E
_

n
f
2
(t) dt 0 as n .
35
It is also convenient to assume that f(t) = 0 for t 0. Now we recall that it is well known from
integration theory that an L
2
-function is continuous in L
2
. More precisely, if h L
2
([0, T]) and
h(t) = 0 outside [0, T], then
lim
a0
_
T
T
(h(t +a) h(t))
2
dt = 0 .
This is seen by noticing that an L
2
-function is continuous almost everywhere. This and the inequality
_
T
T
(f(t +a) f(t))
2
dt 2
__
T
T
f
2
(t +a) dt +
_
T
T
f
2
(t) dt
_
4
_
T
0
f
2
(t) dt
along with the dominated convergence theorem imply that
lim
a0
E
_
T
T
(f(t +a) f(t))
2
dt = 0 . (3.3)
Now let for any t [0, T]
n
(t) = 2
n
[2
n
t].
By means of the change of variable t +s = u, t = v, we get
_
1
0
E
_
T
0
(f(
n
(t +s) s) f(t))
2
dtds =
_
T+1
0
E
_
uT
(u1)0
(f(
n
(u) u +v) f(v))
2
dvdu
Owing to (3.3), the last expectation tends to zero uniformly with respect to u, since 0 u
n
(u)
2
n
. It follows that there exists a sequence n(k) such that for almost every s [0, 1]
lim
k
E
_
T
0
(f(
n(k)
(t +s) s) f(t))
2
dt = 0 . (3.4)
Fix any s for which (3.4) holds. Then (3.4) and the inequality a
2
2b
2
+ 2(a b)
2
show that
f
2
(
n(k)
(t +s) s) is integrable at least for all large k. The proof is then completed by observing that
for any xed s, f(
n(k)
(t +s) s) H
0
.
Thus, for each f [0, ) the stochastic integral is dened as the limit of integrals of functions from
H
0
that approaximate f. We then write
I(f) =
_

0
f(t) dW
t
.
If g = 1
(0,T]
f, dene
I(g) =
_

0
1
(0,T]
f(t) dW
t
=
_
T
0
f(t) dW
t
.
From the uniqueness of the extension of I from H
0
to [0, ), we have the following.
Theorem 3.9. For any f [0, ) the stochastic integral I(f) exists and is a square integrable
almost surely unique random variable satisfying
EI(f)
2
= E
_

0
f(t)
2
dt < .
36
It is important to know that all almost surely continuous random functions f(t), t 0 satisfying (3.2)
such that f(t) for all t is measurable with respect to the -algebra generated by all random variables
W
s
, s t are integrable with respect to the Wiener process.
Example 3.10. The integral
_
1
0
t dW
t
is a random variable with mean zero. Further
Var
_
1
0
t dW
t
= E
_
_
1
0
t dW
t
_
2
=
_
1
0
t
2
dt = 1/3.
The same result can be obtained by a direct calculation of the variance using the covariance function
of the Wiener process.
Example 3.11. As shown by a direct calculation of integral sums
_
T
0
W
t
dW
t
=
W
2
T
2

T
2
.
Note that the expectation of the integral is zero and the variance is
E
__
T
0
W
t
dW
t
_
=
_
T
0
EW
2
t
dt =
_
T
0
tdt = T
2
/2 .
This can also be conrmed by direct calculation of the variance of (W
2
T
T)/2.
3.2. Continuous-time martingales and properties of the stochastic integral
The aim of this part is to provide an important property of the stochastic integral, namely that is a
martingale with respect to the ltration generated by the Brownian motion. We start by giving the
denition of continuous-time martingales which is an extension of the one in discrete-time.
Denition 3.12. Let us consider a ltration F
t
, t 0, on the probability space (, F, P). An family
M
t
, t 0, of random variables, is a F
t
-martingale is
1. M
t
is adapted, i.e. M
t
is F
t
-measurable for all t 0;
2. M
t
is integrable, i.e. E[M
t
[ < for all t 0;
3. for any s t, E [M
t
[F
s
] = M
s
.
If we replace the equality in the last condition by an inequality we obtain
a submartingale if, for any s t, E [M
t
[F
s
] M
s
;
a supermartingale if, for any s t, E [M
t
[F
s
] M
s
;
Remark 3.13. It follows from this denition that, if M
t
, t 0, is a martingale then E[M
t
] = E[M
0
]
for all t 0.
Here are some examples of martingales.
37
Proposition 3.14. If W
t
, t 0, is a Brownian motion with its natural ltration F
t
= (W
s
, s t),
t 0, then the processes W
t
, W
2
t
t and expW
t

2
2
t for any R are F
t
-martingales.
Proof. We only prove the martingale property of W
t
, t 0. If s t, then W
t
W
s
is independent of
the -algebra F
s
. Thus
E [W
t
W
s
[F
s
] = E [W
t
W
s
] = 0.
The assertion follows by recalling that W
s
is F
s
-measurable.
Next, for any t 0 and f [0, ) we write I
t
(f) = I(f1
[0,t)
).
Proposition 3.15. For any f [0, ) the process I
t
(f), t 0, is martingale. In particular,
E[I
t
(f)] = 0, for any t 0.
Proof. We start by proving the martingale property for a step random function f H
0
. Fix t and
without loss of generality assume that t t
0
, t
1
, . . . , t
n
(otherwise we can articially rene the
partition by adding to it the point t). If t = t
k
, then
1
[0,t)
f(s) =
k1
i=0
f(t
i
)1
[t
i
,t
i+1
)
(s), I
t
(f) =
k1
i=0
f(t
i
)
_
W
t
i+1
W
t
i
_
.
It follows that I
t
(f) is F
t
-measurable. Furthermore, if s t, s, t t
0
, t
1
, . . . , t
n
, s = t
r
and t = t
k
with r k, then
E [I
t
(f) I
s
(f)[F
s
] =
k1
i=r
E
_
f(t
i
)E
_
W
t
i+1
W
t
i
[F
t
i
[ F
s
= 0.
Thus, for any f H
0
, the martingale property of (I
t
(f))
t0
follows. Next, consider a sequence f
n
H
0
such that f
n
f in L
2
( [0, )). From the above result, we have for any 0 s t,
E [I
t
(f
n
)[F
s
] = I
s
(f
n
) a.s. (3.5)
Furthermore, from the Itos isometry, we get that
E (I
t
(f
n
) I
t
(f))
2
= E
_
t
0
(f
n
(s) f(s))
2
ds 0.
Evaluating conditional expectation is a continuous operator of L
2
() as a projection operator in L
2
().
Hence upon passing to the limit in (3.5) in the mean-square sense, we get an equality which shows
that: I
t
(f) is F
t
-measurable function as a function almost surely equal to an F
t
-measurable function
E[[F
t
] and the martingale property holds. This proves the result. The last claim follows readily from
Remark 3.13.
Example 3.16. Note that
_
t
0
W
s
dW
s
= W
2
t
/2 t/2 is a martingale.
The integral is linear, i.e. I(af +bg) = aI(f) +bI(g). If f = g a.s. for all s [0, T], i.e. f and g are
modications of each other, then the integrals of f and g over [0, T] are a.s. equal.
One can also prove that the mapping t I
t
(f) is continuous a.s. In order to prove this we start with
Doobs inequality.
38
Lemma 3.17 (Doobs inequality). For each f H
0
,
E sup
s0
_
s
0
f(t)dW
t
4EE
_

0
f(t)
2
dt . (3.6)
Proof. Denote I
s
f =
_
s
0
f(t)dW
t
. Then
I
s
f =
n1
i=0
f(t
i
)(W
t
i+1
s
W
t
i
s
)
is continuous in s, so the supremum in the left-hand side of (3.6) can be taken over a countable family
of s and therefore is a random variable. Since (I
s
k
f, F
s
k
) is a discrete time martingale with respect to
k, Doobs inequality for discrete time martingales implies that
E sup
km
(I
s
k
f)
2
4E(I
sm
f)
2
= 4E
_
sm
0
f
2
t
dt 4E
_

0
f
2
t
dt .
This holds for not necessarily ordered s
1
, . . . , s
m
and so by passing to the limit and numbering all
rational numbers s one obtains (3.6).
Theorem 3.18. If f

H
0
, then I
s
f has a continuous modication.
Proof. First nd f
n
(t) H
0
such that
E
_

0
(f(t) f
n
(t)
2
dt 2
n
.
Then
1
[0,s)
(t)f(t) = 1
[0,s)
(t)f
1
(t) + 1
[0,s)
(t)(f
2
(t) f
1
(t)) + ,
where the convergence in the right-hand side is understood in the L
2
-sense. By Itos isometry,
I
s
f = I
s
f
1
+I
s
(f
2
f
1
) + ,
where each summand in the right-hand side is continuous. It remains to show that the series converges
uniformly. By Doobs inequality,
E sup
s0
(I
s
(f
n+1
f
n
))
2
4E
_

0
(f
n+1
(t) f
n
(t))
2
dt 162
n
.
Now the Markov inequality implies that
P(sup
s0
[I
s
(f
n+1
(t) f
n
(t)[ n
2
) 16n
4
2
n
.
Since the series
n
4
2
n
converges, the Borel-Cantelli lemma implies that sup
s0
[I
s
(f
n+1
(t)f
n
(t)[
n
2
for all suciently large n. Since the series
n
2
converges, this yields the uniform convergence.
The following result establishes Walds identities for stochastic integrals.
39
Theorem 3.19. Let f be It o-integrable function and let be an almost surely nite stopping time.
Assume that E
_
0
f
2
s
ds < . Then
_

0
f
s
dW
s
=
_

0
1
s<
f
s
dW
s
a.s.
and
E
_

0
f
s
dW
s
= 0 ,
E
__

0
f
s
dW
s
_
2
= E
_

0
f
2
s
ds .
Note in particular that choosing f identically equal one, we obtain Walds identities for the martingale
W
t
, namely
EW
= 0 , E(W
)
2
= E .
3.3. It o processes
If a stochastic process
t
can be represented as
t
=
0
+
_
t
0
s
dW
s
+
_
t
0
b
s
ds , a.s. t 0 ,
with an Ito integrable stochastic process
t
=
t
() and an adapted jointly measurable in (, t) process
b
t
= b
t
() which is absolutely integrable on any nite segment, then
t
is called an It o process and we
write
d
t
=
t
dW
t
+b
t
dt
and say that
t
has stochastic dierential
t
dW
t
+b
t
dt.
Example 3.20. Since
_
t
0
W sdW
s
= W
2
t
/2 t/2, we can write
d
t
= W
t
dW
t
,
where
t
= (W
2
t
t)/2. Using the obvious linearity property of stochastic dierentials, we deduce
d(W
2
t
) = 2W
t
dW
t
+dt .
In other words this means
W
2
t
= W
0
+
_
t
0
2SW
s
dW
s
+
_
t
0
dt .
3.4. Functions of Brownian motion
Note that the chain rule for dierentiation does not hold for stochastic dierentials. For instance,
d(W
2
t
) = 2W
t
dW
t
+dt .
In the following we nd an expression for df(W
t
) with any twice continuously dierentiable function
f.
We make use of Itos integral to nd out an expression for functions of Brownian motion. We need
rst a result dealing with Riemann integral of some function of Brownian motion.
40
Theorem 3.21. Let g be a bounded continuous function on R and let 0 = t
0
< t
1
< < t
n
= t be a
partition of [0, t]. Then for any
i
[W
t
i
, W
t
i+1
] we have
lim
n1
i=0
g(
i
)[W
t
i+1
W
t
i
]
2
=
_
t
0
g(W
s
) ds (3.7)
holds in probability as n and the partition size goes to zero.
Proof. By denition of the integral and continuity of g,
S
n
=
n1
i=0
g(W
t
i
)(t
i+1
t
i
)
_
t
0
g(W
s
) ds a.s. as n .
Let us show further that S
n

n
0 as n in the mean square, where
n
=
n1
i=0
g(W
t
i
)[W
t
i+1
W
t
i
]
2
.
Using the independence of increments and conditioning we have
E(S
n

n
)
2
= E
_
n1
i=0
g(W
t
i
)
_
(W
t
i+1
W
t
i
)
2
(t
i+1
t
i
)
_
_
2
= E
n1
i=0
g
2
(W
t
i
)E
_
_
(W
t
i+1
W
t
i
)
2
(t
i+1
t
i
)
2
[W
t
i
_
= 2 E
n1
i=0
g
2
(W
t
i
)(t
i+1
t
i
)
2
2 max
in
(t
i+1
t
i
)E
n1
i=0
g
2
(W
t
i
)(t
i+1
t
i
) 0 as n
if max
i
(t
i+1
t
i
) 0 for n .
If
i
[W
t
i
, W
t
i+1
], then
n1
i=0
g(
i
)[W
t
i+1
W
t
i
]
2
n
max
i=0,...,n
(g(
i
) g(W
t
i
))
n1
i=0
[W
t
i+1
W
t
i
]
2
0
in probability by the continuity of g and W and the denition of quadratic variation. Consequently
both S
n
and
n
have the same limit in probability as given in (3.7), hence the result.
Theorem 3.22 (Basic Itos formula). If f(x) is twice continuously dierentiable function, then for
any t 0
f(W
t
) = f(0) +
_
t
0
f
(W
s
) dW
s
+
1
2
_
t
0
f
(W
s
) ds.
41
Proof. By representing f as a sum, it is possible to assume that f has a compact support and so this
function and its rst and second derivatives are bounded.
Since f
is continuous and so is W
s
and further f
(W
s
) is measurable with respect to the -algebra
generated by all W
u
, u s, both integrals above are well dened. Now, let 0 = t
0
< t
1
< < t
n
= t
be a partition of [0, t]. By telescoping summation we write
f(W
t
) = f(0) +
n1
i=0
[f(W
t
i+1
) f(W
t
i
)].
From Taylor expansion we get
f(W
t
i+1
) f(W
t
i
) = f
(W
t
i
)[W
t
i+1
W
t
i
] +
1
2
f
(
i
)[W
t
i+1
W
t
i
]
2
with
i
[W
t
i
, W
t
i+1
]. Thus,
f(W
t
) = f(0) +
n1
i=0
[f(W
t
i+1
) f(W
t
i
)]
= f(0) +
n1
i=0
f
(W
t
i
)[W
t
i+1
W
t
i
] +
1
2
n1
i=0
f
(
i
)[W
t
i+1
W
t
i
]
2
.
Letting the size of the partition max
i
(t
i+1
t
i
) go to zero we see that the rst sum converges to the
Itos integral
_
t
0
f
(W
s
) dW
s
and the second sum converges to
_
t
0
f
(W
s
) ds.
The result of Itos formula can be expressed in terms of stochastic dierentials
df(W
t
) = f
(W
t
)dW
t
+
1
2
f
(W
t
)dt .
Note that the second term in the right-hand side makes the result quite dierent from the classical
calculus, where only the rst order dierentials appear. For instance, dW
2
t
is not equal to 2W
t
dW
t
.
Indeed, integrating dW
2
t
over [0, T] we get W
2
T
which has non-zero mean, while
_
T
0
2W
t
dW
t
has zero
expectation.
Note that Itos formula can be easily informally derived using the following multiplication rule:
(dW
t
)
2
= dt, dW
t
dt = 0, (dt)
2
= 0 .
Let f and g be two almost surely continuous processes such that their Itos stochastic integrals exist.
Dene
t
=
0
+
_
t
0
b
s
ds +
_
t
0
s
dW
s
,
t
=
0
+
_
t
0
a
s
ds +
_
t
0
s
dW
s
, t 0 .
In the following theorem we obtain the covariation of and .
Theorem 3.23. The covariation process of and is
[, ]([0, t]) =
_
t
0
s
ds < t 0.
42
The proof relies on the denition of the covariation and approximation of both and by simple
functions. In particular,
[, ]([0, t]) =
_
t
0
2
s
ds < t 0.
Example 3.24. By Itos integral representation we may write
W
t
=
_
t
0
dW
s
,
which follows taking f(t) = 1
(0,t]
or using the denition of Itos integral. Thus, for s > 0
[W, W]([0, s]) =
_
s
0
du = s.
Repeating the proof of Theorem 3.21, we see that for
i
[
t
i
,
t
i+1
]
lim
n1
i=0
g(
i
)[
t
i+1

t
i
]
2
=
_
t
0
g(
s
)
2
s
ds =
_
t
0
g(
s
)d[, ]
s
.
This yields the following generalisation of Itos formula. If d
t
= b
t
dt +
t
dW
t
and f is twice continu-
ously dierentiable, then
df(
t
) = f
(
t
)d
t
+
1
2
f
(
t
)
2
t
dt ,
where instead of d
t
its expression can be substituted. An alternative expression is
df(
t
) = f
(
t
)d
t
+
1
2
f
(
t
)d[, ]
t
.
The following result establishes the product rule for stochastic dierentials.
Theorem 3.25. Let
t
and
t
be real-valued processes having stochastic dierentials. Then
d(
t
t
) =
t
d
t
+
t
d
t
+ (d
t
)(d
t
) .
Proof. Apply Itos formula to
t
=
1
4
((
t
+
t
)
2
(
t

t
)
2
) .
An alternative proof of Itos formula rst establishes this product rule, then proves the formula for f
being a polynomial and then approximates any twice dierentiable function with polynomials.
Example 3.26. If f(x) = x
2
/2, then
f(W
t
) = W
2
t
/2 = 0 +
_
t
0
W
s
dW
s
+
1
2
_
t
0
ds,
hence
_
t
0
W
s
dW
s
= W
2
t
/2 t/2.
Example 3.27. If f(x) = x
m
, m 2, then
W
m
t
= m
_
t
0
W
m1
s
dW
s
+
m(m1)
2
_
t
0
W
m2
s
ds.
43
3.5. Multidimensional integration
Both concepts of the stochastic dierential and Itos integral can be generalised for a multidimensional
case, i.e. for the Wiener process that takes values in R
d
.
Let W
t
= (W
1
t
, . . . , W
d
t
) be the d-dimensional Wiener process with independent components. Consider
a d-dimensional process f = (f
1
, . . . , f
d
) such that all its components are Ito integrable. Dene
_
t
0
f
s
dW
s
=
_
t
0
f
1
s
dW
1
s
+ +
_
t
0
f
d
s
dW
d
s
,
so that f
s
dW
s
is interpreted as the scalar product. The integrability condition for f can be also
formulated as
E
_
T
0
|f
s
|
2
ds < .
The properties of the multidimensional integral are similar to the properties of one-dimensional inte-
gral. It is also possible to integrate a matrix-valued process
ij
t
as
_
t
0
s
dW
s
by interpreting the integrand as the matrix multiplied by vector. If
_
t
0
tr(
s
s
)ds is integrable (tr
denotes the trace), then
E
_
_
_
_
_
t
0
s
dW
s
_
_
_
_
2
= E
_
t
0
tr(
s
s
)ds .
The stochastic dierential in the multidimensional case is dened as
d
t
=
t
dW
t
+b
t
dt
where
s
is a matrix-valued process and b
t
is a vector-valued process. The formal operations with
stochastic dierentials is greatly simplied by noticing the following multiplication table
dW
i
t
dW
j
t
= 1
i=j
dt dW
i
t
dt = (dt)
2
= 0 .
Theorem 3.28 (General Itos formula). Let u : R
d
R be a twice continuously dierentiable function.
If d-dimensional process = (
1
, . . . ,
d
) has stochastic dierential, then
du(
t
) =
d
i=1
u
x
i
(
t
)d
i
t
+
1
2
d
i,j=1
2
u
x
i
x
j
(
t
)d
i
t
d
j
t
.
Assume that d
t
=
t
dW
t
+b
t
dt. Using the multiplication rule for stochastic dierentials, it is possible
to arrive at
du(
t
) = (L
t
u)(
t
)dt + (
t
gradu)(
t
)dW
t
,
where gradu is the vector composed of the partial derivatives of u and L
t
is the second-order dierential
operator dened as
(L
t
u)(x) =
d
ij=1
a
ij
t
2
u
x
i
x
j
+
d
i=1
b
i
t
u
x
i
44
and (a
ij
t
) form the matrix
1
2
t
.
Another special variant of Itos formula deals with functions that also depend on t. If F(t, x) is twice
continuously dierentiable, then
dF(t, W
t
) = F
t
(t, W
t
)dt +F
W
(t, W
t
)dW
t
+
1
2
F
WW
(t, W
t
)dt .
This can be seen by letting
t
= (t, W
t
), i.e.
b
t
= (1, 0) ,
t
=
_
0 0
0 1
_
.
Example 3.29. The following process (called geometric Brownian motion)
Y
t
= exp(W
t

1
2
t), t 0
is well known in nancial mathematics. Since F(t, x) = e
x
1
2
t
is twice continuously dierentiable
function we get
dY
t
=
1
2
e
Wt
1
2
t
dt +e
Wt
1
2
t
dW
t
+
1
2
e
Wt
1
2
t
dt = e
Wt
1
2
t
dW
t
= Y
t
dW
t
.
In other words,
dY
t
= Y
t
dW
t
,
which is a stochastic dierential equation for Y
t
.
3.6. Examples of applying Its formula
The following result establishes a conncetion between Itos formula and the theory of partial dierential
equations.
Theorem 3.30. let
0
be deterministic, such that
0
lines in open set Q R
d
. Let be the rst
exit time of an It o process
t
from Q and let u be a twice continuously dierentiable function on Q.
Furthermore, assume that is a.s. nite and E
_
0
[L
s
u(
s
)[ds < . Then
u(
0
) = Eu(
) E
_

0
L
s
u(
s
)ds .
In particular, if Lu vanishes, then u(
0
) = Eu(
) describes the solution of the elliptic partial dier-

ential equation Lu = 0.
Example 3.31. Consider the Wiener process W
t
. In this case Itos formula becomes
u(W
t
) = u(W
0
) +
_
t
0
gradu(W
s
)dW
s
+
_
t
0
Lu(W
s
)ds ,
where
Lu =
1
2
u =
1
2
_
2
u
x
2
1
+ +

2
u
x
2
d
_
45
is the half of the Laplace operator L. Consider a bounded open domain D in R
d
. A function u such
that u = 0 identically vanishes on D is said to be harmonic. If u is harmonic, then
u(W
t
) = u(W
0
) +
_
t
0
gradu(W
s
)dW
s
is a martingale (note that the stochastic integral is a martingale). Compare with the discrete time
case: if X
n
is a Markov chain with transition matrix P and if f is a harmonic function (i.e. Pf = f),
then f(X
n
) is a martingale.
In particular, if W
0
= x
0
is deterministic, then
Eu(W
) = u(x
0
)
for suitable stopping time (that satises the condition of the optional sampling theorem). If the
values of u on the boundary of D are known and are given by a function g, this makes it possible
to retrieve the harmonic function inside D. For this consider the Wiener process that starts at x
0
and dene to be the rst exit time from D. By continuity, W
lies on the boundary of D and so

u(W
) = g(W
) is known. By simulating many Wiener processes up to the rst exit time, we obtain
the sample of values for g(W
) and their average serves as an estimator for u(x

0
).
Example 3.32. Let be the rst exit time of W
t
from the open ball B
R
of radius R centred at the
origin. Dene
u(x) =
1
d
(R
2
|x|)
2
.
We apply Itos formula noticing that is the identity matrix and L =
1
2
with the Laplace operator
. Then almost surely
u(W
t
) = t
_
t
0
2
d
W
s
dW
s
+
1
d
R
2
, t > 0 .
Applying it for t instead of t and noticing that 0 u(W
t
)
1
d
R
2
, we see that
E
_
t
0
|W
s
|
2
ds R
2
t < .
Thus, Walds identities yield that
Eu(W
t
= E(t ) +
1
d
R
2
whence
E(t ) =
1
d
R
2
Eu(W
t
) .
Letting t and noticing that u(W
t
) u(W
) = 0 we obtain that E = R
2
/d.
3.7. Stochastic dierential equations
Let (W
t
, F
t
)
t0
be a d-dimensional Wiener process. Let b(t, x) and (t, x) be Borel functions dened
on (0, ) R
d
, where b is vector-valued and is matrix-valued. Assume that there exists a nite
constant K such that for all x, y, t
|(t, x)| +|b(t, x)| K(1 +|x|) ,
|(t, x) (t, y)| +|b(t, x) b(t, y)| K|x y| ,
46
where the norm of a matrix is dened as || = (
(
ij
)
2
)
1/2
. Let
0
be a F
0
-measurable random
vector. Consider the following equation
t
=
0
+
_
t
0
(s,
s
)dW
s
+
_
t
0
b(s,
s
)ds , t 0 , (3.8)
or, alternatively written as
d
t
= (t,
t
)dW
t
+b(t,
t
)dt
with the corresponding initial value
0
.
By its solution we mean a continuous F
t
-adapted process such that (3.8) holds almost surely. Note
that the continuity of together with the conditions on b and imply that the integrals in (3.8) are
well dened.
Theorem 3.33. The solution of (3.8) exists and is a Markov process.
This existence result is proved by the Picard method of successive approximation for dierential
equations.
4. Levy processes
A Levy process is a stochastic process with stationary and independent increments. The basic theory
was developed, principally by Paul Levy in 1930s. In the last 20 years there have been a renewed
interest for this class of processes. The reasons are two-fold. On the theoretical side, they include
many interesting examples such as Brownian motion, stable processes, simple and compound Poisson
processes, subordinators, etc. Moreover Levy processes are the simplest generic class of processes which
have (a.s.) continuous sample paths interspersed with random jumps of arbitrary size occurring at
random times. On the applied side, they have been intensively used in the recent years for describing
asset prices dynamics in Finance. They are also used for modelling physical phenomena such as
turbulence via Burgers equation, viscoelasticity, etc.
4.1. Some basic ideas
Our state space is the Euclidean state space R
d
. The inner product of two vectors x = (x
1
, . . . , x
d
)
and y = (y
,
. . . , y
d
) is x, y) =
d
i=1
x
i
y
i
and the associated norm is |x| = x, x)
1
2
.
Let (, F, P) be a probability space. For a random vector we dene its laws as p
(A) = P( A),
A F. If and are two independent random variables, then the law of + is given by the
convolution of measures
p
+
(A) = (p
)(A) =
_
R
d
p
(Ax)p
(dx).
The characteristic function of (or of p
) is
: R
d
C where
(u) = Ee
u,)
=
_
R
d
e
iu,x)
p
(dx).
47
We recall that
1
, . . . ,
n
are independent random vectors if and only if
E
_
_
exp
_
_
i
n
j=1
u
j
j
)
_
_
_
_
=
1
(u
1
)
n
(u
n
)
for all u
1
, . . . , u
n
R
d
.
4.2. Innite divisibility
Let be a probability measure on R
d
. Dene
n
= . . . (n times). We say that has a
convolution n
th
root if there exists a probability measure
1
n
for which (
1
n
)
n
= .
Denition 4.1. Probability measure is innitely divisible if it has a convolution n
th
root for all
n N. In this case
1
n
is unique. A random variable is innitely divisible if its law p
is innitely
divisible, i.e. for each n 2, there exist i.i.d. random variables
1
, . . . ,
n
such that
d

1
+ +
n
.
Theorem 4.2. Probability measure is innitely divisible if and only if for all n 2, there exists a
probability measure
n
with characteristic function
n
such that
(u) = (
n
(u))
n
for all u R
d
. Moreover
n
=
1
n
.
Proof. If is innitely divisible, take
n
=
1
n
. Conversely, for each n 2, by Fubinis theorem,
(u) =
_
R
d

_
R
d
e
iu,x
1
+...+xn)
n
(dx
1
) . . .
n
(dx
n
) =
_
R
d
e
iu,x)
n
n
(dx).
Since
() =
_
R
d
e
iu,x)
(dx) and determines uniquely, we obtain =
n
n
.
If and are each innitely divisible, then so is .
The space of innitely divisible measures is closed under weak convergence, i.e. if (
n
)
n0
is a sequence
of innitely divisible measures and
n
w
then is innitely divisible. Recall that
n
w
means
lim
n
_
R
d
f(x)
n
(dx) =
_
R
d
f(x)(dx),
for each continuous bounded function f : R
d
R.
Here is the fundamental result of this part.
Theorem 4.3 (Levy-Khintchine formula). A random vector R
d
is innitely divisible if and only
if its characteristic function
(u) = E(e
i,u)
) is given by
(u) = e
(u)
, (4.1)
48
where
(u) = ia, u)
1
2
Qu, u) +
_
R
d
_
e
iu,x)
1 iu, x)1
|x|<1
_
(dx) (4.2)
with uniquely determined a R
d
, a positive semi-denite symmetric d d matrix Q and a measure
on R
d
0 such that
_
R
d
\0
min(1, |x|
2
)(dx) < . (4.3)
Remark 4.4. That the integral in the exponent is well dened follows from the requirements on
and the fact that
_
e
iux)
1 iu, x)1
|x|<1
_
= O([x[
2
) (resp. O(1)) as [x[ 0 (resp. [x[ ).
The function 1
|x|<1
in the Levy-Khintchine formula may be replaced with
x
(1[x[)
or any measurable
function that is O([x[
2
) (resp. O(1)) as [x[ 0 (resp. [x[ ). The only eect of this replacement
is that the value of a changes.
Example 4.5 (The normal distribution). Let N(, ) with R
d
and positive-denite symmetric
matrix . A standard calculation yields
(u) = e
iu,)
1
2
u,u)
,
and hence
(
(u))
1
n
= e
iu,/n)
1
2
u,
1
n
u)
.
We see that is innitely divisible with
i
N(
n
,
1
n
) for each 1 i n. Moreover, it appears that
a = , Q = and = 0 in (4.2).
Example 4.6 (Poisson distribution). Consider a Poisson random variable N Po(), so that
P(N = n) =

n
n!
e
, n 0.
for some c
lambda > 0. Its characteristic function is
N
(u) =
n=0
e
iun
e
n
n!
= e
(e
iu
1)
,
from which we deduce that N is innitely divisible with each N
i
Po(
n
), 1 i n. The Poisson
distribution appears if d = 1, a = 0, Q = 0, and is the measure of mass concentrated at 1 in
(4.2).
Example 4.7 (Compound Poisson distribution). Let
1
,
2
, . . . be i.i.d. random vectors with distri-
bution , i.e. P(
1
A) = (A) for any measurable set A. Let N be a Poisson random variable of
mean . We say that
=
1
+ +
N
(4.4)
has a compound Poisson distribution. Since N is innitely divisible, is innitely divisible too, no
matter what kind of distribution has. The total probability formula implies that
(u) =
n=0
E(e
i
1
++n,)
)P(N = n)
=
n=0
(u)
n
e
n
n!
= exp(1
(u)) .
49
This formula corresponds to (4.2) for Q = 0, = and
a =
_
|x|<1
x(dx) .
Not every innitely divisible distribution with Q = 0 can be obtained this way, since, in general
_
|x|<1
x(dx) and moreover
_
|x|<1
(dx) are not nite, while
_
|x|<1
|x|
2
(dx) is nite by (4.3).
However, every innitely divisible distribution with Q = 0 can be obtained as a limit of compound
Poisson distributions.
Example 4.8 (Uniform distribution is not innitely divisible). We show rst the characteristic func-
tion of any innitely divisible distribution never vanishes.
Let be innitely divisible. Then [
(u)[
2
is also a characteristic function (namely of
, where
is an independent copy of ). The innite divisibility of implies that [
(u)[
2/n
is a characteristic
function for all n 2. The limit of a converging sequence of characteristic functions is again a
characteristic function. Since [
(u)[ is a real number lying in [0, 1], [
(u)[
2/n
converges to a function
that is either zero or one. Since the characteristic function is continuous and takes value one at the
origin, the limit cannot take value zero at all. This is the case when
vanishes, so this is impossible.

Consider uniformly distributed on [1, 1]. Its characteristic function is u
1
sinu, which vanishes for
certain u and so cannot be a characteristic function of an innitely divisible distribution.
4.3. Denition of Levy processes
Consider a stochastic process X
t
, t 0. Assume that the realisation (the paths) of X
t
are functions
that have values in R
d
, are right-continuous and have left limits for all t (one says X
t
has c`adl` ag
paths). Sometimes it is useful to add to R
d
a cemetery point .
Denition 4.9. X
t
, t 0, is said to be a Levy process if
1. for every t 0 and s > 0, the increment X
t+s
X
t
is independent of (X
u
, u t).
2. for every t 0 and s > 0, the increment X
t+s
X
t
has the same distribution as X
s
.
3. X
t
, t 0, is stochastically continuous, i.e. for every t 0 and > 0
lim
st
P([X
t
X
s
[ > ) = 0 .
One can show that the Levy process X
t
, t 0 admits a modication which has c`adl` ag paths. The
right-continuity implies that X
0
= 0 with probability 1.
The denition implies that
X
1
= (X
1
X
11/n
) + (X
11/n
X
12/n
) + + (X
1/n
X
0
)
and all summands on the right-hand side are independent and identically distributed. Therefore, X
1
has an innitely divisible distribution.
We now list some examples of important Levy processes.
50
Example 4.10 (Brownian motion). The Brownian motion (or Wiener process) W
t
is the Levy process
whose increments are normally distributed. The Brownian motion is called standard if this normal
distribution has mean zero and the unit covariance matrix. A general Brownian motion can be obtained
from the standard one by an ane transformation QW
t
+at.
If the standard Brownian motion W
t
takes values in the real line, then W
t+s
W
t
N(0, s). The
independent increments property implies that E(W
t
W
s
) = min(t, s).
The Brownian motion appears as the limit (in distribution) of simple random walk, for instance the
standard Brownian motion is the limit of
(n)
t
=

n(S
[nt]
/), where S
n
=
1
+ +
n
is the sum of n
i.i.d. random variables with mean zero and variance
2
. One should however take care here and use
an appropriate denition that allows the convergence of a discontinuous process
(n)
t
to the continuous
Brownian motion.
Example 4.11 (Poisson process). A Levy process X
t
, t 0, is said to be a Poisson process if its
increment X
t+s
X
t
has a Poisson distribution with parameter s. The number > 0 is called the
intensity of the Poisson process. One usually writes N
t
, t 0, to denote the Poisson process.
The Poisson process can be alternatively described as a collection of random variables (time moments)
S
1
, S
2
, . . ., where N
t
has jumps. Then
P(S
1
> t) = P(X
t
= 0) = e
t
,
i.e. S
1
is exponentially distributed with parameter c. Similarly, S
k
has the Gamma distribution
Ga(k, ). Thus, one can represent
S
1
=
1
,
S
2
=
1
+
2
,
S
n
=
1
+ +
n
,
where
1
,
2
, . . . are i.i.d. Ex() random variables.
An important property of the Poisson process says that given N
t
= n the jump points are given by n
i.i.d. Un[0, t] random variables.
The process
M
t
= N
t
ct
is a martingale. Furthermore, if H
t
is a predictable process (i.e. H
t
is F
t
-measurable for each t), and
E(
_
t
0
[H
s
[ds) is nite for each t 0, then
M
t
=
_
t
0
H
s
dN
s
c
_
t
0
H
s
ds
is a martingale. For instance, this holds if H
s
is a non-random function. Note that
_
t
0
H
s
dN
s
=
Nt
k=1
H
S
k
,
where S
1
, S
2
, . . . are jump times of the Poisson process.
51
Example 4.12 (Compound Poisson process). Let
1
,
2
, . . . be a sequence of i.i.d. random variables
(or random vectors). If N
t
is a Poisson process, dene
X
t
=
Nt
i=1
i
,
which is called the compound Poisson process. One can interpret X
t
as the total claim amount if
particular claims of sizes given by
1
,
2
, . . . have occurred at jump times of a Poisson process. If is
the joint distribution of
i
s, then
Ee
iu,Xt)
= e
t(u)
,
where
(u) =
_
R
d
(e
iu,x)
1)(dx) .
This formula corresponds to (4.2) with
(u) = i
_
R
d
u, x)1
|x|<1
(dx) +
_
R
d
(e
iu,x)
1 iu, x)1
|x|<1
)c(dx) .
The following inequality from martingale theory is often useful.
Proposition 4.13 (Doobs martingale inequality). Let M
n
be a submartingale. Dene
M
n
= maxM
0
, . . . , M
n
.
If 1/p + 1/q = 1 with p > 1, then
|M
n
|
p
q|M
n
|
p
, (4.5)
where ||
p
= (E(
p
))
1/p
.
This inequality has been proved in the rst part of the course (discrete time).
Poisson measures and Poisson point processes
Let E be a Polish space with -algebra c. Let be a sigma-nite measure on E. A Poisson measure
with intensity is a collection of random variables (A, ), A c, T dened on (, T) such that
1. For every Borel subset B of E with (B) < , (B, .) has a Poisson distribution with parameter
(B).
2. if B
1
, . . . , B
n
are disjoint Borel sets, the random variables (B
1
, .), . . . , (B
n
, .) are independent.
3. For any , (., ) is a counting measure on E.
Next, let E = R
d
[0, ) and let
t
, t 0, be a stochastic process taking values on R
d
. For every
Borel subset B, let us introduce the counting process of B
N
B
t
= Cards t;
s
B(= (B [0, t])), t 0
We say that the process
t
, t 0, is a Poisson point process with intensity measure if for every
Borel subset B, N
B
t
, t 0 is a Poisson process with intensity (B). We say that the Poisson point
process
t
, t 0, is discrete if (R
d
) < .
52
4.4. The characteristic exponent of Levy processes
Let X
t
, t 0, be a Levy process in R
d
. The innite divisibility of X
1
implies that
Ee
iu,X
1
)
= e
(u)
.
Thus,
Ee
iu,Xt)
= e
t(u)
,
which is shown rst for a rational number t and then by an approximation using the right-continuity
of the process. The function is called the characteristic exponent of the process.
Example 4.14. The Poisson process appears if (u) = (e
iu
1). The standard Brownian motion
appears if (u) =
1
2
|u|
2
.
The so-called stable process appears if (ku) = k
(u). In this case k

1/
X
kt
coincides in distri-
bution with X
t
. The corresponding Levy measure is given by (dr d) = r
d
dr(d) in polar
coordinates with (0, 2) and some -nite measure on the unit sphere of R
d
.
Proposition 4.15. The nite dimensional distributions of a Levy process X
t
, t 0 are determined
by the law of X
t
for any choice of t > 0.
Proof. To check that the joint distribution of (X
t
1
, . . . , X
tn
) is determined by the distribution of X
t
, it
suces to check that (X
t
1
, X
t
2
X
t
1
, . . . , X
tn
X
t
n1
) is, for any 0 < t
1
< < t
n
. By the stochastic
continuity and independence and stationarity of the increments this holds if the distribution of X
t
determines the distribution of (X
t
j
X
t
j1
) for t
j
t
j1
=
k
l
t and t
0
= 0. This follows from
X
t
j
t
j1
(u) =
1
l
X
l(t
j
t
j1
)
(u) =
1
l
X
kt
(u) =
k
l
Xt
(u).
The following result shows (by an explicit construction) that any innitely divisible probability dis-
tribution can be viewed as the distribution of the Levy process evaluated at time 1.
Theorem 4.16 (Classication theorem). If is given by (4.2), then there exists a unique probability
measure under which X is a Levy process with characteristic exponent . The jump process of X,
namely X = (X
t
, t 0), is a Poisson point process with intensity measure .
Remark 4.17. The proof of the theorem says that one can express a Levy process as the sum of three
independent Levy processes X
(1)
, X
(2)
and X
(3)
, where X
(1)
is a linear transform of a Brownian
motion with drift, X
(2)
is a compound Poisson process having only jumps of size at least 1 and nally
X
(3)
a pure jump process having jumps of size less than 1.
We proceed by relating analytical properties of the characteristic exponent with the probabilistic
behaviour of the Levy process.
We say that the process has bounded variation if its sample paths have bounded variation on every
compact time interval a.s.
53
Proposition 4.18. A Levy process X
t
, t 0, has bounded variation if and only if Q = 0 and
_
(1 |x|)(dx) < , i.e. has the following form
(u) = id, u) +
_
R
d
(e
iu,x)
1)(dx)
for some d R
d
which is known as the drift coecient.
Note that a compound Poisson process has bounded variation. Conversely, a process with bounded
variation is compound Poisson if d = 0 and has a nite total mass.
Proposition 4.19. Suppose d = 1. Then
1. We have
lim
[u[
2
(u) =
1
2
Q
where Q denotes the Gaussian coecient.
2. If X
t
, t 0, has bounded variation and drift coecient d, then
lim
[u[
1
() = id
Proof. We only prove 1., the argument for 2. is similar. Plainly,
lim
[u[
u
2
_
e
iu,x)
1 iu, x)1
|x|<1
_
= 0
for every x. On the other hand, making use of the inequalities
[1 cos a[ 2(1 a
2
) and [a sina[ 2([a[ [a[
3
),
it is easy to see that for every u large enough
u
2
_
e
iu,x)
1 iu, x)1
|x|<1
_
4(1 x
2
).
So the dominated convergence theorem applies and yields that
lim
[u[
u
2
_
R0
_
1 e
iu,x)
+iu, x)1
|x|<1
_
(dx) = 0.
We conclude the proof by the Levy-Khintchine formula.
Proposition 4.20. The characteristic exponent is bounded if and only if X
t
, t 0, is a compound
Poisson process.
Proof. If X
t
, t 0, is a compound Poisson process then its characteristic exponent is bounded.
Conversely, Proposition 4.19 shows that the boundedness of yields that Q is null and d = 0.
Moreover, note that
Re(u) =
_
R\0
(1 cos x)(dx) .
54
Since
1 e
tx
2
/2
=
1
2t
_
R
(1 cos ux)e
u
2
/2t
du,
Fubinis theorem yields that
_
R\0
(1 e
tx
2
/2
)(dx) =
1
2t
_
R
e
u
2
/2t
Re(u)du supRe(u) : u R .
By the dominated convergence theorem, letting t goes to , we deduce that the Levy measure is
nite. Hence the Levy process is a compound Poisson process.
4.5. The Markov property
Let F
t
be the natural complete ntration generated by X
t
, t 0, i.e. F
t
is a complete -algebra
generated by X
s
for s t.
The simple Markov property states that, given X
t
, the future process X
t+s
, s 0, is independent of
the past. This follows from the fact that X
t+s
X
t
, s 0, is independent of F
t
.
The following result follows from 0 1-law.
Proposition 4.21. The ltration F
t
is right-continuous, that is
F
t
=
s>t
F
s
for every t 0. In particular, the initial -algebra F
0
is trivial.
For instance, the event X
t
< 0 for all suciently small t belongs to F
0
and so has either probability
zero or one.
Let P
x
denote the distribution of X
t
+ x, t 0, that is the distribution of the Levy process started
from x. Denote by E
x
the expectation with respect to P
x
.
The convolution operators act on essentially bounded function f as
P
t
f(x) = E
x
(f(X
t
)) = Ef(X
t
+x) =
_
R
d
f(x +y)P(X
t
dy) .
Let (
0
(R
d
) be the space of continuous functions vanishing at . Endowed with the uniform topology,
i.e. with the norm |f| = sup
xR
d [f(x)[, (
0
(R
d
) is a Banach space.
Proposition 4.22. The family of operators (P
t
)
t0
dened above is a strongly continuous contraction
semigroup on (
0
(R
d
), i.e. for any t 0, P
t
is a linear bounded operator on (
0
(R
d
) and
1. P
0
= Id ,
2. P
t
P
s
= P
t+s
, for any t, s 0, (semigroup)
3. P
t
f uniformly converges to f as t 0, (strongly continuous)
4. |P
t
| = sup
f(
0
(R
d
),|f|1
|P
t
f| 1, (contraction)
55
Moreover we have P
t
f (
0
for every t 0. (P
t
)
t0
is called a Feller semigroup.
Proof. By denition, for any t 0, P
t
is a linear operator on (
0
(R
d
) and P
0
=Id. The semigroup
property follows from the Markov property. Indeed, we have for any s, t 0,
P
t
P
s
f(x) = E
x
E
Xt
(f(X
s
)) = E
x
(f(X
t+s
)) .
Let us now show the strongly continuity. Let f (
0
(R
d
). It is easy to see that f is uniformly
continuous on R
d
. Given > 0, choose > 0 so that [f(x +y) f(x)[ < whenever [y[ . Then
[P
t
f(x) f(x)[ [
_
[y[
(f(x +y) f(x))P(X
t
dy)[ +[
_
[y[>
(f(x +y) f(x))P(X
t
dy)[
+ 2|f|
_
[y[>
P(X
t
dy)
+ 2|f|
for small t, because of the stochastic continuity, namely the convergence in probability of X
t
to X
0
= 0
as t 0. Hence, |P
t
f| 0 as t 0, i.e. (P
t
)
t0
is strongly continuous. Note also that
P
t
f(x) =
_
R
d
f(x +y)P(X
t
dy) = E(f(X
t
+x)) .
Thus, P
t
is a bounded operator with |P
t
| 1. By the dominated convergence theorem, P
t
f (
0
for
every t 0, which completes the proof.
The Markov property of X
t
holds also at random stopping times. A non-negative random variable
is called a stopping time if t F
t
for all t 0. The stopping -algebra F
consists of all events

A such that A t F
t
for all t 0.
Proposition 4.23. Let be a stopping time with P( < ) > 0. Then conditionally on < ,
the process X
+t
X
, t 0, is independent of F
and has the same distribution as X

s
, s 0.
Proof. If is deterministic, the result is the ordinary Markov property. Assume that takes a
countable number of values t
k
, k 1. Then
P((X
+s
X
, s 0) A[F
) =
k=1
P((X
+s
X
, s 0) A, = t
k
[F
)
=
k=1
P((X
t
k
+s
X
t
k
, s 0) A, = t
k
[F
t
k
)
=
k=1
P((X
s
, s 0) A, = t
k
)
= P((X
s
, s 0) A)
for every set A that consists of functions (possible realisations of the process). Finally, an arbitrary
stopping time can be approximated by a sequence of discrete stopping times
n
and use the
right-continuity of the process. For instance one can take
n
= 2
n
[2
n
+ 1].
56
The Markov property can be formulated using the translation operators X
= (X
+t
, t 0). Then
the conditional distribution of X
given F
is P
x
, where x = X
.
Th same argument as in Proposition 4.21 shows that F
= F
+
.
Although the process X
t
is not left-continuous, it is possible to show the following quasi-left-continuity
property.
Proposition 4.24. Let
n
, n 1 be an increasing sequence of stopping times with such that
n

almost surely as n . Then
lim
n
X
n
= X
a.s. on < .
In particular, if
n
< a.s., then X is continuous at time a.s. on < .
Proof. Assume that is a.s. nite. By the existence of left limits, X
n
X
as n . Let f and
g be two functions from (
0
. Then
lim
n
E(f(X
n
)g(X
n+t
)) = E(f(X
)g(X
+t
)) .
By the right-continuity of the paths,
lim
t0+
E(f(X
)g(X
+t
)) = E(f(X
)g(X
)) .
The Markov property at time
n
yields that
E(f(X
n
)g(X
n+t
)) = E(f(X
n
)P
t
g(X
n
)) ,
which converges to E(f(X
)P
t
g(X
)) as n . If t 0, the Feller property implies that

lim
t0+
E(f(X
)P
t
g(X
)) = E(f(X
)g(X
)) .
Thus
E(f(X
)g(X
)) = E(f(X
)g(X
)) ,
that is X
= X
a.s.
4.6. Basic ideas of potential theory for Levy processes
Resolvents
Remember that the operators
P
t
f(x) = E
x
f(X
t
) =
_
R
d
f(x +y)P(X
t
dy)
form a semigroup. The resolvent of this semigroup is dened as
U
q
f(x) =
_

0
e
qt
P
t
f(x)dx = E
x
__

0
e
qt
f(X
t
)dt
_
,
57
where q > 0. Therefore qU
q
f(x) = E
x
f(X
), where Ex(q). Then

f = lim
q
qU
q
f
for each f (
0
.
Fubinis theorem yields the resolvent equation, i.e. for any q, r > 0, we have
U
q
U
r
+ (q r)U
q
U
r
= 0 .
Thus, if U
q
f = U
q
g for one q, then they are equal for all q and also f = g.
Let T be the image of (
0
under U
q
. Then U
q
is a bijection between (
0
and T. The innitesimal
generator / : T (
0
is dened by the relation
U
q
(qI /) = I ,
where I is the identical operator. The resolvent equation implies that the denition of / does not
depend on q.
Denote the Fourier transform of an integrable function f (i.e. f L
1
) by
Tf() =
_
R
d
e
i,x)
f(x)dx.
Then
T(P
t
f)() = e
t()
Tf() , (4.6)
T(U
q
f)() =
Tf()
q + ()
. (4.7)
Then
T(/f)() = ()Tf() .
By the Levy-Khintchine formula, one obtains that /f is dened on all suciently smooth functions
that also suciently fast decrease to zero and
/f(x) = a, f
(x)) +
1
2
d
ij,j=1
Q
ij
f
ij
(x) +
_
R
d
(f(x +y) f(x) 1
|y|<1
y, f
(x)))(dy) .
4.7. Subordinators
Cumulant
A subordinator is a Levy process taking values in [0, ), which implies that its sample paths are
increasing. If X is a subordinator and Ex(q) is independent exponentially distributed random
variable, then the process
X
(q)
t
=
_
X
t
, t [0, ) ,
, t ,
58
is called a subordinator killed at rate q.
When working with subordinators, it is easier to use the Laplace transform instead of the Fourier
transforms. Then
Ee
uXt
= e
t(u)
,
where
(u) = (iu) = du +
_
(0,)
(1 e
ux
)(dx)
called the Laplace exponent or the cumulant of the subordinator. The Levy measure has support
in [0, ) and fulls
_
(0,)
min(1, x)(dx) < .
Clearly, each subordinator is transient. Its potential measure
U(A) = U(0, A) = E
0
__

0
1
XtA
dt
_
is a Radon measure (i.e. its values on any set can be approximated by the values on compact subsets).
Its Laplace transform is given by
_

0
e
xu
U(dx) = E
0
__

0
e
uXt
dt
_
=
1
(u)
.
The function |(x) = U([0, x]) is called the renewal function of a subordinator.
Important subordinators
The Poisson process (in this case is concentrated at 1).
The stable subordinator for (0, 1)
(u) = u
=

(1 )
_

0
(1 e
ux
)x
1
dx.
The value = 1 corresponds to the degenerate case.
The Gamma process dened by (dx) = ax
1
e
bx
dx. In this case
a log(1 +u/b) =
_

0
(1 e
ux
)(dx) ,
the so-called Frullani integral.
59

Continuous Time

Uploaded by

Copyright:

Available Formats

You might also like

Continuous Time

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Continuous Time

Uploaded by

Copyright:

Available Formats

Stochastic Processes II

= x X : x() > 0 is open in X. Dene

, where A is a non-Borel subset of [0, 1]. Then

, whereas stochastic processes are

and further build D

for compact sets K

(s, t) [1, 1].

taking possibly innite values. Nevertheless, [(t)[ is a.s. nite for

that does not take innite values at all.

(C) is dened as the inmum of

(F) = 1. Choose F to be the space of

consists of all measurable events A such that A t F

is generated by the random

, t 0, is again a Wiener process independent of F

) share the same distribution. It remains to note that

) describes the solution of the elliptic partial dier-

lies on the boundary of D and so

) and their average serves as an estimator for u(x

is an independent copy of ). The innite divisibility of implies that [

(u)[ is a real number lying in [0, 1], [

vanishes, so this is impossible.

(u). In this case k

consists of all events

and has the same distribution as X

)) as n . If t 0, the Feller property implies that

), where Ex(q). Then

You might also like