Monte Carlo Sampling For Random Differential Equations: Master INVESTMAT 2017-2018 Unit 4

Monte Carlo sampling for Random Differential


master INVESTMAT 2017–2018

Unit 4

Aims of Monte Carlo method

Monte Carlo sampling (MCS) is a statistical sampling method that was popularized by
physicists from Los Alamos National Laboratory in the USA in the 1940s.
In the following we will explain how this method works in dealing with random
differential equations (r.d.e.’s) although it can be applied in many different contexts
such as:
Approximate definite integrals.
Numerical optimization.
MCS relies on simulation of both random variables (r.v.’s) and stochastic processes
(s.p.’s). Hence, to understand better MCS we first need to introduce some
preliminaries about random number generation since basically what needs when using
MCS is generating randomly values of r.v.’s following specific probability distributions.

Part I

Sampling Random Variables

Pseudorandom number generation: sampling r.v.’s

The easiest method to generate random numbers with a fixed probability distribution
is based on the following idea:
Assume that X is a continuous r.v. with d.f. FX (x) which is continuous and strictly
increasing. Then, for x : −∞ ≤ x ≤ +∞ fixed one gets:

FX (x) = P [{ω ∈ Ω : X (ω) ≤ x}] = u ∈ [0, 1]

taking the inverse:
x = FX−1 (u) ⇒ X = FX−1 (U) .
| {z }
key result

Inverse distribution method

Let FX (x) = P [{ω ∈ Ω : X (ω) ≤ x}] be the distribution function (d.f.) of r.v. X .
U ∼ Un([0, 1]) ⇒ FX−1 (U) has d.f. FX (x).

For distributions with nonconnected support or jumps, FX is not strictly

increasing or continuous and simulating such r.v.’s becomes more difficult. In
these cases one chooses: FX−1 (u) = inf {x : FX (x) ≥ u}. An alternative method is
the so-called Acceptance–Rejection algorithm (see reference [2]).
Sometimes the computation of FX−1 is very difficult and approximations of it are
only available. Fortunately, some software like Mathematica has powerful
commands to generate r.v.’s.
Example 1: Sampling an exponential r.v. X ∼ Exp(λ ), λ > 0
In this case:
fX (x) = λ exp(−λ x) , x > 0,
Z x
FX (x) = fX (y ) dy = 1 − exp(−λ x) = u,

FX−1 (u) = − λ1 log(1 − u).

So, we can generate X as follows:

1 log(U)
X =− log(1 − U) ≡ X = − , U ∼ Un([0, 1]).
λ λ

Exercise 1: Testing the quality of inverse distribution method for sampling an

exponential r.v.
Test with Mathematica the method described above. Is it accurate?

Example 2: Sampling a standard Gaussian r.v. X ∼ N(0; 1)
In this case:
Z x  
FX (x) = √ exp − dy ⇒ FX−1 (u) is very difficult to compute!
2π −∞ 2

The following expression provides an accurate approximation:

p0 + p1 y + p2 y 2 + p3 y 3 + p4 y 4
FX−1 (u) ≈ y +
, y= −2 log(1 − u), u ∼ U([0, 0.5]).
q0 + q1 y + q2 y 2 + q3 y 3 + q4 y 4

The case 0.5 < u < 1 is handled by symmetry. The coefficients pi and qi , 0 ≤ i ≤ 4 are
given by

p0 = −0.322232431088, q0 = 0.099348462606,
p1 = −1, q1 = 0.588581570495,
p2 = −0.342242088547, q2 = 0.531103462366,
p3 = −0.0204231210245, q3 = 0.10353775285,
p4 = −0.0000453642210148, q4 = 0.0038560700634.

Exercise 2: Testing the quality of inverse distribution method for sampling a standard
Gaussian r.v.
Test with Mathematica the method described above.

Exercise 3: Generating a Poisson r.v.
The following procedure describes the generation of values, {Yn }, sampled from a
Poisson distribution of parameter γ > 0, Yn ∼ Po(γ) (hence E[Yn ] = γ). For that goal,
first remember that
y > 0, FY (y ) = P[Y ≤ y ] = e −γ ∑ , m ≤ y < m + 1.
k=0 k!

Then, for each Un ∼ U([0, 1]) previously generated, we determine

m−1 m
γk γk
m such that : e −γ ∑ < Un ≤ e −γ ∑ ⇒ put Yn = m.
k=0 k! k=0 k!

In order to generate n simulations, Yn , the above procedure is repeated n times.

Test with Mathematica method. Are they accurate? (Hint: Compare the results
provided by this method against the ones directly obtained by a Mathematica
command through histograms).
Exercise 4: Two additional methods for generating standard Gaussian r.v.’s:
Box-Muller and Polar-Marsaglia methods
Test with Mathematica the following methods. Are they accurate? (Hint: Compare
the results provided by both methods against the ones directly obtained by a
Mathematica command through histograms).


 p
 X1 = −2 ln(U1 ) cos(2πU2 ),
U1 , U2 ∼ U(0, 1) independent ⇒ p ⇒
X2 = −2 ln(U1 ) sin(2πU2 ),

⇒ X1 , X2 ∼ N(0; 1) independent ⇒ µ + σ Xi ∼ N(µ; σ 2 ), i = 1, 2, independent.

Exercise 4 (continuation): Two additional methods for generating standard Gaussian
r.v.’s: Box-Muller and Polar-Marsaglia methods
It avoids trigonometric evaluations

 V1 = 2U1 − 1,
U1 , U2 ∼ U(0, 1) independent ⇒ V1 , V2 ∼ U(−1, 1) independent ⇒
V2 = 2U2 − 1,

⇒Z = (V1 )2 + (V2 )2 ⇒

 if Z > 1 ⇒ U1 , U2 , V1 , V2 are recomputed,

 p
⇒  X1 = Z −4 ln(Z ),
if 0 < Z < 1 ⇒ ⇒ X1 , X2 ∼ N(0; 1) independent

−4 ln(Z ),
 
X2 = Z

⇒ µ + σ Xi ∼ N(µ; σ 2 ), i = 1, 2, independent.

Exercise 5: The linear congruential generator method
All the above generator methods (Inverse Transformation; Poisson; Box-Muller;
Polar-Marsaglia, etc.) are based on generating randomly distributed uniformly on
[0, 1]. In this exercise we propose to study a popular method to generate numerical
values of a r.v. on [0, 1]. The method is based on the following congruence

Xn+1 ≡ (aXn + c) mod (m), n = 0, 1, 2, . . . ,

where a, c and m are positive integers and m is typically large, and X0 is a starting
number (usually called seed).
As d mod (m) is the remainder when dividing d by m, then

0 ≤ d mod (m) ≤ m − 1,

the sequence {Un : n ≥ 0} is calculated

Un = para n = 0, 1, 2, . . . , 0 ≤ Un ≤ 1, ∀n ≥ 0.
For certain values of the parameters a, c and m the sequence Un may possess
statistical properties for numbers that are randomly distributed uniformly on [0, 1].

Exercise 5 (continuation): The linear congruential generator method
Linear congruential generators eventually repeat. If Xi+p = Xi , then the smallest value
of p is called the cycle length or period of the generator. For linear congruential
generator p ≤ m.
The next result is a useful criteria to determine the cycle length of certain linear
congruential generators when c 6= 0.
The period of a linear congruential is m if and only if the following three conditions
g.c.d.(c, m) = 1.
Every prime factor of m divides a − 1.
If 4 divides m, then 4 divides a − 1.
If c = 0 and m is a prime number, the longest possible period is m − 1. This is
achieved when

ak − 1 is not divisible by m for k = 1, 2, . . . , m − 2.

Exercise 5 (continuation): The linear congruential generator method
Using Mathematica construct the following linear congruential generators for a
uniform r.v. on [0, 1] and compare the corresponding histograms with the ones
provided by a direct command of Mathematica:
1 For the case c = 0, take a = 75 , m = 231 − 1 = 2 147 483 647 (this is a Mersenne
prime), then one gets the following congruential generator

Xn+1 ≡ (16807Xn ) mod 231 − 1 ,

n = 0, 1, 2, . . .

2 For the case c 6= 0 propose your own linear congruential generator.

Part II

Solving Random Differential Equations by

Monte Carlo Sampling

Our initial pattern example

In our exposition we will consider the following initial value problem (i.v.p.):
Ẋ (t) = −αX (t), Ẋ (t, ω) = −α(ω)X (t, ω),
⇔ , ω ∈ Ω,
X (0) = β , X (0, ω) = β (ω)

where α = α(ω) and β = β (ω) are r.v.’s defined on a common probability space
(Ω, F , P).
Notice that α and β can be independent or dependent r.v.’s. Their statistical
dependence structure is given by their joint d.f.: Fα,β (a, b). In case of both are
statistically independent, this d.f. factorizes as Fα,β (a, b) = Fα (a)Fβ (b) being Fα (a)
and Fβ (b) their respective individual (marginal) d.f.’s.

Solving a differential equation in the random context has a wider meaning than in the
deterministic scenario:

Deterministic Random
x(t) X (t, ω) = X (t)
expectation function: E[X (t)]
variance (or standard deviation) function: V[X (t)]
higher moments: E[(X (t))n ], n ≥ 3
covariance: C[X (t), X (s)]
1-p.d.f. and 1-d.f.

In practice, often expectation and variance/standard deviation functions are computed


MCS procedure
1 Generate identically and independently distributed random numbers

Z (i) = (α (i) , β (i) ), i = 1, 2, . . . , M,

according to the joint distribution of α and β : Fα,β (a, b). Notice that the
dependence structure of α and β is required to be known.
2 For each i = 1, 2, . . . , M, solve the governing deterministic equation

Ẋ (t) = −αX (t),
X (0) = β ,

and obtain:
X (i) (t) = X (t, Z (i) ).
3 Estimate the required solution statistics. For example, the expectation and
variance of the solution s.p. can be estimated, respectively, by:

X M (t) =
M ∑ X (t, Z (i) ) ≈ E[X (t)],

M 2
σ 2 [XM (t)] = ∑ X (t, Z (i) ) − X M (t) ≈ V[X (t)].
M i=1

Other solution
n statistics
o can be estimated via proper schemes from the solution
ensemble X (i) . It is obvious that Steps 1 and 3 are preprocessing and
postprocessing steps, respectively. Only Step 2 requires solution of the original
problem, and it involves repetitive simulations of the deterministic counterpart of the

MCS in practice: assuming that only the i.c. is random

Let us illustrate the solution of our particular i.v.p. via MCS with:

α = 1, β ∼ Un([0, 1]), M = 20, 0 ≤ t ≤ 3

Expectation of the solution stochastic process (s.p.)

Exact solution s.p.: X (t) = X (t; β ) = β e −t .

Exact expectation of the solution s.p.: E[X (t)] = 12 e −t ⇒ Let us check it!
Remember that: E[β ] = 12

Expectation computed by MCS with M simulations: X M (t)

M = 20 M = 50 M = 100 M = 1000
0.534086e −t 0.487041e −t 0.520565e −t 0.484646e −t

As M increases one gets:

1 −t
lim X M (t) = E[X (t)] = e
M→∞ 2
although the convergence is slow and likely no monotone.

MCS expectation approximations against the exact expectation

Standard deviation of the solution s.p.

Exact solution s.p.: X (t) = X (t; β ) = β e −t .

e −2t
Exact variance of the solution s.p.: V[X (t)] = 12 ⇒ Let us check it!
Remember that: V[β ] = 12

Standard deviation computed by MCS with M simulation: σ [XM (t)]

M = 20 M = 50 M = 100 M = 1000
0.280843e −t 0.303476e −t 0.307459e −t 0.285814e −t

Again as M increases one gets:

lim σ [XM (t)] = σ [X (t)] = √ e −t ≈ 0.288675e −t
M→∞ 2 3
but the convergence is slow and likely no monotone.

MCS standard deviation approximations against the exact one

We observe that an analogous behaviour like we noticed for the expectation

approximations. For the sake of clarity, we do not specify in the plot neither the
approximations nor the exact standard deviation.

MCS in practice: assuming that only the coefficient is random

In this case we have taken:

α ∼ Exp(2), β = 1, M = 20, 0 ≤ t ≤ 3

Notice that the diffusion coefficient (−α) is negative.

About the quality of the approximations

Error estimate of MCS follows immediately from the:

Central Limit Theorem (CLT)

H : Let X1 , X2 , . . . , XM be independent and identically distributed (i.d.d.) r.v.’s with

E[Xi ] = µ and V[Xi ] = σ 2 < ∞. Let

M √
1 XM −µ
XM = ∑ Xi , UM = M .
M i=1 σ

T : Then the distribution function of UM converges to a N(0; 1) distribution function

as N −→ ∞:
UM −−−→ Z , Z ∼ N(0; 1).

d stands for convergence in distribution.

About the quality of the approximations

n o
Since X (t, Z (i) ) are i.i.d. distributed r.v.’s for each t, one gets:

1 σ d V[X (t)]
X M (t) =
M ∑ X (t, Z (i) ) = µ + √M UM −M→∞
−−→ M(t) ∼ N E[X (t)];

Hence the standard deviation of X M (t) is M −1/2 σ [X (t)].

This leads to the widely adopted concept that error convergence rate of MCS is
inversely proportional to the square root of the number of simulations

About the quality of the approximations

Drawbacks of MCS

The convergence rate of MCS is O(1/ M) is relatively slow. Roughly speaking:
If a one-digit increase in solution accuracy of the statistics is required, one needs to run
roughly 100 times more simulations and thus the computational burden by 100 times.

We will check this statement in the computational examples!

For large and complex systems where the solution of a single deterministic realization
is time-consuming, this entails a tremendous numerical challenge.

About the quality of the approximations

Advantages of MCS

The O(1/ M) convergence rate is independent of the total number of input
r.v.’s. This turns out to be an extremly useful property that virtually no other
methods possess.
Implementation of MCS is simple.
MCS relies on deterministic methods to compute the (approximate) solutions of
every trajectory. This is a good new since a large number of powerful
deterministic methods to deal with differential equations are available.

Part III


Now we will present several illustrative examples where randomness enters into the
r.d.e. through different terms (i.c.’s, coefficients and/or source term) and considering
different probability distributions. This includes independent and dependent structure
among the involved random inputs. In addition, we will also consider different ways to
introduce statistical dependence including copulas method.
Computations will be carried out by Mathematica.

We will consider the following initial value problem (i.v.p.):

Ẋ (t) = αX (t) + β ,
X (0) = γ,
The examples are:
1 γ is a r.v.: γ ∼ Un([0, 1]).
2 α is a r.v.: α ∼ Exp(λ = 2).
3 γ and α are independent r.v.’s: γ ∼ Un([0, 1]) and α ∼ N(−1; 1).
4 α and β are dependent r.v.’s: θ = (α, β ) ∼ N2 (µ; Σ) where
−1 1 1
µθ = , Σθ = .
1 1 4
5 α and β are dependent r.v.’s generated by a kernel distribution from a sample.
6 α and β are dependent r.v.’s whose dependence structure is generated by a
copula (by Farlie–Gumbel–Morgenstern).

1 D.P. Kroese, T. Taimre and Z.I. Botev (2011): Handbook of Monte Carlo
Methods, Wiley Series in Probability and Statistics, John Wiley and Sons, New
2 S.M. Ross (1990): A Course in Simulation, Macmillan, New York.

