Week6 Handout

IS principles Example Options Path-dependent options
wi3425TU—Monte Carlo methods
L.E. Meester
Week 6
1/ 44
Week 6—Program for this week

Importance sampling is like a Formula 1 racing car:
if you know how to drive you can go very fast; otherwise,
you might end up in the hay (if you’re lucky).
1 Importance sampling principles
2 Continuing last week’s example
3 Importance sampling for options
How IS works with the normal distribution
Watch (out with) the weights
4 Path-dependent options
Multi-dimensional importance sampling
Steering the price path
A word of caution
Inserted at the start: Points of attention for Monte Carlo
simulations that you “should” apply in your MC-life, but definitely
on the exam. . . 2/ 44
Points of attention for Monte Carlo simulations I
1 Analyse your problem. Do this, before you start

programming. Are there alternative ways to formulate it?
This may lead to alternative/better solutions and/or
simulation possibilities.
2 Accuracy. Estimates should (where possible) always be
accompanied by an indication of their accuracy: standard
errors or confidence intervals (make sure the confidence level
is clear). The notation 3.12 ± 0.14 (s.e.) indicates an estimate
of 3.12 with a standard error of 0.14. If it is understood which
it is, 3.12 ± 0.14 or 3.12 (0.14) would suffice.
3/ 44
Points of attention for Monte Carlo simulations II
3 Significant digits. How many of the digits are significant

depends on the accuracy. In case of an estimate of
3.1237920388 with a standard error of 0.121298375 the 95%
confidence interval is about 3.1237920388 ± 0.2377448150.
This says that the estimate is accurate to about 0.24.
Everything from the third digit after the decimal point carries
no additional information and is therefore better omitted: the
answer 3.12 ± 0.24 (95%CI) contains all the useful
information.
A good rule of thumb: round the standard error (or the ±
part of the confidence interval) to two significant digits; then
state your estimate with the same number of digits as the
standard error. So if your estimate is 9823.34 and your
standard error 327.89 you would write: 9820 ± 330 (s.e.).
4/ 44
Points of attention for Monte Carlo simulations III
4 Report relevant parameters with your results. If there are

parameters whose values can be set at your discretion, like the
number of replications, the step size, the control variate
parameter θ, one often experiments with them while
simulating. So, even if the first line of code is M=1e3, this
does not mean that this was the case for the reported results.
5 Consistency. In some situations a quantity may be estimated
in more than one way or using several methods. The resulting
estimates should then be consistent: differences between
them, expressed in the number of (appropriately computed)
standard errors, should not be too big. If they are, then
something is wrong and should be checked. . .
5/ 44
Points of attention for Monte Carlo simulations IV
6 Mistakes, checks. Everybody makes mistakes: errors of

reasoning, programming errors, etcetera. So, use every
opportunity you have to catch those mistakes. Never blindly
believe the final answer your program spits out, but scrutinize
intermediate results for errors. Usually, there are enough
things around that you might check. If necessary, create a
testing opportunity, for example a special case for which you
know the answer.
7 Random seeds. Initialise the randomgenerator(s) so that
your results are reproducible. Set a seed chosen by yourself: if
we all mimick Higham and start our program with
rand(’state’,100) and randn(’state’,100), how
random is that?
6/ 44
Points of attention for Monte Carlo simulations V
8 Variance reduction. When applying variance reduction

methods, there are several possibilities; in order of
attractiveness: a) you know that the method will bring
reduction, and you can show this (before simulating); b) you
think that the method will bring some reduction, and you
have some arguments to support it (perhaps intuitive ones);
or c) you just try it out. Try to be aware which of those
situations you are in, and if you “just tried something” and
afterwards realized “I could have known that this would
happen” then try to make explicit why this is so and how you
could have known—this is how you learn to get better at this.
9 Bias. Some methods produce biased estimates: there is a
systematic deviation with respect to the unknown quantity
you are estimating. Make sure you are aware of this.
7/ 44
Points of attention for Monte Carlo simulations VI
10 Accuracy in the presence of bias. The Monte Carlo rule

“one hundred times as many replications gives me an
additional digit of accuracy” is no longer true when there is
bias, so then blindly going for as large an M as possible is
senseless. It is better to find a balance between the size of the
(remaining) bias and the standard error. This is a hard and
sometimes unsolvable problem—do what you can.
8/ 44

A word of caution
9/ 44
Importance sampling: the principle
For a function k : R → R one can determine

Z ∞
I = k(x)f (x)dx,
−∞
by Monte Carlo via I = E [k(T )] where T has density f .

Suppose: g is another probability density on R such that
g (x) > 0 if f (x) > 0; then:
Z ∞
f (x)
I = k(x) g (x) dx.
−∞ g (x)

f (X )
This we interpret as I = E k(X ) , where X has pdf g .
g (X )
10/ 44
The likelihood ratio (LR)
The ratios
f (x) f (X )
w (x) = and w (X ) =
g (x) g (X )
are called the likelihood ratio (LR).

We write Z ∞
I = k(x)w (x) g (x)dx
−∞
and
I = E [k(X )w (X )] , X ∼ g.
11/ 44
Sampling from the “wrong” distribution can be made OK
It’s OK to sample from g instead of f if we reweigh the results:
Values of X sampled from the interval (x, x + dx)

occur with frequency g (x)dx;
are multiplied by f (x)/g (x);
so contribute the correct amount f (x)dx to the integral I .
Example:
Suppose g (x0 ) = 2 f (x0 ) then
under g values near x0 happen twice as often as they should;
the weight w (x0 ) = f (x0 )/g (x0 ) = 0.5 corrects this.
12/ 44
How can we use this to our advantage?
We could obtain a small variance for k(X )w (X ) if
k(x)f (x)
k(x)w (x) = ≈ constant;
g (x)
so we should choose g (x) ≈ constant · k(x)f (x);

i.e., g (x) should be large when k(x)f (x) is large.
This makes sense intuitively, because corresponding x-values
contribute most to the integral
Z ∞
I = k(x)f (x)dx.
−∞
Name of the method: importance sampling (IS)—we look

for g that sample the important values more often than f .
13/ 44
The zero-variance distribution: a mirage? (not on exam)
The optimal g would be
g (x) = constant · k(x)f (x);
which is only possible if the function k is nonnegative;

this g the so-called zero-variance density.
If you plug it into E [k(X )w (X )] you find
k(X )w (X ) = constant,
which means that the variance is zero!

However, the constant equals (the unknown) I . . .
Even though this may look stupid now, it is a very useful guideline
(look up “Approximate zero-variance”).
14/ 44

A word of caution
15/ 44
Last week: tried IS on determining π
√
Earlier we used Y = 4 1√− U 2 with U ∼ U (0, 1) to
R1
estimate π: E [Y ] = 0 4 1 − x 2 dx = π and Var(Y ) ≈ 0.8.
√
Fits framework: k(x) = 4 1 − x 2 and f (x) = 1 for
0 ≤ x ≤ 1, f (x) = 0 elsewhere, so that E [k(U)] = I .
Last week we tried for g :
g (x) = 32 (1 − x 2 ); failed, we did not see how to simulate this;
g2 (x) = 32 (2 − x); worked, found R1 ≈ 0.25; SchatPi IS2.m.
Two more g : [0, 1] → R are instructive to explore:

√
g (x) ∝ 1 − x 2 .
g (x) ∝ 1 − x.
16/ 44
√
IS for π: g (x) ∝ 1 − x2
√
Plan: try g (x) ∝ 1 − x 2 (where ∝ means proportional to);
get the distribution function G ; then we can simulate from g using
the inverse distribution function method: solve G (x) = u etc.
√
Set g (x) = 1 − x 2 . With Maple we find:
Z t
1 p 1
G (t) = g (x)dx = t 1 − t 2 + arcsin (t)
0 2 2
We forgot: to normalize G , divide by G (1) which equals 14 π.

Even if you could solve G (t) = u, the answer involves our
unknown π. . .
This happens if you try to get the zero-variance g : a vicious circle.
17/ 44
IS for π: g (x) ∝ 1 − x
If we take g1 (x) = 2 (1 − x) for 0 ≤ x ≤ 1, we find

G1 (x) = 1 − (1 − x)2 for 0 ≤ x ≤ 1 and
√
G1 (x) = u⇔ 1 − u = (1 − x)2 ⇔ x =1− 1 − u;
√
so X = 1 − U has cdf G1 .
If we simulate (SchatPi_IS.m) we find:
Var(k(X )w (X )) ≈ 2.14, a detoriation from 0.797.
r
1+X
Explanation: for this X , k(X )w (X ) = 2 and this
1−X
goes to ∞ for X near 1:
18/ 44
19/ 44
IS principles Example Options Path-dependent options IS for the normal distribution Watch the weights

A word of caution
20/ 44
A simple example of how importance sampling may help
For the price of a European call-option we can write V = E [k(Z )],

with Z standard normal and

σ2 √
k(Z ) = e−rT max S0 e(r − 2 ) T +σ T ·Z − E .
Suppose we want to determine the price by simulation, for several

high strikes. Parameters: S0 = 10, σ = 0.1, r = 0.06, T = 1.
Strikes from E = 9 to E = 17; see EurCall IS.m.
21/ 44
Output from EurCall IS.m:
strike V̂ s.e. s.e. as fraction of V̂

9 1.54269 0.00097 0.00063
11 0.25080 0.00050 0.00199
13 0.008774 0.000087 0.00986
15 0.0000824 0.0000075 0.09072
17 0 0 NaN
The price drops off, the relatieve standard error (s.e. divided by
estimate) increases: 9% at E = 15; at E = 17: error!
22/ 44
Analysis of the simulation
Payoff zero means we’re out of the money: S(T ) ≤ E .

Writing things out we get S(T ) = 10 · exp(0.055 + 0.1 · Z ) so
k(Z ) = 0 ⇔ S(T ) ≤ E ⇔ 0.055+0.1·Z ≤ ln(E /10);
For E = 15 this applies if Z ≤ 3.50.

Therefore, we are trying to estimate E [k(Z )] where
P(k(Z ) = 0) = P(Z ≤ 3.5) ≈ 0.9998
The majority of the simulated values is ZERO! . . .

At E = 17 apparently all of them. . . .
23/ 44
24/ 44

A word of caution
25/ 44
Importance sampling for the normal, with a shift
In the preceding example the simulated prices were too low to

generate enough non-zero payoffs.
Want to generate higher values of S(T ) more frequently.
2
Writing k(z) for the discounted payoff and f (z) = √12π e−z /2 ,
our importance sampling equations are:
Z ∞ Z ∞
f (y )
V = k(z) f (z) dz = k(y ) g (y ) dy = E [k(Y ) w (Y )] ,
−∞ −∞ g (y )
where
f (Y )
w (Y ) = and Y has density g .
g (Y )
26/ 44
2
Let’s be simple: set Y ∼ N (µ, 1), so g (y ) = √1 e−(y −µ) /2 ,
2π
with µ to be determined. Then:
1 2
√1 e− 2 y
f (y ) 1 2 − 1 (y −µ)2 1 2 −µ y
w (y ) = = 2π
1 2
= e− 2 y 2 = e2µ .
g (y ) √1 e− 2 (y −µ)
2π
So h i
1 2
V = E k(Y ) e 2 µ −µY , for Y ∼ N (µ, 1),
may serve as a base for an importance sampling simulation.

See: EurCall IS2.m (µ = 3.5).
Even better is a µ dependent on the strike: choose µ so that
P(k(Y ) > 0) = 0.5 for each E (EurCall IS3.m).
27/ 44
Results
EurCall IS3.m: for each E , choose µ so that P(k(Y ) > 0) = 0.5.
µ E V̂ s.e. rel s.e.

-1.60 9 1.5370 0.0116 0.0075
0.40 11 0.2510 3.18e-04 0.0013
2.07 13 0.0088 1.02e-05 0.0012
3.50 15 8.29e-05 1.14e-07 0.0014
4.76 17 3.13e-07 4.96e-10 0.0016
Except for the first two strikes the relative s.e. has improved.
Our criterion is somewhat arbitrary; the idea is to have
enough samples that end in the money, but not too many (try
this out to see for yourself).
There is another point of attention: the weights.
28/ 44

A word of caution
29/ 44
A few remarks about the weights
The expected value of the weights is 1:

Z Z Z
f (y )
E [w (Y )] = w (y )g (y ) dy = g (y ) dx = f (y ) dx = 1.
g (y )
But the distribution can be so skewed (for µ = 3.5 below)

that we can only look on a log-scale:
Note that weights between 10−6 and 10 are all quite common.
30/ 44
A warning: we need to watch the weight distribution
Note that weights between 10−6 and 10 are all quite common.
Some samples have a weight 107 times that of others. . .
This potentially means trouble: imagine this
1 observation with weight 10;
999999 with weight 10−6 .
Recall that sample mean and standard deviation are not very
robust to outliers: if the weight distribution is very extreme,
these estimators break down.
31/ 44
IS principles Example Options Path-dependent options Multi-dim IS Steering the price path A word of caution

A word of caution
32/ 44
Importance sampling: multi-dimensional
Suppose I = E [k(Z1 , . . . , Zm )] where Zi has density fi and Z1 , . . . ,

Zm are independent. Then, just as before:
I = E [k(Z1 , . . . , Zm )] = E [k(Y1 , . . . , Ym ) w (Y1 , . . . , Ym )] ,
where Yi has density gi and Y1 , . . . , Ym are independent, and

Qm m
f (y1 , . . . , ym ) i=1 fi (yi ) fi (yi )
Y
w (y1 , . . . , ym ) = = m
Q = .
g (y1 , . . . , ym ) i=1 gi (yi ) gi (yi )
i=1
Requirement, again: the numerator must be positive for all y1 , . . . ,

ym for which the denominator is positive.
33/ 44
The choice of g1 , . . . , gm
A good choice for g1 , . . . , gm puts larger weight on large

values of k · f (in the ideal situation the quotient is constant).
In practice this is not always easy.
Intuitive principle: make sure that important values occur
(more) often (compare with the earlier example).
For path dependent options: make sure that paths for which
the option ends in-the-money have a large(r) probability of
occurring.
34/ 44
Application: Path dependent options
Based on the risk-neutral asset price model

2 √
S(ti+1 ) = S(ti ) e(r −σ /2)(ti+1 −ti )+σ ti+1 −ti ·Zi
, i = 1, . . . , n, (1)
for S(t0 ), . . . , S(tn ), many option prices may be written as
V = E [k(Z1 , . . . , Zm )] with Z1 , . . . , Zm independent N (0, 1).
Suppose we take Y1 , . . . , Ym independent, with Yi ∼ N (µi , 1)

then
V = E [k(Y1 , . . . , Ym ) w (Y1 , . . . , Ym )] .
Weight function
m
f (y1 , . . . , ym ) Y fi (yi )
w (y1 , . . . , ym ) = = .
g (y1 , . . . , ym ) gi (yi )
i=1
35/ 44
The weight function is a product of factors we already saw:
fi (yi ) 1 2

wi (yi ) = = exp 2 µi − µi yi .
gi (yi )
so w (y1 , . . . , ym ) =
1 2 1 2
e 2 µ1 −µ1 y1 · · · e 2 µm −µm ym
The value k(Y1 , . . . , Ym ) should have weight:

m m m
!
Y 1 2 X X
µ −µi Yi
w (Y1 , . . . , Ym ) = e 2 i = exp 1
2 µ2i − µi Yi .
i=1 i=1 i=1
This is just the product of what we get in the 1-dim case.
36/ 44
Parameters µi should be selected to give “important” paths a

larger chance of occurring.
Looking at
2 /2)(t √
S(ti+1 ) = S(ti ) e(r −σ i+1 −ti )+σ ti+1 −ti ·Zi
we see: simulating with Yi instead of Zi boils down to:

√
adding of an extra drift term µi σ ti+1 − ti in the exponent.
This is so because Yi has the same distribution as Zi + µi and
it is as if Zi is replaced by Zi + µi .
With this, one can get some idea of the effect (just as in the
call option example earlier):
with µi > 0 we add extra upward drift and
with µi < 0 downward drift.
37/ 44

A word of caution
38/ 44
How do you “steer” the price path with IS?
Assume a fixed stepsize ∆t, T = N · ∆t.

Steering: suppose at time T (not necessarily “expiration”) we
want P(S(T ) > E ) ≈ 0.5. The current asset price model:
 
N √
X
S(T ) = S0 · exp (r − 12 σ 2 ) · T + σ ∆t · Zj  .
j=1
Importance sampling: Zj becomes Yj = Zj + µj , so the

exponent becomes:
N √
X N √
X
(r − 12 σ 2 ) · T + σ ∆t · Zj + σ ∆t · µj .
j=1 j=1
(extra drift terms in blue).

39/ 44
The exponent:
N √
X N √
X
(r − 21 σ 2 ) · T + σ ∆t · Zj + σ ∆t · µj .
j=1 j=1
The rest is just computation: the exponent has a normal

distribution, so I should try to get the median (which
corresponds to Zj = 0) to equal ln E /S0 .
So, solve: N √
X
(r − 21 σ 2 ) · T + σ ∆t · µj = ln E /S0 .
j=1
For the common choice µj = µ this leads to
ln E /S0 − (r − 12 σ 2 )T
µ= √ .
σN ∆t
40/ 44
A numerical example: S0 = 5, E = 6, σ = 0.3, r = 0.05,

T = 1, ∆t = 10−3 , µj = µ for all j, then
√
0.005 + 3 10 · µ = ln 6/5, or µ ≈ 0.0187.
See ch19 IS.m for an implementation example.

A variant: given S(t0 ), how do we accomplish that at time t1
the median of S(t1 ) is at the barrier-level B?
Assume the stepsize is ∆t and t1 − t0 = N1 ∆t. This leads to
a slightly modified equation and solution:
ln B/S(t0 ) − (r − 12 σ 2 )(t1 − t0 )
µ= √ .
σN1 ∆t
For a down-and-in-call with barrier quite below S0 one might
want to first steer the path down to hit the barrier and then
up to be in the money at expiration.
41/ 44

A word of caution
42/ 44
Overview importance sampling
Importance sampling: for E [k(X )] purposely simulating X

from a distribution different from the intended one.
To correct this, simulated values k(X ) have weight w (X ).
Weights: realizations both smaller and larger than 1 occur.
w : the ratio of two probability densities, therefore also called
likelihood ratio or LR.
E [w (X )] = 1 and also E [w (Y1 , . . . , Ym )] = 1: on average the
weight is 1.
The method is a bit more delicate than, for example,
antithetic variables.
If you “push g too far away from f ” the method may fail:
43/ 44
A warning and a rule of thumb
IS-estimators: always unbiased; variance reduction: not

guaranteed.
Pushed too far, the distribution of the weights w (Y1 , . . . , Ym )
becomes very skewed and Var(w (Y1 , . . . , Ym )) may become
so big that the variance of the IS-estimator blows up.
How can this be avoided: Check the range of the
distribution of weights, either theoretically (in the examples
they have a log-normal distribution), or by looking at a
histogram of (the 10-logs of) the simulated weights.
See EurCall_IS2.m and ch19_IS.m.
A (conservative) rule of thumb: with weights between 10−5
and 102 you are OK; (far) outside this range you are at risk.
(For some strikes EurCall_IS3.m is on the borderline).
44/ 44

Week6 Handout

Uploaded by

Copyright:

Available Formats

You might also like

Week6 Handout

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week6 Handout

Uploaded by

Copyright:

Available Formats

IS principles Example Options Path-dependent options

wi3425TU—Monte Carlo methods

Week 6—Program for this week

Points of attention for Monte Carlo simulations I

1 Analyse your problem. Do this, before you start

Points of attention for Monte Carlo simulations II

3 Significant digits. How many of the digits are significant

Points of attention for Monte Carlo simulations III

4 Report relevant parameters with your results. If there are

Points of attention for Monte Carlo simulations IV

6 Mistakes, checks. Everybody makes mistakes: errors of

Points of attention for Monte Carlo simulations V

8 Variance reduction. When applying variance reduction

Points of attention for Monte Carlo simulations VI

10 Accuracy in the presence of bias. The Monte Carlo rule

1 Importance sampling principles

2 Continuing last week’s example

3 Importance sampling for options

Importance sampling: the principle

For a function k : R → R one can determine

by Monte Carlo via I = E [k(T )] where T has density f .

The likelihood ratio (LR)

are called the likelihood ratio (LR).

Sampling from the “wrong” distribution can be made OK

It’s OK to sample from g instead of f if we reweigh the results:

Values of X sampled from the interval (x, x + dx)

How can we use this to our advantage?

We could obtain a small variance for k(X )w (X ) if

so we should choose g (x) ≈ constant · k(x)f (x);

Name of the method: importance sampling (IS)—we look

The zero-variance distribution: a mirage? (not on exam)

The optimal g would be

g (x) = constant · k(x)f (x);

which is only possible if the function k is nonnegative;

which means that the variance is zero!

1 Importance sampling principles

2 Continuing last week’s example

3 Importance sampling for options

Last week: tried IS on determining π

Two more g : [0, 1] → R are instructive to explore:

We forgot: to normalize G , divide by G (1) which equals 14 π.

If we take g1 (x) = 2 (1 − x) for 0 ≤ x ≤ 1, we find

1 Importance sampling principles

2 Continuing last week’s example

3 Importance sampling for options

A simple example of how importance sampling may help

For the price of a European call-option we can write V = E [k(Z )],

Suppose we want to determine the price by simulation, for several

Output from EurCall IS.m:

strike V̂ s.e. s.e. as fraction of V̂

Analysis of the simulation

Payoff zero means we’re out of the money: S(T ) ≤ E .

k(Z ) = 0 ⇔ S(T ) ≤ E ⇔ 0.055+0.1·Z ≤ ln(E /10);

For E = 15 this applies if Z ≤ 3.50.

P(k(Z ) = 0) = P(Z ≤ 3.5) ≈ 0.9998

The majority of the simulated values is ZERO! . . .

1 Importance sampling principles