2020 Notes Numprofin

Méthodes numériques probabilistes avancées en finance
M2MO, Trimestre 3 , Février-Mars 2020
Jean-Francois CHASSAGNEUX∗,
- Documents de cours: Moodle
∗
Université de Paris, U.F.R. de Mathématiques, chassagneux@lpsm.paris
1
References
[1] Frédéric Abergel and Rémi Tachet. A nonlinear partial integro-differential equa-
tion from mathematical finance. Discrete and Continuous Dynamical Systems-
Series A, 27(3):907–917, 2010.
[2] Yacine Ait-Sahalia. Testing continuous-time models of the spot interest rate.
The review of financial studies, 9(2):385–426, 1996.
[3] Aurélien Alfonsi et al. Affine diffusions and related processes: simulation, theory
and applications, volume 6. Springer, 2015.
[4] Aurélien Alfonsi, Benjamin Jourdain, and Arturo Kohatsu-Higa. Pathwise opti-
mal transport bounds between a one-dimensional diffusion and its euler scheme.
The Annals of Applied Probability, 24(3):1049–1080, 2014.
[5] Vlad Bally and Gilles Pages. Error analysis of the optimal quantization al-
gorithm for obstacle problems. Stochastic processes and their applications,
106(1):1–40, 2003.
[6] Vlad Bally, Gilles Pages, et al. A quantization algorithm for solving multidi-
mensional discrete-time optimal stopping problems. Bernoulli, 9(6):1003–1049,
2003.
[7] Vlad Bally and Denis Talay. The law of the euler scheme for stochastic differ-
ential equations. Probability theory and related fields, 104(1):43–60, 1996.
[8] Denis Belomestny et al. Solving optimal stopping problems via empirical dual
optimization. The Annals of Applied Probability, 23(5):1988–2019, 2013.
2
[9] Denis Belomestny and John Schoenmakers. Projected particle methods for
solving mckean–vlasov stochastic differential equations. SIAM Journal on Nu-
merical Analysis, 56(6):3169–3195, 2018.
[10] Jean-Michel Bismut. Conjugate convex functions in optimal stochastic control.
Journal of Mathematical Analysis and Applications, 44(2):384–404, 1973.
[11] Jean-Michel Bismut. Contrôle des systemes linéaires quadratiques: applications
de l’intégrale stochastique. In Séminaire de Probabilités XII, pages 180–264.
Springer, 1978.
[12] François Bolley. Separability and completeness for the wasserstein distance. In
Séminaire de probabilités XLI, pages 371–377. Springer, 2008.
[13] Mireille Bossy and Denis Talay. Convergence rate for the approximation of the
limit law of weakly interacting particles: application to the burgers equation.
[14] Mireille Bossy and Denis Talay. A stochastic particle method for the mckean-
vlasov and the burgers equation. Mathematics of Computation of the American
Mathematical Society, 66(217):157–192, 1997.
[15] Bruno Bouchard, Jean-François Chassagneux, et al. Fundamentals and ad-
vanced techniques in derivatives hedging. Springer, 2016.
[16] Bruno Bouchard, Ivar Ekeland, and Nizar Touzi. On the malliavin approach to
monte carlo approximation of conditional expectations. Finance and Stochas-
tics, 8(1):45–71, 2004.
3
[17] Bruno Bouchard and Nizar Touzi. Discrete-time approximation and monte-carlo
simulation of backward stochastic differential equations. Stochastic Processes
and their applications, 111(2):175–206, 2004.
[18] Gerard Brunick and Steven Shreve. Mimicking an itô process by a solution of a
stochastic differential equation. The Annals of Applied Probability, 23(4):1584–
1628, 2013.
[19] René Carmona, François Delarue, Gilles-Edouard Espinosa, and Nizar Touzi.
Singular forward–backward stochastic differential equations and emissions
derivatives. The Annals of Applied Probability, 23(3):1086–1128, 2013.
[20] René Carmona, François Delarue, et al. Probabilistic Theory of Mean Field
Games with Applications I-II. Springer, 2018.
[21] René Carmona, Jean-Pierre Fouque, and Li-Hsien Sun. Mean field games and
systemic risk. Communications in Mathematical Sciences, 13(4):911–933, 2015.
[22] Jean-François Chassagneux. Linear multistep schemes for bsdes. SIAM Journal
on Numerical Analysis, 52(6):2815–2836, 2014.
[23] Jean-François Chassagneux, Adrien Richou, et al. Numerical simulation of
quadratic bsdes. The Annals of Applied Probability, 26(1):262–304, 2016.
[24] Jean-François Chassagneux, Lukasz Szpruch, and Alvin Tse. Weak quantitative
propagation of chaos via differential calculus on the space of measures. arXiv
preprint arXiv:1901.02556, 2019.
4
[25] Emmanuelle Clément, Damien Lamberton, and Philip Protter. An analysis of
a least squares regression method for american option pricing. Finance and
Stochastics, 6(4):449–471, 2002.
[26] D Crisan and K Manolarakis. Solving backward stochastic differential equations
using the cubature method: Application to nonlinear pricing. In Progress in
analysis and its applications, pages 389–397. World Scientific, 2010.
[27] Dan Crisan and Konstantinos Manolarakis. Solving backward stochastic differ-
ential equations using the cubature method: application to nonlinear pricing.
SIAM Journal on Financial Mathematics, 3(1):534–571, 2012.
[28] Dan Crisan, Konstantinos Manolarakis, et al. Second order discretization of
backward sdes and simulation with the cubature method. The Annals of Applied
Probability, 24(2):652–678, 2014.
[29] Dan Crisan, Konstantinos Manolarakis, and Nizar Touzi. On the monte carlo
simulation of bsdes: An improvement on the malliavin weights. Stochastic
Processes and their Applications, 120(7):1133–1158, 2010.
[30] PE Chaudru de Raynal and CA Garcia Trillos. A cubature based algorithm to
solve decoupled mckean–vlasov forward–backward stochastic differential equa-
tions. Stochastic Processes and their Applications, 125(6):2206–2255, 2015.
[31] Nicole El Karoui, Shige Peng, and Marie Claire Quenez. Backward stochastic
differential equations in finance. Mathematical finance, 7(1):1–71, 1997.
[32] Michael B Giles. Multilevel monte carlo path simulation. Operations Research,
56(3):607–617, 2008.
5
[33] Paul Glasserman. Monte Carlo methods in financial engineering, volume 53.
Springer Science & Business Media, 2013.
[34] E Gobet and P Turkedjiev. Adaptive importance sampling in least-squares
monte carlo algorithms for backward stochastic differential equations. Stochastic
Processes and their applications, 127(4):1171–1203, 2017.
[35] Emmanuel Gobet. Weak approximation of killed diffusion using euler schemes.
Stochastic processes and their applications, 87(2):167–197, 2000.
[36] Emmanuel Gobet. Monte-Carlo methods and stochastic processes: from linear
to non-linear. CRC Press, 2016.
[37] Emmanuel Gobet, Jean-Philippe Lemor, Xavier Warin, et al. A regression-
based monte carlo method to solve backward stochastic differential equations.
[38] Emmanuel Gobet, José G López-Salas, Plamen Turkedjiev, and Carlos
Vázquez. Stratified regression monte-carlo scheme for semilinear pdes and bsdes
with large scale parallelization on gpus. SIAM Journal on Scientific Computing,
38(6):C652–C677, 2016.
[39] Emmanuel Gobet and Plamen Turkedjiev. Linear regression mdp scheme for
discrete backward stochastic differential equations under general conditions.
Mathematics of Computation, 85(299):1359–1391, 2016.
[40] Emmanuel Gobet, Plamen Turkedjiev, et al. Approximation of backward
stochastic differential equations using malliavin weights and least-squares re-
gression. Bernoulli, 22(1):530–562, 2016.
6
[41] S Graf, H Luschgy, et al. Asymptotics of the quantization errors for self-similar
probabilities. Real Analysis Exchange, 26(2):795–810, 2000.
[42] Carl Graham, Thomas G Kurtz, Sylvie Méléard, Philip Protter, and Mario
Pulvirenti. Probabilistic Models for Nonlinear Partial Differential Equations:
Lectures Given at the 1st Session of the Centro Internazionale Matematico Es-
tivo (CIME) Held in Montecatini Terme, Italy, May 22-30, 1995. Springer,
2006.
[43] J Guyon and P Henry-Labordere. Non linear pricing. crc financial mathematics,
2014.
[44] Julien Guyon and Pierre Henry-Labordere. The smile calibration problem
solved. Available at SSRN 1885032, 2011.
[45] Julien Guyon and Pierre Henry-Labordère. Being particular about calibration.
Risk, 25(1):88, 2012.
[46] István Gyöngy. Mimicking the one-dimensional marginal distributions of pro-
cesses having an itô differential. Probability theory and related fields, 71(4):501–
516, 1986.
[47] Martin B Haugh and Leonid Kogan. Pricing american options: a duality ap-
proach. Operations Research, 52(2):258–270, 2004.
[48] Martin Hutzenthaler, Arnulf Jentzen, and Peter E Kloeden. Strong and weak
divergence in finite time of euler’s method for stochastic differential equations
with non-globally lipschitz continuous coefficients. Proceedings of the Royal
7
Society A: Mathematical, Physical and Engineering Sciences, 467(2130):1563–
1576, 2010.
[49] Benjamin Jourdain and Alexandre Zhou. Existence of a calibrated regime
switching local volatility model and new fake brownian motions. arXiv preprint
arXiv:1607.00077, 2016.
[50] Rafail Khasminskii. Stochastic stability of differential equations, volume 66.
Springer Science & Business Media, 2011.
[51] P. Kloeden and E. Platen. Numerical Solution of Stochastic Differential Equa-
tions. Springer, 1992.
[52] Vassili N Kolokoltsov. Nonlinear Markov processes and kinetic equations, vol-
ume 182. Cambridge University Press, 2010.
[53] Hiroshi Kunita. Stochastic flows and stochastic differential equations, vol-
ume 24. Cambridge university press, 1997.
[54] Daniel Lacker, Mykhaylo Shkolnikov, and Jiacheng Zhang. Inverting the marko-
vian projection, with an application to local stochastic volatility models. arXiv
preprint arXiv:1905.06213, 2019.
[55] V. Lemaire and G. Pagès. Multilevel Richardson-Romberg extrapolation. ArXiv
e-prints, January 2014.
[56] Jean-Philippe Lemor, Emmanuel Gobet, Xavier Warin, et al. Rate of con-
vergence of an empirical regression method for solving generalized backward
stochastic differential equations. Bernoulli, 12(5):889–916, 2006.
8
[57] A Lipton. The vol-smile problem. Risk, 15:61–65, 2002.
[58] Yating Liu. Optimal Quantization: Limit Theorems, Clustering and Simulation
of the McKean-Vlasov Equation. PhD thesis, Sorbonne Université, UPMC;
Laboratoire de Probabilités, Statistique et . . . , 2019.
[59] Francis A Longstaff and Eduardo S Schwartz. Valuing american options by
simulation: a simple least-squares approach. The review of financial studies,
14(1):113–147, 2001.
[60] Harald Luschgy, Gilles Pagès, et al. Functional quantization rate and mean
regularity of processes with an application to lévy processes. The Annals of
Applied Probability, 18(2):427–469, 2008.
[61] Jin Ma and Jiongmin Yong. Forward-backward stochastic differential equations
and their applications. Number 1702. Springer Science & Business Media, 1999.
[62] Gilles Pages. Quadratic optimal functional quantization of stochastic processes
and numerical applications. In Monte Carlo and Quasi-Monte Carlo Methods
2006, pages 101–142. Springer, 2008.
[63] Gilles Pagès. Numerical Probability: An Introduction with Applications to Fi-
nance. Springer, 2018.
[64] Gilles Pagès and Abass Sagna. Improved error bounds for quantization based
numerical schemes for bsde and nonlinear filtering. Stochastic Processes and
their Applications, 128(3):847–883, 2018.
[65] Etienne Pardoux and Shige Peng. Adapted solution of a backward stochastic
differential equation. Systems & Control Letters, 14(1):55–61, 1990.
9
[66] Etienne Pardoux and Shige Peng. Backward stochastic differential equations
and quasilinear parabolic partial differential equations. In Stochastic partial
differential equations and their applications, pages 200–217. Springer, 1992.
[67] Étienne Pardoux and Aurel Răşcanu. Stochastic differential equations, Back-
ward SDEs, Partial differential equations, volume 69. Springer, 2014.
[68] Huyên Pham. Continuous-time stochastic control and optimization with finan-
cial applications, volume 61. Springer Science & Business Media, 2009.
[69] Vladimir Piterbarg. Markovian projection method for volatility calibration.
Available at SSRN 906473, 2006.
[70] Daniel Revuz and Marc Yor. Continuous martingales and Brownian motion,
volume 293. Springer Science & Business Media, 2013.
[71] Leonard CG Rogers. Monte carlo valuation of american options. Mathematical
Finance, 12(3):271–286, 2002.
[72] Alain-Sol Sznitman. Topics in propagation of chaos. In Ecole d’été de probabil-
ités de Saint-Flour XIX—1989, pages 165–251. Springer, 1991.
[73] Denis Talay and Luciano Tubaro. Expansion of the global error for numeri-
cal schemes solving stochastic differential equations. Stochastic analysis and
applications, 8(4):483–509, 1990.
[74] John N Tsitsiklis and Benjamin Van Roy. Regression methods for pricing com-
plex american-style options. IEEE Transactions on Neural Networks, 12(4):694–
703, 2001.
10
[75] Cédric Villani. Optimal transport: old and new, volume 338. Springer Science
& Business Media, 2008.
[76] Jianfeng Zhang et al. A numerical scheme for bsdes. the annals of applied
probability, 14(1):459–488, 2004.
[77] Alexandre Zhou. Etude théorique et numérique de problèmes non linéaires au
sens de McKean en finance. PhD thesis, Paris Est, 2018.
11
Notations
Sc2 the space of stochastic process satisfying

" #
2
E sup |Xt | <∞
t∈[0,T ]
and whose sample paths are continuous.
12
Part I
Handouts
13
1 Introduction
These handouts have non-empty intersection with some textbooks: [51, 36, 63, 33]
14
2 Review of the linear case
2.1 Mathematical & Financial Framework
• (Ω, A, P) is a probability space supporting a d-dimensional Brownian Motion
W , F is the filtration generated by W .
• T > 0 is a terminal date.
2.1.1 SDEs
• We consider on [0, T ] the following SDE
dXt = b (t, Xt ) dt + σ (t, Xt ) dWt . (2.1)
where b and σ are measurable function.
• (HL)(i): σ and b are Lipschitz-Continuous in time and space

|b(t, x) − b(t0 , x0 )| + |σ(t, x) − σ(t0 , x0 )| ≤ L |x − x0 | + |t − t0 | .
Theorem 2.1. (strong Existence and uniqueness for SDEs) Under (HL)(i), there
exists a unique1 continuous F-adapted process X taking its values in Rd such that
Z t d Z
X t
Xti = xi0 + i
b (s, Xs ) ds + σ ij (s, Xs ) dWsj (2.2)
0 j=1 0
1
Up to indistinguishability.
15
2.1.2 Useful estimates under (HL)
We use Hölder, BDG inequalities & Gronwall Lemma to get them.
• For all p ≥ 1,
" #
p
E sup |Xt | ≤ CT (1 + E[|X0 |p ]). (2.3)
t∈[0,T ]
• Time regularity:
" #
2
max E sup |Xt − Xti | ≤ C|ti+1 − ti |. (2.4)
i t∈[ti ,ti+1 ]
Stochastic flow Observe that we can define for all (s, x) ∈ [0, T ]×Rd the solution
to
Z t Z t
Xts,x = x + b(r, Xrs,x )dr + σ(r, Xrs,x )dWr , s ≤ t ≤ T
s s
and, by convention, Xts,x = x, t ∈ [0, s].
The flow of the SDE is the mapping (s, x, t) 7→ Xts,x .
It has several important properties:
• Let 1 ≤ p < ∞, There exists a constant Cp such that

" #1
p
E sup |Xt |p ≤ Cp (1 + |x|) . (2.5)

t∈[0,T ]
• Let 2 ≤ p < ∞. There exists a constant Cp such that for all (s, x, t) and
(s0 , x0 , t0 ),
h 0 0
i p p

E |Xts,x − Xts0 ,x |p ≤ Cp |x − x0 |p + (1 + |x|p ∨ |x0 |p )(|s − s0 | 2 + |t − t0 | 2 ) .
(2.6)
16
• The mapping (s, x) 7→ X s,x with values in Sc2 is continuous.
• Let x ∈ Rd , we observe, for all 0 ≤ r ≤ s ≤ t,
r,x
Xtr,x = Xts,Xs , P − a.s.
• Let (s, x) ∈ [0, T ] × Rd , (Xts,x )0≤t≤T is a Markov process and if f is a bounded
measurable function then for s ≤ r ≤ t
E[f (Xts,x )|Fr ] = Λ(r, Xrs,x ) P − a.s.
with Λ(r, y) = E[f (Xtr,y )].
17
2.1.3 Link with PDEs
• LX is the Dynkin operator associated to X
1 2
LX ϕ(t, x) = b(t, x)∂x ϕ(t, x) + Tr ∂xx ϕa (t, x) (2.7)
2
for ϕ ∈ C 2 , where a := σσ † .
Theorem 2.2. (Feynmann-Kac for parabolic PDEs - Cauchy condition) Assume
that (HL)(i) holds true, g ∈ Cp0 and that there is a u a Cp1,2 ([0, T )×Rd )∩Cp0 ([0, T ]×
Rd ) solution (classical solution) to the PDE:






∂t u + LX u = 0 on (t, x) ∈ [0, T ) × Rd
(2.8)




u(T, ·) = g(·)
then
h i
u(t, x) = E g(XTt,x )
,→ uniqueness result...
• In the sequel, we shall sometimes use the following assumption:
(HX∞) b and σ are Cb∞ and moreover, σ is uniformly elliptic:
υ † σσ † (x)υ ≥ |υ|2 > 0 (2.9)
for all υ ∈ Rd .
• This assumption allows to prove the smoothness of the function u (for t < T )
whatever the smoothness of the terminal condition.
18
2.1.4 Financial setting
...Pricing an european option in a “perfect” market!
• fixed interest rate r
• price process S solution to SDE (2.1) with bi = rxi i.e.
d
X
dSti = rSti dt + σ ij (St )dWtj
j=1
(we work directly under the risk neutral probability)
• The super-replication price is defined as
p(G) := inf{p ∈ R : ∃ φ ∈ Ab s.t. VTp,φ ≥ G} . (2.10)
G is the random payoff, Ab is an admissible set of self-financing strategies
(investment in risky asset), V p,φ is the portfolio value with initial value p and
following the strategy φ.
• In “perfect market”
p(G),φ∗
p(G) = EQ e−rT G = V0 (2.11)
for some optimal strategy φ∗ . (some conditions on G too)
,→ linear pricing rule in complete market
• When G = g(ST ) (vanilla option), one can show that
p(G) = u(0, S0 ) ,
where u is solution (in some sense) to the following parabolic PDE, see e.g.
19
[15]:





∂t u + LS u = ru on [0, T ) × Rd
(2.12)




u(·, T ) = g(·)
,→ Call:g(x) = [x − k]+ , Put:g(x) = [k − x]+ etc.
20
2.2 Euler Scheme for SDEs
To compute the option price, one can solve numerically the PDE or try to evaluate
the expectation, using e.g. Monte Carlo methods. We will indeed solve the PDE
but using probabilistic methods essentially, meaning that they are inspired by the
representation of the value function in terms of an expectation.
2.2.1 Definition and first properties
• Generally, it is not possible to obtain "exact" simulation of the solution of an
SDE.
• One has then to consider an approximation of X given by a time-discretisation
of the SDE.
• The simplest time discrete approximations of an SDE is the Euler approxima-
tion or the Euler-Maruyama approximation.
Definition of the Euler scheme
• We chose a discretisation/partition π of [0, T ]
π = {0 =: t0 < t1 < ... < ti < ... < tn := T }.
We denote hi := ti+1 − ti and h = maxi hi .
• most generally, we work with a constant time-step h = T /n.
Definition 2.1. An Euler approximation of the equation (2.1) associated with the
partition π is a discrete time process X π = {Xtπ , t ∈ π} satisfying the iterative
21
scheme

Xtπi+1 = Xtπi + b ti , Xtπi (ti+1 − ti ) + σ ti , Xtπi Wti+1 − Wti (2.13)
for i = 0, 1, ..., n − 1 and with initial value X0π = X0 .
22
Remark 2.1. .
(i) If σ ≡ 0 then (2.13) reduces to the deterministic Euler scheme for ODEs.
(ii) Wti+1 − Wti is a Gaussian random variable with zero mean and variance ti+1 −
ti = h. Hence in order to generate the increments Wti+1 −Wti of the Brownian
motion W we can use a sequence (Gi )i≥1 of independent Gaussian pseudo-
random numbers Gi ∼ N (0, h).
For later use, we introduce a "continuous" version of the Euler scheme.
Definition 2.2. We denote by {Xtπ , t ∈ [0, T ]}, the continuous Euler Scheme.
- At time ti belonging to π, we have

Xtπi+1 = Xtπi + b ti , Xtπi (ti+1 − ti ) + σ ti , Xtπi Wti+1 − Wti
- For ti < t < ti+1 , we set

Xtπ = Xtπi + b ti , Xtπi (t − ti ) + σ ti , Xtπi (Wt − Wti ) . (2.14)
• Associated differential operator (for later use in proofs):
1 2
L̄(s,z) ϕ(t, x) = b(s, z)∂x ϕ(t, x) + Tr ∂xx ϕ(t, x)a(s, z) (2.15)
2
Moment estimate
Proposition 2.1. Under the assumptions of Theorem 2.1, we have, for p ≥ 2

" #
E sup |Xtπ |p ≤ Cp (1 + |X0 |p )
0≤t≤T
where Cp does not depend on π.
Proof. Cf. Exercise II.4 2
23
2.2.2 Weak convergence for vanilla options
• The goal is to estimate the error
w := E[g(XTπ )] − E[g(XT )] (2.16)
• (Hr) g, b, σ and u are C 1,2 with Lipschitz derivatives.
Theorem 2.3. Under (Hr) and (HL),
|E[g(XTπ )] − E[g(XT )] | ≤ Ch . (2.17)
Proof. First, we note that, for t ∈ [ti , ti+1 ]
1 √
E |f (Xtπ ) − f (Xtπi )|q q ≤ Lf h (2.18)
where Lf is the Lipschitz constant of f . For later use we introduce
Li := 2
sup {|∂x u(t, ·)|∞ + |∂xx 3
u(t, ·)|∞ + |∂xxx u(t, ·)|∞ }
t∈[ti ,ti+1 ]
and by L the Lipschitz constant of b and a (to simplify in the proof σ is bounded)
We observe that
n−1
X h i
w = E u(ti+1 , Xtπi+1 ) − u(ti , Xtπi ) .
i=0
Applying Ito’s formula, we have
h i
E u(ti+1 , Xtπi+1 ) − u(ti , Xtπi ) =
Z ti+1
π π π 1 2 π 2 π
E ∂t u(t, Xt ) + b(Xti )∂x u(t, Xt ) + σ (Xti )∂xx u(t, Xt ) dt
ti 2
Using the PDE satisfied by u, we get
h i
E u(ti+1 , Xtπi+1 ) − u(ti , Xtπi ) = (2.19)
Z ti+1
1
E {b(Xtπi ) − b(Xtπ )}∂x u(t, Xtπ ) + {a(Xtπi ) − a(Xtπ )}∂xx
2
u(t, Xtπ ) dt.
ti 2
24
For the first term in the RHS we compute

|E {b(Xtπi ) − b(Xtπ )}∂x u(t, Xtπ ) | =

|E {b(Xtπi ) − b(Xtπ )}{∂x u(t, Xtπ ) − ∂x u(t, Xtπi )} + {b(Xtπi ) − b(Xtπ )}∂x u(t, Xtπi ) |
Z ti
i π π
≤ LL hi + E ∂x u(t, Xti ) L̄z=Xtπ b(Xs )ds
i
t
where we used (2.18) and Cauchy-Schwarz inequality to get the upper bound for the
first term in the RHS of the above inequality. For the second term, we apply Ito’s
Formula:
Z ti Z ti
π π π
π
E b(Xti ) − b(Xt )∂x u(t, Xti ) = E ∂x u(t, Xti ) L̄ z=Xtπ b(Xsπ )ds + ∂x b(Xsπ )σ(Xtπi )dWs
i
t t
Conditioning with respect to Fti cancels the strochastic integral term. Using the
assumptions on b, we get

|E {b(Xtπi ) − b(Xtπ )}∂x u(tti , Xtπi ) | ≤ CL hi .
We thus obtain

|E {b(Xtπi ) − b(Xtπ )}∂x u(t, Xtπ ) | ≤ CL hi .
For the second term in (2.19), we perform similar computations and obtain the same
upper bound. This leads to
n−1
X
|w | ≤ C Li h2i , (2.20)
i=0
≤ Ch . (2.21)
25
Extensions .
i) Error expansion with a lot of smoothness
Talay & Tubaro [73] have proved
Theorem 2.4. If u ∈ C ∞ ,
n
X
E[g(XTπ )] = E[g(XT )] + Ci hi + O(hn+1 )
i=1
,→ Generally one cannot beat order one with an Euler scheme.
,→ Possibility of Romberg-Richardson extrapolation method:

e.g. compute 2E g(XTh ) − E g(XT2h )
ii) Measurable terminal condition
• For the Euler Scheme with Brownian increments and with condition on the diffu-
sion coefficient, Bally & Talay [7] have proved
E[g(XTπ )] = E[g(XT )] + O(h)
for g measurable and bounded!
26
2.3 Implementation using Monte Carlo Methods
2.3.1 Quick review of the case without bias

• Assuming we know how to sample from ST , the price p0 = E e−rT g(ST ) < ∞
is approximated by
N
1 X −rT
p̂N
0 = e g(STj ) ,
N
j=1
where the STj are iid random variables with same law as ST .
,→ empirical mean.
• We observe that p̂N

0 is an unbiased estimator of p0 and importantly we have
the following result.
Theorem 2.5. (LLN) Since (g(STj ))j is a sequence of integrable and iid random
variables, then
p̂N
0 → p0 a.s.
• We need to assess the accuracy of the previous estimate. We can use the
following L2 -estimate:
Theorem 2.6. Assume that (g(STj ))j a sequence of square integrable and iid random
variables. We have that
N V ar(e−rT g(ST ))
Var p̂N
0 = E |p̂0 − p0 |
2
= .
N
27
• Remark: If X1 ∈ L2 , then
N
1 X rT
V̂0N := (e g(STj ) − p̂N
0 )
2
N −1
j=1

is an unbiased estimator of Var erT g(ST ) .
28
• We can also describe the distribution of p̂N
0 , at least asymptotically.
Theorem 2.7. (CLT) Assume that (g(STj ))j is a sequence of square integrable and
q
N V̂0N
iid random variables. We assume that Var[g(ST )] > 0 and set σ̂0 := N then
√ p̂N − p0
N 0 N 1σ̂N >0 → N (0, 1) in distribution.
σ̂0 0
• From this, we can deduce asymptotic confidence interval
Corollary 2.1. Under the assumption of Theorem 2.7, we have
√ p̂N − p0
P( N 0 N < z α2 ) → 1 − α
σ̂0
where for ∈ [0, 1], z denotes the 1 − quantile of the standard normal distribution
i.e. 1 − := P (G < z ) with G ∼ N (0, 1).
Figure 1: Probability density function of N (0, 1) and quantile
,→ This tells us that, for N large enough, the probability that
σ̂0N N σ̂0N
p0 ∈ [p̂N
0 − z α √
2
, p̂ 0 + z α √
2
]
N N
29
is closed to 1 − α.
• Using concentration inequalities, one can obtain non-asymptotic confidence inter-
val.
2.3.2 The case with bias
• BUT in practice, we do not know how to sample from ST and we use a dis-
cretisation scheme e.g. the Euler scheme STπ
• The simulation of the Euler scheme is quite straightforward as soon as one
knows how to simulate the Brownian increment!
• We then compute
N
1 X −rT
p̂π,N
0 := e g(STπ,j )
N
j=1
• The error is decomposed into
p̂π,N
0 − p0 = M C + w (2.22)

with M C := p̂π,N
0 − E e−rT g(STπ )

and w = E e−rT g(STπ ) − E e−rT g(ST )
,→ There is a balance to find between these two errors (see below)

• The study of the MC error can be done as previously as soon as E |g(STπ )|2 <
∞.
2.3.3 Convergence of Mean Square Error for MC method

• In estimating the quantity p0 = E e−rT g(ST ) , one faces a tradeoff between
reducing bias (w ) and reducing variance (S ).
30
• To take this into account, one tries to minimise the Mean Square Error.
Definition 2.3. For a given quantity α estimated by α̂, we set

M SE := E |α − α̂|2
We observe
M SE = |α − E[α̂] |2 + Var[α̂] = bias2 + V ariance
Indeed,

E |α − α̂|2 = E |α − E[α̂] |2 + 2E[(α − E[α̂])(E[α̂] − α̂)] + E |α̂ − E[α̂] |2
= |α − E[α̂] |2 + V ar(α̂) .
Optimising the computational effort
• Assuming weak convergence of order 1, we have
1
M SE ∼c h2 + .
N
• The computational effort C:
1. For one path, computational effort ∼c 1/h
2. there are N paths to simulate
,→ overall C ∼c N/h
• The goal is to minimise the MSE taking into account the computational cost:
c2 c3 N
min(c1 h2 + ) s.t. =C.
h,N N h
31
1
• This leads to, setting h2 ∼c N,
√ 1
M SE = O(C − 3 ) .
√
• In other words, to reach a precision M SE = O(), one has
C = O(−3 ) .
• Note: To give (asymptotic) confidence interval for p0 , one needs to make the
discretisation error negligeable with respect to the stat error and to chose then
N ∼c h−2−η for η small.
32
2.4 Implementation using quantisation of Brownian increments
Discretisation of the brownian increment (d = 1 for ease of presentation).
ci stands for a discrete approximation of ∆Wi := Wt − Wt

• ∆W i+1 i




 b π = X0
X ti+1
(2.23)



 bπ = Xb π + b(X
b π )hi + σ(X
b π )∆W
ci , 0 ≤ i < n
X ti+1 ti ti ti
• Matching moment property up to order M : for al k ≥ M ,
h i h i
ci )k = E (∆Wi )k , .
E (∆W (2.24)
Proposition 2.2. Assume that g and u(t, ·) is Cb4 (with bounds uniform in time) and
h i
c has the matching moment property up to order M = 3 and E (∆W
that ∆W ci )4 =
O(|π|2 ) then
h i
bTπ ) − E[g(XT )] | ≤ C|π| .
ˆw := |E g(X (2.25)
Example 2.1. Two-point discretisation
p
ci = ± hi ) = 1 .
P(∆W
2
One observes that
h i h i h i
ci )2 = E (∆Wi )2 = hi and E (∆W
E (∆W ci )2k+1 = E (∆Wi )2k+1 = 0 , (2.26)
for all k ≥ 0, by symmetry.
Proof. 1. We drop the π in the proof and denote the Euler scheme with increment
∆W started from ξ at time t by X̄ t,ξ . In particular, we shall use:
t ,X
i
bt
i bt + ∆X̄i with ∆X̄i := b̂i hi + σ̂i ∆Wi .
X̄ti+1 =X i
33
We observe that
n−1
X h i
ˆw = bt ) − u(ti , X
E u(ti+1 , X bt )
i+1 i
i=0
n−1
X
b ti ,X
bt ti ,X
bt
b
= E u(ti+1 , Xti+1 ) − u(ti+1 , X̄ti+1 ) + E u(ti+1 , X̄ti+1 ) − u(ti , Xti )
i i
i=0
The second term has already been studied in the proof of Theorem 2.3. We just have
to study the first one. We proceed by doing an expansion of the smooth function
u(ti+1 , ·) in terms of the increments of both Euler scheme. In what follows, we
perform this for a generic Cb4 function ϕ.
bt + λ∆X
Introducing λ 7→ ϕ(X bi ), and performing a Taylor expansion, we compute
i
3
X b i )k Z 1 3
b t + ∆X
ϕ(X bi ) = ϕ(k) b
(Xti )
(∆X
+ bt + λ∆X
ϕ(4) (X bi )4 (1 − λ) dλ ,
bi )(∆X
i i
k! 0 6
k=0
and similarly
3
X Z 1
t ,X bt ) (∆X̄i )k bt + λ∆X̄i )(∆X̄i )4 (1 − λ)3
ϕ(k) (X ϕ(4) (X
i
bt
i
ϕ(X̄ti+1 )= i + i dλ .
k! 0 6
k=0
Due to the matching moment property and boundedness properties, we get
h i
b ti ,X
E ϕ(Xti+1 ) = E ϕ(X̄ti+1 ) + O(h2i ) .
bt
i
The proof is then concluded observing that, from the previous expansion,

b ti ,X
E u(ti+1 , Xti+1 ) − u(ti+1 , X̄ti+1 ) = O(h2i )
bt
i
Tree methods
• When using discrete random variables for the increment, it is possible to cal-
culate the approximation on a tree
,→ the two-point approximation used in Example 2.1 leads to a binomial tree
34
• There is no statistical error but computation becomes rapidly untractable (un-
less special properties are used as recombination)
Link with finite difference scheme
• say b = 0, σ = cste : (2.8) ,→ backward heat equation
• Tree approximation gives for time ti starting at x
h i
ūi (x) = E ūi+1 (X̂tti+1
i ,x
)
1 √ 1 √
= ūi+1 (x + σ h) + ūi+1 (x − σ h)
2 2
where ui (·) (resp. ui+1 (·)) stands for the approximation of uti (·) (resp. uti+1 (·))
• Rearranging the terms, we obtain
ūi (x) − ūi+1 (x) 1 √ √

= ūi+1 (x + σ h) + ūi+1 (x − σ h) − 2ūi+1 (x)
h 2h
which corresponds to the (backward explicit) finite different scheme
ūi (x) − ūi+1 (x) σ2

= 2 ūi+1 (x + δ) + ūi+1 (x − δ) − 2ūi+1 (x)
h 2δ
√ σ2 h
with space discretisation δ = σ h (satisfying the stability condition 2δ 2
≤ 12 ).
35
Figure 2: MC simulation for a put (gauss inc)
36
Figure 3: MC simulation for a digital (gauss inc)
Figure 4: MC simulation for a digital (disc inc)
37
Figure 5: Bias for a digital (disc inc) - no variance
JUMP
38
2.5 Strong convergence
2.5.1 Lipschitz case
Proposition 2.3. Assume that b and σ are Lipschitz continuous in t and x, then
" #
E sup |Xt − Xtπ |2 ≤ C|π| .
t∈[0,T ]
Proof. 1. Stability: Let δXt := Xt − Xtπ . We have, for 0 ≤ u ≤ T
Z u Z u

δXu = b(Xs̄ ) − b(Xs̄π ) ds + σ(Xs̄ ) − σ(Xs̄π ) dWs + T (u) (2.27)
0 0
with the truncation error T (u) given by
Z u Z u

T (u) = b(Xs ) − b(Xs̄ ) ds + σ(Xs ) − σ(Xs̄ ) dWs . (2.28)
0 0
We then compute
Z t Z u
2

E sup |δXu | ≤ C(E |b(Xs̄ ) − b(Xs̄π )|2 ds + sup | σ(Xs̄ ) − σ(Xs̄π ) 2 2
dWs | + sup |T (u)| .
u≤t 0 u≤t 0 u≤t
Applying BDG inequality for the stoc. int. term and then using the Lipschitz
property of b and σ, we get
Z t
2 2 2
E sup |δXu | ≤ C(E |δXs̄ | ds + sup |T (u)| ) , (2.29)
u≤t 0 u≤t
leading to
"Z #
t
E sup |δXu |2 ≤ C(E sup |δXu |2 ds + sup |T (u)|2 ) . (2.30)
u≤t 0 u≤s u≤T
Applying Gronwall’s Lemma, we obtain

" # " #
E sup |δXu |2 ≤ CE sup |T (u)|2 . (2.31)
u≤T u≤T
39
This step shows that the global error is controlled by the (global) truncation error!
2. Study of the truncation error:
We compute, recalling (2.28),
Z T Z u
2 2
sup |T (u)| ≤ C( b(Xs ) − b(Xs̄ ) ds + sup | σ(Xs ) − σ(Xs̄ ) dWs |2 ) .
u≤T 0 u≤T 0
(2.32)
Taking the expectation and applying BDG inequality, we obtain

" # Z
T
2 2 2
E sup |T (u)| ≤ CE ( b(Xs ) − b(Xs̄ ) + |σ(Xs ) − σ(Xs̄ )| )ds , (2.33)
u≤T 0
which leads, since b, σ are Lipschitz, to

" # Z
T
2 2
E sup |T (u)| ≤ CE Xs − Xs̄ ds .
u≤T 0
The proof is concluded using (2.4).
• The result above yields a naturally a control on the strong approximation error at
each time step, with the same order.
• Let assume a uniform grid π with h = T /n. We now report the strong error
between the X process and its piecewise constant Euler scheme.

" #1 r
p
T (1 + log(n))
E max sup |Xt − Xtπi |p ≤ Cp (1 + |X0 |) (2.34)
i t∈[ti ,ti+1 ) n
For a proof see Theorem 7.2 in [63].
Remark 2.2. Obviously, the same control holds true with Xti instead of Xtπi . In
fact it is sharp for both, taking the specific case of X = W , see [63], Chapter 7.
40
2.5.2 Non globally Lipschitz case
• Existence and uniqueness results are more complicated to obtain when the coeffi-
cients are no longer Lipschitz continuous.
• We consider below SDEs strong uniqueness & existence result: see the book by
Khasminskii [50] for a Lyapounov function approach or the book by Alfonsi [3] for
affine processes, in particular the CIR.
Explicit Euler scheme diverges Explicit Euler Scheme for SDEs with super-
linear coefficient will diverge. [48].
• Proof for a simple example:
dXt = −Xt3 dt + dWt and X0 = 0
Existence and uniqueness comes from Theorem 2.4.1 in [50]. It is also the case that
E[|Xt |p ] < ∞, for all p ∈ [1, ∞).
• Euler scheme is, as usual, on a uniform time grid π with n time steps, h = T /n,
Xtπi+1 = Xtπi − (Xtπi )3 h + ∆Wi and X0π = X0 = 0. (2.35)
• The main observation is that
lim E[|XTπ |] = +∞. (2.36)

n→∞
Proof. The proof is based on identifying an event with exponential small probability
but with doubly exponential weight. Let
Ωn := {sup |∆Wi | ≤ 1 and |Wt1 | ≥ rn }

i≥1
41
where rn = max( h3 , 2).
i−1
1. By induction we show that, |Xtπi | ≥ rn2 on Ωn .
This is true for i = 1 by definition of Ωn and since X0π = 0.
Now assume the property holds true for i < n. Observing that |a − (b + c)| ≥
|a| − |b + c| ≥ |a| − |b| − |c|, we get, using the scheme definition (2.35),
|Xtπi+1 | ≥ h|Xtπi |3 − |Xtπi | − 1 ,
≥ h|Xtπi |3 − 2|Xtπi |2 ,
where we used the fact that |Xtπi | ≥ 1 for the last inequality. We then compute
i
|Xtπi+1 | ≥ |Xtπi |2 (hrn − 2) ≥ (rn )2 .
2. We now compute
P(Ωn ) ≥ P(sup |∆Wi | ≤ 1)P(|Wh | ≥ rn )

i≥1
1
Observing that supt∈[0,T ] |Wt | ≤ 2 implies that |Wt − Ws | ≤ 1 for any (s, t) ∈ [0, T ]2 ,
we get
1
P(Ωn ) ≥ P( sup |Wt | ≤ )P(|Wh | ≥ rn )
t∈[0,T ] 2
rn
≥ cP(|W1 | ≥ √ ) for c > 0
h
Using the inequality
2
xe−x
P(|W1 | ≥ x) ≥ (2.37)
4
see Exercice II.5, we then obtain
(rn )2
P(Ωn ) ≥ ce− h
42
3. Combining step 1. with the above inequality, we conclude
n−1 n3
E[XTπ ] ≥ c22 e9 T 3 → +∞ when n → ∞
• Note that we have then, for p ≥ 1,
lim E[|XTπ − XT |p ] = +∞.

n→∞
Proof. By Jensen inequality, we have
E[|XTπ − XT |p ] ≥ E[|XTπ − XT |]p ≥ (E[|XTπ |] − E[|XT |])p
moreover E[|XT |] < ∞ so that the result comes from a direct application of (2.36).
• The proof above can be extended to more general situation see Theorem 1 in [48].
Namely, let
dXt = µ(Xt )dt + σ(Xt )dWt and X0 = ξ
which is assumed to have a strong solution. Denote by X π its classical (explicit!)
Euler scheme.
Assume moreover, P(σ(ξ) 6= 0) > 0 and that for some constant C ≥ 1, β > α > 1,
|x| ≥ C,
|x|β
max(|µ(x)|, |σ(x)|) ≥ and min(|µ(x)|, |σ(x)|) ≤ C|x|α
C
then
lim E[|XTπ |] = +∞ .
n→∞
43
• Depending on the SDEs various solutions can be proposed, among them the use
of tamed explicit scheme.
• Typical example of symptomatic financial models are short rate models, as Aït-
Sahalia ones see [2]:

a−1
dXt = − a0 + a1 Xt − a2 Xt dt + γXtρ dWt and X0 > 0 ,
%
Xt
ai ≥ 0, ρ, % > 1.
44
2.6 Path-dependent options
2.6.1 General consideration
• For some measurable functional F : C → R and some SDE solution X we would
like to evaluate
p0 := E[F (Xt , 0 ≤ t ≤ T )] assuming it is finite.
This is done by introducing a discrete version of X π and possibly an approximation
F ρ of F .
• In finance, typical functionals are
1. Asian option like:

Z T
F ((xt )t∈[0,T ] ) := f ( xt dt)
0
2. Lookback option in one dimension
F ((xt )t∈[0,T ] ) := f (xT , sup xt , inf xt )

t∈[0,T ] t∈[0,T ]
3. Barrier option
F ((xt )t∈[0,T ] ) := f (xT )1τD (x)>T
where τD is the exit time of the open domain D, namely
τD (x) := inf{t ∈ [0, T ] | xt ∈

/ D}
,→ If F is Lipschitz continuous for the sup-norm, namely
∀x, y ∈ C, |F (x) − F (y)| ≤ L max |xt − yt |

t∈[0,T ]
45
for some positive constant L.
• We consider the piecewise linear version of the Euler scheme:
t − ti
Xt`,π = Xtπi + (X π − Xtπi )
ti+1 − ti ti+1
which is non adapted... but computable for any t ∈ [0, T ]. Then,

r
h i 1 + log(n)
|E F (X· ) − F (X·`,π ) | ≤ C
n
• The continuous version of the Euler scheme X π can be used if F depends only on
quantities of the path that can be simulated (see e.g. next section). Then, we have,
1
|E[F (X· ) − F (X·π )] | ≤ CL √
n
• The lower bound in this case is given by the weak error analysis and is of order
2
1
n. The best known upper bound [4] is C n− 3 + in dimension one under an uniform
ellipticity assumption..
Example: Weak convergence for barrier options (1d case) See [35] (or 2019
lecture notes)
46
2.7 Acceleration methods
• When there is no bias: one should implement Variance reduction methods (see
e.g. )
2.7.1 Reducing the bias
• If one uses a weak order α scheme, the complexity is then
2α+1
C = O(− α ),
√
for a M SE = O().
• Recall that, if there is no bias, C ∼c −2 ...
• To obtain a smaller biais, one can
1. use a high order scheme (for weak convergence) see below;
2. use a Romberg-Richardson extrapolation method.
47
2.7.2 Multi-Level Monte Carlo
Introduced in [32].
We want to approximate
p = E[G] where G = g(XT ) ,
for some Lipschitz function g and X is solution to SDE of type (2.1).
• we consider a collection of grid π ` with step size
T
h` = , ` = 0, 1, . . . , L .
2`
• We denote by
`
G` = g(XTπ )
the approximation of G using (say) an Euler scheme for X on each grid π l .
,→ The classical method simply estimates E[GL ].
• We observe
L
X
E[GL ] = E[G0 ] + E[G` − G`−1 ]
`=1
,→ the MLMC method independently estimates each of the expectations on the
RHS in a way which minimises the overall variance for a given computational
cost.
• Y0 is an estimator for E[G0 ] using N0 sample paths, and Y l for E[G` − G`−1 ]
using Nl sample paths. We consider



 P 0 j


Y 0 = N10 N j=1 G0 ,
.

 PN `


Y ` = 1
N` j=1 Gj` − Gj`−1 , 1 ≤ ` ≤ L
We use the same Brownian motion to get G` and G`−1 !
48
PL
• We observe, setting Y = Y 0 + `=1 Y
` (main estimator)
h i V`
L
X
Var Y ` = and Var[Y ] = N`−1 V`
N`
`=0
,→ Hopefully, V` will be smaller on finer level...
• The total computational cost is proportional to
L
X
N` h−1
` .
`=0
• The general result is as follows [32]
Theorem 2.8. In the setting presented above, assume, with c1 , c2 , c3 , α, β, γ positive
1
constants s.t. α ≥ 2 min(γ, β), that
(i) |E[G` − G] | ≤ c1 2−α` ,
(ii) Var[Y` ] ≤ c2 N`−1 2−β` ,
(iii) C` ≤ c3 N` 2γ` , where C` is the computational complexity of Y` ,
then there exists c4 > 0, such that there are values L and N` s.t. M SE :=

E |Y − E[G] |2 < 2 for < 1/e with






 c4 −2 , β>γ




C ≤ c4 −2 (log(1/))2 , β=γ









c4 −2−(γ−β)/α , 0<β<γ
• Precisely, assuming an Euler approximation of order 1 in the weak sense and
1
2 in the strong sense, we have
Proposition 2.4. For > 0, setting
N` ∼c −2 Lh` and L ∼c log2 −1 ,
49
we obtain
E[GL − G] = O()
Var[Y ] = O(2 )
M SE = O(2 )
C = O(−2 (log )2 ).
,→ To compare to a cost of O(−3 ) in the standard method!
50
Proof. We first observe that
L
!
X h`
M SE ≤ C + h2L (2.38)
N`
`=0
and we have the constraint on the overall computational effort (C)
L
X N`
=C.
h`
`=0
We minimise the RHS of (2.38) with L being fixed for now. Observing that the
problem is similar to
L
X L
X
1
min s.t. x` = 1
x`
`=0 `=0
1
whose solution is simply x` = L+1 , we compute that
h` C (L + 1)2
N` = and M SE = + h2L . (2.39)
L+1 C
To reach the M SE ≤ C2 , we set
log(1/)
L= and C ∼c −2 (log(1/))2 . (2.40)
log(2)
Remark 2.3. It is possible to combine Romberg-Richardson extrapolation method
and MLMC method, see Lemaire & Pagès [55].
51
Figure 6: MLMC for European option (M=4), see [32]
0 0
−2
−2
−4
log variance
log |mean|
−4
−6
M
M
−6
−8
P
l
−8 P P− P
l −10 l l−1
P− P Y −Y /M
l l−1 l l−1
−10 −12
0 1 2 3 4 0 1 2 3 4
l l
10 1
10 10
ε=0.00005
ε=0.0001
8 ε=0.0002
10 10
0
ε=0.0005
ε=0.001
ε2 Cost
Nl
6
10 −1
10
Std MC
4 Std MC ext
10
−2 MLMC
10 MLMC ext
0 1 2 3 4 −4 −3
10 10
l ε
Figure 2: Geometric Brownian motion with European option (value ≈ 0.10).
approximately −1 again implies an O(h) convergence of E[P!l − P!l−1 ]. Even at

l = 3, the relative error E[P − P!l ]/E[P ] is less than 10−3 . Also plotted is a line
for the multilevel method with Richardson extrapolation, showing significantly
faster weak convergence.
The bottom two plots have results from two sets of multilevel calculations,
with and without Richardson extrapolation, for five different values of !. Each
line in the bottom left plot corresponds to one multilevel calculation and shows
the values for Nl , l = 0, . . . , L, with the values decreasing with l because of
the decrease in both Vl and hl . It can also be seen that the value for L, the
maximum level of timestep refinement, increases as the value for ! decreases.
52
The bottom right plot shows the variation of the computational complexity
C (as defined in the previous section) with the desired accuracy !. The plot
is of !2 C versus !, because we expect to see that !2 C is only very weakly
16
2.8 High order schemes
See the book [51]. (and 2019 lecture notes)
2.8.1 High order weak convergence
An example of order 2 weak Taylor scheme To get high order convergence
scheme, one has to add more term in the expansion of the coefficient.
Example 2.2. (autonomous 1-dimensional case) On a discrete time grid π, we
define X π by
• X0π = X0
• For 0 ≤ i ≤ n − 1,
1
Xtπi+1 = Xtπi + b(Xtπi )hi + σ(Xtπi )∆Ŵi + σ(Xtπi )σ 0 (Xtπi ) (∆Ŵi )2 − hi
2
1 1 00
+ b0 (Xtπi )σ(Xtπi ) + b(Xtπi )σ 0 (Xtπi ) + σ (Xtπi )σ 2 (Xtπi ) ∆Ŵi hi
2 2
1 0 π 1
bb (Xti ) + b00 σ 2 (Xtπi ) h2i
2 2
where ∆Ŵi is a three-point distributed random variable with distribution
p 1 2
P(∆Ŵi = ± 3hi ) = , P(∆Ŵi = 0) = .
6 3
53
3 Computing sensitivities in the linear case
3.1 Finite-Difference Approximations
• We focus on the Delta but this approach can be applied to other "greeks"
(Gamma, Vega etc.)
h i
• The price of the european option is given by u(t, x) = E g(XTt,x ) where X is
solution to an SDE of type (2.1) (with b = 0, d = 1), its Delta is ∂x u(t, x).
• To approximate ∂x u(0, x), we use finite difference.
1. Forward difference: for > 0
u(0, x + ) − u(0, x)
∂x u(0, x) ≈ ∆F () :=

Remark: one can consider also backward difference. Same properties here.
2. Central difference: for > 0
u(0, x + ) − u(0, x − )
∂x u(0, x) ≈ ∆C () :=
2
• Estimator (for the central difference): let Xix− , Xix+ be iid samples of XT0,x−
and XT0,x+ , we define then
N
1 X
ˆ N () :=
∆ C g(Xix+ ) − g(Xix− )
2N
i=1
3.1.1 Bias
Proposition 3.1. Assuming smoothness of the value function, we have
∂x u(0, x) = ∆F () + O() and ∂x u(0, x) = ∆C () + O(2 )
54
Proof. For the central difference: If u(0, .) is C 3 , we simply observe
1 2
u(0, x + ) = u(0, x) + ∂x u(0, x) + ∂xx u(0, x)2 + O(3 )
2
1 2
u(0, x − ) = u(0, x) − ∂x u(0, x) + ∂xx u(0, x)2 + O(3 )
2
55
3.1.2 Variance
ˆ N.
We focus on ∆ C
(i) Using independent sets of random numbers for Xix− and Xix+ :
0,x+
ˆN 1 V ar(g(XT )) + V ar(g(XT0,x− ))
V ar(∆ C ()) =
N 2 4
Simply observe that
ˆ N ()) = 1
V ar(∆ C (V ar(p̂N (x + )) + V ar(p̂N (x − )))
42
(ii) Using the same set of random numbers for Xix− and Xix+ and assuming
g, b and σ are C 1 :
ˆ N ()) ' 1 V ar(g 0 (X 0,x )∇X x )

V ar(∆ C T T
N
Simply observe that
ˆN 1 g(XTx+ ) − g(XTx− )
V ar(∆C ()) = V ar
N
56
3.2 Tangent process approach
3.2.1 Tangent process
We denote X x := X 0,x , where X is solution to an SDE of type (2.1) .
Theorem 3.1. If b and σ are Cb2 2 then for all t ∈ [0, T ], the mapping x 7→ Xtx is
a.s. C 1 . Its derivative denoted ∇X x is solution to the following SDE:
Z t d Z
X t
∇Xtx = Id + ∇b(Xsx )∇Xsx ds + ∇σ .j (Xsx )∇Xsx dWsj .
0 j=1 0
Precisely, we have (up to a modification) that a.s.
Xtx+ − Xtx
lim = ∇Xtx ,
→0
and x 7→ ∇X x is continuous.
Lemma 3.1. Under the assumptions of the previous theorem, we have

" #
E sup |∇Xt |p ≤ Cp .
t≤T
Proof of Theorem 3.1. We only prove:

" #
X x+ − X x
lim E sup |∇Xux − u u 2
| + |∇Xux+ − ∇Xux |2 = 0 (3.1)
→0 u≤T
(d = 1)
To obtain the a.s. properties claimed in the statement of the Theorem, one has to
use the Kolmogorov continuity criterion, see e.g. Chapter 1, Theorem 1.8 in [70].
X x+ −X x
1. We define ∆ X := X x+ − X x , δ X := and compute
Z t Z t
∆ Xt = + b̃u ∆ Xu dWu + σ̃u ∆ Xu dWu ,
0 0
2
b, σ Cb1 with α-Holder continuous derivatives for α > 0 is sufficient, see [53].
57
R1 R1
where we set σ̃u = 0 σ 0 (Xux+ + λ∆ Xu )dλ and b̃u = 0 b0 (Xux+ + λ∆ Xu )dλ.
Since σ and b have bounded derivatives, we observe that σ̃ and b̃ are bounded. We
compute, for p ≥ 2
Z s Z s
p p−1 p
E sup |∆ Xs | ≤ 3 ( + E sup | b̃u ∆ Xu du|p + E sup | σ̃u ∆ Xu dWu |p )
s≤t s≤t 0 s≤t 0
Z t Z s
p p
≤ C( + E sup |∆ Xs | du + E sup | σ̃u ∆ Xu dWu |p )
0 s≤u s≤t 0
Then, using BDG inequality, we have

Z t
p p p
E sup |∆ Xs | ≤ C( + E sup |∆ Xs | du) .
s≤t 0 s≤u

E sup |∆ Xs | ≤ Cp .
p
(3.2)
s≤t
2. Next, we compute
Z t Z t
∇Xt − δ Xt = b̃u (∇Xu − δ Xu )du + σ̃u (∇Xu − δ Xu )dWu + Rt
0 0
with
Z t Z t
Rt = ∇Xu (b0 (Xux ) − b̃u )du + ∇Xu (σ 0 (Xux ) − σ̃u )dWu .
0 0
Using the same argument as in the previous step we obtain,

" # " #
E sup |∇Xt − δ Xt |2 ≤ CE sup |Rt |2 .
t≤T t≤T
And we compute
" # Z
T
2 0 0
E sup |Rt | ≤ CE |∇Xu (b (Xux ) − b̃u )|2 + |∇Xu (σ (Xux ) − σ̃u )|2 du .
t≤T 0
Using CS inequality, we get

Z T Z T 21 Z T 12
0 0
E |∇Xu (b (Xux ) − b̃u )|2 du ≤ CE |∇Xu | du E 4
|b (Xux ) − b̃u |4 du .
0 0 0
58
We observe that
Z 1
0
|b (Xux ) − b̃u | =| (b0 (Xux ) − b0 (Xux+ + λ∆ Xu ))dλ| ≤ C|∆ Xu |,
0
leading to
Z " #1
T 2
0
E |∇Xu (b (Xux ) − b̃u )|2 du ≤ CE sup |∆ Xu | 4
,
0 u≤T
where we used Lemma 3.1. Similarly, we compute
Z " #1
T 2
E |∇Xu (σ 0 (Xux ) − σ̃u )|2 du ≤ CE sup |∆ Xu |4 .

0 u≤T
We then get, using step 1. that

" #
E sup |Rt |2 ≤ C2
t≤T
and the proof is concluded letting ↓ 0 .
3. For ease of notations, we set b = 0. We have then
Z t
∇Xtx =x+ σ 0 (Xsx )∇Xsx dWs ,
0
Z t
∇Xtx+ = x + + σ 0 (Xsx+ )∇Xsx+ dWs .
0
Setting Γ = ∇X x+ − ∇X x , we get
Z u Z u
0

Γu =+ σ (Xsx )Γs dWs + σ 0 (Xsx ) − σ 0 (Xsx+ ) ∇Xsx+ dWs .
0 0
Squaring the previous equality, we get
Z u Z u
0

sup |Γu |2 2
≤ C( + sup | σ (Xsx )Γs dWs |2 + sup | σ 0 (Xsx ) − σ 0 (Xsx+ ) ∇Xsx+ dWs |2 )
u≤t u≤t 0 u 0
Taking the expectation and using BDG inequality, we compute
Z t Z t
2 2 2 x x+ 2 x+ 2
E sup |Γu | ≤ C( + E |Γs | ds + |Xs − Xs | |∇Xs | ds )
u≤t 0 0
59
Using CS inequality, we get
Z t " #1
2
E sup |Γu |2 ≤ C(2 + E sup |Γu |2 ds + CE sup |Xsx − Xsx+ |4 )

u≤t 0 u≤s s∈[0,T ]
Combining Gronwall Lemma with (3.2), we obtain

" #
E sup |∇Xux+ − ∇Xux |2 ≤ C2 . (3.3)
u≤T
Example 3.1. Set d = 1. We have
Z t Z t
0
∇Xtx =1+ b (Xsx )∇Xsx ds + σ 0 (Xsx )∇Xsx dWs .
0 0
leading to
0 x 2
0 (X x )− σ (Xt ) )ds+ t
Rt
σ 0 (Xtx )dWs
R
∇Xtx = e 0 (b s 2 0 .
Example 3.2. Set d = 1. Give the expression of the tangent process in the BS
model.
60
3.2.2 Computing the delta (pathwise approach)
Proposition 3.2. Assume that g is Cb1 and b and σ are Cb2 , then u is Cb1 and
∂x u(0, x) = E[∇g(XTx )∇XTx ] .
Proof. For the proof, we assume that g is Cb2 . We are going to compute the limit
of ∆F () for −→ 0.
1. We have that
Z 1
1 x+ x
0 x ∆ XT
∆F () = E g(XT ) − g(XT ) = E g (XT + λ∆ XT ) dλ
0
We then compute
Z 1
0 ∆ XT 0 0
∆ XT
|E[∇g(XTx )∇XTx ] x
− ∆F ()| = |E g (XT ) ∇XT −x
− x x
g (XT + λ∆ XT ) − g (XT ) dλ |
0

x ∆ XT |∆ XT |2
≤ CE |∇XT − | + CE

∆ XT
≤ C(E |∇XTx − | + )

using (3.2). Using (3.1), we then obtain lim→0 ∆F () = E[∇g(XTx )∇XTx ], proving
the differentiability f u.
2. We now prove that ∂x u is continuous. We compute

|∂x u(0, x + ) − ∂x u(0, x)| = |E g 0 (XTx+ )∇XTx+ − g 0 (XTx )∇XTx |

≤ |E g 0 (XTx+ )(∇XTx+ − ∇XT2 ) | + |E (g 0 (XTx+ ) − g 0 (XTx ))∇XT2 |
1
≤ CE |∇XTx+ − ∇XTx | + CE |XTx+ − XTx |2 2 ,
where for the last term we used CS inequality and Lemma 3.1. Now using (3.3)-(3.2),
we have
|∂x u(0, x + ) − ∂x u(0, x)| ≤ C .
61
Letting goes to 0 concludes the proof. 2
62
3.2.3 Practical implementation
It requires the discretization of (X x , ∇X x ). One may use the usual Euler scheme,
setting:
- (X0π , ∇X0π ) = (x, 1)
- and on π:
d
X
Xtπi+1 = Xtπi + b(Xtπi )(ti+1 − ti ) + σ .j (Xtπi )(Wtji+1 − Wtji )
j=1
d
X
∇Xtπi+1 = ∇Xtπi + ∇b(Xtπi )∇Xtπi (ti+1 − ti ) + ∇σ .j (Xtπi )∇Xtπi (Wtji+1 − Wtji )
j=1
Remark: simplification if d = 1.
63
3.3 Greek weights
• We are interested in computing the sensitivity of
h i
u(θ) = E g(XTθ )
with respect to the parameter θ at θ = θ0 .
Example 3.3. (i) If θ = x, and x is the starting point of X then we are computing
the ’Delta’.
(ii) If X follows a BS model and θ = σ the volatility of X then we are computing
the ’vega’.
• We are interested in finding random variable Π s.t.
h i
∂θ u(θ0 ) = E g(X θ0 )Π
for a large class of payoff function g.
• Precisely, we consider
h i
2 θ0
W := {Π ∈ L | ∂θ u(θ0 ) = E g(X )Π , for all bounded measurable function g}
• Often in practice, this will lead to
h i
∂θ u(θ0 ) = E g(X θ0 )Π + E()
where Π is an approximation of such greek weight and E() is the associated
error.
64
3.3.1 Likelihood Ratio Method
• Assuming that XTθ has a differentiable positive probability density φ(θ, x):
h i h i
∂θ E g(XTθ ) = E g(XTθ0 )s(θ0 , XTθ0 ) with s(θ, x) = ∂θ ln(φ(θ, x)) .
θ=θ0
s is called the score function.
We compute
h i Z
∂θ E g(XTθ ) = ∂θ g(x)φ(θ, x)dx
Z Z
∂θ φ(θ, x)
= g(x)∂θ φ(θ, x)dx = g(x) φ(θ, x)dx
φ(θ, x)
• If S 0 := s(0, XT0 ) ∈ L2 , then

W := {Π ∈ L2 | E Π | XT0 = S 0 }
• We observe also that V ar[g(XT0 )Π] ≥ V ar[g(XT0 )S 0 ]

V ar[g(XT0 )Π] = E (g(XT0 )Π)2 − (∂θ u)2

= E E (g(XT0 )Π)2 |XT0 − (∂θ u)2

≥ E (E g(XT0 )Π|XT0 )2 − (∂θ u)2
= V ar[g(XT0 )S 0 ]
65
Example: Black Scholes delta In the Black-Scholes setting, we have

−rT x WT
∂x u(0, x) = e E g(Xt ) .
xσT
The density h of XTx is lognormal and given by
1 log(y/x) − (r − 21 σ 2 )T
h(y) = √ ϕ(ζ(y)), ζ(y) = √
yσ T σ T
And we compute
log(y/x) − (r − 12 σ 2 )T
∂x h(y)/h(y) = −ζ(y)∂x ζ(y) =
xσ 2 T
Similar computations lead to the following form for the vega,
2
−rT x WT 1
∂σ u(0, x) = e E g(Xt ) − WT − .
σT σ
Higher order derivatives Assuming that XTθ has a twice differentiable positive
probability density φ(θ, x):
h i 2 θ
2 θ ∂θθ φ(θ, XT )
∂θθ E g(XTθ ) = E g(XT ) .
φ(θ, XTθ )
Example 3.4. (Black Scholes Gamma)
2
2 −rT 1 x WT 1
∂xx u(0, x) =e E g(Xt ) − WT − .
x2 σT σT σ
66
3.3.2 Integration by part
• We consider the delta in the Black Scholes setting (r = 0), the payoff function
is Cb1 .
• From the tangent process approach, we know that:
1 0 x x
∂x u(0, x) = E g (XT )XT .
x
• Using an IBP argument, we compute
1
∂x u(0, x) = E[g(XTx )WT ] .
xσT
We have
Z √ √
2 T /2+σ 2 T /2+σ
∂x u(0, x) = g 0 (xe−σ Ty
)e−σ Ty
ϕ(y)dy
which rewrites
Z √
1 2 T /2+σ
∂x u(0, x) = √ ∂y [g(xe−σ Ty
)]ϕ(y)dy
xσ T
Z √
1 2 T /2+σ
= √ g(xe−σ Ty
)yϕ(y)dy
xσ T
67
3.3.3 Bismut’s formula
Theorem 3.2.
Z T
x 1 x −1 x
∂x u(0, x) = E g(XT ) σ(Xs ) ∇Xs dWs . (3.4)
T 0
1
RT
(Here we have Π = T 0 σ(Xsx )−1 ∇Xsx dWs ).
Proof. We observe that for 0 ≤ s ≤ T ,
∂x u(0, x) = E[∂x u(s, Xsx )∇Xsx ]
and then
Z T
1
∂x u(0, x) = E[∂x u(s, Xsx )∇Xsx ] ds
T 0
Z T Z T Z T
E ∂x u(s, Xsx )∇Xsx ds =E ∂x u(s, Xsx )σ(Xsx )dWs σ(Xsx )−1 ∇Xsx dWs
0 0 0
And via the martingale representation theorem, since u is solution to the pde,
Z T
∂x u(s, Xsx )σ(Xsx )dWs = g(XTx ) − u(0, x)
0
Remark 3.1. If Xt := x + bt + σWt then ∇Xt = 1 and we have

Wt
∂x E[φ(Xt )] = E φ(Xt ) . (3.5)
σt
68
4 U.S. options in complete market
4.1 Definition and first properties
• An option that can be exercised at anytime until its maturity T > 0.
• Setting: Interest rate r constant, X solution on [0, T ], to

Z t Z t
Xt = X0 + rXs ds + σ(s, Xs )dWs
0 0
W is a standard BM under the risk neutral probability, the market is complete
(σ is invertible).
Upon exercise, the option pays to the owner g(Xt ) (g exercise payoff, Lipstchitz
continuous).
• Super-hedging price:
y,φ
pus
0 := inf{y ∈ R | ∃ φ; admissible financial strategy s.t. Vt ≥ g(Xt ) ∀ t ∈ [0, T ]}
• Dual representation of the price (optimal stopping problem):
,→ at time 0:
−rτ
pus
0 = sup E e g(Xτ )
τ ∈T[0,T ]
h ?
i
= E e−rτ g(Xτ ? )
,→ at time t ∈ [0, T ]:
h i
−r(τ −t)
pus
t = esssupτ ∈T[t,T ] E e g(Xτ )|F t
The discounted price is a super-martingale
One proves easily that pus −rτ g(X )]

0 ≥ supτ ∈T[0,T ] E[e τ
69
h i
• (non-linear) PDE representation: pus (t, x) = supτ ∈T[t,T ] E e−r(τ −t) g(Xτt,x ) is
solution3 to
min{−LX u − ru , u − g} = 0 on [0, T ) × R
u(T, .) = g(.)
• This comes from the Dynamic programming principle: For all stopping time
t≤θ≤T
h i
pus (t, x) = sup E e−r(τ −t) g(Xθt,x )1{θ>τ } + pus (θ, Xθt,x )1{θ≤τ }
τ ∈T[t,T ]
(see e.g. chapter 5 in [68])
3
in some sense... e.g. viscosity sense.
70
4.2 Bermudan option
• An option that can be exercised at a discrete set of time (<) during its life.
< = {0 =: s0 , . . . , sj , . . . , sκ := T } .
• Super-hedging price:
pb0 := inf{y ∈ R | ∃ φ; admissible financial strategy s.t. Vsy,φ ≥ g(Xs ) ∀ s ∈ <}
• Dual representation of the price:
,→ at time 0:

pb0 = sup E e−rτ g(Xτ )
<
τ ∈T[0,T ]
h ∗
i
= E e−rτ g(Xτ ∗ )
,→ at time t ∈ <:
h i
pbt = esssupτ ∈T < E e−r(τ −t) g(Xτ )|Ft
[t,T ]
• Backward Programming Algorithm: (denoting G := g(X) )
Definition 4.1. (i) at T , set YT = GT
(ii) For j < κ, compute
h i
−r(sj+1 −sj )
Ysj = max Csj , Gsj where Csj := E e Ysj+1 |Fsj (Continuation value)
We have Yt = pbt , t ∈ <.
• We now set r = 0 and prove the above results.
71
Proposition 4.1. (i) Y is the smallest super-martingale above G = g(X).
(ii) The optimal stopping time τ ? is given by
τ ? = inf{t ∈ < | Yt = Gt } ∧ T ,
i.e. Y0 = E[Gτ ? ].
(iii) Martingale decomposition theorem: Yt = Y0 + Mt? − A?t .
(iv) Y is the super-hedging price.
Proof.
(i) by induction
(ii) Since Y is a super-martingale above Z on <, we observe that Y0 ≥ E[Yτ ] ≥ E[Zτ ]
< .
for all τ ∈ T[0,T ]
Let us define τ ? = inf{t ∈ < | Yt = Zt }. We compute

 
Xκ−1
E[Yτ ? − Y0 ] = E (Ysj+1 − Ysj )1{τ ? >sj } 
j=0
 
Xκ−1 h i
= E E (Ysj+1 − Ysj )1{τ ? >sj } |Fsj 
j=0
 
Xκ−1

= E (E Ysj+1 |Fsj − Ysj )1{τ ? >sj } 
j=0
=0
This proves that Y0 = supτ ∈T < E[Gτ ].

[0,T ]
(iii) For t ∈ <, we have that Yt = Y0 + Mt? − A?t where M ? is a martingale and A?
is an increasing process given recursively by (M0 , A0 ) = (0, 0) and

Ms?j+1 = Ms?j + Ysj+1 − E Ysj+1 |Fsj

A?sj+1 = A?sj + Ysj − E Ysj+1 |Fti
72

We notice that Ysj − E Ysj+1 |Fti = [Gsj − Csj ]+
(iv) Denotes
Γ := {y ∈ R | ∃ φ; admissible financial strategy s.t. Vsy,φ ≥ g(Xs ) ∀ s ∈ <}
a. Note that for y ∈ Γ, V y,φ is a super-martingale above G and thus – in particular
y = V0y,φ ≥ Y0
which leads to inf Γ ≥ Y0 .
b. Denote η ? = Y0 + M ? , we have that
Y0 = E[ητ?? ] == E[ηT ] .
In our setting, this means that we can replicate ηT with an initial wealth of Y0 i.e.
there exists an admissible strategy φ s.t.
ηt = UtY0 ,φ and then UtY0 ,φ ≥ Yt ≥ Gt , t ∈ π.
Thus Y0 ∈ Γ. 2
73
Proposition 4.2. Set |<| = max0<j≤κ (sj − sj−1 ), then the following holds:
h i
sup E |pbt − pus
t |2
≤ C|<|α
t∈[0,T ]
where α = 1 if g is Lipschitz continuous and α = 2 if g ∈ Cb2 .
0 ≤ pus b
t − pt
Let τ̄ , be the projection on the grid < of τ ∗ (optimal stopping time for the US
option), we have
h i
pus
t − pb
t = E g(Xτ ∗ ) Ft − p
b
t
h i
≤ E[g(Xτ ∗ )|Ft ] − E g(Xτ̄ ) Ft
h i
≤ CE |Xτ ∗ − Xτ̄ | Ft
p
≤C |<|
If g ∈ Cb2 , we apply Ito formula to obtain
Z τ̄
pus
t − pbt ≤ CE | LX g(Xs )ds| Ft
τ∗
≤ C|<|
74
4.2.1 Discretisation of the forward process
• We now introduce a discretization grid π:
π = {0 =: t0 < · · · < ti < . . . tn := T } ,
such that < ⊂ π.
• We define the bermudan option associated to the exercise payoff g(X π ).

pπ0 = sup E e−rτ g(Xτπ )
<
τ ∈T[0,T ]
h ∗
i
= E e−rτ g(Xτπ∗ )
Proposition 4.3. In our setting, we have
1
|pb0 − pπ0 | ≤ C|π| 2 .
when g is Lipschitz continuous.
Proof. See Exercise II.18 2
75
Extension to the continuous case
• We assume that < = π.
• Combining Proposition 4.2 and Proposition 4.3, we have
√
|pus π
0 − p0 | ≤ C π .
76
4.2.2 Longstaff-Schwarz algorithm
• Observe that in Definition (4.1), the optimal stopping time τ̂0 (from time 0)
can be estimated by
1. set τ̂κ = T
2. then set τ̂j = sj 1Aj + τ̂j+1 1Acj with Aj = {Ysj ≤ Gsj } .
,→ Indeed, one has Y0 = E[Gτ̂0 ], namely τ ? = τ̂0 .
• The Longstaff-Schwartz Algorithm [59] focuses on the sequence of optimal time
execution
Definition 4.2. (Longstaff-Schwartz)
1. set τ̃κ = T

2. then set τ̃j = sj 1Ej + τ̃j+1 1Ejc with Ej = {E Gτ̃j+1 |Fsj ≤ Gsj } .
,→ Then one has Y0 = E[Gτ̃0 ]
• Note that the value process Y is computed only through its representation in
terms of optimal stopping time.

indeed, Ysj = E Gτ̃j+1 |Fsj
77
4.3 Dual approach
• We work in the Bermudan case on the discrete time grid <.
• Let M0 the space of square integrable martingale M such that M0 = 0.
Theorem 4.1. The following holds

Y0 = sup E[Gτ ] = inf E sup(Gt − Mt ) . (4.1)
τ M ∈M0 t∈<
The inf is achieved for M ? the martingale part of the Doob-Meyer decomposition
and
Y0 = sup(Gt − Mt? ) .
t∈<
• The previous representation has been introduced in [71] (for the continuous
case) and simultaneously in [47] where the formula involves supermartingales.
Proof. 1. First, observe we observe that
Y0 = sup E[Gτ − Mτ ]
τ

≤ E sup(Gτ − Mτ ) = E sup(Gt − Mt ) .
τ t∈<
2. Recall the Doob-Meyer decomposition theorem: Yt = Y0 + Mt? − A?t where A is
an increasing predictable process with A0 = 0. Thus, we have
Gt ≤ Yt = Y0 + Mt? − A?t
which leads to
(Gt − Mt? ) ≤ Y0 − A?t ≤ Y0 .
78
• In [71], sub-optimal martingales are constructed in "ad-hoc" way, see Section
4.4.2 for possible systematic approaches.
• Numerical methods based on the dual formula 4.1 will lead naturally to upper-
bounds for the true price.
79
4.4 Implementation using regression techniques
• The regression part is required to estimate the conditional expectation (linear
or nonlinear [74, 47])
• Approximation of the Bermudan option price on
< = {0 =: s0 , . . . , sj , . . . , sκ := T }
• Assume perfect simulation of the underlying and r = 0 (w.l.o.g)
• Use the Backward Algorithm:
(i) at T , set Yκ = g(XT ).
(ii) For j < κ, compute

Yj = max E Yj+1 |Fsj , g(Xsj )
,→ need to compute continuation value.
4.4.1 Linear Regression-based methods
• An easy backward induction proves Yj = uj (Xsj ), 1 ≤ j ≤ κ. (X is markovian)

• Denote Cj := cj (Xsj ) := E Yj+1 |Xsj = E Yj+1 | Fsj ( continuation value),
we have
Yj = cj (Xsj ) ∨ g(Xsj )
How to approximate Cj ? (assuming all quantities are square integrable)
• Observe that

Cj = argminZ∈L2 (Fsj ) E |Yj+1 − Z|2 = argminZ∈L2 (σ(Xsj )) E |Yj+1 − Z|2
80
• Denoting L2j the set of PXsj -square integrable function, we observe

cj = argminf ∈L2 E |Yj+1 − f (Xsj )|2
j
• L2j is too big! so one considers some basis function (ψ` )`≥1 and then pick a
finite number of them, say K, to get
K
X
f' β ` ψ`
`=1
((β ` )1≤`≤K depends on f ...)
• With this approximation, the minimisation problem can be rewritten

" K
#
X
β¯j = argminβ∈RK E |uj+1 (Xsj+1 ) − β ` ψ` (Xsj )|2 (4.2)
`=1
and one sets
K
X
c̄j := β̄j` ψ` , and cj = c̄j + error
`=1
PK `
(Cj = C̄j + error where C̄j = `=1 c̄j ψ` (Xsj ))
• The coefficient β̄ is easily calculated, once observed that C̄j is the orthogonal
projection of Yj+1 on V = V ect(ψ1 (Xj ), . . . , ψK (Xj )):
We denote ψ := (ψ1 , . . . , ψK ) and set
h i h i
Bψj = E ψ(Xj )> ψ(Xj ) , j
Bψu = E ψ(Xj )> uj+1 (Xj+1 ) ,
we have that
β̄j = (Bψj )−1 Bψu

j
,
assuming Bψj non-singular.
81
By property of the orthogonal projection on the vector space V, we have

E (Yj+1 − C̄j )Z = 0, for all Z ∈ V leading to
" K
#
X
E Yj+1 − β̄` ψ` (Xj ) ψr (Xj ) = 0 , 1 ≤ r ≤ K .
`=1
j
• In practice, one has to consider estimated counterparts of Bψj , Bψu ...
N
1 X
Bψj = ψ(Xji )> ψ(Xji )
N
i=1
where (Xji )1≤i≤N are i.i.d copies of Xj .
82
• Full approximation (Tsitsiklis-Van Roy): observe that in practice one has to
estimate ūj = c̄j ∨ g instead of uj leading to the following method:
1. Simulate (Xji )1≤i≤N i.i.d copies of Xj , j ≤ κ.
2. at T , set Yκi = g(Xκi ) and ū

ˆκ = g.
3. for j < κ, compute β̄ˆ = (B̂ψj )−1 B̂ψj ūˆ

j+1
PK ˆ
set c̄ˆj := `=1 β̄` ψ` and ˆj = c̄ˆj ∨ g.
ū
• One can compute an approximated optimal policy τ̂ ∗ on each path and recom-
pute E[g(Xτ̂ ∗ )].
Remark 4.1. 1. The choice of the basis functions is key, specially in high di-
mension.
2. The linear-regression method is used for the Longstaff-Schwartz Algorithm [59].
3. Error analysis (approximation/statistical) is involved, see [25, 39, 36].
4. This is a low-biased estimator of the American option price.
4.4.2 Implementation of the dual approach
• Based on Theorem 4.1, one should find a "good" martingale. Various ap-
proaches beyond "guessing" [71] have been considered see e.g. [8] and the
references therein.
• We recall the method in [47], see also [33, Chapter 8].
• The martingale to use in (4.1) is M ? which is given by

∆Mj? = Ysj+1 − E Ysj+1 |Fsj . (4.3)
83
• To ensure the martingale property, Haugh & Kogan suggest to recompute the
previous martingale increment for each simulated path ⇒ Nested Simulation
ˆ of the us option price has been computed

– An approximation ū

– Simulate M path of X: for each Xsmj , compute E ū
ˆ(Xsj+1 )|Xsj by res-
imulation and set

ˆ j with ∆
M̂j+1 = M̂j + ∆ ˆ j = ū
ˆ(Xsj+1 ) − E ū
ˆ(Xsj+1 )|Xsj (4.4)
1 PM m
– Compute M m=1 maxt∈< (g(Xsj ) − M̂jm )
84
4.5 Quantization based methods
4.5.1 Introduction - cubature formula
See e.g. the review article [62] or Chapter 5 in the book [63]
• Let X be an Rd random variable. Quantization → find the best approximation
of X by discrete random variable taking at most M different values x :=
{x1 , . . . , xM }.
• Voronoi tessellation of the M -quantizer x is a (Borel) partition C(x) := (Ci (x))1≤i≤M
s.t.
Ci (x) ⊂ {ξ ∈ Rd | |ξ − xi | ≤ min |ξ − xj |}
i6=j
• Nearest neighbour projection on x: Px : ξ ∈ Rd 7→ xi if ξ ∈ Ci (x)
85
• x-quantization of X: X̂ = Px (X) (remark: if X is absolutely continuous any
two x quantizations are Pa.s. equal.)
• pointwise error: |X − X̂ x | := d(X, {x1 , . . . , xn }) = min1≤i≤M |X − xi |
h i1
2
• Quadratic mean quantization error : E |X − X̂ x |2 =: kX − X̂ x k2
Optimal quantization (this presentation: X has infinite support)
• Proposition 1 in [62] ot Theorem 5.1 in [63]
h i
1. The quadratic distortion function: x 7→ E |X − X̂ x |2 reaches a minimum
at some quantizer x∗
h ∗
i
2. The function M 7→ E |X − X̂ x |2 is decreasing to 0 as M → +∞
• Upper bound on the convergence rate for X ∈ L2+ for some > 0: There
exists Cd, s.t.
1
∀M ≥ 1, min kX − X̂ x k2 ≤ Cd, kXk2+ M − d (4.5)
x∈(Rd )M
This is Zador’s Theorem, see e.g. [60].
• Any L2 -optimal M -quantizer is the best approximation of X by L2 r.v. taking
at most M values (least square approximation) i.e.
∗
kX − X̂ x k2 := min{kX − Y k2 | #Im(Y ) ≤ M } (4.6)
Proof. Set y := Im(Y ) = {y1 , . . . , yM } and observe that
min |X − yi | ≤ |X − Y | P − a.s.
i
∗
so that kX − Y k2 ≥ kX − X̂ y k2 ≥ kX − X̂ x k2 . 2
86
• Any L2 -optimal M -quantizer x∗ ∈ Rd is stationary in the following sense
h ∗
i ∗
E X X̂ x = X̂ x . (4.7)
h i
(in particular E[X] = E X̂ x as soon as the quantizer is stationary which
generally does not correspond to minima)
Proof. We compute
h ∗
i
E |X − X̂ x |2
h i2 h i h i h i
2
x∗ x∗ x∗ x∗ x∗ x∗
= E X − E X X̂ + 2(X − E X X̂ )(E X X̂ − X̂ ) + E X X̂ − X̂
h i2 h i
∗ ∗ ∗ 2
= E X − E X X̂ x + E X X̂ x − X̂ x
∗
where the last inequality is obtained by conditioning w.r.t X̂ x .
h ∗
i
From the previous point, we also have, as E X X̂ x takes almost M different
∗
values (this is a measurable function of X x ...)
h i h i 2
x∗ 2 x∗
E |X − X̂ | ≤ E X − E X X̂
This leads to
h i
2
x∗ x∗
E E X X̂ − X̂ =0.
• Two M -quantizer (M = 500) of N (0, I2 ) one of them being (close to be) op-
timal... [62]
87
For a discussion on how to obtain optimal quantization grid, we refer to Section
5.3 in [63].
88
Cubature formulas
• Below x is an L2 -optimal M -quantizer for X.
• One naturally computes
h i XM
E[g(X)] ' E g(X̂ x ) = f (xi )P(X ∈ Ci (x))
i=1
• If g is Lipschitz,
h i h i
|E[g(X)] − E g(X̂ x ) | ≤ [g]Lip E |X − X̂ x | ≤ [g]Lip kX − X̂ x k2
• If g is C 1 with Lipschtiz derivative,
h i h i
|E[g(X)] − E g(X̂ x ) | ≤ [Dg]Lip E |X − X̂ x |2 (4.8)
as X̂ x is stationnary.
• Convergence when M → ∞ follows from (4.5).
Conditional expectation
• For X, Y denote their quantized version by X̂, Ŷ
h i
• Approximation of Θ = E[F (X)|Y ] by Θ̂ = E F (X̂)|Ŷ ...
• one needs to compute X̂, Ŷ and the law of (X̂, Ŷ ).
• Say E[F (X)|Y ] = ϕF (Y ) with ϕF Lipschitz, then:

kΘ − Θ̂k2 ≤ C kX − X̂k2 + kY − Ŷ k2 (4.9)
Proof. See Exercises II.19. 2
89
4.5.2 Quantization tree for optimal stopping problem
This technique has been introduced in [6, 5], where a complete error analysis is done,
see also Section 11.3.2 in [63].
• we are given a discrete-time Markov Chain (Xk )0≤k≤ on a grid π (samples of
(Xt )0≤t≤T or an associated scheme).
• we use the Backward Programming algorithm of Definition 4.1.
• In the Markovian setting, recall that
Yk = uk (Xk ) where uk = ck ∨ g and ck (Xk ) = E[uk+1 (Xk+1 ) | Xk ].
• For each k, we consider the (optimal) quantization X̂k of Xk on the grid
Ck := (Cki )1≤i≤Mk .
h i
• For any ϕ, we replace E[ϕ(Xk+1 )|Xk ] by E ϕ(X̂k+1 ) | X̂k , denote then
k
πij = P(X̂k+1 = xjk+1 |X̂k = xik )
• Define the pseudo-Snell envelope:






Ŷn = g(X̂n )
. (4.10)

 h i


Ŷk = max{E Ŷk+1 |X̂k , g(X̂k )}
• A backward induction yields Ŷk = ŷk (X̂k ) where






ŷn = g
(4.11)

 PMk+1 k


ŷk (xik ) = max{ j=1 πij ŷk (xjk+1 ), g(xik )}
Pn−1
• ’online’ computational cost: ∼ k=0 Mk Mk+1 .
90
• note that the grids and (π k ) are computed offline, e.g. by MC method: Note
k
]{n|X̂kn = xik & X̂k+1
n = xjk+1 }
πij = limN →∞
]{n|X̂kn = xik }
where (Xkn )1≤n≤N,1≤k≤κ are MC simulation of the Markov Chain (Xk )1≤k≤κ .
91
4.5.3 Markovian quantization (grid method)
• given δ > 0 and κ ∈ N∗ , we consider the bounded lattice grid:
Γ = {x ∈ δZd | |xj | 6 κδ, 1 6 j 6 d} .
Observe that there are (2κ)d + 1 points in Γ.
• We introduce a projection operator Π on the grid Γ centered in X0 given by,
for x ∈ Rd ,





 δbδ −1 (xj − X0j ) + 21 c + X0j , if |xj − X0j | 6 κδ,


(Π[x])j = κδ, if xj − X0j > κδ,






 −κδ, if xj − X0j < κδ.
• we use an optimal quantization of Gaussian random variables (∆Wi ):
p ∆Wi
ci :=
∆W hi GM ( √ )
hi
GM denotes the projection operator on the optimal quantization grid for the
standard Gaussian distribution with M points in the support4 .
Moreover, it is shown in [41] that
h i1 √ 1
c
E |∆Wi − ∆Wi |p p
6 Cp,d hM − d . (4.12)
• we introduce the following discrete/truncated version of the Euler scheme,




 X b π = X0
0
h i (4.13)


 X bπ = Π X b π + hi b(X
b π ) + σ(X
b π )∆W
ci .
i+1 i i i
b π is a Markovian process living on Γ and satisfying |X

We observe that X bπ| 6
i
C(|X0 | + κδ), for all i 6 n.

4
The grids can be d ownloaded from the website: http://www.quantize.maths-fi.com/ .
92
Definition 4.3. We denote (Ybiπ )06i6n the solution of the backward scheme satisfying
(i) The terminal condition is Ybnπ := g(X

bnπ )
(ii) for i < n, the transition from step i + 1 to step i is given by
h i
Ybiπ = max(Eti Ybi+1
π biπ ))
, g(X (4.14)
Proposition 4.4. For all i ∈ {0, ..., n}, there exists a function uπ (ti , .) : Γ → R
such that
Yb π = uπ (ti , X
biπ )
This function is computed on the grid by the following backward induction: for all
i ∈ {0, ..., n} and x ∈ Γ,
h p i
uπ (ti , x) = max{E uπ ti+1 , Π x + hi b(x) + hi σ(x)GM (U ) , g(x)}
with U ∼ N (0, 1).
The terminal condition is given by uπ (tn , x) = g(x).
93
5 Non-linear pricing methods
5.1 Backward Stochastic Differential Equation
• We work in a Lipschitz setting. We give the main definition and properties for
solution to BSDEs.
• first introduced by Bismut [10, 11] and then studied in general way first by
Pardoux and Peng [65]. See also [31, 61, 67].
5.1.1 Definition
• For a prescribed terminal time T > 0, the solution of a backward stochastic
differential equation is a couple (Y, Z) satisfying on [0, T ]






dYt = −f (t, Yt , Zt )dt + Zt dWt
(5.1)




YT = ξ
for some progressively measurable random function f , called the driver, and a
terminal condition ξ which is a FT -measurable random variable.
• We denote S 2 (Rk ) the vector space of RCLL5 adapted processes Y , with values
in Rk , and such that:

" #
kY k2S 2 := E sup |Yt | 2
< ∞, (5.2)
0≤t≤T
Sc2 (Rk ) is the subspace of continuous process.
• The set H2 (Rk×d ) is the set of Z process, progressively measurable, valued in
Rk×d , such that

Z T
kZk2H2 := E 2
|Zt | dt < ∞,
0
5
Righ Continuous with Left Limits.
94
where for z ∈ Rk×d , |z|2 = Tr(zz † ).
assumptions The random function f defined on [0, T ] × Ω × Rk × Rk×d and
valued in Rk is such that for all (y, z) ∈ Rk × Rk×d , the process {f (t, y, z)}0≤t≤T is
progressively measurable. We also assume that
(H1): There exists a positive constant L such that P a.s.,
1. Lipschitz continuity in (y, z) : for all t, y, y 0 , z, z 0 ,

f (t, y, z) − f (t, y 0 , z 0 ) ≤ L |y − y 0 | + kz − z 0 k ;
2. Integrability condition:
Z T
2 2
E |ξ| + |f (r, 0, 0)| dr < ∞.
0
Theorem 5.1. Under (H1), there exists a unique solution (Y, Z) ∈ Sc2 × H2 to
Z T Z T
Yt = ξ + f (s, Ys , Zs )ds − Zs dWs , 0 ≤ t ≤ T. (5.3)
t t
Proof. see the proof of Theorem 2.1 in [31] 2
5.1.2 Some key properties
Linear BSDEs We first study linear BSDE for which we can give an almost
explicit solution. For this section, we set k = 1: Y is then real valued and Z a
d-dimensional row vector.
Proposition 5.1. Let {(at , bt )}t∈[0,T ] be progressively measurable and bounded pro-
cesses with value in R × Rd . Let {ct }t∈[0,T ] be an element of H2 (R) and ξ a random
variable, FT –measurable, square integrable.
95
The linear BSDE
Z T Z T
Yt = ξ + {ar Yr + Zr br + cr } dr − Zr dWr , (5.4)
t t
has a unique solution given by:
Z T
∀t ∈ [0, T ], Yt = Γ−1
t E ξΓT + cr Γr dr Ft , (5.5)
t
where for all t ∈ [0, T ],
nZ t
1
Z t Z t o
2
Γt = exp br dWr − |br | dr + ar dr .
0 2 0 0
Proof. See exercise II.20 2
Remark 5.1. We observe that if ξ and c are non-negative then Y ≥ 0.
Comparison Theorem
Theorem 5.2. Let k = 1 and assume that (ξ, f ) satisfies (H1), the solution to
the associated BSDE is denoted (Y, Z). Let (Y 0 , Z 0 ) be a solution of a BSDE with
RT
parameters (ξ 0 , f 0 ) and satisfying 0 f 0 (t, Yt0 , Zt0 )dt ∈ L2 (FT ). We also assume that
P a.s. ξ ≤ ξ 0 and f (t, Yt , Zt ) ≤ f 0 (t, Yt , Zt ) λ ⊗ P − a.e. (λ denoting the Lebesgue
measure). Then,
P a.s., ∀t ∈ [0, T ], Yt ≤ Yt0 .
If, moreover, Y0 = Y00 , then P a.s., Yt = Yt0 , 0 ≤ t ≤ T and f (t, Yt , Zt ) = f 0 (t, Yt , Zt )
λ ⊗ P–a.e. In particular, as soon as, P (ξ < ξ 0 ) > 0 or f (t, Yt , Zt ) < f 0 (t, Yt , Zt ) on
a set with positive λ ⊗ P–measure then Y0 < Y00 .
96
5.1.3 Application to non-linear pricing
• We consider a market with one risky asset solution of the SDE

Z t Z t
Xt = x + b(s, Xs )ds + σ(s, Xs )dW̃s (5.6)
0 0
and assume a different rate for lending (r) and borrowing (R, R > r).
• To simplify the notation, we assume that there exists a brownian motion W
under Q ∼ P s.t.
Z t Z t
Xt = x + rXs ds + σ(s, Xs )dWs (5.7)
0 0
We will work now under the probability Q.
• A portfolio is constituted of cash α and risky asset (quantity φ), its value at
time t is
Vtυ,φ = αt + φt Xt . (5.8)
• Between t and t + dt, the variation of
- the cash account is: [αt ]+ rdt − [αt ]− Rdt = (rαt + [αt ]− (r − R))dt
- the risky asset account is: φt dXt
,→ Working with self financing strategies, we obtain
dVt = (rVt + [Vt − φt Xt ]− (r − R))dt + φt σ(t, Xt )dWt . (5.9)
• The (super-)replication price for an european option with terminal payoff
g(XT ) is defined as
p0 = inf{υ ∈ R | ∃ψ admissible strategy , VTυ,ψ = g(XT ) (≥)} (5.10)
97
equivalently one has to solve for (Y, ∆) the following
Z T Z T
−
Yt = g(XT ) + (rYs + [Ys − ∆s Xs ] (r − R))ds − ∆s σ(s, Xs )dWs
t t
(5.11)
and obtain that p0 = Y0 and ∆ is an admissible strategy.
Sketch of proof. From Theorem 5.1, there exists a unique solution to
Z T Z T
Zs
Yt = g(XT ) + (rYs + [Ys − Xs ]− (r − R))ds − Zs dWs (5.12)
t σ(Xs ) t
Setting
Zs
φs = (5.13)
σ(s, Xs )
we have that Yt = VtY0 ,φ and then VTY0 ,φ ≥ g(XT ). Thus, Y0 ≥ p0 .
Now let v s.t. ∃ψ, VTv,ψ ≥ g(XT ) = YT . We observe that V v,ψ has the same
dynamics as a BSDE and thus we can use the comparison theorem to conclude that
v ≥ Y0 . Taking the infinimum on v leads to p0 ≥ Y0 .
98
5.2 Main properties in the Markov setting
See [66]
• One can show that Yt = u(t, Xt ) for some Lipschitz function u.
• u is solution (in the viscosity sense) of

− u(0) (t, x) − f x, u(t, x), ∂x u(t, x)σ(t, x) = 0 (t, x) ∈ [0, T ) × Rd (5.14)
and u(T, x) = g(x) . (5.15)
• Assuming that u is a smooth solution to the above non-linear pde, one can
prove conversely
(Yt = u(t, Xt ), Zt = ∂x u(t, Xt )σ(t, Xt )). (5.16)
Proof. Apply Ito’s Formula. 2
Notations (in the one-dimensional setting) For a function ϕ : [0, T ] × R → R, we
define
ϕ(0) (t, x) = L(0) ϕ(t, x) := ∂t ϕ + LX ϕ(t, x) (5.17)
ϕ(1) (t, x) = L(1) ϕ(t, x) := σ(t, x)∂x ϕ(t, x) (5.18)
where LX is given in (2.7).
For a multi-index, β ∪n≥1 {0, 1}n with β = (j1 , . . . , jk ) for some k ≥ 1, we denote
by −β := (j2 , . . . , jk ) and
ϕβ = Lβ ϕ = L(j1 ) [L−β ϕ] .
99
5.3 Numerical analysis of backward Methods
• we seek to approximate (Y, Z) in order to approximate u.
• to alleviate the presentation, we assume that f := f (y, z) only
• We work under two different sets of assumptions
1. (HL): the coefficient b, σ, f , g are L-Lipschitz continuous.
2. (Hr): the function u, b, σ are Cb2,4 (up to T ).
Backward Algorithm: [17, 76] on a discrete grid π
- at tn = T :
(Ynπ , Znπ ) = (g(XT ), 0) (5.19)
- i < n, compute


 h i

 π ( Wti+1 −Wti )
Ziπ = Eti Yi+1 ti+1 −ti
(5.20)

 π


Yiπ = Eti Yi+1 + (ti+1 − ti )f (Yiπ , Ziπ )
where the grid π satisfies n|π| ≤ C where |π| := maxi (ti+1 − ti ).
Remark 5.2. .
1
(i) This is an implicit Euler scheme and the error is O(|π| 2 ) in the Lipschitz case
and assuming smoothness of the coefficient O(|π|) at most, generally (see below).
(ii) Explicit version of the scheme: in the smooth case Znπ := σ(XT )g 0 (XT ).
(iii) One has to estimate the conditional expectations!
heuristics.
100
5.3.1 L2 -stability
• Perturbation approach: Consider
h i
Ỹi = Eti Ỹi+1 + (ti+1 − ti )f (Ỹi , Z̃i ) + ζi (5.21)

Wti+1 − Wti
Z̃i = Eti Ỹi+1 ( ) (5.22)
ti+1 − ti
where ζi ∈ L2 (Fti ).
• Let us set δY := Y π − Ỹ , δZ := Z π − Z̃, we introduce
Definition 5.1 (L2 -stability). The scheme given in (5.20) is L2 -stable if there exists
a constant C > 0 s.t.

" n−1
#
X
max E |δYi |2 ≤ CE |δYn |2 + nζi2 (5.23)
i
i=0
for |π| small enough, for all perturbation ζ.
Proposition 5.2. If f is Lipschitz continous, the scheme (5.20) is L2 -stable.
Proof. Let us observe that the scheme can be rewritten
Yiπ = Yi+1
π
+ hi f (Yiπ , Ziπ ) − hi Ziπ Hi − ∆Mi (5.24)
∆Wi
where Hi = h . Note that (5.26) defines ∆Mi , moreover it satisfies

Eti[∆Mi ] = Eti[∆Mi Hi ] = 0 and E |∆Mi |2 < ∞ (5.25)
These properties are obtained by using the definition of the scheme given in (5.20).
For the perturbed scheme, we have similarly:
Ỹi = Ỹi+1 + hi f (Ỹi , Z̃i ) + ζi − hi Z̃i Hi − ∆M̃i (5.26)
101
Denoting δfi = f (Yiπ , Ziπ ) − f (Ỹi , Z̃i ) and δ∆Mi = ∆Mi − ∆M̃i , we observe that
δYi + hi δZi Hi + δ∆Mi = δYi+1 + hi δfi + ζi . (5.27)
Squaring both sides and taking conditional expectation, we compute, using Young’s
inequality,
h i C
|δYi |2 + hi |δZi |2 ≤ (1 + Ch)Eti (δYi+1 + hi δfi )2 + |ζi |2 . (5.28)
h
Note that
(δYi+1 + hi δfi )2 ≤ (|δYi+1 | + Chi |δYi | + Chi |δZi |)2 (5.29)
h
≤ (1 + )(|δYi+1 | + Chi |δYi |)2 + C(1 + )h2i |δZi |2 (5.30)
h
Choosing h and such that C(h + ) ≤ 12 , we obtain
1
(δYi+1 + hi δfi )2 ≤ (1 + Ch)|δYi+1 |2 + Ch|δYi |2 + h2i |δZi |2 (5.31)
2
Inserting the previous inequality in (5.28), we get
C
|δYi |2 ≤ (1 + Ch)Eti |δYi+1 |2 + |ζi |2 (5.32)
h
C
≤ eCh Eti |δYi+1 |2 + |ζi |2 (5.33)
h
Taking expectation on both sides and iterating over i, we obtain (5.23).
102
5.4 Convergence analysis assuming no error on X
5.4.1 Truncation error
• Let us introduce

∆Wi
Ẑti = Eti Yti+1 . (5.34)
hi
• We define the local truncation error as
Z ti+1
ζ̂i := Eti [f (Yt , Zt ) − f (Yti , Ẑti )]dt . (5.35)
ti
• It measures how well the true solution satisfies the scheme.
Indeed, we observe that:
• The global truncation error is then defined as
X h i
T (π) := E n|ζ̂i |2 . (5.36)
i
• Assume at this point that there is no error made on the forward process, thus
δYn := 0 and we have

max E |Yti − Yiπ |2 ≤ CT (π) . (5.37)
i
Proof. This comes directly from the L2 -stability of the scheme with the
perturbation given by ζ̂. 2
• we now study the order of the truncation error under (HL) or (Hr).
103
5.4.2 Order of convergence in the smooth case
Theorem 5.3. Under (Hr), we have that
T (|π|) ≤ C|π|2 (5.38)
and thus the scheme is of order 1.
Proof. (proof in the one dimensional case) 1. We observe that

Z ti+1
|ζ̂i | ≤ Eti [f (Yt , Zt ) − f (Yti , Zti )]dt + Chi |Zti − Ẑti |. (5.39)
ti
Using the PDE satisfied by u, we get
h i
|Eti[f (Yt , Zt ) − f (Yti , Zti )] | = |Eti u(0) (t, Xt ) − u(0) (ti , Xti ) | (5.40)
Z ti+1
(0,0)
= |Eti u (t, Xt )dt | (5.41)
ti
≤ C|π| (5.42)
where we used Ito’s formula for the last equality. Now we compute, setting Hi :=
∆Wi
hi
Z ti+1 Z ti+1
(0) 1 (1)
Ẑti = Eti u(ti+1 , Xti+1 )Hi = Eti Hi u (t, Xt )dt + u (t, Xt )dt
ti hi ti
(5.43)
Observe that
Z ti+1 Z ti+1
|Eti Hi u(0) (t, Xt )dt | = |Eti Hi {u(0) (t, Xt ) − u(0) (ti , Xti )}dt | (5.44)
ti ti
≤ C|π| (5.45)
and that
h i Z t
(1) (1) (0,1)
Eti u (t, Xt ) = Eti u (ti , Xti ) + u (s, Xs )ds (5.46)
ti
104
We thus get
Z ti+1
1 (1)
|Eti u (t, Xt )dt − u(1) (ti , Xti )| ≤ C|π| (5.47)
hi ti
leading to |Zti − Ẑti |2 ≤ C|π|. We then obtain that
|ζ̂i | ≤ C|π|2 . (5.48)
And, by summing this local error estimate, we conclude that T (π) ≤ C|π|2 . 2
105
5.4.3 Order of convergence in the Lipschitz case
• The Lipschitz case is more involved.
Proposition 5.3. Under (HL), we have that
Z !
2
n−1
X ti+1
T (|π|) ≤ C |π| + max E |Yt − Yti | + E |Zt − Z̄ti |2 dt (5.49)
i ti
i=0
where
Z ti+1
1
Z̄ti = Eti Zt dt .
hi ti

Z ti+1 Z ti+1
Ẑti = Eti Zs dWs Hi + Hi f (Yt , Zt )dt (5.50)
ti ti
and then compute

Z ti+1
Z̄ti = Eti Zs dWs Hi (5.51)
ti
Z ti+1 Z ti+1
2 2
|Eti Hi f (Yt , Zt )dt | ≤ Eti |f (Yt , Zt )| dt (5.52)
ti ti
leading to
Z ti+1
2 2
|Z̄ti − Ẑti | ≤ CEti |f (Yt , Zt )| dt (5.53)
ti
Then
Z ti+1 Z ti+1
2 2 2 2
|Eti [f (Yt , Zt ) − f (Yti , Ẑti )]dt | ≤ hi Eti C [|Yt − Yti | + |Zt − Z̄ti | + |Z̄ti − Ẑti | ]dt
ti ti
(5.54)
Z ti+1
2 2
≤ hi CEti [|Yt − Yti | + |Zt − Z̄ti | ]dt
ti
(5.55)
Z ti+1
+ Ch2i Eti 2
|f (Yt , Zt )| dt (5.56)
ti
106
The proof is concluding by summing on i and recalling that
Z T
2
E |f (Yt , Zt )| dt ≤ C. (5.57)
0
• We give without proof the following result
Z ti+1
X Z ti+1
2 2
max E |Zt | dt + E |Zt − Z̄ti | dt ≤ C|π| (5.58)
i ti ti
i
due to Zhang [76].
• It is based on a representation of Z obtained by means of Malliavin calculus
and require some heavy computations.
Theorem 5.4. Under (HL), we have that
T (|π|) ≤ C|π| (5.59)
and thus the scheme is of order 1/2.
Proof. Observe that

" # Z Z
ti+1 ti+1
2
E sup |Yt − Yti | ≤ CE |f (Yt , Zt )|2 dt + |Zt |2 dt (5.60)
t∈[ti ,ti+1 ] ti ti
≤ C|π| (5.61)
where we used (5.58). Then, the proof is concluded by combining the previous
estimate and (5.58) with Proposition 5.3. 2
107
5.5 Full discrete-time error analysis
• We replace X by its Euler scheme X π .
• The scheme for (Y, Z) is the same but the terminal condition (5.19) is now
(Yn , Zn ) = (g(XTπ ), 0) (5.62)
Theorem 5.5. Under (Hr), the following holds
|Y0 − u(0, X0 )| ≤ C|π| . (5.63)
5.5.1 Truncation error
• Define Ỹi = u(ti , Xtπi ) and

∆Wi
Z̃i = Eti Ỹi .
ti+1 − ti
Proposition 5.4. With the above definition,
h i
Ỹi = Eti Ỹi+1 + (ti+1 − ti )f (Ỹi , Z̃i ) + ζi (5.64)
where
h i
ζi = Eti ζie + ζif + ζiz (5.65)
and ζie , ζif and ζiz are defined in (5.67), (5.69) and (5.70) respectively.
Proof. (b = 0) 1. Applying Ito’s formula, we have, setting Vsi = σ(Xtπi )∂x u(s, Xsπ )
Z ti+1 Z ti+1
1
u(ti+1 , Xtπi+1 ) = u(ti , Xtπi ) + {∂t u(s, Xsπ ) + σ 2 (Xtπi )∂xx
2
u(s, Xsπ )}ds + Vsi dWs
ti 2 ti
(5.66)
108
Then, introducing
Z ti+1
1 2
ζie := − σ 2 (Xtπi ) − σ 2 (Xsπ ) ∂xx u(s, Xsπ )ds (5.67)
2 ti
the above equality rewrites, using the PDE satisfied by u,
Z ti+1 Z ti+1
Ỹi+1 = Ỹi − hf (Ỹi , Vtii ) + u(0) (s, Xsπ ) − u(0) (ti , Xtπi ) ds + ζie + Vsi dWs
ti ti
(5.68)
Now we define
Z ti+1
ζif =− u(0) (s, Xsπ ) − u(0) (ti , Xtπi ) ds (5.69)
ti
ζiz = h{f (Ỹi , Z̃i ) − f (Ỹi , Vtii )} (5.70)
and observe that
h i
Ỹi = Eti Ỹi+1 + hf (Ỹi , Z̃tii ) + ζie + ζif + ζiz . (5.71)
5.5.2 Convergence analysis
• We prove below Theorem 5.5: this relies (as usual) on
1. the stability result for the BTZ scheme
2. a control of the truncation error coming from the regularity of u.
Lemma 5.1. Under (Hr), the following holds
|Eti[ζie ] |2 = O(h4 ) (5.72)

h i
|Eti ζif |2 = O(h4 ) (5.73)
|Eti[ζiz ] |2 = O(h4 ) (5.74)
109
Proof of Theorem 5.5
We simply observe that (Ỹi , Z̃i )i defines a pertubed scheme as in Section 5.3.1. We
can then combine Proposition 5.2 with Lemma 5.1 to conclude the proof, recall also
(5.37).
110
5.6 Numerical illustration and further consideration
5.6.1 Monte Carlo based methods
• Regression methods, as the one used for US options, can be easily adapted to
the BSDEs setting, introduced in [37, 56] and extensively studied in [39, 40,
38, 34]
• Any methods that permits to estimate conditional expectation can be used,
e.g. Malliavin method [29, 17, 16]
5.6.2 Tree based methods
• We illustrate Section 5.5 with Example 5.1.
• A more systematic approach is based on Cubature methods (it allows in par-
ticular to work with degenerate diffusion), see e.g. [27, 28, 26].
Example 5.1. In this example [22], we work with d = 3 and X = W , We consider
the following BSDE
Z 1 Z 1
ω(1, W1 ) 5
Yt = + (Zs1 + Zs2 + Zs3 )( − Ys )ds − Zs dWs . (5.75)
1 + ω(1, W1 ) t 6 t
where ω(t, x) = exp (x1 + x2 + x3 + t). Applying Itô formula, we verify that the
solution is given by
ω(t, Wt ) ω(t, Wt )
Yt = and Ztl = , l ∈ {1, 2, 3} , t ∈ [0, 1] . (5.76)
1 + ω(t, Wt ) (1 + ω(t, Wt ))2
• In this very smooth setting, we can introduce higher order scheme too. We
consider the Crank-Nicholson scheme (CN) and a linear multi-step scheme
(AMB2) on the graph below.
111
• The number of points used to approximate ∆W depends on the theoretical rate
of convergence of the discrete-time error and the number of moment to match,
recall Section 2.4.
• Since X = W , we can use easily a recombination procedure.
Figure 7: Empirical convergence, Example 5.1
112
5.6.3 (Markovian) quantisation
• We illustrate here the Markovian quantisation with Example [23] in the case
of quadratic BSDEs.
• "Classical" quantization can be used for BSDEs as well, see e.g. [64].
Example 5.2. [23] We consider the following quadratic Markovian BSDE:



 R
 Xt` = X0` + t νXs` dWs` , ` ∈ {1, 2, 3}
0
, 0 6 t 6 1, (5.77)

 R R
 Yt = g(X1 ) + 1 a kZs k2 ds − 1 Zs dWs
t 2 t
where a, ν, and (X0` )`∈{1,2,3} are given real positive parameters and g : Rd → R is a
bounded Lipschitz function.
Applying Ito’s formula, one can show that the solution is given by
1
Yt = log Et exp ag(X1 ) , t61. (5.78)
a
For any given g, ν and a, it is possible to estimate the solution Y0 at time 0 using an
ν2
+νW1`
approximation of the Gaussian distribution at time T = 1, since X1` = X0` e− 2 .
• For our numerical illustration, d = 2 and g is given by
2
X
g : x 7→ 3 sin2 (x` ) , (5.79)
`=1
• We use a markovian quantisation scheme as introduced in Section 4.5.3.
• The non-Lipschitz setting may cause instability and we use in the graph below
a scheme taming this quadratic growth also (called ’adaptive truncation’)
113
Figure 8: Comparison of schemes’ convergence
5.7 An example of forward method
• We present here a machine learning method to approximate BSDEs.
• Consider, for y ∈ R and Z ∈ H2
Z t Z t
Yty,Z =y− f (Ysy,Z , Zs )ds + Zs dWs (5.80)
0 0
• and the following optimisation problem
h i
V := min E |g(XT ) − YTy,Z |2 (5.81)
(y,Z)∈R×H2
• In the Lipschitz setting, the solution of the above optimisation is the BSDEs
(5.1) with V = 0!
• Main idea: solve numerically the optimisation problem (5.81) to get an ap-
proximation of the BSDE.
114
5.7.1 Discretization of the optimisation problem (5.81)
• The dynamics (5.80) of Y y,Z is discretised using a classical Euler scheme on
π:
Ytπn+1 = Ytπn + hf (Ytπn , Ztn ) + Ztn (Wtn+1 − Wtn ) and Y0π = y . (5.82)
• The random variable Z must be discretised also in some sense: parametric
specification!
1. Non-linear specification: Z = Z Θ for some Θ ∈ RK where Θ stands for
the coefficient of a Neural Network ϕNN and ZtΘn = ϕNN (tn , Xtn )
2. Linear specification: Ztθn = ϕL (tn , Xtn ) where
K
X
ϕL (tn , ·) = θk φk (tn , ·) , θ ∈ RdK
k=1
where (φk )1≤k≤K are some basis functions.
• Often, in practice, X has to be approximated too, by an Euler scheme for
example, recall (2.13).
• The discrete optimisation problem is now given by

V π := min E |g(XTπ ) − YTπ,υ |2 (5.83)
υ∈R1+K̄
where υ = (y, Θ0 , . . . , ΘN −1 ) and K̄ = d + (N − 1)K for the non-linear spec-
ification, or υ = (y, θ0 , . . . , θN −1 ) and K̄ = d + (N − 1)dK for the linear
specification.
• To solve the discrete optimisation problem one uses generally a (stochastic)
gradient descent algorithm.
115
6 An introduction to McKean-Vlasov SDEs in finance
See the seminal paper [72] and among others [42, 52], [20] for the large population
control point of view (e.g. Mean Field Games).
6.1 Definition
6.1.1 Introduction
• Example of systemic risk study [21]:
X i log-monetary reserve of each bank, the ’particle system’ is given by
 
N
X
1
dXti,N = Xtj,N − Xti,N  dt + dBti (6.1)
N
j=1
Bank i borrows to bank j if its reserve is lower.
LLN should lead to the following equation when the number of banks goes to infinity:

dX̄t = (E X̄t − X̄t )dt + dBt (6.2)

,→ Solution is given by an Ornstein-Uhlenbeck process (note E X̄t = E[X0 ]).
• Typical questions:
1. Does the solution of (6.2) describes well the system (6.1) when N → +∞?
2. Alternatively: Is (6.1) a good approximation of (6.2)?
116
6.1.2 Notations
R
• For p ≥ 1, Pp (Rd ) is the space of probability measure µ satisfying |x|p µ(dx) <
∞.
• Wasserstein distance on Pp :
1
Wp (µ, ν) := inf E[|X − Y |p ] p . (6.3)
X∼µ,Y ∼ν
Note that (Pp , Wp ) is a polish space6 see e.g [75] or [12] among others.
1 PN 1 PN
• For later use: denote µN = N i=1 δxi and ν N = N i=1 δyi , then
N
1 X
W22 (µN , ν N ) ≤ |xi − yi |2 (6.4)
N
i=1
where x ∈ (Rd )N , y ∈ (Rd )N .
6
A complete metric space.
117
6.1.3 Existence and uniqueness
• We consider, for ξ ∈ L2 ,
Z t Z t
Xt = ξ + b(Xs , L(Xs ))ds + σ(Xs , L(Xs ))dWs , t ≤ T. (6.5)
0 0
• In this context, (HL) rewrites

|b(x, µ) − b(x0 , µ0 )| + |σ(x, µ) − σ(x0 , µ0 )| ≤ L |x − x0 | + W2 (µ, µ0 ) (6.6)
Theorem 6.1. Under (HL), there exits an unique solution (6.5)
Proposition 6.1. Denote µt := L(Xt ), then (µ)0≤t≤T is a weak solution to
1 2
∂t µt = −∂x {b(x, µt )µt } + ∂xx {σ 2 (x, µt )µt } (6.7)
2
118
Proof. of Theorem 6.1 [b = 0 for the proof] Let Φ : S 2 → S 2 given by
Z t
Xt = Φ(x)t := ξ + σ(xs , L(xs ))dWs (6.8)
0
and denote ∆X := X − X 0 = Φ(x) − Φ(x0 ). We compute
Z r
2
sup |∆Xr | ≤ sup | δσs dWs |2 (6.9)
r≤t r≤t 0
where δσs := σ(xs , L(xs )) − σ(x0s , L(x0s )). It satisfies

E |δσs |2 ≤ 2L2 E |xs − x0s |2 + W22 (L(xs ), L(x0s ))

≤ 4L2 E |xs − x0s |2 (6.10)
Using BDG inequality, we obtain
Z t
2 2
E sup |∆Xr | ≤ CE |δσs | ds (6.11)
r≤t 0
And from (6.10), we deduce
Z t

2
E sup |∆Xr | ≤ C E |xs − x0s |2 ds (6.12)
r≤t 0
and a fortiori, setting ∆x = x − x0 ,
Z t
2 2
E sup |∆Xr | ≤ C E sup |∆xr | ds (6.13)
r≤t 0 r≤s
Iterating this inequality, one obtains, denoting Φ(k+1) = Φ ◦ Φ(k) ,
" #
(CT )k
E sup |Φ(k) (x)r − Φ(k) (x0 )r |2 ≤ E sup |∆xr |2 ds (6.14)
r≤t k! r≤T
so that Φ has a unique fixed point. 2
119
...
120
Proof. of Proposition 6.1. Let φ ∈ Cc ([0, T ) × R) and apply Ito’s formula to get
Z T
1
E[φ(t, Xt )] = E[φ(0, X0 )] + E (∂t φ + b(Xs , µs )∂x φ + σ(Xs , µs )2 ∂xx
2
φ)(s, Xs )ds
0 2
(6.15)
Thus, φ having compact support,
Z T Z
1
0= {(∂t φ + b(x, µs )∂x φ + σ(x, µs )2 ∂xx
2
φ)(s, x)}µs (dx)ds (6.16)
0 R 2
which expresses the weak solution property.
Recall that if µt has a smooth density that me denote mt : We can use Integration
by Part formula to get
Z
E[b(Xs , µs )∂x φ(s, Xs )] = b(x, µs )∂x φ(s, x)ms (x)dx (6.17)
R
Z
=− φ(s, x)∂x {b(x, µs )ms (x)}dx . (6.18)
R
6.2 Particle system approximation
• We consider the following example, denoting µt := L(Xt ),
Z
dXt = β(Xt , y)µt (dy)dt + dWt , (6.19)
X0 = ξ ∼ µ ∈ P2 (6.20)
where β is a Lipschitz function. (Note that b(x, ν) = E[β(x, χ)] where χ ∼ ν.)
,→ Unique solution to the above equation as (HL) is satisfied.
(This is the laboratory example of [72])
121
Proof. We observe that

b(x, ν) − b(x, ν 0 ) = E β(x, χ) − β(x, χ0 ) for any χ ∼ ν, χ0 ∼ ν 0 (6.21)
Then, since β is Lipschitz continuous, we get

|b(x, ν) − b(x, ν 0 )| ≤ LE |χ − χ0 | (6.22)
1
≤ LE |χ − χ0 |2 2 (6.23)
where we used Cauchy-Schwartz for the last inequality. Note that χ and χ0 are
arbitrary so that
1
|b(x, ν) − b(x, ν 0 )| ≤ L inf0 0
E |χ − χ 0 2 2
| = LW2 (ν, ν 0 ) (6.24)
χ∼ν,χ ∼ν
122
• Associated particle system:
N
1 X
dXti = β(Xti , Xtj )dt + dWti , (6.25)
N
j=1
X0i = ξ i
where (ξ 1 , . . . , ξ N ) are iid with law µ0 and (W i ) are i.i.d. Brownian motion.
• Observe that
N N
1 X 1 X
β(x, Xtj ) = b(x, µN
t ) where µ N
t := δX j (empirical measure part. syst.).
N N t
j=1 j=1
(6.26)
• We consider N independent particles with the dynamics (6.19)
dX̄ti = b(X̄ti , L(X̄ti ))dt + dWti (6.27)
Same BM as in (6.25), observe that L(X̄ti ) = µt and we denote
N
1 X
µ̄N
t := δX̄ j (6.28)
N t
j=1
Theorem 6.2. The following holds, for all T > 0,

" #
CT
ϕ(T ) := max E sup |Xti − X̄ti |2 ≤ (6.29)
i t≤T N
• From this, one deduces that the law of the first k particules (X i ) converges to the
law of the first k independent particules (X̄ i ):
This phenomenon is called Propagation of chaos (the particules become independent
at the limit N → ∞), see [72].
123
Proof. Let δX i := X i − X̄ i , r ≤ T ,
Z r Z r
|δXri |2 =| {b(Xsi , µN
s ) − b(X̄si , µs )}ds|2 ≤T |b(Xsi , µN i 2
s ) − b(X̄s , µs )| ds
0 0
(6.30)
leading to
Z t
i 2

E sup |δXr | ≤ CT E |b(Xsi , µN i N 2 i N i 2
s ) − b(X̄s , µ̄s )| + |b(X̄s , µ̄s ) − b(X̄s , µs )| ds
r≤t 0
(6.31)
Now,
 
N
X
1
E |b(Xsi , µN i N 2
s ) − b(X̄s , µ̄s )| ≤ CE|Xsi − X̄si |2 + |Xsj − X̄sj |2 
N
j=1
≤ Cϕ(s) . (6.32)
since b satisfies (HL), recalling (6.4).
124
Moreover,
N
1 X
|b(X̄si , µ̄N i 2
s ) − b(X̄s , µs )| = | {β(X̄si , X̄sj ) − b(X̄si , µs )}|2 (6.33)
N
j=1

Denote Y j = β(X̄si , X̄sj ) − b(X̄si , µs ) and observe that E Y j |X̄ i = 0, for j 6= i and
/ {i, j} Y j and Y k are independent.

k∈
We then compute
 2 
XN
1
E |b(X̄si , µ̄N i
s ) − b(X̄s , µs )|
2
= 2 E Y j  (6.34)
N
j=1
 
1 X h i X h i
= 2 E |Y i |2 + 2 E Y iY k + E Y jY k 
N
k6=i j6=i,k6=i
N
X
1 C
= E |Y j |2 ≤ (6.35)
N2 N
j=1
Combining (6.31)-(6.32)-(6.35), we arrive at
Z t
C
ϕ(t) ≤ ϕ(s)ds + (6.36)
0 N
and the result follows from Gronwall Lemma. 2
125
• It is possible also to obtain a weak error as follows
1
|E Ψ(µN
T ) − Ψ(µT )| = O( ) (6.37)
N
for Ψ : P2 → R under some smoothness condition on Ψ, see [24].
Time discretization and other approximation schemes
• Full discretization of the particles system requires a time discretization see e.g.
[14].
• The theoretical time discretization is also studied in Chapter 5 of [58] in a
Lipschitz setting.
• Other type of approximation schemes are possible: via “projection” [9], via
optimal quantization see Chapter 7 of [58], cubature method see [30].
126
6.3 Singular interaction
6.3.1 Burgers Equation
• Burgers equation is given by
ν2 2
∂t u + u∂x u = ∂ u and u(0, ·) = g(·) (6.38)
2 xx
ν is the viscosity coefficient (ν = 0, inviscid Burgers equation which is a scalar
conservation law)
A simple model from fluid dynamics but represent also Carbon Allowance Price in
some simple model (a slight modification of it), see e.g. [19].
• When ν > 0, there exists a unique solution given by

h i
R (x−y) − G(y) − |x−y|2 νWt
E Ĝ ν (x + νWt )
t e
ν 2 2ν t dy
2
t
u(t, x) = R − G(y) − |x−y|2 =− h i (6.39)
e ν2 2ν 2 t dy E Ĝν (x + νWt )
or alternatively
h i
E Ĝ0ν (x + νWt )
u(t, x) = −ν 2 h i (6.40)
E Ĝν (x + νWt )
G(y)
with Ĝν (y) = e− ν2 .
Proof. We assume smoothness of the function in the following computations. Ob-
serve that u = ∂x w where w is solution to the HJB equation:

Z ·
1 ν2 2
∂t w + (∂x w)2 = ∂xx w and w(0, ·) = G(·) := g(z)dz . (6.41)
2 2 −∞
−w
Define θ := e η and compute

θ θ 2 2 (∂x2 w)2 θ
∂t θ = −∂t w , ∂x θ = −∂x w , ∂xx θ = − ∂xx w − (6.42)
η η η η
Setting η = ν 2 , we get from (6.41):
ν2 2 G(·)
∂t θ = ∂xx θ and θ(0, ·) = e− ν 2 (6.43)
2
127
This implies that
Z
1 G(y) |x−y|2
θ(t, x) = √ e− ν 2 − 2ν 2 t dy (6.44)
σ 2πt
Z
= Ĝν (x − y)φν (t, y)dy (6.45)
We then observe that
∂x θ
u = ∂x w = −ν 2 . (6.46)
θ
Differentiating (6.44) or (6.45) yields the result. 2
A particle system representation We follow [13] and assume that g is a cdf.
• Setting v = ∂x u. We obtain, that v is a weak solution
σ2 2
∂t v + ∂x (uv) = ∂ v and v0 = ∂x g (6.47)
2 xx
X0 ∼ v0 and
Z t
Xt = X0 + u(s, Xs )ds + σWt and Xt ∼ vt (6.48)
0
Rx R∞
We have u(t, x) = −∞ v(t, y)dy = −∞ H(x, y)v(t, y)dy with H(x, y) = 1y≤x .
• When ν = 0, there exists a unique "good physical solution" (a.k.a. entropy solution
see) For g(x) = 1[0,+∞) (x), it is given by
x 0∧x∨t
u(t, x) = 1{0≤x≤t} + 1{x>t} = (6.49)
t t
• Since u(t, x) = E[H(x, Xt )], we introduce the particle system:
M
X 1
dXti = H(Xti , Xtj )dt + σdWti . (6.50)
M
j=1
• Simulation of the particle system gives:
128
Legend. blue : σ = 1, orange: σ = 0.5, green: σ = 0.1, red: σ = 0
parameters: M = 100000, Time grid with n = 21 dates.
• The algorithm converges to the true value when σ → 0. Note that the sum is
obtained by sorting the system, so that denoting rk(Xti ) the rank of the particle Xti ,
(6.52) is replaced in practice by
rk(Xti )
Xtik+1 = Xtik + + σ∆Wki . (6.51)
M
• If u0 = 1 − g, with g cdf then the particle system is (Exercice)
M
X 1
dXti = (1 − H)(Xti , Xtj )dt + σdWti and X0 ∼ g 0 (6.52)
M
j=1
For g(x) = 1 ∧ (x − 1) ∨ 0, the algorithm behaves very well as it captures the shock
appearing at t = 1 in the entropy solution:
129
Legend. blue : T = 0, orange: T = 0.5, green: T = 1, red: T = 2
parameters: σ = 0.1, M = 10000 ! Time grid with n = 21 dates.
• See [14] for generalisation and study of the error.
130
6.3.2 Calibration of LSV model
[45, 43]
Local volatility model .
• For a volatility function σ
dSt = rSt dt + σ(t, St )St dWt and S0 = x (6.53)
• Calibration to market data imposes restrictions on σ, namely Dupire’s formula
gives
∂T C(T, K) + rK∂K C(T, K)

σD (T, K)2 = 2 2 C(T, K) (6.54)
K 2 ∂KK
where C(T, K) is the price observed today (S0 = x) of Call option with strike K
and maturity T .
,→ No extra volatility risk in this class of models...
131
Stochastic volatility model .
• Given a a stochastic process:
dSt = rSt dt + at St dWt (6.55)
e.g. a follows a brownian SDE
dat = β(t, at )dt + α(t, at )dBt (6.56)
where B is a BM possibly correlated to W .
132
Local Stochastic volatility model [57, 69].
• should incorporate volatility risk but also calibrate to European call option.
dSt = rSt dt + at σ(t, St )dWt (6.57)
for a given stochastic process a.
• Calibration to European call option imposes
σD (t, x)
σ(t, x) = q (6.58)
E a2t |St = x
so that S above has the following dynamics
σD (t, St )at
dSt = rSt dt + q dWt (6.59)
E a2t |St
133
• This comes from Gyöngy Theorem [46], see also [18]. Namely, for an Ito process
dZt = βt dt + αt dWt (6.60)
there exists a diffusion Z D
dZtD = b(t, ZtD )dt + Σ(t, ZtD )dWt (6.61)
such that L(Zt ) = L(ZtD ) for all t ≥ 0, where

b(t, x) = E[βt |Zt = x] and Σ(t, x)2 = E αt2 |Zt = x . (6.62)
134
Heuristics: 1. For Gyöngy Theorem (β = 0). Let φ ∈ Cc ([0, T ) × R) and apply Ito’s
formula to get
Z T
1 2 2
E[φ(t, Zt )] = E[φ(0, Z0 )] + E (∂t φ + αt ∂xx φ)(s, Zs )ds (6.63)
0 2
Since φ has compact support,
Z T
1 2 2
0=E (∂t φ + E αs |Zs ∂xx φ)(s, Zs )ds (6.64)
0 2
Z T
1 2
=E (∂t φ + Σ(s, Zs )∂xx φ)(s, Zs )ds (6.65)
0 2
Denoting µt = L(Zt ), we have
Z T Z
1
0= (∂t φ + Σ(s, x)2 ∂xx
2
φ)(s, x)µs (dx)ds (6.66)
0 2
so that (µt )t is a weak solution to the Fokker-Planck equation
1 2
∂t m = ∂xx (Σ(t, x)2 m) and m0 = L(Z0 ) , (6.67)
2
which is also satisfied by L(ZtD ).
2. Combining Gyöngy Theorem and Dupire result, we must have, recalling (6.57),

E (at σ(t, St ))2 |St = x = σD (t, x) (6.68)
which yields (6.58) .
• Existence and uniqueness for (6.59): mostly open, see nevertheless [1, 49] and [54].
135
Numerical simulation of (6.59) [77, 44] .
• Following MC approach, set at = f (Yt ) where Y is a process that can be simulated.

,→ The main question is to compute E f 2 (Yt )|St
R
• Consider φ : R → R a smooth function s.t. φ(x)dx = 1 and set
1 x
φ (x) := φ( ) , ∀x ∈ R. (6.69)

- If (Sti )1≤i≤N is a particle system associated to St (with or without MF inter-
action) an estimator of ρt (x) (the density function of St ) is
N Z
1 X
µt (x) ' φ (x − St ) = φ (x − y)dµN
i
t (dy) (6.70)
N
i=1
- Nadaraya-Watson estimator at x
PN 2 i i
i=1 f (Yt )φ (x − St )
Θ(t, x) := E f 2 (Yt )|St = x ' PN =: ΘN (t, x) (6.71)
i
i=1 φ (x − St )
- (6.58) is thus approximated by
σD (t, x)
σ(t, x) ' σN (t, x) := p (6.72)
ΘN (t, x)
136
Heuristics.
1. Consider the kernel estimator of the density ρt (x, y) of L(Yt , St ) for two parame-
ters , 0 namely
N
1 X
φ (x − Sti )φ0 (y − Yti ) (6.73)
N
i=1
Then one sets
1 PN i
N i=1 φ (x − St )φ0 (y − Yti )
ρ(y|x) ' 1 P N
(6.74)
i
N i=1 φ (x − St )
and
1 PN i
R 2
i=1 φ (x − St ) f (y)φ0 (y − Yti )dy
E f 2 (Yt )|St = x ' N
1 P N
(6.75)
i
N i=1 φ (x − St )
sending 0 → 0 we retrieve (6.71).
Remark: if f 2 (y) = y and the function φ is symetric then one has directly
Z
yφ0 (y − Yti )dy = Yti . (6.76)
2. Alternatively, one can see

2 E f 2 (Yt )δ0 (x − St )
E f (Yt )|St = x ” = ” (6.77)
E[δ0 (x − St )]
After smoothing of δ0 (the Dirac at 0), we get

2 E f 2 (Yt )φ (x − St )
E f (Yt )|St = x ' (6.78)
E[φ (x − St )]
A particle approximation of the previous expression leads to (6.71).
137
• Assume we are given an approximation Ȳ of Y on the grid {tk , k = 0, . . . , κ}.
• Particle system for time-discretized (6.59): for all j ≤ N ,
Stjk+1 = rStjk h + σN (tk , Stjk )f (Ȳtk )Stjk ∆Wj (6.79)
• Acceleration in practice:
1. In σN (t,x), ΘN (t, x) is not computed for all x = Stjk but on a grid of R. The
value at x = Stjk is then obtained by interpolation.
2. The sums (6.71) has to be optimised for the method to be not too expensive
computation wise.

Remark 6.1. Note that the quantity E f 2 (Yt )|St can also be obtained by a non-
parametric regression estimate (see US option)
Numerical Example We consider Fake Brownian Motion, namely models of the
type:
f (Yt )
dXt = p dWt (6.80)
E[f 2 (Yt )|Xt ]
,→ the Markovian projection is indeed a Brownian motion and thus L(Xt ) = L(Wt ),
for all t.
• In the numerics, Y follows the SDE
dYt = −Yt dt + dBt and Y0 = 0 (6.81)
and
f : x 7→ 0.1 + sin(x)2 (6.82)
• We implement the previous scheme and obtain at T = 1, M = 100000, N time =
20: (Legend: orange: true gaussian density, blue: estimation of XT density function)
138
• "Call price" in this model:
1. E[[X]+ ]:
prix= 0.381 std estim= 0.0018
BM: prix= 0.390 std estim= 0.0018
True: 0.399
2. E[[X − 1]+ ]
X: prix= 0.076 std estim= 0.0008
3. E[[X − 0.5]+ ]
X: prix= 0.183 std estim= 0.0013
139
4. supremum:
E[maxt (Xt )]= 0.600 std estim = 0.0018
E[maxt (Wt )]= 0.658 std estim = 0.0019
140
Contents
I Handouts 15
1 Introduction 16
2 Review of the linear case 17
2.1 Mathematical & Financial Framework . . . . . . . . . . . . . . . . . 17
2.1.1 SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.2 Useful estimates under (HL) . . . . . . . . . . . . . . . . . . 18
2.1.3 Link with PDEs . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.4 Financial setting . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Euler Scheme for SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.1 Definition and first properties . . . . . . . . . . . . . . . . . . 23
2.2.2 Weak convergence for vanilla options . . . . . . . . . . . . . . 27
2.3 Implementation using Monte Carlo Methods . . . . . . . . . . . . . . 31
2.3.1 Quick review of the case without bias . . . . . . . . . . . . . 31
2.3.2 The case with bias . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.3 Convergence of Mean Square Error for MC method . . . . . . 35
2.4 Implementation using quantisation of Brownian increments . . . . . 37
2.5 Strong convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.1 Lipschitz case . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.2 Non globally Lipschitz case . . . . . . . . . . . . . . . . . . . 47
2.6 Path-dependent options . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.6.1 General consideration . . . . . . . . . . . . . . . . . . . . . . 51
141
2.7 Acceleration methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.7.1 Reducing the bias . . . . . . . . . . . . . . . . . . . . . . . . 53
2.7.2 Multi-Level Monte Carlo . . . . . . . . . . . . . . . . . . . . . 54
2.8 High order schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.8.1 High order weak convergence . . . . . . . . . . . . . . . . . . 59
3 Computing sensitivities in the linear case 60
3.1 Finite-Difference Approximations . . . . . . . . . . . . . . . . . . . . 60
3.1.1 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2 Tangent process approach . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.1 Tangent process . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.2 Computing the delta (pathwise approach) . . . . . . . . . . . 67
3.2.3 Practical implementation . . . . . . . . . . . . . . . . . . . . 69
3.3 Greek weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3.1 Likelihood Ratio Method . . . . . . . . . . . . . . . . . . . . 71
3.3.2 Integration by part . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3.3 Bismut’s formula . . . . . . . . . . . . . . . . . . . . . . . . . 74
4 U.S. options in complete market 75
4.1 Definition and first properties . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Bermudan option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.1 Discretisation of the forward process . . . . . . . . . . . . . . 82
4.2.2 Longstaff-Schwarz algorithm . . . . . . . . . . . . . . . . . . . 84
4.3 Dual approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
142
4.4 Implementation using regression techniques . . . . . . . . . . . . . . 87
4.4.1 Linear Regression-based methods . . . . . . . . . . . . . . . . 87
4.4.2 Implementation of the dual approach . . . . . . . . . . . . . . 90
4.5 Quantization based methods . . . . . . . . . . . . . . . . . . . . . . . 92
4.5.1 Introduction - cubature formula . . . . . . . . . . . . . . . . . 92
4.5.2 Quantization tree for optimal stopping problem . . . . . . . . 97
4.5.3 Markovian quantization (grid method) . . . . . . . . . . . . . 99
5 Non-linear pricing methods 101
5.1 Backward Stochastic Differential Equation . . . . . . . . . . . . . . . 101
5.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.1.2 Some key properties . . . . . . . . . . . . . . . . . . . . . . . 103
5.1.3 Application to non-linear pricing . . . . . . . . . . . . . . . . 104
5.2 Main properties in the Markov setting . . . . . . . . . . . . . . . . . 107
5.3 Numerical analysis of backward Methods . . . . . . . . . . . . . . . . 108
5.3.1 L2 -stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.4 Convergence analysis assuming no error on X . . . . . . . . . . . . . 112
5.4.1 Truncation error . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.4.2 Order of convergence in the smooth case . . . . . . . . . . . 113
5.4.3 Order of convergence in the Lipschitz case . . . . . . . . . . 115
5.5 Full discrete-time error analysis . . . . . . . . . . . . . . . . . . . . . 118
5.5.1 Truncation error . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.5.2 Convergence analysis . . . . . . . . . . . . . . . . . . . . . . . 119
5.6 Numerical illustration and further consideration . . . . . . . . . . . . 121
143
5.6.1 Monte Carlo based methods . . . . . . . . . . . . . . . . . . . 121
5.6.2 Tree based methods . . . . . . . . . . . . . . . . . . . . . . . 121
5.6.3 (Markovian) quantisation . . . . . . . . . . . . . . . . . . . . 123
5.7 An example of forward method . . . . . . . . . . . . . . . . . . . . . 125
5.7.1 Discretization of the optimisation problem (5.81) . . . . . . . 126
6 An introduction to McKean-Vlasov SDEs in finance 128
6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.1.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.1.3 Existence and uniqueness . . . . . . . . . . . . . . . . . . . . 130
6.2 Particle system approximation . . . . . . . . . . . . . . . . . . . . . . 133
6.3 Singular interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.3.1 Burgers Equation . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.3.2 Calibration of LSV model . . . . . . . . . . . . . . . . . . . . 143
II Exercises 158
III Partial correction to Exercises 168
144
Part II
Exercises
Exercise II.1. Prove inequality 2.4.
Exercise II.2. Prove inequality 2.5.
Exercise II.3. Prove inequality 2.6
Exercise II.4. Prove the statement of Proposition 2.1.
Exercise II.5. Prove inequality (2.37).
Exercise II.6. Let X π denote the Milstein scheme for X (one dimensional SDE)
given by
Z t Z t
Xt = X0 + b(Xs )ds + σ(Xs )dWs , t≤T,
0 0
where b and σ are Cb2 and σ is bounded.
1. Recall the definition of X π and of the (global) truncation error T .
2. Prove the stability of the scheme i.e.

" # " #
E sup |Xu − Xuπ |p ≤ Cp E sup |T (u)| p
,
u≤T u≤T
for p ≥ 2 .
Exercise II.7. 1. Prove Gronwall Lemma.
2. Let φ be a non-decreasing non-negative function satisfying
Z t Z t 21
2
φ(t) ≤ A φ(s)ds + B φ (s)ds +C , ∀t ≤ T ,
0 0
145
where A, B, C are positive constants.
Show that, for all t ≤ T ,
2 )t
φ(t) ≤ 2Ce(2A+B .
Exercise II.8. Let X π be given by
1
π
Xi+1 = Xiπ + b(Xiπ )h + σ(Xiπ )∆i W + [σσ 0 ](Xiπ ) (∆i W )2 − h
2
1 1
+ b0 σ(Xiπ )∆i Z + [bb0 + b00 σ 2 ](Xiπ )h2
2 2
1
+ [bσ 0 + σ 00 σ 2 ](Xiπ ) (∆i W h − ∆i Z)
2

1 2 00 0 2 π 1 2
+ [σ σ + σ(σ ) ](Xi ) (∆i W ) − h ∆i W
2 3
R ti+1
where ∆i Z = ti (Ws − Wti )ds, ∆i W = Wti+1 − Wti .
This is an Ito-Taylor scheme with strong order 1.5 (no proof required!).
1. Derive heuristically the scheme from the original SDE.
2. What is the distribution of (∆i W, ∆i Z)?
3. Explain how to simulate (∆i W, ∆i Z).
Exercise II.9. (Asian Option) Let X be a one dimensional SDE given by
Z t Z t
Xt = X0 + b(Xs )ds + σ(Xs )dWs , t≤T,
0 0
representing the price of some stock. We assume a deterministic interest rate r and
the existence of a unique pricing measure.
1. Recall the price of an european option with payoff g(XT ) at maturity T , where
g is some Lipschitz function. Explain how to compute this price using MC
simulation, in particular give the Euler scheme X π for X.
146
RT
2. We consider now options written on AT = 0 Xs ds. Using a grid π with
constant step size, we introduce an Euler scheme Aπ to compute A, based on
X π . Show that
1 C
E[|g(AT ) − g(AπT )|p ] p ≤ √ ,
n
where g is Lipschitz continuous function.
3. The Black-Scholes model.
(a) Write down the Euler scheme for the Black-Scholes model. Explain one
drawback of this scheme and why, in fact, we don’t need to use it.
(b) We assume now that X is perfectly approximated on π and we consider
the same approximation Aπ as above (using X...). Prove that
1 C
E[|g(AT ) − g(AπT )|p ] p ≤ .
n
Exercise II.10. Let X be a one dimensional SDE given by

Z t Z t
Xt = X0 + b(Xs )ds + σ(Xs )dWs , t≤T.
0 0
T
We denote by X n its Euler scheme using a grid π with constant time step n. We
consider a bounded measurable function f and assume that the following expansion
holds true:
c1 1
E[f (XT )] = E[f (XTn )] + + O( 2 ) . (6.83)
n n
1. We assume that the Euler scheme is computed using MC simulation. Give an
expression of the MSE, if one uses the ’usual’ estimator.
2. One would like to take advantage of the expansion (6.83). Suggest a new
approximation of E[f (XT )] using X n and X 2n with precision O( n12 ).
147
3. One would like to implement this new approximation in practice.
(a) We first simulate X n and X 2n using two independent batches of Brownian
increments. Give the expression of the MSE and the asymptotic variance
of the estimator (when n → ∞).
(b) We now simulate X n and X 2n using the same Brownian increments.
i. Explain how one could do that in practice, implement the method...
ii. Give the expression of the MSE and the asymptotic variance of the
estimator (when n → ∞) and compare with the previous result.
Exercise II.11. (Lookback Option) Let X be the solution of the one dimensional
SDE:
Z t
Xt = X0 + σ(Xs )dWs ,
0
representing the price of some underlying asset. We assume that σ is Lipschitz and
positive. We would like to approximate the price of an option with the following
payoff at maturity: g(maxt∈[0,T ] Xt ). We introduce the (continuous) Euler scheme
X π for X, on a grid with constant timestep h = T /n.
We assume that

π
E g( max Xt ) ' E g( max Xt ) .
t∈[0,T ] t∈[0,T ]
We know how to simulate Xtπi , ti ∈ π. The goal here is to understand how to simulate
then
max Xtπ = max max Xtπ .

t∈[0,T ] i t∈[ti ,ti+1 ]
148
knowing that (no proof required!)
Law( max Xtπ | (Xtπj )j≤n ) = Law( max Xtπ | Xtπi , Xtπi+1 )
t∈[ti ,ti+1 ] t∈[ti ,ti+1 ]
1. Using the reflection principle, one can prove that for z ≥ y
P( sup Wt ≥ z, Wt ≤ y) = P(Wt ≥ 2z − y) .
t∈[0,T ]
Deduce that for z ≥ y
2
P( sup Wt ≥ z, | Wt = y) = e− T z(z−y) .
t∈[0,T ]
2. Show that, for M ≥ xi ∨ xi+1
P( max Xtπ ≤ M | Xtπi = xi , Xtπi+1 = xi+1 )

t∈[ti ,ti+1 ]
(M −xi )(M −xi+1 )
−2
hσ 2 (xi )
=1−e =: Fh (M ; xi , xi+1 ) .

3. Explain how to compute E g(maxt∈[0,T ] Xtπ ) in practice.
Exercise II.12. Let b and σ be Cb2 functions from R to R and X be the solution of
the following SDE
Z t Z t
Xt = X0 + b(Xs )ds + σ(Xs )dWs , 0 ≤ t ≤ T .
0 0
1. Recall the definition of ∇X the tangent process for X and the equation (E) it
satisfies .
2. Prove that
" #
p
E sup |∇XT | ≤ Cp .
t≤T
3. Show uniqueness to (E).
149
Exercise II.13. Let X be given by
Z t Z t
Xtx =x+ b(Xsx )ds + σ(Xsx )dWs ,
0 0
where b and σ are Cb2 .
1. Recall the definition of the tangent process.
2. Write down the Euler scheme for X and for ∇X, on a discrete time grid π.
3. In the case d = 1 case, suggest an approximation of ∇X wich is positive.
Z t Z t
0 0
where b and σ are Cb2 .
We consider the Euler approximation of Question 2 in Exercise II.13.
1. Recall the rate of strong convergence of the Euler scheme for X.
2. Show that
" #
2
E sup |∇Xt − ∇Xti | ≤ C|π|
t∈[ti ,ti+1 ]
3. Show that
" #
E sup |∇Xt − ∇Xtπ |2 ≤ C|π| .
t≤T
4. We approximate E[g(XT )∇XT ] by E[g(XTπ )∇XTπ ]. Give an upper bound of the
error when using the previous approximation.
5. Explain how to compute the previous approximation in practice.
150
Exercise II.15. We work in the Black-Scholes setting with
dXt = rXt dt + σXt dXt .
Using the likelihood ratio method, show that
1. the delta is given by

−rT x WT
∂x u(0, x) = e E g(XT ) ;
xσT
2. the vega is given by

2
−rT x WT 1
∂σ u(0, x) = e E g(XT ) − WT − ;
σT σ
3. the Gamma is given by

2
2 −rT 1 x WT 1
∂xx u(0, x) =e E g(XT ) − WT − .
x2 σT σT σ

Z t Z t
0 0
where b and σ are Cb2 . For α > 0 and H be a bounded progressively measurable
process, we introduce
Z t Z t
Xtx (α) =x+ (b(Xsx (α)) − αHs σ(Xsx (α))ds + σ(Xsx (α))dWs
0 0
∂X α
The goal of this exercise is to prove that Ut := ∂α α=0 satisfies
dUt = Ut (b0 (Xt )dt + σ 0 (Xt )dWt ) − Ht σ(Xt )dt and U0 = 0 . (6.84)
1. We define
Xt (α + ) − Xt (α)
∂α Xt (α) = lim
→0
151
using a heuristic argument, show that
Z t
∂α Xtx (α) = b0 (Xsx (α))∂α Xtx (α) − αHs σ 0 (Xsx (α))∂α Xtx (α) − Hs σ(Xsx (α) ds
0
Z t
+ σ 0 (Xsx (α))∂α Xtx (α)dWs
0
2. Setting α = 0, in the above equation, we retrieve (6.84). We now define

X x ()−X x
Γ := and
Z 1 Z 1
0
b̃t = b (Xt + λ(Xt − Xt ))dλ , σ̃t = σ 0 (Xt + λ(Xt − Xt ))dλ.
0 0
(a) Write down the dynamics of Γ using b̃ , σ̃ and show that, for ∈ [−1, 1]
" #
E sup |Γt |p ≤ Cp .
t≤T
(b) Show that

" #
lim E sup |∂α Xtx (0) − Γt |2 =0.
→0 t≤T
3. Recall the definition of ∇Xtx , the tangent process for X.
Ut
4. Compute the dynamics of ∇Xt and give the expression of U using (∇Xt ).
Exercise II.17. 1. Prove that the estimator given in Proposition ?? has finite
variance.
Exercise II.18. Prove Proposition 4.3 (hint: introduce a dominating bermudan
option with exercise payoff g(X) ∨ g(X π )).
Exercise II.19. 1. Prove that any L2 -optimal quantizer is stationary, recall (4.7).
In particular, prove (4.6).
2. Prove (4.8) and (4.9).
152
Exercise II.20. Prove Proposition 5.1.
Exercise II.21. Prove Theorem 5.2.
Exercise II.22. Prove Lemma 5.1.
153
Part III
Partial correction to Exercises

Correction to Exercise II.4
1. Localisation: Let us define for M a positive integer
τM := inf{t ≥ 0 | |Xtπ | ≥ M } ∧ T ,
which is a stopping time. We then consider
XtM,π := Xt∧τ
π
M
.
We observe that
Z t∧τM Z t∧τM
XtM,π = x + b(s̄, Xs̄π )ds + σ(s̄, Xs̄π )dWs ,
0 0
Z t Z t
=x+ b(s̄, Xs̄M,π )1{τM >s} ds + σ(s̄, Xs̄M,π )1{τM >s} dWs .
0 0
2. Estimate: Applying the following inequality, (a + b + c)p ≤ 3p−1 (ap + bp + cp )
(Jensen inequality), we compute, using step 1.

Z t Z t
|XtM,π |p ≤ 3p−1 (xp + | b(s̄, Xs̄M,π )1{τM >s} ds|p + | σ(s̄, Xs̄M,π )1{τM >s} dWs |p )
0 0
Using BDG inequality, we obtain

Z u Z t
p
M,π p p
E sup |Xu | ≤ Cp (x + E sup | b(s, Xs̄M,π )1{τM >s} ds|p +E | M,π 2
σ(s, Xs̄ ) 1{τM >s} ds| 2 )
0≤u≤t 0≤u≤t 0 0
p
Using Hölder inequality (recalling that p > 2 ≥ 1),
Z t Z t
M,π p p M,π
M,π p p p−1 −1 p
E sup |Xu | ≤ Cp (x + t E |b(s, Xs̄ )| ds + t 2 E |σ(s, Xs̄ )| ds
0≤u≤t 0 0
Using the linear growth property of b ad σ, we compute

Z t h i
M,π p
E sup |Xu | ≤ Cp (1 + E |Xs̄M,π |p du)
0≤u≤t 0
154
which a fortiori leads to
Z t
M,π p M,π p
E sup |Xu | ≤ Cp (1 + E sup |Xu | du) .
0≤u≤t 0 0≤u≤s

" #
E sup |XuM,π |p ≤ Cp .
0≤u≤T
3. Conclusion: We observe that, as M grows to infinity,
sup |XuM,π |p = sup |Xuπ |p

0≤u≤T 0≤u≤T ∧τM
converges increasingly to sup0≤u≤T |Xuπ |p .
The proof is then concluded by applying the monotone convergence theorem. 2
1. We have
π 1
Xi+1 = Xiπ + b(Xiπ )(ti+1 − ti ) + σ(Xiπ )(Wti+1 − Wti ) + a(Xiπ )((Wti+1 − Wti )2 − (ti+1 − ti )) .
2
and
Z u Z u
T (u) = (b(Xs ) − b(Xs̄ ))ds + (σ(Xs ) − σ(Xs̄ ) − a(Xs̄ )(Ws − Ws̄ ))dWs
0 0
2. We set δXt = Xt − Xtπ and observe that

Z t Z t Z t
δXt = (b(Xs̄ ) − b(Xs̄π ))ds + (σ(Xs̄ ) − σ(Xs̄π ))dWs + (a(Xs̄ ) − a(Xs̄π ))(Ws − Ws̄ )dWs
0 0 0
The only new extra term in the analysis, compared with the Euler scheme is
Z t
At := (a(Xs̄ ) − a(Xs̄π ))(Ws − Ws̄ )dWs .
0
155
We then compute, using BDG inequality,
Z t
p
p π 2
E sup |Au | ≤ Cp E ( |a(Xs̄ ) − a(Xs̄ ))(Ws − Ws̄ )| 2 ) ds
0≤u≤t 0
Z t
π p
≤ Cp E |a(Xs̄ ) − a(Xs̄ ))(Ws − Ws̄ )| ds
0
Z t
π p p
≤ Cp E |a(Xs̄ ) − a(Xs̄ )| Es̄[|Ws − Ws̄ | ] ds
0
Z t
≤E |a(Xs̄ ) − a(Xs̄π )|p ds .
0
Since σ is bounded and Cb2 , we have that a is Lipschitz and we finally obtain

E sup |Au |p ≤ Cp E sup |δXu |p .
0≤u≤t u≤t
The proof is then concluded using the same arguments as the ones used for
the Euler scheme.
1.
2. We observe that, φ being non-decreasing and non-negative,
Z t Z t Z t
2 1 φ(t) B
1
( φ (s)ds) ≤ (φ(t)
2 φ(s)ds) ≤ 2+ φ(s)ds ,
0 0 2B 2 0
where we used Young’s inequality for the last part. Finally, we obtain
Z t
2
φ(t) ≤ (2A + B ) φ(s)ds + 2C .
0
The proof is concluded using the standard Gronwall’s Lemma.
156
1. Apply Ito’s formula to b(Xs ) and σ(Xs ) and discretize the integrals. We sketch
the proof for b = 0. Applying Ito’s formula once, we get
1
Xti+1 ' Xti + σ(Xti )∆i W + σ 00 σ 2 (∆i W h − ∆i Z)
2
Z ti+1 Z t
+ σ 0 σ(Xs )dWs dWt
ti ti
Expanding the last integral (keeping only the dW terms), we get
1
Xti+1 ' Xti + σ(Xti )∆i W + σ 00 σ 2 (∆i W h − ∆i Z)
2
Z ti+1 Z t
+ σ 0 σ(Xti ) dWs dWt
ti ti
Z ti+1 Z tZ s
+ (σ 0 σ)0 σ(Xti ) dWr dWs dWt
ti ti ti
The proof is completed by checking that

Z ti+1 Z ti+1
3
∆i W = 3 Ws ds + 3 Ws2 dWs
ti ti
Z ti+1 Z ti+1 Z t Z ti+1
=3 (Ws − Wti )ds + 6 (Ws − Wti )dWs dWt + 3 (s − ti )dWs
ti ti ti ti
Z ti+1 Z tZ s
= 3(ti+1 − ti )∆i W + 6 dWr dWs dWt
ti ti ti

2. Gaussian vector s.t. E[∆i Z] = 0, E |∆i Z|2 = 13 ∆3 and E[∆i Z∆i W ] = 12 ∆2 .
√ 3
3. Set ∆i W = hG1 , ∆i Z = 12 h 2 (G1 + √1 G2 ),
3
(G1 , G2 ) ∼ N (0, I2 ).

1. EQ e−T r g(XT ) , Xi+1
π = X π + rh + σ(X π )∆ W Q .
i i i
2. Euler scheme.
3. The Black-Scholes model.
π = X π (1 + rh + σ∆ W Q ) can take negative values.

(a) Xi+1 i i
157
(b) We first observe that
1 1
E[|g(AT ) − g(AπT )|p ] p ≤ CE[|AT − AπT |p ] p
Z T
AT − AπT = (Xs − Xs̄ )ds
0
We compute, using an IBP argument,

Z ti+1 Z ti+1 Z s Z s
(Xs − Xti )ds = rXu du + σXu dWu ds
ti ti ti ti
Z ti+1 Z ti+1
= (ti+1 − s)rXs ds + (ti+1 − s)σXs dWs
ti ti
We then have
Z T Z T
∗ ∗
E[|AT − AπT |p ] ≤2 p−1
(E | (s − s)rXs ds| p
+E | p
(s − s)σXs dWs | )
0 0
Z T Z T
p
≤ Cp ( |π|p E sup |Xu |p ds + E | |(s∗ − s)σXs | 2 ds|2 )
0 u 0
Z T
≤ Cp |π| E sup |Xu | ds ≤ Cp |π|p .
p p
0 u
1 PN n j
1. The ’classical’ estimator is N j=1 f ((XT ) ) and then
V ar(f ((XTn )1 )) 1 V ar(f ((XTn )1 ))

M SE = |E f (XT ) − f ((XTn )1 ) |2 + = O( 2 ) + .
N n N
2. We have that
1
E 2f (XT2n ) − f (XTn ) = E[f (XT )] + O( 2 ) .
n
3. (a) We have
1 V ar[2f (XT2n ) − f (XTn )]

M SE = O( ) +
n4 N
1 4V ar[f (XT )] + V ar[f (XTn )]
2n
= O( 4 ) +
n N
158
because of independence. Then we have that, for n big,
V ar[2f (XT2n ) − f (XTn )] ∼ 5V ar[f (XT )] .
(b) We have
1 V ar[2f (XT2n ) − f (XTn )]

M SE = O( ) +
n4 N
and at the limit
V ar[2f (XT2n ) − f (XTn )] ∼ V ar[f (XT )] .
The estimator has then a smaller variance.
1. We compute
P( sup Wt ≥ z, | Wt = y) = lim P( sup Wt ≥ z, | y ≤ Wt ≤ y + η)

t∈[0,T ] η↓0 t∈[0,T ]
[P(WT ≥ 2z − (y + η)) − P(WT ≥ 2z − y)]/η

= lim
η↓0 [P(WT ≤ y + η) − P(WT ≤ y)]/η
∂y P(WT ≥ 2z − y)
=
∂y P(WT ≤ y)
(2z−y)2
−e 2T (−1) z(z−y)
= y2
= e−2 T
− 2T
e
2. We observe that
M − Xtπi
max Xtπ ≤ M ⇔ max Wt − Wti ≤
t∈[ti ,ti+1 ] t∈[ti ,ti+1 ] σ(Xtπi )
leading to, (W.ti is a brownian motion starting from 0 at time ti )
M − xi xi+1 − xi
P( max Xtπ ≤ M | Xtπi = xi , Xtπi+1 = xi+1 ) = P( max Wtti ≤ | Wtti+1
i
= )
t∈[ti ,ti+1 ] t∈[ti ,ti+1 ] σ(xi ) σ(xi )
M − xi xi+1 − xi
= 1 − P( max Wtti ≥ | Wtti+1
i
= )
t∈[ti ,ti+1 ] σ(xi ) σ(xi )
(M −xi )(M −xi+1 )
−2
hσ 2 (xi )
=1−e
159
using the previous question.
3. We use the inverse transform method. Once the values Xtπi have been simu-
lated, we simulate maxt∈[ti ,ti+1 ] Xtπ by computing
1 π q
Xti + Xtπi+1 + (Xtπi+1 − Xtπi )2 − 2hσ 2 (Xtπi ) ln(U i ))
2
where U i are iid U(0, 1) random variables.
1. We recall that ∇Xtx is solution to

Z t Z t
0
Yt = 1 + b (Xsx )Ys ds + σ 0 (Xsx )Ys dWs . (6.85)
0 0
2. To ease the notation, we set b = 0. We then compute

Z u
p
sup |Yu | ≤ Cp (1 + sup | σ 0 (Xsx )Ys dWs |p )
u≤t u≤t 0
Taking the expectation and using BDG inequality, we get

Z t
p 0 x 2 p2
E sup |Yu | ≤ Cp (1 + E ( |σ (Xs )Ys | ) )
u≤t 0
This leads to, for p ≥ 2,

Z t Z t " #
p p p
E sup |Yu | ≤ Cp (1 + E |Ys | ds ) ≤ Cp (1 + E sup |Yu | ds) .
u≤t 0 0 u∈[0,s]
We conclude using Gronwall’s Lemma.
Remark: (see also lecture notes section 2.7, proof of Proposition 2.3) The
above proof should be applied to the localised version of Y i.e. Y·∧τn with
τn = inf{t ≥ 0 | |Yt | ≥ n} ∧ T ,
for n a positive integer.
160
3. Let Y and Z be solution of (6.85) (with b = 0) and ∆ = Y − Z. We then
observe that
Z t
∆t = σ 0 (Xsx )∆s dWs
0
We then have
Z u
sup |∆u |2 = sup | σ 0 (Xsx )∆s dWs |2 ,
u≤t u≤t 0
and taking the expectation and using BDG inequality, we obtain
Z t
2 0 x 2
E sup |∆u | ≤ 4E |σ (Xs )∆s | ds .
u≤t 0
Since σ 0 is bounded, we get
Z t
2 2
E sup |∆u | ≤ C E sup |∆u | ds .
u≤t 0 u≤s

And then using Gronwall’s Lemma, we have that E supu≤t |∆u |2 = 0.
Xtx+ −Xtx
1. It is the derivative of X x with respect to x i.e. ∇Xtx = lim→0 .
2. cf lecture notes: 4.3.3
3. We compute that
R ti σ 0 (Xs )2 Rt
(b0 (Xs )− )ds+ 0 i σ 0 (Xs )dWs
∇Xtxi = e 0 2
One can discretise X and then the above integrals, to get
σ 0 (Xtπ )2
Pi−1 0 j Pi−1
dπ
π
j=0 (b (Xt )− )(tj+1 −tj )+ j=0 σ 0 (Xtπ )(Wtj+1 −Wtj )
∇X ti =e j 2 j
161
1. the strong convergence has a rate equal to 1/2.
2. We observe that (say b = 0)

Z t
sup |∇Xt − ∇Xti | = 2
sup | σ 0 (Xs )∇Xs dWs |2
t∈[ti ,ti+1 ] t∈[ti ,ti+1 ] ti
Taking expectation and applying BDG inequality we obtain

" # Z t
2 0 2
E sup |∇Xt − ∇Xti | ≤ CE |σ (Xs )∇Xs | ds
t∈[ti ,ti+1 ] ti

2
≤ C|π|E sup |∇Xs | .
s
Using Lemma 4.1 in section 4.3.1, we obtain

" #
2
E sup |∇Xt − ∇Xti | ≤ C|π|
t∈[ti ,ti+1 ]
3. We define δt = ∇Xt − ∇Xtπ and observe

Z t Z t
δt = (σ 0 (Xs ) − σ 0 (Xsπ ))∇Xs dWs + σ 0 (Xsπ )(∇Xs − ∇Xs̄ )dWs
0 0
Z t
+ σ 0 (Xsπ )(δs̄ )dWs
0
Squaring, taking the supremum, expectation and applyin BDG we get

Z T
2 0 0 0

E sup |δs | ≤ CE |(σ (Xs ) − σ (Xsπ ))∇Xs |2 + |σ (Xsπ )(∇Xs − ∇Xs̄ )| 2
ds
s≤t 0
Z t
2
+ CE sup |δu | ds ,
0 u≤s
where we used also the boundedness of σ 0 . We then apply Gronwall’s Lemma
to get
" # Z
T
E sup |δt | 2
≤ CE |(σ 0 (Xs ) − σ 0 (Xsπ ))∇Xs |2 + |σ 0 (Xsπ )(∇Xs − ∇Xs̄ )|2 ds .
t≤T 0
We then get
" # Z
T
2 0 0
E sup |δt | ≤ CE |(σ (Xs ) − σ (Xsπ ))∇Xs |2 + |∇Xs − ∇Xs̄ )| 2
ds .
t≤T 0
162
and applying CS inequality,
" # Z
1 1 T
2 4 2 0 0 π 4 2 2
E sup |δt | ≤ CE sup |∇Xs | E sup |(σ (Xs ) − σ (Xs ))| +E |∇Xs − ∇Xs̄ )| ds .
t≤T 0
Using the previous estimates, we get

" #
E sup |δt |2 ≤ C|π| .
t≤T
1. The density h of XTx is lognormal and given by
1 log(y/x) − (r − 21 σ 2 )T
h(x, y) = √ ϕ(ζ(x, y)), ζ(x, y) = √
yσ T σ T
And we compute
log(y/x) − (r − 12 σ 2 )T
s(x, y) = ∂x h(x, y)/h(x, y) = −ζ(x, y)∂x ζ(x, y) =
xσ 2 T
2. We see u and h as functions of σ and y now. We know that
Z
∂σ u(σ, x) = g(y)s(σ, y)h(σ, y)dy .
We compute
log(y/x) − r 1 √ √ WT WT
∂σ ζ = − √ + T , ∂σ ζ(XTx ) = T − √ , ζ(XTx ) = √
σ2 T 2 σ T T
1
∂σ h = (− − ζ∂σ ζ)h
σ
and then

x 1 x x
∂σ u = E g(XT )(− − ζ(σ, XT )∂σ ζ(σ, XT ))
σ

x 1 WT
= E g(XT )(− + − WT )
σ σT
163
3. We have
Z Z 2 h(x, y)
2 2 ∂xx
∂xx u = g(y)∂xx h(x, y)dy = g(y) h(x, y)dy
h(x, y)
Recall that
∂x h = −ζ∂x ζh
∂xx h = −∂x [ζ∂x ζ]h + (ζ∂x ζ)2 h
Expliciting the above derivatives leads to the desired result.
1. heuristic: assume that ∂α X exists and then differentiate under the integral
(both for ds and dWs ) !
2. (say b = 0)
(a) The dynamics of Γ are

Z t Z t
Γt =− Hs σ(Xs )ds + σ̃s Γs dWs .
0 0
We compute, recalling that H is bounded,

Z T Z u
sup |Γu |p ≤ Cp ( |σ(Xs )|p ds + sup | σ̃s Γs dWs |p ) .
u≤t 0 u≤t 0
Using BDG inequality, the boundedness of σ̃ and Gronwall Lemma, we
obtain, for p ≥ 2,
Z T
p
E sup |Γu | ≤ Cp E |σ(Xs )|p ds
u≤t 0
Using then classical estimate for the solution of an SDE and the fact that
∈ [−1, 1], we get

p
E sup |Γu | ≤ Cp .
u≤t
164
(b) We observe that setting δs = ∂α Xs (0) − Γs ,
Z t Z t
δt = − Hs (σ(Xs ) − σ(Xs ))ds + (σ 0 (Xs )δs + (σ 0 (Xs ) − σ̃s )Γs )dWs +
0 0
Using usual arguments (sup+expectation+BDG+Gronwall) we obtain

" # Z T Z T
E sup |δt | ≤ CE 2
2
|Hs σ̃s Γs |ds + E |(σ 0 (Xs ) − σ̃s )Γs |2 dt
t≤T 0 0
Combining the result of 2.a, estimation of |σ 0 (Xs )− σ̃s | (see lecture notes)
with a CS inequality arguments, we obtain

" #
E sup |δt |2 ≤ C2
t≤T
proving the statement.
Proof. (r = 0) We introduce a bermudan option with exercise payoff Zt = g(Xt ) ∨
g(Xtπ ) and we set
pa0 = sup E[Zτ ] = E[Zτ a ]

<
τ ∈T[0,T ]
We then simply observe that
|pπ0 − pb0 | ≤ |pπ0 − pa0 | + |pa0 − pb0 | .
And that
pa0 − pπ0 ≤ E[Zτ a − g(Xτπa )]
≤ E[g(Xτπa ) ∨ g(Xτ a ) − g(Xτπa )]

√
≤ E[|g(Xτ a ) − g(Xτπa )|] ≤ C π .
165
Similarly one computes
√
pπ0 − pa0 ≤ C π
The term |pa0 − pb0 | is treated using same arguments. 2
1.
2. Proof. We first compute
h i h i
Θ − Θ̂ = E[F (X)|Y ] − E E[F (X)|Y ] |Ŷ + E F (X) − F (X̂)|Ŷ
h i h i
= ϕF (Y ) − E ϕF (Y )|Ŷ + E F (X) − F (X̂)|Ŷ
From the best approximation property of conditional expectation, we obtain
h i
kϕF (Y ) − E ϕF (Y )|Ŷ k2 ≤ kϕF (Y ) − ϕF (Ŷ )k2
≤ [ϕF ]kY − Ŷ k2
and from the contraction property
h i
kE F (X) − F (X̂)|Ŷ k2 ≤ kF (X) − F (X̂)k2
≤ [F ]kX − X̂k2
Proof. We first write down the dynamics of Γ:
dΓt = Γt (at dt + bt dWt ) , Γ0 = 1.
166
Using Doob’s inequality, we get easily that Γ ∈ S 2 as b is bounded. It also clear
that there is a unique solution to (5.3): if we state f (t, y, z) = at y + zbt + ct which
obviously satisfies (H1) and we know that Y ∈ S 2 .
Using the product formula, we compute
dΓt Yt = Γt dYt + Yt dΓt + dhΓ, Y it = −Γt ct dt + Γt Zt dWt + Γt Yt bt · dWt ,
Rt
showing that Γt Yt + 0 cr Γr dr is a local martingale, which is - in fact - a martingale
as c ∈ H2 and Γ, Y are in S 2 . Then,
Z t Z T
Γt Yt + cr Γr dr = E ΓT YT + cr Γr dr Ft ,
0 0
which concludes the proof. 2
The proof is done using a linearisation argument. Denoting U = Y 0 −Y ; V = Z 0 −Z
and ζ = ξ 0 − ξ, we have
Z T Z T

Ut = ζ + f 0 (r, Yr0 , Zr0 ) − f (r, Yr , Zr ) dr − Vr dWr .
t t
We observe that
f 0 (r, Yr0 , Zr0 ) − f (r, Yr , Zr ) = f 0 (r, Yr0 , Zr0 ) − f 0 (r, Yr , Zr0 ) + f 0 (r, Yr , Zr0 ) − f 0 (r, Yr , Zr )
+f 0 (r, Yr , Zr ) − f (r, Yr , Zr ) (non negative).
We introduce a and b : a is valued in R and b d-dimensional vector. We set
f 0 (r, Yr0 , Zr0 ) − f 0 (r, Yr , Zr0 )

ar := 1{Ur 6=0} .
Ur
(i)
For 0 ≤ i ≤ d, we consider the vector Zr whose last d − i components are those of
167
Zr0 and the first i components are those of Zr . For 1 ≤ i ≤ d, we set

(i−1) (i)
f 0 r, Yr , Zr − f 0 r, Yr , Zr
bir = 1{Vri 6=0} .
Vri
Importantly, as f 0 is Lipschitz, the two processes are bounded and progressively
measurable. We then observe that
Z T Z T
Ut = ζ + (ar Ur + Vr br + cr ) dr − Vr dWr ,
t t
where cr = f 0 (r, Yr , Zr ) − f (r, Yr , Zr ). By assumption, we have ζ ≥ 0 and cr ≥ 0.
Using the formula given in – Proposition ?? –, we have, for all t ∈ [0, T ],
Z T
−1
Ut = Γt E ζΓT + cr Γr dr Ft ,
t
with for 0 ≤ r ≤ T ,
nZ r
1
Z r Z r o
2
Γr = exp bu · dWu − |bu | du + au du .
0 2 0 0
Following Remark ??, we get that Ut ≥ 0.
If moreover, U0 = 0 we have
Z T
0 = E ζΓT + cr Γr dr ,
0
and the r.v. is non negative. Then, it is equal to zero P a.s. which implies ζ = 0
and cr = 0, concluding the proof of the Theorem.
Proof. 1. Observe that
Z ti+1
1 2
ζie = − σ 2 (Xtπi ) − σ 2 (Xsπ ) ∂xx u(s, Xsπ )ds (6.86)
2 ti
168
has already been studied in the proof of the weak error for the Euler scheme. The
upper is obtained by observing that
2 2
σ 2 (Xtπi ) − σ 2 (Xsπ ) ∂xx u(s, Xsπ ) = σ 2 (Xtπi ) − σ 2 (Xsπ ) (∂xx u(s, Xsπ ) − ∂xx
2
u(ti , Xtπi )) =: Ai
2
+ σ 2 (Xtπi ) − σ 2 (Xsπ ) ∂xx u(ti , Xtπi )) =: Bi
We have Eti[|Ai |] ≤ β|π| and applying Ito’s formula, we also obtain |Eti[Bi ] | ≤ β|π|.
(β is a random variable independent of i and whose moments are bounded) This
yields straightforwardly (5.72).
2. Using Ito’s formula, we obtain, for s ∈ [ti , ti+1 ],
h i Z s
(0) 1
Eti u (s, Xsπ ) =u (0)
(ti , Xtπi ) + Eti (0)
{∂t u (t, Xtπ ) + σ(Xtπi )∂xx
2
u(t, Xtπ )}dt
ti 2
= u(0) (ti , Xtπi ) + O(h)
We easily deduce that (5.73) holds true.
3. We now observe that
|ζiz | ≤ Lh|Z̃i − Vtii | (6.87)
also

(0) ∆Wi
Z̃i = Eti u (ti+1 , Xtπi+1 )
ti+1 − ti
∆Wi
Now we compute, setting Hi := hi
Z Z
ti+1
1 σ(Xtπi ) ti+1
Eti u(ti+1 , Xti+1 )Hi = Eti Hi [∂t + σ 2 (Xtπi )∂xx
2
]u(t, Xtπ )dt + ∂x u(t, Xtπ )dt
ti 2 hi ti
(6.88)
169
Observe that
Z ti+1
1 2 π 2 π
|Eti Hi [∂t + σ (Xti )∂xx ]u(t, Xt )dt | (6.89)
ti 2
Z ti+1
1 2 π 2 π 1 2 π 2 π
= |Eti Hi {[∂t + σ (Xti )∂xx ]u(t, Xt ) − [∂t + σ (Xti )∂xx ]u(ti , Xti )}dt |
ti 2 2
(6.90)
≤ C|π| (6.91)
and that
Z t
1 2 π 2
Eti[∂x u(t, Xt )] = Eti ∂x u(ti , Xti ) + [∂t + σ (Xti )∂xx ]∂x u(s, Xs )ds (6.92)
ti 2
We thus get
Z
σ(Xtπi ) ti+1
|Eti ∂x u(t, Xtπ )dt − Vtii | ≤ C|π| (6.93)
hi ti
Combining the previous inequality with (6.87), we obtain easily (5.74). 2
170

2020 Notes Numprofin

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2020 Notes Numprofin

Uploaded by

Copyright:

Available Formats

Méthodes numériques probabilistes avancées en finance

M2MO, Trimestre 3 , Février-Mars 2020

- Documents de cours: Moodle

tion from mathematical finance. Discrete and Continuous Dynamical Systems-

Series A, 27(3):907–917, 2010.

The review of financial studies, 9(2):385–426, 1996.

and applications, volume 6. Springer, 2015.

The Annals of Applied Probability, 24(3):1049–1080, 2014.

gorithm for obstacle problems. Stochastic processes and their applications,

mensional discrete-time optimal stopping problems. Bernoulli, 9(6):1003–1049,

ential equations. Probability theory and related fields, 104(1):43–60, 1996.

optimization. The Annals of Applied Probability, 23(5):1988–2019, 2013.

solving mckean–vlasov stochastic differential equations. SIAM Journal on Nu-

merical Analysis, 56(6):3169–3195, 2018.

[10] Jean-Michel Bismut. Conjugate convex functions in optimal stochastic control.

Journal of Mathematical Analysis and Applications, 44(2):384–404, 1973.

[11] Jean-Michel Bismut. Contrôle des systemes linéaires quadratiques: applications

de l’intégrale stochastique. In Séminaire de Probabilités XII, pages 180–264.

Séminaire de probabilités XLI, pages 371–377. Springer, 2008.

limit law of weakly interacting particles: application to the burgers equation.

The Annals of Applied Probability, 6(3):818–861, 1996.

vlasov and the burgers equation. Mathematics of Computation of the American

Mathematical Society, 66(217):157–192, 1997.

[15] Bruno Bouchard, Jean-François Chassagneux, et al. Fundamentals and ad-

vanced techniques in derivatives hedging. Springer, 2016.

monte carlo approximation of conditional expectations. Finance and Stochas-

tics, 8(1):45–71, 2004.

simulation of backward stochastic differential equations. Stochastic Processes

and their applications, 111(2):175–206, 2004.

stochastic differential equation. The Annals of Applied Probability, 23(4):1584–

Singular forward–backward stochastic differential equations and emissions

derivatives. The Annals of Applied Probability, 23(3):1086–1128, 2013.

Games with Applications I-II. Springer, 2018.

systemic risk. Communications in Mathematical Sciences, 13(4):911–933, 2015.

on Numerical Analysis, 52(6):2815–2836, 2014.

[23] Jean-François Chassagneux, Adrien Richou, et al. Numerical simulation of

quadratic bsdes. The Annals of Applied Probability, 26(1):262–304, 2016.

propagation of chaos via differential calculus on the space of measures. arXiv

preprint arXiv:1901.02556, 2019.

Stochastics, 6(4):449–471, 2002.

[26] D Crisan and K Manolarakis. Solving backward stochastic differential equations

using the cubature method: Application to nonlinear pricing. In Progress in

analysis and its applications, pages 389–397. World Scientific, 2010.

ential equations using the cubature method: application to nonlinear pricing.

SIAM Journal on Financial Mathematics, 3(1):534–571, 2012.

[28] Dan Crisan, Konstantinos Manolarakis, et al. Second order discretization of

Probability, 24(2):652–678, 2014.

simulation of bsdes: An improvement on the malliavin weights. Stochastic

Processes and their Applications, 120(7):1133–1158, 2010.

[30] PE Chaudru de Raynal and CA Garcia Trillos. A cubature based algorithm to

solve decoupled mckean–vlasov forward–backward stochastic differential equa-

tions. Stochastic Processes and their Applications, 125(6):2206–2255, 2015.

differential equations in finance. Mathematical finance, 7(1):1–71, 1997.

Springer Science & Business Media, 2013.

[34] E Gobet and P Turkedjiev. Adaptive importance sampling in least-squares

monte carlo algorithms for backward stochastic differential equations. Stochastic

Processes and their applications, 127(4):1171–1203, 2017.

Stochastic processes and their applications, 87(2):167–197, 2000.

to non-linear. CRC Press, 2016.

[37] Emmanuel Gobet, Jean-Philippe Lemor, Xavier Warin, et al. A regression-

based monte carlo method to solve backward stochastic differential equations.

The Annals of Applied Probability, 15(3):2172–2202, 2005.

[38] Emmanuel Gobet, José G López-Salas, Plamen Turkedjiev, and Carlos