Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

Deep Optimal Stopping

Vanessa Pizante

University of Toronto

March 26, 2019

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 1 / 44
Optimal Stopping Problem

X = (Xn )N d N
n=0 is an Markov process in R on (Ω, F, (F)n=0 , P).

T is the set of all X -stopping times.


Recall: τ is an X -stopping time if {τ = n} ∈ Fn , ∀n ∈ {0, 1, ..., N}

g : {0, 1, ..., N} × Rd → R is a measurable, integrable function.

Objective
Find V = supτ ∈T Eg (τ, Xτ ).

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 2 / 44
Introduction

Primary Ressource
Becker, S., Cheridito, P., and Jentzen, A.: Deep Optimal Stopping. (2018).

Optimal stopping problems with finitely many stopping times can be solved
exactly.
Optimal V given by Snell Envelope.
Difficult to approximate numerically in higher dimensions.
[Becker et al., 2018] proposes decomposing τ into a sequence of N + 1 0-1
stopping decisions.
Each decision is learned via a deep feedforward neural network.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 3 / 44
What is a Neural Network?

A neural network is a computational system modelled loosely after the human


brain: a neuron receives input from other neurons or from an external source
which it uses to generate output.
Each input has a weight quantifying its relative significance.
You can think of a neuron’s output as its relative firing rate.

Source: [Hijazi et al., 2015]

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 4 / 44
Feedforward Neural Network

Input Layer: x = (x1 , x2 , x3 ).

Hidden Layer: Takes x, and outputs y = (y1 , y2 , y3 ) where yi = fy (wi x + bi )


and wi = (wi1 , wi2 , wi3 ) for i = 1, 2, 3.

Output Layer: Take y and outputs z1 = fz (wz y + bz ) where


wz = (wz1 , wz2 , wz3 )
y1
x1
y2
x2 z1
y3
x3
fy and fz are called activation functions.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 5 / 44
Training a Neural Network

The training data can consist of input and target pairs (x m , z1m )M
m=1 or just
inputs (x m )M
m=1 .

Training the network refers to setting the weight wi and bias bi terms using
the training data.
θ = {w1 , w2 , w3 , wz , b1 , b2 , b3 , bz } is called the model parameters.

Training is done by minimizing a loss function, C , w.r.t θ.


e.g. mean squared-error loss function
M
1 X θ m
C (θ) = kz1 (x ) − z m k2
2M m=1

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 6 / 44
Training a Neural Network

The optimal parameters are found by minimizing the loss function via a
gradient descent algorithm.
e.g. Vanilla gradient descent with learning rate η update step:

θt+1 ← θt − η · ∇θ C (θt )

∇θ C (θt ) is computed using backpropagation.


Method for efficiently moving backwards through the neural network graph to
∂C
compute each ∂θ i
via the chain rule.
Special case of reverse-mode autodifferentiation.

After training, we run the algorithm using a new dataset, this is the testing
data.

Note: Minimizing a loss function via gradient descent is equivalent to


maximizing a reward function via gradient ascent.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 7 / 44
Reinforcement Learning (Optional)

Problem set-up:

Agent receives reward rt = r (st , at ) depending on its actions at and the state
st .
Environment is a Markov Decison Process with unknown transition
probabilites p(st+1 |st , at ).

Reinforce:

Agents aim to learn a policy πθ (st , at ) so they interact with their


environment to maximize some cummulative reward function.
Samples different rollouts ρ = (s1 , a1 , ..., sT , aT ) to obtain a policy where
rollouts corresponding to a high reward are more likely to be chosen.

Model-free approach to policy iteration.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 8 / 44
Q-Learning (Optional)

Q-function (a.k.a. action-value function): Expected rewards (discounted by


factor γ) if you take action a then follow your policy:
"∞ #
X
π

Q (s, a) = E γi rt+i st = s, at = a
i=0

Q-learning is an iterative algorithm based on the recursive formula for Q


derived via Bellman’s equation:

Q(st , at ) ← Q(st , at ) + α(r (st , at ) + γ max Q(st+1 , a) − Q(st , at ))


a

Note r (st , at ) + γ maxa Q(st , a) − Q(st , at ) is the Bellman error


=⇒ Q-Learning is equivalent to minimizing the Bellman error.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 9 / 44
Optimal Stopping and Q-Learning (Optional)

Yu, H. and Bertsekas, D.P.: Q-learning Algorithms for Optimal Stopping Based on Least
Squares. (2007).

Given current state xt we have two options:


Stop and incur cost f (xt ).
Continue and incur cost h(xt , xt+1 ).

Let Q ∗ (xt ) = mina Q(xt , a) where a is either to stop or continue.

Optimal policy is to stop as soon as (Xt ) enters the set: D = {xt |f (xt ) ≤ Q ∗ (xt )}

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 10 / 44
Optimal Stopping and Q-Learning (Optional)

Consider approximations for Q that use a linear projection of transformed xt


[Yu and Bertsekas, 2007]:

Q ∗ (xt ) ≡ φ(Xt )0 βt

The corresponding projected Bellman error is then minimized w.r.t. βt via a


least-squares approximation based on the Q-learning algorithm as follows:

βt+1 = βt + α(β̂t+1 − βt )

where
t
2
X
β̂t = arg min (φ(xk )0 β − h(xk , xk+1 ) − γ min{f (xk+1 ), φ(xk )0 βt })
β
k=0

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 11 / 44
Least-Squares Monte Carlo (LSMC)

Longstaff, F. and Schwartz E.: Valuing American Options by Simulation: A Simple


Least-Squares Approach, (2001).

Simulate M stock price paths (xnm )N


n=0 .

Set VNm = g (N, xNm ), ∀m and then for n = N − 1, ..., 0:


Approximate continuation value qn via q̂n (x) = Kk=0 βk x k where β minimizes
P
the least-squares error
M
X
(e −rtn Vn+1
m
− qn (xnm ))2
m=1
(
g (n, xnm ) if qn (xnm ) < g (n, xnm )
Set Vnm = −rn T m
e N Vn+1 otherwise
f

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 12 / 44
Least-Squares Neural Network Approach

Kohler, Krzyzak and Todorovic: Pricing of High-Dimensional American Options by


Neural Networks, (2006).

Neural network applied to Longstaff-Schwartz algorithm to price higher dimensional


American options.
Replaces least-squares regression with neural network of the form:

q θn = a2θn ◦ σ ◦ a1θn

where
PK
a1 : Rd → RK , a2 : RK → R : ai (x) = Wi x + bi and k=0 |w2,k | ≤ Cn .
σ : Rk → [0, 1]K : σk (yk ) = 1/(1 + e −yk ).
Algorithm backwards recursively creates training data (xnm , e −rtn · Vn+1
m
) and has a
mean squared loss function.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 13 / 44
Extension: More Complex Neural Networks
Hu, R.: Deep Learning for Ranking Response Surfaces with Applications to Optimal
Stopping Problems, (2019).

Figure 1: Left: Feed-forward NN, similar to NN applied in [Becker et al., 2018], Right:
UNET, applied in [Hu, 2019].

Source: [Hu, 2019]

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 14 / 44
Ranking Response Surfaces

Optimal stopping problem can be reframed as an image classification problem


via surface ranking.

Surface ranking problem: Extract index of minimal surface for each input x
and treat it as a class label.
For optimal stopping labels are just stop or continue (at each time step).

Can now use convolution NNs (e.g. UNET), which can be more
computationally efficient [Hu, 2019].

Disadvantage: Convergence theory has yet to have been completely


developed for fully convolutional NN.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 15 / 44
Expressing Stopping Times as a Series of 0-1 Decisions

Goal
Show optimal decision to stop (Xn )N N
n=0 can be made according to {fn (Xn )}n=0
d
where fn : R → {0, 1}.

Write time n stopping time, τn , as a function of {fk (Xk )}N


k=n .

Let Tn be the set of all X -stopping times τ s.t. n ≤ τ ≤ N.

Clearly TN = {N}, so let τN = N · fN (XN ) where fN ≡ 1.

Now, for n = N − 1, ..., 0, we can define:


N
X k−1
Y
τn = kfk (Xk ) (1 − fj (Xj )) ∈ Tn (1)
k=n j=n

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 16 / 44
Theorem 1
For n ∈ {0, ..., N − 1}, let τn+1 ∈ Tn+1 be of the form:
N
X k−1
Y
τn+1 = kfk (Xk ) (1 − fj (Xj )) (2)
k=n+1 j=n+1

Then there exists a measurable function fn : Rd → {0, 1} such that τn ∈ Tn


satisfies

Eg (τn , XτN ) ≥ Vn − (Vn+1 − Eg (τn+1 , Xτn+1 )) (3)

where Vn and Vn+1 satisfy

Vn = sup Eg (τ, Xτ ) (4)


τ ∈Tn

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 17 / 44
Sketch of Theorem 1 Proof

Fix stopping time τ ∈ Tn and let  = Vn+1 − Eg (τn+1 , Xτn+1 ).


Show g (τn+1 , Xτn+1 ) is meas. using (2).
=⇒ ∃hn meas. and Markovian s.t. hn (Xn ) = E[g (τn+1 , Xτn+1 )|Xn ].

Define D = {g (n, Xn ) ≥ hn (Xn )}, E = {τ = n} ∈ Fn and:


τn = nID + τn+1 ID c ∈ Tn
τ̃ = τn+1 IE + τ IE c ∈ Tn+1

So Eg (τ, Xτn+1 ) ≥ Eg (τ̃ , Xτ̃ ) −  =⇒ E[g (τ, Xτn+1 )IE c ] ≥ E[g (τ, Xτ )IE c ] − .

Exercise: Show Eg (τn , Xτn ) ≥ Eg (τ, Xτ ) − .

Since τ is arbitrary, we have Eg (τn , Xτn ) ≥ Vn − , satisfying (3).

Proof is completed by showing τn here satisfies (1).

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 18 / 44
What have we proven?

Theorem 1 shows that τn from (1) can adequately be used to compute Vn .


Meaning τn is the time n optimal stopping time.
Follows from (3) due to the backwards recursive way τn is computed.

Optimal stopping time corresponding to V = supτ ∈T Eg (τ, Xτ ) is:


N
X n−1
Y
τ= nfn (Xn ) (1 − fk (Xk ))
n=1 k=0

...But how do we find the sequence {fn }N


n=0 ?

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 19 / 44
Introducing a Neural Network

Using fN ≡ 1 we construct a sequence of neural networks, f θn : Rd → {0, 1},


to approximate fn for n = N − 1, ..., 0.
We can then approximate τn+1 via
N
X k−1
Y
k · f θk (Xk ) (1 − f θj (Xj )) (5)
k=n j=n

Problem:

f θn produces binary output =⇒ it is not continuous w.r.t. θ.


Need continuous output to train θn via a gradient-based optimization
algorithm.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 20 / 44
Neural Network with Continuous Output

Introduce F θ : Rd → (0, 1), a two-layer, feed-forward neural network of the form

F θ = ψ ◦ a3θ ◦ φq2 ◦ a2θ ◦ φq1 ◦ a1θ (6)

where
q1 and q2 are the number of nodes in the hidden layers.
a1θ : Rd → Rq1 , a2θ : Rq1 → Rq2 , a3θ : Rq2 → R satisfy

aiθ (x) = Wi x + bi

φqi : Rqi → Rqi are ReLU activation function, φqi (x1 , ..., xqi ) = (x1+ , ..., xq+i ).
ψ : R → R is the logistic sigmoid function, ψ(x) = 1/(1 + e −x ).

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 21 / 44
Return to Binary Decisions

Parameters are θ = {(Ai , bi )3i=1 } ∈ Rq , where q = q1 (d + q2 + 1) + 2q2 + 1.


Can now use F θ to find optimal θn via gradient descent for n = N − 1, ..., 0.
After, we can compute f θn : Rd → {0, 1} using:

f θn = I[0,∞) ◦ a3θn ◦ φq2 ◦ a2θn ◦ φq1 ◦ a1θn (7)

Note [Becker et al., 2018] does not provide formal proof that using F θ to
optimize θ will provide optimal parameters for f θ .
However, it does make sense intuitively when we consider F θn (Xn ) to be the
probability that stopping is the optimal decision at time n (given Xn ).

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 22 / 44
Selecting a Reward Function

Main Idea
Want to define a reward function that yields a stopping decision at time n that
will maximize our expected future payoff.

If at time n we:
Stop =⇒ fn (Xn ) = 1 and we will receive payoff g (n, Xn )
Continue and after proceed optimally =⇒ fn (Xn ) = 0 and we will
eventually receive payoff of g (τn+1 , Xτn+1 ).

So we want our reward function at time n to approximate

sup E[g (n, Xn )f (Xn ) + g (τn+1 , Xτn+1 )(1 − f (Xn ))] (8)
f ∈D

where D is the set of all f : Rd → {0, 1} measurable.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 23 / 44
...But can we even replace f in (8) with neural network f θ ?

Theorem 2
Let n ∈ {0, ..., N − 1} and fix a stopping time θn+1 ∈ Tn+1 , then ∀ > 0, there
exists q1 , q2 ∈ N+ such that

sup E[g (n, Xn )f θ (Xn ) + g (τn+1 , Xτn+1 )(1 − f θ (Xn ))]


θ∈Rq
≥ sup E[g (n, Xn )f (Xn ) + g (τn+1 , Xτn+1 )(1 − f (Xn ))] −  (9)
f ∈D

where D is the set of all measurable functions f : Rd → {0, 1}.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 24 / 44
Corrallary 1
For any  > 0, there exists q1 , q2 ∈ N+ and neural network functions of the form
(7) such that f θN ≡ 1 and the stopping time
N
X n−1
Y
τ̂ = nf θn (Xn ) (1 − f θk (Xk )) (10)
n=1 k=0

satisfies Eg (τ̂ , Xτ̂ ) ≥ supτ ∈T Eg (τ, Xτ ) − .

This means we can approximate the sequence of optimal stopping decisions


{fn }N θn N
n=0 with the sequence of optimized neural networks {f }n=0 .

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 25 / 44
Sketch of Theorem 2 Proof (Optional)

By integrability of g , ∃f˜ : Rd → {0, 1} meas. s.t.

E[g (n, Xn )f˜θ (Xn ) + g (τn+1 , Xτn+1 )(1 − f˜θ (Xn ))]
≥ sup E[g (n, Xn )f (Xn ) + g (τn+1 , Xτn+1 )(1 − f (Xn ))] − /4 (11)
f ∈D

Let f˜ = IA , A = {x ∈ Rd |f˜ = 1} and note

B 7→ E[|g (n, Xn )|IB (Xn )] and B 7→ E[|g (τn+1 , Xτn+1 )|IB (Xn )]

are finite Borel measures on Rd .

So ∃K ⊆ A compact s.t.

E[g (n, Xn )IK (Xn ) + g (τn+1 , Xτn+1 )(1 − IK (Xn ))]


≥ E[g (n, Xn )f˜θ (Xn ) + g (τn+1 , Xτn+1 )(1 − f˜θ (Xn ))] − /4 (12)

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 26 / 44
Sketch of Theorem 2 Proof Cont. (Optional)
Let ρK (x) = inf y ∈K kx − y k2 and note kj (x) = max{1 − j · ρK (x), −1},
j ∈ N converges pointwise to IK − IK c . So by DCT:

E[g (n, Xn )Ikj (Xn )≥0 + g (τn+1 , Xτn+1 )(1 − Ikj (Xn )≥0 )]
≥ E[g (n, Xn )IK (Xn ) + g (τn+1 , Xτn+1 )(1 − IK (Xn ))] − /4 (13)

By [Leshno
Pet al., 1993] sinceP kj can be approx. uniformly on compact sets
r s
by h(x) = i=1 (viT x + ci )+ − i=1 (wiT x + di )+ and thus:

E[g (n, Xn )Ih(Xn )≥0 + g (τn+1 , Xτn+1 )(1 − Ih(Xn )≥0 )]


≥ E[g (n, Xn )Ikj (Xn )≥0 + g (τn+1 , Xτn+1 )(1 − Ikj (Xn )≥0 )] − /4 (14)

By considering I[0,∞) ◦ h as an NN of the form f θ , eq.(14) becomes:

E[g (n, Xn )f θ (Xn ) + g (τn+1 , Xτn+1 )(1 − f θ (Xn ))]


≥ E[g (n, Xn )Ikj (Xn )≥0 + g (τn+1 , Xτn+1 )(1 − Ikj (Xn )≥0 )] − /4 (15)

Combining (11), (12), (13) and (15) we obtain (9), as required.


Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 27 / 44
Computing Estimates via the Neural Network

Simulate M independent paths of Markov process, (xnm )N


n=0 .
θN is chosen so f θN ≡ 1 and for n = N − 1, ..., 0:
m
Use θn+1 , ..., θN to compute τ̄n+1 along each of the m paths via
N
X k−1
Y
m
τ̄n+1 = kf θk (xnm ) (1 − f θj (xjm ))
k=n+1 j=n+1

At time n along path m, if we stop w.p. F θn (xnm ) and then adhere to


{f θk (xkm )}Nk=n+1 , the realized reward is

rnm (θn ) = g (n, xnm )F θ (xnm ) + g (τ̄n+1


m
, xτ̄mn+1
m )

For sufficiently large M


M
1 X m
rn (θn ) (16)
M m=1

approximates E[g (n, Xn )F θn (Xn ) + g (τn+1 , Xτn+1 )(1 − F θn (Xn ))].

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 28 / 44
Computing Estimates via the Neural Network

Note (16) acts as the reward function we want to maximize w.r.t. θn via a
gradient ascent algorithm.

Next translate F θn (xnm ) into 0-1 stopping decisions, f θn (xnm ).

Now generate our testing sample paths (ynm )N


n=0 for m = 1, ..., M.
For n = N − 1, ..., 0, using the {θn } found during training, compute
N
X k−1
Y
m
τ̃n+1 = kf θk (ykm ) (1 − f θj (yjm ))
k=n+1 j=n+1

along each of the m sample paths.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 29 / 44
Computing Estimates via the Neural Network
Overall optimal stopping time (estimate of τ̂ from (10)):
N
X n−1
Y
τ̃ m = kf θn (ynm ) (1 − f θk (ykm ))
n=1 k=1

Corresponding Monte Carlo estimate of V = g (τ̂ , Xτ̂ ):


M
1 X
V̂ = g (τ̃m , yτ̃mm ) (17)
M m=1

By CLT a 1 − α, α ∈ (0, 1) confidence interval for V is:


 
σ̂ σ̂
V̂ − zα/2 √ , V̂ + zα/2 √
M M
where zα/2 is the 1 − α/2 quantile of N (0, 1) and
M
1 X 2
σ̂ = g (τ̃m , yτ̃mm ) − V̂
M − 1 m=1

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 30 / 44
Application: Bermudan Max Call Option

Example
Bermudan max-call option expiring at time T with strike price K written on d
assets, X 1 , ..., X d , with N + 1 equidistant exercise times
tn = nT /N, n = 0, 1, ..., N.

+
Payoff function time t is: maxi∈{1,...,d} Xti − K .
h + i
So its price at time t is supτ E e −r τ maxi∈{1,...,d} Xτi − K .

Frame this as an optimal stopping problem supτ ∈T Eg (τ, Xτ ) where


 +
−rtn i
g (n, x) = e max x −K (18)
i∈{1,...,d}

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 31 / 44
Application: Bermudan Max Call Option

Assume a Black-Scholes market model with d uncorrelated assets.

For i = 1, ..., d set:

x0i = 90, K = 100, σi = 0.2, δi = 0.1, r = 0.05, T = 3, N=9

Asset price paths can be simulated via


( n )
m
X
2
√ m

xn,i = x0,i · exp (r − δi − σi /2)∆t + σi ∆t · Zk,i (19)
k=0

m
where ∆t = T /N and Zk,i ∼ N (0, 1).

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 32 / 44
Building Neural Network using PyTorch

Constructing a neural network of the form F θ from (6) is simple using Pytorch!

1 import torch.nn as nn
2
3 class NeuralNet(torch.nn.Module):
4 def __init__(self, d, q1, q2):
5 super(NeuralNet, self).__init__()
6 self.a1 = nn.Linear(d, q1)
7 self.relu = nn.ReLU()
8 self.a2 = nn.Linear(q1, q2)
9 self.a3 = nn.Linear(q2, 1)
10 self.sigmoid=nn.Sigmoid()
11
12 def forward(self, x):
13 out = self.a1(x)
14 out = self.relu(out)
15 out = self.a2(out)
16 out = self.relu(out)
17 out = self.a3(out)
18 out = self.sigmoid(out)
19
20 return out

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 33 / 44
Training the Network in Pytorch

1 def loss(y_pred,s, x, n, tau):


2 r_n=torch.zeros((s.M))
3 for m in range(0,s.M):
4 r_n[m]=-s.g(n,m,x)*y_pred[m] - s.g(tau[m],m,x)*(1-y_pred[m])
5 return(r_n.mean())
6
7 def NN(n,x,s, tau_n_plus_1):
8 epochs=50
9 model=NeuralNet(s.d,s.d+40,s.d+40)
10 optimizer = torch.optim.Adam(model.parameters(), lr = 0.0001)
11
12 for epoch in range(epochs):
13 F = model.forward(X[n])
14 optimizer.zero_grad()
15 criterion = loss(F,S,X,n,tau_n_plus_1)
16 criterion.backward()
17 optimizer.step()
18
19 return F,model

s.g is g , as defined in equation (18).

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 34 / 44
Adaptive Moment Estimation (Adam) Optimization

Input: α, r (θ), θ0
1 Set β1 = 0.9; β2 = 0.99;  = 10−8
2 m0 = v0 = 0; t = 0
3 while r (θt ) not minimized do
4 t =t +1
5 Gt = ∇θ r (θt−1 )
6 mt = β1 · mt−1 + (1 − β1 ) · Gt
7 vt = β2 · vt−1 + β2 · (Gt )2
8 m̂t = mt /(1 − (β1 )t )
9 v̂t = vt /(1 − (β2 )t )

10 θt = θt−1 − α · m̂t /( v̂t + )
Output: θt

Adaptive: Performs smaller updates (lower learning rate) for frequently


occurring features.
Moment Estimation: Stores exponentially decaying average of gradient’s
mean (mt ) and uncentered variance (vt ).

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 35 / 44
Results

d Actual Estimate Standard Error 95 % Confidence Interval


2 8.04 7.50 0.23 (7.05, 7.95)
5 16.64 16.80 0.32 (16.18,17.43)
10 26.20 23.83 0.29 (23.25, 24.40)
Table 1: Tabulated estimates for V . Results computed using M = 5000 sample paths.

Note: Actual estimate for d = 2 is from [Hu, 2019] and for d = 5, 10 are from
[Becker et al., 2018].

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 36 / 44
Optimally Stopping a Fractional Brownian Motion
(Optional)

Fractional Brownian motion with Hurst Parameter H is a Gaussian process


with mean zero and covariance structure
1 2H
E[WtH WsH ] = t + s 2H − |t − s|2H

(20)
2
Standard Brownian Motion is a fractional Brownian motion with H = 1/2.

Objective
We want to evaluate sup0≤τ ≤1 EWτH .

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 37 / 44
Challenge of Optimally Stopping FBM (Optional)

Optional Stopping Theorem (OST)


Let (Xt )t≥0 be a martingale and τ be a stopping time with respect to the
filtration (Ft )t≥0 , then if either of the following hold:
τ is bounded
Xτ ∧t is bounded
it follows that EX τ = EX0 .

1/2 1/2
So by OST EWτ = 0 for any stopping time τ s.t. Wτ ∧t is bounded above
by a constant.

However if H 6= 1/2, W H is not a martingale, so the OST does not apply.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 38 / 44
Optimally Stopping a Fractional Brownian Motion
(Optional)

n
Discretize the time interval [0, 1] into time steps, tn = 100 .
Create a 100-dimensional Markov process, (Xn )n=0 to describe (WtHn )100
100
n=0 :

X0 = (0, 0, ..., 0)
X1 = (WtH1 , 0, ..., 0)
H H
X2 =.. (Wt1 , Wt2 , ..., 0)
.
X100 = (WtH100 , WtH99 , ..., WtH1 )

Letting g : R100 → R be g (x1 , ..., x100 ) = x1 , we now aim to compute

sup Eg (Xτ ) (21)


τ ∈T

where τ is the set of all X-stopping times.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 39 / 44
Optimally Stopping a Fractional Brownian Motion
(Optional)

[Becker et al., 2018] simulated (Xn )100 H H H


n=0 by defining Yn = Wtn − Wtn−1 for
each n = 1, ..., N, which forms a stationary Gaussian process with
autocovariance
|k + 1|2H − |k|2H + |k − 1|2H
E[YnH Yn+k
H
]=
2(1002H )
Pn
Sample paths of (Xn ) are then computed via WtHn = k=1 YkH .

Then train neural networks of the form (7) with d = 100, q1 = 110 and
q2 = 55 and approximate (21) via Monte Carlo.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 40 / 44
Optimally Stopping a Fractional Brownian Motion
(Optional)

Figure 2: Results of [Becker et al., 2018]. Expected value of optimally stopped FBM
w.r.t. its Hurst parameter H.

Source: [Becker et al., 2018]

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 41 / 44
Concluding Remarks

Neural networks (and machine learning algorithms in general) can aid in


solving the optimal stopping problem in higher dimensions.
New area of research:
Neural Network employed by [Becker et al., 2018] has one of the simplest
architectures.
More complex architectures lack fully developed convergence theory
[Hu, 2019].
Questions for further research:
Can we strategically select the number of nodes per hidden layer and/or the
number of epochs for this algorithm?
What other neural network architectures might be worth trying?
What other problems in mathematical finance can be approximated in higher
dimensions by applying neural networks?

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 42 / 44
References I

Becker, S., Cheridito, P., and Jentzen, A. (2018).


Deep optimal stopping.
Technical report.
Hijazi, S., Kumar, R., and Rowen, C. (2015).
Using convolutional neural networks for image recognition.
Hu, R. (2019).
Deep learning for ranking response surfaces with applications to optimal
stopping problems.
arXiv e-prints, page arXiv:1901.03478.
Kingma, D. P. and Ba, J. (2014).
Adam: A method for stochastic optimization.
CoRR.
Kohler, M., Krzyżak, A., and Todorovic, N. (2010).
Pricing of high-dimensional american options by neural networks.
Mathematical Finance, 20(3):383–410.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 43 / 44
References II

Leshno, M., Lin, V. Y., Pinkus, A., and Schocken, S. (1993).


Multilayer feedforward networks with a nonpolynomial activation function can
approximate any function.
Neural Networks, 6:861–867.
Yu, H. and Bertsekas, D. P. (2007).
Q-learning algorithms for optimal stopping based on least squares.
2007 European Control Conference (ECC), pages 2368–2375.

Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 44 / 44

You might also like