Professional Documents
Culture Documents
DOS - Presentation
DOS - Presentation
Vanessa Pizante
University of Toronto
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 1 / 44
Optimal Stopping Problem
X = (Xn )N d N
n=0 is an Markov process in R on (Ω, F, (F)n=0 , P).
Objective
Find V = supτ ∈T Eg (τ, Xτ ).
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 2 / 44
Introduction
Primary Ressource
Becker, S., Cheridito, P., and Jentzen, A.: Deep Optimal Stopping. (2018).
Optimal stopping problems with finitely many stopping times can be solved
exactly.
Optimal V given by Snell Envelope.
Difficult to approximate numerically in higher dimensions.
[Becker et al., 2018] proposes decomposing τ into a sequence of N + 1 0-1
stopping decisions.
Each decision is learned via a deep feedforward neural network.
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 3 / 44
What is a Neural Network?
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 4 / 44
Feedforward Neural Network
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 5 / 44
Training a Neural Network
The training data can consist of input and target pairs (x m , z1m )M
m=1 or just
inputs (x m )M
m=1 .
Training the network refers to setting the weight wi and bias bi terms using
the training data.
θ = {w1 , w2 , w3 , wz , b1 , b2 , b3 , bz } is called the model parameters.
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 6 / 44
Training a Neural Network
The optimal parameters are found by minimizing the loss function via a
gradient descent algorithm.
e.g. Vanilla gradient descent with learning rate η update step:
θt+1 ← θt − η · ∇θ C (θt )
After training, we run the algorithm using a new dataset, this is the testing
data.
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 7 / 44
Reinforcement Learning (Optional)
Problem set-up:
Agent receives reward rt = r (st , at ) depending on its actions at and the state
st .
Environment is a Markov Decison Process with unknown transition
probabilites p(st+1 |st , at ).
Reinforce:
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 8 / 44
Q-Learning (Optional)
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 9 / 44
Optimal Stopping and Q-Learning (Optional)
Yu, H. and Bertsekas, D.P.: Q-learning Algorithms for Optimal Stopping Based on Least
Squares. (2007).
Optimal policy is to stop as soon as (Xt ) enters the set: D = {xt |f (xt ) ≤ Q ∗ (xt )}
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 10 / 44
Optimal Stopping and Q-Learning (Optional)
Q ∗ (xt ) ≡ φ(Xt )0 βt
βt+1 = βt + α(β̂t+1 − βt )
where
t
2
X
β̂t = arg min (φ(xk )0 β − h(xk , xk+1 ) − γ min{f (xk+1 ), φ(xk )0 βt })
β
k=0
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 11 / 44
Least-Squares Monte Carlo (LSMC)
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 12 / 44
Least-Squares Neural Network Approach
q θn = a2θn ◦ σ ◦ a1θn
where
PK
a1 : Rd → RK , a2 : RK → R : ai (x) = Wi x + bi and k=0 |w2,k | ≤ Cn .
σ : Rk → [0, 1]K : σk (yk ) = 1/(1 + e −yk ).
Algorithm backwards recursively creates training data (xnm , e −rtn · Vn+1
m
) and has a
mean squared loss function.
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 13 / 44
Extension: More Complex Neural Networks
Hu, R.: Deep Learning for Ranking Response Surfaces with Applications to Optimal
Stopping Problems, (2019).
Figure 1: Left: Feed-forward NN, similar to NN applied in [Becker et al., 2018], Right:
UNET, applied in [Hu, 2019].
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 14 / 44
Ranking Response Surfaces
Surface ranking problem: Extract index of minimal surface for each input x
and treat it as a class label.
For optimal stopping labels are just stop or continue (at each time step).
Can now use convolution NNs (e.g. UNET), which can be more
computationally efficient [Hu, 2019].
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 15 / 44
Expressing Stopping Times as a Series of 0-1 Decisions
Goal
Show optimal decision to stop (Xn )N N
n=0 can be made according to {fn (Xn )}n=0
d
where fn : R → {0, 1}.
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 16 / 44
Theorem 1
For n ∈ {0, ..., N − 1}, let τn+1 ∈ Tn+1 be of the form:
N
X k−1
Y
τn+1 = kfk (Xk ) (1 − fj (Xj )) (2)
k=n+1 j=n+1
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 17 / 44
Sketch of Theorem 1 Proof
So Eg (τ, Xτn+1 ) ≥ Eg (τ̃ , Xτ̃ ) − =⇒ E[g (τ, Xτn+1 )IE c ] ≥ E[g (τ, Xτ )IE c ] − .
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 18 / 44
What have we proven?
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 19 / 44
Introducing a Neural Network
Problem:
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 20 / 44
Neural Network with Continuous Output
where
q1 and q2 are the number of nodes in the hidden layers.
a1θ : Rd → Rq1 , a2θ : Rq1 → Rq2 , a3θ : Rq2 → R satisfy
aiθ (x) = Wi x + bi
φqi : Rqi → Rqi are ReLU activation function, φqi (x1 , ..., xqi ) = (x1+ , ..., xq+i ).
ψ : R → R is the logistic sigmoid function, ψ(x) = 1/(1 + e −x ).
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 21 / 44
Return to Binary Decisions
Note [Becker et al., 2018] does not provide formal proof that using F θ to
optimize θ will provide optimal parameters for f θ .
However, it does make sense intuitively when we consider F θn (Xn ) to be the
probability that stopping is the optimal decision at time n (given Xn ).
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 22 / 44
Selecting a Reward Function
Main Idea
Want to define a reward function that yields a stopping decision at time n that
will maximize our expected future payoff.
If at time n we:
Stop =⇒ fn (Xn ) = 1 and we will receive payoff g (n, Xn )
Continue and after proceed optimally =⇒ fn (Xn ) = 0 and we will
eventually receive payoff of g (τn+1 , Xτn+1 ).
sup E[g (n, Xn )f (Xn ) + g (τn+1 , Xτn+1 )(1 − f (Xn ))] (8)
f ∈D
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 23 / 44
...But can we even replace f in (8) with neural network f θ ?
Theorem 2
Let n ∈ {0, ..., N − 1} and fix a stopping time θn+1 ∈ Tn+1 , then ∀ > 0, there
exists q1 , q2 ∈ N+ such that
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 24 / 44
Corrallary 1
For any > 0, there exists q1 , q2 ∈ N+ and neural network functions of the form
(7) such that f θN ≡ 1 and the stopping time
N
X n−1
Y
τ̂ = nf θn (Xn ) (1 − f θk (Xk )) (10)
n=1 k=0
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 25 / 44
Sketch of Theorem 2 Proof (Optional)
E[g (n, Xn )f˜θ (Xn ) + g (τn+1 , Xτn+1 )(1 − f˜θ (Xn ))]
≥ sup E[g (n, Xn )f (Xn ) + g (τn+1 , Xτn+1 )(1 − f (Xn ))] − /4 (11)
f ∈D
B 7→ E[|g (n, Xn )|IB (Xn )] and B 7→ E[|g (τn+1 , Xτn+1 )|IB (Xn )]
So ∃K ⊆ A compact s.t.
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 26 / 44
Sketch of Theorem 2 Proof Cont. (Optional)
Let ρK (x) = inf y ∈K kx − y k2 and note kj (x) = max{1 − j · ρK (x), −1},
j ∈ N converges pointwise to IK − IK c . So by DCT:
E[g (n, Xn )Ikj (Xn )≥0 + g (τn+1 , Xτn+1 )(1 − Ikj (Xn )≥0 )]
≥ E[g (n, Xn )IK (Xn ) + g (τn+1 , Xτn+1 )(1 − IK (Xn ))] − /4 (13)
By [Leshno
Pet al., 1993] sinceP kj can be approx. uniformly on compact sets
r s
by h(x) = i=1 (viT x + ci )+ − i=1 (wiT x + di )+ and thus:
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 28 / 44
Computing Estimates via the Neural Network
Note (16) acts as the reward function we want to maximize w.r.t. θn via a
gradient ascent algorithm.
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 29 / 44
Computing Estimates via the Neural Network
Overall optimal stopping time (estimate of τ̂ from (10)):
N
X n−1
Y
τ̃ m = kf θn (ynm ) (1 − f θk (ykm ))
n=1 k=1
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 30 / 44
Application: Bermudan Max Call Option
Example
Bermudan max-call option expiring at time T with strike price K written on d
assets, X 1 , ..., X d , with N + 1 equidistant exercise times
tn = nT /N, n = 0, 1, ..., N.
+
Payoff function time t is: maxi∈{1,...,d} Xti − K .
h + i
So its price at time t is supτ E e −r τ maxi∈{1,...,d} Xτi − K .
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 31 / 44
Application: Bermudan Max Call Option
m
where ∆t = T /N and Zk,i ∼ N (0, 1).
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 32 / 44
Building Neural Network using PyTorch
Constructing a neural network of the form F θ from (6) is simple using Pytorch!
1 import torch.nn as nn
2
3 class NeuralNet(torch.nn.Module):
4 def __init__(self, d, q1, q2):
5 super(NeuralNet, self).__init__()
6 self.a1 = nn.Linear(d, q1)
7 self.relu = nn.ReLU()
8 self.a2 = nn.Linear(q1, q2)
9 self.a3 = nn.Linear(q2, 1)
10 self.sigmoid=nn.Sigmoid()
11
12 def forward(self, x):
13 out = self.a1(x)
14 out = self.relu(out)
15 out = self.a2(out)
16 out = self.relu(out)
17 out = self.a3(out)
18 out = self.sigmoid(out)
19
20 return out
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 33 / 44
Training the Network in Pytorch
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 34 / 44
Adaptive Moment Estimation (Adam) Optimization
Input: α, r (θ), θ0
1 Set β1 = 0.9; β2 = 0.99; = 10−8
2 m0 = v0 = 0; t = 0
3 while r (θt ) not minimized do
4 t =t +1
5 Gt = ∇θ r (θt−1 )
6 mt = β1 · mt−1 + (1 − β1 ) · Gt
7 vt = β2 · vt−1 + β2 · (Gt )2
8 m̂t = mt /(1 − (β1 )t )
9 v̂t = vt /(1 − (β2 )t )
√
10 θt = θt−1 − α · m̂t /( v̂t + )
Output: θt
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 35 / 44
Results
Note: Actual estimate for d = 2 is from [Hu, 2019] and for d = 5, 10 are from
[Becker et al., 2018].
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 36 / 44
Optimally Stopping a Fractional Brownian Motion
(Optional)
Objective
We want to evaluate sup0≤τ ≤1 EWτH .
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 37 / 44
Challenge of Optimally Stopping FBM (Optional)
1/2 1/2
So by OST EWτ = 0 for any stopping time τ s.t. Wτ ∧t is bounded above
by a constant.
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 38 / 44
Optimally Stopping a Fractional Brownian Motion
(Optional)
n
Discretize the time interval [0, 1] into time steps, tn = 100 .
Create a 100-dimensional Markov process, (Xn )n=0 to describe (WtHn )100
100
n=0 :
X0 = (0, 0, ..., 0)
X1 = (WtH1 , 0, ..., 0)
H H
X2 =.. (Wt1 , Wt2 , ..., 0)
.
X100 = (WtH100 , WtH99 , ..., WtH1 )
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 39 / 44
Optimally Stopping a Fractional Brownian Motion
(Optional)
Then train neural networks of the form (7) with d = 100, q1 = 110 and
q2 = 55 and approximate (21) via Monte Carlo.
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 40 / 44
Optimally Stopping a Fractional Brownian Motion
(Optional)
Figure 2: Results of [Becker et al., 2018]. Expected value of optimally stopped FBM
w.r.t. its Hurst parameter H.
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 41 / 44
Concluding Remarks
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 42 / 44
References I
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 43 / 44
References II
Vanessa Pizante (University of Toronto) Deep Optimal Stopping March 26, 2019 44 / 44