Professional Documents
Culture Documents
MIT15 450F10 Lec05
MIT15 450F10 Lec05
t
denotes the share of the stock in the portfolio.
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 5 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
Principle of Optimality
Suppose we have solved the problem, and found the optimal policy
t
.
Consider a tail subproblem of maximizing E
s
[U(W
T
)]starting at some point
in time swith wealth W
s
.
time s,
Wealth W
s
,
policy
(s)
s
Value f-n J(s,W
s
)
policy
(s)
U(W
T
)
U(W
T
)
U(W
T
)
U(W
T
)
s+1
policy
(s)
s+1
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 6 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
Principle of Optimality
Let
(s)
s
,
(s)
+1
, ...,
(s)
T1
denote the optimal policy of the subproblem.
The Principle of Optimality states that the optimal policy of the tail
subproblem coincides with the corresponding portion of the solution of the
original problem.
The reason is simple: if policy (
(s)
the tail subproblem, the original problem could be improved by replacing the
corresponding portion with (
(s)
).
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 7 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
IID Returns
DP
Suppose that the time-t conditional expectation of terminal utility under the
optimal policy depends only on the portfolio value W
t
at time t , and nothing
else. This conjecture needs to be veried later.
E
t
U(W
T
)|
(t )
t ,...,T1
=J(t ,W
t
)
We call J(t ,W
t
)the indirect utility of wealth.
Then we can compute the optimal portfolio policy at t1 and the time-(t1)
expected terminal utility as
J(t1,W
t 1
) =max E
t 1
[J(t ,W
t
)] (Bellman equation)
t 1
W
t
=W
t 1
[
t 1
R
t
+ (1
t 1
)R
f
]
J(t ,W
t
)is called the value function of the dynamic program.
c Dynamic Portfolio Choice II 8 / 35 Leonid Kogan ( MIT, Sloan ) 15.450, Fall 2010
Introduction to Dynamic Programming Dynamic Programming Applications
IID Returns
DP
DP is easy to apply.
Compute the optimal policy one period at a time using backward induction.
At each step, the optimal portfolio policy maximizes the conditional
expectation of the next-period value function.
The value function can be computed recursively.
Optimal portfolio policy is dynamically consistent: the state-contingent policy
optimal at time 0 remains optimal at any future date t . Principle of Optimality
is a statement of dynamic consistency.
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 9 / 35
u, with probability p
d, with probability 1 p
Start at time T 1 and compute the value function
J(T 1,W
T1
) =max E
T1
[U(W
T
)|
T1
] =
T1
pU[W
T1
(
T1
u+ (1
t 1
)R
f
)]+
max
T1
(1 p)U[W
T1
(
T1
d + (1
T1
)R
f
)]
Note that value function at T 1 depends on W
T1
only, due to the IID return
distribution.
c Dynamic Portfolio Choice II 10 / 35 Leonid Kogan ( MIT, Sloan ) 15.450, Fall 2010
Introduction to Dynamic Programming Dynamic Programming Applications
IID Returns
Binomial tree
Backward induction. Suppose that at t , t+1, ..., T 1 the value function has
been derived, and is of the form J(s,W
s
).
Compute the value function at t1 and verify that it still depends only on
portfolio value:
J(t1,W
t 1
) =max E
t 1
[J(t ,W
t
)|
t 1
] =
t 1
pJ[t ,W
t 1
(
t 1
u+ (1
t 1
)R
f
)]+
max
t 1
(1 p)J[t ,W
t 1
(
t 1
d + (1
t 1
)R
f
)]
Optimal portfolio policy
t 1
depends on time and the current portfolio value:
(t1,W
t 1
)
t 1
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 11 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
IID Returns, CRRA Utility
Binomial tree
Simplify the portfolio policy under CRRA utility U(W
T
) =
1
1
W
T
1
J(T 1,W
T1
) =max E
T1
1
W
1
|
T1
=
T1
1
T
1
W
1 1
p
1 T1
(
T1
u+ (1
T1
)R
f
) +
max
T1
1
W
1 1
(1 p)
1 T1
(
T1
d + (1
T1
)R
f
)
= A(T 1)W
1
T1
where A(T 1)is a constant given by
p (
T1
u+ (1
T1
)R
f
)
1
+
1
A(T 1) =max
T1
1
(1 p) (
T1
d + (1
T1
)R
f
)
1
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 12 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
IID Returns, CRRA Utility
Binomial tree
Backward induction
J(t1,W
t 1
) =max E
t 1
A(t )W
t
1
|
t 1
=
t 1
pA(t )W
t
1
(
t 1
u+ (1
t 1
)R
f
)
1
+
max
t 1
(1 p)A(t )W
1
(
t 1
d + (1
t 1
)R
f
)
1
t 1
=A(t1)W
1
t 1
where A(t1)is a constant given by
p (
t 1
u+ (1
t 1
)R
f
)
1
+
t 1
(1 p) (
t 1
d + (1
t 1
)R
f
)
1
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 13 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
Black-Scholes Model, CRRA Utility
Limit of binomial tree
Parameterize the binomial tree so the stock price process converges to the
Geometric Brownian motion with parameters and : p=1/2,
u =exp
2
t+
t , d =exp
2
t
t
2 2
Let R
f
=exp(rt ). Time step is now t instead of 1.
Take a limit of the optimal portfolio policy as t 0:
p (
t
u+ (1
t
)R
f
)
1
+
t
1
(1 p) (
t
d + (1
t
)R
f
)
1 + (1 )(r +
t
(r))t
arg max A(t+t )
t
(1/2)(1 )
2
t
2
t
Optimal portfolio policy
r
=
t
2
c Dynamic Portfolio Choice II 14 / 35 Leonid Kogan ( MIT, Sloan ) 15.450, Fall 2010
Introduction to Dynamic Programming Dynamic Programming Applications
Black-Scholes Model, CRRA Utility
Optimal portfolio policy
r
=
t
2
We have recovered the Mertons solution using DP. Mertons original
derivation was very similar, using DP in continuous time.
The optimal portfolio policy is myopic, does not depend on the problem
horizon.
The value function has the same functional form as the utility function:
indirect utility of wealth is CRRA with the same coefcient of relative risk
aversion as the original utility. That is why the optimal portfolio policy is
myopic.
If return distribution was not IID, the portfolio policy would be more complex.
The value function would depend on additional variables, thus the optimal
portfolio policy would not be myopic.
c Dynamic Portfolio Choice II 15 / 35 Leonid Kogan ( MIT, Sloan ) 15.450, Fall 2010
W
t
1 T
t =1
cannot be expressed as
T1
u(t ,Y
t
,
t
) +u(T,Y
T
)
t =0
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 18 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
Formulation
State augmentation
Continue with the previous example. Dene an additional state variable A
t
:
t
1
A
t
= W
s
t
s=1
Now the state vector becomes
Y
t
= (W
t
,A
t
)
Is this a controlled Markov process?
The distribution of W
t +1
depends only on W
t
and
t
.
Verify that the distribution of (W
t +1
,A
t +1
)depends only on (W
t
,A
t
):
t +1
1
1
A
t +1
= W
s
= (tA
t
+W
t +1
)
t+1 t+1
s=1
Y
t
is indeed a controlled Markov process.
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 19 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
Formulation
Optimal stopping
Optimal stopping is a special case of dynamic optimization, and can be
formulated using the above framework.
Consider the problem of pricing an American option on a binomial tree.
Interest rate is r and the option payoff at the exercise date is H(S
).
The objective is to nd the optimal exercise policy
, which solves
max E
Q
0
(1 +r)
H(S
t
t =0
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 21 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
Bellman Equation
The value function and the optimal policy solve the Bellman equation
J(t1,Y
t 1
) =max E
t 1
[u(t1,Y
t 1
,
t 1
) +J(t ,Y
t
)|
t 1
]
t 1
J(T,Y
T
) =u(T,Y
T
)
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 22 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
American Option Pricing
Consider the problem of pricing an American option on a binomial tree.
Interest rate is r and the option payoff at the exercise date is H(S
).
The objective is to nd the optimal exercise policy
, which solves
Q
0
(1 +r)
H(S
) max E
t
max E
t
{0,1}
t =0
If X
t
=1, the option has not been exercised yet.
Option price P(t ,S
t
,X =0) =0 and P(t ,S
t
,X =1)satises
P(t ,S
t
,X =1) =max H(S
t
),(1 +r )
1
E
Q
t
[P(t+1,S
t +1
,X =1)]
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 24 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
Asset Allocation with Return Predictability
Formulation
Suppose stock returns have a binomial distribution: p=1/2,
u
t
=exp
t
2
t+
t , d
t
=exp
t
2
t
t
2 2
where the conditional expected return
t
is stochastic and follows a Markov
process with transition density
f(
t
|
t 1
)
Conditionally on
t 1
,
t
is independent of R
t
.
Let R
f
=exp(rt ).
The objective is to maximize expected CRRA utility of terminal portfolio value
max E
0
1
W
1
1
T
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 25 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
Asset Allocation with Return Predictability
Bellman equation
We conjecture that the value function is of the form
J(t ,W
t
,
t
) =A(t ,
t
)W
t
1
The Bellman equation takes form
A(t1,
t 1
)W
t
1
=max E
t 1
A(t ,
t
) (W
t 1
(
t 1
(R
t
R
f
) +R
f
))
1
t 1
The initial condition for the Bellman equation implies
1
A(T,
T
) =
1
We verify that the conjectured value function satises the Bellman equation if
A(t1,
t 1
) =max E
t 1
A(t ,
t
) (
t 1
(R
t
R
f
) +R
f
)
1
t 1
Note that the RHS depends only on
t 1
.
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 26 / 35
t 1
=arg max E
t 1
A(t ,
t
) (
t 1
(R
t
R
f
) +R
f
)
1
t 1
=arg max E
t 1
[A(t ,
t
)] E
t 1
(
t 1
(R
t
R
f
) +R
f
)
1
t 1
because, conditionally on
t 1
,
t
is independent of R
t
.
Optimal portfolio policy is myopic, does not depend on the problem horizon.
This is due to the independence assumption.
Can nd
t
numerically.
In the continuous-time limit of t 0,
t
r
=
t
2
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 27 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
Asset Allocation with Return Predictability
Hedging demand
Assume now that the dynamics of conditional expected returns is correlated
with stock returns, i.e., the distribution of
t
given
t 1
is no longer
independent of R
t
.
The value function has the same functional form as before,
J(t ,W
t
,
t
) =A(t ,
t
)W
t
1
The optimal portfolio policy satises
t 1
=arg max E
t 1
A(t ,
t
) (
t 1
(R
t
R
f
) +R
f
)
1
t 1
Optimal portfolio policy is no longer myopic: dependence between
t
and R
t
affects the optimal policy.
The deviation from the myopic policy is called hedging demand. It is non-zero
due to the fact that the investment opportunities (
t
) change stochastically,
and the stock can be used to hedge that risk.
c Dynamic Portfolio Choice II 28 / 35 Leonid Kogan ( MIT, Sloan ) 15.450, Fall 2010
=
T1
2
3
J(T 1,S
T2
,W
T1
) =W
T1
S
T2
+ W
T1
4
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 31 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
Optimal Control of Execution Costs
Solution
Continue with backward induction to nd
W
Tk
b =
Tk
k +1
k +2
J(T k,S
Tk1
,W
Tk
) =W
Tk
S
Tk1
+ W
Tk
2(k +1)
Conclude that the optimal policy is deterministic
b
b
0
=b
1
= =b =
T
T +1
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 32 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
Key Points
Principle of Optimality for Dynamic Programming.
Bellman equation.
Formulate dynamic portfolio choice using controlled Markov processes.
Mertons solution.
Myopic policy and hedging demand.
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 33 / 35
Introduction to Dynamic Programming Dynamic Programming Applications
References
Bertsimas, D., A. Lo, 1998, Optimal control of execution costs, Journalof
FinancialMarkets1, 1-50.
c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 34 / 35
MIT OpenCourseWare
http://ocw.mit.edu
15.450 Analytics of Finance
Fall 2010
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms .