Professional Documents
Culture Documents
Dynamic Programming 2
Dynamic Programming 2
University of Hohenheim
Dynamic Programming
Dynamic Programming
We denote by ct control variables that can be chosen by the agent and are
typically flows (e.g., consumption).
We denote by xt the state variables that are typically stocks and summarize the
decision maker’s situation (e.g., assets).
The choices of ct affect xt+1 , and the initial values of xt are denoted by x0 and
given exogenously.
The utility function is again given by
T
X
U0 = β t u(ct ).
t=0
The dynamic constraint has the form
xt+1 = f (xt , ct ).
Dynamic Programming
Dynamic Programming
We can apply the same principle to the value fn. of this problem.
Maximum attainable utility is T
X
V (xs ) = max Us = max β t−s u(ct ).
{ct }T
t=s {ct }T
t=s t=s
Suppose that the decision maker has already solved the problem that starts at time
t + 1 for a given state variable xt+1 .
Then the maximum attainable utility at time t can be broken into instantaneous
utility and maximum attainable utility afterwards,
given the choice that leads to xt+1 .
We therefore have
V (xt ) = max u(ct ) + βV (xt+1 ) s.t. xt+1 = f (xt , ct ).
ct
This equation is called the Bellman Equation.
Dynamic Programming
The idea that given the decision at t, the subsequent decision should be optimal
starting at t + 1 is Bellman’s Principle of Optimality.
Solving the Bellman equation for all t delivers the optimal sequence of control
variables.
If T is finite, the sequence of Bellman eq’ns can be solved recursively (solve for the
last period, then for the second last and so on).
However, this can be tedious, and for T = ∞ it does not work.
Fortunately, there are better methods that allow to gain more insights.
We focus on T = ∞ from now on.
Dynamic Programming
Dynamic Programming
Step 3: Use the FOCs w.r.t. ct and ct+1 to express V ′ (xt+1 ) and V ′ (xt+2 ):
∂f (xt , ct ) u ′ (ct )
u ′ (ct ) + βV ′ (xt+1 ) = 0 ⇔ V ′ (xt+1 ) = − ∂f (x ,c )
∂ct β t t
∂ct
Dynamic Programming
The household “sums up” the utility that is generated by consuming c(t) at each
instant between 0 and T : Z T
U= u [c(t)] dt.
0
R P
This follows from the analogy between ↔ .
PT
In discrete time we had U = i=0 u(ci ).
In the following, we suppress the time index t whenever this does not impair the
clarity of the exposition.
We often assume an infinite horizon over which individuals optimize such that
T → ∞. This simplification could be justified by a dynastic perspective with
intergenerational altruism. Then, we
Z have ∞
U= u(c)dt.
0
Recall that households discount the future at rate ρ, i.e., they are impatient, such
that the discount factor is e −ρt and
Z ∞we have
U= u(c)e −ρt dt.
0
Moreover, a household has wealth a.
The household gets asset income r · a (interest paid on assets).
It earns labor income w (on normalized labor input L ≡ 1).
In such a setting, the assets of the household evolve over time according to the
following differential equation:
ȧ = w + ra − c.
This is called the flow budget constraint.
If the time horizon ends at T , the remaining level of debt must be non-negative,
i.e., a(T ) ≥ 0 has to hold.
In case of an infinite planning horizon, the equivalent formulation is
lim a(t)e −r̄ t ≥ 0,
t→∞
1
Rt
where r̄ = t 0
r (τ )dτ is the average interest rate.
Interpretation: Debt can only grow at a rate lower than r̄ .
Putting it differently: You are not allowed to borrow and pay back the borrowed
money with even higher debt.
Putting it yet another way: The present value of assets must be non-negative.
This is called the No-Ponzi-Game condition (Ponzi scheme = Schneeballsystem).
If a(T ) > 0, households could still increase utility by consuming more in the last
period of their lives such that the original solution would not have been optimal.
Therefore, only a(T ) = 0 is consistent with an optimal behavior in the last period.
Optimality for the case of an infinite planning horizon requires to leave no assets
asymptotically:
lim a(t)e −r̄ t = 0.
t→∞
This is called the transversality condition.
It encompasses the No-Ponzi-Game condition.
Now we have all the ingredients to solve the optimization problem by means of
Pontryagins Maximum Principle (Pontryagin et al. 1962).
Let
x be the state variables (assets, velocity, etc.),
c be the control variables (consumption, thrust, etc.),
f (t, x , c) be the objective function (discounted utility, etc.),
g(t, x , c) be the dynamic constraint (flow budget constraint, etc.)
Then, we need to determine the optimal control c that solves the following
maximization problem: Z T
max f (t, x , c)dt,
c(t)
0
s.t. ẋ = g(t, x , c),
x (0) = x0
and a suitably defined transversality condition.
Remark: Also T → ∞ is allowed.
Remarks
The variable λ is called the costate variable:
It can be interpreted as the shadow price of the state variable
analogous to the Lagrangian but now depending on time,
such that λ is a function of t (one restriction per instant).
In case of T → ∞, the transversality condition becomes
lim x (t)λ(t) = 0.
t→∞
Hamiltonian:
H = u(c)e −ρt + λ[w + ra − c].
Necessary first order conditions:
∂H
=0 ⇒ u ′ (c)e −ρt − λ = 0,
∂c
∂H
= ȧ ⇒ w + ra − c = ȧ,
∂λ
∂H
= −λ̇ ⇒ λr = −λ̇,
∂a
and lim a(t)λ(t) = 0.
t→∞
Consider a non-renewable resource (coal, crude oil, natural gas, etc.) with a total
fixed global stock of z(t).
Denote the amount of the resource that is extracted
at each point in time by q(t).
The market price of the resource is given by p(t).
The market interest rate is given by r .
We assume that the cost of extraction is negligible.
Price-taking firms aim to maximize the total profit stream.
Each unit that is extracted today reduces the future resource use,
and we have an intertemporal optimization problem.
Hamiltonian:
H = p · q · e −rt + λ(−q).
Necessary first order conditions:
∂H
=0 ⇒ pe −rt = λ, (1)
∂q
∂H
= ż ⇒ −q = ż, (2)
∂λ
∂H
= −λ̇ ⇒ 0 = −λ̇, (3)
∂z
Since λ is the shadow value of leaving the resource in the ground, Equation (1)
implies that this has to be equal to the present value of the market price of the
resource.
Equation (3) implies that the present value of the resource in the ground has to be
constant.