5 G

Linear Quadratic Dynamic Programming
Sang Seok Lee
Department of Economics, Bilkent University
Spring Semester 2017

This Lecture Is Based on
I Ljungqvist and Sargent (2004)

I Hansen and Sargent (2008)
I A nice alternative to my lecture slides is Martin Ellison's lecture
slides at http://users.ox.ac.uk/~exet2581/recursive/lqg_mat.pdf
Matrix Calculus
I Let z, x, and a be n 1 vectors and y be an m 1 vector; Let A be

an n n matrix and B be an m n matrix
I Some rules:
a0 x
x = a
I
0 0
x Ax
I
x = (A + A )x ; if A is symmetric, it is 2Ax
2 x Ax
0 0
I
xx 0
= (A + A ); if A is symmetric, it is 2A
0 0
x Ax
A = xx
I
0 0
y Bz y B
I
y = Bz ;
y = B
0
y Bz 0 0
I
z = B y ; Bzz = B
0
y Bz 0
B = yz
I
I See http://en.wikipedia.org/wiki/Matrix_calculus for a quick

(albeit clumsy) reference
What Is Linear Quadratic Dynamic Programming?
I Dynamic programming in which
I The return function is quadratic

I The transition function is linear
I The resulting optimal policy function is linear
I Let us call it LQDP from now on
I Implication: It is possible to solve the Bellman equation using
linear algebra
I LQDP is very popular in macroeconomics
I Kydland and Prescott (1982), which was discussed during the
RBC lecture, is an example

Optimal Linear Regulator Problem
I We will solve the following optimal linear regulator problems:
I Deterministic and undiscounted

I Deterministic and discounted
I Stochastic and discounted
Deterministic and Undiscounted
n
0 0
o
max
X

xt Rxt + ut Qut
{ut }t=0
t=0
s.t. xt+1 = Axt + But and a given x0 where
I xt is an n 1 vector of state variables and ut is a k 1 vector of control

variables
I R is a positive semidenite symmetric matrix and Q is a positive denite
symmetric matrix
I A is an n n matrix and B is an n k matrix
I We conjecture that the value function is quadratic:
0
V (x) = x Px
where P is a positive semidenite symmetric matrix

I The associated Bellman equation is
0
n 0 0 0
o
x Px = max x Rx u Qu (Ax + Bu) P(Ax + Bu)
u
where the transition law is substituted into the tomorrow's state. Expanding,
0
n 0 0 0 0 0 0 0 0 0 0
o
x Px = max x Rx u Qu x A PAx x A PBu u B PAx u B PBu
u
I Solving the problem on the right hand side gives the following FOC (remember
the symmetry of P ):
0 0
2Qu B 0 PAx B PAx 2B PBu = 0
0 0
(Q + B PB)u = B PAx
u = Fx
F = (Q + B PB)1 B PA
0 0
0
n 0 0 0 0 0 0 0 0 0 0
o
x Px = max x Rx u Qu x A PAx x A PBu u B PAx u B PBu
u
u = Fx
0 0
I Substituting u into the above,
0 0 0 0 0 0 0
x Px = x Rx (Fx) Q(Fx) x A PAx x A PB(Fx)
0 0 0 0
(Fx) B PAx (Fx) B PB(Fx)
0 0 0 0 0 0 0 0 0 0 0 0 0 0
x Px = x Rx x F QFx x A PAx +x A PBFx +x F B PAx x F B PBFx
0 0 0 0 0
I Note that x A PBFx = x F B PAx because they are scalars
I Rearranging,
0 0 0 0 0 0 0
x Px = x (R + F QF + A PA 2A PBF + F B PBF )x
I This implies
0 0 0 0 0
P = R + F QF + A PA 2A PBF + F B PBF
0 0 0 0
P = R + A PA + F (Q + B PB)F 2A PBF
0 0 0 0
P = R + A PA + F (Q + B PB)F 2A PBF
Substituting F = (Q + B PB)1 B PA into it,

0 0
I
P = R + A PA + A PB(Q + B PB)1 (Q + B PB)(Q + B PB)1 B PA

0 0 0 0 0 0
2A PB(Q + B PB)1 B PA
0 0 0
P = R + A PA A PB(Q + B PB)1 B PA ()
0 0 0 0
I () is called the algebraic matrix Riccati equation

I It has a positive semidenite solution under certain conditions
I This can be solved numerically: one can iterate on
Pj+1 = R + A Pj A A Pj B(Q + B Pj B)1 B Pj A

0 0 0 0
from P0 = 0 until convergence

Deterministic and Discounted
n 0 0
o
max 0<<1
X

t xt Rxt + ut Qut
{ut }t=0
t=0
s.t. xt+1 = Axt + But and a given x0
I The introduction of the discount factor modies the Riccati equation:

P = R + A PA 2 A PB(Q + B PB)1 B PA
0 0 0 0
I This can be solved by iterating on

Pj+1 = R + A Pj A 2 A Pj B(Q + B Pj B)1 B Pj A
0 0 0 0
from P0 = 0 until convergence

I The associated optimal policy function is
ut = Fxt
0 0
0
I The value function is still V (x) = x Px where P is obtained from
solving the Riccati equation above
Computer Codes
I Matlab's Control System toolbox can be used to solve the
Riccati equation on the previous slide
(http://de.mathworks.com/help/control/ref/dare.html)
I Tom Sargent provides a Matlab code
(https://ideas.repec.org/c/dge/qmrbcd/21.html)
I Martin Ellison also provides a Matlab code
(http://users.ox.ac.uk/~exet2581/recursive/riccati.m)
Stochastic and Discounted
(
)
n 0 0
o
max 0<<1
X

E0 t xt Rxt + ut Qut
{ut }t=0
t=0
s.t. xt+1 = Axt + But + C t+1 and a given x0 where
I t+1 is an n 1 vector of shocks that is iid-normal with a zero vector as

the mean and an identity matrix as the variance:
0
E {t+1 } = 0 and E {t+1 t+1 } = I
I The value function takes the form
0
v (x) = x Px d
where P is a positive semidenite matrix obtained from solving the
Riccati equation from the deterministic and discounted case and
d = (1 )1 trace(PCC )
0
I The optimal policy function is

ut = Fxt
0 0
I P and F for the deterministic and the stochastic case are
identical
I This is called Certainty Equivalence

I We discussed this in the previous lecture
I Let us prove the claims above

I The Bellman equation is
0
h 0 0
n 0
o i
x Pxd = max x Rx u Qu E (Ax + Bu + C ) P(Ax + Bu + C ) d
u
where E {.} is w.r.t

I Expanding,
0 0 0 0 0 0 0 0 0
x Px d = max[x Rx u Qu E {x A PAx + x A PBu + x A PC
u
0 0 0 0 0 0 0 0 0 0 0 0
+u B PAx + u B PBu + u B PC + C PAx + C PBu + C PC } d]
I Taking the expectations,
0 0 0 0 0 0 0
x Px d = max[x Rx u Qu x A PAx x A PBu
u
0 0 0 0 0 0
u B PAx u B PBu E { C PC } d]
I The FOC of the right hand side is
0 0 0
2Qu B PAx B PAx 2B PBu = 0
u = (Q + B PB)1 B PAx
0 0
0 0 0 0 0 0 0
x Px d = max[x Rx u Qu x A PAx x A PBu
u
0 0 0 0 0 0
u B PAx u B PBu E { C PC } d]
u = Fx
0 0
I Using the well-known result about the expected value of a quadratic form,
0 0 0
E { C PC } = trace(C PC )
I Because the trace is invariant under cyclic permutations,
0 0
trace (C PC ) = trace(PCC )
0 0 0
I So, E { C PC } = trace(PCC )
0 0
I Substituting ut and E { C PC } into the above,
0 0 0 0 0 0 0 0
x Px d = x Rx x F QFx x A PAx + x A PBFx
0 0 0 0 0 0 0
+x F B PAx x F B PBFx trace(PCC ) d
0 0 0 0 0 0 0 0 0
x Px d = x (R + F QF + A PA A PBF F B PA + F B PBF )x
0
(trace(PCC ) + d)
0 0 0 0 0 0 0 0 0
x Px d = x (R + F QF + A PA A PBF F B PA + F B PBF )x
0
(trace(PCC ) + d)
0 0
0 0 0
I Noting A PBF = F B PA, let us gure out P rst:
0 0 0 0
P = R + F (Q + B PB)F + A PA 2A PBF
P = R + 2 A PB(Q + B PB)1 (Q + B PB)(Q + B PB)1 B PA
0 0 0 0 0
+A PA 2 2 A PB(Q + B PB)1 B PA
0 0 0 0
P = R + A PA 2 A PB(Q + B PB)1 B PA
0 0 0 0
which is identical to the one from the deterministic and discounted case
as claimed above
I Now, let us gure out d :
0
d = (trace(PCC ) + d)
0
d= trace(PCC )
(1 )
Certainty Equivalence
0
I Even though d depends on CC which is the covariance matrix of
t+1 , F in the optimal policy function does not depend on it
I This implies that this framework is not suitable for studying
the eect of risk such as precautionary savings
I This certainty equivalence property is due to the quadratic objective

function, the linear transition function, and Et {t+1 } = 0
I It breaks down without these assumptions
Closing Remark
I I recommend that you take a look at the Lagrangian formulation of

the optimal linear regulator problem in Ljungqvist and Sargent
(2004)
I Chow (1997) is an excellent textbook which shows deep
connections between the Bellman and the Lagrangian
I I also recommend Hansen and Sargent's (2008) textbook for a

proper treatment of this topic and beyond

5 G

Uploaded by

Copyright:

Available Formats

You might also like

5 G

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5 G

Uploaded by

Copyright:

Available Formats

Linear Quadratic Dynamic Programming

Sang Seok Lee

Department of Economics, Bilkent University

Spring Semester 2017

I Ljungqvist and Sargent (2004)

I Let z, x, and a be n 1 vectors and y be an m 1 vector; Let A be

I See http://en.wikipedia.org/wiki/Matrix_calculus for a quick

I Dynamic programming in which

I The return function is quadratic

I Let us call it LQDP from now on

I Implication: It is possible to solve the Bellman equation using

I LQDP is very popular in macroeconomics

I Kydland and Prescott (1982), which was discussed during the

RBC lecture, is an example

I We will solve the following optimal linear regulator problems:

I Deterministic and undiscounted

s.t. xt+1 = Axt + But and a given x0 where

I xt is an n 1 vector of state variables and ut is a k 1 vector of control

where P is a positive semidenite symmetric matrix

I The associated Bellman equation is

I Substituting u into the above,

Substituting F = (Q + B PB)1 B PA into it,

P = R + A PA + A PB(Q + B PB)1 (Q + B PB)(Q + B PB)1 B PA

I () is called the algebraic matrix Riccati equation

Pj+1 = R + A Pj A A Pj B(Q + B Pj B)1 B Pj A

from P0 = 0 until convergence

s.t. xt+1 = Axt + But and a given x0

I The introduction of the discount factor modies the Riccati equation:

I This can be solved by iterating on

from P0 = 0 until convergence

I Matlab's Control System toolbox can be used to solve the

Riccati equation on the previous slide

I Tom Sargent provides a Matlab code

I Martin Ellison also provides a Matlab code

s.t. xt+1 = Axt + But + C t+1 and a given x0 where

I t+1 is an n 1 vector of shocks that is iid-normal with a zero vector as

I The optimal policy function is

I P and F for the deterministic and the stochastic case are

I This is called Certainty Equivalence

I Let us prove the claims above

where E {.} is w.r.t

I This implies that this framework is not suitable for studying

the eect of risk such as precautionary savings

I This certainty equivalence property is due to the quadratic objective

I I recommend that you take a look at the Lagrangian formulation of

I Chow (1997) is an excellent textbook which shows deep

connections between the Bellman and the Lagrangian

I I also recommend Hansen and Sargent's (2008) textbook for a

You might also like

I See http://en.wikipedia.org/wiki/Matrix_calculus for a quick

where P is a positive semidenite symmetric matrix

I The introduction of the discount factor modies the Riccati equation:

I This is called Certainty Equivalence

the eect of risk such as precautionary savings