Continous Time Linear Quadratic Regulator

EE363 Winter 2008-09
Lecture 4
Continuous time linear quadratic regulator
continuous-time LQR problem
dynamic programming solution
Hamiltonian system and two point boundary value problem
infinite horizon LQR
direct solution of ARE via Hamiltonian
41
Continuous-time LQR problem
continuous-time system x = Ax + Bu, x(0) = x0
problem: choose u : [0, T ] Rm to minimize

Z T
x( ) Qx( ) + u( ) Ru( ) d + x(T )T Qf x(T )
T T

J=
0
T is time horizon
Q = QT 0, Qf = QTf 0, R = RT > 0 are state cost, final state
cost, and input cost matrices
. . . an infinite-dimensional problem: (trajectory u : [0, T ] Rm is

variable)
Continuous time linear quadratic regulator 42

Dynamic programming solution
well solve LQR problem using dynamic programming
for 0 t T we define the value function Vt : Rn R by

Z T
x( ) Qx( ) + u( ) Ru( ) d + x(T )T Qf x(T )
T T

Vt(z) = min
u t
subject to x(t) = z, x = Ax + Bu
minimum is taken over all possible signals u : [t, T ] Rm

Vt(z) gives the minimum LQR cost-to-go, starting from state z at time t
VT (z) = z T Qf z

fact: Vt is quadratic, i.e., Vt(z) = z T Ptz, where Pt = PtT 0
similar to discrete-time case:
Pt can be found from a differential equation running backward in time

from t = T
the LQR optimal u is easily expressed in terms of Pt

we start with x(t) = z
lets take u(t) = w Rm, a constant, over the time interval [t, t + h],
where h > 0 is small
cost incurred over [t, t + h] is

Z t+h
x( ) Qx( ) + w Rw d h(z T Qz + wT Rw)
T T

t
and we end up at x(t + h) z + h(Az + Bw)

min-cost-to-go from where we land is approximately
Vt+h(z + h(Az + Bw))

= (z + h(Az + Bw))T Pt+h(z + h(Az + Bw))
(z + h(Az + Bw))T (Pt + hPt)(z + h(Az + Bw))

T T T T
z Ptz + h (Az + Bw) Ptz + z Pt(Az + Bw) + z Ptz
(dropping h2 and higher terms)
cost incurred plus min-cost-to-go is approximately

T T T T T T
z Ptz+h z Qz + w Rw + (Az + Bw) Ptz + z Pt(Az + Bw) + z Ptz
minimize over w to get (approximately) optimal w:
2hwT R + 2hz T PtB = 0

so
w = R1B T Ptz
thus optimal u is time-varying linear state feedback:
ulqr(t) = Ktx(t), Kt = R1B T Pt

HJ equation
now lets substitute w into HJ equation:
z T Ptz z T Ptz+
T T T T
+h z Qz + w T
Rw + (Az + Bw ) Ptz + z Pt(Az + Bw ) + z Ptz

yields, after simplification,
Pt = AT Pt + PtA PtBR1B T Pt + Q
which is the Riccati differential equation for the LQR problem
we can solve it (numerically) using the final condition PT = Qf

Summary of cts-time LQR solution via DP
1. solve Riccati differential equation
Pt = AT Pt + PtA PtBR1B T Pt + Q, PT = Qf
(backward in time)
2. optimal u is ulqr(t) = Ktx(t), Kt := R1B T Pt
DP method readily extends to time-varying A, B, Q, R, and tracking

problem

Steady-state regulator
usually Pt rapidly converges as t decreases below T
limit Pss satisfies (cts-time) algebraic Riccati equation (ARE)
AT P + P A P BR1B T P + Q = 0
a quadratic matrix equation
Pss can be found by (numerically) integrating the Riccati differential

equation, or by direct methods
for t not close to horizon T , LQR optimal input is approximately a
linear, constant state feedback
u(t) = Kssx(t), Kss = R1B T Pss

Derivation via discretization
lets discretize using small step size h > 0, with N h = T
x((k + 1)h) x(kh) + hx(kh) = (I + hA)x(kh) + hBu(kh)
N 1
hX 1
J T T
x(kh) Qx(kh) + u(kh) Ru(kh) + x(N h)T Qf x(N h)
2 2
k=0
this yields a discrete-time LQR problem, with parameters
A = I + hA, B = hB, Q = hQ, R = hR, Qf = Qf

solution to discrete-time LQR problem is u(kh) = Kk x(kh),
Kk = (R + B T Pk+1B)1B T Pk+1A
Pk1 = Q + AT Pk A AT Pk B(R + B T Pk B)1B T Pk A
substituting and keeping only h0 and h1 terms yields
Pk1 = hQ + Pk + hAT Pk + hPk A hPk BR1B T Pk
which is the same as

1
(Pk Pk1) = Q + AT Pk + Pk A Pk BR1B T Pk
h
letting h 0 we see that Pk Pkh, where
P = Q + AT P + P A P BR1B T P

similarly, we have
Kk = (R + B T Pk+1B)1B T Pk+1A
= (hR + h2B T Pk+1B)1hB T Pk+1(I + hA)
R1B T Pkh
as h 0

Derivation using Lagrange multipliers
pose as constrained problem:
1 T
x( )T Qx( ) + u( )T Ru( ) d + 21 x(T )T Qf x(T )
R
minimize J = 2 0
subject to x(t) = Ax(t) + Bu(t), t [0, T ]
optimization variable is function u : [0, T ] Rm

infinite number of equality constraints, one for each t [0, T ]
introduce Lagrange multiplier function : [0, T ] Rn and form

Z T
L=J+ ( )T (Ax( ) + Bu( ) x( )) d
0

Optimality conditions
(note: you need distribution theory to really make sense of the derivatives here . . . )
from u(t)L = Ru(t) + B T (t) = 0 we get u(t) = R1B T (t)
to find x(t)L, we use
Z T Z T
( )T x( ) d = (T )T x(T ) (0)T x(0) ( )T x( ) d
0 0
from x(t)L = Qx(t) + AT (t) + (t) = 0 we get
(t) = AT (t) Qx(t)
from x(T )L = Qf x(T ) (T ) = 0, we get (T ) = Qf x(T )

Co-state equations
optimality conditions are
x = Ax + Bu, x(0) = x0, = AT Qx, (T ) = Qf x(T )
using u(t) = R1B T (t), can write as
T

d x(t) A BR B1
x(t)
=
dt (t) Q AT (t)
2n 2n matrix above is called Hamiltonian for problem

with conditions x(0) = x0, (T ) = Qf x(T ), called two-point boundary
value problem

as in discrete-time case, we can show that (t) = Ptx(t), where
Pt = AT Pt + PtA PtBR1B T Pt + Q, PT = Qf
in other words, value function Pt gives simple relation between x and
to show this, we show that = P x satisfies co-state equation

= AT Qx
d
= (P x) = P x + P x
dt
= (Q + AT P + P A P BR1B T P )x + P (Ax BR1B T )
= Qx AT P x + P BR1B T P x P BR1B T P x
= Qx AT

Solving Riccati differential equation via Hamiltonian
the (quadratic) Riccati differential equation
P = AT P + P A P BR1B T P + Q
and the (linear) Hamiltonian differential equation
T

d x(t) 1
A BR B x(t)
=
dt (t) Q AT (t)
are closely related
(t) = Ptx(t) suggests that P should have the form Pt = (t)x(t)1

(but this doesnt make sense unless x and are scalars)

consider the Hamiltonian matrix (linear) differential equation
T

d X(t) 1
A BR B X(t)
=
dt Y (t) Q AT Y (t)
where X(t), Y (t) Rnn
then, Z(t) = Y (t)X(t)1 satisfies Riccati differential equation
Z = AT Z + ZA ZBR1B T Z + Q
hence we can solve Riccati DE by solving (linear) matrix Hamiltonian DE,

with final conditions X(T ) = I, Y (T ) = Qf , and forming
P (t) = Y (t)X(t)1

d
Z = Y X 1
dt
= Y X 1 Y X 1XX 1
T T
1 1 1

= (QX A Y )X YX AX BR B Y X 1
= Q AT Z ZA + ZBR1B T Z
where we use two identities:
d
(F (t)G(t)) = F (t)G(t) + F (t)G(t)
dt
d 1

F (t) = F (t)1F (t)F (t)1
dt

Infinite horizon LQR
we now consider the infinite horizon cost function

Z
J= x( )T Qx( ) + u( )T Ru( ) d
0
we define the value function as

Z
V (z) = min x( )T Qx( ) + u( )T Ru( ) d
u 0
subject to x(0) = z, x = Ax + Bu
we assume that (A, B) is controllable, so V is finite for all z
can show that V is quadratic: V (z) = z T P z, where P = P T 0

optimal u is u(t) = Kx(t), where K = R1B T P
(i.e., a constant linear state feedback)
HJ equation is ARE
Q + AT P + P A P BR1B T P = 0
which together with P 0 characterizes P
can solve as limiting value of Riccati DE, or via direct method

Closed-loop system
with K LQR optimal state feedback gain, closed-loop system is
x = Ax + Bu = (A + BK)x
fact: closed-loop system is stable when (Q, A) observable and (A, B)

controllable
we denote eigenvalues of A + BK, called closed-loop eigenvalues, as

1, . . . , n
with assumptions above, i < 0

Solving ARE via Hamiltonian
T T

A BR B 1
I A BR B P 1
A + BK
= =
Q AT P Q AT P Q AT P
and so
T T

I 0 A BR B 1
I 0 A + BK BR B1
=
P I Q AT P I 0 (A + BK)T
where 0 in lower left corner comes from ARE
note that 1
I 0 I 0
=
P I P I

we see that:
eigenvalues of Hamiltonian H are 1, . . . , n and 1, . . . , n
hence, closed-loop eigenvalues are the eigenvalues of H with negative

real part

lets assume A + BK is diagonalizable, i.e.,
T 1(A + BK)T = = diag(1, . . . , n)
then we have T T (A BK)T T T = , so
T

T 1
0 A + BK
BR B 1
T 0
0 TT (A + BK)T
0 0 T T
1 T T

T BR B T
1
=
0

putting it together we get

T 1
0 0 I 0I T 0
H
0 TT I P
P I 0 T T

T 1
0 T 0
= H
T T P T T P T T T
1 T T

T BR B T
1
=
0
and so
T T
H =
PT PT

T
thus, the n columns of are the eigenvectors of H associated with
PT
the stable eigenvalues 1, . . . , n

Solving ARE via Hamiltonian
find eigenvalues of H, and let 1, . . . , n denote the n stable ones

(there are exactly n stable and n unstable ones)
find associated eigenvectors v1, . . . , vn, and partition as

X
R2nn

v1 vn =
Y
P = Y X 1 is unique PSD solution of the ARE
(this is very close to the method used in practice, which does not require
A + BK to be diagonalizable)

Continous Time Linear Quadratic Regulator

Uploaded by

Copyright:

Available Formats

You might also like

Continous Time Linear Quadratic Regulator

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Continous Time Linear Quadratic Regulator

Uploaded by

Copyright:

Available Formats

EE363 Winter 2008-09

continuous-time LQR problem

dynamic programming solution

Hamiltonian system and two point boundary value problem

infinite horizon LQR

direct solution of ARE via Hamiltonian

continuous-time system x = Ax + Bu, x(0) = x0

problem: choose u : [0, T ] Rm to minimize

. . . an infinite-dimensional problem: (trajectory u : [0, T ] Rm is

Continuous time linear quadratic regulator 42

well solve LQR problem using dynamic programming

for 0 t T we define the value function Vt : Rn R by

minimum is taken over all possible signals u : [t, T ] Rm

Continuous time linear quadratic regulator 43

similar to discrete-time case:

Pt can be found from a differential equation running backward in time

the LQR optimal u is easily expressed in terms of Pt

Continuous time linear quadratic regulator 44

cost incurred over [t, t + h] is

and we end up at x(t + h) z + h(Az + Bw)

Continuous time linear quadratic regulator 45

Vt+h(z + h(Az + Bw))

(dropping h2 and higher terms)

cost incurred plus min-cost-to-go is approximately

minimize over w to get (approximately) optimal w:

2hwT R + 2hz T PtB = 0

Continuous time linear quadratic regulator 46

thus optimal u is time-varying linear state feedback:

ulqr(t) = Ktx(t), Kt = R1B T Pt

Continuous time linear quadratic regulator 47

now lets substitute w into HJ equation:

yields, after simplification,

which is the Riccati differential equation for the LQR problem

we can solve it (numerically) using the final condition PT = Qf

Continuous time linear quadratic regulator 48

1. solve Riccati differential equation

2. optimal u is ulqr(t) = Ktx(t), Kt := R1B T Pt

DP method readily extends to time-varying A, B, Q, R, and tracking

Continuous time linear quadratic regulator 49

usually Pt rapidly converges as t decreases below T

limit Pss satisfies (cts-time) algebraic Riccati equation (ARE)

a quadratic matrix equation

Pss can be found by (numerically) integrating the Riccati differential

u(t) = Kssx(t), Kss = R1B T Pss

Continuous time linear quadratic regulator 410

lets discretize using small step size h > 0, with N h = T

x((k + 1)h) x(kh) + hx(kh) = (I + hA)x(kh) + hBu(kh)

this yields a discrete-time LQR problem, with parameters

A = I + hA, B = hB, Q = hQ, R = hR, Qf = Qf

Continuous time linear quadratic regulator 411

Pk1 = Q + AT Pk A AT Pk B(R + B T Pk B)1B T Pk A

substituting and keeping only h0 and h1 terms yields

Pk1 = hQ + Pk + hAT Pk + hPk A hPk BR1B T Pk

which is the same as

letting h 0 we see that Pk Pkh, where

Continuous time linear quadratic regulator 412

Continuous time linear quadratic regulator 413

pose as constrained problem:

optimization variable is function u : [0, T ] Rm