Continous Time Linear Quadratic Regulator

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

EE363 Winter 2008-09

Lecture 4
Continuous time linear quadratic regulator

continuous-time LQR problem

dynamic programming solution

Hamiltonian system and two point boundary value problem

infinite horizon LQR

direct solution of ARE via Hamiltonian

41
Continuous-time LQR problem

continuous-time system x = Ax + Bu, x(0) = x0

problem: choose u : [0, T ] Rm to minimize


Z T
x( ) Qx( ) + u( ) Ru( ) d + x(T )T Qf x(T )
T T

J=
0

T is time horizon
Q = QT 0, Qf = QTf 0, R = RT > 0 are state cost, final state
cost, and input cost matrices

. . . an infinite-dimensional problem: (trajectory u : [0, T ] Rm is


variable)

Continuous time linear quadratic regulator 42


Dynamic programming solution

well solve LQR problem using dynamic programming

for 0 t T we define the value function Vt : Rn R by


Z T
x( ) Qx( ) + u( ) Ru( ) d + x(T )T Qf x(T )
T T

Vt(z) = min
u t

subject to x(t) = z, x = Ax + Bu

minimum is taken over all possible signals u : [t, T ] Rm


Vt(z) gives the minimum LQR cost-to-go, starting from state z at time t
VT (z) = z T Qf z

Continuous time linear quadratic regulator 43


fact: Vt is quadratic, i.e., Vt(z) = z T Ptz, where Pt = PtT 0

similar to discrete-time case:

Pt can be found from a differential equation running backward in time


from t = T

the LQR optimal u is easily expressed in terms of Pt

Continuous time linear quadratic regulator 44


we start with x(t) = z

lets take u(t) = w Rm, a constant, over the time interval [t, t + h],
where h > 0 is small

cost incurred over [t, t + h] is


Z t+h
x( ) Qx( ) + w Rw d h(z T Qz + wT Rw)
T T

t

and we end up at x(t + h) z + h(Az + Bw)

Continuous time linear quadratic regulator 45


min-cost-to-go from where we land is approximately

Vt+h(z + h(Az + Bw))


= (z + h(Az + Bw))T Pt+h(z + h(Az + Bw))
(z + h(Az + Bw))T (Pt + hPt)(z + h(Az + Bw))
 
T T T T
z Ptz + h (Az + Bw) Ptz + z Pt(Az + Bw) + z Ptz

(dropping h2 and higher terms)

cost incurred plus min-cost-to-go is approximately


 
T T T T T T
z Ptz+h z Qz + w Rw + (Az + Bw) Ptz + z Pt(Az + Bw) + z Ptz

minimize over w to get (approximately) optimal w:

2hwT R + 2hz T PtB = 0

Continuous time linear quadratic regulator 46


so
w = R1B T Ptz

thus optimal u is time-varying linear state feedback:

ulqr(t) = Ktx(t), Kt = R1B T Pt

Continuous time linear quadratic regulator 47


HJ equation

now lets substitute w into HJ equation:

z T Ptz z T Ptz+ 
T T T T
+h z Qz + w T
Rw + (Az + Bw ) Ptz + z Pt(Az + Bw ) + z Ptz

yields, after simplification,

Pt = AT Pt + PtA PtBR1B T Pt + Q

which is the Riccati differential equation for the LQR problem

we can solve it (numerically) using the final condition PT = Qf

Continuous time linear quadratic regulator 48


Summary of cts-time LQR solution via DP

1. solve Riccati differential equation

Pt = AT Pt + PtA PtBR1B T Pt + Q, PT = Qf

(backward in time)

2. optimal u is ulqr(t) = Ktx(t), Kt := R1B T Pt

DP method readily extends to time-varying A, B, Q, R, and tracking


problem

Continuous time linear quadratic regulator 49


Steady-state regulator

usually Pt rapidly converges as t decreases below T

limit Pss satisfies (cts-time) algebraic Riccati equation (ARE)

AT P + P A P BR1B T P + Q = 0

a quadratic matrix equation

Pss can be found by (numerically) integrating the Riccati differential


equation, or by direct methods
for t not close to horizon T , LQR optimal input is approximately a
linear, constant state feedback

u(t) = Kssx(t), Kss = R1B T Pss

Continuous time linear quadratic regulator 410


Derivation via discretization

lets discretize using small step size h > 0, with N h = T

x((k + 1)h) x(kh) + hx(kh) = (I + hA)x(kh) + hBu(kh)

N 1
hX  1
J T T
x(kh) Qx(kh) + u(kh) Ru(kh) + x(N h)T Qf x(N h)
2 2
k=0

this yields a discrete-time LQR problem, with parameters

A = I + hA, B = hB, Q = hQ, R = hR, Qf = Qf

Continuous time linear quadratic regulator 411


solution to discrete-time LQR problem is u(kh) = Kk x(kh),

Kk = (R + B T Pk+1B)1B T Pk+1A

Pk1 = Q + AT Pk A AT Pk B(R + B T Pk B)1B T Pk A

substituting and keeping only h0 and h1 terms yields

Pk1 = hQ + Pk + hAT Pk + hPk A hPk BR1B T Pk

which is the same as


1
(Pk Pk1) = Q + AT Pk + Pk A Pk BR1B T Pk
h

letting h 0 we see that Pk Pkh, where

P = Q + AT P + P A P BR1B T P

Continuous time linear quadratic regulator 412


similarly, we have

Kk = (R + B T Pk+1B)1B T Pk+1A
= (hR + h2B T Pk+1B)1hB T Pk+1(I + hA)
R1B T Pkh

as h 0

Continuous time linear quadratic regulator 413


Derivation using Lagrange multipliers

pose as constrained problem:

1 T
x( )T Qx( ) + u( )T Ru( ) d + 21 x(T )T Qf x(T )
R
minimize J = 2 0
subject to x(t) = Ax(t) + Bu(t), t [0, T ]

optimization variable is function u : [0, T ] Rm


infinite number of equality constraints, one for each t [0, T ]

introduce Lagrange multiplier function : [0, T ] Rn and form


Z T
L=J+ ( )T (Ax( ) + Bu( ) x( )) d
0

Continuous time linear quadratic regulator 414


Optimality conditions

(note: you need distribution theory to really make sense of the derivatives here . . . )

from u(t)L = Ru(t) + B T (t) = 0 we get u(t) = R1B T (t)

to find x(t)L, we use

Z T Z T
( )T x( ) d = (T )T x(T ) (0)T x(0) ( )T x( ) d
0 0

from x(t)L = Qx(t) + AT (t) + (t) = 0 we get

(t) = AT (t) Qx(t)

from x(T )L = Qf x(T ) (T ) = 0, we get (T ) = Qf x(T )

Continuous time linear quadratic regulator 415


Co-state equations

optimality conditions are

x = Ax + Bu, x(0) = x0, = AT Qx, (T ) = Qf x(T )

using u(t) = R1B T (t), can write as

T
    
d x(t) A BR B1
x(t)
=
dt (t) Q AT (t)

2n 2n matrix above is called Hamiltonian for problem


with conditions x(0) = x0, (T ) = Qf x(T ), called two-point boundary
value problem

Continuous time linear quadratic regulator 416


as in discrete-time case, we can show that (t) = Ptx(t), where

Pt = AT Pt + PtA PtBR1B T Pt + Q, PT = Qf

in other words, value function Pt gives simple relation between x and

to show this, we show that = P x satisfies co-state equation


= AT Qx

d
= (P x) = P x + P x
dt
= (Q + AT P + P A P BR1B T P )x + P (Ax BR1B T )
= Qx AT P x + P BR1B T P x P BR1B T P x
= Qx AT

Continuous time linear quadratic regulator 417


Solving Riccati differential equation via Hamiltonian

the (quadratic) Riccati differential equation

P = AT P + P A P BR1B T P + Q

and the (linear) Hamiltonian differential equation

T
    
d x(t) 1
A BR B x(t)
=
dt (t) Q AT (t)

are closely related

(t) = Ptx(t) suggests that P should have the form Pt = (t)x(t)1


(but this doesnt make sense unless x and are scalars)

Continuous time linear quadratic regulator 418


consider the Hamiltonian matrix (linear) differential equation

T
    
d X(t) 1
A BR B X(t)
=
dt Y (t) Q AT Y (t)

where X(t), Y (t) Rnn

then, Z(t) = Y (t)X(t)1 satisfies Riccati differential equation

Z = AT Z + ZA ZBR1B T Z + Q

hence we can solve Riccati DE by solving (linear) matrix Hamiltonian DE,


with final conditions X(T ) = I, Y (T ) = Qf , and forming
P (t) = Y (t)X(t)1

Continuous time linear quadratic regulator 419


d
Z = Y X 1
dt
= Y X 1 Y X 1XX 1
T T
1 1 1

= (QX A Y )X YX AX BR B Y X 1
= Q AT Z ZA + ZBR1B T Z

where we use two identities:

d
(F (t)G(t)) = F (t)G(t) + F (t)G(t)
dt

d 1

F (t) = F (t)1F (t)F (t)1
dt

Continuous time linear quadratic regulator 420


Infinite horizon LQR

we now consider the infinite horizon cost function


Z
J= x( )T Qx( ) + u( )T Ru( ) d
0

we define the value function as


Z
V (z) = min x( )T Qx( ) + u( )T Ru( ) d
u 0

subject to x(0) = z, x = Ax + Bu

we assume that (A, B) is controllable, so V is finite for all z

can show that V is quadratic: V (z) = z T P z, where P = P T 0

Continuous time linear quadratic regulator 421


optimal u is u(t) = Kx(t), where K = R1B T P
(i.e., a constant linear state feedback)

HJ equation is ARE

Q + AT P + P A P BR1B T P = 0

which together with P 0 characterizes P

can solve as limiting value of Riccati DE, or via direct method

Continuous time linear quadratic regulator 422


Closed-loop system

with K LQR optimal state feedback gain, closed-loop system is

x = Ax + Bu = (A + BK)x

fact: closed-loop system is stable when (Q, A) observable and (A, B)


controllable

we denote eigenvalues of A + BK, called closed-loop eigenvalues, as


1, . . . , n

with assumptions above, i < 0

Continuous time linear quadratic regulator 423


Solving ARE via Hamiltonian

T T
      
A BR B 1
I A BR B P 1
A + BK
= =
Q AT P Q AT P Q AT P

and so
T T
     
I 0 A BR B 1
I 0 A + BK BR B1
=
P I Q AT P I 0 (A + BK)T

where 0 in lower left corner comes from ARE

note that  1  
I 0 I 0
=
P I P I

Continuous time linear quadratic regulator 424


we see that:

eigenvalues of Hamiltonian H are 1, . . . , n and 1, . . . , n

hence, closed-loop eigenvalues are the eigenvalues of H with negative


real part

Continuous time linear quadratic regulator 425


lets assume A + BK is diagonalizable, i.e.,

T 1(A + BK)T = = diag(1, . . . , n)

then we have T T (A BK)T T T = , so

T
   
T 1
0 A + BK
BR B 1
T 0
0 TT (A + BK)T
0 0 T T
1 T T
 
T BR B T
1
=
0

Continuous time linear quadratic regulator 426


putting it together we get
     
T 1
0 0 I 0I T 0
H
0 TT I P
P I 0 T T
   
T 1
0 T 0
= H
T T P T T P T T T
1 T T
 
T BR B T
1
=
0

and so    
T T
H =
PT PT
 
T
thus, the n columns of are the eigenvectors of H associated with
PT
the stable eigenvalues 1, . . . , n

Continuous time linear quadratic regulator 427


Solving ARE via Hamiltonian

find eigenvalues of H, and let 1, . . . , n denote the n stable ones


(there are exactly n stable and n unstable ones)

find associated eigenvectors v1, . . . , vn, and partition as


 
X
R2nn
 
v1 vn =
Y

P = Y X 1 is unique PSD solution of the ARE

(this is very close to the method used in practice, which does not require
A + BK to be diagonalizable)

Continuous time linear quadratic regulator 428

You might also like