Professional Documents
Culture Documents
Athans Workshop 10 07
Athans Workshop 10 07
Lewis
Moncrief-ODonnell Endowed Chair
Head, Controls & Sensors Group
Supported by :
NSF - PAUL WERBOS
ARO RANDY ZACHERY
Mike Athans
Awards
American Automatic Control Council's Donald P. Eckman Award "for outstanding
contributions to the field of automatic control" (first recipient of the award, 1964)
American Society for Engineering Education's Frederick E. Terman Award as "the
outstanding young electrical engineering educator" (first recipient, 1969)
American Control Council's Education Award for "outstanding contributions and
distinguished leadership in automatic control education" (second recipient, 1980)
Fellow of the IEEE (1973)
Fellow of the AAAS (1977)
Distinguished Member of the IEEE Control Systems Society (1983)
IEEE Control Systems Society's 1993 H.W. Bode Prize, including the delivery of the Bode
Plenary Lecture at the 1993 IEEE Conference on Decision and Control
American Automatic Control Council's Richard E. Bellman Control Heritage Award (1995)
"In Recognition of a Distinguished Career in Automatic Control; As a Leader and Champion
of Innovative Research; As a Contributor to Fundamental Knowledge in Optimal, Adaptive,
Robust, Decentralized and Distributed Control; and as a Mentor to his Students"
Athans & Falb, Optimal Control (McGraw Hill, 1966) First book on OC?
(.)
x1
(.)
y1
(.)
y2
(.)
ym
xn
inputs
(.)
2
x2
(.)
WT
L
(.)
outputs
hidden layer
Summation eqs
yi = wik vkj x j + vk 0 + wi 0
k =1
j
=
1
Matrix eqs
y = W T (V T x)
Feedback linearization
..
qd
Nonlinear Inner Loop
Feedforward Loop
qd
^f(x)
e
[ I]
Kv
q
Robot System
PD Tracking Loop
NLIP
= W T (V T x ) + K v r v
with
v(t ) = K Z ( Z
+ Z M )r .
W = Fr T F 'V T xr T F r W ,
V = Gx( 'T W r ) T G r V
T
T
with any constant matrices F = F > 0, G = G > 0 , and scalar tuning parameter > 0 . Initialize
the weight estimates as W = 0,V = random .
Then the filtered tracking error r (t ) and NN weight estimates W ,V are uniformly ultimately
bounded. Moreover, arbitrarily small tracking error may be achieved by selecting large control
gains K v .
robustifying terms-
Backprop termsWerbos
Backstepping
..
qd
NN#1
e
e = e.
^ (x)
F
1
[ I]
qd =
q. d
qd
Kr
Robust Control
Term
vi(t)
1/KB1
id
ue
K
q
qr = . r
qr
Robot
System
^F (x)
2
NN#2
Tracking Loop
Backstepping Loop
Actuator Nonlinearities q
Estimate
of Nonlinear
Function
little observer
NN Deadzone
Precompensator
f (x)
qd
e
-
Critic:
Actor:
r
Kv
II
D(u)
T
W i = T i (U i w ) r T W T ' (U T u )U T k 1T r W i k 2 T r W i W i
T
W = S ' (U T u )U T W i i (U i w ) r T k 1 S r W
Mechanical
System
Cellular Metabolism
http://www.accessexcellence.org/RC/VL/GG/index.html
R. Kalman 1960
Dynamics
r=w
v2 F
w = 2 + sin
r r
m
wv F
v=
+ cos
r
m
m = Fm
Objectives
Get to orbit in minimum time
Use minimum fuel
http://microsat.sm.bmstu.ru/e-library/Launch/Dnepr_GEO.pdf
System
Cost
V ( x(t )) = r ( x, u ) dt = (Q( x) + u T Ru ) dt
For a given control, the cost satisfies this eq.
Hamiltonian
V
V
V
0 = V + r ( x, u ) =
, u)
f ( x, u ) + r ( x, u ) H ( x,
x + r ( x, u ) =
Optimal cost
Bellman
Optimal control
V ( 0) = 0
T
T
V
V
0 = min r ( x, u ) +
f ( x, u )
x = min r ( x, u ) +
u (t )
x
x u (t )
T
T
*
V *
x = min r ( x, u ) +
f ( x, u )
0 = min r ( x, u ) +
x u (t )
u (t )
x
V
h ( x(t )) = 1 2 R g ( x)
x
1
HJB equation
*
*
dV *
1 T dV
1 dV
gR g
f + Q ( x) 4
0 =
dx
dx
dx
, V ( 0) = 0
V ( x (t )) = r ( x, u )d = xT (t ) Px (t )
t
CT Policy Iteration
To avoid solving HJB equation
r ( x, u ) = Q( x) + u T Ru
Utility
V
V
0=
, u)
f ( x , u ) + r ( x, u ) H ( x,
x
x
Lyapunov equation
Iterative solution
Pick stabilizing initial control
Find cost
V
0 = k f ( x, hk ( x)) + r ( x, hk ( x))
x
Vk (0) = 0
Update control
V
hk +1 ( x) = 1 2 R 1 g T ( x) k
x
T
Lyapunov eq.
Ak = A BLk
2. Improve policy:
Lk = R 1BT Pk 1
Ric( P ) AT P + PA + Q PBBT P
Then, Policy Iteration is
Pi +1 = Pi RicPi
Ric( Pi ),
i = 0,1,
Frechet Derivative
Ric' Pi ( P) ( A BB T Pi )T P + P ( A BB T Pi )
Draguna Vrabie
+ V ( x(t + t ))
r ( x ( ), u( )) = xT ( )Qx ( ) + uT ( ) Ru ( )
Draguna Vrabie
t +T
V ( x(t )) =
T
T
(
x
Qx
+
u
Ru )dt
+ V ( x(t + T ))
LQR case
t +T
x(t )T Px(t ) =
T
T
T
(
x
Qx
+
u
Ru
)
dt
+
x
(
t
+
T
)
Px(t + T )
Optimal gain is
L = R 1 BT P
Draguna Vrabie
1. Policy iteration
t +T
Cost update
Vk +1 ( x(t )) =
( xT Qx + u k T Ru k )dt + Vk +1 ( x(t + T ))
xT (t ) Pk +1 x(t ) =
u k (t ) = Lk x(t )
t +T
Lk +1 = R 1BT Pk +1
Lemma 1
xT (t ) Pk +1 x(t ) =
1
Lk = R B Pk
T
t +T
t
Solves Lyapunov equation without knowing A or B
is equivalent to
( A BR 1 BT Pk )T Pk +1 + Pk +1 ( A BR 1 BT Pk ) + Pk BR 1 BT Pk + Q = 0
Proof:
d ( x T Pi x )
= x T ( Ai T Pi + Pi Ai ) x = x T ( K i T RK i + Q ) x
dt
t +T
( A B B
Pi )
P i + 1 = Pi
Pi+1 + Pi+1 ( A B B
(R ic )
Pi
R ic ( Pi ) ,
Pi ) + Pi B B
i = 0 ,1,
Pi + Q
= 0
t +T
T
T
x (Q + K i RK i ) xd = d ( xT Px
)
=
x
(
t
)
Px
(
t
)
x
(t + T ) Px
i
i
i (t + T )
T
Theorem D. Vrabie
This algorithm converges and is equivalent to Kleinmans Algorithm
Only B is needed
Algorithm Implementation
Critic update
xT (t ) Pk +1 x(t ) =
t +T
To set this up as
pk +1T x (t ) =
t +T
pk +1T (t )
pk +1T
t +T
[ x (t ) x (t +T )] =
x( )T (Q + KiT RKi ) x( )d
pk +1
pk +1T [ x (t ) x (t + T ) ] =
t +T
x( )T (Q + Lk T RLk ) x( )d = (t , t + T )
Lk +1 = R 1 BT Pk +1
3. Improve control
observe x(t)
apply uk(t)=Lkx(t)
V = xT Qx + (u k )T Ru k
t+T
update control gain to Lk+1
do RLS until convergence to Pk+1
Algorithm Implementation
Or use batch Least-Squares solution along the trajectory
t +T
can be setup as
pi+1T (t ) pi+1T
t +T
[ x (t ) x (t +T )] =
x( )T (Q + Lk T RLk ) x( )d d ( x (t ), Lk )
pi+1 = ( XX ) XY
X =[ 1 (t ) 2 (t ) ... N (t )]
Y = [d ( x 1, Ki ) d ( x 2 , Ki ) ... d ( x N , Ki )]T
Draguna Vrabie
Actor
multiplier
Lk
System
x = Ax+ Bu; x0
V = xT Qx+ uT Ru
V
ZOH T
FB Gain L
Critic
Dynamic
Control
System
Control
u k (t ) = Lk x(t )
t
Sample periods need not be the same
They can be selected on-line in real time
Draguna Vrabie
Cost update
Vk +1 ( x(t )) =
x (t ) Pk +1 x(t ) =
T
kT
Ru k ) dt + Vk ( x(t + T ))
u k (t ) = Lk x(t )
Control policy
LQR
( x Qx + u
t +T
Algorithm Implementation
The critic update
xT (t ) Pk +1 x(t ) =
t +T
To set this up as
pk +1T x (t ) =
t +T
x( )T (Q + Lk T RLk ) x( )d + pk T x (t + T )
t
Regression vector
Previous weights
Lk +1 = R 1BT Pk +1
pk +1T x (t ) =
t +T
x( )T (Q + Lk T RLk ) x( )d + pk T x (t + T )
Lk +1 = R 1 BT Pk +1
3. Improve control
observe x(t)
apply u
V = x Qx + (u ) Ru
t+T
update control gain to Lk+1
do RLS until convergence to Pk+1
Draguna Vrabie
Actor
Lk
System
x = Ax+ Bu; x0
V = xT Qx+ uT Ru
V
ZOH T
Dynamic
Control
System
FB Gain L
Critic
Draguna Vrabie
Lk = R 1BT Pk
with
Ak = A BR 1 B T Pk
Vk +1 ( x(t )) =
Greedy update
t +T
{x Qx + (u )
k T
Ru k d + Vk ( x(t + T )), V0 = 0
is equivalent to
t +T
Pk +1 =
e Ak t (Q + Lk T RLk )e Ak t dt + e Ak T Pk e Ak T
T
a strange pseudo-discretized RE
c.f. DT RE
Pk +1 = A Pk A + Q A Pk B Pk + B PK B
T
B T Pk A
Pk +1 = A Pk Ak + Q + L Pk + B PK B Lk
T
k
T
k
Draguna Vrabie
Pk +1 Pk = e Ak t ( Pk A + APk + Q Lk RLk )e Ak t dt
T
Ak = A BR 1 B T Pk
Robustness?
Comparison with adaptive control methods?
t +T
x( )T (Q + K T RK ) x( )d +W (t +T , x(t +T ))
Control
u k (t ) = Lk x(t )
t
Sample periods need not be the same
0
-0.1
System states
4
-0.2
3.5
3
-0.3
0
2.5
0.5
2
1.5
1
1.5
Time (s)
Controller parameters
1
0.5
0
0
-0.2
0.5
1
Time (s)
1.5
-0.4
0
0.5
1
Time (s)
1.5
Critic parameters
0.2
P(1,1)
P(1,2)
P(2,2)
P(1,1) - optimal
P(1,2) - optimal
P(2,2) - optimal
0.15
0.1
0.05
0
0
3
4
Time (s)
V ( x(t )) = r ( x, u ) dt = (Q( x) + uT Ru ) dt
Policy Iteration
Vk +1 ( x(t )) =
t +T
( Q( x) + u
T
k
uk +1 = hk +1 ( x) = R 1 g ( x)T
Ruk dt + Vk +1 ( x(t + T ))
Vk +1
x
uk +1 = hk +1 ( x) = 1 2 R 1 g T ( x)
Vk +1
x
f(x) needed
Algorithm Implementation
t +T
( Q( x) + u
Vk +1 ( x(t )) =
T
k
Ruk dt + Vk +1 ( x(t + T ))
T
Approximate value by Neural Network V = W ( x)
Wk +1 ( x(t )) =
t +T
( Q( x) + u
T
k
Ruk dt + Wk +1 ( x(t + T ))
Wk +1 [ ( x(t )) ( x(t + T )) ] =
t +T
( Q( x) + u
T
k
Ruk dt
regression vector
( x(t ))
V
uk +1 = hk +1 ( x) = 1 2 R 1 g T ( x) k = 1 2 R 1 g T ( x)
Wk +1
(
)
x
t
x
System
Cost
V ( x(t )) = r ( x, u ) dt = (Q( x) + u T Ru ) dt
Vh ( xk ) = r ( xk , h( xk )) + Vh ( xk +1 )
Hamiltonian
V
V
V
0 = V + r ( x, u ) =
, u)
f ( x, u ) + r ( x, u ) H ( x,
x + r ( x, u ) =
Optimal cost
Bellman
Optimal control
T
T
V
V
0 = min r ( x, u ) +
f ( x, u )
x = min r ( x, u ) +
u (t )
x
x u (t )
T
T
*
V *
x = min r ( x, u ) +
f ( x, u )
0 = min r ( x, u ) +
x u (t )
u (t )
x
V
h ( x(t )) = 1 2 R g ( x)
x
1
HJB equation
V ( 0) = 0
*
*
dV *
1 T dV
1 dV
gR g
f + Q ( x) 4
0 =
dx
dx
dx
V ( 0) = 0