Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Hamilton-Jacobi-Bellman Equation

1 The Hamilton-Jacobi-Bellman Equation


In our initial exposure to dynamic programming, we approximated continuously operating
systems to discrete systems. This approach leads to a recurrence relation that is ideally
suited for digital computer solution. In this section we shall consider an alternative
aproach which leads to a nonlinear partial differential equation - the Hamilton Jacobi-
Bellman (HJB) equation.
The process described by the state equation

x = f(x(t), u(t), t) (1)

is to be controlled to minimize the performance measure


Z tf
J = h(x(tf ), tf ) + g(x( ), u( ), )d (2)
t0

where g and h are scalar functions, t0 and tf are fixed, is a variable of integration. Let
us use the principle of optimality to include this problem in a larger class of problems by
considering the performance measure:
Z tf
J(x(t), t, u( )t tf ) = h(x(tf ), tf ) + g(x( ), u( ), )d (3)
t

where t can be any value less than or equal to tf and x(t) can be any admissible state
value. Notice that the performance measure will depend on the numerical values for x(t)
and t and on the optimal control history in the interval [t, tf ].
Let us now attempt to determine the controls that minimize (3) for all admissible x(t)
and all t tf . The minimum cost function is then:
 Z tf 
J (x(t), t) = min h(x(tf ), tf ) + g(x( ), u( ), )d (4)
u( ) t

By subdividing the interval we obtain:


( )
Z t+t Z tf
J (x(t), t) = min h(x(tf ), tf ) + g(x, u, )d + g(x, u, )d (5)
u( ) t t+t

1
The principle of optimality requires that:
(Z )
t+t

J (x(t), t) = min
u
g(x, u, )d + J (x(t + t), t + t) (6)
t

where J (x(t + t), t + t) is the minimum cost of the process for the time interval
t + t tf with initial state x(t + t).
Assuming that the second partial derivatives of J exist and are bounded, we can
expand J (x(t + t), t + t) in a Taylor series about the point (x(t), t) (truncated after
the first order terms) to obtain:
(Z
t+t
J (x(t), t) = min g(x, u, )d + J (x(t), t) +
u t
" # " #T
J J
+ (x(t), t) t + (x(t), t) [x(t + t) x(t)] (7)
t x

For small t:
x(t + t) x(t) = x(t) t
Z t+t
g(x, u, )d = g(x, u, t)t
t

Because of J (x(t), t) is independent of u we cancel the terms from the right and left
hand side of (7). x will be replaced using the plant equation (1):
" #T
J (x(t), t) J (x(t), t)
0 = min g(x, u, t)t + t + f(x, u, t)t (8)
u( ) t x

Dividing the above equation by t yields:


" #T
J (x(t), t) J (x(t), t)
0= + min g(x, u, t) + f(x, u, t) (9)
t u( ) x

Define the Hamiltonian as:


" #T
J (x(t), t)
H = g(x, u, ) + f(x, u, t) (10)
x

Then we write the partial differential equation (9) as:

J (x(t), t)
0= + min H (11)
t u

To find the boundary value for this partial differential equation set t = tf ; from (4)
we have:
J (x(tf ), tf ) = h(x(tf ), tf ) (12)

2
We have obtained the Hamilton-Jacobi-Bellman equation - HJB (11) subject to
the boundary conditions (12). It provies the solution to the optimal control problems for
general nonlinear dynamical systems. However, analytical solution to the HJB equation
is difficult to obtain in most cases. A solution could be obtained analitically by guessing
a form of the minimum cost function. In general, the HJB equation must be solved
by numerical techniques. Actually, a numerical solution involves some sort of a discrete
approximation to the exact optimization relationship (11).

Exercise 1.1

Consider a general class of dynamical systems modeled by the differential equation:

x(t) = ax(t) + bu(t)

with the associated cost index:


1 1 tf
Z  
J = f x2 (tf ) + qx2 (t) + ru2 (t)dt
2 2 t0

where the initial time t0 = 0, the final time tf < is fixed, and the final state x(tf ) is
free. Find a control u (t) that minimizes J.

Example 1.1

A first order system is described by the differential equation

x(t) = x(t) + u(t)

It is desired to find the control law that minimizes the cost function
1 T 1 2
Z
J = x2 (T ) + u (t)dt
4 0 4
The final time T is specified and the admissible state and control values are not constrained
by any boundaries.

1
g(x(t), u(t), t) = u2 (t)
4
f(x(t), u(t), t) = x(t) + u(t)
The hamiltonian:
J 1 J (x(t), t)
H(x(t), u(t), ) = u2 (t) + (x(t) + u(t))
x 4 x
and since the control is unconstrained, a necessary condition that optimal control must
satisfy is:
H 1 J
= u(t) + =0
u 2 x
3
The control indeed minimizes the hamiltonian function because
2H 1
2
= >0
u 2
The optimal control:
J
u (t) = 2
x
when substituted into the HJB equation:
J
0= + min H
t u

gives: !2 !
J 1 J J J
0= + 2 + x(t) 2
t 4 x x x
!2
J J J
0= + x(t) (13)
t x x
The boundary value for J is:
1
J (x(T ), T ) = x2 (T ) (14)
4

One way to solve the HJB equation is to guess a form for the solution and see if
it can be made to satisfy the differential equation and the boundary conditions. Since
J (x(T ), T ) is quadratic in x(T ) guess
1
J (x(t), t) = p(t)x2 (t) (15)
2
where p(t) represents an unknown scalar function of t that is to be determined. Notice
that
J
= p(t)x(t) (16)
x
which together with the expression determined for u (t) implies that:

u (t) = 2p(t)x(t)

Thus, if p(t) can be found such that (13) and (14) are satisfied, the optimal control is
linear feedback of the state - indeed this was the motivation for selecting the form (15).
Substituting (15) and
J 1
= p(t)x2 (t) (17)
t 2
into (13) gives:
1
0 = p(t)x2 (t) p2 (t)x2 (t) + p(t)x2 (t)
2

4
Since this equation must be satisfied for all x(t), p(t) may be calculated from the differ-
ential equation:
1
p(t) p2 (t) + p(t) = 0 (18)
2
with the final condition: p(T ) = 1/2. p(t) is a scalar function of t; therefore the solution
can be obtained using the transformation z(t) = 1/p(t), with the result:
1
p(t) = (19)
1 + e2(tT )

Obs. The solution of (18) is obtained as follows:


1 1
z(t) = ; z(T ) = =2
p(t) p(T )
z 2 2
(18) 2 2 + = 0
z z z
z 2z + 2 = 0

The solution of homogeneous part of the above equation:

z 2z = 0, is z(t) = C1 e2t

and the general solution:


z(t) = C1 e2t + C2
The constants are calculated by replacing the general solution into the equation and using
the final condition:

2C1 e2t 2(C1 e2t + C2 ) + 2 = 0 2C2 + 2 = 0 C2 = 1

z(t) = C1 e2t + 1; z(T ) = C1 e2T + 1 = 2; C1 = e2t


Then, the general solution yields:
1
z(t) = e2(tT ) + 1 p(t) =
e2(tT ) + 1

The optimal control law is then:


J (x, t)
u(t) = 2 = 2p(t)x(t)
x
2
u (t) = x(t)
1 + e2(tT )
Notice that as T , the linear time-varying feedback approaches constant feedback
p(t) = 1 and the controlled system

x(t) = x(t) 2x(t) = x(t)

is stable.

5
Example 1.2

A man is considering his lifetime plan of investment and expenditure, [4]. He has an initial
level of savings x(0) and no other income other than that which he obtains from investment
at a fixed interest rate . His total capital is therefore governed by the equation:

x(t) = x(t) u(t),

where > 0 and u(t) is his rate of expenditure. He wishes to maximize


Z T q
J= eat u(t)dt
0

for a given T . Find his optimal policy u (t).


The problem can be trasformed into a minimum one by changing the sign of the
performance measure: Z T q

J (x(t), t) = min eat u(t)dt
u 0
Let a = 1; = 4.
We solve this problem using the Hamilton-Jacobi-Bellman equation:
J
0= + min H Jt + H = 0
t u

Then: q
f (x, u, t) = 4x(t) u(t); g(x, u, t) = et u(t)
and the Hamiltonian is:
J
H = g(x, u, t) + f (x, u, t) = et u + Jx (4x u)
x
The boundary condition is: J (x(T ), T ) = 0 since h(x(T ), T ) = 0.
We calculate u that minimizes the hamiltonian from the necessary condition:

H 1 et
= 0 = Jx
u 2 u
1 1 1 2t 2
u = 2t 2 = e Jx
4 e Jx 4
Substituting u into the hamiltonian we obtain:

1 (3e2t + 16xJx 2 )
H =
4 Jx

Suppose we try a solution of the form:


q
J (x(t), t) = p(t) x(t)

6
Then:
J q
Jt = = p(t) x(t)
t
J 1 p(t)
Jx = = q
x 2 x(t)
By substituting the above, the HJB equation becomes:
1 3e2t + 4p2
!
p + x=0
2 p
It has to be satisfied for any x(t), thus we can obtain p(t) from:
1 3e2t + 4p2
p + =0
2 p
subject to the final condition p(T ) = 0.
The solution of the above equation is:
1 2t
p(t) = 6e + 4e4t C
2
The constant C is calculated from the final condition and has the value:
3
C = 2T
2e
The unknown function p(t) is:
1 q 2t
p(t) = 6e 6e4t+2T
2
Then the optimal control law is given by:
1 1 p(t)
u (t) = e2t Jx 2 , where Jx = q
4 2 x(t)

thus:
4x(t) 4x(t)
u (t) = =
e2t (e2t 6e 4t+2T ) 1 6e2(T t)

References
[1] Donald E. Kirk, Optimal Control Theory. An Introduction, Prentice-Hall Inc., 1970
[2] John T.Wen, Optimal Control, Center for Automation Technologies, Rensselaer Poly-
technic Institute
[3] Purdue University, Optimal Control Course
[4] Richard Weber, Optimization and control, Cambridge University
[5] David I. Wilson, Optimal Control, Karlstad University, 2000

You might also like