Professional Documents
Culture Documents
1 The Hamilton-Jacobi-Bellman Equation
1 The Hamilton-Jacobi-Bellman Equation
where g and h are scalar functions, t0 and tf are fixed, is a variable of integration. Let
us use the principle of optimality to include this problem in a larger class of problems by
considering the performance measure:
Z tf
J(x(t), t, u( )t tf ) = h(x(tf ), tf ) + g(x( ), u( ), )d (3)
t
where t can be any value less than or equal to tf and x(t) can be any admissible state
value. Notice that the performance measure will depend on the numerical values for x(t)
and t and on the optimal control history in the interval [t, tf ].
Let us now attempt to determine the controls that minimize (3) for all admissible x(t)
and all t tf . The minimum cost function is then:
Z tf
J (x(t), t) = min h(x(tf ), tf ) + g(x( ), u( ), )d (4)
u( ) t
1
The principle of optimality requires that:
(Z )
t+t
J (x(t), t) = min
u
g(x, u, )d + J (x(t + t), t + t) (6)
t
where J (x(t + t), t + t) is the minimum cost of the process for the time interval
t + t tf with initial state x(t + t).
Assuming that the second partial derivatives of J exist and are bounded, we can
expand J (x(t + t), t + t) in a Taylor series about the point (x(t), t) (truncated after
the first order terms) to obtain:
(Z
t+t
J (x(t), t) = min g(x, u, )d + J (x(t), t) +
u t
" # " #T
J J
+ (x(t), t) t + (x(t), t) [x(t + t) x(t)] (7)
t x
For small t:
x(t + t) x(t) = x(t) t
Z t+t
g(x, u, )d = g(x, u, t)t
t
Because of J (x(t), t) is independent of u we cancel the terms from the right and left
hand side of (7). x will be replaced using the plant equation (1):
" #T
J (x(t), t) J (x(t), t)
0 = min g(x, u, t)t + t + f(x, u, t)t (8)
u( ) t x
J (x(t), t)
0= + min H (11)
t u
To find the boundary value for this partial differential equation set t = tf ; from (4)
we have:
J (x(tf ), tf ) = h(x(tf ), tf ) (12)
2
We have obtained the Hamilton-Jacobi-Bellman equation - HJB (11) subject to
the boundary conditions (12). It provies the solution to the optimal control problems for
general nonlinear dynamical systems. However, analytical solution to the HJB equation
is difficult to obtain in most cases. A solution could be obtained analitically by guessing
a form of the minimum cost function. In general, the HJB equation must be solved
by numerical techniques. Actually, a numerical solution involves some sort of a discrete
approximation to the exact optimization relationship (11).
Exercise 1.1
where the initial time t0 = 0, the final time tf < is fixed, and the final state x(tf ) is
free. Find a control u (t) that minimizes J.
Example 1.1
It is desired to find the control law that minimizes the cost function
1 T 1 2
Z
J = x2 (T ) + u (t)dt
4 0 4
The final time T is specified and the admissible state and control values are not constrained
by any boundaries.
1
g(x(t), u(t), t) = u2 (t)
4
f(x(t), u(t), t) = x(t) + u(t)
The hamiltonian:
J 1 J (x(t), t)
H(x(t), u(t), ) = u2 (t) + (x(t) + u(t))
x 4 x
and since the control is unconstrained, a necessary condition that optimal control must
satisfy is:
H 1 J
= u(t) + =0
u 2 x
3
The control indeed minimizes the hamiltonian function because
2H 1
2
= >0
u 2
The optimal control:
J
u (t) = 2
x
when substituted into the HJB equation:
J
0= + min H
t u
gives: !2 !
J 1 J J J
0= + 2 + x(t) 2
t 4 x x x
!2
J J J
0= + x(t) (13)
t x x
The boundary value for J is:
1
J (x(T ), T ) = x2 (T ) (14)
4
One way to solve the HJB equation is to guess a form for the solution and see if
it can be made to satisfy the differential equation and the boundary conditions. Since
J (x(T ), T ) is quadratic in x(T ) guess
1
J (x(t), t) = p(t)x2 (t) (15)
2
where p(t) represents an unknown scalar function of t that is to be determined. Notice
that
J
= p(t)x(t) (16)
x
which together with the expression determined for u (t) implies that:
u (t) = 2p(t)x(t)
Thus, if p(t) can be found such that (13) and (14) are satisfied, the optimal control is
linear feedback of the state - indeed this was the motivation for selecting the form (15).
Substituting (15) and
J 1
= p(t)x2 (t) (17)
t 2
into (13) gives:
1
0 = p(t)x2 (t) p2 (t)x2 (t) + p(t)x2 (t)
2
4
Since this equation must be satisfied for all x(t), p(t) may be calculated from the differ-
ential equation:
1
p(t) p2 (t) + p(t) = 0 (18)
2
with the final condition: p(T ) = 1/2. p(t) is a scalar function of t; therefore the solution
can be obtained using the transformation z(t) = 1/p(t), with the result:
1
p(t) = (19)
1 + e2(tT )
z 2z = 0, is z(t) = C1 e2t
is stable.
5
Example 1.2
A man is considering his lifetime plan of investment and expenditure, [4]. He has an initial
level of savings x(0) and no other income other than that which he obtains from investment
at a fixed interest rate . His total capital is therefore governed by the equation:
Then: q
f (x, u, t) = 4x(t) u(t); g(x, u, t) = et u(t)
and the Hamiltonian is:
J
H = g(x, u, t) + f (x, u, t) = et u + Jx (4x u)
x
The boundary condition is: J (x(T ), T ) = 0 since h(x(T ), T ) = 0.
We calculate u that minimizes the hamiltonian from the necessary condition:
H 1 et
= 0 = Jx
u 2 u
1 1 1 2t 2
u = 2t 2 = e Jx
4 e Jx 4
Substituting u into the hamiltonian we obtain:
1 (3e2t + 16xJx 2 )
H =
4 Jx
6
Then:
J q
Jt = = p(t) x(t)
t
J 1 p(t)
Jx = = q
x 2 x(t)
By substituting the above, the HJB equation becomes:
1 3e2t + 4p2
!
p + x=0
2 p
It has to be satisfied for any x(t), thus we can obtain p(t) from:
1 3e2t + 4p2
p + =0
2 p
subject to the final condition p(T ) = 0.
The solution of the above equation is:
1 2t
p(t) = 6e + 4e4t C
2
The constant C is calculated from the final condition and has the value:
3
C = 2T
2e
The unknown function p(t) is:
1 q 2t
p(t) = 6e 6e4t+2T
2
Then the optimal control law is given by:
1 1 p(t)
u (t) = e2t Jx 2 , where Jx = q
4 2 x(t)
thus:
4x(t) 4x(t)
u (t) = =
e2t (e2t 6e 4t+2T ) 1 6e2(T t)
References
[1] Donald E. Kirk, Optimal Control Theory. An Introduction, Prentice-Hall Inc., 1970
[2] John T.Wen, Optimal Control, Center for Automation Technologies, Rensselaer Poly-
technic Institute
[3] Purdue University, Optimal Control Course
[4] Richard Weber, Optimization and control, Cambridge University
[5] David I. Wilson, Optimal Control, Karlstad University, 2000