Professional Documents
Culture Documents
StarrHo1
StarrHo1
net/publication/226899019
CITATIONS READS
736 2,444
2 authors, including:
Y.-C. Ho
Harvard University
317 PUBLICATIONS 22,200 CITATIONS
SEE PROFILE
All content following this page was uploaded by Y.-C. Ho on 04 August 2014.
A. W. STARR2 AND Y. C. Ho ~
1. I n t r o d u c t i o n
1 Paper received July 8, 1968. This research was supported by Joint Services Electronics Con-
tracts Nos. N00014-67-A-0298-0006, 0005, 0008 and by NASA Grant No. N G R 22-007-068.
Graduate Student, Division of Engineering and Applied Physics, Harvard University, Cam-
bridge, Massachusetts.
3 Gordon M c K a y Professor of Engineering and Applied Mathematics, Division of Engineering
and Applied Physics, Harvard University, Cambridge, Massachusetts.
184
JOTA: VOL. 3, NO. 3, 1969 185
Player 2 Ptayer 2
x y x y
Player 2 Player 2
X Y x y
a 0, 1 2, 2 2, 2 10, 1
Player 1 Player 1
b 2, 2 1,0 1,10 5,5
4 Player 1 (he) prefers a football game a to a concert b, while player 2 (she) prefers concert y
to football x. But each prefers going to the same event ax or by to going to separate events
ay or bx.
JOTA: VOL. 3, NO. 3, 1969 187
Two prisoners awaiting trial are held in separate cells. Each is offered a 50% reduction in his
sentence if he divulges evidence about the other's crime. If one refuses, the other can only be
convicted of a lesser offense (2 year sentence). A similar situation arises when two superpowers
contemplate building antimissile systems.
188 JOTA: VOL. 3, NO. 3, t969
must demand that the solution have some attribute such as minimax, Nash
equilibrium, stability against coalition formation, pareto-optimality, and so on.
One must also specify what information is available to each player during the
course of the game. Here, it is assumed that each player knows the current
value of the state vector as well as all system parameters and cost functions,
but he does not know the strategies of the rival players. Each player's strategy
can then be expressed as a function of time and the current state vector.
This does not exclude mixed strategies.
Strictly speaking, the set ¢(x, t) must be defined so that the trajectory x¢(t) satisfying (3) can
be continued from any initial point (x0, to).
JOTA: VOL. 3, NO. 3, 1969 189
of the usual dynamic programming argument, one can show that the value
functions V~ are solutions of the partial differential equation
8~i/8t --: --min tti(x, t, 711 ..... !Pi-1, ui , Wi+l ..... ~gN , OVi/Sx) (5)
Ui
where H t , the Hamiltonian for the ith player, has the functional form 7
(7)
= f(x, u~ .... , U N , t) (8)
,,~ = ~*(x, t, ~v~f&,..., av~/ax) (9)
are integrated backward f r o m all the points on the terminal surface, feasible
trajectories are obtained, s T h e next section considers a class of games which
are normal.
Necessary conditions for a Nash equilibrium can also be obtained b y an
extension of the variational methods used in optimal control theory. Case
(Ref. 3) did this, but his results are apparently valid only for strategies which
are functions of time only. I n nearly all problems Of interest, we require
strategies which are functions of the state vector and time. T h e conditions
given below satisfy this requirement. With the Hamiltonians H~ defined in (6),
7 Note that 1i here is merely a dmnmy variable, to be replaced by 8VJax in (5), although it can
be interpreted as a costate variable in the variational approach described below.
8 Note that condition (i) requires that a unique set of controls giving a saddle-point of H(x, t, u, ~t)
can be found as an explicit function of x, t, )t for any ;t. This is a sufficient (but not necessary)
condition for the existence of a Nash trajectory. It is relatively easy to determine if (i) is satisfied,
since no differential equation need be solved.
190 JOTA: VOL. 3, NO. 3, 1969
u i = T,(x, t) minimizes Hi(x, t, T t ,..., Ti-1, u~, Ti+~ ,..., T N , Z,) (13)
Note that, for N = 1 (optimal control problem), the second term in (11) is
absent. The optimal closed-loop control u(x, t) can then be obtained by solving
for the open-loop optimal control u(t) for every initial point (x, t). This method
is not valid in the N-player game, however, due to the summation term in (11).
In the optimal control problem, these necessary conditions are a set of ordinary
differential equations, but in the N-player, nonzero-sum game, they are a set
of partial differential equations, generally very difficult to solve.
max Ji(¢l ,--.,~ ,...,¢N) ~ max Ji(¢l ,..., ¢i .... , ¢~v) (14)
¢1 . . . . . ¢i--I,q~i+1 . . . . . ,d)N 4)1. . . . . @i-l,@i+i..... ~N
Note that only the ith player's cost function enters into the computation
of his minimax control. This is equivalent to finding the equilibrium solution
of a two-player, zero-sum differential game, where the opponent of Player i
chooses all the controls except the ith and tries to maximize J i . Player i can
also calculate his minimax cost f l • If he plays ¢~, he pays no more than ]i •
He probably pays much less, since the other N - 1 players, each with his
own cost to minimize, are unlikely to choose the combination of strategies
which maximizes Ji (for example, they may play their own minimax controls).
Since it fails to take account of the other players' cost criteria and since it is
excessively pessimistic, the minimax solution is somewhat unsatisfactory in
the nonzero-sum game. In some reasonable, well-behaved games, ]i = co.
J O T A : V O L . 3, N O . 3, 1969 191
Conversely, for any/~ satisfying (17) with strict inequalitiesfl the corresponding
control N-tuple which minimizes J in (16) belongs to the noninferior set.
The members of the noninferior set are not ordered by the vector
criterion. Thus, the negotiating problem, equivalent to selecting a vector /~
satisfying (17), cannot be solved unless further rules are specified.
3. L i n e a r - Q u a d r a t i c Games
vector and controls. Like its counterpart in optimal control theory, the
linear-quadratic game is analytically tractable and of some practical interest
(Ref. 1).
For the ith player, i = 1,..., N, the problem is to choose a control
strategy u t = ~ i ( x , t) to minimize
N
J~ = ½(xr&/O,=, , + ~ ~** (xrQix 4- ~ ujrRisuj) dt (18)
2 "JtO 5=1
subject to
N
= A x 4- ~, Bjuj (19)
5=1
Using the value-function approach, one sees that the game is normal and
the minimizing control for the ith player is
u* = --R~XB,rh, = --R-~IB,T(avi/ax) T (21)
All the R i i must be positive definite; otherwise, the problem is meaningless.
Substituting (21) into the Hamilton-Jacobi equation (7), one obtains a partial
differential equation which can easily be solved by separation of variables.
Assuming that V i ( x , t) has the form
obtained from the necessary conditions (10)-(13). When the set of N coupled
matrix equations (23) is integrated backward from the terminal time, the costs
incurred by each player are given by (22). One might wish to compare these
costs with those incurred when the players use arbitrary linear feedback
controls of the form
u i ......... K i ( t ) x (24)
J~ = ~xrP~x (25)
and the equilibrium controls (which are also saddle-point controls, because
the game is zero-sum) are
u* = - - R ; ~ B l r S x , u* = R ; ~ B 2 r S x (29)
in agreement with the results of Ho, Bryson, and Baron (Ref. 1).
8o9f3/3-4
194 J O T A : V O L . 3, N O . 3, 1969
where
N
= - - S , A -- ATS~ -- Q, + S~ E BjRTjlB~rS~, 3~(t,) = S,, (31)
provided that
R,, > 0, R,j < 0, j # i, j = 1,..., N (32)
If conditions (32) are not satisfied, the ith minimax control may fail to exist,
so that the ith minimax cost is infinite. Note that a minimax control might
exist for some players and fail to exist for others. A case of interest is Rij = 0
for all i, j # i. In this special case, the minimax control is either identically
zero or does not exist.
where
j=l i=l
N
~(/~, %) = ~ t~,S,s (34)
i=1
where
N
tz~ = 1, ~ ~> 0, i = 1,..., N
4. E x a m p l e : A S i m p l e Pursuit-Evasion Problem
¢ = v, ~3 == a ~ - - ae (35)
where the accelerations ap and a e are controlled by the pursuer and evader,
respectively. T h e cost criteria are
1~ 2~ T~, fT
]~ .I o
(36)
T Y"
where r I -- r ( T ) and the final time T is fixed. [Following (27), we can obtain 1°
the zero-sum game of Ho, Bryson, and Baron by taking % = % , cpe = - - c e ,
and c~p = - - c v .]
T h e Nash equilibrium controls for this game are found using (23),
where, for convenience, S~ = P, S 2 = E. Specifically,
a~ : -- %[0 a~ : ce[0
, = - ~ [; 0I] f° ~1 ~ + ~ [~
~o~/~)~ [~ °11
= ~ [° ;1-[0 oI ~ + ~ [~
Besides being in a more convenient form for computation, Eqs. (41) have
another significance. I n the simpler pursuit-evasion game with v e l o c i t y c o n t r o l
~' = v~ -- v~ (42)
with ap and ae in the cost criteria (27) replaced by % and ve, respectively,
the Nash equilibrium solution is
11The players measure their costs associated with their own controls in different units, each
unit chosen to give unit weighting to the terminal miss. One must be careful in interpreting
these costs; a direct comparison is not always meaningful.
JOTA: VOL. 3, NO. 3, 1969 197
where the scalar functions p*(t'), e*(t') are solutions of the following linear
equations obtained from (26):
(49)
2.0
COST
/ /
12
~r
1.0
2 212
% 0.5 l.O
TIME-TO-GO
Fig. 1. Equilibrium costs for the zero-sum case.
the quantities of interest (the costs, apart f r o m the factor ½r ~) are p/oJ and ew,
and these are the quantities plotted in all figures.
I n the z e r o - s u m case, (32) can be solved analytically. T h e resulting
costs, s h o w n in Fig. 1 for a 1 = a~ = 1, are
x~These prices are measured in different units, each chosen to make the corresponding prices
on terminal miss unity.
200 JOTA: VOL. 3, NO. 3, 1969
2.5
9.16 ~2=0.444
PURSUER'S COST
].0
w2 = 8 / " ~
-0.5 [5
-- 1.O
EVADER'S COST
-].0 t6
-].5 I t I t I I ~ i
b y both players. But this solution does not have the N a s h property; the evader
can make a small i m p r o v e m e n t in his cost by changing to pe = 1.2, at great
additional cost to the p u r s u e r who still plays pp = 0.8. A similar effort by the
p u r s u e r to i m p r o v e his cost against Pe = 0.71 leads to a higher cost for the
evader. T h u s , we are confronted with an example of the prisoners' dilemma
J O T A : V O L . 3, N O . 3, 1969 201
COST
I 2
gr
Q4 0.8 ].2 1.6
0 I -- 1
l
COST
½r 2
0.4 0.8 1,2 1.6
0
f
/ ~ pp,pe=.8,1.2 (c)
]~ ~ ......~, n DEVIATE
- 1 ~ _a
Note that, when s! < O, that is, w h e n / z < %21(%~ _~ %~), the noninferior
controls become infinite at some finite t'. With a = b = 0, 7 = 1//z(1 --/z),
and there is some range tz* < / ~ ~< 1 such that the noninferior controls
are finite for all t' (one must put a sutficiently large weight/z on the pursuer's
cost). W h e n a set of noninferior controls is used, the costs incurred by the
two players can be obtained by solving (26). T h e resulting costs are
ta T h e corresponding results for the acceleration control problem can again be obtained from
the results for the velocity control problem via (40) and (38).
JOq?A: VOL. 3, NO. 3, 1969 203
1.8
1.6
1.4 PURSUER'S COST
/x=0.5
-....LsL
1.0!
=0,55
0.6
COST 0.4
½r2 Q2
0
0,2 / 0.4 0.6 0.8 1.0
-0.2
if=(
-0.4 I
/ f l
-0.6 f ,55
-0.8 EVADER'S COST
if=0.5
-1.C
-1.2
-1A
-1.6
l
-1.8
-2.0 ! I _L I I , ....
pursuer. The Nash equilibrium costs for the same co are shown for comparison.
Presumably, neither player would accept a negotiated solution giving him a
higher cost than the Nash solution. This would restrict the negotiable range
to approximately 0.5 < / z < 0.6. The exact limits of this range depend on
time-to-go. However, in some games, threat situations may exist (Ref. 4)
which force one player to accept a noninferior solution which is worse for
him than the Nash solution (recall that the Nash solution does not have the
minimax property). Of course, the present mathematical formulation of the
problem gives us no help in selecting a member of the noninferior set. Further
rules are needed to define a unique, fair noninferior solution. The lack of a
set of such rules acceptable to all parties is a major obstacle in practical
negotiations.
5. E c o n o m i c E x a m p l e
subject to
~, = f,(xl ,..., x~) - ,,, (57)
u~ >~ 0, x~ >~ x~ (58)
JOTA: VOL. 3, NO. 3, 1969 205
6. Conclusion
References
1. Ho, Y. C., BRYSOX,A. E., and BAROZq,S., Differential Games"and Optimal Pursuit-
Evasion Strategies, IEEE Transactions on Automatic Control, Vol. AC-10, No. 4,
1965.
2. ISAACS,R., Differential Games', John Wiley and Sons, New York, 1965.
3. CASE, J. H., Equilibrium Points of N-Person Diffeq'ential Games, University of
Michigan, Department of ~ndustrial Engineering, TR No. 1967-1, 1967.
4. LutE, R. D., and RAIVFA,H., Games and Decisions, John Wiley and Sons, New York,
1957.
5. DACUXHA,N. O., and POLAK, E., Constrained Minimization Under Vector-Valued
Criteria in Finite-Dimensional Spaces, University of California at Berkeley, Elec-
tronics Research Laboratory, Memorandum No. ERL-M188, 1966.
6. ZADEH, L. A., Optimality and Non-Scalar-Valued Performance Criteria, IEEE
Transactions on Automatic Control, Vol. AC-8, No. 1, 1963.
7. KLINGER, A., Vector-Valued Performance Criteria, IEEE Transactions on Auto-
marie Control, Vol. AC-9, No. 1, 1964.
Printed in Belgium