Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

VOL. 7, NO.

2 WATER RESOURCES RESEARCH APRIL 1971

DiscreteDifferentialDynamicPrograming
Approach
to Water Resources
SystemsOptimization

MANOUTCYIEYIR
HEIDARI
VENTECHOW
PETAR
V. KOKOTOVI
AND DALE D. MEREDITH

University of Illinois, Urbana, Illinois 61801

Abstract. The optimization of operating policies of multiple unit and multiple purpose
water resources systems by traditional dynamic programing with the use of high speed
digital computers encounters two major difficulties' memory requirements and computer
time requirements. This paper presents an iterative method that can ease the above diffi-
culties considerably. The method starts with a trial trajectory satisfying a specific set of
initial and final conditions and applies Bellman's recursive equation in the neighborhood of
this trajectory. At the end of each iteration step a locally improved trajectory is obtained
and used as the trial trajectory in the next step. The method has proved particularly
effective in the case of so-called 'invertible' systems.The merits of the proposed approach
are demonstrated through its application to a four-unit, two-purpose water resourcessystem.
To save computer time the example is restricted to deterministic inflows.

INTRODUCTION cifically designedfor analyzing multiple stage


processes.
To determine the operating policy of a res-
ervoir system, we analyze a multiple stage One disadvantageof dynamic programing is
processthat usually has a nonlinear objective that it requires a high-speedmemory that is
function, constraints on both states (storage) beyond the capacity of known computerswhen
and decisions(release), and stochasticdistur- the dimensionalityis higher than four or five.
bances. Optimization techniques,such as the Another difficulty usually encounteredin the
conjugategradient [Fletcher and Powell, 1963; applicationof dynamicprogramingis the great
Fletcher and Reeves, 1964] and the second amount of computer time required. Although
variation method [Bryson and Ho, 1969], may the time savingsdue to dynamicprogramingare
be adoptedto searchamonga sequenceof feasi- impressive when compared with the time re-
ble decisionsfor the optimal set of decisions. quired for the direct enumerationmethod, one
must realize that the application of dynamic
However, thesetechniquesrequire a sufficiently
programingto a realistic problem can indeedbe
differentiable objective function, they cannot
expensivein terms of computer time.
handle constraintswithout creating difficulties,
and they require a major modificationof the Several approachesavoiding such difficulties
algorithm in order to incorporatethe stochastichave been proposed, including the works of
disturbances. Mayne [1966], Larson [1968], Wong and Luen-
berger [1968], Jacobson [1968a, b, c], Lee
In contrast, dynamic programing [Bellman,
[1969], and Korsak and Larson [1970]. Larson
1957] doesnot have the limitations mentioned
[1968] and Larson and Keckler [1967] have
above. When expressedin discrete form, its applied their modified algorithm of dynamic
recursive equation can even handle functions programing,which is basedon Bellman's [19'61]
definedby tables. Constraintson statesand de- successiveapproximation method, to a four-
cisions present no difficulties but reduce the dimensionalwater resourcessystem. The works
computationefforts.Stochasticdisturbances may of Mayne [1966], Jacobson[1968a, b, c], and
be incorporatedinto the algorithm with little Jacobsonand Mayne [1970] on differential dy-
modificationexceptfor an increasein computer namicprogramingmay be consideredthe theo-
time. Dynamic programingis a techniquespe- retical backgroundfor this study.
273
DISCRETE DIFFERENTIAL DYNAMIC PROGRAMING
F*[s(n), n] = max {RIO, u(n -- 1),
(DDDP)APPROACH u(n--1)Lr(n--1)

The I)I)I)P is an iterative techniquein which n- 1] q- F*[s(n- 1), n- 1]] (6)


the recursiveequation of dynamic programing which may be solved for every s(n), as a
is used to search for an improved trajectory functionof u(n -- 1) only. Solutionof (6) for
among the discretestates in the neighborhood specific state in (2) provides an optimum
of a trial trajectory. u(n -- 1); i.e.,the optimumdecisionthat should
Consider the dynamic system whose state be made for somestate at stagen -- I to bring
equation is the systemto the specificstate at stage n.
Let us assumethat the objectivefunction (3)
= - - 1), - for the systemof (1) is to be optimizedsubject
n = 1, 2, ... , N to (2) and that the m-dimensionalstate vectors
wheren is an indexspecifyinga stage(beginning at the initial and final stagesare specifiedsuch
of a time increment),N is the total numberof that
time increments into which the time horizon
s(O) = a(O) s(N) = a(N) (7)
hasbeendivided,s(n) is an m-dimensional state
vector at stagen (m being the number of state In the proposed DDDP approach a trial
variables), u(n - 1) is a q-dimensionaldecision sequence of admissibledecisionvectors, u'(n),
vector at stagen -- 1 (q being the number of n = 0, 1, "., N -- 1, called the trial policy,
decisionvariables), and that satisfies(2) is assumed,and the state vec-
tors at different stagesare then determined.The
s(n) S(n) u(n) U(n) (2)
sequenceof values of the state vector satisfying
where S(n) is the admissibledomain in the state (2) and (7) is called the trial trajectory and is
space at stage n, and U(n) is the admissible designatedby s'(n), n -- O, 1, '", N. For in-
domainin the decisionspaceat stagen. In water vertible systems,which will be definedlater, it
resourcessystems,for example in a network of is possiblefirst to assume an admissibletrial
reservoirs, state refers to storage, and decision trajectory s'(n), n = 0, 1, "', N, and then to
refers to release from storage. The objective use it to calculatethe trial policy u'(n), n =
function to be maximized is 0,1, "',N-- 1.
Introducingu'(n) and s'(n) into (3) we ob-
tain F'
F = R[s(n- 1), u(n- 1),n- 1] (3)
where F is the sum of returns from the system F' = R[st(n-- 1), ut(n-- 1,)n -- 1] (8)
over the time horizon and R[s(n -- 1), u(n --
1), n -- 1] is the return obta,inedas a result of where F' is the total return due to the trial
a decisionu(n -i) that is made at stagen -- 1 trajectory and policy over the entire time hori-
withthesystem
instate
s(n- 1) andthatlasts zon. F' may not be the optimum return.
until stage n. Now, considera set of incrementalm-dimen-
The forward algorithm of dynamic program- sional veetom
ing may be used to optimize (3) over n stages
as follows:

F*[s(n), n] = max {RLs(n- 1),


u(n--1)GU(n--1)

u(n- 1), n- 1]-{- F*[s(n- 1), n- 1]} (4)


where F[s(n), n] is the maximum total of the
returns from stage 0 to stage n when the state
at stagen is s(n). Let us solve(1) for s(n -- 1),
_Sim(n)J (9)
s(n- 1) = O[s(n),u(n- 1), n- 1] (5)
Substituting (5) into (4), we obtain the follow- n = 0, !, ... , N
ing recursive equation: i = 1,2,
Dynamic Programing 275
whosejth component8s,j(n), j -- 1, 2, '", m, the states in the subdomain at stage n ---- 0
can take any one value at, t -- 1, 2, ---, T, excepts'(O) -- a(O). If in step 3 the trajectory
from a set of assumed incremental values of the having a final state a(N) is traced, the preser-
state domain. The value at is the tth assumed vation of boundaryconditions(7) is guaranteed.
increment from the state domain and T is the Note that in the courseof the iteration proc-
total number of assumed increments from the ess,the corridor size may be varied gradually
state domain. Thus the total number of Ash(n) by choosingdifferent[at]k, t -- 1, 2, "', T, in
vectors at stage n is T . When added to the step 2. If the corridorsize is kept constantfor
trial trajectory at a stage,thesevectorsform an every iteration and little or no improvement
m-dimensionalsubdomaindesignatedby D(n) can be achieved after the kth iteration, it is
suggestedthat latin, t = 1, 2, -", T, then be
s'(n) -]- As,(n) i-- 1, 2,---, T (10) reducedstarting at the (k 1)th iteration and
Note that one value of at must be zero since that the processbe continued with the new
the trial trajectory is always in the subdomain. corridor size until another iteration that behaves
In Figure 1 two such subdomainsfor m -- 2, like the kth iteration is reached. Then the corri-
T -- 4 and m -- 3, T -- 3 are presented.All dor size is further reduced starting at the next
D(n), n -- O, 1, .", N, togetherare called a iteration, and the procedureis repeateduntil the
'corridor' and designated by C as shown in
Figure 2 by the space between two solid lines
for a systemwith m -- 1, T -- 3, and n -- 10.
In DDDP a corridor C is used as a set of ad- (a)
missiblestates,andthe optimizationconstrained
to these statesis performedby means of the
recurslye relation (6). The value of return F
obtainedis at least equal to or greater than
in (8). If F is greaterthan F', the corresponding
trajectoryand policy obtainedfrom corridorC s
are used in the next iteration step as the trial
trajectoryand trial policy.Thus the kth itera-
tion step is as follows:
1. Use the results [s*(n)]k_ and [u*(n)]k_
of the (k -- 1)th iterationstep as the trial tra-
jectoryand policyfor the kth iterationstep; i.e.,
s2
(b)
[s'(n)]
= [s*(n)]_ (11)
=
2. Select[,], [.], ..- , [r] to definethe
kth corridor C, and use (6) to maximize F
subjectto s(n) C.
3. Amongthe optimumtrajectoriesin corri-
dor C, trace the optimumtrajectory satisfying
boundaryconditions(7) [s(n)] and the cor-
respondingoptimumpolicy [u(n)].
4. Determine F; if F* -- F_* _< Fig. 1. Examples of state subdomains at
where e is someprespecifiedconstant,stop the stage n. (a) A state subdomain D(n) defined by
16 lattice points in the neighborhood of s'(n) for
iteration; otherwisego to step 1. two-dimensional state vector and T -- 4 (x --
-2.0, -- -1.0, s -- 0.0, and -- --1.0). (b)
Figure 3 showsthe flow chart of this pro-
A state subdomain D(n) defined by 27 lattice
cedure.
points in the neighborhood of s'(n) for a three-
Since the boundary conditions(7) must be dimensional state vector and T -- 3 (x -- -]-1.0,
satisfied,one may excludefrom the analysisall -- 0.0, and -- --1.0).
276 smAm

---Trial
Trajectory
a(O)+B s(0),

a(O) + Bs2(0)
'.'""-., : a(N)
+BsI(N)
a(O)+$s3 (0)'
---e--/--:-:.....-,--e--:'-...:
7.;/,
.-..-.;..-e
a(N'*
Bs2(N'
I I i I I I I I I I
0 I 2_ $ 4 5 6 7 8 9 I0
Stage (n)

Fig. 2. Schematic representation of a trial trajectory [s'(n)], -- [s*(n)]_ with n --0,


1, ..-, 10. the boundari.s defining corridor C,, and optimal trajectory [s*(n)] with n -- 0,
1, .--, N in C with N -- 10 of kth iteration for a system with m -- I and T -- 3.

condition in step 4 is satisfied.Note that it is this interpolation usually produces inaccurate


also possibleto assumea different set of (t, resultsand requiresa large amount of computer
increments for each state variable. time. For invertible systems,however,interpo-
lation may be avoided.
INVERTIBILITY OF SYSTEM EQUATIONS IN
A systemis said to be invertible if the order
WATER RESOURCES SYSTEMS
of the state vector is equal to the order of the
The proceduredescribedaboverequiresinter- decisionvector (i.e., m -- q), and the matrix
polationto retrievethe secondterm on the right- 0),/0/j, i, j -- 1, '-., m, of the system,
hand side of (6). For high dimensionalsystems
s(n) = [s(n- 1), u(n- 1), n- 1]
se(n)= 2[s(n- 1), u(n- 1), n- 1]

- ,[s(n- 1), u(n- 1), n- 1]


/ Trace
Read
Data: optimum
pol,cy
[u(n)]k
and
I
N, M, T, a(O), o(N)/ optimum
trajectory
[s(n)]
k
, RIs,u,n,(s,u,n/ satisfyinga(O) and a (N)

,[u'(n)J,,
etc. Retrieve
Fk

t ,

[s'(n)]k=
2,
[s'(n-I)
n:1, ,u'(n-I),
N n-l]J
----, = ,,,[s(n-- 1),u(n-- 1),n- 1] (12)
t
is nonsingularfor every n, u(n) U(n), and
JF'=n=o
" R[s'(n),
u'(n)
, n] s(n) s(n). Assumingthat (12) is an invertible
system, one can solve the decision variables
in terms of the state variables

Choose
o-t , t=l,Z,---, T [s'(n)]k=
[sn)]k.I
t [u'(n)]k=
[uili(n
)]k_l u,(n-- 1)= t[,,[s(n),s(n-- 1),n-- 1]
FormA si(n),i--I,2,---,T
m n: O, I,---, N
n =0,1,---, N u2(n-- 1) = [s(n),s(n-- 1),n-- 1]
f

Form
subdomain
O(n),
n=O,I,---,
N,J
by[s'(n)]
+Asi(n)
, i--I,;),
--, Tm
Let all D(n), n--O,I,---, N, be u,(n-- 1)= t[,,[s(n),s(n-- 1),n-- 1]
corridorCk

Optimizeby dynamicprograming
for states in Ck
u,,(n- 1) = t['m[S(n),
s(n- 1), n- 1] (13)
In the caseof water resourcessystemsit will
Fig. 3. Flow chart showing steps of the DDDP be demonstratedthat the assumptionmentioned
approach. above is not restrictive. The ith componentof
Dynamic Programing 277

the vector equation (12) for a water resources - s'(n-I)+8sl(n-i) . s'(n)+ 8 sl(n)
systemmay be written as
u3
si(n) -- si(n- 1) q- y(n- 1)
s'(n-I)+ 8 s2(n-I) s'(n)+ $ s2(n)
-- u(n-- 1) -- L(n-- 1) (14)
where s,(n) is the storageat stagen, y(n -- 1)
is the inflow during the time period starting at
stagen -- I and lastinguntil stagen, u, (n -- 1) s'(n-I)+8 s3(n-I) s'(n) + 8 s3(n)
is release,and L, (n -- 1) is lossdue to seepage
n-P_. n-I n n+l
and evaporationin the sametime period.Since,
for a systemconsistingof m components suchas Stcge (n)
(14) Fig. 4. Possible decision paths leading to state
s'(n) q- $s_(n)from stage n -- I for a system with
m-- land T-- 3.
-- --1 or nonzero (15)

i----j---- 1,2,... , m Using the equations for invertible systems,


or a,qS,/alj,
i, j, -- 1, 2, .' ', m, is nonsingular, one may write (4) as
the water resourcessystem is invertible. Note
F*[s(n), n] - max {R[s(n -- 1),
that invertibility is due to the associationof a s (--1) D (--1)

releasewith each storageunit in most reservoir


systems. Ignoring the losses, one can write tk[s(n- 1), s(n), y(n -- 1)], n -- 1]
u,(n -- 1) in (14) in terms of inflow and states
as
q- F*[s(n- 1), n- 1]] (17)
whereD(n -- 1) is the state subdomainlocated
u,(n-- 1) ----s,(n-- 1) -- s;(n) q- y,(n-- 1) in the neighborhoodof the trial trajectory
----k[s(n- 1), s,(n), y(n- 1)] stage n -- 1.
It must be emphasizedthat the justification
(16) for this processlies in the assumptionthat
Assume that (3) is to be optimized with the t -- 1, 2, ---, T, are chosenproperly. If they
are not, most of the decisions calculated by
forward dynamic programing algorithm for
state s(n). Rather than use the state s(n) and a (16) for state s(n) may be inadmissible.When
decisionu(n) U(n- 1) in (5) to calculate valuesof (t are kept within an admissiblerange,
s(n -- l), one may use (16) to calculate the the policy slowly convergesto the optimal one
decisionsthat would be required for the states in the DDDP approach.
at stagen -- 1, for whichF*[s(n-- 1), n -- 1]' Use of the equations for invertible systems
has already been calculated to go forward to eliminates the inaccurate and time consuming
state s(n). These decisionscan then be tested interpolation required to retrieve the term
F[s(n-
to determineif they violate the constraintsin (2). 1), n- 1] in (6) by forcing the
If the optimizationis being carried out for the trajectories to go through the states at stage
states in the corridor as defined for the DDDP, n -- 1, for which F'[s(n -- 1), n -- 1] has
then the useof invertibilityprovidesT "possible already been calculatedand stored. For inverti-
decisions,which when applied to the states ble, multiple dimensionalsystemsthe accuracy
in D(n -- 1) will bring the system to s(n). and speed of this procedure are much greater
Figure 4 shows the possible decisionsfor a than the accuracyand speedof the interpolation
systemwith m - I and T = 3. The T mdecisions procedure.
then may be usedin (6) to determineu*(n -- 1) EXAMPLE
and F*[s(n), n] without interpolationto retrieve
F*[s(n -- 1), n -- 1]. The same proceduremay The following simplified system, which was
be repeatedfor other states in the subdomain formulated and solved by Larson [1968] by
D(n) as definedin Figure 1. linear programingand successive approximation
278 I-IEIDARI ET AL.

dynamic programing,was solved by means of The system equationsexpressingthe dynamic


the proposedapproach. behavior of each componentat any stage n are
The operating policy of the four-dimensional
(m -- 4) reservoirnetwork presentedin Figure s(n) = s(n -- 1) -{- y -- u(n- 1)
5 is to be optimized over 12 operating periods
(N -- 12). The inflows into reservoirsI and 2 s2(n)
= s.(n
-- 1)-{-y-- u(n-1) (20)
during any operating period are y and y2, sa(n)= sa(n-- 1) -+-u(n -- 1) -- ua(n-- 1)
respectively.The outflowsor releases(decisions)
ut(n), i -- 1, 2, 3, 4 and n ----0, 1, --., 11, from
s4(n)= si(n- 1) -{-ua(n- 1)
the reservoirsare usedto generatehydropower, -{- u(n -- 1) -- u(n -- 1)
and u(n) after passing through the turbines
n ---- 1, 2, -.. 12
is diverted toward an irrigation project. The
The inflows were set at
storages of the four reservoirs represent a
four-dimensional state vector whose constraints
y -- 2 y2 = 3 (21)
during any operatingperiod were set as for all time increments.All the precedingvari-
0 _ s(n) _ 10 0 < s(n) _ 10 ables and constants have units of volume.
(18) Theperformance
criterion
to be maximized
0 _ sa(n)_ 10 0 s4(n)_ 15 is the sumof the returnsdueto powergenerated
n = 0, 1, --- 12 by the four powerplantsand the return from
the diversionof u,(n) to the irrigationproject.
The constraintson decisionsduring any operat-
11 4 11
ing period are
= Z Z (),() + Z ()()
n=O i=l n=0
o _< (n) _ 3 o < (n) _
(19)
o _< u(n) _ 4 o _< u(n) _ ? + Y g,[,(), a,()]
i1
()
n = O, 1, '" , 11 where F is the total return from the system
for the 12 time periods,bt(n) is the unit return
Legend dueto activityi, i -- 1, .--, 5, during period
startingat stagen andlastinguntil stagen + 1,
A Reservoir and gt[s(N), at(N)] is a functionthat assesses
Power Plant
a penaltyto the systemwhenthe finalstateof
the ith component of the systemat stageN is
s2(n) st(N) insteadof the desiredstateat(N), i --
1, 2, 3, 4. Sucha penaltyfunctionis necessary
I I
for traditional dynamicprograming,for which
boundaryconditions may not be satisfied.
The penalty functionin (22) was assumed
to be
sl(n) n)
g,[s(N),a,(N)]

1(n) L3
in) g,[s,(N), a,(N)] = 0
s,(N)
_a,(N) (23)
otherwise
The desired state vectors of the initial and final
stagesfor i -- 1, 2, 3, 4 were assumedto be
i
4s4
i
(n) 5 5

To Farm

Fig. 5. Reservoir network of a simplified system.


Dynamic Programing 279
There are a total of five activities in the above TABLE 1. Benefit Functions Used To Calculate
criterion: four hydropowergenerationactivities Optimal Policiesof the System in Figure 5
and one irrigation activity. The unit benefit
n (n) (n) (n) (n) (n)
functions of these activities b,(n), i -- 1, 2,
' ', 5, are given in Table 1. 0 1.1 1.4 1.0 1.0 1.6
Larson's [1968] solution to this problem I 1.0 1.1 1.0 1.2 1.7
2 1.0 1.0 1.2 1.8 1.8
(solved by successiveapproximation dynamic
3 1.2 1.0 1.8 2.5 1.9
programingand checkedby linear programing) 4 1.8 1.2 2.5 2.2 2.0
is shownin Figure 6 by solidlines.The optimum 5 2.5 1.8 2.2 2.0 2.0
return is 401.3. (The optimal trajectory pre- 6 2.2 2.5 2.0 1.8 2.0
sented in table 12.11 of Larson [1968] is 7 2.0 2.2 1.8 2.2 1.9
8 1.8 2.0 2.2 1.8 1.8
slightlyin error, as noted throughprivate com- 9 2.2 1.8 i.8 1.4 1.7
municationwith R. E. Larson, 1969.) 10 1.8 2.2 1.4 1.1 1.6
Applicationof the proposedapproachto this 11 1.4 1.8 1.1 1.0 1.5
system,which is invertible, starts with the as-
sumptionof a trial trajectory s'(n), n -- O, 1,
' ', 12, satisfying(18) and (24). When substi-
tions (24) are always satisfied.Three such trial
tuted in (20) with constantsin (21), the trial
trajectories,labeled1, 2, and 3 in Figure 6, are
trajectory will produce a trial policy u'(n),
calculated.Next, three values of are assumed
n -- 0, 1, .' ', 11, which shouldbe checkedfor
constraints (19). It is considerablyeasier to
o' = 1.0 o' ----0 o'= --1.0 (25)
treat this problemas a free end point problem,
i.e., not to satisfy either the initial or the final and a set of T -- 3 incremental vectors is
boundary condition. However, the simplicity formedthat whenaddedto the trial trajectory
of the systemequationsin this examplemakesit producesa subdomainconsistingof 81 lattice
possibleto satisfy both boundary conditionsin points at each stage.
(24). The penalty function (23) is thereforenot The problem is solvedthree times. Each time
needed in the DDDP, since boundary condi- the calculations start with one of the trial tra-

t,mo
I Trajectory Opt
imal
Trajectory
IOE ial Trajectory
$ rial Trajectory
2
/ '-'\
-' 8
6I./ /-- . %
%.ESERVO,R
:% / /// \\RESERVOIR
:5
o

2 \/-Trial
Iraiectory
,,/
0 xx_
//TrioI
Trajectry
m _
%,. ...... _

4 6 8 I0 12 0 2. 4 6 I I
Stage(Time),n Stage(Time),n
;Optimal
Trajectory ,Trial
Trajectory2

1-"' /T__r
icI
Trajectry
:5 II ////
.....IXXX
6 Trial
4
Trajectory
II'RESEOIR
.........

4L'xxOptimal
Trajectory//
/ o 2
xx /Trial XTriol
x Trajectory
5 [
x_ / Trajectoryg 0

4 6 8 I0 12 0 g 4 6 8 I0
Stage(Time),n Stage(Time),n

Fig. 6. Trial trajectories1, 2, and 3 and optimal trajectory of systemin Figure 5, using
ff-l.0, o' = 0.0, and = --1.0. ,.,
28O t-IEIDARI ET AL.

Legend: three trial trajectoriesto the optimal trajectory.


o Trial Trajectory ! The number of iterations required for conver-
a Trial Trajectory 2.
Trial Trajectory 3 genceis 7, 12, and 7 for trial trajectories 1, 2,
and 3, respectively.

410, Optimum Benefit, F :401.5 After the required iterations for the three
trial trajectoriesthe reductionof a,, t -- 1, 2, 3,
does not produce any improvement in the
return. This result may be attributed to three
factors: (1) the optimal trajectory of this sys-
tem followsfull integerstates; (2) the trial tra-
,, jectories in Figure 6 are chosen so that they
follow full integer states; and (3) the valuesof
,, t -- 1, 2, 3, for all stagesare set at full inte-
gers. In a separate try, trial trajectory i is

370 . subjectedto the iteration processwith

Triol
Benefit,
':368,1 aj = 1.3 a2 = 0 a3 = --1.3 (26)
for all stagesstarting with iteration 1, and the
360/
550
Triol
Benefit,
F'=562.5
_
idea of reducing a, t -- 1, 2, 3, is employed.
After a total of 18 iterations in four corridors,
the states shown by solid circles in Figure 8
, rio;Beyefitl,
54C0 2
F' .0
54 4 6 8 I0 12
are obtained, producing a return of 399.06 as
comparedto the optimal return of 401.3. Thus
Iterations, k one concludesthat when the optimal valuesof r,
ig. 7. Total beneft6F as function of number are unknown,the resultmay be considered only
of ierfions for trial rajectories 1, 2, and 3.
an approximation to optimum.

SUmmARY A) CONC,USONS
jectories1, 2, or 3. All three solutionsconverge
to the optimal trajectory as shownin Figure 6. The major factors that inspired the DDDP
Figure 7 showsthe rate of convergencefor the approach were the inherent drawbacks of

- I iV____
RESERVO

0 2 4 6 8 IO 12 0 2 4 6 8 I0 12
Stage.( Time), n Stage ( Time), n

. Trj
a
2 T_r
j_e
o cJ
ory2
4

I I I I I I I I I 0 I I i I f i
4 6 8 I0 1.2 0 2- 4 6 8 I0 12
Stage (Time), n Stage (Time), n

Fig. 8. Near-optimal trajectory shown by solid circles and optimal trajectory of system in
Figure 5, using trial trajectory I and x -- -]-1.3, , -- 0.0, and a- --1.3.
Dynamic Programing 281

traditionaldynamicprograming,namely,mem- TABLE 2. Computer (IBM 360/75) Time Requirements of


the Proposed Approach for the Solution of the System in
ory capacity and computer time requirements. Figure 5
By limiting optimizationto a few lattice points
around trial trajectory, the memory require- Total Pro- Processing
Nominal Operating No. of cessing Time per
mentsappearto have beencurbedsubstantially. Trajectories Periods Interations Time, sec Interation, sec
To illustrate this point numerically, consider
the memoryrequirements of the example. 1 12 $ 35.32 4.42
2 12 13 48.39 3.72
The problemhas four state variables,whose 3 12 8 31.04 3.88
admissiblerangesare given in (18). With the
valuesof at given in (25) the I)I)I)P requires
243 wordsof computermemory,whereastradi-
One must realize that the difficulties encoun-
tional dynamicprogramingusingthe samegrid
size would require 63,888 words. tered in the choice of the trial trajectory and
in the determination of the state subdomain
Another major difficultyin applyingtradi-
tional dynamicprogramingis the computertime are by no means limited to the DDDP ap-
requiredbecauseof the numberof computations proach. Other iterative optimization techniques,
and comparisonsthat must be performedat such as the gradient methods and the second
eachlatticepoint.At eachstageof the example variation methods, also face these problems.
there are 21,296lattice points.If the domainof The incorporation of stochastic inflows into
the decisionsgivenin (19) is dividedinto lattice (17) is rather straightfo.rward operation.
pointswith Au -- 1 unit, a total of 4 X 5 X When the equations for an invertible system
5 X 8 -- 800 combinations of decisionsmust be are used, the states of the system at n and
testedat eachstate lattice point of eachstage. n -- 1 are known. As a result, the first term
By limitingthe optimizationto the neighbor- on the right-hand side of (17) representsthe
hoodof a trial trajectory,the numberof lattice influence of random disturbances,whereas the
points is reduced, and therefore fewer tests secondterm is deterministic. If the range of
will haveto be madeper stateof eachstage. possibleinflowsis divided into V discretelevels,
Furthermore, if the systemis invertible,even (17) may be written as
greaterefficiencymaybeachieved. For example,
if T -- 3 at eachstage,then for a four-dimen- F*[s(n), n]- s(n--1)D(n--1)
max
1
p[y(n -- 1), v]
sionalinvertibleproblemthereareonly3' ----81
possibilities
that statesat stagen -- 1 maylead R[s(n- 1), [s(n- 1), s(n), y(n- 1),
to particularstate at stagen. Thereforeat
a particularstate of stagen only 81 testsin-
stead of 800 will need to be made. n-1]-]-
F*[s(n-
1),
n-1]} (27)
Table 2 summarizesthe processingtime of where p[y(n -- 1), v] is the probability of
IBM 360/75 required to solve the example the vth level of the random variable y(n -- 1).
by meansof the proposedapproach.The num- For systems with stochastic disturbances the
ber of iterations in this table is one more than
independentprobability density function must
that neededto arrive at the optimum results. be replaced by conditional probability den-
The last iteration is required to confirm that sity function.
optimum results have been reached in the
IOTATION
previous iteration.
If the valuesof at are not chosenproperly, a(0), a(N), m-dimensional vectors specifying
it is possiblefor the procedureto convergeto the state of the systemat the begin-
local minimum or maximum. Jacobson and ning and the end of the time hori-
zon, respectively;
Mayne [1970] and the resultsof the present bi(n), unit return due to ith activity in
study indicate that it may be advisable to the time period starting at stagen;
calculate the values of at as a function of the C, corridorformedby all D(n), n - O,
stage either at the beginningof each iteration 1, .-., N;
or when the results of two successive iterations D(n), subdomain formed at stage n;
F, sum of the returns for N time
show little or no improvement in the return. periods;
282

sum of the returns for N time periods Series by the Department of Civil Engineering,
due to u(n) and s(n); University of Illinois, Urbana.
F*, optimum sum of the returns for N
time periods; REFERENCES

function assessinga penalty of the


Bellman, R., Dynamic Programming, 340 pp.,
final state of the ith component of
Princeton University Press, Princeton, New
the system;
Jersey, 1957.
m order of the system,i.e., the number
of state variables;
Bellman, R., Adaptive Control Processes,255 pp.,
Princeton University Press, Princeton, New
total number of time periodsin the
Jersey, 196.1.
time horizon;
beginning of a time period called a
Bryson, A. E., and Yu-Chi YIo, Applied Optimal
stage;
Contro.l, 481 pp., Blaisdell, Waltham, Massa-
chusetts, 1969.
P probability of occurrenceof a flow;
q, number of decision variables in the Fletcher, R., and M. J. D. Powell, A rapidly con-
system;
vergent descentmethod for minimization, Co'm-
R, return from the system in one time put. J., 6, 163-168, 1963.
increment; Fletcher, R., and C. M. Reeves, Function min-
(n), m-dimensional state (storage) vector imization by conjugate gradients, Comput. J.,
at stage n; 7, 149-154, 1964.
S(n), admissible domain of s(n); Jacobson, D. YI., Second-order and second-varia-
s'(n ), n = O, 1, --., N, vector of the trial tion methods for determining optimal control:
trajectory; A comparative study using differential dynamic
s*(n), n = O, 1, .-., N, vector of the optimal programming, Int. J. Contr., 7 (2), 175-196,
1968a.
trajectory;
T, total number of assumed increments Jacobson,D. YI., New second-orderand first-order
from the state domain; algorithms for determining optimal control: A
u(n), q-dimensional decision (release) differential dynamic programming approach,J.
vector to be implemented in the Optimization Theory Appl., 2(6), 411-440, 1968b.
time period starting at stage n; Jacobson, D. YI., Differential dynamic program-
U(n), admissibledomain of u(n); ming methods for solving bang-bang control
u'(n), n=O, 1, -., N- 1, vector of the trial problems,IEEE Trans. Automat. Contr., AC-
policy; 13(6), 661-675, 1968c.
u*(n), n=O, 1, .., N-l, vector of the optimum Jacobson,D. YI., and D. O. Mayne, Differential
policy; Dynamic Programming, 208 pp., American
Yi, inflow into ith reservoir; Elsevier, New York, 1970.
ith m-dimensional incremental Korsak, A. J., and R. E. Larson, A dynamic pro-
vector for stage n; gramming successive approximation technique
*s(n), jth componentof Asi(n); with convergence proofs, 2, Convergence proofs,
constant; Int. Fed. Automat. Contr., Auto,nat. 6(2), 261-
function describing the dynamic 270, 1970.
behavior of the ith component of Larson, R. E., State Increment Dynamic Pro-
the system; gramming, 256 pp., American Elsevier, New
decision of the ith component as a York, 1068.
function of states only; Larson, R. E., and W. G. Keekler, Applications
value of tth assumed increment from of dynamic programming to water resources
the state domain. problems, in Symposium on Computer Control
of Natural Resources and Public Utilities, 52
pp., International Federation of Automarie Con-
Acknowledgments. This paper presents a por- trol, Ylaifa, Israel, September 1967.
tion of the results of a research project on 'Ad- Lee, S. E., Dynamic programming,quasilineariza-
vanced Methodologiesfor Water ResourcesPlan- tion and the dimensionality difficulty, J. Math.
ning' sponsored by the U.S. Office of Water Anal. Appl., 27, 303-322, 1969.
ResourcesResearch and supported by funds pro- Mayne, D., A second-ordergradient method for
vided by the U.S. Department of the Interior as determining optimal trajectories of non-linear
authorized under the Water Resources Research
discrete-time systems, Int. J. Contr., 3(1), 85-
Act of 1964,P.L. 88-379Agreement 14-01-0001-1899. 95, 1966.
'A major portion of the work by the senior author Wong, P. J., and D. G. Luenbergcr, Reducing the
was supportedfinancially by the Illinois State memory requirements of dynamic programming,
Geological Survey. A detailed report on this Oper. Res., 16, 1115-1125, November-December,
study, includingcompletemathematicaltreatment 1968.
of the DDDP, another example of application,
and computer programs,will be publishedin the (Manuscript received October 5, 1970;
Civil Engineering Studies Hydraulic Engineering revised December 14, 1970.)

You might also like