Variables. These Variables Provide Information For Analyzing The Possible Effects That The Current Decision

Dynamic Programming 747
22.1 INTRODUCTION
The decision-making process often involves several decisions to be taken at different times. For example,
problems of inventory control, evaluation of investment opportunities, long-term corporate planning, and
so on, require sequential decision-making. The mathematical technique of optimizing a sequence of
interrelated decisions over a period of time is called dynamic programming. The dynamic programming
approach uses the idea of recursion to solve a complex problem, broken into a series of interrelated
(sequential) decision stages (also called subproblems) where the outcome of a decision at one stage affects
the decision at each of the following stages. The word dynamic has been used because time is explicitly
taken into consideration.
Dynamic programming (DP) differs from linear programming in two ways:
(i) In DP, there is no set procedure (algorithm) as in LP to solve any decision-problem. The DP
technique allows to break the given problem into a sequence of smaller subproblems, which are
then solved in a sequential order (stage).
(ii) LP approach provides one-time period (single stage) solution to a problem whereas DP approach Dynamic
is useful for decision-making over time and solves each subproblem optimally. programming is
the mathematical
technique of
22.2 DYNAMIC PROGRAMMING TERMINOLOGY optimizing a
sequence of
Regardless of the type or size of a decision problem, there are certain terms and concepts that are commonly interrelated
used in dynamic programming approach of solving such problems. decisions over a
period of time.
Stage The dynamic programming problem can be decomposed or divided into a sequence of smaller sub-
problems called stages. At each stage there are a number of decision alternatives (courses of action) and
a decision is made by selecting the most suitable alternative. Stages represent different time periods in the
planning period. For example, in the replacement problem each year is a stage, in the salesman allocation
problem each territory represents a stage.
State Each stage in a dynamic programming problem is assoicated with a certain number of states. These
states represent various conditions of the decision process at a stage. The variables that specify the
condition of the decision process or describe the status of the system at a particular stage are called state
variables. These variables provide information for analyzing the possible effects that the current decision
could have upon future courses of action. At any stage of the decision-making process there could be a
finite or infinite number of states. For example, a specific city is referred to as state variable, in any stage
of the shortest route problem.
Return function At each stage, a decision is made that can affect the state of the system at the next
stage and help in arriving at the optimal solution at the current stage. Every decision that is made has its Stages are the
own worth or benefit associated and can be described in an algebraic equation form, called a return sequence of smaller
function. This return function, in general, depends upon the state variable as well as the decision made sub-problems of the
main problem to be
at a particular stage. An optimal policy or decision at a stage yields optimal (maximum or minimum) return
analyzed.
for a given value of the state variable.
Figure 22.1 depicts the decision alternatives known at each stage for their evaluation. The range of
such decision alternatives and their associated returns at a particular stage is a function of the state input
to the stage itself. The state input to a stage is the output from the previous (larger number) stage and
the previous stage output is a function of the state input to itself, and the decision taken at that stage.
Thus, to evaluate any stage we need to know the values of the state input to it (there may be more than
one state inputs to a stage) and the decision alternatives and their associated returns at the stage.
Fig. 22.1
Information Flow
between Stages
748 Operations Research: Theory and Applications
For a multistage decision process, functional relationship among state, stage and decision may be
described as shown in Fig. 22.2.
Fig. 22.2
Functional
Relationship
among
Components of a
DP Model
where n = stage number

sn = state input to stage n from stage n + 1. Its value is the status of the system resulting from
the previous (n + 1) stage decision.
dn = decision variable at stage n (independent of previous stages). This represents the range of
alternatives available when making a decision at stage n.
fn = rn (sn, dn) = return (or objective) function for stage n.
States at each Further, suppose that there are n stages at which a decision is to be made. These n stages are all
stage represents interconnected by a relationship (called transition function):
conditions of
that is, Ouput at stage n = (Input to state n) * (Decision at stage n).
decision-making at
that stage. sn – 1 = sn * dn
where * represents any mathematical operation, namely addition, subtraction, division or multiplication. The
units of sn, dn and sn – 1 must be homogeneous.
It can be seen that at each stage of the problem, there are two input variables: state variable, sn and
decision variable, dn. The state variable (state input) relates the present stage back to the previous stage.
For example, the current state sn provides complete information about various possible conditions in which
the problem is to be solved when there are n stages to go. The decision dn is made at stage n for optimizing
the total return over the remaining n – 1 stages. The decision dn, which optimizes the output at stage n,
produces two outputs: (i) the return function rn (sn, dn) and (ii) the new state variable sn – 1.
The return function is expressed as function of the state variable, sn. The decision (variable), dn
indicates the state of the process at the beginning of the next stage (stage n – 1), and is denoted by
transition function (state transformation)
s n – 1 = tn (sn, dn),
where tn represents a state transformation function and its form depends on the particular problem to be
Return function is
solved. This formula allows the transition from one stage to another.
an algebraic
equation that 22.3 DEVELOPING OPTIMAL DECISION POLICY
represents worth or
benefit associated Dynamic programming is an approach in which the problem is broken down into a number of smaller
with a decision
subproblems called stages. These subproblems are then solved sequentially until the original problem is
taken at each
stages. finally solved. A particular sequence of alternatives (courses of action) adopted by the decision-maker in
a multistage decision problem is called a policy. The optimal policy, therefore, is the sequence of alternatives
that achieves the decision-maker’s objective. The solution of a dynamic programming problem is based
upon Bellman’s principle of optimality (recursive optimization technique), which states:
The optimal policy must be one such that, regardless of how a particular state is reached, all later
decisions (choices) proceeding from that state must be optimal.
Based on this principle of optimality, an optimal policy is derived by solving one stage at a time, and then
sequentially adding a series of one-stage-problems that are solved until the optimal solution of the initial
problem is obtained. The solution procedure is based on a backward induction process and forward
induction process. In the first process, the problem is solved by solving the problem in the last stage and
working backwards towards the first stage, making optimal decisions at each stage of the problem. In the
forward process is used to solve a problem by first solving the initial stage of the problem and working
towards the last stage, making an optimal decision at each stage of the problem.
The exact recursion relationship depends on the nature of the problem to be solved by dynamic
programming. The one stage return is given by:
f1 = r1 (s1 , d1)
and the optimal value of f1 under the state variable s1 can be obtained by selecting a suitable decision
variable d1. That is, A particular
f * ( s ) = Opt r ( s , d )
1 1
d1
l 1 1 1 q sequence of
alternatives adopted
by the decision-
The range of d1 is determined by s1, but s1 is determined by what has happened in Stage 2. Then in Stage 2 maker in a
the return function would take the form: multistage decision
problem is called a
{ }
f 2* ( s 2 ) = Opt r2 ( s 2 ) * f1* ( s1 ) ; s1 = t 2 ( s 2 , d 2 )
d2
policy.
By continuing the above logic recursively for a general n stage problem, we have
dn
{ }
f n* ( sn ) = Opt rn ( s n , d n ) * f n*−1 ( s n − 1 ) ; sn – 1 = tn (sn, dn)
The symbol * denotes any mathematical relationship between sn and dn, including addition, subtraction,
multiplication.
The General Procedure
The procedure for solving a problem by using the dynamic programming approach can be summarized in
the following steps:
Step 1: Identify the problem decision variables and specify the objective function to be optimized under
certain limitations, if any.
Step 2: Decompose (or divide) the given problem into a number of smaller sub-problems (or stages).
Identify the state variables at each stage and write down the transformation function as a function of the
state variable and decision variable at the next stage. The optimal policy
Step 3: Write down a general recursive relationship for computing the optimal policy. Decide whether must be one such
that, regardless of
to follow the forward or the backward method for solving the problem. how a particular
Step 4: Construct appropriate tables to show the required values of the return function at each stage state is reached, all
as shown in Table 22.1. later decisions
proceeding from that
Step 5: Determine the overall optimal policy or decisions and its value at each stage. There may be more state must be
than one such optimal policy. optimal.
Decision, d n → f n ( sn , d n ) Optimal Return Optimal Decision

dn f n* ( s n ) d n*
States, s n Table 22.1
B Stage 1
22.4 DYNAMIC PROGRAMMING UNDER CERTAINTY

The decision problems where conditions (constraints) at each stage, (i.e. state variables) are known with
certainty, can by solved by dynamic programming.
Model I : Shortest Route Problem

Example 22.1 A salesman located in a city A decided to travel to city B. He knew the distances of
alternative routes from city A to city B. He then drew a highway network map as shown in the Fig. 22.3.
The city of origin A, is city 1. The destination city B, is city 10. Other cities through which the salesman
will have to pass through are numbered 2 to 9. The arrow representing routes between cities and distances
in kilometers are indicated on each route. The salesman’s problem is to find the shortest route that covers
all the selected cities from A to B.
750 Operations Research: Theory and Applications
Fig. 22.3
Network of
Routes
Solution To solve the problem, we need to define problem stages, decision variables, state variables,
return function and transition function. For this particular problem, the following definitions will be used
to denote various the state and decision variables.
dn = decision variables that define the immediate destinations when there are n(n = 1, 2,
3, 4,) stages to go.
sn = state variables describe a specific city at any stage.
Dsn, dn = distance associated with the state variable, sn, and the decision variable, dn for the
current nth stage.
fn (sn , dn) = minimum total distance for the last n stages, given that salesman is in state sn and
selects dn as immediate destination.
fn* (sn) = optimal path (minimum distance) when the salesman is in state sn with n more stages
to go for reaching the final stage (destination).
We start calculating distances between a pair of cities from destination city 10 (= x1) and work backwards
x 5 → x 4 → x 3 → x 2 → x1 to find the optimal path. The recursion relationship for this problem can be
stated as follows:
n n
dn
{
f * ( s ) = Min D sn , d n n }
+ f * ( d ) ; n = 1, 2 , 3, 4
n −1
where f n*− 1 ( d n ) is the optimal distance for the previous stages.

Working backward in stages from city B to city A, we determine the shortest distance to city B (node
10) in stage 1, from state s1 = 8 (node 8) and state s1 = 9 (node 9) in stage 2. Since the distances associated
with entering stage 2 from state s1 = 8 and s1 = 9 are D8, 10 = 7 and D9, 10 = 9, respectively, the optimal
value of f1* ( s1 ) is the minimum value between D8, 10 and D9, 10. These results are shown in Table 22.2.
Decision, d1 → f 1 ( s1 , d1 ) = Ds , d Minimum Optimal

1 1
Distance Decision
10
f * (s )
1 1
d1
Table 22.2 States, s1 8 7 7 10

Stage 2 9 9 9 10
We move backward to stage 3. Suppose that the salesman is at state s2 = 5 (node 5). Here he has to
decide whether he should go to either d2 = 8 (node 8) or d2 = 9 (node 9). For this he must evaluate two
sums:
D + f * ( 8 ) = 4 + 7 = 11 (to state s1 = 8)
5, 8 1
D5 , 9 + f1* ( 9 ) = 8 + 9 = 17 (to state s1 = 9)
This distance function for travelling from state s2 = 5 is the smallest of these two sums:
||||| f 2 ( s2 ) = Min
d2 = 8, 9
l11, 17q = 11 (to state s 1 = 8)
Similarly, the calculation of distance function for travelling from state s2 = 6 and s2 = 7 can be completed
as follows:
R|
D6 , 8 + f1* ( 8 ) = 3 + 7 = 10
For state, s2 = 6 f 2 ( 6 ) = Min S| *
d2 = 8, 9 D
T
6 , 9 + f1 ( 9 ) = 7 + 9 = 16
= 10 ( to state s 1 = 8)
R| D + f1* ( 8 ) = 8 + 7 = 15
f ( 7 ) = Min S
7, 8
For state, s2 = 7 2
|T D
d2 = 8 , 9
7, 9 + f1* ( 9 ) = 4 + 9 = 13
= 13 ( to state s1 = 9 )
These results are entered into the two-stage table as shown in Table 22.3.
Decision, d 2 → f 2 ( s 2 , d 2 ) = Ds + f1* ( d 2 ) Minimum Optimal

2 , d2
Distance Decision
8 9 f 2* ( s 2 ) d2
States, s2 5 11 17 11 8
6 10 16 10 8 Table 22.3
7 15 13 13 9 Stage 3
The results that we obtain by continuing the same process for stages 4 and 5, are shown in Tables 22.4
and 22.5.
Decision, d 3 → f 3 ( s 3 , d 3 ) = D s3 , d 3 + f 2* ( d 3 ) Minimum Optimal

Distance Decision
5 6 7 f * (s )
3 3 d3
States, s3 2 18 20 18 18 5 or 7
3 14 18 17 14 5 Table 22.4
4 17 20 18 17 5 Stage 4
Decision, d 4 → f 4 ( s 4 , d 4 ) = D s4 , d 4 + f 3* ( d 4 ) Minimum Optimal

Distance Decision
2 3 4 f * (s )
4 4 d4
Table 22.5
States, s4 1 22 20 20 20 3 or 4 Stage 5
The above optimal results at various stages can be summarized as below:

Entering states (nodes)
Sequence
R|S 10 8 5 3 1
|T 10 8 5 4 1
Distances
RS7 4 3 6 = 20
T7 4 6 3 = 20
From the above, it is clear that there are two alternative shortest routes for this problem, both having a
minimum distance of 20 kilometres.

Variables. These Variables Provide Information For Analyzing The Possible Effects That The Current Decision

Uploaded by

Copyright:

Available Formats

You might also like

Variables. These Variables Provide Information For Analyzing The Possible Effects That The Current Decision

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Variables. These Variables Provide Information For Analyzing The Possible Effects That The Current Decision

Uploaded by

Copyright:

Available Formats

Dynamic Programming 747

where n = stage number

Decision, d n → f n ( sn , d n ) Optimal Return Optimal Decision

22.4 DYNAMIC PROGRAMMING UNDER CERTAINTY

Model I : Shortest Route Problem

where f n*− 1 ( d n ) is the optimal distance for the previous stages.

Decision, d1 → f 1 ( s1 , d1 ) = Ds , d Minimum Optimal

Table 22.2 States, s1 8 7 7 10

Decision, d 2 → f 2 ( s 2 , d 2 ) = Ds + f1* ( d 2 ) Minimum Optimal

Decision, d 3 → f 3 ( s 3 , d 3 ) = D s3 , d 3 + f 2* ( d 3 ) Minimum Optimal

Decision, d 4 → f 4 ( s 4 , d 4 ) = D s4 , d 4 + f 3* ( d 4 ) Minimum Optimal

The above optimal results at various stages can be summarized as below:

You might also like