Dynamic Programming Method in Problems of Discrete Optimizations

Design and Analysis of Algorithms
Dynamic programming method in problems of discrete

optimizations
Formally, a discrete control system that provides a set:
Ω = {D; x0; F; V (x), f (x, v), s (x, v)},
where D is the set of facades of the system (set D of cosomeone); x0 is the

initial state; F is the set of finite facades, x0 ∈ D, x0 ∉ F, F ⊂ D; V (x) is a finite
set of possible controls in state x, x ∈ D\F; f (х, v) – transition function (from the
state x under the control of v the system is executed in the state f (х, v)), x ∈ D\F, v
∈ V (x), f (х, v) ∈ D ; s(х,v) is the payoff function, here x ∈ D\F, v ∈ V (x); the
values of the payoff function are considered non-negative.
A state x of a system is called intermediate if it is neither initial nor final.
The final sequence T = {x0, x1, x2, ..., xn} includes the system Ω called the
trajectory of the system, if the following conditions are met:
xt = f(xt−1,vt),
where v t ∈ V(x t − 1), t = 1, 2, … , n.
The state x0 is the initial state of the trajectory T, the state xn is its final
state. The states x2, x3,…, xn−1 are T intermediate states of the given trajectory.
A trajectory is called complete if its initial state is the initial state of the system Ω,
and its final state is one of the end ends of this system. Thus, for the complete
trajectory Т = {x0, x1, x2, …, xn}, x0 = x0 and xn ∈ F. The trajectory Т = {x0, x1,
x2, …, xn} is called the final x-trajectory if x0 = x and xn ∈ F. The final x-
trajectory, observed by its initial state of reaching the state x of the system Ω, ends
at one of the endpoints of the system. The trajectory T = {x0, x1, x2, …, xn} was
originally called an x-trajectory if x0 = x0 and xn = x. The initial x-trajectory,
corresponding to its final states x, starts from the initial state of the system.
On the multiple basis of the system Ω, we have a binary reachability relation
P(x, y): the translation that arbitrary states x and y is the settlement of this place if
there is a trajectory T that has its initial state x and its final state y. revealing that
the invasive binary resistance of the resistance of reflexivity (for any state x, P(x,
x) occurs) and transitivity (for any x, y, z from what P(x, y) and P(y, z) ) that P(x,
z)) also holds. An additional assumption is that the relation P(x, y) is
antisymmetric (from P(x, y) and P(y, x) follows x = y). The antisymmetry of the
binary reachability relation means that in any of its trajectories the system cannot
realize cycles (return in the state in which it was already). It has the expressiveness
of reflexivity, transitivity and antisymmetry, the binary relation P(x, y) is a partial
order relation.
A state x of the system Ω is said to be reachable if P(x0, x) takes place. scattering
that the initial state x0 is reachable by membership. The state x of the system Ω is
said to be productive if P(x, w) takes place to consider the final state w. The final
state of the family productivity system. Unreachable and unproductive system
states can be excluded from investigation; further, as a rule, it will be considered
that each state of the system is achievable and productive. When detecting systems
generated by emerging complex problems, the question arises of identifying those
that are both achievable and productive, a separate analysis is possible. Getting rid
of unattainable and unproductive withdrawals of increased volume performs
calculations.
Let Т = {x0, x1, x2,…, xn} be an arbitrary trajectory of the system Ω, while x t = f
(x t − 1,v t), where v t ∈ V (x t − 1), t = 1, 2,… , n.
Cost C(T) of the trajectory
n
C(T) = ∑s(xt−1, vt).
m=1
Thus, the cost of the trajectory is the sum of the step-by-step payments, a
unique place in the implementation of the trajectory.
The main task is to build a complete trajectory, the economic benefits of
maintaining the total cost, i.e. total total income or total expense. Both extreme
problems according to the method of solution. To define the problem, we
formulate the formulation.
Problem A. For the system Ω, find a complete trajectory T of incredible
cost.
Additionally, we enter the accessibility of the set of tasks A(x), where x belongs to
D\F.
Problem A(x). For the system Ω and its state x, find the resulting x-
trajectory of high cost.
The solutions of the formulated extremal problems A and A(x) are called the
exceptional complete trajectory and the exceptional final x-trajectory, respectively.
The method of dynamic development is based on a fundamental feature,
formulated by mathematical mathematician Richard Bellman, the sentence:
В(x) = 0, x ∈ F. (1.1)
“The optimal property of having the property that whatever the intended
state and decision at the initial moment, subsequent decisions should constitute the
nature of the behavior relative to the state that exists at the basis of the first
solutions. ".
B(x) = min (s(x,v) + B( f (x,v))), x∈ D \ F. (1.2)

v∈V (x)
Calculation of the values of the Bellman function by relations (1.1)–(1.2) is
performed step by step in the following order. At the first stage, the values B(x) =
0 are fixed for all x ∈ F. Then, at each next stage, the calculation of the next value
of the Bellman function is performed for an arbitrary state x such that B(x) is
unknown, but the values of B(y) for all directly the states y following x have
already been found (the state y of the system Ω is referred to as immediately
following the state x if the pair {x, y} is a trajectory of the system Ω).
The last value in the calculation process is B(x0).
To synthesize the optimal complete trajectory of the system, when
performing the calculation procedure defined by relations (1.1)–(1.2), one should
additionally compile a list of optimal transitions (LTS). Each entry in this list has
the form [xi, xj], where xi is an arbitrary non-final state, and xj = f (xi, v*) is the
state into which the system passes from xi under the influence of control v*, which
is possible in state xi and reverses to a minimum the sum s(xi, v) + B(f (xi, v)) on
the right-hand side of formula (1.2); if there are several such controls, any of them
is fixed (our goal is to construct any of the optimal trajectories, but not all such
trajectories). SOP records correspond one-to-one to non-final states of the system,
while the record [xi, f (xi, v)] is considered to correspond to the state xi. We
assume that the compiled SOP is ordered in ascending order of the indices of the
first components of the records included in it. To construct an optimal trajectory,
we extract a sequence of records W from the SOP, the first element of which is the
record corresponding to the state x0; each subsequent entry of the sequence W has
as its first component the second component of the previous entry of this sequence.
In the last entry of the sequence W, the second component is some final state of the
system Ω. From the compiled SOP, the sequence W is uniquely extracted. Let W =
{[x0, y1], [y1, y2], [y2, y3], …, [ym−1, ym]}, here ym ∈ F. Then Т = {x0, y1, y2,
y3, …, ym} is the desired optimal trajectory.
Equations (1.1)–(1.2) are dynamic programming recursive relations for
solving Problem A. The method for solving Problem A based on these relations is
called the dynamic programming method.
Let N be the number of states of the system Ω. Then the upper estimate for
the number of calculated values of the Bellman function in the process of solving
problem A is equal to N. The number of elementary operations performed when
determining each specific value of this function by formula (1.2) does not exceed
СN, where С is some constant independent of N. Thus, the upper estimate for the
number of elementary operations performed in solving problem A by the dynamic
programming method is СN 2, where N is the number of states of the system Ω,
and С is a constant independent of N. To remember in the process of computing all
found values of the Bellman function, a memory size proportional to N is required;
the amount of memory of the same order is needed to create a list of optimal
transitions. As a result, we find that the amount of memory needed to solve
problem A by dynamic programming depends linearly on N.
It should be noted that due to the finiteness of the number of states of the
system Ω, the number of its trajectories is finite and problem A can in principle be
solved by enumeration of a finite number of options. The dynamic programming
method allows one to order in a certain way and significantly reduce enumeration.
The system Ω can be represented as a finite weighted directed graph G(Ω),
whose vertices correspond one-to-one to the states of the system, the arcs to the
controls, and the weights of the arcs to the costs of the corresponding transitions.
Let us first assume that the vertices are named after the corresponding states. The
vertex x0 is called initial; the vertices corresponding to the states of the subset F
are called final. In the graph G(Ω), an arc going from an arbitrary vertex x to a
vertex y is drawn if and only if in the system Ω for some control v* (v*∈V(x))
possible in the state x, we have y = f (x, v*); the weight of this arc is set equal to
s(x, v*). A vertex x is called immediately preceding a vertex y if there is an arc (x,
y) in the graph. In the graph G(Ω), there is a path from an arbitrary vertex x, x ∈ D,
to a vertex y, y ∈ D, if and only if the binary relation P(x, y) is satisfied. Since P(x,
y) is a partial order relation, the graph G(Ω) is acyclic. From the properties of
transitivity and antisymmetry of the binary relation P(x, y) and the assumption
made about the reachability of any state of the system Ω, it follows that the graph
G(Ω) has no arcs entering the vertex x0. By definition, the system Ω, having
reached the final state, it stops functioning; therefore, there are no arcs in the graph
emanating from the final vertices.
The vertices of the graph G(Ω) are numbered by non-negative integers. The
vertex x0 (the initial vertex) is assigned the number 0.
The next premise of using the Bellman principle as applied to an expected
outcome
The optimal solution of the problem A*(x) is called the optimal initial x-
trajectory of the system Ω.
There is the following fact.
Theorem 1.2. If the complete trajectory Т = {x0, x1, x2,…, xn} is optimal,
then any of its initial parts is also optimal.
The proof of this theorem is identical to the proof of Theorem 1.1.
The cost of the initial x-trajectory, which is optimal in problem A*(x), will
be denoted by B*(x). It's obvious that
В*(x0) = 0. (1.3)
In the system Ω, an arbitrary state x immediately precedes the state y if the pair {x,
y} is a trajectory of the system Ω. The control that transfers the system Ω from
state x directly to state y will be denoted by v[x, y]. The set of system states
immediately preceding the state y will be denoted by Г(у). Based on Theorem 1.2,
we write a relation that allows us to organize the process of calculating the values
of the function B*(y) for all states y other than the initial one:
B*(y) = min{B *(x) + s(x,v[x, y])}, y∈ D \ x0. (1.4)

x∈Γ(y)
Note that
BA = minB*(x). (1.5)
x∈F
The calculation of the values of the function B*(y) based on the recurrence
relations (1.3)–(1.4) is performed step by step in the following order. At the first
stage, the value B*(x0) = 0 is fixed. Then, at each next stage, the calculation of the
next value of the function B* is performed for an arbitrary state y such that B*(y)
is unknown, but the values of B*(x) for all states x immediately preceding state y
have already been found.
The method for solving problem A based on relations (1.3)–(1.5) is the
direct Bellman method. In contrast, the method based on relations (1.1)–(1.2) is
called the inverse Bellman method. The functions B*(x) and B(x) will be called the
Bellman functions for forward and backward counting, respectively. The
implementation of both the inverse and direct Bellman methods for solving
Problem A requires, according to the upper bound, the number of elementary
operations that depends squarely on the number of states of the system Ω.
Sometimes the direct and inverse Bellman methods, implying that the problem is
solved using dynamic programming relations, are called the methods of direct and
backward counting, respectively.

Dynamic Programming Method in Problems of Discrete Optimizations

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dynamic Programming Method in Problems of Discrete Optimizations

Uploaded by

Copyright:

Available Formats

Design and Analysis of Algorithms

Dynamic programming method in problems of discrete

Formally, a discrete control system that provides a set:

Ω = {D; x0; F; V (x), f (x, v), s (x, v)},

where D is the set of facades of the system (set D of cosomeone); x0 is the

where v t ∈ V(x t − 1), t = 1, 2, … , n.

B(x) = min (s(x,v) + B( f (x,v))), x∈ D \ F. (1.2)

B(y) = min{B (x) + s(x,v[x, y])}, y∈ D \ x0. (1.4)

You might also like