Professional Documents
Culture Documents
Introducción Piazza
Introducción Piazza
Leonardo R. Laura-Guarachi
SEPI-ESE-IPN
Plan de Agua Prieta 66, Plutarco Elı́as Calles, 11340 Miguel Hidalgo, México City,
México.
Saul Mendoza-Palacios,II∗
El Colegio de México, CEE.
Carretera Picacho Ajusco 20, Ampliación Fuentes del Pedregal, 14110 Tlalpan, México
City, México.
Abstract
This paper concerns optimal control problems for infinite-horizon discrete-
time deterministic systems with the long-run average cost (AC) criterion.
This optimality criterion can be traced back to a paper by Bellman [6] for
a class of Markov decision processes (MDPs). We present a survey of some
of the main approaches to study the AC problem, namely, the AC optimal-
ity (or dynamic programming) equation, the steady state approach, and the
vanishing discount approach, emphasizing the difference between the deter-
ministic control problem and the corresponding (stochastic) MDP. Several
examples illustrate these approaches and related results. We also state some
open problems.
I
The work of this author was partially supported by CONACYT grant 263963.
II
Work partially supported by Consejo Nacional de Ciencia y Tecnologı́a (CONACYT-
México) under grants CONACYT-(Project No. A1-S-11222) and Ciencia-Frontera 2019-
87787.
∗
Corresponding author
URL: smendoza@math.cinvestav.mx (Saul Mendoza-Palacios,II )
1. Introduction
This paper concerns deterministic discrete-time control systems in which
the state process {xt , t = 0, 1, ...} ⊂ X evolves as
xt+1 = F (xt , at ), t = 0, 1, 2, ... (1)
where {at , t = 0, 1, ...} ⊂ A is the sequence of control variables or control
actions at each time t. In many applications, the state and action spaces
X and A are subsets of finite-dimensional spaces, say Rn and Rm . Here,
however, we suppose that the state and action (or control) spaces X and A
are so-called Borel spaces (that is, Borel subsets of complete and separable
metric spaces), which include all the spaces that appear in applications, even
finite or countable sets (with the discrete topology).
Given a cost-per-stage function c(x, a), let
T −1
X
JT (π, x) := c(xt , at ) (2)
t=0
be the total cost in the first T stages (T = 1, 2, ...) when using the control
policy (or strategy) π = {a0 , a1 , ...} ⊂ A, given the initial state x0 = x. In
the long-run (or asymptotic or limiting) average cost (AC) control problem
we wish to minimize the objective function (or performance index)
1
J(π, x) := lim sup JT (π, x) ∀x0 = x (3)
T →∞ T
over all policies π, subject to (1). (In Section 2, below, we present a more
detailed description of the AC control problem.)
The AC value function is
J ∗ (x) := inf{J(π, x) : π ∈ Π} (4)
where Π is the set of admissible (or feasible) control policies or strategies
(see Section 2). A control policy π ∗ is said to be average–cost optimal (AC–
optimal) if
J(π ∗ , x) = J ∗ (x) ∀x ∈ X. (5)
2
Remark 1.1. Concerning the definition in (3), note that at the outset we
do not know if the averages (1/T )JT (π, x) converge as T → ∞; therefore,
we have to use either “lim sup” or “lim inf” to ensure that J(π, x) is well
defined. Moreover, the reason for using “lim sup” rather than “lim inf” is
due to the Abelian theorem introduced in Section 5. We will come back to
this point after Lemma 5.6.
3
and Lasserre [25, 26],...) that a time-homogeneous MDP can be represented
in compact form as a Markov control model (CM) CM := (X, A, Q, c), with
X, A and c as above, whereas Q represents the process transition law or
transition probability
for all B ⊂ X, (x, a) ∈ X ×A, and t = 0, 1, ... In particular, for the determin-
istic system (1) the transition law is the Dirac (or unit) measure concentrated
at F (·, ·), that is,
(
1 if F (x, a) ∈ B,
Q(B|x, a) = δF (x,a) (B) := (7)
0 otherwise.
4
continuous function v. Due to this fact, our statements concerning MDPs
are always restricted to the weakly continuous (or Feller) case, as in Costa
and Dufour [15], Feinberg et al. [18], and Vega-Amaya [41], among others.
To the best of our knowledge, the analysis of AC problems for MDPs with
weakly continuous (or Feller) transition probabilities was initiated by Schäl
[38].
Unfortunately, having a weakly continuous transition law is still not suf-
ficient for some MDP results to be applicable to the deterministic case. The
reason is that typically MDPs require conditions such as ergodicity, irre-
ducibility and others that do not hold in the deterministic case.
5
For each state x ∈ X, we denote by A(x) ⊂ A the (nonempty) set of
feasible controls (or control actions) in x. The set of feasible state-action
pairs
K := {(x, a) ∈ X × A : a ∈ A(x)} (12)
is the graph of the set-valued function (or multifunction) x → A(x).
We also assume that the mapping from K to X in (11) is continuous.
Let F be the family of functions f : X → A such that f (x) is in A(x)
for all x ∈ X; that is, the graph {(x, f (x)) : x ∈ X} of f is in K. These
functions f are called selectors of the multifunction x → A(x). We assume
that F is nonempty.
For many purposes, an admissible control policy (or strategy) is just a
sequence π = {at , t = 0, 1, ...} such that at is in A(xt ) for all t = 0, 1, ... In
particular, if, for every t = 0, 1, ..., at = ft (xt ) for some function ft ∈ F, then
π is said to be a Markov (or feedback) control policy. Moreover, if there exists
f ∈ F such that ft ≡ f for all t, then π is called a stationary Markov policy
or simply a stationary policy. In this case (following a standard convention
for MDPs), we identify π with f .
6
As an example, if ξ is bounded, then (13) holds for all π ∈ Π and all
f ∈ F; hence Πξ = Π, and Fξ = F. For special functions ξ, the relation (13)
is a transversality-like condition, which is a standard requirement in some
optimal control problems.
Assumption 2.2. There is a policy π ∈ Π such that J(π, x) < ∞ for each
x ∈ X.
Theorem 3.1. Suppose that (j ∗ , l) is a solution to the ACOE (14), and let
Πl be as in Remark 2.1 (c) with ξ = l in (13). Then, for every initial state
x0 = x,
(b) j ∗ ≤ J ∗ (x) if Πl = Π.
Moreover, suppose that there exists a policy f ∗ ∈ Fl such that f ∗ (x) ∈ A(x)
attains the minimum in the right–hand side of (14), i.e.,
7
(c) j ∗ = J(f ∗ , x) ≤ J(π, x) for all π ∈ Πl ; hence
Multiplying both sides of the latter inequality by 1/T and then letting T →
∞ we obtain (a), which in turn gives (b) if Πl = Π.
(c) If f ∗ satisfies (15), then we have equality throughout (16)–(17), which
yields the equality in (c). The inequality follows from (a). Finally, (d) is a
consequence of (b) and (c).
As in Remark 2.1(c), if l is bounded, then Πl = Π, as required in parts
(b) and (d) of Theorem 3.1.
The ACOE is very useful in the sense that, under the conditions of The-
orem 3.1, it gives the minimal AC j ∗ and an AC optimal policy f ∗ . It is also
useful to obtain refinements of AC optimality, such as “overtaking optimal”
or “bias optimal” controls. However, if we are only interested in obtaining
AC-optimal controls, it suffices to obtain an optimality inequality. This is
explained in Section 5.
Arguments similar to those in the proof of Theorem 3.1 give other useful
results, such as the following.
8
(b) If the inequality in (18) is reversed, i.e.,
Multiplying both sides by 1/T and letting T → ∞, since limt→∞ l(xft )/t ≥ 0,
we obtain that j ∗ ≥ J(f, x) for all x ∈ X.
(b) The proof only requires changing the inequality in (a).
9
with a given initial state x0 ∈ X, θ ∈ (0, 1). Consider the objective function
to be optimized as the long–run average reward (AR)
T −1
1X
J(π, x0 ) = lim inf log(at ). (21)
T →∞ T
t=0
Note, from (21), that the stage reward is r(x, a) = log(a) (in lieu of the
stage cost c(x, a) in (2)).
To find a canonical triplet (j ∗ , l, f ∗ ) that satisfies the average reward op-
timality equation (AROE)
cxθ
∗ θ cb
j + b log(x) = log + b log x
1+b 1+b
c cb
= (1 + b)θ log(x) + log + b log .
1+b 1+b
θ
This last equation is satisfied if b = (1 + b)θ, which implies that b = .
1−θ
Therefore the canonical triplet (j ∗ , l, f ∗ ) is given by
θ
j ∗ = log (c(1 − θ)) + log (cθ) , (23)
1−θ
θ
l(x) = log(x), (24)
1−θ
f ∗ (x) = c(1 − θ)xθ . (25)
10
If the reader is familiar with the dynamic programming or Bellman equa-
tion he/she surely noted in Example 3.4 that the solution (23)-(25) to the
AROE (22) was obtained by means of the “guess and verify” procedure. That
is, in view of the stage reward in (21), we guessed that the function l(·) in
(22) is of the form l(x) = b log(x), and then we verified that this is indeed
the case for some value of b. We will use the same procedure in the following
example on a LQ (linear system, quadratic cost) problem. To simplify the
presentation, in this example we consider a scalar (or one-dimensional) LQ
problem, but similar results hold in a general vector case. For details see, for
instance, Section 7.4 in Bersekas [7].
F (x, a) = δx + ηa,
subject to
xt+1 = δxt + ηat ∀t = 0, 1, ... (28)
for every initial state x0 = x. From (14), the corresponding ACOE is
In view of (27)-(28) we conjecture that l is of the form l(x) = bx2 for some
constant b. To verify that this is indeed the case, we insert this function l in
(29) to obtain that the minimum is attained at
bδη
a∗ = −Bx with B := . (30)
r + bη 2
11
With this value of a = a∗ (29) becomes
[qr + (qη 2 + rδ 2 )b] 2
j ∗ + bx2 = x. (31)
r + bη 2
To proceed further, observe that with a∗ as in (30), it follows that (28)
becomes
rδ
xt+1 = Γxt with Γ := δ − Bη = . (32)
r + bη 2
Assuming that b is such that |Γ| < 1, the linear system (32) is stable and, as
t → ∞,
xt = Γt x0 → 0
for every initial state x0 . We thus obtain from (30)-(31) that (j ∗ , l, f ∗ ) is a
canonical triplet with j ∗ = 0, l(x) = bx2 , and f ∗ (x) = −Bx, where b is the
unique positive solution of the quadratic (steady-state Riccati) equation
[qr + (qη 2 + rδ 2 )b]
b= .
r + bη 2
♦
then (x∗ , a∗ ) is called a minimum steady state-action pair for the ACcontrol
problem (1)–(3).
The following Assumption 4.1 summarizes some of the main conditions
required in the steady-state approach to the AC problem, namely, (a) the
existence of a minimum steady state-action pair, (b) a dissipativity condition,
and (c) a stabilizability/reachability condition. (Concerning (b), see Remark
4.2)
12
Assumption 4.1. The OCP (1)–(3) satisfies:
(a) There exists a minimum steady state-action pair (x∗ , a∗ ) ∈ K.
(b) The problem (1)–(3) is dissipative, which means that there is a so-called
storage function λ : X → R such that, for every (x, a) ∈ K,
(c) Let λ and Πλ be as in (34) and Remark 2.1(c), respectively. For each
initial state x0 ∈ X, there is a control policy π̄ ∈ Πλ (which may
depend on x0 ) such that the corresponding path (x̄t , āt ) converges to
the minimum state-action pair (x∗ , a∗ ) in (a) as t → ∞.
Remark 4.2. (a) The notion of dissipativity was introduced in control
theory by Willems [42]. He considered a differential system, say ẋ(t) =
F (x(t), a(t)), instead of the discrete-time model (1). Dissipativity for
the discrete-time case was introduced by Byrnes and Lin [14]. The
reader should be warned, however, that the terminology is not stan-
dard. For instance, different authors may use different signs in (34).
(b) A function λ that satisfies (34) is called an excessive function by Flynn
[20]. He also uses a Lagrange multipliers approach to study the static
problem (33), which of course is a constrained optimization problem.
(c) If (j ∗ , l) satisfies the ACOE (14), then
13
(a) j ∗ = J(π̄, x) ≤ J(π, x) for all π ∈ Πλ , where π̄ satisfies Assumption
4.1(c); hence
so, by (13),
j ∗ ≤ J(π, x) ∀x ∈ X. (36)
On the other hand, by the stabilizability in Assumption 4.1(c), there is a pol-
icy π̄ ∈ Πλ for which the state–action sequence (x̄t , āt ) converges to (x∗ , a∗ ).
Therefore, since the stage cost c is continuous, c(x̄t , āt ) converges to j ∗ , which
implies that J(π̄, x) = j ∗ for all x. This prove (a).
(b) If Πλ = Π, then part (a) yields that π̄ is AC-optimal and also that
the AC value function is J ∗ (·) ≡ j ∗ .
In the remainder of this section we present some examples that illustrate
Theorem 4.3.
14
Example 4.5 (The Brock–Mirman model, cont’d.). In Example 3.4, for
the transition function F (x, a) = cxθ − a, and the stage reward (or utility)
function r(x, a) = log(a), it can be verified that the unique solution to the
corresponding steady state-action problem (33),
is given by
1 θ
(x∗ , a∗ ) = (cθ) 1−θ , c(1 − θ)(cθ) 1−θ . (37)
We can obtain the dissipativity inequality (34) from the ACOE in Exam-
ple 3.4. However, it is illustrative to obtain (34) directly as follows.
From (23)-(24) and Theorem 4.3(a), we have
θ
J(π̄, x) = r(x∗ , a∗ ) = log(a∗ ) = log (c(1 − θ)) + log (cθ) ; (38)
1−θ
where π̄ ≡ f ∗ (·) in (25). To get the Assumption 4.1(b), note that F (x, a∗ ) −
F (x, a) = a − a∗ . Thus, the strict concavity of the stage reward r(x, a) :=
log(a) gives
∂r ∗ ∗ ∂r ∗ ∗
r(x, a) − r(x∗ , a∗ ) ≤ (x , a )(x − x∗ ) + (x , a )(a − a∗ )
∂x ∂a
1 ∗
= ∗ (F (x, a ) − F (x, a)).
a
Moreover, the system function F (x, a∗ ) is also concave in x and ∂F ∂x
(x∗ , a∗ ) =
1, so
∂F ∗ ∗
F (x, a∗ ) − F (x∗ , a∗ ) ≤ (x , a )(x − x∗ ) = x − x∗ ,
∂x
that is, F (x, a∗ ) ≤ x for all x ∈ X. Therefore, from the last two inequalities
we get the corresponding dissipativity condition (34)
with the storage function λ(x) := a1∗ x. Notice that λ is different from l in
(24), Example 3.4.
Observe that for any initial state x0 ∈ X, the policy f ∗ in (25) with
corresponding state-control path (xt , at ), given by
1−θ t t θ−θ t+1 t+1
xt = (cθ) 1−θ xθ0 , at = c(1 − θ)(cθ) 1−θ xθ0 ,
15
satisfies the stabilizability Assumption 4.1(c), i.e., (xt , at ) converges to the
optimal stationary pair (x∗ , a∗ ).
Hence, by Theorem 4.3, r(x∗ , a∗ ) = J(π̄, x) ≥ J(π, x) for all π ∈ Πλ and
all x ∈ X. ♦
Example 4.6 (The Mitra-Wan forestry model; [33, 31, 30]). Consider a
forestland covered by trees of the same species classified by age-classes from
1 to n. After age n, trees have no economic value. The state space in this
example can be identified with the n–simplex
n
X
n
∆ := {x ∈ R : xi = 1, xi ≥ 0, i = 1, ..., n}
i=1
16
Now, assume the timber production per unit area is related to the tree
age-classes by the biomass vector
ξ = (ξ1 , ξ2 , ..., ξn ) ∈ Rn , ξi ≥ 0, i = 1, 2, . . . , n,
It can be shown that the control system (41) has a set of stationary states
given by the pairs (x, a) satisfying x1 ≥ x2 ≥ · · · ≥ xn and
a1 = x1 − x2 , a2 = x2 − x3 , ..., an = xn .
Moreover, for each age class i there is a pair of stationary state and con-
trol (xi , ai ), known as a normal forest, defined as follow: the state is xi :=
(1/i, ..., 1/i, 0, ..., 0), where each of the first i coordinates is 1/i , and the
remaining are 0; and the control is ai := (0, ..., 0, 1/i, 0, ..., 0), where 1/i is in
the i–coordinate.
Following the Brock-Mitra-Wan condition [11, 34], we assume that there
is a unique normal forest (x∗ , a∗ ) that satisfies
17
Introducing the function λ : ∆ → R, defined by λ(x) = kγN · x, and the
value j ∗ := r(x∗ , a∗ ), we have the corresponding dissipativity inequality (35),
18
over all policies π = {at , t = 0, 1, ...} ∈ Π, subject to (1). The corresponding
α-discount value function is
Vα (π ∗ , x) = Vα (x) ∀x ∈ X. (46)
(c) c is K-inf compact, which means that, for every sequence {(xt , at )} in
K such that xt → x and {c(xt , at )} is bounded above, it holds that {at }
has an accumulation point in A(x).
Clearly, in Assumption 5.1(b) we may replace “c nonnegative” by “c
bounded below”. On the other hand, a condition ensuring Assumption 5.1(c)
is, for example, that c is inf-compact, that is, for every real number r, the
set {(x, a) ∈ K|c(x, a) ≤ r} is compact. (See Feinberg et al. [18].)
Let L(X) be the family of real-valued functions on X which are lower
semicontinuous (l.s.c.) and bounded below. For each α ∈ (0, 1), we define
the operator Tα on L(X) as
19
(b) For every v ∈ L(X) there exists a stationary policy f ∈ F such that
f (x) ∈ A(x) attains the minimum at the right-hand side of (47), i.e.
(using the notation in Remark 2.1(a)),
for all x ∈ X.
(c) The α-discount value function in (45)-(46) is in L(X) and it is a fixed-
point of Tα , that is, Tα Vα = Vα . More explicitly,
for all x ∈ X.
(e) A stationary policy f ∈ F is α-optimal if, and only if, f satisfies (50).
Theorem 5.2 is a standard result in discounted dynamic programming.
See, for instance, the references mentioned at the paragraph preceding As-
sumption 5.1.
The equation (49) is called the α-discounted cost optimality equation (α-
DCOE), and it is also known as the discounted cost dynamic programming
or Bellman equation.
The so-called vanishing discount approach to AC control problems is
based on several connections between the α-discounted costs Vα and the
AC costs as α ↑ 1. These connections are discussed in the remainder of this
section. We begin with the simplest case.
i.e.,
∞
X
(1 − α)Vα (π, x) = M + (1 − α) αt [c(xt , at ) − M ]. (51)
t=0
20
This equation is useful in several ways. For instant, if M = J(π, x), then
(51) becomes
∞
X
(1 − α)Vα (π, x) = J(π, x) + (1 − α) αt [c(xt , at ) − J(π, x)]. (52)
t=0
as α ↑ 1.
(b) In particular, if c is continuous and Assumptions 4.1(a) and (c) hold,
then
lim(1 − α)Vα (π̄, x) = J(π̄, x) = j ∗ ,
α↑1
θ − θt+1
log(at ) = log (c(1 − θ)) + log (cθ) + θt+1 log(x0 ),
1−θ
and Proposition 5.3(b) holds with
θ
j ∗ = log (c(1 − θ)) + log (cθ) .
1−θ
♦
21
5.3. An Abelian theorem
Another connection between discounted cost problems and the AC case
is provided by the Abelian theorem in part (a) of the following lemma.
Lemma 5.6. Let {ct } be a sequence bounded below, and consider the lower
and upper limit averages (also known as Cesàro limits)
n−1 n−1
L 1X U 1X
C := lim inf ct , C := lim sup ct ,
n→∞ n n→∞ n
t=0 t=0
Then
(a) C L ≤ AL ≤ AU ≤ C U .
For a proof of Lemma 5.6 see the references in Bishop et al. [8] or Sznajder
and Filar [39]. Part (b) in Lemma 5.6 is known as the Hardy-Littlewood
Theorem.
From Lemma 5.6(a) we can obtain useful bounds for the average costs
J(π, x). See the following Lemma 5.7 and also Theorem 5.9.
lim inf (1 − α)Vα (π, x) ≤ lim sup(1 − α)Vα (π, x) ≤ J(π, x).
α↑1 α↑1
(b) The value functions Vα (·) and J ∗ (·) in (45) and (4), respectively, satisfy
that
lim inf (1 − α)Vα (x) ≤ lim sup(1 − α)Vα (x) ≤ J ∗ (x)
α↑1 α↑1
for all x ∈ X.
22
Proof. (a) Consider an arbitrary policy π = {at } and the corresponding
state trajectory {xt }. Let ct := c(xt , at ). By Assumption 5.1(b), ct is
nonnegative for all t = 0, 1, .... Hence part (a) follows from Lemma
5.6(a).
(b) The first inequality in (b) follows from the first inequality in (a). More-
over, from (46) and part (a) again,
Remark 5.8. For each α ∈ (0, 1), let (xe (α), ae (α)) be a steady state-action
pair which is a limit point of some optimal state-action path. Then
23
According to (53) we have
ρα = (1 − α)Vα (xe (α)) = c(xe (α), ae (α)).
Moreover, as in Assumption 4.1 (b), we define the storage function for the
discounted problem by λα (·) := hα (·). Thus the equation (54) implies a
dissipative-like inequality as in (34),
λα (x) − αλα (F (x, a)) ≤ c(x, a) − c(xe (α), ae (α)) ∀a ∈ A(x). (55)
Note that if α ↑ 1 and (xe (α), ae (α)) converges to the optimal stationary pair
(x∗ , a∗ ) defined in (33), then the inequality (55) yields (34). As an example,
the latter fact holds in the LQ case. Actually, from Example 4.4 we can see
that, for any α ∈ (0, 1], (xe (α), ae (α)) = (x∗ , a∗ ) = (0, 0).
Comparing (54) with the ACOE (14), we may summarize the vanishing
discount approach as follows: Find conditions under which, as α ↑ 1, the pair
(ρα , hα (·)) in (54) converges to a solution (J ∗ , h∗ ) of the ACOE (14).
More explicitly, the idea is to determine a sequence of discount factors
αn → 1 and a pair (J ∗ , h∗ (·)) such that, as n → ∞,
hn ≡ hαn → h∗ and ραn → J ∗ , (56)
and (J ∗ , h∗ (·)) satisfies (14). We show in Subsection 5.5, by means of exam-
ples, that this is indeed a “feasible” approach. However, as far as we can tell,
there are no general results for the deterministic AC problem (1)-(3). All the
known results on the convergence in (56) to a solution of the ACOE refer to
“stochastic” MDPs, not to the “degenerate” case in (8).
The good news is that, in the degenerate case, (56) gives a pair (ρ∗ , h∗ (·))
that satisfies an AC optimality inequality (ACOI), which suffices to obtain
an AC optimal stationary policy f ∗ ∈ F. We will next present this fact.
Let ρα := (1 − α)mα be as in (53), with mα := inf x∈X Vα (x). Moreover,
let ρ∗ := lim supα↑1 ρα and
J ∗ := inf J ∗ (x) = inf inf J(π, x). (57)
x∈X x∈X π∈Π
By Lemma 5.7(b),
ρ∗ ≤ J ∗ . (58)
We now state the ACOI. The next theorem, which we present without proof,
is due to Feinberg et al. [18]. (Vega-Amaya [41] presents a self-contained
proof of Theorem 5.9, shorter than the proof in [18].)
24
Theorem 5.9. Suppose that Assumption 5.1 holds and, in addition,
(a) J ∗ < ∞;
(b) The function h(·) := lim inf hα (·) is finite-valued.
α↑1
Then there exists a function h∗ ∈ L(X), with h∗ (·) ≤ h(·), and a stationary
policy f ∗ ∈ F such that (J ∗ , h∗ (·)) satisfies the ACOI
and, moreover,
ρ∗ + h∗ (x) ≥ c(x, f ∗ ) + h∗ [F (x, f ∗ )]. (59)
Hence the policy f ∗ is AC-optimal.
Remark 5.10. (a) As in Proposition 3.2(a), the inequality (59) gives that
ρ∗ ≥ J(f ∗ , x) for all x ∈ X. Therefore, from (57) and (58),
ρ∗ ≥ J(f ∗ , ·) ≥ J ∗ ≥ ρ∗ .
J ∗ (·) = J(f ∗ , ·) = J ∗ = ρ∗ .
(b) Let J ∗ be as in (57). If a pair (π̄, x̄) ∈ Π × X is such that J(π̄, x̄) = J ∗ ,
then it is called a minimum pair. For (stochastic) MDPs, the existence
of a minimum pair can be determined in several ways, including infinite-
dimensional linear programming arguments. (See, for instance, Yu [44]
or Section 6.4 in Hernández-Lerma and Lasserre [25].)
To conclude this section we note that Costa and Dufour [15] obtain results
on the ACOI similar to Theorem 5.9 using two different sets of assumptions.
These results are valid in our present deterministic context. On the other
hand, they also obtain the ACOE and the convergence of the policy iteration
algorithm but for MDPs that, to the best of our knowledge, exclude the
deterministic problem (1)-(3).
25
5.5. Examples
In this subsection we introduce some examples to illustrate the several
approaches to the AC control problem.
for any policy π = {at } and initial state x0 = x. Clearly, Assumption 5.1
is satisfied, and the α-DCOE (49) can be obtained by the usual “guess and
verify” procedure. In fact, it is well known (see, for instance, Bersekas [7],
Hernández-Lerma and Lasserre [25],...) that the α-discount value function is
given by
Vα (x) = k(α)x2 (60)
for every α ∈ (0, 1) and x ∈ X, where k(α) is the unique positive solution of
the quadratic (Riccati) equation
qr + (qη 2 + rδ 2 )αk
k= . (61)
r + αkη 2
Note that, as α ↑ 1, (61) reduces to the quadratic equation for the AC case
at the end of Example 3.5. In other words, for α = 1, the positive solution
k = k(1) of (61) coincides with the constant b in (30)-(32). Moreover, from
(60),
mα := inf Vα (x) = 0 ∀α ∈ (0, 1).
x∈X
This yields that, for any sequence of discount factors αn → 1, the pairs
(ραn , hαn (·)) converge to the solution (j ∗ , l(·)) of the ACOE (29). See also
(56).
26
To conclude, let us note that, for each α ∈ (0, 1), the α-optimal control
policy is given by
αk(α)δη
fα (x) = −B(α)x with B(α) :=
r + αk(α)η 2
for every initial state x ∈ X, where X := [0, ∞). This problem has been
solved by several methods, see for example Ulus [40], Domı́nguez-Corella and
Hernández-Lerma [17], and Le Van and Saglam [32]. The α-optimal control
and the corresponding value function are
and
1 θα θ
Vα∗ (x) = log[c(1 − θα)] + log(cθα) + log(x)
1−α (1 − α)(1 − θα) 1 − θα
θ
hα (x) = Vα (x) − mα = log(x)
1 − θα
θα
ρα = log[c(1 − θα)] + log(cθα).
1 − θα
This yields that, for any sequence of discount factors αn → 1, the pairs
(ραn , hαn (·)) converge to the solution (j ∗ , l(·)) of the AROE (22). Moreover,
the α-optimal control fα∗ (x) converges to the AC-optimal control f ∗ (·) in
(25), as α ↑ 1. ♦
27
6. Concluding remarks and open problems
This paper presents the three main approaches to analyze average cost
(AC) control problems for discrete-time deterministic systems, namely, the
AC optimality equation, the steady-state approach, and the vanishing dis-
count approach.
AC problems are a standard topic in the theory and applications of
discrete- and continuous-time “stochastic” Markov decision processes (MDPs).
Hence, since our control problems form a special class of MDPs, one might
expect that the results for MDPs are directly applicable to the deterministic
case. This, however, is not the case. Indeed, as noted in the subsection 1.2,
many concepts for stochastic MDPs are not valid for deterministic or “de-
generate” MDPs. As a consequence, there are open problems such as the
following.
28
6. Extend the results in Sections 3, 4, 5 to the “differential case” in which
(1) and (2) are replaced by
and Z T
JT (π, x) := c(x(t), a(t)), T ≥ 0, (63)
0
References
[1] A. Arapostathis, V. S. Borkar, and M. K. Ghosh. Ergodic Control of
Diffusion Processes. Cambridge University Press, UK, 2012.
29
[8] C. J. Bishop, E. A. Feinberg, and J. Zhang. Examples concerning Abel
and Cesàro limits. Journal of Mathematical Analysis and Applications,
420(2):1654–1661, 2014.
[9] D. Blackwell. Discounted dynamic programming. The Annals of Math-
ematical Statistics, 36(1):226–235, 1965.
[10] V. S. Borkar, V. Gaitsgory, and I. Shvartsman. LP formulations of
discrete time long-run average optimal control problems: the nonergodic
case. SIAM Journal on Control and Optimization, 57(3):1783–1817,
2019.
[11] W. A. Brock. On existence of weakly maximal programmes in a multi-
sector economy. The Review of Economic Studies, 37(2):275–280, 1970.
[12] W. A. Brock and L. J. Mirman. Optimal economic growth and uncer-
tainty: the discounted case. Journal of Economic Theory, 4:479–513,
1972.
[13] W. A. Brock and L. J. Mirman. Optimal economic growth and uncer-
tainty: the no discounting case. International Economic Review, 14:
560–573, 1973.
[14] C. I. Byrnes and W. Lin. Losslessness, feedback equivalence, and the
global stabilization of discrete-time nonlinear systems. IEEE Transac-
tions on Automatic Control, 39(1):83–98, 1994.
[15] O. L. V. Costa and F. Dufour. Average control of Markov decision
processes with Feller transition probabilities and general action spaces.
Journal of Mathematical Analysis and Applications, 396(1):58–69, 2012.
[16] E. B. Davies. One-Parameter Semigroups. Academic Press, London,
1980.
[17] A. Domı́nguez-Corella and O. Hernández-Lerma. The maximum princi-
ple for discrete-time control systems and applications to dynamic games.
Journal of Mathematical Analysis and Applications, 475(1):253–277,
2019.
[18] E. A. Feinberg, P. O. Kasyanov, and N. V. Zadoianchuk. Average cost
Markov decision processes with weakly continuous transition probabili-
ties. Mathematics of Operations Research, 37(4):591–607, 2012.
30
[19] J. Flynn. Steady state policies for a class of deterministic dynamic
programming models. SIAM Journal on Applied Mathematics, 28(1):
87–99, 1975.
[20] J. Flynn. Optimal steady states, excessive functions, and deterministic
dynamic programs. Journal of Mathematical Analysis and Applications,
144(2):586–594, 1989.
[21] M. K. Ghosh and K. S. M. Rao. Differential games with ergodic payoff.
SIAM Journal on Control and Optimization, 43(6):2020–2035, 2005.
[22] J. González-Hernández and O. Hernández-Lerma. Extreme points of
sets of randomized strategies in constrained optimization and control
problems. SIAM Journal on Control and Optimization, 15(4):1085–1104,
2005.
[23] L. Grüne. On the relation between discounted and average optimal value
functions. Journal of Differential Equations, 148(1):65–99, 1998.
[24] L. Grüne, M. A. Müller, C. M. Kellett, and S. R. Weller. Strict dissi-
pativity for discrete time discounted optimal control problems. Mathe-
matical Control & Related Fields, 11(4):771–796, 2021.
[25] O. Hernández-Lerma and J. B. Lasserre. Discrete-Time Markov Control
Processes: Basic Optimality Criteria. Springer, New York, 1996.
[26] O. Hernández-Lerma and J. B. Lasserre. Further Topics on Discrete-
Time Markov Control Processes. Springer, New York, 1999.
[27] A. Hochart. Unique ergodicity of deterministic zero-sum differential
games. Dynamic Games and Applications, 11(1):109–136, 2021.
[28] R. A. Howard. Dynamic Programming and Markov Processes. John
Wiley, Cambridge, Mass., 1960.
[29] K. Kawaguchi. Optimal control of pollution accumulation with long-
run average welfare. Environmental and Resource Economics, 26(3):
457–468, 2003.
[30] L. R. Laura-Guarachi. An optimal control problem in forest manage-
ment. In Games and Evolutionary Dynamics: Selected Theoretical and
Applied Developments, pages 189–209. El Colegio de México, 2021.
31
[31] L. R. Laura-Guarachi and O. Hernández-Lerma. The Mitra-Wan an
forestry model: a discrete-time optimal control problem. Natural Re-
source Modeling, 28(2):152–168, 2015.
[32] C. Le Van and H. C. Saglam. Optimal growth models and the Lagrange
multiplier. Journal of Mathematical Economics, 40(3-4):393–410, 2004.
[33] T. Mitra and H. Y. Wan Jr. Some theoretical results on the economics
of forestry. The Review of Economic Studies, 52(2):263–282, 1985.
[34] T. Mitra and H. Y. Wan Jr. On the Faustmann solution to the for-
est management problem. Journal of Economic Theory, 40(2):229–249,
1986.
[35] M. A. Müller. Dissipativity in economic model predictive control: be-
yond steady-state optimality. In Recent Advances in Model Predictive
Control, pages 27–43. Springer, 2021.
[36] M. A. Müller, D. Angeli, and F. Allgöwer. On necessity and robustness
of dissipativity in economic model predictive control. IEEE Transactions
on Automatic Control, 60(6):1671–1676, 2015.
[37] P. A. Samuelson. A note on measurement of utility. The Review of
Economic Studies, 4(2):155–161, 1937.
[38] M. Schäl. Average optimality in dynamic programming with general
state space. Mathematics of Operations Research, 18(1):163–172, 1993.
[39] R. Sznajder and J. A. Filar. Some comments on a theorem of Hardy
and Littlewood. Journal of Optimization Theory and Applications, 75
(1):201–208, 1992.
[40] A. Y. Ulus. On discrete time infinite horizon optimal growth problem.
An International Journal of Optimization and Control: Theories & Ap-
plications (IJOCTA), 8(1):102–116, 2018.
[41] Ó. Vega-Amaya. On the vanishing discount factor approach for Markov
decision processes with weakly continuous transition probabilities. Jour-
nal of Mathematical Analysis and Applications, 426(2):978–985, 2015.
[42] J. C. Willems. Dissipative dynamical systems part i: General theory.
Archive for Rational Mechanics and Analysis, 45(5):321–351, 1972.
32
[43] K. Yosida. Functional Analysis, 6th Edition. Springer, Berlin, 1980.
[44] H. Yu. On the minimum pair approach for average cost Markov decision
processes with countable discrete action spaces and strictly unbounded
costs. SIAM Journal on Control and Optimization, 58(2):660–685, 2020.
33