Professional Documents
Culture Documents
Makowski
Makowski
bY
A Lagrangian methodology is now described for atudying this con- such that
strained problem; it requires the introduction of a family of auxiliary K(g) I v 5 K(T). (2.9)
problems: For every 7 > 0, let the mapping b'f: S X U -+ lR be given by If the randomized policies f,, 0 5 q 5 1, are defined by
b,(z, u) := r(z, u) - 7c(z, u) (2.4)
fq := clji + (1 - (2.10)
for all (2, u) in S x r/, and define the corresponding Lagrangian functional then j' P fq., where the optimal bias q' is determined as the solution to
by the equation
1 " K ( f , ) = v, 0 I q I 1. (2.11)
B ' ( r ) := -E" x b " ( X ( t )U, ( t ) ) (2.5)
t=1
The discussion of this result can be summarized as the search for
for every policy r in P . As itoccursinMathematicalProgramming, apolicyin 7 that satisfies the requirements (Rl)-(R3) forsomevalue
the solution of the constrained MDP (C&) is closely related to various ' > 0 of the Lagrangian parameter. The reader
7 is referred to 1261 for
properties of the unconstrained MDP (LP,)associated with (2.5), where details.
[LP,): maximize B7(r)over P . 2.4 Optimality via m k h g policies
The existence of policies g and in $7' with the property (2.9) can
To see this, observe that for any policy r in P , the inequality B1(r)1 be further exploited to generate an alternate solution to the constrained
J ( r ) - 7 K ( r ) holds, whereas if the policy r yields (2.1) and (2.2) as limits, MDP (CA.) in the one-parameter family of mizing policies {r(p),O 5
then B'(r) = J(r11 - 7 K ( r ) . If, in addition to this property, a policy p 5 1). For every p in the unit interval [0,1], consider a two-sided coin
r* meets the constraint, i. e., K ( r * ) = V , and is optimal for the MDP biased so that the events Head and Tail occur with probability p and
(LP,), then necessarily for all r in P , 1 - p, respectively. To define the mixing policy ~ ( p ) throw
, this biased
coin exactly once at the beginning of times, before starting to operate
-
B"(*') = J(*') - 7 K ( r * )1 F ( r )1 J ( r ) 7 K ( r ) . (2.6) the system. The policy r(p) is defined as the policy in P that operates
according to s (reap. g) if the outcome is Tail(resp.Head). It is not
difficult to see that for such a policy, the relations
Since K ( r ) 5 V for any policy r in A., the inequality (2.6) and the fact
7 > 0 readily imply that
J b ( P ) ) = (1 - p)J(g) + PJb)
(2.12a)
+ +
J(r*) 1 J ( r ) 7 ! K ( r ' )- K ( r ) ] = J(r) 7 ( V - K ( r ) )1 J ( s ) (2.7)
and
foreverypolicy r in A., whence the policy r* solves theconstrained
K ( r ( p ) ) = (1 - p)K(g) +P m ) (2.12b)
optimization problem (CA.). hold true.
From this discussion, it should be clear to the reader in what sense Now, under the non-degeneracy srrsumptionKfg) # K ( g ) ,pose p' :=
the Lagrangian problems { (LP,), 7 > 0) are useful for solvingthe original [ V - K ( g ) ] / [ K ( g-K(g)j
) 2 0,(owing to (2.9)). Observe from (2.12) that
constrained problem. Indeed, as the arguments given above indicate, any the policy r[p*) steers T2.2) to the value V , yields (2.1) and (2.2) aa limifs
policy r* in P which and that B7 (r(p)) = (l-p)BT'(g)+pB,'(g) = BT'. In other words, the
( E l ) : ylelds the expressions J ( r ' ) and K ( r ' ) as limifs, policy ~ ( p ' )meets the requirements (Rl)-(R3), and consequently solves
(R2): meets the constraint with K ( r ' ) = V , and the constrained problem (C&). It should be noted, in contrast with the
(R3): solves the unconstrained MDP (LP,)for some 7 > 0, policy f' obtained by randomization in Section 2.3, that the evaluation
necessarily solves the constrained problem (CA.). This approach can be of the optimal mixing parameter p' is immediate, if the values K(g) and
used either directly on specific problems mutatis mutandis, as illustrated K ( g ) are available.
in Sections 4 and 5 , or it can provide a convenient theoretical framework 3. MORE ON C O N S T R A I N E D O P T I M I Z A T I O N
for establishing general results on the existence and structural form of
S.l Generalizations
solutions to the constrained optimization problem.
It is possible to generalize the discussion of Section 2 in several di-
To simplify the discussion, it is convenient to assume that S is finite rections:
and that the controlled chain has a single ergodic class under each policy
f in 7 . In that case,underanypolicy f in 7, the expressions (2.1), Firstly,it is of interesttoconsidersystemdynamicswheremore
(2.2) and (2.5) exist srr limits and are independent of the initial condition. complex probabilistic mechanisms areallowed for state transitions and/or
Moreover, it is well known that an optimal policy for problem (LP,)can where the stateprocesses live in more general spaces. Beutler andRoss 141
always be selected to be a pure strategy in 5. This follows by atandard consider a version of (C&) for general semi-Markov decision processes
arguments based on the corresponding Dynamic Programming equation withfinite state spaceandcompactactionspace.Nainand Ross 1211
(2.8), which states here that for all z in S,the relation study a specific constrained MDP with countable (non-finite) state space.
In both cases, similar results on the structure of the Constrained optimal
policyarereported. However, more work seems needed M no general
theory is available to date in the case of non-finite state spaces.
Secondly, aa pointed out in the introduction, the main practical mo-
tivation for studying constrained optimisation problems arises from the
holds for some real constant BT and some mapping h':S * lR. Their desire to handlesituationswithmultiple (con5icting)objectives. The
existence is guaranteed under the assumed conditions 1131, with Bl iden- next natural step would consist in formulating constrained MDP's with
tifiedwith the optimal value of problem (LP.,).It is also well known multiple constraints as a nonlinearprogmmming problemin the space
that if 5' denotes the class of policies g'f in 5 with the property that for of policies. More precisely, with the notation of Section 2, let J ( r ) be
each z in S,the action u = 9 7 ( z ) attains the maximum in the Dynamic the costassociatedwith the policy r in P , andlet IC'(%) denote the
Programming equation (2.8), then any policy in 5' is optimal for the correspondingvalue of the p h Constraint, 1 -< i <
I. Forevery vector
problem (LP,). V = (v~,...,v,)in E?, pose
2.5 Optimality via randomized policies
In [26], it is shown under the simplifying assumptions stated earlier,
A. := {r in P :* ( x ) < vi, 15 i 5 I } (3.1)
that whenever the problem is 'feasible', there exists aconstrained optimal
and define the multiple constraint problem (CA.) as in Section 2. Very
policy with a simple structure. little is known on the existence andstructure of optimal solutions for prob.
Theorem [26]: I f i o r some 0 < 7 < 00, atleastonepolicy g' in, 5' lems of this type. This probably could be traced back to the fact that
has theproperty that K(g") 5 V , then there ezists a constrainedoptrmal the relationship of the corresponding Lagrangian problem to the original
policy 'f in 7 defined by a simple randomization between two pure Markov problem (CA.) is far more subtle in the multiple constraint situation.
stationary policies g and g in 5. In fact, as of the writing of this paper, it is not clear on how to obtain
995
in general an optimal policy through randomization and/or mixing pro- are used for every J in M . The discrete-time axis is divided into contigu-
cedures similar to the ones presented in Section 2. Results are available control framer; the (J + 1)": such control frame starts upon comple-
ous
only in particular instances; Altman and Shwartz [I]establish existence tion of the n(J)'* cycle and is made up of r ~ + E~J + +~cycles.
~ The policy
of a solution t o a problem with multiple constraints for the competing a ~ s ( p is) defined as the policy in P that during the J"' frame operates
queue model considered by Nain and Ross [21]. policy g for cycles, and then policy for EJ cycles, J = 1 , 2 , . . .. Under
the condition (3.3), well-known properties of first return times for Markov
5.2 I m p l e m e n t a t i o n issues chains readily imply that
Even if the policies g and 3, and the value 7' were readily available,
the computation of the optimal bias q* may prove to be a non-trivial
task, for it requires solving forq in the implicitequationK ( f , ) = V on the
interval [0,1], and makes it necessary to evaluate theexpression K ( f , ) for
each 0 5 q 5 1. Both steps usually turn out to be highly difficult ones in and
many applications, and are oftenpossible only via numericalmethods; this
difficulty is clearly illustrated on the competing queue model discussed in
Section 4.
The optimal bias q' acts as an internal parameter; it is available in
principle if the etternd (or model) parameters (i. e., the entries in the where the second equality is justified through (2.12) by the definition of p'.
kansition matrix) are known, but may not be easily available in practice In fact, the convergence (3.5) takes place for the sample averages as well.
,wing to computationaldifficulties. Of course, this difficulty of implemen- a
Note that if were rational, say of the form p' = &
for some integers
;ation is further compounded when some of the external parameters are -n and E, then the conditions (3.3) would be automatically satisfied upon
lot known, since 'on line' identification of the external parameters does choosing E, = 11 and E, = E for all j in mT.
not provide a feasible means to evaluate q'. In any case, this points to Thereader willreadily see that the time-sharing implementation
adaptive methods for directly estimating q', now treated as an unknown a ~ s ( p ' )of the optimal mixing policy ~ ( p ' ) , satisfies the requirements
parameter, and this specifically for the purpose of generating an optimal
'RI)-(R3) and thus solves the constrained problem (CS.).
control; in the terminology of adaptive systems, this is referred to as di-
rect adaptive control 171. This suggests broadening the notion of adaptive In Section 4, this approach is shown to beuseful for identifying a
control for Markov chains to view it as a procedure for recursivelyup- solution to a multiple constraint problem.
dating thecontroltomeettheperformance criterion. Although this is a 5.4 A Certainty Equivalence implementation
well-known problem in the general theory of adaptive systems, it seems A possible solution to the difficulties mentioned in Section 3.2 would
to have not been studied much in the context of MDP's, at least to the be to estimate directly the optimal bias q* and then use the Certainty
authors' knowledge. Equivalence Principle at each step. Moreprecisely, this suggests using
The reader's attention should be drawn to the fact that direct a d a p a possibly recursive estimation scheme that generates a sequence of bias
tive control ideas, with q' regarded as an unknown parameter, do not values {q(n)}? converging to q'. At step n, the RV q(n) constitutes an
always lead to implementable policies. This was illustrated by Shwartz estimate of the bias value q*, which is thus interpretedas the (conditional)
and Makowski 128,301 on the competing queue problem of Section 4. probability of using g (given available information H(n)), and it is thus
The implementation issues discussed above can be addressed in the natural to select the control action U(n) according to jq(,,); the adaptive
somewhat more general context of steering the cost (2.2) to a prespecified policy so generated by the sequence {q(n))}? is denoted by a.
value V : Given is a parametrized family {jp, 0 5 q 5 I} of Markov There are as many such adaptive schemes as there are schemes for
stationary policies, and assume, with the notation g := jo and 3 := j1, estimating the optimal bias value q'. In each specific case, optimality of
that K ( g ) 5 V 5 K(3). The problem is then to finid a policy 'f in the the adaptive policy a will be concluded if it can be established that policy
parametrized family { j,, 0 5 q 5 I} that steers the cost (2.2) to the value 0
V , i. e., K(j') = V. If the mapping q + K(j,) is continuous, this can (Al): yields (2.1) and (2.2) as limits,
be achieved by selecting f' to be jp, with the bias q' being determined (A2): meets the constraint, i. e., K ( a ) = V , and
as the solution to the equation K(f,) = V, 0 5 q 5 1. Although most (A3): yields the same cost as f', i. e., J(a)= J ( f * ) .
of the ideas discussed in this paper apply mutatis mutandis to this more This is done via a separate analysis and typically proceeds by showing
general situation, the discussion will be carried out only in the context of that
constrained MDP's for sake of clarity. li-Ifqcn)(x(n)) - f*(x(4)1= 0, (3.6)
5.5 A time-sharing implementation
the convergence taking place under Pa either almost surely or in probabif-
Although the optimal mixing policy r(p') of Section 2.4 has a very
ity, a property which readily follows from the (weak) cowutency of the
simple structure,it is not stationaryand ergodic. Indeed,under such
estimation scheme, Under (3.6) and possibly additional structural model
a policy ~ ( p ' ) , the sample averages corresponding to K(r(p')) do not
assumptions, a method of proof due to Mandl 1191 can be extended to
satisfy the constraint since on a set of probability p* (resp. 1 -p*), these
show that J(a)= J ( f ' ) and K ( a )= K(f'). Examples of this approach
limits will be K ( g ) (resp. K(g)). This somewhatunappealing feature
are given in Sections 4 and 5.
can be eliminated through the following time-sharing implementation of
mixing policies. Assume the existence of a privileged state to which the At this point, the reader may wonder as to how such an estimation
system returns to infinitely often under eachone of the policies g and scheme is selected.
3, and define a cycle as the time T betweenconsecutive visits to that (i): Sometimes, it is feasible to compute the optimalbias q* as a func-
state. Denote the expectation of a cycle duration under policies g and 7 tion q'(8) of the external parameters 8. In that case, the designer may
by EP-(T) and Er(T) , respectively. For every p in the unit interval [ 0 , 1 ] , want to consider using the Certainty Equivalence Principle in conjunction
the mixing policy r(p)has a time-sharing implementation a ~ s ( pwhich ) with a parameter estimation scheme, say based on the Mazimum Likeli-
is nowdefined:Let bethe element of [0,1]
. . uniquelydefined through hood Principle. This approach is illustrated in Section 5 on a problem of
the relation 00w control.
$ET(T)
P = ( 1 - p')EP(T)+ p'rn(T) (ii): In many applications, the function q + K ( f , ) turns out to be
continuous and strictly monotone, say increasing for sake of definiteness.
and consider two sequences of non-negative integers {gj}T and {E,}? In that case, the search for q* can be interpretedas finding the zero of the
with the property that continuous, strictly monotone function K(f,) -V and this brings to mind
ideas from the theory of Stochastic Approzimations 1241. Here, this circle
(3.3) of ideas suggests generating a sequence of bias values {q(n)}? through
the recursion
where the notations
J J
q(n + I) = [ q(n) + an(V - c ( x ( n + I), v ( n + 1))) 1; n = 12 2 2 ' . .(3.7)
n(J):= Enj, E(J):= ZEj and n(J) := n(J)+E(J) (3.4)
with q(1) given in [O,lJ. Here [z]: := 0 V (z A 1) for all z in E3 and the
j= 1 j= 1
996
sequence of step sizzs {a,,}? satisfies and solving the constrained problem ( C h )reduces to finding a policy
m m
r* in P which meets the requirements (Rl)-(R3).
0 < an 10, an = CQ, I an+1- an I< 03. (34 If the constraint is not satisfied while giving highest priority to the
n= 1 n= 1 constrainedqueue(i. e., the Oth queue), then there is no solution.On
the other hand,if the constraint is satisfied while giving lowest priority to
The corresponding policy CZSA is structurally simple and easy to imple-
the constrained queue and ordering the other queues according to the pc-
ment on-line; this simplicity of implementation is derived from the fact
rule, then this policy is optimal for the constrained problem ( C h ) . The
that the difficult step of directly solving for q* is completely bypassed.
proof in the remaining case is available in 1211, the main idea being that
4. O P T I M A L R E S O U R C E A L L O C A T I O N each one of the problems (LP,)is solved by a &ed priority assignment
4.1 Model policy which admits a description as the pc-rule based on p07 and prck,
+
Consider the following system of K 1 infinite-capacity queues that 1 s k I K .
compete for the use of a singleserver: Time is slotted and the service The dynamics of this problem weregeneralizedbyNain and Ross
requirement of each customer corresponds exactly to one time slot. At to a semi-Markov decision process 1221. Another generalization, given by
the beginning of each time slot, the controller gives priority to one of Altman and Shwarts [l],involves a more general constraint (4.3) associ-
the queues. If the kth queue is givenservice attention during that slot, ated with an instantaneouscost d: I N K + 1 + lR which is also an arbitrary
then with probability p k the serviced customer (if any) completes service linear function with positive coefficients (asin (4.2)).
and leaves the system, while with probability 1 - pk, the customer fails Theorem 111: If the problem is feasible; then there ia a constrained opti-
to complete service andremains in the queue. The arrival pattern is mal policy fq. which randomizes between twowork-conserving static pri-
modelled as a renewdprocess, in that the batch sizes of customers arriving ority assignment policies.
into the system in each slot are independent and identically distributed
from slot to slot. Under these assumptions, the evolution of the system -
4.S Single constraint An adaptive implementation
is fully described by anINK+'-valued process { X ( n ) } y , with x&(.), Even with (4.3), the function q + K(fp) is not easy to compute, in
0 < k 5 K ,representing the number of customers in the kth queue at the spite of the linearity of the instantaneous cost. Indeed, as pointed out by
beginning of the slot [n,n 1). + Nain and Ross [21], computing the quantityK(fp) amounts t o studying a
coupled-processor problem whose solution can be obtained viaa reduction
The mean number of customers arriving to the kth queue is denoted
to a Riemann-Hilbert problem [e].
by x k . Define the traffic intensity p := 2
and assume henceforth
The stochastic approximation algorithm thatgenerates the sequence
that p < 1; this guarantees system stability [2].
of bias estimates {q(n)}y here takes the special form
For some mappingc: INK+' -+ R2, the cost to be minimized is defined
bv q(n+l)=[q(n)+a,(V-Xo(n+l))]~ n = 1 ,2 I. . . (4.5)
1 "
J ( r ) := G"--E* c(X(t))
?I t=1 with q(1) given in [ O , l ] .
for every policy r in P. A special case abundantly treated in the literature For K = 1, thereare onlytwo fixed priorityassignment policies,
is the one where c is linear and positive, i.e. for all z in MK+', and therefore no a priori knowledge of the various statisties is required
K
in order to implement this algorithm. As such, the proposed policy is
implementable and constitutes an adaptive policy in the restricted tech-
C(.) = CkZkr (4.2)
k=O
nical sense understood in theliterature on the nowBayesian adaptive
with Ck 2 0,O < k < K . For this case, several authors have discussed controlproblemfor Markov chains [ l l ] . However, for K > 1, OSA is
the problem of selecting a service allocation strategy that minimizes (4.1) implementable only if - g and have been determined.
over the class P of all admissible service allocation strategies [3,5]. They The basic results assume finite third moments on the initial queue
all show the optimality of the pc-rule, i. e., the fixed priority assignment sizes and on the statistics of the arrival pattern. Under these additional
policy that orders the customer classes in increasing order of priority with conditions,
<
the values p k C k , 0 k 5 K.
Theorem [SO]: The sequence of biases {q(n)}y converges in Probability
4.2 Single Constrained queue (under P a * ) t o the optimal bias 9'.
In [ Z l ] , Nain and Ross considered the situation where several types The derivation of this result makes combined use of results by Kush-
of traffic, e.g., voice, video and data, competefor the use of a single ner and Shwarts [141 on the weak convergence of Stochastic Approxima-
synchronous communication channel. They formulate this situation as a tions via ODE methods and on moment estimates obtained for the queue
+
system of K 1 discrete-time queues that compete for the attention of a size process in [30]. The condition (3.6) now holds and implies
single server, and solve for the service allocation strategy that minimizes
the long-run average of a linear expression in the queue sizes of the K T h e o r e m [SO]: The policy OSA solves the constrained optimization prob-
customer classes { 1, . ', K} under the constraint that the long-run aver- lem ( C h )with J ( C Z S A=) J(r)
and K ( c z ~ =
A )K ( f ' ) .
age queue size of the remaining customer class 0 does not exceed a certain 4.4 Multiple constraint -
A time-sharing implementation
value V. Thus for any policy r in P, define J ( n ) by (4.1) with c given by In this section, a special version of the multiple constraint problem
(4.2) where co = 0, and pose ( C h )is studied. For every policy c in P, pose
1 n
K ( r ) := L n - E * C X o ( t ) .
t=1
Nain and Ross [21] extend some of the optimality results from [3,5]
to show that if the constraint can be met in a non-trivial fashion, then an and for every vector V = iV1, ' . ', VK) in R K ,consider the constrained
optimal policy with a very simple structure can be identified. problem
Theorem (211: If the problem is feasible, there ezists a constrained opti- ( C h ): minimize KO(*) over & , (4.7)
mal policy j' in j r which randomizes between twowork-conserving static
priority assignment policies.
with A. defined by (3.1).
997
the outcome of the throw is the Ph side, 1 5 1 I L. For every p in the respectively, for every admissible policy x on P .
L-dimensional simplex, the relations A threshold policy is a Markov stationary policy in 7, with a sim-
ple structure determined by two parameten L and q in M and [0,1],
L respectively,whenceathreshold policy is denotedhereafter by ( L , q ) .
K ' ( 4 P ) ) = CPtK'(Pi),0 I i I K, (4.8) According to the threshold policy (L, q ) , an incoming customer is admit-
I=1 ted or rejected wether the queue size is < L or > L;if the queue size is
exactly L,a biased coin with bias q is flipped and the outcome then de-
hold true and suggest the following Linear Program (LP),
where
termines whether or not the incoming customer can access to the queue.
L The adopted convention interprets q as the (conditional) probability of
Minimise XZ~K"(~) (4.9a) accepting an incoming customer.
1=1 The policy that admits every single customer is denoted by (m, 1). If
K ( ( m ,1)) 5 V , then the problem (CR.)corresponding to (5.1)-(5.2) is
subject to the constraints trivially solved by (m, 1). On the other hand, if K((m, 1)) > V , then the
problem has a non-trivial solution, which can be shown to be of threshold
L
type. This result is derived again through a Lagrangian argument: For
C~,K'(Oi)
I vi, 0 I i I K, (4.9b) 7 > 0, let the mapping b7: M -+ IR be given by b,(z) = pl(z # 0) - 7 2
I=1
and define for every policy r in P , B'(r) as in (2.5). The unconstrained
and MDP (LP,)is nowdeEned as Section 2, and as inpreviousinstances
L where this technique was used, solving the constrained problem (CR.)
O I z ~ < l , l < l < L , and cz1=1. (4.9c) reduces to finding a policy x* in P which meets the requirements (R1)-
l=1 (R3).
The relationship between the Linear Program (LP) and the original con- The unconstrained Lagrangian problem(LP,) can be solved through
strained problem (CA.) is formabed in the following result. a tedious Dynamic Programming argument that shows concavity of the
Theorem [l]: If the Linear Program (LP), hos a rolution, say p', then corresponding value function.
the policy ~ ( p ' )solves the multiple conatraint problem (CR.).Conversely, Theorem [17]: For every 7 > 0, there enbts a threshold policy (L,, q,)
if problem (CA.) hoa a aolution, then it can alwayr be implemented by a that 80[Ve8 the unconstrainedLagrangian problem (LP,). Moreover,for
min'ng policy. eachthreshold value L i n I N , there always e m i t s 7 ( L ) > 0 so that any
The solution of problem (CR.)via mixing policies has the clear ad- thresholdpolicy ( L , q ) , with q arbitrary in [0,1],solves the Lagrangion
vantage of requiring only the solution of the Linear Program (LP); this problem (LP,(,,)).
is to be contrasted with the difficult queueing problem that is required Threshold policies are thus unconstrained optimal. For any thresh-
to find the optimal randomized policy [21]. A dynamic or time-sharing oldpolicy ( L , q ) , thequantities (5.1) and (5.2) exiat as limitr. More-
implementation can also be provided in the spirit of Section 3.3. Here the over, K ( ( L , q ) )increases as L increases, whereas for fixed L in M , the
empty state is taken to be the privileged state and a cycle thus coincides mapping q + K ( ( L , q ) )is continuow and strictlymonotoneincreas-
with a bwy cycle for the queueing system. For sake of brevity, the dis- ing on [O,l]. Since K ( ( c q 1 ) )> V, thereexists ' L in M such that
cussion is restricted to the case where the element p in the L-dimensional K((L',O)) < V 5 K((L*,l)),andthecontinuityand the strict mono-
simplex has all its components rational with pi = R,
1 5 1 5 L,for some tonicity of q + K ( ( L ' , q ) ) thenimply the existence of q* such that
+
action implemented in the slot [n, l),the arrival during that slot and [22) P. Nainand K. W. Ross, 'Optimal multiplexing of heterogeneous
service completion at the end of that slot, respectively. traffic with hard constraint", Proceedings of the Joint Performance
Using the Strong Law of Large Numbers and results from the theory '86 and ACM Sigmetrics Conference, Raleigh, North Carolina, May
of Large Deviations, the appropriate versionof condition (3.6) is shown to 1986.
again take place [la], the rate of convergence beingezponentially fast; this 1231 M. Reiser, 'Performance evaluation of data communication systems',
last fact is crucial t.0 the basic result, which is obtained under a fourth Proc. IEEE, vol. 70, no 12, pp.171-196 (1982).
moment assumption on the initial queue size. [24] H. Robbinsand S. Monro, 'A stochasticapproximationmethod",
Theorem [MI: The policy CYML solves the constrained optimization prob- Ann. Math. Stat. vol. 22, pp. 4W407 (1951).
lem ( C h )with J ( a M L ) = J ( f * ) and K ( U M L=) K ( f ' ) . [25] Z.Rosberg, P. V. Varaiya and J. Walrand, "Optimal controlof service
It is not clear at this time on how to design direct adaptive control intandemqueues",IEEETrans.Auto.Control, vol. AC-27, pp.
schemes when the optimal threshold t 'is not available. A stochastic a p 600-610 (1982).
proximation algorithm for generating recursively a sequenceof estimates 1261 K. W.Ross, Constrained Markov DecMion Processes with Queueing
for '
L similar to the one used for q' naturally comesto mind; however, the Applications, Ph.D. thesis, Computer, Information and Control En-
corresponding scheme fails to work owing to its sensitivity to variations gineering, University of Michigan, 1985.
of the integer-valued threshold [la]. [27] D. Serfozo, 'An equivalencebetweendiscreteandcontinuoustime
REFERENCES Markov decision processes", Oper. Res. vol. 23, pp. 618620 (1979).
[I] E. Altman and A. Shwarts, "Optimal priority assignment with gen- [28] A. Shwarts and A. M. Makowski, 'An optimal adaptive scheme for
eral constraints', submitted, 24th Allerton Conference on Commu- two competing queues with constraints", 7th International Confer-
nication, Control and Computing, October 1986. ence on Analysis and Optimization of System, . . 515-532, Antibes,
pp.
France, 1986:
[2] J. S. Baras, A. J. Dorsey and A. M. Makowski, "Discrete time com-
peting queues with geometric service requirements: stability, param- A. Shwartsand A.M. Makowski, 'Adaptivepoliciesforasystem
eter estimation and adaptive controln,under revision, SIAM J. Con- of competing queues I Convergence results for the long-run average
trol Opt. cost', in preparation (1986).
[3] J. S. Baras,D.-J. Ma and A. M. Makowski, 'K competingqueues A. Shwarts and A. M. Makowski,'Adaptive policies forasystem
with geometric service requirements and linear costs: The pc-rule is of competing queues I 1 Implementable schemes for optimal server
always optimal", Systems k ControlLetters,vol. 6, pp. 173-180 allocation", in preparation (1986).
(1985). S. Stidham, Jr., 'Optimal control of a d h i o n to aqueueing system",
IEEE Trans. Auto. Control, vol. A C 3 0 , pp. 705-713 (1985).
141 Beutler and Ross, 'Time-average optimal constrained semi-Markov
decisionprocesses",Adv.Appl.Prob. vol. 18, pp. 341-359(1986).
[5] C. Buyyukkoc, P. Varaiya and J. Walrand, "The cprule revisited",
Adv.Appl. Prcb. vol. 17, pp. 234-235(1985).
161 G. Fayolle and P. Iasnogorodski, 'Two coupled processors: The re-
duction to aRiemann-Hilbertproblem", Z. Wahr. verw. Gebiete,
V O ~ . 47, pp. 325-351 (1979).
999