Professional Documents
Culture Documents
Optimal Routing To Two Parallel Heterogeneous Servers With Resequencing
Optimal Routing To Two Parallel Heterogeneous Servers With Resequencing
0018-9286/91$01.00 0 1 9 9 1 IEEE
~
'AYOUN AND ROSBERG: OPTIMAL ROUTING TO TWO PARALLEL HETEROGENEOUS SERVERS 1437
of fixed-position routings. Also, as it will become apparent, it length of queue Q only. This subclass will be referred to as
is not optimal to keep server 1 idle if queue Q is not empty, the resequencing-invariant class. A further simpler subclass
and therefore the requirement of J 2 2 does not exclude the are the threshold policies.
head of the line. A policy t , is a threshold policy with level m 2 J if i)
Let X ( t ) be a tuple denoting the state of the system at the first customer in queue Q is routed to server 1 whenever
time t (to be defined below) and I X ( t )I be the number of he becomes free; ii) the customer from position J is routed
customers in the system at that state. A routing policy a is to server 2 when and only when he is free and the number of
any rule that at every time t 2 0 decides, on the basis of past customers in server 1 and queue Q is at least m.
states and of past decisions up to time t , which idle servers to One result of this study is that the optimal policy can be
activate. Policies may leave a server idle even when there is a taken within the resequencing-invariant class. Another result
customer in the corresponding position. is that for a certain range of positions J , the optimal policy
With a holding cost accrued at a fixed rate of 1, the can be taken within the threshold class. We also show that
long-run average cost associated with the policy a is then there is a preferable routing position J,.
defined by For the routing problem without resequencing delays, the
def
JT( x ) = limsup -E,"
T- m [LT I I X ( t ) I dt ,
routing position J is irrelevant since service requirements
are identically distributed. This problem was first studied in
[7], where it was conjectured that the optimal policy would
for every state x (1) be of threshold type. In [l], a version of the problem with N
where E,"[.] denotes the expectations with respect to the servers was considered under the assumptions that the system
probability measure induced by the policy a on the process has an initial load of n customers and no new customers
X = { X ( t ) , t 2 0 ) starting in state x. A routing policy a* enter the system, i.e., X = 0. A simple policy which mini-
is optimal if it minimizes (l), i.e., if mizes the expected flow time has been determined. This
J&) 5JTW optimal policy has the following simple form [l]:
forl<jrN,
for every policy a and state x.
set
For the exponential system considered here, the optimiza-
tion problem associated with (1) falls within the purview of (4)
continuous-time Markov decisions processes which are uni-
formizable, i.e., which are equivalent to uniformized dis- and define R , = 0. If there are n customers that remain
crete-time Markov decisions processes [6]. The reader is unprocessed and server j is the fastest server available (i.e.,
referred for details to [4], where the same problem without with the largest p j ) , then the idle server j is activated-and a
resequencing delays is studied. To define the discrete-time customer dispatched to it-if and only if n > R j .
decision process, consider that at any given instant, each The conjecture from [7] on the threshold form of the
server is working either on a real customer, if activated, or optimal policy was settled in the affirmative in [4] for N = 2.
on a dummy customer otherwise. Dummy customers always Using policy iteration, it has been shown that the optimal
return to queue Q upon completing service and incur no policy is of threshold type with threshold level R(X) (which
contribution to the cost. Transitions are associated either with depends on A). It was also conjectured there that as Xl0,
arrivals or service completions at one of the servers of a R(X) increases and converges to R , given by (4). In [12],
customer-either real or dummy-determine free transitions. simple stochastic coupling arguments were used to prove the
These free transitions occur according to a Poisson process of optimality of the threshold policy for N = 2. Motivated by
+
rate X p . A (free) transition due to an arrival occurs with the conjecture made in [4], it has been shown in [lo] (for a
+
probability X/h p , whereas a transition due to a service general number of parallel servers) and in [8] (for two
servers) that the threshold policy above for X = 0, is also
+
completion at server i occurs with probability p l / h p. If
in state x before a transition, the process will jump after this optimal for small enough values of the arrival rate A.
In light of the results above, one is naturally led to explore
transition to a state which depends on the current state x and
on the action taken under the policy a in use. The cost the idea that when resequencing delays are introduced, the
function for using policy a which corresponds to (1) is then optimal policy would also be of threshold type. We settle this
question in the affirmative only for J > J,.
given by
def 1 The issue of resequencing delays in this context has been
~ ~ (= xlimsup--E:
)
N-m N [m:o
IX(m)( ,
] X E S (2) first introduced in [3], where queueing statistics have been
evaluated under the class of fixed-position threshold policies.
where X ( m ) now denotes the state sampled at the mth It has been further shown there, that for a given threshold
transition. We also need the total /3-discounted cost (0 < /3 level m , there is an optimal position J* from which one
< 1) associated with the policy a, which is defined by should route customers to server 2. This position is given by
m
v:(x)Ef~:
[ m=O
/3"1x(rn)\
,
1 xes.
In other words, when a customer has to be routed to server 2 which will be referred to as the state of the resequencing
according to the threshold policy t,, then the best fixed queues. Finally, let k ( t ) be the highest position of the
position is the nearest to J,. This property of J,, will be customers in { i,(t) 11 Im IJ - l} that would delay the
referred to as its “optimality property. ’ ’ customer being served by server 2 during the tth transition,
Reviewing the optimality property of J , for a threshold if he completes his service immediately. If there is no such
policy, and considering the fact that threshold policies may customer in { i,( t ) I 1 5 m IJ - l}, or if server 2 is idle,
not necessarily be optimal, we are intrigued by another then k ( t ) = 0.
question, whether J, has the optimality property for a more The variable X ( t ) = ( n ( t ) ,e l ( t ) ,e,(t), R ( t ) ,k ( t ) ) is a
general class of policies. We will show that this is indeed the natural state variable that may assume values in S = Jl/x
case. {0,1}2 X M J x {l;.., J - l}, where A’={0,1;..,}.
The paper is organized as follows. In Section 11, we define To describe the transitions of the process X it is useful to
the state space and the transitions under fixed-position rout- define the transformations
ings. Section I11 is subdivided into two parts. In Section
A , D,, D,: S -+ S
III-A, we show that the faster server should be kept active as
long as the service queue is not empty. In Section 111-B which that describe the states to which the process will jump from
is further subdivided, we consider the optimal control of the state x, when a free transition occurs. These transformations
slower server. In Section 111-B-l), we show that the optimal correspond to an arrival, a service completion at server 1 and
control is independent of the state of the resequencing queues. a service completion at server 2, respectively. For the formal
In Section 111-B-2), we show the “optimality property” of definition we need the following notations.
position Jo, and in Section 111-B-3) we show that for J > Jo , A state X E S stands for a tuple x = ( n , e,, e 2 ,R , k ) ,
the optimal policy is of threshold type. where R = (I, (II, * , 1,- - with the understanding that I ,
customers in queue R2, 1 Im 5 J - 1, are being delayed
11. THESTATEPROCESS AND BASIC
DEFINITIONS by customer .,i For every 0 Ik 5 J - 1 and e2 E {0,1}
RESULTS denote
In this section, we define the states and the transitions of
the Markov decision process that describes our routing prob-
lem and examine its state evolution.
A . States and Transitions
We start with the state definition. After every transition t , I if k = Oand e2 = 1,
t = 0 , 1, * , in the discrete-time decision process, let n( t )
denote the number of customers in queue Q, and e i ( t ) ,
i = 1,2 denote the state of server i (with the understanding
that ei(t) = 1 if server i is busy, and e i ( t ) = 0 otherwise).
To describe the resequencing queues R 1 and R 2 we need the
following notion.
[ ifk=O.
We say that customer i in a resequencing queue is being The transformation SL, defines the state that queues R1
delayed by customer k, if: and R 2 would jump to from state x, when server 1 would
i) customer k , did not finish service; complete service of a real customer. Observe that by defini-
ii) k, < i; tion, if k = 0 and e, = 1, then the customer that is being
iii) k , is the maximal k that satisfies i) and ii). served by server 2 is the “oldest” in the system. Otherwise,
Thus, customer i is released immediately after the service the customer that is being served by server 1 is the “oldest.”
completion of customer k,. Thus, if k > 0 or e2 = 0, when server 1 would complete
Let I( t ) be the number of customers in queue R 1 (after the service of a real customer, this customer and those in queue
tth transition), that are being delayed by the customer which R 2 which are being delayed by him, would leave the system.
is being served by server 2. Here, I( t ) = 0 if e,( t ) = 0. In this case we necessarily have 1 = 0. If k = 0 and e, = 1,
Also(seeFig. l),denoteby i l ( t ) < i 2 ( t )< ; - * , < iJ-l(t), we necessarily have R = ( I , (0, .,0)), and the customer
the J - 1 customers with the lowest sequence numbers that would finish service in server 1, would join queue R 1.
among those in queue Q and server 1 after the tth transition. (These observations are proven in the next section.)
The number of customers in queue R2 that are being delayed The transformation St defines the state that queues R1
by customer i,(t), 1 5 m 5 J - 1 is denoted by I,(t). and R 2 would jump to from state x, when server 2 would
Observe that the customers in R1 can be delayed only by complete service of a real customer. Recall that for k = 0
the customer which is being served by server 2 , and those in and e2 = 1 we necessarily have R = (1, (0,* * * , 0)). There-
R 2 by one of the customers in { i,(t) I 1 5 m I J - 1). fore, when the customer that is being served by server 2
(These are formally proven in Section 11-B below .) would finish service, he and the customers in queue R1
The lengths of the resequencing queues are determined by would leave the system. If k > 0 and e, = 1, then the
the tuple customer that would finish service in server 2, would join
R ( t ) = ( I @ ) ,( 4 ( t ) , * - L* ,( t ) ) ) queue R 2 and would be delayed by customer ik.
AYOUN AND ROSBERG: OPTIMAL ROUTING TO TWO PARALLEL HETEROGENEOUS SERVERS 1439
Now, the free transitions of process X from state X E S The customer in the sth position in queue R 2 at
(when no routings are made), are as follows. time t.
The customer in the sth position in queue Q at
A ( x ) = ( n + 1, e , , e 2 , R , k ) ,
time t.
The number of customers in queue R1 at time t.
The number of customers in queue R2 at time t.
The customer which is being served by server 1 at
if e2 = 0; time t , or 0 if the server is idle.
Ddx) = The customer which is being served by server 2 at
( n , e,,O, S:(R),O), if e2 = 1
time t , or 0 if the server is idle.
where x+ = max { 0, x } . The probabilities that a free transi- Lemma 2.1: At every time t and for every occupied
+
tion A ( x ) ,D , ( x ) ,or D 2 ( x )occurs, are h/X p , p1 / A + positions s and p , or, respectively, 1 and J, in the corre-
p , and p z / X+ p , respectively. sponding queues
Here, it is convenient to identify a stationary policy a with
a function a: S { Ph, P I ,P z , Pb} as follows. Assume that
+
a) 4 p ( t > < 4 s ( t ) , for P < s;
a free transition-either an arrival or a service comple- b) r j ( t ) < r:(t), for p e s;
tion-occurs that would make the state jump to x E S if no c) r,2(t) < r,2(t),for p < s;
action were taken. The policy a uses at state x an operator d) qs(t') Iqs(t),for t' < t ;
Po, a E { h , 1,2, b } , that makes the state jump instanta- e) r,'(t) < l(t) < q l ( t ) for l ( t ) > 0, and r i ( t ) < q , ( t )
neously from x to Po(x ) , where for l(t) = 0;
P h ( X )= x ; f) r:(t) < 2(t) < q J ( t )for 2(t) > 0, and r:(t) < q J ( t )
for 2(t) = 0;
Pl(n,O,ez,R,k)=(n-l,l,ez,R,k), n i l ; g) 2(t) < r,'(O;
P 2 ( n , e , , 0R
, , O ) = ( n - l , e l , l , R , J - 1) n11; h) There exists a p , 1 Ip IJ - 2, such that l ( t ) e
P,(n,O,O, R,O) = ( n - 2 , 1 , 1 , R , J - l ) , n i 2.
r:v> or q p ( t )m.
Proof: Properties a)-f) are direct consequences from
The operator Ph does not route any customers, P , routes the the facts that customers join at the end of the queues and are
customer from the head of the queue to server 1, Pz routes being dispatched from fixed positions.
the customer from position J to server 2, and Pb does P , Property 8): Customer r,'(t) is being delayed by a lower
and P2. (Notice that from the way we define the position J , customer. From properties a), b), and e), it could only be
the order in Pb is irrelevant.) customer 2(t). Thus, 2(t) < r i ( t ) .
Property h): Similarly, for customer r:(t). From proper-
B. Basic Results ties f) and a) it could only be one of the customers in
Since the cost function is linear in the state variable and the -
{ 1 ( 0 , 41(0, * qJ-*(f)l. 0
total number of customers in the system changes by at most In the next lemma we show that the two resequencing
one at every transition, it is well known that an optimal queues cannot be nonempty at the same time.
policy exists for the 0-discounted problem (associated with Lemma 2.2: At every time t , at least one of the queues
(3)), and that it can be taken in the class of Markov stationary R1 or R 2 is empty.
policies [ 111. One of the conclusions of this study is that the Proof: Suppose that n l ( t ) > 0 and n,(t) > 0 for some
exact same result also holds for the long-run average cost t. As in the proof of Lemma 2.1 g), n,(t) > 0 and a), b),
criterion (2). Furthermore, for every stationary policy a,the and e) of Lemma 2.1 imply that 2(t) > 0. Hence, from
limit in (2) exists and is independent of the initial state x . Lemma 2.1 f) and g)
Without loss of generality we may assume that X p = 1. +
Under any stationary policy a, the forward equations of r ? ( t ) < 2(t) e r i ( t ) . (7)
v!(x) are
However, from part e) of the Lemma, r : ( t ) < l ( t ) < q , ( t ) ,
v*8(4= I X I + P[XKYa(A(x))) and from part h), r : ( t ) < r:(t), which is in contradiction
+ P I v,B(a(D,( 4 ) )+ P z v !( a x N l ( 6 ) with (7). 0
The following lemma asserts that at every time t , any
customer in queue R1 is being delayed by customer 2(t).
where r ( y ) E { ph(y), pl(u>,p2(y>,P b ( y ) } . Furthermore, any customer in queue R2 is being delayed by
In the following lemmas we present some basic properties
one of the customers in {l(t), q l ( t ) ; * . ,q J - , ( t ) } .
of the state evolution. The first lemma resolves the order
For every t denote by I l ( t ) ,12(t); * , ZJ(t), the set of
among the customers at any instant.
customers in queue R 2 that are being delayed by customers
Denote (see Fig. 1): l(t), q l ( t ) , - * *q, J - , ( t ) ,respectively, and I ( t ) the set of
rJ(t) The customer (i.e., its sequence number) in the sth customers in queue R1 that are being delayed by customer
position in queue R1 at the tth transition (time t ) . 2(t).
1440 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36, NO. 12, DECEMBER 1991
1
an initial state at which a activates server 2 while leaving I X ( t >I ’
server 1 idle. By definition, server 2 is activated by the Jth IZ(t)I = forT+T2?;It<(pl/p2)T+~2;
customer from queue Q. We will show that a can be strictly 1 X ( t )I , otherwise.
improved.
To simplify notation we may assume without loss of For all other realizations in {T > TI}, iT mimics a’s
AYOUN AND ROSBERG: OPTIMAL ROUTING TO TWO PARALLEL HETEROGENEOUS SERVERS 1441
server. (From Lemma 3.1, it would be server 1.) where I,, --1,-e ,correspond to the customers in server 1
If { r < U } we couple at time 7,the residual service time and in the first ( J - 2) positions of queue Q. For R =
of customer 1 in X to the service time of that customer in -
(0, (0, ,0)) we fix the notation [O].
X . (Under these realizations both are exponential with pa- Lemma 3.3: There is a function h p ( R ) such that for
rameter p ,.) We also couple all other service requirements in every routing policy a whose decisions are independent of
both systems. From time 7 and on, .ii mimics a’s actions. R
By this coupling both systems start at time 7 in the same state
and therefore have the same state evolutions. Thus, for every
1442 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36, NO. 12, DECEMBER 1991
Proof: Let xo = ( n ,e,, e2,R , k ) and Zo = ( n ,e,, iterations, we will show that the routing decisions of the
e2,[O], k ) be two initial states, and X and 2 the processes &optimal policy are independent of R .
that are governed by policy n and start at xo and Z0, Let 9 be the Banach space of all functions f:S --* R with
respectively. Since ?r is independent of R( t ) , we may couple the norm 11 * 11 defined by I1 f 11 = sup,,s I A x ) /
the arrivals and service times in both systems. This is made max { 1, 1 x 1 } I . From (6) and Lemma 3.2 we may define for
possible by the same evolutions of ( n ( t ) ,e,(t), e2(t))and every stationary policy x , the dynamic programming opera-
( E ( t ) ,E l ( t ) , E2(t)). (Here, we use the tilde notation as in tor T,: F+ 9, by
Section 11.) There are two cases of R that have to be
considered. (T,f)(x) = I X I + P [ w w x ) ) )
Casei): Assumethat R = ( O , ( l , ; * * ,l J - l ) ) . Set ro = To + P , f ( 7r ( 0 1 (4))+ P 2 f ( n ( 0 2 ( x>>>1 (14)
= 0, and for every 1 5 j IJ - 1 let rj(Tj) be the instant
that the customer present at time 0 in position j , leaves the where a( y ) E S is the state to which the process jumps from
system. By the coupling, rj and Fj are identical. state y after policy n takes the action whether or not to route
Since n routes from position J , the customers that are a customer to server 2 at state y . Also, define the optimal
present at time 0 in the first ( J - 1) positions, will be routed dynamic programming operator T: 84 8,by
to server 1. Thus, rj is distributed as the sum of j indepen- ( T f ) ( x ) = I X I + P [ h m i n f ( n ( A ( x ) ) )
dent geometric r.v.'s with parameter p , . By the definition of
the resequencing delay, we therefore have + P , m i n f ( a ( D , ( x ) ) ) + P * m y ( n ( W ) ) ] . (15)
For this case the lemma follows by defining The procedure by which a new value function is derived by
J- 1 using operator T is known as value iteration, and by which
hp(R) = liE[l + /3 + + p ' ~ - ~ ] . (10) a new stationary policy is derived by using T, as policy
i= 1 iteration.
Case i): Assume that R(I, (0,. * 0)). If 1 = 0 then the
a ,
Theorem 3.1: The routing decisions of the 0-optimal
lemma is trivial. For 1 > 0, let r be the instant that the policy are independent of the state of the resequencing queues,
customer present at time 0 in server 2, completes his service. R.
Clearly, 7 is geometrically distributed with parameter p 2 . Proof: First, we show that if n's decisions are inde-
We have pendent of R , so are the decisions of the policy derived by
the policy iteration TV.! Then we show that the optimal
= IT(t)I + l , forOst<r; policy preserves the same property. For every f E 9, define
= IX(t)I, for t 2 r . g f ( n , R ) =f(n,l,O,R,O) - f ( n - 1 , 1 , 1 , R , J - I),
for n 1 J - 1. (17)
For this case the l e v a follows by defining
P(R) = IE[I + p + ... + p 7 - q . (11)
let ?ro be a policy whose routing decisions are independent of
R , and for every m 2 0 define T,+~ as the policy that is
From (10) and ( l l ) , the function derived by the policy iteration TV!m. That is, T,m+,V!m =
From (15) and (17), n,+,(y) is either 0 or 1,
h p ( R ) = 1E[1 +p + +p'-'] TV!m.
depending on whether g B ( n , R ) is negative or nonnega-
'Tm
J- 1
+ ic
tive, respectively. From Lemma 3.3 it follows that if n,'s
liE[l + p + ..' + 0 q (12) decisions are independent of R , then g,P ( n , R) =
= 1
g P ( n , [O]), which implies that T , + ~ ' S decisi& are also
satisfies (9). Here the expectations are taken with respect to ,.v
independent of R .
the geometric r.v.'s which are clearly independent of R . U Since a limit point of { n,} does not necessarily exist, we
The function hp(R ) represents the accrued discounted cost cannot deduce the theorem by the policy iteration procedure.
that is contributed by the customers present at time 0 in the However, we can extract it by the value iteration procedure
def
resequencing queues. For later references denote h @( k ) = as follows. Consider the sign of g,O, where V p= inf, V:
hP((O,(0; -
0, 1,0, * * , 0))), where the 1 corresponds to
a , is the @-valuefunction.
position k . Observe that from (10) Since no's decisions are independent of R , it follows by
the argument above that so are n,'s decisions, and by
+
h @ ( k 1) > h P ( k > . (13) Lemma 3.3, the sign of gv!m(n, R ) is independent of R ,
By using Lemma 3.3 and the following value and policy rn 2 0. since the limrn+- V!m exits and equals to V p (see,
AYOUN AND ROSBERG: OPTIMAL ROUTING TO TWO PARALLEL HETEROGENEOUS SERVERS 1443
e.g., [4,Lemma 3]), the sign of gvs(n, R ) is also indepen- define the function
dent of R .
To conclude the proof, observe that the P-optimal policy y p ( k )= E [ 1+ p + . * .+ p z q
a*, is the solution to the optimality equations V B= W O . - E[1 + + * e * +PX(k)-'], k 2 1. (18)
Now, from (15), a * ( y ) = 1 if and only if g v s ( n , R ) =
def
g " P ( n , [O]) 2 0, and the solution is independent of R . 0 Note that for = 1, y ( k ) = yl(k) = E[Z,] - E[X(,,].
Theorem 3.1 would also apply to the optimal policy with The function yB(k ) represents the difference in the accrued
respect to the average cost, if one could guarantee the cost that is contributed by a customer present at time 0 in
following limits position k , under the two alternative routing policies above.
+
recalling that CY = pl /pl p,, we obtain by the forward
equations.
g* = lim (1 - Pk) v ~ ~ ( o ) y#) = CYEIPTo]yp(k- 1) - (1 - CY)
Pk-t 1
route from variable positions (with some restrictions), and X ( 2).From the identities for the rest of the departure times
show that if position k , k # J,, is feasible under a, then a
can be improved by one of the transformations above. V,P(n - 1 , 1 , 1 , R , k - 1) - V!(n - 1 , 1 , 1 , R , k )
Lemma 3.4: The following hold for every p, I0 < 1: = + I ) { E ~+ p ... +p7-11
(I,
i) if a routes customers to server 2 from positions larger
than or equal to k , k < J,, and k is a feasible position, then -E[1 + +p'-']}.
* e * (20)
T , ( l ,a), I 1 1, is at least as good as a; Thus, we have to show that the expression within the braces
ii) If a routes customers to server 2 from positions larger is nonnegative. To prove this, first note that customer k in
than or equal to k - 1, k > J,, and k is a feasible position, system 2 is routed to server 2, if and only if ( k 1) in X +
then there is a p2 < 1 such that Ti(1, a), 1 2 1, strictly is routed to server 2. Also note, that _since a routes from
improves a for every p2 5 < 1. positions k or higher, customer k in X would definitely be
Proofi The proof is based on a pathwise comparison o,f served by server 1, if this server would complete his first
the state process X under policy a to the state process X service before server 2 does. This event occurs with probabil-
under policy T i (a)(for part i)) or T; (a)(for part ii)). To ity PI I P l + P2.
compare realizations, we couple the arrivals and service Let To be as in (19) and T, be the number of steps after
completions in both systems. (Note that a service completion +
To that it takes to route customer ( k 1) in X to server 2
may correspond t,o different customers in X and X . ) The (and infinite, if he is routed to server 1). Denote by 7 the
process X and X are identical until step 1. Hence, for our conditional probability (conditioned on the state at time To)
pathwise comparison, we may assume without loss of gener- that { Tl < a}.By using the forward equations from time 0
ality that the processes start at step 1. Therefore it suffices to to To, it follows from the definitions of 2, and X(,) that
prove the lemma for T i (a) and T;(T).
Part i): Let x, = ( n ,1,0, R,O), n 2 k , be an initial
E[(1 + p ... +p7-1) - (1 +p . - * +p'-l)]
those that at time 0, are being delayed by him), would leave with probability one, we have
both systems at the same time. Furthermore, the departure
+
times of customer k 1 (and those that at time 0, are being E[(1 +p . - * +/3-1) - (1 +p ..* +pi-l)]
nIJ-2.
yO(J- 1) < 0 , p 2 < p < 1. (23)
Here, 1, = (I, (I,;**, Zk;-., Z,-,) = (0, (0, 1,
The proof is based on policy iteration and develops along the * * ,O)).
same lines as the proof in [4], with some changes that are From Lemma 3.5 one may show by successively using the
required from our different state space. Define a partial order operator Tt,, that Vt", also satisfies properties a)-f) of the
"- < " on the states, as follows. Recall that a state x is a lemma. Indeed, it is easy to construct in a recursive manner a
tuple x = ( n , e,, e2, R , k ) . We say that x i y , x,Y E S ,if function f, that satisfies properties a)-f). From the lemma it
at least one of the following conditions hold: def
follows that T:n+lfo= T t $ T t f o ) , n 2 1, also satisfies these
i) x = y (component-wise);
properties. Now, since limn+mT c fo = yt, we obtain the
ii) x = D , ( y ) ;
following corollary.
iii) x = D,(y);
Corollary 3.1: For every m 2 J , the P-discounted cost
iv) A ( x ) = y ;
function under policy t, satisfies properties a)-f) of Lemma
v) all components of x and y are equal except for one,
3.5
which is smaller in x .
The next lemma is the basis of our final result and its proof
vi) there is a Z E Ssuch that, x l z and ~ ( y .
is similar to that in [4, Lemma 41. The assumption J > J ,
For every f E ,F we also define the function:
and the property in (23) are crucial for reproducing the
proof. The lemma asserts that the new policy that is obtained
%(n, k )
from 5; by the policy iteration procedure, is also of thresh-
f(n-2,l,l,[O],k) -f(n-3,l91,[O],k), old type.
n 2 3 , O < k < m i n { n - 2 , J - I}; Lemma3.6: For every m,, 2 I J, < J 5 rn, < 00, there
+
exists an m,, J I rn, I m, 1, such that Ttm,5B_o =
f(O,l,l,[O],l) -f(o,0,1,[0],0)~ Tqo.
n=2, k=0. Proof: To prove the lemma we need to explore the
(24) properties of the function
k s J - 1 - V t , ( n - 2 , 1 , 1 , [ O ] , J - 2) > 0. (28)
AYOUN AND ROSBERG: OPTIMAL ROUTING TO TWO PARALLEL HETEROGENEOUS SERVERS 1447
g ( n ) > P k ( n + 1) + P a l g ( n - 1)
+ Pa2[v g n , 1705 [ 0 ] 4
- v g n - 1 , 1 , 0 , [o] + 1 , - , J q ]
2 PXg(n + 1) + Pa,g(n - 1)
+ Pa2[v g n , 190, [o] ,o) Again, from (28), the expression in the first braces is greater
than g ( n - 1). Furthermore, from Corollary 3.1 and prop-
- v ; p 1 , 1 , 1 , [ 0 ] , J - I)] erty a) of Lemma 3.5
= p X g ( n + 1) + P p , g ( n - 1) + P P 2 g ( n ) .
s ( n ) IPa& - 1). (31)
The last inequality follows from Corollary 3.1 and property Case iv): n 1 m , I3. (The policy tmoroutes a customer
+ +
a) of Lemma 3.5. Since X a, p2 = 1, we obtain for this at queue length n - 1 and above.)
case From (6)
(1 - P)g(.) - PX(g(. + 1) - g ( n) = Pp1 [ vqn - 2,171, [o] , J - 1)
IPa.l(g(n - 1) - g ( n ) ) ,
1 IJ - 1 In Im , - 2.
(29)
The same inequality is obtained for 1 In < J - 1, by
defining g ( n ) = Kt(n, 1, 0 , [OI, 0) - v[$n - 1, 1, 1,
[O], n), 1 5 n < J - P. From (28), the expression in the first braces is positive. From
Case i): n = m , - 2 > 1. (The policy tmoroutes a cus- Corollary 3.1 and property a) of Lemma 3.5, the expression
tomer at queue length n +
1, but does not route at queue in the second braces is also nonnegative. Thus
lengths n and below.)
From (6), the definition in (25), property e) of Lemma 3.5
and (28), we have
To complete the proof note that from (29)-(32), g ( n ) satis-
s(n) = P P I [ v;,(n - 1 , 1 , 0 , [o],o) fies the conditions of the corresponding function in [4, Eq.
(lo)]. As a consequence, the rest of the proof is identical to
-vp,,(n - 2,131, [o], J - 2)] the proof of [4,Lemma 41, and our lemma follows. 0
The assertion of the next theorem and its proof are identi-
+ Pa,[ Y p , 190, [ 0 ] , 0 ) cal to [4,Theorem 51. The proof applies the convergence of
the policy iteration to the 0-optimal policy.
- v;p - 1 , 1 , 0 , [o] + 1,-,,0)] Theorem 3.3: For every J > J, and P2 IP < 1:
i) there exists a stationary policy of threshold type, with
= PP& - 1) + PP2[ v;p, 190, [0],0) threshold m*(P) I00;
ii) if V $ X ) < v;+Jx), for some state x,then r n * ( ~ I
)
-v;p - 1 , 1 , 0 , [ 0 ] , 0 ) m.
In our final theorem we show by applying [5, Theorem 31,
+v;,(n - 1 , 1 , 0 , [ 0 ] , 0 ) that the optimal policy with respect to the average cost is of
threshold type. Here, we cannot reproduce the results from
- vp,,(n - 1 , 1 , 0 , [o] + l J - l , o ) ] [4, Section IV] since a close form for Vtmis intractable. We
will show instead, that Assumptions 1-5 of [5, Theorem 31
IPa,g(n - 1) + PP*(A'va,(" + 1) - h P ( J - 1))
hold for our problem. The main assumption requires the
following lemma.
1P p , g ( n - 1). (30)
Under every threshold policy t,, m Iw , define 7,
The last inequality follows from property c) of Lemma 3.5. (respectively, C,) as the number of steps (respectively, the
The same inequality is obtained for the case n = m , - 2 = accrued cost) until the first return to an empty system. Also,
1. Observe that the assumption J > J, > 2 and the require- let Ex(7,J and Ex(Cm)be their expected values given that
ment m , IJ, implies that m , > 3. the system starts at state x.
Case iii): n = m , - 1 I2. (The policy tmo routes a Lemma 3.7: If X < p , , then for every state x,
customer at queue length n and above.) SUP, E,(T,) < 03 and suprnE,(C,) < 00.
1448 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36, NO. 12, DECEMBER 1991
Proof: For m = 03, 7, is distributed as the first return The finiteness of E ( u 2 ) follows from the fact that all mo-
time to state 0 in an M I M I 1 queue with arrival and service ments of 7, and K are finite. o
rates X and p l , respectively. Since X < p l , E,(7,) < W . Remark 3.3: Since at most one customer could be served
We will show that this implies a uniform bound on Ex(7,). by server 2 at every instant, the same proof implies that the
For every m < m, consider the systems that operate under expected return time and accrued cost, given that X ( 0 ) = x,
t, (i.e., M / M / l ) and under t,. To compare their paths, we is uniformly bounded over all admissible policies.
feed them with the same arrival process and couple the Now we are ready for our final theorem.
service completion times-either real or dummy -in both Theorem 3.4: For every J > J,, there exists a stationary
systems. (To clarify the coupling, imagine the servers pro- policy of threshold type t,, whose level is a limit point
ducing completion events at rates p , and p,, irrespective ,
m* = lim pk+ m*( P k ) (m* could be infinite).
whether or not a customer is being served. When a comple- Proof: We consider two cases.
tion event occurs in a server that is serving a real customer, +
Case i): pl IX < p1 p,. The proof of this case is
this customer would complete his service. Our coupling is identical to the proof of the corresponding case in [4, Lemma
referred to these completion events, irrespective of the cus- 7, Theorem 81. In this case, due to the instability of t,, it is
tomer identities that are being served. It is quite clear that for also shown that m* < W .
exponential systems, this view is statistically the same as Case ii): h < p,. In this case we apply Theorem 3 from
identifying the services with the customers.) [ 5 ] . Assumptions 1-3 there, trivially hold in our problem.
Under t,, define U as the first instant that server 2 From Theorem 3.3, the policies tm*,R,are B-optimal for
completes a dummy service immediately after the system every P2 5 P < 1. Therefore, Lemma 3.7 implies that As-
becomes empty. That is, at time U - 1 the system just sumptions 4 and 5 there also hold, for a subsequence Pk 1 +
became empty and the next jump was due to a service for which m*(Pk)-+ m*. Hence, our theorem is a direct
completion in server 2. Observe that from our coupling, at consequence of [ 5 , Theorem 31. 0
time U , both systems are empty. Hence To combine this result with the results of Theorem 3.2
about the optimal position, let tm*(J),J > J,, be the optimal
SUPEXb,) s EX(0). (33) policy with respect to the average cost, given that the routing
m
position is J. From the proof of Lemma 3.4, Part ii), it is
From the renewal property of state 0 in an M I M I 1 queue clear that tm*(Jo+l) is at least as good as tm,(J).Hence,
and that of the residual completion time, U can be repre- ,) is at least as good as any other fixed-position policy
sented as follows. Let Bi, i 2 1 be the ith time that the that routes customers from J > J,. Furthermore, from The-
system (under t,) is empty, and K be the number of returns orem 3.2, part b. l), the policy t$3Jo+l, that routes cus-
to an empty system until server 2 completes a service imme- tomers whenever t,,,Jo+,, does, but from position J,, is at
diately after the system becomes empty. We have least as good as t,,(Jo+,). Thus, the following corollary is
obtained.
0 = (e, + 1) + (e, + 1) + ... +(e, + 1). (34) Corollary 3.2: The policy t,$(Jo+l,is at least as good as
any other policy that routes from position J > J,.
Since at every step, the probability of a service completion at REFERENCES
server 2 is p, > 0, K is geometrically distributed with A. K. Agrawala, E. G . CoiEnan, Jr., M. R. Garey, and S. K.
parameter p,. Furthermore, O i , i 2 2 are i.i.d and indepen- Tripathi, “A stochastic optimization algorithm minimizing expected
flow times on uniform processors,” IEEE Trans. Computers, vol.
dent of K . By definition C-33, no. 4, pp. 351-356, Apr. 1984.
S. Ayoun, “Optimal control of a queueing system with two heteroge-
E @ , ) = Ex(7m) < 00; neous servers with resequencing,” M.S. thesis, Dep. Electr. Eng.,
Technion, Haifa, Israel, Feb. 1989.
E@,) < max { E o ( 7 m ) ; E1(7,)} < (35) I. Illiadis and Y. C. Lien, “Resequencing delay for a queueing system
with two heterogeneous servers under a threshold-type scheduling,”
IEEE Trans. Commun., vol. COM-36, pp. 692-702, 1988.
where the indexes in E,(.) correspond to states 0 and 1 in W. Lin and P. R. Kumar, “Optimal control of queueing systems with
the M / M / 1 queue. two heterogeneous servers,” IEEE Trans. Automat. Contr., vol.
From (33)-(35) and Wald’s lemma AC-84, pp. 696-703, Aug. 1984.
S. A. Lippman, “Semi-Markov decision processes with unbounded
rewards,” Management Sci., vol. 19, pp. 717-731, 1973.
supE,(~,) IEx(7,) + E ( K ) ( 1 + E ( 0 , ) ) < m. (36) -, “Applying a new device in the optimization of exponential
m queueing systems,” Operations Res., vol. 23, pp: 687-710, 1975.
R. L. Larsen, “Control of multiple exponential servers with applica-
tion to computer systems,” Ph.D. dissertation, Tech. Rep. 1041,
To prove that sup, E,(C,) < 03, denote by A ( t ) the Univ. Maryland, College Park, MD, 1981.
number of arrivals until time t. Given that the system under M. I. Reiman, “Optimal control of a heterogeneous two server queue
t,, m I03, starts at state x in light traffic,” AT&T Bell Lab., Murray Hill, NJ, 1989.
M. I. Reiman and B. Simon, “Open queueing systems in light
C, 5 1X I + u*A(a). traffic,” Math. Operations Res., 1989.
Z . Rosberg and A. Makowski, “Optimal routing to parallel heteroge-
neous servers-Small arrival rates,” IEEE Trans. Automat. Contr.,
From the Poisson arrivals vol. 35, pp. 789-796, July 1990.
M. Schal, “Conditions for optimality in dynamic programming and
for the limit of n-stage optimal policies to be optimal,” Z .
AYOUN AND ROSBERG: OPTIMAL ROUTING TO TWO PARALLEL HETEROGENEOUS SERVERS 1449
Warscheinlichhreitstheorie Verw. Gebiete, vol. 32, pp. 179-196, Zvi Rosberg received the B.Sc., M. A., and Ph.D.
1975. degrees from the Hebrew University, Jerusalem,
[I21 J. Walrand, “A note on the optimal control of a queueing system with Israel, in 1971, 1974, and 1978, respectively.
two heterogenec)us servers,” Syst. Contr. Lett., vol. 4,pp. 131-134, From 1972 to 1978 he was a Senior System
1984. Analyst in the General Computer Bureau of the
Israeli Government. From 1978 to 1979 he had a
Research Fellowship at the Center of Operation
Serge Ayoun received the B.Sc. and the M.Sc. Research and Econometrics (CORE), University of
degrees in computer engineering from the Tech- Louvain, Belgium. From 1979 to 1980, he was a
nion Institute of Technology, Haifa, Israel, in 1986 Visiting Assistant Professor at the University of
and 1989, respectively. Illinois. From 1980 to 1989, he was with the
Since 1989 he has been with IBM Israel Science Department of Computer Science, Technion, Israel. Since 1990, he has been
and Technology working in the area of image with IBM Israel, the Science and Technology Center. During 1985-1987 he
processing. was on leave at the IBM Thomas J. Watson Research Center, Yorktown
Heights, NY. His main research interest include probabilistic models of
communication networks and computer systems, performance evaluation,
queueing theory, and applied probability.