Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

1436 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36, NO.

12, DECEMBER 1991

Optimal Routing to Two Parallel


Heterogeneous Servers with Resequencing
Serge Ayoun and Zvi Rosberg

Abstract-Customers arrive to a single service queue accord- x


ing to a Poisson process with rate A, from which they are routed
to two parallel heterogeneous and exponential servers whose
rates are pI 2 p2. Customers are released from the system after
Q
Service Queue I
1
* I
service completion, according to their arrival order-a require-
ment introducing additional resequencing delays. Customers
which are delayed due to resequencing are waiting in a rese-
quencing queue. We consider the optimal routing problem un-
der the class of fixed-position routing policies, that route cus-
tomers to the faster server from the head of the service queue,
and to the slower server from position J. The cost function is
taken as the long-run average holding cost of the customers in
the system. We show that an optimal stationary policy exists I...., iJ-,(t)}
and is of the following type: the faster service is kept active as
long as the service queue is not empty. The decision whether or
not to route a customer to the slower server is independent of
the state of the resequencing queue. If the position J is greater
+
than J, = Iln(1 - cY)/ln a ] ,OL = pI /aI p,, then customers
are routed to the slower server if and only if the length of the
service queue is at least m* (a threshold policy). We also show
that the routing position J, is "optimal" in the sense that every
policy can be improved by dispatching a customer from position
J, (if not empty), rather than from position J. - -- - ----
. --
Resequensing
Queues
I. INTRODUCTION

I N THIS paper we consider a queueing system (Fig. 1)


which is composed of an infinite capacity queue Q at-
tended by two exponential servers operating at rates p I > p 2 .
Customers arrive into the system according to a Poisson
process with rate X, and are assigned consecutive integers J,
which serve as their identifiers. Throughout we assume the Fig. 1. Routing from position J when 2(t) = 0 and l ( t ) # 0.
def
+
stability condition X < p = p, p,. Arriving customers join
at the end of queue Q and are routed to one of the servers routing policy may assign customers from an arbitrary posi-
according to some given routing policy (to be defined below). tion of the queue. Customers which are being delayed due to
Customers in service cannot be rerouted. resequencing, are waiting in one of two resequencing queues:
In many applications of routing in communication net- R1 for customers which have been served by server 1, and
work, customers (messages) are released from the service R2 for those which have been served by server 2.
system (the channel and receiver) according to the order of The positions in queue Q from which customers are being
their arrivals. That is, customer i is not released from the routed to the servers (which are perceived as two alternative
system unless he and all customers whose numbers are routes), clearly affect the overall resequencing delays (see
smaller than i , have finished their service. The waiting time [3]). The optimal routing problem with variable positions
of a customer that has completed his service, for the release turned out to be extremely difficult. Therefore, we restrict
of customers with lower sequence numbers, is referred to as our attention to fixed-position routing policies which route
resequencing delay. Note that resequencing delays are possi- customers to server 1 only from the head of queue Q , and to
ble since servers are operating at different rates. Moreover, a server 2 only from a fixed position J , J L 2. By position J
we mean the Jth customer among those in server 1 and in
Manuscript received February 23, 1990; revised May 16, 1991. Paper queue Q. Beside tractability, this restriction is also motivated
recommended by Associate Editor At Large, P. R. Kumar. by the result in [2].It has been shown there, that if routing
The authors are with IBM Israel, Science and Technology, Technion City,
Haifa 32000, Israel. positions are allowed to vary in time, then under light and
IEEE Log Number 9103543. heavy loads one can take the optimal policy within the class

0018-9286/91$01.00 0 1 9 9 1 IEEE
~

'AYOUN AND ROSBERG: OPTIMAL ROUTING TO TWO PARALLEL HETEROGENEOUS SERVERS 1437

of fixed-position routings. Also, as it will become apparent, it length of queue Q only. This subclass will be referred to as
is not optimal to keep server 1 idle if queue Q is not empty, the resequencing-invariant class. A further simpler subclass
and therefore the requirement of J 2 2 does not exclude the are the threshold policies.
head of the line. A policy t , is a threshold policy with level m 2 J if i)
Let X ( t ) be a tuple denoting the state of the system at the first customer in queue Q is routed to server 1 whenever
time t (to be defined below) and I X ( t )I be the number of he becomes free; ii) the customer from position J is routed
customers in the system at that state. A routing policy a is to server 2 when and only when he is free and the number of
any rule that at every time t 2 0 decides, on the basis of past customers in server 1 and queue Q is at least m.
states and of past decisions up to time t , which idle servers to One result of this study is that the optimal policy can be
activate. Policies may leave a server idle even when there is a taken within the resequencing-invariant class. Another result
customer in the corresponding position. is that for a certain range of positions J , the optimal policy
With a holding cost accrued at a fixed rate of 1, the can be taken within the threshold class. We also show that
long-run average cost associated with the policy a is then there is a preferable routing position J,.
defined by For the routing problem without resequencing delays, the
def
JT( x ) = limsup -E,"
T- m [LT I I X ( t ) I dt ,
routing position J is irrelevant since service requirements
are identically distributed. This problem was first studied in
[7], where it was conjectured that the optimal policy would
for every state x (1) be of threshold type. In [l], a version of the problem with N
where E,"[.] denotes the expectations with respect to the servers was considered under the assumptions that the system
probability measure induced by the policy a on the process has an initial load of n customers and no new customers
X = { X ( t ) , t 2 0 ) starting in state x. A routing policy a* enter the system, i.e., X = 0. A simple policy which mini-
is optimal if it minimizes (l), i.e., if mizes the expected flow time has been determined. This
J&) 5JTW optimal policy has the following simple form [l]:
forl<jrN,
for every policy a and state x.
set
For the exponential system considered here, the optimiza-
tion problem associated with (1) falls within the purview of (4)
continuous-time Markov decisions processes which are uni-
formizable, i.e., which are equivalent to uniformized dis- and define R , = 0. If there are n customers that remain
crete-time Markov decisions processes [6]. The reader is unprocessed and server j is the fastest server available (i.e.,
referred for details to [4], where the same problem without with the largest p j ) , then the idle server j is activated-and a
resequencing delays is studied. To define the discrete-time customer dispatched to it-if and only if n > R j .
decision process, consider that at any given instant, each The conjecture from [7] on the threshold form of the
server is working either on a real customer, if activated, or optimal policy was settled in the affirmative in [4] for N = 2.
on a dummy customer otherwise. Dummy customers always Using policy iteration, it has been shown that the optimal
return to queue Q upon completing service and incur no policy is of threshold type with threshold level R(X) (which
contribution to the cost. Transitions are associated either with depends on A). It was also conjectured there that as Xl0,
arrivals or service completions at one of the servers of a R(X) increases and converges to R , given by (4). In [12],
customer-either real or dummy-determine free transitions. simple stochastic coupling arguments were used to prove the
These free transitions occur according to a Poisson process of optimality of the threshold policy for N = 2. Motivated by
+
rate X p . A (free) transition due to an arrival occurs with the conjecture made in [4], it has been shown in [lo] (for a
+
probability X/h p , whereas a transition due to a service general number of parallel servers) and in [8] (for two
servers) that the threshold policy above for X = 0, is also
+
completion at server i occurs with probability p l / h p. If
in state x before a transition, the process will jump after this optimal for small enough values of the arrival rate A.
In light of the results above, one is naturally led to explore
transition to a state which depends on the current state x and
on the action taken under the policy a in use. The cost the idea that when resequencing delays are introduced, the
function for using policy a which corresponds to (1) is then optimal policy would also be of threshold type. We settle this
question in the affirmative only for J > J,.
given by
def 1 The issue of resequencing delays in this context has been
~ ~ (= xlimsup--E:
)
N-m N [m:o
IX(m)( ,
] X E S (2) first introduced in [3], where queueing statistics have been
evaluated under the class of fixed-position threshold policies.
where X ( m ) now denotes the state sampled at the mth It has been further shown there, that for a given threshold
transition. We also need the total /3-discounted cost (0 < /3 level m , there is an optimal position J* from which one
< 1) associated with the policy a, which is defined by should route customers to server 2. This position is given by
m
v:(x)Ef~:
[ m=O
/3"1x(rn)\
,
1 xes.

The complex structure of the state space of X (see Section


(3) m, if m < J,;

11) results in a complex class of stationary policies. A simpler


subclass are the policies whose decisions are functions of the
1438 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36, NO. 12, DECEMBER 1991

In other words, when a customer has to be routed to server 2 which will be referred to as the state of the resequencing
according to the threshold policy t,, then the best fixed queues. Finally, let k ( t ) be the highest position of the
position is the nearest to J,. This property of J,, will be customers in { i,(t) 11 Im IJ - l} that would delay the
referred to as its “optimality property. ’ ’ customer being served by server 2 during the tth transition,
Reviewing the optimality property of J , for a threshold if he completes his service immediately. If there is no such
policy, and considering the fact that threshold policies may customer in { i,( t ) I 1 5 m IJ - l}, or if server 2 is idle,
not necessarily be optimal, we are intrigued by another then k ( t ) = 0.
question, whether J, has the optimality property for a more The variable X ( t ) = ( n ( t ) ,e l ( t ) ,e,(t), R ( t ) ,k ( t ) ) is a
general class of policies. We will show that this is indeed the natural state variable that may assume values in S = Jl/x
case. {0,1}2 X M J x {l;.., J - l}, where A’={0,1;..,}.
The paper is organized as follows. In Section 11, we define To describe the transitions of the process X it is useful to
the state space and the transitions under fixed-position rout- define the transformations
ings. Section I11 is subdivided into two parts. In Section
A , D,, D,: S -+ S
III-A, we show that the faster server should be kept active as
long as the service queue is not empty. In Section 111-B which that describe the states to which the process will jump from
is further subdivided, we consider the optimal control of the state x, when a free transition occurs. These transformations
slower server. In Section 111-B-l), we show that the optimal correspond to an arrival, a service completion at server 1 and
control is independent of the state of the resequencing queues. a service completion at server 2, respectively. For the formal
In Section 111-B-2), we show the “optimality property” of definition we need the following notations.
position Jo, and in Section 111-B-3) we show that for J > Jo , A state X E S stands for a tuple x = ( n , e,, e 2 ,R , k ) ,
the optimal policy is of threshold type. where R = (I, (II, * , 1,- - with the understanding that I ,
customers in queue R2, 1 Im 5 J - 1, are being delayed
11. THESTATEPROCESS AND BASIC
DEFINITIONS by customer .,i For every 0 Ik 5 J - 1 and e2 E {0,1}
RESULTS denote
In this section, we define the states and the transitions of
the Markov decision process that describes our routing prob-
lem and examine its state evolution.
A . States and Transitions
We start with the state definition. After every transition t , I if k = Oand e2 = 1,
t = 0 , 1, * , in the discrete-time decision process, let n( t )
denote the number of customers in queue Q, and e i ( t ) ,
i = 1,2 denote the state of server i (with the understanding
that ei(t) = 1 if server i is busy, and e i ( t ) = 0 otherwise).
To describe the resequencing queues R 1 and R 2 we need the
following notion.
[ ifk=O.
We say that customer i in a resequencing queue is being The transformation SL, defines the state that queues R1
delayed by customer k, if: and R 2 would jump to from state x, when server 1 would
i) customer k , did not finish service; complete service of a real customer. Observe that by defini-
ii) k, < i; tion, if k = 0 and e, = 1, then the customer that is being
iii) k , is the maximal k that satisfies i) and ii). served by server 2 is the “oldest” in the system. Otherwise,
Thus, customer i is released immediately after the service the customer that is being served by server 1 is the “oldest.”
completion of customer k,. Thus, if k > 0 or e2 = 0, when server 1 would complete
Let I( t ) be the number of customers in queue R 1 (after the service of a real customer, this customer and those in queue
tth transition), that are being delayed by the customer which R 2 which are being delayed by him, would leave the system.
is being served by server 2. Here, I( t ) = 0 if e,( t ) = 0. In this case we necessarily have 1 = 0. If k = 0 and e, = 1,
Also(seeFig. l),denoteby i l ( t ) < i 2 ( t )< ; - * , < iJ-l(t), we necessarily have R = ( I , (0, .,0)), and the customer
the J - 1 customers with the lowest sequence numbers that would finish service in server 1, would join queue R 1.
among those in queue Q and server 1 after the tth transition. (These observations are proven in the next section.)
The number of customers in queue R2 that are being delayed The transformation St defines the state that queues R1
by customer i,(t), 1 5 m 5 J - 1 is denoted by I,(t). and R 2 would jump to from state x, when server 2 would
Observe that the customers in R1 can be delayed only by complete service of a real customer. Recall that for k = 0
the customer which is being served by server 2 , and those in and e2 = 1 we necessarily have R = (1, (0,* * * , 0)). There-
R 2 by one of the customers in { i,(t) I 1 5 m I J - 1). fore, when the customer that is being served by server 2
(These are formally proven in Section 11-B below .) would finish service, he and the customers in queue R1
The lengths of the resequencing queues are determined by would leave the system. If k > 0 and e, = 1, then the
the tuple customer that would finish service in server 2, would join
R ( t ) = ( I @ ) ,( 4 ( t ) , * - L* ,( t ) ) ) queue R 2 and would be delayed by customer ik.
AYOUN AND ROSBERG: OPTIMAL ROUTING TO TWO PARALLEL HETEROGENEOUS SERVERS 1439

Now, the free transitions of process X from state X E S The customer in the sth position in queue R 2 at
(when no routings are made), are as follows. time t.
The customer in the sth position in queue Q at
A ( x ) = ( n + 1, e , , e 2 , R , k ) ,
time t.
The number of customers in queue R1 at time t.
The number of customers in queue R2 at time t.
The customer which is being served by server 1 at
if e2 = 0; time t , or 0 if the server is idle.
Ddx) = The customer which is being served by server 2 at
( n , e,,O, S:(R),O), if e2 = 1
time t , or 0 if the server is idle.
where x+ = max { 0, x } . The probabilities that a free transi- Lemma 2.1: At every time t and for every occupied
+
tion A ( x ) ,D , ( x ) ,or D 2 ( x )occurs, are h/X p , p1 / A + positions s and p , or, respectively, 1 and J, in the corre-
p , and p z / X+ p , respectively. sponding queues
Here, it is convenient to identify a stationary policy a with
a function a: S { Ph, P I ,P z , Pb} as follows. Assume that
+
a) 4 p ( t > < 4 s ( t ) , for P < s;
a free transition-either an arrival or a service comple- b) r j ( t ) < r:(t), for p e s;
tion-occurs that would make the state jump to x E S if no c) r,2(t) < r,2(t),for p < s;
action were taken. The policy a uses at state x an operator d) qs(t') Iqs(t),for t' < t ;
Po, a E { h , 1,2, b } , that makes the state jump instanta- e) r,'(t) < l(t) < q l ( t ) for l ( t ) > 0, and r i ( t ) < q , ( t )
neously from x to Po(x ) , where for l(t) = 0;
P h ( X )= x ; f) r:(t) < 2(t) < q J ( t )for 2(t) > 0, and r:(t) < q J ( t )
for 2(t) = 0;
Pl(n,O,ez,R,k)=(n-l,l,ez,R,k), n i l ; g) 2(t) < r,'(O;
P 2 ( n , e , , 0R
, , O ) = ( n - l , e l , l , R , J - 1) n11; h) There exists a p , 1 Ip IJ - 2, such that l ( t ) e
P,(n,O,O, R,O) = ( n - 2 , 1 , 1 , R , J - l ) , n i 2.
r:v> or q p ( t )m.
Proof: Properties a)-f) are direct consequences from
The operator Ph does not route any customers, P , routes the the facts that customers join at the end of the queues and are
customer from the head of the queue to server 1, Pz routes being dispatched from fixed positions.
the customer from position J to server 2, and Pb does P , Property 8): Customer r,'(t) is being delayed by a lower
and P2. (Notice that from the way we define the position J , customer. From properties a), b), and e), it could only be
the order in Pb is irrelevant.) customer 2(t). Thus, 2(t) < r i ( t ) .
Property h): Similarly, for customer r:(t). From proper-
B. Basic Results ties f) and a) it could only be one of the customers in
Since the cost function is linear in the state variable and the -
{ 1 ( 0 , 41(0, * qJ-*(f)l. 0
total number of customers in the system changes by at most In the next lemma we show that the two resequencing
one at every transition, it is well known that an optimal queues cannot be nonempty at the same time.
policy exists for the 0-discounted problem (associated with Lemma 2.2: At every time t , at least one of the queues
(3)), and that it can be taken in the class of Markov stationary R1 or R 2 is empty.
policies [ 111. One of the conclusions of this study is that the Proof: Suppose that n l ( t ) > 0 and n,(t) > 0 for some
exact same result also holds for the long-run average cost t. As in the proof of Lemma 2.1 g), n,(t) > 0 and a), b),
criterion (2). Furthermore, for every stationary policy a,the and e) of Lemma 2.1 imply that 2(t) > 0. Hence, from
limit in (2) exists and is independent of the initial state x . Lemma 2.1 f) and g)
Without loss of generality we may assume that X p = 1. +
Under any stationary policy a, the forward equations of r ? ( t ) < 2(t) e r i ( t ) . (7)
v!(x) are
However, from part e) of the Lemma, r : ( t ) < l ( t ) < q , ( t ) ,
v*8(4= I X I + P[XKYa(A(x))) and from part h), r : ( t ) < r:(t), which is in contradiction
+ P I v,B(a(D,( 4 ) )+ P z v !( a x N l ( 6 ) with (7). 0
The following lemma asserts that at every time t , any
customer in queue R1 is being delayed by customer 2(t).
where r ( y ) E { ph(y), pl(u>,p2(y>,P b ( y ) } . Furthermore, any customer in queue R2 is being delayed by
In the following lemmas we present some basic properties
one of the customers in {l(t), q l ( t ) ; * . ,q J - , ( t ) } .
of the state evolution. The first lemma resolves the order
For every t denote by I l ( t ) ,12(t); * , ZJ(t), the set of
among the customers at any instant.
customers in queue R 2 that are being delayed by customers
Denote (see Fig. 1): l(t), q l ( t ) , - * *q, J - , ( t ) ,respectively, and I ( t ) the set of
rJ(t) The customer (i.e., its sequence number) in the sth customers in queue R1 that are being delayed by customer
position in queue R1 at the tth transition (time t ) . 2(t).
1440 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36, NO. 12, DECEMBER 1991

Lemma 2.3: generality, that the customers in queue Q have consecutive


a) If queue R1 is not empty at time t , then e2(t)= 1 and numbers starting from 1. (This is possible since only the
at the next service completion of server 2 , queue R1 will order among them determine their departure times from the
become empty. system. Also from the state definition of the resequencing
b) Any customer in queue R2 is being delayed by one of queues, this assumption does not change the system stat:.)
-
the customers in { 1(t ) , ql( t ) , , qJ- t ) }. Define a policy i? and a corresponding process X as
c) The customers in every set Z,(t) are consecutive, and follows. The initial state X ( 0 ) = X(O), and a time 0, i? takes
those in I J t ) are smaller than those in I p + , ( t ) . the same action as a, except that it activates server 1 (with
Proof: customer number 1) instead of server 2. From then on, the
a) From Lemma 2.2, queue R2 is empty. From Lemma realizations of X and 2 are coupled. This is done by
2.1 b) and e), all customers in queue R1 are smaller than feeding both systems with the same arrival process and
those in queue Q and server 1. Thus, they could be delayed assuming that the first service time at server 2 in X equals
only by customer 2(t). T2 = ( p I / p 2 ) F l . (Here, T j is the service time of a customer
b) If queue R2 is not empty, then by Lemma 2.2, queue at server j . ) _Observe that this coupling is made possible by
R1 is empty. Thus, the customers in queue R2 could be the fact that T I is exponentially distributed with parameter pl
delayed only by those in queue Q and in server 1. The result and therefore T2 is exponentially distributed with parameter
now follows from parts a) and f) of Lemma 2.1. P2.
c) The result is an immediate consequence of the fact that After time 0, policy i? mimics the actions of policy a
for 2 Im 5 J , qEI,(t), if and only if i,(t) < q < (activates the same servers by customers from the appropriate
i m P l ( t (see
) Fig. 1 for notation clarification). Similarly, for positions) with one exception:
4 EZl(t). 0 i) Let T be the first time that a activates server 1. If
Remark 2.1: If l ( t ) = 0 then Zl(t)= 4. Otherwise, ZJ T < TI, then i? activates server 2 at time T (instead of 1 )
= 4. That is, there are at most J - 1 nonempty sets from with the customer from the Jth position. Since server 1 is
{ I l ( t ) ,4 ( 0 9 ’ * * , I J ( t ) } . busy at that time, this is customer number J . (Observe that in
the tilde system, server 2 is available at time T since T < TI
III. OPTIMAL
ROUTING < T 2 , which in turn implies that 2 has not been activated
In this section, we consider the 0-discounted and the under a until time T , and therefore neither under i?. Also, ii
average-cost Markov decision processes. The optimal control is feasible since a and the realizations of all r.v.’s in X are
is split into two parts: routing to the faster server and routing known in 2.1
to the slower server. In Section 111-A, we show by probabilis- For all realizations where i) occurs we reach at time T in
tic arguments, that the faster server should be utilized as long both systems, fo states which are the same except for the
as queue Q is not empty. In Section 111-B, which is further following. In X , the customer at server 1 (customer number
subdivided, we consider the optimal control of the slower 1 ) has been given some service while in X he has not. The
server. In Section 111-B-l), we show that the optimal control converse holds for the customer at server 2 (whose number is
is independent of the state of the resequencing queues. In J in both systems). To continue the coupling observe thtt at
Section 111-B-2), we show the “optimality property” of time 7, the residual service time of customer 1 in X is
position J,, and in Section 111-B-3) we show that for J > J,, exponentially distributed with parameter pl, which is the
the optimal policy is of threshold type. sape as his service time in X . Moreover, the condition
{ T I > T} implies the condition { T2 > ( p l / p 2 ) 7 } and there-
A . Routing to the Faster Server fore the residual service time of customer J in X from time
In this section we use arguments similar to those presented ( p , / p 2 ) 7 (> 7) is exponentially distributed with parameter
in [12] in order to show that server 1 is kept active if queue P2.
Q is not empty. Hence, w,e can couple the residual service time of cus-
To fix the notation, all the proofs in this section are based tomer 1 in X from time T with his service time in X which
on pathwise comparison arguments between an original state starts at time 7. Furthermore, we can also couple the resid$
pJocess X under a given policy a, and another state process service time of J in X from time ( p l / p 2 ) 7 (> T), with T2
X under policy i? derived from a. The latter is referred to -the service time of J in X from time 7.The latter implies
as the tiZde system, and we use a tilde to denote all relevant that customer J completes service in X at time (pI / p 2 ) 7 +
quantities in the tilde system. f”, while in 2 at time T +
T2. After time T, ii continue to
Lemma 3.Z: For every 0 < 0 < 1, the 0-optimal policy mimic a’s actions. From the coupling above and the defini-
has the property that whenever it activates a server, it tion of 7, it is clear that this is feasible. Hence, for all
activates the fastest available one. realizations in X where i) occurs we have
Proof: Let a be any given policy and let X ( 0 ) = x be

1
an initial state at which a activates server 2 while leaving I X ( t >I ’
server 1 idle. By definition, server 2 is activated by the Jth IZ(t)I = forT+T2?;It<(pl/p2)T+~2;
customer from queue Q. We will show that a can be strictly 1 X ( t )I , otherwise.
improved.
To simplify notation we may assume without loss of For all other realizations in {T > TI}, iT mimics a’s
AYOUN AND ROSBERG: OPTIMAL ROUTING TO TWO PARALLEL HETEROGENEOUS SERVERS 1441

actions and we obtain realization where { 7 < U } we have 1 z ( t ) 1 = 1 X ( t ) 1 for


every t .
If { 7 > U } then by time 7 - , customer 1 and those that
had been delayed by him, left the tilde system. In the original
system they are still in the system at that time. In this case,
the service coupling is done by coupling the services which
are given by the servers in both systems, irrespective of the
The first equality is straightforward. The second equality customer identities. Again, from time 7 and on, ii mimics
follows from the fact that customer 1 leaves X at time T,, a’s server activations with possibly one dummy activation.
while no one is leaving X before time r (when customer The following is a consequence of the different states at
number 1 starts his service). Note that even if customer J time r - , and the definitions of the fixed-position routing, ii
had finished his service before 7, he would be delayed by and the resequencing delay. If at any given time after r ,
customer 1. The last inequality in (8) is based on the two customer i and those that have been delayed by him leave X ,
following observations. From the definition of the fixed-posc then customers { 1, * * , i - 1 ) and those that have been
tion routing and the fact that at time 7 server 1 is idle in X delayed by them, had left X earlier. Therefore, by the
(after customer 1 departs) we have the following. definition of ii,customers { 1, * , i} and possibly custome:
i) Whenever a routes customer number i, 1 Ii IJ - 1, +
i 1 and those that haxe been delayed by them, had left X
+
77 routes customer number i 1. For customer numbers i, by that time. Thus, I X ( t )1 5 I X ( t ) I - 1 for U < t I7,
+
i 2 J 1 , both policies route the same customers. and 1 r?(t) I II X ( t )I otherwise. Since there is a positive
ii) From time r and on, all service completions (except for probability that { r > U } and /3 < 1, ii strictly improves
the first completion time of customer J at server 2 in X ) are a. 0
coupled in both systems. Remark 3.1: Using the same argument as in the proof of
From i) and ii), when customer i, 1 Ii IJ - 1, com- Lemma 3.2, it is even simpler to show that keeping server 1
+
pletes service in X , so does customer i 1 in 2.There- active whenever possible, is also optimal within the nonfixed
fore, since customer 1 had left % before time r , customers position routing policies.
{ 1, *a , J} and those in R1 and R2 that have been delayed Hereafter, we may further restrict attention to policies with
by them had left % earlier than they left X . This holds also the additional property that server 1 be kept active whenever
for all customers i, i 2 J + 1, and the last inequality is possible.
satisfied.
B. Routing to the Slower Server
Since the discount factor is less than 1 and the event where
I X ( t )I < 1 X ( t )1 occurs with positive probability, it fol- In the previous section we proved that the /3-optimal policy
lows that ii strictly improves a. 0 keeps the faster server active whenever queue Q is not
Hereafter, we may restrict attention to policies with the empty. Thus, the optimality problem becomes a problem of
property given in Lemma 3.1. Another property of the routing customers to server 2. That is, at which set of states a
P-optimal policies is given in the following Lemma. customer should be routed to server 2 (when idle), given that
Lemma 3.2: For every 0 < /3 < 1, the /3-optimal policy the dispatching position is J . Hereafter, a routing decision
keeps server 1 active if queue Q is not empty. will be understood as routing to the slower server only.
Proof: Let X ( 0 ) = x be an initial state such that Q(0) In the following sections we will derive some useful
> 0 and e,(O) = 0; and a a policy that does not activate attributes of the optimal routing policy. The first attribute is
server 1 at that state. We show that a can be strictly that its decisions are independent of the states of the rese-
improved. quencing queues. Another attribute relates to the routing
From Lemma 3.1 we may assume that a does not activate position J . It will be shown that J , has an “optimality
any server at time 0 (otherwise it is not optimal and we are property” in the sense that one would like to route from the
done). Define the following policy ii and its corresponding nearest position to J,. A third attribute is that for J > J,,
state process 2 which starts at the same initial state z(0)= the optimal policy is of threshold type.
X(O), and where all arrivals are coupled to those in the I ) The Resequencing-Invariant Property: The next
original system. Let U be an exponential r.v. with parameter lemma is essential for the proof that state R does not play
p l which is independent of anything else in the systems. At any role in the optimal routing decision. Observe that from
time 0, ii routes customer 1 to server 1 and his service time Lemma 2.2, Remark 2.1, and Lemma 3 . 2 , the feasible values
is taken as U . Also, let r be the first time that a activates a of R are of the form (0, (Il, -
1,- ,)) or (I, (0, * * , 0)),
,
e ,

server. (From Lemma 3.1, it would be server 1.) where I,, --1,-e ,correspond to the customers in server 1
If { r < U } we couple at time 7,the residual service time and in the first ( J - 2) positions of queue Q. For R =
of customer 1 in X to the service time of that customer in -
(0, (0, ,0)) we fix the notation [O].
X . (Under these realizations both are exponential with pa- Lemma 3.3: There is a function h p ( R ) such that for
rameter p ,.) We also couple all other service requirements in every routing policy a whose decisions are independent of
both systems. From time 7 and on, .ii mimics a’s actions. R
By this coupling both systems start at time 7 in the same state
and therefore have the same state evolutions. Thus, for every
1442 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36, NO. 12, DECEMBER 1991

Proof: Let xo = ( n ,e,, e2,R , k ) and Zo = ( n ,e,, iterations, we will show that the routing decisions of the
e2,[O], k ) be two initial states, and X and 2 the processes &optimal policy are independent of R .
that are governed by policy n and start at xo and Z0, Let 9 be the Banach space of all functions f:S --* R with
respectively. Since ?r is independent of R( t ) , we may couple the norm 11 * 11 defined by I1 f 11 = sup,,s I A x ) /
the arrivals and service times in both systems. This is made max { 1, 1 x 1 } I . From (6) and Lemma 3.2 we may define for
possible by the same evolutions of ( n ( t ) ,e,(t), e2(t))and every stationary policy x , the dynamic programming opera-
( E ( t ) ,E l ( t ) , E2(t)). (Here, we use the tilde notation as in tor T,: F+ 9, by
Section 11.) There are two cases of R that have to be
considered. (T,f)(x) = I X I + P [ w w x ) ) )
Casei): Assumethat R = ( O , ( l , ; * * ,l J - l ) ) . Set ro = To + P , f ( 7r ( 0 1 (4))+ P 2 f ( n ( 0 2 ( x>>>1 (14)
= 0, and for every 1 5 j IJ - 1 let rj(Tj) be the instant
that the customer present at time 0 in position j , leaves the where a( y ) E S is the state to which the process jumps from
system. By the coupling, rj and Fj are identical. state y after policy n takes the action whether or not to route
Since n routes from position J , the customers that are a customer to server 2 at state y . Also, define the optimal
present at time 0 in the first ( J - 1) positions, will be routed dynamic programming operator T: 84 8,by
to server 1. Thus, rj is distributed as the sum of j indepen- ( T f ) ( x ) = I X I + P [ h m i n f ( n ( A ( x ) ) )
dent geometric r.v.'s with parameter p , . By the definition of
the resequencing delay, we therefore have + P , m i n f ( a ( D , ( x ) ) ) + P * m y ( n ( W ) ) ] . (15)

Notice that if the decision at state y , n(y), for which the


min, f(?r( y ) ) is attained, is consistently chosen for every y ,
then Tf defines a stationary policy n' which satisfies

(T,,f)(X) = (Tf)(x) = min(T,f)(x).


r (16)

For this case the lemma follows by defining The procedure by which a new value function is derived by
J- 1 using operator T is known as value iteration, and by which
hp(R) = liE[l + /3 + + p ' ~ - ~ ] . (10) a new stationary policy is derived by using T, as policy
i= 1 iteration.
Case i): Assume that R(I, (0,. * 0)). If 1 = 0 then the
a ,
Theorem 3.1: The routing decisions of the 0-optimal
lemma is trivial. For 1 > 0, let r be the instant that the policy are independent of the state of the resequencing queues,
customer present at time 0 in server 2, completes his service. R.
Clearly, 7 is geometrically distributed with parameter p 2 . Proof: First, we show that if n's decisions are inde-
We have pendent of R , so are the decisions of the policy derived by
the policy iteration TV.! Then we show that the optimal
= IT(t)I + l , forOst<r; policy preserves the same property. For every f E 9, define
= IX(t)I, for t 2 r . g f ( n , R ) =f(n,l,O,R,O) - f ( n - 1 , 1 , 1 , R , J - I),
for n 1 J - 1. (17)
For this case the l e v a follows by defining
P(R) = IE[I + p + ... + p 7 - q . (11)
let ?ro be a policy whose routing decisions are independent of
R , and for every m 2 0 define T,+~ as the policy that is
From (10) and ( l l ) , the function derived by the policy iteration TV!m. That is, T,m+,V!m =
From (15) and (17), n,+,(y) is either 0 or 1,
h p ( R ) = 1E[1 +p + +p'-'] TV!m.
depending on whether g B ( n , R ) is negative or nonnega-
'Tm
J- 1
+ ic
tive, respectively. From Lemma 3.3 it follows that if n,'s
liE[l + p + ..' + 0 q (12) decisions are independent of R , then g,P ( n , R) =
= 1
g P ( n , [O]), which implies that T , + ~ ' S decisi& are also
satisfies (9). Here the expectations are taken with respect to ,.v
independent of R .
the geometric r.v.'s which are clearly independent of R . U Since a limit point of { n,} does not necessarily exist, we
The function hp(R ) represents the accrued discounted cost cannot deduce the theorem by the policy iteration procedure.
that is contributed by the customers present at time 0 in the However, we can extract it by the value iteration procedure
def
resequencing queues. For later references denote h @( k ) = as follows. Consider the sign of g,O, where V p= inf, V:
hP((O,(0; -
0, 1,0, * * , 0))), where the 1 corresponds to
a , is the @-valuefunction.
position k . Observe that from (10) Since no's decisions are independent of R , it follows by
the argument above that so are n,'s decisions, and by
+
h @ ( k 1) > h P ( k > . (13) Lemma 3.3, the sign of gv!m(n, R ) is independent of R ,
By using Lemma 3.3 and the following value and policy rn 2 0. since the limrn+- V!m exits and equals to V p (see,
AYOUN AND ROSBERG: OPTIMAL ROUTING TO TWO PARALLEL HETEROGENEOUS SERVERS 1443

e.g., [4,Lemma 3]), the sign of gvs(n, R ) is also indepen- define the function
dent of R .
To conclude the proof, observe that the P-optimal policy y p ( k )= E [ 1+ p + . * .+ p z q
a*, is the solution to the optimality equations V B= W O . - E[1 + + * e * +PX(k)-'], k 2 1. (18)
Now, from (15), a * ( y ) = 1 if and only if g v s ( n , R ) =
def
g " P ( n , [O]) 2 0, and the solution is independent of R . 0 Note that for = 1, y ( k ) = yl(k) = E[Z,] - E[X(,,].
Theorem 3.1 would also apply to the optimal policy with The function yB(k ) represents the difference in the accrued
respect to the average cost, if one could guarantee the cost that is contributed by a customer present at time 0 in
following limits position k , under the two alternative routing policies above.
+
recalling that CY = pl /pl p,, we obtain by the forward
equations.
g* = lim (1 - Pk) v ~ ~ ( o ) y#) = CYEIPTo]yp(k- 1) - (1 - CY)
Pk-t 1

for some sequence { 0,). Then, since g v ~ n( , R ) is indepen-


+
. E [ p T o + x ( k - 1) ( 1 0 + * e * +Pxk-I)]
(19)
dent of R , the result is a straightforward consequence of the
where To is the time of the first service completion at one of
following optimality equations for the average cost problem: the servers-either real or dummy. (Recall that without loss
+ +
of generality we assumed that h pl p2 = 1. Hence, To
is geometrically distributed with parameter pl p,.) +
From (19) it is clear that y p ( k ) is a decreasing function.
Furthermore, there are E > 0 and 0,< 1, such that
By Lemma 3.7 and Remark 3.3 below, if h < pl the limits limk+myp(k ) < - E for every P I0,. The latter property
above follow from [5, Theorem 31. is an immediate consequence of the facts that limk+m([ Z, -
Hereafter, we may further restrict attention to policies X(,,I - [ X ( k - l )- X(k,]) = 0 with probability one,
whose routing decisions (to server 2) are functions of the E[+-,, - X(,)I = - l / p l and limo-tl yp(k) = E[Z,l -
length of queue Q only. Although this structure is the same E[X(,,]. Since yp(l) > 0, there exists an integer J,(/3) 2 2,
as in the problem without resequencing delays, it does not for which the function y B ( k ) ( P IPo) becomes strictly
imply that we have the same optimal policy. This is due to negative for the first time.
the different evolutions of the cost structures. As our prime interest is the average cost criterion, we
2) An Optimal Routing Position: In Section I, we de- consider 0's for which J,(@ = Jo(l).Since the j,(P)'s are
scribed the optimality property of J , that has been derived in integers and y p ( k ) --+ y ( k ) , it is clear that there exists a
[3] for the class of fixed-position threshold policies t,. In P1 < 1 such that Jo( P) = J,(1) for every P IP1.
this section, we extend this property to a more general class Finally, we show that J,(1) = J, (which is defined in (5)).
of fixed-position policies. From the memoryless of the geometric distribution it is
From the results above, the 0-optimal fixed-position rout- standard to show that y ( k ) = a k / p 2 - (1 - cr),/p1, from
ing policy can be taken in the class of stationary policies that which it follows that y ( k ) < 0 if and only if ( p , /pl)(l +
are functions of two parameters. i) The set of lengths of p, / p J k - ' > 1. The latter relation implies that J,(1) = J,.
queue Q at which a policy routes a customer to server 2 (if Now we are ready to prove the optimality property of J,.
idle). ii) The position J from which customers are being We need the following policy transformations, which are
dispatched. Note that since the position is fixed, the set in i) applied also to nonfixed-position policies.
is restricted by the position in ii). i) For every integer 1 I1, every routing policy a and
We say that a class of policies II is routing-invariant, if k 1 2, define Ti(1,a) as the nonstationary policy that dif-
the policies differ only by the positions from which customers fers from a only by the following action at the lth step. If a
are being dispatched. That is, for every a EII, the sets in i) routes a customer from position k , then Tt(1, a) routes a
+
above are identical. One example is the class of threshold customer from position k 1, if not empty. Otherwise (at
policies with level m and routing positions J , J 5 m. Let the lth step and other steps), it takes the same routing actions
J ( n ) be the set of routing positions that correspond to class as a. That is, at step 1, TL(Z,a) routes customers to server
n. We will show that the optimality property of J , holds for 2, at the same length of queue Q that a routes, but possibly
every routing-invariant class. from a higher position. At other steps, Tl(1, a) routes
To proceed, we first characterize J, in terms of the customers to server 2, at the same lengths and from the same
expected delay of a customer in position k under two positions as a routes.
alternative policies. One is a policy that routes customer k to ii) For every integer 12 1, every routing policy a and
- -
server 2, and customers { 1,2, , k - 1} to server 1. The k I3, define T;(l, a) as the policy that differs from a only
other policy routes customers { 1,2, e ,k } to server 1. by the following action at the ith step. If a routes a customer
Let { Xi}be a sequence of independent geometric r.v.'s from position k , then TL(Z,a) routes a customer from
with parameter pl, and Y an independent geometric r.v. with position k - 1. For 1 = 1, these transformations will be
parameter p,. For k 2 1, denote X(,, = C;=,X, and Z, = denoted by Tk+(a)and T i (a),respectively.
max { Y , X(,- where X(,, = 0. For every 0 < I1 In the following lemma, we consider policies that may
1444 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36, NO. 12, DECEMBER 1991

route from variable positions (with some restrictions), and X ( 2).From the identities for the rest of the departure times
show that if position k , k # J,, is feasible under a, then a
can be improved by one of the transformations above. V,P(n - 1 , 1 , 1 , R , k - 1) - V!(n - 1 , 1 , 1 , R , k )
Lemma 3.4: The following hold for every p, I0 < 1: = + I ) { E ~+ p ... +p7-11
(I,
i) if a routes customers to server 2 from positions larger
than or equal to k , k < J,, and k is a feasible position, then -E[1 + +p'-']}.
* e * (20)
T , ( l ,a), I 1 1, is at least as good as a; Thus, we have to show that the expression within the braces
ii) If a routes customers to server 2 from positions larger is nonnegative. To prove this, first note that customer k in
than or equal to k - 1, k > J,, and k is a feasible position, system 2 is routed to server 2, if and only if ( k 1) in X +
then there is a p2 < 1 such that Ti(1, a), 1 2 1, strictly is routed to server 2. Also note, that _since a routes from
improves a for every p2 5 < 1. positions k or higher, customer k in X would definitely be
Proofi The proof is based on a pathwise comparison o,f served by server 1, if this server would complete his first
the state process X under policy a to the state process X service before server 2 does. This event occurs with probabil-
under policy T i (a)(for part i)) or T; (a)(for part ii)). To ity PI I P l + P2.
compare realizations, we couple the arrivals and service Let To be as in (19) and T, be the number of steps after
completions in both systems. (Note that a service completion +
To that it takes to route customer ( k 1) in X to server 2
may correspond t,o different customers in X and X . ) The (and infinite, if he is routed to server 1). Denote by 7 the
process X and X are identical until step 1. Hence, for our conditional probability (conditioned on the state at time To)
pathwise comparison, we may assume without loss of gener- that { Tl < a}.By using the forward equations from time 0
ality that the processes start at step 1. Therefore it suffices to to To, it follows from the definitions of 2, and X(,) that
prove the lemma for T i (a) and T;(T).
Part i): Let x, = ( n ,1,0, R,O), n 2 k , be an initial
E[(1 + p ... +p7-1) - (1 +p . - * +p'-l)]

state at which a and T i (a)routes from different positions.


Since for all other initial states V!( x ) = V&(T)(x ) , we have
to show that
V,B(n - l , l , l , R , k - 1) - V!(n - l , l , l , R , k ) 2 0 .
This is due to the fact that after the first action, X (respec-
tively, X ) instantaneously jumps to state ( n - 1, I , 1, R , k
- 1) (respectively, to ( n - 1, 1, 1, R , k ) ) . From then on, -(1 +p ' * . +pZ*-l))]

both processes are governed by policy T. P2


As in the proof of Lemma 3.1, we may assume without - -E[(1 - B)pTO+X'k-l)
P l + P2
loss of generality that the customers in server 1 and in queue
Q are numbered by 1,2, +
n 1. We will show that for
e , '(1 + p - * * +OX"-')].
every customer i # k , its departure times in both systems are
Thus, from (18), (19) and the fact that
the same, while the expected departure time of customer k is
smaller in X . (1 + p . . . +px(k,-1) = (1 + p . +pX(P-l)-I
1
Since a routes from positions larger than or equal to k , it
follows from the coupling that every customer i < k (and
+ pX(.-l)(1 + p - * ' +pxk-')

those that at time 0, are being delayed by him), would leave with probability one, we have
both systems at the same time. Furthermore, the departure
+
times of customer k 1 (and those that at time 0, are being E[(1 +p . - * +/3-1) - (1 +p ..* +pi-l)]

delayed by him) would also be the same. This is plain from


+
the fact that ( k 1) leaves the system at the first instant at
-- +
which customers { 1,2, * , k l } have been released. Since
states (n - 1, 1, 1, R , k - 1) and ( n - 1, 1, 1, R , k ) differ
only by the locations of customers k and k 1, it is +
apparent from the coupling that this instant is the same in
both systems.
+
Every customer i > k 1 in both systems, is routed at
the same time and to the same server, and its completion time
is also the same. Since he would leave the system at the first
instant at which he and all preceding customers would have
been released, it follows by induction that its departure time
must be the same in both systems. Hence, it is left to show The last inequality follows from the monotonicity of y p ( k ) ,
that the expectedaccrued cost due to the delay of customer the definition of J , and the fact that k < J,. This completes
k , is smaller in X . the proof of Part i).
Let ~ ( 7 " ) be the departure time of customer k from system Part i): Let x, = ( n , 1,0, R , 0), n L k - 1, be an
AYOUN AND ROSBERG: OPTIMAL ROUTING TO TWO PARALLEL HETEROGENEOUS SERVERS 1445

initial state at which n and T i ( a ) routes from different nonstationary policy +


= TZ(1 1, n[) (alternatively
positions. Since for all other initial states V!(x) = +
= T i ( 1 1, a[)). Notice that for every I , nI satisfies
V & J x ) , we have to show that the conditions of Lemma 3 . 4 and therefore nI+l improves
nl. Let n, be the limiting policy. Policy n, is stationary
V!(n - 1 , 1 , 1 , R ,k - 2) and routes customers at the same queue lengths (of queue Q)
- V!(n - 1 , 1 , 1 , R , k - 1) < 0. that no does. However, if T, routes from position J > J,,
then n, routes from position ( J - 1). If no routes from
As in Part i), the departure times of every customer i # ( k
- 1) in both systems are the same and therefore it suffices to
position J < J,, then n, routes either from position ( J 1) +
(if not empty) or from position J (otherwise). Furthermore,
show that Lemma 3.4 and Remark 3 . 1 imply that for J > J,, n, is
E[1 + p . - .+ p 7 - 1 ] - E[1 + p ... + p<]0. (21) strictly better then no, and for J < J,, n, is at least as good
as no. For J < J, - 1, and for J = J, - 1 with yb(J, -
Here, 7 and 7" relate to the departure times of customer 1) > 0, n, is strictly better then no.
( k - 1). Similarly, alter the definitions of Tl and 7 in Part i) The following theorem extends the optimality property of
by relating them to customer ( k - 1). Again, by the forward J, to any routing invariant class.
equations we have Theorem 3.2: The following hold with respect to the
P-discounted cost, p2 I < 1, and to the average cost
E[(1 +p .*. +pf-1) - (1 +p '.' +p7-')]
criteria.
a) For every routing-invariant class ll which result in
-
--
PI
E [ pro( (1 +p +pZ"-'- ) positive recurrent Markov chains:
PI + P2 a.1) if J, E J ( l l ) , then the policy n E II that routes
- (1 +p . .. + p X w - l >)I from position J, is optimal within ll;
a.2) if for every J € J ( l l ) , J > J,, then the policy
+- P2 E [7 p T o + TI(( 1 + p . . . +pxu-2)- 1 ) n E ll that routes from position J* = min { J I J E J ( l l ) } , is
PI + P2 optimal within ll;
- (1 + p . .. + p Z k - - ' - l a.3) if for every J E J ( ~ )J, < J,, then the policy
111 n E ll that routes from position J* = max { J 1 J E J ( l l ) } , is
-~ P2 E [(1 - v)p=O+X'k-2' optimal within ll.
PI + P2 b) For every fixed-position routing policy n that routes
from position J and result in a positive recurrent Markov
.(1 + p . +pxk-l-'
)I chain:
P2 b.1) if J > J,, then a is inferior to the policy that
-
- ( I----
PI + P2
E[vBTo++b(k - 1) routes at the same lengths of queue Q, but from position J,.
b.2) if J < J,, then n is inferior to the non-fixed
+- P2 E [7 p T o + x ( * - 2 ) position policy that routes at the same lengths of queue Q,
PI + P2 +
but from position ( J 1) if not empty, and from J other-
.(1 + . e +pXk-I-l )(1 - PT9].
(22) wiseproof: The proof for the /3-discounted cost criterion is
To complete the proof we have to reduce the last positive immediate from Lemma 3 . 4 and the discussion that follows.
term above. Observe that given { Tl < m}, T , is definitely Indeed, within a routing-invariant class ll, one can succes-
smaller than the first service time that would be given by sively improve a policy by gradually increasing (accordingly,
server 1 after To. Therefore, Tl is stochastically smaller decreasing) the routing position within J(II), until one hits
than XI. Hence, by Jensen inequality the second summand in J,, m a x ( J 1 J E J ( ~ >or} min{JI J ~ J c l l ) } Parts
. a.l),
the right-hand side of (22), can be made arbitrarily close to a.2), and a.3) follow, respectively. Furthermore, b.1) is an
zero, for /3 arbitrarily close to one. immediate consequence of a. l ) , and b.2) follows from Part i)
Finally, by the definition of J, and the fact that ( k - 1) > of Lemma 3.4.
J,, it follows that y p ( k - 1) < 0. Thus, there is a p2, The results for the average cost criterion follows form [5]
PI Ip2 < 1, such that the right-hand side of (22) is negative by using the convergence
for every p2 I/3 < 1. This completes the proof of Part
lim (1 - ~ ) v ! ( =
x )V ~ ( X )
ii) . 0 i3+ 1
Remark 3.2: If yp(.Jo- 1) > 0, then in Part i) of Lemma
3.4, T l ( 1 , T ) strictly improves n. which holds for problems with a linear cost structure and
From Lemma 3.4, TZ(1, n) and T i ( 1 , n), 1 1 1, are continuous state jumps as ours (see [ 5 ] ) . 0
improvement transformations of policies that route from posi- 3) An Optimal Policy of Threshold Type: In this section
tions other than J,. Therefore, they could successively be we show that if the routing position J is greater than J,,
used to obtain a limiting stationary policy. then an optimal policy with respect to the average cost
Let no be a fixed-position routing policy that routes from criterion exists, and is of threshold type. Assume that J > J,
position J # J,. For every 1 2 0, recursively define the and start with the 0-discounted problem, Po I < 1. Recall
1446 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36, NO. 12, DECEMBER 1991

nIJ-2.
yO(J- 1) < 0 , p 2 < p < 1. (23)
Here, 1, = (I, (I,;**, Zk;-., Z,-,) = (0, (0, 1,
The proof is based on policy iteration and develops along the * * ,O)).
same lines as the proof in [4], with some changes that are From Lemma 3.5 one may show by successively using the
required from our different state space. Define a partial order operator Tt,, that Vt", also satisfies properties a)-f) of the
"- < " on the states, as follows. Recall that a state x is a lemma. Indeed, it is easy to construct in a recursive manner a
tuple x = ( n , e,, e2, R , k ) . We say that x i y , x,Y E S ,if function f, that satisfies properties a)-f). From the lemma it
at least one of the following conditions hold: def
follows that T:n+lfo= T t $ T t f o ) , n 2 1, also satisfies these
i) x = y (component-wise);
properties. Now, since limn+mT c fo = yt, we obtain the
ii) x = D , ( y ) ;
following corollary.
iii) x = D,(y);
Corollary 3.1: For every m 2 J , the P-discounted cost
iv) A ( x ) = y ;
function under policy t, satisfies properties a)-f) of Lemma
v) all components of x and y are equal except for one,
3.5
which is smaller in x .
The next lemma is the basis of our final result and its proof
vi) there is a Z E Ssuch that, x l z and ~ ( y .
is similar to that in [4, Lemma 41. The assumption J > J ,
For every f E ,F we also define the function:
and the property in (23) are crucial for reproducing the
proof. The lemma asserts that the new policy that is obtained
%(n, k )
from 5; by the policy iteration procedure, is also of thresh-
f(n-2,l,l,[O],k) -f(n-3,l91,[O],k), old type.
n 2 3 , O < k < m i n { n - 2 , J - I}; Lemma3.6: For every m,, 2 I J, < J 5 rn, < 00, there
+
exists an m,, J I rn, I m, 1, such that Ttm,5B_o =
f(O,l,l,[O],l) -f(o,0,1,[0],0)~ Tqo.
n=2, k=0. Proof: To prove the lemma we need to explore the
(24) properties of the function

g,p,o(n, R ) = v,",,cn, 190, K O )


- Vt",Jn-l,l,l,R,J-l), forn2J-1. (26)
n=l. From Lemma 3.3 it suffices to explore the function
(25)
def
g( a ) = g ,!
( n , [O]). This will be carried out by using the
In the following lemma we list some properties of f E 9 that forward eq2tions in (6) and representing g( n) in a recursive
propagates to Tt,f,m 2 J . This will be used to show that form. The forward equations depend on the value n and we
under every threshold policy t,, Vt", also satisfies the same separately consider all possible cases.
properties. Case :)i 1 5 J - 1 5 n < rn, - 2. (The policy tmodoes
Lemma 3.5: If f E 9 satisfies the following properties, not route a customer at queue lengths n 1 and below.) +
so does Tt,f,m 2 J : From (6)
a) for every x,y E S , if x F y then f(x) 5 f(y ) ; g ( n ) = PA[ q , ( n + 1 , 1 , 0 , [O],O)
b) for every n 2 2, A,(n, J - 1) 2 hO(min{n - 2, J
- 1)); - vp,,(n > 191, [o] 3 J - I)]
c) for every n 2 2, A>(n) 2 h'(min{ n - 1, J - 1));
d) for every n 2 2, A,(n, s) = A,(n,O), 0 < s < + OF,[ v g n - 1,1,0, [o] 3)
min{n - 2, J - 1);
- q n - 2 , 1 , 1 , [o] J - 2)]
e) f<n,e,, e2, R , k ) = f(n,e,, e,, LO], k ) + h'(R);
9

f) f(n - 1, 1, 1, [OI, k - 1) - f ( n - 1, 1, 1, [OI, k ) = + O F 2 [ q , c n >1>03101 0 ) >


ya(k), 11 k~ min{n, J - 1).
The proof of this lemma is standard but extremely tedious - V[,(n - 1 , 1 , 0 , [O] + 1,-,,0)]. (27)
and we do not present the details here. The main lines are as
follows. The function Tt,f is represented via (14) and the
properties are verified one by one. The full verification is
First, note that the expression in the first braces is g(n 1). +
Next, add and subtract V,",$n - 2, 1, 1, [O], J - 1) within
given in [2] and the reader may reproduce it based on the the second braces. From the proof of part ii) of Lemma 3.4
following properties which are easily shown: and the assumption J > J,, we have
+
hP( k 1) > hO( k ) . , ,

f(n + l , l , l , [ O ] , k ) 2 f ( n , 1 , 1 , [ 0 ] + l,, J- I),


vt,(n - 2 , 1 , 1 [O] , J - 1)

k s J - 1 - V t , ( n - 2 , 1 , 1 , [ O ] , J - 2) > 0. (28)
AYOUN AND ROSBERG: OPTIMAL ROUTING TO TWO PARALLEL HETEROGENEOUS SERVERS 1447

Thus, by (27) we obtain From (6)

g ( n ) > P k ( n + 1) + P a l g ( n - 1)
+ Pa2[v g n , 1705 [ 0 ] 4
- v g n - 1 , 1 , 0 , [o] + 1 , - , J q ]
2 PXg(n + 1) + Pa,g(n - 1)

+ Pa2[v g n , 190, [o] ,o) Again, from (28), the expression in the first braces is greater
than g ( n - 1). Furthermore, from Corollary 3.1 and prop-
- v ; p 1 , 1 , 1 , [ 0 ] , J - I)] erty a) of Lemma 3.5
= p X g ( n + 1) + P p , g ( n - 1) + P P 2 g ( n ) .
s ( n ) IPa& - 1). (31)
The last inequality follows from Corollary 3.1 and property Case iv): n 1 m , I3. (The policy tmoroutes a customer
+ +
a) of Lemma 3.5. Since X a, p2 = 1, we obtain for this at queue length n - 1 and above.)
case From (6)
(1 - P)g(.) - PX(g(. + 1) - g ( n) = Pp1 [ vqn - 2,171, [o] , J - 1)
IPa.l(g(n - 1) - g ( n ) ) ,
1 IJ - 1 In Im , - 2.
(29)
The same inequality is obtained for 1 In < J - 1, by
defining g ( n ) = Kt(n, 1, 0 , [OI, 0) - v[$n - 1, 1, 1,
[O], n), 1 5 n < J - P. From (28), the expression in the first braces is positive. From
Case i): n = m , - 2 > 1. (The policy tmoroutes a cus- Corollary 3.1 and property a) of Lemma 3.5, the expression
tomer at queue length n +
1, but does not route at queue in the second braces is also nonnegative. Thus
lengths n and below.)
From (6), the definition in (25), property e) of Lemma 3.5
and (28), we have
To complete the proof note that from (29)-(32), g ( n ) satis-
s(n) = P P I [ v;,(n - 1 , 1 , 0 , [o],o) fies the conditions of the corresponding function in [4, Eq.
(lo)]. As a consequence, the rest of the proof is identical to
-vp,,(n - 2,131, [o], J - 2)] the proof of [4,Lemma 41, and our lemma follows. 0
The assertion of the next theorem and its proof are identi-
+ Pa,[ Y p , 190, [ 0 ] , 0 ) cal to [4,Theorem 51. The proof applies the convergence of
the policy iteration to the 0-optimal policy.
- v;p - 1 , 1 , 0 , [o] + 1,-,,0)] Theorem 3.3: For every J > J, and P2 IP < 1:
i) there exists a stationary policy of threshold type, with
= PP& - 1) + PP2[ v;p, 190, [0],0) threshold m*(P) I00;
ii) if V $ X ) < v;+Jx), for some state x,then r n * ( ~ I
)
-v;p - 1 , 1 , 0 , [ 0 ] , 0 ) m.
In our final theorem we show by applying [5, Theorem 31,
+v;,(n - 1 , 1 , 0 , [ 0 ] , 0 ) that the optimal policy with respect to the average cost is of
threshold type. Here, we cannot reproduce the results from
- vp,,(n - 1 , 1 , 0 , [o] + l J - l , o ) ] [4, Section IV] since a close form for Vtmis intractable. We
will show instead, that Assumptions 1-5 of [5, Theorem 31
IPa,g(n - 1) + PP*(A'va,(" + 1) - h P ( J - 1))
hold for our problem. The main assumption requires the
following lemma.
1P p , g ( n - 1). (30)
Under every threshold policy t,, m Iw , define 7,
The last inequality follows from property c) of Lemma 3.5. (respectively, C,) as the number of steps (respectively, the
The same inequality is obtained for the case n = m , - 2 = accrued cost) until the first return to an empty system. Also,
1. Observe that the assumption J > J, > 2 and the require- let Ex(7,J and Ex(Cm)be their expected values given that
ment m , IJ, implies that m , > 3. the system starts at state x.
Case iii): n = m , - 1 I2. (The policy tmo routes a Lemma 3.7: If X < p , , then for every state x,
customer at queue length n and above.) SUP, E,(T,) < 03 and suprnE,(C,) < 00.
1448 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 36, NO. 12, DECEMBER 1991

Proof: For m = 03, 7, is distributed as the first return The finiteness of E ( u 2 ) follows from the fact that all mo-
time to state 0 in an M I M I 1 queue with arrival and service ments of 7, and K are finite. o
rates X and p l , respectively. Since X < p l , E,(7,) < W . Remark 3.3: Since at most one customer could be served
We will show that this implies a uniform bound on Ex(7,). by server 2 at every instant, the same proof implies that the
For every m < m, consider the systems that operate under expected return time and accrued cost, given that X ( 0 ) = x,
t, (i.e., M / M / l ) and under t,. To compare their paths, we is uniformly bounded over all admissible policies.
feed them with the same arrival process and couple the Now we are ready for our final theorem.
service completion times-either real or dummy -in both Theorem 3.4: For every J > J,, there exists a stationary
systems. (To clarify the coupling, imagine the servers pro- policy of threshold type t,, whose level is a limit point
ducing completion events at rates p , and p,, irrespective ,
m* = lim pk+ m*( P k ) (m* could be infinite).
whether or not a customer is being served. When a comple- Proof: We consider two cases.
tion event occurs in a server that is serving a real customer, +
Case i): pl IX < p1 p,. The proof of this case is
this customer would complete his service. Our coupling is identical to the proof of the corresponding case in [4, Lemma
referred to these completion events, irrespective of the cus- 7, Theorem 81. In this case, due to the instability of t,, it is
tomer identities that are being served. It is quite clear that for also shown that m* < W .
exponential systems, this view is statistically the same as Case ii): h < p,. In this case we apply Theorem 3 from
identifying the services with the customers.) [ 5 ] . Assumptions 1-3 there, trivially hold in our problem.
Under t,, define U as the first instant that server 2 From Theorem 3.3, the policies tm*,R,are B-optimal for
completes a dummy service immediately after the system every P2 5 P < 1. Therefore, Lemma 3.7 implies that As-
becomes empty. That is, at time U - 1 the system just sumptions 4 and 5 there also hold, for a subsequence Pk 1 +

became empty and the next jump was due to a service for which m*(Pk)-+ m*. Hence, our theorem is a direct
completion in server 2. Observe that from our coupling, at consequence of [ 5 , Theorem 31. 0
time U , both systems are empty. Hence To combine this result with the results of Theorem 3.2
about the optimal position, let tm*(J),J > J,, be the optimal
SUPEXb,) s EX(0). (33) policy with respect to the average cost, given that the routing
m
position is J. From the proof of Lemma 3.4, Part ii), it is
From the renewal property of state 0 in an M I M I 1 queue clear that tm*(Jo+l) is at least as good as tm,(J).Hence,
and that of the residual completion time, U can be repre- ,) is at least as good as any other fixed-position policy
sented as follows. Let Bi, i 2 1 be the ith time that the that routes customers from J > J,. Furthermore, from The-
system (under t,) is empty, and K be the number of returns orem 3.2, part b. l), the policy t$3Jo+l, that routes cus-
to an empty system until server 2 completes a service imme- tomers whenever t,,,Jo+,, does, but from position J,, is at
diately after the system becomes empty. We have least as good as t,,(Jo+,). Thus, the following corollary is
obtained.
0 = (e, + 1) + (e, + 1) + ... +(e, + 1). (34) Corollary 3.2: The policy t,$(Jo+l,is at least as good as
any other policy that routes from position J > J,.
Since at every step, the probability of a service completion at REFERENCES
server 2 is p, > 0, K is geometrically distributed with A. K. Agrawala, E. G . CoiEnan, Jr., M. R. Garey, and S. K.
parameter p,. Furthermore, O i , i 2 2 are i.i.d and indepen- Tripathi, “A stochastic optimization algorithm minimizing expected
flow times on uniform processors,” IEEE Trans. Computers, vol.
dent of K . By definition C-33, no. 4, pp. 351-356, Apr. 1984.
S. Ayoun, “Optimal control of a queueing system with two heteroge-
E @ , ) = Ex(7m) < 00; neous servers with resequencing,” M.S. thesis, Dep. Electr. Eng.,
Technion, Haifa, Israel, Feb. 1989.
E@,) < max { E o ( 7 m ) ; E1(7,)} < (35) I. Illiadis and Y. C. Lien, “Resequencing delay for a queueing system
with two heterogeneous servers under a threshold-type scheduling,”
IEEE Trans. Commun., vol. COM-36, pp. 692-702, 1988.
where the indexes in E,(.) correspond to states 0 and 1 in W. Lin and P. R. Kumar, “Optimal control of queueing systems with
the M / M / 1 queue. two heterogeneous servers,” IEEE Trans. Automat. Contr., vol.
From (33)-(35) and Wald’s lemma AC-84, pp. 696-703, Aug. 1984.
S. A. Lippman, “Semi-Markov decision processes with unbounded
rewards,” Management Sci., vol. 19, pp. 717-731, 1973.
supE,(~,) IEx(7,) + E ( K ) ( 1 + E ( 0 , ) ) < m. (36) -, “Applying a new device in the optimization of exponential
m queueing systems,” Operations Res., vol. 23, pp: 687-710, 1975.
R. L. Larsen, “Control of multiple exponential servers with applica-
tion to computer systems,” Ph.D. dissertation, Tech. Rep. 1041,
To prove that sup, E,(C,) < 03, denote by A ( t ) the Univ. Maryland, College Park, MD, 1981.
number of arrivals until time t. Given that the system under M. I. Reiman, “Optimal control of a heterogeneous two server queue
t,, m I03, starts at state x in light traffic,” AT&T Bell Lab., Murray Hill, NJ, 1989.
M. I. Reiman and B. Simon, “Open queueing systems in light
C, 5 1X I + u*A(a). traffic,” Math. Operations Res., 1989.
Z . Rosberg and A. Makowski, “Optimal routing to parallel heteroge-
neous servers-Small arrival rates,” IEEE Trans. Automat. Contr.,
From the Poisson arrivals vol. 35, pp. 789-796, July 1990.
M. Schal, “Conditions for optimality in dynamic programming and
for the limit of n-stage optimal policies to be optimal,” Z .
AYOUN AND ROSBERG: OPTIMAL ROUTING TO TWO PARALLEL HETEROGENEOUS SERVERS 1449

Warscheinlichhreitstheorie Verw. Gebiete, vol. 32, pp. 179-196, Zvi Rosberg received the B.Sc., M. A., and Ph.D.
1975. degrees from the Hebrew University, Jerusalem,
[I21 J. Walrand, “A note on the optimal control of a queueing system with Israel, in 1971, 1974, and 1978, respectively.
two heterogenec)us servers,” Syst. Contr. Lett., vol. 4,pp. 131-134, From 1972 to 1978 he was a Senior System
1984. Analyst in the General Computer Bureau of the
Israeli Government. From 1978 to 1979 he had a
Research Fellowship at the Center of Operation
Serge Ayoun received the B.Sc. and the M.Sc. Research and Econometrics (CORE), University of
degrees in computer engineering from the Tech- Louvain, Belgium. From 1979 to 1980, he was a
nion Institute of Technology, Haifa, Israel, in 1986 Visiting Assistant Professor at the University of
and 1989, respectively. Illinois. From 1980 to 1989, he was with the
Since 1989 he has been with IBM Israel Science Department of Computer Science, Technion, Israel. Since 1990, he has been
and Technology working in the area of image with IBM Israel, the Science and Technology Center. During 1985-1987 he
processing. was on leave at the IBM Thomas J. Watson Research Center, Yorktown
Heights, NY. His main research interest include probabilistic models of
communication networks and computer systems, performance evaluation,
queueing theory, and applied probability.

You might also like