Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Computers & Industrial Engineering 62 (2012) 1011–1024

Contents lists available at SciVerse ScienceDirect

Computers & Industrial Engineering


journal homepage: www.elsevier.com/locate/caie

Three m-failure group maintenance models for M/M/N unreliable queuing service
systems
Gia-Shie Liu ⇑
Department of Information Management, Lunghwa University of Science and Technology, No. 300, Sec. 1, Wanshou Rd., Guishan Shiang, Taoyuan County 33306, Taiwan

a r t i c l e i n f o a b s t r a c t

Article history: This paper considers group maintenance problems for an unreliable service system with N independent
Received 15 September 2009 operating servers and a Markovian queue. A specific class of group maintenance policies is developed
Received in revised form 6 November 2011 where the repair is started as soon as the number of failed servers reaches a predetermined threshold.
Accepted 19 December 2011
This is actually a Quasi Birth-and-Death Process with two dimensions, the level for the arrival/service
Available online 5 January 2012
process and the phase for the failure/repair process. Two models with positive repair time and another
with instantaneous repair are considered. The matrix geometric approach is applied to calculate the
Keywords:
steady state distribution and the expected average cost for all three models. For the theoretical analysis,
Group maintenance
m-Failure
this paper proves that there exists an optimal group maintenance parameter m, which can find the min-
Matrix-geometric imal average cost for all three models. Additionally, some mathematical properties and sensitivity anal-
Queuing system yses are numerically demonstrated based on various parameters. Finally, the comparisons of these three
Continuous time Quasi Birth-and-Death proposed models in many aspects are also discussed.
Processes Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction operating normally even some service units are malfunctioning.


Of course, the multi-unit system can also provide more sufficient
It is very important to keep service systems operating normally services to meet all kinds of requests or missions given by custom-
in this service oriented era. Falling this, a lot of cost would occur ers. The simplest maintenance policy for the multi-unit system is
due to the loss of customers and the delay of the production of to treat all units independently and follow the same maintenance
goods. For examples, most telephone companies or mobile phone policy separately for each unit in the system. Nevertheless, for
companies will lose their customers because of frequent communi- those specific systems focused by this paper, the replacement
cation traffic jam problems caused by failures of their service sys- activity is not possible or difficult to be implemented immediately
tems; the failure of the related power systems operated by the upon every single unit failure because the missions or operations
power company will cause the production delay of some manufac- cannot be canceled immediately for safety or it will cause a lot of
turing companies; moreover, for some systems such as nuclear setup cost, production delaying cost, customer holding cost or cus-
power systems, military weapon systems, airplanes, and subma- tomer loss cost to stop the service/production operations. In that
rines, it is exceptionally essential to keep away from breakdowns case, it is obvious that group maintenance policies can play a key
during their operation since it can be falled in a very dangerous role to keep those multi-unit systems operating normally.
and disastrous condition. To reach this goal, besides inherent reli- In recent years, a large amount of researches has been devoted
ability, an appropriate maintenance policy can be applied to avoid to finding optimal maintenance policies under various assump-
failures of such operating systems. tions. The maintenance models for multi-unit systems are gener-
This paper will focus on maintenance problems of some specific ally based on those for single-unit systems. For single-unit
large service or operating systems such as telecommunication sys- systems, various maintenance policies, such as block replacement
tems, internet service systems, electric power systems, nuclear policies (Beichelt, 1993; Chien & Chen, 2007; Kennee, Gharbi, &
power systems, military submarines, aircraft carriers, space sta- Beit, 2007; Scarf, Dwight, & Al-Musrati, 2005; Sheu, 1997, 1998;
tions, satellite systems, and automated manufacturing systems. Sheu & Griffith, 2002), age-replacement policies (Beichelt, 1993;
Since the services or functions provided by these systems are so Scarf et al., 2005; Sheu, 1998; Shen & Chien, 2004), periodic pre-
important, most of them are multi-unit systems, which can apply ventive maintenance policies (Jung & Park, 2003; Jung, Park, &
parallel or redundant system designs to keep the whole system Park, 2010; Sheu, Lin, & Liao, 2006; Vaughan, 2005; Yeh & Lo,
2001), failure limit maintenance policies (Arunraj & Maiti, 2010;
⇑ Tel.: +886 2 26788221 (H), +886 2 82093211x6328 (O); fax: +886 2 26788221. Carazas & Souza, 2010; Chan & Asgarpoor, 2006; Das, Lashkari, &
E-mail addresses: liugtw@yahoo.com.tw, liug@mail.lhu.edu.tw Sengupta, 2007; Grall, Berenguer, & Dieulle, 2002; Lapa, Pereira,

0360-8352/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cie.2011.12.028
1012 G.-S. Liu / Computers & Industrial Engineering 62 (2012) 1011–1024

Nomenclature

c number of repairmen k vector of arrival rate in each state of number of operat-


e [1,1, . . . , 1]0 ing servers
f failure rate for each server D(k) diag (k0, k1, . . . , kN)
m the threshold of number of failed servers to initiate l service rate of one server for each customer
repair process lw service rate when w servers are operational
m the optimal m l vector of service rate in each state of number of operat-
N total number of servers in the system ing servers
n the repaired number of failed servers Dðl Þ diag (0, l, 2l, . . . , Nl)
Q infinitesimal generator for failure/repair process Dðk þ l
 Þ diag (k0, k1 + l, k2 + 2l, . . . , kN + Nl)
Q infinitesimal generator for M/M/N continuous time Mar- pw the stationary probability of w servers are operational
kov Process p [p0, p1, . . . ,pN]
P the related transition probability matrix derived from Q S fixed repair cost
r repair rate with one repairman fixing one failed server r_cost variable repair cost per failed server
rw repair rate when w servers are operational h holding cost per customer per unit time
T the threshold of scheduled time to initiate repair A0 DðkÞ
process A2 DðlÞ
s another threshold of scheduled time to initiate repair A2 Q  DðkÞ  Dðl Þ
process T00 Q  DðkÞ
T the optimal T T01 DðkÞ
w number of operational servers in the system Tx2 DðkÞ for x ¼ 1; . . . ; N  1
x number of customers in the system Tx0 diag{lmin(x, w), 0 6 w 6 N}, for 1 6 x 6 N  1,
yxw probability of x customers in system and w servers are Tx1 Q  DðkÞ  T x0 for 1 6 x 6 N  1
operational tij the expected transition time from state i to j
yx [yx0, yx1, . . . , yxN] Mw the expected umber of customer in system when w

y [y0, y1, y2, . . .] servers working
k customer arrival rate M [M0, M1, . . . , MN]
kw customer arrival rate when w servers are operational

& de Barros, 2006; Love & Guo, 1996; Monga, Zuo, & Toogood, each separate unit respectively if there exists economic indepen-
1997; Pham & Wang, 1996; Saassouh, Dieulle, & Grall, 2007; Wu dence, failure independence, and structure independence between
& Clements-Croome, 2005), are proposed combined with minimal units within those systems (Aghezzaf & Najid, 2008). The multi-
repairs, unscheduled replacements and other options based on dif- unit systems discussed in this paper sre apparently economic
ferent situations. It can be noted that most maintenance models for dependent. It means that carrying out maintenance operations
single-unit systems mentioned above usually assume that all fail- on several units simultaneously costs less money or time than on
ures are instantly detected and repaired (Arunraj & Maiti, 2010; each unit individually. There are various types of maintenance pol-
Beichelt, 1993; Carazas & Souza, 2010; Kennee et al., 2007; Scarf icies for multi-unit systems, such as block replacement policies,
et al., 2005; Sheu, 1997, 1998; Sheu & Chien, 2004; Sheu & Griffith, opportunistic maintenance policies, or group maintenance policies.
2002). In the real world, this is not always true; what usually hap- For classical block replacement policies, all units in the system are
pened is that the single-unit system must stop its service and replaced simultaneously at periodic intervals, and each unit failed
maybe lose the customers waiting in the line. Therefore, this paper in between will also be replaced immediately (Barlow & Proschan,
will consider the random repair time and multi-unit service 1965; Sun, Xi, Du, & Pan, 2010). Since the classical block mainte-
systems. nance policies are possible to force the system to replace nearly
For multi-unit systems, a lot of review papers have been con- new unfailed units, many studies have developed various modified
ducted recently (Cho & Parlar, 1991; Dekker, Wildeman, Schouten, block maintenance policies to evade this kind of unnecessary
& Frank, 1997; Wang, 2002). According to the observation of Wang waste (Anisimov, 2005; Archibald & Dekker, 1996; Scarf & Cavalc-
(2002), maintenance policies for multi-unit systems can be imple- ante, 2010; Sun et al., 2010). For opportunistic maintenance poli-
mented by applying single-unit system maintenance policies for cies, upon a unit failure, besides replacing the failed one, other

Fig. 1. Transition flow for group maintenance process.


G.-S. Liu / Computers & Industrial Engineering 62 (2012) 1011–1024 1013

unfailed units are also possible to be replaced depending on some which requires inspection at either the scheduled time T or the
predetermined control threshold strategies such as the control lim- time when exactly m machines have failed, whichever comes first.
it for age (Berg, 1976, 1978; Pham & Wang, 2000), the control limit At an inspection, all failed machines are replaced with new units
for failure rate (Zheng & Fard, 1991). For group maintenance poli- while operating machines are serviced so that they become as good
cies, failed units in the system will be replaced together while as new. Sheu and Jhang (1996) propose a two-phase maintenance
reaching some predetermined thresholds of criteria including time, policy for a group of identical repairable units to select two time
cost, number of failed units etc. Since this paper focuses on some interval and m to minimize the expected cost per unit time for
specific systems in which immediate replacement action is difficult an infinite time span. Besides the above three typical types of
to be carried out while every single unit failure is occured, group group maintenance policies Popova (2004), develops the optimal
maintenance policies are more appropriate to be applied in our structure of Bayesian group replacement policies for a parallel sys-
study. tem of n units with exponential failure times and random failure
According to our observation, there are three main types of parameters. This paper has showed that it is optimal to inspect
group maintenance policies which have been studied in the litera- the system merely at failure times and obtained the exact form
ture. The first type is T-age group maintenance policy. The general of the optimal Bayesian group replacement for a two-unit system.
idea in this policy is that no failed server is repaired until a sched- Assuming that machine failure times conform to a Weibull distri-
uled time T, and then all failed servers in the system are fixed bution Das et al. (2007), optimizes maintenance costs by imple-
simultaneously. Barlow and Proschan (1965) show that the opti- menting a group policy subject to desirable machine reliability
mal scheduled time for preventive maintenance is nonrandom threshold.
and there exists a unique optimal policy if the distribution of time Most group maintenance models mentioned above assume that
to failure has an increasing failure rate. Okumoto and Eslayed the failed machines take negligible repair time and the operating
(1983) show that, in the case of the exponential distributions, a machines produce output at a constant rate (Barlow & Proschan,
closed form expression for T is developed; for general underlying 1965; Assaf & Shanthikumar, 1987; Das et al., 2007; Dekker & Roe-
failure distributions, bounds for T are derived. Dekker and Roelv- lvink, 1995; Li & Xu, 2004; Nakagawa, 1983; Okumoto & Eslayed,
ink (1995) determine the optimum time for preventively replacing 1983; Pham & Wang, 2000; Popova, 2004; Ritchken & Wilson,
a fixed group of components by introducing marginal cost criteria; 1990; Sheu & Jhang, 1996; Sun et al., 2010). For the repair time
the marginal cost is interpreted as the extra cost caused by defer- of failed units, in fact, it can be quite different depending on differ-
ring preventive replacement for an additional time unit and both of ent types of multi-unit systems. In the real world, we believe that it
block and group age replacement are developed and compared. Li is impossible for any physical system to take zero repair time to re-
and Xu (2004) develop an age group replacement model to choose place its failed units. But, some small or simple multi-unit systems,
the set of components randomly for replacement and study the ef- such as internet service system or telecommunication service sys-
fect of dependency in this selection; it is showed that preventive tem, may be assumed to take negligible repair time for replacing
maintenance only makes sense for components whose lifetimes failed servers or failed switchboards due to their modularizing de-
are new-better-than-used. Sun et al. (2010) simulate and analyze signs. Moreover, it is more reasonable to assume the positive repair
the effect of three different multi-component maintenance policies time for most repair processes of systems such as automated man-
on the system performance, including age replacement, block ufacturing system, electric power system, nuclear power system,
replacement, and block replacement with minimal repair. The sec- and submarine’s navigation system. Therefore, besides one model
ond type of group maintenance policy is m-failure. The general for instantaneous repair, assuming the positive repair time for
idea for this policy is that no repairs are performed until m servers other two models in this paper adds options to be more credible
have failed, and then all failed servers in the system are fixed and better reflecting the real repair process. For these two models
simultaneously. Assaf and Shanthikumar (1987) consider two with positive repair time, one model allows other unit failures dur-
models with exponential failure times and a more general mainte- ing maintenance process, and the other model assumes that other
nance policy, under which when the number of failed servers unit failures will not occur because all normal operating units are
reaches m, n servers are repaired. They show that the optimal re- also checked to make sure they will stay in normal condition dur-
pair policy is either not to repair any failed server or to repair all ing maintenance process. For presuming constant production rate,
failed servers (n = m), and this is effectively a m-failure group it means that when a failed machine is left un-repaired, there is a
maintenance policy. By extending the repair and replacement production loss cost incurred at a constant rate. Although the
model developed by Assaf and Shanthikumar (1987), optimal m- assumption of continuous inflow is appropriate for high-volume
failure policies with nonnegative random repair time are discussed automatic production processes, in the real world, it may be more
in Wilson and Benmerzouga (1990). Pham and Wang (2000) fur- realistic to assume that jobs arrive according to a random arrival
ther extend two (s, T) opportunistic maintenance policies to the stream. In this paper, the production loss cost rate is not constant,
group maintenance policy, which triggers the corrective replace- but instead it depends on the number of jobs waiting for service at
ment depending on the number of failed component. Liu (2004) any given time. It can be obtained more precise production loss
develops a specific class of m-failure group replacement policy cost than the traditional group maintenance models in which con-
for an M/M/N production/service system. A matrix-geometric stant production job arrivals are assumed.
method is applied to perform the steady state analysis and to ob- Therefore, the group maintenance problems considered in this
tain the expected average number of customers and the expected paper fall in the classification of queuing systems with unreliable
average cost. Mathematical analysis and numerical examples are servers, in which a matrix-geometric method can be employed
given to demonstrate the properties of the optimal policy for var- for steady state analysis of a certain class of continuous time Mar-
ious sets of parameter values. The last type is (m, T) group mainte- kov Processes. The earliest results on matrix-geometric solutions
nance policy. The general idea is that no failed machine is repaired are contained in the paper of Evans (1967) and the Ph.D. thesis
until a scheduled time T or upon m servers have failed, whichever of Wallace (1969) for block-Jacobi generators of continuous-
comes first, and then all failed machines in the system are fixed parameter Markov Processes of the GI/M/1 type, called Quasi
simultaneously. Nakagawa (1983) considers the optimal number Birth-and-Death (QBD) Processes. The general theory, largely
m to minimize the mean cost rate when the scheduled replace- developed by Neuts (1981), was motivated by the problem of ana-
ment time T is specified. Ritchken and Wilson (1990) consider a lyzing the steady state behavior of non-Markovian (M/G/1 and
generalization of the combined (m, T) group replacement model GI/M/1) queues operating in a random environment that can be
1014 G.-S. Liu / Computers & Industrial Engineering 62 (2012) 1011–1024

in one of N appropriately defined states. Besides, Latouche and reaches a predetermined level m. During the period of mainte-
Ramaswami (1999) further introduce the more up-to-date devel- nance process, all failed servers are replaced with new ones or re-
opments of these celebrated matrix-geometric methods and call paired to become as good as new at the same time.
them matrix analytic methods subsequently. They mainly present There is a fixed cost S associated with the initiation of the main-
the basic mathematical concepts and the algorithms of these ma- tenance process, independent of the number of failed servers.
trix analytic theories. In fact, these matrix analytic methods can Additionally, there is a variable cost r_cost for each failed server
be applied in several areas, In recent years, many matrix analytic undergoing repair. Finally, there is a delay cost h per unit time
methods are continuously developed or improved to solve the for each customer waiting in the system. This cost structure im-
QDB problems for different queuing systems, such as M/M/1 queue plies the following tradeoff. For small values of m, the maintenance
in a random environment (Neuts, 1978a, 1978b), Phase-Type process starts when only few servers are failed; therefore, the
queues (Latouche & Ramaswami, 1999; Neuts, 1981), Tandem number of working servers will be usually high and the delay cost
queues with blocking (Latouche & Neuts, 1980), a Markovian will be relatively low. On the other hand, the fixed cost S is paid
queue with server breakdowns and repairs (Neuts & Lucantoni, more often, therefore the average repair cost will be relatively
1979), Markovian queues with marked transition (He & Neuts, high. In contrast, for large values of m, the repair process is trig-
1998), Multiserver retrial queues (Artalejo, Gomez-Corral, & Neuts, gered less often, hence it will keep the average repair costs low
2001), Fluid queues (Dzial, Breuer, da Silva Soares, Latouche, & but at the same time increase the delay cost. The objective of this
Remiche, 2005; Silva Soares & Latouche, 2006), and even the most paper is to determine the optimal value of m that minimizes the
notable telecommunication systems at the present time (Conti, expected average repair and delay cost.
Gregori, Lenzini, & Neuts, 1994; Li, Widjaja, & Neuts, 1998). How- Depending on different maintenance process assumptions,
ever, most of them focus on evaluating the average performance there are totally three group maintenance models developed in
characteristics for some certain situations of systems, such as find- this paper. The first group maintenance model assumes positive re-
ing the steady-state probability distribution, little effort has been pair time and allows other functional servers failed during mainte-
spent on the control policy or aspect, with the exception of the re- nance process. The second group maintenance model assumes
lated problems of server vacations (Alfa & Frigui, 1996). In this pa- positive repair time and does not allow normal operating servers
per we consider the group maintenance policies for M/M/N failed during maintenance process because all of them will also
unreliable queuing service systems. As we mentioned before, the be checked to make sure they are operational, when those failed
production loss cost is not constant, but depending on the number units are replaced. The third group maintenance model assumes
of job waiting in the queue line. This is actually a QBD process with that the repair time for replacing failed units is negligible due to
two dimensions, the level for the arrival/service process and the their modularizing designs.
phase for the failure/repair process. Consequently, it is very appro-
priate to develop the group maintenance policies by implementing 2.1. Model 1: Positive repair time and server failures allowed during
the matrix analytic methods to handle the failure/repair process of maintenance
servers and the arrival/service process of customers simulta-
neously, and obtain the steady state probability vector for number This group maintenance model assumes positive repair time
of customers waiting in the system. and allows failures of other normally operating servers during
In the sequel, Section 2 will discuss the problem description and maintenance process. In practical engineering application, this
formulation. Section 3 will do the modeling and steady state anal- model can be applied in multi-unit systems such as typical auto-
ysis. Section 4 will derive the related cost function. Section 5 will mated manufacturing system, electric power system, nuclear
demonstrate the properties of cost functions. Section 6 will illus- power system, and submarine’s navigation system. The time of
trate the numerical results and related comparisons. Finally, Sec- maintenance process follows an exponential distribution with sys-
tion 7 will give some conclusions and discussions. tem repair rate rw = rc/(N  w) while w servers are operational.
While the repairs are still in process, any server that fails in be-
tween will also enter the repair facility, and the system repair rate
2. Problem description and formulation is adjusted accordingly.
The above group maintenance model is related to that in Neuts
This paper considers a unreliable queuing system with N iden- and Lucantoni (1979), in which the failed servers start repair at the
tical servers and a single queue. The applied system can be internet failure instant if any repairman is available. This model is actually a
service system, telecommunication service system, electric power QBD process with two dimensions, the level for the customer arri-
system, nuclear power system or automatic manufacturing system, val/service process and the phase for the server failure/repair pro-
etc. depending on different group maintenance models developed cess. The infinitesimal generator Q for the customer arrival/service
in this paper. The customer arrivals of system follow a Poisson pro- process combined with the server failure/repair process takes the
cess with arrival rate kw generally depending on the number of following form:
working servers w. Each customer entering the system can only 2 3
be served by one server; the service time for each customer follows T 00 T 01 0 0 0 0 0 0 0 .. .
an exponential distribution with service rate l. Each server in sys- 6T T T 0 0 0 0 0 0 .. . 7
6 10 11 12 7
6 7
tem operates for a random period of time, which follows exponen- 6 . .. . . . . .. .. . . .. .. . . .. . .. .. . .. . 7
6 7
tial distribution with failure rate f. The failed servers are repaired 6 7
6 0 0 0 0 T N2;0 T N2;1 T N2;2 0 0 .. . 7
by a crew of c repair persons. The repair rate for each repair person 6 7
to fix one failed server is r. It is assumed that the repair crew will Q ¼6
6 0 0 0 0 0 T N1;0 T N1;1 T N1;2 0 .. . 7 7 ð1Þ
6 7
do the jobs together and devote their efforts proportionately to all 6 0 0 0 0 0 0 A2 A1 A0 .. . 7
6 7
servers being repaired at a given time. Therefore, the whole system 6 0 0 0 0 0 0 0 A2 A1 .. . 7
6 7
repair rate during maintenance process will depend on the number 6 7
4 0 0 0 0 0 0 0 0 A2 .. . 5
of servers being repaired and equal rw = rc/(N  w) while w servers
: : : : : : : : : :
are operational (and N  w under repair). A class of group mainte-
nance policies is developed, called m-failure, under which the The elements of Q ,the square blocks of dimension
maintenance process is started when the number of failed servers (N + 1)  (N + 1), are defined as zero matrices except that:
G.-S. Liu / Computers & Industrial Engineering 62 (2012) 1011–1024 1015

DðkÞ ¼ diagðk0 ; k1 ; . . . ; kN Þ; Dðl


 Þ ¼ diagð0; l; 2l; . . . ; NlÞ; T 00 ¼ Q  DðkÞ; T 01 ¼ T 12 ¼ T 22 ¼ . . . ¼ T N1;2 ¼ A0 ¼ DðkÞ
T x0 ¼ diagfl minðx; wÞ; 0 6 w 6 Ng; for 1 6 x 6 N  1; T x1 ¼ Q  DðkÞ   T x0 for 1 6 x 6 N  1; A2 ¼ Dðl  Þ;
A1 ¼ Q  DðkÞ  A2 ð2Þ

The matrix Q is the generator which describes the failure/repair The stationary probability vector p of Q in (5) is found by solv-
process. All elements in Q are equal to zero except that ing the equations p
 Q ¼ 0; p
 e ¼ 1, and it is given explicitly by
!1
Q 00 ¼ r 0 ; Q w;w1 ¼ wf for 1 6 w 6 N; Q w;N N Nf X
N1
p0 ¼ p1 ¼ p2 ¼ .. . ¼ pNm1 ¼ 0; pN ¼ þ þ1 ;
¼ rw for 0 6 w 6 N  m; Q w;w r Nm j¼Nmþ1 j
¼ Q w;w1  Q w;N ; for 1 6 w 6 N ð3Þ Nf
In summary, the generator Q can be described as follows:
pNm ¼ pN ;
r Nm
N
2 3 pj ¼ pN ; for j ¼ N  m þ 1;. . .; N  1
r 0 0 : 0 0 0 : 0 0 r0 j
6 f f  r 1 : 0 0 0 : 0 0 r1 7
6 7
6 : : : : : : : : : : 7
6 7
6 0 0 : ðN  mÞf r Nm  ðN  mÞf 0 : 0 0 r Nm 7
Q ¼6 7
6 0 0 : 0 ðN  m þ 1Þf ðN  m þ 1Þf : 0 0 0 7
6 :
6 : : : : : : : : : 77
2.3. Model 3: Instantaneous repair
4 0 0 : 0 0 0 : ðN  1Þf ðN  1Þf 0 5
0 0 : 0 0 0 : 0 Nf Nf
This group maintenance model assumes that the repair time for
ð4Þ
replacing failed servers is negligible due to their standardization
The stationary probability vector p  of Q in (4) is obtained by and modular designs. It also means that the replacement of all
solving the equations p
 Q ¼ 0; p
 e ¼ 1, and it is given explicitly by failed servers will be finished immediately after the repair process

" ! ! !#1
X r0 Y
r 0 Nmþ1
j
ðk  1Þf þ rk1 X N
r0 Nmþ1Y ðk  1Þf þ r k1 Y
j
k1
p0 ¼ 1 þ þ þ
f j¼2
f k¼2 kf j¼Nmþ2
f k¼2
kf k¼Nmþ2
k
!
r Y ðk  1Þf þ r k1
j
r
p1 ¼ 0 p0 ; pj ¼ 0 p0 ; for j ¼ 2; . . . ; N  m þ 1
f f k¼2 kf
! !
Y ðk  1Þf þ r k1
r 0 Nmþ1 Y j
k1
pj ¼ p0 ; for j ¼ N  m þ 2; . . . ; N
f k¼2
kf k¼Nmþ2
k

2.2. Model 2: Positive repair time and no server failures allowed during is initiated. In the real engineering application, it is impossible for
maintenance any physical system to take zero repair time to replace its failed
units. However, some completely modularizing-design multi-unit
This group maintenance model assumes positive repair time systems, such as internet service system or telecommunication ser-
and does not allow normally operating servers failed during main- vice system, may be assumed to take negligible repair time for
tenance process because all of them will be checked to make sure replacing failed servers or failed switchboards. Consequently, there
they are operational when those failed servers are replaced. In are no server failures, customer arrivals and customer services during
practical engineering application, this model can be applied in typ- the period of maintenance. The basic structure of this model is similar
ical automated manufacturing system, electric power system, nu- to model 2. The main difference between this model and model 2 is
clear power system, submarine’s navigation system, in which that there are no customer arrivals and customer services during
their normally operating servers are checked to make sure they the period of maintenance. To keep the same structure of group
will stay in function during maintenance process. In this model, maintenance models developed in this paper, we need to apply an
the basic model is similar to model 1 described above; the only dif- approximation approach to build this model by assuming that r is
ference is that there are no server failures during the period of much larger than other transition rates. This model has exactly the
maintenance. Therefore, the infinitesimal generator Q for this same generator matrix Q as in (5) of model 2, but has different cus-
model takes the form in (1) and its elements have almost the same tomer arrival and service rate diagonal matrix as compared with (2):
definition in (2) but with modified generator of the failure/repair
process Q. Only few elements in Q need to be changed as follows: DðkÞ ¼ diagð0; . . . ; 0; kNmþ1 ; . . . ; kN Þ; Dð l

Q w;w1 ¼ wf for N  m þ 1 6 w 6 N; and Q w;w1 ¼ 0 for 1 6 w ¼ diagð0; . . . ; 0; N  m þ 1l; . . . ; NlÞ

6Nm By assuming a very large r, the stationary probability vector p of


Q in (5) is obtained by solving the equations p Q ¼ 0; p
 e ¼ 1, and it
In summary, the modified generator Q can be described as is given explicitly by
follows:
2 3
r0 0 : 0 0 0 : 0 0 r0 p0 ¼ p1 ¼ p2 ¼ . . . ¼ pNm1 ¼ pNm ¼ 0; pN
6 0 r1 : 0 0 0 : 0 0 r1 7 !1
6 7
6
6 : : : : : : : : : : 77 X
N1
N N
6 0 0 : 0 rNm 0 : 0 0 rNm 7 ¼ þ1 ; pj ¼ pN ; for j ¼ N  m þ 1; . . . ; N  1
Q ¼6
6
7 j j
6 0 0 : 0 ðN  m þ 1Þf ðN  m þ 1Þf : 0 0 0 77 j¼Nmþ1
6 : : : : : : : : : : 7
6 7
4 0 0 : 0 0 0 : ðN  1Þf ðN  1Þf 0 5 It is clear that all infinitesimal generators developed in models
0 0 : 0 0 0 : 0 Nf Nf
1–3 are following the general form of the infinitesimal generator
ð5Þ described in Neuts (1981). Therefore, this paper will develop the
1016 G.-S. Liu / Computers & Industrial Engineering 62 (2012) 1011–1024

related matrix geometric method to compute the steady state prob- (2) Model 2:
ability distribution for all three proposed models. The details of
modeling and steady state analysis are presented in the next section.
p A2 e > p A0 e ) ½0; 0; .. .; pNm ; .. .; pN 
 diagð0; l;. .. ;N  mÞl; .. .; NlÞe > ½0; 0;. .. ; pNm ;. .. ; pN 
3. Modeling and steady state analysis X
N X
N
 diagðk0 ; . .. ;kN Þe ) kw pw < l wpw
This section models the problem described in Section 2 as a con- w¼0 w¼Nm
tinuous Markov reward process, and applies the matrix geometric (3) Model 3:
method to compute the steady state distribution. To this end, let xt,
wt denote the number of customers in the system and the number p A2 e > p A0 e ) ½0; 0; . . . ; pNmþ1 . . . ; pN 
of operating servers at time t, respectively. Then {wt, t P 0} is a  diagð0; 0; . . . ; ðN  m þ 1Þl; . . . ; NlÞe
continuous time Markov Process with infinitesimal generator Q > ½0; 0; . . . ; pNmþ1 ; . . . ; pN diagð0; 0; . . . ; kNmþ1 ; . . . ; kN Þ
that depends on the specific assumptions made on the repair pro-
X
N X
N
cess. Furthermore, {(wt, xt), t P 0} is also a continuous time Markov e) kw pw < l wpw
Process with infinitesimal generator Q . Let p  denote the stationary w¼Nmþ1 w¼Nmþ1
 the stationary distribution of Q .
distribution of Q and y
h
The physical meaning of stability condition, p 
k<p l
 assumed in
3.1. The related properties for steady state analysis
this model is that the weighted mean arrival rate should be less than
The steady state distribution is computed for all three models the weighted mean service rate. It also means that the service capa-
based on the different repair assumptions. The first thing need to bility of operating servers can handle all customers waiting in this
be considered is whether the queue is stable in these three models. system so that the number of customers in the queue line will not
From the stability condition of QBD processes for M/M/1 Queue in a go to the infinity. If the stability condition does not hold, the queuing
Markovian environment given in Latouche and Neuts (1980, 1981), line will go to the infinity and is not possible to obtain the stationary
distribution. In addition, stability condition p 
k<p l can also be de-
it follows that for the model described above, the queue is stable if 
and only if scribed as stability index pplk < 1 . According to the result of Lemma 3,
when stability condition holds, the system is stable and all irreduc-
p k < p l ; ð6Þ ible Markov Process Q of three proposed models are positive recur-
rent. This property is the most important part required to be able to
where p  is the stationary probability vector of Q in (4) or (5),
 continue the steady-state analysis of our continuous QBD processes.
k ¼ ½k0 ; k1 ; k2 ; . . . ; kN  is the vector of arrival rates, and l
 ¼ ½l0 ; l1 ;
The following Theorem 3.1, which is a particular analogue of Theo-
l2 ; . . . ; lN  is the vector of service rates. This is commonly referred
rem 1.7.1 and Theorem 3.1.1 given in Neuts (1981), is provided to
to as the ‘negative drift’ condition for stability in matrix-geometric
set up the required condition to obtain the stationary probability
solutions. (Neuts, 1981; Latouche & Ramaswami, 1999). In the con-
vector of this proposed Q . We shall subsequently assume that Lem-
tinuous QBD of our three proposed models, the similar stability
ma 3 holds, Theorem 3.1 then holds.
property holds, using the similar type of argument as the proof of
Throrem 7.2.4 given in Latouche and Ramaswami (1999).
Theorem 3.1. if the irreducible Markov Process Q is positive
recurrent, then
Lemma 3. The irreducible Markov Process Q is positive recurrent or
the queue is stable if and only if the following stability condition holds:
X
N XN XN XN (a) The minimal nonnegative matrix R satisfies the matrix-quaratic
ð1Þ Model 1 : k w pw < l wpw ð2Þ Model 2 : kw pw < l wpw equation
w¼0 w¼1 w¼Nm w¼Nm
X
N X
N  ÞÞ þ R2 Dðl
DðkÞ þ RðQ  Dðk þ l Þ ¼ 0
ð3Þ Model 3 : kw pw < l wpw
w¼Nmþ1 w¼Nmþ1 (b) The eigenvalues of R lie inside the unit disk.
P k
(c) B[R]e = 0, where the matrix B½R ¼ 1 k¼0 R Q k0
Proof. According to Theorem 7.2.4 given in Latouche and Ramasw- (d) There exists a positive vector y0 satisfied with y0 B[R] = 0, and
ami (1999), if the number of states N is finite and that the matrix normalized by y0(I  R)1e = 1
A = A0 + A1 + A2 is irreducible, the continuous time QBD is positive (e) Partitioning the stationary probability vector y  _of Q into the
recurrent if and only if (N + 1)-vectors y0, y1, y2, . . ., the N(N + 1)-vector (y0, y1,
p A2 e > p A0 e: ð7Þ y2, . . . , yN1) are first obtained by solving the system
It is clear that all infinitesimal generators developed in our three
y0 T 00 þ y1 T 10 ¼ 0
models, which have the form of Q in (1), are the continuous time
QBD processes. y DðkÞ þ y T x;1 þ y
x1 x xþ1 T xþ1;0 ¼ 0 for 1 6 x 6 N  2
Furthermore, the matrix Q in (4) and (5), which describe the yN2 DðkÞ þ yN1 ½T N1;1 þ RDðl
 Þ ¼ 0
failure/repair processes of our three models, are irreducible and the ð8Þ
X
N2
1
related number of states N is also finite. Therefore, depending on yx e þ yN1 ð1  RÞ e ¼ 1
different forms of Q , Q, 
k; l
 in model 1, model 2, and model 3 (7) x¼0
can be modified as below: then yx ¼ yN1 RxNþ1 for x P N  1; N > 2

(1) Model 1:
Proof.
p A2 e > p A0 e ) ½p0 ; p1 ; . . . ; pN Dðl Þe > ½p0 ; p1 ; . . . ; pN DðkÞe
) ½p0 ; p1 ; . . . ; pN diagð0; l; . . . ; NlÞe > ½p0 ; p1 ; . . . ; pN  (a) The proof of this first statement follows immediately from
X
N X
N the results of Theorem 1.7.1 and Theorem 3.1.1 given in
 diagðk0 ; . . . ; kN Þe ) kw pw < l wpw Neuts (1981) by using Dð kÞ to substitute for A0 ; Q  Dð

w¼0 w¼1 l Þ to substitute for A1 ; Dðl Þ to substitute for A2.
G.-S. Liu / Computers & Industrial Engineering 62 (2012) 1011–1024 1017

(b) The expected number of transitions before the first return to X


1 X
1 X v 1
1 X

the level i, given that the process starts in the state (i, j), is A0 e ¼ ðR  Rv ÞAv e R k ) A0 e ¼ R k Av e ) A0 e
v ¼2 k¼0 v ¼2 k¼1
finite if and only if the Markov Process Q is positive recurrent.
P k X
1 X
1
It is given by the jth component of the vector 1 k¼1 R e, which ¼ Rk Av e
P1 k
is finite if and only if the matrix k¼0 R ¼ ðI  RÞ1 is finite, or k¼1 v ¼kþ1
equivalently the eigenvalues of R lie inside the unit disk
(c) It can be easily verified that (b) According to the definition of Q in (1), it is trivial to show
that
X
1 X
1 X
k X
1 X
1 X
1 X
1
B½Re ¼ Rk Q k0 e ¼  Rk Av e ¼  R k Av e A0 e ¼ Rk Av e ) k ¼ RA2 e ) k ¼ Rl

k¼0 k¼0 v ¼0 v ¼0 k¼v k¼1 v ¼kþ1
! !
X
1
k
X
1
v 1
X
1
v h
¼ R R Av e ¼ ðI  RÞ R Av e ¼ 0
3.2. The analytic and algorithmic solutions
k¼0 v ¼0 v ¼0
P1 P P1 Following the results of Lemma 3, Theorem 3.1 and Lemma 3.2,
(d) Clearly y0 B½R ¼ y0 k¼0 Rk Q k0 ¼ k¼0 y0 Rk Q k0 ¼ 1
k¼0 yk Q k0
P1 the vector of steady state probabilities corresponding to x custom-
Since yQ ¼ 0; k¼0 yk Q k0 ¼ 0 can be easily verified. The nor-
ers in the system can be expressed as yx = yN1RxN+1, for x = N,
malizing equation can be derived as follows:
N + 1, . . ., where matrix R is the solution to the following system
X
1 X
1 X
1
of equations:
yx e ¼ 1 ) y 0 Rx e ¼ 1 ) y 0 Rx e ¼ 1 ) y0 ðI  RÞ1 e
x¼0 x¼0 x¼0 X
1
Rk Ak ¼ 0;  ¼ k
Rl ð9Þ
¼1
k¼0
Finally, It can be concluded that the vector y0 is strictly positive
Furthermore, vectors y0, y1, . . . ,yN1 are computed using appro-
whenever Q is positive recurrent.
PN2 1
priate initial conditions. Therefore, the computation of the station-
(e) From yQ ¼ 0 and x¼0 yx e þ yN1 ðI  RÞ e ¼ 1; y0 ; y1 ; . . . ; ary distribution y entails, on the one hand, the solution of the
yN1 can be easily obtained. above system of Eq. (9), and on the other hand, the computation
of the initial vectors.
We next need to prove that yx = yN1RxN+1 for x P N  1, N > 2. In the following, the special case N = m = 1 is considered first for
By conditioning on the time and the state of the last visit to the set all three models. Although this case is trivial from the point of view
N  1, if there is such a visit, the relation follows that of policy design, it is sufficiently simple to admit analytic solutions
ðnÞ ðnÞ P Pn ðrÞ
P x;j;x;j ¼ N1 Px;j;x;j þ m nr
v ¼1 r¼0 P x;j;N1;v N1 PN1;v ;x;j . Adding these and provides useful intuition. Afterwards the case of general N and
equations for n ranging from 1 to n and dividing the resulting m will be computed in the form of some specific algorithm depend-
sums by n, as n ? 1, the left-hand side tends to be yx,j. Since the ing different models.
P ðnÞ
sum 1 n¼1 N1 P x;j;x;j is finite, the first term on the right-hand side
1. In the special case that N = 1 and m = 1 with analytic solutions
tends to be zero. The second term
The same analytic solutions can be derived for all three models.
X n X n  r
m
1X n
ðrÞ nr
Xm
1X ðrÞ
X
n
n The transition rate matrix Q in (4) and (5) take the same form
P x;j;N1;v N1 P N1;v ;x;j ¼ P x;j;N1;v N1 P N1;v ;x;j  
v ¼1 n n¼1 r¼0 v ¼1 n r¼0
 
n¼0 cr cr
Q¼ ð10Þ
Pm P  ðrÞ f f
tends to v ¼1 yN1;v R
xNþ1
, since n1 nr¼0 Px;j;N1;v tends to yN1,v and
Pn r n xN+1
for 1 6 v 6 m. This completes The stationary distribution of Q is found by solving pQ = 0 and
n¼0 N1 P N1;v ;x;j has the limit R
the proof of this statement. h pe = 1, thus
An additional accuracy check for obtaining R is provided by the f rc
following lemma: p0 ¼ ; p1 ¼ ð11Þ
rc þ f rc þ f

Lemma 3.2. If eigen-values of R lie inside the unit disk and Ae = 0 The queue is stable (Q is positive recurrent) if and only if
P
where A ¼ 1k¼0 Ak , then p k < p l ) f k0 þ crk1 < crl ð12Þ
P1 k
X
1 X
1 Matrix R is the minimal solution of equation k¼0 R Ak ¼ 0,
ðaÞ A0 e ¼ Rk  ¼ k
Av e ðbÞ Rl which takes the following form:
k¼1 v ¼kþ1
 Þ þ R2 Dðl
DðkÞ þ R½Q  Dðk þ l Þ ¼ 0
Proof. 8
> R00 ðcr þ k0 Þ  fR01  k0 ¼ 0
>
>
< lðR00 R01 þ R01 R11 Þ þ crR00  R01 ðf þ k1 þ lÞ ¼ 0
(a) From Theorem 3.1, there exists a minimal nonnegative matrix
P ) ð13Þ
R satisfies the matrix-quadratic equation 1 Rk Ak ¼ 0 and >
> R10 ðcr þ k0 Þ  fR11 ¼ 0
P1 k¼0k >
:
lies inside the unit disk. Clearly, k¼0 R A k ¼ 0 ) A0 þ lðR10 R01 þ R211 Þ þ crR10  R11 ðf þ k1 þ lÞ þ k1 ¼ 0
P P1 v
RðA  A0  1 v ¼2 Av Þ þ v ¼2 R Av ¼ 0 By multiplying e to both
sides of equation, it can be shown that ¼
In conjunction with equation Rl k, the following explicit
X
1
!
X
1 X
1 X
1
solution for R is obtained:
A0 e þ R A  A0  Av e þ Rv Av e ¼ 0 ) A0 e þ RAe  RA0 e  RAv e þ Rv Av e ¼ 0 Specifically,
v ¼2 v ¼2 v ¼2 v ¼2
X
1 X
1 2 3
k0 ðf þlÞ k0
) A0 e  RA0 e ¼ ðR  Rv ÞAv e ) ðI  RÞA0 e ¼ ðR  Rv ÞAv e
v ¼2 v ¼2 R¼ 4 lðcrþk0 Þ l 5 ð14Þ
f k1 k1
Since R lies inside the unit disk, the above equation can be derived lðcrþk0 Þ l
as follows:
1018 G.-S. Liu / Computers & Industrial Engineering 62 (2012) 1011–1024

The matrix geometric solution can now be expressed as yx = y0Rx, this system equation. To do so, first let D0 and Dx, 1 6 x 6
P1
for x = 1, 2, . . . . Furthermore, it can be shown that x¼0 yxw ¼ N  1, denote the matrices diag(T00), and diag(Tx1),
P1
pw ; w ¼ 0; . . . ; N, or equivalently in vector form, x¼0 yx ¼ p . Substi- 1 6 x 6 N  1, respectively.
tuting yx = y0Rx into the previous equation to obtain y0 ðI  RÞ1 ¼ p , Then the system of Eq. (8) may be written into the form
from which it follows that
y0 ¼ p
 ðI  RÞ ð15Þ y0 ¼ ½y0 ðT 00 þ D0 Þ þ y1 T 10 D1
0 ;

By substituting (11) and (14) into (15), we obtain the explicit yx ¼ ½yx1 DðkÞ þ yx ðT x1 þ Dx Þ þ yxþ1 T xþ1;0 D1
x ; for 1 6 x 6 N  2;
result yN1 ¼ fyN2 DðkÞ þ yN1 ½T N1;1 þ DN1 þ RDðlÞgD1
h i N1 ;
f ðlcrk0 f k1 crÞ lcrk0 f k1 cr
y0 ¼ p
 ðI  RÞ ¼ lðcrþk0 Þðcrþf Þ lðcrþf Þ ð16Þ X
N2
yx e þ yN1 ðI  RÞ1 e ¼ 1: ð20Þ
According to the stability condition (12), it seems reasonable to x¼0
conclude that y0 is positive.
The first three equations above can be solved with successive
iteration method. The last equation, which is known as normal
2. In the general case that N P 2 with algorithmic solutions
equation, may be used at each step to keep the successive iterations
within a compact set that contains the unique solution vector.
It is not easy to obtain an analytic expression in the general case
by solving the system of equations in (8) and (9) when the number 4. The computation of cost function
of servers in the system N or the order of the system is getting
large. This section considers the long-run expected average cost
For obtaining matrix R from system of equations in (9), the com- regarding three different models:
puting steps are as follows:
First, the equation A0 + RA1 + R2A2 = 0 will be modified as succes-
2 4.1. Model 1: Positive repair time and server failures allowed during
sive form R ¼ A0 A 
1  R A2 A1 . Then, the iterative method is
maintenance
applied to approximate the optimal R. Besides, another equation,
Rl ¼ k, is used to make sure the successive iterations within a com-
In the following the long-run expected average cost of this spe-
pact set which contains the unique solution matrix.
cific model can be computed by the following intermediate steps:
For computing the initial vectors y0, y1, . . . , yN1, it can be
implemented by applying the specific algorithm depending on
1. Expected number of customers in the system: First, denote the
different proposed models:
mean vector M ¼ ½M 0 ; M 1 ; . . . . . . ; M N , where Mw = Lwpw and
Lw = E [number of customer jw servers working], 0 6 w 6 N.
(1) Positive Repair (model 1 and model 2)
From the results of stability and steady state analysis, M can be
In some applications of these two models, the orders of the
expressed as follows:
systems may be very high, but their particular structures
allow efficient solution. Since the vector  k defined in these X
1 X
N2 X
1
M¼ xyx ¼ xyx þ xyx ; and
models are positive, the equations in (8) can be solved recur- x¼1 x¼1 x¼N1

sively, starting with the penultimate one. This leads to X


1 X
1 X
1 X
1
j
X
1
xyx ¼ yN1 xRxNþ1 ¼ yN1 ðj þ N  1ÞRj ¼ yN1 jR þ yN1 ðN  1ÞRj
x¼N1 x¼N1 j¼0 j¼0 j¼0
yi ¼ yN1 Hi ; for 0 6 i 6 N  2: ð17Þ X
1
j1 1 2 1
¼ yN1 R jR þ yN1 ðN  1ÞðI  RÞ ¼ yN1 RðI  RÞ þ yN1 ðN  1ÞðI  RÞ
j¼1
2
The matrices Hi are defined as ¼ yN1 ðI  RÞ ½ðN  1ÞI  RðN  2Þ

HN1 ¼ I;
HN2 ¼ ðT N1;1 þ RA2 ÞD1 ðkÞ; ð18Þ Therefore,
X
N2

HNm ¼ ðHNmþ1 T Nmþ1;1 þ HNmþ2 T Nmþ2;0 ÞD ðkÞ; 1 M¼ xyx þ yN1 ðI  RÞ2 ½ðN  1ÞI  ðN  2ÞR ð21Þ
x¼1
for 3 6 m 6 N:

From the above results, the expected number of customers in


The vector yN1 is then obtained by solving the system system is then given by Me.
2. Expected holding cost:
yN1 ðH0 T 0;0 þ H1 T 1;0 Þ ¼ 0;
" # Let h be the holding cost per unit time for each customer wait-
XN2
1 ing in system, then the expected holding cost is given by hMe.
yN1 Hm e þ ðI  RÞ e ¼ 1: ð19Þ
3. Expected cycle time:
m¼0
Let tij be the expected transition time from state i to j, and tii the
expected recurrence time of state i. In this model the cycle is the
The computations in (17)–(19) can be implemented very efficiently time between two successive repairs. Because the group main-
in terms of computer memory. The recursive Eq. (18) can be com- tenance is initiated when there are exactly N-m operating serv-
puted by storing only two of the matrices Hi at a time, while the ers in the system, the cycle time can be defined as tNm, Nm.
P
vector N2m¼0 H m e is computed successively in each step. The system From matrix Q in (4) and the theory from Markov renewal pro-
Eq. (19) can then be solved easily after the recursive computations cesses (Ross, 1970), then
are completed.
1 X Q jk
(2) Instantaneous repair (model 3) t j;Nm ¼  þ tk;Nm ; for j
Since some elements of vector  k are zero in this model, it is Q jj k–Nmk–j Q jj
impossible to implement the same approach used in (17)–
¼ 0; 1; . . . ; N: ð22Þ
(19) to solve the system Eq. (8). A more time-consuming,
but safe, iterative method (Neuts, 1981) is presented to solve The above equations can be solved to obtain tNm, Nm.
G.-S. Liu / Computers & Industrial Engineering 62 (2012) 1011–1024 1019

4. Expected variable cost of repair: 4.3. Model 3: Instantaneous repair


This part only considers the transition mechanism for the num-
ber of operating servers w from starting the repair process to This model has the similar computing procedure to that in
actually finishing it. First group maintenance is started when model 1, but with a few differences as follows:
state w = N  m. The next transition will be either to state N if
maintenance is finished or to state N  m  1 if there is another 1. Expected cycle time:
working server failed, whichever happens first. If maintenance Because of instantaneous replacement, the time between two
is finished first, then this cycle ends and a new cycle begins. successive repairs will only include the time spending on the
Otherwise, the current cycle will continue until the group main- failure processes from w = N to w = N + 1  m. Therefore
tenance is finished. Therefore, during a transition cycle the state X
N
1
w varies between 0 and N  m. From any state w between 1 and t Nm;Nm ¼ ð26Þ
if
N  m, the next transition can be either to N with transition rate i¼Nþ1m

cr/(N  w), or to w  1 with transition rate wf. From state w = 0, 2. Expected variable cost of repair
the only possible transition is to state N with transition rate First the group maintenance is activated when state w is equal
cr/N. When w = N, a new cycle begins. The corresponding tran- to N-m. Because of the instantaneous repair, the group mainte-
sition flow diagram is shown in Fig. 1. nance must be finished before any other transition going on.
From the above transition mechanism, the probability of the Hence the number of repaired servers for each group replace-
number of repaired servers for each group maintenance cycle ment must be m. Then the expected variable cost of repair
is derived as follows: can be obtained as follows: var_rep_cost(m) = mr_cost.
cr=m
pðm servers repairedÞ ¼ ; 5. The properties of the cost function
cr=m þ ðN  mÞf
" #
Yi1
ðN  jÞf cr=i
pði servers repairedÞ ¼  for i ¼ m þ 1;...;N In this section, to facilitate the proof process, it is assumed that
j¼m
ðcr=jÞ þ ðN  jÞf cr=i þ ðN  iÞf
k0 = k1 = . . . = kN for model 1 and model 2, and k0 = k1 = . . . = kN,
ð23Þ kNm+1 = . . . = kN is assumed for model 3.

Then the expected variable cost of repair can be obtained as Lemma 5.1. If m = 0, then tNm, Nm = 0.
P
follows:v ar rep costðmÞ ¼ r cost  Ni¼m ip½i ser v ers repaired, where
Proof.
r_cost is the variable repair cost per failed server.
5. Expected average cost:
(1) Model 1
In this model, several costs are incurred within each cycle.
From Eq. (22), it directly follows that t Nm;Nm ¼
These are the fixed repair cost, variable repair cost, and holding P Q ½Nm;k
1
cost.  Q ½Nm;Nm þ k Nm Q ½Nm;Nm t k;Nm : By substituting m = 0
By applying the above steps, the expected average cost can be in the previous equation, it is trivial to obtain
expressed as a function of parameter m as follows: 1 X Q N;k
t N;N ¼  þ tk;N :
Q N;N k–N
Q N;N
S  v ar rep costðmÞ
av g cðmÞ ¼ þ h Me þ ð24Þ
tNm;Nm t Nm;Nm
According to Q in (4), it then follows:
This equation can be used to determine the optimal group replace-
1 Nf
ment parameter m. t N;N ¼ þ tN1;N :
Nf þ r N Nf þ r N

4.2. Model 2: Positive repair time and no server failures allowed during
Since r N ¼ cr0 ¼ 1, it can be easily verified that tN, N = 0.
maintenance
(2) Model 2
By substituting m = 0 to the Eq. (25), it follows
This model has the similar computing procedure to that in P
t N;N ¼ r1N þ Ni¼Nþ1 if1
model 1, but with a few differences as follows:
According to Q in (5), it then follows that t N;N ¼ r1N . Since
rN ¼ cr0 ¼ 1, it can be verified that tN, N = 0.
1. Expected cycle time:
(3) Model 3
Because there is no server failure during the period of mainte-
From Eq. (26), it can be shown that
nance, the time between two successive repairs will include
the time spending on repair and the failure processes from X
N
1
w = N to w = N  m + 1. Therefore
t Nm;Nm ¼ :
i¼Nþ1m
if

1 X
N
1
tNm;Nm ¼ þ ð25Þ It is obvious that tNm, Nm = 0 when m = 0. This completes the
rNm if
i¼Nþ1m proof. h
2. Expected variable repair cost:
First the group maintenance is started when state w is N  m.
Since there are no server failures during the period of mainte- Corollary 5.2. If m = 0, then the expected average cost avg_c(m) = 1.
nance in this model, the number of repaired servers for each
group maintenance action must be m. Then the expected vari- Proof. Substituting m = 0 in (24) for all three models, it is easy to
able repair cost can be obtained as var_rep_cost(m) = mr_cost show that avg_c(0) = 1 according to the results of Lemma 5.1. h
1020 G.-S. Liu / Computers & Industrial Engineering 62 (2012) 1011–1024

Lemma 5.3. tNm, Nm is increasing in m. Example 1. N = 8, c = 4, kw = 5, w = 0, . . . ,8, l = 3, f = 0.1, r = 1,


Proof. S = 2500, r_cost = 1000, h = 200

(1) Model 1 It suffices to show that tNm1, Nm1  tNm,


pk/pl Average tNm, Expected
Nm P 0. From Eq. (22) and the transition flow for group
number of Nm average
maintenance process in Fig. 1, it follows that
customers cost

1 ðN  mÞf Model m=1 0.2158 1.6699 1.549 2720
tNm;Nm ¼ þ t Nm1;Nm 1
ðN  mÞf þ cr
m
ðN  mÞf þ cr
m
cr  m=2 0.2343 1.6767 3.258 1814
m m=3 0.2561 1.6934 5.1896 1471
þ cr t N;Nm
ðN  mÞf þ m m=4 0.2829 1.7334 7.4413 1272
m=5 0.3175 1.8311 10.1815 1137
1
and t Nm1;Nm1 ¼ t Nm1;Nm þ t Nm;Nm1 P tNm1;Nm þ ðNmÞf þcr
. m = 6 0.3656 2.0928 13.7449 1056
m
Therefore, m=7 0.4422 3.09 18.9658 1127
m=8 0.6079 14.4461 29.1786 3249
t Nm1;Nm1  tNm;Nm P tNm1;Nm
cr Model m=1 0.2128 1.6668 1.5 2667
1 m
þ cr  t Nm;Nm ¼  t Nm1;Nm 2
ðN  mÞf þ m ðN  mÞf þ cr
m m=2 0.2303 1.6674 3.1786 1749
cr
m 1 1 m=3 0.2516 1.6699 5.0952 1413
 cr  t N;Nm þ 
ðN  mÞf þ m ðN  mÞf ðN  mÞf þ cr
m m=4 0.2782 1.6803 7.3452 1221
m=5 0.313 1.723 10.0952 1088
It is clear that tNm1, Nm1  tNm, Nm P 0. m = 6 0.3619 1.9106 13.6786 1004
P
1
(2) Model 2 From (25), it follows tNm;Nm ¼ rNm þ Ni¼Nþ1m if1 . It m=7 0.4397 2.9137 18.9286 1085
is clear that tNm, Nm is increasing in m. m=8 0.6079 14.4461 29.1786 3249
P
(3) Model 3 From (26), it follows tNm;Nm ¼ Ni¼Nþ1m if1 .
Model m=1 0.2083 1.6634 1.2525 3127
3
It is obvious that tNm, Nm is increasing in m. The proof is
m=2 0.2232 1.6639 2.6836 2010
completed. h
m=3 0.2414 1.6651 4.3527 1597
m=4 0.2644 1.6696 6.3552 1357
Lemma 5.4. The stability index is increasing in m. m=5 0.2948 1.688 8.8577 1184
m=6 0.3383 1.7725 12.1936 1052
 m = 7 0.409 2.3457 17.1961 1022
Proof. By definition, the stability index is equal to PNp k .
l w¼1
wpw m=8 0.5662 11.8507 27.1986 2756

According to the structure of generator Q in (4), when m According to the results of the above example, the stability rate
increases, the system spends more time staying in states with is increasing in m, the amount of customers in system is increasing
P
lower w. Hence,l N 
w¼1 wpw becomes smaller while the value of pk
in m, the transition time of a cycle is increasing in m, and the ex-
is still the same and equal to k. Therefore, the stability index is pected average cost is convex in m for all three models.
increasing in m for all three models. h To compare the three proposed models, the following results
can be concluded:
First, the optimal threshold of number of the failed servers to
Theorem 5.5. There exists an optimal group maintenance parameter initiate repair process is 6 for model 1, 6 for model 2, and 7 for
1 6 m 6 N, for which the expected average cost is minimized. model 3. Secondly, for every specific threshold m, model 3 has
the lowest stability rate, then model 2, then model 1. Thirdly, for
Proof. From the result of Corollary 5.2, it given by the average cost every specific threshold m, model 3 has the least amount of cus-
function avg_c(m) = 1 where m = 0. On the other hand, if m > N tomers in system, then model 2, then model 1. Furthermore, for
(which means that the group maintenance is never implemented), every specific threshold m, model 3 has the shortest transition
the average number of customers in system Me tends to infinity. time of a cycle, then model 2, then model 1. Finally, the optimal ex-
Then the average holding cost h  Me ¼ 1, and certainly the pected average cost is 1056 for model 1, 1004 for model 2, and
expected average cost avg_c (m) goes to infinite as well. From 1022 for model 3.
(22), it is trivial that tN, N = 0 < tNm, Nm where 1 6 m 6 N. Hence, The foregoing results are anticipated, because when no server
avg_c(m) 6 avg_c(0) = avg_c(N + 1) = 1 where 1 6 m 6 N where failures, customer arrivals, and customer services are allowed dur-
1 6 m 6 N. Therefore, for all three models, there exists an optimal ing the period of maintenance, the system spends more time stay-
group maintenance parameter 1 6 m 6 N, which minimizes the ing in states with more servers working, which means a lower
average cost. h stability rate, fewer customers waiting in system, a shorter cycle
transition time, and later group maintenance initiation.
In the following, the numerical analysis is performed by chang-
6. Numerical examples and result analysis ing value in some specific parameter based on Example 1.

This section will present numerical examples and the related 6.1. Change arrival rate k only
result analysis for three proposed group maintenance models,
which can be applied to different types of multi-unit systems. The following example examine the sensitivity of the model to
Example 1 is the basic example to illustrate all three models in this the arrival rate, and try to find some monotonic properties on dif-
section: ferent performance measures for all three models.
G.-S. Liu / Computers & Industrial Engineering 62 (2012) 1011–1024 1021

Example 2. N = 8, c = 4, kw, w = 0, . . . ,8, l = 3, f = 0.1, r = 1, S = 2500, Example 3 (continued)


r_cost = 1000, h = 200
pk/pl Average tNm, Expected
number of Nm average
pk/pl Average tNm, Expected
customers cost
number of Nm average
customers cost S = 6500, 0.4422 3.09 18.9858 1338
m = 7
Model kw = 4, 0.3537 2.0603 18.9658 921
1 m = 7 Model S = 2000, 0.3619 1.9106 13.6786 967
kw = 5, 0.3656 2.0928 13.7449 1056 2 m = 6
m = 6 S = 2500, 0.3619 1.9106 13.6786 1004
kw = 6, 0.4388 2.7417 13.7449 1186 m = 6
m = 6 S = 7000, 0.4397 2.9137 18.9286 1322
kw = 7, 0.4445 2.8084 10.1815 1332 m = 7
m = 5
Model S = 0, m = 6 0.3383 1.7725 12.1936 847
Model kw = 4, 0.3518 1.9231 18.9286 887 3
2 m = 7 S = 100, 0.3383 1.7725 12.1936 855
kw = 5, 0.3619 1.9106 13.6786 1004 m = 6
m = 6 S = 2000, 0.409 2.3457 17.1961 993
kw = 6, 0.4342 2.4904 13.6786 1119 m = 7
m = 6 S = 2500, 0.409 2.3457 17.1961 1022
kw = 7, 0.4382 2.5718 10.0952 1257 m = 7
m = 5 S = 7000, 0.409 2.3457 17.1961 1458
m = 7
Model kw = 3, 0.2454 1.0959 17.1961 772
S = 100000, 0.5662 11.8507 27.1986 6341
3 m = 7
m = 8
kw = 4, 0.3272 1.6082 17.1961 874
m = 7
kw = 5, 0.409 2.3457 17.1961 1022 From the results in Example 3, it is evident that the optimal ex-
m = 7 pected average cost is increasing in S and the optimal group
kw = 6, 0.406 2.2344 12.1936 1144 replacement parameter m is increasing in S for all three models.
m = 6 This is intuitively expected, because when S is high, it is not eco-
nomical to perform the group maintenance too early. Accordingly,
it can be advised that higher setup cost needs later group mainte-
According to the results in Example 2, the optimal group main-
nance operation for all three proposed models.
tenance parameter m is decreasing in k and the optimal expected
average cost is increasing in k for all three models. This is antici-
pated as well, because when the arrival rate is high, the queue size 6.3. Change variable repair cost r_cost only
will be higher on average, which means earlier maintenance is nec-
essary. Therefore, it can be suggested that higher arrival rate needs The following example examines the sensitivity of three pro-
earlier group maintenance operation for all three proposed group posed models to the variable repair cost, and tries to find some
maintenance models developed in this paper. There also exist the monotonic properties on different performance measures.
following outcomes which are not showed in the above table:
the stability rate, the number of customers in system and the ex- Example 4. N = 8, c = 4, kw = 5, w = 0, . . . ,8, l = 3, f = 0.1, r = 1,
pected average cost are increasing in arrival rate k for every specific S = 2500, h = 200
m, but the transition time of a cycle is the same in all k for every
specific m. pk/pl Average tNm, Expected
number of Nm average
customers cost
6.2. Change setup cost S only
Model r_cost = 500, 0.3656 2.0928 13.7449 828
The next example examines the sensitivity of the model to the 1 m = 6
setup cost, and tries to find some monotonic properties on differ- r_cost = 1000, 0.3656 2.0928 13.7449 1056
ent performance measures for all three models. m = 6
r_cost = 3000, 0.4422 3.09 18.9858 1881
Example 3. N = 8, c = 4, kw = 5, w = 0, . . . ,8, l = 3, f = 0.1, r = 1, m = 7
r_cost = 1000, h = 200
Model r_cost = 500, 0.3619 1.9106 13.6786 784
2 m = 6
pk/pl Average tNm, Expected
r_cost = 1000, 0.3619 1.9106 13.6786 1004
number of average
Nm
m = 6
customers cost
r_cost = 3000, 0.4397 2.9137 18.9286 1824
Model S = 2000, 0.3656 2.0928 13.7449 1020 m = 7
1 m = 6
Model r_cost = 100, 0.3383 1.7725 12.1936 609
S = 2500, 0.3656 2.0928 13.7449 1056
3 m = 6
m = 6
(continued on next page)
1022 G.-S. Liu / Computers & Industrial Engineering 62 (2012) 1011–1024

Example 4 (continued) convex in m for all three models. Combining theoretical results in
Section 5 and numerical results in this section, it can be concluded
pk/pl Average tNm, Expected
that there exists a unique m to minimize the expected average cost
number of Nm average
for all three proposed group maintenance models developed in this
customers cost
paper.
r_cost = 1000, 0.409 2.3457 17.1961 1022 For comparisons of three proposed models, Example 1 clearly
m = 7 shows the trend of decreasing stability rate, amount of customers
r_cost = 30000, 0.5662 11.8507 29.1986 11286 in system, and cycle transition time in the order of model 1, 2,
m = 8 and then 3. Of course, the applied maintenance policy should be de-
signed appropriately according to the characteristic of system. But
in some cases that systems can carry out all repair processes of
According to the results of the previous table, it is very clear three proposed group maintenance models, from the aspect of run-
that the optimal expected average cost is increasing in r_cost and ning the service/production process more smoothly, it can be pro-
the optimal group maintenance parameter m is also increasing in posed that group maintenance model 1 can be the first priority
r_cost for all three cases. It is intuitive that when r_cost is high, implemented to run the repair process; if the amount of customers
the average cost to perform the group maintenance becomes high in system is always getting too high, by additionally checking those
as well. Therefore, it is more economic to perform the group main- functional servers during maintenance process, group maintenance
tenance after more server failures. It can be concluded that higher model 1 can be turned into model 2 to improve this situation;
variable repair cost needs later group maintenance operation for all moreover, by greatly increasing the repair rate of maintenance pro-
three proposed models. cess, group maintenance model 2 can be transformed into model 3
to further decrease the amount of customers in system. In addition,
6.4. Change holding cost h only all 15 examples also show that the group maintenance threshold for
model 3 is great than or equal to those for other two models; it can
The final example performs sensitivity analysis to the holding be concluded that instantaneous repair model can perform later
cost and tries to find some monotonic properties on different per- group maintenance than other two positive repair models due to
formance measures for three various proposed models. very quick, even instantaneous repair process.

Example 5. N = 8, c = 4, kw = 5, w = 0, . . . ,8, l = 3, f = 0.1, r = 1,


S = 2500, r_cost = 1000 7. Conclusion

pk/pl Average tNm, Expected This paper first formulates the group maintenance problem
number of average which can handle the failure/repair process of servers and the arri-
Nm
customers cost val/service process of customers simultaneously for M/M/N unreli-
able service systems. There are three variations in our proposed
Model h = 100, 0.4422 3.09 18.9658 818 group maintenance models. The first one is the basic model with
1 m = 7 positive repair time and allows server failures during maintenance.
h = 200, 0.3656 2.0928 13.7449 1056 The second one is a modified model with positive repair time, but
m = 6 does not allow server failures during maintenance. The last one is a
h = 600, 0.3175 1.8311 10.1815 1869 modified model with instantaneous repair. Compared with the tra-
m = 5 ditional group replacement models, these three group mainte-
Model h = 100, 0.4397 2.9137 18.9286 793 nance models developed in this paper provide the related
2 m = 7 decision makers with a better tool to design a appropriate mainte-
h = 200, 0.3619 1.9106 13.6786 1004 nance policy according to the characteristics of service system and
m = 6 the random environment in the real world.
h = 700, 0.313 1.723 10.0952 1949 According to the theoretical results of Theorem 3.1 and Lemma
m = 5 3.2, the matrix geometric solutions of the steady state probability
distribution of system are calculated by applying two specific algo-
Model h = 10, 0.5662 11.8507 27.1986 505 rithms to solve the related linear equations depending on three dif-
3 m = 8 ferent proposed models. Furthermore, the expected number of
h = 200, 0.409 2.3457 17.1961 1022 customers in the system and the expected average cost functions
m = 7 depending on the group maintenance policy parameter m are
h = 400, 0.3383 1.7725 12.1936 1406 developed based on the above matrix geometric solutions of three
m = 5 proposed models.
Three m-failure group maintenance policies are built where the
repair is initiated as soon as the number of failed servers reaches a
By observing the above table, it is very clear that the optimal ex-
predetermined threshold. For the theoretical analysis, this paper
pected average cost is increasing in h, and the optimal group main-
first shows that the transition time of a cycle and the stability rate
tenance parameter m is decreasing in h for all three cases. It is
are both increasing in m, and then proves that there exists an opti-
intuitive that when h is higher, the average holding cost is also
mal group maintenance parameter m, which can be used to find
higher. Therefore, it is more economic to perform the group main-
the minimal average cost for all three models.
tenance earlier. It can be suggested that higher holding cost needs
All numerical results obtained so far indicate that the stability
earlier group maintenance for all three proposed models.
rate, the amount of customers in system, and the transition time
Finally, these numerical results of all 15 examples completely of a cycle are increasing in m, and the expected average cost is con-
demonstrate the properties of the theoretical results proved in Sec- vex in m for all three models. Therefore, there exists a unique m to
tion 5, which include the stability rate increasing in m, the transi- minimize the expected average cost. These numerical results com-
tion time of a cycle increasing in m, and the expected average cost pletely demonstrate the properties of the above theoretical results.
G.-S. Liu / Computers & Industrial Engineering 62 (2012) 1011–1024 1023

This paper also studies the sensitivity of the optimal m value Dzial, Tessa, Breuer, L., da Silva Soares, A., Latouche, G., & Remiche, Marie-Ange
(2005). Fluid queues to solve jump processes. Performance Evaluation, 62,
and average cost value with respect to system parameters such
132–146.
as arrival rate, setup cost, variable repair cost, and holding cost. Evans, R. V. (1967). Geometric distribution in some two-dimensional queuing
For comparison of three proposed models, the results clearly show systems. Operations Research, 15, 830–846.
the trend of decreasing stability rate, amount of customers in sys- Grall, A., Berenguer, C., & Dieulle, L. (2002). A condition-based maintenance policy
for stochastically deteriorating systems. Reliability Engineering and System
tem, and cycle transition time in the order of model 1, 2, and then Safety, 76, 167–180.
3. It is also showed that the group maintenance threshold for He, Qi-Ming., & Neuts, M. F. (1998). Markov chains with marked transitions.
instantaneous repair model is great than or equal to those for other Stochastic Processes and their Applications, 74, 37–52.
Jung, C. M., & Park, D. H. (2003). Optimal maintenance plicies during the post
two positive repair time models. The results shown above provide warranty period. Reliability Engineering and System Safety, 82, 173–185.
decision makers greater insight by implementing the different Jung, Ki Mun, Park, Minjae, & Park, Dong Ho (2010). System maintenance cost
group maintenance policies based on different levels of system dependent on life cycle under renewing warranty policy. Reliability Engineering
and System Safety, 95, 816–821.
parameters. Kennee, J.-P., Gharbi, A., & Beit, M. (2007). Age-dependent production planning and
In practical engineering application, according to the results de- maintenance strategies in unreliable manufacturing systems with lost sale.
rived by mathematical analysis and numerical examples in this European Journal of Operational Research, 178, 408–420.
Lapa, Celso Marcelo F., Pereira, Claudio Marcio N. A., & de Barros, Marcio Paes
paper, it is clear that model 1 and model 2 can be applied in mul- (2006). A model for preventive maintenance planning by genetic algorithms
ti-unit systems such as typical automated manufacturing system, based in cost and reliability. Reliability Engineering and System Safety, 91,
electric power system, nuclear power system, submarine’s naviga- 233–240.
Latouche, G., & Neuts, M. F. (1980). Efficient algorithmic solutions to exponential
tion system depending on if their normally operating servers are
tandem queues with blocking. SIAM Journal of Algebraic and Discrete Methods, 1,
checked to make sure they will stay in operation during mainte- 93–106.
nance process, and model 3 can be applied in some completely Latouche, G., & Ramaswami, V. (1999). Introduction to matrix analytic methods in
modularizing-design multi-unit systems, such as internet service stochastic modeling. Philadelphia: ASA-SIAM.
Liu, Gia-Shie (2004). M-failure group replacement model for queueing systems with
system or telecommunication service system, in which it is as- unreliable and repairable servers. Journal of the Chinese Institute of Industrial
sumed to take negligible repair time for replacing failed servers Engineers, 21(5), 423–431.
or failed switchboards. Li, Jian-Min, Widjaja, I., & Neuts, M. F. (1998). Congestion detection in ATM
networks. Performance Evaluation, 34, 147–168.
Li, Haijun, & Xu, S. H. (2004). On the coordinated random group replacement policy
in multivariate repairable systems. Operations Research, 52(3), 464–477.
References Love, C. F., & Guo, R. (1996). Utilizing Weibull failure rates in repair limit analysis for
equipment replacement/preventive maintenance decisions. Journal of
Aghezzaf, El-Houssaine, & Najid, Najib M. (2008). Integrated production planning Operational Research Society, 47(11), 1366–1376.
and preventive maintenance in deteriorating production systems. Information Monga, A., Zuo, M. J., & Toogood, R. (1997). Reliability based design of systems
Science, 178, 3382–3392. considering preventive maintenance and minimal repair. International Journal of
Alfa, Attahiru Sule, & Frigui, I. (1996). Discrete NT-policy single server queue with Reliability, Quality and Safety Engineering, 4(1), 55–71.
Markovian arrival process and phase type service. European Journal of Nakagawa, T. (1983). Optimal number of failures before replacement time. IEEE
Operations Research, 88, 599–613. Transaction on Reliability, 32(1), 115–116.
Anisimov, Vladimir V. (2005). Asymptotic analysis of stochastic block replacement Neuts, M. F. (1978a). The M/M/1 queue with randomly varying arrival and service
policies for multicomponent systems in a Markov environment. Operations rates. Opsearch, 15(4), 139–157.
Research Letters, 33, 26–34. Neuts, M. F. (1978b). Further results on the M/M/1 queue with randomly varying
Archibald, Thomas W., & Dekker, Rommert (1996). Modified block-replacement for rates. Opsearch, 15(4), 158–168.
multiple-component systems. IEEE Transactions on Reliability, 45(1), 75–83. Neuts, M. F. (1981). Matrix-geometric solutions in stochastic models – an algorithmic
Artalejo, J. R., Gomez-Corral, A., & Neuts, M. F. (2001). Analysis of multiserver approach. New York: Dover.
queues with constant retrial rate. European Journal of Operational Research, 135, Neuts, M. F., & Lucantoni, D. M. (1979). A Markovian queue with N servers subject to
569–581. breakdowns and repairs. Management Science, 25, 849–861.
Arunraj, N. S., & Maiti, J. (2010). Risk-based maintenance policy selection using AHP Okumoto, K., & Eslayed, E. A. (1983). Optimum group maintenance policy. Naval
and goal programming. Safety Science, 48, 238–247. Research Logistics Quarterly, 30, 667–674.
Assaf, D., & Shanthikumar, G. (1987). Optimal group maintenance policies with Pham, H., & Wang, H. (1996). Imperfect maintenance. European Journal of
continuous and periodic inspections. Management Science, 33(11), 1440–1452. Operational Research, 94, 425–438.
Barlow, R. E., & Proschan, F. (1965). Mathematical theory of reliability. New York: Pham, H., & Wang, H. (2000). Optimal (s, T) opportunistic maintenance of a k-out-of-
Wiley. n: G system with imperfect PM and partial failure. Naval Research Logistics, 47,
Beichelt, F. (1993). A unifying treatment of replacement policies with minimal 223–239.
repair. Naval Research Logistics, 40, 51–67. Popova, Elmira (2004). Basic optimality results for Bayesian group replacement
Berg, M. (1976). Optimal replacement policies for two-unit machines with policies. Operations Research Letters, 32, 283–287.
increasing running costs – I. Stochastic Processes and Applications, 5, 89–106. Ritchken, P., & Wilson, J. (1990). (m, T) Group maintenance policies. Management
Berg, M. (1978). General trigger-off replacement procedures for two-unit systems. Science, 36, 632–641.
Naval Research Logistics, 25, 15–29. Ross, S. M. (1970). Applied probability models with optimization applications. New
Carazas, F. G., & Souza, G. F. M. (2010). Risk-based decision making method for York: Dover.
maintenance policy selection of thermal power plant equipment. Energy, 35, Saassouh, B., Dieulle, L., & Grall, A. (2007). Online maintenance policy for a
964–975. deteriorating system with random change of mode. Reliability Engineering and
Chan, G. K., & Asgarpoor, S. (2006). Optimum maintenance policy with Markov System Safety, 92, 1677–1685.
processes. Electric Power Systems Research, 76, 452–456. Scarf, P. A., & Cavalcante, Cristiano A. V. (2010). Hybrid block replacement and
Chien, Yu-Hung, & Chen, Jih-An (2007). Optimal age-replacement policy for inspection policies for a multi-component system with heterogeneous
repairable products under renewing free-replacement warranty. International component lives. European Journal of Operational Research, 206, 384–394.
Journal of Systems Science, 38(9), 759–769. Scarf, P. A., Dwight, R., & Al-Musrati, A. (2005). On reliability criterion and the
Cho, DI., & Parlar, M. (1991). A survey of maintenance models for multi-unit implied cost of failure for a maintained component. Reliability Engineering and
systems. European Journal of Operational Research, 51, 1–23. System Safety, 89, 199–207.
Conti, M., Gregori, E., Lenzini, L., & Neuts, M. F. (1994). An M/G/1 type approach to Sheu, Shey-Huei (1997). Extended block replacement policy of a system subject to
the approximation of the slot-occupancy pattern in a DQDB network. shocks. IEEE Transactions on Reliability, 46(3), 375–382.
Performance Evaluation, 21, 59–80. Sheu, Shey-Huei (1998). A generalized age and block replacement of a system
Das, K., Lashkari, R. S., & Sengupta, S. (2007). Machine reliability and preventive subject to shocks. European Journal of Operations Research, 108, 345–362.
maintenance planning for cellular manufacturing systems. European Journal of Sheu, Shey-Huei, & Chien, Yu-Hung (2004). Optimal age-replacement policy of
Operational Research, 183, 162–180. system subject to shocks with random lead-time. European Journal of Operations
Dekker, Rommert, & Roelvink, Ingrid F. K. (1995). Marginal cost criteria for Research, 159, 132–144.
preventive replacement of a group of components. European Journal of Sheu, Shey-Huei, & Griffith, W. S. (2002). Extended block replacement policy with
Operations Research, 84, 467–480. shock models and used items. European Journal of Operational Research, 140,
Dekker, Rommert, Wildeman, Ralph E., Schouten, Van Der Duyn, & Frank, A. (1997). 50–60.
A review of multi-component maintenance models with economic dependence. Sheu, Shey-Huei, & Jhang, Jhy-Ping (1996). A generalized group maintenance policy.
Mathematical Methods of Operations Research, 45, 411–435. European Journal of Operations Research, 96, 232–247.
1024 G.-S. Liu / Computers & Industrial Engineering 62 (2012) 1011–1024

Sheu, Shey-Huei, Lin, Yuh-Bin, & Liao, Gwo-Liang (2006). Optimum policies for a Wang, Hongzhou (2002). A survey of maintenance policies of deteriorating systems.
system with general imperfect maintenance. Reliability Engineering and System European Journal of Operations Research, 139, 469–489.
Safety, 91, 362–369. Wilson, J. G., & Benmerzouga, A. (1990). Optimal m-failure policies with random
Silva Soares, Ana da, & Latouche, G. (2006). Matrix-analytic methods for fluid repair time. Operations Research Letters, 9, 203–209.
queues with finite buffers. Performance Evaluation, 63, 295–314. Wu, Shaomin, & Clements-Croome, Derek (2005). Preventive maintenance models
Sun, Ji-Wen, Xi, Li-feng, Du, Shi-Chang, & Pan, Er-Shun (2010). Tool maintenance with random maintenance quality. Reliability Engineering and System Safety, 90,
optimization for multi-station machining systems with economic consideration 99–105.
of quality loss and obsolescence. Robotics and Computer-Integrated Yeh, Ruey Huei, & Lo, Hui-Chiung (2001). Optimal preventive-maintenance
Manufacturing, 26, 145–155. warranty policy for repairable products. European Journal of Operational
Vaughan, Timothy S. (2005). Failure replacement and preventive maintenance spare Research, 134, 59–69.
parts ordering policy. European Journal of Operational Research, 161, 183–190. Zheng, X., & Fard, N. (1991). A maintenance policy for repairable systems based
Wallace, V. (1969). The solution of quasi birth and death processes arising from on opportunistic failure rate tolerence. IEEE Transactions on Reliability, 40,
multiple access computer systems. Ph.D. Thesis. Systems Engineering 237–244.
Laboratory, University of Michigan, Tech. Rept. No. 07742-6-T.

You might also like