Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Reliability Engineering and System Safety 67 (2000) 61–73

www.elsevier.com/locate/ress

A Monte Carlo methodological approach to plant availability modeling


with maintenance, aging and obsolescence
E. Borgonovo, M. Marseguerra*, E. Zio
Dipartimento di Ingegneria Nucleare, Politecnico di Milano, Via Ponzio 34/3, 20133 Milan, Italy
Received 14 March 1999; accepted 23 June 1999

Abstract
In this paper we present a Monte Carlo approach for the evaluation of plant maintenance strategies and operating procedures under
economic constraints. The proposed Monte Carlo simulation model provides a flexible tool which enables one to describe many of the
relevant aspects for plant management and operation such as aging, repair, obsolescence, renovation, which are not easily captured by
analytical models. The maintenance periods are varied with the age of the components. Aging is described by means of a modified Brown–
Proschan model of imperfect (deteriorating) repair which accounts for the increased proneness to failure of a component after it has been
repaired. A model of obsolescence is introduced to evaluate the convenience of substituting a failed component with a new, improved one.
The economic constraint is formalized in terms of an energy, or cost, function; optimization studies are then performed using the main-
tenance period as the control parameter. q 1999 Elsevier Science Ltd. All rights reserved.
Keywords: Monte Carlo simulation; Periodic maintenance; Aging; Obsolescence; Availability; Energy function; Optimization

1. Introduction approach for predicting component failures using physical


information on the actual state of the operating equipment.
Managing an industrial plant entails evaluating and trad- In many instances, this approach proved more effective than
ing off the conflicting objectives of safe operation and the large preventive maintenance programs: detailed studies
economic service. For this reason, the relevant reliability of the mechanisms of component failures resulted in better,
studies which assess the reliability, availability and safety more reliable designs.
levels of operation under the given maintenance and repair More recently, a very successful systematic method for
strategies should be coupled with economic analyses aiming establishing maintenance programs, the Reliability
at reducing the plant associated costs of downtime, main- Centered Maintenance (RCM) method, has started to
tenance and repair. break through in many industries [3]. This method directs
The first scientific approaches to this management maintenance efforts towards those parts and units which are
problem date back to the 1950s and 1960s and can be critical from the point of view of reliability, safety and
found in the review paper by McCall [1] and in the book production regularity. A decision logic and specific forms
by Barlow and Proschan [2]. As a result, various so-called are used to identify the worthwhile maintenance activities.
maintenance optimization models were introduced in which The approach is more qualitative than the optimization
both costs and benefits of maintenance were quantified and models but it is more all-embracing than models which
an optimum compromise between the two was sought. have only a limited capability [4]. Extensions to classical
Well-known models originating from this period are the RCM are being investigated aiming at introducing mainte-
so-called age and the block replacement models. nance optimization by means of cost and reliability models.
From the practical point of view, at that time, preventive Benefits from the introduction of statistical decision tools
maintenance was strongly advocated as a means to reduce are certainly expected.
failures and unplanned downtime. In many companies, large Overall, this area of management has seen a flourishing
time-based preventive maintenance programs were set up. effort through the years from researchers in the fields of
In the 1970s, condition monitoring became a popular operations research, management science, reliability engi-
neering [5–10]. Classical gradient descent methods [11], as
* Corresponding author. Fax: 1 39-2-2399-6309. well as modern techniques such as genetic algorithms [12],
E-mail address: marzio.marseguerra@polimi.it (M. Marseguerra) are being considered for the optimization task.
0951-8320/00/$ - see front matter q 1999 Elsevier Science Ltd. All rights reserved.
PII: S0951-832 0(99)00046-0
62 E. Borgonovo et al. / Reliability Engineering and System Safety 67 (2000) 61–73

2. A quantitative model for plant availability and


λ
economic management

In this section we present the basics and fundamentals of


the Monte Carlo simulation model used for plant reliability
and availability analysis as well as for its economic evalua-
tion.

2.1. Failure and repair with aging


Zone A Zone B Zone C t As it is well known, the failure behavior of many mechan-
ical and electrical components can be described in terms of a
Fig. 1. Typical behavior of component failure rate.
bath-tub time-dependent failure rate l…t† such as that of Fig.
1. Adopting a language mutated from life, we can interpret
The present work locates itself within this management/ the three zones of this figure as those corresponding to the
operation area with the objective of developing a methodo- infancy, to the maturity and to the senility, with decreasing,
logical framework to account for many of the realistic issues constant and increasing danger of death, respectively.
involved in plant reliability and economic management, It is common practice to represent analytically this beha-
such as maintenance, aging and obsolescence. Generally, vior in terms of a composition of cumulative distribution
these aspects are strongly interrelated so as to render an functions (CDFs) of the failure times, each having the
analytical approach basically prohibitive for real scale following Weibull expression:
applications. For this reason we devote our attention to Zt
Monte Carlo simulation which provides a flexible tool 2 l…t† dt a
capable of accounting for many of these aspects in a quite F…t† ˆ 1 2 e 0 ˆ 1 2 e2…bt† …1†
straightforward manner.
where l…t† ˆ a × ba × ta21 is the time-dependent failure
In Section 2 we present the model which forms the basis
rate. The corresponding probability density function
for the development of our Monte Carlo approach: failure
(PDF) is:
and repair behaviors with aging effects are considered
a
together with an adaptive, periodic maintenance strategy; f …t† ˆ a × …b†a × …t†a21 × e2…bt† …2†
moreover, an appropriate cost, or energy, function is intro-
duced to assess the monetary value of the plant operation. In the present work we shall consider a single CDF
This cost function will provide the basis for an optimization having a monotonically increasing failure rate, i.e. a l…t†
process of the management and operation of the system. In with a . 1: This conservative assumption amounts to
analyzing different management options we will assume considering aging effects since the beginning of the life of
that the system has already been designed to satisfy the the components. We shall assume that aging develops with a
safety constraints; in other words, the system configuration, value a ˆ 2:
redundancy allocation and component failure characteristics The effects of component aging are counterbalanced by
have been selected so as to guarantee a certain level of maintenance actions, performed with period t , which reju-
safety. Then, in the optimization we focus only on the avail- venate the component. In practice, during the interval t , the
able strategies for system maintenance, here represented by failure rate l…t† increases only slightly so that, to simplify
different choices of the maintenance periods of the various the calculations, we resort to exponential distributions char-
system components. In Section 2 we also present a possible acterized by stepwise constant failure rates whose values are
model of technological evolution and obsolescence which determined by imposing that the probabilities of failures
can be included within the Monte Carlo simulation in a within each maintenance period coincide for the Weibull
direct way. Section 3 contains a simple application of the and the exponential distribution. More specifically, in the
model to a reference system drawn from the literature. generic interval of length t , the probability of a failure is
Section 4 shows how the Monte Carlo scheme can be a
exploited to perform an optimization on the period of main- PW …t # t† ˆ 1 2 e2…bt† …3†
tenance: an application to the system of reference is
for the Weibull distribution, and
provided. In the application of the model we followed the
p
methodological spirit of the work in that the choices of the Pe …t # t† ˆ 1 2 e2l t
…4†
numerical values of the model parameters are not necessa-
rily realistic but rather they highlight certain particular for the exponential distribution.
model effects. Finally, we end the paper with some general For the two CDFs to give the same probability of failure
remarks on the problem and some specific comments on the within ‰0; tŠ the failure rate of the exponential distribution
method proposed. must have the effective value (constant throughout the
E. Borgonovo et al. / Reliability Engineering and System Safety 67 (2000) 61–73 63

cumulative distribution function probability density functions: weibull vs exponential


1 0.03
weibull: α = 1.1
0.9 exponential:λ = .0206

0.8 0.025

0.7
0.02
0.6
cdf

0.5

cdf
0.015

0.4

0.3 0.01

0.2
0.005
0.1 weibull: α = 1.1
exponential:λ = .0206
0 0
0 20 40 60 80 100 0 20 40 60 80 100
time (h) time (h)

(a)
(b)

Fig. 2. (a) Cumulative distributions: weibull (a ˆ 1:1, b ˆ 0:021) vs exponential (l ˆ 0:0206). (b) Probability density functions: weibull (a ˆ 1:1,
b ˆ 0:021) vs exponential (l ˆ 0:0206).

maintenance period) CDFs are quite different and the failures occur at very differ-
ent times: the exponential distribution somewhat favors
1
lp ˆ ba ta21 ˆ l…t† …5† early failures whereas the Weibull distribution shifts the
a failures to later times, closer to the end of the period.
which is the average of the l…t† function over the period t . Note that the effective failure rate of a component, l p, is
Obviously, for a -values close to unity the Weibull distribu- strictly linked to the maintenance period t and this will have
tion is almost exponential and the two failure models are significant effects on the optimization of maintenance with
almost coincident throughout (Fig. 2(a) and (b)); otherwise, respect to this parameter, as it will be seen below.
the discrepancy becomes significant (Fig. 3(a) and (b)) and As for what concerns the repair process, we shall adopt
coincidence of the CDFs occurs only at t ˆ t: In this latter the usual assumption of constant repair rate, m . Although we
case, assuming small repair times, the average number of realize that the repair process is all but Markovian, this
failures within the period is the same by construction but the assumption, which can be easily removed in a Monte

cumulative distribution function probability density functions: weibull vs exponential


1 0.03
weibull: α = 2
0.9 exponential:λ = .0176
0.025
0.8

0.7
0.02
0.6
cdf

cdf

0.5 0.015

0.4
0.01
0.3

0.2
0.005
0.1 weibull: α = 2
exponential:λ = .0176
0 0
0 20 40 60 80 100 0 20 40 60 80 100
time (h) time (h)

(a) (b)

Fig. 3. (a) Cumulative distributions: weibull (a ˆ 2, b ˆ 0.021) vs exponential (l ˆ 0.0176). (b) Probability density functions: weibull (a ˆ 2, b ˆ 0.021) vs
exponential (l ˆ 0.0176).
64 E. Borgonovo et al. / Reliability Engineering and System Safety 67 (2000) 61–73

λ deteriorating and the component emerges with a failure rate


increased by a given percentage p of its value before the
failure. The analyst-defined parameter p then specifies the
amount of deterioration induced by the failure-repair
process.

τ 2.2. Maintenance
t It is well recognized that maintenance is a central theme
of plant management: indeed, an efficient maintenance
Fig. 4. Linear growth of failure rate within the maintenance period t and
counterbalancing effect of maintenance. policy may ensure safe, reliable and economic operation.
On the contrary, an inefficient scheduling and choice of
maintenance actions guarantees at least a waste of
Carlo approach, will allow us to compare the analytical resources.
results with those obtained by the Monte Carlo simulation. Various maintenance criteria have been followed to fulfill
Moreover, since the asymptotic system availability of a the large variety of requirements and constraints of the
repairable component depends on the average repair time, industrial world. More recently, the criterion of RCM has
the approximation of constant repair rate is significant only been supported as a unifying concept in maintenance prac-
during the transient evolution [13]. tice [3]. It is essentially a qualitative approach aiming at
Even with maintenance counterbalancing its effects, developing a maintenance scheme which satisfies both the
some aging of the components is inevitable. Given that reliability and the economic constraints of plant manage-
we are going to use equivalent constant transition rates, ment and operation. The main goal of RCM is that of iden-
we account for an effect of component deterioration due tifying those maintenance activities which guarantee a
to extensive operation through the outcome of the repair certain level of system reliability: however, in the case of
process. More specifically, we assume that as a result of a limited resources, as is always the case in practice, a
repair action the component might not necessarily return to compromise is sought between the number, frequency and
an “as good as new” condition since it is likely to become the type of intervention and the operation and management
more fragile and prone to future failures. To account for costs of the plant. It is then clear how relevant it is to study
imperfect, deteriorating repairs, we adopt a modified efficient maintenance strategies and how important it is to
Brown–Proschan model of stochastic repairs which postu- develop appropriate models that render the analysis quanti-
lates that a system is repaired to an “as good as before” tative.
condition (minimal repair) only with a certain probability The model here proposed is based on the following
p and is, otherwise, returned in a “deteriorated” condition assumptions: (i) maintenance occurs only while the system
(deteriorating repair) [14,15]. Thus, these two conditions is available; (ii) maintenance periodicity varies with the
obey a Bernoulli distribution. Inclusion of this model within component’s aging due to imperfect, deteriorating repair;
the Monte Carlo simulation scheme is straightforward. (iii) the maintenance action is such to restore the conditions
When a repair action is completed, we sample a uniform existing at the beginning of the previous maintenance period
random number r in ‰0; 1Š: if r , p; then the repair is mini- (Fig. 4); and (iv) the maintenance action is instantaneous.
mal and the failure properties of the component are returned In realistic situations, the maintenance activities become
to the conditions existing prior to failure; otherwise, repair is more and more frequent as the component ages. In our

λ*(t)

τ’ τ’
Trip

τ τ

Fig. 5. Adaptive maintenance period for an aging component. After component failure and repair (with time Trep) the component ages according to the Brown–
Proshan model and the maintenance period is shortened from t to t 0 .
E. Borgonovo et al. / Reliability Engineering and System Safety 67 (2000) 61–73 65

model, effectively, deterioration of a component is due to denote by knj l the average number of repairs underwent
the imperfect repairs, as for the Brown–Proschan model by the jth component over the whole mission time TMISS.
previously introduced, and we allow for an adaptive sche- The energy function can then be written as follows:
dule of maintenance intervention according to which the ZTMISS X 1
ratio between the maintenance period t and the mean time E…TMISS ; t† ˆ ‰B0 A…t; t† 2 CMj A…t; t†Š dt
1=l between successive failures is kept constant. Thus, after 0 j
tj
a minimal repair lnew ˆ lold and, then, tnew ˆ told ; vice X 1
versa, after a deteriorating repair, lnew ˆ …1 1 p†lold and, 2 CRj kn l …7a†
then, tnew ˆ told lold =lnew ˆ told =…1 1 p†: Fig. 5 shows the j
mj j
situation for a ˆ 2 and p ˆ 1:
or
2.3. The profit function E…TMISS ; t† ˆ E0 2 …EU 1 EM 1 ER † …7b†

Plant management is inevitably affected by economic where:


constraints. In order to quantify the consequences of a E0 ˆ B0 TRMISS is a constant term, independent of t;
given management action in economic terms it is common EU ˆ B0 T0 MISS U…t; t† dt is the cost of downtime, U…t; t†
practice to introduce a profit (cost) or energy function E being the
P plantRinstantaneous unavailability;
which contains all the relevant factors affecting the plant EM ˆ j CMj T0 MISS …1=tj †‰1 2 U…t; t†Š dt represents the
operation and management from an economic point of view maintenance costs; in particular, the integral represents
[16–18]. Given a set of alternative management strategies, the mean number of maintenance actions for component j
the choice will fall on the one which gives the largest value duringPTMISS;
of E. ER ˆ j CRj …1=mj †knj l is the repair cost.
In general, neglecting any financial interest, we can write
the profit function as follows: The three terms composing the integral of the energy
function E in Eq. (7a) are directly computed during the
ZTMISS Monte Carlo simulation. The plant profit is accumulated
E…TMISS † ˆ ‰B…t† 2 C…t†Š dt …6†
0 during each trial by collecting the time intervals of the
available plant operation and multiplying them by the
where the first term denotes the benefits obtained from plant assumed constant hourly base profit B0. The maintenance
operation; the second term denotes the global plant manage- cost in a single trial is obtained by computing the number of
ment and operation costs. maintenance actions undertaken during the mission time for
As defined, the profit function E gives the total benefit each component j, multiplying it by the cost of each main-
received from a given plant management and the operation tenance action CMj, and summing over all components.
strategy in a period TMISS. Often in practice, the mean net Finally the repair expenditures in a single trial are
hourly benefit, namely E divided by the mission time TMISS, computed as follows: when failure of the jth component is
is taken as an indicator of management performance. sampled, the repair clock is started to measure the time
Often an additional term containing the risk of plant acci- interval required for the completion of the repair process;
dents and the costs of consequences to the external environ- the clock then stops when the sampled transition is the
ment is considered. However, here this term is neglected as repair of that component: at this point, the time spent to
we assume that the optimization with respect to the safety repair the component is known and we multiply it by the
aspects of plant operation have already been accounted for hourly cost CRj to obtain the cost of that repair action. By
in the design (in particular with the choice of the system collecting all the contributions from all repairs of all compo-
configuration, redundancy allocation, components reliabil- nents occurred in the trial, we get the total cost due to repairs
ity, etc.) and we focus on the optimization of the period of for that trial.
components maintenance. At the end of the simulation, the collected profits and
We assume that the net hourly profit from plant operation maintenance and repair costs are averaged over the number
is proportional to the plant availability A…t; t…t††; i.e. of trials so as to provide us with an estimate of the profit,
B…t; t…t†† ˆ B0 A…t; t…t††; in which we have made explicit maintenance and repair costs (first, second and third term of
the dependence from the vector of the maintenance periods Eq. (7a), respectively); division by the mission time gives
t…t† ˆ ‰t1 …t†; t2 …t†; t3 …t†…Š of the various components. The the corresponding hourly rates.
elements of t…t† serve as a control parameter in the maxi-
mization of the energy function. We remind that the t js are 2.4. Obsolescence
stepwise dependent on t as explained in Section 2.2 (see Fig.
5). To keep this notation from becoming too unwieldy, in The problem of obsolescence is becoming more and more
the following we shall drop the time dependence. We intro- important in the management of modern plants [19]. Obso-
duce the cost CMj of a single maintenance action on compo- lescence is defined as the loss in value of a component,
nent j, and the hourly cost CRj of its repair. Finally, we system or plant, due not to its conditions or past operation
66 E. Borgonovo et al. / Reliability Engineering and System Safety 67 (2000) 61–73

history but to a change in the external scenario of techno- maintenance of the renovated system, from now on,
logical evolution and marketing [20]. The overcoming of a compare with those that would be obtained with the old
given technology due to technical, legislative and/or system, without renovation. The profit functions E…t !
marketing reasons typically leads to a decrease in value of TMISS ; tujNEW † and E…t 1 1=mj ; t ! TMISS uj† will serve as
the system which is not necessarily related to its past or measures of the benefits and costs in the renovated and
current performance but can certainly influence its future old system configuration, respectively. Note that the reno-
life. Indeed, the availability on the market of improved vation process is considered instantaneous whereas the aver-
components offers the enviable opportunity to plant age repair time 1=mj of the failed component is accounted for
managers of upgrading their system performance while explicitly.
rejuvenating the system itself. We now have all the ingredients to make a decision on
In this section, we formalize the issue of obsolescence in how to proceed when the jth component fails: we will
a quantifiable manner and see how resource constraints play proceed to, renovation, instead of repair, if:
a fundamental role in the management of this problem.
Qualitatively, we can say that as the system components E…t ! TMISS ; tujNEW † 1 CNjNEW …t†
age, the overall management costs increase mainly due to !
1 1
downtime costs; at the same time, new, improved compo- #E t1 ! TMISS ; tuj 1 CRj 1 VRj …t†: …9†
nents become available on the market and this further mj mj
reduces the current plant value; the substitution of an old The two sides of the inequality represent the total profit of
component with a new, improved one does increase the running the system for the remaining portion of the mission
system performance but at a cost of purchase. The problem time with the new or the old (repaired) component, respec-
posed by the obsolescence issue is, then, that of deciding tively.
whether to continue operation with the current plant status A highly complicated issue is the evaluation of the profits
or renovate it, partially or totally. To account for the various E…t ! TMISS ; tujNEW † and E…t 1 1=mj ! TMISS ; tuj†: The
issues at stake, we postulate that as calendar time goes by complication lies in the fact that, from the current system
new components are available on the market and they are status as resulting from the past failure–repair–aging
characterized by a failure rate which decreases exponen- history, one should consider all possible future evolutions,
tially. If we buy a component at time t0, components of thus facing the combinatorial explosion of possible scenar-
the same kind appear in the market at later times and they ios. In our Monte Carlo approach this problem is drastically
have better l ’s according to the expression: approximated by following system evolutions in which only
one component can be renovated during TMISS. For what
lji!l …t† ˆ lji!l …t0 † e2s …t2t0 †
j
t $ t0 …8† concerns E…t ! TMISS ; tujNEW †; before starting the Monte
Carlo simulation we establish a suitable sequence of times
where s j is the rate of decrease in the failure rate of the Ti, i ˆ 1; 2; …N0 : For each component we, then, pre-
newly produced components and lji!l …t0 † is the failure rate compute N0 batches of 1000 Monte Carlo trials to evaluate
of the component purchased at t0. the value of E 0 …Ti ! TMISS ; tujNEW †; i ˆ 1; 2; …N0 ; which
Typically the decision of replacing the nominal jth represents the profit of operating the system with the reno-
component with a new, improved one is faced at the time vated component from the time Ti to the end of the mission
of failure, so that the alternatives are: repair, at an average time. For the evaluation of E…t 1 1=mj ! TMISS ; tuj†; before
cost CRj × 1=mj (where CRj is the hourly cost of repair of the starting the Monte Carlo simulation we pre-compute one
jth component and the second factor represents the nominal batch of 1000 trials with no renovations. When during the
average time for repair completion), or renovation by actual simulation the jth component fails at time t, we inter-
purchasing a new component jNEW that has become available polate between the adjacent Tip values to determine E…t !
in the market at time t, CNjNEW …t†: This cost depends on TMISS ; tujNEW † and retrieve the pre-computed value of E…t 1
many factors related to the patterns of evolution of both the 1=mj ! TMISS ; tuj†; then, we insert these quantities in the
related technology and market. Since a detailed modeling of inequality (9) to decide whether to actually substitute or
these factors is beyond the scope of this paper, we simplified not the failed component.
the issue by considering the purchase cost constant over The above approximations seem reasonable since, in
time. general, we do not expect the important components of
An additional factor in the decision is the residual value the system to be renovated frequently within TMISS.
VRj …t 2 t0 † of the jth component, if repaired. In this regard,
we assume that it decreases continuously from the time of
purchase t0 according to an exponential law with parameter 3. The reference system
u j.
Finally, the decision of replacing a component with a For the application of the proposed methodology we
newly available one depends also on how the increased considered a gas compression system, taken from literature
benefits and reduced costs associated to the operation and [16], comprising an active and a standby compressor which
E. Borgonovo et al. / Reliability Engineering and System Safety 67 (2000) 61–73 67

and after the transition, following the multiplicative correla-


tion model proposed in Ref. [21] and briefly described here.
This model allows one to modify the transition rates by
multiplying the original values times a pre-defined multi-
plication factor which depends on the current and arrival
configurations. More precisely, in our case, the rates of
the transitions a ! s and s ! a for component C2 are
chosen several orders of magnitude smaller than the other
ones so that these transitions are essentially inhibited with
the system in nominal configuration. However, when the
active component C1 fails, a correlation is activated
which forces the standby component C2 to perform the
transition s ! a: To do this, the correlation is such to multi-
ply the transition rate l2s!a by a very large factor which
renders it now several orders of magnitude larger than the
rates of the other transitions (repair of C1, failure of C2).
Fig. 6. System unavailability. W, analytic with no aging; 1 , Monte Carlo
with no aging; p , Monte Carlo with aging …p ˆ 0:8; p ˆ 0:3; t ˆ 20 h†: Then, the next stochastic transition is basically instanta-
neous and certain to bring the standby component C2 to
the active operation mode. Analogously, we proceed to
will be denoted as components C1 and C2, respectively. return component C2 to the standby mode when the failed
Each component can only be in two states: good (g) or failed active component C1 is brought back to operation after
(f). For component C1, a failure cause due to mechanical repair. Moreover, the assumption that the system configura-
wear of the lube oil pump has been identified as worth the tion for which both components are failed is an absorbing
effort of a periodic maintenance task, consisting in lube oil one is here implemented by introducing yet another correla-
replacement every t hours. Furthermore, such component tion to basically inhibit the repair transition of C1 (recall
can undergo repair with a given rate, m1f!g : For component that in the model this is the only component to undergo
C2, the good state is further specified into active (a) or cold repair) when the current system configuration is that of
standby (s) mode (i.e. it can only fail once put into service). both components being failed.
In nominal configuration, the system operates with C1 in the These applications of the correlation model proposed in
good state and C2 in cold standby. For simplicity, we Ref. [21] confirm the great flexibility that such model intro-
assume that the compressor system configuration corre- duces, handling, in particular, various kinds of dependencies
sponding to both compressors being failed is an absorbing as well as deterministic, operation-mode transitions.
one. Fig. 6 reports the comparison of the analytical (circles)
Due to the methodological nature of the present work, in and Monte Carlo (crosses) unavailability solutions for the
all the numerical applications which follow our attention is reference system with no aging. The components’ failure
devoted to the effects that the various model assumptions and repair rates are reported in Table 1. The input data for
have on the overall results and not on the significance of the the cost evaluation of the reference system are given in
actual results which may sometimes be unrealistic. Table 2. Furthermore, recall that component C2 is in cold
Given the relative simplicity of the system, in the absence standby, so that l2g!f …h21 † ˆ 0; while the deterministic
of aging and obsolescence phenomena, its availability can transitions from standby to active and vice versa are treated
be computed analytically by solving the related Markov through the correlation model just explained. The mission
chains [16]: the results obtained for this case have served time was taken as 4000 h; the number of trials in this and all
us for partially testing the Monte Carlo simulation model following Monte Carlo simulations is 10 5, unless otherwise
developed. specified. The good agreement of the two solutions is
In this regard, few words of mention are deserved for evident.
what concerns the Monte Carlo treatment of the active-to- The starred-curve in Fig. 6 refers to the case with aging
standby …a ! s† and standby-to-active …s ! a† transitions. after repair …p ˆ 0:8 and p ˆ 0:3 in the Brown–Proschan
In the present work, these transitions are simulated by estab- model) and a periodic maintenance with t ˆ 20 h: The
lishing a proper correlation between the system states before aging process is shown to worsen significantly the availabil-
ity performance of the system, as expected. Only the Monte
Table 1 Carlo solution is computed in this case, the analytic
Transition rates
approach becoming rather cumbersome.
Component j lig!f (h 21) mjf!g (h 21) For what concerns the energy function E of Eqs. (7a) and
(7b), since we will eventually be interested in optimizing the
1 4:14 × 1024 1
maintenance strategy with respect to the period t and since
2 0.1 0
E0 is a constant term, independent of t , in what follows we
68 E. Borgonovo et al. / Reliability Engineering and System Safety 67 (2000) 61–73

Table 2 becomes more and more unavailable. The combination of


Input data for the cost evaluation of the reference system these three contributions gives rise to the global cost beha-
CU ($/h) CR1 ($/h) a CM1 ($) a t (h) vior E 0 =t of Fig. 7(d) where the dominance of the downtime
term becomes evident.
100 5 10 24 Let us now see what happens to the system costs when the
a
Maintenance and repair actions are performed only on the active
Brown–Proschan aging-after-repair effects are included (as
component C1. before, p ˆ 0:8 and p ˆ 0:3). Recall that besides worsening
the availability performance of the system, the effect of
aging will also be that of shortening the maintenance period
shall directly consider the costs E 0 ˆ E0 2 E ˆ EU 1 EM 1 t (so as to maintain the product lp t) constant. This increase
ER ; so that the maximization of E corresponds to a mini- in the maintenance frequency inevitably makes the asso-
mization of the overall costs E 0 . Fig. 7 reports the time ciated costs to increase. In order to highlight the effects of
behavior of the hourly costs EM =t; EU =t; ER =t and E 0 =t result- maintenance costs, we have taken the previous case but for
ing from the Monte Carlo simulation in the case of no the hourly downtime cost CU which has been (somewhat
Brown–Proschan aging. unrealistically) set equal to just $1/h. Fig. 8 reports the
The decreasing behavior of EM =t (Fig. 7(a)) is a conse- hourly and integral behavior of the cost curves, as computed
quence of the fact that maintenance is performed only when by the Monte Carlo simulation. One immediately sees that
the system is available: as times goes by, the system the contribution due to maintenance has a maximum
becomes less and less available (Fig. 6) and, correspond- (circles). At the beginning, aging degrades the system
ingly, the maintenance actions are reduced. On the contrary, performance by increasing the transition failure rates of
as expected from the definition, the downtime cost EU =t the system components. As just explained, this induces
increases proportionally to the system unavailability (Fig. more frequent maintenance actions which make the hourly
7(b)). Finally, also the repair cost ER =t shows a decreasing costs increase. However, as the system availability
behavior (Fig. 7(c)): this is due to the assumption of the decreases, the downtime increases: since during downtime
absorbing state when both components are down so that no maintenance actions are undertaken, in the long run as
the number of repairs are also reduced as the system unavailability increases the maintenance costs decrease.

Fig. 7. System cost: (a) maintenance; (b) downtime; (c) repair; (d) total.
E. Borgonovo et al. / Reliability Engineering and System Safety 67 (2000) 61–73 69

Fig. 8. (a) Instantaneous and (b) integral behavior of the costs as a function of time, in case of Brown–Proshan aging after repair …p ˆ 0:8; p ˆ 0:3; t ˆ
20 h†:

Note also how, in this unrealistic case, the chosen cost between the failure rate of a component and the mainte-
values are such that maintenance expenditures become nance period t (Eq. (5)) which forms the basis for the
predominant at most times, when maintenance actions are present optimization of the maintenance period. This
made very frequent so as to overcome the running aging implies that the optimization regards only the starting main-
process. tenance period referring to ‘as new’ components; this initial
The repair costs also show a similar, peaked behavior maintenance period is then modified during the compo-
(crosses). Indeed, aging at the beginning worsens the beha- nents’ life so as to counteract the Brown–Proschan degra-
vior of the components, thus inducing also more frequent dation of the corresponding failure rates, as explained in
repairs. However, since the system failure (both components Section 2.2. Indeed, Eq. (5) implies that the smaller is t
C1 and C2 down) constitutes an absorbing configuration, as the smaller the components’ failure rates are, so that the
time goes by the system is more and more in this state of system is more available and produces more benefits; on
unavailability where repairs are not performed. Finally, the the other hand, there will be more maintenance actions
value E 0 …TMISS ; t† of the cumulated global costs at TMISS in and this will increase the associated costs. Therefore, if
Fig. 8(b) can be considered as the total amount of expenses we look at the contributions of the cost function E 0 in
needed to operate the system up to that time. This shall be Eqs. (7a) and (7b), we expect that as t decreases: the down-
compared with the gained benefits from plant operation. time cost EU decreases, as does the contribution due to
repairs, ER, since the probability of failure decreases; the
maintenance term EM, on the contrary, increases. The opti-
4. Maintenance optimization mal choice of t will then represent a compromise for the
behaviors of the various contributions. Obviously, if t p
The problem of choosing maintenance strategies is of turns out to be very small, this means that the system design
foremost importance in plant management and operation. and/or components’ choice were poor; moreover, the
An efficient strategy should aim at guaranteeing the level assumption of instantaneous maintenance should be re-
of performance and availability of the system while allow- visited. In the opposite case, a very large t p would imply
ing for a reduction in the resource expenditures. a very good system design and/or components characteris-
The Monte Carlo scheme proposed here allows for a tics; this would bring in issues of capital costs/interests and
quantitative analysis of the maintenance strategies. The their relation to the goodness of the components which have
definition of an appropriate system energy model, such as not been here considered, for simplicity.
the one proposed in the previous section, enables one to While an analytical search for the optimal t -value is
perform a search of the optimal strategy in terms of a maxi- exceedingly complicate, the Monte Carlo approach is rather
mization of the energy function E (Eqs. (7a) and (7b)), straightforward. We, a priori, define a range for t within
which corresponds to a minimization of E 0 ˆ EU 1 EM 1 which the search is to be performed. From this range, we
ER : In this section the search of the optimal maintenance select a number of values t i, i ˆ 1; 2; …; NM ; and for each
period t p for the reference system presented in Section 3 is value we perform a batch-Monte Carlo evaluation of the
performed within the Monte Carlo simulation framework. cost function E 0 …t; ti †: The choice of the optimal period
As mentioned in Section 2.2, to describe the failure beha- will then fall on that value t p which minimizes the cumu-
vior of the system components instead of considering a lative cost E 0 …t; tp † at t ˆ TMISS : This approach was applied
Weibull distribution, we introduce an equivalent exponen- to the reference system for values of t i in the range ‰2; 38Š
tial distribution such that the probability of a failure at any (in hours) and with a value of b (Eq. (5)) such that the
time within an interval between maintenance actions is the failure rate of component C1 (which is the only one to
same. By doing so, we are able to establish a connection be maintained and repaired) is equal to 9 × 1024 h21 in
70 E. Borgonovo et al. / Reliability Engineering and System Safety 67 (2000) 61–73

Fig. 9. System costs as a function of the maintenance period t : (a) maintenance; (b) downtime; (c) repair; (d) total.

correspondence of the mean value t ˆ 20 h: It turns out that hourly cumulative costs due to downtime, maintenance and
b ˆ 0:021: The number of Monte Carlo trials per batch was repair, as well as for the global hourly cost
chosen equal to 10 000. E 0 …TMISS uti †=t …$=h†; for various values of t i: the optimal
We first consider the simple case with no aging and obso- choice for the period t p turns out to be 9.2 h.
lescence, for which the analytical solution to the energy As expected, Fig. 9(a)–(d) shows that for small values of
function can be obtained [16]. Fig. 9 shows the good agree- t i, the costs due to highly frequent maintenance actions are
ment between the analytical and Monte Carlo results for the such to increase considerably the global costs; as the value

Fig. 10. System costs as a function of the maintenance period t in presence of aging …p ˆ 0:8; p ˆ 0:3†:
E. Borgonovo et al. / Reliability Engineering and System Safety 67 (2000) 61–73 71

Fig. 11. System cumulative benefits and total costs as a function of time: (a) non-optimized t ˆ 20 h; (b) optimized tp ˆ 5:6 h:

of t i is increased, the maintenance costs decrease but down- obsolescence (circles). The effect of obsolescence is seen to
time, and its associated costs, slowly increase: the conflict- improve significantly the availability performance of the
ing trends of these contributions give rise to a minimum of system as it counteracts the aging of the components.
the global cost function; finally, for very low maintenance Fig. 13 compares the effects of obsolescence on the
frequencies the system availability deteriorates significantly cumulated total costs E 0 . Obviously, the results strongly
and the dominating contribution of the downtime costs gives depend on the input data: as shown in the figure, the operat-
rise to very large global cost values. ing costs of the system are significantly lowered in the case
Let us now see what happens when aging is considered. of renovation of component C1 at a price CN1 equal to $5
Fig. 10 reports the Monte Carlo simulation results for the (a); the advantages of renovation are completely defeated
case of Brown–Proschan aging with p ˆ 0:8 and p ˆ 0:3: when the cost of a new component C1 is raised from $5 to
The optimal maintenance period reduces to tp ˆ 5:6 h : this $100 (b).
is due to the fact that aging worsens the availability of the
system so that the contribution of the downtime costs is felt
at earlier times. The importance of the choice of the main- 5. Conclusions
tenance period can also be seen from the point of view of the
cost-benefit analysis of plant operation. Fig. 11 shows the The operation and management of a plant requires proper
cumulative benefits and total costs, in dollars, as a function accounting for the constraints coming from safety and relia-
of time for the aging system with (a) non-optimized period bility requirements as well as from budget and resource
t ˆ 20 h and (b) optimized period tp ˆ 5:6 h. In the first considerations. The analyses that need to be performed in
case, the plant turns out to be making a net profit only for the order to evaluate the maintenance strategies and operating
first 2300 h, after which the downtime costs are such to procedures need to consider many practical aspects, such as
dominate (Fig. 11(a)). On the contrary, the optimized main- aging, repair, obsolescence, renovation, which are almost
tenance period is such to render profitable the operation of
the plant throughout the mission time (Fig. 11(b)).
Finally, we investigate the effects of obsolescence on the
reference system of Section 3 with the data of Table 3, N0 ˆ
3 and maintenance period t ˆ 20 h:
Fig. 12 reports the results of the system unavailability for
the cases of no aging (stars), aging (crosses) and aging plus

Table 3
Data for the obsolescence process

Component j CNj ($) s j (h 21) u j (h 21)

1 5 100 1:2 × 1024 0.05


2 5 0a 0b
a
No improved types of the standby component C2 become available (no
obsolescence).
b
Component C2 does not lose value since it does not undergo obsoles- Fig. 12. System unavailability as a function of time, under no aging/no
cence. obsolescence ( p ); aging/no obsolescence ( 1 ); aging/obsolescence (W).
72 E. Borgonovo et al. / Reliability Engineering and System Safety 67 (2000) 61–73

Fig. 13. Total cumulative costs for the system, with and without obsolescence. (a) CN1, $5; (b) CN1, $100.

impossible to be captured by analytical models. The Monte References


Carlo simulation instead provides a flexible tool which
enables to describe efficiently many of the relevant aspects [1] McCall JJ. Maintenance policies for stochastically failing equipment:
for plant management and operation. a survey. Mgmt Sci 1965;11:493–524.
[2] Barlow RE, Proschan F. Mathematical theory of reliability, New
In this paper, we have presented a Monte Carlo approach York: Wiley, 1965.
to the availability analysis of complex systems under peri- [3] Nowlan FS, Heap HF. Reliability-centered Maintenance. Technical
odic maintenance strategies and within the economic Report AD/A066-579. National Technical Information Service, US
constraint of limited resources. The maintenance interven- Department of Commerce, Springfield, VA, 1978.
tions have the objective of defeating the effects of aging: [4] Horton M. Optimum maintenance and RCM. Proceedings of the 3rd
EsReDa Seminar on Equipment Aging and Maintenance, Chamonix,
these latter have been accounted for by means of a modified
France, 14–15 October 1992.
Brown–Proschan model of imperfect, deteriorating repair [5] Piersaklla WP, Voelker JA. A survey of maintenance models: the
which accounts for the increased proneness to failure of a control and surveillance of deteriorating systems. Nav Res Log
repaired component. The effects of obsolescence have also Quart 1979;23:353–88.
been accounted for by assuming the availability of new [6] Bosch K, Jensen U. Maintenance models: a survey. Parts 1 and 2. OR
components on the market, characterized by exponentially Spektrum 1983;5:105–18 see also p. 129–48 (in German).
[7] Sherif YS, Smith ML. Optimal maintenance models for systems
decreasing failure rates. The economic constraint has been subject to failure—a review. Nav Res Log Quart 1981;28:47–74.
formalized in terms of an energy, or cost, function which has [8] Valdez Flores C, Feldman RM. A survey of preventive maintenance
allowed us to optimize the maintenance period and to eval- models for stochastically deteriorating single unit systems. Nav Res
uate the convenience of substituting a failed component Log Quart 1989;36:419–46.
with a new, improved one. The proposed approach has [9] Cho DI, Parlar M. A survey of maintenance models for multi-unit
systems. Eur J Oper Res 1991;51:1–23.
been illustrated with reference to a system taken from litera-
[10] Martorell S, Munoz A, Serradell V. An approach to integrating
ture. Due to the methodological character of the work, no surveillance and maintenance tasks to prevent the dominant failure
particular meaning should be given to the numerical results causes of critical components. Reliab Engng System Safety
obtained. Rather, the choices of the numerical values of the 1995;50:179–87.
model parameters have been directed by the intention of [11] Vaurio JK. Optimization of test and maintenance intervals based on
illustrating various model effects. risk and cost. Reliab Engng System Safety 1995;49:23–36.
[12] Munoz A, Martorell S, Serradell V. Genetic algorithms in surveil-
As a final consideration we note that as the issues of plant lance and maintenance of components. Reliab Engng System Safety
management and operation gain more and more importance, 1997;57:107–20.
they will also become more and more complicated so that [13] Vaurio JK. On time-dependent availability and maintenance optimi-
the Monte Carlo approach will become the fundamental tool zation of standby units under various maintenance policies. Reliab
for such analysis. Further efforts in the field are therefore Engng System Safety 1997;56:79–89.
[14] Brown M, Proschan F. Imperfect repair. J Appl Probab 1983;20:851–9.
needed both from a methodological point of view (such as,
[15] Lim TJ. Estimating system reliability with fully masked data under
for example, the introduction of efficient variance reduction Brown–Proschan imperfect repair model. Reliab Engng System
techniques for the evaluation of cost and energy integrals) Safety 1998;59:277–89.
and from an application point of view. [16] Vatn J. Maintenance optimisation from a decision theoretical point of
Currently, the authors have completed a work which view. Reliab Engng System Safety 1997;58:119–26.
combines Monte Carlo reliability and availability analysis [17] Dubi A. Analytic approach and Monte Carlo methods for realistic
systems. Tutorial notes, April 1997.
to the powerful genetic search algorithms for optimization. [18] Freeze RA, Massmann J, Smith L, Sperling T, James B. Hydrogeo-
The work is in the process of being prepared for submission logical decision analysis: a framework. Ground Water
to this journal for publication. 1990;28(5):738–66.
E. Borgonovo et al. / Reliability Engineering and System Safety 67 (2000) 61–73 73

[19] Framatome spare parts expertise. Framatome Nuclear Newsletter, no. [21] Marseguerra M, Zio E. Nonlinear Monte Carlo reliability analysis
51, 1997. with biasing towards top event. Reliab Engng System Safety
[20] Song JS, Zipkin PH. Managing inventory with the prospect of obso- 1993;40:31–42.
lescence. Oper Res 1993;44:215–24.

You might also like