An Introduction To Probability Models in Reliability and Maintainability

2011 Annual RELIABILITY and MAINTAINABILITY Symposium
An Introduction to Probability Models

in Reliability and Maintainability
C. Richard Cassady, Ph. D.
C. Richard Cassady, Ph.D.

Department of Industrial Engineering
University of Arkansas
4207 Bell Engineering Center
Fayetteville, Arkansas 72701 USA
e-mail: cassady@uark.edu
Tutorial Notes © 2011 AR&MS

SUMMARY & PURPOSE
The purpose of this tutorial is to provide attendees with basic coverage of the traditional, fundamental probability models
used to describe, improve, and optimize system reliability and maintainability. This coverage requires the discussion of some
basic concepts from probability and distribution theory. No specific models are endorsed. Instead, emphasis is placed on
identifying the key assumptions associated with each model.
C. Richard Cassady, Ph.D.

Richard Cassady is a Professor in the Department of Industrial Engineering and the Director of the Freshman Engineering
Program at the University of Arkansas. Prior to joining the faculty at UofA, he was on the faculty at Mississippi State
University. He received his Ph.D., M.S. and B.S. all in industrial and systems engineering from Virginia Tech. His primary
reliability research interests are in repairable systems modeling. This work includes the analysis and development of equipment
maintenance policies including preventive maintenance, selective maintenance and cannibalization. He is a Senior Member of
IIE, and a member of ASEE, INFORMS and SRE. He is also a member of the RAMS Management Committee.
Table of Contents
1. Reliability Models ................................................................................................................................................................1

2. Repairable Systems ..............................................................................................................................................................1
3. Renewal Models ...................................................................................................................................................................2
4. Minimal Repair Models........................................................................................................................................................3
5. Imperfect Maintenance .........................................................................................................................................................4
6. Advanced Topics ..................................................................................................................................................................6
7. Conclusions ..........................................................................................................................................................................6
8. References ............................................................................................................................................................................6
9. Tutorial Visuals…………………………………………………………………………………….. ...................................8
ii – Cassady 2011 AR&MS Tutorial Notes

1. RELIABILITY MODELS one’s beliefs regarding the probabilities of the other events in
the collection, then the collection is a collection of
1.1 Probability Primer
independent events. If A and B are independent events, then
Reliability is defined as the probability that a component Pr(A|B) = Pr(A). If {A1, A2, … , An} is a collection of
(or an entire system) performs its intended function for a independent events, then
specified period of time when operated in its design
environment. If we assume that the component is used for its Pr Pr
intended function in its design environment, then reliability
can be defined as the probability of proper function. For this
reason, any treatment of basic reliability begins with coverage 1.2 Static Reliability Models
of key concepts from basic probability.
2. REPAIRABLE SYSTEMS
The basic building block of probability theory is the
random experiment. A random experiment is an occurrence A good definition of a repairable system can be obtained
that has an unpredictable outcome. The sample space for a by modifying the one provided by Ascher and Feingold [9]. A
random experiment (denoted by Ω) is the set of all possible repairable system (RS) is a system which, after failure, can be
outcomes for the random experiment. Events (denoted by restored to a functioning condition by some maintenance
italicized capital letters) are meaningful subsets of the sample action other than replacement of the entire system. Note that
space. Note that examples intended to emphasize the replacing the entire system may be an option, but it is not the
definitions and reinforce the concepts presented throughout only option.
this tutorial are included in the tutorial visuals in Section 3. Maintenance actions performed on a RS can be
Probabilities (as will be explored later) are defined on categorized into two groups: corrective maintenance (CM)
events, but it is often interesting to combine events using three actions and preventive maintenance (PM) actions. CM actions
common operators. The union (denoted by ∪) of a collection are performed in response to system failures, and they could
of events is the event corresponding to the case in which at correspond to either repair or replacement activities. PM
least one of the events in the collection occurs. The actions are not performed in response to RS failure, but they
intersection (denoted by ∩) of a collection of events is the are intended to delay or prevent system failures. Note that PM
event corresponding to the case in which all of the events in actions may or may not be cheaper and/or faster than CM
the collection occur. The complement (denoted by ′) of an actions. As with CM actions, PM actions can correspond to
event is the event corresponding to the case in which the event either repair or replacement activities. Finally, operational
does not occur. maintenance actions, e.g. putting gas in a vehicle, are not
Applying event operators leads to additional definitions. considered to be PM actions.
If two events having no outcomes in common, then the PM actions can be divided into two sub-categories.
intersection of the two events is the null event (denoted by φ). Scheduled maintenance (SM) actions are planned based on
In this case, the two events are said to be mutually exclusive some measure of elapsed time. Condition-based maintenance
or disjoint. A collection of events is said to be disjoint if there (CBM) actions are initiated based on data obtained from
is no intersection among any of the events in the collection. sensors applied to the RS. Vibration data and chemical
The branch of mathematics known as probability is based analysis data are two examples of the type of data used in
on three axioms (Note that Pr(A) denotes the probability of an CBM. While it is still a developing science, CBM provides the
event A). potential for just-in-time, cost-effective maintenance.
Axiom 1: For any event A, 0 ≤ Pr(A) ≤ 1. However, scheduled maintenance is the only type of PM
Axiom 2: Pr(Ω) = 1 considered further in this tutorial.
Axiom 3: If A and B are mutually exclusive events, then Repairable systems modeling refers to the application of
operations research techniques (e.g., probability modeling,
Pr(A ∪ B) = Pr(A) + Pr(B).
optimization, simulation) to problems related to equipment
These axioms can be used to derive all of the most
maintenance. Repairable system models are typically used to
commonly-used probability rules. Two such rules are
evaluate the performance of one or more repairable systems
Pr(A′) = 1 − Pr(A)
and/or design maintenance policies for one or more repairable
and
systems.
Pr(A ∪ B) = Pr(A) + Pr(B) In our discussions, we assume that a RS is always in one
where A and B are events. of two states: functioning (up) or down. Note that a system
In some cases, knowledge of the occurrence of one event may be down for CM or down for PM.
may alter one’s belief regarding the probability of some other The performance of a RS can be measured in several
event. In such cases, conditional probability is used to update ways. We consider three categories of RS performance
the probability of interest. Let Pr(A|B) denote the conditional measures: (1) number of failures, (2) availability measures, (3)
probability of event A given the occurrence of event B where cost measures. Let N(t) denote the number of RS failures in
Pr
Pr | the first t time units of system operation. Because of the
Pr stochastic (random) nature of RS behavior, N(t) is a random
If a collection of events is such that knowledge of the variable. Thus, we may focus our attention on the expected
occurrence of one event in the collection has no impact on value, variance and probability distribution of N(t).
2011 Annual RELIABILITY and MAINTAINABILITY Symposium Cassady – 1

Availability can be loosely defined as the proportion of cp cost (per time unit) of PM
time that a RS is in a functioning condition. However, there ca cost of RS replacement
are four specific measures of availability found in the RS Consider a RS that: (1) is modeled as a single component
literature [10]. All these measures are based on the RS status or a “black box”; (2) is intended to function 24 hours per day,
function: 7 days per week; (3) has self-announcing (obvious) failures;
⎧1 if system is functionin g at time t (4) is binary-state – as in equation (1); (5) is “as good as new”
X (t ) = ⎨ (1) at time t = 0; (6) is subjected to either SM or no PM. We
⎩0 if system is down at time t utilize our own taxonomy to capture the essential elements of
See Fig. 1 for a graphical portrayal of X(t). the models of such repairable systems. This taxonomy has six
parts and can be summarized by 1/2/3/4/5/6 where 1 describes
X(t)
initiation of a the probabilistic characteristics of the time to first failure of
maintenance action the RS, 2 describes the duration of CM, 3 describes the impact
of CM on the age of the RS, 4 describes the type of PM policy
1 (if any), 5 describes the duration of PM, and 6 describes the
impact of PM on the age of the RS.
3.RENEWAL MODELS
The first class of RS models that we address is based on
concepts and results from renewal theory.
0 3.1 A Generic Case – The G/G/P Model
0 completion of the
t We first consider a RS having the six characteristics
maintenance action
related to our taxonomy. In the description G/G/P, the first G
Figure 1. Graphical Portrayal of X(t) implies that the time to first failure of the RS is some type of
random variable. The second G implies that the duration of
The first of these availability measures is the availability CM is some type of random variable. The P implies that CM
function, A(t). is perfect, i.e. CM restores the RS to an “as good as new”
A(t ) = Pr[X (t ) = 1] (2) condition. Note that the “as good as new” assumption is the
Note that A(t) is typically difficult to obtain and rarely used in key assumption and often the subject of criticism of the
practice. The second measure is limiting availability, A. corresponding models (except when CM corresponds to RS
A = lim A(t ) (3) replacement). The absence of the last three elements of the
t →∞
taxonomy implies that no PM is performed.
By far the most commonly-used availability measure, limiting Let Ti denote the duration of the ith interval of RS
availability is often easy to obtain mathematically. However, function. Because of the “as good as new” assumption, {T1,
there are some cases in which limiting availability does not T2, … } is a sequence of independent and identically
exist. The third availability measure is the average availability distributed (iid) random variables. Let Di denote the duration
function, Aavg(T). of the ith CM action, and note that {D1, D2, … } are assumed
1T to be iid random variables. Therefore, each cycle (function,
Aavg (T ) = ∫ A(t ) dt (4)
T 0 CM) has identical probabilistic behavior, and the completion
Average availability corresponds to the average proportion of of a CM action is a renewal point for the stochastic process
“uptime” over the first T time units of system operation. Since {X(t), t ≥ 0}.
it is based on A(t), average availability is typically difficult to Regardless of the probability distributions governing Ti
obtain and rarely used in practice. However, because it and Di, the limiting availability is easy to obtain [10].
captures availability behavior over a finite period of time, it is E (Ti ) MTTF
A= = (6)
a valuable measure of RS performance. The final availability E (Ti ) + E (Di ) MTTF + MTTR
measure is limiting average availability, Aavg. Suppose Ti is a Weibull random variable having shape
Aavg = lim
t →∞
Aavg (t ) (5) parameter β = 2 and scale parameter η = 1000 hours. Then
When it exists, limiting average availability is almost always ⎡ ⎛ t ⎞β ⎤
equivalent to limiting availability. To our knowledge, limiting R (t ) = Pr (Ti > t ) = exp ⎢− ⎜⎜ ⎟⎟ ⎥ (7)
average availability is almost never used in practice. ⎣⎢ ⎝ η ⎠ ⎦⎥
Cost functions are often used to evaluate the performance ⎛ 1⎞
of a RS. The form of this function depends on the reliability MTTF = η Γ⎜⎜1 + ⎟⎟ = 886.2 hours (8)
and maintainability characteristics of the RS of interest. ⎝ β⎠
However, these functions typically include a subset of the Suppose Di is a normal random variable having a mean
following cost parameters. (MTTR) of 25 hours. Thus,
cf cost of a failure 886.2
cost per time unit of “downtime”
A= = 0.9726 (9)
cd 886.2 + 25
cr cost (per time unit) of CM For this example, availability and average availability values
2 – Cassady 2011 AR&MS Tutorial Notes

can be estimated using simulation. of PM is random variable, and the second P indicates that PM
restores the RS to an “as good as new” condition. Note that
3.2 A Special Case – The CFR/E/− Model PM may be worthwhile if PM is cheaper and/or faster than
In the description CFR/E/−, the CFR implies that Ti is an CM, since the RS has an increasing failure rate and PM
exponential random variable having failure rate λ, and the E reduces the age of the RS. Therefore, it may be of interest to
implies that Di is an exponential random variable having derive an optimal PM policy for the RS. Specifically, we can
repair rate μ. Since the RS has a constant failure rate (CFR), modify our existing probability models to identify the value of
RS aging and the impact of CM are irrelevant. Note that the τ that maximizes the limiting availability of the RS.
CFR/E/− model is a special case of the G/G/P model. For this Let T denote the duration of an interval of RS function.
RS [10]: Let f(t) denote the probability density function (pdf) of T, and
μ λ let F(t) denote the cumulative distribution function (cdf) of T.
A(t ) = + e −(λ + μ )t (10) Let DPM denote the duration of a PM action, and let DCM
λ+μ λ+μ denote the duration of a CM action.
μ Fig. 4 contains a graphical portrayal of RS behavior under
A= (11)
λ+μ such a PM policy.
λ [1 − e − (λ + μ )T ] + μ (λ + μ )T
Aavg (T ) = (12)
(λ + μ )2 T Function
For example, suppose λ = 0.001 failures per hour (MTTF
= 1000 hours) and μ = 0.025 repairs per hour (MTTR = 40 τ if T > τ T if T ≤ τ
hours). In this case,
0.025 1000 DPM DCM
A= = = 0.9615 (13)
0.001 + 0.025 1000 + 40
A plot of A(t) can be found in Fig. 2, and a plot of Aavg(T) can
be found in Fig. 3. PM CM
1
0.99 Figure 4. RS Behavior under an Age-Based PM Policy

Because of our assumptions, the completion of any
0.98
maintenance action corresponds to a renewal point, and
E (uptime )
A= (14)
E (uptime ) + E (downtime )
0.97
We can derive E(uptime) and E(downtime) by conditioning on

100 200 300 400 500 the value of T.
τ
Figure 2. Example Availability Function – A(t) vs. t E (uptime ) = ∫ tf (t )dt + τ [1 − F (τ )] (15)

0
1 E (downtime ) = E (DCM )F (τ ) + E (DPM )[1 − F (τ )] (16)

Note that the integral in E(uptime) typically must be evaluated
0.99 numerically. Numerical analysis can then be used to compute
limiting availability values for various values of τ.
0.98
For example, suppose T is a Weibull random variable
having β = 2 and η = 80 hours. Suppose E(DCM) = 8 hours and
0.97
E(DPM) = 2 hours. Fig. 5 contains a plot (generated by
Mathematica) of the limiting availability of the RS as a
500 1000 1500 2000 function of the age-based PM policy, τ. The optimal PM
policy is τ* = 47.5 hours with a corresponding limiting
Figure 3. Example Average Availability Function – A(t) vs. t availability of 0.9182.
3.3 PM Optimization − The W/G/P/A/G/P Model 4. MINIMAL REPAIR MODELS
In the description W/G/P/A/G/P, the W indicates that the The second class of RS models that we address is based
time to first failure is a Weibull random variable with shape on the concept of minimal repair.
parameter β (β > 1) and scale parameter η. Note that since β >
4.1 The Generic Case – The G/0/M Model
1, the RS has an increasing failure rate (IFR). The A indicates
that the RS is subjected to an age-based PM policy: If the RS In the description G/0/M, the G again implies that the
functions without failure for τ time units, a PM action is time to first failure of the RS is a random variable. The 0
initiated. Furthermore, the second G indicates that the duration implies that CM is instantaneous, and the absence of the last

three elements of the taxonomy implies that no PM is Furthermore, N(t+s) − N(s), the number of failures in the
performed. The M implies that CM is minimal, i.e. CM interval (s,t+s], is also a Poisson random variable having mean
restores the RS to an “as bad as old” condition. Minimal CM, λt [11]. The implication of this result is that the number of
or minimal repair, implies that the RS functions after CM but failures in a given interval depends only on the length of the
its equivalent age is the same as it was at the time of failure. interval. Note that this is not true for an NHPP.
As with the “as good as new” assumption, the realism of the
“as bad as old” assumption is often questioned. 4.3 The W/0/M Model – The Power Law Process
Suppose T is a Weibull random variable having shape
0.918
parameter β and scale parameter η. Then,
β
z (t ) = β t β −1 (28)
0.916 η
β
⎛t⎞
Z (t ) = ⎜⎜ ⎟⎟ (29)
0.914
⎝η ⎠
and {N(t), t ≥ 0} is a power law process. If β > 1 (β < 1), then
the intensity function increases (decreases) and failures tend to
occur more (less) frequently over time. Suppose β = 1.75 and
40 50 60 70
η = 1500 hours. Then,

E [N (1000)] = 0.4919 (30)
Figure 5. Example PM Optimization – A vs. τ
E [N (2000 ) − N (1000 )] = 1.1626 (31)
Let T denote the duration of the first interval of RS E [N (3000 ) − N (2000 )] = 1.7092 (32)
function. Let f(t) denote the pdf of T, let F(t) denote the cdf of
T, and let z(t) denote the hazard function of T. Then, {N(t), t ≥ Pr[N (1000) > 2] = 0.0138 (33)
0} is a non-homogeneous Poisson process (NHPP) having Pr[N (2000) − N (1000) > 2] = 0.1125 (34)
intensity function z(t) [11]. Since {N(t), t ≥ 0} is an NHPP Pr[N (3000 ) − N (2000 ) > 2] = 0.2452 (35)
having intensity function z(t), then N(t) is a Poisson random
variable having mean Z(t), where Z(t) is the cumulative 4.3 Optimal Replacement – The W/0/M/B/0/P Model
intensity function..
t
Consider the W/0/M model with β > 1. Suppose the RS
Z (t ) = ∫ z (u ) du (17) under consideration has an increasing intensity function. Over
0 time, failures will tend to occur more frequently, and at some
Furthermore, N(t+s) − N(s) is a Poisson random variable point, it will become economical to replace the system. Let τ
having mean Z(t+s) − Z(s) [11]. denote the replacement time. Replacement of this type would
N (t ) ∈ {0,1,K} (18) be equivalent to perfect, instantaneous PM under a Block PM
policy. The results is the W/0/M/B/0/P model. For such a RS,
E [N (t )] = Var [N (t )] = Z (t ) (19)
we can use a cost model to choose an optimal value of τ.
e − Z (t ) [Z (t )]
n
Pr [N (t ) = n ] = (20) Let cf denote the cost of a failure, let ca denote the cost of
n! replacing the RS, and let C(τ) denote the cost per unit time of
N (t + s ) − N (s )∈ {0,1, K} (21) RS ownership if the RS is replaced at time τ. Then:
E [N (t + s ) − N (s )] = Z (t + s ) − Z (s ) (22) E[C (τ )] = {ca + c f E[N (τ )]}
1
(36)
Var [N (t + s ) − N (s )] = Z (t + s ) − Z (s ) (23) τ
Pr [N (t + s ) − N (s ) = n ] =
e − [Z (t + s )− Z ( s )] [Z (t + s ) − Z (s )]
n
(24)
E[C (τ )] =
1
τ
[c a + c f Z (τ ) ] (37)
n!
c f τ β −1
E [C (τ )] =
ca
+ (38)
4.2 The CFR/0/− Model – The Poisson Process τ ηβ
If T is an exponential random variable then z(t) is a Differentiation and algebraic manipulation yield [9]:
1β
constant value λ and {N(t), t ≥ 0} is a Poisson process having ⎡ c ηβ ⎤
rate λ. In this case, the impact of CM is irrelevant. Since CM τ =⎢ a
∗
⎥ (39)
is instantaneous, {N(t), t ≥ 0} is a Poisson process having rate ⎣⎢ c f (β − 1) ⎦⎥
λ and N(t) is a Poisson random variable with mean λt [11]. For example, if β = 1.75, η = 1500 hours, ca = $1000, and cf =
N (t ) ∈ {0,1,K} (25) $75, then the RS should be replaced after τ* = 7768 hours.
E [N (t )] = Var [N (t )] = λt (26) 5.IMPERFECT MAINTENANCE
Pr[N (t ) = n] =
e − λt
(λt ) n
(27)
The final class of RS models that we address is based on
n! the concept of imperfect maintenance [12]. The phrase
“imperfect maintenance” typically refers to the impact of CM

on a RS. Imperfect maintenance is more effective than The inputs to the simulation model are the RS reliability
minimal CM but not as effective as perfect CM. A wide and maintainability parameters (β, η, μ, a) and the following
variety of imperfect maintenance models have been presented simulation parameters: the run length (tend), the number of
in the literature [12]. Our focus is on two of the models. observation points (m), and the number of replications (n).
Note that the m observation points {t1, t2, … , tm} are equally-
5.1. The W/E/K1 Model
spaced over the simulation run. The outputs from the
Under this model, the time to first RS failure is a Weibull simulation models are: Status(i,tj) – the status of the RS (1 or
random variable having β > 1. In the description W/E/K1, the 0) at observation point j in replication i; Uptime(i,tj) – the
K1 refers to Kijima’s first model [13] of the imperfect impact accumulated uptime of the RS at observation point j in
of CM. In two papers [13,14], Kijima proposes this model and replication i; NumFailure(i,tj) – the cumulative number of RS
another imperfect CM model that he refers to as “virtual age” failures at observation point j in replication i. These outputs
models. K1 refers to the first of these models. can be converted to the following performance estimates:
Aˆ (t j ) = ∑ Status(i, t j )
Let Vi denote the virtual age of the RS after the ith CM 1 n
action, and let V0 = 0. Kijima’s first virtual age model (45)
n i =1
indicates that
Aˆ avg (t j ) = ∑ Uptime(i, t j )
1 n
Vi = Vi −1 + aTi (40) (46)
nt j i =1
where 0 < a < 1. Under this model, each CM action removes a
portion, 1 − a, of the age accumulated during the most recent [ 1 n
]
Eˆ N (t j ) = ∑ NumFailure(i, t j ) (47)
period of RS function. n i =1
After the first CM action, the RS age is V1 = aT1. The Suppose β = 1.75, η = 1, μ = 5, and a = 0.2. Using the
duration of the second period of RS function (T2) is governed output from n = 153,664 replications of a simulation of length
by a residual Weibull probability distribution (the parameters tend = 25 with m = 2000 observations, the RS performance
β and η are unchanged) given survival to age V1. Let F1(t) estimates were computed and compiled into Fig. 6. Note the
denote the cdf of T1. Then: degrading availability behavior.
⎡ ⎛ t ⎞β ⎤ 1 50
F1 (t ) = 1 − exp ⎢− ⎜⎜ ⎟⎟ ⎥ (41)
⎣⎢ ⎝ η ⎠ ⎦⎥
45
40
Let F2(t) denote the cdf of T2. Then: 0.85
35
⎡ ⎛ t + V ⎞β ⎛V ⎞β ⎤
F2 (t ) = 1 − exp ⎢− ⎜⎜ 1
⎟⎟ + ⎜⎜ 1 ⎟⎟ ⎥ (42)
30
⎣⎢ ⎝ η ⎠ ⎝ η ⎠ ⎦⎥ 0.7 25
A(t)
Aavg(t)
E[N(t)]
th
After the (i − 1) CM action, the RS age is: 20
i −1
Vi −1 = a ∑ T j
15
(43) 0. 55
10
j =1
th
The duration of the i period of RS function is governed by a 5
residual Weibull probability distribution (the parameters β and 0.4

0 6.25 12.5 18.75 25
0
η are unchanged) given survival to age Vi-. Let Fi(t) denote the t
cdf of Ti. Then: Figure 6. RS Performance under the W/E/K1 Model

⎡ ⎛ t + V ⎞β ⎛ V ⎞β ⎤ 5.2 The W/E/K2 Model
Fi (t ) = 1 − exp ⎢− ⎜⎜ 1−1
⎟⎟ + ⎜⎜ 1−1 ⎟⎟ ⎥ (44)
⎣⎢ ⎝ η ⎠ ⎝ η ⎠ ⎦⎥ Under Kijima’s second virtual age model (K2), RS aging
is governed by the following equation:
Since β > 1 and a > 0, the age of the RS continues, the
Vi = a(Vi −1 + Ti ) (48)
age of the RS continues to increase over time. Therefore, the
reliability performance of the RS degrades over time. Note Under this model, each CM action removes a portion of the
that if a = 0, then CM is perfect, and note that if a = 1, then total accumulated age. Using the same parameters as the
CM is minimal. example in section 5.1, the simulation modeling approach can
Suppose we wish to evaluate the behavior of this type of be used to obtain the RS performance estimates capture in Fig.
RS using the availability function, the average availability 7. Note that under Kijima’s second virtual age model, the RS
function, or the average number of failures. We can use achieves steady-state availability performance.
discrete-event simulation to conduct this evaluation. With the As a final note, the simulation approach is applicable to
simulation approach, we begin by constructing a computer any RS model, but most appropriate for those that cannot yield
program that mimics RS function, failure, CM and aging over analytic performance results. Advances in computing
some finite time period. Within the computer program, we technology make it a more practical approach everyday, and in
track the status of the RS, the cumulative uptime of the RS, many cases, the nature of the output makes it possible to
and the number of RS failures, and we occasionally stop the control the degree of uncertainty (e.g., confidence levels and
simulation clock and record these metrics. Then, we replicate precision) in the results.
the entire process some specified number of times.

1 30 component from another RS that is failed for some other
reason and waiting in some maintenance queue.
25
Cannibalization is often used as a maintenance alternative due
0.9
to the absence of spare parts and the need for reduced
20
maintenance delays.
0.8 15
A(t)
Aavg(t)
6.3 Allocating Maintenance Resources
E[N(t)]
Many RS models assume that the resources required to

perform maintenance are available in unlimited quantities. In
10
fact, components within a RS, and systems within an

0.7
organization, share maintenance resources. Typically,

0.6 0
insufficient resources are available for performing all
0 6.25 12.5
t
18.75 25
desirable maintenance actions. In such a case, a strategy is
needed for prioritizing maintenance actions and/or selecting a
Figure 7. RS Performance under the W/E/K2 Model
subset of maintenance actions to perform. Such a strategy falls
6.ADVANCED TOPICS in the domain of selective maintenance. For a single RS,
selective maintenance strategies answer questions such as:
The models presented in this tutorial are very basic and Which components should be repaired? Which components
subject to some very limiting assumptions. However, these should be subjected to PM? Which components should be
models are applicable in many industrial and military settings.
replaced? [15,16,17,18] Given multiple RS in an
More importantly, the concepts related to these models serve organization, selective maintenance strategies recommend
as the building blocks for more complex and realistic allocation of the organization’s maintenance resources.
repairable systems models. We summarize some of these
advanced topics below. 6.5 Productive Maintenance (not Total Productive
Maintenance)
6.1 System-Level Maintenance
Models for optimizing PM policies typically use (as we
A RS is almost always comprised of many components
have) equipment availability or cost of equipment operation as
that have different maintenance “needs”. Optimizing the performance measure. In most cases, a unit of equipment
maintenance planning at the component level is likely to be (a single RS) only supports a part of an organization’s
suboptimal at the system level. Therefore, we need system-
mission. Maintenance decisions should be made so that the
level maintenance strategies for performing component-level productivity of the entire organization is maximized.
maintenance. Examples of such a philosophy may include incorporating
The availability function for a system is a function of the
production scheduling and PM planning or integrating vehicle
availability functions of its components. If the components are load assignment and PM planning.
independent, then component availability functions can be
combined like reliability functions (e.g., series, parallel, 7. CONCLUSIONS
series-parallel) to construct the system availability function.
In this tutorial, we present the basic reliability and
Often, components that comprise a system are not
maintainability concepts and mathematical modeling
independent. This dependence can be either structural or
approaches associated with repairable systems. The concepts
economic. Structural dependence may manifest itself in terms
and models should facilitate the attendee’s understanding of
of common-cause failures or maintenance-induced damage.
the repairable systems modeling literature and ability to
Economic dependence may result in the potential for
formulate and solve repairable systems modeling problems.
opportunism.
As a final note, attendees may want to pursue
Opportunism can be explained with a simple example.
opportunities to learn more about several areas closely related
Consider a RS comprised of two components. Either due to
to repairable systems modeling. These areas in clued Markov
failure or expiration of a PM interval, maintenance is about to
modeling, equipment inspection models, spare parts inventory
be performed on component 1. If component 2 is near the
management, condition-based maintenance, and diagnostic
expiration of its PM interval, then it may be worthwhile to go
modeling.
ahead and perform PM on component 2. Such an action is an
opportunistic maintenance action [10]. 8. REFERENCES
6.2 Fleet-Level Maintenance 1. McCall, J.J. (1965) “Maintenance Policies for
Stochastically Failing Equipment: A Survey”,
Organizations often utilize fleets of repairable systems
that share organizational resources. Maintenance plans should Management Science, Vol. 11, No. 5, pp. 493-524.
account for the opportunities and limitations resulting from 2. Pierskalla, W.P., and J.A. Voelker (1976) “A Survey of
Maintenance Models: The Control and Surveillance of
having a fleet of identical repairable systems. One such
example is cannibalization. Deteriorating Systems”, Naval Research Logistics
Cannibalization refers to a maintenance action in which a Quarterly, Vol. 23, No. 3, pp. 353-388.
failed component in a RS is replaced with a functioning

3. Osaki, S. and T. Nakagawa (1976) “Bibliography for 12. Pham, H. and H. Wang (1996) “Imperfect Maintenance”,
Reliability and Availability of Stochastic Systems”, IEEE European Journal of Operational Research, Vol. 94, pp.
Transactions on Reliability, Vol. 25, pp. 284-287. 425-438.
4. Sherif, Y.S. and M.L. Smith (1981) “Optimal 13. Kijima, M., H. Morimura, and Y. Suzuki (1988) “Periodic
Maintenance Models for Systems Subject to Failure – A Replacement Problem Without Assuming Minimal
Review”, Naval Research Logistics Quarterly, Vol. 28, Repair”, European Journal of Operational Research, Vol.
pp. 47-74. 37, pp. 194-203.
5. Valdez-Flores, C. and R.M. Feldman (1989) “Survey of 14. Kijima, M. (1989) “Some Results for Repairable Systems
Preventive Maintenance Models for Stochastically With General Repair”, Journal of Applied Probability,
Deteriorating Single-Unit Systems”, Naval Research Vol. 26, pp. 89-102.
Logistics, Vol. 36, No. 4, pp. 419-446. 15. Rice, W.F., C.R. Cassady and J.A. Nachlas (1998)
6. Cho, D.I. and M. Parlar (1991) “A Survey of Maintenance “Optimal Maintenance Plans under Limited Maintenance
Models for Multi-Unit Systems”, European Journal of Time”, Industrial Engineering Research ’98 Conference
Operational Research, Vol. 51, pp. 1-23. Proceedings.
7. Dekker, R. (1996) “Applications of Maintenance 16. Cassady, C.R., W.P. Murdock and E.A. Pohl (2001) “A
Optimization Models: A Review and Analysis”, Deterministic Selective Maintenance Model for Complex
Reliability Engineering and System Safety, Vol. 51, No. 3, Systems”, Recent Advances in Reliability and Quality
pp. 229-240. Engineering (H. Pham, Editor), World Scientific,
8. Wang, H. (2002) “A Survey of Maintenance Models for Singapore, pp. 311-325.
Deteriorating Systems”, European JOrunal of 17. Cassady, C.R., E.A. Pohl and W.P. Murdock (2001)
Operational Research, Vol. 139, pp. 469-489. “Selective Maintenance Modeling for Industrial
9. Ascher, H. and H. Feingold (1984) Repairable Systems Systems”, Journal of Quality in Maintenance
Reliability, Marcel Dekker, Inc., New York. Engineering, Vol. 7, No. 2, pp. 104-117.
10. Barlow, R.E. and F. Proschan (1965) Mathematical 18. Cassady, C.R., W.P. Murdock and E.A. Pohl (2001)
Theory of Reliability, John Wiley & Sons, Inc., New “Selective Maintenance for Support Equipment Involving
York. Multiple Maintenance Actions”, European Journal of
11. Ross, S.M. (1989) Introduction to Probability Models, Operational Research, Vol. 129, No. 2, pp. 252-258.
Seventh Edition, Harcourt Academic Press, San Diego.

An Introduction To Probability Models in Reliability and Maintainability

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Introduction To Probability Models in Reliability and Maintainability

Uploaded by

Copyright:

Available Formats

2011 Annual RELIABILITY and MAINTAINABILITY Symposium

An Introduction to Probability Models

C. Richard Cassady, Ph. D.

C. Richard Cassady, Ph.D.

Tutorial Notes © 2011 AR&MS

C. Richard Cassady, Ph.D.

1. Reliability Models ................................................................................................................................................................1

ii – Cassady 2011 AR&MS Tutorial Notes

2011 Annual RELIABILITY and MAINTAINABILITY Symposium Cassady – 1

2 – Cassady 2011 AR&MS Tutorial Notes

0.99 Figure 4. RS Behavior under an Age-Based PM Policy

We can derive E(uptime) and E(downtime) by conditioning on

Figure 2. Example Availability Function – A(t) vs. t E (uptime ) = ∫ tf (t )dt + τ [1 − F (τ )] (15)

1 E (downtime ) = E (DCM )F (τ ) + E (DPM )[1 − F (τ )] (16)

2011 Annual RELIABILITY and MAINTAINABILITY Symposium Cassady – 3

η = 1500 hours. Then,

4 – Cassady 2011 AR&MS Tutorial Notes

Let F2(t) denote the cdf of T2. Then: 0.85

residual Weibull probability distribution (the parameters β and 0.4

cdf of Ti. Then: Figure 6. RS Performance under the W/E/K1 Model

2011 Annual RELIABILITY and MAINTAINABILITY Symposium Cassady – 5

Many RS models assume that the resources required to

fact, components within a RS, and systems within an

organization, share maintenance resources. Typically,

6 – Cassady 2011 AR&MS Tutorial Notes

2011 Annual RELIABILITY and MAINTAINABILITY Symposium Cassady – 7

You might also like