1 s2.0 S0951832023002272 Main

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Reliability Engineering and System Safety 237 (2023) 109313

Contents lists available at ScienceDirect

Reliability Engineering and System Safety


journal homepage: www.elsevier.com/locate/ress

Maintenance optimization of a system subject to two-stage degradation,


hard failure, and imperfect repair
Tom Ivar Pedersen *, Xingheng Liu , Jørn Vatn
Department of Mechanical and Industrial Engineering, Norwegian University of Science & Technology (NTNU), Trondheim 7491, Norway

A R T I C L E I N F O A B S T R A C T

Keywords: This paper develops a condition-based maintenance (CBM) policy for a two-component system subject to
Maintenance optimization continuous degradation and hard failure. The cumulative degradation of one of the components is modeled by a
Imperfect repair two-stage degradation process, which then determines the failure probability of the other component via a
Hard failure
proportional hazards model. Imperfect repair reduces the degradation level without changing the degradation
Condition-based maintenance (CBM)
Long-run cost rate
rate, whereas preventive renewal restores both components to the as-good-as-new condition. The CBM policy is
Process industry optimized by finding the imperfect repair threshold and the preventive renewal time that minimize the long-run
cost rate. We propose a numerical procedure to find the solution without Monte Carlo simulation. Furthermore,
the modeling framework is flexible enough to account for maintenance delay and opportunistic maintenance. A
case study of the maintenance of a cooling system taken from a company in the process industry is presented to
illustrate the applicability of the proposed maintenance policy.

1. Introduction production process and the ambient condition. However, the risk of hard
failure in the DC increases as the cooling water flow rate decreases. Hard
Assets in the process industry have for many years been equipped failure is, in this paper, understood as a type of failure characterized by a
with sensors that monitor the condition of the systems [1], prompting sudden breakdown. This contrasts with soft failures, where a component
the application of condition-based maintenance (CBM) [2]. CBM opti­ is considered failed when the degradation reaches a predefined
mization models have received increasing attention in the literature [3], threshold [8]. In the motivating case study, hard failure occurs when a
but their application in practice lags [4,5]. This is partly because of the lack of cooling causes damage to the DC so that this component can no
complexity of real-life systems compared to the stylized ones commonly longer contain the cooling water. As this causes water to leak into the
studied [6]. When using analytical methods for maintenance optimiza­ furnace, production must be stopped, and the DC must be renewed.
tion, simplifications must be made to make the calculations tractable. Furthermore, the renewal cost of the UC is small compared to the DC.
However, by deviating from the actual system, the results may become The ability of the UC to fulfill its required function is easily monitored
misleading and less cost-effective [7]. To avoid such consequences, it is with online sensors, while the health of the DC can only be revealed with
tempting to plan and optimize the maintenance policy for a specific prohibitively expensive inspections.
system in a targeted manner. Maintenance actions for this system include imperfect repair (IR), a
This paper investigates the optimization of a CBM policy for a two- cleaning procedure that partially restores the performance of the UC,
component system. This is motivated by the operation and mainte­ and preventive renewal (PR), which requires a shutdown of the system
nance of a cooling system used on some electric arc furnaces in the and causes an unavailability cost. Although IR can temporarily restore
process industry, where we have had access to industrial sensor data and the flow rate, it does not influence the degradation rate. As the degra­
maintenance records. This system consists of a cable supplying cooling dation rate increases with time, IRs are needed more and more
water, named the upstream component (UC), and a downstream frequently, and eventually, the system reaches a state where it is no
component (DC) that needs cooling. When the UC degrades, this results longer economical to continue performing IRs. PR, on the other hand,
in a drop in the cooling water flow rate to the DC. Meanwhile, the need brings both the UC and the DC to the as-good-as-new (AGAN) state.
for cooling the DC fluctuates depending on several factors related to the Because of this, performing PR at the right time is important to minimize

* Corresponding author.
E-mail address: tom.i.pedersen@ntnu.no (T.I. Pedersen).

https://doi.org/10.1016/j.ress.2023.109313
Received 7 July 2022; Received in revised form 21 November 2022; Accepted 15 April 2023
Available online 20 April 2023
0951-8320/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
T.I. Pedersen et al. Reliability Engineering and System Safety 237 (2023) 109313

the long-run cost rate. in [37], then investigated in [38–40]. These studies aim to determine
Although the maintenance policies proposed in this paper are whether the exponential formula for reliability still applies when the
tailored to a specific system, the presented approach can also be used as failure rate is conditioned on a stochastic process governed by cova­
a basis for the maintenance optimization of similar systems. Examples of riates. A practical application can be found in [41], where a general path
systems with a relatively simple and easily monitored UC, where an IR model is used to describe the continuous degradation that determines
action is available, and which support a more complex DC, where the the failure rate. If the hazard rate at an inspection exceeds a certain
cost of failure is high, are numerous in the process industry. This can be threshold, preventive renewal is performed. This threshold is then
other cooling systems with heat exchangers as the UC. Other examples optimized based on the long-run cost rate criteria. Similar problems are
are hydraulic or lubrication systems with the hydraulic oil or the addressed in [42] and [43], where the baseline degradation is modeled
lubricant as the UC. These are all examples of UCs where a reduction in by a Wiener process, and the optimal predictive maintenance strategy is
their ability to perform the required function affects the failure rates of developed.
the DCs. In this paper, we apply the proportional hazards model, as in [42],
and assume that the failure rate is constant within a short time interval
1.1. Literature review (piecewise constant). There are two reasons for this. Firstly, our system
is monitored at a relatively high frequency, so the information loss by
Maintenance optimization of the above system must consider the such an approximation is limited. Secondly, this approximation facili­
following two aspects: the health of the UC when degradation, renewals, tates estimating the system’s degradation level when including the effect
and imperfect repairs are taken into account and the stochastic depen­ of IR and PR.
dence between the UC and the DC.
1.2. Contributions and scope of the paper
1.1.1. Modeling of the baseline degradation
The baseline degradation, the evolution of the health indicator when CBM optimization models that consider preventive renewal and
the system is not subject to any maintenance interventions, of continu­ imperfect repair together with hard failure are rare in the existing
ously deteriorating systems are usually described by a general path literature, except for a recent stream of papers using the semi-Markov
model or stochastic processes such as the Wiener process, the Gamma decision process (SMDP) framework [44–47]. To the best of our
process, the inverse Gaussian (IG) process [9–12], or their variations knowledge, this is the first paper to jointly consider the optimization of
[13–15]. The Wiener process is most suitable for modeling both imperfect repair (IR), preventive renewal (PR), and the
non-monotonic degradation paths, while the Gamma and the IG pro­ degradation-dependent failure rate with a two-stage degradation pro­
cesses are widely used for systems that degrade monotonically. Choosing cess and maintenance delay. The main methodological contribution of
the most appropriate baseline degradation is vital but often receives less this paper is to present a novel numerical routine that can efficiently find
attention. For example, authors in [16] point out that despite its popu­ the long-run cost rate for the maintenance of the studied system without
larity, the Gamma process can be wrongly used when there is no evi­ resorting to Monte Carlo simulation. Our method does not rely on a
dence that the variance-to-mean ratio is a constant. particular choice of the baseline degradation process: it does not matter
In the case study in this paper, the baseline degradation is non- whether a Wiener process or a general degradation path model is used.
monotonic. As shown in the literature review by [17], extensive devel­ As long as the degradation process happens in independent increments
opment of Wiener-process-based methods has improved the capability with respect to time and a function providing a probability distribution
and flexibility of approaches for modeling non-homogeneous degrada­ of the baseline degradation in any time increment is available, the
tion processes. Examples of this development are models that consider evolution of the health indicator can be tracked using the proposed
non-linearity, covariates, and heterogeneity among the individual numerical routine.
degrading components. Recent papers that propose variants of Moreover, our modeling framework is flexible enough to incorporate
Wiener-based degradation processes are [18], which proposed a maintenance delay and opportunistic maintenance, making it useful for
Winer-based degradation process for a two-stage degradation process, many practical situations. We have previously used a similar approach
and [19], which models a multi-stage Wiener-based degradation in [48]. In this paper we demonstrate the flexibility and pertinence of
process. our proposed numerical routine by showing how it can be tailored to a
specific industrial case study that would have been hard to model
1.1.2. Modelling the effect of imperfect maintenance analytically. However, human errors, harmful maintenance, and
When it comes to modeling the effect of imperfect maintenance, parameter estimation are beyond the scope of this paper.
there are, according to [20], mainly three different approaches: reduce The remainder of this paper is organized as follows. In Section 2, the
the virtual age [21–24], reduce the degradation rate [25,26], or reduce maintained system and assumptions are presented. The proposed
the degradation level. The latter is relevant to the system in the moti­ maintenance policies are formulated in Section 3. A case study to
vating example in this paper. The reduction in degradation level can illustrate the applicability of the proposed policies is presented in Sec­
either be proportional and deterministic [27–30], inspired by the tion 4. The paper ends with conclusions in Section 5. The notations used
arithmetic reduction of age (ARA) models, or can be modeled by a in this paper are listed in Table 1.
continuous random variable [31–34].
In the case study in this paper, IR has not been shown to affect the 2. System description and assumptions
degradation rate or the virtual age. When IR is performed, the health
indicator (HI) is restored to a level between as-bad-as-old (ABAO) and Below are some general assumptions on the failure and degradation
as-good-as-new (AGAN) based on a continuous probability distribution. of the investigated system:
In this study, the switch between IR and PR is time-based [35] rather
than based on the number of IRs [33,36]. This is because a time-based • The system consists of an upstream component (UC) and a down­
policy generally facilitates the administration and material prepara­ stream component (DC).
tion work. • The UC is subjected to a two-stage continuous degradation process,
monitored at instants τ, 2τ, 3τ…
1.1.3. Stochastic dependence between components • Monitoring is done nearly continuously, i.e., τ is short, and there is
Another important feature of this paper is the joint consideration of no inspection cost.
continuous degradation and hard failure. This issue was first addressed • The DC is subject to hard failures. The failure cost is cf .

2
T.I. Pedersen et al. Reliability Engineering and System Safety 237 (2023) 109313

Table 1 inspection at time kτ reveals that the UC has entered the D-phase. The
Notation. definition of entry into the D-phase may be based on crossing a defined
Xk state of the health indicator for the upstream component (UC) at threshold of the health indicator or on the detection of changes in the
time kτ characteristics of the degradation process [51,52]. Tf is the hard failure
τ inspection interval time of the DC, and Tc is the time when the UC is found to have entered
MIR1 , MIR2 , MPR , maintenance threshold for imperfect repair (IR) of type 1 and 2
the D-phase. Thus TS = min(Tf , Tc ).
Mc and preventive renewal, and lower control limit (LCL)
Ts , TD , TR duration of the S-phase, duration of the D-phase, and the renewal The probability of TC = kτ is P[TC = kτ] = FTC (kτ) − FTC ((k − 1)τ),
cycle length, TR = TS + TD k ≥ 1 where FTC (t) = P[Tc ≤ t]. In this paper, we model the length of
Tf , TC the hard failure time in the S-phase, time of change from S to D- the S-phase with the Weibull distribution. Thus,
phase, TS = min(Tf , TC )
Tf,D , TMPR the local hard failure time in the D-phase, the local time of FTC (t) = P[Tc ≤ t]
preventive renewal in the D-phase, TD = min(Tf,D , TMPR ) ( ( )αW )
t (1)
μ(t; aμ , bμ ), time-dependent drift and diffusion coefficient of the degradation = 1 − exp − , for t ≥ 0.
θW
σ(t; aσ , bσ ) process
Qk the baseline degradation in the time interval ((k − 1)τ, kτ]. Qk = We assume that the failure rate of the DC is time-invariant but de­
X−k − X+
k− 1 pends on the HI of the UC:
gk PMF of the baseline degradation in the time interval ((k − 1)τ,kτ]
C(t), c∞ cumulative cost in the time interval (0, t], long-run cost rate λ(x) = λ0 exp(γ (η − x)), (2)
H(t; x) probability of hard failure in the interval (0, t] given the state of
the health indicator, x
r(x; α, β, ξ) distribution of the effect of imperfect repair (IR)
where the baseline failure rate is λ0 and η and γ are shape parameters.
λIR1 , λIR2 arrival rate for IR of type 1 and 2 Because the fluctuations in XS are assumed small the cumulative prob­
ci1 , ci2 , cp , cf , cu cost of IR of type 1 and 2, preventive renewal (PR), failure, and ability of hard failure in the S-phase in the interval (0, t] is approximated

unavailability as H(t; μS ) = P[Tf ≤ t ⃒Xu = μs , 0 ≤ u ≤ t]. Thus:
Ni1 (k), Ni1 (k) number of IR of type 1 and 2 in the time interval (0, kτ]
μS the expected state of the HI in the S-phase ⎛ ⎞
∫kτ
w, Z time between arrival of maintenance windows, waiting time for ⎝
H(kτ; μS ) = 1 − exp − λ(μS ) du⎠ = 1 − exp( − kτ λ0 exp(γ(η − μS )) ).
the next maintenance window
0

(3)
• The DC is not subject to aging, and its failure rate at time t depends
We simplify the notation of H(kτ; μS ) as Hk,S .
solely on the UC’s health indicator (HI) at time t.

2.1.2. The degradation phase (D-phase)


Assumptions on the maintenance activities are as follows:
In the D-phase, the HI of the UC at time kτ is denoted Xk with
probability mass function (PMF) fXk (x) = P[Xk = x]. We assume that the
• Imperfect repair (IR) requests are made at two predefined thresholds
degradation in the D-phase follows a process with the following prop­
of the HI of the UC: MIR1 and MIR2 . The delay from the requests is
erties. First, the distribution of the health indicator at the start of the D-
made to the execution of IR is exponentially distributed. Execution of
phase is known. Second, the degradation increments between two in­
IR does not disturb the production process.
spection times are independent and have a known distribution: Xk −
• IR restores the UC’s HI to a state between AGAN and ABAO but does
Xk− 1 ∼ gk .
not affect the rate, μ(t), of the baseline degradation process.
In the rest of this paper, the HI is discretized to simplify the calcu­
• Preventive renewal (PR) of the UC is performed at a predetermined
lation of its distribution as it evolves over time. The discretization error
D-phase length, TPR , with a cost of cp . When PR of the UC has been
can be reduced if the HI is defined on a more finely spaced lattice but
performed, both the UC and the DC are considered AGAN. This is
with an increased computational cost. Now, consider the interval ((k −
because the failure rate of the DC is assumed time-invariant.
1)τ, kτ], and assume there is no DC failure and no maintenance in­
• If PR of the UC is performed outside a maintenance window, this
terventions in this interval. A convolution argument now yields, fXk =
leads to an unavailability cost, cu , due to production losses.
• Maintenance windows arrive periodically with a fixed interval fXk− 1 ∗ gk for k = 1, 2…, where ∗ is the convolution operator and gk is the
length, w. If PR is performed in a maintenance window cu = 0. PMF of the degradation increment in the interval ((k − 1)τ, kτ].
Because the length of τ is short, we assume that the health indicator is
Further details are given below. constant within this time interval. Thus, the probability of hard failure of
the DC in the same time interval given that health indicator x is: P[T ≤
kτ|x, (k − 1)τ ≤ u ≤ kτ] = H(τ; x), where H(⋅) is defined in Eq. (3). Next,
2.1. The degradation stages we give more details about the triggering mechanism and effects of
imperfect repair actions.
The degradation of the UC happens in a two-stage process. The
component is healthy until the arrival of a shock that introduces a po­ 2.2. Imperfect repair (IR)
tential failure. The first phase is named the stable phase (S-phase). The
second phase is named the degradation phase (D-phase). This is similar When the HI of the UC is found to be ≤ MIR1 , an imperfect repair (IR)
to the PF model, see e.g., [49], ch.12, and is often encountered in order is placed. This is denoted IR1, with cost ci1 if successfully imple­
practice [50]. In the absence of active maintenance, the health declines mented. The operators also have a large number of other activities
in the D-phase, based on an increasing and non-monotonic stochastic causing exponentially distributed delays, with mean 1/λIR1 . If the IR is
degradation process. The transition point between the two phases is not performed before the health indicator falls below another threshold,
assumed to be easily detected based on the condition monitoring data. MIR2 , a higher priority is given to performing the IR, and additional
personnel is called out. This is denoted IR2, with cost ci2 . The arrival rate
2.1.1. The stable phase (S-phase) of the additional personnel is λIR2 . It is assumed that ci2 > ci1 and
In the S-phase, the health indicator of the UC is denoted as XS with λIR2 > λIR1 .
mean μS and is subject to small random fluctuations. Let the system be Let Y be the improvement in the HI by performing IR. It is reason­
inspected every τ time unit. The duration of the S-phase, Ts , is said to be able to assume that the improvement is capped between AGAN (μS ) and
equal to kτ if a hard failure occurs between (k − 1)τ and kτ, or if the ABAO (Xk− ), where μS is the mean of the HI in the S-phase and Xk− is the

3
T.I. Pedersen et al. Reliability Engineering and System Safety 237 (2023) 109313

state of the HI just before the IR is performed. A scaled beta distribution and cf are the costs of PR, unavailability loss, and the added cost of
is used to model the distribution of Y, i.e., the PMF is given by: failure, respectively. The procedures for finding Qf,D and E[TD ] are pre­
sented in the following sub-sections.
yα− 1 (ξ − y)β− 1
r(y; α, β, ξ) = , α, β, ξ > 0, (4)
ξα+β− 1 B(α, β) 3.1.1. Probability calculations for the PR policy
To simplify the notation, we use in the rest of this section a local time
where α and β are shape parameters and B(⋅) is the beta function. ξ = μS
for the D-phase with TC = 0. The duration of the D-phase, TD , is equal to
− Xk− is a scale parameter to adjust the support of Y to the range [0, ξ].
kτ if either a hard failure occurs between (k − 1)τ and kτ or if Xk at time
kτ is ≤ MPR . We denote the time of failure in the D-phase as Tf,D and the
2.3. Opportunistic maintenance (OM) first time that Xk is found to be at or below MPR as TMPR . Thus, TD =
min(Tf,D , TMPR ).
For systems where unavailability causes large losses, there may be
To find Qf,D and TD , we introduce the following notation: Let Sk be
potential for considerable cost savings by performing opportunistic
the event that the system works at time kτ, while Ak is the event that a
maintenance [53]. In this paper, opportunistic maintenance (OM) is
hard failure of the DC occurs in the interval ((k − 1)τ, kτ ] and Bk is the
understood as maintenance activities performed when production is
event that a PR is performed at kτ. I is the set of values that X can take
stopped due to reasons external to the maintained system and thus does
with the chosen discretization level. The unconditional PMF of Xk is
not cause any additional downtime.
denoted as fXk (x) = P[Xk = x].
Maintenance windows are assumed to arrive at fixed intervals for
In the following, we distinguish between Xk− and Xk+ , where the
this maintained system. Thus, the interarrival time of the maintenance
former refers to the degradation level at time kτ without considering
windows, denoted w, is constant. Zτ is the waiting time for the next
failure or maintenance, and the latter is the degradation level after the
maintenance window, with the PMF fZ (z). Z is assumed to have a uni­
occurrence of these events are taken into account. We define,
form distribution, i.e., fZ (z) = 1/w for z = 0, 1, …, w − 1.
[ ]
fXk+ (x) = P Xk+ = x, Sk
3. Maintenance policies
and
Four different maintenance policies are presented in this section. [ ]
First, a procedure for finding the long-run cost rate for a simple pre­ fXk− (x) = P Xk− = x, Sk− 1 .
ventive renewal (PR) policy without imperfect repair (IR) is presented.
Note that fX+ (x), k = 1, 2, … has zero probability mass for
We then adjust the policy to include OM. Subsequently, two IR policies k

with and without OM are presented. The motivation for choosing these x ≤ MPR . This is because the occurrence of the event Sk implies that Xk−

four policies is that the company in the case study operates two plants is above the preventive renewal threshold. Also note that x fX+ (x) =
k− 1

with similar equipment and failure modes but with different mainte­ P[Sk− 1 ] is the probability that the system survives until (k − 1)τ. As the
nance policies. At one plant, only PR is performed. In contrast, in the probability of failure in the interval ((k − 1)τ, kτ] is determined by Xk− ,
other plant, the frontline personnel have implemented a modification we first compute its distribution by convolution,
that allows for performing IR of the UC without disturbing the produc­
fXk− = fXk−+ 1 ∗ gk . (6)
tion process. Therefore, the first two policies highlight the cost-
effectiveness of OM for the first plant, and the other two policies (with The probability of a hard failure in the interval ((k − 1)τ, kτ] is,
IR and possibly OM) can be applied to the second plant to further reduce ∑ [ ] [ ] ∑
the maintenance cost. P[Ak ] = P Ak | Xk− = x P Xk− = x = H(τ; x)fXk− (x), (7)
x x

3.1. The preventive renewal (PR) policy where H(⋅) is given by Eq. (3). Event Bk can only occur if the renewal
cycle has not already been terminated by a hard failure at time kτ. Thus,
We consider in this section a maintenance policy where no IR is ∑ [ ] [ ]
performed. The HI of the UC is monitored every τ time unit. A preventive P[Bk ] = P Bk |X−k = x P X−k = x, Sk− 1
x∈I
renewal (PR) of the UC is triggered if Xk ≤ MPR . Both components are ∑ [ ] (8)
= l(x ≤ MPR )(1 − H(τ; x))P X−k = x, Sk− 1
renewed if a hard failure occurs on the DC. The goal is to find the x∈I ∑
renewal threshold, MPR , that minimizes the long-run cost rate. The long- = f X−k (x)⋅(1 − H(τ; x)),
run cost rate can be found by dividing the expected renewal cycle cost by x≤MPR

the expected renewal cycle length [12]:


where l is the indicator function. We then update fX+ (x) which is
k
E[C(TR )]
c∞ (MPR ) = . required for the calculation of P[Ak+1 ] and P[Bk+1 ],
E[TR ] [ ]
fXk+ (x) = P X+ k = x, Sk
When the S and D phases are combined, the long-run cost rate for the ∑ [ ]
PR policy can be found by, = P Xk+ = x, Sk , Xk− = y (9)
y∈I
∑ [ ] [ ] [ ]
c∞ (MPR ) = = P Xk+ = x|Sk , Xk− = y P Sk | Xk− = y P Xk− = y
∑+∞ ( ( ( ) )) y∈I

k=1
P[Tc = kτ] cp + cu + cf Hk,S + 1 − Hk,S Qf ,D = l(x = y)l(y > MPR )(1 − H(τ; y))fXk− (y)
∑ +∞ ( ( ) ) , y∈I
k=1
P[Tc = kτ] E[TS |TC = kτ] + 1 − Hk,S E[TD ] = l(x > MPR )(1 − H(τ; x))fXk− (x).
(5) ∑
Note that P[Sk ] = fX+ (x). By combining Eqs. (7), (8), and (9), we
k
x
where the expected length of the S-phase with a given TC is E[TS |TC = kτ]
∑ have,
= ki=1 (1 − Hi,S ). Hk,S is the cumulative probability of failure in the S- ∑ ∑
phase in the period (0,kτ], as presented in Eq. (3). Qf ,D is the probability P[Ak ] + P[Bk ] + P[Sk ] = fXk− (x) = P[Sk− 1 ] = fXk−+ 1 (x).
that the renewal cycle ends with a hard failure given that the UC has x x

entered the D-phase. E[TD ] is the expected length of the D-phase. cp , cu

4
T.I. Pedersen et al. Reliability Engineering and System Safety 237 (2023) 109313

Finally, the probability that the renewal cycle ends with a hard OM is added to the policy, we must track the evolution of the PMF of the
failure in the D-phase is, HI after crossing the renewal threshold, MPR .
We introduce Xz,− TM =k and Xz,+ TM =k to represent the degradation level

+∞
Qf ,D = P[Ak ], (10) before and after failure and maintenance at time (k + z)τ, for the
k=1 probability mass where TMPR = kτ. The subscript z ∈ [0, 1, ⋯, w − 1]
represents possible realizations of Z, while the subscript TM is equivalent
and the expected length of the D-phase is,
to TMPR . Similarly, we define Sz,TM =k as the event that the system is

+ ∞ working at time (k + z)τ, while Az,TM =k is the event that a hard failure
E[TD ] = kτ P[TD = kτ]
occurs while waiting for a maintenance window in the interval
(11)
k=1

+∞ ((k +z − 1)τ, (k +z)τ ] and Bz,TM =k is the event that a maintenance win­
= kτ (P[Ak ] + P[Bk ]).
dow arrives and PR is performed at (k + z)τ. The event A∗k refers to a
k=1
hard failure that occurs in the time interval ((k − 1)τ, kτ] with x > MPR , i.
The flowchart in Fig. 1 illustrates the calculation procedure for the e.,
PR policy. [ ] ∑
P A∗k = H(τ; x)fXk− (x). (12)
3.1.2. The PR policy with opportunistic maintenance (OM) x>MPR

When opportunistic maintenance (OM) is introduced to the PR pol­ We define,


icy, renewal is done at the first maintenance window after TMPR . The [ + ]
length of the D-phase now becomes, TD = min(TMPR +Zτ, Tf,D ) where Zτ, fXz,T
+
M =k
(x) = P Xk+z = x, Sk+z , TMPR = kτ
the waiting time for the maintenance opportunity, is uniformly
distributed as defined in Section 2.3. To find the long-run cost rate when and
[ − ]
fXz,T
− (x) = P Xk+z = x, Sk+z− 1 , TMPR = kτ .
M =k

The distribution of Xz,T



M =k
with z = 0 is:

fXz=0,T
− (x) = l(x ≤ MPR )⋅fXk− (x).
M =k

For the other values of z, Xz,T



M =k
can be found by the convolution,

fXz,T
− = fXz−+ 1,T ∗ gk+z .
M =k M =k

The probability of a hard failure in the interval ((z +k − 1)τ, (z +k)τ]


while waiting for a maintenance window is,
[ ] ∑ [ ] [ ]
P Az,TM =k = P Az,TM =k | X−z,TM =k = x P X−z,TM =k = x
x∈I
∑ (13)
= f X−z,T =k (x) H(τ; x).
M
x∈I

The probability of a maintenance window arriving and PR being


performed at time (z +k)τ is,
[ ] ∑ [ ] [ ]
P Bz,TM =k = P Bz,TM =k |X−z,TM =k = x P X−z,TM =k = x, Sz− 1,TM =k
∑x∈I
= f X−z,T =k (x)⋅(1 − H(τ; x))⋅P[Z = z|Z > (z − 1)] (14)
M
x∈I
∑ 1
= f X−z,T (x)⋅(1 − H(τ; x))⋅ .
x∈I
M =k w− z

We then update fX+ (x),


z,TM =k

[ ]
+
fXz,T
+ (x) = P Xz,TM =k
= x, Sz,TM =k
M =k

∑ [ + −
]
= P Xz,TM =k = x, Sz,TM =k , Xz,TM =k
=y
y

∑( [ ]
+ −
P Xz,TM =k
= x|Sz,TM =k , Xz,TM =k
=y ⋅
y
= [ ] [ (15)
− −
] )
P Sz,TM =k |Xz,TM =k
= y P Xk+z =y
( )
∑ 1
= l(x = y)fXz,T
− (y) (1 − H(τ; y)) 1−
M =k w− z
Fig. 1. Flowchart illustrating the calculation procedure for the preventive y

renewal (PR) policy. If the downstream component is subject to a hard failure, ( )


1
both the downstream component (DC) and the upstream component (UC) are = fXz,T
− (x) (1 − H(τ; x)) 1− .
=k
w− z
renewed. The UC’s health indicator (HI), X, is assumed to be equal to the lower
M

control limit (LCL) when the D-phase is entered. A recursive routine is used to The probability of a hard failure in the D-phase is found by
track the evolution of X for every time increment, kτ, in the D-phase. If the HI is combining Eqs. (12) and (13),
found to be at or below the preventive renewal threshold (Xk ≤ MPR ) preventive
renewal of the UC is performed.

5
T.I. Pedersen et al. Reliability Engineering and System Safety 237 (2023) 109313

( )

+∞
[ ∗] ∑w− 1
[ ]
(16) ∑ ∑
MIR1 MIR1
Qf ,D = P Ak + P Az,TM =k , [ ⃒ ] [ ]
k=1 z=0
P[Bk ] = P Bk ⃒Xk− = x P Xk− = x, Sk = ψ (x)⋅fXk− (x)⋅(1 − H(τ; x)),
x> MIR2 x>MIR2

and by using Eqs. (12), (13), and (14), the expected length of the D- (20)
phase is found by,
and,

+ ∞
E[TD ] = k P[TD = kτ] ∑
MIR2
[ ⃒ ] [ ] ∑MIR2
k=1 P[Dk ] = P Dk ⃒Xk− = x P Xk− = x, Sk = ψ (x)⋅fXk− (x)⋅(1 − H(τ, x)) .
( ) x = 0 x=0

+ ∞ ( [ ]) w−∑1 ( [ ] [ ])
= k P A∗k + (z + k) P A z,TM =k + P Bz,TM =k . (21)
k=1 z=0
Let TPR = r τ. The expected number of IRs is given by:
(17)

r
Finally, the long-run cost rate for the PR policy with OM can be found E[NIR1 (r)] = P[Bk ], (22)
by using Eq. (16) for Qf,D and Eq. (17) for E[TD ] in Eq. (5) and setting cu = k=1

0.
and,

r
3.2. The imperfect repair (IR) policy E[NIR2 (r)] = P[Dk ]. (23)
k=1

With the introduction of the ability to perform IR without distur­ We then update fX+ (x),
bance to the production process, the HI can be kept above MPR arbi­ k

[ ]
trarily long. However, with a degradation rate that increases with time, fXk+ (x) = P Xk+ = x, Sk
the frequency of IRs will, at some point, become so high that it is no ∑ [ + ]
longer economical to continue performing IRs on the UC. Furthermore, = P Xk = x, Sk , Xk− = y
y∈I
since there is no evidence in the case study that the performance of IR ∑ [ + ] [ ⃒ − ] [ − ]

influences the degradation rate, we adopt a time-based switching policy

= y∈I P Xk = x| Sk , Xk = y P Sk Xk = y P Xk = y .
rather than based on the number of IRs performed.
Based on this, a planned renewal time for the UC, TPR , is introduced
We have that P[Xk− = y] = fXk− (y) and P[Sk |x−k = y] = 1 − H(τ; y). Then,
as one of the decision variables. TPR is defined in the local time in the D-
phase. The other decision variables are the two maintenance thresholds [ ] [ ]
P Xk+ = x|Sk , Xk− = y = P (Bk ∪ Dk )C l(x = y) + P[(Bk ∪ Dk )]fX0 (x)
for IR1 and IR2, named MIR1 and MIR2 , respectively. When the S and D
(1 − ψ (y)) l(x = y) + ψ (y)fX0 (x) ,
phases are combined, the long-run cost rate for the IR policy is found by, =

c∞ (MIR1 , MIR2 , TPR ) =


∑+∞ ( ( ) )
P[Tc = kτ] cp + cu + cf Hk,S + 1 − Hk,S E[CD ] (18)
k=1
∑ +∞ ( ( ) ) .
k=1
P[Tc = kτ] E[TS |TC = kτ] + 1 − Hk,S E[TD ]

where fX0 is the PMF of X after the IR, again found by the convolution
E[CD ] = ci1 E[NIR1 ]+ci2 E[NIR2 ] + cf Qf,D is the expected cost in the D- argument:
phase. E[NIR1 ] and E[NIR2 ] are the expected numbers of IR1 and IR2
within the renewal period, while Qf,D is the probability of failure in the ∑
x
fX0 (x) = fXk− (x − y)⋅ r(y; α, β, μS − x + y ), ∀ x < μS ,
D-phase. TD denotes the duration of the D-phase, i.e., TD = min(TPR , y=0
Tf,D ). TPR is the planned renewal time of the UC and Tf,D is the hard
failure time of the DC in the D-phase. where r(⋅) is the PMF of the IR effect given by Eq. (4).
Let TPR = r τ. The probability that a renewal cycle ends with a hard
3.2.1. Probability calculations for the IR policy failure is given by:
To calculate Qf ,D , E[NIR1 ], E[NIR2 ], and E[TD ], we must find the dis­ ∑
r
tribution of Tf,D . The same notation as in Section 3.1.1 is used here Qf ,D (r) = P[Ak ]. (24)
except for Bk , which in this section denotes the event that IR1 is per­ k=1

formed with MIR2 < x ≤ MIR1 at kτ. Dk denotes that IR2 is performed Eq. (7) still applies for P[Ak ]. Thus, the expected length of the D-phase
with x ≤ MIR2 at kτ. Let ψ (x) be the probability of IR in the time interval is:
((k − 1)τ, kτ] given the degradation level x,

r ∑
r

⎪ 0 if x > MIR1 E[TD ] = kτ P[TD = k τ] = kτ P[Ak ]. (25)

⎨ k=1 k=1
ψ (x) = 1 − exp( − λIR1 τ) if MIR2 < x ≤ MIR1 (19)

⎪ Finally, by combining Eqs. (22), (23), and (24), the expected cost in

1 − exp( − (λIR1 + λIR2 )τ) if 0 < x ≤ MIR2 . the D-phase, CD , when TPR = r τ is found by,
The probability of events Bk and Dk are, E[CD (r)] = ci1 E[NIR1 (r)] + ci2 E[NIR2 (r)] + cf Qf ,D (r). (26)

The flowchart in Fig. 2 illustrates the calculation procedure for the IR

6
T.I. Pedersen et al. Reliability Engineering and System Safety 237 (2023) 109313

Fig. 2. Flowchart illustrating the calculation procedure for the imperfect repair (IR) policy. If the downstream component is subject to a hard failure, both the
downstream component (DC) and the upstream component (UC) are renewed. The UC’s health indicator (HI), X, is assumed to be equal to the lower control limit
(LCL) when the D-phase is entered. A recursive routine is used to track the evolution of X for every time increment, kτ, in the D-phase. Preventive renewal (PR) of the
UC is performed at a predetermined length of the D-phase (TPR ). An imperfect repair (IR) order is placed if Xk− ≤ MIR1 . If Xk− ≤ MIR2 additional personnel are called
out in order to increase the arrival rate of the IR. Y is the improvement in the HI by performing IR.

7
T.I. Pedersen et al. Reliability Engineering and System Safety 237 (2023) 109313

policy. paths. The sensor data indicated both temporal and unit-to-unit vari­
ability in the degradation process. However, as parameter estimation is
3.2.2. The IR policy with opportunistic maintenance (OM) not the focus of this paper, only point estimates for the parameters were
When opportunistic maintenance (OM) is introduced to the IR pol­ used.
icy, the length of the D-phase becomes TD = min(TPR + Zτ,Tf,D ). Unlike Because the operator had used a conservative maintenance policy,
the policy presented in Section 3.1.2, where the renewal decision is there were no known examples of failure in the downstream component
based on a degradation threshold, MPR , the renewal decision for the IR (DC) caused by a lack of cooling from the upstream component (UC) in
policy is based on a threshold in the time domain, TPR . This simplifies the the available data. However, based on a physical understanding of the
calculations of the expected length and expected cost of the D-phase system, it is evident that a lack of cooling water flow from the UD in­
when OM is introduced. creases the probability of failure of the DC. Because of this, we had to
The expected length of the D-phase with TPR = r τ and OM is found resort to expert judgments to come up with estimates for the hazard
by: model parameters.
The cost parameters, interarrival time of maintenance windows, and
1 ∑
w− 1
[ ( )]
maintenance delay for IR2 were set based on available maintenance
E[TD |TPR = r τ] = E min TPR + zτ, Tf,D
w z=0 records and company procedures. The cost parameters have been scaled,
(27) but the relative size of these parameters has been retained. The unit for
w− 1 ∑
1 ∑ r+z
the HI of the UC, the cooling water flow, is m3 /h. The sampling rate for
= w kτ P[Ak ].
z=0 k=1 the collected sensor data was one hour and the sensor readings were
specified with two decimal points. This discretization of time, τ = 1
The expected cost in the D-phase with OM can be found in a similar
hour, and the HI, dx = 0.01, proved to give a reasonable trade-off be­
way,
tween accuracy and computational time and was used in the case study.
1 ∑
w− 1
E[CD |TPR = r τ] = E[CD (MIR1 , MIR2 , TPR + zτ)]
w z=0 4.2. The degradation process in the case study

1 ∑ The expected length of the S-phase was found to be 10 324 h, based


w− 1 ∑
r+z
( )
cf P[Ak ] + ci1 P[Bk ] + ci2 P[Dk ] .
= w z=0 k=1 on Eq. (1) and the parameters in Table 2. The change from the S to the D-
(28) phase was, in the case study, defined as the first time the health indicator
was found to be below a defined lower control limit, LCL, for the third
The long-run cost rate for the IR policy with OM can be found by inspection in a row. Based on this, we approximated that X(t =
using Eq. (27) for the expected length and Eq. (28) for the expected cost TC ) ≈ LCL. From inspections of historical degradation paths, such as the
in the D-phase and letting cu = 0 when calculating Eq. (18). example shown in Fig. 3, we chose to model the degradation of the UC in
the D-phase as an increasing and non-monotonic degradation process
4. Case study with normally distributed degradation increments. The following func­
tions for the mean and standard deviation of the degradation increments
This section presents the application of the proposed policies to the were used: μ(t) = aμ exp(bμ t) and σ(t) = aσ exp(bσ t) with t = 0 at TC .
motivating case study. The estimated parameters for the case study, Thus, the distribution of the degradation increments in the D-phase, at
shown in Table 2, are based on a combination of available sensor data, time kτ, can be modeled as:
maintenance records, and expert judgments collected from the company ( )
( ) √̅̅̅
operating the system investigated in this study. Qk ∼ N τ aμ exp bμ (kτ − TC ) , τ aσ exp(bσ (kτ − TC )) , for kτ > TC .

The PMF of Qk is denoted as gk (q) with,


4.1. The parameters used in the case study
gk (q) = Gk (q +0.5 dx) − Gk (q − 0.5 dx). Gk is the cumulative distribution
of the degradation increments at time kτ,
The effect of the imperfect repair (IR), the maintenance delay for IR1,
and the distribution of the S-phase were found using maximum likeli­ Gk (q) = P[q ≤ Q]
( ( ))
hood estimates (MLE) based on available sensor data and maintenance 1 q − μ(kτ − TC )
records. The parameters for the degradation process were also estimated =
2
1 + erf √̅̅̅
σ (kτ − TC ) 2
,
based on available sensor data by using a curve fitting routine based on
the Levenberg–Marquardt algorithm on one of the previous degradation √̅̅̅ ∫x
where erf(⋅) is the error function, i.e., erf(x) = (2 / π) exp(− u2 )du . An
0
Table 2
Parameters used in the case study.
Parameters Value

Degradation process
aμ , bμ − 1.7⋅10− 7 , 1.4⋅10− 3

aσ , bσ 0.01, 1.6⋅10− 4
Cost structure
ci1 , ci2 , cu , cp , cf 1, 4, 100, 100, 10 000
Interarrival time of the maintenance windows
w 730
Maintenance delay
λIR1 , λIR2 0.038, 1
Hazard model
λ0 , η, γ 6.3⋅10− 6 , 5.0, 3.8
IR parameters
α, β 1.32, 1.27
S-phase
αW , θW , μs , LCL 2.8, 13 179, 7.80, 7.47
Fig. 3. An example of a degradation path from the industrial dataset.

8
T.I. Pedersen et al. Reliability Engineering and System Safety 237 (2023) 109313

Fig. 6. The long-run cost rate, c∞ , for the IR policy with the optimal thresholds
Fig. 4. The long-run cost rate, c∞ , for the PR policy depending on the threshold
for MIR1 and MIR2 . The dotted line represents the policy where PR is performed
for PR (MPR ). The dotted line represents the PR policy when PR of the UC is
immediately at TPR , while the solid line represents the policy with OM, with
performed immediately at TMPR . The solid line is the PR policy with OM, with w
w = 730.
= 730. LCL is the lower control limit used to detect the shift from the S-phase to
the D-phase.
depending on MIR1 and MIR2 for the IR policy without OM. For this
policy, the IR thresholds that minimize the long-run cost rate are MIR1 =
advantage of using the error function is that there are several fast nu­
6.04 and MIR2 = 4.77, while the level of TPR that minimizes the long-run
merical approximations for this function, which makes it well-suited for
cost rate is 7785. For the IR policy with OM, the optimal thresholds for
use in the numerical routine.
the decision variables are MIR1 = 6.01, MIR2 = 4.76, and TPR = 6979.
The long-run cost rate for the IR policy without and with OM are
4.3. Results 0.01097 and 0.00543, respectively (Fig 6).
When opportunistic maintenance was not used, the difference in the
A grid search was performed to find the optimal threshold for the minimal long-run cost rate between the PR and the IR policy was found
decision variable in the PR policy, with and without OM, using Eq. (5). negligible. The reduction in cost rate when using the IR policy is only
The optimal threshold for MPR when PR is performed immediately at (0.01097 − 0.01059)/0.01097 ≈ 3% compared with the PR policy.
TMPR is 5.27, with a cost rate of 0.01097 (Fig. 4). If the PR is delayed until However, when OM is introduced, the reduction in cost rate by using the
the next maintenance window, with w = 730, the long-run cost rate is IR policy is ((8.28 − 5.43)/8.31) ⋅10− 3 ≈ 34%. This exemplifies the
minimized by setting MPR > LCL. In other words, the optimal time for benefit of having the option to perform imperfect repairs so the per­
performing the PR is at the first maintenance window that arrives after formance of the UC can be kept in check while waiting for the next
TC . Because no unavailability cost is incurred when renewals are per­ maintenance window to arrive.
formed during maintenance windows, i.e., cu = 0, the optimal cost rate
for the PR policy with OM is 0.00828. 4.4. The performance of the proposed numerical procedures
Similarly, a grid search was performed to find the optimal thresholds
for the decision variables for the IR policy using Eq. (18). Fig. 5 shows a The numerical routines for all four policies were implemented in
grid plot of the long-run cost rate with the optimal threshold for TPR , Python 3.8.10 with the NumPy 1.20.2 [54] and SciPy 1.7.0 [55] li­
braries. The computational times for finding the expected cost (E[CD ])
and duration (E[TD ]) of the D-phase with the proposed numerical rou­
tines are compared with Monte Carlo (MC) simulations with 104 sample
paths in Table 3. For the PR policy with OM, the numerical procedure
took twice as long time compared to the Monte Carlo simulation due to

Table 3
Comparison of computational times for finding the expected cost and duration of
the D-phase. For the PR policy, the decision variable, MPR = 5.0, was used, and
the numerical routine was evaluated for k ∈ [1, 10 000]. The decision variables,
MIR1 = 6.0, MIR2 = 4.8, and TPR = 8000, were used for the IR policy. The
values presented in Table 2 were used for the remainder of the parameters. The
values in the table are the mean duration based on ten runs on the NTNU IDUNN
computer cluster [56].
Policy Numerical [seconds] MC simulation [seconds] Ratio

PR 3.6 790 219


PR with OM 2017 1061 0.5
IR 28a 1231 44a
IR with OM 33a 1303 39a
a
For the IR policies, the numbers are not entirely comparable because, for the
numerical routine, the expected cost and duration in the D-phase were evaluated
Fig. 5. Grid plot of the long-run cost rate, c∞ , for the IR policy depending on for all values of TPR in the range [1,8000]. In contrast, only TPR = 8000 was
the levels of MIR1 and MIR2 . The point at MIR1 = 6.04 and MIR2 = 4.77 marks the evaluated with the MC simulation.
IR thresholds that minimize the long-run cost rate.

9
T.I. Pedersen et al. Reliability Engineering and System Safety 237 (2023) 109313

the inefficient procedure for separate calculations of the PMF of Xk after potential direction of further work is to include parameter uncertainty in
TMPR for each kτ, as presented in Section 3.1.2. Thus, the practical the model, as discussed in Section 4.5. Another potential direction of
advantage of using the numerical procedure for this policy is limited future work is exploring whether other modeling frameworks can
when w is large. However, for the other three policies, considerable further reduce the long-run cost rate. In this paper, the maintenance
computational time can be saved using the proposed numerical routine decisions are based on control limits that are fixed during the renewal
compared to Monte Carlo simulations. For the IR policies, the ratio is not cycle. An advantage of using fixed thresholds is that this makes the
entirely comparable. For the numerical routine, the expected cost arrival of maintenance activities predictable for the operators. However,
(E[CD ]) and duration (E[TD ]) of the D-phase were evaluated for each there is probably potential for further reduction of the long-run cost rate
TPR ∈ [1, 8000] as the routine is recursive. In contrast, only E[CD ] and by using frameworks where the maintenance policy can be adapted as
E[TD ] at TPR = 8000 were evaluated with the Mont Carlo simulation. new information arrive in an online manner, e.g., factors such as the
current health state or time to the next maintenance window. An
4.5. Further considerations when applying the proposed method example of such a framework is the semi-Markov Decision Process.

There are several aspects outside the scope of this paper that are CRediT authorship contribution statement
important to take into consideration if the proposed policy is to be
applied. Among these are parameter uncertainty and the choice of Tom Ivar Pedersen: Conceptualization, Methodology, Software,
optimization technique. As this study present no novelty in these areas, Validation, Formal analysis, Investigation, Resources, Data curation,
we will only present a brief discussion on these aspects in this Writing – original draft, Visualization. Xingheng Liu: Methodology,
subsection. Validation, Formal analysis, Writing – original draft, Writing – review &
One way to take parameter uncertainty into account is by doing a editing. Jørn Vatn: Methodology, Writing – review & editing,
sensitivity analysis. This can be done by calculating how changing each Supervision.
individual parameter affects the long-run cost rate [57]. The results of
this analysis can be visualized either through spider plots or tornado
Declaration of Competing Interest
diagrams [58] and may help the operator make decisions that take
better account of the parameter uncertainty. Another benefit of such an
The authors declare that they have no known competing financial
analysis is identifying parameters where the parameter uncertainty
interests or personal relationships that could have appeared to influence
should be propagated into the model. As the HI is subject to continuous
the work reported in this paper.
monitoring, better estimates of the degradation parameters of an indi­
vidual UC can be found as data from that degradation path becomes
Data availability
available.
Recently several papers have proposed approaches for adaptive
The authors do not have permission to share data.
parameter estimation for non-monotonic degradation processes. In [59],
the drift of a nonlinear Wiener-based degradation process is updated by
taking account of temporal uncertainty and unit-to-unit variability. In
Acknowledgment
[60], temporal variability in the environmental covariate that affects the
degradation is implemented in a Wiener-based degradation process. A
This research is a part of BRU21 – NTNU Research and Innovation
Bayesian framework is then used for online updating of the parameter
Program on Digital and Automation Solutions for the Oil and Gas In­
estimates. [61] presents another example of Bayesian updating of
dustry (www.ntnu.edu/bru21) and the Norwegian Center for Research
parameter estimation for Wiener-based degradation processes.
based Innovation on Subsea Production and Processing (SUBPRO, www.
With the parameters used in the case study, the cost function appears
ntnu.edu/subpro). The work of the second author is funded by the
unimodal in the plots in Section 4. However, the cost function for the IR
Research Council of Norway (SUBPRO, project 237893).
policies may become multimodal because the cost of performing IR can
occur several times within a renewal cycle. If, for instance, the variance
of the degradation process is low, the cost of the IRs will come at almost References
the same time for all degradation paths, causing a cost function with
[1] Van De Kerkhof RM, Akkermans HA, Noorderhaven NG, Zijm H, Klumpp M,
multiple local minima. Because of this, algorithms suitable for multi­ Clausen U, Ten Hompel M. Knowledge lost in data: organizational impediments to
modal optimization problems should be considered when optimizing the condition-based maintenance in the process industry. editors. Logistics and supply
IR policies. See, e.g., [62] for an introduction to numerical optimization chain innovation: bridging the gap between theory and practice. Springer; 2015.
p. 223–37.
methods. [2] Lee J, Wu F, Zhao W, Ghaffari M, Liao L, Siegel D. Prognostics and health
management design for rotary machinery systems—Reviews, methodology and
5. Conclusions applications. Mech Syst Signal Process 2014;42:314–34. https://doi.org/10.1016/
j.ymssp.2013.06.004.
[3] de Jonge B, Scarf PA. A review on maintenance optimization. Eur J Oper Res 2020;
This paper proposes a condition-based maintenance policy (CBM) for 285:805–24. https://doi.org/10.1016/j.ejor.2019.09.047.
a two-component system subject to two-stage continuous degradation [4] Bokrantz J, Skoogh A, Berlin C, Wuest T, Stahre J. Smart Maintenance: an
empirically grounded conceptualization. Int J Prod Econ 2020;223:107534.
and hard failure. The policy also includes maintenance delay and https://doi.org/10.1016/j.ijpe.2019.107534.
opportunistic maintenance. The numerical procedures for finding the [5] Veldman J, Klingenberg W, Wortmann H. Managing condition-based maintenance
long-run cost rate for the CBM policies with imperfect repair (IR), pre­ technology: a multiple case study in the process industry. J Qual Maint Eng 2011;
17:40–62. https://doi.org/10.1108/13552511111116240.
sented in this paper, were found to be considerably faster than Monte [6] Olde Keizer MCA, Flapper SDP, Teunter RH. Condition-based maintenance policies
Carlo simulations. for systems with multiple dependent components: a review. Eur J Oper Res 2017;
For the specific case study in this paper, the potential cost savings 261:405–20. https://doi.org/10.1016/j.ejor.2017.02.044.
[7] Zio E. The Monte Carlo simulation method for system reliability and risk analysis.
from using imperfect repair (IR) are substantial when maintenance
London: Springer; 2013.
windows are available. This exemplifies the potential for cost savings by [8] Zhu W, Fouladirad M, Bérenguer C. A multi-level maintenance policy for a multi-
combining maintenance modeling and optimization with the inven­ component and multifailure mode system with two independent failure modes.
tiveness of frontline personnel to develop maintenance policies tailored Reliab Eng Syst Saf 2016;153:50–63. https://doi.org/10.1016/j.ress.2016.03.020.
[9] Si XS, Wang W, Hu CH, Zhou DH, Pecht MG. Remaining useful life estimation
to the maintained system and maintenance capabilities on hand. based on a nonlinear diffusion degradation process. IEEE Trans Reliab 2012;61:
In this paper, all model parameters are assumed to be known. A 50–67. https://doi.org/10.1109/TR.2011.2182221.

10
T.I. Pedersen et al. Reliability Engineering and System Safety 237 (2023) 109313

[10] Wang X, Xu D. An inverse gaussian process model for degradation data. [35] Chen Y, Gong W, Xu D, Kang R. Imperfect maintenance policy considering positive
Technometrics 2010;52:188–97. https://doi.org/10.1198/TECH.2009.08197. and negative effects for deteriorating systems with variation of operating
[11] Ye Z-S, Chen N. The inverse Gaussian process as a degradation model. conditions. IEEE Trans Autom Sci Eng 2018;15:872–8. https://doi.org/10.1109/
Technometrics 2014;56:302–11. https://doi.org/10.1080/ TASE.2017.2675405.
00401706.2013.830074. [36] Do P, Voisin A, Levrat E, Iung B. A proactive condition-based maintenance strategy
[12] van Noortwijk JM. A survey of the application of gamma processes in maintenance. with both perfect and imperfect maintenance actions. Reliab Eng Syst Saf 2015;
Reliab Eng Syst Saf 2009;94:2–21. https://doi.org/10.1016/j.ress.2007.03.019. 133:22–32. https://doi.org/10.1016/j.ress.2014.08.011.
[13] Giorgio M, Guida M, Pulcini G. A new class of markovian processes for [37] Yashin A, Arjas E. A note on random intensities and conditional survival functions.
deteriorating units with state dependent increments and covariates. IEEE Trans J Appl Probab 1988;25:630–5. https://doi.org/10.2307/3213991.
Reliab 2015;64:562–78. https://doi.org/10.1109/TR.2015.2415891. [38] Singpurwalla ND. Survival in dynamic environments. Stat Sci 1995:86–103.
[14] Liu X, Matias J, Jäschke J, Vatn J. Gibbs sampler for noisy transformed Gamma https://doi.org/10.1214/ss/1177010132.
process: inference and remaining useful life estimation. Reliab Eng Syst Saf 2022; [39] Finkelstein MS. On the exponential formula for reliability. IEEE Trans Reliab 2004;
217:108084. https://doi.org/10.1016/j.ress.2021.108084. 53:265–8. https://doi.org/10.1109/TR.2004.829164.
[15] Wang X, Balakrishnan N, Guo B. Residual life estimation based on a generalized [40] Lehmann A. Joint modeling of degradation and failure time data. J Stat Plan
Wiener degradation process. Reliab Eng Syst Saf 2014;124:13–23. https://doi.org/ Inference 2009;139:1693–706. https://doi.org/10.1016/j.jspi.2008.05.027.
10.1016/j.ress.2013.11.011. [41] Liu B, Liang Z, Parlikad AK, Xie M, Kuo W. Condition-based maintenance for
[16] Guida M, Postiglione F, Pulcini G. A time-discrete extended gamma process for systems with aging and cumulative damage based on proportional hazards model.
time-dependent degradation phenomena. Reliab Eng Syst Saf 2012;105:73–9. Reliab Eng Syst Saf 2017;168:200–9. https://doi.org/10.1016/j.ress.2017.04.010.
https://doi.org/10.1016/j.ress.2011.12.016. [42] Hu J, Chen P. Predictive maintenance of systems subject to hard failure based on
[17] Zhang Z, Si X, Hu C, Lei Y. Degradation data analysis and remaining useful life proportional hazards model. Reliab Eng Syst Saf 2020;196:106707. https://doi.
estimation: a review on Wiener-process-based methods. Eur J Oper Res 2018;271: org/10.1016/j.ress.2019.106707.
775–96. https://doi.org/10.1016/j.ejor.2018.02.033. [43] Zheng H, Kong X, Xu H, Yang J. Reliability analysis of products based on
[18] Gao H, Cui L, Dong Q. Reliability modeling for a two-phase degradation system proportional hazard model with degradation trend and environmental factor.
with a change point based on a Wiener process. Reliab Eng Syst Saf 2020;193. Reliab Eng Syst Saf 2021;216. https://doi.org/10.1016/j.ress.2021.107964.
https://doi.org/10.1016/j.ress.2019.106601. [44] Najafi S, Zheng R, Lee CG. An optimal opportunistic maintenance policy for a two-
[19] Liao G, Yin H, Chen M, Lin Z. Remaining useful life prediction for multi-phase unit series system with general repair using proportional hazards models. Reliab
deteriorating process based on Wiener process. Reliab Eng Syst Saf 2021;207: Eng Syst Saf 2021;215. https://doi.org/10.1016/j.ress.2021.107830.
107361. https://doi.org/10.1016/j.ress.2020.107361. [45] Zheng R, Chen B, Gu L. Condition-based maintenance with dynamic thresholds for
[20] Huynh KT. A hybrid condition-based maintenance model for deteriorating systems a system using the proportional hazards model. Reliab Eng Syst Saf 2020;204:
subject to nonmemoryless imperfect repairs and perfect replacements. IEEE Trans 107123. https://doi.org/10.1016/j.ress.2020.107123.
Reliab 2020;69:781–815. https://doi.org/10.1109/TR.2019.2942019. [46] Zheng R, Makis V. Optimal condition-based maintenance with general repair and
[21] Baraldi P, Balestrero A, Compare M, Benetrix L, Despujols A, Zio E. A modeling two dependent failure modes. Comput Ind Eng 2020;141. https://doi.org/
framework for maintenance optimization of electrical components based on fuzzy 10.1016/j.cie.2020.106322.
logic and effective age. Qual Reliab Eng Int 2013;29:385–405. https://doi.org/ [47] Zheng R, Wang J, Zhang Y. A hybrid repair-replacement policy in the proportional
10.1002/qre.1388. hazards model. Eur J Oper Res 2022. https://doi.org/10.1016/j.ejor.2022.05.020.
[22] Mercier S, Castro IT. On the modelling of imperfect repairs for a continuously press.
monitored gamma wear process through age reduction. J Appl Probab 2013;50: [48] Pedersen TI, Vatn J. Optimizing a condition-based maintenance policy by taking
1057–76. https://doi.org/10.1239/jap/1389370099. the preferences of a risk-averse decision maker into account. Reliab Eng Syst Saf
[23] Ahmadi R. A new approach to modeling condition-based maintenance for 2022;228:108775. https://doi.org/10.1016/j.ress.2022.108775.
stochastically deteriorating systems. Int J Reliab Qual Saf Eng 2014;21:1450024. [49] Rausand M, Barros A, Høyland A. System reliability theory: models, statistical
https://doi.org/10.1142/S0218539314500247. methods, and applications. 3rd ed. Hoboken, NJ: John Wiley & Sons; 2021.
[24] Ahmadi R. Scheduling preventive maintenance for a nonperiodically inspected [50] Moubray J. Reliability-centered maintenance. 2nd ed. New York: Industrial Press;
deteriorating system. Int J Reliab Qual Saf Eng 2015;22:1550029. https://doi.org/ 1997.
10.1142/S0218539315500291. [51] Fouladirad M, Grall A, Dieulle L. On the use of on-line detection for maintenance of
[25] Deloux E, Dijoux Y, Fouladirad M. Generalization of the proportional hazards gradually deteriorating systems. Reliab Eng Syst Saf 2008;93:1814–20. https://doi.
model for maintenance modelling and optimization. Proc Inst Mech Eng O J Risk org/10.1016/j.ress.2008.03.020.
Reliab 2012;226:439–47. https://doi.org/10.1177/1748006X12448149. [52] Lei Y, Li N, Guo L, Li N, Yan T, Lin J. Machinery health prognostics: a systematic
[26] Zhang M, Gaudoin O, Xie M. Degradation-based maintenance decision using review from data acquisition to RUL prediction. Mech Syst Signal Process 2018;
stochastic filtering for systems under imperfect maintenance. Eur J Oper Res 2015; 104:799–834. https://doi.org/10.1016/j.ymssp.2017.11.016.
245:531–41. https://doi.org/10.1016/j.ejor.2015.02.050. [53] Wang H. A survey of maintenance policies of deteriorating systems. Eur J Oper Res
[27] Letot C, Dehombreux P, Fleurquin G, Lesage A. An adaptive degradation-based 2002;139:469–89. https://doi.org/10.1016/S0377-2217(01)00197-7.
maintenance model taking into account both imperfect adjustments and AGAN [54] Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D,
replacements. Qual Reliab Eng Int 2017;33:2043–58. https://doi.org/10.1002/ et al. Array programming with NumPy. Nature 2020;585:357–62. https://doi.org/
qre.2166. 10.1038/s41586-020-2649-2.
[28] Zhao X, He S, Xie M. Utilizing experimental degradation data for warranty cost [55] Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al.
optimization under imperfect repair. Reliab Eng Syst Saf 2018;177:108–19. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods
https://doi.org/10.1016/j.ress.2018.05.002. 2020;17:261–72. https://doi.org/10.1038/s41592-019-0686-2.
[29] Ponchet A, Fouladirad M, Grall A. Maintenance policy on a finite time span for a [56] Själander M., Jahre M., Tufte G., Reissmann N. EPIC: an energy-efficient, high-
gradually deteriorating system with imperfect improvements. Proc Inst Mech Eng performance GPGPU computing research infrastructure. arXiv 2019. https://arxiv.
O J Risk Reliab 2011;225:105–16. https://doi.org/10.1177/1748006XJRR349. org/abs/1912.05848.
[30] Castro IT, Mercier S. Performance measures for a deteriorating system subject to [57] Howard R, Abbas A. Foundations of decision analysis, global edition. Harlow,
imperfect maintenance and delayed repairs. Proc Inst Mech Eng O J Risk Reliab England: Pearson Education Limited; 2016.
2016;230:364–77. https://doi.org/10.1177/1748006X16641789. [58] Eschenbach TG. Spiderplots versus tornado diagrams for sensitivity analysis.
[31] Meier-Hirmer C, Riboulet G, Sourget F, Roussignol M. Maintenance optimization Interfaces Provid 1992;22:40–6. https://doi.org/10.1287/inte.22.6.40.
for a system with a gamma deterioration process and intervention delay: [59] Yu W, Shao Y, Xu J, Mechefske C. An adaptive and generalized Wiener process
application to track maintenance. Proc Inst Mech Eng O J Risk Reliab 2009;223: model with a recursive filtering algorithm for remaining useful life estimation.
189–98. https://doi.org/10.1243/1748006XJRR234. Reliab Eng Syst Saf 2022;217:108099. https://doi.org/10.1016/j.
[32] Guo C, Wang W, Guo B, Si X. A maintenance optimization model for mission- ress.2021.108099.
oriented systems based on Wiener degradation. Reliab Eng Syst Saf 2013;111: [60] Xu X, Tang S, Yu C, Xie J, Han X, Ouyang M. Remaining useful life prediction of
183–94. https://doi.org/10.1016/j.ress.2012.10.015. lithium-ion batteries based on wiener process under time-varying temperature
[33] Van PD, Bérenguer C. Condition-based maintenance with imperfect preventive condition. Reliab Eng Syst Saf 2021;214. https://doi.org/10.1016/j.
repairs for a deteriorating production system. Qual Reliab Eng Int 2012;28:624–33. ress.2021.107675.
https://doi.org/10.1002/qre.1431. [61] Liu D, Wang S. A degradation modeling and reliability estimation method based on
[34] Nicolai RP, Frenk JBG, Dekker R. Modelling and optimizing imperfect maintenance Wiener process and evidential variable. Reliab Eng Syst Saf 2020;202. https://doi.
of coatings on steel structures. Struct Saf 2009;31:234–44. https://doi.org/ org/10.1016/j.ress.2020.106957.
10.1016/j.strusafe.2008.06.015. [62] Nocedal J, Wright S. Numerical optimization. 2nd Ed. New York: Springer; 2006.

11

You might also like