Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Reliability Engineering and System Safety 49 (1995) 23-36

~) 1995 Elsevier Science Limited


Printed in Northern Ireland. All fights reserved
ELSEVIER 0951-8320(95)00035-6 0951-8320/95/$9.50

Optimization of test and maintenance


intervals based on risk and cost
J. K. Vaurio
Imatran Voima Oy, PL 23, 07901 Loviisa, Finland
(also with Lappeenranta University of Technology)

(Received 1 December 1994; accepted 23 March 1995)

A general procedure is presented for optimizing the test and maintenance


intervals of safety related systems and components. The method is based on
minimizing the total plant-level cost under the constraint that the total
accident frequency (risk) remains below a set criterion.
The measure of risk is the time-average value of an accident rate. The
probabilities of component failures and other basic events are linear functions
or inversely proportional to the test or maintenance intervals. Human errors
and common cause failures are included in the formalism. Analytical results
are obtained for single components and simple systems while a numerical
procedure is given for obtaining optimal intervals for complex plants with
multiple systems and initiating events.

1 INTRODUCTION many m-out-of-n parallel redundancy systems (e.g.


literature listed in Ref. 4), minimization requires a
Engineered safety systems are usually standby systems numerical solution with computer codes like
that are tested periodically to reveal and repair ICARUS, 5 or trial and error by using time-dependent
failures that may have occurred since the previous unavailability codes such as FRANTIC. 6
activation or inspection. Downtime (unavailability) Some studies on acceptable test intervals have set
can be caused by testing or by failures and repairs criteria for individual components such that the risk
caused by activations or occurring during standby, as increase during standby unavailability should be a
well as by human errors associated with tests. certain fraction of the average risk. 7 While this can be
Frequent testing increases the testing costs while reasonable for risk-important components it does not
infrequent testing leads to increasing downtime and consider costs or fully account for the fact that many
risk. It is obvious that optimal test intervals usually components can belong to the same testing group. It
exist for minimizing costs while satisfying the safety can also lead to unrealistically long test intervals for a
goal. large number of less important components.
Early optimizations of single component test Test-caused failures have also been included in
intervals were based on minimizing the time-average optimizing system test intervals on the basis of r i s k . 8'9
unavailability without cost considerations, l-a These A general algorithm has been suggested for solving
models are now expanded in Section 2 to include cost test intervals that minimize the system average
minimization together with a risk or unavailability unavailability (without cost consideration). 1°
criterion. It is also possible that a single component is Recently test interval calculation examples have
involved in two different periodic activities: testing been reported for more complex systems or plants
and maintenance. Optimization of two periods is not a using, as a criterion, an upper limit for the
trivial exercise, as is illustrated by an example in time-dependent risk (accident frequency)) ~'t2 Such
Section 4.1. calculations are believed to reduce the number of test
For systems with multiple components it is not episodes in some sense (at least compared to
generally possible to solve optimal test intervals in arbitrarily specified values), but they do not consider
analytical form, even if only the system unavailability the fact that different testing groups have different
is optimized without cost consideration. Although numbers of components involved and require different
analytical unavailability equations are available for amounts of resources.
23
24 J. K. Vaurio

It is also possible to study the risk impact of 2.1 Component model


changing the value of a specific test interval and use
the result as a basis for extending the interval. 13 If this The time-average unavailability of a general standby
is systematically applied to testing groups with high component with test interval T can be presented as
cost and low risk sensitivity, a near-optimal selection
A
of intervals might be obtained. However, some or all u = u ( T ) = p + -~+ z A Z (1)
intervals might need to be reduced to ensure that the
risk remains acceptable. It is the objective of this with z = ½ and parameters p, A and the failure rate h
paper to develop a systematic method to determine independent of T. (We assumethat u -< 0-1 for eqn (1)
exactly which intervals to increase or decrease, and by to hold, as is usually the case in practice.) Each of the
how much, to minimize the total cost. The procedure parameters can be a function of other parameters due
is presented in Sections 3 and 4. Accident costs are to different causes of unavailability. What is important
included in the formalism in Section 5. The procedure for the optimization approach is that any un-
is formulated so that it is easy to adopt in connection availability contribution can be presented as a term
with fault tree codes that identify and manipulate that is either (1) independent of T, (2) proportional to
terms (minimum cut sets) based on event identifica- T, or (3) inversely proportional to T.
tions rather than keeping track of the time- Analyzing published detailed component
dependencies of all terms explicitly. The procedure is m o d e l s 4'6'14"15 and failure modes as to their depen-
illustrated with several examples, including common dency on time parameters indicates that p can include
cause failures and human errors. contributions (terms) such as
The essential features in this paper are (1) explicit --probability of failure due to a demand (an initiating
minimization of cost instead of (or in addition to) event),
minimizing the risk or unavailability, (2) considering --probability of a human error of omission (failure to
the total plant-level cost and risk rather than return to service after a test),
individual components or systems, (3) inclusion of --repair time unavailability contributions of any
preventive maintenance intervals and costs in addition failures caused during standby,
to surveillance tests or inspections, and (4) in- --probability of failure during a mission time,
corporating both standby and normally operating --unavailability due to monitored failures (detected
components, human errors and common cause immediately).
failures. The outage time parameter A can consist of
The position taken in this paper is that the --mean downtime due to a test or maintenance
time-average risk is the most important safety carried out with intervals T,
criterion. Even if technical specifications of a plant --repair time unavailability contributions of any
may have action rules concerning simultaneous failures caused by tests.
failures or allowed repair times, all these are included Equation (1) can also be applied directly for a train
(or should be) in a correct risk model for the average of several components in series when the whole train
accident rate. is tested simultaneously with intervals T. In such a
case p and h are the s u m s of the corresponding
parameters of the individual components while A is
2 OPTIMIZATION AT COMPONENT LEVEL the downtime per test or maintenance for the whole
train.
One should notice that eqn (1) is valid also for
In this Section the optimization problem and solutions normally operating (rather than standby) components.
are formulated for a single component, or a train of For those the constant term p accounts for normal
components in series. A general unavailability model unavailability due to failure repair times r[tS=
is first defined in Section 2.1, as a function of a test or At/(1 + hr)] while A can be the mean outage time due
maintenance interval T. Finding the value of T that to maintenance carried out with intervals T. The last
minimizes the unavailability is explained in Section term z A T can account, at least approximately, for a
2.2. The costs associated with testing (or maintenance) failure rate increase between maintenances when z is
are included in the formalism and used as the properly defined. (A method to handle more general
optimization criterion in Section 2.3. Minimizing the time-dependent failure rates is outlined in Section
total cost is the primary objective while a risk or 4.1.)
unavailability limit sets an additional constraint for the
optimization. It will be seen throughout this paper 2.2 Unavailability optimization
that minimizing the cost generally leads to longer
characteristic intervals (while satisfying a risk limit) Assuming that risk optimization is equivalent to
than minimizing the risk alone. minimizing the time-average unavailability u, the
Optimization of test and maintenance intervals 25

optimal test interval can be solved by setting the over a long time t is Ct/T. It is appropriate to use the
derivative du/dT of eqn (1) equal to zero. This yields cost rate y(T) -- C / T as a function to be minimized.
t---,-,.- Assume that a maximum unavailability limit Um~x
7` = ~ / ~ . (2) has been specified by authorities or selected based on
the results of a risk assessment (e.g. using importance
However, one should not blindly use this result as the measures such as Risk Achievement). 7-9 Because y ( T )
only criterion for selecting T. With T = 7` the last two is a decreasing function of T, one wishes to find T as
terms of eqn (1) are equal, and the sum is rather large as possible within the limits of allowable risk.
insensitive to T within a factor of 2 from 7`. On this The optimal solution T = To is now the largest real
basis alone one is entitled to test less frequently than solution (if such exists) of the equation
exactly at intervals 7` without increasing the risk u(T) <- Ureas, (4)
unreasonably. There is another important considera-
tion. In case of A = 0 (or small A), eqn (2) yields small where the l.h.s, is according to eqn (1) (with z = ½).
7` (i.e. frequent testing) even when 1A7` could be The situation is illustrated in Fig. 1 where both y(T)
significantly smaller than p. In relative terms there is and u(T) are presented as functions of T. It shows
no benefit in selecting ½AT orders of magnitude that u(T) is rather flat around 7`. It is obvious that eqn
smaller than p. It should be quite acceptable to select (4) does not have a real solution if even the absolute
the value of T anywhere in the interval minimum of eqn (1) is lar~g_~ than U,,a~, i.e.
U m a t r < p + A T "~. If u,,~,>-p +V2AA, there are solu-
7`/2 < r < 27` + p/A. (3) tions for eqn (4), and the largest one is obtained with
For example, with p = 10 -3, A = 10-5/hr, A x 0.2 hr, exact equality u = u,,~x:
eqn (2) yields 7` = 200 hr. The unavailability at T = 7` To .~_ A-a{umax _ p .~ [(Umax __ p ) 2 -- 2Am]l/2}. (5)
is u = p + A T ` = 3 X 1 0 -3. At T = 2 7 ` + p / A = 5 0 0 h r
the unavailability is u = 3.9 × 10 -3, only 30% higher With the numerical values of A, p and A in the
than the absolute minimum. This is a small difference example of the previous section, eqn (4) has no real
compared to the uncertainty bounds usually assessed solution if u,,~x < 3 x 1 0 - 3 . With u,,~ = 10 -2, eqn (5)
for reliability parameters. yields To = 1777-5hr, significantly larger than 7` =
200 hr. The amount of testing work is reduced by an
2.3 Cost-based optimization order of magnitude by using To instead of 7".
It is interesting to notice the following characteris-
There are always some costs associated with testing tics of the solution (eqn (5)): (a) To - 7`, (b) To does
and maintenance. Selecting an interval that minimizes not depend on u,,,~ and p individually but only
the unavailability can lead to an expensive main- through the difference Um~x-p, and (c) To is
tenance program not warranted by risk considerations. independent of C. Thus, the result is insensitive to any
In this section a cost function is defined and uncertainties in the cost estimate.
minimized, leading to less frequent testing while the
unavailability still remains below a set limit.
Assume that the average cost associated with each 3 PLANT-LEVEL OPTIMIZATION
periodic test (or maintenance) is C. The total cost
In this section, the cost and risk functions are defined,
and the derivatives of both are developed in terms
9
that are suitable for the optimization procedure. The
main definitions and assumptions are the following:
(1) there are ! -> 1 characteristic time periods
T~,i= 1 , 2 , . . . , I at the plant or system under
E study. Each T~ is an interval of periodic tests or
< UNAVAK,ABII./~
maintenance actions for a certain group of
components, a timing group. The components of a
timing group are tested or maintained with the
same interval, but not necessarily simultaneously.

I
1
~:
2
i
3
n
4
a
5 6
i
IV n
7
ct3~r RATE -.

n
8
a
9
Thus, components of a timing group can belong to
the same train or to different trains of a redundant
system;
(2) any component can be a member of one or more
timing groups, e.g. one group (and period) for
TIME INIEKCAL,T (. 100hr)
testing and another group for maintenance;
Fig. 1. Unavailability and cost rate as functions of T. (3) the unavailability-state of each component is
26 J. K. Vaurio

modeled by one or more basic events in the logic The form of y is such that one hopes to maximize
plant risk model. Different basic events of one every T~to the extent allowed by a risk constraint. The
component can be associated with different timing problem in general terms is to solve the times T~ that
groups, e.g. some with testing, some with minimize the cost function
maintenance;
(4) multiple component unavailabilities due to re- ~ Ci=
Y = i=1 ~ rain!, under the condition F(T)-< R, (7)
peated human errors or common-cause failures
are also modeled as basic events in the plant
i.e. under the constraint that the risk function F(T)
(system) logic model;
does not exceed a set limit value R. F will be defined
(5) the basic events are mutually independent;
more specifically in Section 3.2. It is also assumed that
(6) the risk (accident rate) function F can be
non-negative upper and lower limits Yi+ and Y7 may
presented as a linear function of the initiating be given:
event frequencies and the basic event probabilities
(i.e. in the usual 'sum of products' form); Y? <_ T~<_ y [ , i = 1 , 2 . . . . . I. (8)
(7) the basic event probability (b.e.p.) uk of each of
This is called the admissible region of T. Some or all
the K basic events depends on no more than one
of the limits can be 0 or oo. Equation (1) indicates that
T~. Each uk (k = 1, 2 , . . . , K) can be independent
in practice Y;- should be larger than any A of
of any T/or proportional -T~ or - T ; -1 (according
components associated with timing group i(T~), and
to the general component model, eqn (1)).
Y,-+ should be smaller than any A-1 of those
It will be seen that these assumptions cause no severe
components.
limitations to current risk assessment practice. (In
Some basic features of the problem are as follows:
fact, assumption 7 can be relaxed by permitting Uk to
Case A: If F(Y ÷) - R, then the optimal solution is
be a sum of products of independent factors that each
T = Y+, because y cannot be made any smaller than
depend on some T~. This will be demonstrated in
the absolute minimum cost )1o = Y(Y+).
Section 4.1.)
Case B: If F(Y ÷) > R, the optimal solution T = To,
The times of interest, T~, are variables subject to
if one exists, satisfies F ( T o ) = R (i.e. exact equality
optimization. If any test or maintenance interval is a
holds rather than F < R).
fixed constant, predetermined by rules or criteria
Proof: If there is a solution T such that F(T) < R, it
other than optimization, it is not included in the set of
is always possible to increase any T/(thereby reducing
I variables as defined. The unavailabilities of the basic
cost) until T~= Y~ or until F(T) = R. Thus, at the end
events associated with such fixed intervals are
either F(To) = R, or all T~= Y,-+.
constants in the current problem, independent of any
Thus, in the case where F ( Y + ) > R , the problem
of the periods T/, i = 1, 2 . . . . . L Vector notation
can be reformulated
T = {Ti} is used to indicate any function of all Tgs.

Y = ;=1 T,. = min!, under the condition F(T) = R. (9)


3.1 Cost function
The problem is qualitatively similar to Fig. 1 when F
At the plant level the cost rate function to be (in place of 'unavailability') is considered as a function
optimized (minimized) is of any single T~, F has a limit R (in place of U,a~) and
Ci the cost rate is a decreasing function of Ti..However,
y(T) = multiple intervals cannot be solved as simply as a
,=1 T~' (6)
single interval in Section 2. Of cause, this problem
a sum of terms of which each is inversely proportional does not have a solution if the minimum of F is larger
to one T~. Each Ci is a measure of cost per action, than R.
carried out with intervals Ti.
There are several alternatives for selecting the cost 3.2 Risk function
constants Cv With C~ = Y, the period of plant
operation (e.g. Y = I yr), one is minimizing the total The risk function F(T) represents the frequency of
number of group tests or maintenances in time Y. accidents with specified consequences, e.g. the core
With C,. = Yni, where n~ is the number of component damage frequency of a nuclear power plant. The
trains belonging to timing group i, one is minimizing Boolean logic form of an accident can be presented as
the total number of test or maintenance episodes in
time Y. Finally, selecting Ci = Ynic~, where c~ is a A = E 4~. E M~, (10)
v Iz
measure of cost or workload per one train episode in
timing group i, one is minimizing the total cost in time where (Pv indicates the occurrence of an initiating
Y. event of type v, and Mvg is minimum cut set (MCS)
Optimization of test and maintenance intervals 27

relevant to event v, i.e. a set of unavailability states Uk ~ T~. One should notice that the absolute values of
(basic events) of safety components. Each M~,, is a Ci do not influence the optimization, only the relative
logic product (intersection) of individual basic event values (i.e. y can be multiplied by any constant >0).
Boolean variables Zk (k = 1, 2 . . . . , K). The optimization procedure to be described is
The algebraic probability of A per unit time is the relevant to cases I ~ 2 . In case I = 1 the condition
accident frequency, F(T). It can be presented in the F = R uniquely determines the single time interval T~
form if such exists [or T1 = Y~- if F(Y-~) <- R].

F = ~. f~G~, (11)
v 3.2.1 Derivatives of F ( T )
Definitions:
where f~ is the frequency of initiating event type v,
K [ = subset of basic event indices k (1 - k <- K) with
and the probability G~ is a sum of terms, products of
probability proportional to T~, i.e. Uk =ZkAkT~
basic event probabilities Uk=P(Zk), relevant to
with some constant Zk>O and failure rate
initiating event type v. Symbolically,
Ak > 0;
G~ = ~ d~Uk'" " Ur (12) K~- = subset of basic event indices k' (1 <- k' -< K) with
probability inversely proportional to T~, i.e.
U k, ~- A k , / T i with some constant Ak, > 0.
where d8 = + 1. In rare-event approximation all d~ = 1
Thus, Ki = K [ U K£ is the set of basic events (indices)
and each term (product) is a probability of an MCS.
that belong to timing group i.
In a complete exact expression there can be negative
The derivative of F with respect to a specific T, is
terms ( d ~ = - 1 ) due to intersections of MCS's
according to the Poincare's theorem. Most important
OF k~K OF OUk (15)
for the solution procedure is that F is linear in terms of
OTi ,aUk aTi '
any basic event probability Uk. We assume here that
c o m m o n cause failures (CCF) as well as human errors where the sum is over all terms of F containing at least
( H E ) and component failure modes are all modeled as one basic event with Uk =ZkAkTi (k e K~-) or Uk, =
basic events in the system logic models. Ak,/Ti (k' e KT). From these one obtains
According to the general component model eqn (1),
each b.e.p, uk can be independent of any ~ or OUk Uk
- for k E K,.+, (16)
proportional -2 T~or ~ T/-~. The same is true for CCF's 0T, T,
and HE's, as will be shown. Thus, every term (eqn
OU k, U k,
(12)) is proportional to some powers of some T/s, and -- - for k' e K~. (17)
some of the terms might be independent of any T~. It 0T, T,
is permissible but not necessary to delete from F all
Defining
terms that are independent of all T/s if one subtracts
the same amount from R. s~ = sum of terms in F containing a specific
From the linearity of F in terms of any Uk it follows
that
uk = z j t ~ T k , (k ~ K ? ) , (18)
OF ~ of terms containing Uk Sk' = sum of terms in F containing a specific
= (13)
OU k Uk Ak ,
Uk' Ti' (k E K;-), (19)
This is always positive in a coherent plant model in
which increasing failure probability always increases
and taking into account eqns (13), (15), (16) and (17)
risk.
one obtains
This fact has a Corollary: If there is no basic event
proportional to a particular T~, then F is a a~ s,
nonincreasing function of that Tj. One can select the OT~ T,' s, S;-S?,
optimal value Tj = Y~, and eliminate this component
of T from further consideration. s~ : ~ ~ (20)
k E K,+
Assume from now on that such T/s have been
eliminated from the problem and are not among the S~ = ~ Sk,.
times to be optimized. The optimization problem is k' EK~

then (unless F ( Y +) -< R) It is quite c o m m o n that many terms that include


Uk, = Ak'/T~ also contain some Uk = ZkAk T~, canceling
Y = i=~ -~ = min!, under the condition F ( T ) = R, (14) terms out from Si. One should notice that S [ can have
the same term more than once because several basic
and for every T/there is at least one basic event with event probabilities in one term can be proportional to
28 J. K. Vaurio

T,.. On the other hand, $7 normally does not have account as a constant term (as p in eqn (1)) for a
repeated terms because there is a rule not to test or single component. This is usually satisfactory also for
maintain redundant trains simultaneously (to avoid components in redundancy systems with staggered
making multiple components unavailable at the same testing schemes because the dependency between tests
time). Based on this same reason S~- is normally a of different trains is small. However, especially with
polynomial function of T,- of degree ->0, while S,-- can sequential testing, there is a higher probability to
also be such a polynomial but additionally have terms repeat an error in other trains if one is made in the
- T ~ -~. Consequently, Si can be negative for small first test of a testing cycle. If the probability of the first
values of T~. error is Y0 and Yn is the conditional probability of
Considering an unconstrained optimization problem (n + 1) th error, given n preceding errors, then a group
without any limits for the time variables T~, the of n components becomes unavailable with probability
problem defined by eqn (14) can be reformulated as P : ~/0, ~/1 . . . . . "Yn--1 due to human errors. 17 This is
usually larger than p ' = yg that results when only
C; + L[F(T) - R] = min!, (21) independent errors are modeled. Thus, the difference
i=l Ti has to be modeled as a common cause event impacting
where L is a Lagrange multiplier (constant). Setting a group of n trains to avoid underestimating the risk.
the derivatives of eqn (21) equal to zero yields the Essential for the optimization procedure here is that
conditions the human error multi-train unavailabilities, e.g.
p - p ' , do not depend on the value of T~. This is
- C--2+ LSi(T) = O, i = 1, 2, .. I. (22) because a set of errors, if it occurs, remains in the
T/ " ' system for the whole period T~ (or a certain fraction of
T~). Similarly, operator errors made during a transient
This shows that at the optimal point To the ratios (e.g. failure to operate a safety system) are typically
Ci/(T~S~) are equal and L is equal to this cost/risk system level basic events impacting multiple com-
sensitivity ratio. However, finite limits Yi-< T / - Y+ ponents, and independent of any T~.
can complicate the problem and make eqn (22) invalid Even if derivations in eqn (15) need not be done
at the point where eqn (14) is satisfied in the with respect to constant Uk'S, constant unavailability
admissible region. The general optimization procedure events are normally present in the terms added up in
will be presented in Section 4. (The idea of first eqns (18)-(20). Thus, it is necessary to model
solving the problem without limits for T/, and then repeated human errors properly. ~
setting those limits afterwards to Y~- or Y~- does not
generally yield the optimum solution.) 3.2.4 Modeling example
To illustrate the concepts introduced so far, consider a
3.2.2 Coefficients Zk and c o m m o n cause failures parallel 1-out-of-2:G system of identical components,
For a single component (or train) it has been pointed tested sequentially. The unavailability of this system
out (eqn (1)) that the coefficient zk = ½ gives the can be modeled by K -- 8 basic events ZI, . . . , Z8 with
correct time-average unavailability term for basic probabilities (unavailabilities) Ul = u2 = 70, u3 = u4 =
events with probability -T~ (k e KT). It has also been I/X/3A,/2T, us = u6 = A l L u7 = ~O'Yl - T2, u8 = 1A2/zT,
shown that different values of Zk are needed for basic respectively. Thus, K + = {3, 4, 8}, K - = {5, 6}. (Ak/, is
events in parallel redundancy systems, and the values the rate of failures of specific k components out of n
depend on the success criterion and on how the tests components.)
are staggered. 16 The same is true for common cause The Boolean risk function is
failures. In a common cause basic event probability,
Uk = ZkAk T, Ak is the rate of failure of a specific group m : (I~. ( Z 1 Z 2 -]'- Z l Z 4 -}- Z 1 Z 6 "J- Z 3 Z 2 "}- Z 3 Z 4 "{- Z 3 Z 6
o f multiple components. Table 5 of Ref. 16 gives the "~- Z 5 Z 2 q- Z s Z 4 --~ Z 7 q- Z 8 ) .
factors zk between 0.125 and 0.669, for m - o u t - o f - n : G
systems (1 --- m -< n -< 4) for three testing schemes: (1) (Notice that ZsZ6 is missing because simultaneous
sequential testing, (2) staggered testing, and (3) maintenance is not done.) In rare-event approxima-
staggered testing with extra testing of other trains tion F is obtained from A by replacing cb with the
whenever a failure is observed in one train. Values are initiating event frequency f and each Z~ by Uk. The
given in Ref. 16 for single failures as well as for sums of terms needed in the derivative equations are
common cause failures of any multiplicity. -- 3u2 + fu3u4 + fu u6,
S4- : f U l U 4 + f u 3 u 4 + f U s U 4 ,
3.2.3 H u m a n errors
It was pointed out in Section 2.1 that the probability ---fus, s; =fusu2 +fusu.
of a human error of omission (failure to return a and
component to service after a test) can be taken into s6 = fulu6 + fu3u6.
Optimization o f test and maintenance intervals 29

Finally, the derivative is In case of a general linear risk function


1
dF S F = ~., a,T~ (26)
dT T i=1

with the optimal times T~= T,.0 are proportional to R


according to
S =f(u3u2 + u3u4 -'}-//1//4 q - / / 3 / / 4 -}- u 8 - u5//2 -//in6) •

For this simple system the result can be verified a, Ti.,) = R ~ ~ . (27)
directly.
With staggered testing the only differences are that In the case I = 2 this yields
Z 7 can be deleted (y~ = Y 0 , U 7 = 0 ) , U 3 = U 4 ~--- R Rx
V~2&4A1/2T and us = A2/2T/4. Thus, the risk is lower Tl,o a l ( l + x ) ' Tz° a2(l+x) (28)
with staggered testing than with consecutive testing.
The method may seem a little cumbersome in this where
simple problem. However, computer codes that solve = (a2C2~ 1/2
large fault trees for total plant risk models usually x \a----~] "
manipulate minimum cut sets and basic events rather
than cut set term dependencies on T~ directly. The Example: At a certain plant the current test
method is easy to adopt for use in connection with intervals for both ECCS and EPSS are four weeks,
such fault tree codes. Events belonging to different and the acceptable risk is F = R = 10-S/yr. The risk
timing groups (K~ and K/- for i = 1 , . . . , I) need to be due to L O C A is 90% leaving 10% for LOSP. The
identified so that a code can list and add up terms that unavailability of both systems is dominated by
contain specified events. common cause failures, which justifies a linear risk
model (for dominating terms).
3.3 Multiple initiating events This means that R/al = 746.7 hr and R/a2 = 6720 hr.
The test related costs of EPSS are nine times as high
as those for ECCS (due to related fuel and
A few simple examples with unconstrained times are
maintenance), i.e. C2/CI =9. Equations (28) yield
analyzed in this section to demonstrate the method.
new optimal test intervals T~ = 373 hr and T2 = 3360 hr,
Consider a risk function of two time variables T~
without increasing risk. The cost reduction due to
and T2 that are associated with different systems and
changing from four week intervals to the new values is
initiating events. For example, T1 can be a test interval
64%.
for an emergency core cooling system (ECCS) that
It is also possible that a dominating family of
responds to loss of coolant accidents (LOCA) while T2
initiating events is protected against by two redundant
is a test interval for an emergency power supply
systems, with different test intervals, /]1 and T2. The
system (EPSS) responding to loss of offsite power
risk function F = aT1 T2 leads to optimum test intervals
(LOSP) events. In the simplest case the risk function
can be presented as __ ( RC, ? '2 ( RC2? ~
and r2=
F = aT7 + bT'~. (23)
In all these examples the intervals increase with an
Here a and b can depend on other test and increasing risk limit and depend on the cost ratios
maintenance intervals as well as initiating event rather than the absolute values of C1 and C2. This
frequences but can be considered constants in this makes the optimization rather insensitive to systema-
optimization problem. (Also terms that are indepen- tic errors in the cost estimates.
dent of both T~ and T2 have been subtracted from F In all cases it is rather straightforward to evaluate
before optimization.) The integers n -> 1, m >- 1 how sensitive the risks and the costs are to possible
depend on whether common cause failures or other deviations of T~ from the optimum values. Doubling
multiple failures dominate the risk. In this case the intervals (or R) with the risk eqn (26) would
optimization of the cost function y = CI/T~ + C2T2 by double the risk and reduce the cost by 50%. In case of
eqn (22) leads to optimal intervals F = aT1 T2 the risk is quadrupled by doubling both T~
( C 1 ) 1/n+1 ( C2 ~ l/m+l and T2.
7"1 = \ ~ a , I , T2 = \ m L b ] ' (24)

where L has to satisfy the equation 4 OPTIMIZATION PROCEDURE


{ C 1 ~ n/n+' ( C 2 ~ m/m+1 The starting point of general optimization is eqn (14)
R = a ~-~a,] + b\~--L~ ! . (25)
(because the case F < R is relevant only when
30 J. K. Vaurio

F(Y ÷) < R, in which case T = Y+ is the solution). An The cost rate function to be minimized is
iterative procedure can be obtained by considering the
risk and cost differentials from eqns (20) and (6) C1 C2 (35)
respectively: Y = -~-1+ --~2,

where C1 and Ca are the costs associated with testing


dF = +
z.., -OFdT,
- i = ~ S,(T) "~-/
dT/ , (29) and maintenance, respectively (I = 2).
i=l OTi i=1
The form of F is such that T2 can be increased
Ci d without limit because both risk and cost can be
dy = - 2., (30) reduced by increasing T2. A selected upper limit Y~-
,=aT/ T,
for T2 together with a condition F - - R dictates a
If the condition F = R is not satisfied with the initial unique value for T1 without any further use of the cost
values of T, the first task is to use eqn (29) to make function. Minimizing the risk alone would lead to
effective changes AT~ until the level F = R is reached. T2 = Y~- and T~ = Yi-.
After that one makes combinations of changes so that A more genuine optimization problem can be
AF = 0 while the cost is reduced, i.e. Ay < 0. In this formulated by taking into account that the failure rate
part the idea is to increase T~for i such that the partial A increases in time until a maintenance is performed.
cost~risk-sensitivity ratio Assuming a linear growth rate 2e for h, the average
failure rate over a maintenance interval is
Ci X = ho(1 + eT2), (36)
(31)
r, - T/Si(T)
where ho is the failure rate immediately after
is large compared to the total cost~risk sensitivity ratio. maintenance. The risk function now becomes
F =fU 1 +fUlU 2 + f U 3 (37)
r= ~ C i / ~ Si(T), (32)
as a sum of products, where ua = ~hoTa, uz = eT2 and
i=1 T / / / = ~
u3=h/T2. With numerical values f = 0 . 0 1 / y r , e =
and reduce Tj if r~ < r in such a way that F remains 10-4/hr, A = 2 0 h r , R = 1 0 - 4 / y r , C a = 1 0 0 0 $ and
constant, i.e. C2 = 4000 $, the solution is presented in Appendix 2
following the procedure presented in Appendix 1, step
Sj A--~T= -Si AT/ (33) by step. (The units of the variables are given in the
rj 7;/ early steps of the procedure.) The result is T~ = 785 hr,
T2 = 10 734 hr, and the minimum cost is y = 1-646 S/hr.
(at least two values are changed at a time).
(The same T~ and T2 are obtained whenever
A step by step procedure is presented in Appendix
1. The total risk function F is assumed to be available,
C2/Ca = 4, independent of the absolute cost values.)
It is interesting to notice that in this case u2 is not
typically stored in a computer file in the form of terms
really a probability but a factor that can have values
(sum of products) and associated basic events. Usually
larger than unity. The solution process works because
F is known numerically at least in one admissible
the problem is mathematically in a proper form.
point with initial values T.
The risk (eqn (37)) as a function of T2 is
qualitatively similar to the operational unavailability
4.1 A maintenance example obtainable by other models (e.g., Fig. 3 of Ref. 19).
However, the methodology is not limited to the linear
form of eqn (36), used here only as an example. It is
To illustrate the solution procedure consider a simple possible to generalize to a polynomial of any degree q
risk function (or a Taylor series) as an ageing model betw6en
maintenance events,
F = f 1AT1 + ~ , (34)

where
A=Ao(l+~,iti), i=1
0--<t < T2.
f = the initiating event rate (demand rate),
The first part of the risk function, flXT, where ,( is
T~ = interval of periodic tests or inspections,
the average of h over T2 can be written as
T2 = interval of periodic maintenance,
A = average duration of a maintenance outage, fU O +fUoU 1 +fUoUlU 2 + ' ' " +fUoUl " " " Uq,
;t = standby failure rate of a safety system.
(One can assume that any terms that are independent where
of both T~ and T2 have been deleted from F and VE v
Uo = ~ X o r , , ua = ~E1T2, u~ - T2
subtracted from the risk criterion R.) (V + 1)ev-1
Optimization of test and maintenance intervals 31

for v = 2 . . . . . q. Thus, the risk function is in the form independent of T~, and the economic optimization
of a sum of products of factors that are linear in T~ leads to T~= Y+).
and T2. This is directly suitable for the optimization Consider now F as a function of all T,'s. If a
procedure. minimum point of F at To has been found such that
Si = 0 for all i with finite T~o, F increases monotonically
in every direction from To. There can be no other
4.2 Existence and uniqueness of a solution
local minima. This proves the uniqueness of the
solution, at least in the case of a rare-event
It is quite possible, of course, that the minimum of F is approximation.
larger than R, in which case there is no solution. It is
of interest to know that the minimum of F that may be
4.3 Repair costs
reached (in step 5 of the optimization procedure,
Appendix 1) is a unique global minimum. This can be
proved, at least under the following assumption: In this section we consider how to take into account
F is presented as a rare event approximation i.e. repair costs associated with any component j
as a sum of non-negative terms, each consisting of belonging to test timing group i.
an initiating event frequency multiplied by a The probability of failure per one test interval is
minimum cutset probability (product of a set of aj + AjTg, where aj is the failure probability due to a
unavailabilities uk). demand (test) and Aj is the standby failure rate for
Consider F as a function of any one particular T~. F component j. Thus, the repair cost per unit time is
consists of three mutually exclusive sets of terms: cej
GI- = set of terms in which the number of basic events Yrj = -~ b,j + AjbAj. (39)
ti
k ~ K~ is larger than the number of basic events
k' ~ K f ; where b,j and baj are the mean cost of repair for
G ° = set of terms in which the number of basic events failures of type a and A, respectively. Note that the
k ~ K i is equal to the number of basic events second term is independent of T~ and, therefore, does
k eKf; not have an impact on the optimization of Ti. If the
G f = set of terms in which the number of basic events sum ~,jYrj is added to y in eqn (9) as an additional
k e K + is smaller than the number of basic cost, the form of the optimization problem does not
events k' e K~-. change. The only change needed is to replace Ci with
Thus, every term in GT- is proportional to - T 7 with Ci + ~,j ajb~j, the sum over components belonging to
an integer n ~ 1, every term G ° is independent of T,, the timing group i. The general impact of the
and every term in G~ is proportional to - T ~ - " , with additional cost term is to increase the optimum
an integer m -> 1. Consider now the definitions of S +, interval T,. (Thus neglecting the repair cost due to a 1
S~- and S~ (eqn (20)). Every term of G ° that is in S~- is could lead to more frequent nonoptimal testing.)
equally repeated in S + so that the difference
S~ = S + - S~ does not contain any terms of G ° (terms 4.4 Outage costs
independent of T~). Both S,.+ and Sj- can contain terms
from both sets G + and GT, but in the difference, S~,
Failures of standby safety systems can cause plant
equal terms cancel out, leaving positive terms - T 7
outage and loss of production in two ways due to
and negative terms - T f m. Thus, S~ can be written as
Technical Specifications:
(1) Multiple or common cause failures forcing plant
s,= P;(r,)- (38) shutdown;
(2) Repair time exceeding the allowed outage time
where P+ and P~- are polynomials of degree ->1 with ( A O T ) of a component.
non-negative coefficients and P+ (o) = P;- (o) = 0 Assuming that common cause failures cause most of
(lim P~ = 0 for T~--+ 2). the losses of the first kind, as is usually the case, the
If P / - - 0, Si is increasing monotonically (even all cost per unit time is ~,k h~r~Cp, where the sum is over
derivatives are non-negative) as a function of T~--- 0. If common cause failures with rates Ak and repair times
P+ =- 0, Si is negative and approaches zero at T~--+ ~. If rk, and Cp is the net value production per unit time.
both P+ and P;- have nonzero terms, Si is negative at Because this term is independent of any T,, it does not
small values of T~, but monotonically increasing, and influence the optimization on the cost function side (y
positive for large values of T/. Thus, there is a single in eqn (9)). C o m m o n cause failures are already part of
point at T~> 0 where Si = 0 and F has a minimum the risk function F(T) and certainly influence
value. Then F has a unique smallest value at some optimization, but the cost of outage they cause does
point Ti->0 (Y~--< T~<-YI-). Only in the special case not influence optimization.
P ~ - - P T = - O the minimum is not unique (F is Partially similar conclusions can be drawn about
32 J. K. Vaurio

losses of the second kind. Usually a small fraction Yk Ca, at the optimal point, (c) only the relative costs
of component repair times exceeds AOT. Failures Ci/CA influence the optimization, and (d) the terms of
during standby can cause outage cost rate F that are independent of any T~ do not influence the
~,k ykAk rkCp, the sum over all components belonging optimization.
to the timing group i. This part is again independent of Examples:
any T~and irrelevant for the optimization procedure. E.1 With a risk function F = ~]i=l
1 aiTin, eqns (41) yield
Failures due to a demand (and exceeding AOT) the optimal intervals
cause a cost rate term
{ C, "]''i +'
~ ",'kakrk Cp. Ti = \niaiCA l , i=1,2 ..... L
T,
The optimal cost rate can be shown to be
When this is included in the cost function, the impact
is simply to replace Ci with Ci ~'~k~K, ykakrkCp y=~ni+l Ci
(increasing the optimal T~ somewhat). The product i=l ni Ti
Ykak is often So small that the additional term can be E.2 The linear multiple initiating event example (eqn
deleted. (26)) leads to Ti = (Ci/aiCA) 1/2, and the risk rate
at the optimal point is F = ~[=1 (aiCi/CA) la. A
numerical example with I -- 2, al = 1.5288 ×
5 T O T A L COST O P T I M I Z A T I O N 10-12/hr 2 (LOCA), a2 = 1.6987 × 10-13/hr z
(LOSP), C2/C~ =9, CA/C~= 106 yields T~=
In this section the accident cost is included in the cost 808.77 hr, Tz = 7278.8 hr and F = 2-166 × 10-S/yr.
function. The first task in Section 5.1 is to find times T~ E.3 The product from F = aT1 T2 with eqn (41) yields
that yield the absolute minimum total cost rate. Only T 1 = ( C 2 / a C 2 C A ) 113, T2 = ( C 2 / a f l f A ) 1/3. The risk
if the risk at that point exceeds a set criterion R, is it and the cost at the optimum point are
necessary to continue with the constrained optimiza- F = (aC1C2/CA) 1/3 and y = 3CAF, respectively.
tion described in Section 5.2. E.4 The formalism can be used also for optimizing
loss-of-profit risk in alternating production
5.1 Unconstrained optimization systems, not only accident risks with standby
systems. Consider a two-train production system,
one train operating and one on standby. The
The problem is to solve T = To such that the total cost
roles of the trains are altered with intervals T.
rate is minimized:
The standby train is maintained with cost C and
duration A. The rate of production loss events is
y = CAF(T) = min! (40)
i=l Ti
where CA is the cost of an accident.
The functional form of y shows that it is large and where f is the failure rate of the operating train,
decreasing function of any T/at small values of T,- and and 0 and A are parameters for the train on
increasing for large values of T/if there is at least one standby. The 'accident' cost can be defined as
MCS with probability -T~' with n > 0. (If no such CA = TP, where r is the average repair time and P
term exist, both the risk and the cost are minimized by is the profit rate for the system in operation.
selecting T/ as large as possible; such T,. can be held Equation (41) yields optimum T,
constant at Y7 and removed from further optimiza-
tion.) Thus, there is a minimum of y in the region To [2(A C "- 7~a
Ti>0, i = 1, 2 . . . . . L
Equation (40) yields the conditions
(This is the solution also if f is an initiating event
rate and there is only one component or train in a
C._.ji_ CASi(T ) = 0; i = 1. . . . I. (41)
Ti safety system, with parameters p, A, A).
In all these examples higher values of CA lead to
Comparison with eqns (22) shows that the problem is higher optimal cost, shorter intervals and lower risk.
mathematically similar to the risk-limited optimization Analytical solution of eqns (41) is not always
except that the Lagrange multiplier is now replaced feasible. An iterative method can be based on the
with a known quantity CA. One can conclude from differential
eqn (41) that (a) every Si is positive (i = 1. . . . . I) at
the optimal point To (because CA T~/Ci > 0 ) , (b) all
cost/risk sensitivity ratios ri = C,/(TiSi) are equal, to i=1 Ti '~i" (42)
Optimization of test and maintenance intervals 33

Effective cost reduction is accomplished by optimization eqn (22), one observes that definition
increasing T~ when the coefficient -CJT~+CASi is L = L + CA creates a complete equivalence between
most negative, or decreasing T~ when the coefficient is the two. This yields the interpretation that L in eqn
most positive. Throughout this process one has to (22) is the 'effective' accident cost. Thus, the
observe the limits Yi- <- T~<<-Y+, of cause. In the end optimization procedure similar to Appendix 1 can be
r~~- CA for all i such that Y~- < T/< Y~-. applied (starting now from step 5). It leads to
After the minimum of y has been found, one has to r = CA + L and to the minimum of y = ~=~ Ci/T~ +
verify that F(To) <- R. If F(To) > R, one can try CAF if the minimum of F is not larger than R. If the
changing any consecutive testing schemes to staggered minimum of F is larger than R, the process ends up
testing, because this usually reduces risk, and try with Si ~ 0 for all i (ri--->~), in which case there is no
optimization again. If still F(To) > R, one has to find a acceptable solution.
new point T where y is the smallest under "the Analytical solutions for several simple yet typical
constraint F<--R. Figure 2 illustrates the general risk functions are given in Table 1.
behaviour of the cost and risk functions in terms of a In case of example E.2 of Section 5.1 one can
particular T~. In a normal case F(To)<-R, and the conclude that the risk limit R =10-5/yr can be
solution is found. satisfied if CA is replaced with CA + L which is
If F ( T o ) > R, one has to reduce T~ until the risk (2.166) 2 = 4"69 times the original CA. The times 7"1 and
limit is satisfied. This is explained in the next section. T2 are reduced to 373 hr and 3360 hr, respectively, and
the total cost is increased by a factor of 2.166.
5.2 Risk-constrained opimization In all examples E.1-E.3 risk can be reduced to any
value R > 0 by controlling the value of CA + L in
If the absolute minimization of y yields a risk rate that place of CA. ThUS, one can call/~ the 'virtual' accident
is larger than R, eqn (40) has to be replaced with cost and CA + L the 'effective' total accident cost
dictated by a risk criterion, R. The maintenance
x= CAF + L ( F - R) = min! (43) example of Appendix 2 yielded cost/risk sensitivity
i=1 T/ ratios r~---r= 137× 1065, which is the effective
where L > 0 is a Lagrange multiplier. Equation (41) is accident cost dictated by R = 10-4/yr.
then replaced by In example E.4 there is a limit to which F can be
reduced by increasing CA. The limit is the absolute
C~-(CA+£)S,=0, i=1,2,.. I. (44) minimum of F.
T/ "'

At a point where eqn (44) is satisfied, dy = - L dF.


Thus, it is not possible to reduce the cost rate y any 6 CONCLUSION
further by reducing F below R.
When eqn (44) is compared with the earlier A systematic procedure has been developed for
optimizing test and maintenance intervals by minimiz-
ing the total testing and maintenance costs while
satisfying a risk criterion. The measure of risk is the
time-average accident frequency. Component failures,
common cause failures and human errors are included
and modeled by basic events, the probabilities of
which are simple functions of test and maintenance
intervals. The procedure is formulated in terms of
minimum cut sets (or terms) and basic events, subjects
...... Y"~" "RISK
that are easily manipulated by computer codes solving
fault trees. Analytical solutions have been obtained
for several special risk models, illustrating how

~F(T) different factors influence the optimization. Essential


features of the methodology include combined
consideration of risk and cost, inclusion of main-
tenance effects in addition to test periods, and
versatility in handling operating and standby com-
ponents and various failures modes. The impacts of
% T° INTgRVAL T,
repair costs and outage times have also been
Fig. 2. Optimum intervals T~: (1) To is the solution when considered. Examples have shown that significant
R>F(To), (2) TB is the solution, F(TB)=R, when economical benefits can be obtained by optimizing test
Fmi,<-R < F(To), (3) there is no solution if R < Fmi,. and maintenance intervals. It has been shown that
34 J. K. Vaurio

Table 1. Optimal time intervals T. minimum costs and Lagrange multipliers (effective accident costs)

Risk function Cost function


F(T) Y = i=1 -i + CAF = min! Y = ,=1 ~ = mini, with F -< R

aT~ T2 111= [C~/(aCzCa)]I'~ T, = [RC,/(aC2)] '~


7"2= [C2/(aC, CA)] T2: [RC2/(aC,)] 'n
y = 3CAF L = (aCl C2)~r2/R 3~2

aiTi Ti = \ ~ 1 t l 2
i=1

_(c,) ''2
y = 2CAF T~ - \ff--~L/

' ( c i ) l / n , +l ( C i ~ llni+'
Z a , T: '
i=1
T~=\nia---~m ] , T~=\n,a----~] ,

y= ~ ni+l Ci
where L is a solution of
i=1 ni Ti
{ Ci ~ni/n`+l
~a i -- =g
,=, niaiL)
f(,, + ' AT a T= [2(a + 11' , r = + lX2,,/,w
R -fp>
when R >- F(T). when d = ~ - 1,
If R < F ( T ) , use next C
column. L=
2fA(d 2 - 1 + d ~ - 1)
if d < 1, there is no solution [R < F(i")].

realistic optimization of maintenance is possible if the (5) Increasing the accident cost estimate leads to a
risk model includes beneficial effects of maintenance higher total cost, shorter intervals and a lower
in addition to any unavailability caused by main- accident rate. Thus, the accident cost parameter
tenance. Including the cost of an accident in the can be used to control the risk.
formalism does not essentially complicate the (6) The Lagrange multiplier of an optimization
problem. In many cases it can make the solution problem has several interesting interpretations: it
analytically easier. In summary, the following general is the effective accident cost as well as a common
conclusions can be drawn from this study: cost/risk sensitivity ratio (in case the accident cost
(1) The optimal test and maintenance intervals has not been specified in advance, or if minimizing
depend only on the relative cost values rather the total cost leads to a risk higher than the limit).
than the absolute values. This makes a solution It is possible that the owner of a plant and the
rather insensitive to systematic errors in the cost authorities ('society') have different ideas about the
estimates. cost of an accident. Thus, they can come to different
(2) The intervals based on minimum cost are conclusions about acceptable test and maintenance
generally larger than those based on minimum intervals. Specifying a risk limit instead of an accident
risk. cost allows proper freedom for the interval
(3) Overestimating the cost of an individual test or optimization.
maintenance action leads to an extended interval
for that particular action (while reducing the
intervals for other actions, in some cases). REFERENCES
However, a specified risk limit always guarantees
that unacceptably long intervals are avoided. 1. Jacobs, I. M., Reliability of engineered safety features as
(4) The minima of the risk and cost functions are a function of testing frequency. Nuclear Safety, 9 (1968)
often rather fiat. Therefore the consequences of 303-312.
small errors are not severe. Furthermore, 2. Hirsch, H. M., Setting test intervals and allowable
bypass times as a function of protection system goals.
sensitivities to small errors can be easily evaluated I E E E Trans. Nuclear Science, N-18 (1971) 488-494.
in case analytical solutions are available (e.g., 3. Signoret, J. P., Availability of a periodically tested
Table 1). standby system, N U R E G / T R - O 0 2 7 1976.
Optimization o f test and maintenance intervals 35

4. Vaurio, J. K., Comments on system availability analysis not fixed based on considerations other than
and optimal test intervals. Nucl. Engng Design, 128 optimization). Determine also the limits of
(1991) 401-402. variation Y [ <- T~ <- Y;-, i = 1, 2 , . . . , I*. Values
5. Vaurio, J. K. & Sciaudone, D., Unavailability modeling
and analyis of redundant safety systems. ANL-79-87, within this interval are called admissible. (For
Argonne National Laboratory 1979. example, I " / c a n be 1 year for testing, 4 years for
6. Vesely, W. E. & Goldberg, F. F., FRANTIC--A major maintenance overhaul, etc.).
computer code for time dependent unavailability B. Select risk criterion R.
analysis. NUREG-O193 1977. Special case: calculate risk at T = Y+ or otherwise
7. IAEA-TECDOC-737, Advances in reliability analysis
and probabilistic safety assessment for nuclear power verify that F ( Y +) > R. If F(Y ÷) -< R, the optimal
reactors. Technical Committee Report, Budapest, 7-11 solution is T = Y+.
September 1992, International Atomic Energy Agency C. Identify basic events (indices k) that belong to
(1994). ": each subset KF (i.e. the unavailability Uk ~ T~),
8. Kim, I. S., Martorell, S., Vesely, W. E. & Samanta, P. and to each Ki-(Uk ~ T71), for all timing groups
K., Quantitative evaluation of surveillance test intervals
including test-caused risks. NUREG/CR-5775, BNL- i ( i = 1, 2 , . . . , I*).
NUREG-52296 1992. D. If there is no basic event with probability
9. I~epin, M., Ko~uh, M. & Mavko, B., Risk impacts proportional to a particular Tj (K 7 is empty), set
associated with surveillance tests. In Proc. 4th this Tj to the maximum value (Tj= I"7) per-
TUV-Workshop on Living PSA Application, Hamburg, manently, and eliminate Tj from further considera-
2-3 May 1994, Technischer Uberwachungs-Verein Nord
e.V. 1994. tion. Calculate new F(T) if any Tj was changed.
10. Uryagev, S., On optimization of test strategies. In Proc. (This step does not increase risk or cost but can
PSA'93, Clearwater Beach, January 26-29, 1993, reduce both because Sj <-0 in the whole admissible
American Nuclear Society 1993. range no matter what values T / > 0 the other
11. Sandstedt, J., Demonstration case studies on living PSA.
In Proc. PSA'93, Clearwater Beach, January 26-29, variables have. The number of remaining timing
1993, American Nuclear Society 1993. groups is I <- I*. Determine the cost parameters C~
12. Robertson, M. & Byrne, J., Converting a static PSA for for i = l , 2 , . . . , L
the Dounreay prototype fast reactor into a living PSA. E. It is permissible, but not necessary, to delete from
In Proc. PSA/PRA and Severe Accidents '94, 17-20 the risk rate F any terms that are independent of
April 1994, Ljubljana, Nuclear Society of Slovenia 1994.
13. Levinson, S. H. & Enzinna, R. S., ESFAS reliability all free variables Tj. R has to be reduced by the
analysis to justify test interval extension. In Proc. same amount. (e.g. terms that do not contain any
PSAM--II, San Diego, March 20-25, 1994, Society for basic events in any K/+ or K~ can be subtracted.)
Risk Analysis (1994). Optimization procedure
14. Papazoglou, I. A., Bail, R. A., Buslik, A. J., Hall, R. E., 1. Calculate all Si(T), i = 1, 2, . . . , I (eqn (20)). If any
Ilberg, D., Samanta, P. K., Teichmann, T., Youngblood,
R. W., E1-Bassioni, A., Fragola, J., Lofgren, E. & S / - 0, increase the corresponding T~ until S/> 0 or
Vesely, W., Probabilistic safety analysis procedures until Ti = Y~+.
guide. NUREG/CR-2815, Section 5.6.3, U.S. Nuclear 2. Determine F = F ( T ) , cost rates Ci/T~ and y(T)
Regulatory Commissions (1984). (eqn (6)), sensitivities S~(T) (eqn (20)), cost/risk
15. Vaurio, J. K., Developments in reliability data collection sensitivity ratios r/ (eqn (31)) and r (eqn (32)). If
and analysis. In Proc. IFAC Symp. SAFEPROCESS 94,
Espoo, Finland, 13-15 June, 1994. any S / < 0 while T~< Y?, repeat from step 1 until
16. Vaurio, J. K., The theory and quantification of common no T~ is changed. At the end of this step Tj = Y7
cause shock events for redundant standby systems. for any j such that Si(T ) -< 0.
Reliab. Engng & System Safety, 43 (1994) 289-305. 3. If F = R, go to step 6.
17. Apostolakis, G. E. & Bansal, P. P., Effect of human 4. If F < R (while F(Y ÷) > R): Identify imax such
error on the availability of periodically inspected
redundant systems. IEEE Trans. Rel~ab. R-26 (1977) that r / , ~ = largest postiive ri for which T~<
220-225. Y;- (i = 1, 2 , . . . , I). (There is at least one positive
18. Swain, A. D. & Guttmann, H. E., Handbook of human r/ unless all S i - 0 and T = Y+ is the solution.)
reliability analysis with emphasis on nuclear power plant Increase Tim~ a reasonable increment AT/ma~ such
applications. NUREG/CR-1278, U.S. Nuclear Regula- that
tory Commission 1983.
19. Vesely, W. E., Quantifying maintenance effects on AT/,,ax f R - F Y 7 - T/max,
unavailability and risk using Markov Modeling. Reliab. 0< <-min,---, ~}.
Engng & System Safety, 41 (1993) 177-187. Timax [ Simax Tiim7
(AI.1)

A P P E N D I X 1: I T E R A T I V E P R O C E D U R E FOR
This assures that F approaches R without T/max
exceeding Yimax"
+
SOLVING OPTIMAL INTERVALS
Go to step 2.
Setting up the problem: 5. If F > R :
A. Select I* timing groups i (i = 1, 2 , . . . , I*) for Verify: If all T/s are either = Y7 or Y~, or if the
which the times T~ are subject to optimization (i.e. ratios S J F for all positive Si's (T~ < Y~) are small,
36 J. K. Vaurio

e.g. <0.01, then the risk cannot be essentially A P P E N D I X 2: S T E P B Y S T E P S O L U T I O N OF


reduced (see eqn (29)), and solution does not THE MAINTENANCE PROBLEM
exist. Otherwise:
Identify imin such that that rim~ = smallest positive A . Y ; = 0, Y? = 2000 hr, Y2 = 2000 hr, Y~ =
r~ for which T~> II,7 (i = 1, 2 , . . . , I). Reduce T/-,,i~. 20 000 hr.
The absolute value of a change should be limited, B. F ( Y ÷) = 3.1 × 10-4/yr > R.
0< AT/ml, C. K~- = {1}, K~- = { } (empty), K~ = {2}, K~ = {3}.
Timin D. C1 = 10005, C2 = 4 0 0 0 5 .
1. S~-=f-//1 q-f//lU2, S1 = 0, 52-~--fulu2, 52 = f u 3 ,
<-minCE-R, Timi"-Y~i" ~} s, =fUl + fu,u2 =f aorl(1 +
t Sim, , . (A1.2)

G o to step 2.
S2= f u l u 2 - fu3= f(½AoTleT2 - ~ ) .
. A t this point one has F ~ R (unless one has exited
the procedure because T = Y ÷ or because the Initial values: T~ = 2000 hr, T2 = 20 000 hr (T =
minimum of F has been reached with R < F). If Y+).
there are less than two values of T/ satisfying 2. F = 3.1 x 10-4/yr, $1 = 3 x 1 0 - 4 / y r (:>0), $2 =
Y;- < T / < Y;~, no further optimization is possible: 1.9 X 10-4/yr (>0), y = 0.7 $/hr, r~ =
the optimum vector T has been found. If there are 1666.7 $. yr/hr, r2 = 1052.6 $. yr/hr.
at least two values T/ with Y;- < Ti < Y+ (with 5. F > R, r2 < r l - + i m i n = 2, - A T 2 / T 2 = ~ , T 2 = 1 0 0 0 0
corresponding S; > 0), continue from step 7. = 2000).
. Definite indices imax and imin such that 2. F = 2 . 2 × 1 0 -4 , S ~ = 2 × 1 0 -4 (>0), S z = 0 . 8 x 1 0 -4
r~max=largest positive ri for which Ti<Y~-, (>0), y = 0.9, ra = 2500.0, r2 = 5000.0.
r~=i~=smallest positive r~ for which T->Y;-, 5. r l < r 2 - - + i m i n = l , - A T I / T I = ½ - + T I = 1 0 0 0 (T2 =
i=1,2 .... ,L 10 000).
. If ( r ~ , , ~ - r i , , ~ ) / r is small, e.g. <0-02, one has 2. F = 1.2 × 10 -4, $1 = 1 x 10 -4 (>0), $2 = 0.3 × 10 -4
reached the optimal solution T and the optimal (>0), y = 1-4, rl = 10 000, r2 = 13 333.
cost y = ~ = 1 C~/T~. Otherwise, go to step 9. 5. r~<rz--->irnin=l, -AT~/T~=0.2--->T~=800 (T2 =
. The idea now is to increase T/m~ (to reduce cost 100oo).
most effectively) and reduce T/mi~ (to increase cost 2. F = 1 × 10 -4, 51 = 0.8 × 10 -4 ( > 0 ) , 5 2 -- 0.2 x 10 -4
least effectively) so that the risk does not change (>0), y = 1.65, rl = 15 625, r2 = 20 000, r = 16 500.
as a net result, AF ~-0. This can be accomplished 3. --+6.--+7. r~ < r2--+ imax = 2, imin = l.
by selecting changes AT/max and AT,,,,, that satisfy 8. (r2 - rO/r = 0-255 (not small).
eqn (33). A practical suggestion is to calculate new 9. New T2 = 12 121, new T1 = 757.58.
values 2. F = 1.0029 × 10 -4, $1 = 0.83792 × 10 -4, S2 =
r
Tim~. . . . = mm~[ ~ / 7
1
T/,,~, Y+ } (A1.3)
0.29413 × 10 -4, y - 1.65, rl = 15 753, r2 = 11 219,
r = 14 575.
5. r: < rl < imin = 2, - A T2/ T2 = O.OO994---> T2 =
T/minnew= max{ Timin[ -STrain( Timax'newT/'max1/ ]a, Y i- l 12 000 (T1 = 757-58).
2. F = 1.0000 x 10 -4, $1 = 0.8333 x 10 -4, S~ =
(A1.4) 0.2879 x 10 -4, y = 1.6533, r~ = 15 840, r: = 11 579,
G o to step 2. r = 14 746.
Note for a special case: If there are no basic 3 . - - > 6 . - + 7 . r2 < rl -+ imax = l , imin = 2.
events with probabilities inversely proportional to 8. (rl - r : ) / r = 0.289 (not small).
any T~ (all K;- are empty), the maximum risk rate 9. New Ti = 785.18, new T2 = 10 734.
is F ( Y ÷) and the minimum is F ( Y - ) . If 2. F = 1.000 × 10 -4, $1 = 0.8140 × 10 -4, $2 =
R < F ( Y - ) , there is no solution. If R - > F ( Y + ) , 0.2351 × 10 -4, y = 1.646, rl = 15 646, r2 = 15 850,
then T = Y + is the solution. In the range r = 15 592.
F ( Y - ) -< R < F ( Y +) a solution exists and Si >- 0 for 3. -+ 6. -+ 7. r~ < r2--+ imax = 2, imin = l.
all i = 1, 2 , . . . , I in the admissible region. 8. (r: - rl)/r -- 0-013 (small).

You might also like