Professional Documents
Culture Documents
RCM Ii
RCM Ii
group
BOSTON
OXFORD
JOHANNESBURG
MELBOURNE NEW DELHI
SINGAPORE
KCJJrIHlCU 1992,
Second edition 1997
nch'Y><-,>d
1994
1997
1997
part of this ou1om:at11on
material fom1 ( including
in any medium electronic
or
means and whether or not trans11ently
of
OllOtC>CODYlnfr
licence issued by
90 Tottenham Court
0
the
Contents
Preface
Acknowledgements
2.1
2.2
2.3
2.4
2.5
Functions
Describing functions
Performance standards
The operating context
Different
of functions
How functions should be listed
1
6
7
16
18
21
28
35
44
3.1
3.2
Failure
Functional failures
Functional Failures
45
45
46
53
4.1
4.2
4.3
4.4
4.5
4.6
4.7
}"ailure Consequences
5 .1
5 .2
5.3
53
55
58
64
77
80
90
90
92
94
viii
5 .4
5.5
5.6
5.7
Operational consequences
consequences
Hidden failure consequences
Conclusion
103
108
111
127
129
6. l Technical tv,,,nu,u"
6.2
6.3
6.4 Scheduled restoration tasks
6.5 Scheduled discard tasks
6.6 Failures which are not
129
130
133
134
137
140
144
7.1
7.2
7.3
7.4
7 .5
7.6
7.7
7 .8
7.9
144
145
149
149
155
157
163
166
167
170
8. l
8.2
83
8.5
Default actions
task intervals
The technically feasibility of failure-finding
170
171
175
185
187
9. l
9.2
9. 3
Walk-around checks
187
188
197
10
198
No scheduled maintenance
198
198
209
211
Contenu;
11
11. l
11.2
11.3
11.4
1 l .5
11.6
11. 7
12
Work m1<.Ka!les
Maintenance planning and control
defects
.-.nrTO>+>"\L'
13
13.1
13.2
13.3
13.4
13.5
13.6
13.7
14
212
212
214
220
221
224
235
235
250
261
269
277
284
286
1
292
Who knows?
RCM review groups
Facilitators
261
14.1
14.2
14.3
14.4
maintenance r'\Ar'Tf>t"t',"l'lnf>A
Maintenance effectiveness
Maintenance ATT,i'"'lA'l'"I''"
What RCM achieves
15
318
335
343
of the airlines
15.1 The
15 .2 RCM in other sectors
15.3 Why RCM
Index
293
304
307
318
321
12
415
Preface
to an
Humanity continues to
extent on the wealth
mechanised and automated businesses. We
more and more on services such as the llr'llntPrf''tTnt;::,,r!
or trains which nm on time. More than ever, these
of physical assets.
continued
Yet when these assets
not only is this wealth eroded and not
are these services interrupted, but our very survival is threatened.
ment failure has played a part in some of the worst accidents and environ- incidents which have become
mental incidents in industrial
Amoco
Bhopal and
a
which these failures occur and what must be done
are rapidly
to
more
ally as becomes
how many of these failures
are caused by the very activities which are
to
them.
The first industry to confront these issues was the international civil
On the basis of research which
r.,-"
and widely-held beliefs about
asset
continues to
as its users want it to perform. This framework is
known within the aviation industry as
and outside it asReliabil
' " r
,r.,
''<"t"lrr.ct.
Nlamt
r
enance,
or
RCM.
H,J-,..'-'Uu\...u
Reliability-centred Maintenance was ue11eumeo
a
years. One of the principal milestones in its
commissioned by the United States Department of Defense from United
Nowlan and the late Howard
by
Airlines and
1978. The report provided a
of the
ofRCM by the civil aviation
It forms the
ment and
basis of both editions of this book and of much of the work done in this
field outside the airline industry in the last fifteen
198f fs. the author and his associates have
Since the
comto applyRCM in hundreds of industrial locations around the world
work which led to the development of RCM2 for industries other than
aviation in 1990.
U\,A,,V><JL.LUJC.,
vu,... .,.,, ..
f"n<'.>tt'Ht:>l'V:> nt'P
Maintenance
xii
Thefirst editionof this book (published in the UK in 1991 and the USA
in 1992) provided a
introduction to RCM2.
to the extent
has continued
the RCM
Since
that it became necessary to revise the first edition to incorporate the new
t1e11eliop1ne1r1ts. Several new
have been added, while others have
been revised and extended. Foremost among the
are:
a more
review of the role of functional analysis and the
definition of failed states in
2 and 3
a much broader and
look at failure modes and effects analysis
1
with
in the ,..."'*"''"""'"'
of levels
emphasis on the
of detail required in Chapter 4
of
new material on how to establish .-:,.r,r,,,::n'\l'1.hlta levels of risk in ......, ...,.... ,., ......,.. 5
3
and
r.."rr...nn approaches to the determination of failure the addition of more .,...,.
115vs1vu;,
task intervals in .._,u,c,1,
C'hrPr
.nv1. 8
more about the implementation of RCM recommendations in Chapter
process
on the RCM
11, with extra
uppucu
'"'
more information on how RCM should and should not be"?""""'
look at the role of the
'"'-'''-""""f... a mo re
in
RCM facilitator
new material on the measurement of the overall performance of the
14
maintenance function in
a brief review of asset hierarchies in Appendix 1,
with a sumrole played by functional hierarchies and
mary of the
functional block
in the application of RCM
a review of different types of human error in Appendix 2, together with
a look at the part they play in the failure of physical assets
,.AlUI.JVJ.
f'An'l"'lf'A hAt'\CIH.:>
xiii
The book is intended for maintenance,
and r...-., 1"""'
managers who wish to learn whatRCM what it achieves and how it is
applied. It will also provide students on business or
studies
strateintroduction
to
the
formulation
courses with a
to financial)
for the
Finally, the book will be invaluable for any students any branch of
a.n,rnri,,:,..::.nnn wh see
o k
of the state-of-the-art in
Maintenance.
elements
Af'f'>f'>A't'lf>'\l
,LA1'-'HU""
n"lA'H'llne>n
JOHN MOUBRA Y
Lutterworth
Leicestershire
',;,:,vnt,:,.1rnh;:>.r 1997
Acknowledgements
- > V H,,..._
I ntroduction to
Rel iabi l ity-centred Maintenance
;;..,<.U X hJ1.<UVH
<" l'V,>1'HT1 n n
con nee-
pressure
and to contain
are
attitudes and skills in all branches of
are having to
to the limit. Maintenance
as ..., ..
and managers. At the same
ways of thinking and
time the limitations of maintenance "" '''H,.,n,. ..
apparent, no matter how much
are computerised.
In the face of this avalanche of vu,,..,,:;..,,
looking for a new ..,.w,.,,. '"'""'"""
starts and dead ends which
> tt rttn ,,r,o
iUUl.h)UJ
vu,,uu;,,v,,
J.,,"'"'"' "
'
Reliability-centred Maintenance
of
Figure 1.1
1 94 0
1 950
i 960
1 970
1 980
1 990
2000
Downtime has
assets reduci ng output, l 't'lF'r'P'.} C , n cr OD1eralln:Q:
customer service. By the
conce rn in the
the effects of downtime are
where reduced stocks of work-in
move towards
progress mean that quite small breakdowns are
much
to
a whole
In recent
the
n ;>T l t -r,, "t''HY
.U HlCl.LU._(i';; ...->c-..-._rnn
become
Greater automation also means that more and more failures affect our
to sustain satisfactory
standards of service as it does
networks as much as they can interfere with the consistent
achievement of "'"''"'"' "'"' '1
More and more failures have serious
some
either conform to
they cease to "'._,,.,,, .......,
on the
of our
assets
which becomes a simple matter of ,.., ... ,..,.,H., ""''T""'" '" 1 survival.
on physical assets is
so
and to own. To secure the maximum return
must be kept
effici-
in absolute terms
'"'"'"''"''". . ,,,..,.,. it is now the
As
thirty years it has moved from almost nowhere to the top of the
as a cost control priority .
New research
Quite
from
new research i s
many of
our most basic beliefs about age and failure. In particular, i t is <'.l n.--. <:>..-,o,n
that there is less and less connection between the
age of most
assets and how
arc to fail .
l . 2 shows how the earliest view
of fai lure was s imply that as
were more likely to fail. A
crr,\n/1 n awareness of 'infant mortality'
w11e1esnnact Second Generation
belief in the 'bathtub' curve.
r
ff
Figure 1.2:
1 940
,_. ,...n, uHor
1 950
1 960
1 970
1 980
1 990
2000
1nird Generation research has revealed that not one or two but
.,....r, ,.., n,n
'"'"''""'""' ".lt"ln"ill'I./ occur in yiav'-'"'
This i s discussed i n detail
.,.....,,..-,,,n
ri.rf effect on maintenance.
New IECfl.fllULte.\
There has been
growth in new maintenance
and techHundreds have been
over the past fifteen years, and
every week,
more are
1 940
1 950
Figure 1.3:
1 960
1 970
i 980
1 990
2000
6
RCM provides a framework which enables users to respond to these chalIt does so because it never loses
of the
quickly and
fact that maintenance is about phys ical assets.
the maintenance function itself would not exist. So RCM starts with a
zero-based review of the maintenance
of
or sys-
in the
These questions are introduced
then considered in detail in ,...1,, v,.,..,1. 2 to lO.
Maintenance
'-'L
Functional Failures
"""'"''"'r '"'"" of maintenance are defined the functions and associ
of the asset under consideration. But how
does maintenance achieve these """''""""'t""",
The only occurrence which is likely to stop any
to the
standard
by its users is some kind of failure. This
that
maintenance achieves its
by adopting a suitable approach to
the
of failure. However, before we can
a suitable blend
of failure management tools, we need to '"'"'"""''""' what failures can occur.
The RCM process does this at two levels:
, rt c>,n f','f'u, n a what circumstances amount to a failed state
then by
into a failed state.
what events can cause the asset to
In the world of RCM, failed states are known asfunctionalfailures beto standard
occur when an asset is unable
cause
r,,:; ,,t-,r,,,,n ,,,n,rr:, which is
to the user.
,..,,,.1M'tYr-rn ,cint'"" "1 'V n.<r>f'C>f'H'<r\C'
Introduction to
to
this definition encompasIn addition to the total
where
the
asset
still
functions
but at an l l n n r,c,r,-r n le'\ l a
ses
situations where the asset cannot sustain
level of performance
"'"''"'"'i..."'u'J'-' levels of quality or
these can
be
identified after the functions and
standards of the asset have
been defined.
Functional failures are discussed at
in Chapter
Failure Modes
once each functional fai lure has
ru,,rn , , . , aH the events 1vhich are
to ,U,.:_,111...UJ
reasonably
state. These events
known
failure modes inc.l ude those which
failure modes.
have occurred on the same or simi lar equipment v ,_,,.,,. ._,,.,u l l..
context, failures which are
prevented
and failures which have not
tenance
considered to be real
in the context in n n ,:i, , ,, ,..,. ,.,.
Most traditional lists of failure modes
deterioration or normal wear and teaL
fai lures caused by
and
flaws so that all reasortaDJlV
nri::>i.!1 t'\l l C ,,...,,,.., ,....,,.....,,...,
t'F>t*\t-..::>norl
l 1'1f'r.rt"\Ar' <,} fA
t n o n n ,ru
.lU/1,.,UUiJ
instead of
to treat
effort are n o t wasted
it is equall y
to ensure that tirne is not wasted on the
other
i nto too much detail.
analysis itself
Failure Effects
The fourth step i n the RCM process entail s listingfailure effects, which
when each failure mode occurs. These
describe what
tions should include all the information needed to support the evaluation
of the consequences of the
such as:
that the failure has occurred
what evidence
or the environment
in what ways (if
it poses threat to
it affects production
in what ways
is caused
the failure
what r, n ,"'""
the failure.
what must be done to
10
very
and also for eliminating waste
Failure Consequences
of an average i ndustrial
A detailed
is likely to yield
between three and ten thousand
failure modes. Each of these
in some way, but in each case, the effects
failures affects the
may also affect product
are different.
or the environment. They will all take
quality, customer
time and cost money to
It is these consequences which most strongly influence the extent to
each fa ilure. In other words, if a failure h as seri
which we try to
to try to avoid i t
ous c onsequences, we are likely to go to great
On the other hand, i f it has little o r no effect, then we may decide t o do
no routine maintenance beyond basic cleaning and lubrication.
of RCM is that it
that the consequences of
A
failures are far more important than their technical characteristics. In
any kind of
that the only reason for
mainit
se, but to avoid or at least to reduce the
The RCM process classifies me:se co11seauern:es
into four groups, as follows:
Hiddenfailure consequences: Hidden failures have no direct impact,
expose the
to multiple failures with
often
of these failures are associated with
Safety and environmental consequences: A failure has
consequences if it could hurt or kill someone. It has environmental conse
quences if it could lead to a b reach of any
national
or international environmental standard.
Operational consequences: A failure has operation al consequences if
it affects production (output, product quality, customer service or oper
costs in addition to the direct cost of repair)
Non-operational consequences: Evident fai lures which fall into this
nor production, so they i nvolve only the
"''""""''"'"'' affect neither
direct cost of
ll
We will see later how theRCM process uses these {'1 1,.:> rr,,,r 1 "" as the basis
of a
a
framework for maintenance ae<:::1s1ton-1n.,'11,ir,n
,,,..,,,,...,.
structured review of the consequences of each failure mode in tenns: of the
the
and
it
above
and the
environment into the mainstream of maintenance t'Y'l"l, ,..,,, ,,..,,.."''"''"'"
The consequence evaluation process also shifts ...,, .,,q..n,,u,, ,,,,"
the idea that all failures are bad and must be prevented. In
it focuses
attention on the maintenance activities which have most effect on
and diverts energy away from those which
formance of the
have little or no effect. It also encourages to think more
about
vm-..,f::,VHv,>
-T'\t',PU/.:l<t'HI UP
chosen when it
inDefault
cij
Figure 1.4:
The traditional
view of failure
:;:;
15
C
"LI FE"
(I)
\.-
.3
cu
LL
Age
Maintenance
12
acteristics
often found where equipment comes into direct contact with
r,"'''x"" fai lures are also often associated with _..,. ...
the product.
abrasion and U :lnr,,>> ftA>">
However, equipment in
it w as twenty
y ears ago. This has led to strutling
i n the
of
as
1 .5. The graphs show conditional probabi lity of failure
shown in
"'""''"ri, ... , n, rr age for a
of electrical and mechanical items.
Pattern A is the well-known bathtub curve. It
with a
'tr,.-r/, i t t'\ , fol lowed by a constant or
then by a wear-out
zone. Pattern B shows constant or
conditional
ability of
in a wear-out zone (the same as Figure 1 .4).
iTP .
,Ll",U''-',
vVJC '-J,.U.v,u,
E
F
Introduction
13
of
Pattern C shows
but there i s no identifiable
Pattern D shows
conditional
probability of failure when the item new or just
of the
then a
w hile pattern E shows a constant conrapid increase to a constant
ditional probability of failure all
Pattern F starts
w ith
i nfant mortality, which
or very
slowly
conditional
of failure.
S tudies done on civil aircraft showed that
of the items conformed
to
to pattern A,
to 7% to 14% E and no fewer than 68%
occur in aircraft is not
F.
number of times these
But there is no doubt that assets
the same as in
become more
we
more and
of
E and
These findings contradict the belief that there is
a connection
This belief led to the idea that the
and
between
more often an item i s
this seldom true. Unless there is a dominant ,, ,,,.. _ r,,, , .., , ,,,,.
age limits do little or
to improve the
increase overall failure rates
In fact scheduled overhauls can
introducing infant
i nto otherwise stable c u ,n;n,c
An awareness of these facts has led some """'"'"""'' 'H".""'"'
of
maintenance
In
to do for failures with minor consequences. B ut when the
must be
consequences are CH;;_uu,J.vW,u,
to prevent prediet the failures, or at least to
the "'""'"'''"'rt""""''""'''
mentioned
u s back to the
of
Thi s
RCM d ivides proactive tasks into three
follows:
scheduled restoration tasks
scheduled discard tasks
scheduled on-condition tasks.
n H ,OT"rt <> n
1
('iH'le 0 TY! l Y! (1
14
Reliability-centred Maintenance
On-condition tasks
need to
The
certain
of failure, and the nr[Hu,, n ninability of classical techniques to do so, are behind the growth of new
on the
of the fact that they are about
some
to occur. These ,,. u.:rnu 5" are known aspotentialfailures' and are defined
as za1cmtLTll'.Wite
conditions which indicate that a functional fail
ure is about to occur or is in the process
The new techniques are used to detect potential failures so that action
can be taken to avoid the consequences which could occur if they
are called oncondition tasks because
erate intq functional failures.
to meet desired
items are left in service on the condition
performance standards.
maintenance includes
condition-based maintenance and condition nwnitoring.)
Used appropriately, on-condition tasks are a very
way
waste of time. RCM enables
... ...u ... .._ - ,.,. but they can also be an
decisions in this area to be made with
confidence.
'
HTO t"l"U lr\ i'VL'
Default Actions
r,ru ,:;, n,r.r1
'C>('
of default
<.<'-'<,HJ.Uc>,
as follows:
15
task is worth
if it reduces the risk
of the multiple fai lure associated with that function to an '.H' r.:>nr,,
low leveL If such a task cannot be found then a scheduled/ailure-finding
task must be performed.
default decision is that the item may have to be rethen the
-- !...,-,-UJ-, on the consequences of the mt1m101e
or environmental consequences, a
task
if it reduces the risk of that fa ilure on its own to
is
worth
a very low level indeed, if it does not eliminate i t
I f a task cannot be found which reduces the risk of the fa ilure to an
low
the item must be redesigned or the process must be changed.
, n r, h l-
"
16
Maintenance
to the development of
Compare this with the traditional
maintenance policies. Traditionally, the maintenance
of each
asset are assessed in terms of its real or assumed technical cn:3.racte:nst1cs,
resmtmg schedules
the consequences of failure.
without
without considering that different
are used for all similar assets,
contexts. This results in
in different
consequences
numbers of schedules which are
not because they are
the technical sense. but because
achieve nothing.
Note also that the RCM process considers the maintenance rP,.. . .
reconsider the
whether it is
men ts of each asset before
who is on duty
This is simpl y because the maintenance
not what should be
to maintain the equipment as it exists
there or what
be there at some
in the future.
.-.c> ..
17
Revierv groups
We have seen how the RCM process embodies seven basic questions. In
practice, maintenance peop.le simply cannot answer all these questions on
their own. This is because many (if not most) of the answers can only be
supplied by production or operations people. This applies especially to
questions concerning functions, desired performance, failure effects and
failure consequences.
For this reason, a review of the maintenance requirements of any asset
should be done by small teams which include at least one person from the
maintenance function and one from the operations function. The senior
ity of the group members is less important than the fact that they should
have a thorough knowledge of the asset under review. Each group mem
ber should also have been trained inRCM. The make-up of a typical RCM
review group is shown in Figure 1 .6:
The use of these groups not
Facilitator
only enables management
to gain access to the
Engineering
Operations
knowledge and
Supervisor
Superviso
expertise of each
member of the group
on a systematic basis,
Craftsman
but the members
(M and/or E)
Operator
themselves gain a
greatly enhanced under
standing of the asset in
External Specialist (if needed)
(Technical or Process)
its operating context.
Figure 1. 6: A typical RCM review group
Facilitators
RCM review groups work under the guidance of highly trained special
ists i n RCM, known as facilitators. The facilitators are the most impor
tant people in the RCM review process. Their role is to ensure that:
the RCM analysis is carried out at the right level, that system bounda
ries are clearly defined, that no important items are overlooked and that
the results of the analysis are properly recorded
RCM is correctly understood and applied by the group members
the group reaches consensus in a brisk and orderly fashion, while retain
ing the enthusiasm and commitment of individual members
18
quickly and finishes o n time.
..,,,..u .. ,,,..,...
The outcomes
RCM
in the manner ,,,,,;...,...,---
If it is
outcomes, as follows:
maintenance schedules to be done by the maintenance ' 'lrl"m,o. n J"
revised
for the
of the asset
of the
a list of areas where
asset or the way in which it is ri.n,0., .,,.,,.,4 to deal with situations where the
asset cannot deliver the desired
in its current ,rn"ITH'Yll>"'>T> E"\t'\
in the process learn a
outcomes are that
Two less
deal about how the
and also tend to function better as teams.
1
1
"' ..
19
20
without
to reconsider aU maintenance policies from
scratch. It also enables equipment users to demonstrate that their mainte
1
nance programs are built on rational foundations (the audit trail
the information stored on RCM
more and more
with its attendant loss
worksheets reduces the
of
and
An RCM review of the maintenance
of each asset also
to maintain each
a much clearer view of the skills
what spares should be held in stock. A valuable
asset, and for
and manuals.
r""r1 1 1 1 1,."''
rurnJH' SAC'
......
-i,;;.,, a cornmon,
,._.,.,,
understood techni Better teamwork : RCM .....
..:v.::.
P' v"
cal :uu5u"6 for everyone who has
to do with maintenance.
maintenance and ,.....,..,,...,.,... ,.."."' people a better understanding
This
of what maintenance can
achieve and what must be done
to achieve it
J <'l n f 'll;"l ,'T,>
l
Functions
become
Most
for things, be
because
elec trical or structural. This
leads
feel o ffended
,,v,u-.,, ,u u,,ih
,ic,'t'H'\O,ri
,:rv r. t n.T'CH'
22
2. 1 Describing functions
a function state
It is also helpful to start such
statements \Vith the word to'
pump watee , to
, etc).
in the next
of this chapter, users
exJJHlme:a at
an asset to fulfil a function.
it to do so
also
level
to an
v . ,.,.,.,,,,.,,,.,.,. So a fu nc tion definition
and by
implication the definition of the nh,,,,.r,r,u,:.c,
v:.,j.,,,.,: ....,., of maintenance for the asset
..,"""'"''"'''-' the level of
- is not complete unless it
,-,.o, -t-p,rn... ,, ..... ,,.::, desired
the user
" f.J'"'"-'-'- H.J
an
Function1,
23
Figure 1:
Initial capability vs
desffed perlormance
w
(.)
z
<(
:;E
a:
0
u..
a:
w
a..
So if deterioration is
it must
be allowed for. This means that when
any asset is put into
i t must be
able to deliver more than the minimum
standard
desired the
is
user. What the asset is able to
inher
ent reliability).
22 illustrates the
right relationship between this capabi
lity and desired
Figure 2.2: Allowing for deterioration
24
Reliability-centred Maintenance
INITIAL CAPABILITY What it can do)
Maintenance
w
(.)
2
a:
0
u..
a:
w
a..
cannot raise
the capability
of the asset
above this level
The objective of
maintenance is
to ensure that
capability stays
above this level
,,_.C,
1. c i. ..c a
a ., .:i ' , ,
f,l.i
- 'I" ".
.: ,. '
;1 . ' -1
r.c
Figure 2.4:
A nonmaintainable situation
Two conclusions which can be drawn from the above examples are that:
for any asset to be maintainable, the desired peiformance of the asset
must fall within the envelope of its initial capability
in order to determine whether this is so, we not only need to know the
initial capabil ity of the asset, but we also need to know exactly what
minimum performance the user is prepared to accept in the context in
which the asset is being used.
Functions
25
,:H .,.,H.U.Hi.lUF,
Qualitative standards
In spite of the need to be precise, it i s sometimes 1moo:;;;s1101e
quantitative performance standards so we have to live with ,.1 ..................
statements.
For instance, the p ri mary function of a
(if
varies
from person to
not 'attractive'). What is meant by
person and is i mpossible to q ua ntify. As a result, user and maintain e r need to take
care to ensure t hat they s hare a common understanding of what is meant by
words l ike 'acceptable' before
u p a system intended to p reserv e that
acceptabi l ity.
26
-lVc) 'V < v<MC t),(> Jt,,Hr'.n Orlr'O standards
A function statement which contains
"""'"'"'' H'" an absolute.
For instance, the concept of containment is associated with nearly all enclosed
co ntainment are often written as follows:
svEnerns. Function statements
contain liquid X
that the system must contain
The absenc e of a
standard
al/the
at all amounts to a failed state. In cases where
and that
an e nc losed system can tolerate some l eakage, the amount which can be toler-
1"U'r1;.'1''HU11'1rP S{(l!Ulanl'S
:'"'"t,,t;,,,,c (or
applied
Performance {'.'.>V
v"'v,.,,
....,......,,u.
between two extremes.
3
2
<t.:
0
...J
Figure 2.5:
Functions
For instance, the primary function of a swee1:mcK1na m ,,,,.,,.h ,nn
gm of sweets i nto
The primary function of the ,,,.,r\r1,.-,,r1
chine might be:
To finish grind main
in
a cycle time of 3.00 0.03
a
diameter of 75 0. 1 m m with a s urface
finish of Ra0.2.
In cases like
the desired performance limits
limits. The limits of ...,,.,,...,v .. ...
and lower
as
three standard deviations either side of the
the upper and lower control limits. Quality mamagecne1m
that in a well
process, the difference between the control limits
should ideally be half the difference between the
limits. This
multiple should allow
a maintenance viewpoint.
also
and lower limits not only
apply to other functional
nc:uHms such as the accuracy of
and the
of control systems and nrc,1-,.-.,,i- n ,.., devices. This issue is
discussed further in
3.
28
is
d rawn at a rate of 900 l itres per m i nute, the primary function would be:
To p u mp slurry i nto Tank B at n ot less than 900 litres per m i nute.
standard than in the previous l ocation , so the stand
This is a
ard to which it has to be maintained rises accordingly. Because it is now p umping
and severity of the failure modes
instead of water, the nature,
the p u m p itself i s unchanged, i t is l ikely t o e n d
As resu lt,
oto,hJ, different maintenance program i n the new context.
up with a r-nrY'lnl
iv:v,V
V"'f I
processes
11 1
,HlUL'-'i:,J
11
""
'l:':
U
U LJ
Jc H.
or alternative means of
The presence of
feature of the nn,,,..,.,:, t t n. cr context which must be considered in detail when
....,.,.., u, ,, . . .... the functions of any asset.
. .
Figure 2. 7:
Different
contexts
Stand Alone
.D
. :u. . t.y:..:..:
standards
Quality standards and standards of customer service are two more
context which can lead to differences between the deof the
C' ,...........T.,.,,,..", of the functions of otherwise identical machines.
stations on two transfer machines
For example, identical
same basic function - to mill a
of cut,
ness tolerance and surface finish
all be different.
lead to quite different conclusions about
the
Maintenance
30
Environmental standards
An : ..\,,vu,.,:.,E,: J important
of the rvn,Or r'!i'H,, cr conte xt of any asset i s
which it has
o n the environment.
the
worldwide interest in environmental issues means that when
two sets of 'users ' . The
we maintain any asset, we actually have to
the asset itself The second is
as a
first the people who
which wants both the asset and the process of which it forms part
not to cause undue harm to the envi ronment.
What
wants is
in the form of mc:re::1s11ni!llv "t,,,-.... "'''""'r
environmental standards and l\.;bu;a;_uu;,. These are international, nationat
, n ,>-r,a,,,'H' 1 H i"l' 'I H
.,
,,UlUE,\;ill.
not
s imply to ensure that a
or process
also
sound a t the moment i t i s commissioned.
is
have to be taken to ensure that
life.
because all over
more and more incidents which
affect the environnn\,1:,P -:i , asset did not behave as it should
n-rt"T.ar,r-u
now
themselves
acceptable levels of risk. In
in others to individual sites and
some cases,
in
others to individual processes or assets.
context.
standards
they are an important
of the
d=,, ra l ,(). ....,rl
,...n,nr, .:,rn ,, n ..
Shift anangements
context. Some plants
v,,.,. ,..,,"" for
hours per
five
a week (and even less in bad
Others operate continuously for seven
a
others
somewhere in between.
31
shift plant, production lost due failures can
In a
overtime. This overtime leads to , n ,,r,::,o e,,,, I
up by
evaluated in the
of these
so maintenance
24 hours per
On the other hand, if an asset is
week, it is seldom :_--rwc>hlA to make up for lost
lost sales. This costs a
deal more than extra
so i t is worth
" " 'n>THT
IJV,J,,<V<v
As products move
as economic conditions
their li fe
to the
organisations can move from one end this
quickly . For this reason, it i s
to review maintenance
policies every time this
context ,.,... ..,.., ,.,.,,.._,
of the
rh.nnO'P
..,u ...... ,,....,-,
32
So from the ma.mten:anc:e ""''" n"'"" "t- a balance has to be stmck between
and:
the economic
the cost
of those fniJnrP<' Qf
/""'"' ''" maintenance tasks with a view to
the cost
the failures.
or
To strike this balance
this
understood in un,L.................- ...
be paiticularly
,, .,,.A ,. ...., ,., ,_, ,
T"\Y'<CC>Ut:JYHll r\
lT
tirne
times are influenced by
Functions
33
If this approach
seen part of the (initial) Ar\,O.rOt"> n, ,y
.Market demand
,-,. .,.....c, .,,,,
h n, n \.,
1,JiH'-">'-L
sometimes features
the
can
in dernand for
in summer than in winter, while u rban transport t'nt'Y'lr\l n i ,:lC, o,... nnr,,:,,,-.r,a
mand during rush hours.
much more
of industry, this
understood when
asf;essmg failure consequences.
Raw material
context is influenced by
Sometimes the
the supply of raw materials. Food manufacturers
ods of intense
harvest times and
activity at other times. This
to fruit processors and sugar
operational failures
but
mills.
n n ,<> n t>t"H:><' of raw materials if these cannot
deteriorate.
Documenting the
context
For all the above reasons, it is essential to ensure that everyone involved
the development of a maintenance program for
as;;set
understands
context of that asset The best way to do is to document
the
ifnei:::essa1:v up to and
the overall mission
statement of the entire oran1sat10ia, as part of the RCM process.
Vj.J'\.eJ. >.,\,
T'l"a,c,1"
J.U,:., f'>f'>
context statement
2.8 overleaf shows a
machine mentioned earlier. The crankshaft is
in type
used i n motor car model K
Reliability-centred Maintenance
34
Make car
model X
Make
engines
asset: Motown
Make Type 2
engines
Finish grind
crankshaft
main and big
end journals
Functions
35
urations
expectations change - but still we find
since
was built. Defining
very close "''-"J""'"'"
""'""t'1 An
'""'" between maintainers and users.
,rir.:--""r"
It also usually a profound 1eaimu1g e;mene1:1ce for everyone involved.
Functions are divided into two main
(primary and secondary functions) and then further divided into various
These
with
are reviewed on the fo llowing pages,
VJcUUF,'-',
Primary functions
36
Primary functions are
easy to rec:ogmse . In
of most indust1ial assets are based on their
the names
of a
, Uh''-'""''-'VJ.
, r; r,._....
><
A similar situation is often found i n manufacturi ng, where the same asset may be
different functions at different times. For i n stance, a single reactor
used to
vessel in a chemical plant m ight be used at different times to reflux
continuunder three different sets of conditions, as follows:
ously) three
3
2
1
Product
2 bar
Pressu re
1 0 bar
6 bar
1 20"C
1 40 c
1 80C
500 litres 600 litres 750 l itres
Batch s ize
Functions
37
3'-'JJC.U
In the above example, a combined fun ction statement could be 'to reflux up to 750
l itres of product at temperatures up to 1 80 G and pressures u p to 1 o bar.'
This will
some overmaintenance some of the
but which will ensure that the asset can
handle the worst stresses to which it will be o v ,,...,..._,, .::, r1
Serial or dependent prinwry functions
One often encounters assets which must
two or more
functions i n series. These are known as serial functions.
Secondary Functions
Most assets are expected to fulfil one or more functions in additi on to their
primary functions. These are known as secondary functions .
For example, the primary function of a motor car might be described as follows:
to transport up to 5 people at speeds of up to 1 40 km/h
m ade roads
If this was the only function of the vehicle, then the only ornec1we of the mainteto carry up to 5
n ance program for this car would be to p reserve its
at speeds of up 1 40 km/h along made roads. However, this
from the
because most car owners expect far more from their vehicles,
to the ability to i ndicate how m uch fuel is i n the fuel tank.
ability to carry
are divi-
38
control/containment/comfort
appearance
functions,
The first letters of each line in this list form the word ESCAPES .
>.1u,vu,,._u sec:onaai--v functions are usually less obvious than primary
of a
fu nction can still have serious consefunctions, the
quences sometimes more serious than the loss
function. As
se<:::0110lfY functions often need as much if not more maintenance
too must be clearly identified. The fol lowthan primary fu nctions,
of these functions i n more detail.
the main
pages
Jr,Ql', I I H-,.J H'
Functions
39
functions
those which deal with
A further subset of
nucn,c.r,,,. TI1ese are most often found in the food
contamination
onannace11t11:a1 industri es . The (> <' <' r>f'"I 1 n,arr,r.rn. '1nra.
and lead to
maintenance routines ( cleaning and
<;,;::.-,
,.>IJ'v'-'U. llv.._.,
Structural
, w ,._,,,.. , ,... ,
function. This
Many assets have a structural
''"'-''",,n or conu:>orrent.
supporting some other
"--v .. n....
involves
protect
exrmctea to support
Control
In many cases,
not
to fulfil fu nctions to
standard of performance, but
want to be able to
fonnance. This expectation is summarised in sermr:ate
For i nstance, the primary function of a car as suggested
transport
u p to 1 40 km/h
made roads'. One control function
up to 5
associated with this function could be 'to enable driver to regulate speed at will
and + 1 40 km/h'
between - 1 5 km/h
' ""'"' ,.... .,.,,
. - --.-,........n
of functions. This includes functions which '"'r''
P' ,.:, :,., up1 u.Lv1 ., with real time information about the process
VDU's
and control
or which record such information for later
or analog recording
voice
in
functions not
Perfonnance standards associated with
late to the ease with which it should be
to read and assimilate
but
also
cover
its
accuracy.
to playback the
For i nstance, the function of the
of a car
indicate the road speed to the d river to withi n +5 -0% of
b e described as 'to
actual
40
Containment
function is to contain
In the case of assets used to store things, a
whatever is being stored. However, containment should also be acknowlmaterial of
sec:onoruv function of all devices used to
This
pumps, conveyors, chutes,
function of items like gearremarks on
26
or
their assets not to cause them
are l isted under the
of ' comfort' because the
freedom from anxi
classified under
egory of functions
that this doesn 1t ...., .
,-'u
secondary function.
The appearance of many items embodies a
the primary fu nction of the
on most industrial
For
it from corrosion, but a bright colour might be
eami:>mint is to
used to enhance its
for safety reasons. Similarly, the main function of a
is to show the name of the company which
sec:onoa1':V function is to project an
41
Protective devices
assets become more complex. the number of ways
As
n-rr,nr,, nn almost
fail is 51vvvu.5
This has led to Pn,"'1",;u.-.,,.nf"f, n rr
in the
of
failure
consequences.
In
and
nate (or a t least t o reduce)
consequences,
use i s
made o f automatic protective devices. These work i n one of
f","'''''
r.r,a r,,,
tQ abnormal conditions
to draw the attention of the vpvHH\It,')
and audible alanns which ro,nr),YJ/1
are monitored a
cells, overload o r
temperature or pressure
can
rupture discs
42
The maintenance
which are not
fail-safe - is discussed in much more detail in rhmti::>rs 5 and 8. However,
demonstrates two fundamental points :
this
..._,U..,.IJ ..,.. L
of protective
to consider the
It is only
the functions of
devices if we understand their functions. So when
any asset, we must list the functions of all
devices.
devices concerns the way their functions
A final point
should be described. These devices actov ext;ermcm (in other words when
so it is important to describe them correctly.
function statements should include the words 'if'
of the c ircumstanorin the event of . followed by a very
ces or the event which would activate the protection.
1"\ri'\f'A /'t, 'ttP
Anyone who uses assets of any sort only has finite financial resources.
This leads them to put a limit on what they are prepared to spend on
v.., ...,, .......u,., and maintaining it How much they are prepared to spend is
a combination of three factors:
the actual extent of their financial resources
how much they want whatever the asset will do for them
the same end.
and cost
the
At the
context level, fu nctional
costs
are
out in the form of
economic issues can be addressed directly by function
in areas such as
statements w hich define what users
materials.
and loss
nv ,.-,.,:,1"\ rl 1 h 1 >C> hlirl<'rOf'c<
Functions
For i nstance, car might be
'to consume not more than 6
of fuel
per 1 00 km at a constant 1 20 km/h, and not more than 4 l itres of fuel per 1 00 km
might be
at 60 km/h.' A fossil fuel
export at least 45%
of the latent energ y in the fuel as electrical power.' A plant
solvent might want 'to lose no more than 0.5% of solvent X per month'.
Functions
are sometimes encountered which are "''" "'"'''' ''"
Items or
when
has been modified
superfluous. Thi s usually
frequently
has been over,,..,,...,. ....,,,,,.,,....... '" built
1 1 11 "'.,." '""""'
11"
line between a
a pressu re reducing valve was built into
For
gas manifold and a gas turbine. Th e original f unction of the valve was to reduce
1 20 psi to 80
The system was later modified to red uce
the gas
the manifold pressure to 8 0 psi , after which the valve
no useful p u rpose.
"'"'J.U>J
44
the ESCAPES
TI1ere will often be doubt about which of the ESCAPES "'"1''""'"''"''"""' some
should the function of a seat
functions
mechan ism be classified under the
of 'control' or "comfort' ?
classification does not matter. What does m atter
In
and define all the functions which are l ikely to be ex"'"'''-'-'5\Jl.i\.;;:'I
""'''""u,,,.,,...,,
Figure 2.9:
De,scnlbfnrc, functions
,C>-nf-a.T'T',r'i
Fu nctional Failures
1',, .,,,,,..,
3.. 1 Failure
In the previous chapter, we
because they want the assets do
Not
their assets to fulfil the intended functions
dard of performance.
went on to explain that
both to do what its
of the
must
want and to allow for deterioration, the initial
as the
as
exceed the desired standard of
capability of the asset continues to exceed the desired standard perTr>t>m-,,n(''"' the user will be satisfied.
if for any reason the asset is unable to do what the
On the other
user wants, the user will consider it to have failed .
,.,._.iJ"""' ' " ..
f'\Pl"Tr>,irn 'H'l/',:l,
46
Reliability-centred Maintenance
3. .
Figure 3. 1:
The general failed state
Functional .Failures
47
Conversely, i t is
it cannot pump the
it still contains the
This shows why it more accurate to define fail ure in terms the l oss
functions rather than the fa i lure of an asset as a whole. It also
of
the RCM process
the term 'functional failure' to describe
its own.
rather than 'fai lure'
to
the
of
we also need to l ook m ore
oeirtorm:am;e s tandards.
Performance Standards and F
. ailure
As discussed in the first
between
by a n.w,r-r-.:rt"n <J t'> ,''< standard.
factory _:''-""t"Fnrm,;:in .. and failure is
'failure'
Given that ns::>1 'trYrm 1 n,p standards
defining a functional failure as follows:
can be defined
P
IJV.. .. V.1C U Ll.,_U,.,v
CJJ>-Uu
j,1'-l lVUUU,H..:.,\..,
U U \U'UHc'l ,
t'""'1:u1 u, .....,..,:,
Figure 3.2:
Functional failure
QV!l,m
nlt:l
For""'"
....'' 'I"''
... , the primary function of the
from tan k X to Tank Y at n ot less than 800 l itres/mi nute' This function
from two functional fail u res, as follows:
fails to pump any
at all
pumps water at less than 800 l itres per m i n ute .
48
Maintenance
INITIAL CAPABILITY
(What it can do)
I Actual deterioration
. ,i. . .. >...
.
:.. ). y-
tor d9:terf()fa{i:
on
UJ
0
<(
2
a:
0
u.
UJ
a..
Figure 3.3:
Asset still OK
some deterioration
Functional Failures
49
The function of a crankshaft grinding machine was listed as 'To finish grind main
bearing journals in a cycle time of 3.00 0.03 minutes to a diameter of 75 0. 1 mm
with a surface finish of Ra 0.2'.
Completely unable to grind workpiece
Gri nds workpiece in a cycle time longer than 3 .03 m inutes
Grinds workpiece in a cycle time less than 2.97 minutes
Diameter exceeds 75. 1 m m
Diameter i s below 74.9 mm
Surface fin ish too rough.
Of course, if only one limit applies to a particular parameter, then only one
failed state is possible. For instance, the absence of a lower limit on the
roughness specification in the above example suggests that it is not pos
sible to make the item too smooth . In some circumstances, this may not
actually be true, so care needs to be taken to verify this point when analys
ing functions of this type.
In practice, the failed states associated with upper and lower limits can
manifest themselves in two ways. Firstly, the spread of capability could
breach the specification limits in one direction only. This is illustrated in
Figure 3.4, which shows that this type of failed state can be likened to a
number of shots hitting a target in a tight group but way off center.
Figure 3.4:
The second failed state occurs when the spread of capability is so broad
that it breaches both the upper and the lower specification limits. Figure
3.5 shows that this can be likened to shots scattered all over the target.
Note that in both of the above cases, not all of the products produced
by the processes i n question will be failed. I f the breach is m inor ! only a
small percentage of out-of-spec products will be produced. However, the
further off centre the grouping in the first case, or the broader the spread in
the second case, the higher will be the percentage of fa ilures.
Mean desired_.:
perforemance
50
Reliabilitycentred Maintenance
Figure 3.5:
For i nstance, the funct ion of a temperature gauge could be listed as 'to disp lay the
temperatu re of process X to within (say) 2% of the actual process temperatu re'.
This gauge can suffer from three functional failures, as follows:
fails altogether to display p rocess temperatu re
displays a temperatu re more than 2% higher than the actual temperatu re
displays a temperatu re more than 2% lower than the actual temperature.
For example, we saw how the p u m p shown in Figure 2. 1 fails if it is com pletely
u nable to p u m p water, and if it is able to pump less than 800 litres/minute. If the
same pump is used to fill a tan k from which water is d rawn at 900 l itres/minute,
the second failed state occurs if the throughput d rops below 900 litres/mi n ute.
Functional Failures
51
Figure 3.6:
Different views
about failure
\ Leak
starts
The maintenance manager (who controls the hydraulic oil budget) may ask the
operators for access to the hydraulic system to repai r leaks 'because oil consump
tion is excessive'. However access may be denied b ecause the operators think
the machi n e 'is stil l working OK'. When thi s happens, the maintenance people ( 1 )
record that the m achine 'was not released for
maintenance' , and
form the opinion that their p roduction col l eagues 'don't believe in PM'. For similar
reasons, the maintenance manager might not release a maintenance person to
officer.
to do so by the
repair a smal l leak when
In fact, all three parties almost certainly do believe in prevention. The real p roblem
is that they have not taken the trouble to agree
what is meant by 'failed' so
they do not sh are a common u nderstanding of what they are
to p revent.
Thi s
52
.,,..,.........,...""""""'.""' standard used to define functional failure in other
the point where we say 'so far and no further' defines the level
maintenance needed to avoid that failure other words,
to sustain the
level of ,,....,,,.,,..,r-,-..,,-m,c,n ro
much time and energy can be saved if these pe1rtonnam::::e standards are
the failures occur
estabJ ished
the ""'""''!
by opera
p1v1 ,-;..!:':'!'!!!:e standards used to define failure must be
with anyone else who
tions and maintenance
has
to say about how the asset should behave,
CJVJiU\.. H H U f:., L\,>F,, HUL.U.'- 'LV
RCM II
INFORMATION
WORKSHEET
@ 1 996 ALADON LTD
SYSTEM
SUB-SYSTEM
----
5 Af'W'Iur6ine
u:.,fiaust System
"'"""'..,.
D Fails to
above the
Figure 3. 7:
10 m
gas to a
if
of sending a shutdown
temperature exceeds
54
Reliability-centred Maintenance
RCM II
INFORMATION
WORKSHEET
SYSTEM
SUB-SYS--
---
These issues are discussed at length later in thi s chapter, but first we l ook
at why we need to analyse fail ure modes at all.
55
UUl,LUv.U.i-"U'-''-' f-'".L.LUUJlJC,
is all about
UHi, .'-J.Llr,
to deal with
record individual
them).
Sadly, in too many cases, these failure modes are mscwseia . recorded or
otherwise dealt with after they have occurred . .....,.,."'
"'' ""' with failures after
nP.;}l1ncr
they have happened i s of course the essence of reactive maintenance.
Proactive management on the other hand, means dealing with events
before they occur or at least, deciding how they should be dealt with if
they were to occur. In order to do this, we need to know beforehand what
events are likely to occur. The 'events' in this context are fa i lure modes.
So if we wish to apply truly proactive maintenance to any
we must try to i dentify all the failure modes which are
to affect that asset.
they should be identified before
all, or if thi s is not possible, before
occur
Once each failure mode has been identified, it then becomes uo:ss1101e
to consider what happens when it occurs, to assess its consequences and
to decide what (if anything) should be done to anticipate,
detect
it out
or correct it or perhaps even to
'L1
, failure by:
reen in suction fine?
......
, failure by: training
impellers correctly?
i;entrifugal pump
57
If the
mechanism is
and
be because it wasn't put
coming adrift, this would almost
on properly in the first
(If we knew that th!s was so, then
the
failure mode should
be described as 'Im peller fitted
.) This
in turn means that the failure mode is most
to occur soon after start up,
as shown in
4.2
F in
"1
and we would
deal
with it by imnrm,inri
reinforces the
that the level at which we manage the
This
maintenance of any asset is not at the level of the asset as a whole this
and not even at the level
this case,
but at the level of each failure mode. So before we can deSV5,te1na1t1c, ""''""'""'"''h'"" maintenance management
for any
what
modes are (or could be).
"---- that one of the failure modes could be elimand another by improving
or procemode is dealt with by scheduled maintenance.
.,..,.,,,,,.,.,.,.,, 5 to 9 describe an
approach to deciding what is likely to
,1,,. .. i;"'"' w
be the most suitable way of uvuuu5
ith each failure.
Note also that the failure management c r d uh r. n c
one of several possibilities in each case.
For instance we could monitor impeller wear by monitoring the pump performance and only
the i mpeller when it needs it We also need to bear in mind
that adding a screen to the suction line adds three more failure possibilities, which
need to be analysed in turn (it could block up, it could be holed and therefore cease
to screen, and it could disintegrate and damage the impeller.)
C'hapters 6 to 9 examine these alternatives in more detail.
These points all indicate that the identification of failure modes is one
program intended
in the
to ensure that any asset continues to fulfil its intended functions. In praccontext and the
tice, depending on the complexity of the item, its
level at which it is being analysed, between one and thirty failure modes
are usually listed per functional failure.
i ssues in
The next two sections of thi s chapter consider two of the
this area u nder the
of failure modes
level of detail .
of the chapter consider failure
Thereafter._ the last three
sources of i nformation for an FMEA, and how failure modes and effects
should be listed.
nrAr'\ AQ,:;.,i
58
Reliability-centred Maintenance
IttflAL CAPABILllY
Yhat tun do
Figure 4.3:
Deterioration
Any physical asset that fulfils a function which brings it into contact with
the real world i s subject to a variety of stresses. These stresses cause the
59
asset to deteriorate
its
or more , , r,,,., .. ,, ,t-a h ,
sistance to stress.
resistance drops so much that the
can
deliver the desired
no
in other
it fails.
Deterioration covers all forms o f 'wear and tear'
of insulation,
abrasion,
modes should of course be i ncluded in list of failure. modes wherever
The level of detail with which
are thought to be
need to be recorded is discussed in the next
of this ,,n ,,r.r.::.r
o f fai lure modes. The first
Lubrication is associated with two
cerns lack of lubricant, and the second the failure of the l ubricant itself.
to lack of , ......... "' ." h ,....
With
i n the last two decades. Twenty years ago, the
were rei:>1erusr1ea manually. The cost of
was
compared to the cost of not
so. It was also tiny
to the cost of analysing the lub1icat.ion rea1m1err1ents of each point in detail.
a lubrication program. ""''"'"',,,
cise to
set up on the basis of a quick survey by a l ubrication
Nowadays
'sealed-for-life'
and centralised lubrihave become the norm in most industries. This has led to
cation
where a human has to
a massive reduction in the number
oil or grease to a
and a m assive increase in the consequences of
failure
the failure of
lubrication
From
v iewpoint, thi s means that it is now cost-effective to:
the
centralised lubrication
in their own
use RCM to
consider the loss oflubricant in the few remaining
lubricated
points as individual failure modes .
of failures associated with lubrication concerns
The second
deterioration of the lubricant i tsel f. It is caused
Q h ,:,. r, r, r, cr of the oil mo1ecu1es.
deterioration of the oil may be
by the buildtion.
or the presence of water or other contaminants. A lubricant
up of
because the wrong lubricant has been
m ay also fail to do its
used. If any or all of these failures modes are considered to be
in the
should be recorded and
context under consideration,
to transformer oil and
also
further
u p,, .., ,.,Uh. . .
,:>H'-'U.l.llle,
Maintenance
60
Dirt
Dirt or dust is a very common cause of failure. It interferes directly with
them to block, stick or jam. It also a nr, nr-, n,;
machines
cause of the failure of functions which deal with the appearance of assets
which should look clean look dirty). Dirt can also cause product
mechanisms of maeither by getting into the
into products
chine tools
As a
such food, pharmaceuticals or the
failures caused dirt should be listed in the FMEA whenever they are
likely to
with a
function of the asset.
,JtJLvLu..,,.
so the relevant
come adrift, the consequences are usually very
solthe failure
failure modes should be listed. These are
the failure of threaded
or rivets due to
components such as
electrical connections or
which
can also fail due to fatigue or
or which simply come undone.
Also take care to record the functions and associated failure modes of
the
...,....,,..,. .. , ... .rr1ecnaimsms such as split pins
integrity of assemblies.
Human errors which reduce C(li'Jatnlltv
capability' category of failure modes are
The final subset of the
those caused by human error. As the name implies, these refer to errors
of the process to the extent that it is unable
which reduce the
to function as required by the user.
l:::X<:Un>les include
operated valves left shut
process to be
unable to start,
,...,,._,., ..,.,,.,,,....,", fitted by maintenance craftsmen or sensors set
in such a way that a machine
out when nothing is wrong.
61
w
0
<(
a:
0
Sustained, deliberate overloading
LL
a:
In many industries, users quickly give w
in to the temptation simply to speed 0... ..........................
up equipment in response to increased
Figure 4.4:
demand for existing products. In other
Failure Mode Category 2
cases, people use assets acquired for
one product to process a product with different characteristics (such as
larger, heavier unit sizes or higher quality standards). People do this in the
belief that they will get more out of their facilities without any increase
in capital investment. This may even be true in the short term. However,
this solution carries long-term penalties in terms of reduced reliabil ity
and/or availability, especially when the increased stresses begin to approach
or exceed the ability of the asset to withstand them.
1
62
Maintenance
maintenance
to claim that 'there must be ,vu,.,.o-th,.-,,rr wrong with our maintenance' ' while
nc,gg:m_g the machine to death'. These
focus on what they want
tend to thi nk in terms of what
the
"-VllC> t ,.J.vLUt;.;,
rPlrr:lnf'rlTI<T
to increased demand
formal
programs. l11ese programs entail mc:reas1ng the capasuch as a production line to accommo ..
mu c h to the
more probof their sponsors, these programs often seem to end up
lems than
solve. This usually
because a few small
program, with
left out of the overall
terns or
results. How this occurs i s i llustrated in
4.5.
r,:H" r,,,. n rl
they should
from fa ilure modes of this
is
be recorded in the FMEA so that
can be dealt with "........,...,.,"'.,.., ,, t .,, ,
63
8 I 9
10
11
12
materials
Manufacturing processes often suffer functional failures caused by pro
cess materials which are out of specification (in terms of such variables
as consistency, hardness or pH). Similarly,
plants often suffer
from inadequate or incompatible packaging materials.
64
Reliabili(y-centred Maintenance
In both cases, the machines fail or run badly because they cannot handle
the out-of-spec material. This can be seen as an increase in applied stress.
In practice, these 'failure modes' are seldom the result of a failure of the
asset under review, but are nearly always the effect of a failure elsewhere
in the system. This means that remedial action has to be applied to a differ
ent asset. However, acknowledging these failures i n the analysis of the
affected asset helps to ensure that they will receive attention when the
system which is really causing the problem is analysed. As a result, these
failure modes should be incorporated in the FMEA where they are known
to affect the as"set under review, with a comment in the failure effects
column which directs attention to the real source of the problem.
Initial incapability
Chapter 2 explained that for any asset to be maintainable, its desired
performance must fall within the envelope of its ini tial capability. It went
on to mention that the majority of assets are in fact built thi s way. How
ever, situations do arise where desired performance is outside the envelope
of the initial capability right from the outset, as shown in Figure 4.6.
This incapability problem seldom
affects entire assets. It usually affects
just one or two functions of one or two
components, but these weak links upset
the operation of the whole chain. The
first step towards rectifying design
problems of this nature is to list them
as fail me modes in an FMEA.
Figure 4.6: Failure Mode Category 3
65
difficult to find an
level of
In practice, it can be
i t important t o d o
because the l evel of detail pro
detail .
of the FMEA and
amount of time needed
foundly affects the
to do it Too little detail and/or too few fai lure modes lead
Too m any failure
and sometimes
much detail causes the entire RCM process to take much
than i t
needs to. I n extreme cases, excessive detail can
the process to take
than necessary ( a
two or even three times
known as
This means that it is essential to
to
the
balance. S ome
factors which need to be taken into account are discussed i n the
follow ing n,:;i1 nr,lr'\h.:>
Causation
The causes of any fu nctional failure c an be defined to almost any level of
in different situations. At one
and different levels are
extreme, it is sometimes
to summarise the causes a functional
failure in one statement, such as 'machine fails' , At the other,
may need
to consider what goes wrong the molecular level and/or
the
remoter corners of the p syche of the
and maintainers in bid to
define so-called root causes of failure.
The extent to which fai lure modes can be described at different levels
of detail illustrated in Figure 4.7 on the next three pages.
Figu re 4.7 is based o n the pump set shown Figure 4.2, some of whose failure
m odes were l isted in Figure 4. 1 .
4.7
i n which the pump set
. These fai lure
suffer from the functional fai l u re
modes are considered at seven d ifferent levels of detail.
level
1 ) is fail u re of the pump
the motor, the switchgear and i nlet/outlet
nrc,on=issivelv more detail. When r'rn'lcu1or,nr1
levels have been defined and failure
1-''-'' ""''"'"''"' of this example only.
a re not any kind u niversal c1ass1t1cc1tic1n
4. 7 does not show all failure
at each level so don't use this
example as a defin itive model
it is possible to analyse some of the failure modes at even lower levels than level
to do
7, but it would very seldom be
' u nable to transfer
the fail u re modes listed only
water at all'. Figure 4. 7 does n ot show fai l u re modes which would
other
functional fail u res, such as loss of containment or loss
0,
0,
:::::i
<ti
3
0
fs-
("i:)
(/)
Cb
;:l.
a:;cti
?i5"
Q_
Br
:::.:
..
4. 7 (continued):
67
0\
00
tti
3
tti
C)
ti)
(1:5
;:!
St
5:::s
<"'I
69
Figure
Root causes
The term 'root cause' is often used in connection with the
of failures. It implies that if one drills down far enough, it is
to arrive
at a final and absolute level of causation. In
this is seldom the case.
For i nstance, i n Figu re 4.7 the fail ure mode 'im peller nut
listed
at level 6, which in turn is caused by an
error' at level 7 . If we were to
g o down one level further, the assembly e rror might have occu rred because the
'fitter was distracted' (level 8). H e m ight have been distracted because his 'child
was HI' (level 9). This fail u re might have occurred because the 'child ate bad food
in restaurant' (level 1 0) .
Clearly, this process of dri ll ing down could go on almost forever way
beyond the point at which the
the FMEA has any constresses ret)eDlrec11v
trol over the failure modes. Thi s is
that the level at which any failure mode should be identified is the level at
which it is possible to identify an appropriate failure
out an FMEA before failures
(This is equally true whether one is
occur or a 'root c ause analysis' after a failure has ""''-'"",.,,,,.....
The fact that the level which i s appropriate varies for different fa ilure
modes means that we do not have to list all failure modes at the same level
on the Information Worksheet. Some failure modes
be identified at
level 2, others at level 7, and the rest somewhere in between.
H l UaH<.<J>','-",C 1 H.,"ll'-
For i nstance, in one particular context, it may be appropri ate to l ist only those
failure modes shaded i n grey i n Figure 4.7. I n anoth e r context, it may be appro
priate for an entire FMEA for an identical pump set to consist of the single fail ure
mode 'pump set fails'. Another context may call for yet another selectio n .
70
Human error
Part 3 of this chapter mentioned a number of '"'-- - ways in which
human error could cause machines to fail. It went on to
that if the
they should
associated failure modes are thought to be reasonably
'"'"' r""""'..,,.,."" r1 i n the FMEA. This has been done i n
where all
w ith the word 'error' are some fonn of human
a brief summary
issues involved in the
error.
classification and u1duu5vwvlH. of such errors .
't'Yl<'.l, Tl'l ,lTO ,n1c" '""
Probability
Different failure modes occur at different
Some may occur
regularly, at average intervals measured in months, weeks or even
Others may be extremely improbable, with mean times between occur
rences measured in millions of years. When preparing an FMEA, decisions must be made
as to what failure modes are so unlikely
be
that
This means that we do not try to list every
can
failure possibility re.Q:an11e:ss of i ts likelihood.
JrfHI T . .. , ,.H
71
a review of
schedules should only be c arried out
has been comr:ne1:eo
a final check after
rest of the RCM
in order to reduce the possibility of
the status quo.
users of RCM are
to assume that all
fail ure
and hence that these
modes are covered their
are the
fa i lure modes which need to be considered in the FrvlEA.
This assum'PtKm
are con
to deal
we don't want to
w aste time on fa ilures which have never occurred before and which are
the context in
are i nstalled on the motor
the
For
'sealed-for-life'
4. 7. This means that the l i kelihood of lubrication failure
pump shown in
is low - so low that it would not be i ncluded in most FMEA's. On the other hand,
failure due to lack of l ubricant probably would be included in FMEA's nn::, n::l rart
for manually lubricated components, centralised lube systems and aet:ub1Jxeis .
However, the decision not to list a failure mode should be r,,.,. ,,.n r., " ,..'"'''
careful consideration of the failure consequences.
n ,:>c< ru, -n
n
Y.U\.n.)UV11.
Consequences
If the consequences are likely to be very severe
then less
failure possibilities should be 1isted and
For instance, if the pump set in
assembly
the failure mode
sky' would be wsmH;se, o mnmemately
pump were pumping
mode is more likely to be taken .;,,:;;, ,v..,,-:.. v
(Appropriate failure management
might be
to ban aircraft from
or to
flying
roof which can withstand a ,.,..c,chinr,
72
Another example from
is 'motor not switched on'. This failure m ode
is
to be dismissed o n
o f impro bability i n most situations. Even
if it does occur, the consequen ces may be so trivial that it is excluded from the
if it could occur and it does matter - especial ly in cases
FMEA.
and something could
m ust be switched on in a
if they are not - then this failure mode should be considered. )
Cause vs Effect
Care should be taken not to confuse causes and effects when listing failure
modes. This a subtle mistake most often made by people who are new
to the RCM process.
nv.::,c"'' all of the same design and all
For example, one p'lant had some 200 /'10!:l
-,...,rh,
:.,,.,,""'
no1'tnrmi
r1n
or
same
less
m
ore
o
n
the same type of
function
the
f n iti,.,, ; ... ""t'
1"1"\!lr.Utlr"li"t failure modes were recorded for one of th ese rtOC>Yht'\VOC''
Gear teeth stn1opea.
These failure modes were listed to begi n with because the people carrying out the
review recalled that each failure h ad happened in the past to their knowledge
were
old). The failures did not affect
of the
So the implication was that it might be worth doing
but they did affect
n.rc,HOir"ITl\,tO tasks like 'check
for wear' or 'check gearbox for backlash',
However, furthe r discussion revealed
and
that both failures had occurred because the oil level had not been checked when
it should h ave been, so the
had
due to lack of oil. What
had failed if they had been
is m ore, no-one coul d recal l that a ny of the
property lubricated. As a result, the failure mode was eventually recorded as:
Gearbox fails d u e to l ack of oil.
obvious proactive task, which was to check
This u nderlined the
is not to suggest that all gearboxes should be anathe oil level
i n this way. Some are m uc h more complex or m uc h more heavily loaded,
to a wider variety of failure modes. In other cases, the failure
and so are
m ay be much more severe, which would call for a more defensive
fail u re OO!SS1tm1t1!es.
73
'"
TYtYVH'rn,r,:;,
74
Evidence of l<"ailure
Failure effects should be described in a way which enables
to decide whether the failure will become evident to the
the RCM
,,..,.,,,,.,.,,, ........ ,.. crew u nder normal circumstances.
should state whethe r the failure causes warning
For i nstance, the
to come o n or alarms to sound (o r both) , and whether the warning is given
on a local
or in a central control room (or both).
75
Modern industrial
proportion of failure
if there is a possibility that someone could
hurt
ment
killed as a direct resul t of the failure, or an environmental standard or
the failure
should describe how this
could be
""""' ""' 1 ', h r,n
76
In this context, downtime means the total amount of time the asset
to this
would nonnally be out of service
from the moment
As indicated in
fails until the moment it is folly
much
this is
Figure 4.8:
Downtime vs
n , r, ih"t c h ltt
In addition to downtime, any other ways in which the failure could have
effect on the operational c apability of the asset should be
a
listed. Possibilities include:
whether and how product quality or customer service
if so whether any financiaJ penalties are involved
whether any other equipment or activity also has to
slow down)
whether the failure leads to an increase in overall ""''"'
"'h ,,
'-'Fi...,...:tt.ug
costs in
addition to the direct cost of
(such as higher energy
(if any) is caused by the failure.
what secondary
Corrective Action
78
In
few manufacturers are involved in the day-to-day operation of the equipment After the end of the
almost none
users about what fails and why.
reeao,1cK from the
The best that many of them can do is try to draw conclusions about how
mpnt is performing from combination of anecdotal evidence
1 l-''""'"'u"
'-''"f 'U
their 1::.n
'.l n,h'"'
1 " of spares sales (except when a really spectacular failure
and an ........
.:1 ":"
occurs, in which case
tend to take over from
At this
rational technical discussion about root causes often
Manufacturers also have little access to infommtion about the
context of the equipment, desired standards of perfonnance, failure
consequences and the skills of the user's
and maintainers. More
often the manufacturers know nothing about these issues. As result.
FMEA's compiled by these manufacturers are usually
and often
highly speculative, w hich
limits their value.
The small minority ofequipment manufacturers who are able
a
FMEA on their own
fall into one of two '"'' T,:,. c..,, ....,. ..
!"
79
the operating context may be different: The operating context of your asset
may have features which make it susceptible to failure modes that do not
list Conversely, some of the modes in the
appear in the
improbable (if not impossible) in your context.
list might be
peef<Jnnance standards may differ: your asset may
to standards of
which mean that your whole definition of failure may be
completely different from that used to develop the
FMEA.
vv,, .< VJllU'"''"'"'
80
are often mc:on101ete
more often than not,
describe what was done to
the failure
main bearing') rather than what caused it
they do not describe f a ilures which have not yet occurred
they often describe fa ilure modes which are
the effect of some
other failure.
These drawbacks mean that technical history records should only be used
source of information when
an FMEA, and
as a
never the sole source.
i -., ic,. ...,.,.,._,""
Level of analysis
RCM is defined as a process used to determine what must be done to
e nsure that any
asset continues to do whatever users want
to do in its
context. In the l ight of this "''""' U H v ....
have seen that it is necessary to define the context in detail before \VC can
we also need to define
what the
appl y the process.
'physical asset' is to whi c h the process wiH be applied.
RCM to a truck, is the entire truck the
For example, if we
subdivide the truck and
(say) the drive train "'"''"""'" T'"'1"
the chassis and so on? Or should
further subdivide the
system, the
{say) the
from the
drive train and
not be
differentials, axles and wheels? Or should the
fuel
b lock, engine management system , cooling
so on before
starting the analysis? What about subd ividing the fue l Sy'Stem into tank, pump,
and filters?
out the
For example, when thinking about the fail u re modes which coul d
vehicle, a possibility which comes t o mind i s a b locked fuel
The
part of the fuel system, so it seem s sensible to address this failure
4.9 indicates that if the
a Worksheet for the fuel system.
out at this l evel, the b locked fuel line
be the seventh failure mode
a dozen which could cause the functional fail ure
identified out of a total
'unable to transfer any fuel at all'
82
RCM II
INFORMA TION
----- --..". -----.... --.. WO RKS HEET lsussvsM..."----
at a low level,
failure consequences.
83
at the top
the
Instead
towards the bottom of the ernLuurm<nt hierwe could start at the top.
For example, the primary function of a truck was listed on page 28 as follows: To
transport up to 40 tons of material at
of up to 75
60
from Startsville to Endburg on one tank of fuel. ' The first functional failure
ciated with this fun ction is 'Unable to m ove at all'.The fou r failure modes shown
4.9 could a l l cause thi s functional failure,
i n stead o f
listed o n
in
Worksheet for the fuel system ,
could have been l i sted o n a
an
Worksheet covering the e ntire truck, as shown i n
4 . 1 O.
RCM II
SYSTEM
40 ton
INFORMATION
--- - --- - - -WORKSHEET i:Fvc?'i'Eu----------------
84
,,. ..,.""'"'''' at thi s level are as follows :
.........,,.+r.,,..,...,,,,..'"''"'"' e:x:p(cr:mcms are much easier t o define
failure consequences are much easier to assess
it is e as ier to identify and
control
and circuits as a whole
there is less repetition of functions and failure modes
it not necessary to raise a new information worksheet for each new
carried out a t thi s level consume far less paper.
sub-system, s o
1-1r-.u,.,.,.n, ... the main
the analysis at this level
that there are hundreds of failure modes which could render the truck
,,T,. .,,.,,,,""'"' u nable to move. These range from a flat
crankshaft. So i f we were to
to list all the failure modes at this level,
likely that several would be overlooked am)gemer.
it is
For i n stance, we have seen how the b locked fuel
have been the
seventh failure mode out of twelve to be identified i n the
carried out at
level. However, at the truck level, Figure 4. 1 O shows that it might
the 'fuel
have been 73rd out of several h undred failure modes.
SUgest that
at an i ntermediate level. In fact,
it may be
we are almost spoiled
because most assets can be sub--divided
into many levels and the RCM process applied at any one of these
'.) t\':d "U C: P C'
G iven
how do we select
the level at which to perform the
We have seen that the
level usually embodies too many failure
modes per function to permit sensible analysis. In spite of thi s however,
the main Jimctions of the asset or
we still need to
at the
highest
in order to provide a framework for the rest of the analysis.
o.v,.11.11,, vv a truck to carry goods from A to B, not to pump
a fuel l ine. Although the latter function contrib utes to the former, the
or"t...,nr,n-.:u"I" ' of the asset and hence of its maintenance tends to be
overall ,.,n,._,,;
,,,,,.,.,,...,,.,
levels. For i nstance, the chief executive of a truck fleet is
judged at the
to ask 'how i s truck X pertorming?' than 'how i s the fuel system
much more
r,a,-trw
on truck X F'"''
,..,, iirnir,f'1'J'
.... 1::1 (unless the fuel system is known to be causing a problem).
85
(jearbo;r
Propsliaft
'Engine 6[ocl(
Coofing
'Jue{sgstem
4
:Jue[ tank
5
:Jue{pump
A Unable lo carry any
fuel at a!I
etc....
Figure 4.11: Functions and failures at different levels
86
Reliability-centred Maintenance
On the other hand, we have seen the initial i nclination is nearly always
For this reason, a good general rule
to start too low in the asset
esr>ecrntLY for people new to RCM) is to carry out the analysis one level
or even two levels higher than at first seems sensible. Thi s is because
easier to break complex sub-systems out of a high-level anait is
than it is to go up a level when one has started too low. Thi s is dis
cussed in more detail in the next section of thi s chapter.
With a bit of practice (especi ally concerning what is meant by 'a level
at which it is possible to identify a suitable failure management policy ' ),
the most suitable level at which to carry out any analysis eventually be
comes i ntuitively obvious. In this context, note that it is not necessary to
at the same level throughout the asset hierarchy.
every
For instance, the entire braking system could be analysed at level 2 as shown i n Fig
u re 4 . 1 1 , but lt may be necessary to analyse the engine at level 3 or even level 4.
Option 1
List all the reasonably l ikely failure modes of the subassembly individu
ally as part of the main analysis in other words, at levels equivalent to
level 3 , 4, 5 or 6 in Figure 4. 7 .
For example, consider a n asset which could stop completely as a result o f the
failure of a small gearbox. On the I nformation Worksheet for this asset, this gear
box fail u re could be listed as shown below:
In general, the the failure modes which could affect a :st11Ja,:-:;cu llJ1
is
to
be
in a higher level
if the
suffer from no more than about 6 failure modes which
considered to
any
functional failure of
and which \Vill
be worth
the higher level
Option 2
a
failure mode the Infom1aList the failure of the
tion Worksheet to
then rai se a new worksheet to 'A ""''"' '"'
.,.,...,,,.,... ,".,.,. functional fai lures, failure modes and effects of the sut)-assemt)lV
exercise.
been
For o.v!::.mr.io. the failure the gearbox discussed above could
as follows:
'"' PT'\".>r <TP
A
is usually w orth
in this
ure modes of the
one function
could cause the loss
the main as;en101v
there are between 7 and 9 fai lure modes per functional
u se
in mind that ,,.,., .,........ ,,.., tl n iH u c::P-2 mean
option one or option two,
but fewer fai lure modes per
more
Option 3
List the failure of the Slm-as;en101.v on the Information Worksheet a
at level
to level one or
in other
and leave
record its
gearbox
to treat
failure
For example, if it was considered
discussed above in this fashion , it wou ld be listed shown overleaf:
88
or sutiassemo1v
should
be adopted for a
This
which has the following characteristics:
to detailed diagnostic and repair routines when it
It 1s not
to l ater repair
but is simply replaced and either d iscarded or
complex
it is
small but
it does not h,ive any dominant failure modes
to any form
it is not likely to be
Option 4
a complex
might suffer from one or two domand a n umber of less
inant failure modes which are readily
common failures
not be
and/or the consequences of the failures do not warrant it
a small electric motor operating i n a dusty environment m ight be
For
its cooling fan gets blocked,
certain to fai l due to overheatin g if the
far between and not very serious if they
do occu r. In this case, the failure modes for this motor might be listed as fol l ows:
motor fan blocked by dust
motor fail s
other
This option is
Services
The failure of services
water, steam,
gases, vacuum, etc) are
treated as a
failure mode from the point of view of the asset which
is supplied by that
because detailed analysis of these failures is
beyond the scope of the asset in question. Such failures are noted
their effects recorded and
supply
for infom1ation purposes
as a whole.
in detail when the service is
are then
Worksheet
listed in the l as t column of the Information Worksheet
4. 1 3.
the relevant fa ilure mode, as shown in
,.,...,.,,, n ;..,,,
'-"'-""';;;,,,.a ...,v
SYSTEM
5 MW Gas Turbine
SYSTEM N2
05
SUB-SYSTEM
C!
......
1
1
nature
-.
.,._
:::::
'"'j
31
(b
::0
s-
o'
::::<-.
m.
(b
00
\0
Failure Consequences
Failure
91
to
each
each failure mode.
This combination of context, standards and effects means that every
set
failure has a
associated with If the
then considerable efforts will be made to
quences are very
at l east to
it in time to reduce or eliminate the
the
true if the failure could hurt or kill someconsequences. This is
or i f it is
to have a serious effect on the
It is also
true of f ai lures w hi ch i nterfere with production or
which
if the failure only has minor c onsequences, it is
possible that no proactive action w i l l be taken and the failure simply cor
rected each time i t occurs.
This
that the consequences of failures are more ""'""' """
than their technical characteristics. It also suggests that
whole i dea of
proactive maintenance is not much about
failures it is
of failure.
the
about avoiding or
Maintenance
92
Three pumps
Stand Alone
Duty
Stand-by
Failures of this kind are classed as evident because someone will even
tually find out about it when they occur on their own. This leads to the
following definition of an evident function:
Failure
93
Reliabilitycentred Maintenance
This approach also means that the safety, environmental and economic
which is much
consequences of each failure are assessed in one
them separately.
more cost-effective than
of these t"'"" '"' .."'"''
next four sections of this
on to the
with the evident
and then
in detail.
rather more complex issues
0 1
Failure
95
96
Maintenance
ovamnto consider a failure mode which could result in death or i njury to ten
(what could happen). The probability that this failure mode cou l d occur is
one in a thousand i n
one year (how likely it is to occur). On the basis of these
with this failure is:
the risk
per 1 00 years
1 0 x (1 in 1 000)
1
N ow consider a second failure mode which could cause 1 00 0 casualties, but the
probability that this failure could occur is one in 1 00 000 in any one year. The risk
associated with this fai l u re is:
1 000 X (1 i n 1 00 000}
upon which
the risk is the s ame
In these
it is based are
different. Note also that these
do not indiquantify it Whether or
cate whether the risk is tolerable they
is a
not the risk is
and m uch more difficult question
which is dealt with later.
in more detail.
What could
occurred?
Two issues need to be considered when ""'"H'H-.ar, -n n what could h appen
if a failure were to occur. These are what
and whether
to be hurt or killed as a result.
anyone is
What
if any failure mode occurs should b e recorded
as explained at
on the RCM Information Worksheet as its failure
4. Part 5 of Chapter 4 also listed a number of typical
in
or the environment.
effects which pose a threat to
The fact that these effects could hurt or kill someone does not necesmean that they will do so every time
occur. Some m ay even
so. However, the issue is not whether
occur quite often without
such consequences are
but whether they are nr.,c,,n in ,,,,.
For ex,im1J1e. if the h ook were to fai l on a travelling crane used to carry stee l coils,
load would only h u rt or kill anyone who h appened to
under
to it at the time. If no-one was nearby, then no-one wou ld get h u rt
However, the.po;s10111tv that someone could be hurt means that this fail u re mode
accordingly.
hazard and
should be treated as a
Failure
97
This example demonstrates the fact that the RCM process assesses
consequences at the most conservative level. If it is reasonable to assume
that any failure mode could affect
or the environment, we assume
to further
that it can, in which case it must be
(We see
is taken into consideralater that the likelihood that someone will
tion when evaluating the tolerability of the risk.)
A more complex situation arises when dealing with
hazards that
covered
some
form
of
built-in
protection.
We
have seen
are
that one of the main objectives of the RCM process is to establish the most
effective way of managing each failure in the ..., .,,,, ......,,,. ...
This can only be done if these consequences are evaluated to
as if nothing was being done
the failure
its consequences).
diet or prevent it or to
to dea] with the failed or the
Protective devices which are
failing state ( alarms, shutdowns and relief systems) are nothing more than
built-in failure management systems. As a result, to ensure that the rest
of the analysis is carried out from an appropriate zero-base, the conse
quences of the failure of protected functions should ideally be assessed
as if protective devices of this type are not
For example, a failure which could cause a fire is always
as a
of a fire-extingu ishing system does not necessarh azard, because the
ily guarantee that the
will be controlled and
98
it may be prudent to list a wildly unlikely but nonetheless
failure mode in an FMEA, purely to
on record the fact
In these cases, a comment like
that it was considered and then
''This failure mode is considered too unlikely to j ustify further
should be recorded in the failure effects column.)
UVA U'-".u''""'
('1Ql'\n.c,r,'\lH,
This
illustrates the relationship between the probability of being
killed which any one person is prepared to tolerate and the extent to which
terms, this
that person believes he or she is in control. In more
5.2.
vary for a
i ndividual as shown in
104-+-----"'k::----------+----------t-------t-
10-s-+---+----------4':::----+---t----10-1-----+--------+---"""----+--
107 -+-----'""-+----,--------+---------+-------.::::......i:----
Figure 5.2:
Tolerability
of fatal risk
I believe I have
complete control
(driving my car
or in my home
workshop)
I believe I have
some control and
some choice
about exposing
myself (on the site
where I work)
I believe I have
I have no control,
no control, but and no choice about
I don't have to
exposing myself
and/or my family
expose myself
(in a passenger (off-site exposure to
aircraft)
industrial accidents)
99
do not necessarily reflect the views of the author - they
what one individual might decide that he or she is n,,,,nn .-.c>rf
Note also that they
based on the
of one individual going
about his or her daily business. This view then has to be translated into a
of risk for the whole
(all the workers on a
all the
In other words, if I tolerate a probability of 1 1 00 000
in any one year and I have 1 000 co-workers who all share the
we all tolerate that on average 1 person per year on our site will
every 1 00 years - and that person may be me, and it may
..,...,u,,,.1ti,m
a,,. v- u of risk in this fashion can
Bear in mind that any 1:rnmfifi
if I
I
be
In other
l 0-5 , it is never more than a ballpark
It indicates that I
to tolerate a probability of being killed at work which is
lower than that which I tolerate when I
the roads
with <:1 r,n ,,v,m,>1-H>.l"'i<'
bearing in mind that we are
to translate the
which
are
to tolerate that any one
work into tolerable probabili ty for
mr, 1 r 11"' 1 "" failure) which could kill someone.
For example, continuing the logic of the previous example, the orC)babllttv
, thatany
one of my 1 000 co-workers will be killed i n any one year ls 1 in 1 00
that everyone on the site faces
the same
Furthermore, if
activities carried out on the site
(say) 1 0 000
which
kill
event could kill one
someone, then the average
be reduced to 1 05 in any one year. This means that the orc>babllrtv of an event
must be reduced to 1 Q-7 , while the
., of an
which is l ikely to kil l ten
event which has a 1 in 10 chance of killing one person must be
to 10-5.
1
100
Although
of control usually dominates decisions about
the tolerability of
it is by no means the only issue. Other factors
which
us decide what is tolerable include the L'""v v, , .. .::...
is well beyond the
this issue in any
individual values: To
scope of this book. Suffice it to contrast the views on tolerable risk
likely to be held a mountaineer with those of someone who suffers
from
or those of an underground miner with those of someone
who suffers from
industry nowadays
the need to
the fact that some are
as possible, there is no
intrinsically more
than others. Some even
for
The views of any individual
of risk with
who works in that industry
boil down to his or her perception
whether the
of whether the intrinsic risks are 'worth it'
the risk.
benefit
of children - especially
the
on
unborn children has an
effect on
views
about what is tolerable. Adults
display a surprising and even
"''"'u ....,.".,,,u;:;.. c11;rei2:ar11 for their own safety.
to
1
E' l 'l, H C trAt'lYi,r>1",11 ".l
ov;::unnlo the author worked with one group which had occasion to discuss
of a certain chemical. Words like 'toxic' and 'carcinogenic' were
the
treated with indifference, even thoug h most of the members of this group were
m ost at risk. However, as soon as it eme rged that the chemical was
and the meaning of these words was explained
also mutagenic and
to the group, the chemical was suddenly viewed with m uch g reater respect.
Failure
101
1 02
Safety and Proactive Maintenance
If a failure could
lates that we must
nrP '1PlnI
No
Figure 5.3:
Identifying and
developing a
maintenance
sm,1Rf1v for a
ffki;ehich
affects
or
the environment
Yes
Yes
No
See Parts
4 and 5 of
this chapter
1 04
VUl.<'-'leslVH
increased
1 05
As we have seen. these consequences tend to be economic in nature, so
they are usually evaluated i n economic terms. However, in certain more
extreme cases (such as losing a war), the 'cost' may have to be evaluated
on a more qualitative basis.
Avoiding Operational Consequences
The overall economic effect of any fai lure mode which has
consequences depends on two factors:
.n.r., ,. ...
how much the failure costs each time i t occurs, in terms of its effect on
operational capability plus
costs
how often it happens.
In the previous section of this chapter, we did not pay much attention to
how often failures are l ikely to occur. (Failure rates have little bearing on
s afety-related fai lures, because the objective in these cases is to avoid any
failures on which to base a
if the fa ilure consequences are
Pl'()nnmtt' the total cost is affected by how often the consequences are
l ikely to occur. In other words, to assess the economic impact of these
failures, we need to assess how much they are likely to cost
of time.
, u u a,
1 06
Assume that it
mode which can affect
seizes due to normal wear and tear'. For the sake of simplithis pump is
city, assume that the motor on thi s pump is equipped with an overload switch, b ut
there is no
alarm wired to the control room.
This failure mode and its effects might b e described o n an RCM Information
Worksheet as shown i n
5.5 above.
jJvl
.. ,; ....w.,'
Water is d rawn out of the tan k at a rate of 800 litres ""
""'' ...,...,,,._,
,.t,... so the tank runs
hours after the low level alarm sounds. It takes 4 hours to
so the downstream process stops for 1 .5 hours. So this failure costs:
1 .5 X 5 000 = 7 500
every three years, plus the cost of rep lacing the bearing.
in lost
Assume that it 1s
feasible to check the bearing for audible noise
once a week
basi s upon which we make this kind of judgement is discussed
In the next
If the
i s found to be
the operational
at
consequences failure can be a voided by ensuring that the tank is full before
work on the
This provides five hours of storage so the bearing can
without i nterfe ri n g with the downstream p rocess.
in four
now be
Assume also that the pump is located i n an u n manned pumping station. It has
that the check should b e carried out by a maintenance craftsman,
been
and that the total time needed to d o each check is twenty minutes. Assume further
in which case it costs
i s 3 years, h e will do about
each check. the MTBF of the
per failure. In other words, the cost of the checks is:
1 50 X 8
1 200
plus the cost of replacing the bearing.
every three years,
1"' the scheduled task is clearly cost-effective relative to the
In this ,>vqm:'1.,,.,,
l consequences of the
cost of the ''!:'pr:itinn!\
v.u .....
This ,.,... ,.,:1-,.._,,., ...u that if a fail ure has operational consequences, the basis for
r1
whether a proactive task is worth doing is
as fo llows:
'-A<-4 . . .
1 1 1
' " ,,.. "
Failure
107
Figure 5.6:
/de:7tifY_in9. and
OBIIBl(JDlfJG 8
maintenance
strategy for a
failure which
has oo.ernmonat
consequences
108
Note that this analysis i s canied out for each individual failure
not for the asset as a whole. This i s "'"'"'' ""'''"'
n,'1-.,n.n r,:,,rt to the costs of the failure mode which it is meant
In each case, it a
go decision.
individual failure modes in this way, it i s
when
necessary to do a detailed cost-benefit study based on actual
not
downtime costs and MTBFs as shown in the
on
106. Thi s
because the economic desirability
tasks is often intuiti vely
failure modes with operational consequences .
obvious when
whether o r not the economic consequences are evaluated
this aspect of the RCM process must still be applied
thoroughly. (In
this
is surprisingly often overlooked by people
new to the process. Maintenance people in particular have a tendency to
m11:,1e1rne11t tasks on the basis of technical feasibility alone, which results
maintenance .......,...,...,.,.."l l'Y'H'
but
in
bear i n mind that the operational consequences of any failure
are heavily i nfluenced by the context in which the asset is
This
is
another reason
care should be taken to ensure that the context
maintenance program
is identical before
for one
issues were discussed in Part 3 of Chapter 2.
Figure 5.7:
Pump with
Offtake from
tank: 800
litres/minute
Failure Lo.nst:.'auen1ces
109
The duty pump is swi tched on by one float switch when the level in Tank Y
d rops to 1 20 000 l itres, and switched off by anothe r when the level reaches 240
000 litres. A third switch is located just below the low level switch
and this switch is designed both to soun d an alarm in the control roo m if the water
level reaches it, and to switch on the stand-by pump. If the tan k runs
the
downstream p rocess has to be shut down. This also costs the nr,...,, "',....'"'"'t,nn
uses the pump 5 000 per hour.
As before, assume that it h as been agreed that one fail u re mode which can
affect the d uty pump is 'bearin g seized', and that this seizure is caused by normal
wear and tear. Assume that the motor o n the d uty pump is also
with an
over-load switch , but again there is no trip alarm wired to the contro l room. This
fai l u re m ode and its effects might be described on an RCM I nformation Work
sheet as shown in Figure 5.8:
In this example, the cost of doing the scheduled task is now much
than the cost of not doing it As
it is not worth
the
task even though the pump is technically identical to
that it is only worth
a failure
This
in
which has non-operational consequences if, over a period of time, the cost
of the
task is less than the cost
the failure. ff it is
not, then scheduled maintenance
protected functions:
it is only valid to say that a failure will have nonoperational consequences because a stand-by or redundant component
is available if it is reasonable to assume that the protective device will
be functional when the failure occLIrs. This of course means that a suitable maintenance program must be applied to the protective device (the
stand-by pump in the example given above). This issue is discussed at
length in the next part of this chapter.
If the consequences of the multiplc failure of a protected system are
particul;trly serious, it rnay be worth trying to prevent the failure of the
protected function as well as the protective device in order to reduce the
probability of the multiple failure to a tolerable level. (As explained on
Page 97, if the rni~ltiplefailure has safety consequences, it may be wise
to assess consequences as if the protection was not present at all, and then
to revalidate the protection as part of the task selection process.)
For example, Pump C in Figure 5.7 can be regarded as a protective device, because it 'protects'the pumping function if Pump B should fail. Pump B is of course
the protected function.
The existence of such a system creates two sets of failure possibilities, depending on whether the protective device is fail-safe or not. We consider
the implications of each set in the following paragraphs, starting with
devices which are fail-safe.
Fuil-saje protective devices
In this context, fail-safe means that the failure of the device o n its o w n will
become evident to the operating crew ~lndernor~nalcirctrnixtances
protected functions:
it is only valid to say that a failure will have nonoperational consequences because a stand-by or redundant component
is available if it is reasonable to assume that the protective device will
be functional when the failure occLIrs. This of course means that a suitable maintenance program must be applied to the protective device (the
stand-by pump in the example given above). This issue is discussed at
length in the next part of this chapter.
If the consequences of the multiplc failure of a protected system are
particul;trly serious, it rnay be worth trying to prevent the failure of the
protected function as well as the protective device in order to reduce the
probability of the multiple failure to a tolerable level. (As explained on
Page 97, if the rni~ltiplefailure has safety consequences, it may be wise
to assess consequences as if the protection was not present at all, and then
to revalidate the protection as part of the task selection process.)
For example, Pump C in Figure 5.7 can be regarded as a protective device, because it 'protects'the pumping function if Pump B should fail. Pump B is of course
the protected function.
The existence of such a system creates two sets of failure possibilities, depending on whether the protective device is fail-safe or not. We consider
the implications of each set in the following paragraphs, starting with
devices which are fail-safe.
Fuil-saje protective devices
In this context, fail-safe means that the failure of the device o n its o w n will
become evident to the operating crew ~lndernor~nalcirctrnixtances
l l2
Reliability-centred Maintenance
Time
__
I ed +-r-:
Protected
function
Protective
device
1: Failure of "fail
safe" device is
evident immediately
3: Protective device
reinstated: situation
back to normal
Failure
p ressurisation. Similarly, if
113
1 14
Reliability-centred Maintenance
----""'llll:"I
....
ed...,
P_
ro_
te_
ct.,...
c_
te_
d,..._______.......P
...,.r...,,ot....e,...,
function
function <>prJi09
wit91JtprQtection
3: If protected function
fails here, the result is
a multiple failure
protectivedevi ce
has failed
I
I
npone
.t l:>ecaus,
knows. thatthe
Protective
;....;..,,;,..devic
=;.;..;.;;;.
e +-------....:
1: Failure of non -fail
safe device is not
evident to operators
Figure 5.10:
I n the case of the relief valve, i f the p ressure i n the vessel rises excessively while
the valve is jammed, the vessel will p robably explode ( u nless someone acts very
quickly or unless t here is other p rotection in the system). If Pump B fails while
Pump C is in a failed state, the result will be total loss of pumping.
In the first of these examples, the consequences of the multiple failure are
very serious indeed, so we would go to great lengths to preserve the inte
grity of the hidden function. In the second case, the consequences of the
multiple failure are purely economic, and how much it costs would influence
the hidden failure.
how hard we would try to
Failure
Further
could follow if
LA<CU H 'i\.,,
I J5
u n u u ,,J < '-/
to shut down a
vibration switches: A vibration switch
i
n
such
a
way
that
its
failure
is h idden. Howbe
fan
ever, thi s only matters if the fan vibration
tolerable l imits
,.,...,,.. ,,., ..,.., the fan
the fan i tself
consequences
ultimate level switches: Ultimate level switches
to activate
if a primary level switch fails
an alarm or shut down
operate. In other
if an ultimate low level switch
there are
no consequences unless the
switch also
in which case the vessel or tank would run
1 16
Reliability-centred Maintenance
Average unavailability
of the protective device
1 17
Figure 5. J1:
of
Figure 5.Jla:
-------=--1r ---------:;}
Protected
function
Protective
device
Figure 5. 1 lb:
Probability
Figure 5.llc:
Probability of a
failure
note
page 96
lvlaintenance
1 18
pumps i n
m ay b e such that the users
tolerate a probability of m ultiple failure
Assume that it has also been
of less than 1 in 1 000 in any one year
maintained, the mean time between
estimated that if the d uty pump is
mant1c:1pa:tea fail ures of the duty pump can be increased to ten years, which
of failure i n
o ne year of one i n ten, or 1 0 1
corresponds to
the u navailSo to red uce the
pump must not be allowed to exceed 1 02 , or 1 %. In other
of the
words m ust be maintained in such a way that its
exceeds 99'%,. This
5. 1 2 below.
is i l lustrated in
Probability of failure in any one year reduced to 1 in 10
Protected
""",_-_.,....-+------=------'"""-----------*
Fails
fu nction
Protective
device
Unavailability 1%
Availability 99%
JJ'-,',;>c>J,ILJ.lV
Failure of
Protected
Function
Failed State
of Protective
Device
Multiple
Fai lure
mistake
in
Tolerable
Rate of Multiple
Failure
1 O per m onth?
memo or
email
Motor bums out:
500 to rewind
Duty
Boiler over
Relief valves
shut
1 in 50
1 in 1 000
1 in 10 000 000
Failure
1 19
.,,.,,..,,., ...0
:n
1
!Q) 0! 1 0-1
0
-I
(I)
>,
.... C:
(I) (ti
1 0-2
1 0-3
1 04
Trivial
(no cost)
up to
100
1 000
10 000
and over
un i versal
to tolerate
1 20
5.2 and 5 . 14
be possible to produce a
that it
risks and economic risks in one
schedule of risk which combines
continuum. How this
be done is discussed in Appendix 3.
In some cases, it may
sometimes impossible
.... ""'""'n q1L1 ar1tm:1mre analysis of the probabi li ty of multiple
failure in the manner described above. In such cases, it may be enough to
make a judgment about the required availability of the protective device
based on a qualitative assessment of the reliability of the
func
consequences of the mul tiple failure. This approach
tion and the
is discussed further i n
8. However, if the multiple failure is partianalysis should be
cularly
The
consider in more detail how it i s
to
influence:
- the rate at which protected functions fail
- the availability of protected devices.
11 0
tJVJc _LV,Ui.l.''-U .
the pro-
some sort of
maintenance
f Pr1 function is operat ed
the \Vay in which the f"\rt'\fP.,('
t''-vvv1.r
.....r..f' a r,, f'
n rl function.
vu,.... u;.,,u;., the
of the P.l vv..A,\.v...;
n r,-1,1'.o,l't'l!L>
increase the availability of the r' v,'-'u: .._ device by
some sort
maintenance
...,.,...,..., ...,....... periodicall y if the protective device has failed
the pro tee ti ve device.
,n"H">ITJnH
Prevent
protectedfunction
We h ave seen that the probability of a multiple failure is partly b ased on
the rate of failure of the protected function. This could almost
or operation of the protected
the
be reduced
by changing its
or even
a last
if the failures of a protected function can be anticipated
"'r,.,,u,:,,n,>'fl the mean time between
failures of this function would be increased. This in turn would reduce the probability of the
multiple failure.
1 21
one way to prevent the simultaneous failure of
For
to try to p revent u nanticipated failures of
B. By
these fail ures, the mean time betwee n failures of
failure would be
so the probability of the
shown i n
5. 1 2 .
H (11u""""'r
is that the
Prevent the
In order to
must try to ensure that the hldden
function is not in a failed state if and when the
function fails. If
a
task could be found which was
availability of the
then a
is theoretically almost 1moo:;s1t)le.
For
if a proactive task could be found which could
1 Offl/o
ability of Pump C while it is
the n we can be sure that C would
always take over if B failed.
quences of the
of protection, as d iscussed
that any
it is most
In
function, h idden or otherwise, to achieve an
What it must do, however, is deliver the
needed to
reduce the probability of the multiple failure to a tolerable level.
For
assume that a
task is foun d which
If the mean tim e between ma.nt1ic10.ate,c1
achieve an
of the
Pump B is 1 O years, then the
1 000) in any one year, as discussed earlier,
122
1\;Jaintenance
Detect the
'"'''<iihJ.:,.
a hidden failure,
vu,.,,.n ,,J .'"' to find a suitable way
If
failure by checking the
it is still ooss110 1e to reduce the risk of the
If this check
hidden function uun.it:: .1 to find out if it is still
is carried out at suitable intervals and if
it
it is found to be
Scheduled n.n run-umrnu!
r,,,:;,r,r,rl l ("'' " "'
l''-'l
the em,u,i,neJvu
Failure
1 23
Identifying and
developing a
maintenance
strategy for a
hidden failure
No
J?fQctiye, ma;ntenanc i$
w<>rtll doing itit.cures
tlie, 'f''" pility 11,e,ded to
redy Je. pro(?,b.iJlty .of
a mllltipJe flili.h.tr to
tolerableJevel
Yes
The failure is
evident. See
Parts 3 to 5
of this chapter
5. l 5 .
1 24
Reliability-centred Maintenance
ror' ex,am1::>1e, consider a motor vehicle which suffers from a b locked fuel line. The
(in other words, the average "operator") would not be able to diag
nose this fail u re mode without expert assistance, so there m ight be a temptation
to cal l this a hidden fai l u re. However, the loss of the function caused by this fail u re
mode is evident, because the car stops working.
The au,esi,ion
There is often a temptation to describe a failure as 'hidden' if a consider
able period of time elapses between the moment the failure occurs and the
moment it is discovered. In fact, this is not the case. If the loss of function
evm1mu1, becomes apparent to the
and it does so as a direct
and inevitable result of this failure on its own, then the failure is treated
no matter how much time
between the failure in ques
as
tion and its '" " "'f"""' '."'
11
Failure
1 25
This example demonstrates that time is not an issue when '''"... <'",,,,.,..,riT
hidden failures. We are simply
whether anyone w ill
a
a
come aware of the f ct that the f ilure has occurred on its own, and not
if they will be aware when it occurs.
Primary and secondary functions
Thus far we have focused on the primary function
which is to be capable of fulfilling the function they are aeng11eo to fulfil
when called upon to do so. As we have seen, this is usually after the pro
function of
tected function has failed. However, an important
many of these devices is that they should not work when nothing is wrong.
The failure of the first function is hidden, but the failure of the second is
evident because if it occurs, the switch transmits a
shutdown
signal and the machine stops. If this is likely to occur in
it should
be listed as a failure mode of the function which is interrupted
the
primary function of the machine). As a result, there is usually no need to
list the implied secondjimction separately, but the failure mode would be
listed under the relevant function if it is reasonably
to occur.
The operating crew
When
crew refers
whether a failure is evident, the term
to anyone who has occasion to observe the equipment or what it is doing
at any time in the course of their normal daily
and who can be
relied upon to report that it has failed.
Failures can be observed by people with many different points of view.
They include operators, drivers, quality inspectors, ('r<'.li Ttc n"l .- n
whether any of these
sors, and even the tenants of buildings.
people can be relied upon to detect and report a failure
on four
critical elements:
the observer must be in a position either to detect the failure mode itself
or to detect the loss of function caused by the failure mode. This may
be a physical location or access to
infonnation (including
management information) which will draw attention to the fact that
something is wrong.
1 26
Reliability-centred Maintenance
For
We shall see l ater that failure-finding tasks are covered by the RCM
task selection process, so once
it should be assumed at this
that this task is not being done (even though the task is
in the
nonnal duties). This is beof the
cause the RCM process might reveal a more effective
or the need
or lower .....a.r,n,c:>. n , ' 1
to do the same task at a
there is often confrom the question of maintenance
siderable doubt about what the ' normal' duties of the
crew
are. This occurs most often where standard operating procedures
are either poorly documented or do not exist. In these cases, the RCM
review process does much to help clarify what these duties should be, and
can do much to
lay the foundations of a full set of operating procedures. This
to
plants.)
1 27
'Fail-safe ' devices
that a protective circuit is said to fail-safe \Vhen it
It often
not This usually occurs when
part of a circuit is considered instead
of the circuit as a whole.
An example
provided by a pressure switch , this time attached to a
static bearing. The switch was meant to shut down the machine if the oil p ressure
d u ring
i n the bearing fell below a certain level. It
that if the
electrical signal from the switch to the control panel was i nterrupted, the machine
would s h ut down, so the failure of the switch was
to be evident
However, further d iscussion revealed that a
coul d
deteriorate with age, so the switch could become
sensing changes
in the p ressure. This failure was hidden, and the maintenance program for the
switch was developed accord ingly.
5.7 Conclusion
compre
This chapter has demonstrated how the RCM process
fai l ures. As summarised in
hensive
framework for
1 28
6 Proactive Maintenance 1 :
Preventive Tasks
6.1 Technical Feasibility and Proactive Tasks
As mentioned in Chapter 1 , the actions which can be taken to deal with
failures can be divided into the following two categories:
proactive tasks: these are tasks undertaken before a failure occurs, in
order to prevent the item from
into a failed state.
embrace
what is traditionally known as 'predictive' and
maintenance,
although RCM uses the te1ms scheduled restoration, scheduled discard
and on-condition maintenance
default actions: these deal with the failed state, and are chosen when it
is not possible to identify an effective proactive task Default actions in
clude failure-finding, redesign and run-to-failure.
correspond to the sixth and seventh of the seven
These two
questions which make up the basic RCM decision process, as follows:
what can be do Ile to predict or prevent each failure?
what if a suitable predictive or preventive task cannot be found?
Chapters 6 and 7 focus on the sixth question. This deals with the criteria
used to decide whether proactive tasks are technicallyfeasible. They also
look in more detail at how we decide whether
of tasks
are worth doing. (Chapters 8 and 9 review default ...,..,,..
When we ask whether a proactive task is technically
we are
simply asking whether it is possible for the task to prevent or anticipate
the failure in question. This has nothing to do with economics - econom
ics are part of the consequence evaluation process which has
considered at length . Instead, technical feasibility
on the technical characteristics of the failure mode and of the task itself.
i,_,.,,v.
1 30
Reliabilitycentred Maintenance
Two issues dominate proactive task selection from the technical view
point. These are:
the relationship between the age of the item under consideration and
how likely it is to fail
what happens once a failure has started to occur.
The rest of this chapter considers tasks which could apply when there is
a relationship between age (or exposure to stress) and failure. Chapter 7
considers the more difficult cases where there is no such relationship.
Preventive Tasks
131
Figure 6.2:
Absolute
predictability
............
Figure 6.3:
...
.......................
.. ......... ..... ...
.
(/)
(J)
A realistic v;ew of
age-related failures
if) ...____,.-.---..-.-----,.-.-1
Age (x 1 0 000)
Part B is exposed to a generally higher level of stress throughout its life than part
A, so it deteriorates more quickly. Deterioration also accelerates in response to
the two stress peaks at 8 000 km and 30 000 km. On the other hand, for some
reason part A seems to deteriorate at a steady pace despite two stress peaks at
23 000 km and 37 000 km. So one component fails at 53 000 km and the other
at 80 000 km.
1 32
Reliability-centred Maintenance
This
shows that the failure age of identical parts
under
apparently identical conditions varies widely.
although some
number of parts
parts last much
than others, the failures of a
which deteriorate in this fashi on would tend to congregate around some
average
as shown in
6.4.
Figure 6.4:
Frequency of
failure and
..avcrac,e life"
1
(x 10
_..
So even when resistance to failure does decline with age, the point at which
failure occurs often much less predictable than c01mnon sense sugesrs.
Chapter 1 2 explores the quantitative implications of this situation in more
Itaiso ex1p1arns that the failure frequency curve shown in Figure 6.4
can be drawn as a conditional probability of failure curve, as shown in
life defines the age at which there is
6.5 below. (The term
a rapid increase in the conditional probability of failure. It is used to dis,.,r"' '" " "'' h this
from the average life shown in
Figure 6.5:
Conditional
probability of
failure and
"useful life"
1
The effect of
failures
1
Age (x 10 000)-.
Preventive Tasks
1 33
Figure 6. 7:
Failures which
are age-related
1 34
Preventive Tasks
135
Pattern A B. it
The scheduled res
In other words:
136
Reliability-centred Maintenance
mentioned earlier, a reduction in the number of failures is not suffior environmental consequences ) because
cient if the failure has
we want to eliminate these failures altogether.
Preventive Tasks
Figure 6.8:
(months)-..
12
l ife is 18 months.
In this example, the u seful life is 1 2 months, while the
I n a period of 3 years, the failure occurs twice if no
maintenance
done, while the p reventive task would be done three times. I n other words, the
preventive task has to be done 50% more often than the corrective task which
would have to be performed if the failure was allowed occur on
own.
2 000 i n lost p roduction and repai r, fail u res would
If each scheduled restoration tas k costs
cost 4 000 over a three year
So in this
(say) 1 1 00, these tasks would cost 3 300 over the same
case, the task is cost-effective.
On the other hand, if the average life was 24 months and all other figures
and would
remained the same, failures only occur 1 .5 times every
cost 3 000 over this period. The scheduled restoration task sti l l costs 300
over the same
so it woul d not be cost-effective.
1 38
Reliability-centred Maintenance
)aJ;re-iizre Um.its
- -J-
or environmental
Safe-life limits
apply to failures which have
consequences so the associated tasks must prevent all failures. In other
no failures should occur before this limit is reached. This means
that safe- life limits cannot apply to items which conform to pattern A,
because infant mortality means that some items must fail prematurely. In
fact,
cannot apply to any failure mode where the probability of fail
ure is more than zero when the item enters service.
In practice, safe-life l imits can only apply to failure modes w hich occur
in such a
that no failures can be expected to occur before the wearout zone is reached.
Preventive Tasks
1 39
Ideally, safe-life limits should be determined before the item is put into
service. It should be tested in a simulated operating environment to deter
mine what life is actually achieved, and a conservative fraction of this life
used as the safe-life limit This is illustrated in
6.9.
.Q
<I>
o o-
o a.o
SAFE
LIFE
LIMIT
1
Age -...
Figure 6.9:
Safe-life limits
140
For new assets, this means that a failure mode which has
mic consequences should also be
into an age-exploration program to
find out if a life limit
as with scheduled restorathere is seldom ..., .. ,., ... ,.,, ..
of task in an
initial scheduled maintenance program.
The Technical Feasibility of Scheduled Discard Tasks
The above comments indicate that scheduled discard tasks
feasible under the
circumstances:
Scheduled discard tasks are technically feasible if:
most of the items survive to that age (all of the items if the
There is no need to ask if the task will restore the original condition be
cause the item is replaced with a new one.
Preventive Tasks
141
In all four ofthese examples, when the items enter service it is not possible
to predict when the failures will occur. For this reason, such failures are de
scribed as 'random' .
142
anti-icing equip
gear, moveable high-lift devices, pressure and temperature control systems
the cabin, extensive navigation and communications
equipment, com pfex i nstrumentatio n and complex ancillary support systems.
Preventive Tasks
1 43
E
Figure 6. 14:
Failures which
not age
related
ating
age limits do little or
scheduled. overhauls can actually increase overall failure rates
by introducing infant mortality into otherwise stable
This is
number of
accidents around the
borne out by the high and
world which have occmTed e ither while maintenance is under way or
after a maintenance intervention. It is also borne out by the
machine operator who says that "every time maintenance works on it over
it
it takes us until Wednesday
to
the
.)
Figure 7.1:
Point where
it has failed
(functional
'te)
The PF curve
to detect whether
Predictive Tasks
1 45
146
The P-F interval tells us how often on-condition tasks must be done. If we
want to detect the potential failure before it becomes a functional failure,
the interval between checks must be less than the P-F interval.
In
equal to half
sufficient t o select task
it i s
the P-F interval. This e nsures that the inspection will detect the vv,...,.. u .u.
failure before the functional failure occurs, while most cases) provid
ing a reasonable amount of time to do somethin g about it. Thi s leads to
of the nett P-F interval.
the
The Nett P- Interval
between the
The nett P-F interval is the minimum interval
,, "''"'""' ,,,,.,..., of a potent ial
and the occurrence of the functional failure.
This is illustrated in Figures
P-F interval:
Inspection interval:
7.3 and
which both show
9 months _.,
1 month
a failure with a P-F interval of
Nett P-F
nine months.
l i t!t I I r I f ..oE- interval:
8 months
7.3:
Predictive Tasks
14 7
F
Time --=>-
The nett P-F interval governs the amount of time available to take what
ever action is needed to reduce or eliminate the consequences of the fai l
ure. Depending on the operating context of the asset, warning of incipient
fa i lure enables the users of an asset to reduce or avoid consequences in
a number of ways, as follows
downtime: corrective action can be planned at a time w hich does not
disrupt operations. The opportunity to plan the coITective action properly
also means that it is likely to be done more quickly .
repair costs: users may be able to take action to eliminate the '"'-'"''" ...........
damage which would be caused by unanticipated failures. This would
reduce the downtime and the repair costs associ ated w ith the failure.
For instance, a timely warning might enable users to switch a machine off
before (say) a collapsing bearin g all ows a rotor to touch a stator.
Maintenance
1 48
failures which affect the balance of large fans cause serious probFor
so on-line vibratio n sensors are used to shut the fans down
lems very
when such failures occur. In this case, the P-F interval is very short, so monitoring
is continuous. Note also that once again, the monitoring device is being used to
a void the consequences of the failure.
Figure 7.5:
Inconsistent
P-F intervals
Time -
Predictive Tasks
149
be selected which is
Clearly, in these cases a task interval
stantially less than the shortest ofthe likely P-F intervals. In this way, we
can always be reasonably certain
the potential failure before
it becomes a functional fai lure. If the nett P-F interval associated with
for suitable action to be taken to
this minimum interval is
then the on-condition task
deal with the consequences of the
technically feasible.
On the other
of them can be - then
val, and the task in question should
other way of dealing with the failure.
v.U'l.J U .:::. U
1 50
Condition monitoring
The most sensitive on-condition maintenance
involve
the use of some
of equipment to detect potential failures. In other
eo1mt1m1:nt is used to monitor the condition of other equipment.
These techniques are known as condition monitoring to distinguish them
from other
of on-condition maintenance.
Condition moni toring embraces several hundred different techniques,
so a detailed study of the subject is well beyond the scope of this ,,.,.,,, .....'".
However. Appendix 4 provides a brief summary of about 1 00 of the better
All of these techniques are
to detect failure
known
particle effects
chemical effects
physical effects
t"c,,..,,,....,..,,, ...,,i,nrr; effects
electrical effects.
can be seen as
These
sensitive versions of the human
senses. Many of them are now very sensitive indeed, and a few give several
of failure. However, a major limita
months (if not several
every condition monitoring device is that it monitors only
tion of
one condition. For
a vibration analyser only monitors vibration
So
sens1t1v
and cannot detect chemicals or
ity is bought at the price of the versatility inherent in the human senses.
The P-F intervals associated with different monitoring techniques vary
from a few minu tes to several months. Different
also pinpoint
failures with
Both of these factors must be
considered when assessing the feasibility of any technique.
condition monitoring
can be spectacularly efbut when
are inappropriate they can
and sometimes bitterly disappointing waste of time.
the criteria
As a
whether on-condition tasks are technirigorously to
feasible and worth doing should be
condition
,::, 1.; :lHvl. "U,
Predictive
I5l
In zone 2 i n
7 .6 the process i s ,out o f control but still within
fication. (Oakland 1991 describes how
shifts of this
a cusum chart'
identifiable condition which indicates that a functional failure is about to
occur. In other words, it is a potential fai lure.
the situation the process
as shown in zone 3 i n
This example describes only one o f many ways which SPC can be
used to measure and manage process
A ful l
o f all
is well beyond the scope of this book. t-t n.u n, u,o.r
the
point to note at this
is that if deviations on charts like these can
then the charts are sources of
related directly to specific fai lu re
on-condition data which can make a valuab le contribution to overaJl pro
active maintenance efforts.
152
Reliability-centred Maintenance
In control and
in specification
= OK
Out-of-control
and in specification
= potential failure
Out-of-control
and out-of -spec
= functional failure
Predictive Tasks
153
Normal
Potential failure
Figure 7. 7:
Functional failure
for on*condition
maintenance
The process of
marked up
coloured) as shown in
or anyone else needs to do i s look at the gauge and
if
failure
zone, or
more drastic
the pointer is in the
action if it is in the fu nctional failure (red?) zone. However the gauge must
still be monitored at i ntervals which are less than the P-F interval.
reasons, this
which are
state. Also take care to ensure that
..,u,e> ...... u,,F, a
marked up
in this way are not taken off and remounted in the wrong place.)
'"' .... " , ,' 1 1 "
11 .
1 54
Reliabiliv-centred Maintenance
of the potential
a human is able
failure and hence about the most appropriate action to be taken, whereas
a condition monitoring devi ce can only take
and send a signal.
Selecting the Right Category
Many failure modes are preceded by more than one often several - dif
so more than one category of on-condition task
ferent potential
orn...rAnr, ,:, t;::> Each of these will have a different P-F interval, and
different
and levels of skill.
each will
O.V!:lomn, ln
f'f"ln<:!llf1t:>r a ball
Figure 7.8:
Different potential
failures which
can
failure mode
one
Predictive Ta::.,-ks
1 55
H l 'l rn l in cn>
1 56
However, if the same pipeline was carrying a toxic substance like cyanide, any
as functional failure . In this case it i s not feasible
leak at all would be
for l eaks, so some othe r method would need to be found
This would almost certa inly entail some sort of modification.
how
This
age
The P-F interval and
these principles for the first time, people often have diffiWhen
between the ' life' of a component and the P-F
in
interval. This leads them to base on-condition task frequencies on the real
..., ..,,;;.,J,.,.,,,,.... 'life' of the item. If it exists at
this life is usually many
than the P-F interval, so the task achieves little or nothing.
we measure the life of a component forwards from the moment
it enters service. The P-F interval is measured back from the functional
so the two
are often completely unrelated. The distinction is important because failures which are not related to age (in other
as those
by
are as likely to
which are not
Failures
a random
0
1
Age (years) ---->-
Inspections
done at
2 monthly
intervals
Potential
detected at
2 months
functional
this does not mean that on-condition tasks apply only to items
which fail on a random basis. They can also be applied to items which
as discussed later in this chapter.
1 57
n ri ,., , .. , ru-.. .n ,
P-F interval is
158
Reliability-centred Maintenance
Cracks m igrate
to the surface of
the outer race
Figure 7.10:
How a rolling element bearing fails due to 'normal wear and tear'
Predictive Tasks
1 59
For example, the symptoms described in Figure 7.1 O are based on fail u re due
to normal wear and tear. Very similar symptoms would be exhibited in the final
where the failure process has been initiated
stages of the fai l u re of a
by dirt, lack of lubrication or brinelling.
In practice, the precise root cause of many failures can only be identified using sophisticated instruments. For
it might be possible
to determine the root cause of the failure of a
by
a ferrograph to separate particles from the lubricating oil and
the
particles under an electron microscope.
However, if two different failures have the same
and if the
P-F interval is broadly similar for each set of symptoms - as it probably
would be in the case of the bearing examples the distinction between
root causes is irrelevant from the failure detection viewpoint. (The dis
tinction does of course become relevant if we are
to eliminate
the root cause of the failure.)
failure only becomes detectable when the fatigue cracks
to the
up. The point at which this hap
surface and the surface starts
pens in the life of any one bearing depends on the
of rotation of
the bearing, the magnitude of the load, the extent to which the outer race
r11n,1:::.,..,,"'
smface is ..............
.,... prior to or during
itself rotates, whether the
installation, how hot the bearing
in
the
of the
shaft relative to the housing, the materials used to manufacture the bear
ing, how well it was made, etc. Effetively this combination of variables
makes it impossible to predict how many operating
must
before the cracks reach the surface, and hence when the
will start
exhibiting the symptoms mentioned i n
7 . l 0. (For those inter
ested in pursuing this subject
chaos theory i n particular the
'butterl1y effect' shows how tiny differences between the initial condi
tions which appl y to any dynamic system lead to dramatic differences
after the passage of time. This may explain why minute vmiations between
the initial conditions of two rolling element
can lead to
differences between the ages at which they fail. See
Deterioration accelerates in the final
of most fai lures.
deterioration is likely to accelerate when bolts start to
when filter
elements get blinded, when V-belts slacken and start slipping, when elec
trical contactors overheat, when seals start to fail, when rotors become
unbalanced and so on. B ut it does not accelerate in every case.
Reliability-centred Maintenance
1 60
.. -
r. f<S'ct4'
..: .
2"""'
' Cno5S SECTA:W
OF T'fRE TR[AV
P-F
interval
at least
5 000 km
pl
10
0
30
20
Operating Age
(x 1 000 km)
40
50
Figure 7 .11:
Predictive
16 1
Truck
Left front
Right front tyre
Left rear tyre
Right rear tyre
1 40 000
47 500
22 000
1 2 500
38 000
1 42 500
50 000
24 500
2
40 500
1 45 000
500
27 000
4 500
43 000
1 47 500
1 000*
500
7 000
45 500
rt:ar:,1,;:irt::)r;l at
of UF tyre dropped below 3 mm and tyre 'v,_,,,._...,,.,,
rarHQf'Drl with new spare
.,
Hn"'"'"""r if the process of deterioration linear and the task itself is very
expensive, then it might be worth ensuring that we
001tent1a1 failures when it is
necessary.
For i nstance, if an on-condition task entails shutting downand
turbine to check the turbine discs for cracks, and we are certain that dt=:te,wr.atlcm
,,.,,,...," n.o,v\mlC' detectable after the turbine has been in service
of time (in other words, the
then we should
the turbine out of service to c heck for the cracks after it
1 62
which there is a reasonable l i kelihood that detectable cracks will start to emerge.
Thereafter, the frequency of checkin g is based on the rate at which a detectable
crack is l i kely to deteriorate i nto a failure.
For the record, the age at which cracks are l i kely to start becoming detectable
ls known as the crack initiation life, whereas the time (or n u mber of stress cycles)
w hich elapse from the moment a crack becomes detectable u ntil it grows so large
that the item fails is known as the crack propagation life.
In cases like these, the cost of doing the task would be much greater than
the cost of the associated planning systems, so it is worth ensuring that we
only start doing the tasks when it is really necessary. However, if it is felt
worthwhile, bear in mind that the planning process
that this
has to employ two completely different timeframes, as follows:
used to decide when we should start doing the on
condition tasks. This is the operating age at which potential failures are
likely to start becoming detectable.
the second time-frame governs how
we should do the tasks after
rea;cnea. This time-frame is of course the P-F interval.
be felt that the turbine disc is u n li kely to develop any detec
For example, it
table cracks u ntil it has been in service for at least 50 000 hours, but that it takes
a minimum of ten thousand hours for a detectable crack to deteriorate i nto disc
don't need to start checking for cracks until the item
failure.
has been i n service for 50 000 hours, but thereafterit must be checked at intervals
of less than ten thousand hours.
of sophistication
Planning with this
a very detailed un
derstanding of the failure mode u nder consideration, together with highly
so1m1:mc:ate:a planning systems. In practice, few failure modes are this well
understood. When they are, even fewer organisations possess planning
"''"'I'"'..,.,,., which can switch from one time frame to another as described
above, so thi s issue needs to be approached with care.
In closing this discussion, it must be stressed that all the curves P-F
and age-related - which h ave been drawn in this part of this chapter have
been drawn for
mode at a time.
concerning tyres, the failure process was ' normal'
For i nstance, in the
wear. Different failure
(such as flat spots worn on the tyres due to emerM
b raking or
to the carcass caused by hitting kerbs) would lead to
rhtt,oro,nt conclusions because both the technical characteristics and the conse
q ue nces of these failure modes are different.
Predictive Tasks
I 63
164
This approach is of course potentially very dangerous, because there
that the initial
interval, no matter how short,
is also no
will be shorter than the P-F interval to begin with (unless serious considerato the failure process itself).
tion is
Arbitrary intervals
The difficulties associated with the two approaches described above lead
quite seriously that some arbitrary 'reasonto
short' interval should be selected for all on-condition tasks. This
arbitrary approach is the least satisfactory (and the most dangerous) way
task
because there is
no guarantee
to
interval
will
be
shorter
than the P-F
that the
interval. On the other hand, the true P-F interval may be much longer than
in which case the task ends up being done much
the arbitrary
more often than necessary.
only needs to be done once a month, that task
For instan ce, if a daily task
thirty times as much as it should.
is
Research
P-F interval is to simulate the failure
The best way to establish a
in such a way that there are no serious consequences when it eventually
does occur.
this is done when aircraft components are tested
to failure on the
rather than in the air. This not only provides data
about the life of the
as discussed in Chapter 6, but it also
enables the observers to study at leisure how failures develop and how
is expensive and it
However, laboratory
quickly this
takes time to
even when it is accelerated. So it is only worth
doing in cases where a
number of components are at risk
such as an aircraft t1eet - and the failures have very serious consequences.
A rational approach
The above pairagrarms indicate that in most cases, it is either impossible,
1morarct1ca1 or too
try to determine P-F intervals on an
ricaJ basis. On the other hand, it is even more unwise simply to take a shot
these problems, P-F intervals can still be estimated
in the dark.
.:,............,.,",""!' accuracy on the basis of judgement and ex1oer1en,ce.
is to ask the right question. It is essential that anyone who
to determine a P-F interval understands that we are
how
In other words, we are
how much time (or
Predictive
1 65
.:>J UtpLvll.h".>
('U'f'">'\n.ln.
r,u,
n't"'\t:>r".>l"t:>
If everyone in the group achieves consensus, then the P-F i nterval has
go on to consider other task selection
been established and the
criteria such as the
of the P-F interval and whether the nett
interval is lon g .., .. v-.. 5: to avoid the failure consequences.
If the
co1nscnsus, then it is not nr.,.,,,,,,lhl.::.
a positive answer to the question "what is the P-F interval?". When this
happens, the associated on-condition task must be abandoned as a way of
the failure mode under consideration, and the failure must be
dealt with in some other way .
The third trick is to conceiztrate
mode at a tfrne. In other
a
if the f ilure mode is wear, then the
should concentrate
on the characteristics of wear, and should not discuss
conosion or
the symptoms of the other failure modes are almost i denti
fatigue
cal and the rate of deterioration is also very
Finally , i t must be c learly understood everyone
part in such
that the
an
is to arrive at an on-condition task interval
which is less than the P-F interval, but not so much less that resources will
on the
process.
be
The effectiveness of such a group is redoubled if matru1gerne1rn
of human
presses an appreciation of the fact that it is
that humans are not infall ible. However, the
that if the failure has
consequences, the
wrong could (literally) be fatal for themselves
need to take special care in this area.
Pnr'\ l l nc f'I
1 66
Maintenance
Predictive Tasks
1 67
Figure 7.12:
Age (x 1 0 000)
168
Maintenance
On-condition tasks
On-condition tasks are considered
reasons:
the
they can
performed without moving the asset from its
while it is in operation, so they seldom
installed position and
interfere with the ........;;, :....:::::::;: process. They are also easy to organise.
potential failure conditions so cotTective action
defined before work starts. This reduces the amount of
and enables it to be done more quickly.
by identifying e(1uipment on the point of potential failure, they enable it to
all of its useful life
illustrated by the tyre example).
realise
Scheduled restoration
If a suitable on-condition task cannot be found for a particular failure, the
next choice is a scheduled restoration task. It too must be technically
teatstt>Ie. so the failures must be concentrated about an average age. If they
are, scheduled restoration
to this age can reduce the incidence of
functional failures. This may be cost-effective for failures with major
the scheduled restoration
economic consequences, or if the cost
the functional failure.
task is significantly lower than the cost
of scheduled restoration are that:
The
it can only be done when items are stopped and (usually) sent to the
workshop, so the tasks nearly always affect production in some way
the age limit
to all
so many items or components which
have survived to
ages will be removed
work, so they O',,..,...,,....t-"" a much
restoration tasks involve
workload than on-condition tasks.
H"''"''"""""r scheduled restoration is more conservative than scheduled dis
instead of throwing them away.
card because it involves
Scheduled discard tasks
the least cost-effective
Scheduled discard is
but where it is technically 1eals1c11e, it does have a few desirable
features.
Predictive Tasks
1 69
Combinations
For very small number of failure modes which have
or environ
mental consequences, a task cannot be found which on its own reduces the
risk of failure to an
low level, and a suitable modification does
not
In these cases, it is sometimes pos
s ible to find a combination of tasks
(usually from two different task catesuch as an on-condition task
and a scheduled discard task), which
reduces the risk of the failure to an
Do the on-condition
ceptable level. Each task is carried out
task at intervals less
than the P-F interval
for that
at the frequency
task. However, it must be stressed that
situations in which this i s necessary
are very rare, and care should be taken
not to employ such tasks on a 'belt and
braces' basis.
Do the scheduled
restoration task at
The task selection process
intervals less than
The task selection process is summa
the age limit
7. 1 3. Thi s basic order of
rised in
preference i s valid for the vast
ity of failure modes, but it does not apply
in every
case. If a lower order
task is clearly
to be a more costeffective method of m2magmtg a fai lure
Do the scheduled
than a
then the lower
discard task at
order task should be selected.
intervals less than
the age limit
Jligure 7.13:
Defau lt Actions 1 :
Fai l u re-finding Tasks
Tasks
Will the loss of function
caused by this failure
171
8.2 Failure-finding
Why bother?
of maintenance straMuch of what has been written to date on the
and
three
of maintenance: nnc11ctrve.
and corrective. Predictive tasks entai l
if C'Ar,-.,::,.1:h a, n
is failing. Preventive maintenance means
items or
,r,,n...7",nnPn,tc> at fixed intervals. Corrective maintenance means
or when
have failed.
either when
are found to be
of maintenance tasks which falls into
However, there i s a whole
none of the above categories.
when
a fire
we are not
it
or
We are simply
if it still works.
Tasks designed to c heck whether sornecnm,g
i"'h<:>i'>ITU"\
1 72
Average unavailability
of the
device
This led to the conclusion that the probability of a multiple failure can be
the unavailability of the protective device - in other
reduced
by
its availability. Chapter 5 went on to explain that the
best way to do this is to prevent the protective device from getting into a
fai led state
applying some sort of proactive maintenance.
Tasks
173
of
\,HJ,> cHJ'VUJl\,J
V>cHVa,ULk
;,uflTlCHHC'
sensor to
the circuit
response.
1 74
For exam ple, a pressure switch may be designed to shut down a machine if the
below a certain level . Wherever possible switches
lubricating oil p ressure
by d ropping the oil pressure to the required level and
like this should be
checking whether t he machine shuts down.
Similarly, a fire detection circuit should be checked from smoke detector to fire
alarm by blowing smoke at the detector and checking if the alarm sounds.
Do not
Dismantling anything always creates the possibility that it will be
incorrectly. If this happens to a hidden function; the fact
back
that it is hidden means that no-one will know it has been left in a failed
state until the nex,t check ( or until it needed). For this reason, we should
of checking the functions of protective devices with
always look for
out disconnecting or otherwise disturbing them.
have to be dismantled or
been
some devices
This
removed altogether to check if they are working properly. In these cases,
care must be taken to do the task in such a way that the devices will
still work when
are retumed to service.
mathematical implica
tions of the fact that a failure-finding task might induce a failure are con
sidered later in this chapter.)
It must
possible to check
In a very s mall but still
number of cases, it
carry out a failure-finding tasks of any sort These are:
where it is
access to the
device in order to
to
check it ( this is almost
a result of thoughtless design).
when the function of the device cannot be checked without destroying
it (as in the case of fusible devices and rupture discs). In
ot11ter1:ec1mclo1es are available (such as circuit breakers instead of fuses).
However, in one or two cases our only options are to find some other
way of managing the risks associated with untestable protection until
or to abandon the processes concerned.
so1metn1ng better comes
Minimise risk while the task is being done
It should be
to carry out a failure-finding task without signifithe risk of the multiple failure.
cantly
An example of a borderline task is ove rspeeding something in order to check
protection mechanism works.
whethe r the
40
1 76
Reliability-centred Maintenance
Bike
1
2
3
4
5
6
7
8
9
10
LEGEND
= Checked/OK
= Checked/failed
= Failed during
this year
Figure 8.2:
Brake light
failures
I n this case, the failure-finding i nterval of one year is equal to 1 0% of the MTBF
of ten years. However, we don't know exactly when each fai led l ig ht ceased to
function. One might have failed the day after the last check, anothe r the day before
the current check, and the rest at some time in between. All we know for sure is
that each of the fou r lights failed some time d u ri n g the year preceding the check.
So in the absence of any better information, we ass u me that on average, each
failed light failed half way through the year. I n other words, on average, each of
the fai led lights was out of service for half a year. This means that over the fou r
year period, our failed l ig hts were i n a failed state for a total of:
4 failed l ights x 0.5 years each in a failed state = 2 years.
So o n the basis of the above info rmation, it seems that we can expect an average
u navailability from o u r brake l ights of:
2 years in a failed state + 40 years in service
5%.
This corresponds to an availability of 95%.
The above
suggests that there is a linear correlation between the
unavailability (5%), the failure-finding interval ( 1 year) and the reliability of
the protective device as given by its MTBF ( 1 0 years), as follows:
Unavailability
It can be shown that this linear relationship is valid for all unavailabilities
of less than
provided that the protective device conforms to an expo
nential survival distribution (failure pattern E or random failure). (See
Cox & Tait 199 1 , Pp 283 - 284 or Andrews & Moss 1 993 , Pp 1 10 - 1 1 2)
l!,X,c tuamiJz task time and
time
Note that the 'unavailability ' of the protective device does not include.
any unavailability incurred while the failure-finding task is being carried
out, nor does it include any unavailability caused by the need to repair the
device if it is found to be failed. This is so for two reasons:
Tasks
which
be needed
task and any
both the
should be carried out under tightly controlled conditions. These condi
reduce - if not
eliminate - the chance
tions should
of a multiple failure while the intervention is under way. This entails
either shutting down the
system
alternative protection until the
been fully restored. If this is done
the unavailability resulting from the ( controlled) intervention can be
ignored in any assessments of the probability of multiple failure.
In the RCM decision process, the latter point is covered the criteria for
task worth
If there a
ficant increase in the likelihood of a multiple failure while the task is
I H 1"';P_T1t n n 1 n n
1 78
the above calculations are only valid if the brake
(Strictly
on all the bikes are used about the same number of times each week. If
there is a wide variation, both the MTBF and the
interval
should be calculated in terms of distance travelled, or even more
in terms of the number of times the brakes - and hence the brake
the
point to note at this
is the connection
are used.
between the
the desired availab ility and the MTBF).
For people who are uncomfortable w ith mathematical formulae, form
above can be used to develop a simple table, as follows
ula
99,99% 99.95/o 99.9% 99.5% 99%
0.02%
0. 1 %
0.2 /o
I%
2%
8% 95'%
4%
I 0 /o
Required avt:UUWliiuv
established the rel ationship between availab ility, reliability and
1 1 1 r,:._Tl! T\f1 t n n t n t,oi-,,,:; f c the next i ssue to consider is how we decide what
we require. Part 6 of Chapter 5 explained that this can be done
as follows:
prepared to tolerate for the
1 . first ask what probability
multiple failure which could occur if the hidden function w as not work
when called upon to do so
function will fail in
2: then determine the probability that the
the period under consideration
3: finally determine what availability the hidden function must achieve
to reduce the probability of the multiple failure to the desired level
we need to find out the mean
time between failures of the hidden function. Once this has been done, we
8.3 and select the task
which
"""''"'"'"- to look at
are in a P"';')ii.vu
f",'trr,::,.c,">r,nA,;: to the level
establish ed in
3. This process
is illustrated in the
.111.t.\..;:l vU..11,,,'J'
I 79
MTBF = 1 O years
Figure 8.4:
this meant that the unavailability of the stand-by pump must not exceed 1 10, so
the availability of this pump has to be 99% or better
Figure 8.3 suggests that to ach ieve an availability of
for the
pump,
someone would n eed to carry out a failure-finding task (in other words, check that
it is fully functional) at an i nterval of 2/o of its mean time between fai lures. Records
m ight show that the stand-by pump has a m ean time
failures of 8 years
about 400 weeks), so the failure-finding task
should be:
of 400 weeks = 8 weeks
1 80
Reliability-centred Maintenance
is the mean time between failures of the
the failure-finding task interval.
is the allowed
ruv..-f'n-,rt f"n,n
r>rr)TL>,"TH U'J
device.
rE"!lrriinnPrl
:::: (1/MTED) X
as follows:
3
.... 4
FFI
2 x
.... 2
5
interval to be detennined in
as follows:
If we apply this formula to the figures used in the duty/stand-by pump system
is 1 000 years,
is 8 years and MrEo is 1 0 years, so:
mentioned above,
FFI
2 months
modes in a
device
all the failure possibilities which could cause
,., .... ... u,J"" this
device to fail have been grouped together as one
each
The vast majority of protective de
failure mode ('stand-by pump
vices c an be treated in this way, because all the failure modes which could
device to cease to f u nction are checked when the func
cause
tion of the device as a whole is checked.
J.4 r,n ,,:,.,u,r it is sometimes appropriate to carry out a detailed FMEA of
the device in order to identify individual failure modes which might on
their own cause the device to be unable to provide the required protection.
This is usually done in two sets of circumstances:
when some of the failure modes arc known to be susceptible to pro
but others are neither predictable nor preventable.
active
In these c ases, the appropriate on-condition or scheduled restoration/
discard task should be applied to the failure modes which qualify, and
tasks applied to the remainder of the failure modes
Tasks
181
it to
there
a
in a failed state_
11
MMF
the
( 1 - p) .can be
In this
If the act of switching is the only cause of failure
there
and ifthe failure conforms to an
is
survival distribution,
probability of a multiple failure the demand rate
multiplied by
For ex1:1mt>le. if the demand rate is 40 years and the switch lasts average of
then the probability of a multiple failure is:
600 000
1 in (40 x 600 000) 1 in 24 000 000 years.
Thi s is so because if the failure is caused
of operating the switch to check if it has failed will ,,,.u,....,. , u... .,,"'"'...., ,"'H
enable you to find out if the l ast
'-'1-'''"'"'"''"V"
Maintenance
1 82
do not indicate whether the underlying failure-pattern is agein which case some form of scheduled restoration or scheduled
discard
be
or whether it random
"''"""'""'"
Failure-finding Tasks
1 83
184
Reliability-centred Maintenance
rY U :i l h in l ,:,,
<J
c,w,u,,
1 85
If this value of
will almost
inadequate and
discussed in the next chapter.
1 86
ivfaintenance
marised i n
is a fuller descrip tion of this aspect of the procthan the two boxes at the
foot of the left hand column
in
r1 1 'Jo (Tr,l m
Figure 8.5:
Failure-finding:
the decision process
consequences.
operational or
Note that if a suitable
task cannot be found a failure under
either of these
it simply means that we do not carry out
scheduled maintenance on that
m its
form. It does not
mean that we simply
about it. As we
the next section of this
,,rEl:nn::.r there may be circumstances under which it is worth
the
to reduce overall costs.
of the
1 88
Maintenance
9.2 Redesign
has arisen
as we have
and
which must be followed to ..!.v
ri.:;n r,:::,.
,r,,,n a successful maintev:vp
nance program. In thi s part of this chapter,
issues
between
and maintenance, and then
which affect the
by redesign in the task selection process.
consider the
is used in its broadest sense i n this chapter. Firstly,
The tenn
item of equipment. This
it refers to any
to the
h':u"
"
means any action which should result in a ..,,, ..,,........vp. to a drawing or a parts
the
component, adding a new
list. It includes
rerUll,Cln' P Clll entire machine With One Of a different make Or type,
machine. It also means any other once-off change to a
of the plant It even
which affects the
a method of dealing with a specific failure mode ( which
the capabi li ty of the person being trained.)
V
c,n,,rd',';>+>/Hc,
1 89
Most modifications take from six months to three years from concep
tion to commissioning, depending on the cost and complexity of the new
design. On the other hand, the maintenance person who is on duty
has to maintain the equipment as it exists today, not what should be there
or what might be there some time in the future. So
realities must
changes.
be dealt with before tomorrow's
Secondly, most organisations are faced with many more apparently
desirable design improvement opportunities than are physically or eco
nomically feasible. By focusing on failure consequences, RCM does
much to help us to develop a rational set of priorities for these T'\rr,1t"['r"
especially because it separates those which are essential from those that
are merely desirable. Clearly, such priorities can only be established after
the review has been carried out
Inherent reliability vs desired performance
Among other things, Part 2 of Chapter 2 stressed that the i nherent reliabi
lity of any asset is established by its design and by how it is made, and that
maintenance cannot yield reliability beyond that inherent in the
This led to two important conclusions.
Firstly, if the inherent reliability or built-in
of an asset
greater than the desired performance, maintenance can
achieve the
desired perfonnance. Most equipment is adequately "V'-'"' ' ' '-'''"'' v,n,;..,u'-''-
and built, so it is usually possible to develop a ,,...,.,., .,,.,."'"'''Vl maintenance
program, as described in previous chapters. In
in most cases,
RCM helps us to extract the desired performance from the asset as it is
currently configured.
On the other hand, if desired performance exceeds inherent reliability,
In
then no amount of maintenance can deliver the desired
these cases 'better' maintenance cannot solve the
so we need to
look beyond maintenance fo r the solutions. Options include:
modifying the equipment
changing operating procedures
lowering our expectations and deciding to live with the problem.
This reminds us that maintenance is not always the answer to chronic
reliability problems. It also reminds us that we must establish as soon and
as precisely as possible what we want each piece
to do in its
,...,n.,or<1, 1'1:ni f'T
context before we can starting talking
about the appropriateness of its design or its maintenance
vu-iJU.U' l U \.
Maintenance
1 90
or environrnental consequences
If a failure could affect
or the environment and no
task
can be found which reduces the risk of the failure
,.,vA .,...,.,,uu.,,t-,.. must be changed, simply because we
or environmental hazard which cannot be ade
usually undertaken with one
to reduce the probability of the fail ure mode occurring to a level which
the affected component
This usually done by
is
with one which
or more reliable.
the item or the process in such a way that the failure no l onger
or environmental consequences. This is most often done
r'lr/Vta,,h , ro r,,c; n r 1 .,,:H>
which wer e
abnormal conditions
to alert
- to shut down the equip1nent in the event of a
to e liminate or relieve abnormal conditions w hich follow a failure and
otherwise cause more serious
which
- to take over from function which has failed
to
situations from
Remember that if such a device is added, its maintenance requirements
must also be analysed.
or environmental consequences can also
be reduced by
hazardous materials from a process, or even
by
dangerous process altogether.
ULUU""''"'
As mentioned in
Of the
when
RCM does not raise the t1we.st:Lon o f economics. If the level of risk assoas
we are obliged either
ciated with any failure
or to make the process safe. The alternative is to
"n"" that are known to be unsafe
unsound.
._,,,,u.Lvl,
",.."'"'11 1
Other
Actions
1 91
Hidden failures
the risk of a
failure can be reduced
In the case
the equipment in one of four ways:
by
make the hidden function evident by adding another device: Certain
hidden fu nctions can be made
adding another devi ce which
draws the attention of the operator to the failure of the hidden function.
a battery used to power a smoke detector is a classical h idden
For
function if no additional p rotection is p rovided. H owever,
is fitted
to most such detectors in such a way that the l ight
out if the
fails.
m akes the
of the
In this way the additional
battery, not about the
(Note that the l ight only tells us about the condition of
ability of the detector to detect smoke.)
Special care is needed in this area, because extra functions ins ta] led for
thi s purpose also tend to be hidden. If too many
are
added, it becomes
difficult if
sensible
tasks. A much more effective approach to
substitute an evident function for the hidden
in
the next n1r''.'.l O'rsl1'\n
substitute an evidentfunction for the hiddenjimction : In most cases
this means substituting a
fail-safe
but
difficult to in
which is not fa il-safe. This is
if it is done, the need for a fai lure-finding task falls away once.
For example, one
used way to warn the d river of a
that his
b rake lights have failed is to
a warning
which switched on if the
brake lights fail . (In many cases, this llght i s also switched on for a short while
when the ignition is switched on. However, so are all the other
o n the
dashboard. Under these circumstances one
be
overlooked, so its f unction is effectively hidden.)
The system might also be configured i n such
only be tested by d i sabling a brake
and
and i nvasive
which is
on. This is a
likely to be dismi ssed
than it solves,
m ultiple fail u res associated with this
the design.
quences, so it is necessary to
One way to eliminate this problem i s to make the
and of the warning system evident This can be
cables for the
l ight, and
the cables
through them at the b rake l ights every time h e u ses the brakes.
In this situation, it
a pinp rick of light at the end each
d river if either a brake
or a cable fails. In other words, the function
protective device is now evident, so
1 92
Reliability-centred Maintenance
substitute a more reliable (but still hidden) devicefor the existing hid
8.3 suggests that a more reliable hidden function
den function:
(in other words, one which has a higher mean time between fai lures)
will enable the organisation to achieve one of three objectives:
to reduce the probability of the multiple failure without changing the
failure-finding task intervals. This increases the level of protection
- to increase the interval between tasks without changing the probabil
ity of the multiple failure. This reduces resource requirements
- to reduce the probability of the multiple failure and increase the task
intervals,
increased protection with less effort.
duplicate the hidden function : If it is not possible to find a single pro
tective device which has a high enough MTBF to give the desired level
of protection, it is still possible to achieve any of the above three objec( or even triplicating) the hidden function.
tives by
Let us return to the example of a d uty pump with a stand-by. It was explained
on page 1 79 that if the users want the p robability of a multiple fail u re to be less
than 1 in 1 000, and the u nanticipated failure rate of the duty p u m p is red uced
to 1 in 1 O years, then the availabil ity of the stand-by pump has to be 99% or
better. This led to the conclusion that a failu re-finding task should be done on
the stan d-by p u m p every 2 months i n order to achieve an availability of 99%
on an MTBF for this pump of 8 years).
However, now let us assume that someone has decided that the p robability
of a m u ltiple fail u re in this system should not exceed 1 in 1 00 000 ( or 1 Q5) , rather
than 1 i n 1 000. If the m ean time between unanticipated failures of the d uty pump
(Mreo) is u n changed at 1 O years, applying formu la 4 i n Chapter 8 shows that the
u navailability ( Ur,vE ) of the stan d-by pump should n ot exceed:
MrE /M MF = 1 011 00 ooo = 1 0-4
u 1wE
So the u n availability of the stan d-by pump must now not exceed 1 0-4 (0.01 %).
I f the MTBF of the stand-by pump is u nchanged at 8 years, applying formu l a 2
from Chapter 8 yields the fol lowi ng:
FFI
2 x 1 Q-4 x 8 years
1 4 hours
Activating a stan d-by pump this often is plainly impractical, so more thought has
to be g iven to the design of this system.
In fact, Figure 9.1 opposite shows that i f we were to add a second stand-by
pump, and ensure that the availability of each stand-by pump on its own exceeds
99% (corresponding to an unavai lability of 1 %, or 1 0-2), the p robability of the
m ultiple failure would be:
1 0 1 X 1 0'2 X 1 0-2 :::: 1 Q-5
or 1 in 1 00 000. Figure 8.3 suggests that this can be achieved by doing a failure
finding task on each stan d-by pump at the original frequency of once every 8
weeks. I n othe r words, a m uch h igher level of protection is achieved without
changing the task interval.
Duty
______P
_
ro_b_
a_
b1_
m_yo-f_ta_H_u
re_,_
nan-y....--
pump _,,
1 year uncnangea at 1 In 1 0
Unavailabilftv
Stand-by -+.a11,:_:),a-n,;_n:_r,:_9_
9_
%_______
....__ .
pump 1
Figure 9.1:
The effect of
duplicating a
hidden function
A_va
ill...
9_
Stand-by -+_
_,_
.,a_b_
'ty_9_
%___,,:..;,.
nJ:l
_v
_;:u,r:qf}
'lf
V-1_
.Yo
pump 2
//
1 94
Reliability-centred Maintenance
Figure 9.2:
Decision diagram
for a preliminary
assessment
at a proposed
modification
Redesign is
not justified
Redesign is
not justified
Redesign is
not justified
n o t worth modiis
fying it. On
if it i s
t o b e around fo r a while
the modification
have
Redesign is
a chance to pay for itself. This is why
not justified
the first
in
9.2 asks:
b; the remaining
useful
Redesign is
desirable
equipment high ?
Some
demand that modifications should pay for themselves within a specified period - say, two years. This
sets the
horizon of the equipment at two years. This
of policy
reduces the number of
initiated on the basis of projected costwhich will pay for themselves
benefits and ensures that only
are submitted for approval. So if the answer to the fi rst question
9.2 is no'
is probably not ''""'u .... ...,.....
195
Proposed modification:
a stainless
The problem:
Jumps which
block the
Figure 9.3:
1 1
"' "'
'-J.U'--:)'":VU
- "' v;.,, u
, n c f" 1'tc:>rf
196
Reliability-centred Maintenance
1 97
If we are certain that the modification to the hopper will work, a discounted cash
flow analysis on the figu res provided for the hopper ( at a discount rate of
shows that the modification will pay for itself
i n five years if the blockage occurs fou r times per year,
* in seven years if it occurs three times per year and
* i n more than ten years if i t occu rs twice per year.
1O
..:
>
z
::,
(I)
;:..
!I)
1 99
200
No
Is a task to detect whether
the failure is occurring or
about to occur technically
feasible and,worth doing?
Yes
No
No
Scheduled
oncondition task
Scheduled
o ncondition task
Is a scheduled restoration
task to reduce the failure
rate technically feasible
and worth doing?
I s a scheduled restoration
task to avoid failures
technically feasible
and worth doing?
No
Schedu led
restoration task
Yes
Scheduled
restoration task
Is a scheduled discard
task to reduce the fai lure
rate techni cally feasible
and worth doing?
Yes
Yes
No
Scheduled
discard task
Is a failure-finding task to
detect the failure technically
feasible and worth doing?
Scheduled
failurfinding
task
Is a scheduled discard
task to avoid failures
technically feasible
and worth doing?
No
Scheduled
discard task
Yes
No
No
Could the
m ultiple
failure affect
safety or the
enviro nment?
ls a combination of tasks
to avoid failures technically
feasible and worth doing?
Yes
Combination
of tasks
No
Redesign is
compulsory
s..-N
Redesign i s -Ye
.0_ No scheduled Redesign may
compulsory
maintenance
be desirable
201
No
No
Yes
No
No
Scheduled
restoration task
Is a scheduled d iscard
task to reduce the failure
rate technically feasible
and worth doing?
No
Scheduled
discard task
Is a scheduled restoration
task to reduce the fail ure
rate technically feasible
and worth doing?
Yes
Scheduled
restoration task
No
Schedu led
on-condition task
Scheduled
on-condition task
ls a scheduled d iscard
task to reduce the failure
rate technically feasible
and worth doing?
Yes
No schedu led
maintenance
Redesign may
be desirable
Scheduled
discard task
No
No scheduled
maintenance
Redesign may
be desirabl e
Figure 1 0. 1 :
] 00
1 991 Aladon Ltd
202
The decision worksheet is divided into sixteen columns. The columns
headed F, FF and FM identify the failure mode under consideration. They
as shown
are used to cross-refer the information
in
10. 3 below:
RCM II
Cooting Water Pumping Systen
INFORMATION
WORKSHEET r.;;..;.--------
RCM n
svsTeM
DECISION
WORKSHEET rstffisvs:fEii______
Figure 10.3:
the information
and decision
worksheets
The ne1011rms on the next ten columns refer to the umst11Dns on the RCM
in Figure 1 0. 1 , as follows:
Decision
the columns headed H, S, E, 0 and N are used to record the answers to
concerning the consequences of each failure mode
the
the next three columns
Ht,
H3 etc) record whether a proactive task has been
and if so, what type of task
if it becomes necessary to answer any of the default questions, the
columns headed H4 and H5, or S4 are used to record the answers.
The last three columns record the task which has been selected (if any),
with which it is to be done and who has been selected to do
the
it. The
task' column is also used to record the cases where reor it has been decided that the failure mode does not
need scheduled maintenance.
paragraphs, each of these four sections of the decision
In the
worksheet is reviewed in the context of the associated questions on the
Decision
"''''1"
" '""0
J.JJ.,..._ ......... .
203
Failure Consequences
of questions H, S, E and O in
l 0. 1 are dis
The
cussed at length in Chapter 5. These questions are asked for each failure
mode, and the answers recorded on the decision worksheet on the basis
shown in Figure 1 0.4 below.
- - in column S and
go to question S 1
/ r"I T 1:::;::; ,. - -
0
Write the letter Y
=----'
go to question 0 1
and go to question N 1
Figure l 0.5 shows how the answers to these questions are recorded on the
decision worksheet. Note that:
each failure mode is dealt with in terms of one
ces only. So if it is classified as having enviromnental consequences,
we do not also evaluate its operational consequences least when per
forming the first analysis of any asset). This means that if for instance
a 'Y ' is recorded in column nothing is recorded in column 0.
the
once the consequences of the failure mode have been
next step is to seek a suitable preventive task.
7 .5 also summarises the criteria used to decide whether such tasks are worth
''") Ta.rrt'Vt"H
204
y" N
preventive task
To be worth doing,
m ust reduce the risk this fail u re on its
own to a n
level
To be worth
over a period of time
any
m ust cost less than
cost of the operational consequences
the cost of repair of the failure which
is meant to p revent
Non-operational consequences:
Figure 10.5:
Proactive Tasks
The eighth to tenth columns on the decision worksheet are used to record
whether a
task has been selected, as follows:
the c olumn headed H l/S l/01/N l i s used to record whether a suitable
oncondition task could be found to anticipate the failure mode in time
to avoid the consequences
the column headed H2/S2/02/N2 is use d to record whether a suitable
the column headed H3/S3/03/N3 is used to record whether a suitable
205
206
Reliability-centred Maintenance
I__J
4 B 4 N
4 c 2 N
N N N N Y
Could the multiple failure affect
1
N N N N N ---- safety or the environment?
I I (This question is only asked if the
answer to question H4 is no.) If the answer to this question is yes, redesign
is compulsory. If the answer is no, the default action is no scheduled mainte
nance, but redesign may be desirable.
5 B 2 Y Y
2 A 5 Y Y
N N N
N N N
I I I I
Proposed Task
task or a failure-finding task has been selected during the
If a
decision-making process, a description of the task should be recorded in
the column headed 'proposed task'. Ideally, the task should be described
as precisely on the decision worksheet as it will be on the document which
reaches the person doing the task. If this is not possible, then the task
should at least be described in enough detail to make the intent absolutely
c lear to whoever writes up the detailed task description.
207
For example, consid e r a situation where an on-condition task has been soE:,c1i1eo
7 explained h ow such
for a rolling element bearing.
can suffer
from a variety of potential failure conditions, i ncluding noise, vibration, heat, wear
and so on. Many machines have more than one and often several such no::.rir,nc,
at the very least, the 'proposed task' should
. is to be checked for what cond ition. I n other words, if a
checked for noise, the p roposed task should read 'check
noise', and not just 'check bearing'.
208
in the next chapter, if we are confronted with a number
As
of tasks which need to be done at a wide range of different frequencies,
the time to consider
them into a smaller number of work
(''"m
...:'.:'1H
.,... nrr maintenance schedules. However, the initial
pa,c1G:1ge:s is when ..,,,.,,,
remain on the decision worksheet to
task frequencies should
remind us how the schedule
were derived (in other words, to
preserve the 'audit trail'.).
Note also that task intervals can be based on any appropriate measure
of exposure to stress. This includes
time, distance
or throughput, or any other
travelled, stop-start
to the failure mechme:ast1rao1e variable which bears
anism. However, calendar time tends to be used where vv,,,,, .. ,,,;,:v because
and cheapest to administer.
it the
u,,r,
1"...,.,"'"""',-." '"''
Can Be Done By
The last column on the decision worksheet is used to list who should do
each task. Note that the RCM process considers this issue one faifore
the subject with any
mode at a time. In other words, it does not
preconceived ideas about who should (or should not) do maintenance
and confidence to do this
asks who has the
work. It
ta,-,;k correctly.
be allocated to mainThe answer could be anyone at alt Tasks
the
function, specialist
insurance inspectors 1
tainers,
cec:nnJc1ans, vendors, structural
or laboratory technicians.
A sometimes controversial issue which arises at this
concerns
on-condition and failure-finding tasks. It some
simple
times makes sense to allocate these tasks to maintainers, but in many
maintainers to do these tasks has the
drawbacks
cases,
if they are skilled tra<1es1oeo01e
, ):
will be very high
if the task
is short, the inspection
sometimes more than once per shift. This can lead to so many
tasks that maintainers do little more than travel from one task
to the next. This Trl\!P l 1 1 TI U
the tasks makes the use of maintainers for this purpose expensive. often
them in this capacity.
to the point where it is simply not worth
tasks boring and are often
many skilled people find
reluctant to do them at all.
Tra rs n ,O.rl.f>'1
209
System Nil
SYSTEM
Facilitator:
Date
Sub-system Nll
A 1
A 2
s:
Iv
a;
(I)
IN
Weekly
noise
Craftsman
s
t':.
IN N N
C')
er
:::i
<.i)
is: 1 2
N Y(
N N N
scheduled maintenance
instead of
tank. When
of
the duty pump
;;::s,
! 4 weekly
Operator
211
_L_LVVVv Vvl,
1 1 Implementing RCM
Recommendations
1 1. 1 Implementation - The Key Steps
We have seen how the formal application of the RCM process ends with
completed deeision worksheets. These specify a number of routine tasks
which need tobe done at regular intervals to ensure that the asset contin
ues to do whatever its users want it to do, together with the default actions
which must be taken if an appropriate routine task cannot be found.
about how
in this process learn
who
The
the asset works and about how it fails. This on its own frequently causes
the participants to change their behaviour in ways which often lead direct
ly to remarkable
in asset performance. However, in order
to derive the maximum long-term benefit from RCM, steps must be taken
to implement the recommendations on a formal basis. These
should
ensure that:
all the recommendations are approved -------- , by the managers with
overall
for the assets
all routine tasks are described dearly and ,...,,..,,..,...,"'""' "
all actions which call for orn:.:-e -ou cnamgts (to designs, to the way the
or to the capability of
and maintainers) are
asset
identified and nnp1ementcd. ,..,-,. ..,..,"''"'1- 1 "
routine tasks and operating procedure cnalngt!S are incorporated into
appropriate work ..,...,.,.,,..,.....,,......,.,
"" IJ '"''" "'V,. ,.,,
Monthly
No scheduled maintenance
Redesign guard
Check agitator gearbox oil level Weekiy
Check tension of main drive chain
Calibrate gauge
No scheduled maintenance
yearly
21 3
214
.l..1.VY'f"-'Vl.,
.,.;,i:;,.
""'-
215
the people who did the
(If this
start to lose
efforts put into
whether manageJ.U"",. "">", and more
ment was serious about involving them in the first
made
Levels
The analysis should be carried out at the
leveL The most common
fault is to analyse assets at too low a
and the usual
is
numbers o f items with only one o r two functions defined per item.
Functions
Reliability-centred Maintenance
216
All the functional failures associated with each function should be listed
of each performance standard
(usually complete failure plus the
in the fu nction statement).
Failure modes
Ensure that fai lure modes which have happened or which are .....,..-:,.n"""'"' 1 '"'
likely have not been omitted. Failure mode descripti ons should also be
"''"",.,"LL'"' In particular,
should i nclude verb, not
a component
Failure Effect
Pump impeller jammed by rock
Failure
Failure effect
damaged
2 Screens worn.
whether ( and how) the failure will be evident to the operating crew
whether
217
Consequence evaluation
Special care should be taken to ensure that the hidden function uu1esr1on
,'n 1......
<::ir' the
n t1,r-n
:v._.:
(que stion H on page 200) has been answered
In y<-U
should have been attached to the terms on its mvn and
correct
on pages 1 24
under normal circumstances in this
and 1 26. Special attention s hould also be
to the evaluation of the
safety and environmental consequences of evident
and to the
effectiveness of any tasks which might have been selected to manage
failures in these two ca1:egon. e s.
Task selection
Any tasks which have been selected should not only
the criteria for
in
technical feasibility as
6, and 8, but they should
also address the consequences of the fa ilure. Key points to look out for:
if the
H is 'No' and the answer to
H4 is No' ,
then question H5 must be answered. If the answer to HS is Yes' , the
n-rr,nn c,.:,rt task should not be 'no schedu led maintenance'
' Ye s' ; the nrr,nn<OA,i task should not
detail to
tasks or default actions s hould be described i n ,,:.nrmcrh
..
leave the auditor in no doubt to what is intended. In
task descriptions should not simply list the type of task
condition tasks' or 'scheduled
to the failure
mode in question. It should not incorporate a combination of tasks because this
signifies two different fa ilure modes
the answer
to
S4 is yes). For
vr,,n[\"""11
vIU,U.UIJ < v ,
Wrong
tension
or
tension of chain
chain
wear
Initial interval
218
Maintenance
Right
C heck feedscrew coupling for loose bolts and
if necessary
or
for cracks and
Visually check agitator coupling
report defects to the maintenance supervisor etc
Calibrate gauge
Pages 206 and 207 explained that each task should be defined as
as oos>S1tJ1e on the decision worksheet. This saves the duplication of effort
which occurs if detailed procedures have to be written up later
one else. It also reduces the possibility
if time does not
the RC!vt
the procedures to be ,:;:n;:.r-ifi,"A
analysis, then they must be
later. As mentioned below, this can
often be done as
of an ISO 9000-type initiative.
Note that if detailed task
are to be ......,,. ,,,..,.'.l,,.,,,rt
should ideally be done by someone who
should understand
analysis. If this is not possible, the third
asked to define the tasks on the decision worksheet
that he or she is
in more detail, and not to re-audit the analysis.
cH""'' '- -' " ' '-'V>
L
whether
sw.ir1.ot:a and/
or isolated while the task is
precautions which must be taken
tools
these items can
much
unproductive
to and fro after the job
started.
ISO 9000 and RCM
should be
what work
of RCM is to
A major
(In other words, to e nsure that 'they do the
thrust of quality
like ISO 9000
hand, a
possible in order to minimise the
should be doing as
chance of errors. (In other words, to ensure that
.)
tasks from RCM decision
This
worksheets to end-user documents can be seen as the
where the
output of an RCM
becomes the input to an I SO 9000
writing exercise. It also
that if both initiatives are to
it makes sense to apply RCM first.
220
Reliability-centred Maintenance
assets
the RCM process, but the
should not be made to
,l-,.
ut:t)1u:1:mr
should consult afterwards with the people who did the review
in order to develop a COITectly focused
n n ,?Yr7T,0/1
22 1
iT'lTitnH
t1.....
maintenance procedures to
incorporated into the vv..,,. .... u,..
the balance of the maintenance routines are ,n.u.;'"' .... into sec)ar,ate
schedules and checklists,
-n.-r,r>Arh l rC> O
1-1...
222
Reliability-centred Maintenance
etc
Figure 11.3:
Maintenance Schedules
A maintenance schedule is a document listing a number of maintenance
tasks to be done by a person with a specified level of skill on a ..,..,.,,....,_.,. ...,...,.
1 1 A shows the relationship be
asset at a
tween these schedules and the RCM decision worksheets:
yearly
Operator
Figure 11.4:
223
Figure 11.5:
Consolidating
task frequencies
<'i"\'tYlr\i'U"!P t'l f
Contradictions
When a low frequency schedule incorporates a higher frequency sched"'
ule, shoul d the latter be incorporated as a
instruction, or should it
should
an annual schedule inbe rewritten in full? In other
schedule ) , or should all the
'do the three
clude
mr,n J' l '\ lU schedule be written out in
annual schedule?
In fact it is wise to rewrite the schedules in order to avoid the .,...,...... 1 '"'"'.,
of contradictions.
i
p1vu.1v11.1
ever we
WeibuJ/ distributions
Afuinte1umce
clown into
functions and
and
then identifies the fai lure
function hito functional
modes which
functional failure. This
to establish
to
in the
to
establish the
device.
C\J UU>t"'l t P
from the
of information which
i nclude:
that the fail u re has r,,ft"n,rori which
of
to
obt a i ned
266
Reliabilizv-centred Maintenance
Figure 13. I
Operations
Supervisor
A typical RCM
review group
Operator
Engineering
Supervisor
Craftsman
(M and/or E)
270
n\ (l"
293
'<>'H'
understood me:an1mg
'mean time between
294
Maintenance
how
it lasts. Thi s is usually thought of as i ts ' life' or i ts
at the end of which the item under consideration fails and is eitherrebuilt
Strictly
this phenomThi s is usually referred
when it
how
it out
to as 'downtime' or 'unavailability' , and measures how much of the
time the item i s incapable of fulfilli ng a stated function to the satisfac
tion of the user, in relation to the amount of time the user would like it
to
Unavailability
avail-ability)
is
that it has survived to
in the
how
zt is
of that period. We have seen that this is the conditional
the
probability of failure. This could
be described as a measure of
'dependability ' , if only to distinguish it from the other three variables.
One common variation of this measure is the 'B 10 life'. Chapter 1 2
that this i s usually measured from the moment the item i s put
and is the
before which not more than l 0% of the
into
items can be
to fail. (In other words, the conditional probabilof fa ilure in the stated period is
""'"l"H ... . . . ,,..
295
\Vhich
if a
costs per unit of
it i s
output among all those used by an electric
would
its
load)
want i t to generate
.. lnterms
much of the time as orn;s1t)le.
of this function, the m ost ao1oror0nate measure of maintenance effectiveness is
availability.
load. They may even
operational reasons. Slowdowns or s hutdowns of this nature affect the utilisation
of the asset as opposed to its availability. In essence,
meas u res what
percentage of time the mach i ne i s available t o fulfil its
quirement, while utilisation measures how m uch it
On the other hand, the
m ight only be used oer1001ca11v
peak demands for power (peak toads).
will be that the generator comes on stream as soon as it
measure of effectiveness wil l be how often it does so
by a failure
it fails to do so.
measu red in terms of number
When meas u ri ng safety ,
of days or number of m anhours worked between lost time incidents wr rata11t1es t
This is a form of 'mean time between failures'. Similar measures are
for environmental i ncidents.
On the product qual ity front, a scrap
4% can be seen as a measure
of unavailability, in the sense that while a machine is
(A scrap
of 96%). Scrap rates can also b e expressed a s
2 0 parts p e r million, which
is another way of expressing a failure rate. Both are valid measures of maintenance
effectiveness,
in h ighly mechanised or automated orc,ceisses.
Different Expectations
Every function h as associated with
and/or durabi lity and/or availability and/or ae1)en1oam
For i nstance, two of the functions associated with the bodywork of a car
isolate the occupants of the car from the elements' and 'to look
. Most
car owners expect the bodywork to be able to fulfil
the car is a convertible or
ext)ectea life of the
o r a window). On the other hand, everyon e knows that cars
start 'to look u n acceptable' in the space of a few
or weeks.
in
first
case we have a conti n uity
which
be meas u red in hundreds of
thousands of kilometres or decades, while in the second
the
pectation is measured in h und reds of kilometres or
296
by the fact that the loss of nearly every function
This i ssue is
can be caused by more than one sometimes dozens of failure modes.
Each failure mode has a.),)v..... 1a1....., u1 with it a specific failure rate ( or MTBF),
and each will take the function out of service for an amount of time which
to that failure mode. As a result, the continuity characteristics
is
of any function will
be a
of the continuity character
istics of all the failure modes which could cause the loss of that function.
/"'"'"'''; "fo"
conclusions, as follows:
we need a thorough
of all the failure modes which are
likely to cause each loss of function in order to be able to design, operate
and maintain an asset in such a way that the effectiveness ex1oec:ta1:im1s
which we have of each function will be achieved
it is unreasonable to hold the maintainer of an asset alone accountable
for the achievement of any continuity (reliability/availability/durabilfunction of any asset. The
ity/dependability)
achievernent of these
is also a function of how it is oe:1gnec1,
the associated
Accountability for
built and
between the people responsible for all of these
'maintenance' effectiveness as it is
functions. (In other
defined in this v:10.,F" is not only a measure of the effectiveness of the
motmt:enarn;e ri,r:
t It measures how effectively everyone associwhatever is necessary
their
in
ated with the asset is
to ensure that it continues to do what its users want it to do.)
f" n ',ITH.:>r
1 1
vlrtl', "' 11
297
Different Functions
the most important
about
the effectiveness of
maintenance activities is the fact that every asset has more than one and
sometimes dozens of functions. As
a
of con
tinuity expectations is associated with each function. This means that i f
298
so each functional fail u re n eeds to be cons idered on its own merits, as fol lows:
Functional failure A: fails to pump a t all: Obviously, if a pump isn't working, it
r1!:!c,n11r,.o However, there are five pumps i n the station
cannot be
"''"'"'r,rt"' on the
SO the level of av,illab1111:v JtcaHH!r.tf ....,,.nc;t,JvllUv
of demand. For
us that he ''hardly ever" has all five p umps
exc1m1::>le. the station
i n use at once - so seldom that we can i g no re the possibility. He might also tel l
i n use simultaneously for a total of not more than
one hour a
and then never for more than about ten minutes at a time. If
ea.ch pump has an average availability of 95%, two pumps will be out of service
simultaneously for no more than 2% of the time. In other words, fou r p u mp s
would be available 98'%, o f the time, while there i s a demand for fou r pumps 4%
of the time. Under these circumstances, only a tiny fraction of customers would
and then not for very long. Thi s m ight tempt the owne r
need to
t o accept an
(If he regularly had five or more customers want
to buy gas at the same time, he would expect a much h igher availability.
But it may cost him somewhat more to achieve it, especially if he has to pay a
p remium for rapid response when calling out technicians to deal with failures.)
i rritating to take their business elsewhere,
might find slow pumps
Consequently, the owner is
esi::;,ec1a11v if there are faster alternatives
likely to want any of his p umps which wasn't failed completely to pump at the
requ i red rate 1 'al l the time - o r at l east, as close to all the time as you can make
it". This
tum out to m ean
99.8% of the time that the pump i s not
otherwise out of action another form of 'availability'.
40 litres/minute: If the pump pumps too
fast, it is likely to no1'\or!'.lto
::i"'' __.. s ufficient back pressure to keep tripping the 'tank
mechanism in the nozzle. Customers would have to learn
full' pressure
to throttle back the
rate by not
the handle so m uch, which
many regu lars m ight also find irritating enough to cause them to take thei r busi
ness e lsewhere. As a result, the owner is l i kely to say that he wou ldn't want this
failed state to occur "too often". He m ight then q uantify this expectation as a
failure rate - say not more than o nce in fifty years on any one pump.
Function 2: to indicate volume and value of fuel delivered to customer to within
0.03% of actual volume/value. Thi s function can fail i n two ways, as follows:
Functional failure A: indicates that more than 0. 03% less fuel has been delivered
than actual: If this
the station owner appears to be selling less fuel
than
is actually
so he l oses money. The failure becomes .:in,v,,.,,,,,....
after a while,
ratio of fuel sold to fuel received will start to
Nevertheless, the owner wou l d p robably still seek a low failure rate - say not
on any one pump. (If the indicator fails completely,
more than one in 1
one lucky customer
it s hows that nothing
been delivered. If this
might get a free tank of fuel, then the station manager wou ld shut down the
affected pump until the p roblem is rectified.)
299
Functional failure B: indicates that more than 0. 03% more fuel has been delivered
(thus
in the com1...;,<.,4U<C,t H,,<;,<;> would l ead
any one pump.
c-1' ".>n ri ,, nr,
customer's fuel
300
N
Maintenance
Functional failure A:
below 2 000 litres: Based on normal patterns
are ordered when the tank l evel apof demand, fresh suppli es of
nrri!'.l f't"!o<::: 5 000 l itres, and we are told that they are nearly
delivered
before the level reaches 2 000 litres. If the level in the tank drops m uch below
2 000 l itres, there is a g reatly increased chance that the tank wil l empty, caus
ing the station to lose business. As a resu lt, the station manager expedites the
to 000 l itres
i ndicated by the low l evel alarm).
delivery if the l evel
He says h e needs to expedite deliveries about once a year, which he says is
effectiveness in terms of a
. Here he is
about
by increased demand and/or
rate. (Note that this failed state i s
It h as nothing to do with the maintenance department in the clasto 'cause the business to continue'.)
Functional failure B: level rises above 48 000 litres: The l evel in the tank is only
l i kely to rise above 48 000 litres if the delivery driver is not paying attention to
the tank level i ndicator when filling the tank or if the level indicator itself has
failed. In both cases the warni n g light comes on at 48 500 litres. We are told
that this happens "about once every six months" another failure rate which
i nvolved m i ght say they accept
the
Functional failure C: tank contains something other than gasoline: The tan k
if it is filled with somethi n g else
something other than
can
(say) diesel. If this happens, customers could fill their tanks with the wrong
cou l d
claims for
the resulting bad
him out of bus
iness, so he would rather this didn't happen at all. When reminded that 'never'
is an unattainable ideal, he m ight decide to accept a failure rate of (say) once
in 1 00 000 years.
light if level drops to 5 000 litres. The
Function to switch on a local
the level in all the fuel tanks every day i n o rder to
station manager
track consumption,
orders more fuel when level s approach 5000 l itres. The
low level warning
serves as a reminder if the level indicator fails or if there
between readings. This l ight i s n eeded about o nce
is a sudden
If it does not work when needed, the low level
every two
l evel drops to 2 000 litres. If an initial order is placed at
alarm
this late stage, the tank will almost certai nly run dry and the station will b e out
The owner says he will accept a mean time between
(MMF) of 400 years. In the l ight of this expecta
1 6 tells us that the maxim u m unavai labil ity the station
tion,
can tolerate for the l ow level warning light is Mm/MMF = 2/400 = OB%. This means
that the low level alarm is
maintained effectively if its a vailability remains
above 99.5%.
Function to switch on a local warning light if level rises to 48 500 litres. The
high level warni n g
is backed up by an audible alarm , so following similar
the owner m ight come to the conclusion that he will
logic to the above
accept an
97.5/o for this warning
30 ]
Function 8: to sound an alarm if the level in the tank drops below 2 000 litres.
not
there is a 50% chance that the tank will run d ry before the tanker arrives, and
for about one hour on average u nder
the station would be out of
c i rcumstances. This leads the station owner to conclude that he will not accept
this multiple fail u re (level drops below 2 000 litres while low level alarm
As u1;:,\.,1.1;::,,::,u
1 00
m ore than "once in a h undred
is
one
year,
so
the
station
can
MTED
l evel alarm of MreiMMF 1 / 1 00
alarm ls being maintained emect1ve1v
A
B
3
4
5
7
8
Each pump
pump
Each pump
Each pump
Each pump
Each pump
Each pump
Each pump
Whole system
A
B
A
B
A
A
A
A
A
99.5/o
50 years
1 000 years
50 000 years
1 000 years
1 000 years
1 000 000 years
500 years
Tank
1 year
Tank
6 m onths
1 00 000 years Tank
97.5''/o
99 /o
The
illustrates several important
of maintenance
as follows:
UL
H/L
UL alarm
we ,u-e not
maintenance
when
effectiveness. The
ment effectiveness we are
.:hiftino
distinction is important, because ., ... ..,.., e'mt:>nas1s
from the
to focus on
to its functions helps
maintainers i n
what the equipment does rather than what it is.
r\a,A,t'YY\ ,'1T\f'>''
tYt<'>'}VIH"tr"<'
302
even
assets have a surprisingly l arge n umber of fu nctions.
Each of these functions has a unique set
Before it is
to
a
maintenance effecwe need to know what all
or otherwise in each case.
This means that it is not possible to list a single continuity statem ent for an entire
or "to last at l east
asset, such as "to fail not more than once every two
. We need to be
eleven
about which function must not be lost
more than once every two years or must not fail tor at least e leven years ( or
which functional failure must not occur more than once every
more
two years, or which functional failu re m ust not occur before eleven years).
What RCM
303
It is poss1 :0 1e for many assets to operate too fast as well as too slowly.
which
asset would i ncrease the OEE as defined
means that it possible to obtain an apparent
r,prtc,,rrrH ll"ll"P
the asset to
i n a fai led state.
by
machine was that it
For instance, a primary performance standard of the
The 1' means that If the machine
should produce 101 1
hour, it is in a failed state
produces more than 102 units
it starts going faster than a
u,nrv_,n_,,nac,c orbecause
heat and damage the
However, if it operates at 103
102%. This increases the 'overall'-''"'"''"""'''"''
at a time when the machine Iv ctv\UQIIV
Maintenance
304
the OEE as defined above only relates to the primary function of any
because as i n the case of the "" ...'"_..,,u,.,
asset. This
every asset machine tools included have many more funcand each of these will have their own
tions than the primary
v...,,,,.v,, u""'....._,.., ...,, ,,.,.,-,.,.._ ..,..,., ...,,,,.,.. Consequently, the OEE is not a meas
but only a measure of the effectiveure of 'overall' effectiveness at
fulfilled.
ness with w hich the primary fun ction of the asset is
truly user-oriented maintenance
finally, for the reasons discussed
Pn ltPr1nr1 '1 Plc need to tum their attention away from
effec
tiveness towards functional effectiveness. So if measures of this sort
must be
it is much more accurate to refer to them as measures of
rather than 'overall equipment
'primary functional effectiveness'
effectiveness' .
Conclusion
The two most important conclusions to emerge
are that:
t' h <;int'.:>t
Labour
The cost of maintenance labour typically amounts to between one third
on the
and
and two thirds of total maintenance costs,
overall wage levels in the
on contract labour
nance labour costs should include
and materials' because it
is often incorrectly grouped under
is bought out). When considering maintenance labour, it is also wise not
maintenance work done opto make the common mistake of
erators as a zero cost "because the
In
operators for this work, the onmn11S2lttcrn
ffil:Uff[en1ance, and the COSt should be ,,,, 1;,,..,,..,, , u i .,;,,ciiu:" 1 uv,.,., .... , , .. .
306
Maintenance
maintenance labour effi-
Time recovery
of total
time paid for)
hours and as a
Overtime
of normal hours)
of
Relative and absolute amounts of time spent on different
default actions and modifications, and subsets of
and niaterials
and materials usually account for the portion of maintenance ex
penditure which does not come under the heading of ' labour' , How well
in the following ways:
measured and
is
they are
Total
on spares and materials (total and per unit of output)
Total value of spares in stock
Stock turns
value of spares and materials in stock divided by the
total annual
on these
stock items which are in stock
VVli. vvH <,t,:;..v of
Relative and absolute values
of stocks (consumables,
of estimates
hours vs actual hours for jobs which
307
Some of these
measures are useful for ..m,.1 kinr
,,H,,"'r imme diate deaction
cisions or initating short-term
budgets, time recovery, schedule completion rates, .,u.,,,n,.....,,:...,
more useful for tracking trends and ('t,,rnr,,r n u JJe1ctOJrn1;an<:e with sirnilar
facilities in order to plan
term remedial action
per unit of output, service levels and ratios in
are
a
help in
attention on what must be done to ensure that
maintenance resources are used as i;:;u11,.,CHl;J
,,..,.,., ,..,,..a, ., as pv,,.,t<J1>.
Maintenance efficiency is
to measure. The issues which
it addresses are usually under the direct control ofm aintenance managers .
""'"
For these two reasons, there is often a 1'..,,.,0 "'"
.....:1 ...
rn.,' 'J for these managers to
and not
on maintenance
focus too much attention on
effectiveness. This is unfortunate, because the issues discussed under the
11
1 940
1 950
1 960
1 970
1 980
1 990
2000
The use of RCM helps to fulfil all of the Third Generation '-"'""' """""" ''v"'"'
The extent to which it does so is summarised in the
starting with safety and environmental
HH.v;,;..,. . ,
308
Reliability-centred Maintenance
from the technical viewpoint, the decision process dictates that fail
ures which could
or the environment must be dealt with
in some fashion it simply does not tolerate inaction, As a result, tasks
to reduce all equipment-related safety
are selected which are
or environmental hazards to an acceptable level , if not eliminate them
completely. The fact that these two issues are dealt with by groups
which include both technical experts and representatives of the 'likely
victims' means that they are also dealt with realistically.
o, o
the structured
to protected systems, especially the concept of
the hidden fu nction and the orderly approach to failure-finding, leads
to substantial improvements in the maintenance of protective devices.
reduces the probability of multiple fizilures which have
This
se1ious consequences. (This is perhaps the most powerful single feature of
it correctly significantly lowers the 1isk of doing business.)
RCM.
involving groups of operators and maintainers directly in the analysis
makes them much more sensitive to the real hazards associated with
their assets. This makes them less likely to make dangerous mistakes,
and more likely to make the right decisions when things do go wrong.
the overall reduction in the number and frequency of routine tasks
esr>ec1atLV invasive tasks which upset basically stable systems) reduce
the risk of critical failures occurring either while maintenance is under
way or shortly after start-up.
This issue is particularly i mportant if we consider that p reventive maintenance
a part in two of the th ree worst accidents in i ndustrial history (Bhopal,
Chernobyl and Piper Alpha). One was caused directly by a proactive mainte
nance intervention which was currently u nder way (cleaning a tank ful l of methyl
1 cf"\,1"\l;;ln'!;;l; 1'Q at
On Piper Alpha, an u nfortunate series of i ncidents and
oversights might not have turned into a catastrophe if a c rucial relief valve had
not been removed for preventive maintenance at the time.
309
25
rn
:t:
0 20
j
flJ
15
JI.,
<l> 10
0.
,,
o
C
Cl)
(.)
<(
63
65
67
69
71
Year ---->
Source: C A Shifrin: "A viation Safety
A viation Week & Space Technology: Vol
95
310
Higher Plant Availability and Reliability
on the perform
achieving 95%
potential than one which is f' l l ..ro n t" I H
has
l r'tllP'111 n a 8 5%. Nonetheless , if t is
RCM achieves
i
significant improvements
of the
The
of course improved by
number and the
of unanticipated failures which have
consequences.
The RCM process
to achieve this in the following ways:
the
consequences of everyfailure
review
which has not already been dealt with as a
w ith
criteria used to assess task
ensure that only
the
the most effective tasks are selected to deal with each failure mode.
What RCM
3 11
by
the in6 each failure mode to the relevant functional
tlinann,'i ,. which
formation worksheet provides a tool
leads in tum to shorter
times.
r,:U 'lt"tT'l fT
1v'-H,_ ..
, y . ,, ,.., , . v ,, . ._ , ,
on on-condition
maintenance reduces
with a corresponincrease in avail ab ility . In addition,
ding
l ist of all the failure modes which are reasonably
a dispassionate assessment of the rel ationship between
reveals that there is
no reason at all
routine overlumls
rron 11,9 1u ,i Thi s l eads to a reduction
scheduled downtime without a corresponding increase in unscheduled downtime.
a n,n t,, r; <, ,, c
,-.
in
a shutdown
of the above comments, it is often necessary to
or an overhaul for any of the following reasons:
to prevent a failure w hich is ge1m1ne1'y
to
a potential failure
a h idden functional failure
- to
- to carry out a modification.
In these cases, the disciplined reviev\i of the need for
or cor
rective action that is part of the RCM process leads to shorter shutdown
worklists, which l eads in tum to shorter shutdowns. Shorter shutdowns
"''"''' '""'" '"'"' uu1u"Lc and hence more
to be comrHe1tea
'HJjA-.TPP,llft
u.n ,, -. u- 4 u "
as explained on page
RCM provides an
and
participate in the process to learn
how to
operate and maintain nevv plant. This enables them to avoid many of the
errors which would otherwise be made as a result of the
pro
cess, and to ensure that the plant is maintained
from the outset
At l east four organisation s with whom the author has worked in the UK and
USA achieved what each described as 'the fastest and smoothest start-u p i n
the company's history' after
ACM to new installations. In each
ACM was applied in the final stages of
cerned are in the automotive, steel , paper and l"t"\l"ltO, tir,r\Ol'\J
r"
3 12
the elimination
and hence of superfluous failures.
As mentioned
it is not unusual to find that between 5% and
superfluous, but
plant are
20% of the n r. r...-. ..._,,.,.o,," of a
can still
the plant when they fail. Eliminating such
leads to
increase in reliability,
'-' "-f.L &. v v V'-'' "'HU.i,;..
of mainte-
r,:, ,_)[! 1 t , o
For example, ACM has led to the fol lowing reductions i n routine maintenance
workloads when
to existin g systems:
313
\...,A1v1HU.l,;;;\,
3 14
Reliability-centred Maintenance
how the plant should be operated together with the identlj'ication ofchronicfailures leads to a reduction in the number and severity
of failures, which leads to a reduction in the amount of money which
must be spent on
them.
What RCM
5
t,he
._,.,,..,,.. U H..,....,,,,
f'VH"H'.:>rr> C>l'I
Better Teamwork
In a curious way, teamwork seems to have become both a means to an end
The ways in w hich the
and an end in itself in many
structured RCM approach to maintenance problem
and decision
m aking contributes to teambuilding were summarised on page 268. Not
only does this approach foster teamwork within the review groups thembut it also improves communication and --- ""... ,., .. 01:>:t\il,'ee1n :
production or
and the maintenance function
m,mageJm ent, supervisors, technicians and vv,, , u,v,
....,..,,H'"....,.. "', vendors, users and maintainers.
A Maintenance Database
The RCM Infom1ation and Decision Worksheets .,.."''""''
as follows:
additional
p1 \JV lU\.c
number of
316
Reliability-centred Maintenance
3 17
An Integrative Framework
As mentioned in Chapter l, all of the issues discussed above are
the mainstream of maintenance
and many are
feature ofRCM
'-'TPn-rnJ-<.'. TPn frame work for t!wlrlin
.__,.,.,, ,_. ,, _., ,..._ all of them at once, and
in the
everyone who h as -...u1 L:i;u6 to do with the
process.
".lt"' n l t t'\ct,'\ H
19
320
emDoait'.a in a handbook on maintenance evaluation and
drafted
t1f'1ll) l,m11ner:1t.
program
747 maintenance program has been suicccssrut
r:1nn-111nor, 1n,,1 [t>t:nnunu, led
irnprovements,
later in a seconll document, MSG-2:
which were
Airline Manufacturer Maintenance Program Planning Document.
MSG-2
used to
the ,i;;nzea: uua 11l(llJrzten{ll'lC()
These programs
the Lockheed 101 1 and the
have also been
MSG-2 has also been applied to tactical miliwe refor aircraft such as the Lockheed
tarJ' aircrqfi;
!Cct.: in
S-3 and P-3 and the McDonnell F4J. A similar document J/'
.... ..,..cyu,
r.,.,.,,.,.,,;1
the Airwas the
bus lndustrie A-300 and the Concorde.
outlined in MSG- 1 and MSG-2 was
The
' a scheduled-maintenance program that assured the maximum
,,.fr_, ,.,,,,,,,n
and also prothe
vided them at the lowest cost. 11s an
economic
under traditional maintenance policies
achieved with this
scheduled
the Douglas DC-8 aiqJlane
339 items, in contrast to seven such items in the DC- IO proto overhaul limits in the later
gra1n. One
Elimination of scheduled
turbine propulsion
'"'" ';"'"""'' led to major reductions in labour and materials
reduced the
maintnwnce
this was a respectable
then cost more than US$1 million
the Boeing
under the MSG-I
e.u1er.iaea onl_v 66 000 manhours
structural
a basic interval
000 hoursfor the first
Under traditional ma'. uuentim:e
than 4 rnillion manhours to arrive at
cies it took an exJr,e1,wttu1e
/"'1,>
the smaller and
the sanze structural
DC-8. Cost reductions <d' this mtuu,izti,tae are of obvious impor
maintainin.g large
r
n ,, al''IJW,, c,
Ut VC.Hfj,"
<.: .A,LU t u , uc .
'1V H ln /l'/'T7
,4
32 1
t ,n- ,n , l H >'
program.
led to
centred Maintenance.
The Nowlan and Heap report provided the basis
which was
1980 and revised in 1 988 and 1 993. MSG3 remains to this
day the process used to develop and refine maintenance programs for all
major types of civil aircraft.
322
Reliability-centred Maintenance
The projects
,u,.....,.. smelting,
forming)
\w'U;i:;., u ,: v0, assembly, components, tyres)
(nickel, aluminium, platinum)
v,(,J,., ,H H:::
F,, ,
base metal
banking
breweries and soft drinks
buildings and building service.s
chemicals (industrial and household)
cosmetic manufacture
r-r.,rn n,ntt:r manufacture
electric utilities
steam, gas and nuclear
distribution,
g as distribution
harbours
iron mining
m ass housing
microelectronics
undertakings ,- -"' navies and air
n uclear facilities
pharmaceuticals
equipment
cellulose, tlbre[ia;s
postal services
pulp & paper (tissue paper, fine paper, "'""""""""U,f.JH.....
steelmaking
water distribution
woodworking.
received
The fact that RCM has been
at all
levels, and has enabled users to achieve some remarkable successes in all
that it i s much less affected cultural differthese countries,
this nature.
ences than many other
listed opposite all became aware of and started
The
ing RCM at different
so implementation is more advanced on some
sites than others. The overall situation c an be summarised as follows:
in about 25% of the cases, senior managers have
ary training
about 1 0% of the oni:antsa1:ior1s have applied RCM to all of the
ment on at least one site
and most
the remaining 65% have reviewed some of their
RCM to
most if not all of their
plan to continue
'"'" .,... ,.,, ..,., in each case.
However, Chapter l 4 p rovides
summary of the results achieved
together with a brief review of some of the ,. .,.., .......... '"" ....,.
to
'JJ
1:-.l
Lubrication
Dealt with
work
.....
.....
Not handled as a
..
Q'tl
failure could
Asks the multiple failure could
the affect
affect
after
at the foot of the
I hidden function column
failure i s evident. Yes and no
colanswers lead to two
and leads to the same
umns: 'Yes1 defaults to
defaults as MSG3
to desirable
Safety
consequences
;::i
<"'l
I\)
page
default action if no
can be fou nd: does not specify
.
.
Ill
::,:
G)
ti.)
to
::,:
lubrication
Other l ubricaas individual
is a
issue
about lubrication is
at the head of every task
selection
selection criteria
users to consider tasks
from all categories before makina a
selectlon
Not considered
I Considered in auestion E
Appendix 1 :
Asset Hierarchies and
Functional Block Diagrams
Plant registers and asset hierarchies
own, or at least use, hundreds if not thousands of
Most
physical assets. These assets range in size from small pumps to steel rolling
mills, aircraft carriers or office blocks.
may concentrated on one
smal l site or spread over thousands of square kilometres. S ome of these
assets will be
others will be fixed.
can apply RCM - a process used to determine
Before any
what must be done to ensure
to do what
ever its users want it to do - it must know what these assets are and where
are. In all but the smallest and simplest facilities, this means that a list
of an the plant,
and buildings owned or used by the nror1n 11 ,;:n_
sort, must be
tion, and
This list
ueing1]eu in way which makes it
to
1
"'"'"'
*'.ln'.l
track of the assets that have been
those that have
yet to be
and those that are no.t
is also needed for other
as the planning and scheduling of routine and non-routine maintenance
history recording and maintenance cost allocation. As a result, it
should be set up and the associated
in such
a way that it can be used for all these !J"'" "'v,.,...,,,.
Chapter 4
that RCM can be
at almost any level in a
hierarchy. It also suggested that the most appropriate level is the level w hich
number of failure modes per function.
leads to a reasonably
'Appropriate' levels become much easier to
if the
is
as a hierarchy which makes it possible to ll'! Pn t t 1r u
any asset at any level
down to and mcmomg
nents Cline replaceab le
or even spare
The truck on page 85 provides ()flC
A l . I overle af shows another ............u.u..,.,
boiler house in a food
factory.
1
v a u ,F-
328
FoodCo lnc
Factory 3
Switchgear
Number
01
02
03
0301
0302
0303
030301
030302
030303
030304
03030401
03030402
0303040201
0303040202
0303040203
030304020301
030304020302
030304020303
0303040204
03030403
03030404
0304
04
05
Figure Al.I
Asset 1-1,,...,,,,.,,.,,.,.,..,,
Asset
FQodCo lnc
Factory 1
Factory 2
Factory 3
Preparation Dept
Packing Dept
Site Services
Power Supply
Compressed Air
Water System
Boiler House
Coal Handling System
Boiler No 1
Shell & Tubes
Grate
Feed Pump
Motor
FD Fan
Boiler No 2
Ash Handling System
Maintenance Dept
Distribution
Head Office
329
Itunctional hierarchies and functional block diagrams
oo :s s1101e to /'l e> '<1,:,. ! ,n
>"r'' "" '
1
.:>T"'""
u<UU'-/UV;
330
Reliability-centered Maintenance
WHAT IT IS ...
Figure A J.3
Asset Hierarchy
33 1
332
Reliability-centered Maintenance
Steam
For example, the boiler house operators and maintainers would be fully aware
of the fact that coal , water and air go into the boiler at one end and that steam ,
f l u e gas a n d a s h (and occasionally d irty water) come o u t o f t h e othe r. Most of
them would probably regard the notion that these simple facts should be d rawn
in a d iagram at best as a waste of time. As d iscussed at length in Chapter 2, the
real
is not to identify the simple and usually obvious relationships be
tween processes, but to define the desired performance relative to the initial
capability for all the key e lements of each system, and then to define what m ust
be done to ensure that the system continues to deliver the desired performance.
1'""
assets at or below the level chosen for the analysis are dealt with as p art
of the normal RCM process. Part 7 .of Chapter 4 showed that the fu nc
tions of lower level assets are either listed as
functions in the
main analysis, or deal t with as failure modes, or in u,.., """"'"'
ally complex subsystems, broken out for ser,ar.ate
For instance, the example of the t ruck shown i n
4. 1 1 i n
4
showed how a blockage in the fuel l i ne could simply be treated as a failure mode
of either the engine or the drive system, without needing a separate function
statement for the fuel system or the fuel line.
Maintenance
334
System boundaries
it is of course important to
RCM to any asset or
When
and where it ends.
to be analysed
define
If a comprehensive asset hierarchy has been drawn up and a decision
taken to analy se a particular asset at a particular level, then the
usually automatically encompasses all the assets below that system in the
asset n,A
:.,.,:r1L,,..,:.
'lr, nu
1. The only exceptions are subsystems which are judged to
be so
that they will not be
at all , or very complex
which are set aside for set)arate ..,"n _,, h u, c
which consist of a sensor in one sysCare is needed with control
a second system, which in tum
to
tem which sends a
activates an actuator in a third . Chapter 4 explained that this issue can often
be dealt with either by conducting the
at a high enough level to en
sure that the 'system' encompasses the entire loop, or by
control
'\c'il<ctf'n1 .:c ,,,.-.,, ..... t,"" 1 " (after the controlled systems have been analysed). However, sometimes this is not practical, in which case a decision must be made
will encompass the control loop in its entirety.
to which
right on the
Care is also needed to ensure that assets or
boundaries do not 'fall between the cracks' . This applies
to
items like valves and ,_,.,.,,. ;.,--
because as
It is wise not to be too
about boundary
um1erstana1mg grows during the RCM process, perceptions about what
not be incorporated i n the analysis frequently ...,,.u,,,..1-,.,_,.
should or
This means that boundaries may need to be extended to incorporate some
others may be dropped and
others which are i ncluded
initially may be set aside for later ,n-::1, hJ C l C
of rigid boundary definitions tend to be
the
external contractors
to apply RCM on behalf of end-users, because
boundaries must be defined precisely in order to define the com
mercial scope of the contracts. The fact that the analysis is the subject a
fonnal contract means that boundaries have to be defined much more pretechnical
much more rigidly - than is
point of view. Contracts of this type then either have to be rene.e:<matea
every time a boundary needs to be changed, or the boundary is not moved
in a suboptimal
The best way to avoid
when it should be,
the time and cost associated with these commercial manoeuvrings is to avoid
l'/'\'l"ltr\t''hncr out this
of maintenance policy formulation """"""''-''-u:,.,,..
;..,c,i,,,n;f',n,\nt
u1.:>JLJ;:,UUJ.v"u
Appendix 2:
Human Error
Chapter 4 mentioned that a
many equipment fa ilures are caused
'human error' . It went on to mention that if a
human em..1r i s con
sidered to be a credible reason why a functional failure could occur, then
that error should be i ncluded in the FMEA.
human error is an
enormous
in its own right. The purpose of this appendix is to pro
vide a brief summary of the
of human enw, and to 0u;:::....,,:-,1e
how they might be dealt within the framework of RCM.
Principal Categories of Human Error
B IanWhen considering the i nteraction between
chard et
group the main factors under four ......"'""' ",;;,, ,,>,
anthropometric factors
human sensory factors
physiological factors
psychological factors.
every 'human error' can be traced to a fa ilure or a problem which
has occurred in at least one
As
we review them
briefly in the first p art of this appendix, before
in more detai l at
the fourth
r, o f" a.r,c,"..' '
Anthropometric facto rs
Anthropometric factors are those which relate to the size and/or "'T,e> n l'TTn
of the operator or maintainer. Errors occur because a person
person, such as a h and or arm):
simply cannot fit into the space available to do ..
- can not reac h
enough to lift or move sornetnm,g
- is not
or is reasonably likely to occur for any of these
If a failure is
reasons, it is highly unlikely that a proactive maintenance task will be
found to deal with it Note also that if a human error occurs for one of these
reasons ., the human error is not the root cause. The failure mode is ---H ,
human error is a failure effect
and the
poor
-.u-u ..,""
Maintenance
lf the consequences are such that something must be done about a fail
ure which is
for anthropometric reasons, the only viable course
of action likely to be
involve recon
This will
fin11r1nn the asset in such a way that it becomes more accessible or easier
to move. In this context,
A2. 1 shows some dimensions which are
considered by the US Navy to be adequate for reasonable human access
in confined spaces.
i,..., ,.u,,u;-,,
Figure A2.1:
Human Error
The term 'physiological factors' refers to environmental ,.., rr;e, <.''"''' which
The stresses include
or low terni)en:Jltmes,
affect human
338
.-.l\Ufr.,,,..., what RCM can do is alleviate if not eliminate the hostile
which SO often exists between maintenance and
less inclined to blame
on page 268. This makes
as
each other for errors. and more inclined to find solutions.
<lT>,r> n ,c- n , r,
An,,:,,;i l",r,r><'
The three sets of factors discussed far all relate to external phenomena
whi ch cause the human to m ake an error. As a
they are
and to deal with (al though doing so may sometimes be
easy to
and
of errors are
A far more
those which find their roots in the psyches of the humans themselves. As
a result, these
factors are discussed in more detail in the
next section of this appendix.
Psychological errors
of human error into those
Rea;.;on 199 i divides the
which are unintended and those which are intended. An unintended error
is one 1vvhich occurs when someone does a task which he or she should be
An intended error
the job
but does it
'.
1l'
1Pt,t
Qf'1n
n
deJ1bernte1v
sets
out
to
do
something,
but what they
whe
QCCUfS
J V X U-J . .
Figure .A2.2:
".'.> t"",...,,..,,,;,:;,:. of
psychological errors
2: Human
u nintended errors are subdivided into
and
intended errors are subdivided into mistakes and violations.
A2.2 and are discussed
are illustrated in
These
in the following oma12:.rau1ns.
and
S lips and
are also known as skill-based errors.
son1el)OOV who is fully
to do
it correctly many times in the
when somebody does ..,,....,""t h n , n ....,,,,,.,...,,.,,nu
wires a motor n '''" "' t l u
occur
when .someone misses out a
in a sequence of activities
in
stance, if a mechanic leaves a tool behind after
in a machine or
to fit a
.,
1""""'1
:" "'1
340
Reliability-centered Maintenance
For instance, a protected system might be set u p in such a way that excessive
pressu re should cause an alarm to sound and a warning light to illuminate. How*
ever, a situation mig ht a rise where the alarm is fai led, the pressure increases and
the light comes on. The absence of the alarm may l ead the operator to believe t hat
the warning l ight on i ts own is only a false alarm, especially if it has a history of
spurious fai l u res. In this case, the operator may choose to take no action u ntil the
a course of action which h as been appropriate i n the past. On
l ight is
this occasion however, it is not the right thing to do.
The application of a bad rule means j ust what it says. The normally chosen
or prescribed course of action is just plain wrong.
A classic exatmPle o f a bad rule is a maintenance p rogram which schedu les items
for fixed i nterval overhauls in o rder to deal with failure modes w hich conform to
fail ure pattern E or F
Figure 1 .5 or 1 2. 1 ). In the case of Pattern F especially,
an action designed to improve rel iability will i n fact make it worse, by upsetting a
stable system and i nducing i nfant mortality.
the 'root cause' of the failure is the rule itself or the process
by which it is selected. If the rule is promulgated or selected by someone
other than the person who performs the task - in other words, if the person
doing the task is only following orders - then the mistake is really the effect
of another failure.
The RCM process helps to reduce the possibility ofmisapplying good
rules in two ways:
especially what could happen
the thorough analysis of failure
if a hidden function is in a failed state when it is needed, means that people
are less likely to j ump to inappropriate conclusions when the situation
does arise (especially if they have been involved in the RCM process)
"',..,, ,.,,.,,. ..... attention on the functions and maintenance of protective
the RCM process greatly reduces the probability that these
devices ,Nill be in a failed state in the first place.
Hurnan
34 1
..d,....,;vvu
The chances of bad habits .;,
'"''"""''"''''
' "<T
6 are also reduced if care is taken
rise to C'l"'\,,,r,nn"'
the FM EA to r>C>r, n ru failure modes which
hc r,,,, .......... rh, to reduce them to
consequences of a false
c ases where the ......a.r.n,:,"'"''" an d/or the
alarm warrant it, the most
entails ,,c;, riloe , r,n
bad rules because the
to reduce
RCM
whole RCM process is all about defining the most
'rules' for
maintaining any asset
care must be take to ensure that the rules of RCM itself are
not applied badly. This is u.....,, ,_.,Ju,.,
application of RCM
;'-'111-')
,nistakes
mistakes occur when someone is confronted with a
situation which has not occurred before and which has not been '' "'
In situations like
ted (in other
one for which there are no
to make a decision about an ,:,ru"'lrnru"i ',t,:,
and a mistake occurs if this decision wrong.
r..-,.,.....,,,,,a the author has found that a common ,,,,.,.,'"'''"'"'' which occurs
in this context is a belief on
managers and
that
if a crisis occurs late at
"I know, therefore my company knows". In
Vlr'>A\l< ! I P,i ,'tt."
use
the
when all the
to
less if it is not in the m ind o f the person who has to take the first
deal with the cri sis.
This
that fir st and most obvious way to avoid Krnowte{H.!e:-o:ase:a
of the
mistakes is to improve the
who have to make the
and maintainers.
decisions. In most c ases. these are the
tors and maintainers are
to make appropriate decisions more often
if they clearly understand how the
vvhat can go
wrong (functional failures and failure
of each
fail ure
effects). As mentioned several times in
2, 13 and
1
enhaoc
1- ,.,,
..1
h
1.UHl\.;lJJU-
t.V'-l 'U.t,HU.,
342
Reliabilitycentered Maintenance
minimise novelty, because new and alien technologies put people at the
curve, where mistakes are most likely to happen
bottom of the
avoi d
coupling. This means designing systems in such a way that
if failures do occur, consequences develop slowly
to give people
time to think and hence more opportunity to make the right decisions.
Violations
A violation occurs when someone knowingly and deliberately commits
an error. Violations fall into three categories:
routine violations. For instance, when people make a habit of not wearing items of protective
(such as hard hats) despite rules which
clearly state that they should
The
for routine and exceptional violations usually consists of appropriate enforcement of the rules by management. However, once again,
involvement in the RCM process
people a clearer understanding of
the need for safety procedures and the risks they are running if they violate
of
them. The
is beyond the scope of this book.
Conclusion
The most important conclusions to emerge from this appendix are that:
not all human errors are necessarily the fault of the person who made
the eITor. In many cases, the eITor is either forced by external circum
stances or by inappropriate mles. So if blame is to be allocated for any
error, c are must be taken to identify the real source
human error is at least as common a reason why equipment fails to do
what its users want it to do as deterioration, if not more so. As a result,
it should be dealt with as part of the RCM process, either as a failure
mode when it is a root cause, or as afailure
when it consists of
inappropriate responses to other failures
in the industrial context, it is only possible to come to grips with human
errors if the people involved in committing the errors are involved
directly in
them, and developing appropriate solutions.
Appendix 3:
A Contin uum of Risk
that it
be possible to
a schedule of
risks and economic risks in one
acc;ep1tat)te risks which combines
continuum. It .,, ..... ,.,,..,...,"' ..,., ..... that this
be made possible
5 .2 and 5. 1 4 in some way.
5. 1 4, repeatt:ct
A3 . 1 , showed what an
sation might decide that it can
for one event that has
economic consequences o nly.
01H"<0i'.J>Ct,:,rl
Cll(ft'rPC'tP.(1
Figure 5.2
what
one i ndividual might be
prepared to tolerate in a
,,,..,.,,,,,,. . .-,;, situation from
any event which could
prove fatal in that situation,
A3. l .
as summarised i n
Figure A3.2: Acceptability of fatal risk
344
next
was to translate the probability which m ysel f and my cotolerate that any one of us
workers are
be killed by any
for each
event
event at work into a tolerable
failure) which could kill someone.
previous example, the probability that any
one of my 1 000 co-workers will be killed in any one year is 1 in 1 00 (assuming
Furthermore, if the
the same
site faces
that everyone on
activities carried
the site
(say) 10 000 events which could kill some
average
that each
one, then
event could kill one person must be reduced
to 1 06. This means that the
of an
L U <U H SH'
Figure A3.3:
1
J;-''-'J1V1u1{{1\ev') into 1nean-
rl,>.('t lYt'\,:,.f'i
T'\.Olf'"t"n
._.,
,,
t.,
t">'
.u
'> u. uvv.
Ukely to
Figure AJA:
Acceptability of one
lethal event where
I have some control
and some choice
Ukely to km
to kill
Likely to kill
Figure A3.5:
Ac,;eJJ1ta1J1iltv of one
lethal event where
I have no control
and some choice
Maintenance
346
Similar
10 chance of killing 1
Figure A3. 6:
lethal event where
I have no control
and no choice
1 07
that these
Please note once
are not meant to be l"'\l'J'crn'l,t1;,:,.
and do not
reflect the
of the author or any other organisation or individual as to what should or should not be ac<:e1J1tat)te.
A Continuum
347
Appendix 4:
1 Introduction
that most failures
some
at
of the
is c alled a potentialfailure,
fact that they are about to occur. This
and is defined as an
physical condition which indicates that
is either about to occur or is in
is defined as the inability ofan item
standard. Techniques to detect potential
to meet a
failures are known as oncondition maintenance
because items are
,rH:>nP,r'fPri and left in service on the condition that
meet ..,..,,..,...,,. .... ,._,
is determined
of these
standards.
the P-F
which is the interval between the emergence of the
into a functional failure.
potential failure and its
Basic on-condition maintenance
,,, rt lr t nrt in the form of the human senses
-u ""..,, 7, the main technical
of using
wide range of potential failure
conditions using these four
However, the disadvantages are that
humans is relatively , m nrc:f'H'P and the associated P-F in
very short
IJ\J'-""uu ..... failure c an be detected, the longer the P-F
intervals mean that inspections need to be done less
intervaL
often and/or that there more time to take whatever action is needed to
avoid the c onsequences of the fail ure. This is
so much effort is
to define potential failure conditions and
techpossible P-F intervals.
de1tectmg them w hich
the
A4.1
However.
shows that a
P-F interval means
that the potential failure must be detected at a point which is higher up the
move up this curve, the smaller the deviation
P-F curve. But the
l! ,nrtd,hn,n
l
nor
a
11
if the final stages of deterioration
the
m
from
a
the more sensitive must be the
not linear. The sm ller
monitoring r,,,.,,. t"l .. .,. H.," ui:: sume:u to detect the potential failure.
0"' ,,,.., """ "'"'
349
"Normal"
Figure A4. l:
t:::
P-F intervals
and
from
conditions
'6
Time
monitor, as follows:
Dynamic monitoring detects
fa ilures
equipment) which cause abnormal
cially those associated with
to be emitted in the form of waves such
amounts
and acoustic effects.
Particle
detects '"""'"'" "'t,,,i fai lures which
particle
cause discrete particles of different sizes and
the environment in which the item or component
350
Maintenance
in the
fai lure effects encompass
which can be detecappearance or structure of the
detect pote ntial
and the associated monitorin g
fail ures in the form
the visible effects of wear and
dimensional ,.,, .. ,... ,,;;,-..,,,.
monitoring techniques l ook for po tenzperature
tential failures which cause a rise in the temperature of the equipment
itself
to a rise in the temperature of the material bei ng
the equ ipment).
" ""'"'"''" ,,,.F, techniques l ook for
Electrical rnnn1tnrincr
v,,h'""'""'' conductivity' dielectric s trength and potentiaL
electrical
4:
3 Dynamic
A Prelimimuy Note on Vibration
spectrum
each individual component in order to
However, the situation is COl11i'.)l1catctt
acceleration. So step
rneasured and what .-n ,:., ,:-, c nr, t1 n
is to decide which techn1qt1e
now called
systems in vibration
The role
systems can now find and >.H<<,::,1 1u,:,,__ t'il',-,l"'\ ! f'.> l'rH:
vibration , 1 1 1 .d t v , , ,
re2to111gs with al] !:he data from. """"'
of this part of this
The
al vibration
in
;e;,..
of failure
of two parts: a
anical vibrations i nto
to
Skill: To
the nature
much lower and contribute very
,'1 f:, ,H,u,,.,. When these
do grow the
of deterioration. Difficult to set
'-2 Octave Hand Analysis
Conditions
on a recorder
Skill: To
trained technician
to use w he n the measurement parameters have been
Good detection
en9:ir,eer: Portable:
Condition
P-F
,,c,ru , s ,,,c,rf
to , n t,,'>r'l",,rnt
P-F interval:
and vibration
the
354
'""" '" ''"'"""'" a suitably trained skilled worker, to interpret the
results an ,;,. v ,,..ttttc>t-.,,,,,,1 technician
can be done in 'real time' and is therefore faster than FFT
and does not suffer from certai n pitfalls caused by the batch nature of
CPB s pectra are very
for rapi d
FFr, such as loss of data
fault detection. Equipment
measurement and
level
Condition
To reduce the
of the waveform.
variable
pass. low pass and band
P-F
to mon ths
bank of
rolls on
356
skilled worker.
,t the resu lts
t"\uf;.:,.t'\f ,:,,n
>
consi derable
nt ,;:u-r..
I
'
'.l'l''l<> I U ,.' C> r"t
wear, imbalance,
"''''"' <.,vr,1.,c,, bel t drives, compressors, """- "'" "' roller bear
vv,,u , u.-,,,,, electric motors, pumps, turbines, etc
""
the
to be trended
v l H,UlF;,'-'c>
s mall
stage
to months
Condition
of the transform,
Skill: .. ,,,,m mwe1sta1:wu1g
P-F
Skill: A
Several
to months
358
Maintenance
shafts and
or metal
systems
0-1
Skill:
Reveal s some faults that may have gone undetected i n thei r earlier
stages or which are buried in the noise floor
of the v ibration spectrum.
More consistent than demodulation. Outputs are independent of machine
and i nstrument Frnax
Applicabl e to a broad range
from
in excess of I kHz
very slow
loose
high
Condition
and
3.13 Proxirnity
etc
P-F interval:
to
Skill: A
Portable:
tin1e: LJl!l!!IlOSllC
P-F
360
Maintenance
technici an
size and
i nformation
measurement Limited to rol ler element nP 'lrt1, oc
to
3. 15 Ultrasonic Analysis
Conditions m.o nitored:
wear,
c aused by
Ar;imir callOflS,' Leaks in pressure and vacuum systems (ie. boilers, heat
condensers, chillers, distillation columns, vacuum furnaces,
gas
steam traps: valve and valve seat wear: pump
of seals and
the
pipe or tank leaks
,.nn uu.u;::.-.,,
Skit!: A
Condition
36 1
can
3.16 Kurtosis
Conditions monitored: Shock
element
P-F interval:
Skill: A
to months
excavations
fonnation
lV!aintenance
362
rials. Irre levant e lecttical and mechanical noise can i nterfere with measurements.
Gives l imited information on the type of flaw. Interpretation may be difficult
4 Particle .M onitoring
4.1 Ferrography
Conditions ,nonitored: Wear,
transmissions,
P-F interval:
several months
is dil uted w ith a fixer solvent (tetrachloro
slide under the influence of a
of the
the
gas turbines,
Condition
P-F interval:
Skill:
months
364
Maintenance
Requires
P-F interval:
Condition
a
to
in ..........-.. ,,.,..... n-
d n n ! <HJP
Skill: To
microns
of the contaminants.
Maintenance
366
oil caused
wear,
oil systems
lo months
P-F
n umber of translucent
also vary
nmr-tu l ,;,. c in the
beam. Provides no information on the chem ical comJ)0;1t1on of contami
Dilution
avoid coinci
f6r
dence t:ITOr where
appear as one
4.8 Real Time Ferromagnetic Sensor
compressors and
P-F interval: Weeks to months
n n ,HH I T>I
On-line tecltmuwe
I A <'
Condition
to
the
'"''''"
4. 10 Graded F'iltration
Conditions monitored: Particles
corrosion and contam i nants
turbines, transirrn:sP-F interval:
weeks to months
A small amoun t
distribution
368
materiat or dirt
'"'""''-''''
of the capi ndicates imminent failure. The debris
colour, and texture) aepe1au1111g
H HvA
v (, CU H H ,<Hl"JJl
of
compressors and
to few
Condition
P- F
to
Skill: A
Portable
Sediment
D- 1698)
370
Maintenance
in transfonners, breakers, and cables
P-F
be taken offline
has
4. 15 UDAR
trained
be conducted i n
Detection A nd l<an2mt?J
Conditions
smoke from smokestacks
P-F
areas
level
skill.
5 Chemical Monitoring
A Preliminary Note on the Chemical Detection of Contaminants in ,Fluids
to detect elements i n
POl[enrnu faiiure has '"""''"' r,d
of the fluid
Condition
Wear metals: the
Tin from
turbine aircraft
contamination
Aluminium from
Boron from coolant leaks in oil
Calcium when found in fueL
Copper fron1 oil cooler
CmTOsion: the
Aluminium from
Iron
u,11 1 r. 1u 1 n n ""'""''""'Yt'
in
in oi l
374
Maintenance
Conditions
silicon: corrosion
lead,
zinc,
turbines, transmis-
compressors and
to months
!ll'fC> f V I P t1,P
of a
fla me ()f other
of
the
converted into
the results: an
microns.
5.5
Fluorescence Spectroscopy
turbines, transmisP- F
months
which
4: Condition
P-F interval:
months
Skill: To draw
line scans
caused
and
P-F
Se veral m onths
Maintenance
'"""''", ,L,,,
to two
mm
1 8 1 6) apart, until breakd()\Vn
and trended. Five breakdowns
with one
an electrician.
conduct
trained
test:
Transformer does
to be taken offCt.>rll<' H H /P to
trained
OJ!
Condition
DIAL
Conditions monitored:
in
anti-knock
in
and/or m:;oc:rs,mt
Calcium from
- Chtomium from an anti-oxidant i n
Cobalt from
trace
in some lubricants
- Nickel
vanadium
natural
oil s
in crude oils
in
with
378
lvlaintenance
in
,mti-wear
systems,
to months
P-F interval:
Condition
u n ., a v n n,,
P-F interval:
5.12
P-F
such
nnd
380
.such
Fourier transfonn i nfrared spectrometer to record
eluted from the column. Different detectors
different
volume):
i nsolubles).
transmisinterval:
Skill:
months
Comlition
interval: Months
Skill: To
f\/liPr,nc.t,un,
L,r,,,.,.,,.,.,,,,d i,trl':CIJt><
P-F
5.16
.Maintenance
382
P-F
and creates a
ion then
atoni. The
fills the core hole and a third
,nn c,::>r\./P energy This electron has
etc
Months
trained technician
msoe<;ttcm unless this
to do so
nlPl<.:li!rf>
the extent or
location of corrosion:
Internal combustion
Condition
P-F interval: Weeks to months
UUUU U H '
oil
be
chemicals used in
the
Maintenance
turbines,
Weeks
months
Skill:
within
for oetro1ewm based
only
oils.
within
chem-
Condition
Skill:
it cnames
it dnrrr>l'lt::<'
"''"'i'><'1l"\/ 11nn1rn1,;,r,;:
other aspects of
Water also
system
rusts and corrodes metal "'"'''""'''"'"
fol lows : :
wear
- it gums up valves and
it shortens the life of filters
it entrains more air, which
bulk modulus.
D-1744)
386
interval: Days to weeks
1n,:,rntu1,11 A measur ed
is reacted with Karl Fischer reagent which
contains iodine. When iodine i s
current will pass between
with the
e lectrodes. Moisture entrained i n
which has not reacted with the iodine rem ai ns. Once
the
the iodine and the test is
The
used to determine the titration
point
and ca1culate the water concentration. The duration of the test indicates the water
transformers should not exceed
at 20"C
content
water
per million).
Oils
compressors,
4:
5.25 C rackle Test (Human
Water in oi l
P-F interval:
Skill:
of water present.
detector)
to
Test
low as 25
1 0 0()0
Jl/laintenance
a more
mix),
and economical .
OH colour can
6
6.1 Liquid
Penetrants
discontinuities or cracks due to
corrosion stress
heat-- treatmcnt. corrosion
machined sur
structures, compressor receivers
P-F interval:
a,cc1Jrctm2. to
fluorescent or dual sento remove them from the test
t)enetrant;) and the
post e mulsified or solvent
4: Condition
CondiI ions
P-F interval:
6.3
Conditions monitored: Surface and n,,,..,.,.._,.,,.,
embrittlemcnt
interval:
Fluorescent
sprays
should be carried out under ultrav iolet
Skill:
6.4
Conditions monitored: S urface discontinuities and
heat tre,atn1e11tL '"''"''''''"'""' cn1hriltlen1er1t.
P-F
months,
389
390
co1ntam1ng fine iron oxide parmspec:ncm and a
field
Provides a record.
:vu
several months.
..._,.,.L1.,l,,,'1'.
1VA-J.!_;i.,J.g
Skill: A
the
of materials.
echo tecnmque.
rec:tmt1u1n e: Problems of modulation associated
to be obtai ned .
,,,::,,,:,,--i ,.,.....
Condi ticm
6.7 Ultrasonics Resonance
Conditions monitored,
(Also used for
ll/l./ l l !...IA v , ,
Discontinuities
for
6.9
Conditions monitored: Genera.I and localised
1nills.
P-F interval: Several months.
stream for a
are removed and checked for
several months)
these measurements, relative metal loss from the
can be estimated.
Skill: A
trained technician.
Maintenance
l 00 kHz
4 MHz i nduces
mete r .
work with
c h a rt .-, ,-. . -,,,,----
materials .
6.1 1
structures, metallic
pumps, shafts,
P-F
months.
parts or structures
crack-1. i ke
Two-sided access
C'ondi1ion
6.12
6.13
Conditions monilored:
and their
defects, COITosion,
P-F
Skill: A
6.14 Cold
Probes
394
Maintenance
Skill: As
!nSrPTf"> P t'"
to
5ikill: As
As for
unit is trans,.,.,...,.,,,"'r from a cold l ight
a Hexible fibre cable into
built into its \Vhich
The instrument c an be
to take a detailed ..-,n., u,-:n,"
or cine carneras,
Condition
6.17 Electron
conc1au.ms monitored: l'he
circumstances of
and
6.18 Colour
Conditions monitored: Oil colour and conditi on
interval: Weeks to mon ths
6. 19 Oil
Conditions monitored: Oil oxidation , water contamination,
and
contamination
oils
lvla intenance
396
contami nation .
No test
Partkles less than microns cannot
the
and source of contamin ation cannot
6.20 Oil Odour
Condi1ions monitored: Oil oxidation
,n n ,T,CHH
worke r
Skill:
6 .1 8 Strain
Strain
tunnels, the load
4: Condition
P-F imerFal:
to months
Resistance
foil and semiconductor
"""" . ..,,, that vvhen an electrical
1
" " /,,,-,,.,.
of
a nd
increased internal and
due to poor lubrication and reduced
,;, f j , r, ,t u n "1,'-P
Monitor
6.22
P-F interval:
weeks to months.
computer for
Skill:
398
months.
,,,,,="""'"C,"i to a reference oil . Identical baHs are
and the reference oiL The time
r. r. . . . ,r1
Skill: A trained
to
Accurate to within
in most
needs to be translucent
Oil
falls, dark or oxidised oil s may be unsuitable . Not a field
it
v l l,,",Jll\.,,
trans mis-
Flammable
Condition
7
A Preliminary Note on
time.
in
ation. It i::; based on the n .., ....,.,,,.1."
emit infra-red radiation. Thermal ,,"Y,un,n,Y
visible
paper,
metals,
welds, buried steam lines, steam traps,
kilns, tyre
and rubber manufacture.
P-F interval: A
and
detectors. The detectors
amount of current
then nr,'V'>CC.>i1 an on- boa rd """'" ' c,u
ted on a view finder or rmmitor as
3
Skill: A
399
400
lvlaintenance
7 .2 Focal Plan
Electrical: currentiresistance re1auonsm1 os fonn
of
lubrication,
P-f
radiatio n onto a
of {'ipr.:,,,. trc
and thermal resolutions that \Vere
u nknown.
comr>os. e o of many small elements. The detectors convert the
which is
and
i nto a visible
Condition
Skill: A
7.4
Conditions moniwred: Surface temperature.
Hot spots, insulation failure.
P-F interval:
to
Permanent record
on
the
Colours do not
temperatures: Service life of each
colour in the interim).
not
fixed
402
Skill:
trained technician .
a
and
and direct indication of corrosion
corrosion
Some instruments record the
1,.,. 1 0
sion condition: Aut.o matic and
Sensitive to corrosion
nr..
results .
fraction of a mil per
tor,, ,L:>nrq, ; l'\i'l ,:[i l< l lf'P<.:
nr.,... ,, ,,
1r11pt
total corrosion).
gas transmission
water distribution
paper mills,
abrasive
electrical
HH,n , t r.,r, n n
P-F
nnEH''!<;'..l1lH'\n
converted
corrosion rate.
method.
time
4: Condition
403
"-''--F"-'''"'''
Skill:
can be influenced
A
filled oil transformer should have a
in-service oil filled transformer
field technician.
One of the
Not an
tests.
404
Maintenance
circuits.
P-F
current flow. If there is no current
eqiu1i::,m1::m under
this current must be
cmTent' . The insn caHed
Skill: Tedmicjans
Test
and bounce.
circuit breakers.
and rnedium
rnonths.
neccssmy.
data:
breakers c an benefit from this test.
weeks.
the contacts. The
calculated
4: Condition
Resistance values
ures before
interval: Several
and
circuit unbalances
of the motor
indicates imbalanced m:.:tgt11et:Jtc
\Xll1,t11l H1'C
r
and/or
the
and
field tedmician ,
406
'""'''""''- and minimum c urrent test
which
non-destructive. u1mtwc:u,,1nt and
can be used i n
the field. Tests c a n b e done a t the MCC reqmrmg no break in motor connections.
deterioration
ment can also n;:.,ri"nrn,
Cannot evaluate one coil by itself.
co1no1ex and
determine the location and
of a faulL
8.10 Motor Current Signature Analysis
resistance
Conditions rnonitored: Broken rotor
or
between bars and
u neven rotor-stator air gaps, rotor misposition, deteri
orated or shorted rotor or stator core lamination.
Appendix 4: Condition
407
fl/J,Dllc-:arions: AC or DC motors.
,-,.n">,nra l u
and
been HY\t"lrr."""''
the spectra (this has
tnt,::.rnrPt:ltlr,n standpoint). Equipmen t
408
P-f' interval:
weeks to months,
attached to motor feed lines either al the Motor Control
conditioning unit condi
sensed from the feed lines. Data files
to
deviations.
Skill: To attach
and interpret the
A
occurs when a s mall void, crack, or
larity in an insulation system
electric field to build up. Sensors are used
sensor is connected between the gnJurldeid
to pick-up the PD. On
CT circuit On cables, the sen sor connected around the
side of the
around the insulated conwire that connects the cable shield or
are placed on the motor frame or around the ground
ductor. On motors,
three i ssues
connection or around the insulated motor lead. ln
are considered:
and the magnitude of the pulse (field
the number pulses per
coulombs)
in
Appendix 4: Condition
409
for
to set PD threshold,.
410
Reliability-centred Maintenance
A flux coil sensor is placed at the centre of the axial outboard end of
the motor. Lons1:ste11t p1os1nrnrung of this sensor is essential for reliable and trend
,_.,_...,:,..,,.:
able data.) The signal ,.r,:,,,,
" ''"'"' from the sensor is transformed into the frequency
domain
an FFf analyser. A trend of certain magnetic flux frequencies will
associated with the rotor and stator windings.
indicate electrical
in a flux coil spectru m occur at frequencies which have some
Most of the
Broken rotor bars increase the sideband activity
relationship to nmn ing
(which causes motor
Unbalanced supply
around
and eventually leads to premature deterioration of the stator windings) shows no
except around the
occurring at line frequency + 1 x RPM. One of
the first faults a w inding will encounter is tum-to-tum shorts, which then
or phase-to-ground shorts. A winding fault can be indicated
into
around the 3 x running
sideband of line frequency . A variation of thi s
technique is used t o detect turn-totum shorts b y looking a t the family o f 'slot
tre,c1m:::nc1es from measurements taken with a flux coil. Flux measure
is analysed at the
ments are taken as mention above, and the
'slot
The principle slot pass frequency occurs at the product
of the number of rotor bars and running speed. The technique i nvolves comparspectra over time to determine when c hanges occur.
Skill: To record the spectrum: an electrician/technician with an understanding of
motors. To interpret the results: an en:grnteer.
I
i\.C1
' vwiua;;ze.is: One of the few techniques that can detect faults associated with elec
trical insulation of electric motors while the motor i s online.
11s11ai1anuu2es: High
resu]ts .
,,>rn,r,;,f
tO ,nt
: ...1.,,F'"'
4 11
Skill: Field technician.
the
test can be
is low level and 'rides' on top of the DC of
Lhsactvantc1,Res: Test could take
time on
9 A Note on Leaks
With the exceotmn
covered in much detail in this
storage tanks. This is because
which 1'""" ' 1 11 ,,
d.,.,,,,,..,, nt,,-r, of 36 different leak detection methods
and commissioned
Laboratory, Edison,
L>, .,d--n
-r,,,h
Bibliography
41
Mercier J-P. Nuclear Pcrwer Plant Maintenance. Mfus<ms-Anor
Kirk. 1 987
Mohr G. echmJlc,12v Overview Ultrasonic Detection". P/Plvf
(5) 1 995
Editions
Predictive Maintenance
diana. 4 6 December 1 995
Nowlan F S
National Technical Information Service,
Oxford. Butterworth Heinemann. l 989
Oakland JS.
Perrow C Normal Accidents. New York.
1 984
1 990
1-<euamtlfV-Cc',rn:'ret1 11-'.Ulllltt'fUJ'llCe. Los
Altos, California: Dolby Access Press. l 978
. Northwest Indiana Business Roundtable & Trade Show. Merrillville, Indiana.
1996
Rose A.
414
Smith AM. KeJ1 tatntll'V-!'em'er(:ll Jv1an,uei,zar.ice. New York. McGraw-Hill. 1 993
Oxford. B utterworth Heinemann.
Smith DJ.
1 993
Snow DA
mann. 1 99 1
Tissue B M . SC/MEDIA
and instrumentation. Website
Defense. 1 986
van der Hom G & '"'"'""''",. W. "Electric Motor Predictive Main tenance A
comorenemave I\Orffo..1cn . Predictive Maintenance
National
tntlia11ap,011s, Indiana. 4 6 December 1 995
Weaver "Time W aveform
PIPM
8 (5) 1 995
White G. "Vibration Data Collectors and n,:.i"""' .."'" Predictive Maintenance
Indianapolis, Indiana. 4-6 December 1995
. Predictive Maintenance
uec;emtoer 1995
& Le Bleu J. "Condition NHJmtorm,g
Xu
8 (6) 1 995
. PIPM
/'lVJ1rr,M U',:>
Index
Atomic emission
AE:
Abrasion: 59
Accelerometer: 35 1
Acceptable:
296
looks:
risk: 30, 98- 1 0 1 , 256,
Acoustic emission:
Actuarial analysis:
A.ct1nm:1steJnng RCM:
Age-related failures: 1 1, 1 3 1 - 1 33, 1 601 6 1 , 235-238, 243-246
Air
Association of America
(ATA):
Ain.:raft accidents: 309
Airlines and RCM: 3 1 8-32 1
366-367
All-metal debris
Analysis paralysis: 65
Analytical ferrography:
Anthropometric
335-336
Anxiety: 40
Appearance: 40
n nruut rHY RCM : 1 6- 1
8,
Asset hierarchy: 327-330
Atomic absorption spectroscopy: 374
Atomic emission spectroscopy:
Attentional failures:
Attractiveness:
Audit trail: 20, 3 16
274,
Auditing RCM: 1 8, 2 14-217,
276
Availability: 294-304,
of hidden functions: 1 1 5- 1 1 8,
12,
Bathtub
Battery 1n11Je(lam:e
Benchmarking:
Benefits of RC;I:
Car:
Maintenance
416
1 69, 206
Combination
Comfort: 40
cons1:!QUern;e evaluation: 1 1 ,
Consequences of failure: 1 0- 1 1 , 1 5, 7 1 90127,
lV(H(l;UlCe : 9 1
catc:gm-tes: 1 0
Crankshaft
26.
34, 49
Criticality assessment: 280
Customer service: 3, 1 9, 29, 1 04, 201 . 280
Cusum chart: 1 5 1
Damage:
Secondary damage
Database: 19. 267-268, 3 1 5-3 1 7
Data: 255-260
failure:
Technical history data
Debottlenecking: 62-63
enaoscicme: 394
,. nui:cuu
.
u . 200-201, 267
Decision support: 5
Decision Worksheet: 198-211, 267,
Default actions: 1 4, 9 1 , 170197
Defect rer>oritmii:
Design
Desired oertori,narice: 23.24, 130, 189,
256
Detective
17J
Deterioration: 58-50,
Device:
in maintenance schedules:
the ultimate:
Control:
Control limits:
Corrator: 401 A02
Corrective maintenance: 1 7 1
39 1
1 9,
maintenance: 278. 305
operating: 104, 20 I
repair: 104, 108-110,
test:
C'raftsman:
Maintenance craftsman
Index
see Maintenance effectiveness
Failure effects
Effects:
294-295
energy: 294
maintenance;
time: 230-23 1
Electrical
1 50, 350, 40 l -4 1 1
Electrical resistance meter: 402
Electrical surge comparison: 406
Electro-chemical corrosion mcimtonng:
382
worth
Failure modes: 9 , 53-73,
270 ,
417
41 8
Reliability-centred Maintenance
First Generation: 2
Flow orc)cesses:
Fluorescence spectroscopy (X- ray):
FMEA: 53-89
see also Ru lure modes and Failure effects
400
Focal plan
Fourier
Fourier transfom1 infrared spectroscopy:
378
Fractional dead time: 1 1 7
Frequencies:
consolidation:
nmure-rmamg: 1 75- 1 85
Maintenance schedules
low:
Maintenance schedules
on-condition: 1 45- 1 49, I 63- 1 65
scheduled discard: 1 38- 1 40
scheduled restoration: 1 35
t're:auenc:v analysis: 356
Fuel line: 8 1 -85
Functions: 7-8, 21-44, 2 1 5, 261 -262
definition
22
different types: 35-44
Evident functions
evident:
hidden:
Hidden functions
8, 35-37
secondary: 8, 34, 37-44
sur,ertluc,us: 43
3 29-333
Functional
Functional
Functional hierarchies: 329-333
Functional
297-304
Functional failures: 8-9, 45-52, l 24, J 55,
2 1 6, 262
definition: 47
Gas station:
Gearbox failure:
86-88
failure data: 79
Index
KrnJWI1eawe-o,1sed nustakes: 338. 341-342
Kurtosis: 361
419
Manufacturer
289
420
Reliability-centred Maintenance
colour: 395
Oil
3 96
Once-off changes: 1 8, 220-221
1 1,
145-169, 205,
On-condition
definition:
i ntervals: 1 45- 1 49
1 55- 1 56
28 - 35, 5 0,
Operating
procedure: 1 8
1 5 , 93 ,
Operational consequences:
3 10
259,
1 03-108, 1 70,
definition of: 1 04
1 93- 1 97
Operations managers: 26 1 -268, 2 9 1
()peratiOllS "' """"'nll,t.'Arc } 7 , 262-268, 29 1
262-268, 291
\JfJuu u , ,.;
311
also Scheduled restoration
UV(lflOlldlnJ!.: 6 1 -64
Oxidation: 1 34
1 44- 145, 1 57- 1 62, 343,.349
linear: 1 60- 1 6 1
and random
205, 256,
P-F interval: 145-149,
4 10
nett
vs operating
1 56, 1 60- l 62
Packaging
P,m-view fibrescopes:
280
Pareto
Partial discharge: 408A09
failure: 47
Particle effects: 1 50, 349,
Pmlide monitoring: 349, 362-370
282-285, 3 1 5
226-229
1ov.1tr,enutern:v schedules: 224-225,
23(}-232
time: 230
running time: 23 1
Poka yoke: 339
Por1e-otocl<a\?e tecr1mc1ue: 364-365
Potential failure: 1 4, 1 44-1 45, 1 54, 1 55,
205, 256, 3 10 , 348-349
definition: 1 44
Potential monitoring: 402-403
Potentiometric titration:
TANfrBN: 383-3 84
TBN:
407-408
Power
Predictive tasks: 144-169, 1 7 1
Pressure relief valve:
Relief valve
Pressure switch: 1 25
Preventive tasks: 133-1 40, 1 7 1
Primary effocts monitoring: 149, 1 5 2- 1 53
42 1
Index
Primary functions: 35-37, 125
PRN: 280
Proactive maintenance: 129169
Proactive tasks: 1 - 14, 9 1 , l 02- l 03, 106,
1 07, 1 70,
combination: 1 69
hidden failures: 1 2 l
selection: l 67- 1 69
Probability
Mean Time between Failures
note on P96
Random failure:
1 40 , 1 43,
and P-F ,,,1,,,.,,,,, 1,,. 1 56
Raw material supply:
RCM:
Maintenance
RCM:
protective
redesign: 1 92
rnainte:na11ce: l 88- 1 89
non-operational corisi.::(Juence-s: 109- 1 10,
1 93- 1 97
422
Scheduled restoration
1 7. 266*269
Review
Rigid 001esc;opies: 393
Risk: 95- l O 1
acci:ptabiWty: 98- 1 0 1
continuum of: 1 1 8- 1 20
cconornic: 1 1 9,
346
evaluation: W l
fatal: 98--99,
of fail ure: 69, 1 59, 3 35-34 1
Root
Rotating disc electroche;:
Routine maintenance:
and hidden functions: 1 20- 1 22
reduction:
342
Routine violations:
340-34 1
Rule-based mistakes:
Run-to-failure: l l , 1 4
Running time:
S -N
SPC: see S tatistical
control
Safety first: 93
30,
103
Safety1eus1at10n:
_ see Relief valve
and actuarial analysis: 25 1
Scale parameter:
:-:ic,mnmJ;? a uger elec tron m1crc>scoov:
3 81 -382
;")Cumm2 electron mH:::ro:-.covv: 3 8 1
Schedule: see Maintenance schedule
Scheduled discard: l l, 1 3, 137-140, 1 68,
205,
265
tn>l'H ll rli''-1 1 38- 1 40
>
worth doing: 1 40
Scheduled on-cm1ct1tmn tasks:
On-condition
1 43
Scheduled overhauls:
Scheduled restoration
Spike
S taff turnover: 20, 3 16
Standard operat111g mocem-e: 221 -222
Stand-by pump; see Pump
control: 1 5 1 - l 5 2
Statistical
Steel mill: 3 10, 3 1 1
Strain gauges: 396-397
Stress:
Applied stress
.Stn omtbte ma,:met1c film: 3 89-390
Structural integrity: 3 9
.su1oertllu)us functions: 4 3
Survival distribution: 237-245
27, 48
boundaries: 1 7, 270, 334
Index
TQM: 2 1, 288
Task:
Proactive
proactive:
packaging: 223
oro, om,ed: 206M207
selection process:
169,
Teamwork: 20. 268,315
Technical:
characteristics: 1 5, 1 29
Technically feasible
259-260
1 ec:nmca111y ttias1Die: 14, 90-91 , 129-130,
205, 324
failure-finding tasks: 1 85
on-condition tasks: 1 49
scheduled discard
l 40
scheduled restoration tasks: 1 35- 1 36
Techniques maintenance: 5
1 50, 350, 399,-40 l
Temperature
Temperature indicating paint: 40 1
1 e1npt;atmg: 28 l
rne:rm,ogr;apny: 399
Thin-layer activation: 380
Third Generation:
307
Time and hidden failures: 124- 1 25
Time synchronou s avi:ra21r11g analysis:
3 55-366
Time waveform
Total failure : 47
Total quality management:
TQM
Traditional view of failure: 1 1
to maintenance: 1 6
Traditional
Vibration switch: l 1 5
221
Y ield:
423