Professional Documents
Culture Documents
Guidelines For Application of The EPRI Preventive Maintenance Basis
Guidelines For Application of The EPRI Preventive Maintenance Basis
Guidelines For Application of The EPRI Preventive Maintenance Basis
(35,3UHYHQWLYH0DLQWHQDQFH%DVLV
Effective December 6, 2006, this report has been made publicly available in accordance with
Section 734.3(b)(3) and published in accordance with Section 734.7 of the U.S. Export
Administration Regulations. As a result of this publication, this report is subject to only
copyright protection and does not require any license agreement from EPRI. This notice
supersedes the export control restrictions and any proprietary licensed material notices
embedded in the document prior to publication.
SINGLE USER LICENSE AGREEMENT
THIS IS A LEGALLY BINDING AGREEMENT BETWEEN YOU AND THE ELECTRIC POWER RESEARCH INSTITUTE, INC.
(EPRI). PLEASE READ IT CAREFULLY BEFORE BREAKING OR TEARING THE WARNING LABEL AND OPENING THIS
SEALED PACKAGE.
BY OPENING THIS SEALED REPORT YOU ARE AGREEING TO THE TERMS OF THIS AGREEMENT. IF YOU DO NOT
AGREE TO THE TERMS OF THIS AGREEMENT, PROMPTLY RETURN THE UNOPENED REPORT TO EPRI AND THE
PURCHASE PRICE WILL BE REFUNDED.
1. GRANT OF LICENSE
EPRI grants you the nonexclusive and nontransferable right during the term of this agreement to use this
report only for your own benefit and the benefit of your organization. This means that the following may
use this report: (I) your company (at any site owned or operated by your company); (II) its subsidiaries
or other related entities; and (III) a consultant to your company or related entities, if the consultant has
entered into a contract agreeing not to disclose the report outside of its organization or to use the report
for its own benefit or the benefit of any party other than your company.
This shrink-wrap license agreement is subordinate to the terms of the Master Utility License Agreement
between most U.S. EPRI member utilities and EPRI. Any EPRI member utility that does not have a Master
Utility License Agreement may get one on request.
2. COPYRIGHT
This report, including the information contained in it, is either licensed to EPRI or owned by EPRI and is
protected by United States and international copyright laws. You may not, without the prior written
permission of EPRI, reproduce, translate or modify this report, in any form, in whole or in part, or prepare
any derivative work based on this report.
3. RESTRICTIONS
You may not rent, lease, license, disclose or give this report to any person or organization, or use the
information contained in this report, for the benefit of any third party or for any purpose other than as
specified above unless such use is with the prior written permission of EPRI. You agree to take all
reasonable steps to prevent unauthorized disclosure or use of this report. Except as specified above, this
agreement does not grant you any right to patents, copyrights, trade secrets, trade names, trademarks
or any other intellectual property, rights or licenses in respect of this report.
4. TERM AND TERMINATION
This license and this agreement are effective until terminated. You may terminate them at any time by
destroying this report. EPRI has the right to terminate the license and this agreement immediately if you
fail to comply with any term or condition of this agreement. Upon any termination you may destroy this
report, but all obligations of nondisclosure will remain in effect.
5. DISCLAIMER OF WARRANTIES AND LIMITATION OF LIABILITIES
NEITHER EPRI, ANY MEMBER OF EPRI, ANY COSPONSOR, NOR ANY PERSON OR ORGANIZATION
ACTING ON BEHALF OF ANY OF THEM:
(A) MAKES ANY WARRANTY OR REPRESENTATION WHATSOEVER, EXPRESS OR IMPLIED, (I) WITH
RESPECT TO THE USE OF ANY INFORMATION, APPARATUS, METHOD, PROCESS OR SIMILAR ITEM
DISCLOSED IN THIS REPORT, INCLUDING MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE, OR (II) THAT SUCH USE DOES NOT INFRINGE ON OR INTERFERE WITH PRIVATELY OWNED
RIGHTS, INCLUDING ANY PARTY’S INTELLECTUAL PROPERTY, OR (III) THAT THIS REPORT IS
SUITABLE TO ANY PARTICULAR USER’S CIRCUMSTANCE; OR
(B) ASSUMES RESPONSIBILITY FOR ANY DAMAGES OR OTHER LIABILITY WHATSOEVER (INCLUDING
ANY CONSEQUENTIAL DAMAGES, EVEN IF EPRI OR ANY EPRI REPRESENTATIVE HAS BEEN ADVISED
OF THE POSSIBILITY OF SUCH DAMAGES) RESULTING FROM YOUR SELECTION OR USE OF THIS
REPORT OR ANY INFORMATION, APPARATUS, METHOD, PROCESS OR SIMILAR ITEM DISCLOSED IN
THIS REPORT.
6. EXPORT
The laws and regulations of the United States restrict the export and re-export of any portion of this report,
and you agree not to export or re-export this report or any related technical data in any form without the
appropriate United States and foreign government approvals.
7. CHOICE OF LAW
This agreement will be governed by the laws of the State of California as applied to transactions taking
place entirely in California between California residents.
8. INTEGRATION
You have read and understand this agreement, and acknowledge that it is the final, complete and exclusive
agreement between you and EPRI concerning its subject matter, superseding any prior related
understanding or agreement. No waiver, variation or different terms of this agreement will be enforceable
against EPRI unless EPRI gives its prior written consent, signed by an officer of EPRI.
Guidelines for Application of the
EPRI Preventive Maintenance Basis
TR-112500
EPRI • 3412 Hillview Avenue, Palo Alto, California 94304 • PO Box 10412, Palo Alto, California 94303 • USA
800.313.3774 • 650.855.2121 • askepri@epri.com • www.epri.com
DISCLAIMER OF WARRANTIES AND LIMITATION OF LIABILITIES
THIS DOCUMENT WAS PREPARED BY THE ORGANIZATION(S) NAMED BELOW AS AN
ACCOUNT OF WORK SPONSORED OR COSPONSORED BY THE ELECTRIC POWER RESEARCH
INSTITUTE, INC. (EPRI). NEITHER EPRI, ANY MEMBER OF EPRI, ANY COSPONSOR, THE
ORGANIZATION(S) BELOW, NOR ANY PERSON ACTING ON BEHALF OF ANY OF THEM:
ORDERING INFORMATION
Requests for copies of this report should be directed to the EPRI Distribution Center, 207 Coggins
Drive, P.O. Box 23205, Pleasant Hill, CA 94523, (800) 313-3774.
Electric Power Research Institute and EPRI are registered service marks of the Electric Power
Research Institute, Inc. EPRI. POWERING PROGRESS is a service mark of the Electric Power
Research Institute, Inc.
Copyright © 2000 Electric Power Research Institute, Inc. All rights reserved.
CITATIONS
Principal Investigator
D. H. Worledge
The report is a corporate document that should be cited in the literature in the following manner:
Guidelines for Application of the EPRI Preventive Maintenance Basis. EPRI, Palo Alto, CA:
2000. TR-112500.
iii
REPORT SUMMARY
The EPRI Preventive Maintenance (PM) Basis project has developed detailed technical basis
documents to support PM tasks and task interval selection for 39 component types. This basis
information has many practical applications in the development, optimization, and justification
of PM program activities in power plants. This report provides step-by-step guidelines for using
the PM Basis information for six applications thought to provide the most benefit to nuclear
plant maintenance personnel. This report also provides a vision for future enhancements to the
PM Basis data to optimize its usefulness and value.
Background
EPRI has sponsored the development of reliability-centered maintenance (RCM) and streamlined
RCM for power generation and delivery systems. EPRI also introduced the concept of the Task
Selection Template, which greatly improves the quality, consistency, and efficiency of task
selection. Although streamlined RCM has proved to be a sound and cost effective method for
maintenance task and interval selection, technical information needed to establish the basis for
the selected activities is not typically available to the plant maintenance professional.
In response to this need for technical basis information, EPRI sponsored the PM Basis project
beginning in 1997. The PM Basis project provides the user with the technical basis for PM tasks
and intervals by component type and gives information to adapt the intervals to plant conditions.
Task Selection Templates, a synopsis of the task content and intervals, and the reasons why these
choices are technically valid are presented for 39 major component types in a series of report
volumes. These reports are being widely used by EPRI-member utilities. Their use has led to
requests for EPRI to provide additional related products, including guidelines for application of
the PM Basis information.
Objectives
x To provide clear guidance on the use of the PM Basis information for six applications that
are thought to provide the most benefit to nuclear plant maintenance personnel
x To provide a vision for future enhancements to the PM Basis data personnel to optimize the
data’s usefulness and value
Approach
The author of this report participated in all of the expert panels and report development for the
39 component types. He also assisted EPRI-member utilities, in collaboration with EPRI, to
apply the PM Basis information during its development. Based on this experience and continuing
communications with EPRI members who were applying the PM Basis information on their own,
the author produced step-by-step guidelines for applications that were identified as the most
useful by participants at an EPRI PM Optimization meeting in October 1998.
v
Results
This report contains specific guidelines for six discrete applications of the PM Basis information:
x PM Task Evaluation
x Interval Evaluation
x PM Audit
x As-Found Condition
x Task Deferral
x Cause Evaluation
This report also outlines an application of the PM Basis information to establish a performance
indicator based on the number of PMs allowed to exceed their task interval within an allowed
grace period.
EPRI Perspective
This work represents an important step in the full use of the PM Basis data developed by EPRI in
1997 and 1998. There is a continuing need, however, for a more integrated product that provides
all the content and functionality of these products but offers additional value. This need includes
the following:
x Efficient and integrated access to all of the PM Basis information from a single database
x Feedback from users to improve and supplement the existing PM Basis information
x Periodic updates of the PM Basis information including more component types, more basis
information for existing component types, and additional guidelines and analysis tools
Ongoing EPRI-sponsored activities are designed to accomplish each of the above objectives.
These activities are part of the Preventive Maintenance Information Repository (PMIR) project.
Also, there is a mistaken impression among some potential users of the PM Basis reports that the
primary value of the PM Basis information is for the development and update of RCM studies. In
fact, the need for the technical PM basis information is generally valid, regardless of the method
selected for task and interval selection. Furthermore, this information is essential for the
continuing optimization and evolution of the PM program throughout the life of the plant. This
report clearly shows the potential usefulness of the PM Basis information in the day-to-day
maintenance of power plant equipment.
TR-112500
Keywords
Preventive maintenance
Maintenance optimization
Component reliability
Power plant reliability
vi
EPRI Licensed Material
ABSTRACT
This report contains specific guidelines for six discrete applications of the Preventive
Maintenance (PM) Basis information. The PM Basis information provides the user with the
technical basis for tasks and intervals by component type, and gives information to adapt the
intervals to plant conditions. Task Selection Templates, a synopsis of the task content and
intervals, and the reasons why these choices are technically valid are presented for 39 major
component types. The applications detailed in this report are: 1) PM Task Evaluation, 2) Interval
Evaluation, 3) PM Audit, 4) As-Found Condition, 5) Task Deferral, and 6) Cause Evaluation.
Participants at an EPRI PM Optimization meeting in October 1998 identified these applications
as the most useful. In an appendix, this report discusses an application of the PM Basis
information to establish a performance indicator based on the number of PMs allowed to exceed
their task interval within an allowed grace period. This report also provides a vision for future
enhancements to the PM Basis data to optimize their usefulness and value. These guidelines are
valuable for the development of a PM program and for the continuing optimization and evolution
of a PM program throughout the life of a plant.
vii
EPRI Licensed Material
CONTENTS
1 INTRODUCTION.................................................................................................................. 1-1
1.1 Background and Perspective..................................................................................... 1-1
1.2 Summary and Objectives .......................................................................................... 1-2
1.3 PM Optimization Flow Chart...................................................................................... 1-5
ix
EPRI Licensed Material
A INTERVAL CHARTS...........................................................................................................A-1
x
EPRI Licensed Material
xi
EPRI Licensed Material
LIST OF FIGURES
xiii
EPRI Licensed Material
LIST OF TABLES
xv
EPRI Licensed Material
1
INTRODUCTION
A power generating plant is a complex facility that is most easily represented as an assembly of
systems, each designed to perform a discrete set of well-defined functions. Such systems are
composed of many components. Keeping these components at levels of reliability and
availability necessary to achieve the desired performance of system functions is inherent in the
system design and component selection and is continuously sustained through maintenance of
the components. Therefore, in order to do effective maintenance, the plant must have appropriate
tasks and task intervals on components to sustain this inherent reliability and availability to
achieve the system functions.
EPRI has been a leader in the electric power industry in developing reliability-centered
maintenance (RCM) for power generation and delivery systems. RCM is a structured approach to
task selection that focuses on:
x Maintaining important system functions
x Preventing important functional failures of critical equipment
x Selecting the most appropriate maintenance tasks and task-intervals
RCM prescribes condition-directed tasks instead of time-directed tasks, calls for failure-finding
tasks on standby equipment, and recommends operational and design changes when an
applicable, effective maintenance task cannot be identified. EPRI sponsored the development
and validation of streamlined RCM methods to reduce the cost and duration of RCM analyses.
These methods have proven to be successful and continue to be widely applied in the industry.
The EPRI streamlined methods report, TR-105365, introduced the concept of the Task Selection
Template. The template has greatly improved the quality, consistency, and efficiency of task
selection. Furthermore, it provides an integrated set of preventive maintenance (PM) tasks and
intervals for consideration. Included are condition-directed, time-directed, and failure-finding
tasks including surveillance and operator round monitoring. When considered as a package, these
tasks represent an optimized set of activities to prevent critical failure modes, considering likely
degradation mechanisms and common causes.
Although streamlined RCM has proven to be a sound and cost-effective method for maintenance
task and interval selection, technical information to establish the basis for the selected activities
is not typically available to the RCM analyst. If the analyst develops or acquires this technical
basis information, it is not typically documented as part of the RCM study. For example,
documentation usually does not include failure causes or degradation mechanisms. It almost
1-1
EPRI Licensed Material
Introduction
never documents failure locations, important degradation influences, failure timing, discovery
opportunities, or the importance of duty cycle and operating environment of the equipment.
Although this discussion has centered on a maintenance program developed using RCM, the
need for the technical PM basis information is generally valid, regardless of the method selected
for task and interval selection. Furthermore, this information is essential for the continuing
optimization and evolution of the PM program throughout the life of the plant.
In response to this need for technical basis information, EPRI has sponsored the PM Basis
project, which began in 1997. The PM Basis project provides you with the technical basis for PM
tasks and intervals by component type and supplies information to adapt the intervals to plant
conditions. A recommended PM program (the Task Selection Template), a synopsis of the task
content and intervals, and the reasons why these choices are technically valid in a variety of
circumstances are presented for 39 major component types in a series of report volumes. A
thorough description of these reports as well as the structured way in which this information was
developed is contained in the PM Basis Overview Report, TR-106857.
These reports are being widely used by EPRI-member utilities. They are used as a resource
during PM basis development, as a part of the periodic assessment of PM program effectiveness,
and during the evolution of the PM program as a result of a plant experience.
Their use has led to requests for EPRI to provide additional related products. These products
include guidelines for application of the PM Basis information, an electronic version of the PM
Basis information with an efficient user interface, and updates and enhancements to the PM
Basis information in the future.
This report contains specific guidelines for six discrete applications of the PM Basis information.
These applications are:
x PM Task Evaluation
“I want to know if the task interval is about right or how it should change.”
x PM Audit
1-2
EPRI Licensed Material
Introduction
x Task Deferral
“How can I use these data for cause evaluations and corrective actions?”
Guidelines for each of these applications is included in this document as a stepwise instruction
including specific links into the PM Basis reports and data tables and references to tables and
charts developed from the PM Basis data specifically for this guideline. These latter tables and
charts are included as appendices in this report. Several guidelines refer to the PM Optimization
Flowchart in Figure 1-1 at the end of this section. Instructions within a guideline to go to a
specific reference in the EPRI PM Basis reports are noted in bold typeface with a section
reference in brackets, such as Correlation Table [EPMB 2.1].
One specific application, performed by the author of this report and supported by EPRI, occurred
during the development of these guidelines. This application, “A Strategy to Manage PM Tasks
within a Grace Period,” employed a sophisticated use of the PM Basis information and resulted
in guidance for cost-effective use of the number of PMs in the grace period as an indicator of PM
program performance and effectiveness. At the same time, it provides justification for many
nuclear plants to relax their severe requirement to complete effectively all required PMs prior to
entry into the grace period. Because of the generic value of this application result, it is included
as an appendix in this report.
The PM Basis reports and these applications guidelines are complementary products. There is a
continuing need, however, for a more integrated product that provides all of the content and
functionality of these products but offers additional value, including
x Efficient, integrated access to all of the PM Basis information from a single database
x Feedback from users to improve and supplement the existing PM Basis information
x Periodic updates of the PM Basis information including more component types, more basis
information for existing component types, and additional guidelines and analysis tools.
Ongoing EPRI-sponsored activities are designed to accomplish each of the above objectives.
These activities are part of the Preventive Maintenance Information Repository (PMIR) project.
The PM Basis information will be brought together by including all of the existing PM Basis
information within a single relational database. A user interface will be built to enable access to
the information from user-friendly screens and menu selections. More sophisticated searches,
queries, and reports will be available. The application guidelines will be integrated with the PM
Basis Database so that you can execute the application completely from within the PMIR
software. The PM Basis Overview Report will also be accessible from within the PMIR product,
enabling you to access important definitions and explanations during use of the Database.
Feedback from users will be sent by electronic messages to the EPRI PMIR coordinator directly
from the PMIR application. Input from users will be requested any time that you have alternative
1-3
EPRI Licensed Material
Introduction
or additional information to add to the PMIR Database, especially if that information results in
selection of tasks or intervals at odds with the PM Basis Task Selection Templates.
Periodic updates of the data will be performed by EPRI to incorporate consideration of all
feedback information from users as well as any additional analyses performed or secured by
EPRI that enhances the basis for PM tasks and intervals. These updates will be issued in such a
way that users can critically review the changes for their own use as they are made available.
This important feature will enable you to maintain configuration control of your own PM Basis
documentation based on the EPRI PMIR information.
1-4
EPRI Licensed Material
Introduction
Figure 1-1 shows how the EPRI PM Basis Database can assist with PM task optimization.
Start
Has the Failure Yes Retain Existing Has the Failure Yes
History Been Tasks and Intervals, History Been
Satisfactory? But Check Template Satisfactory?
for More
Cost-Effective
Choices. Retain
No No Current
Tasks and
Intervals.
Are the Adopt the EPRI
EPRI Tasks Yes Review Failure
Tasks and
Cost Effective Intervals. Causes to Find
and Acceptable? Additional Failure
Mechanism. Add
Task or Improve
No Task Execution.
Perform PM
Task Evaluation
and Interval
Evaluation.
Figure 1-1
PM Optimization Flow Chart
1-5
EPRI Licensed Material
2
PM TASK EVALUATION GUIDELINE
2.1 Context
Before proceeding with PM Task Evaluation, you should examine the sample procedure
described in Figure 1-1. This procedure provides an approach to PM optimization that considers
criticality, failure history, vendor recommendations, current PM tasks, and the acceptability of
the EPRI recommendations. The flow chart references both PM task evaluation and interval
evaluation and serves as an introduction to how plant PM optimization can use these individual
processes.
PM Task Evaluation selects and evaluates PM tasks to provide adequate protection against
component failures. Evaluation of task intervals is done separately using interval evaluation,
which is described in Section 3. Evaluation of a PM task should be carried out in the context of
other tasks that are being performed on the component.
The main reasons for these evaluations could be 1) to optimize tasks and intervals as part of a
programmatic improvement in which a large number of tasks and components are addressed or
2) to change individual tasks in response to poor performance, or a finding of poor or
consistently good material condition. Evaluating the tasks follows the same general process,
regardless of the reason for the evaluation.
There are also three initial questions that should always be considered when evaluating a PM
task, but they cannot be answered by the database. They all presume that a specific task is being
evaluated.
x Is this task required to satisfy the plant operating license, such as a Technical Specification,
EQ, Appendix R, etc?
x Is this task required by a relevant code or standard, such as ASME Section XI?
x Is this task part of a management commitment to regulators, insurance, etc?
If the answer to any of these questions is Yes, proceed with implementing the task in the short
term. Further task evaluation, below, may suggest that changing the task requirement would add
value.
The PM Task Evaluation Process has a short version and a detailed version. The short version is
described first. The Short PM Task Evaluation Process is recommended for routine applications,
especially when evaluating a large number of components. For individual components that
present particular problems, the additional detail of the Detailed PM Task Evaluation Process
may be justified.
2-1
EPRI Licensed Material
Each question is addressed in two parts. The first part (see Section 2.2.2) is the interpretation of
the question and explains what the issue is about and why it is important. The second part (see
Section 2.2.3) shows you how to get the answer to the question, most often by reference to the
EPRI PM Basis Reports. These references are abbreviated thus: [EPMB 3.2] to mean that the
material can be found in Section 3.2 of each EPRI PM Basis Report.
This section explains what is meant by each question and why it is important.
The objective of PM could be to prevent all failures of a component or to limit the number of
failures to an acceptable level. Components with these two PM objectives are classified in PM
optimization processes as critical and non-critical, respectively. Both types of components lead
you to consider performing some PM tasks. If neither objective applies to this component, it
should be classed as run-to-failure (RTF), which means that no PM tasks of any kind will be
performed on it.
The PM objective for critical components is to prevent all failures that are known to occur and
are expected to occur at least once in the life of the equipment. Critical components must have
sufficient PM coverage in scope and frequency to address a wide spectrum of possible failure
mechanisms and certainly all of the common failure mechanisms.
In some cases, you may not be able to find cost-effective PM tasks for a critical component. If a
design modification cannot eliminate the failure mechanism or its effects or if a task is not cost
effective, the component defaults to RTF, and you have to accept the cost of a failure.
Components that lack important functions may simply be allowed to fail if there are no serious
consequences. They can be repaired after they fail, but they would not merit expenditure of
maintenance resources to prevent the failure. Such components would be run-to-failure.
However, many components fall between these extremes. For example, they might have to meet
a reliability target that permits several failures (perhaps a maximum of three failures in two
years), so they should not be allowed to fail too often. Or they result in significant costs when
they fail although such costs are not on the same scale as a loss of production. This could be a
2-2
EPRI Licensed Material
result of self or secondary damage, increased waste disposal costs, additional testing or
requalification of other equipment, or significant radiation exposure during repair (more than
during PM). All such components are classified as non-critical.
Non-critical components might need a PM task to address just one or two catastrophic failure
mechanisms if the objective involves preventing expensive damage to the component or to
another component. If the objective is to meet a performance criterion that permits several
failures, it should be sufficient to address only the most common failures.
You may not be able to find cost-effective PM tasks for non-critical components. In these cases,
they default to RTF, and you have to accept the cost of a failure.
Tasks recommended by the PM Basis Database should be compared to existing tasks to obtain
candidate PM tasks. You should be aware that the tasks in the Database may not be packaged in
the same way as current tasks, that is, certain line item activities may be performed as part of
another task, and tasks that have similar names may differ appreciably in scope. The purpose of
this question is simply to identify the Database recommendations and to identify which task
names are equivalent to current tasks. Evaluation of the candidates is performed subsequently. If
you do not have any current tasks, select the Database recommendations as candidates. You can
also include vendor-recommended tasks for critical components if you know how they relate to
the EPRI tasks.
The expert panel of utility and vendor personnel who recommended the tasks in the Template
suggested the complete set of tasks, for the given conditions, as being a moderately conservative
group of tasks that could be applied by a utility with little operating experience or with limited
corporate memory of its failure history. The intention was to address all of the failure
mechanisms that are known to happen and that are expected to occur at least once in the life of
the equipment. This is also the philosophy of reliability-centered maintenance (RCM).
However, your current PM tasks for a critical component may differ from the tasks
recommended in the EPRI Template. In this case, you should review the vendor
recommendations for additional task insights and investigate the coverage of failure mechanisms
using this step of the procedure. See the overall recommended approach to PM optimization in
the PM Optimization flow chart in Figure1-1.
If your past history of failures has been unsatisfactory for a non-critical component and you do
not find the EPRI recommendations acceptable, you would follow this step of the procedure to
decide among the options. See the overall recommended approach to PM optimization in the PM
Optimization flow chart in Figure 1-1.
The purpose of this step is to permit you to select among the tasks with insight into which failure
mechanisms are being protected against and which are not. The tasks selected must have
adequate applicability for a wide range of failure causes for critical components. Among these,
2-3
EPRI Licensed Material
the most common failure mechanisms should be particularly well-protected against for critical
components.
Some, if not all, of the most common failure causes should be protected against for non-critical
components, but it is usually not economically feasible to provide very wide coverage.
Sometimes non-critical components may need to be protected against only one or two
particularly catastrophic failure mechanisms.
To provide protection, a task must contain activities that address the failure mechanisms. When a
task is assigned that in principle is applicable to a failure mechanism, the degree of protection
provided by the task is largely a matter of the interaction between the task interval and the
expected time to failure. That interaction is dealt with by the Interval Evaluation in Section 3.
The best way to think about the PM objectives is first to consider if the component is critical,
that is, whether it is worth trying to prevent all failures expected during the life of the equipment.
If not, the component is probably run-to-failure (RTF). However, before consigning it to receive
no PM tasks at all, you should determine if there are other reasons to require some minimal level
of PM. If there are other reasons, the component is non-critical.
Furthermore, any Maintenance Rule structures, systems, and components (SSC) functions, which
have very restrictive performance criteria (that allow only one failure in two years among the
whole set of components that provide the function), are not likely to meet the performance
criteria unless the PM objective for individual components is to prevent all failures. Therefore,
components whose failure can defeat an SSC function that has such tight performance criteria
will usually be critical components, regardless of whether they are risk significant or not.
However, some components might have to meet a more relaxed reliability target (for example, a
maximum of three failures in two years), so individual failures are not too important, but they,
nevertheless, should not be allowed to fail too often. Or a component might cause significant
costs when it fails, although such costs are not on the scale of a loss of production. This could be
a result of self or secondary damage, additional waste disposal costs, additional testing or
requalification of equipment, or significant radiation exposure during repair (more than during
PM). All of these components would be classified as non-critical.
2-4
EPRI Licensed Material
In addition, you should always question if a failure would be evident in a timely way to the
operating staff and could be important if it were not detected. If this is the case, you need to
include some kind of functional test as a failure-finding task. The importance of the failure
should at least reach the non-critical level.
The set of tasks recommended by the Database can be seen most easily in the Template form
[EPMB 2.1], which shows the range of tasks and intervals recommended for eight combinations
of Critical, Non-Critical, High and Low Duty Cycle, and Severe and Mild Service Conditions.
All the tasks on the Template are candidates unless they are designated NR (Not Recommended)
in the matrix for the relevant column.
Before selecting a column that corresponds to the Critical (C), Non-Critical (N), High Duty
Cycle (H), Low Duty Cycle (L), Severe Service Condition (S), or Mild Service Condition (M),
you can find the definitions of these terms in the Definitions of Template Application
Conditions [EPMB 2.5].
Note any tasks that do not apply. The reason for this is that the next step involves selecting tasks
from the Correlation Table [EPMB Table 3.2] on which all the tasks are displayed regardless of
the criticality, duty cycle, and service conditions. The scope of most tasks is outlined in the PM
Application Notes [EPMB 2.3].
The simplest procedure, recommended for this Short PM Task Evaluation Process, is to select all
the PM tasks recommended in the appropriate column of the Template, as described in the
previous step. However, if these differ appreciably from current tasks for a critical component
(regardless of failure history) or if you do not find them acceptable (for example, not cost
effective) for a non-critical component that has an unsatisfactory failure history, you can use the
following procedure to investigate the effectiveness of PM tasks against relevant failure
mechanisms. An overview of these options can be seen in the PM Optimization flow chart in
Figure 1-1.
The failure mechanisms that each task protects against are shown in the Correlation Table
[EPMB Table 3.2] in column Location/Degradation. An X in a task column indicates that the
task is applicable to the failure mode shown in the row. Remember that the Correlation Table
shows all of the tasks, not just those recommended on the Template for a specific set of
conditions.
There are priority failure mechanisms that should be covered by the PM program with a high
degree of confidence for critical components. These priority mechanisms include the most
common failure causes, which can be found in Building A PM Strategy [EPMB 3.1], together
with any failure mechanisms that you know have happened before at this plant, especially those
that have occurred more than once.
2-5
EPRI Licensed Material
If a non-critical component needs protection against only one or two specific failure
mechanisms, these rows in the Correlation Table should be covered by at least one candidate
task. Otherwise, non-critical components should have one applicable and cost-effective PM task
for each of the most common failure mechanisms.
A good approach to selecting tasks is to first select the task that seems to cover the most priority
failure mechanisms or that is preferred for other reasons, and find the failure mechanisms that are
not covered by it. If any of these failure mechanisms are in the priority group, you need to select
additional tasks to cover them. After all priority failure mechanisms are covered, you can check
to ensure that enough of the other failure mechanisms are also covered and that those left
uncovered do not include any that you do not want to leave unprotected.
In general, task selection should be biased in favor of condition-monitoring tasks and in favor of
the most cost-effective tasks. Do not rely on operator rounds as the only task to address a priority
failure mechanism unless it is clear that an average operator can detect the incipient failure
before it happens.
Do not forget a failure-finding task if one was indicated by Question B. Most of the Templates
include one or two such tasks as functional or operability tests. If you need a failure-finding task
and one has not been selected by the above process, you should add it.
Candidate PM Tasks
6. Which PM tasks are recommended by the vendor?
7. Which PM tasks are recommended by the EPRI PM Basis Database?
2-6
EPRI Licensed Material
Cost Effectiveness
Each question is addressed in two parts. The first part (see Section 2.3.2) contains the
interpretation of the question; it explains what the issue is about and why it is important. The
second part (see Section 2.3.3) shows you how to get the answer to the question, most often by
reference to the EPRI PM Basis Reports. These references are abbreviated thus: [EPMB 3.2], to
mean that the material can be found in Section 3.2 of each EPRI PM Basis Report.
This section explains what is meant by each question and why it is important.
The objective of PM could be to prevent all failures of a component or to limit the number of
failures to an acceptable level. Components with these two PM objectives are classified in RCM
and PM optimization processes as critical and non-critical, respectively. Both cases lead you to
consider performing some PM tasks. If neither of these objectives apply to this component, it
should be classed as run-to-failure (RTF), which means that no PM tasks of any kind will be
performed on it.
The PM objective for critical components is to prevent all failures that are known to occur and
are expected to occur at least once in the life of the equipment. Critical components must,
therefore, have sufficient PM coverage in scope and frequency to address a wide spectrum of
possible failure mechanisms and certainly all the common failure mechanisms.
In some cases, you may not be able to find cost-effective PM tasks for a critical component. If a
design modification cannot eliminate the failure mechanism or its effects or if a task is not cost
effective, the component defaults to RTF, and you have to accept the cost of a failure.
Components that lack important functions may simply be allowed to fail if there are no serious
consequences. They can be repaired after they fail, but they would not merit expenditure of
maintenance resources to prevent the failure. Such components would be run-to-failure.
However, many components fall between these extremes. For example, they might have to meet
a reliability target that permits several failures in a certain time period, but they should not be
allowed to fail too often. Or they result in significant costs when they fail even though such costs
are not on the same scale as a loss of production. This could be a result of self or secondary
damage, increased waste disposal costs, additional testing or requalification of other equipment,
or significant radiation exposure during repair (more than during PM). All such components are
classed as non-critical.
2-7
EPRI Licensed Material
Non-critical components might need a PM task to address just one or two catastrophic failure
mechanisms if the objective involves preventing expensive damage to the component or to
another component. If the objective is to meet a performance criterion that permits several
failures, it should be sufficient to address only the most common failures.
You may not be able to find cost-effective PM tasks for non-critical components. In these cases,
the components default to RTF, and you have to accept the cost of a failure.
Failure finding refers to the practice of performing a test to determine if the component has
already failed. Failure finding is important only when a component failure would not be evident
to the operating staff soon after it occurs. In this case, the component could remain failed for a
long time, possibly causing a more damaging situation as failures to other components occur
over time.
Most failure-finding tasks in a nuclear power plant are embodied in technical specifications as
surveillance tests applied to trains of standby safety systems. However, the possible need for a
failure-finding task should always be kept in mind, even for non-critical components.
Additional Perspective
Failure-finding tasks may improve the probability that a component that is required to operate on
demand will actually operate when required. If the probability is improved, it is because there is
a standby failure rate over time, and the standby mission time depends on the test interval.
However, in this case, these tasks do not decrease the number of failures experienced during
operation of the component or even the number of failures during standby because they do not
decrease the run failure rate or the standby failure rate. The improvement in the failure to start
probability comes only because of the decrease in average unavailability related to the test
interval. Therefore, the test intervals control the unavailability of a standby component.
Failure-finding tasks will, however, increase the total number of failures experienced when the
results of the tests themselves are included under an alternative and very common assumption.
The assumption is that the probability of failure in a test is a constant. That is, it is not the result
of failure during the standby time, and moreover, the component cannot fail during standby. This
leads to more failures when there are more tests. Under this restrictive assumption, failure-
finding tasks have absolutely no effect on either the reliability or the availability of the
component during standby and also have no effect on its reliability during its mission time. The
number of failures experienced from real demands remains the same, equal to the number of
demands times the constant probability of failure on demand, and does not involve the test
interval.
2-8
EPRI Licensed Material
Under either assumption, it is clear that the more a component is tested, the more opportunity
there is for it to be left in an inoperative condition by human error, and there may also be an
increase in wear on the component caused by increased testing.
The purpose of this explanation is to emphasize that the nuclear industry’s belief in the
effectiveness of surveillance testing must stem from a conviction that equipment is most likely to
fail during the time elapsed in standby. Otherwise, testing without predictive capability is
irrational. Additionally, testing should be no more frequent than absolutely necessary, and
failure-finding tests should embody as much predictive capability as possible. The test of
applicability for failure finding to be predictive is the ability to detect incipient failure with a lead
time approaching the test interval.
Critical components need defense against the wide range of events that are expected to occur at
least once in the life of the equipment. They need excellent defense against the most likely
failure mechanisms. Non-critical components need defense against at least some of the most
likely things that can happen. Cost-effectiveness considerations in later questions may limit what
can be achieved for non-critical components.
It will be presumed that past events are indicative of future events unless there is a convincing
reason to believe that the root causes of past failures have been eliminated. Critical components
certainly need defense against failure mechanisms that have occurred before in the same plant or
that are indicated by as-found condition information.
The susceptibility of individual components to failures can be addressed in four ways. A broad
treatment of most of these effects is obtained by characterizing service conditions to encompass
environmental effects and by using the duty cycle to characterize how equipment is operated. In
the Database, these two parameters exert an overall influence on the PM program, mainly
through task intervals but sometimes through the choice of tasks. The third approach asks if there
are pervasive and extreme local conditions, for example, a high moisture level, which may
suggest extra protection against specific failure mechanisms. Finally, as-found equipment
condition reports may indicate equipment that is experiencing more degradation than expected.
Service conditions refer to the existence of local environmental factors such as heat, vibration,
oil vapor, salt spray, moisture, and so on. The Database characterizes these as either severe or
mild service conditions. Further information on service conditions can be found in the EPRI PM
Basis Overview Report (TR-106857).
Duty cycle refers to the way in which the equipment is used, for example, continuous, standby,
intermittent, fully loaded, partially loaded, long periods of inactivity, etc. The Database treats
duty cycle as being either high or low. Further information on duty cycles can be found in the
EPRI PM Basis Overview Report (TR-106857).
2-9
EPRI Licensed Material
The vendor manual and subsequent vendor communications generally include recommended PM
tasks and intervals. These recommendations are often conservative for a variety of reasons. In
general, you should make your own decisions on which tasks and intervals are appropriate, using
your own operating experience and industry operating experience as exemplified by your
industry operating experience program, NMAC reports, and the EPRI PM Basis Reports. Vendor
recommendations can be especially relevant when you have equipment with design features that
are not shared by the majority of power plants and that are not included in the EPRI PM Basis
Reports.
The tasks selected must have adequate applicability for a wide range of failure causes for critical
components. Among these, the most common failure mechanisms and those that have happened
before at this plant should be particularly well covered for critical components.
Some, if not all, of the most common failure causes should be covered for non-critical
components. Sometimes non-critical components may need only one or two particularly
catastrophic failure mechanisms to be covered.
Critical components should not be left with unprotected failure mechanisms unless the risk is
explicitly acknowledged. To be applicable, a task must contain activities that address the failure
mechanisms. This involves the task scope and content. After a task is assigned that in principle is
applicable to a failure mechanism, the degree of protection provided by the task is largely a
matter of the interaction between the task interval and the expected time to failure. That
interaction is dealt with by Interval Evaluation in Section 3.
Condition-monitoring tasks are tasks that measure or detect some aspect of degradation before
the failure point is reached. If they permit a meaningful estimate of time remaining to failure or
at least can detect a degraded condition with enough lead time to permit a planned outage rather
than a forced outage to make repairs, they are referred to as predictive PM tasks. Condition-
monitoring tasks are likely to be more cost effective than time-directed tasks, even though they
must typically be performed fairly frequently. An additional advantage is that they are usually
not intrusive tasks, although some equipment isolation and realignment may be needed. This
2-10
EPRI Licensed Material
means that condition-monitoring tasks do not carry a significant risk of causing additional failure
mechanisms, that is, caused by the PM task itself.
Condition-monitoring tasks can be effective against random failures such as those giving rise to a
constant failure rate and also against wearout failures for which the times to failure are very
uncertain and/or variable.
If condition-monitoring tasks were indeed favored over time-directed tasks during Step 8, there
should be no significant changes introduced in this step. However, it is possible that a condition-
monitoring task was not accepted as a candidate because of a lack of confidence in its ability to
detect degradation. It is the purpose of this step to question this assumption and to seek
alternatives to costly time-directed tasks.
This question simply requires a check that the most cost-effective time-directed tasks have been
selected. Time-directed tasks cannot be very effective when the failure time is truly a random
variable because there is then no “good” time to perform the task. The existence of a significant
number of random failure mechanisms seriously dilutes the effectiveness of a time-directed task,
even when it is capable of detecting the types of degradation involved.
The best way to think about the PM objectives is first to consider if the component is critical,
that is, whether it is worth trying to prevent all failures expected during the life of the equipment.
If not, the component is probably run-to-failure (RTF). However, before consigning it to receive
absolutely no PM tasks at all, you should determine if there are other reasons to require some
minimal level of PM. If there are other reasons, the component is non-critical.
Furthermore, any Maintenance Rule SSC functions, which have performance criteria that allow
only one failure in two years among the whole set of components that provide the function (for
example, the train), are not likely to meet the performance criteria unless the PM objective for
individual components is to prevent all failures. Therefore, components whose failure can defeat
an SSC function that has such tight performance criteria will usually be critical components
regardless of whether they are risk significant or not.
2-11
EPRI Licensed Material
However, some components might have to meet a more relaxed reliability target (for example, a
maximum of three failures in two years), so individual failures are not too important, but they,
nevertheless, should not be allowed to fail too often. Or a component might cause significant
costs when it fails even though such costs are not on the same scale as a loss of production. This
could be a result of self or secondary damage, additional waste disposal costs, additional testing
or requalification of other equipment, or significant radiation exposure during repair (more than
during PM). All of these components would be classed as non-critical.
This is simply a matter of deciding if a failure would be evident in a timely way to the operating
staff and if the undetected failure could be important. The importance of the failure should at
least reach the non-critical level.
Look in the Correlation Table [EPMB Table 3.2] at column Location/Degradation. Notice that
these failures are already correlated with PM tasks that address them. The events listed can be
expected to happen in power plants at least once in the life of the equipment. These data fields
are further described in the EPRI PM Basis Overview report under Failure Locations and under
Degradations and Influences.
An additional element you need to note is the set of priority failure mechanisms, that is, those
that are the most likely failure mechanisms and those that have already occurred. The most likely
ones can be found in Building A PM Strategy [EPMB 3.1]. The common failure causes are
described in varying degrees of detail and do not correspond one-to-one to records in the
Correlation Table. Be careful to make a note of all failure locations and degradation
mechanisms that correspond to the common failure causes.
Industry information should be obtainable from your operating experience program, for example,
vendor bulletins and Generic Letters, and other industry sources such as NMAC reports. Add as
new records at the bottom of the Correlation Table any additional failure mechanisms from
these sources that you think are credible during the life of the equipment. Mark with an X each
PM task that could detect the degradation before failure or prevent it from occurring.
It is not usually worth reviewing maintenance work orders (MWOs) to find previously
experienced failure mechanisms. Instead, rely more on the memory of experienced people. You
might use the list of generic failure mechanisms in the Correlation Table [EPMB Table 3.2] to
trigger their recollections. Broaden the question to other components of the same type with
similar duty cycles and service conditions. The number of events is not as important as the
failure mechanisms. However, the existence of several corrective MWOs might indicate
recurring problems. Note the failure mechanisms that have occurred previously, especially any
that have occurred more than once, and any that have been indicated by as-found condition
reports, and add them to your list of priority failure mechanisms.
2-12
EPRI Licensed Material
For critical equipment only, you are looking for specific factors that apply to this equipment that
may promote certain failure mechanisms. You may also need to specify service conditions and
duty cycle.
It is usually not necessary to characterize service conditions or duty cycle to make task
selections; these factors play their main role in determining task intervals. However, sometimes
they are relevant to task selection, so check the Template Form [EPMB 2.1] to see if this is the
case. To decide whether service conditions are severe or mild, and whether the duty cycle is high
or low, look at the list of relevant factors in the Definitions Of Template Application
Conditions [EPMB 2.5].
For critical equipment only, you should note any obvious, outstanding, and pervasive influences
that would not be expected in a nominal environment (for example, high temperature) that
clearly apply to the equipment in question. The final step described below is to find the failure
mechanisms that are driven by these effects.
To follow up on these susceptibilities for critical equipment, look at the Failure Location,
Degradation Mechanism, and Degradation Influence columns of the Degradation Table
[EPMB Table 3.1]. If you know or suspect particular susceptibilities as noted above, look for
them in the Degradation Influences column. The associated failure locations and degradation
mechanisms will enable you to ensure adequate coverage by PM tasks. You may want to make a
separate list of these susceptibilities for later use. Include any that are indicated by as-found
condition reports.
Your vendor manuals will recommend PM tasks and intervals. This step is simply intended to
obtain this information for critical equipment only. Check the vendor tasks against the
recommendations in the EPRI PM Basis Database performed in the course of answering
Question 7. The Database is already moderately conservative in its recommendations, so if the
vendor recommends tasks that the EPRI Database omits, there should be a good reason for
including them in your PM program.
The most likely technically justifiable reason is that your equipment has design features that
were not included in the EPRI Database. One way to check this is to examine the list of failure
locations and degradation mechanisms in the Failure Location column and the Degradation
Mechanism column of the Degradation Table [EPMB Table 3.1].
The set of tasks recommended by the Database can be seen most easily in the Template form
[EPMB 2.1], which shows the range of tasks and intervals recommended for eight combinations
of Critical, Non-Critical, Duty Cycle, and Service Conditions. This form is the basic roadmap to
PM tasks in the Database. Select the column that corresponds to the Critical (C), Non-Critical
(N), High Duty Cycle (H), Low Duty Cycle (L), Severe Service Condition (S), or Mild Service
2-13
EPRI Licensed Material
Condition (M), as determined by previous questions. Select each task as a candidate unless it is
designated NR (Not Recommended) in the matrix for the selected column.
The scope of each task is explained in the PM Application Notes [EPMB 2.3]. This contains an
outline of the line item activities to be covered by the task. Check to see if vendor tasks and
current tasks match Database tasks reasonably well in scope. If they do, they are already
candidates. If they do not, do not select the vendor tasks as candidate tasks unless you have a
technical reason that will become the basis for the task (for example, special task for failure
mechanisms affecting a unique design feature or enhanced PM to address a historical and plant-
specific susceptibility).
Prepare a list of candidate tasks. Even if your intention is ultimately to select only a single task
or if you are evaluating only a single task, it is a good idea to include alternatives on the
candidate list.
The most common failure mechanisms and those that have happened before at this plant should
have been noted as priority failure mechanisms. These priority failures should be covered by the
PM program with a high degree of confidence for critical components. This could mean
addressing each of them with at least two applicable PM tasks as far as practical. The majority of
the other failure mechanisms should be covered by at least a single task. However, if you have to
leave some failure mechanisms without any protection at all, you should check that these do not
consist of any that have occurred before and that they do not contain any to which the component
has a predisposition to fail (that is, those you listed as having a susceptibility).
The first two of these three control points are addressed in this step.
Task choices can be made quickly using the Correlation Table [EPMB Table 3.2]. The best
approach to selecting tasks is to first select the task that seems to cover the most priority failure
mechanisms and then to select additional tasks to cover the failure mechanisms that are not
covered by the task. After all the priority failure mechanisms are covered, you can determine that
enough of the generic failure mechanisms are also covered and that those left uncovered do not
include mechanisms to which the component has a predisposition to fail.
For critical components, the failure mechanisms that are not covered by any PM task should be
explicitly acknowledged. If a non-critical component needs protection against only one or two
specific failure mechanisms, these rows should be covered by at least one candidate task.
Otherwise, non-critical components should have at least one applicable PM task for each of the
priority failure mechanisms.
2-14
EPRI Licensed Material
In general, task selection should be biased in favor of condition-monitoring tasks. Do not rely on
operator rounds as the only task to address a priority failure mechanism unless it is clear that an
average operator can detect the incipient failure before it happens.
Do not forget a failure-finding task if one was indicated by Question 2. Most of the Templates
include one or two such tasks as functional or operability tests. If you need a failure-finding task
and one has not been selected by the above process, you should add it.
Review of task content could be done here but is best left until cost-effectiveness issues have
been addressed in Step 10.
If a potentially useful condition-monitoring task was rejected in the selection of candidate tasks
because of a lack of confidence in its ability to detect incipient failures, you might reconsider
that decision in the light of three important facts:
x Condition-monitoring tasks are performed frequently. This means that if the interval is short
enough and the “look ahead” capability of the task is great enough, you may get more than
one opportunity to detect accumulating degradation. This can increase the applicability of the
tasks.
x Condition-monitoring tasks only disturb the equipment in minor ways, if at all, and so have
much less chance than intrusive tasks of introducing additional failures. This is a major
benefit because maintenance error is a significant contributor to many intrusive time-directed
PM tasks and can be responsible for increasing failure rates by several hundred percent in
comparison with nonintrusive tasks. Whenever equipment is reassembled, often with new
parts, it experiences an infant mortality phase again. Infant mortality is easily observable in
reliability statistics of most equipment, and failure rates early in life are usually several times
higher than failure rates of mature equipment. It is also found that rework rates in nuclear
power plants peak soon after a refueling outage, and forced outages in fossil power plants
peak soon after a major maintenance outage. These effects are large; consequently,
condition-monitoring tasks could be preferred even if they do not detect degradation quite as
reliably as time-directed tasks.
Equipment varies greatly in its susceptibility to maintenance error. To determine the degree
to which this might be important, look at the Determination Of Time Directed Task
Intervals [EPMB1.5]. For many component types, but not all, utility experts listed the most
likely ways in which maintenance error causes failures. Other insights can be obtained by
looking at the Degradation Table [EPMB Table 3.1]. The failure locations and degradation
mechanisms where maintenance error contributes are those with “maintenance error,”
“improper assembly,” or “personnel error” entries in the Degradation Influences column.
x Many condition-monitoring tasks are less costly than time-directed tasks, notwithstanding the
higher frequency of performance. The combination of enhanced cost effectiveness and
nonintrusiveness could lead to additional condition-monitoring tasks being proposed beyond
those in the first set of candidates. To prompt ideas for additional monitoring techniques,
look at the Discovery Opportunity column of the Degradation Table [EPMB Table 3.1].
2-15
EPRI Licensed Material
It may be possible to relax the technical coverage of the failure mechanisms somewhat in return
for implementing a less expensive time-directed task. This is a matter of reexamining the Step 8
evaluation. A sense of the trade-off on time-directed task intervals can be obtained by looking at
the Template [EPMB 2.1].
At the end of this step, it is a good idea to review the summary of task content for each task to
ensure that each task does what you need it to do. Look at the Task Content List in the PM
Application Notes [EPMB 2.3].
2-16
EPRI Licensed Material
3
INTERVAL EVALUATION GUIDELINE
3.1 Context
Before proceeding with Interval Evaluation, it is recommended that you examine the sample
procedure described in the PM Optimization flow chart in Figure 1-1. This procedure provides
an approach to PM optimization that considers criticality, failure history, vendor
recommendations, current PM tasks and intervals, and the acceptability of the EPRI
recommendations. Figure 1-1 makes reference to both PM Task Evaluation and Interval
Evaluation and serves as an introduction to how a plant PM optimization project can use these
evaluation processes. The Interval Evaluation process can also be used to address issues
concerning individual task intervals.
Interval Evaluation selects and evaluates PM task intervals to provide adequate protection
against component failures. Evaluation of whether the tasks themselves are technically
applicable and cost effective is done separately using PM Task Evaluation, described in Section
2. Do not attempt to evaluate or change a task interval in response to concerns about equipment
condition or reliability until you are sure that the correct PM tasks are being performed.
Evaluation of a task interval should be carried out in the context of other PM tasks that are being
performed on the component.
The main reasons for these evaluations could be one or more of the following:
x To optimize tasks and intervals as part of a programmatic improvement in which a large
number of tasks and components are addressed
x To change individual task intervals in response to poor reliability
x To change individual task intervals in response to unacceptable or consistently good material
condition
x To justify deferring a task
Evaluating the intervals follows the same general process regardless of the reason for the
evaluation; however, a version that is focused on justifying task deferrals can be found in
Section 6.
There are also three initial questions that should always be considered when evaluating a PM
task interval, but they cannot be answered by the Database. They all presume that a specific task
interval is being evaluated.
3-1
EPRI Licensed Material
x Is this task interval specified by the plant operating license such as for Technical
Specifications, EQ, Appendix R, etc?
x Is this task interval specified by a relevant code or standard such as ASME Section XI?
x Is this task interval part of a management commitment to regulators, insurance, etc?
If the answer to any of these questions is Yes, you will probably continue to implement the
existing task interval in the short term. However, further evaluation of the interval might suggest
that changing the requirement would add value.
Each question is addressed in two parts. The first part (see Section 3.2.2) is interpretation of the
question and explains what the issue is about and why it is important. The second part (see
Section 3.2.3) shows you how to get the answer to the question, most often by reference to the
EPRI PM Basis Reports. These references are abbreviated thus: [EPMB 2.5] to mean that the
material can be found in Section 2.5 of each EPRI PM Basis Report.
This section explains what is meant by each question and why it is important.
The objective of PM could be to prevent all failures of a component or to limit the number of
failures to an acceptable level. Components with these two PM objectives are classified in PM
optimization processes as critical and non-critical, respectively. Both types of components lead
you to consider performing some PM tasks. If neither objective applies to this component, it
should be classed as run-to-failure (RTF), which means that no PM tasks of any kind will be
performed on it.
The PM objective for critical components is to prevent all failures that are known to occur and
are expected to occur at least once in the life of the equipment. Critical components must have
sufficient PM coverage in scope and frequency to address a wide spectrum of possible failure
mechanisms and certainly all of the common failure mechanisms.
3-2
EPRI Licensed Material
In some cases, you may not be able to find cost-effective PM tasks for a critical component. If a
design modification cannot eliminate the failure mechanism or its effects, or if a task is not cost
effective, the component defaults to RTF, and you have to accept the cost of a failure.
Components that lack important functions may simply be allowed to fail if there are no serious
consequences. They can be repaired after they fail, but they would not merit expenditure of
maintenance resources to prevent the failure. Such components would be run-to-failure.
However, many components fall between these extremes. For example, they might have to meet
a reliability target that permits several failures in a certain time period, so they should not be
allowed to fail too often. Or they result in significant costs when they fail although such costs are
not on the same scale as a loss of production. This could be a result of self or secondary damage,
increased waste disposal costs, additional testing or requalification of other equipment, or
significant radiation exposure during repair (more than during PM). All such components are
classified as non-critical.
Non-critical components might need a PM task to address just one or two catastrophic failure
mechanisms if the objective involves preventing expensive damage to the component or to
another component. If the objective is to meet a performance criterion that permits several
failures, it should be sufficient to address only the most common failures.
You may not be able to find cost-effective PM tasks for non-critical components. In these cases,
they default to RTF, and you have to accept the cost of a failure.
The purpose of this question is to identify the intervals recommended by the Database. Task
intervals recommended by the Database may depend on criticality, duty cycle, and service
conditions. You should be aware that the tasks in the Database may not be packaged in the same
way as current tasks, that is, certain line item activities may be performed as part of another task,
and tasks that have similar names may differ appreciably in scope.
3. Is there a technical reason why the interval is longer or shorter than the EPRI PM
Basis Database recommendation?
The expert panel of utility and vendor personnel who recommended the tasks and intervals in the
Template made the recommendations as being moderately conservative choices that could be
applied by a utility with little operating experience or with limited corporate memory of its
failure history. The purpose of this step is to permit you to be guided by the recommendation, but
you are expected to apply insight related to the conditions and history at your plant. The most
important factors to be considered are the history of component failures and reports of
component condition.
3-3
EPRI Licensed Material
The answer to this question will already be known if the process in Section 2 has been
completed. If not, proceed as follows.
The best way to think about the PM objectives is first to consider if the component is critical,
that is, whether it is worth trying to prevent all failures expected during the life of the equipment.
If not, the component is probably run-to-failure (RTF). However, before consigning it to receive
no PM tasks at all, you should determine if there are other reasons to require some minimal level
of PM. If there are other reasons, the component is non-critical.
Furthermore, any Maintenance Rule SSC functions, which have very restrictive performance
criteria (that allow only one failure in two years among the whole set of components that provide
the function, for example, the train), are not likely to meet the performance criteria unless the
PM objective for individual components is to prevent all failures. Therefore, components whose
failure can defeat an SSC function that has such tight performance criteria, will usually be
critical components, regardless of whether they are risk significant or not.
However, some components might have to meet a more relaxed reliability target (for example, a
maximum of three failures in two years), so individual failures are not too important, but they,
nevertheless, should not be allowed to fail too often. Or a component might cause significant
costs when it fails, although such costs are not on the same scale as a loss of production. This
could be a result of self or secondary damage, additional waste disposal costs, additional testing
or requalification of other equipment, or significant radiation exposure during repair (more than
during PM). All of these components are classified as non-critical.
In addition, you should always question if a failure would be evident in a timely way to the
operating staff and could be important if it were not detected. If this is the case, you need to
include some kind of functional test as a failure-finding task. The importance of the failure
should at least reach the non-critical level.
The set of intervals recommended by the Database can be seen most easily in the Template form
[EPMB 2.1], which shows the range of tasks and intervals recommended for eight combinations
of Critical, Non-Critical, High and Low Duty Cycle, and Severe and Mild Service Conditions.
3-4
EPRI Licensed Material
Select the column that corresponds to the Critical (C), Non-Critical (N), High Duty Cycle (H),
Low Duty Cycle (L), Severe Service Condition (S), or Mild Service Condition (M). Check the
definitions of these terms in the Definitions Of Template Application Conditions [EPMB 2.5]
to select the combination that applies to the component in question. To be sure that you are
seeking an interval for the appropriate task name in the Template, check its scope, outlined in the
Task Contents list of the PM Application Notes [EPMB 2.3].
3. Is there a technical reason why the interval is longer or shorter than the EPRI PM
Basis Database recommendation?
The task interval appropriate for your plant may differ from the Database recommendation. It
might be shorter if you have experienced one or more failures or poor equipment condition
during previous PM task intervals. It might already be longer if past as-found equipment
condition was invariably good at previous task intervals and if you experienced no failures.
If the current interval is shorter than the EPRI recommendation, determine that there is no
technical reason in the history of the equipment that is responsible, for example, failures or
adverse equipment condition or trend. If there is no technical justification for limiting the current
interval, proceed at once to extend the interval to the EPRI recommended value, even if the
change is greater than 25%. The EPRI value will then be the value used until further as-found
evidence justifies changing it. If there is a valid technical reason to restrict the interval, you
should retain the current interval.
Remember that time-directed PM tasks cannot provide a valid and effective defense against truly
random failure mechanisms, that is, those for which you do not expect any kind of failure-free
period; they can happen at any time, even to a new component. Therefore, if a time-directed task
interval is shorter than the EPRI recommendation due to past failures, poor equipment condition,
or an adverse trend in condition, be sure that the task is indeed applicable to the relevant failure
mechanism. You can check this by looking at the Time of Failure column in the Task
Correlation Table [EPMB Table 3.2]. You will often find many “Random…” entries in the
Failure Timing column, which indicate that although the task can detect these degradations when
they occur, you should not expect to optimize this particular task interval on the basis of these
conditions. Nor should you permit the occasional failure from these random failure mechanisms
to influence the choice of the task interval if it is a time-directed task. Instead, select condition
monitoring type tasks to form the main defense against random failure mechanisms.
If the current interval is equal to or longer than the EPRI interval, determine that there is no valid
technical reason in the history of the equipment to prevent a further extension in interval from
being considered, for example, previous failures, adverse as-found equipment condition, or an
adverse trend in condition that may have caused it to be reduced to the current value. In the
absence of such evidence, proceed to the next step (4).
3-5
EPRI Licensed Material
The prime indicator of a need to reduce the interval is poor component condition. You should
reduce the interval if the component condition has deteriorated to the point where you have little
confidence that, even after it has been restored, it will remain unfailed through the following
interval. Sometimes, an adverse trend in equipment condition can be extrapolated to show the
likely condition by the end of the interval.
If you have already experienced one or more failures or have equipment in a severely degraded
condition, you need to establish the cause of failure or degradation to be sure that it is a
degradation mechanism that the task in question is supposed to address, that is, to detect before
failure occurs. In addition, the degradation must be of the wearout kind for a time-directed task
to have any significant probability of improving the situation. You can check which mechanisms
are the wearout kind by looking for a failure-free period in the Time of Failure column in the
Task Correlation Table [EPMB Table 3.2]. Be sure that the task was actually performed the
last time it should have been.
The time period over which the equipment should be free of failures should not be too long
O
(<<1/ ), but it should be long enough to contain at least a few task intervals. These
component-task intervals of experience may be accumulated over a group of like
components.
Note: The equipment condition requirement is necessary for a confident extension in interval,
and it requires a judgement that the condition is not merely good, but good enough to regularly
last to the extended interval.
There will be a more confident condition assessment if the condition of a number of similar
components is observed, if the observer knows what kind of degradation to look for, and if some
measured parameter can be trended. When a group of similar components is available for
3-6
EPRI Licensed Material
interval extension, it is beneficial to stagger the initiation of the interval increase among
components so that some components can deliver condition information at the extended interval
before the others reach the extended interval.
Read the Progression of Degradation to Failure, and Failure Discovery and Intervention text of
the PM Application Notes [EPMB 2.3] to find clues to whether the recommended interval is a
candidate for interval extension. Sometimes this is discussed explicitly, especially when there
appears to be little scope for interval extension.
Residual failure rates when PM is effective (that is, the interval is less than or equal to the
shortest failure-free interval) are normally low enough to result in mean time between failures
(MTBFs) around 10 to 25 years, even when randomly occurring failure modes are factored in.
Therefore, the fact that zero failures have been observed in two or three intervals, or in 10 or
even more years, cannot be taken by itself as a justification to extend the interval. Therefore, it is
not valid to extend intervals by referring to the absence of failures over time without also
considering the condition of the equipment.
25% Extension
If equipment condition is satisfactory, you should feel comfortable extending the interval by
about 25% without further analysis. This is because in the worst case where the original interval
is optimal (that is, the interval is already set at the shortest failure-free interval of all the relevant
wearout failure mechanisms), and some failure modes become unprotected by the change, the
failure rate is unlikely to increase by more than 15 to 30%. The importance of good equipment
condition at the existing interval is that it adds a significant measure of conservatism to this
estimate by making it much less likely that the worst case applies.
When extensions are required for practical reasons to be larger than 25%, for example when 1.5
years would be changed to three years, this 100% increase in interval may increase the failure
rate by 125 to 250% (that is, the new rate could be 3.5 times the old rate) in the above worst case,
that is, when you start at the optimum interval and failure modes become unprotected. However,
this result is not only the worst case but also supposes that you do not have any prior knowledge
of whether failure modes will become unprotected by the interval extension. Observing
consistently good equipment condition at the existing interval is one way to be sure that you will
not uncover failure mechanisms for even modest interval extensions. Judging that the condition
will remain good for the duration of a large proposed extension can be more demanding.
To add additional confidence that a large extension will not result in unacceptable failures, you
should try to ensure that one or more of the relevant failure modes do not lose their PM
protection. The relevant PM Task column in the Correlation Table [EPMB Table 3.2] will
display the Failure Locations, and Degradation Mechanisms that are addressed by the task. The
smallest failure-free interval in the Time of Failure column will give an idea of the optimal
interval with regard to that mechanism. Consider however, that not every failure mode is
relevant, that is, needs to be addressed, as the following list illustrates:
3-7
EPRI Licensed Material
1. For non-critical components you need to defend only against failure causes that have
occurred before at this plant, against the most common failure causes, and maybe not all of
these, depending on economic factors. Find the most common failure causes in Building A
PM Strategy [EPMB 3.1].
2. Random failure mechanisms cannot be defended against by time-directed tasks and should be
ignored for these tasks. Even if a time scale is indicated for them; they should be included for
condition monitoring tasks.
3. Even for critical components, you may leave certain failure mechanisms unprotected if you
believe that there is a technical reason (for example, design, operating conditions) why they
would not influence your equipment. You can look at the factors that influence the
degradation by finding the Degradation Influences column in the Degradation Table
[EPMB Table 3.1]. Depending on the situation, some of the wearout failure mechanisms (that
is, those with a failure-free interval) could nevertheless be left unprotected if all of the
following are true:
x They are not ones that have occurred before (see item 1).
x There is not a significant likelihood that the influences driving them will arise.
4. Failure modes might not need to be protected by the task in question if they are adequately
protected by another PM task that is performed at an appropriate interval. Check the other
task columns on the Correlation Table [EPMB Table 3.2].
5. You are currently evaluating failure mechanisms that can be addressed by the selected PM
task. There will probably be other failure mechanisms that are not covered by any PM task
and failures caused by the imperfect execution of PM tasks. The above numerical estimates
of the effect on failure rates assumed that the task in question addressed 50% of the wearout
failure mechanisms and that the interval in question belonged to a task that was 90%
effective.
The Interval Charts, in Appendix A of this report, provide a quick way to estimate the change in
failure rate for an increase in PM task interval. There are three charts for different values of task
effectiveness. As an example, if the task effectiveness is 70% and if the task addresses only 10%
of the wearout mechanisms, the E=0.7 chart shows an increase in failure rate that is only about
3% for a 50% interval extension.
3-8
EPRI Licensed Material
90
60
50
Rho = 1.0
40 Rho = 0.5
Rho = 0.3
Rho = 0.1
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
Figure 3-1
Worst Case Increase in Failure Rate for Increase in Task Interval, Task Effectiveness 70%
250
- Rho Is the Fraction of Wearout Mechanisms That Is Addressed by
the Task But Not Addressed by Any Other Tasks
150
Rho = 1.0
Rho = 0.5
100
Rho = 0.3
Rho = 0.1
50
0 10 20 30 40 50 60 70 80 90 100
Figure 3-2
Worst Case Increase in Failure Rate for Increase in Task Interval, Task Effectiveness 90%
3-9
EPRI Licensed Material
500
- Rho Is the Fraction of Wearout Mechanisms That Is Addressed by
450 the Task But Not Addressed by Any Other Tasks
- Task Effectiveness = 95%
400
Increase in Failure Rate (%)
350
300
100
50
0
0 10 20 30 40 50 60 70 80 90 100
Figure 3-3
Worst Case Increase in Failure Rate for Increase in Task Interval, Task Effectiveness 95%
3-10
EPRI Licensed Material
4
PM AUDIT GUIDELINE
4.1 Context
The following PM Audit process is intended to provide a quick survey of the PM tasks and
intervals on key equipment in a few systems. It provides conclusions on how well the tasks and
intervals conform to the suggestions in the EPRI PM Basis Database. It also provides technically
based recommendations for changes.
The main reasons for performing such an audit would be to validate other findings of
shortcomings in the PM program and to assess the nature and degree of changes required to
optimize the PM program. The audit can be performed on two or three systems in one week by
two persons, one of whom should be a PM coordinator or an equivalent person who understands
the PM program and is cognizant of the plant databases.
The following procedure assumes that the systems to be audited have been selected. The
methodology of the procedure is to compare plant PM tasks with the EPRI Database
recommendations using the forms that are included as Appendix B. This is repeated for specific
component tags; the reasons for any differences are then added, together with recommendations
for cases where there is a lack of justification. The set of forms becomes the audit report.
An interview with the appropriate system, component, or program engineer may be needed in
Step 8 and again later. Usually two brief interviews are needed, so it helps to conduct the audit
physically close to these individuals. Read through all the procedure steps before starting, and
give some thought to the order in which to address the component types and systems in relation
to the availability of staff.
1. Make a list of the major component types to be evaluated in the first system. For example,
these could be: horizontal pump, 4KV motor, 4KV breaker, 480V load center breaker,
Terry® turbine, check valve, MOV, AOV, and heat exchanger. The P&IDs and FSAR
description of the system can help.
2. Select the first of these component types and go to the PM Audit forms in Appendix B. Find
the form for the component type. Make a few copies of that page. Repeat this for the other
component types. You may subsequently need to repeat this action more times as you choose
to evaluate further examples of the same component types. The audit is performed by
completing each table, adding recommendations, and summarizing the overall findings in a
few paragraphs.
4-1
EPRI Licensed Material
PM Audit Guideline
3. Enter the tag numbers for a group of components of the first component type in the upper left
part of the first table (Component IDs). This group should all have the same duty cycle,
service conditions, and functional importance (for example, “SW-P1 to SW-P5” for five
service water pumps).
4. Enter the functional importance of the components as “critical,” or “non-critical,” in the top
right box next to “Category.” The objective of PM could be to prevent all failures of a
component as far as possible (a critical component) or to limit the number of failures to an
acceptable level (a non-critical component).
Components may be critical by virtue of their importance to safety, to production, or to both.
Examples of critical components would be those causing a trip, a power reduction >5%, or a
plant transient; entering an LCO without a good expectation of restoring to operability before
expiration of the AOT; causing a loss of a safety function; causing a loss of a redundant train
of a safety system, causing a personnel hazard, or causing a large, uncontrolled
environmental release of a toxic substance.
Furthermore, any Maintenance Rule SSC functions, which have performance criteria that
allow only one failure in two years among the whole set of components that provide the
function (for example, the train), are not likely to meet the performance criteria unless the
PM objective for individual components is to prevent all failures. Therefore, components
whose failure can defeat an SSC function that has such restricted performance criteria will
usually be critical components regardless of whether they are risk significant or not.
However, some components might have to meet a more relaxed reliability target (for
example, a maximum of three failures in two years, or even more), so individual failures are
not too important, but they, nevertheless, should not be allowed to fail too often. Or a
component causes significant costs when it fails even though such costs are not on the same
scale as a loss of production. This could be a result of self or secondary damage, additional
waste disposal costs, additional testing or requalification of other equipment, or significant
radiation exposure during repair (more than during PM). All of these components are classed
as non-critical.
5. It is not necessary to decide on duty cycle and service conditions if these characteristics make
no difference to tasks and intervals. To determine if it is necessary to know the duty cycle
and service conditions, look either at the Template form [EPMB 2.1] for the component type
or at the Template Sensitivity table in Appendix C of this report. The Template will show
explicitly whether duty cycle and service conditions make a difference to the tasks and task
intervals. The Template Sensitivity table is a summary of these dependencies for critical
components. If the component type shows a “T” or an “I” on the Template Sensitivity table,
it means that the tasks or intervals depend on the relevant characteristic. An “I,M” means that
the interval has a minor dependence, within the discretionary ability of a plant to decide its
own initial interval. Shaded entries indicate cases where there is either no dependence on the
relevant characteristic for tasks or intervals, only a minor dependence for task intervals, or no
need for a separate assessment of criticality. It is suggested that you use the Template
Sensitivity table for easy reference for critical components, which are likely to be the
majority of those evaluated.
4-2
EPRI Licensed Material
PM Audit Guideline
If you need to, check the factors that affect the duty cycle and service conditions for the tags
in question by looking at the Definitions Of Template Application Conditions [EPMB
2.5], and then add the duty cycle and/or service conditions to the criticality assignment on the
audit form (for example, critical, high, mild). Reference to the Template Sensitivity table can
also avoid spending time on criticality assignments in some cases (for example, compressors
or positive displacement pumps).
6. Complete the third box on the right of the audit form by adding the task intervals from the
EPRI Database Template to the row marked “Industry PM Basis Tasks and Intervals.”
7. Search for plant PM tasks that correspond to those on the audit form. You should refer to the
PM Application Notes [EPMB 2.3] for the component type to check the content of tasks
such as “External Visual Inspection,” “Off-line Electrical Tests,” and others where it may not
be obvious whether the plant task has equivalent content. Enter the plant intervals in the
second box on the right of the audit form, or enter “No Task.” Add tasks that do not
correspond to the EPRI Database tasks. The tasks in the second and third boxes have a one-
to-one correspondence.
Do not ignore regulatory programs such as Technical Specification Surveillance Tests, EQ,
Appendix J, ASME Section XI testing, check valve and MOV programs, or other plant
programs such as thermal performance monitoring. Also include operator rounds, engineer
walkdowns, and tasks that may not exist as scheduled PM tasks such as thermography
rounds.
It has been found that the availability of information on plant PM tasks has a direct bearing
on how much work can be accomplished in an audit of fixed length. The additional value of
the EPRI Database in this step is that the search is not vague and open ended, but focused
and directed. An interview may be necessary with the system, component, or program
engineer to answer questions about plant PM tasks, and to verify the information added on
duty cycle and service conditions in steps 5 and 6.
8. Compare the plant tasks and intervals with the EPRI Database recommendations and look for
explanations for the differences. Enter the explanations in the fourth row on the right of the
audit form in “Plant Basis for the Differences”. Keep the numbered items in one-to-one
correspondence with those above. These explanations could be comments such as “Repeat
failures have lead to short interval,” “(2) and (3) more frequent than necessary but driven by
pump intervals,” “No erosion has ever been observed,” “Overhaul has no basis - not been
done since 1988,” “Just increased from 3Y due to consistently good condition,” or “No
specific reason for SOV replacement interval.”
It is quite likely that an interview will be required with an appropriate system, component, or
program engineer to elicit this information.
9. Enter recommendations in the fifth row of the audit form to address the most important
differences. These should usually follow the format “Consider changing abc, to address xyz,”
where a few words of explanation are added from the text of PM Application Notes [EPMB
2.3] to point out the existing vulnerability or lack of cost-effectiveness.
4-3
EPRI Licensed Material
PM Audit Guideline
10. Finally, after all relevant equipment has been processed, write a few additional paragraphs to
add the objectives of the audit, add the names of those interviewed, and summarize the main
findings.
4-4
EPRI Licensed Material
5
EVALUATION OF AS-FOUND CONDITION GUIDELINE
5.1 Context
Evaluation of as-found condition assumes that you have condition reports or trends developed
over a period of time on equipment that includes and closely resembles the equipment in
question in design, environment, and operating conditions. The condition information is to be
evaluated to decide if the PM tasks or task intervals should be modified to prevent failures. If PM
tasks or intervals are determined to require modification, the guidance offered in Section 2 and
Section 3 should be followed.
As-found information is obtained in a wide range of formats, extending from a few words to
detailed health assessments compiled from a variety of condition monitoring techniques. The
examples provided below give an idea of the level of as-found condition information that can be
supported by the EPRI PM Basis Database. It is beneficial to collect such specific information
covering the most common failure mechanisms. In fact, complete sets of as-found data collection
forms have been prepared by EPRI contractors for about 20 of the component types in the EPRI
PM Basis Database.
No matter how the condition information is recorded, it must lend itself to straightforward
interpretation as follows:
x Condition Acceptable - No action needed at this time. The condition of the equipment is
determined to be good enough that there is no need to perform the PM task at this time (that
is, at the existing interval). Equipment performance is judged to continue to be satisfactory
until the next scheduled task implementation even if the current task were not performed.
x Condition Marginal - Equipment may or may not perform acceptably until next time if the
current task is not performed now. Some deterioration exists, which should give no cause for
alarm. The degradation is indicative of normal wear and exposure to anticipated service
conditions and would normally be corrected, either during the current PM task or by a
separate corrective maintenance (CM) work order. In a condition marginal situation, action is
generally taken to improve the condition because without action it is not clear whether it may
become unacceptable before the next time the task is performed. Marginal degradation
indicates that the task content and interval may be approximately appropriate.
x Condition Unacceptable - Action required now. A degraded condition exists that is judged to
be unacceptable at this time. The unacceptable condition does not have to be equivalent to a
failure, but, even though the condition will be corrected at this time, it casts doubt on the
ability of the equipment to function satisfactorily in the future if the current task remains at
5-1
EPRI Licensed Material
the current interval. The designation of “unacceptable” suggests that the risk of failure is too
high for comfort and that corrective action should have been taken earlier.
For example, for a PM task in which a lubricant sample is taken and lubricant level is
monitored, levels of degradation can be expressed with the following organization of
information:
5-2
EPRI Licensed Material
If the measurement has a trip value and a reset value, both may be captured.
The following procedure will assume that the above assignments of acceptable, marginal, and
unacceptable can be determined from the information available. If this is not the case, the
information is not capable of supporting decisions relating to PM tasks and intervals. Consider
the existence of stable trends in condition, if any. Extrapolation of such trends can provide the
answer to items below.
1. Identify the applicable PM task that best addresses the degradation mechanisms underlying
the condition information of interest. This may not be the task during which the condition
was noticed and reported.
As an example, two applicable tasks are an overhaul at 15 years and an inspection every 3
years resulting in an “Unacceptable” condition report involving a degradation mechanism
that develops steadily over 10 years. Even if the degradation is observed during one of the
inspections, that observation may have little impact on whether the inspection interval should
be 2 years, 4 years, or 1 year. On the other hand, its estimated failure-free period might be
very significant to whether the overhaul interval should be changed to 8 years or 10 years.
However, if the failure-free period is not known and cannot be estimated, it may be better to
rely on the inspection to find and monitor the progress of the degradation. The inspection
task then becomes a condition monitoring exercise for this degradation, and the inspection
interval becomes the focus of the evaluation.
An applicable PM task is one that is designed to address the degradation mechanisms being
evaluated. These aspects of as-found condition are the relevant ones for the applicable task.
5-3
EPRI Licensed Material
2. For time-directed tasks (in contrast to condition monitoring activities), the degradation
mechanisms being evaluated must be of the wearout kind (that is, having an expected failure-
free period) if the appearance or absence of the degradation is to have an influence on the
task interval. Failing this, the condition information is not relevant for adjusting the task
interval.
3. If one or more relevant aspects of equipment condition are unacceptable, this will generally
indicate that the interval should be shortened, especially if the condition has been seen before
or has occurred on similar equipment, or is a known frequent cause of failures in industry
experience.
4. If all aspects of equipment condition are , lengthen the interval. An intermediate step would
be to defer the task one time and to reassess the condition. See Section 6,Task Deferral or
Section 3, Interval Evaluation.
5. If some aspects of equipment condition are , and none are unacceptable, do not change the
interval or defer the task.
The Database can provide assistance in answering these questions. In particular, the Correlation
Table [EPMB Table 3.2] will indicate if a degradation mechanism is addressed by other tasks
that are performed (look for “X” across the row) and will also show which other degradation
mechanisms are relevant for each task (look for “X” down the task column).
The Correlation Table will also show in the Time of Failure column whether a degradation
mechanism is expected to possess a failure-free period.
5-4
EPRI Licensed Material
6
PM TASK DEFERRAL EVALUATION GUIDELINE
6.1 Context
The Task Deferral Evaluation is focused on justifying a one-time deferral of a single task, rather
than making a decision to increase the interval permanently. Because the deferral is limited to a
single occasion, the level of risk is generally less than that which would accompany a permanent
change. However, deferrals are often sought for purely logistical reasons and may not be
supported by equipment condition information.
The following procedure assumes that you have no historical adverse equipment condition
information to suggest that the deferred task will lead to an unacceptable equipment condition or
failure.
It will also be assumed that there is no specific reason to suspect that deferring the task will leave
known failure mechanisms unmonitored. Nevertheless, since the possibility exists, two things
will happen. There will be an increase in the failure rate and an increase in the probability of
experiencing a failure during the period of deferral. The failure rate increases with the proportion
of the interval represented by the deferral. The probability of a failure increases with the failure
rate and with the length of the deferral.
This suggests different treatment for critical and non-critical equipment. For an explanation of
critical and non-critical equipment see Section 3.
In both cases it is assumed that the increase in probability of a failure should not be more than
0.1 (that is, 10% probability of a failure in absolute terms). For critical equipment it is also
assumed that the failure rate should not be permitted to increase by more than a factor of two.
This is because an increase of a basic event probability in the plant PSA by a factor of two will
cause a relative increase in the core damage frequency by the Fussel-Vesely (FV) risk-
significance fraction. This should be acceptable on a one-time basis for a limited period except
for the most risk-significant equipment. It is assumed that PM tasks for very risk-significant
equipment (for example, FV>5%) would not be deferred without additional evaluation. The
failure rate criterion is not applied in the case of non-critical equipment.
The following rules were derived from detailed failure rate calculations that took into account the
generic number and distribution of failure mechanisms, random failures from non-wearout
failure mechanisms, task effectiveness, and the proportion of wearout failures addressed by a PM
6-1
EPRI Licensed Material
task. The rules assumed that the latter proportion was conservatively 100%; different
assumptions can be inserted during additional evaluation. The calculations also assumed that
there is no conservatism in the existing PM program in order to maximize the effects of
deferrals.
For non-critical equipment, defer one time without further evaluation, up to the limits shown in
Table 6-1. If the deferral is longer than these limits, it requires evaluation.
Table 6-1
Non-Critical Equipment Deferral Limits
For critical equipment, defer one time without further evaluation, up to the limits shown in Table
6-2. If the deferral is longer than these limits, it requires evaluation. Tasks with intervals of 1.5
years or less should not be deferred without evaluation.
Table 6-2
Critical Equipment Deferral Limits
The above rules should cover almost all cases of interest without the need for further evaluation.
However, any one of the following three methods may be used to justify task deferral outside the
above limits:
x Discover if the equipment condition has consistently been good enough to enable it to reach
the deferred task execution time without failure. This requires data from plant-specific
experience.
x If the interval is sufficiently less than the interval recommended in the Template Form
[EPMB 2.1] so that even with the deferral, the combined period does not exceed 125% of the
Template interval, defer the task. This requires a conservative task interval.
6-2
EPRI Licensed Material
x Evaluate the limit in the above tables using more specific input data:
1. Find the proportion of wearout failure mechanisms that are addressed by the deferred
task. To do this, count the records in the Correlation Table [EPMB Table 3.2] that
have a failure-free period indicated under Time of Failure (that is, those that are not
random), and that also have an X in the relevant PM Task column.
2. Find the total number of wearout modes by adding all those with failure-free
intervals.
3. Divide the number of wearout modes that address the task by the number of total
U
wearout modes. Call this fraction (rho).
A. Select the chart that corresponds most closely to the overall effectiveness of the
task (E). If the degradation addressed by a task is present when the task is
performed, effectiveness means the probability that it is not still present after the
task is performed, that is, that the task succeeds in doing what it is supposed to do.
Most tasks will have an effectiveness around 90% (E = 0.9); this means that you
could perform the task ten times without missing and not correcting the
degradation that you are supposed to find.
The E = 0.7 chart is for tasks that have only a marginal chance of being
successful. The E = 0.95 chart is for tasks that have a better chance than most of
finding the targeted degradation. These might be overhauls, tasks that perform a
very specific test, or tasks that replace subcomponents.
U
B. Select the curve for the value of . Read the percentage increase in failure rate
that corresponds to the percentage you are increasing the interval by deferring the
task.
C. For non-critical equipment, use Z = 0.06 for E = 0.7; Z = 0.02 for E = 0.9;
Z = 0.01 for E = 0.95.
6-3
EPRI Licensed Material
7
CAUSE EVALUATION GUIDELINE
7.1 Context
The EPRI Database contains information on failure locations, degradation mechanisms, and
opportunities for detection of degraded conditions, which can be useful to a cause evaluation. A
cause evaluation normally results from a failure event or from poor performance, such as failing
to meet Maintenance Rule performance criteria. For these cases, the Degradation Table [EPMB
Table 3.1] will provide the best view of the data.
Additionally, the selection and justification of appropriate corrective actions may involve
changes to PM tasks or intervals. This can include justifying why adopting a particular PM task
is not the appropriate corrective action. To optimize corrective actions that consist of selecting
PM tasks, follow the procedures described in Section 2 and in Section 3.
It is assumed that you will either have a particular failure location or degradation mechanism in
mind or be focused on one or more symptoms without being specific about the failure location
and degradation mechanism. Either way, it will be useful to read the following information to
obtain an overview of the EPRI Database capabilities to support cause evaluations.
To see a list of the failure locations and degradation mechanisms that are most likely for this
equipment, look at the Degradation Table [EPMB Table 3.1]. The failure locations and
degradation mechanisms listed are a good way to be sure you have considered the range of
failure locations and mechanisms that might be relevant in a cause evaluation. If the failure
location or degradation mechanism of interest appears in the table, you can discover the factors
that are likely to influence the development of a failure by looking at the column headed
Degradation Influences. You might also want to check if it is among the common failures noted
in the Building A PM Strategy text [EPMB 3.1].
When you are focused on the right records in the Degradation Table [EPMB Table 3.1], look to
the right to find the Discovery Opportunity field. The entries in this field usually contain multiple
items. Almost always, these are discovery opportunities in which the degraded condition can be
detected. By their nature, they frequently correspond to symptoms of the condition. If you start
with a symptom, the best approach is to look for it in the Discovery Opportunity column.
7-1
EPRI Licensed Material
A
INTERVAL CHARTS
A-1
EPRI Licensed Material
Interval Charts
0.025
0.02
Model
MBA
Failure Rate Per Year
0.015
0.01
0.005
0
5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10
Figure A-1
Comparison of Single Mode Model with Accurate Solution (MBA)
A-2
EPRI Licensed Material
Interval Charts
50
45
40
E; B = 0.9; 1
35
Excess Failure Rate (%)
0.8; 1
30 0.9; 2
0.8; 2
25
20
15
10
0
5 10 15 20 25 30 35 40 45 50
Figure A-2
Excess Failure Rate (%) vs. Standard Deviation When Mean of Deviations of Task Time
from Designated Interval Equals Zero
A-3
EPRI Licensed Material
Interval Charts
140
120
E; B = 0.9; 1
0.8; 1
Excess Failure Rate (%)
100
0.9; 2
0.8; 2
80
60
40
20
-10 0 10 20 30 40 50
Mean of Difference Between Actual Task Time and Designated Interval (% of D. Interval)
Figure A-3
Excess Failure Rate (%) vs. Deviation from Interval (%) with Standard Deviation = 12.5%
A-4
EPRI Licensed Material
Interval Charts
140
120
E; B = 0.9; 1
0.8; 1
Excess Failure Rate (%)
100
0.9; 2
0.8; 2
80
60
40
20
Mean of Difference Between Actual Task Time and Designated Interval (% of D. Interval)
Figure A-4
Excess Failure Rate (%) vs. Deviation from Interval (%) with Standard Deviation = 25%
A-5
EPRI Licensed Material
B
PM AUDIT FORMS
B-1
EPRI Licensed Material
PM Audit Forms
MOV
Component IDs: Unit: Component Type: MOV
Category:
Current Plant 1. Diagnostics - Direct Force
Program:
2. Diagnostics - Electrical Force
7. Functional Test
Recommendations:
B-2
EPRI Licensed Material
PM Audit Forms
AOV
Component IDs: Unit: Component Type: AOV
Category:
Current Plant 1. Calibration of Accessories
Program: 2. Packing Inspection / Adjustment
3. Visual External Inspection
4. Diagnostic Scan
5. Internal Leak Detection
6. Ultrasonic Techniques - Minimum Wall Thickness
7. Air Supply Filter Replacement
8. Actuator Assembly Overhaul
9. Replacement of Accessories
10. Valve Assembly Overhaul
11. Packing Replacement
12. Stroke Test (Timed Stroke, SOV & Limit Switch Actuation)
Recommendations:
B-3
EPRI Licensed Material
PM Audit Forms
SOV
Component IDs: Unit: Component Type: SOV
Category:
Current Plant 1. Valve Body Bonnet - Elastomer Replacement
Program:
2. Functional Tests
Recommendations:
B-4
EPRI Licensed Material
PM Audit Forms
Check Valve
Component IDs: Unit: Component Type: Check Valve
Category:
Current Plant 1. Diagnostic Tests
Program:
2. Radiography
5. Overhaul
6. Functional Tests
Recommendations:
B-5
EPRI Licensed Material
PM Audit Forms
PORV - SOV
Component IDs: Unit: Component Type: PORV - SOV
Category:
Current Plant 1. Calibration
Program:
2. Acoustic and Tail Pipe Temperature Monitoring
3. Overhaul Valve
Recommendations:
B-6
EPRI Licensed Material
PM Audit Forms
PORV - Pneumatic
Component IDs: Unit: Component Type:
PORV - Pneumatic
Category:
Current Plant 1. Calibration
Program:
2. Diagnostic Tests
4. Packing Replacement
Recommendations:
B-7
EPRI Licensed Material
PM Audit Forms
Safeties - Spring
Component IDs: Unit: Component Type:
Safeties - Spring
Category:
Current Plant 1. Set Point Verification
Program:
2. Operability Test
Intervals:
Recommendations:
B-8
EPRI Licensed Material
PM Audit Forms
Horizontal Pump
Component IDs: Unit: Component Type:
Horizontal Pump
Category:
Current Plant 1. Vibration Analysis
Program:
2. Oil Analysis
3. Performance Trending
5. Coupling Inspection
8. Partial Disassembly
9. Refurbishment
Recommendations:
B-9
EPRI Licensed Material
PM Audit Forms
Vertical Pump
Component IDs: Unit: Component Type: Vertical Pump
Category:
Current Plant 1. Vibration Analysis
Program:
2. Performance Trending
6. Refurbishment
7. Functional Testing
Recommendations:
B-10
EPRI Licensed Material
PM Audit Forms
Recommendations
B-11
EPRI Licensed Material
PM Audit Forms
3. Coupling Inspection
8. Functional Tests
Recommendations:
B-12
EPRI Licensed Material
PM Audit Forms
HV Motor
Component IDs: Unit: Component Type: HV Motor
Category:
Current Plant 1. Thermography
Program: 2. Vibration Monitoring
3. Oil Analysis
4. Electrical Tests - On-Line
5. Mechanical Tests - On-Line
6. Electrical Tests - Off-Line
7. Mechanical Tests - Off-Line
8. External Visual Inspection
9. Partial Disassembly and Inspection
10. Partial Refurbishment
11. Refurbishment
12. Functional Tests
Recommendations:
B-13
EPRI Licensed Material
PM Audit Forms
MV Motor (4KV)
Component IDs: Unit: Component Type:
MV Motor (4KV)
Category:
Current Plant 1. Thermography
Program: 2. Vibration Monitoring
3. Oil Analysis
4. Electrical Tests - On-Line
5. Mechanical Tests - On-Line
6. Electrical Tests - Off-Line
7. Mechanical Tests - Off-Line
8. External Visual Inspection
9. Partial Disassembly and Inspection
10. Partial Refurbishment
11. Refurbishment
12. Functional Tests
Industry PM Basis 1. Thermography
Tasks and 2. Vibration Monitoring
3. Oil Analysis
Intervals:
4. Electrical Tests - On-Line
5. Mechanical Tests - On-Line
6. Electrical Tests - Off-Line
7. Mechanical Tests - Off-Line
8. External Visual Inspection
9. Partial Disassembly and Inspection
10. Partial Refurbishment
11. Refurbishment
12. Functional Tests
Plant Basis for the
Differences:
Recommendations:
B-14
EPRI Licensed Material
PM Audit Forms
LV Motor
Component IDs: Unit: Component Type: LV Motor
Category:
Current Plant t
1. Thermography (motors 200 Hp)
Program:
t
2. Vibration Monitoring (motors 200 Hp)
t
3. Oil Analysis (motors 200 Hp)
t
5. Electrical Tests - Off-Line (motors 200 Hp)
6. Brush Maintenance
8. Functional Testing
Industry PM Basis 1. t
Thermography (motors 200 Hp)
Tasks and 2. t
Vibration Monitoring (motors 200 Hp)
Intervals: 3. t
Oil Analysis (motors 200 Hp)
4. Electrical Tests - Off-Line (motors <200 Hp)
5. t
Electrical Tests - Off-Line (motors 200 Hp)
6. Brush Maintenance
7. External Visual Inspection
8. Functional Testing
Recommendations:
B-15
EPRI Licensed Material
PM Audit Forms
DC Motor
Component IDs: Unit: Component Type: DC Motor
Category:
Current Plant 1. Vibration Monitoring
Program:
2. Insulation and Winding Resistance Test
4. Brush Maintenance
5. Functional Testing
Recommendations:
B-16
EPRI Licensed Material
PM Audit Forms
MCC
Component IDs: Unit: Component Type: MCC
Category:
Current Plant 1. Thermographic Scan (Buckets and MCC Housing)
Program:
2. Clean, inspect, tighten, cycle (Buckets and MCC Housing)
Recommendations:
B-17
EPRI Licensed Material
PM Audit Forms
MV Breaker
Component IDs: Unit: Component Type: MV Breaker
Category:
Current Plant 1. Thermography - Breaker and Cubicle including bus
Program:
2. Breaker - Visual Inspection
4. Breaker - Overhaul
6. Cubicle - Overhaul
8. Functional Test
Recommendations:
B-18
EPRI Licensed Material
PM Audit Forms
LV Breaker
Component IDs: Unit: Component Type: LV Breaker
Category:
Current Plant 1. Thermography - Breaker and Cubicle including bus
Program:
2. Breaker - Visual Inspection
4. Breaker - Overhaul
6. Functional Test
Recommendations:
B-19
EPRI Licensed Material
PM Audit Forms
3. Detailed Inspection
Recommendations:
B-20
EPRI Licensed Material
PM Audit Forms
3. Cell Inspection
4. Detailed Inspection
6. Total Replacement
Recommendations:
B-21
EPRI Licensed Material
PM Audit Forms
Nickel-Cadmium Battery
Component IDs: Unit: Component Type:
Nickel-Cadmium Battery
Category:
Current Plant 1. Battery Monitoring
Program:
2. Cell Inspection
3. Detailed Inspection
Recommendations:
B-22
EPRI Licensed Material
PM Audit Forms
Charger
Component IDs: Unit: Component Type: Charger
Category:
Current Plant 1. Thermography (includes visual on-line inspection)
Program:
2. Clean and Inspect
3. Component Replacement
4. Breaker Testing
Recommendations:
B-23
EPRI Licensed Material
PM Audit Forms
Inverter
Component IDs: Unit: Component Type: Inverter
Category:
Current Plant 1. Thermography (includes visual on-line inspection)
Program:
2. Clean and Inspect
3. Component Replacement
4. Breaker Testing
Recommendations:
B-24
EPRI Licensed Material
PM Audit Forms
Heat Exchanger
Component IDs: Unit: Component Type:
Heat Exchanger
Category:
Current Plant 1. Performance Monitoring
Program:
2. NDE Inspection
4. Internal Inspection
5. Cleaning
6. Leak Testing
Recommendations:
B-25
EPRI Licensed Material
PM Audit Forms
Main Condenser
Component IDs: Unit: Component Type:
Main Condenser
Category:
Current Plant 1. Performance Monitoring
Program:
2. NDE Inspection
3. Waterbox Inspection
4. Hotwell Inspection
5. Cleaning
Recommendations:
B-26
EPRI Licensed Material
PM Audit Forms
Feedwater Heater
Component IDs: Unit: Component Type:
Feedwater Heater
Category:
Current Plant 1. Performance Monitoring
Program:
2. NDE Inspection
3. Internal Inspection
Recommendations:
B-27
EPRI Licensed Material
PM Audit Forms
4. Internal Inspection
5. Overhaul
6. Functional Tests
Recommendations:
B-28
EPRI Licensed Material
PM Audit Forms
3. Internal Inspection
4. Overhaul
5. Functional Tests
Recommendations:
B-29
EPRI Licensed Material
PM Audit Forms
3. Oil Analysis
4. Vibration Analysis
5. Calibration
7. Internal Inspection
Recommendations:
B-30
EPRI Licensed Material
PM Audit Forms
4. Internal Inspection
Recommendations:
B-31
EPRI Licensed Material
PM Audit Forms
3. Calibration
4. Performance Monitoring
5. Thermography
8. Internal Inspection
Recommendations:
B-32
EPRI Licensed Material
PM Audit Forms
Control Relay
Component IDs: Unit: Component Type: Control Relay
Category:
Current Plant 1. Thermography
Program:
2. As-Found Testing and Calibration, Testing only for non-
critical (test is operability only)
3. Functional Test
4. Replacement
Recommendations:
B-33
EPRI Licensed Material
PM Audit Forms
Protective Relay
Component IDs: Unit: Component Type:
Protective Relay
Category:
Current Plant 1. As-Found Testing and Calibration
Program:
2. Replacement of Electrolytic Capacitors
Recommendations:
B-34
EPRI Licensed Material
PM Audit Forms
Timing Relay
Component IDs: Unit: Component Type: Timing Relay
Category:
Current Plant 1. Thermography
Program:
2. As-Found Testing and Calibration
3. Functional Testing
Recommendations:
B-35
EPRI Licensed Material
PM Audit Forms
Recommendations:
B-36
EPRI Licensed Material
PM Audit Forms
5. Lubrication
7. Turbine Overhaul
Recommendations:
B-37
EPRI Licensed Material
PM Audit Forms
5. Calibration
B-38
EPRI Licensed Material
PM Audit Forms
Recommendations:
B-39
EPRI Licensed Material
PM Audit Forms
Recommendations:
B-40
EPRI Licensed Material
C
TEMPLATE SENSITIVITY TABLE
C-1
EPRI Licensed Material
KEY:
This shade indicates when the component is driven by another plant program or T = Difference in tasks
follows another component’s criticality. Criticality is important but no separate
assessment is done.
This shade indicates when only minor differences in task intervals exist. The I = Difference in
component is not worth a separate assessment of the attribute. intervals
This shade indicates when tasks and intervals do not depend at all on the attribute. I,M = Difference in
A separate assessment is not needed. intervals, but minor.
COMPONENT TYPE VOL CRITICALITY DUTY SERVICE
(PM CYCLE CONDITIONS
Basis) (CRITICAL) (CRITICAL)
AOV 1 T I I
Medium voltage switchgear 2 I,M I,M
Low voltage switchgear 3 I,M I,M
MCC (follow criticality of load) 4 T
Check (follow GL89-04) 5 I I,M I
MOV 6 T I I
SOV (follow criticality of AOV 7 T I
or other actuated equipment)
Low voltage motor 8 T I,M I,M
Medium voltage motor 9 I,M I
High voltage motor 10 I,M I
DC motor 11 T I
Vertical pump 12 I
Horizontal pump 13 I T
Reciprocating air compressor 14
Rotary screw air compressor 15
PORV - solenoid 16 T
PORV - pneumatic 17
Safety reliefs 18
HVAC chillers & compressors 19 T
HVAC dampers & ducting 20 T T
HVAC air handlers 21 T
Inverter 22 I I
Charger 23 I I
Battery flooded lead acid 24 T T
Battery - valve regulated lead 25 T
acid
Battery Nicad 26 I,M
Liquid ring pump 27 T I
Positive displacement pump 28
Relays protective (simplify 29 I I
criticality assignment)
Relays control (simplify 30 T I I
criticality assignment)
Relays timing (simplify criticality 31 T I,M
assignment)
Heat exchangers 32 T T T
Feedwater heaters 33 I
Condenser 34
Main feedwater pump turbines 35
Single stage terry turbines 36 I
Turbine EHC hydraulic controls 37
Station type oil-filled 38 T,I T,I I
transformers
C-2
EPRI Licensed Material
D
A STRATEGY TO MANAGE PM TASKS WITHIN A
GRACE PERIOD
The interval charts and other numerical estimates used in this guideline are based on a series of
generic models of failure rates and their response to preventive maintenance, developed
specifically for this application. These models go by the names of “Effective Maintenance
Model,” “Run To Failure Model,” and “Missed Modes Model.” These models are described in
the following EPRI white paper, which was written to respond to utility requests for guidance on
the use of a grace period for the implementation of scheduled PM tasks. The grace period
analysis was performed by superimposing a statistical distribution of task performance times on
the above three models. The combination was given the name “KGB Model.” The paper is
reprinted here in its entirety because the maintenance and failure rate models support the
application guideline and the conclusions of the grace period analysis are relevant to many
utilities.
D-1
EPRI Licensed Material
David H. Worledge
Applied Resource Management
Background
Practical constraints result in some PM tasks at nuclear power plants being performed later than
scheduled. This is unavoidable even in good maintenance programs where the PM intervals are
optimal or conservative. To limit the risk of additional failures, most plants adopt a “grace
period” for performing a PM task, limited to (for example) 25% beyond the scheduled time. PM
tasks delayed longer than the grace period are reported as delinquent.
Some plants schedule the tasks at intervals that are 20% shorter than the technically optimal
intervals so that a grace period 25% beyond the scheduled interval still meets the intent of the
optimal interval. Consequently, most PM tasks are scheduled and performed considerably sooner
than their optimal intervals in order to reduce almost to zero the number that become delinquent.
This trend adds to PM costs and may harm reliability by introducing unnecessary maintenance
error.
The objective of this paper is to generate a reasonable strategy from a reliability perspective,
which plants can adopt as policy, regarding, 1) how long the grace period should be, 2) when
tasks should be performed within the grace period, 3) how to track the plant performance in
meeting this goal, and 4) the degree to which the overdue date might be exceeded.
The Appendix describes a generic numerical model to assess the impact of PM task intervals on
reliability. The rest of the paper uses the model results to determine if current industry practice is
reasonable, and how it might be improved.
Intrusive PM tasks performed before their technically optimal task intervals are likely to increase
the failure rate because maintenance error and material defects are introduced more often. The
effect (infant mortality) is commonplace for a wide range of equipment (e.g. switchgear, AOV’s,
check valves, relays), and is evidenced by significant levels of rework soon after a maintenance
outage (Corio, Marie R. And Costantini, Lynn P.,”Frequency and Severity of Forced Outages
Immediately Following Planned And Maintenance Outages”, Generating Availability Trends
Summary Report, NERC, 1989). Therefore scheduling intrusive tasks too soon is detrimental.
Nevertheless, to prevent many tasks from becoming delinquent it is a practical necessity to
perform a significant proportion of all tasks before their optimal intervals.
If an intrusive task is performed 20% earlier than its optimal interval, the infant mortality part of
the failure rate, which is already roughly equal to the best failure rate that good PM can produce
D-2
EPRI Licensed Material
(see Appendix), will increase by a commensurate 20%, regardless of the fact that no “naturally
occurring” failure modes are expected.
We distinguish two cases, in both of which the technically optimal interval is that beyond which
wearout failure modes can be expected to occur.
grace
some failures expected
Because infant mortality erodes the benefits of good PM, in case A the degree of conservatism
(and hence the grace period), should not exceed an amount which, following current industry
practice, we will initially consider to be 20% of the technically optimal interval, (i.e. 25% of the
scheduled interval). Less is better if it is also practical.
In case B, we also initially consider the grace period to be 25% of the scheduled interval, but in
this case, no infant mortality considerations arise. Instead, there is a concern that reliability may
be worsened, because wearout failure modes could in principle occur during the grace period.
We will assume that despite this residual concern, no wearout failure modes are actually known
to occur with high probability within the grace period. This is a good assumption for grace
periods in nuclear power plants.
Both cases are considered because they are common in the industry.
D-3
EPRI Licensed Material
Case A: Within the grace period it is advantageous from a reliability perspective to perform
intrusive tasks as close as possible to the end of the grace period (i.e. to the overdue date), so
they are not performed too frequently. Condition monitoring tasks and other non-intrusive tasks
may be performed sooner with little detrimental impact on reliability. Since some intrusive tasks
must still be performed before others, it is better that these be the tasks with the longer intervals,
because these will represent a smaller proportionate increase in failure rate. For example, 90 days
before the overdue date is a 17% shortening for an 18-month interval, but only a 2.5% shortening
for a 10-year interval.
Case B: Within the grace period it is advantageous from a reliability perspective to perform all
tasks as close as possible to the start of the grace period (i.e. to the scheduled date). Since some
tasks must be performed before others, it is better that these be the tasks with the longer
intervals, because these will represent a smaller proportionate increase in failure rate. For
example, 90 days before the overdue date is an 8.3% extension for an 18-month interval, but is a
22.5% extension for a 10-year interval.
In case A, performing PM tasks before their overdue date confers no reliability benefit. The sole
benefit is the practical matter of avoiding too many overdue tasks. Consequently, plants need
track only those tasks approaching the overdue date, and only to the degree that it facilitates task
implementation to avoid delinquency. For example, tracking tasks within 90 days of the overdue
date could be a solution.
The number of tasks permitted to be within 90 days of their overdue date could be limited to
somewhere in the range 50 to 200, depending on plant experience with getting tasks completed.
There does not seem to be a useful purpose in limiting the overall number of tasks in the whole
grace period, since there is no reliability penalty for being “in grace.”
In case B, performing PM tasks before their overdue date does reduce the reliability disbenefit of
exceeding the scheduled date. The following section, “Exceeding The Due Date”, puts this
concern into quantitative perspective, and demonstrates that performing tasks up to 25% beyond
their due date does not lead to a significant reliability increase.
Tracking tasks that are within 90 days of the overdue date could be a practical solution. The
number of tasks permitted to be within 90 days of their overdue date should also be limited,
depending on plant experience with getting tasks completed. In this case, there is also a useful
purpose in limiting the overall number of tasks in the whole grace period, since there is a
reliability penalty for being “in grace.”
D-4
EPRI Licensed Material
In case A , if the population of actual task performance times is centered anywhere between the
scheduled intervals and the optimal task intervals (with standard deviation 12.5%), the average
failure rate increases by about 6% at the most. In case B, if most of the population is positioned
between the scheduled date and the overdue date but with 15% of the components in the grace
period past the overdue date, the overall failure rate would be increased by only 20%.
Moreover, a specific component which does not get its task performed until 25% (this is 2
standard deviations if the mean is at the optimal interval) beyond the overdue date in case A, or
25% beyond the scheduled date in case B, experiences an increase in failure rate of no more than
about 30%. In fact, it is the relatively slow response of failure rate to increasing interval that
permits the possibility of finding the right interval by trial and error without excessive danger
from overshooting.
The results indicate that it should be possible to permit a certain proportion of tasks to be
performed beyond the optimal date without significant harmful effect. It is suggested that the
strategy to be followed should avoid designating tasks as delinquent unless their performance
times exceed some limit beyond the overdue date in case A. The model results show that even if
15% of the components in the grace period go past 125% of the overdue date in case A, the
overall failure rate would be increased by only 20%. In case B, a delinquent component should
be one that has not received its PM task by the overdue date. Even then, if 15% of the
components in the grace period become delinquent, the overall failure rate would be increased by
only 20%.
Proposed Strategy
The proposed strategy would focus on completing, by the technically optimal date, (i.e. by the
overdue date in case A, and by the scheduled date in case B) PM tasks which:
D-5
EPRI Licensed Material
1. Are for risk significant components because their Fussel-Vesely (FV) parameter >0.5%
4. Are known to be needed to prevent a known high risk of failures, e.g. replacing head valves
in reciprocating compressors at some plants.
In case A, the due date (date scheduled) would be programmed at no more than 20% less than
the overdue date. The grace period would be the time between these dates. There would be no
negative connotation attached to being in the grace period. The grace period exists only to focus
on completing tasks by the overdue date. This case is more expensive to implement than case B,
and does not contain as much conservatism as might be expected from the adoption of task
intervals, which are shorter than the technically optimal intervals. The negative impact on
reliability of infant mortality is likely to cancel out the benefits of conservative task intervals.
In case B, the due date would be the technically optimal interval. The grace period would extend
an additional 25% of this interval. There is a disbenefit to being in the grace period, but this is
moderate and controlled by other steps. This case is less expensive to implement than case A.
In both cases, the number of tasks that are within 90 days of the overdue date could be tracked
and limited to a number in the range 50 to 200, depending on plant experience. In case B only,
the number of tasks that are in grace should also be limited to an overall maximum.
In case A, control workflow so that intrusive tasks with short intervals (e.g. 2 years or less) are
preferentially completed during this 90 day window, and not before, i.e. as close as possible to
the overdue date. Condition monitoring, and non-intrusive tasks could be performed earlier
rather than later in the grace period to assist in workflow management.
In case B, control workflow so that tasks with longer intervals (e.g. 3 years or more) are
preferentially completed before this 90-day window.
Screen tasks during the 90-day period before the overdue date so as to prevent tasks, which are
of the following type from going over the overdue date:
Tasks which:
1. Are for risk significant components because their Fussel-Vesely (FV) parameter >0.5%
4. Are known to be needed to prevent a high risk of failures, e.g. replacing head valves in
reciprocating compressors at some plants.
D-6
EPRI Licensed Material
In both cases, permit some of the other tasks normally in the grace period to go over the overdue
date if necessary for practical reasons (e.g. spare parts not available), without being declared
delinquent. Limit the total number of tasks to go beyond their overdue date to be no more than
15% of the total in the grace period.
Establish an upper time limit equal to the overdue date plus 25% in case A, and the overdue date
plus 15% in case B, (note that this is a proportion of the scheduled interval, not an absolute
number of days), beyond which any task would be declared delinquent.
D-7
EPRI Licensed Material
The method depends on a few observations that stemmed from the EPRI PM Basis Database. A
complex component (e.g. a motor or valve, etc.) has a large number of failure mechanisms,
divided between wearout failure mechanisms (which have an expected period of failure free
operation before failures start to be observed), and random failure mechanisms, which can occur
at any time. Further, the expected failure free periods for the wearout mechanisms seem to
occupy all time scales available, i.e. they range from less than one year to the design life of the
equipment, say 40 years.
At the end of a failure free period a given wearout mode has some probability each year of
producing a failure. If you waited long enough (this could be 100 years or more), and did no PM,
you could be pretty certain that each mode would have produced a failure. Generally, modes
with short failure free periods (e.g. 1 year) will have higher subsequent annual failure
probabilities than modes with longer failure free periods (e.g. 15 years).
If you want to calculate the expected number of failures per year when no preventive
maintenance is performed, you would expect this failure time distribution to play a major role.
Since no one normally possesses such information, such an RTF (run-to-failure) reliability
prediction is not normally attempted. However, if a PM task is performed on a regular schedule,
with an interval not too different from the failure free period, it is clear that a reliability
calculation will depend far less on the details of the failure time distribution. This is because an
effective PM task will discover emerging degradation and correct it, thus restoring the
component to something approaching an as-new condition. This will take place in the early part
of the failure time distribution so the bulk of its form and magnitude will scarcely be sampled in
this situation.
Failure rates are also affected by whether the PM task is actually always performed, or
performed on time, as well as by personnel errors which result in degradation not being
D-8
EPRI Licensed Material
recognized, repairs which are ineffective, defects and faults being introduced during the tasks, as
well as by task intervals which are longer than they should be, and by modes which are not
addressed by any task. Furthermore, although experienced maintenance personnel usually have a
good idea of the most likely failure free periods to expect, modes can still be affected by many
factors which change their failure free intervals in ways which are hard to predict.
These observations suggest that a realistic maintenance decision model could be constructed
using a uniform distribution of failure free periods, and an overall effectiveness for each PM
task. This effectiveness, E, would be the probability of diagnosing degradation and successfully
correcting it, when such degradation exists and the task is performed. We would expect this
parameter to be in the range 75% to 95%.
An individual wearout failure mode with failure free interval n years has a failure time
distribution taken as uniform starting at n years and stretching out for another 2n years, so that
normalization requires it to have an annual failure probability of 1/2n per year. All failure rates
calculated in the model will be proportional to this probability, but the results are presented as
ratios of failure rates so that the impact of this assumption is greatly reduced.
Assuming that Nw modes are active for a component, and that these have failure free periods
uniformly distributed between some minimum, m years, and an upper limit of 40 years, there
will be Nw/(40-m) modes “starting up” each year on average.
In the interval between I and 2I, we expect to get (1-E)/2n failures per year from any mode with
failure free period, n, and these will endure for (2I-n) years until the task is performed again. We
expect this to happen every interval, so the failure rate for a single mode is thus
/ e1 = (1-E)(2I-n)/2nI (1)
D-9
EPRI Licensed Material
A chart of this relation against an accurate solution (using MBA software) to the underlying
alternating renewal process is shown in Figure 1 for I = 5 years. Note that MBA gives a larger
rate because it includes contributions of order (1-E)2 and higher.
The chart shows that the model is only a few percent non-conservative (i.e. predicting low) for
shorter modes, and becomes more so for longer modes. When this result is integrated across a
spectrum of modes, the shorter ones dominate, giving a result, below, that is a reasonable
representation (i.e. within a few percent) of the underlying renewal process.
Since Nw.dn/(40-I) modes start up in dn years, the total contribution to the failure rate from all
modes that can contribute is
2I
Since I is usually much less than 40 years the dependence on task interval is weak. Equation (2)
gives a failure rate of about 0.02 failures per year for effective PM with intervals up to 20 years,
when E=80% and Nw = 20. This is close to experience.
To this must be added the random rate, which cannot be protected with time-directed PM tasks.
If we assume that it is not cost effective to continue to reduce the wearout failures below the
/ /
level of the remaining random component we would conclude that r = B e with B ~ 1 or 2, so
/
that the total effective maintenance failure rate must be close to (1 + B) e ~ 0.04 or 0.06 failures
per year (i.e. 17 to 25 years between failures when well maintained).
Risk of Performing PM
Intrusive PM tasks run a risk of introducing additional failures. A simple treatment enables the
most important conclusion to be drawn. Consider that performing an intrusive task introduces an
additional failure with a probability Pim (subscript for infant mortality). This applies each time the
/
task is performed so it increases the failure rate on average by im = Pim / I, where I is the task
interval. The parameter Pim is likely to be in the range 5% to 15% for a wide range of equipment.
/ /
A value of Pim = 0.1 with a 5 year interval adds im = 0.02 failures /year to (1 + B) e, above; an
amount that equals the effective maintenance failure rate. If the interval is unnecessarily
/
decreased from 5 years to 4 years, im will increase by 20%, a significant erosion of effective
PM. Other values of Pim and I give a similar conclusion.
When there is no PM, but failures are repaired in a time short compared to the mean time
between failures, renewal theory provides an asymptotic solution for the above single mode that
has a uniform time to failure distribution from n years to 3n years.
D-10
EPRI Licensed Material
The general approach from this point is to develop a ”Missed Modes Model” which will add the
effect of failure modes which cause failures because the PM interval is too long. Such modes
have nothing to prevent them from occurring and can greatly increase the failure rate. Armed
with the Effective Maintenance and Missed Modes results we will impose a distribution of times
at which the tasks actually get performed.
We envisage a single component with a set of failure modes, as before. The shortest failure free
interval is at m years. The others are distributed uniformly between m and 40 years. Since the
PM task interval is I>m, modes with m < failure free interval < I are “missed” by the task and so
are not attenuated by the factor (1-E). Modes with failure free interval > I would be treated as
effective maintenance in the manner shown above.
Failures will accrue in the first interval at 1/2n per year per mode, for a total of (I-n) years.
Integrating over all contributing modes gives a failure rate of:
/m = [Nw. /2I(40-m)] . ³m (I – n) dn /n
or /m D D
= [Nw. /2(40- I)].[ - (1 + ln D)] D = m/I
D-11
EPRI Licensed Material
/ e
’
= [(1-E). Nw /2I(40-m)] . ³, (2I – n) dn /n
so that /
m D D
= [Nw /2(40- I)].[ - (1 + ln D)]+/ e D
(40 – I)/(40 – I) (7)
The ratio of this total rate to the effective maintenance and random rate is:
D D
Ratio = {[Nw /2(40- I)].[ - (1+ln )] + D / e [(40–I)/(40– I) +B]} /D / (1+B)
e (9)
/
with e given by equation (2). The value of this ratio, minus 1, shows the fractional increase in
the rate and is a prototype of the ratio we shall seek in the KGB population model.
KGB Model
The KGB model introduces a group of components which have individual task performance
times, xj, which do not necessarily equal their own task interval, Ij. Throughout, it will be
J
convenient to use the dimensionless parameter = (xj – Ij)/Ij to represent the fraction by which
J
any given task time exceeds its interval. Consequently, = 0,1 for all xj = Ij, 2Ij, and is the N
J
standard deviation in the space of an assumed normal distribution of . There is no further need
to use the subscripts j. We assume that the task interval represents the effective maintenance
case, i.e. the shortest mode occurs at I. Components whose PM task is performed before this time
are effectively being dealt with by effective maintenance because all modes are longer than the
task time. Components whose task time is later than the designated interval possess missed
modes, and so are treated with the missed mode model, plus a modified effective maintenance
model for the modes which arise after the task performance.
Complications which arise include, 1) a normalization shift when the tails of the (infinite) normal
distribution overlap the practical bounds of the problem at x = 0 and x = 2I, and 2) whether to
correct the effective maintenance model to allow for the restricted range of modes since the
shortest mode is at I > task time.
D-12
EPRI Licensed Material
Consider the following figure showing a task time of x to (x + dx) for general guidance:
Missed
Effective Maintenance Modes Modified Effective Maintenance
dx
0 I x 2I x'
-1 0 γ 1 γ
Distribution of Modes
I n
When x > I a missed mode at n provides an annual contribution of 1/2n failures for (x-n) years.
Modes with n from I to x contribute. The rate is thus:
x
/ (x)’ = [N
m w / (2I.(40 – I)] . ³, (x – n) dn /n
/ (J)’ = [N
m w J
/ (2.(40 – I)] . [(1 + ) ln (1 + ) - ] J J
The average of this quantity over the N components contributing to it is:
1
I
where N is the number of components in the population, is a normalization adjustment, and f( ) J
I
is the population normal distribution. and f( ) are given by: J
J
f( ) = [1/( N S@exp [–(J – J) /N ] dJ
2 2
+1
I = ³ f(J) dJ
D-13
EPRI Licensed Material
The region to the right of x in the above diagram represents effective maintenance, but only for
modes arising at times greater than x. Contributing modes each add failures for a time (x + I – n)
and these contributions should be integrated from x to (x + I):
x+I
/ (x)’ = [N
e2 w .(1 – E) / (2I.(40 – I)] . ³x (x + I – n) dn /n
/ (J)’ = [N
e2 w J J
.(1 – E) / (2.(40 – I)] . [(2 + ) ln {(2 + )/(1 + )} - ]J
The average of this quantity over the N components contributing to it is:
+1
Components which have the PM task performed earlier than the shortest mode, i.e. with x < I,
have effective PM with two qualifications. The first is that performing the task too frequently can
add a significant number of failures, increasing Nw and B in a way that can not be modeled. The
second is that when a mode adds failures, attenuated by the factor (1-E), the modes that
contribute are not the full spectrum from x to (x + I), as before, because the shortest mode is at I
> x. These two effects oppose each other. The second can be calculated but without the first, the
failure rate would be artificially reduced.
Consequently, the contribution from these components has been assumed to be the normal
effective maintenance rate of equation (2) times the number of components in this part of the
group:
0
The new total failure rate, / (J N), is obtained by adding the separate rates:
T
A fractional excess failure rate can be examined to find the percentage change in the rate
compared to having all components maintained at the effective maintenance rate:
Excess Ratio ( J N % ( = [/ (J N) - (1 + B)N/ ] / (1 + B)N/
T e e
D-14
EPRI Licensed Material
The Excess Ratio does not depend on Nw, nor on N, because they cancel in the ratio.
Note Added: The model is called the KGB model because it describes the effects of a population
distribution whose main parameters are called K and GB (gamma bar) in the software.
Results
Figure 2 shows the percentage Excess Ratio for 4 sets of E;B values as a function of standard
deviation, when the population mean is in fact, equal to the designated task intervals.
When the population standard deviation is 25% of the designated intervals (25 on x axis), the
worst case shown has an increase of 15% in the number of failures per year.
If the initial base case were less effective than an optimized PM program this change would be
smaller. For 80% effective tasks, and a random contribution that is twice the effective
maintenance rate, the population would need to spread to a standard deviation of more than 50%
of the intervals in order to increase the failure rate by 15%.
Figures 3 and 4 show the general behavior of the Excess Ratio. Using these results and the fact
that 16% of a normal distribution lies beyond one standard deviation, suggests the rule of thumb:
“Provided no more than 15% of PM tasks are executed beyond 125% of the optimal
interval, the increase in failure rate will most likely be less than 20%.”
Of course, these rules hold only as well as our assumptions about the values of E and B, a normal
distribution of task execution times, and a uniform distribution of failure free periods for wearout
failure modes. However, these were reasonable assumptions for generic PM programs and
generic components. Furthermore, the baseline PM program was assumed to be well optimized
to give the most sensitivity to the distribution of task times.
For PM programs which are not so well optimized, e.g. where task intervals already have a
conservatism built into them, or where PM tasks are less effectively performed, where the
random background of failures is higher, or where the infant mortality effects of decreased
intervals is present, the effects of delayed PM tasks will be smaller than estimated above.
D-15