Download as pdf or txt
Download as pdf or txt
You are on page 1of 76

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.

com - For Personal Use Only - Not For Distribution

Strategy and Planning


Asset Management Decision-Making

Lifecycle Delivery

Reliability Engineering

Asset Information Version 1.1 October 2016


Organisation and People
Risk and Review

Managing the
Organisation

Asset
Management

Asset
Management
System
16
Asset
Portfolio

© Copyright The Institute of Asset Management 2016. All rights reserved www.theIAM.org
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

About the IAM Disclaimer


The Institute of Asset Management (the The IAM publishes this document for
IAM) is a not-for-profit, professional body. the benefit of its members and the
We are owned and controlled by our public. This document is for guidance
Members and committed to remaining and information only. The IAM and
independent from commercial and trade their agents, servants or contractors do
associations. We exist to advance the not accept any liability for any losses
discipline of Asset Management, not only arising under or in connection with this
for people and organisations involved in information. This limit on liability applies
the acquisition, operation and care of to all and any claims in contract, tort
physical assets but also for the benefit of (including negligence), misrepresentation
the general public. Our priorities are to (excluding fraudulent misrepresentation),
promote the generation and application breach of statutory duty or otherwise.
of knowledge, training and good This limit on liability does not exclude or
practice and to help individuals become restrict liability where prohibited by the
demonstrably competent. law nor does it supersede the express
terms of any related agreements.
Copyright
All copyright and other intellectual Acknowledgments
property rights arising in any information This Subject Specific Guidance (SSG)
contained within this document are, has been produced by the Institute of
unless otherwise stated, owned by The Asset Management (IAM) through the
Institute of Asset Management Ltd or significant efforts of many individuals and
other companies in The Institute of Asset organisations. The Institute would like to
Management Ltd group of companies. thank the following in particular for their
No part of this publication may be contributions.
reproduced in any material form (including
photocopying and restoring in any
medium or electronic means and whether
or not transiently or incidentally) without
the written permission of The Institute of
Asset Management Ltd.

Development Team Reviewed and endorsed by


Mark Knight – CGI
Allan Mornement – Pigott & Associates
Duncan Maxwell – PEME
Flori Mihai – Main Roads WA
Thiagarajan Karthikeyan – JK Asset
Management Ltd
James Hilton – AstraZeneca
Ray Galeozzie – LSC Group Limited
Mark Norris – Atkins
Mark Thompson – Southern Water

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
ii © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

The scope of Asset Management


Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

Strategy & Planning

Asset Information

©C
Copyright
i ht 2014 IInstitute
tit t off A
Assett M
Managementt ((www.theiam.org/copyright)
th i / i ht)

Group 1 Group 3 Group 5


1. Asset Management Policy 11. Technical Standards & Legislation 26. Procurement & Supply Chain
2. Asset Management Strategy 12. Asset Creation & Acquisition Management
& Objectives 13. Systems Engineering 27. Asset Management Leadership
3. Demand Analysis 14. Configuration Management 28. Organisational Structure
4. Strategic Planning 15. Maintenance Delivery 29. Organisational Culture
5. Asset Management Planning 16. Reliability Engineering 30. Competence Management
17. Asset Operations
18. Resource Management
Group 2 19. S hutdown & Outage Management Group 6
20. Fault & Incident Response
6. Capital Investment Decision- 31. Risk Assessment & Management
21. Asset Decommissioning & Disposal
Making 32. Contingency Planning & Resilience
7. Operations & Maintenance Analysis
Decision-Making 33. Sustainable Development
8. Lifecycle Value Realisation Group 4 34. Management of Change
9. Resourcing Strategy 22. Asset Information Strategy 35. Assets Performance & Health
10. Shutdowns & Outage Strategy 23. Asset Information Standards Management
24. Asset Information Systems 36. Asset Management System
25.  ata & Information Management
D Monitoring
37. Management Review, Audit
& Assurance
38. Asset Costing & Valuation
39. Stakeholder Engagement

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. iii
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

Acknowledgements ii
Contents
1 INTRODUCTION TO SUBJECT SPECIFIC GUIDANCE 1
1.1 Purpose of the SSGs 1
1.2 The SSGs in context 1
1.3 SSGs and the issue of Complexity versus Maturity 1
1.4 Further reading 2

2 INTRODUCTION 3
2.1 The Purpose, Intended Use, and Intended Audience of this SSG 3
2.2 Aligning this Document 5
2.3 Complexity versus Maturity 6
2.4 Navigating & Using this document 7

3 WHAT DOES “RELIABILITY ENGINEERING” MEAN? 8


3.1 High Level Definition of the SSG Topic Area 8
3.2 SSG Approach to Reliability Engineering 10
3.3 Terminology 11

4 CONCEPTS, PRINCIPLES AND KEY FACTORS 14


4.1 Concepts 14
4.2 Principles 14
4.3 Key Factors for Reliability Engineering 16

5 RELIABILITY ENGINEERING TOOLBOX 19


5.1 Selecting the Right Tool for the Right Job 19
5.2 Mathematical Modelling 21
5.3 Reliability Engineering Tool Selection Matrix 23
5.4 FMEA/FMECA 24
5.4.1 What is it? 24
5.4.2 When is it used? 24
5.4.3 Complimentary techniques 25
5.4.4 Example procedure steps 25
5.4.5 Prerequisites 26
5.4.6 Typical Proprietary Tools 26
5.4.7 Objectives 26
5.4.8 Derived Benefits 26
5.4.9 References 26
5.5 FTA 27
5.5.1 What is it? 27
5.5.2 When is it used? 27
5.5.3 Complimentary techniques 27
5.5.4 Example procedure steps 27
5.5.5 Prerequisites 27
5.5.6 Typical Proprietary Tools 27
5.5.7 Objectives 27
5.5.8 Derived Benefits 28
5.5.9 References 28

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distributio
iv © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5.6 HAZOP 28
5.6.1 What is it? 28
5.6.2 Limitations 28
5.6.3 When is it used? 29
5.6.4 Complimentary techniques 29
5.6.5 Example procedure steps 29
5.6.6 Prerequisites 30
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

5.6.7 Typical Proprietary Tools 30


5.6.8 Objectives 30
5.6.9 Derived Benefits 30
5.6.10 References 30
5.7 ETA 31
5.7.1 What is it? 31
5.7.2 When is it used? 31
5.7.3 Complimentary techniques 31
5.7.4 Example procedure steps 31
5.7.5 Prerequisites 33
5.7.6 Typical Proprietary Tools 33
5.7.7 Objectives 33
5.7.8 Derived Benefits 33
5.7.9 References 33
5.8 RCM 33
5.8.1 What is it? 33
5.8.2 When is it used? 34
5.8.3 Complimentary techniques 35
5.8.4 Example procedure steps 35
5.8.5 Prerequisites 36
5.8.6 Typical Proprietary Tools 37
5.8.7 Objectives 37
5.8.8 Derived Benefits 38
5.8.9 References 38
5.9 CRO 38
5.9.1 What is it? 38
5.9.2 When is it used? 38
5.9.3 Complimentary techniques 38
5.9.4 Example procedure steps 38
5.9.5 Prerequisites 39
5.9.6 Typical Proprietary Tools 39
5.9.7 Objectives 39
5.9.8 Derived Benefits 39
5.9.9 References 39
5.10 RCA 39
5.10.1 What is it? 39
5.10.2 When is it used? 39
5.10.3 Complimentary techniques 40
5.10.4 Example procedure steps 40
5.10.5 Typical Proprietary Tools 40
5.10.6 Objectives 40

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. v
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5.10.7 Derived Benefits 40


Contents
5.10.8 References 40
5.11 FRACAS/DRACAS 41
5.11.1 What is it? 41
5.11.2 When is it used? 42
5.11.3 Complimentary techniques 42
5.11.4 Example procedure steps 42
5.11.5 Prerequisites 43
5.11.6 Typical Proprietary Tools 44
5.11.7 Objectives 44
5.11.8 Derived Benefits 44
5.11.9 References 44
5.12 TPM 44
5.12.1 What is it? 44
5.12.2 When is it used? 44
5.12.3 Complimentary techniques 44
5.12.4 Example procedure steps 44
5.12.5 Prerequisites 45
5.12.6 Typical Proprietary Tools 45
5.12.7 Objectives 45
5.12.8 Derived Benefits 45
5.12.9 References 45
5.13 OEE 45
5.13.1 What is it? 45
5.13.2 When is it used? 46
5.13.3 Complimentary techniques 46
5.13.4 Example procedure steps 46
5.13.5 Prerequisites 46
5.13.6 Typical Proprietary Tools 46
5.13.7 Objectives 46
5.13.8 Derived Benefits 46
5.13.9 References 46
5.14 LCVR 46
5.14.1 What is it? 46
5.14.2 When is it used? 47
5.14.3 Complimentary techniques 47
5.14.4 Example procedure steps 47
5.14.5 Prerequisites 47
5.14.6 Typical Proprietary Tools 47
5.14.7 Objectives 47
5.14.8 Derived Benefits 47
5.14.9 References 48

6 CASE STUDIES 48
6.1 Case Study 1 50
6.1.1 Introduction 50
6.1.2 Challenge 50
6.1.3 Approach 50
6.1.4 Deliverables 51
6.1.5 Results 51

vi © Copyright The Institute of Asset Management 2016. All rights reserved.


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

6.2 Case Study 2 51


6.2.1 Introduction 51
6.2.2 Challenge 51
6.2.3 Approach 52
6.2.4 Deliverables 53
6.2.5 Results 53
6.3 Case Study 3 54
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

6.3.1 Introduction 54
6.3.2 Challenge 54
6.3.3 Approach 54
6.3.4 Deliverables 55
6.3.5 Results 55
6.4 Case Study 4 56
6.4.1 Introduction 56
6.4.2 Challenge 56
6.4.3 Approach 56
6.4.4 Deliverables 56
6.4.5 Results 56
6.5 Case Study 5 58
6.5.1 Introduction 58
6.5.2 Challenge 58
6.5.3 Approach 58
6.5.4 Deliverables 60
6.5.5 Results 60
6.6 Case Study 6 61
6.6.1 Introduction 61
6.6.2 Challenge 61
6.6.3 Approach 61
6.6.4 Deliverables 61
6.6.5 Results 62
6.7 Case Study 7 62
6.7.1 Introduction 62
6.7.2 Challenge 62
6.7.3 Approach 62
6.7.4 Deliverables 63
6.7.5 Results 64
6.8 Case Study 8 64
6.8.1 Introduction 64
6.8.2 Challenge 64
6.8.3 Approach 65
6.8.4 Maintenance Analyses 65
6.8.5 Re-usable Maintenance Support Processes 65
6.8.6 Innovative Technologies for Condition Monitoring 65
6.8.7 Deliverables 65
6.8.5 Results 66

7 REFERENCES 67

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. vii
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

1 Introduction to Subject Specific


Guidelines
This Subject Specific Guidance (SSG) is part of a suite of documents designed to
expand and enrich the description of the Asset Management discipline as summarised
in the IAM’s document ‘Asset Management – an Anatomy’ (referred to throughout
this document as “The Anatomy”).

The SSGs cover the 39 Subjects in The Anatomy directly as a ‘one to one’ (where a
subject is very broad), or grouped (where subjects are very closely related).

1.1 Purpose of the SSGs practical approaches and solutions that are
This document provides guidance for good asset economic, viable, understandable and usable.
management. It is part of a suite of Subject Specific The underlying requirement for continual
Guidance documents that explains the 39 subject areas improvement should drive progress.
identified in “Asset Management – an Anatomy”, also
published by the Institute of Asset Management. These 1.2 The SSGs in context
subject areas are also acknowledged by the Global The SSGs are a core element within the IAM Body
Forum for Maintenance and Asset Management as the of Knowledge and they have been peer reviewed
“Asset Management Landscape”. and assessed by the IAM Expert Panel. They align
fully with the IAM’s values and beliefs that relate
PAS 55 and ISO 55001 set out requirements which to both the development of excellence in the asset
describe what is be done to be competent in asset management discipline and provision of support to
management, however they don’t offer advice on those who seek to achieve that level of excellence.
how it should be done. The SSGs are intended
to develop the next level of detail for each subject 1.3 SSGs and the issue of Complexity
in The Anatomy. They should therefore be read versus Maturity
as guidance; they are not prescriptive, but rather It is important to understand and contrast these
intended to help organisations by providing a terms. Put simply:
consolidated view of good practice, drawn from • The complexity of the business will drive the
experienced practitioners across many sectors. complexity of the solution required; and
• The maturity of the organisation will determine its
The SSGs include simple as well as complex solutions, ability to recognise and implement an appropriate
together with real examples from different industries solution.
to support the explanatory text because it is
understood that industries and organisations differ A very mature organisation may choose a simple
in scale and sophistication. In addition, they are at solution where a naive organisation may think
different stages of asset management; some may be that a complex solution will solve all its problems.
relatively mature while others are at the beginning of In truth, there is no universal best practice in Asset
the journey. Management – only good practice that is appropriate
for the operating context of any particular
Accordingly, there is flexibility for each organisation organisation. What is good practice for one
to adopt their own ‘fit for purpose’ alternative organisation may not be good practice for another.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
1 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

For example, an organisation that is responsible 1.4 Further reading


for managing 100 assets, all in the same location, The Anatomy provides a starting point for
could use a spread sheet-based solution for an development and understanding of an Asset
Asset Register and work management system. Management capability and the SSGs follow on
This is arguably good practice for that organisation. to support that further. However, the opportunity
However, for a utility business with thousands of doesn’t end there; the IAM provides a range of
distributed assets, this is unlikely to represent a expert and general opinion and knowledge which
good practice solution. is easily accessed by members through the IAM
website.
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

When reading the SSGs, the reader should have


a view of the complexity and maturity of the
organisation, and interpret the guidance that is
offered in that context.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 2
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

2 Introduction
2.1 The purpose, intended use, and the organisations within them. To provide additional
intended audience of this SSG context this SSG provides case study examples from
PAS 55 and ISO 5500x1 are the formal specification different sectors to demonstrate the key points of
and standard for the implementation of an Asset guidance. However, any document generic enough
Management System, setting out the minimal to be applied to multiple industry sectors must be at
requirements an organisation would need to a relatively high level of detail.
meet to gain accreditation to that specification or
standard. For any organisation or individual wanting Those familiar with BSI PAS 55:2008 will be aware
to master the discipline, knowledge of PAS 55 and that this specification itemises 28 requirements for
ISO 5500x is not the whole picture. As well as the organisations seeking to demonstrate good Asset
standard and management system aspects, they Management practices.2 These requirements are a
need to understand the full breadth and depth of clear foundation for implementing and operating
the component parts that make up the landscape of an Asset Management System. They are, however,
Asset Management and this is supported through the distinct from the capabilities such organisations
SSGs. need – these are the 39 Subjects described in
the Anatomy.3 The areas for requirements are
Standards could therefore be regarded as ‘what’ is summarised in the following figures from PAS 55 and
required for an Asset Management System. This ISO 5500x.
SSG, as one of many being developed by the IAM,
supports the ‘how’ to deliver the component parts The Asset Management Anatomy has been built
and in its development has tried to cover the range around 6 Subject Groups and 39 subjects and now
of industry sectors currently associated with the IAM provides a stable platform on which the IAM can
and recognise that differences in levels of maturity develop SSGs. These six subject groups and 39
and operating contexts exist within those sectors and subjects are also aligned with The Asset Management

Stakeholder and organisation context


4.7 Management review 4.2 Asset management policy 4.1 Understanding the organizatio and its context
4.2 Understanding the needs and expectations of the stakeholders
5.1 Leadership and commitment
Act Plan 5.3 Organizational roles, responsibilities and authority
Organizational plans and
organizational objectives
4.6 Performance assessment and
4.3 Asset management strategy,
improvement
objectives and plans 4.3 Determining the scope of the Asset management
4.6.1 Performance and condition PAS 55:2008 4.3.1 Asset management strategy asset management system policy
5.2 Policy
monitoring
4.6.2 Investigation of asset-related
Management 4.3.2 Asset management objectives 6.2.1 Asset management objectives
Strategic asset management
4.3.3 Asset management plan(s)
failures, incidents and system structure 4.3.4 Contingency planning
plan (SAMP)
Asset management objectives
nonconformities
4.6.3 Evaluation of compliance 6.2.2 Planning to achieve asset
4.1 General requirements Do management objectives
4.4 Asset management system
4.6.4 Audit 6.1 Actions to address risks and opportunities
8.3 Outsourcing (scope)
4.6.5 Improvement actions Plans for developing for the asset management system
4.6.6 Records asset management
4.4 Asset management enablers Asset management plans
systems - relevant
and controls support
Check 4.4.1 Structure, authority and 8.1 Operational planning and control
8.2 Outsourcing (control)
responsibilities 7.1 Resources
8.3 Management of change
4.5 Implementation of asset 4.4.2 Outstanding of asset Asset management 7.2 Competence
Implementation of asset systems - relevant
management plans(s) management activities management plans
7.3 Awareness
support elements 7.4 Communication
4.5.1 Life cycle activities 4.4.3 Training, awareness and
7.5 Information requirements
4.4.2 Tools, facilities and equipment competence 7.5 Documented information
4.4.4 Communication, participation
and consultation
Asset portfolio
4.4.5 Asset management system
documentation
4.4.6 Information management
8.2 Management of change
4.4.7 Rick management 9.1 Monitoring, measurement, analysis and evaluation
Performance evaluation and improvements
4.4.8 Legal and other requirements 9.2 Internal audit
4.4.9 Management of change 9.3 Management review
10 Improvement

Figure 1: Requirements for good asset management practices - PAS 55 (left) ISO 55002 (right)

1. The term ISO 5500x is used generically in this document to refer to the family of standards comprised on ISO 55000, 55001, and 55002 unless specific reference is made
to a section within one of those standards.
2. Clause 4, PAS 55
3. Asset Management – an Anatomy Version 2, July 2014, Institute of Asset Management
4. Second Edition, March 2014

3 © Copyright The Institute of Asset Management 2016. All rights reserved.


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

Create or Operate and Dispose an/or


Identify need
Acquire Maintain Replace

Sell, Recycle
Install and Operate and
Identify need Select Purchase and/or
Configure Maintain Replace

Manage
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

Identify need Design Construct Commission Operate and Decommission Residual


Maintain Liabilities

Figure 2: Core asset life cycle stages and examples of variations6

Landscape4 (published by The Global Forum on Note that in ISO 5500x the explicit identification of
Maintenance and Asset Management) to facilitate different life cycle activities (such as create/acquire,
the exchange and alignment of maintenance and operate, maintain, renew/dispose) has been dropped
asset management knowledge and practices. and instead it accommodates more diverse life cycle
stages of different asset types. The figure above
The six Subject Groups5 are: shows some examples of lifecycle variations.
• Strategy and Planning.
• Asset Management Decision-Making. We recognise that you will spend more time and
• Lifecycle Delivery. money if you don’t have the right assets in the first
• Asset Information. place, and we recognise that prior stages are also
• Organisation and People. important by getting an asset design right in the
• Risk & Review. first place. However, many of the earlier and later
lifecycle phases are covered by other areas within
This SSG specifically pertains to Reliability Engineering the Lifecycle Delivery subject group of the Asset
which sits within the Lifecycle Delivery area of Management Landscape, as listed below, and this
the Asset Management Subject Groups. It will SSG focuses more on the Operate / Utilise & Maintain
become part of a full series of SSGs covering all / In Service area of the asset lifecycle.
39 Subjects and a smaller series of Sector Specific
Guidelines (where these are desired by a particular Lifecycle Delivery:
sector). These are not designed as text books or • Technical Standards & Legislation.
course material but as reference documents for • Asset Creation & Acquisition.
professionals working in or requiring guidance in • Systems Engineering.
this field. We would expect everybody involved in • Configuration Management.
asset management to have a working knowledge of • Maintenance Delivery.
the 39 Subjects, but the degree to which they might • Reliability Engineering (this SSG).
need deep or specialist knowledge will depend on • Asset Operations.
the job or task they perform. • Resource Management.
• Shutdown & Outage Management.
Reliability Engineering has a broad scope and as such, • Fault & Incident Response.
the group developing this SSG considers lifecycle • Asset Decommissioning & Disposal.
as a central component of this document, and this
document is based on the assumption that most people In applying this SSG, the reader is requested to view
who read this SSG will be people who have existing Reliability Engineering as a toolbox. It has many
assets and are looking at better ways to manage the sections with many tools that apply across the whole
reliability associated with them. For this reason this SSG lifecycle. But some tools are used more often than
focuses on the “in service” phase of the asset lifecycle. others and some tools are more general whereas
5. Updated to reflect Version 2 of the Anatomy
6. An Anatomy of Asset Management Issue 3 July 2014

© Copyright The Institute of Asset Management 2016. All rights reserved. 4


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

others have very specific applications. Given the • Alignment (‘line of sight’) of organisational
assumption that most use of the SSG will probably be in objectives feeding clearly into asset management
the “in service” stage and be related to mostly physical strategies, objectives, plans and day-to-day activities.
asset management this SSG describes the applicability • Whole life cycle asset management planning and
of techniques to be used for these purposes. cross-disciplinary collaboration to achieve the
best value combined outcome.
The following diagram illustrates typical priorities and • Risk management and risk-based decision-
concerns for asset managers. This topic is covered in making.
other IAM documents but aligning the organisational • The enablers for integration and sustainability;
and strategic goals with work carried out under the particularly leadership, consultation,
asset management system is critical for maintaining communication, competency development and
clear line of sight. This line of sight is maintained by information management.
using the right tool for the right job.
An in depth comparison of these two publications
Typical priorities & ‘values’
is outside the scope of this document but some key
differences are listed below9:
Keeping stakeholders happy Corporate/
Organization • PAS 55 primarily relates to the management of
Management
Portfolio return on investment physical assets, whereas ISO 55001 is intended to
n

be used for managing physical assets in particular,


tio

compliance & sustainability Manage Asset Portfolio


isa
tim

but it can also be applied to other asset types.


Op

Systems performance,
lue

cost & risk optimization Manage Asset Systems


Va

• In PAS 55, asset management strategy was


g
tin

Life Cycle Activities:


deemed to include both strategies for managing
os
eC

efficiency & Manage individual Assets over their Life Cycles


ycl

effectiveness
eC

assets and strategies for improving asset


Lif

management, but in ISO 55001, these are split


Figure 3: Typical priorities and concerns7 out into discrete requirements.
• PAS 55 defines explicit identification of different
As we look to maintain this line of sight across the life cycle activities (such as create/acquire,
whole lifecycle it is useful to see where the above operate, maintain, renew/dispose) but these have
diagram fits with respect to other components of PAS been dropped in ISO 55001 to accommodate
55 and ISO 5500x. While there are dependencies more diverse life cycle stages of different asset
between plans, policies, and objectives it is the types.
lifecycle activities where the plans and capabilities • Requirements for optimisation in planning and
come together although strategic asset management decision-making are retained in ISO 55001 but
planning is one area where PAS 55 and ISO 5500x are described differently.
are not aligned. • ISO 55001 reflects stakeholder needs and uses
‘value’ to determine the best balance in achieving
2.2 Aligning this document conflicting objectives.
The authors have tried to align this SSG with the • The required steps for risk management are
key components of both PAS 55 and ISO 5500x, but reduced within ISO 55001 (already specified in
the reader should note that there are several areas ISO 31000).
where the PAS 55 specification and the ISO 55000 • ISO 55001 requirements for audit and
standard(s) differ. As an international standard ISO documented information have also been
5500x represents a good path for organisations tightened.
interested in improving their asset management
practices to follow but PAS 55 remains a very 2.3 Complexity versus maturity
complete and easy to read guide. However, there are It is important to understand and contrast the
many key areas where there is alignment, including8: terms complexity and maturity and how they

7. An Anatomy of Asset Management Issue 3 July 2014


8. ISO 55000, John Woodhouse, Chairman, Experts Panel, Institute of Asset Management
9. Ibid

5 © Copyright The Institute of Asset Management 2016. All rights reserved.


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

impact an organisation looking to improve its asset to assess capabilities and a roadmap for improving
management performance, specifically within the them. In its simplest form, a maturity model is a set
area covered by this SSG. of characteristics, attributes, indicators, or patterns
that represent progression and achievement in a
In simple terms: particular domain or discipline. The artefacts that
• The complexity of the business will drive the make up the model are typically agreed upon by
complexity of the solution required. the domain or discipline and are validated through
• Organisational maturity will determine its ability application and iterative recalibration11.
to recognise and implement an appropriate
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

solution. A maturity model allows an organisation or industry


to have its practices, processes, and methods
A very mature organisation may choose a simple evaluated against a clear set of artefacts that
solution where a less mature organisation may establish a benchmark. These artefacts typically
incorrectly perceive that a complex solution will solve represent best practice and may incorporate
all its problems. In truth, there is no universal best standards or other codes of practice that are
practice in Asset Management or other areas – only important in a particular domain or discipline. While
good practice that is appropriate for the operating this SSG is not a maturity model it does borrow some
context of any particular organisation10. of these concepts and presents a set of tools and
techniques to be considered12.
Part of gaining an understanding of an organisation’s
capabilities involves answering these questions: In producing this SSG the team discussed the
• How can you tell if you are doing a good job of benefits of applying maturity models focused on
managing these assets and monitoring your the use of specific Reliability Engineering tools
progress on an ongoing basis? but, while there would undoubtedly be benefits in
• How do you manage the interactions of systems applying these if they existed, it is out of scope for
and processes that are continually evolving? this document. However, the IAM has done some
• How do poor processes impact interoperability, internal work on this topic and seeks to establish
safety, reliability, efficiency, and effectiveness? consensus or identify weaknesses and/or opposition
(based on good or better facts or arguments)
The most complex or sophisticated Asset through a Green Paper. Also, note that the IAM
Management solution does not necessarily represent produced a PAS 55 Assessment Methodology (PAM)
a mature approach to Asset Management in all which contained a series of questions to explore
business contexts. What is good practice for one the maturity of an organisation’s asset management
organisation may not be good practice for another. capability across all the elements of BSI PAS 55:2008,
For example, an organisation that is responsible enabled organisations to undertake a self assessment
for managing 100 assets, all in the same location, and ‘gap analysis’ of their current asset management
could use a spreadsheet-based solution for an Asset practices, and it is available as a free download13 for
Register and Work Management System. This is paid-up members of the IAM. During production
arguably good practice for that organisation. When of this SSG, PAM was replaced by the Self-
reading the SSGs, understanding the complexity and Assessment Methodology14 (SAM) tool which allows
maturity of the organisation must be considered and organisations to assess their capability across either
the guidance placed in that context. the 28 elements of BSI PAS 55:2008 or the 27 sub-
clauses of ISO 55001. It may also be used by 3rd
Maturity models exist for many different challenge party independent assessors.
problems. They provide a way for organisations to
approach problems and challenges in a structured
way by providing both a benchmark against which

10. Maturity Models 101: A Primer for Applying Maturity Models to Smart Grid Security, Resilience, and Interoperability, Software Engineering Institute, 2012, Caralli, Knight,
Montgomery
11. Ibid
12. Ibid
13. https://theiam.org/products-and-services/pas55-methodology
14. https://theiam.org/products-and-services/Self-Assessment-Methodology

© Copyright The Institute of Asset Management 2016. All rights reserved. 6


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

2.4 Navigating and using this document


The following descriptions provide a high level
overview of each section in this document.

Introduction This section. Introduction to the purpose of the SSGs and how they relate
to PAS 55 and ISO 5500x.
What Does “Reliability Overview of scope and key terminology.
Engineering” Mean?
Concepts, Principles and High level summary of the important considerations, concepts and
Key Factors principles of Reliability Engineering.
Reliability Engineering Description and overview of several helpful techniques for Reliability
Toolbox Engineering.
Guidance for Reliability Builds on the previous section and looks more at key inputs, outputs and
Engineering required experience.
Case Studies Examples to illustrate the use of some Reliability Engineering Tools.

Table 1: Overview of this document by section

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
7 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

3 What does “Reliability Engineering”


mean?
3.1 High level definition of the SSG topic Reliability Engineering has been developed in
area response to the need to control these risks.
Reliability Engineering consists of the systematic
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

As previously described the IAM recognises 6 Subject


Groups and 39 subjects which provide a stable application of engineering principles and techniques
platform on which the IAM can develop SSGs. throughout a product lifecycle to ensure that a
Reliability Engineering is recognised as one of the system or device has the ability to perform a required
39 subjects. You may have noticed that Root Cause function under given conditions for a given time
Analysis (RCA) was also part of this topic in Version interval. The goal always needs to be to identify
1 of the Anatomy (Reliability Engineering & RCA) but potential reliability problems as early as possible in
Reliability Engineering is a broad topic that covers the product lifecycle and ensure that the reliability
many areas and techniques. These components requirements will be met. As with many disciplines
of Reliability Engineering, one of which is RCA, are early identification is more effective than late
treated separately in this SSG so as to differentiate identification of issues, and prevention is better than
the value that they each offer. For this reason this cure. From a Reliability Engineering perspective
document describes the overall scope as Reliability it is important to note that the magnitude of the
Engineering which is consistent with Version 2 of the costs associated with changes to an asset while
Anatomy. in construction or service is much higher than the
costs of changes to the design. Likewise, a good
In terms of time spent in each lifecycle phase, design is cheaper than retrofits to poor designs or
operation and maintenance generally makes up the replacements with superior assets since in terms of
largest amount of elapsed time and thus the largest the Whole Life Cost, around 85% is locked in during
amount of time where the organisation interacts the design phase so this is a critical time for those
with the assets. It is also the time where the assets managing the asset to be involved.
perform the services for which they were designed
and installed. This is the phase where the assets The benefit that Reliability Engineering brings to
deliver value but it is also where they are used and asset management practitioners is a structured
exposed to operational risks, and thus it is where approach that uses data and statistical techniques
they can also fail and impact an organisation’s to quantify the causes and likelihood of asset failure
activities. Unreliability has a number of unfortunate as well as their consequences, and to quantify
consequences and therefore for many products those. But Reliability Engineering is not defined
and services is a serious threat. For example poor by a single approach or a specific life cycle stage.
reliability can have implications for15: Reliability Engineering can be applied before assets
• Safety. are installed, during operation, and after failure and,
• Competitiveness. depending on the situation, different approaches
• Profit margins. offer more helpful results. For this reason several
• Cost of repair and maintenance (warranty and approaches have developed over time to assessing
service costs). the impact of asset failure on reliability, and it is for
• Delays further up supply chain. this reason that this SSG takes a top down review
• Reputation and public liability. of different approaches so that the reader may learn
• Development risk. more about the applicability of different approaches
• Good will. with respect to their own situation.

15. Source - Warwick Manufacturing Group

© Copyright The Institute of Asset Management 2016. All rights reserved. 8


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

The objectives of Reliability Engineering can be Such commercial evaluation can / should consider
summarised as: the following:
1) To apply engineering knowledge and • Reduced production losses.
specialist tools/techniques to prevent or • Lower production unit cost.
reduce the likelihood and/or consequence • Reduced maintenance costs.
of failure, recognising that you should • Improved employee safety.
determine the consequences and probability • Better process stability.
of failure and then decide what, if anything • Extended equipment life.
to do. You can decide to reduce or accept • Reduced spare parts inventory.
the consequences. You can decide to reduce • Improved sense of employee ownership.
or accept the probability of failure, or do • Reduced risk of environmental issues.
nothing. • Reduced overtime.
2) To identify and correct the causes of failure • Continuous improvement.
that do occur, despite the efforts to prevent
them. The key aspects of reliability management include:
3) Determine ways of coping with failures that • Corporate level involvement.
do occur, if their causes have not been • Integral part of product development not parallel.
identified or corrected. • Reliability procedures integrated into design process.
• Built into programme plan and production of a
On a case-by-case basis, the above objectives may reliability plan.
need justifying commercially if a proposed Reliability • Ownership of the reliability plan within the design team.
Engineering solution is likely to be ‘resource hungry’.

4. Context of the organisation


4.1 Understanding the organisation and its
context
4.2 Understanding the needs and expectations
of stakeholders
10. Improvement
4.3 Determining the scope of the asset
10.1 Nonconformity and corrective action
management system
10.2 Preventive Action
10.3 Continual Improvement
Act Plan
5. Leadership
5.1 Leadership and commitment
PDCA 5.2 Policy
Strategy & Planning
representation 5.3 Organisational roles, responsibilities
and authorities
of ISO 55001
9. Performance evaluation
9.1 Monitoring, measurement, anaysis clauses
and evaluation
9.2 Internal Audit Check Do 6. Planning
9.3 Management Review 6.1 Actions to address risks and opportunities
for asset management system
6.2 Asset management objectives and
8. Operation planning to acheive them
8.1 Operational planning and control
Asset Information 8.2 Management of change
8.3 Outsourcing

©C
Copyright
i ht 2014
2014 IInstitute
titt t off A
ti Assett M
Managementt ((www.theIAM.org/copyright)
th IAM
IAM / i ht) 7. Support
7.1 Resources 7.4 Communication
7.2 Competence 7.5 Information requirements
7.3 Awareness 7.6 Documented information

Create or Operate and Dispose an/or


Identify need
Acquire Maintain Replace

Sell, Recycle
Install and Operate and
Identify need Select Purchase and/or
Configure Maintain Replace

Manage
Identify need Design Construct Commission Operate and Decommission Residual
Maintain Liabilities

Figure 4: SSG approach to Reliability Engineering

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
9 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

3.2 SSG approach to Reliability at a specified statistical confidence level. Events


Engineering may occur at any time in the lifecycle, but the main
There are many disciplines associated with all aspects of focus of this SSG is on operation and maintenance
an asset lifecycle that relate to reliability. These represent activities.
areas where general approaches and methodologies
specific to particular disciplines can add much value and • Reliability is predicted on required function
guidance. Reliability Engineering thus has to consider (Upper Right)
and balance many different drivers, benefits, policies, Generally, reliability is a measure of operation
risks, and external influences within the boundaries of without failure, however reliability is often tied to
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

these constraints as it aims to provide optimal asset asset systems where individual components may
performance. In order to achieve this goal the following fail without necessarily impacting system reliability,
four key elements16 in Figure 4 provide a concise focus of and the system may still perform the primary
Reliability Engineering activities: function for which it was intended. Even if no
individual part of the system has failed the system
• Reliability deals with potential events (Upper as a whole may eventually cease to perform the
Left) primary function for which it was intended because
This means that failure is regarded as a it has been in a failing state; when this is the case
probabilistic phenomenon: and the likelihood of then system reliability will obviously be impacted.
failure may be random or may vary over time The system requirements specification is the
according to a distribution (probability density criterion against which reliability is measured.
function). Reliability Engineering is concerned There are processes to be considered at each
with delivering a specified probability of not failing, lifecycle stage in order to meet required functionality.

16. Asset Management – an anatomy Version 3, July 2014, Institute of Asset Management

© Copyright The Institute of Asset Management 2016. All rights reserved. 10


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

• Reliability applies to a specified period: usually The concept of Reliability Engineering techniques as a
time (Lower Right) toolbox is described in the following sections of this
Reliability Engineering seeks to ensure that document, but the figure below from An Anatomy
components and materials will meet the of Asset Management shows how an organisation’s
requirements during the specified period. Note objectives (e.g. improvement of reliability, elimination
that units other than time may be used, for of particular problems) “requirements” need
example distance or number of cycles. Applying techniques specific to the asset life cycle “Reliability
specified time periods to reliability introduces the Engineering tools” and capabilities to apply those
basis for reliability comparisons and allows the techniques and learn from the results “capabilities”.
use of time as a normalising denominator for
reliability comparisons. These metrics are often 3.3 Terminology
applied to maintenance strategies and the focus Reliability is a word that is often used in many
of this document is on “in service” stages of the contexts. Since this is the foundation for Reliability
lifecycle. Engineering we have presented several related terms
here so that the reader can understand the context
• Activities are restricted to operation under in which these terms are used in this guidance
stated conditions (Lower Left) document and understand what is covered and what
This constraint is necessary because it is is not covered.
impossible to design a system for unlimited
conditions. Since there have been many Resilience is a topic that is gaining more attention
approaches developed over time for different in recent years internationally, particularly in regard
aspects of Reliability Engineering this SSG to the management of natural disasters and many
attempts to relate the tools covered by this SSG countries in Europe, Australia, USA, Japan to name
to when they are applicable. a few have or are developing a critical infrastructure
resilience strategy.

Organizational Strategic Plan

Asset Management Policy


Capabilities

Requirements AM Objectives Development Plan for


Management System
Continual improvement

AM Strategies
Continual improvement

AM Plans
AM Capabilities
Processes, resources,
competencies & technologies
Life Cycle Activities

Aquire/create
Reliability Portfolio
Engineering Tools Utilize
Asset Systems
Maintain
Dispose/Replace Assets

Figure 5: Key elements within an asset management system terminology

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
11 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

Reliability Reliability is the probability that an item can perform its intended function for a specified
interval under stated conditions. A single word definition for reliability is dependability17.
From IEC 60050 – 191-12-01: probability that an item can perform a required function under
given conditions for a given time interval18.
Resiliency Infrastructure resilience is the ability to reduce the magnitude and/or duration of disruptive
events. The effectiveness of a resilient infrastructure or enterprise depends upon its ability to
anticipate, absorb, adapt to and/or rapidly recover from a potentially disruptive event19.
Redundancy Duplication of components in electronic or mechanical equipment so that operations can
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

continue following failure of a part.


Reliability Reliability Engineering consists of the systematic application of engineering principles and
Engineering techniques throughout a product lifecycle to ensure that a system or device has the ability to
perform a required function under given conditions for a given time interval.
Reliability Engineering is an ongoing process starting at the conceptual phase of a product
design (including defining system requirements) and continuing throughout all phases of a
product lifecycle. The goal always needs to be to identify potential reliability problems as
early as possible in the product lifecycle and ensure that the reliability requirements will be
met. Changes to a design are orders of magnitude less expensive in the early part of a design
phase rather than once the product is manufactured and in service20.
Root Cause Once a system or a group of assets are operational, it will be necessary to record the Root
Analysis Cause(s) of any failures that occur which would typically involve the deployment of a Failure
Recording and Corrective Action System (FRACAS) or Data Recording and Corrective Action
System (DRACAS). It is important that any such system enables the Root Cause(s) of failure to
be captured and that this information is captured in a way that is consistent with the ‘Failure
Modes, Effects and Criticality Analysis (FMECA) that was undertaken. This enables feedback
to the Operations and Maintenance Decision-Making on actual failure rates for different
failure modes which may influence the choice of maintenance or inspection intervention21.

Table 2: Basic terminology

For example, in the context of critical infrastructure, fails it impacts its reliability. When a system fails it
resilience refers to: impacts its reliability. But when a system withstands
• Coordinated planning across sectors and an event without failing how do you measure that?
networks to assess failure probability, And how severe does the event need to be? And
consequence and post-failure activities. how do you measure that? In the end resilience is
• Responsive, flexible and timely recovery measures. something that affects different industries differently
• The development of an organisational culture that and has different approaches dependent on local
has the ability to provide a minimum level of factors so metrics will likely be tied to appropriate
service during locally agreed standards.
• Interruptions, emergencies and disasters, and
return to full operations quickly. Resilience fits within the overall concept of reliability
• In this way, resilience results in the building of but is particularly valuable for dealing with severe
capacity in organisations to be agile, adaptive and and non-traditional hazards. The measurement
to improve by learning from experience22. of resilience is another topic that this SSG cannot
address but the list of topics outside of the scope
While there are metrics to measure reliability, of this SSG only serves to illustrate the breadth of
resilience is trickier to measure. When an asset Reliability Engineering as a topic. For readers who

17. Practical Reliability Tools for Refineries and Chemical Plants (Barringer)
18. Due to be superseded soon by IEC 60050-192
19. US National Infrastructure Advisory Council 2010, pg. 15.
20. IAM Knowledge Center
21. Ibid
22. Australia, Critical Infrastructure Resilience Strategy, 2010

© Copyright The Institute of Asset Management 2016. All rights reserved. 12


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

are interested in delving deeper into the topic of The following illustrations from the University of
resilience, the University of Kansas and the European Kansas use two axes that show operational state
Network and Information Security Agency (ENISA) against service levels seen by a customer. The
have both done some interesting work on the topic first diagram shows the impact on the state of the
of resilience metrics23 24. network when an event occurs and how the event
is first remediated and then recovered from. The
second diagram compares the impacts of an event
on two networks, one that is not resilient to events
and one that is.

Operational State N
Normal Partially Severely with resilience
Operation Degraded Degraded without resilience
Si aggregate network state
Imparied Unacceptable

Unacceptable
S’3

Service space [multi-vaiate]


Sc
Service Parameters P

Remediate
Imparied
S2
S3
Detect Sr
Acceptable

Defend S’1
Recover
Acceptable

S2
S0 S0 S1

Normal Partially Severely


Operation Degraded Degraded

Figure 6: Resilience state space and strategy Figure 7: University of Kansas framework to
inner loop25 quantify network resilience26

23. https://wiki.ittc.ku.edu/resilinets/Main_Page#ResiliNets_Wiki
24. Measurement Frameworks and Metrics for Resilient Networks and Services: Technical report, ENISA, February 2011
25. Modelling and Analysis of Network Resilience, IEEE COMSNETS, Bangalore, India, 2011
26. https://wiki.ittc.ku.edu/resilinets/Main_Page#ResiliNets_Wiki

13 © Copyright The Institute of Asset Management 2016. All rights reserved.


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

4 Concepts, Principles and Key Factors


This section is a high level summary of the important considerations, concepts and
principles of Reliability Engineering. It lists out the key factors and areas of guidance
that are to be explained in more detail in the following section and summarises the
key points of guidance drawn from the following sections.
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

4.1 Concepts This SSG will discuss the applicability of different


The concept that this document uses to explain tools to different situations to provide a guide as
Reliability Engineering is a toolbox. There are several to how and when to apply the techniques. An
tools that can be used and it is the goal of this important fact to bear in mind is that the onset of
SSG to describe which tools should be considered failure which will ultimately lead to a loss of function
for particular instances, although the reader is may occur well in advance of the point of functional
recommended to review the additional references failure. This concept is embodied in the concept of
made in this document for further information. the P-F curve.

4.2 Principles The P-F curve shows the time taken for the onset of
The principles of Reliability Engineering were failure to be detected; point ‘P’ (Potential Failure) and
introduced in the previous section, and this SSG uses which will be variable dependent on the detection
Figure 4 to relate these concepts to PAS55 and to the method that is used and point ‘F’ (Functional Failure)
tools discussed in this document. These concepts are which occurs when the system no longer meets the
simple and foundational to this topic. user requirements. The time range between P and
F, commonly called the P-F interval, is the window of
Reliability Engineering: opportunity during which an appropriate (originating)
• Deals with potential events. task to detect the onset of failure can be conducted;
• Is predicted on required function. once completed the (remedial) task can be planned
• Applies to a specified interval: usually time, but and completed and the system returned to its
may be linked to operational cycles or throughput. required performance state.
• Applies to operation under stated conditions.

Point at which Potential


Onset of Potential Failure Failure become detectable
100%

P-F Interval
Condition
(Required
Point of Functional Failure
Performance)
Minimum Acceptable Performance

Failed Condition

Operating Age

Figure 8: Example P-F curve

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 14
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

A hidden failure where an undetected failure does probabilities and consequences and part of managing
not lead to a loss of primary function in normal failures is deciding when to prevent them and when
circumstances warrants further discussion and is to tolerate them. If the cost of preventing a failure is
also covered in the section on Reliability Centred greater than the cost of the consequence then failure
Maintenance (RCM) on page 33, and specifically in may be a viable option. This is not a trivial topic
Figure 16: Decision Logic for functional failures NOT however and the timeframes for pre-failure actions
evident to operations personnel during normal duties need to be considered against the timeframes for
on page 37. A ‘hidden’ failure will not be noticed post-failure actions as well as any potential additional
under normal operating conditions and requires a impacts that a failure may cause. It is also important
second failure to occur before it is noticed e.g. you to note that in critical situations or where safety
will not know that a fire detector has failed until a is an issue there may be a zero tolerance policy to
defect that causes a fire has occurred. failure and/or injury which will dictate the way this is
managed.
Understanding that an asset failure that impacts
reliability may have its cause anywhere from design Whatever the balance of risk and reward in deciding
to a point in time immediately before (functional) the right approach the reader is urged to remember
failure, the P-F curve provides another way to view that the rates of failure over time may vary for many
the applicability of this SSG. This SSG provides an reasons and the following failure shapes provide
overview of tools that can be applied27 during the examples of temporal variations that may occur.
lifecycle of an asset including at any point on the
P-F curve. For instances where an organisation has
Initial break-in period
multiple similar assets or where replacement assets Bathtub Pattern D = 7%

are being procured lessons learned from failures Pattern A = 4%

may be used to redesign the asset (where this is


Random
an option) or change installation/maintenance Pattern E = 14%

procedures or to modify the management of similar Wear Out


Pattern B = 2%
assets that haven’t failed yet.
Time Time

Before discussing failure shapes it should be noted Fatigue


Pattern C = 5% Infant Martality
that these discussions primarily relate to mechanical Pattern F = 68%

components and that solid state or microelectronic


Age related = 11% Random = 89%
assets are somewhat different. Microelectronic
assets are susceptible to stress from their surrounding
Figure 9: Failure shapes (John Moubray,
environment, and heat and humidity in particular
Nolan and Heap)
can cause problems. Failure Modes, Effects and
Criticality Analysis (FMECA) is particularly useful as a
tool for electronics. Modern electronics are generally Furthermore a large proportion of modern electronics
extremely reliable but there are many failure modes demonstrate intermittent failures when problems
that they are susceptible to28. The challenge is that occur. This document does not go into detail on
the progressive nature of the internal failure modes this topic but it is mentioned here as a reminder
does not necessarily have a similar cause and effect to the reader. Since complex asset systems may
in terms of external symptoms. Once a failure mode involve many components, intermittent failures may
reaches a certain threshold the device as a whole be difficult to diagnose since the effects of those
suddenly transitions form normal operation to failure. problems may be manifested in other areas of
system performance and may prove challenging to
The goal of this SSG is to provide guidance on how reproduce or identify. Redundancy is also often built
to determine the causes of system failures and into microelectronics so problems may be hidden
identify suitable techniques to assist with activities rather than intermittent until overall failure of that
targeted to manage them. Earlier we discussed component occurs.

27. Note that the P-F curve is drawn in relation to operating age and some of the tools discussed will be used in design i.e. before the operating phase commences.
28. Practical Reliability Engineering, Chapter 9, O’Connor

15 © Copyright The Institute of Asset Management 2016. All rights reserved.


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

Identify need
Create or Operate and Dispose an/or • An incident has occurred or a situation exists
Acquire Maintain Replace
where action needs to be taken in the short term
Identify need Select Purchase
Install and Operate and
Sell, Recycle
and/or
that may include giving instructions to largely
Configure Maintain Replace
untrained staff so as to meet immediate
Operate and
Manage objectives.
Identify need Design Construct Commission Decommission Residual
Maintain Liabilities • The organisation is looking at how to meet longer
term objectives in terms of improvement processes.
Figure 10: Variations of asset life cycle stages
The seven basic RCM questions are:
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

• What are the functions of the asset?


4.3 Key factors for Reliability Engineering • In what way can the asset fail to fulfil its functions?
The key factors for Reliability Engineering are the • What causes each functional failure?
situations and drivers that cause asset managers and • What happens when each failure occurs?
engineers to reach for Reliability Engineering tools • What are the consequences30 of each failure?
and thus the drivers for needing to apply Reliability • What should be done to prevent or predict the
Engineering techniques. The approach adopted in failure?
this document is to use a two dimensional matrix • What should be done if a suitable proactive task
with asset lifecycle phases along the top and a list of cannot be found?
drivers/challenges down the side and to show where
each technique has value. The intention with this It is also important to understand that all tools can
concept is to show, based on specific situations, where be “simple” or complex. They can be as complicated
the strengths of each approach are most helpful. as needed but can also provide quick benefits in
many cases. As with maturity versus complexity,
The following section provides a high level overview more complex is not necessarily better.
of several techniques (“tools”) with indications of
where they are applicable and a high level overview Take, for example a situation where an asset has
of each tool. This is intended as a primer to Reliability failed and caused other assets to not be able to
Engineering for people who have not been exposed to operate normally. The organisation responsible for
the methods previously and who are looking for ways operating the equipment may want to bring the
to take the guidance from PAS55 and ISO5500x to systems back on line quickly to meet deadlines or
select tools to help implement that guidance. reliability targets, or both. Understanding why the
asset failed, and how, is important to understand so
When determining drivers for selecting which that the system does not fail again once repaired.
technique(s) to use there are seven basic questions, Gathering and analysing tangible evidence of cause
as used within RCM29, which can be used to help. and effect with proposed solutions is the desired
It should be noted that these RCM (see page 34) approach.
related questions would normally be asked in a
serial fashion to support the process, the first five Some systems take a long time to restart after
when developing the FMECA (see page 26) and even the briefest failure and the cost of clean-up,
the other two when conducting the task analysis. maintenance, and restarting can be significant as
Remembering that reliability is predicted on required well as the impact on customers and deadlines. To
function these questions are designed to focus on simply replace the failed equipment after a two day
maintaining the required functions of a system. outage only to experience a repeat fault would be
totally unacceptable. Failure Modes and Effects
It is important for the reader to also bear in mind Analysis (FMEA)/FMECA, Hazard and Operability
that, while the following questions are good to study (HAZOP), RCA, and Failure Reporting and
help understand the various contexts in which tools Corrective Action System (FRACAS)/Defect Reporting
may be considered, there are two general types of and Corrective Action System (DRACAS) all have the
applications for these tools: potential to help the operator in this situation. But

29. http://www.reliabilityweb.com/art08/7_questions_rcm.htm
30. For those used to looking at risk as the product of probability and consequence/severity this is covered in the ‘CA’ of the FMECA and will likely have an associated
criticality matrix.

© Copyright The Institute of Asset Management 2016. All rights reserved. 16


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

imagine if this was the third time in a few weeks that • Identify critical assets – the failure of these assets
a similar failure had occurred. Clearly there is reason has the most significant impact – what are the
to do a deeper investigation and look at whether critical parts, components, functions analysed?
there is a common cause affecting all of the incidents • Assess asset inter-dependability (one asset or
and if so what can be done to proactively avoid component failure can influence others) – what
them in the future. interdependencies if any exist between these
parts, function, components?
Whichever tool or combination of tools is selected, • Components classification hierarchy (e.g.
it or they must be implemented in the context of a critical, potentially critical, run to failure),
Reliability Plan which should contain the followings31: differentiate between critical, potential critical, or
• Statement of reliability requirement. run to failure parts, components or processes.
• Organisation for reliability. • Source the information and data required for
• Reliability activities to be performed and why. performing the Reliability Engineering analysis and
• Timing of major activities. ensure its quality (relevance, currency accuracy)
• Management of suppliers. when looking to events that had already occurred.
• Standards and company procedures to be used. • Data can include: components and their hierarchy
• Lesson learned feedback. and criticality; components condition; age;
• Risk Analysis/risk register. relationship between components.
• Data collection and analysis procedure. • Develop and apply a consistent convention/
• Reliability monitoring plan. terminology for defining the components and the
failure modes.
Additionally, although each Reliability Engineering
tool may target a specific type of issue, the
application of most tools will involve consideration
of the elements listed below. The intention is to give
the reader a flavour of some of the elements that
could be considered and we recognise that this list
and the approaches suggested are not exhaustive
and that several elements could be quantified as a
compound or multidimensional measure showing
e.g. quantified effects on risk to life, product quality,
company reputation etc. As an introduction to this
topic the SSG does not try to offer ways in which to
combine techniques to solve complex or interrelated
issues and this is why there is a “Derived Benefits”
section for each tool so that a decision on which tool
to select could be influenced by what else a reader
could benefit from:
• Determine the scope of the Reliability Engineering
program (Functions, condition, timeframe, why is
the Reliability Engineering tool applied, what is
the expected outcome and in what timeframe?).
• Identify the asset reliability criteria – what are the
asset reliability criteria that need to be achieved or
preserved? e.g. safety criteria, energy reduction;
level of service, regulatory conformance etc.
• Identify asset system boundaries, interface issues
– to what process or system functions or
components/parts is the Reliability Engineering
analysis tool applied to?

31. O’Connor, Chapter 15, Reliability management, Practical Reliability Engineering, John Wiley

17 © Copyright The Institute of Asset Management 2016. All rights reserved.


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

© Copyright The Institute of Asset Management 2016. All rights reserved. 18


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5 Reliability Engineering Toolbox


The engineer’s toolbox for Reliability Engineering is a feast if you enjoy alphabet soup.

This is the main body of the guidance. It consists of It should be assumed that different examples of good
a number of sub-sections with each section covering practice will be appropriate for different business
one of the following ‘tools’. The key factors considered contexts. For example an asset register for a ‘simple’
are the drivers for needing a Reliability Engineering company can be a spreadsheet and be fit for purpose
approach so, whereas this section includes an overview / good practice but for a major utility good practice
of each tool, it also addresses how specific tools can for an asset register will be very different.
help to address individual factors.
This section provides a high level description of the
The tools discussed in this SSG are: prerequisites, objectives and ‘spin-offs” that can be
expected from the use of each tool. In this section
FMEA Failure Modes and Effects Analysis See the following descriptions also focus on the “how”
FMECA Failure Modes, Effects and p24 as opposed to the “what” when it comes to applying
Criticality Analysis these tools. The discussion of the “how” includes:
FTA Fault Tree Analysis See • People (roles, responsibilities).
p27 • Process (inputs/outputs, key steps, controls, etc.).
• Information or knowledge.
HAZOP Hazard and operability study See • Pitfalls or challenges and how to avoid them.
p28 • Examples of good industry practice.
• Success factors are and how these can be
ETA Event Tree Analysis See
p31 achieved.
• Suitable benchmarks.
RCM32 Reliability Centred Maintenance See
p33 5.1 Selecting the right tool for the right
CRO Cost Risk Optimisation See job
p38 Table 4 lists some of tasks frequently carried out
by asset managers/engineers. The purpose of this
RCA Root Cause Analysis See table is to act like a guide to this document. It takes
p39 the tools listed in Table 3: Reliability Engineering
“Tools” and maps them to a number of business
FRACAS Failure Reporting and Corrective See
Action System p41 tasks (requirements). In this way the reader can look
for a situation that is similar to a situation they may
DRACAS Defect Reporting and Corrective See have seen and then look up which tools are most
Action System p41 suited to adding value to that task. The suggested
‘suitable tools’ are not exclusive to a specific task and
TPM Total Productive Maintenance See
each task may be suited to one or more tools and
p44
may well be used together in order to arrive at the
OEE Overall Equipment Effectiveness See desired result.
p45
When selecting which tool to use sometimes there is
LCVR Life Cycle Value Realisation See a good fit and sometimes there isn’t and this makes
p46 it difficult to prescribe which tools you should use.
Some are forward looking, some look at historical
Table 3: Reliability Engineering “tools”

32. NOTE: A search for RCM and related publications on the Internet reveals appearances in a plethora of guises - hyphenated and not hyphenated, Centered and Centred,
capitalised and lower case. In this SSG we have adopted RCM as the generic term to cover all of the above variations.

19 © Copyright The Institute of Asset Management 2016. All rights reserved.


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

events, some focus on costs, some focus on risk, some • HAZOP


use actual data and some are driven by probabilities, A detailed method for the systematic examination
but all have value within Reliability Engineering and of a well-defined process or operation, either
often, as mentioned above, the best result comes planned or existing and often used to identify
from using more than one of these tools together. significant operability or quality problems.

Some tools help with identifying what has happened, • ETA


some with uncovering the range of possibilities of A commonly applied technique used for
what may happen and the combination of the two identifying the consequences that can occur
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

can be a powerful aid. Different tools may also following a potentially hazardous event and
provide qualitative or quantitative findings but one suited to design stage and/or pre-commissioning
thing that is always important to bear in mind is the Reliability Engineering studies.
extent of the problem being addressed. Whether
what is optimal for a single asset analysed in isolation • RCM
is still optimal for the wider system or the company A methodology used to identify the strategies
as a whole needs to be considered if line of sight which can be implemented to manage the failure
is to be maintained. Line of sight is defined33 by modes causing the functional failure of any
the IAM as “the clear connectivity between the physical asset in a given operating context
organization strategic plan (commonly called the through asking the seven basic questions of RCM.
business plan) and the on-the-ground daily activities
of individual departments (planning, engineering, •CRO
procurement, operations, maintenance, performance Often used for long term investment planning
management, etc.).” activities to model the impacts of multiyear
factors and look for optimal solutions as to when
The importance of eliminating barriers such as silos to repair, refurbish, or replace assets while
for P&L and KPIs, in addition to the sharing of data, maintaining an acceptable risk profile and
are equally essential so that business processes are balancing multiple constraints.
viewed holistically. The sharing of data provides more
input for tools and often requires support from top • RCA
management but this also helps develop line of sight. An umbrella term for various reactive
methodologies used to identify the root cause of
Ultimately the choice of tool depends on the failures after an event has occurred. The objective
challenges and/or objectives within each organisation is to make recommendations to prevent reoccurrence
but as a starting point for which tools to consider the recognising that there may be more than one root
following points provide a brief overview of each tool: cause for failure and several effective actions.

• FMEA / FMECA • FRACAS / DRACAS


Techniques to identify an asset’s potential or Process driven closed loop reporting systems
actual failure modes and their effects/criticalities requiring procedures for reporting, data collection,
at various levels within a system and can support failure reporting, trend analysis and corrective
the derivation of appropriate activities though action management. Whereas FRACAS specifically
complimentary Reliability Engineering tools such addresses failure, DRACAS embraces failure and
as RCM, where applicable and effective, to ensure other incidents to lead to an overall improvement
the inherent reliability requirements are met. to the equipment capability.

• FTA • TPM
A ‘top down’ highly-structured deductive analysis More often applied to a manufacturing rather
tool which enables defined potential events to be than services environment this is a method that
evaluated by looking backward (top down) at brings together both design and operational
the potential causes, predicting probability of aspects of professional maintenance to deliver
each contributing cause happening. real improvement in an operational environment.

33. Asset Management – an anatomy, IAM

© Copyright The Institute of Asset Management 2016. All rights reserved. 20


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

• OEE of the failures but this becomes less precise as


A data collection and analysis tool that was subjectivity is involved in the determination of which
developed within manufacturing companies to failures to include in the calculations. The difficulty
understand the overall performance of an asset, in applying statistical techniques arises because
recognising that downtime is not the only cause unlike many uses of statistics where we are looking
of reduced asset performance. at means and medians where large numbers of
samples often sit, in Reliability Engineering we are
• LCVR looking at generally extreme behaviours and unlikely
Methods that assess and optimise the situations. A single failure can have far reaching
combination of all direct and indirect expenditures, safety, economic, or operational impacts so we need
cash flows, risks and performance benefits that to have techniques for predicting the likelihood of
may be associated with asset ownership or failure as well as the potential impact.
responsibility over an asset’s or asset system’s
life cycle. The user of this guide should consider how much
they can or should quantify failure probabilities and
To maintain this focus on typical business challenges consequences, how wide a view should they take,
this SSG also presents short case studies that relate single asset, asset system, single process, company
to the use of these tools and while it was not wide? They should consider whether they are looking
possible to find an exact fit for each business task at one of many similar assets or differing assets or
during the production of this SSG, we have made similar assets but in differing situations.
every endeavour to try to find a case study that
captures the challenge posed by each task and which If we start with the assumption that a population of
relates to the use of the related tools. supposedly identical components, operating under
similar conditions, fail at different points in time (refer
If there is just a single tool that is of interest please to Figure 9), then a failure phenomenon can only
feel free to focus on that section of this SSG. be described in probabilistic terms. But even this is
We have tried to create a useful introduction to not strictly accurate since operating environments
Reliability Engineering and its concepts and to tie are often different and contain a level of uncertainty
typical challenges faced by Reliability Engineering due to environmental variations. Take for instance
practitioners to the tools and case studies that will electric utilities where similar assets are installed in
provide the stepping stones to a learning path that geographically dispersed locations which are subject
will be rewarding and contribute to your growth to different operating and environmental conditions.
within this area.
ISO 31000 tells us that risk is the impact of
5.2 Mathematical Modelling uncertainty on organisational goals, thus the quality
Before describing each of the tools above, there is and quantity of asset data is crucial to reducing both
another type of technique that needs discussion and uncertainty and risk since ambiguity and subjectivity
that is the application of mathematical modelling in can cause inaccuracy in models. Reliability
Reliability Engineering. Mathematical modelling, and Engineering will always be about dealing with
in particular the use of statistical probability, is a core uncertainty but the more we can reduce uncertainty
concept in Reliability Engineering and is so pertinent the more effective we should become. Six Sigma is a
that probability is even a part of the definition of the set of techniques and tools for process improvement
word “reliability”. Since reliability is the treatment developed by Motorola. Six Sigma seeks to improve
of performance as a probability a discussion of the quality of process outputs by identifying
mathematical techniques is warranted as a part of and removing the causes of defects (errors) and
the Reliability Engineering toolkit since it provides a minimising variability.
foundation for many tools.
Monte Carlo methods are a broad class of
We can measure actual reliability in terms of actual computational algorithms that rely on repeated
failure, time between failures, or quantified impacts random sampling to obtain numerical results

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
21 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

where multiple simulations are run many times in which the probability estimate for a hypothesis is
order to obtain the distribution of an unknown updated as additional evidence is acquired. In order
probabilistic entity. They are often used in physical to evaluate the probability of a hypothesis, it is
and mathematical problems and are most useful required to specify some prior probability, which is
when it is difficult or impossible to obtain a closed- then updated in the light of new data. Bayesian
form expression, or infeasible to apply a deterministic methods also include uncertainty resulting from lack
algorithm. They are particularly useful for assessing of information.
how the failure probabilities of individual items
interact to affect the failure probability of a complex The Weibull distribution is widely used in Reliability
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

system made of these items. Engineering and elsewhere due to its versatility and
relative simplicity. Depending on the values of the
Probabilistic risk assessment is a systematic and parameters, the Weibull distribution can be used to
comprehensive methodology to evaluate risks model a variety of life behaviours and a graphical
associated with a complex engineered system representation of a Weibull distribution is used in
and is used in Event Tree Analysis and Fault Tree An Anatomy of Asset Management in Section 5.3.6
Analysis (described later in this section). Probabilistic along with depictions of lognormal distribution and
algorithms, such as used in Monte Carlo analysis, exponential distributions to represent Reliability
which depend on random input have a chance Engineering.
of producing an incorrect result. Monte Carlo
simulations address this by running the simulations And the list goes on. The purpose of the last few
multiple times to obtain a distribution of probable paragraphs is to reinforce the need for strong
results. modelling skills when applying Reliability Engineering
techniques. There are several books on this topic
Bayesian inference has found application in a which get very deep into mathematical proofs and
range of fields and is a method of inference in details of various techniques.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 22
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5.3 Reliability Engineering Tool Selection


Matrix

Requirement Intended Process Suitable Tool(s) Case Study


To support Analysis of existing asset history to identify asset FMEA/FMECA, Case Study 1
implementation ‘bad performers’ so that remedial actions can be FRACAS, RCM, Page 49
of a continuous implemented FTA
review programme
To conduct Ensure that technical characteristics of proposed ETA Case Study 2
downstream/ asset replacement are compatible with related asset Page 50
upstream design systems
check as part
of change
management
To respond to an Identify root cause of unexpected event so that FTA, ECA, RCA Case Study 3
unexpected event remedial actions can be implemented Page 52
that has occurred
To make a repair/ Document technical and commercial analysis of CRO n/a
replace decision options.
To investigate Establish sequence/ timeline of faults and determine FMEA/FMECA, n/a
reason for multiple if there is failure propagation HAZOP, ETA,
co-existing or RCA, FRACAS
recurring faults
To carry out Identify, evaluate and mitigate: n/a
pre-commissioning - potential effect of human errors. FMEA/FMECA,
risk assessment - safety and product liability problem areas FTA, HAZOP,
- non-compliance with regulatory standards RCA,
- potential single-point failures, areas of ETA
system vulnerability
To predict failure Identify: Case Study 4
modes which may - relevant critical assets FMEA/FMECA, Page 54
seriously affect - potential failure modes which would affect RCM, (HAZOP)
expected or operations
desired operation - causes of failure
- mitigation options
To develop test Determine test criteria from available documentation, FMEA/FMECA Case Study 5
plans and develop into test plans and diagnostic procedures Page 55
To improve Identify relevant operation problems (‘bad actors’) FRACAS/DRACAS, Case Study 6
ownership / Identify operation-associated failure modes. Engage RCM/TPM Page 58
operational operating team. Develop mitigating options
standards
To increase Identify failure modes from asset history FMECA, RCM Case Study 6
reliability of an Develop mitigating options FMECA, RCM Page 59
asset to realise its
inherent reliability
Table 4: Reliability Engineering Tool Selection Matrix

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
23 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5.4 FMEA / FMECA - Failure Modes,


Effects and Criticality Analysis Functional failure

Mode of failure FMEA/FMECA


5.4.1 What is it?
FMEA and FMECA are Reliability Engineering Effect of failure
techniques used to identify an asset’s potential or
actual failure modes and their effects/consequences Criticality of failure
(FMECA only)
Risk

at various levels within a system and a view to


deriving an appropriate activity, where applicable Risk exposure costs
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

and effective, to ensure the inherent reliability Optimised


requirements are met. The only difference between Cost of risk reduction options decision
making
the two methods is that FMECA, has an additional
Options evaluation process
step which is concerned with identifying the criticality
of the failure modes.
Figure 11: The role of FMECA in Optimised
This analysis tool can either be employed from a Decision Making34
‘bottom up’ asset based perspective which enables
you to look at potential results at successive
higher system levels of events occurring at system Both methods can be used during any of the life
component level or as a ‘top down’ functionality cycle stages of a product or asset.
based viewpoint, each having their own strengths
in different situations. The ‘bottom up’ FMECA is During the conceptual and design phase the methods
used in design as described below however a ‘top are used in conjunction with modelling of the whole of
down’ functional FMECA can also used with the life cycle costs for different design options to support
level of detail expanded as required or dictated to the capital investment decision process. During
by appropriate standards/guidance. The ‘top down’ operation and maintenance stages the methods
approach is accepted by Lloyds for example. are used to optimise Operations and Maintenance
Decisions. Similarly the methods are used to inform
Analysis may require a multi-disciplinary approach, Asset Disposal Decisions. Some more specific examples
ideally led by a facilitator with FMEA/FMECA where FMEA and FMECA are used are:
experience. • When an existing product is being redesigned.
• When an existing process, product or service is
5.4.2 When is it used? being applied in a new way.
Both techniques can be linked to the risk assessment • Before developing control plans for a new or
process, being its precursors, and are used in the modified process.
Life-cycle Costs and Value Optimisation processes, • When improvement goals are planned for an
facilitating the Decision Making process throughout existing process, product or service.
each life-cycle stage of a product, asset or system. • When analysing failures of an existing process,
As stated earlier in this SSG we are focused on the product or service.
‘in service’ phase of the asset lifecycle however we • Periodically and as part of continuous review,
recognise that ‘design phases’ can occur within the throughout the life of the process, product or
‘in service’ phase of an asset lifecycle due to the service.
requirement for modifications, technology refreshes • When a new process, product or service is being
etc. so FMEA/FMECA has applicability throughout the designed.
asset lifecycle.

Figure 11 illustrates the relationship between FMECA,


Risk and Optimised Decision Making.

34. Adapted from International Infrastructure Management Manual V2, 2002

© Copyright The Institute of Asset Management 2016. All rights reserved. 24


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

Like any other Reliability Engineering method, the Not all failures and failure modes are critical. The
application of FMEA and FMECA is applied to: “criticality” of a failure mode is determined with
• Operations under stated conditions. consideration to the level of risk it poses if failure was
• Required system/product functions and not prevented. “Criticality” Failures are prioritised
specifications. according to how serious their consequences are,
• A specified period, which is not always time how frequently they occur; they are normally derived
(e.g. it can be km, cycles etc.) using an agreed criticality matrix.

The understating of the following factors is critical Risk can be determined in regard to safety, social,
in the application of FMEA and FMECA: loss of reputation, environment, financial considerations
• Failure35 can be defined as the point at which an and any combination of the above and more.
asset stops performing the functions for which it
was designed or the inability of an item to meet The purpose of the FMEA or FMECA is to take
a desired standard or performance. actions to eliminate, mitigate or reduce failures,
• Partial Failures36 lead to a reduced level of starting with the highest-priority ones. Failure
performance that can be considered as separate modes and effects analysis also documents current
functional failures since a reduced level of knowledge and actions about the risks of failures, for
functionality may be acceptable for a period of use in continuous improvement.
time.
• Failure mode can be defined as the way, or 5.4.3 Complimentary Techniques
mode, in which something might fail or a single Cost Risk Optimisation (CRO), RCM, HAZOP
event that may cause a functional failure.
• Failure Effects can be defined as the physical 5.4.4 Example Procedure Steps
manifestation, if any, that result from the The following table shows the steps used when
occurrence of a failure mode. conducting a FMEA and FMECA.
• Criticality can be defined as the relative measure
of the consequence of a particular failure mode Steps Description FMEA FMECA
and its frequency of occurrence.
1 Functions: Identify and
Broadly, the failure modes of an asset can be defined record what the asset
in terms of: does (not what is),
• Capacity/utilisation, refers to the effect of over-
include required standards
or under-capacity on the required level of service. of performance.
• Reliability and levels of service, where the asset 2 Functional Failures:
fails to meet required level of service. Document the way in
• Structural, refers to the physical condition of the which the asset can fail
asset and its components, as a measure of its to fulfil its Functions.
deterioration, remaining life and service potential. 3 Failure Modes:
• Cost or economic impact, where the cost to Determine what causes
maintain or operate an asset is likely to exceed each Functional Failure.
the economic return expected, or the customer’s 4 Failure Effects: Write
willingness to pay, to retain an asset. what happens if nothing
• Obsolescence, where technological changes or
were done to predict or
lack of replacement parts can render assets prevent each Failure Mode.
uneconomical to operate.
• Operator error.
5 (Failure) Criticality:
Determine how each
Each of these types of failure has distinct attributes Failure Mode matters.
that require consideration and evaluation. Specific
Table 5: FMEA / FMECA steps
information needs to be collected in order to ascertain each
mode of failure and its magnitude and consequences.
35. the system has no output at all
36. the system is operating, but below the required standard

25 © Copyright The Institute of Asset Management 2016. All rights reserved.


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5.4.5 Prerequisites 5.4.7 Objectives


Situation • Identification of both local and far-reaching
• When there is an intended design change effects of instigating change (Change
involving replacement or modification to a system Management).
component (i.e. material change to improve • Identification of risk consequent on instigating
quality). change, and generation of ideas for reducing that
• When there is an intended change to the way in risk.
which a process is to be controlled (i.e. • Identification of opportunities for process
replacement of semi-automatic by fully-automatic improvement.
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

mechanism).
• When there is an intended change to the output 5.4.8 Derived Benefits
required from a system (i.e. increase in flow or • Captures collective knowledge of team.
pressure). • Provides more visibility of reliability performance
• When there is a need to analyse consequence(s) problems.
of a failure event in an existing system. • Increases repeatability and reproducibility across
the system.
Data • Provide documented evidence of cause and effect
• System design documents – to include system and proposed solutions.
boundary definition (i.e. Piping and • Identifies the critical to quality characteristics of a
Instrumentation (P&I) diagram). system.
• Historical failure rate data for system components • Provides a record of the analysis of the logic and
(i.e. In Service Data, NPRD 2011 etc.)37. basic causes leading to a top event.
• Operating statements/user operating requirements. • Helps solve problems at their root rather than just
fixing the obvious.
Experience • Generates recommendations for reducing risk.
• Knowledge of operation and maintenance of the • Focuses upon key areas on which to concentrate
process under consideration. quality control, inspection and manufacturing
• Ability to interpret P&I diagrams. process controls.
• Knowledge of the impact of system performance • Provides early identification of design deficiencies.
change outside the process under consideration • Evaluates the probability (or rate of occurrence) of
(i.e. on other processes, on customers, on anomalous operating conditions in preparation
environments). for criticality analysis.
• Familiarity with FMEA/FMECA methodology. • Decomposes large and complex systems into
smaller more manageable parts.
5.4.6 Typical Proprietary Tools
There is benefit from using an appropriate tool 5.4.9 References
that will allow interfacing with other Reliability
Engineering disciplines, employing a write once, read 10 steps to creating an FMEA
many times approach and potentially having links http://blog.gembaacademy.com/2007/06/28/10-
to an appropriate Engineering Breakdown Structure steps-to-creating-a-fmea
(or Bill of Materials or similar). No proprietary tool British Standard BS EN 60812:2006
is necessary for a simple FMECA however, as the NOTE: Transport for London (Underground, Buses,
process can be carried out using a spreadsheet, Overground, Trams etc.) require suppliers to deliver
either produced in-house or from a readily available to this standard as do many others
free template on the Internet. Care should be taken IEC 60812
in this situation that the design of spreadsheet is Analysis techniques for system reliability –
appropriate for the analysis in question and tailored Procedure for failure mode and effects analysis
to suit the particular project. (FMEA)

37. Nonelectronic Parts Reliability Data (NPRD) provides failure rate data for a wide variety of component types including mechanical, electromechanical, and electronic
assemblies. It provides summary and detailed data sorted by part type, quality level, environment and data source.

© Copyright The Institute of Asset Management 2016. All rights reserved. 26


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5.5 FTA - Fault Tree Analysis 3) Continue to break down each element with
additional gates to lower levels. Consider the
5.5.1 What is it? relationships between the elements to help
Fault Tree Analysis (FTA) is another technique you decide whether to use an “and” or an
for reliability and safety analysis. Bell Telephone “or” logic gate.
Laboratories developed the concept in 1962 for the 4) Finalise and review the complete diagram.
US Air Force for use with the Minuteman system. The chain can only be terminated in a basic
It was later adopted and extensively applied by fault: human, hardware or software.
the Boeing Company. FTA is a ‘top down’ highly- 5) If possible, evaluate the probability of
structured deductive analysis tool which enables occurrence for each of the lowest level
defined potential events to be evaluated by looking elements and calculate the statistical
backward (top down) at the potential causes, probabilities from the bottom up.
predicting probability of each contributing cause
happening. It is recommended that this is carried out 5.5.5 Prerequisites
by a trained FTA practitioner if it is to be effective.
Situation
5.5.2 When is it used? Used during the design stage of a system to identify
The deductive analysis begins with a general weaknesses in that system, and to identify probability
conclusion, then attempts to determine the specific of failure. Includes consideration of human errors
causes of the conclusion by constructing a logic (i.e. design of control system for mechanical handling
diagram called a fault tree (top-down approach). equipment)
The main purpose of the fault tree analysis is to
help identify potential causes of system failures Data
before the failures actually occur. It can also be • System design documents – to include system
used to evaluate the probability of the top event boundary definition (i.e. P&I diagram).
using analytical or statistical methods. These • Historical failure rate data for system components
calculations involve system quantitative reliability (i.e. In Service Data, NPRD 2011 etc.).
and maintainability information, such as failure • List of potential failure modes (i.e. fail to start/
probability, failure rate and repair rate. stop, fail to open/shut).

FTA is a tool to investigate problems where you keep Experience


asking questions such as: • Knowledge of operation and maintenance of the
• What could cause that? process under consideration.
• What could this cause? • Ability to interpret P&I diagrams.
• Knowledge of FTA methodology.
5.5.3 Complimentary Techniques
RCA - FTA does not function well as an RCA method 5.5.6 Typical Proprietary Tools
because it does not work well when human actions • Isograph ‘Fault Tree+’. Note that training by the
are inserted as a cause but is often used to support software tool supplier should be preceded by an
an RCA as is Event Tree Analysis (ETA). understanding of the terminology, diagrams and
calculations involved in FTA analysis.
5.5.4 Example Procedures Steps • Reliasoft.
To perform a FTA, follow these steps: • Relex.
1) Define the fault condition, and write down
the top level failure. 5.5.7 Objectives
2) Using technical information and professional • Identification of root causes of failure leading to a
judgments, determine the possible reasons higher level event.
for the failure to occur. Remember, these are
level two elements because they fall just
below the top level failure in the tree.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
27 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5.5.8 Derived Benefits A HAZOP study may also consider quality issues in
• Provides the timeline of events leading to the proposed design. It is advisable to cover aspects
incident/event. of maintenance operations, including isolation,
• Provides a record of the analysis of the logic and preparation and removal for maintenance since
basic causes leading to a top event. these often create hazards as well as an operability
• Exposes all the different relationships that are problem. Where there are manual operations
necessary to result in a specific (top) event. or activities, it may be necessary to analyse the
• Builds a framework for thorough qualitative and ergonomics of the whole operation or activity in
quantitative evaluation of the top event. detail.
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

5.5.9 References The HAZOP study method was developed by ICI


in the 1960s and its use and development was
Overview encouraged by the Chemical Industries Association.
Practical Reliability Engineering 5th Edition Patrick D Since then, it has become the technique of choice
T O’Connor. for many of those involved in the design of new
Basic Concepts processes and operations. In addition to its power
Fault Tree Analysis http://www.weibull.com/basics/ in identifying Safety, Health and Environmental (SHE)
fault-tree/index.html. hazards, a HAZOP study can also be used to search
In depth explanation and example for potential operating problems.
TA Concepts & Applications
http://www.hq.nasa.gov/office/codeq/risk/docs/ Whilst the most common use of a HAZOP study is
ftacourse.pdf during the design of a new facility, it is often applied
to existing facilities and modifications. It has also
BS EN 61025:2007
been successfully applied to process documentation,
Fault tree analysis (FTA)
pilot plant and hazardous laboratory operations as
well as tasks such as commissioning and
5.6 HAZOP - Hazard and Operability decommissioning, emergency operations and
Study incident investigation.

5.6.1 What is it? A HAZOP study is a versatile technique and good


A HAZOP, is a detailed method for the systematic results may be achieved by several different
examination of a well-defined process or operation, approaches provided the basic principles are
either planned or existing. The primary purpose of followed.
a HAZOP study is to identify and evaluate hazards
within a planned process or operation. The hazards This is a highly-structured analysis tool requiring a
may be several types, including those to people and multi-discipline team (see recommended reading for
property, both on and off-site. It is also important to suggested team composition). It is essential that this
consider the potential effects to the environment. process is led by a trained HAZOP practitioner if it is
to be effective.
In addition, a HAZOP study is often used to identify
significant operability or quality problems and 5.6.2 Limitations
this is commonly included as a defined objective Difficulties may be caused by inadequate terms of
of a study. Operability problems arise through reference or poor definition of the study scope. The
the reliability as well as the manner of the plant intention of a HAZOP study is not to become a design
operation, with consequences such as downtime, meeting. Nevertheless, some actions may result in
damaged equipment and the expense of lost, spoilt changes to the design, and potential problems may be
or out-of-specification product leading to expensive found within the intended range of operation.
re-run or disposal costs. The need to consider
quality issues varies greatly with the details of the The analysis of problems within a HAZOP study
operation but in some industries it is a crucial area. is normally qualitative although, on occasion, the

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 28
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

approximate quantitative analysis of a problem can In addition to the identification of hazards, it is


help the team to decide on the need for action and common practice for the team to search for potential
the action itself. HAZOP study may identify problems operating problems. These may concern security,
that need further quantitative analysis, including human factors, quality, financial loss or design defects.
Quantitative Risk Assessment (QRA). This is done
outside the HAZOP meeting. Where causes of a deviation are found, the team
evaluates the consequences using experience and
A HAZOP study is not an infallible method of judgement. If the existing safeguards are deemed
identifying every possible hazard or operability inadequate, the team recommends an action for
problem that could arise during the actual change or calls for further investigation of the
operations. problem. The consequences and related actions
may be risk-ranked. The analysis is recorded and
Expertise and experience within the team is crucial presented as a written report which is used in the
to the quality and completeness of a study. The implementation of the actions.
accuracy and extent of the information available to
the team, the scope of the study and the manner of 5.6.4 Complimentary Techniques
the study all influence its success. Only a systematic, ETA, FMEA/FMECA, RCM
creative and imaginative examination can yield a high
quality report but, even then, not every potential 5.6.5 Example Procedures Steps
problem will necessarily be found. Additionally, the At the outset of a HAZOP study, the team creates a
study will only be effective if the issues identified conceptual model of the system or operation. This
during the study are resolved and put into practice. uses all available, relevant material such as a firm,
detailed design, an outline of operating procedures,
5.6.3 When is it used? material data sheets and the reports of earlier hazard
A HAZOP study is a structured analysis of a system, studies. Hazards and potential operating problems
process or operation, carried out by a multi- are then sought by considering possible deviations
disciplinary team. The team proceeds on a line-by- from the design intention of the section or stage
line or stage-by-stage examination of a firm design under review. The design intention should include a
for the process or operation. Whilst being systematic statement of the intended operating range (envelope).
and rigorous, the analysis also aims to be open and
creative. This is usually more limiting than the physical design
conditions. For those deviations where the team can
This is done by using a set of guidewords in suggest a cause, the consequences are estimated
combination with the system parameters to seek using the team’s experience and existing safeguards
meaningful deviations from the design intention.
A meaningful deviation is one that is physically
possible – for example, no flow, high pressure THE HAZOP PROCESS
or reverse reaction. Deviations such as no Select equipment node
temperature or reverse viscosity have no sensible,
Choose deviation orparameters
physical meaning and are not considered. The & guide words
team concentrates on those deviations that could
Identify causes
lead to potential hazards to safety, health or the
environment. It is important to distinguish between Associate consequences
the terms hazard and risk. They have been defined
Apply risk ranking
as follows: a ‘hazard’ is a physical situation with
the potential for human injury, damage to property, Agree actions to be taken
damage to the environment or a combination
of these. A ‘risk’ is the likelihood of a specified Monitor actions for completion

undesirable event occurring within a specified period


or in specified circumstances. Figure 12: The HAZOP process38

38. http://www.isograph.com/software/hazop/

29 © Copyright The Institute of Asset Management 2016. All rights reserved.


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

are taken into account. Where the team considers 5.6.7 Typical Proprietary Tools
the risk to be non-trivial or where an aspect requires Isograph ‘Hazop+’. Note that training by the software
more investigation, a formal record is generated to tool supplier should be preceded by an understanding
allow the problem to be followed up outside the of the terminology, diagrams and calculations involved
meeting. The team then moves on with the analysis. in HAZOP analysis

The validity of the analysis obviously depends upon 5.6.8 Objectives


having the right people in the team, the accuracy of the Identification of deviations from the design intent, and
information used and the quality of the design. analysis of consequential potential risks
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

It is normally assumed that the design work has been


done in a competent manner so that operations 5.6.9 Derived Benefits
within the design envelope are safe. Even where this • Provides documentation of cause and effect and
is the case, the later stages of the project must also be proposed solutions.
carried out correctly – that is, engineering standards • Provides a record of the analysis of the logic and
are followed and there are proper standards of basic causes leading to a top event.
construction, commissioning, operation, maintenance • Captures collective knowledge of team.
and management. A good HAZOP study tries to take • Identifies and addresses credible deviations from
account of these aspects and of the changes that the design intent.
can reasonably be expected during the lifetime of the • Identifies the ‘critical to quality’ characteristics of
operation. Such a study may well identify problems that a system.
are within the design limits as well as problems which • Solves problems at their root rather than just
develop as the plant ages or are caused by human error. fixing the obvious.
• Focuses upon key areas on which to concentrate
5.6.6 Prerequisites quality control, inspection and manufacturing
Situation process controls.
Existence of a planned or existing process that • Generates recommendations for reducing risk.
may represent risks to personnel or equipment, or • Provides a better understanding of uncertainty
which may give rise to problems which may prevent and impact of potential risks on key factors.
efficient operation (i.e. installation of a water • Provides an early identification of design deficiencies.
treatment facility) • Facilitates decomposition of large and complex
systems into smaller more manageable parts.
Data • Documents all the different relationships that are
• System design documents – to include system necessary to result in a specific (top) event.
boundary definition (i.e. P&I diagram). • Identifies end events that would otherwise not be
• History record of previous HAZOP studies carried foreseen.
out on same system (including design stage).
• Record of all changes to process, materials, 5.6.10 References
procedures and environments implemented since
Detailed methodology
commissioning/previous HAZOP study.
Guide for the execution of HAZOP study http://
• Health and Safety Executive (HSE) history relative
www.red-bag.com/engineering-guides/247-bn-eg-
to subject system.
ue105-guide-for-the-execution-of-hazop-study.html
• M&R records relative to subject system. Relevant
legislative requirements and current conformity Recommended viewing
status. RMP HAZOP study series http://www.youtube.com/
watch?v=-rLiAKoJUDk
Experience BS IEC 61882:2001
• Knowledge of operation and maintenance of the Hazard and operability studies (HAZOP studies).
process under consideration. Application guide
• Ability to interpret P&I diagram. IEC 61882
• Knowledge of HAZOP methodology. Hazard and operability studies (HAZOP studies).
Application guide
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 30
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5.7 ETA - Event Tree Analysis nuclear industry, a package is widely used which is
able to perform both ETA and FTA.
5.7.1 What is it?
ETA is a commonly applied technique used for 5.7.3 Complimentary Techniques
identifying the consequences that can occur ETA compliments other techniques such as FTA.
following a potentially hazardous event. ETA is Where ETA looks at the outcomes of the event, FTA
particularly suited to design stage and/or pre- looks at the causes of the event. Also HAZOP.
commissioning Reliability Engineering studies. Using
ETA design and procedural weaknesses can be 5.7.4 Example Procedures Steps
identified, and probabilities of the various outcomes Steps required to perform ETA:
from events can be determined. The technique
utilises a forward logic approach for the assessment Steps Description
of reliability and safety analysis and is based on
binary logic, in which an event has or has not
happened or a component has or has not failed.
1 Define the system: Define what needs
ETA origins date back to when UKAEA first to be involved or where to draw the
introduced it in its design offices in 1968 initially to boundaries.
try to use whole plant risk assessment to optimise 2 Identify the accident scenarios: Perform
the design of a 500MW Steam Generating Heavy a system assessment to find hazards or
Water Reactor. The methodology used has not accident scenarios within the system
changed much to this day and, although it was first design.
applied for risk management for the nuclear industry, 3 Identify the initiating events: Use a hazard
it is now utilised by a multiple variety of industries analysis to define initiating events.
including: 4 Identify intermediate events: Identify
• Offshore Oil.
countermeasures associated with the
• Transportation.
specific scenario.
• Gas Production.
• Shipbuilding.
5 Build the event tree diagram.
6 Obtain event failure probabilities: If the
5.7.2 When is it used? failure probability cannot be obtained use
ETA begins with an initiating event and works fault tree analysis to calculate it.
towards the final result, with the method providing 7 Identify the outcome risk: Calculate the
information on how failure can occur and the overall probability of the event paths and
probability of occurrence. It allows for multiple determine the risk.
failures to be analysed and highlights system 8 Evaluate the outcome risk: Evaluate
weaknesses. the risk of each path and determine its
acceptability.
This technique may be applied to a system early in
the design process to identify potential issues that 9 Recommend corrective action: If the
may arise, rather than correcting the issues after outcome risk of a path is not acceptable
they occur. With this forward logic process use of develop design changes that change the
ETA, as a tool in risk assessment, can help to prevent risk.
negative outcomes from occurring by providing a risk 10 Document the ETA: Document the entire
assessor with the probability of occurrence. process on the event tree diagrams and
update for new information as needed.
Although ETA can be relatively simple, software
can be used for more complex systems to build the Table 6: Steps for performing ETA
diagram and perform calculations. This ensures
human error is removed from the process. In the

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
31 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

As stated on the previous page an event tree begins The example left can be shown as a flowchart below:
with an initiating event such as: component failure,
increase in temperature/pressure or the release of Initiating
Event
Fire dectected? Fire alarm works? Sprinkler works? Resultant event

a hazardous substance. The consequences of the


initiating event are then followed through a series of Y Limited damage
possible paths, with each path assigned a probability Y
N
of occurrence and the probability of the various Extensive damage

outcomes can be calculated. Y People escape

Fire starts N Y Extensive damage


Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

Below is an example of where Event Tree Analysis N


Wet escape

N
could be used during the design phase of a fire Possible fatalities
protection system. Extensive damage

Figure 13: Example ETA flowchart


In the following example fire protection is provided
by a sprinkler system. Audible alarms are used for
the alerting of office staff, visitors and contractors to By using this method it can also ensure design factors
exit the building. are put in place prior to new assets being purchased,
such as the need for extra equipment. This shows
1) Smoke detectors should detect a rise in the resultant event if any of the components of the
temperature if a fire was occurring, or it will system were to fail. Mathematical calculation can be
not if failure has occurred. applied to work out the likely chance of failure. In
2) The smoke detectors identify a change in the scenario above this can be seen in the following
temperature and send a signal to the control step below:
panel. The control panel will either work, or
it will not if failure has occurred.
Initiating Fire spreads Sprinkler fails People cannot Resultant event Scenario
3) The control panel has identified the signal Event quickly? to work? escape?

from the smoke detectors and sends a signal P = 0.5


1
to the audible alarm system. Y Multiple fatalities

4) The audible alarm system should detect the P = 0.3

signal from the control panel and activate Y


N
the siren, or it will not if failure has occurred. P = 0.1
P = 0.5
2
Y Loss / damage
5) The audible alarm system has identified the Fire starts N
signal from the control panel and activates Frequency = 1/yr P = 0.7 Fire controlled
3

the siren. N
4
6) The control panel has identified the signal P = 0.9 Fire contained

from the smoke detectors and sends a signal


to the sprinkler system. Figure 14: Mathematical calculation applied to
7) The sprinkler system should detect the signal ETA flowchart
from the control panel and activate the
sprinkler, or it will not if failure has occurred. Fire start 1 per year:
8) The sprinkler system has identified the signal 1) 10% chance the fire spreads quickly (90% is
from the control panel and activates the doesn’t and therefore is contained).
sprinkler system. 2) 30% chance the sprinklers fail to work (70%
chance they do work and therefore the fire is
controlled).
3) 50% chance people cannot escape (50%
chance they can escape resulting in loss or
damage).
4) If 50% cannot escape then the likelihood is
there will be multiple fatalities.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 32
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

At the end of each branch (scenario) the probabilities 5.7.8 Derived Benefits
are multiplied together as so: • Provides a timeline of events leading to incident/
1) 0.1 x 0.3 x 0.5 = 0.015 / per year. event.
2) 0.1 x 0.3 x 0.5 = 0.015 / per year. • Accounts for timing, dependence and domino
3) 0.1 x 0.7 = 0.07 / per year. effects among various accident contributors.
4) 0.9 / per year. • Generates recommendations for reducing risk.
• Performs analysis simultaneously in the failure or
All the final properties should add up to 1 (100%). success domain.
• Facilitates decomposition of large and complex
5.7.5 Prerequisites systems into smaller more manageable parts.
Situation • Identifies end events that would otherwise not be
One or more potential ‘top level’ failure events to foreseen.
analyse (i.e. engine will not start) which could be • Facilitates increase in system dependability by
due to one or more co-existing faults which need using more appropriate activities taking into
identifying and analysing for root cause. consideration of the operating context.

Data 5.7.9 References


• System design documents – to include system
boundary definition (i.e. P&I diagram). Overview
• Historical failure rate data for system components Practical Reliability Engineering 5th Edition Patrick D
• (i.e. In Service Data, NPRD 2011 etc.). T O’Connor
• Minimum subject system reliability standard In depth explanation
• (i.e. failure rate < 1 per 10,000 hours operation). Fault Tree Handbook NUREG-0492 US Nuclear
Regulatory Commission (1981).
Experience IEC 62502
• Ability to interpret P&I diagrams. Analysis techniques for dependability. Event tree
• Familiarity with FTA process, terminology, analysis (ETA)
diagrams, calculations.

5.7.6 Typical Proprietary Tools 5.8 RCM - Reliability Centred


Isograph ‘Reliability Workbench’. Note that training Maintenance
by the software tool supplier should be preceded by
an understanding of the terminology, diagrams and 5.8.1 What is it?
calculations involved in FTA analysis. RCM has been defined and refined over the years
since 1977. In the context of asset management,
RiskSpectrum PSA – used for both ETA and FTA in the RCM can be confidently defined as: A process to
nuclear industry. define the asset management interventions required
to ensure that assets continue to meet the user
5.7.7 Objectives requirements in their current operating context.
• Calculation of predicted reliability of the subject
system. The table below captures the variation in the RCM
• Identification of critical components in the subject definitions from 1997 to date 2014.
system which contribute to vulnerability.
• Identification of focus points for maintenance or RCM is a methodology that can be used to identify
design changes which will reduce risk of system the strategies which can be implemented to manage
failure. the failure modes causing the functional failure of
• Recognition of consequences and risks of any physical asset in a given operating context. The
component failure. RCM process involves asking the following seven
basic questions which are answered in the sequence
shown right:

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
33 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

Definition of RCM
Source Description
John Moubray RCM A process used to determine what must be done to ensure that any physical
definition: 1991/1997 asset continues to do what its users want it to do in its present operating
context.
F Stanley Steve Nowlan and The term Reliability-Centered maintenance refers to a scheduled maintenance
Howard F.Heap: Dec 1978 program realise the inherent reliability capabilities of equipment
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

SAE-JA 1011 August 1998 RCM is a specific process used to identify the policies which must be
and SAE-JA 1012 Jan 2002 implemented to manage the failure modes which could cause the functional
failure of any physical asset in a given operating context
MIL-HDBK-338b-1998 – A disciplined logic or methodology used to identify preventive and corrective
Page 58 maintenance task to realise the inherent reliability of equipment at a
minimum expenditure of resources
Dependability management: RCM is a method established for achieving an initial preventive maintenance
BS IEC 60300-3-10-2001 - program, which is intended to ensure that inherent reliability and safety levels
Page 27 for equipment and structures are achieved and maintained
Dependability management: RCM is a method to identify and select failure management polices to
60300-3-11 : 2009 efficiently and effectively achieve the required levels of safety, availability and
:Application of Reliability economy of operation. Failure management policies can include maintenance
Centred maintenance activities, operational changes, design modification, or other actions in order
to mitigate the consequence of failures.
Defence Standard 00-45: The Defence Standard that provides the requirements for deriving
Reliability Centred maintenance programmes within the Ministry of Defence.
Maintenance to Manage
Engineering Failures
S4000P: International Development of preventive maintenance task requirements and a
Specification For Developing methodology to continuously improve preventive maintenance during a
and Continuously Improving product (asset) in service phase therefore being applicable during the whole
Preventive Maintenance product (asset) lifecycle.

1) What are the functions and associated RCM analyses require a multi-discipline team or
desired standards of performance of the multi stakeholder engagement, and are ideally
asset in its present operating context led by a facilitator with current experience in the
(functions)? methodologies (and proprietary tools) to be used.
2) In what ways can it fail to fulfil its functions
(functional failures)? 5.8.2 When is it used?
3) What causes each functional failure (failure Typically RCM implementations focuses on risk
modes)? reduction / mitigation while looking to realise the
4) What happens when each failure occurs inherent reliability that has been designed into
(failure effects)? the asset. The quality of RCM analysis including
5) In what way does each failure matter (failure the knowledge of the various stakeholders
consequences)? i.e. maintainer, operator, Original Equipment
6) What should be done to predict or prevent Manufacturer (OEM) etc. confidence and quality on
each failure (proactive tasks and task intervals)? historic data coupled with data quality of enterprise
7) What should be done if a suitable proactive systems play a significant part and provide the
task cannot be found (default actions)? foundation for making auditable and defensible asset

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 34
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

management decisions. In addition to the failure with the relevant stakeholders. The differing
management strategy, organisations use the RCM interactions can be conducted dependent upon the
analysis for the following: pressures of an organisations resources and ability
• To understand fully the risk levels to which to support the analysis activity in a full time capacity
organisations operate their asset systems. with efficiencies to be gained through the individual
• To define the critical capital and operating engagement. The asset to be analysed can also be
maintenance requirements of all major determined dependent upon the requirement and
operational assets. strategy of the organisation i.e. complete systems,
• To optimise operational and maintenance sub systems or lower i.e. motor or pump set that is
procedures. the subject of reliability issues. They follow the RCM
• To enhance asset knowledge including the process of defining function, functional failure, failure
mechanics, criticality, and frequency of failure. modes, effects, criticality, consequence analysis and
• To identify the opportunity for operational and task analysis. The period and duration of the RCM
capital expenditure savings, by delivering analysis may vary depending on the complexity of
justifiable operational and maintenance regimes. the asset systems, operating context, capability and
• To aid the design of future capital works and the competency of the stakeholders. The selection of
procurement of plant and equipment by the intervention actions is built on decision logic
understanding the full implications of the defined in the standards, which at a high level takes
operation of individual assets and sites. into account:
• To verify and validate the operational and • Will the functional failure become apparent to the
maintenance risk and operating regimes. operator under normal circumstances if the
• To redefine the inspection strategy by asset Failure Mode occurs on its own?
group/ class.
• To identify and support the use of Condition The answer to this question tells you whether the
Monitoring. failure is hidden or evident and takes you down the
• To identify and embed Reliability in the design. appropriate side of the algorithm with more decision
• To define Asset level Inspection strategy. logic questions, for example if the answer to the
• To develop basic and advanced analytics for Asset above question is “no”:
Systems and Assets. • Does the functional failure cause loss or
• To develop the process for application of the RCM secondary damage that could have an adverse
analysis for a wider population of similar Asset effect on operating safety or lead to a serious
groups with similar operating context but with a environmental impact?
varying level of risk. • Does the hidden functional failure in combination
with a second failure /event cause loss or
5.8.3 Complimentary Techniques secondary damage that could have an effect on
Activities that support the RCM process are: operating safety or lead to a serious
• Backfit RCM. environmental impact?
• In Service Maintenance Optimisation Review. • Is there a design change which will make a
hidden functional failure apparent to the crew or
Whilst not necessarily considered a complimentary operator?
technique the development of FMEA/FMECA within
the RCM process precedes the RCM algorithm / Based on the decision logic a range of options
decision process. Also compliments CRO. ranging from:
• On Condition Tasks.
5.8.4 Example Procedures Steps • Hard Time Tasks (can also be known separately as
RCM analysis is typically facilitated / conducted by Scheduled Restoration and Scheduled
experienced practitioners who can either undertake Replacement Tasks).
a series of structured analysis sessions for an asset • Failure Finding Tasks.
system or an asset with multidiscipline teams or • Redesign (Mandatory or Desirable).
develop the analysis through individual engagement • No Scheduled Maintenance.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
35 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

1 Will the loss of function


2 Is there
Yes caused by this failure mode
on its own become evident
No 5 Is there an An RCM Decision Algorithm
an intolerable to the operating crew under intolerable
risk that the normal circumstances? risk that the
No 3 Is there an intolerable multiple No
6 Is there an intolerable
effects of this risk that the multiple
risk that the effects of this failure could
failure mode failure could could breach
failure mode could breach
No 4
Does the failure kill or injure 7 Does the multiple
could injure or No a known environmental No No
a known environmental mode have a direct someone? failure have a direct
kill someone? standard or regulation?
standard or regulation? adverse effect on adverse effect on
operational capability? operational capability?
Yes Yes Yes Yes
Yes Yes
8 Is a scheduled on- 11 Is a scheduled on- 13 Is a scheduled on- 15 Is a scheduled on- 18 Is a scheduled on- 21 Is a scheduled on-
condition task technically condition task technically condition task technically condition task technically condition task technically condition task technically
feasible and worth doing? feasible and worth doing? feasible and worth doing? feasible and worth doing? feasible and worth doing? feasible and worth doing?
Yes No Yes No Yes No Yes No Yes No Yes No
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

Scheduled Scheduled Scheduled Scheduled Scheduled Scheduled


on-condition task on-condition task on-condition task on-condition task on-condition task on-condition task
9 Is a scheduled 12 Is a scheduled 14 Is a scheduled 16 Is a scheduled 19 Is a scheduled 22 Is a scheduled
restoration or scheduled restoration or scheduled restoration or scheduled restoration or scheduled restoration or scheduled restoration or scheduled
discard task technically discard task technically discard task technically discard task technically discard task technically discard task technically
feasible and worth doing? feasible and worth doing? feasible and worth doing? feasible and worth doing? feasible and worth doing? feasible and worth doing?
Yes No Yes No Yes No Yes No Yes No Yes No

Scheduled discard Scheduled discard Scheduled discard Scheduled discard Scheduled discard Scheduled discard
or restoration task or restoration task or restoration task or restoration task or restoration task or restoration task
10 Is a combination of 17 Is a scheduled failure- 20 Is a scheduled failure- 23 Is a scheduled failure-
No scheduled No scheduled
tasks technically feasible maintenance maintenance finding task technically finding task technically finding task technically
and worth doing? feasible and worth doing? feasible and worth doing? feasible and worth doing?
Yes No Yes No Yes No Yes No
Redesign may Redesign may
be desirable be desirable
Combination Redesign Failure-finding Redesign Failure-finding No scheduled Failure-finding No scheduled
of tasks compulsory task compulsory task maintenance task maintenance

Redesign may Redesign may


be desirable be desirable

EVIDENT SAFETY AND EVIDENT OPERATIONAL EVIDENT NON-OPERATIONAL HIDDEN SAFETY AND HIDDEN OPERATIONAL HIDDEN NON-OPERATIONAL
ENVIRONMENTAL CONSEQUENCES CONSEQUENCES ENVIRONMENTAL CONSEQUENCES CONSEQUENCES
CONSEQUENCES Over a period of time, the failure Over a period of time, the failure CONSEQUENCES Over a period of time, the failure Over a period of time, the failure
The failure management policy management policy must cost less management policy must cost less The failure management policy management policy must reduce management policy must reduce
must reduce the risk of the than the cost of the operational than the cost of repairing the failure must reduce the risk of the multiple the probability of a multiple failure the probability of a multiple
failure to a tolerable level consequences plus repair costs failure to a tolerable level (and associated total costs) to failure (and associated total
an acceptable minimum costs) to an acceptable
EFFECTIVENESS CRITERIA minimum

Figure 15: RCM Failure Effect Characterisation39

Note: Dependent on the selection of an applicable 5.8.5 Prerequisites


and effective task and the information that has Situation
supported the periodicity of this task i.e. engineering These methodologies may be applied to determine
judgment, Age Exploration can be used to assess an optimum maintenance regime for any physical
the periodicity as being the most suitable and/ or be asset which may fail. Historically they have been
adjusted based on feedback. seen as/can be resource-intensive and therefore most
suitable for application to critical equipment (i.e.
The final outcome of RCM analysis post-validation crane hoist equipment, selected for HSE, legal and/
becomes a defensible, auditable and well- or process criticality) however differing approaches to
defined failure management strategy ready for the application of these methodologies can see them
implementation. Many organisations successfully applied in an efficient manner mindful of a business’s
implement the analysis and realise the benefits that needs and pressures on its resources. Ideally carried
this approach can bring going on to review the out in the early phases of a project lifecycle, the
analysis on a regular and periodic basis and testing the derived optimum maintenance schedule will then be
applicability and effectiveness of the tasks that have subject to ongoing in service review to identify new
been derived through activities such as Backfit RCM failure modes and test applicability and effectiveness
and In Service Maintenance Optimisation reviews. of current maintenance tasks; they can also be
applied as a review of legacy equipments with OEM
Note that some organisations have developed their based maintenance to again identify an optimum
own decision logic based on RCM such as the maintenance regime.
example below from Network Rail which is focused
on hidden failures.

39. SAE JA1012 “A Guide to the Reliability-Centered Maintenance (RCM) Standard,” January 2002

© Copyright The Institute of Asset Management 2016. All rights reserved. 36


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

H1
Does the failure mode, in combination with a further failure, cause a
loss of function or secondary damage which could have a direct effect Is the failure evident?
on safety or the environment?

H2 Please use the Evident Failure diagram


Does the failure mode, in
Yes
combination with a further
failure, have a direct effect on
the operational capability of
the asset?

Safety and Environmental Operational Non-Operational

HS1 HO1 HN1

Is an on-condition Is an on-condition Is an on-condition


task to detect whether task to detect whether task to detect whether
a failure is about to Do the task at a failure is about to Do the task at a failure is about to Do the task at
occur effective? less than the occur effective? less than the occur effective? less than the
Yes P-F Interval Yes P-F Interval Yes P-F Interval
Does it reduce the Is the task worth doing? Is the task worth doing?
potential multiple failure
rate to a tolerable level?

No No No
HS2 HO2 HN2

Is a scheduled restoration Is a scheduled restoration Is a scheduled restoration


task effective? task effective? task effective?
Do the task at Do the task at Do the task at
less than less than less than
Does it reduce the Yes Is the task worth doing? Yes Is the task worth doing? Yes
the Life the Life the Life
potential multiple failure
rate to a tolerable level?

No No No
HS3 HO3 HN3
Is a scheduled discard Is a scheduled discard Is a scheduled discard
task effective? Do the task at task effective? Do the task at task effective? Do the task at
less than less than less than
Does it reduce the the Life Is the task worth doing? the Life Is the task worth doing? the Life
Yes Yes Yes
potential multiple failure
rate to a tolerable level?

No No No
HS4 HO4 HN4
Is a scheduled failure-finding Is a scheduled failure-finding Is a scheduled failure-finding
task effective? Do the task at task effective? Do the task at task effective? Do the task at
the failure-finding the failure-finding the failure-finding
Does it reduce the Yes interval Is the task worth doing? Yes interval Is the task worth doing? Yes interval
potential multiple failure
rate to a tolerable level?

No No No

Redesign is mandatory
No scheduled maintenance No scheduled maintenance
Redesign may be desirable Redesign may be desirable

Figure 16: Decision logic for functional failures not evident to Operations Personnel during normal duties40

Data 5.8.6 Typical Proprietary Tools


• Manufacturer’s documentation, including Mutual Consultants Ltd ‘RCM Desktop’. Note that
drawings, parts list and recommended Mutual Consultants are licensees of Aladon LLC, and
maintenance regime. that the current version of RCM Desktop is based
• Operating Statements, Standardised Operating on RCM 2 methodology. Note also that training by
Checks. the software tool supplier should be preceded by
• Component replacement criteria (i.e. brake lining an understanding of the terminology, diagrams and
residual thickness). calculations involved in RCM analysis.
• Historical failure rate data for system components
(i.e. In Service Data, NPRD 2011 etc.). Whatever toolset is selected, many only comprise
• Functional requirements (i.e. hoist rate/capacity). of a tool for recording information & decisions as
• Statutory requirements (i.e. regular testing). well as providing appropriate outputs and an audit
• FMEA/FMECA study findings (if available). trail. Successful analysis is about the methodology
• An RCM Management Plan detailing appropriate not the tool and successful analysis recording can
standards, tools and stakeholders relevant to the be achieved (in the worst case) by use of an Excel
analysis. spreadsheet or even a Word document.

Experience 5.8.7 Objectives


• Familiarity with RCM process, terminology, • Logic-backed, safe and defensible maintenance
diagrams, calculations. regime.
• Knowledge of domain and required asset
functionality and reliability.

40. Source: Network Rail

37 © Copyright The Institute of Asset Management 2016. All rights reserved.


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5.8.8 Derived Benefits 5.9.2 When is it used?


• Supports operations and maintenance CRO is often used for long term investment planning
cooperation. activities to model the impacts of multiyear factors
• Retains and shares asset and system knowledge. and look for optimal solutions as to when to repair,
• Supports reduction of overall costs by more refurbish, or replace assets while maintaining an
efficient planned maintenance effort. acceptable risk profile and balancing multiple
• Provides a fully documented audit trail for constraints. These techniques can be focused on
verification. organisational processes (Bayesian approach) or
• Acts as a management tool for maintenance specific assets, or groups of similar assets modelled
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

managers which enhances control and direction together based on specific shared characteristics or
of maintenance programs. individual asset characteristics.
• Delivers a clear rationale to the maintenance
organisation of its objectives and purpose and
TOTAL
the reason for which it is performing the 3500 IMPACT
scheduled maintenance tasks. 3000

Business impact (£k/yr)


• Provides a vital feedback for design / 2500 OPTIMUM RISKS OF
modification / enhancement / capital planning 2000 FAILURE
activity. 1500
‘Balance point’
1000 but not optimal

500 PREVENTATIVE
5.8.9 References ACTION COSTS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
In depth explanation Maintenance or Life Cycle (months/years)
RCM (3) by John Moubray/The Aladon Network/
J.R.Paul Lanthier, due August 2014. Figure 17: Balancing risk and cost to find the
optimum solution41
IEC 60300-3-11
Dependability management - Part 3-11: Application
guide - Reliability Centred Maintenance
5.9.3 Complimentary Techniques
ASD 4000P RCM, FMEA/FMECA, RCA
International specification for developing and
continuously improving preventive maintenance 5.9.4 Example Procedures Steps
One of the earliest tasks in the creation of a
CRO model is the determination of its Functional
5.9 CRO - Cost Risk Optimisation Specification. This describes the purpose of the
model, and contains high-level specifications which
5.9.1 What is it? should address the following questions:
The goal of CRO like any optimisation problem is to • What assets are to be modelled?
find the values of controllable factors determining • What are the Key Performance Indicators for the
the behaviour of a system that maximises desirable system, and how will they be modelled?
outcomes. There are different techniques that can • What are the likely data requirements, and how
be used to achieve the desired results but the two will issues with the data be handled?
principle methods are probabilistic, and Bayesian. • What costs should be calculated in the model?
While Bayesian methods are also probabilistic they • What are the possible intervention methods, and
follow more of a rules-based approach as opposed (in general) their effects?
to a simulation approach. From the input data, • What scenarios might be run?
a model is formed which predicts how the asset
base will perform into the future, both in terms Intervention is a term used in CRO to represent
of serviceability and cost. CRO analyses require proactive modification of assets or their behaviour to
a multi-disciplinary team. They are essentially manage costs and/or performance. Depending on
led by a facilitator with current experience in the the performance of the assets, varying operational
methodologies (and proprietary tools) to be used. costs will be incurred. The need for reduction of

41. Asset Management – An Anatomy V3, IAM, July 2014

© Copyright The Institute of Asset Management 2016. All rights reserved. 38


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

costs leads to the suggestion of interventions (which • Generates recommendations for work and
will have their own intrinsic, direct costs), which investment based on level of desired risk.
modify the assets (or their performance) in some • Provides a better understanding of uncertainty
way. This modifies the outcomes of the performance and impact of potential risks on key factors.
predictions, and thus the operational costs incurred • Evaluates and compares multiple investment
and the need for further intervention. The loop scenarios.
continues until the optimum balance is determined
between operational and capital expenditure subject 5.9.9 References
to risk and performance constraints.
Overview
5.9.5 Prerequisites Cost/Risk Optimisation (John Woodhouse) http://
Situation www.plant-maintenance.com/articles/ Costriskop.
This tool is ideally suited to where there are pdf
conflicting alternative options (i.e. shutdown Case study
planning on continuous process plants when repair/ SASOL experiences in cost/risk optimization (SSF
renew decisions are involved). This methodology Pty / TWP Ltd) http://www.twpl.co.uk/ _assets/
is based on focusing on the financial impacts of client/case-study/SASOL%20cost%20risk%20
the competing options. Normally used for major optimization%20ERTC%202003v2.pdf
decisions (i.e. repair/replace furnace flare stack), the
methodology can be applied for minor stand-alone
assets (repair/replace pump). 5.10 RCA - Root Cause Analysis
Data 5.10.1 What is it?
• Historical failure rate data for system components RCA is an umbrella term for various different
(i.e. In Service Data, NPRD 2011 etc.). methodologies used to identify the root cause of
• Cost and budget data. failures (as opposed to addressing the symptoms
• Functional requirements (i.e. capacity/emission of the failure) with the objective being to make
limits). recommendations to prevent the reoccurrence of the
• Technical and cost data for alternatives being failure.
considered.
• FMEA/FMECA study findings (if available). RCA is a reactive method, as analysis is completed
after an event has occurred and therefore there may
Experience be more than one root cause for failure, thus there
• Previous experience with CRO process (i.e. may be several effective actions that address the root
SALVO), terminology, diagrams, calculations. cause. RCA can be considered to be an iterative
• Knowledge of required asset functionality and process and a tool of continuous improvement.
reliability.
This tool is simple to use in order to rapidly identify
5.9.6 Typical Proprietary Tools the originating (root) cause of a range of target
• Decision Support Tools (DSTL) APT software suite. actual events (i.e. flooding occurred, power failed). It
• WiLCO from SEAMS. is most often the methodology of choice when rapid
analyses are required.
5.9.7 Objectives
• Better understanding of costs and risks involved 5.10.2 When is it used?
with selected option. The purpose of RCA is to identify the factors
• Optimum analysis of selected options. influencing an asset prior to failure in order to
identify what behaviours, actions, or conditions need
5.9.8 Derived Benefits to be changed to prevent recurrence of a similar
• Screens out potential costly system modifications failure, and to identify the lessons to be learned to
(retrofits). promote the achievement of better consequences.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
39 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

To be effective, RCA must be performed pump / float switch / power supply).


systematically, with conclusions and root causes that • Historical failure data for asset (i.e. FRACAS/
are identified backed up by documented evidence. DRACAS / Computerised Maintenance
Usually a team effort is required. To be effective, Management System (CMMS) data).
the RCA should establish a sequence of events or
timeline to understand the relationships between Experience
contributory (causal) factors, root cause(s) and the • Familiarity with RCA process.
defined problem or event to prevent in the future. • Knowledge of required asset functionality and
associated systems.
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

5.10.3 Complimentary Techniques


Typical RCA tools in common use includes: 5.10.5 Typical Proprietary Tools
• Cause and Effect / Fishbone (Ishikawa) Diagram. No proprietary tools are required. Templates are
• Pareto Chart. available for download from the web to assist in
• FTA. documenting the available methodologies.
• FMEA / FMECA.
• CRO. 5.10.6 Objectives
• FRACAS/DRACAS. • To identify root cause of actual events.
• To identify opportunities for increasing system
Multiple RCA’s can be managed through the use of reliability.
FRACAS/DRACAS.
5.10.7 Derived Benefits
5.10.4 Example Procedures Steps • Provides a timeline of events leading to incident/
event.
• Captures collective knowledge of team.
Identify payback Implement
Define the
problem on corrective corrective • Increases visibility of reliability performance
actions actions
problems.
• Identifies and addresses credible deviations from
Gather data Identify
corrective
Monitor success
of corrective
the design intent.
& evidence
actions actions • Increase repeatability and reproducibility across
the system.
• Provides documented evidence of cause and
Establish Why?
timeline effect and proposed solutions.
• Accounts for timing, dependence and domino
Figure 18: Example RCA procedure steps effects among various accident contributors.
• Solves problems at their root rather than just
fixing the obvious.
Situation • Documents all the different relationships that are
This tool is simple to use in order to rapidly identify necessary to result in a specific (top) event.
the originating (root) cause of a range of target • Builds a framework for thorough qualitative and
actual events (i.e. flooding occurred, power failed). It quantitative evaluation of the top event.
is most often the methodology of choice when rapid
analyses are required. 5.10.8 References
Guidelines
Data
Root Cause Analysis: Simplified Tools and
• Definition of problem (i.e. flooding has occurred
Techniques (Bjorn Anderson)
in oil cellar).
• Timeline of event, including sequence of Guidelines
potentially contributing events (i.e. sump pump The Root Cause Analysis Handbook (Max Ammerman)
failed to start, maintenance carried out). IEC 62740
• Understanding of mitigating system (i.e. sump Root Cause Analysis (RCA)

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 40
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5.11 FRACAS/DRACAS - Failure/Data, and manage the data outputs from the individual
Recording, Analysis and Corrective Action RCA activities to allow systematic improvement of
asset or system performance as applicable.
Systems
Fracas Vs Dracas
5.11.1 What is it?
The terms FRACAS and DRACAS are frequently
The purpose of a FRACAS or DRACAS is to provide
confused, misused and interchanged unfortunately
a defined, systematic closed loop process to address
too often to the determent of equipment
non-conformance that occur within an asset or
development. Whereas FRACAS specifically
system, develop and manage a plan of action to
addresses failure, DRACAS embraces failure as well as
correct them, and if applicable ensure that these
other incidents and observations which with failures
have a positive effect on future performance.
if appropriately addressed may lead to an overall
improvement to the equipment capability.
A FRACAS/DRACAS can be seen as a process to
facilitate individual RCA activities to be conducted

Term Description
Incident Incidents are not confined to those ‘known’ faults and failures which affect the ability
of equipment to perform or be operated satisfactorily (this would be to pre-judge an
incident as a fault/failure). Other events, such as observed deterioration, may also be
reported as incidents as well as actions, such as modifications, scheduled maintenance
and the repair/replacement of faulty items. Collecting such ‘data’ in a FRACAS/DRACAS
provides important information on all occurrences and observations which arise during
a reporting period and facilitates sentencing, classification, failure and trend analysis.
Failure An event that prevents an item from performing a required function to the required
specification.
Observation Technical observations from hands-on personnel are used for keeping a record of
problems which are not defects or failures in their own right. For example, an item may
be superficially degraded but still fully capable of operating satisfactorily. These records are
useful in anticipating failures from progressive, worsening conditions. The observation of a
degradation condition may initiate a design review, even though failure has not arisen.
Non-Conformance Failure to conform to accepted standards.
Table 8: FRACAS/DRACAS terminology

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
41 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5.11.2 When is it used? ensuring that all the responsible parties fulfil their
A FRACAS/DRACAS is by its nature a process driven obligations as detailed in the procedure.
closed loop reporting system. It requires procedures • Program/Project Manager - is responsible for
for reporting, data collection, failure reporting (which ensuring the implementation of corrective actions
may come from disparate defect reporting systems) at the program level as a result of the evaluation
and trend analysis and corrective action management and identification of problems relating to
and can include the formation of an Incident Reliability issues.
Sentencing Committee, made up of the relevant • Reliability Engineer - is responsible for reviewing
stakeholders that review and allocate sentencing collected “Field Data” for quality and
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

codes that start to identify the root cause of the completeness. In addition the engineer will review
non-conformance. FRACAS/DRACAS procedures will and analysis all data from all sources to identify
guide how failures are reported, where information any potential problems and trends.
is stored, which analysis methodologies to use, when • System Maintainers - are responsible for the
they are used and how they are used, therefore actual collection of “Field Data”. The System
a FRACAS/DRACAS procedure would be specific Maintainer may be the organisation implementing
to an individual organisation’s needs. A FRACAS/ the FRACAS/DRACAS or the customer of the
DRACAS procedure should also define the individual system / product.
responsibilities at all levels, and a description of the • Production/QA Staff - would be responsible for
basic elements required. A typical FRACAS/DRACAS the collection of failure data during the
flow is detailed below: production phase.

Dependent upon the complexity of the system


FRACAS FAILURE the management of a FRACAS/DRACAS could be
UPDATING OCCURANCE achieved by using a simple form or may require a
more complex database.

FRACAS/DRACAS Corrective Action Process


1) Data Acquisition
RECOMMENDED The data highlighting incidents for FRACAS/
DATA
CORRECTIVE COLLECTION DRACAS will be collected and collated in a
ACTIONS
variety of ways from different data sources
dependent upon the type of asset or the
type of incident (non-conformance). The
types of data/failure that will provide a
FAILURE
practical input, how they will be interrogated,
MAINTENANCE FAILURE by whom and how often will need to be
TREND REPORT
ANALYSIS clarified further by the organisation
implementing this process. There is a need
Figure 19: Typical FRACAS flow for ‘Incident Report Forms’ designed for
internal use within the project to allow
departments to raise appropriate incidents
5.11.3 Complimentary Techniques for consideration and inclusion within the
RCA FRACAS/DRACAS process.

5.11.4 Example Procedures Steps 2) Data Recording


Typical FRACAS/DRACAS stakeholders may consist of As data is received, it will be entered into the
one or more of the following roles: FRACAS/DRACAS system/database as
• Engineering Manager – is responsible for incidents by a Data Administrator (DA). Any
ensuring that program or project requirements documentary evidence of the incident is to
and objectives, regarding the failure reporting be retained as part of the audit trail.
system, are addressed. This could include,

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 42
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

3) Data Classification Supplier or the OEM. The NA will then present


This includes initial sentencing of Non- any proposed solution, along with impact
Conformity Reports or similar defect statements, penalties and recovery costs to the
documentation. relevant stakeholders in order to gain agreement
on implementation.
4) Incident Sentencing Committee (ISC)
Processes The incident feedback information (format
ISC Meetings can be held at appropriate dependent upon source of original incident and
intervals throughout the asset life. The whether standardised Incident Report Forms are
interval will be agreed at the first ISC utilised or not) would then be recorded in the
meeting and can be varied as required and database (via the FRACAS/DRACAS administrator
as agreed by the ISC. Each ISC will consist of or DA in order to be discussed at the next
the following activities: meeting of the ISC. Once the CA process has
been completed the incident can be formally
a. Meeting Preparation closed.
i. The DC (Data Co-ordinator is responsible
for general admin, meeting agenda’s etc) will
Incident
issue the Calling notice, including the report identifies
problem
Meeting Agenda.
ii. The DC will issue the FRACAS/DRACAS
Status /Trends / Analysis Report and any
Entered in dB
other applicable documentation. Initial sentence Incident
report Evaluate
root cause
b. The actual Meeting agendas will vary
depending on circumstances but the Data Propose
following would generally be covered: administrator solution
ISC Stakeholders
i. Review of Minutes of previous ISC and
associated Actions. Implement
solution
ii. Review of updated FRACAS/DRACAS Incident
update Incident
system/database including formal agreement closure Nominated
actionee
or amendment of sentencing (as stated in
the FRACAS/DRACAS Status/Analysis report)
and any appropriate Corrective Action (CA)
decisions. Figure 21: Corrective action process roles
iii. Review status of any outstanding CAs and
agree further actions if required. 5.11.5 Prerequisites
iv. Any other business.
Situation
c. The DC will issue the Minutes of the FRACAS/DRACAS is a methodology that underpins a
Meeting. programme of continuous improvement. It is based
on a closed loop process for the collection of asset
5) Corrective Actions non-conformance data, in a format that, where
A typical corrective action process is depicted in applicable, lends itself to analysis, sentences this
Figure 21 and is carried out as required by the data and extends that analysis to recommendations
FSC. As the nature and magnitude of the for improvements or modifications. As such it is
individual tasks vary there is no overall standard not strictly a tool such as RCA, FMECA etc. that are
sequence to this process. The Action will be used in connection with a single potential or actual
given to an appropriate Nominated Actionee (NA) event, but is utilised as part of a comprehensive
and RCA may be undertaken by the Actionee management system which encompasses the use of
(individual or team), the project design team, the the ‘stand-alone tools’ listed in this SSG.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
43 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

Data quality control, inspection and manufacturing


• Historical failure rate data for system components process controls.
(i.e. In Service Data, NPRD 2011 etc.). • Develops operations and maintenance
• Non-conformance data for asset and systems (i.e. cooperation.
input from a CMMS database). The data • Generates recommendations for reducing risk.
collection element cannot necessarily be •
rigorously disciplined as an element of this is 5.11.9 References
culture driven. There are many examples of poor
quality data that has been submitted in support Overview
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

of an ‘asset non-conformance’. Organizational Best Practices for FRACAS


• Experience Implementation: Management Considerations
• Familiarity with closed-loop system of Initiation/ when Developing and Deploying a Corrective
analysis/corrective action/review. It is also worth Action System42 http://www.cadcam.com.au/plm/
noting that there may be disparate sources that WindchillQuality-4915-BestPracticesFRACAS.pdf
feed the FRACAS/DRACAS process. MIL-STD-2155(AS)
Failure reporting, analysis and corrective action
5.11.6 Typical Proprietary Tools system (FRACAS)
There are proprietary FRACAS/DRACAS systems
available however experience shows that companies 5.12 TPM - Total Productive Maintenance/
likely construct their own FRACAS/DRACAS system Manufacturing
to meet their specific requirements i.e. as a result of
their CMMS system that could provide the core data 5.12.1 What is it?
warehouse however, it should be recognised that Total Productive Maintenance (TPM), or sometimes
data capture may come from a number of disparate more valuably titled ‘Total Productive Manufacturing’,
sources which all need to be recognised and is a model that brings together aspects of
integrated into the FRACAS/DRACAS process. professional maintenance to deliver real improvement
in an operational environment.
5.11.7 Objectives
• To provide a comprehensive data warehouse
Traditionally it can be difficult to share with a project
of actual events in a format that where applicable team how they’re decisions at design can truly
can be readily analysed. effect the reliability and performance of equipment.
• To establish a discipline for the capture and entry
In addition the value of operators in taking care
of event data. and understanding machine condition as an early
• To establish a closed loop process of feedback to
notification of a deviation from standard can be
those responsible for the capture and entry of invaluable. TPM brings both of these aspects
non-conformance data in order to encourage together under the ‘Early Equipment Management’
high quality of base-line information. and ‘Autonomous Maintenance’ pillars respectively.
• To either close out or provide improvements/
modifications to assets that have had non-
conformance data reported.

5.11.8 Derived Benefits Total Productive Maintenance


• Captures collective knowledge of team.
Focused Equipment
Training and Skills

Early Equipment

• Increases visibility of reliability performance


& Environment
Safety, Health
Management

Management
Improvement
Development

Maintenance

Maintenance
Autonomous
Professional

Quality

problems.
• Provides documented evidence of cause and
effect and proposed solutions.
• Screens out potential costly system modifications
(retrofits).
• Focuses upon key areas on which to concentrate
Figure 22: The pillars of TPM43
42. This paper lists further recommended reading as a Bibliography.
43. Note diagrams showing the “pillars of TPM” often show 5,6,7,8 or more pillars depending on the source.

© Copyright The Institute of Asset Management 2016. All rights reserved. 44


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5.12.2 When is it used? 5.12.6 Typical Proprietary Tools


TPM is more often applied to a manufacturing rather There are a large number of interpretations of
than services environment - although the beauty of the TPM pyramid. It is suggested that a model
TPM is that it provides an umbrella that brings all the is developed within the organisation by the
people who have an influence on asset performance implementation team – the purpose will be to
under one model. translate the theory in to the language of the
business and market sector. This generates a model
5.12.3 Complimentary Techniques that is more engaging for the organisation and
TPM can be used as part of a larger Asset supports a sustainable implementation of TPM.
Management program, where particular focus
is required on improving the ‘Acquire, Operate, 5.12.7 Objectives
Maintain and Dispose’ parts of the ‘Lifecycle TPM will allow an organisation to deliver holistic
Delivery’, outlined in the IAM ‘Anatomy 2’. Asset Management Strategy covering all aspects of
it’s asset lifecycle, structured in such a way that is
5.12.4 Example Procedures Steps not only globally recognised, but well organised to
The adoption of a TPM approach should be aid understanding and ownership of the different
made whilst developing the organisations Asset aspects of the process.
Management Strategy as it is best applied holistically
to the whole organisation. 5.12.8 Derived Benefits
Delivery of a TPM program will benefit an entire
TPM should be implemented as with any system or organisation. The different pillars of the approach
approach, by starting with an assessment of current bring different benefits, but it is the holistic
maturity, identification of key stakeholders, education implementation that drives increased operational
and engagement, forming an implementation team efficiency and lower operating costs. Improvements
with key representation from all relevant sections in reliability, quality, higher morale and job
of the business (typically production, maintenance satisfaction, health and safety all result in the ability
and engineering managers with top level executive to grow the business.
representation) and then development of a master
plan for TPM implementation. Pilots and focused 5.12.9 References
implementation initiatives can help the establishment
of a more sustainable business change, to help TPM for Supervisors
provide evidence of benefits as well as the opportunity Productivity Press Development Team (1996),
to learn through a more rapid implementation. Productivity Press, ISBN 978-1-56327-161-8

5.12.5 Prerequisites
Situation 5.13 OEE - Overall Equipment
TPM can be used during the development of a Effectiveness
company’s maintenance strategy to structure the
document and make the interactions between 5.13.1 What is it?
key functions more clearly defined. It can be used Overall Equipment Effectiveness (OEE) is a data
alongside the Asset Management Policy in organisations collection and analysis tool that was developed
that manage the complete asset lifecycle. within manufacturing companies to understand the
overall performance of an asset, recognising that
Data downtime is not the only cause of reduced asset
• Details of policy and procedures across the
performance.
organisation.
In simple terms it can be described as the product of
Experience
availability, performance and quality.
• Basic understanding of Asset Management Strategy.
• Understanding of TPM to support development
OEE = Availability x Performance x Quality
and implementation.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
45 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

This is normally calculated in the following manner: Experience


• Availability = Running time / Total available time. • RCA capability to investigate significant losses and
• (where running time is total available time less the plan improvements.
downtime). • Basic data analysis capability.
• Performance = Actual output in time period /
(Running time x Design rate). 5.13.6 Typical Proprietary Tools
• Quality = Good quality output / total output. OEE tools are available from many suppliers, however
it is often most useful to begin with a manual
5.13.2 When is it used? process using asset performance log books and basic
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

OEE was developed used in manufacturing processes, Excel spreadsheets.


however the principles of recording and categorising
losses has been successfully applied to facilities and 5.13.7 Objectives
services. Identification of key asset losses to allow focused
improvement activity.
5.13.3 Complimentary Techniques
TPM 5.13.8 Derived Benefits
Through the measurement of OEE and the
5.13.4 Example Procedures Steps integration into a TPM or reliability program, or
OEE is best implemented through the operating team through PDCA, improvements to asset or equipment
with the following steps: performance can be implemented and measured.
• Training and awareness.
• Implement manual data collection process. 5.13.9 References
• Introduce data review through regular weekly
meetings. Overall Equipment Effectiveness (OEE)
• Identify improvement opportunities. Hansen, Robert C (2005). Industrial Press. ISBN
978-0-8311-3237-8.
Once the process is introduced it should be sustained OEE for the Production Team
through the use of a Plan-Do-Check-Act (PDCA) cycle Koch, Arno (2007). Makigami. ISBN 978-90-78210-
and regular production loss review meetings. 08-5. (English). ISBN 978-90-78210-07-8 (Dutch).,
ISBN 978-3-940775-04-7 (German).
As the organisation matures there are many OEE for Operators: Overall Equipment
automated data capture systems available which can Effectiveness Productivity Press Development
make the process more efficient and gather more Team (1999), Productivity Press, ISBN 978-1-56327-
data, however it is not advised to jump to these 221-9
without establishing the PDCA cycle where the data
can be used and acted upon.
5.14 LCVR - Life Cycle Value Realisation
5.13.5 Prerequisites
Situation 5.14.1 What is it?
• OEE is most often used in manufacturing Life Cycle Value Realisation (LCVR) encompasses
processes, however the principles of recording methods that assess and optimise the combination of
and categorising losses has been successfully all direct and indirect expenditures, cash flows, risks
applied to facilities and services. and performance benefits that may be associated
with asset ownership or responsibility over an asset’s
Data or asset system’s life cycle - from first identification
• Asset performance data must be available of need to final disposal, decommissioning and any
including operating times, duration and cause of residual liabilities thereafter. This topic is covered in
downtime and stoppages, asset performance or more detail in a separate SSG on this topic.
run rate and yield.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 46
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

5.14.2 When is it used? 5.14.5 Prerequisites


The primary purpose of LCVR is to support asset Situation
management decisions in relation to costs, risks In seeking to optimise value, an organisation will
and value opportunities, taking account of both require information relating to the costs, risks,
the immediate/short term impacts and any longer performance and benefits that result from not
term consequences. The correct application of only the asset management decision being made
LCVR can produce increased financial and economic but also secondary effects such as those resulting
benefits, improved decision making effectiveness, from interdependency with other decisions. Such
better communication with stakeholders as well as information should be accurate to the degree that a
driving improved cross-disciplinary governance and decision is demonstrably robust.
consistency.
Data
To optimise value, the LCVR approach should take a Life Cycle Costing generally requires assessment of
combined view of costs, performance, benefits, risks the following:
and the wider implications of decisions made across a) Design, procurement, construction and
the life cycle of the asset. The life cycle cost of an commissioning costs.
asset or asset system represents a single economic b) Operational and Capital maintenance costs
valuation of the total costs, risks and other business over the presumed economic life of the asset
impacts associated with ownership over its life cycle, or system.
expressed in a common unit (such as present day c) Related risks, such as failure rates and
equivalent values of future costs or other cash flows). consequences.
d) Performance constraints and ‘lost
The costs, risks and cashflows that should be opportunity costs’ (if comparing options with
incorporated into the analysis should include different performance capabilities).
any impacts that are likely to affect the intended e) Related costs, such as Socio-economic costs.
investment decisions, optimisations during asset life, f) Anticipated end-of-life costs such as
or end-of-life decisions. (decommissioning & disposal) or other
cashflows (such as recoverable value).
5.14.3 Complimentary Techniques g) Any post disposal period of residual liabilities.
CRO.
Experience
5.14.4 Example Procedures Steps Data and expert knowledge are essential inputs
LCC generally requires assessment of the following: to the process, so clarity of information sources,
a) Design, procurement, construction and consistent quantification methods for risk and
commissioning costs. intangibles, and the handling of uncertainty are
b) Operational and Capital maintenance costs important contributors to LCVR processes.
over the presumed economic life of the asset
or system. 5.14.6 Typical Proprietary Tools
c) Related risks, such as failure rates and The SALVO Project (http://www.salvoproject.org/)
consequences.
d) Performance constraints and ‘lost 5.14.7 Objectives44
opportunity costs’ (if comparing options with • Financial & business performance benefits.
different performance capabilities). • Consistency, robustness & auditability of decisions.
e) Related costs, such as Socio-economic costs. • Engagement and credibility with stakeholders.
f) Anticipated end-of-life costs such as • Rationalising corporate data.
(decommissioning & disposal) or other
cashflows (such as recoverable value). 5.14.8 Derived Benefits
g) Any post disposal period of residual liabilities. The effective application of Life Cycle Value
Realisation results in a number of significant
and tangible benefits. The benefits likely to be

44. Breakdowns of each of these into more granular benefits can be found in the Life Cycle Value Realisation SSG Document

47 © Copyright The Institute of Asset Management 2016. All rights reserved.


Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

particularly significant where decisions are being 5.14.9 References


made are of high criticality significance (i.e. they
involve spending significant sums or having high Life Cycle Value Realisation SSG
potential risk/performance consequences) or high IAM
complexity (i.e. many factors involved, with complex
interactions, or great uncertainty in assumptions and
secondary consequences).
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 48
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

6 Case Studies
A number of case study examples of good practice will be included to supplement the
guidance and to show what “good” looks like. These will be from a range of sectors.
They will give real-life examples of the theory explained in previous sections.

Case studies should recognise different operating Note from the authors: In selecting case studies
contexts, e.g.: for this SSG we essentially used existing published
• Good practice for a small asset management studies and tried to reverse engineer the techniques
organisation might be different than for a large used. We recognise that it would be more useful
organisation. if the case studies explained how/why the chosen
• For asset owners as opposed to asset operators, etc. technique was used since this would be excellent
guidance for others seeking to select an appropriate
The case study should be introduced by stating what technique. If possible it would be useful to know
it is intended to demonstrate and how it links to or what difficulties were encountered and how they
explains the guidance. This section will allow the were overcome and by including these points in
contributors to bring case studies from their own future case studies For those that write them there
organisations or experience. It will be a section would be considerable ‘added value’ if these two
that can live and be easily maintained as common points were documented during execution of a
practice matures in this area. project, and included in case study write-ups. To do
so would be a further step towards best practice.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
49 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

6.1 Case Study 1


Requirement To support implementation
of a continuous improvement
programme
Intended Analysis of existing asset history
Process to identify asset ‘bad performers’
so that remedial actions can be
implemented
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

6.1.1 Introduction
This is a case study taken from Metronet on the,
“Renewing the Tube,” Project. With Metronet in the
process of purchasing the first new fleet of London
Underground trains for 17 years in 2009, whilst Figure 23: London Underground Train
planning to introduce another four fleets from 2012
onwards, the Fleet Asset Managers wished to lay 6.1.3 Approach
the foundations of a long term investment plan to Investigation and assessment was undertaken to aid
maximise the return on their investment and insulate decision making on when to refurbish or replace
that investment from political changes. The design major systems, such as door engines, traction motors,
life of the new fleet would be 40 years, however, wheels and brakes. The case study initially examined
existing fleets had already exceeded this design life the doors as this system failed most frequently at the
based on technology and materials developed in the highest cost, closely followed by traction and brakes.
early 1960s. These three systems accounted for over 40% of
failures by both frequency and cost. An investigation
6.1.2 Challenge was initiated into the differences between the door
Purchases of major long-life assets such as rolling system maintenance regimes on the different fleets to
stock, planes, wind turbines etc., represent identify, and then roll out, best practice.
a substantial investment by companies or
governments, from which the owner needs to ensure As an example of how Reliability Engineering can
maximum return over the asset’s economically impact design during the “in service” part of the
viable operational life. This case study discusses one lifecycle the case study looked at how the design
particular example of how a customer explored the can be made more cost effective by designing for
potential to extend the life of their asset purchase by ease of maintenance or replacement. It considered
approximately 50% and the methods they used to whether or not it would be better to create modular
achieve this. systems that can be removed for service off-line,
than allowing easy access in situ, rather than how
Significant forward planning was required as the to minimise the cost of operating which was already
known time interval for extension would be 20 years. in place, or what the selected supplier delivers after
This is the life of the bogies. After 20 years, the cast interpreting specifications.
iron develops sufficient fatigue cracking to require
replacement, so for an investment to be worthwhile, The work also considered the degree of performance
the fleet would need to achieve 60 years. recovery from planned maintenance and how best
to plan maintenance interventions through the use
Reliability Engineering looks at all aspects of the of tools such as RCM since not all suppliers have
asset lifecycle from concept through specification embraced such working techniques and many
to design, production, commissioning, operating, products were supplied prior to the development
maintaining, decommissioning and disposal/recycling. of RCM, e.g., the 67TS rolling stock was in service
In this instance, it looked at both the fleet and 11 years prior to the release of Nowlan and Heaps’
infrastructure (depot and equipment) due to the landmark study of commercial aircraft reliability and
multiple interfaces between the two. maintenance in 1978.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 50
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

In order to determine what would be required to service, longer operating hours (24hr tube), PPM
extend the life of a fleet due for delivery 2 years later, it frequency etc. could be demonstrated by the WLCM
was necessary to understand the current situation with • Better understanding of existing fleet performance
the existing fleets and what had been learnt over nearly through improved analysis and Reliability Growth
150 years of operating the London Underground. Plans
• Improved performance through the introduction
The initial stage involved gathering data and lessons of RCM
learnt from the existing fleets. Failure data was • Able to constructively influence the new fleet
gathered from the eight Metronet fleets in order to design, e.g. exchangeable wiring cartridges
identify common failures, which would allow each
fleet to learn from improvements made on other fleets
as well as provide feedback into the design process. 6.2 Case Study 2
1) Defining the problem was an important first Requirement To conduct downstream/ upstream
step to identify issues with current practices design check as part of change
and subsequently identify a root cause. management
2) The second step was to clearly define the
Intended Ensure that technical characteristics
target, i.e., what does success look like when Process of proposed asset replacement
the problem is resolved? There is a gap
are compatible with related asset
between where you are and where you wish
systems
to be that needs to be closed. Closing that
gap was the target. 6.2.1 Introduction
3) The third step was Root Cause Analysis (RCA). To ensure that the technical characteristics of
4) The fourth step was to develop a proposed asset replacement are mutually compatible
countermeasure, planned to address the gap with related asset systems the subject of this
and meet the target. design check concerned the replacement of four
5) The fifth step is to test and refine the vertical spindle horizontal hot strip coilers with two
countermeasure to achieve the target. horizontal mandrel downcoilers. Justification for
6) The final, sixth, step was to assess the replacement included:
lessons learnt and consider where else this • strip surface damage in the water-cooled twist
solution could be applied, or if there are guides no longer met increasing surface quality
similar issues that would benefit from this requirements.
solution. • vertical spindle coiler and associated asset
downtime was identified as a limitation to
6.1.4 Deliverables increasing mill throughput.
• Maintenance practices based on RCM and
condition monitoring This design check concerns the interface between the
• Designs were reviewed for ease of maintenance new equipment and the installed environment. It is
or replacement not a design or reliability check on the as-purchased
• Performance recovery predicted following service equipment or system, nor a dimensional check on
or refurbishment the physical environment. It is a compatibility design
• Decision support tool developed to determine check to ensure that critical operational interfaces
whole life cost of intervention and design options match. For example - cooling water pressure, flow,
temperature, reliability; OHT operation cycle time for
6.1.5 Results mandrel changes; strip speeds and interval times.
The following benefits were identified as a result of
the analysis: 6.2.2 Challenge
• A Whole Life Cost Model was produced as a The downcoiler system design will have been ‘frozen’
Decision Support Tool for the next 65 years against external system data supplied at the time
• The effect of changes of use, i.e. more frequent of placing the purchase and installation contract.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
51 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

Due to the long ‘gestation time’ of this nature of


contract, it is important to ensure that this external
system data is still valid, and represents true
operation values and is not based on external system
design values.

6.2.3 Approach
A boundary diagram [Process Map] was constructed
from the new downcoiler system data to show all
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

process and service material flows and operational


interventions (i.e. incoming strip, outgoing coils;
cooling water supply, waste water discharge; coiler
mandrel changes). From this a table was constructed
to show the maximum and minimum parameters for
each of the above data.

To support this, additional data from both Figure 24: Example of a strip mill downcolier
operational records (to establish current system installation
performance at the identified system boundaries),
from the FRACAS system (to identify current
performance anomalies), and from the original Where values were obtained from operational
design P&ID drawings were obtained to provide records for ‘actual operational data’, the margin
design, and current performance data. Against each of error was deduced from the record sample size
‘new system data’ was entered the external system reviewed, and upper/lower boundaries calculated.
data as ‘design data’ and ‘actual operational data’. Where comparison of values (new system, existing
Each data value entered was qualified by reference to system design and existing system actual) gave rise
the source of information. Where the value had to to concern, a risk evaluation was carried out, using
be calculated from base information the calculations the ETA methodology, and included in the report
were included. together with proposals for risk mitigation.

Closed circuit cooling water (in) Electrical power Emissions


Flow rate Temperature Pressure V A pf Noise Steam

Descaling water (clean)


Flow rate Temperature Pressure

Some typical downcoiler system boundary criteria


Laminar flow cooling water (clean)
Dirty water discharge
Flow rate Temperature Pressure
Flow rate Temperature Impurities

Hot rolled steel strip Consumables


Speed Temperature Width Hydraulic oil Coil strapping
Thickness Cycle time

Closed circuit cooling water (out)


Compressed air
Flow rate Temperature Impurities
Flow rate Pressure Quality

Oxygen
Flow rate Pressure Quality
Hot rolled coils Mandrel changes
Frequency Weight O.D. Frequency Cycle time
I.D Temperature Cycle time

Figure 25: Boundary diagram

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 52
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

6.2.4 Deliverables 6.2.5 Results


The design check report consisted of three sections: This report was submitted to the project manager,
and reviewed by him together with operations and
1) Boundary diagram, together with table of maintenance staff. Where considered appropriate,
required/design/actual flow data, including actions were initiated as part of the project plan to
sources of information and calculations eliminate areas of non-alignment. In this instance
where necessary. Each set of data was improvements to the final stage of cooling water
compared, and either ‘passed’ or ‘failed’’. treatment and filtration were recommended and
2) Risk evaluation for all ‘failed’ areas, with implemented, and an area identified for coiler
evaluation expressed and ranked in terms of mandrel overhaul within the area covered by the
lost production opportunity. OHT crane carrying out the mandrel changes.
3) Proposals for risk mitigation.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
53 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

6.3 Case Study 3 A number of high profile examples include:


• X many days downtime for _____.
Requirement To respond to an unexpected event • X attempts to solve a problem for _____.
that has occurred
Intended Identify root cause of unexpected 6.3.2 Challenge
Process event so that remedial actions can Develop a problem solving framework that
be implemented is understood, respected and owned by the
Maintenance Technician team.
6.3.1 Introduction
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

Technicians are by their training and experience 6.3.3 Approach


often excellent problem solvers. However we were Working in collaboration with the companies
experiencing repeat failures and prolonged downtime lean program, and with other manufacturing
through iterative and assumption filled problem organisations in the region we the Maintenance
solving. Manager engaged with a number of members of the

Problem solving activity

Triggers:
Technician Sub-part rejection
Problem identified
Engineerng planner Batch slot missed
Repeat failures
Maintenance lead Customer complaints
Customer Engineering cause for miss vs plan
Longer than one shift
Go to the problem

Identify containment Do it!

Clarify the
problem Clarify the problem
BREAKDOWN ACTION
(Complete 5W1H)

5 Whys Determine root cause Record stages on


(Complete 5 Why) Problem Solving Form

Identify
PLANNED ACTIVITY
countermeasure
eeting
Review M
Agenda
Review / performance
meeting
Version 1.0 20/9/2012

Figure 26: Process used in the case study

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 54
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

F1 Engineering Problem Solving Template


Plant: Equipment: Plant item no.: SAP no.: Number:

Clarify the problem 5 Why

What is the problem? Why #1

Where did it happen?


Why #2

When did it happen?


Why #3

Which trend (if any) is visible?

Why #4
Who found the problem?

How big is the problem? Why #5

Problem statement
Countermeasures
No Countermeasure SAP Who

#1

#2
Notes/Comments
#3

#4

Early life review


Date required Completed

Figure 27: Problem solving form

Technician team to develop a basic problem solving 6.3.5 Results


process, agreed the trigger levels when problem The number of repeat breakdowns has drastically
solving would be used and produced relevant reduced – analysis of the plant CMMS data suggests
training material using real examples from the it has been almost eliminated.
department.
Technician team really engaged with the process as
6.3.4 Deliverables they developed it – training material well received
A robust problem solving process including: and now replicated in other areas.
• worksheet / form.
• process / procedure.
• training documentation (including examples).

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
55 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

6.4 Case Study 4 To help focus and prioritise activities a Criticality analysis
was performed. This process considers the impact
Requirement To predict failure modes which may to SHE, Quality and Operations in the event of a
seriously affect expected or desired system failure, plotted against the perceived reliability,
operation maintainability and complexity of that system.
Intended Identify:
Process - relevant critical assets The RCM (Reliability Centred Maintenance) studies
- potential failure modes which were completed both on site and at suppliers
would affect operations during the design phase of the project. The
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

- causes of failure process included RCM training, process mapping,


- mitigation options determining functions, failure mode and effect
analysis, risk priority number scoring and then
making a decision to reduce the risk priority number
6.4.1 Introduction through re-design, inspection, maintenance or spares
A pharmaceutical manufacturing organisation was mitigation. This process was inclusive and it was
building a new sterile manufacturing asset some 20 important to ensure we were genuinely engaging
years after the previous one was completed. The designers, maintenance managers, operators and
facility produces several variants of just one product technicians.
using a combination of bespoke and customised
equipment and there was a genuine concern about These identified actions were incorporated into the
delivering robust and reliable asset (both process project process to track their delivery.
equipment and services) to enable 100% reliability
between bi-annual shutdowns. 6.4.4 Deliverables
• Criticality Assessment – that prioritised process
equipment and services.
• RCM studies for critical process equipment and
services.
• Re-design activities with tangible risk reduction
scoring.
• Maintenance and spare part requirements identified.

6.4.5 Results
Of the 822 Failure Modes identified across 11 critical
process and service systems, 85% of these were
random. Traditional scheduled maintenance would
Figure 28: Sterile Manufacturing Facility have addressed only 15%.

6.4.2 Challenge 11%


15% Traditional maintenance
The cost of building in redundancy (the traditional approach would have
approach to this problem) was prohibitive, as well as addressed just 15%
(i.e. age related) failure
not addressing random failure. The challenge was modes.

how to work with the design team and equipment 85% 89% RCM has addressed 89%
of failure modes.
suppliers to focus not just on design quality but
considering functional reliability. % Failure modes addressed with traditional (left) and SPP5 approach (right)

6.4.3 Approach Through this work we’ve addressed 89% of


It was important that the project leadership team these failure modes, over 30% through re-design,
were engaged and sold the benefit of using a completely eliminating failure modes.
process focused upon reliability to allow them
to readily support the project team through the The risk priority number (each failure mode scored
workstream. against Severity, Occurrence and Detection) has been

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 56
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

reduced by 68%. The project team are confident in We have had excellent feedback from our suppliers,
delivering what is a unique and complex asset with two of whom have adopted the process for their
reliability built in. future project work.

S. O. D.
Severity x Occurrence x Dectection
68%
We found this RCM session very useful and will use
this methodology in future also for our other machines
and especially for new developments.
(The Risk Priority Number)

Best regards,
Udo Baeuerle, Head of Technology
In addition we have trained Engineers, Technicians
and Operators who now have a better understanding
of both the nature of failure and the equipment
itself.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
57 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

6.5 Case Study 5 6.5.2 Challenge


The mill was a complex and semi-continuous process
Requirement To develop test plans including line, comprising three main sections: billet reception,
compliance specifications for handling and reheating furnace; mill train including
commissioning to production of a finishing blocks, coil and bar runout and cooling
complex asset with interdependent tables, and finished coil and bar handling, storage
components. and dispatch.
Intended Determine the components of the
Process asset and their interdependencies; The production assets servicing this process line
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

were: billet reception and reheating furnace section


Determine the other assets that and the finished coil and bar section. Service
interact with the study asset (e.g. utilities included: closed and open cooling water
energy and water supply etc) and if supplies; scale handling equipment, overhead cranes;
any of these required testing too. oil cellars; compressed air, oxygen and gas supplies;
substations and electrical distribution network;
Determine the type of activities communications, control systems, and lighting.
the asset undertakes and their
characteristics (e.g. operating in hot Each discreet equipment unit of the mill complex and
and cold conditions, under load??). utilities required a Cold Test Plan. The Plan included:
a check on the correct rotation of electric motors
Determine the order/sequence (with couplings disconnected); unimpeded movement
of testing, based on components of mill equipment within design parameters; physical
interdependencies. checking of isolation valves. Each system required
a Hot Test Plan. Examples of this were a physical
Use data from available operation of lubrication and cooling water circuits;
documentation. firing up the billet reheating furnace; load testing
of overhead cranes; operation of individual mill
6.5.1 Introduction stands throughout their speed range - but only after
The subject of this case study is the development of successful hot testing of cooling, lubrication and other
test plans for commissioning of a 540,000 tonne/ service circuits and networks. Each of the mill sections
year steel rod and bar mill in North Africa. The were subjected to a separate and sequential Load
mill was the final stage of one of the process lines Test, starting with billet reception through to each of
of an integrated steel plant, and the product mix the three finished product categories and, for each
was round bar 5.5mm to 12mm diameter for wire category, minimum and maximum dimensions. Once
drawing, round bar from 6mm to 14mm diameter these Load Tests had been successfully completed,
for manufacturing, and deformed bar from 6mm to the whole mill complex entered into the Performance
25mm diameter for construction. Test phase to prove, as far as possible within an
agreed operating period, that the mill was capable of
reaching the contractual 540,000 tonne/year output
with the contractually defined product mix.

6.5.3 Approach
Each of the four test programmes required a different
approach using a different set of input data.

Cold Tests
The major part of asset-related documentation
provided with the mill was provided by sub-suppliers
to the turnkey contractor, and required analysis and
reformatting to create the Equipment Register (base
Typical long products mill train

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 58
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

COLD TESTS HOT TESTS LOAD TESTS PERFORMANCE TESTS

Units
Systems
Units
Utilities
Units
Systems
Units
Wire Drawing
Units
Systems Billet Reception Sizes/Qualities
Units
and Reheating
Units Furance
Systems
Units
Round Bar
Units Sizes/Qualities
Systems
Units Mill Train and
Cooling Beds
Units
Systems
Units Re-bar
Sizes/Qualities
Units
Systems
Units Coil and Bar
Units handling
Systems
Units

Commissioning test sequence

line for CMMS. The preliminary analysis allowed procedures, pass/fail criteria
grouping into generic equipment types, and a • Responsibilities and the approval process
Cold Test Schedule was developed for each type, • Risks and contingencies
and reviewed against the manufacturer’s technical
documentation. Results of these tests were recorded in the
Equipment Register as part of the equipment history.
An FMEA approach was used to develop the Cold
Test schedule. Not a ‘full-blown’ FMEA as this Hot Tests
would have been conducted during the design and Piping and Instrumentation Diagrams (P&I Ds) were
manufacturing phases of the individual units, but the used to allocate each equipment item (units in the
FMEA results from these earlier phases (Functions, above diagram) into discreet systems, and a Hot Test
Functional Failures, Failure Modes) were used as Schedule developed for each system. For this phase
a basis for developing failure/success criteria for a full FMEA was carried out at ‘System’ level, with
installed equipment. Not all equipment required a emphasis on (Failure) Criticality. Initiation of some
Cold Test, and some of the tests required services system tests were dependent on completion of an
to be available (e.g. power, compressed air). Where ‘upstream’ system test, and the resulting test plan
practical this was achieved by sequential cold test was set out in the form of a dependency diagram.
planning, but in most cases required a temporary Each Hot Test Schedule was clearly documented as
service supply. Each Cold Test Schedule clearly for the Cold Test Schedules, with the addition of:
documented: • Prerequisite equipment Cold Test status
• What would and would not be tested • Prerequisite system Hot Test status (from
• Prerequisite equipment installation status dependency diagram)
• How the test would be conducted, equipment
and materials needed, test configurations and

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
59 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

Results of these tests were recorded in the and Provisional Acceptance, the mill commenced on
Equipment Register as part of the system history. an operational programme under client management
and staff, with technical support by the turnkey
Load Tests contractor, aimed at achieving a steady state of
The mill train contract specification was used as a production over a three month period. There were
basis for designing the Load Tests. The objective no specific Performance Test schedules for this
was to prove that the mill could achieve the speeds, period; the production records being analysed and
tolerances and qualities required in the contract data extrapolated to support negotiation between
for each of the contracted sizes. The mill train the client and the turnkey contractor as regards the
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

was divided into three sections for the purpose of mill’s expectation of reaching the target annual mix.
designing Load Tests. Twelve finished sizes were
listed in the contractual product mix and, although 6.5.4 Deliverables
the initial plan was to carry out the Load Tests for Cold Test schedules, and test results including a
each of the three sections in sequence, they were report on any remedial actions taken.
conducted as one integrated test plan
Hot Test schedules, and test results including a report
A Load Test schedule was developed for each size on any remedial actions taken.
listed, and included upstream tests (billet handling,
reheating furnace, shears) and downstream tests Load Test schedules, and test results including a
(coil and bar handling and storage). The same report on any remedial actions taken.
format was used as for Cold and Hot Tests.
Whilst the FMEA approach was still valid, the 6.5.5 Results
consequence of failure was more aligned to failure Complete test schedules were developed by the
to meet downstream operational and contractual turnkey contractor for Cold, Hot and Load Tests. The
requirements than had been the case for the Cold contractor remained totally responsible for carrying
and Hot Tests. Provisional Acceptance of the mill out these tests, with the client’s engineer witnessing
was dependent on successful completion of all ten their conduct. Client staff, who had been trained
Load Test schedules. externally by the turnkey contractor at other facilities,
were invited to participate in the Load Tests as part
of their continued training.

The first deformed (re-inforcement) bars were


produced before the bar storage area of the mill had
been completed, and were put to good use in the
floor slab construction.

Contractual product dimension schedule

Performance Tests
With the exception of the 6mm deformed bar, each
of the listed sizes contributed to a target annual
mixed finished product output of 540,000 tonnes.
On successful completion of the Load Test schedules,

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 60
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

6.6 Case Study 6 6.6.2 Challenge


This production line utilises a considerable amount
Requirement To improve ownership / operational of manual human operations, as such the cartonning
standards machine is quite old and fairly low-tech. There was
Intended Identify relevant operation problems a perception amongst the operating team that the
Process (‘bad actors’) machine or the carton needing re-designing.
Identify operation-associated failure
modes. Engage operating team. 6.6.3 Approach
Develop mitigating options Through discussion with the Process Line Manager,
Plant Manager and Engineering Manager it
6.6.1 Introduction was agreed to use a RCM process. Operators,
A packing line that processes nearly 80% of the Technicians and Quality Assurance Associates were
world supply of a pharmaceutical product was trained in the process, its history and learnt about
suffering from poor performance and variability in age and random failure.
output. Considering the data the impact of carton
and machine interface was impacting both availability The RCM studies identified tasks that the operating
through downtime (rectifying problems, carton jams team could undertake to improve performance and
for example) and performance reduction (i.e. having spot issues early to minimise downtime and plan
to run the cartoning machine more slowly). interventions.

The output, along with a session on age and random


failure was presented to the whole operating
team. The tasks were implemented through daily
and weekly inspection checklists undertaken by
the operating team, who modified and updated
these documents, increasing their ownership of the
process.

6.6.4 Deliverables
• Awareness and understanding about asset care
and reliability within the Operating Team.
Figure 29: Cartoning Machine • Increased level of technical understanding

Study
complete

Downtime (CAM and carton build)

600 Addressing failure...


500
Downtime (minutes)

(% on condition)
400 (% not 16% 10%
addressed)
300
(% traditional
200 20% time based)

100 48%
(% redesign) 6%
0 (% spares)
27/06/2011
27/07/2011
27/08/2011
27/09/2011
27/10/2011
27/11/2011
27/12/2011
27/01/2012
27/02/2012
27/03/2012
27/04/2012
27/05/2012
27/06/2012
27/07/2012
27/08/2012
27/09/2012
27/10/2012
27/11/2012
27/12/2012
27/01/2013
27/02/2013
27/03/2013
27/04/2013
27/05/2013
27/06/2013
27/07/2013
27/08/2013
27/09/2013
27/10/2013
27/11/2013

• 84% of identified failure modes have been addressed


• Traditional maintenance would only have addressed
30% at best

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
61 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

amongst asset care practitioners (Lead Operators


and Technicians).
• RCM study documenting the process and learning.
• Work instruction and back up sheets for Operator
Asset Care tasks.
• Re-design activities with a tangible risk reduction
scoring.

6.6.5 Results
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

The packing line performance has not only improved,


but has become more consistent – allowing more
confidence in supply.

Feedback from the operating team was fantastic.


When the Plant Manager visited a weekly
performance review meeting and asked what was
contributing to the improved, consistent performance
the operating team passionately identified “ it’s
all since we introduced the new daily and weekly
cleaning and inspection tasks”.

6.7 Case Study 7


Figure 30: One of two identical cranes
Requirement To increase reliability of an asset
Intended Identify failure modes from asset
Process history maintenance requirements of an asset and as such
Develop mitigating options is an effective first component in any process to
determine maintenance requirements.
6.7.1 Introduction
Reliability Centred Maintenance (RCM) is the 6.7.2 Challenge
application of a structured method to establish the To identify the most appropriate, cost effective
optimum preventative maintenance for a given maintenance regime for the crane with no
asset (system or equipment) in its unique operating detrimental safety, availability or economic impact.
environment. It is an effective, proven methodology
for rationalising legacy maintenance whilst deriving LSC Group was requested to conduct an RCM
optimised maintenance for new equipments. It assessment of the current maintenance for the
begins by identifying the performance requirements crane and to identify the most appropriate and cost
of the equipment, the way in which the equipment effective maintenance regime, with no detrimental
fails and the plausible root causes of failure, and safety, availability or economic impact.
then details the effects and consequences of failure.
This allows an assessment to be made of the 6.7.3 Approach
probability of occurrence and severity of the failure An RCM analysis that identified the various
and identifies significant safety, environmental, alternative solutions to manage the crane
operational or cost consequences. maintenance schedule.

The methodology allows the selection of an Working alongside DML (now Babcock Marine)
appropriate maintenance task that addresses each LSC Group supplied the RCM expertise and
identified failure. RCM provides a fully auditable was responsible for recording the analysis and
decision-making process for determining the reporting the results. DML supplied the appropriate

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 62
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

engineering expertise, allowing the RCM analysis task selection and frequency. The analysis for the
to be conducted on a team basis. This partnership crane identified 66 functions, 132 functional
brought together the following attributes: failures and 694 engineering failure modes. 180
• Knowledge and experience of the RCM process. scheduled maintenance tasks were derived, of
• Detailed knowledge of the appropriate design which 54 tasks were new tasks that had not
features, installation and commissioning. previously been carried out. Many of the existing
• Knowledge of how the crane is used, operated, maintenance tasks were not justified by the RCM
maintained and supported. analysis and auditable rationale was provided to
• Knowledge of the condition of the crane and its support their removal from the maintenance
components at overhaul, including understanding schedule. 2 ‘Mandatory Redesigns’ were
of the actual failure modes, and their effects. recommended to address engineering failure
• Specialist knowledge of constraining influences, modes with safety consequences and a ‘Desirable
e.g. Health and Safety, environmental legislation, Redesign’ was recommended to prevent
regulatory bodies, etc. economic consequences.

Detailed technical data was required to conduct • Planned Maintenance Schedule


the analysis and was obtained from manufacturer’s This schedule listed the routine preventative
handbooks, assembly drawings and wiring diagrams. maintenance tasks required to support the crane
Available failure data was reviewed to ensure that all in its operating environment for the duration of
failures that had previously occurred were addressed. its anticipated life cycle (30 years). A move to a
Maintenance records were reviewed to give an indication calendar based maintenance cycle was
of the condition of equipment after use. Where recommended, in preference to the existing
historical failure data was not available, the engineering operating hours based cycle. This allows for
judgement of the team, with a thorough knowledge easier management and better planning of the
of the equipment, was used. Site visits were essential maintenance activities and increases crane
to acknowledge the physical attributes of the crane, availability.
appreciate difficulties experienced by operators and
maintainers and to assess the environmental conditions • Maintenance Comparator Report
under which the crane operated. This report allowed a direct comparison between
the existing maintenance schedule and the RCM-
6.7.4 Deliverables derived maintenance schedule. The comparator
The following deliverables were provided in the form showed a 43% reduction in the maintenance
of a RCM Analysis Report: effort required to support the crane. This
reduction was possible because the man-hours
• Operating Context Statement needed to meet the requirements of the RCM-
This statement provided a physical and functional derived maintenance schedule were significantly
description of the crane and described the less than the man-hours required to complete the
physical environment in which the crane was existing schedule.
operated. It also provided precise details of the
manner in which the crane was used and When approval is sought from the regulatory
specified quantitative performance requirements. body for a design change notification as a result
of modifying the maintenance schedule, the RCM
• Analysis Comments Report Analysis can be used as the supporting
This report provided details of the documents justification and validation document. The level
used and referenced during the analysis, of detail in the analysis encompasses and far
identified individuals involved and listed any exceeds that provided in existing procedures and
assumptions made throughout the analysis. reports, ensuring that all maintenance activity
considerations have been addressed.
• FMECA / Consequence and Task Analysis
These outputs recorded the results of the analysis To re-inforce and directly support the
and detailed the justification for maintenance development and maturity of the Integrated

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
63 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

Schedule, an LSC Group risk management 6.8 Case Study 8


specialist was tasked with updating the current
Risk Register. Working with the alliance, the LSC Requirement To increase reliability of an asset
Group specialist helped to establish a coherent, Intended Identify failure modes from asset
comprehensive and forward looking Risk Register, Process history
defining the key risks and mitigation actions. Develop mitigating options

Through collaboration, the team were able to 6.8.1 Introduction


produce a Risk Register that clearly identified Commissioned in 1998, HMS Ocean has been
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

short, medium and long term risks. Through a deployed on a range of military operations, and like
number of workshops and interactive working many platforms has a complex set of equipment,
sessions, the LSC Group specialist helped facilitate systems and subsystems on board.
the definition of all existing and future risks such HMS OCEAN, like the majority of Royal Navy
that they were properly defined, measured and platforms, is maintained using Reliability Centred
understood, as well as ensuring all risks had Maintenance (RCM) methodology.
mitigation action owners.
RCM is the UK Ministry of Defence (MOD) mandated
With improvements identified, the team could maintenance policy for future platforms, major
work towards driving these forward, ensuring equipment procurements and selected in-service
that both risk management and scheduling platforms. As a philosophy, RCM focuses on optimising
activity would be more coherently integrated availability through preventative maintenance –
as part of project controls. The assessment of the thereby realising an assets inherent reliability delivering
project risks and the resultant update of the Risk maximum operational availability, safely.
Register enabled senior management to make
better business decisions, adding benefit and 6.8.2 Challenge
increasing levels of delivery success. With more hours of maintenance demanded
than capacity for Ships Staff delivery there was a
6.7.5 Results requirement to reduce the maintenance burden
The RCM Analysis has identified the appropriate whilst maintaining equipment safety and delivering
maintenance tasks to ensure safe, reliable and the necessary operational capability.
economical use of the crane throughout its life.
The analysis has justified the removal of existing For HMS OCEAN, putting the required maintenance
maintenance tasks from the schedule and has theory into practice was resulting in an inability to
justified new tasks to prevent adverse safety, complete all the necessary maintenance. Factors
availability or economic consequences all in a such as training, manning issues and changing ship
structured, auditable manner with the net result operating profiles impacted on the capacity and
being greater equipment availability at a significantly the capability of Suitably Qualified and Experienced
reduced cost. (SQEP) personnel to complete the maintenance.
Furthermore, delays in feeding back experiences
The RCM-derived maintenance schedule identified from defect and repair events to optimise future
reductions of 725 man-hours per year compared to the maintenance were causing the problems to continue
current regime. When applied to a second identical and grow.
crane this gave a potential saving of £1.3 million
through the anticipated life of the cranes, with the RCM
analysis costs being recouped within the first year.

Further cost savings are possible by using this analysis


as a template to produce maintenance schedules
for other cranes, although a level of revisit would be
anticipated dependent upon the type of crane and its
location. Figure 31: HMS Ocean

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 64
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

It was recognised that an increasing number of • Operating Context Review – to check that
unreliable and high maintenance systems were far equipment is operated and analysed in the way it
outweighing the capacity of the Ships Company to had been identified to operate.
complete and that this was further impacting on the • FMECA – identifying the failure mode and effect
availability and sustainability of all ship systems. reviewed against the function and functional
failure.
6.8.3 Approach • Task Review – ensuring that the task manages the
A prioritised schedule of maintenance activity failure mode that has been identified. This would
developed using proven RCM methodology to address the issues associated with equipment and
achieve the required reduction in Ships Staff systems not currently being operated in
maintenance hours. accordance with RCM philosophy.

Working with HMS OCEAN since 1999, LSC Group A total of twenty one analyses associated with six
was chosen to help reduce the maintenance burden. high failing systems were identified and then a
The target was to achieve a reduction of >20% of revalidation of the maintenance schedules was
Ships Staff maintenance through the identification completed using the process defined above.
and completion of maintenance reviews for a range
of equipments. 6.8.6 Innovative Technologies for Condition
Monitoring
The purpose of the maintenance review was to During the review, LSC Group identified a number
investigate if the current Ships Staff maintenance of Condition Monitoring techniques and technology
burden could be reduced by addressing and that would reduce the maintenance burden on Ships
improving the process for aligning maintenance Staff. Improvements included:
intervals with anticipated failure modes – whilst • Continuous monitoring of exhaust emissions –
looking at the potential use of innovative technology this can be used to identify potential failures on
and Condition Monitoring techniques. Internal Combustion Engines through trend
analysis of exhaust gases.
6.8.4 Maintenance Analyses • Automated chlorine monitoring to test water –
Working closely with the MOD and Babcock team, reducing the need to complete routine, daily
LSC Group specialists conducted a programme of maintenance.
analyses to identify those tasks that generate high • Fire hydrants on upper deck – reduction in
maintenance load for both the Weapons and Marine lubrication process by changing the maintenance
Engineering departments. process.

It was quickly identified that 83 individual tasks The use of On Condition Tasks and Condition
(some 5% of the overall number of maintenance Monitoring would enable intervention to be planned
tasks) generated 60% of the maintenance load. before a predicted failure could occur. Furthermore
the new technologies would significantly reduce the
6.8.5 Re-usable Maintenance Support Processes time required to complete maintenance tasks.
A critical focus of the project was to ensure that
re-usable processes to support maintenance reviews 6.8.7 Deliverables
were developed using RCM methodology. LSC Group recommended a series of process,
maintenance and condition monitoring improvements
LSC Group looked to incorporate mission profiles that would significantly reduce the man hours required
and readiness states to highlight which systems and to complete maintenance.
equipment should be reviewed. • A total of twenty one analyses associated with six
high failing systems were identified.
Targeting a reduction in ship staff hours and tasks, • 83 high loading maintenance tasks identified and
LSC Group developed a rolling Maintenance Review proposed changes would reduce from ~19000 to
process for In-Service systems including: 8800 hours per year.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
65 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

• 213 changes were recommended to maintenance (RN) and Royal Fleet Auxiliary (RFA) Platforms. This
schedules. study has proven that intelligently and rigorously
• Average maintenance hours per year were applied RCM provides optimised and defensible and
anticipated to reduce by 46% (Marine Engineering maintenance to maximise equipment and platform
Department) and 14% (Weapons Engineering reliability.
department) respectively.
• Introduction of Condition Monitoring techniques
would enable intervention to be planned before a
predicted failure could occur:
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

- Continuous monitoring of exhaust emissions


– early detection of possible failures.
- Automated chlorine monitoring to test
water – reducing daily maintenance from
2 hours work for 1 person per day to an
automatic daily check with alarms requiring
little human intervention.
- Fire hydrants on upper deck – reduction in
lubrication process by changing the
maintenance process.
• Data cleansing activity of key maintenance data
bases that reduces errors and re-baselines data
required in support of maintenance decision
making.
• Introduction of regular review cycles to support
the review process and ensure engagement of
stakeholder community.
• Stakeholder Engagement to ensure revised
maintenance schedules understood, approved
and implemented.

Overall the maintenance review could reduce the


maintenance burden on the platform, improving
planning, reducing risk and maintaining the required
level of capability and availability of the platform.

6.8.8 Results
The overall objective of the maintenance review
was to address the balance between capacity and
capability to undertake the necessary maintenance
activity. A predicted reduction of 20% in Ships Staff
man hours required to maintain key equipment
onboard HMS OCEAN, whilst continuing to meet
the required availability levels through optimised
maintenance schedules. These maintenance
schedules continue to deliver the necessary
defensibility and auditability and meet the evolved
operational context.

The processes developed through this review could


be applied to any current and future Royal Navy

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 66
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

7 References
1. The term ISO 5500x is used generically in this document to refer to the family of standards comprised on
ISO 55000, 55001, and 55002 unless specific reference is made to a section within one of those standards.
2. Clause 4, PAS 55
3. Asset Management – an Anatomy Version 2, July 2014, Institute of Asset Management
4. Second Edition, March 2014
5. Updated to reflect Version 2 of the Anatomy
6. An Anatomy of Asset Management Issue 3 July 2014
7. An Anatomy of Asset Management Issue 3 July 2014
8. ISO 55000, John Woodhouse, Chairman, Experts Panel, Institute of Asset Management
9. Ibid
10. Maturity Models 101: A Primer for Applying Maturity Models to Smart Grid Security, Resilience, and
Interoperability, Software Engineering Institute, 2012, Caralli, Knight, Montgomery
11. Ibid
12. Ibid
13. https://theiam.org/products-and-services/pas55-methodology
14. https://theiam.org/products-and-services/Self-Assessment-Methodology
15. Source - Warwick Manufacturing Group
16. Asset Management – an anatomy Version 3, July 2014, Institute of Asset Management
17. Practical Reliability Tools for Refineries and Chemical Plants (Barringer)
18. Due to be superseded soon by IEC 60050-192
19. US National Infrastructure Advisory Council 2010, pg. 15.
20. IAM Knowledge Center
21. Ibid
22. Australia, Critical Infrastructure Resilience Strategy, 2010
23. https://wiki.ittc.ku.edu/resilinets/Main_Page#ResiliNets_Wiki
24. Measurement Frameworks and Metrics for Resilient Networks and Services: Technical report, ENISA,
February 2011
25. Modelling and Analysis of Network Resilience, IEEE COMSNETS, Bangalore, India, 2011
26. https://wiki.ittc.ku.edu/resilinets/Main_Page#ResiliNets_Wiki
27. Note that the P-F curve is drawn in relation to operating age and some of the tools discussed will be used
in design i.e. before the operating phase commences.
28. Practical Reliability Engineering, Chapter 9, O’Connor
29. http://www.reliabilityweb.com/art08/7_questions_rcm.htm
30. For those used to looking at risk as the product of probability and consequence/severity this is covered in
the ‘CA’ of the FMECA and will likely have an associated criticality matrix.
31. O’Connor, Chapter 15, Reliability management, Practical Reliability Engineering, John Wiley
32. NOTE: A search for RCM and related publications on the Internet reveals appearances in a plethora of
guises - hyphenated and not hyphenated, Centered and Centred, capitalised and lower case. In this SSG
we have adopted RCM as the generic term to cover all of the above variations.
33. Asset Management – an anatomy, IAM
34. Adapted from International Infrastructure Management Manual V2, 2002
35. The system has no output at all
36. The system is operating, but below the required standard
37. Nonelectronic Parts Reliability Data (NPRD) provides failure rate data for a wide variety of component
types including mechanical, electromechanical, and electronic assemblies. It provides summary and
detailed data sorted by part type, quality level, environment and data source.

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
67 © Copyright The Institute of Asset Management 2016. All rights reserved.
Reliability Engineering Version 1.1 October 2016
The Institute of
Asset Management

38. http://www.isograph.com/software/hazop/
39. SAE JA1012 “A Guide to the Reliability-Centered Maintenance (RCM) Standard,” January 2002
40. Source: NetworkRail
41. Asset Management – An Anatomy V3, IAM, July 2014
42. This paper lists further recommended reading as a Bibliography.
Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution

43. Note diagrams showing the “pillars of TPM” often show 5,6,7,8 or more pillars depending on the source.
44. Breakdowns of each of these into more granular benefits can be found in the Life Cycle Value Realisation
SSG Document

Purchased by Anita Pharmatrisanti - anitapharma02@gmail.com - For Personal Use Only - Not For Distribution
© Copyright The Institute of Asset Management 2016. All rights reserved. 68
With thanks to our sponsors whose support is greatly appreciated

Sponsorship helps cover the costs of design, check and production processes. Sponsors are not in any way linked to editorial control or input to the IAM’s published content.
Sponsorship of an SSG is not evidence, and must not be used to claim or imply, that the sponsoring organisation is regarded by the IAM as competent or expert in these areas
and the use of the above logos does not imply endorsement in any way by the IAM.

Institute of Asset Management


ISBN 9781908891259
St. Brandon’s House 90000 >
29 Great George Street,
Bristol, BS1 5QT
United Kingdom

T: +44 (0)8454 560 565


E: office@theIAM.org
9 781908 891259
www.theIAM.org

You might also like