Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

REPORT SEPTEMBER

434-20 2019

RISK ASSESSMENT DATA DIRECTORY

Guide to finding and using


reliability data for QRA

responsible source
Acknowledgements
Safety Committee

Photography used with permission courtesy of


©Opla/iStockphoto and Harald Pettersen / © Equinor (Front cover)
©Photo_Concepts/iStockphoto (Back cover)

Feedback

IOGP welcomes feedback on our reports: publications@iogp.org

Disclaimer

Whilst every effort has been made to ensure the accuracy of the information
contained in this publication, neither IOGP nor any of its Members past present or
future warrants its accuracy or will, regardless of its or their negligence, assume
liability for any foreseeable or unforeseeable use made thereof, which liability is
hereby excluded. Consequently, such use is at the recipient’s own risk on the basis
that any use by the recipient constitutes agreement to the terms of this disclaimer.
The recipient is obliged to inform any subsequent recipient of such terms.

This publication is made available for information purposes and solely for the private
use of the user. IOGP will not directly or indirectly endorse, approve or accredit the
content of any course, event or otherwise where this publication will be reproduced.

Copyright notice

The contents of these pages are © International Association of Oil & Gas Producers.
Permission is given to reproduce this report in whole or in part provided (i) that
the copyright of IOGP and (ii) the sources are acknowledged. All other rights are
reserved. Any other use requires the prior written permission of IOGP.

These Terms and Conditions shall be governed by and construed in accordance


with the laws of England and Wales. Disputes arising here from shall be exclusively
subject to the jurisdiction of the courts of England and Wales.
REPORT SEPTEMBER
434-20 2019

RISK ASSESSMENT DATA DIRECTORY

Guide to finding and using


reliability data for QRA

Revision history

VERSION DATE AMENDMENTS

1.0 September 2019 First release


Guide to finding and using reliability data for QRA

Contents

Abbreviations 5

1. Scope and application 6


1.1 Scope 6
1.2 Application 6
1.3 Definitions 6

2. Summary of recommended data 10


2.1 Data Sources 10
2.2 Sources of Reliability Data 10

3. Guidance on use of data 13


3.1 Introduction 13
3.2 Failure Rate Calculation 14
3.3 Calculation of On-Demand Failure Probability 24
3.4 Guidance Specific to the OREDA Handbook 24

4. Review of data sources 27


4.1 OREDA Database and Handbook(s) 27
4.2 MIL-HDBK-217F 30
4.3 FIDES 30
4.4 EPRD-97 and NPRD-95 31
4.5 PDS Data Handbook 31
4.6 FARADIP III 31
4.7 IEEE 493-2007 31
4.8 Sintef Reports, SubseaMaster and WellMaster 32

5. Typical safety systems reliability 33


5.1 Process Isolation Systems 33
5.2 Fire Safety and Passive Safety Systems 36
5.3 Refuges and Shelters 39

6. Recommended data sources for further information 40

7. References 41

4
Guide to finding and using reliability data for QRA

Abbreviations

ALARP As Low As Reasonably Practicable MRT Mean Repair Time


BOP Blow Out Preventer MTBF Mean Time Between Failures
BPCS Basic Process Control System MTTF Mean Time to Failure
C&E Cause and Effect (diagram) MTTFS Mean Time to Fail Spurious
CCF Common Cause Failure MTTR Mean Time to Restoration
CCPS Centre for Chemical Process Safety NFPA National Fire Protection Association
CMF Common Mode Failure IOGP Oil and Gas Producers
DAL Design Accident Load P&ID Piping and Instrumentation Drawing
DC Diagnostic Coverage PFD Process Flow Diagram
DCS Distributed Control System PFDavg Average Probability of Failure on
Demand
E&P Exploration and Production
PFH Probability of Failure per Hour
E/E/PE Electrical/Electronic/Programmable
Electronic PFHD Probability of Dangerous Failure per
Hour
ESD Emergency Shutdown
PLC Programmable Logic Controller
EUC Equipment Under Control
PRV Pressure Relief Valve
FGS Fire and Gas System
PSV Pressure Safety Valve
FIT Failure in Time
PTI Proof Test Interval
FMEDA Failure Modes Effects and
Diagnostics Analysis QRA Quantitative Risk Assessment
FTA Fault Tree Analysis RRF Risk Reduction Factor
GRP Glass Reinforced Plastic SCSSV Surface Controlled Subsurface
Safety Valve
HFT Hardware Fault Tolerance
SFF Safe Failure Fraction
HSE Health and Safety Executive (UK)
SIF Safety Instrumented Function
I/O Input/Output
SIL Safety Integrity Level
IEC International Electrotechnical
Commission SIS Safety Instrumented System
IPF Instrumented Protective Function SRS Safety Requirements Specification
IPL Independent Protection Layer TEMSC Totally Enclosed Motor Propelled
Survival Craft
LOPA Layer of Protection Analysis
TR Temporary Refuge
MooN M out of N

5
Guide to finding and using reliability data for QRA

1. Scope and application

1.1 Scope
The reliabilities of Instrumented Protection Systems (IPFs) is a key input to Quantitative
Risk Assessment (QRA) of hydrocarbon exploration and production facilities. The IPFs
include systems such as the Fire and Gas System (FGS), Emergency Shutdown (ESD) and
blowdown systems, High Integrity Pressure Protection System (HIPPS), Process Shutdown
(PSD) systems and blowout prevention.

This datasheet provides guidance on obtaining, selecting and using reliability data for these
systems and for their component parts, for use in QRA.

1.2 Application
This datasheet contains specimen data taken from previous OGP datasheets; this specimen
data is presented in the appendices. In addition, the recommended data sources that are
identified in section 2.2 should be consulted to ensure that all data are the most up to date
and relevant for any analysis. Guidance on using and processing data is given in Section 3.

The data presented are applicable to activities in support of operations within exploration
for and production of hydrocarbons.

1.3 Definitions
For the purposes of this document, the following terms and definitions apply.

Common Cause Failures Concurrent failures of different devices, resulting from a single
event, where these failures are not consequences of each
other. These failures may occur at the same time or within a
short time of each other.

Common cause failures are dependent failures and may occur


due to external events, systematic fault, human error etc.
Common Mode Failures Concurrent failures of different devices characterised by the
same failure mode. Common mode failures can have the same
or different underlying causes.
Critical Failure Failure of an equipment unit that causes an immediate
cessation of the ability to perform a required function.
Dangerous Failure Failure which tend to impede a given safety action. A failure is
dangerous only regarding a given safety action.

6
Guide to finding and using reliability data for QRA

Detected/Revealed/ In relation to hardware and software, failures or faults which


Overt Failure are not hidden because they announce themselves or are
discovered through normal operation or through dedicated
detection methods (e.g., diagnostics, operator intervention and
tests).

Revealed is mainly used for failures or faults which announce


themselves when they occur.

Detected is mainly used for failures of faults which don’t


announce themselves when they occur and remain hidden
until detected.
Failure The inability of an equipment unit or system to perform a
specified function.
Failure Mode The way that a device fails. These ways are generally grouped
into one of four failure modes: Safe Detected (SD), Dangerous
Detected (DD), Safe Undetected (SU), and Dangerous
Undetected (DU).

A dangerous failure is a failure which impedes or disables a


given safety function.

A safe failure favours a given safety function.


Failure Rate The number of failures per unit time for a piece of equipment.
Usually assumed to be a constant value. It can be broken down
into several categories such as safe and dangerous, detected
and undetected, and independent/normal and common
cause. Care must be taken to ensure that burn in and wear
out are properly addressed so that the constant failure rate
assumption is valid.

Note: In some cases, time can be replaced by units of use.


In most cases, the reciprocal of MTTF can be used as the
predictor for the failure rate, i.e., the average number of
failures per unit of time in the long run if the units are replaced
by an identical unit at failure.
Fault Abnormal state that may cause a reduction in, or loss of, the
capability of a device to perform a required function.
Hidden Failure A failure that is not revealed to operation or maintenance
personnel and that needs a specific action (e.g., periodic test)
to be identified.

7
Guide to finding and using reliability data for QRA

Mean Repair Time (MRT) Expected overall repair time. This includes the following:
• Time spent before performing a repair (after a fault has
been detected); i.e., acquiring resources, and/or spare
parts, including potential logistic delays, and taking the
system out of service, including time required for insulation
removal, scaffolding preparation, warm up/cool down &
drying time;
• Time spent to perform the repair; and
• Time spent to put the repaired component back into
operation, i.e., time to restore all the necessary insulation,
warm up/cool down & dry out, test the repaired component,
scaffolding removal (if intrusive), etc.
Mean Time Between Predicted time between failures of a system during operation.
Failures (MTBF) The MTBF assumes that the failed system is immediately
repaired (mean time to repair, or MTTR), as a part of a renewal
process. This contrasts with the mean time to failure (MTTF),
which measures average time to failure if the system is not
repaired (infinite repair time).

MTBF = MTTF + MTTR


Mean Time to Failure The average amount of time until a system fails or its
(MTTF) “expected” failure time. Please note that the MTTF can be
assumed to be the inverse of failure rate (λ) for a series of
components, all of which have a constant failure rate for the
useful life period of the components.

MTTF = MTBF - MTTR.


Mean Time to Expected overall time taken to fully restore the system. This
Restoration (MTTR) includes:
• The time taken to detect and diagnose the failure;
• Time spent before performing a repair (after a fault has
been detected), i.e., acquiring resources, and/or spare
parts, including potential logistic delays, and taking the
system out of service, including time required for insulation
removal, scaffolding preparation, warm up/cool down &
drying time;
• Time spent to perform the repair; and
• Time spent to put the repaired component back into
operation, i.e., time to restore all the necessary insulation,
warm up/cool down & dry out, test the repaired component,
scaffolding removal (if intrusive), etc.

Note: MTTR is not necessarily the sum of the above listed


delays, but it must consider the possible overlaps between them.

8
Guide to finding and using reliability data for QRA

MooN SIS, or part thereof, made up of “N” independent channels,


which are so connected, that “M” channels are sufficient to
perform the SIF.
Non-critical failure Failure of an equipment unit that does not cause a cessation
of the ability to perform a required function (also known as no-
effects failure).
Random Failure A failure occurring at a random time, which results from
one or more degradation mechanisms. Random failures
can be effectively predicted with statistics and are the basis
for the probability of failure on demand based calculations
requirements for safety integrity level.
Redundancy Use of multiple devices or systems to perform the same
function. Redundancy can be implemented by identical devices
or diverse devices.
Reliability 1) The probability that a device will perform its objective
adequately, for the period specified, under the operating
conditions specified.
2) The probability that a component, piece of equipment or
system will perform its intended function for a specified
period, usually operating hours, without requiring corrective
maintenance.
Safe Failure Failure which tends to favour a given safety action. A safe
failure is a non-redundant system leads to a spurious trip. In a
system with redundancy, it leads to operation where the safety
action is available but with a lower probability of failure on
demand.
Safety Integrity Ability of the SIS to perform the required SIF as and when
required.
Safety Integrity Level Discrete level (one out of four – SIL 1 to SIL 4) allocated to
the SIF for specifying the safety integrity requirements to be
achieved by the SIS.

9
Guide to finding and using reliability data for QRA

2. Summary of recommended data

2.1 Data Sources


Where guideline values are given in the Appendices, these are taken from sources that are
either in the public domain or from pre-existing IOGP datasheets. It is strongly advised that in
all analyses the best available data are taken from the relevant source as listed in section 4.

2.2 Sources of Reliability Data


There are numerous sources of reliability data. Figure 2.1 shows various categories in
which reliability data is usually divided into. It is generally recommended to select the
database considered most appropriate for the equipment type.

Database
Categories

Electrical &
Mechanical Human
Machinery Electronic
Parts Reliability
Parts

Non-
Rotating Electrical Electronic Process
General Electronic Valves
Machines Components Parts Vessels
Parts

Figure 2.1: Reliability Database Categories

2.2.1 List of Data Sources


The recommended sources of reliability are presented in Table 2-1.

Table 2-1: Reliability Data Literature Sources

Data Source Scope Equipment Publisher Available From

OREDA Handbook 2015, 6th Edition – Topsides Equipment OREDA Participants http://www.oreda.com/
Volume 1

OREDA Handbook 2015, 6th Edition – Subsea Equipment OREDA Participants http://www.oreda.com/
Volume 2

MIL-HDBK-214F, 1991, Reliability Electronic Equipment Department of Defense, http://www.weibull.com/


Prediction of Electronic Equipment United States of America

MIL-HDBK-214F Notice 1, 1992, Electronic Equipment Department of Defense, http://www.weibull.com/


Reliability Prediction of Electronic United States of America
Equipment

MIL-HDBK-214F Notice 2, 1995, Electronic Equipment Department of Defense, http://www.weibull.com/


Reliability Prediction of Electronic United States of America
Equipment

10
Guide to finding and using reliability data for QRA

Data Source Scope Equipment Publisher Available From

Nonelectronic Parts Reliability Data – Electrical Quanterion Solutions RMQSI Knowledge Centre
NPRD-2016 Assemblies and Incorporated/RIAC
Electromechanical/
Mechanical Parts

Electronic Parts Reliability Data – EPRD- Electronic components Quanterion Solutions RMQSI Knowledge Centre
2014 Incorporated/RIAC

Failure Mode/Mechanism Distributions Electrical, mechanical, Quanterion Solutions RMQSI Knowledge Centre
FMD-2016 and electromechanical Incorporated/RIAC
parts

PDS Data Handbook, Reliability Data for Sensors, Detectors, SINTEF PDS
SIS, 2013 Edition Valves and Logic https://www.sintef.no/
Solvers projectweb/pds-main-page/

Failure Rate Data In Perspective Electrical, electronic, Technis http://www.technis.org.uk/


(FARADIP), Three mechanical,
pneumatic,
instrumentation and
protective devices

493-2007 - IEEE Recommended Practice Electrical power IEEE Standards Association http://www.techstreet.com/
for the Design of Reliable Industrial and generation and ieee
Commercial Power Systems distribution equipment

WellMaster Reliability Management Well and subsea Exprosoft http://www.exprosoft.com/


System (RMS) equipment

WellMaster BOP Blow Out Preventers Exprosoft http://www.exprosoft.com/

Electrical & Mechanical Component Electrical and Exida http://www.exida.com/Books


Reliability Handbook, 3rd Edition Mechanical
Components

Safety Equipment Reliability Handbook Vol. 01: Sensors Exida http://www.exida.com/Books


(SERH) - 4th Edition Vol. 02: Logic Solvers
Vol. 03: Final Elements

Human Factors in Reliability, Handbook Human factors McGraw-Hill McGraw-Hill


(Edited by W.G. Ireson), pp 12.2 – 12.37.
McGraw-Hill, New York, 1966

Kirwan, B. (1994) A Guide to Practical Human factors CPC Press CPC Press
Human Reliability Assessment. CPC
Press.

The validation of three human reliability Human factors Applied Ergonomics. 27(6) Applied Ergonomics. 27(6)
quantification techniques - THERP, 359-373 359-373
HEART, JHEDI
Part 1, 2 and 3

Lees’ Loss Prevention in the Process Various Elsevier Butterworth- https://www.elsevier.com


Industries, Volume 3, Hazard Heinemann
Identification, Assessment and Control,
Third Edition

Note: Some of the data sources listed in Table 2-1 are now considered to be outdated, especially for electronic components as
the technology has advanced significantly since these were compiled. For example, the MIL-HDBK-214F [1] is over 20 years old.
Use of these should be limited to specific cases when contemporary data sources do not contain the information.

11
Guide to finding and using reliability data for QRA

2.2.2 Applicability of Data Sources to Equipment Type


Each major industry sector usually has an industry specific database of reliability of
equipment and components. Although it is highly advisable to use the industry specific
source, there may be occasions where the ‘next best-fit’ source must be used (at times,
with some correction factors applied to the data).

Table 2-2: Applicability of Data Sources to Component Types

Pressure
Electronic Electrical Logic Process Subsea
Data Source Sensors Valves Safety
Components Machines Devices Vessels Equipment
Devices

OREDA Handbook Vol. 1 Yes Yes Yes Yes Yes Yes Yes

OREDA Handbook Vol. 2 Yes

MIL-HDBK-214F, 1991 Yes Yes Yes Yes

NPRD-2016 Yes

EPRD-2014 Yes Yes

FMD-2016 Yes Yes Yes Yes Yes

PDS Data Handbook Yes Yes Yes

FARADIP, Three Yes Yes

493-2007 - IEEE Yes Yes

WellMaster RMS Yes

WellMaster BOP Yes

EMCRH Yes Yes

SERH Yes Yes Yes Yes

12
Guide to finding and using reliability data for QRA

3. Guidance on use of data

3.1 Introduction
Reliability is a broad term that covers multiple aspects of a system or product and focuses
on the ability of a product to perform its intended function. Mathematically, if an equipment
item is performing its intended function at time equals zero, reliability can be defined as
“the probability that an item will continue to perform its intended function without failure
for a specified period under stated conditions”. The product defined here could be an
electronic or mechanical hardware product, a software product, a manufacturing process
or even a service.

The science of reliability prediction is based upon the principals of statistical analysis.
Reliability engineering uses a probabilistic approach rather than a deterministic one. This
probability can be calculated or stated to reside within certain statistical confidence limits.
Fundamental to such a calculation is the ability to source basic reliability data. Ideally such
data1 should be:
• Current
• Auditable
• Specific (applicable to equipment/component type)
• Extensive (large sample with many recorded failures)
• Applicable to environment
• Be suitable for life trending

Unfortunately, real world data sources rarely meet these ideals and it is therefore
necessary to accept compromises. When performing QRA, it is important that the
limitations of the data source are understood, and where necessary alternatives sought.

For QRA, the reliability parameters to be taken from the database would be the failure rate
(or the mean time to failure, MTTF) and/or the average probability of failure on demand
(PFDavg). See Section 3.3 for details of probability of failure on demand calculation.

Where information is extracted from the OREDA [2] or another industry standard
database it is not (in general) necessary to perform any further statistical analysis of the
failure patterns for QRA purposes. The approach described in Section 3.2 applies where
basic information relating to times to failure is available for analysis, for example from
maintenance records or breakdown reports. In these circumstances, it is necessary to
judge the quality of the data and to then apply the appropriate analytical technique. The
techniques for data analysis presented herein are divided into two classifications as follows:
• Based on sample statistics
• Based on inferences from the associated statistical distributions

1 “CASE AB” check to determine appropriateness of the data.

13
Guide to finding and using reliability data for QRA

The characteristics of distributions are much harder to derive, especially from field
breakdown reports rather than laboratory test data, but have the potential to provide
more information. Note that it is not the intention to provide a comprehensive theoretical
background to data analysis in this document, but instead to provide some practical
techniques that may be used to prepare reliability data.

This document outlines three techniques for data analysis, namely:


• Calculation of point estimate of failure rate applied where adequate data are available
– refer to Section 3.2.2
• Prediction of failure rate within defined confidence limits applied where only sparse
failure data are available – refer to Section 3.2.3
• Use of probability plotting to derive information relating to the underlying statistical
distribution – refer to Section 3.2.4

3.2 Failure Rate Calculation

3.2.1 Background
The observed failure rate for a component is defined as the ratio of the total number of
failures to the total cumulative observation or operational time. For items displaying a
constant failure rate, if λ is the failure rate of the N items then:

λ=k⁄T
Where k is the total number of failures and T is the total observation time across the N items.

For the case where components are replaced after failure (as applies to industry field
databases), then the total cumulative observation time may be defined as N × field
operational lifetime.

Strictly, this calculation provides a point estimate of the failure rate and if the exercise were
repeated with another set of identical equipment and conditions, it may yield results that
are not identical to the first. Any number of such measurements may be made providing a
number of “point estimates” for the failure rate, with the true value of the failure rate only
being provided after all components have failed (for a non-replacement test). In practice,
therefore, it is necessary to make a prediction about the total population of items based
on the failure patterns of a sample. This process of statistical inference can be performed
using the properties of a “χ2” (chi squared) distribution. This allows us to bound the
population failure rate within confidence limits (typically 90% or 60% may be used).

It is also necessary to make some assumptions about the pattern of failures across time,
considering the shape of the commonly depicted ‘bathtub curve’ (Figure 3.1). This curve
typifies the expected component failure rate across time and is divided into three distinct
areas, namely:
• Early life: Characterized by a decreasing failure rate
• Useful life: Constant failure rate
• Wear out: Increasing failure rate

14
Guide to finding and using reliability data for QRA

Figure 3.1: The Bathtub Curve

To perform analysis of failure patterns outside of the constant failure rate period a level of
detailed information is required that is typically not available from the recorded data (e.g.,
actual age of equipment of failure, homogeneous samples). Therefore, an assumption
is made that all failures recorded are experienced during the useful life phase, and the
pattern of these failures may be described by a random, exponential distribution. This can,
at least to a certain extent, be justified on the following grounds:
• Early life failures resulting from commissioning problems may not be recorded as
equipment failures
• Early life failures resulting from manufacturing defects can be largely eliminated by
testing prior to installation
• Wear out failures largely eliminated by preventative maintenance and planned
renewals. Note that this assumption may be less valid for wear out of subsea
equipment where no planned maintenance will be performed

The discussion allows us to analyse the data from each source, and in most cases to
calculate a mean value, confidence intervals about the mean value and the associated
variance.

The next set of sections briefly describe the following calculations:


• Point Estimate Failure Rate
• Chi-Squared Failure Rate
• Probability Plotting

Of these, the constant failure rate (point estimate or Chi-Squared failure rate) model is
most commonly used for QRA purposes.

15
Guide to finding and using reliability data for QRA

3.2.2 Point Estimate Failure Rate


Where adequate data are available, i.e., the sample set is sufficiently large, a point estimate
of the failure rate can be made simply by taking the ratio of the total number of recorded
failures to the total cumulative observed time.

If k is the total number of failures of N items, then the failure rate λ(t) is given by:
λ(t) = k ⁄ T
where T is the total cumulative observed time.

As the product matures, the weaker units die off, the failure rate becomes nearly constant,
and modules have entered what is considered the normal life period. This period is
characterised by a relatively constant failure rate. The length of this period is referred to
as the system life of a product or component. It is during this period that the lowest failure
rate occurs. The amplitude on the bathtub curve is at its lowest during this time. The useful
life period is the most common time frame for making reliability predictions. Most of the
failure rates quoted in data references (such as MIL-HDBK-217) apply to this period.

Also, during the “constant failure rate” period, the mean time between failures (MTBF) is
often reported instead of the failure rate. MTBF can be obtained by:
MTBF = 1⁄λ

Note: This assumption is only valid for the flat region of the bathtub curve. It is inappropriate
to extrapolate MTBF to give an estimate of the service life time, which may be less than
suggested by the MTBF given the higher failure rates in the wear-out part of the bathtub curve.

The MTBF numbers are preferred in engineering usage as large positive numbers (say 5000
hours) is more intuitive than very small numbers (say 0.0002 per hour) and can be linked to
maintenance intervals based on hours of operation.

3.2.3 Constant Failure Rate Model (Chi Squared)


This model is most appropriate if the number of recorded failures is less than 5 and/or
the total overserved time is relatively small. Where the total number of failures is small or
zero, a point estimate of failure rate is inappropriate, therefore a technique of statistical
inference and confidence limits should be applied. Also, as the sample size increases, the
distribution approaches the normal distribution.

This can be addressed via a Chi Squared (χ2) test using the methodology described below.
This method can forecast the failure rate when no failures have been recorded in the
observed time using confidence intervals.

Required Input Data


• Total overserved time ‘T’
• Number of failures ‘k’ where ‘k’ is an integer between 0 and 5
• Confidence interval2 ‘CI’
• Test data truncation information – failure based or time based
2 In statistics, a confidence interval (CI) is a type of interval estimate of a population parameter where a population parameter is a
quantity that indexes a family of probability distributions.

16
Guide to finding and using reliability data for QRA

Calculation
1) Calculate α = 1 - CI
2) Calculate n as follows:

Single-Sided Limits*
a) n = 2k for failure-truncated test
b)
n = 2(k+1) for time-truncated test

Double Sided Limits*


a)
n = 2k and (1 - α ⁄ 2) for lower limit
b)
n = 2k(2k + 2) and α ⁄ 2 for upper limit

Look up the value of χ2 corresponding to n and α (using statistics tables for Chi Squared
distribution or see Table 3-1.
3) Failure Rate Confidence Limit λCI is calculated using λCI = (χ2) ⁄ 2T

* Confidence Limits

The limits defining the interval are called confidence limits. These are the highest and the
lowest values in the interval. The two-sided version tests against the alternative that the
true variance is either less than or greater than the specified value. The one-sided version
only tests in one direction. The choice of a two-sided or one-sided test is determined by the
problem.

𝑓𝑓(𝛘𝛘𝟐𝟐) 𝑓𝑓(𝛘𝛘𝟐𝟐)

1 − 𝛼𝛼 𝛼𝛼 𝛼𝛼)
2 1 − 𝛼𝛼 𝛼𝛼)
2

𝛘𝛘𝟐𝟐 𝛘𝛘𝟐𝟐

Single Sided Limit Double Sided Limit

Figure 3.2: Chi-Squared Method Confidence Limits

It is worth noting that λCI (χ2 ⁄ (2T) is a conservative estimate. The true value has probability
of α being higher than the estimate (based on a single sided upper confidence limit). Using
the upper bound of the failure rate is a conservative approach and hence it can be used
instead of the maximum likelihood estimate when the sample is small.

17
Guide to finding and using reliability data for QRA

Table 3-1: Chi-Squared Distribution

Worked Example

Problem: Equipment maintenance records show that 5 identical devices each with a
recorded running time of 1000 hours each have experienced no recorded failures. Calculate
the failure rate at 90% confidence (single sided upper limit).

Solution:

Following inputs have been provided:


• Individual device observed time = 1000 hours
• Number of failures ‘k’ = 0
• Confidence interval ‘CI’ = 90% or 0.9

Since no failures have occurred, it is appropriate to use time-truncated equations.

18
Guide to finding and using reliability data for QRA

Calculation Step Value


Total observed time ‘T’ T = 5 ×1000 = 5000 hours
Calculate α = 1 - CI α = 1 - 0.9 = 0.1
n = 2(k+1) for time-truncated test n = 2 (0+1) = 2
From the table, note χ2 χ2 = 4.605
Upper bound of failure rate λCI (90% confidence) λCI = χ2 ⁄ 2T = 0.0004605 or 4.61E-04 failures/hour

Note: The decision to use statistical interpretation or point estimate is based on the
number of recorded failures. For items with a very high failure rate, a significant number of
failures could equate to a small amount of experience years, but typically a large amount of
experience years is also required for a point estimate.

3.2.4 Probability Plotting (Weibull Distribution)


Where sufficient good quality data are available, probability plotting techniques may be
used to derive information relating to the underlying statistical distribution. Graphical
plotting techniques may be implemented manually or by computer and involve analysis of
the cumulative distribution of the data.

A probability plot allows the user to plot time-to-failure data on a specially-constructed


plotting paper, which differs from distribution to distribution. Based on the linearity of the
data points on the plot, the user can determine whether he or she has chosen a distribution
that is appropriate to the data. The user can also make estimates of the distribution’s
parameters from scales on the plot. A commonly used distribution for failure data is the
Weibull Distribution. This distribution originally postulated in 1951 by Swedish engineer
Waloddi Weibull. It is particularly suited to reliability life data plotting because of its
flexibility, having no specific shape but instead being described by shaping parameters. It is
a three-parameter distribution namely:
• Characteristic life (α)
• Shape factor (β)
• Location parameter (γ)

Often only two parameters α and β are used by setting γ = 0.

There are special cases associated with values of the shape factor:
• β=1 corresponds to exponential distribution
• β<1 represents burn in (decreasing failure rate)
• β>1 represents wear out (increasing failure rate)

Note: In line with convention, β is used here to represent the shape factor of the Weibull
distribution. This is not the same β used to describe the dependent failure fraction of
common cause failures.

19
Guide to finding and using reliability data for QRA

By using a graphical plotting technique, the data can be quickly analysed without detailed
knowledge of statistical mathematics. A simple procedure for this is as follows:
• Determine test sample size and times to failure
• List times to failure in ascending order
• Establish median rankings from published tables (or calculate/estimate from
formulae)
• Plot times and corresponding ranks on Weibull plot paper. This is essentially log-log
graph paper but with scales for reading β and α
• Draw best fit straight line and read off α at 63.3% intercept
• Draw a parallel line through intercept on y axis and read off β

An example using median ranking

Median ranking is the most frequently used method for probability plotting, especially if the
data are known not to be normally distributed. Median ranking tables are available from
statistics text books, or they may be estimated by the following equation:

R = (i - 0.3) / (N + 0.4)

where i is the failure order number and N is the total number of failures.

In the following example, failures are listed from 1 to 10 with their corresponding time to
failure and median rank. These are then plotted on the Weibull paper.

Table 3-2: Rank Data and Median Ranks

Time to Time to Time to


Failure Median Failure Median Failure Median
Failure Failure Failure
Number Rank Number Rank Number Rank
(hours) (hours) (hours)

1 10 0.02 11 2000 0.35 21 77000 0.68

2 38 0.06 12 5000 0.38 22 10200 0.71

3 80 0.09 13 8300 0.42 23 119000 0.75

4 140 0.12 14 1200 0.45 24 134000 0.78

5 215 0.15 15 16300 0.48 25 146000 0.81

6 310 0.19 16 21500 0.52 26 159000 0.85

7 460 0.22 17 27500 0.55 27 172000 0.88

8 670 0.25 18 36000 0.58 28 187000 0.91

9 1050 0.29 19 48200 0.62 29 204000 0.94

10 1900 0.32 20 74000 0.65 30 230000 0.98

20
Guide to finding and using reliability data for QRA

The Weibull plot is as follows:

Figure 3.3: Weibull Plot

Plot Line and Read Values of characteristic life (α) and shape factor (β)

It is generally acceptable to fit a straight-line plot by eye through the data points. The value
of shape factor is read by drawing a line perpendicular to the plotted line through the plot
origin. The value of β can then be read from the intercept of this line and the β scale. The
value for the characteristic life may be read from the intercept of the plotted line with
the “estimator line”. The position of the estimator is determined by the intercept of the
perpendicular line with the α scale.

In the above plot all three stages of the bathtub curve are displayed, the values are
approximately:

Characteristic life (α) 87 hours 320 hours 1000hours

Shape factor (β) 0.7 1.0 3.4

3.2.4.1 Probability Plotting – Complex Scenarios


If a straight line is not obtained in the Weibull plot, there could be one or more underlying
reasons, including:
• Data having been censored
• More than one failure mechanism (mixed Weibull effects)
• Errors in sampling
• There is a threshold parameter (i.e., a three parameter Weibull distribution applies)
• Distribution not Weibull

21
Guide to finding and using reliability data for QRA

3.2.4.2 Dealing with Censored Data


At the end of a reliability trial or when processing field data there may be several items
that have not failed. This is referred to as a censored data sample. Those items that have
survived are referred to as “suspended”. To calculate the median ranks in this situation the
following procedure should be followed:
• Determine test sample size and times to failure
• List times to failure in ascending order
• Place suspended test items at the appropriate points in list
• For each failed item calculate the mean order number iti
iti = it-1 + Nti

(n + 1) - iti-1
where Nti =
1 + (n - number of preceding items)

and n is the sample size

• Establish median rankings from published tables (or calculate/estimate from


formulae)
• Plot times and corresponding ranks on Weibull plot paper

3.2.4.3 Mixed Distributions


If the data do not fit to a straight line, especially where an obvious change of slope is seen,
it may be that more than one mode of failure is being displayed by the sample. If this is the
case, the data pertaining to each failure mode must be segregated and analysed separately.

3.2.4.4 Failure Free Period


Should the data still yield a curve rather than a straight line, it is possible that a failure free
life period is being exhibited, i.e., a three value rather than a two value Weibull distribution
is applicable.

The third Weibull parameter (location parameter), γ, locates the distribution along the
abscissa. Changing the value of γ has the effect of “sliding” the distribution and its
associated function either to the right (if γ > 0) or to the left (if γ < 0). The parameter γ may
assume all values and provides an estimate of the earliest time a failure may be observed.
A negative γ may indicate that failures have occurred prior to the beginning of the test or
prior to actual use. The life period 0 to +γ is the failure free operating period of such units.

To account for this, an attempt can be made to predict the failure free period. This may
be based on engineering judgement and knowledge of the items under consideration or
may simply be the time until the first failure occurs. The data are then replotted from this
time and if a straight line results the failure free period is as estimated and the remaining
parameters may be estimated from the plot. If another curve is produced the process is
repeated.

22
Guide to finding and using reliability data for QRA

3.2.5 Treatment of Common Cause Failures


A Common Cause Failure (CCF) is the result of an event that, because of dependencies,
causes a coincidence of failure states in two or more separate channels of a redundant
system, leading to the defined system failing to perform its intended function. CCFs can
degrade the performance of any redundant system and are of concern when analysing
protective functions.

Several mathematical techniques exist for the treatment of CCF’s. One of the simplest
and most practical is the Beta factor approach. This assumes that λ, the total failure rate
for each redundant unit in the system, is composed of independent and dependent failure
contributions as follows:

λ = λc + λi
where λi is the failure rate for independent failures
λc the failure rate for dependent failures
The parameter beta ( β) can then be defined as:

β = λc /λ
Note: β is also commonly used to represent the shape factor of the Weibull distribution,
this is not the same as β used to describe the dependent failure fraction of common cause
failures.

Thus, β is the relative contribution of dependent failures to total failures for the item. The
lack of available data relating to dependent failures of sufficient quality necessitates the
use of an estimation technique for beta, guided by several parameters shaping factors (the
subjective assessment of defensive mechanisms). Such a quantification method, known as
the partial beta factor model may be applied for detailed assessment.

For a simpler approach a representative value of β may be assumed between 0.01 (highly
diverse components or systems) and 0.1 (similar components or systems).

3.2.6 Failure Rate Calculation using the OREDA Estimator


The OREDA handbook [2] recognises that the data it presents are not taken from a
homogeneous sample. To merge these non-homogenous data into a single multi-sample
estimate with an average failure rate (point estimate of total number of failure divided by
aggregated time in service) is likely therefore to result in an unrealistically short confidence
interval. An approach referred to as the OREDA-estimator is applied to derive a mean
failure rate with associated upper and lower 90% confidence bounds. A description of the
theoretical basis for the OREDA-estimator is given in [3].

The handbook also gives point estimates of failure rate; the numerical difference between
this and the OREDA estimator gives an indication of the degree of diversity in failure rates
between parts of the overall population. OREDA recommends that the OREDA estimator be
used when data are taken from this source.

23
Guide to finding and using reliability data for QRA

3.3 Calculation of On-Demand Failure Probability


The on-demand failure probability may be listed in the failure data source, e.g., OREDA [2]
or occasionally FARADIP [4]. It is usually more appropriate, however, to calculate a specific
probability of failure on demand for a given protective function. Typically, such failures are
unrevealed and must be detected by means of manual or automatic proof testing. OREDA
does not list the PFD directly, but reports the frequency of the ‘Failure on Demand’ failure
mode, from which the PFD can be calculated, as described in section 3.2.4.1.

For a protective system having dangerous failure rate λd and proof test interval T, the
probability of failure on demand or unavailability due to unrevealed failures is presented in
Table 3-3.

The table gives a simplified set of equations to calculate the PFDavg for different redundant
architecture combinations usually represented by MooN (M out of N). For example, an on-
demand 2oo3 system implies that 2 elements must work correctly out of 3 for the overall
system to work successfully.

Further details are available in functional safety standard, IEC 61508 Part 6 [5].

Table 3-3: Average Probability of Failure on Demand

MooN 1oo1 1oo2 2oo2 2oo3 2oo4

PFDavg λd T
⁄2 λ2 T 2
d ⁄3 λ T
d
λ2 T 2
d
λ3 T 3
d

Note that these simplified formulae are only applicable under specific conditions. For
example, 1oo1 must have λdT ≪ 1.

3.4 Guidance Specific to the OREDA Handbook

3.4.1 Selecting Appropriate Data


The item selected from the database must be appropriate in terms of fit to the system
under analysis and in terms of data quality. Specifically, the following should be considered:

Technology Does the data correctly represent the equipment being


assessed?
It may be necessary for the analyst to provide or seek expert
judgement on certain questions, e.g., can data for a diesel engine
be used for a spark ignited engine?
Environment Will the environmental conditions influence the failure rate?
OREDA data are gathered from offshore North Sea. This
introduces specific failure mechanisms (saline environment,
humidity, temperature), if transferring the data to another
environment additional failure modes and mechanisms may be
involved, and/or it is likely that the frequencies will be different.

24
Guide to finding and using reliability data for QRA

Operational Mode Is the data appropriate for the operating mode of the equipment?
Equipment operated frequently in a standby mode (emergency
generators, firewater pumps) will exhibit different failure modes
and frequency compared to equipment operating continuously.
Number of Recorded Is the equipment failure data set large enough to be
Failures representative?
Equipment with few recorded failures will have a large uncertainty
associated with their failure rate. In such cases, Chi-Squared
method may be better.
Population and Does the data set encompass wide a enough population?
Installations It is desirable for data to be selected for equipment with a large
population across a wide number of installations. This avoids
data representing localised effects or dominated by one design or
manufacturer.
Time in Service Has the data being gathered from equipment which has spent
sufficiently long time in similar conditions?
It is desirable for data to be selected for equipment with a long
time in service (calendar time). The operational time may be
considerably less for equipment that is normally on standby (e.g.,
firewater pumps).

3.4.1.1 Number of Demands


Where stated, this value can be used to derive an on-demand failure probability (but note
also that an on-demand failure probability is occasionally stated in the comment field). For
example, one selected data item (taxonomy code 1.3.2) has 7 recorded critical failures for
the mode “fails to start on demand”. The number of demands is given as 860, and hence
the on-demand critical failure probability can be calculated as 7/860 = 0.008.

3.4.1.2 Repair Time


Repair times are stated in terms of active repair hours and repair man-hours (min, mean
and max). In general, the “active repair hours” will be of most interest but this field is
sometimes blank. In these instances, an estimate can be made at 50% of the repair man-
hours. Note that the active repair time does not include time for fault realisation, spare
parts or crew mobilisation or the impact of any applied maintenance strategy or delays (See
MTTR definition in section 1.3).

3.4.1.3 Lack of Critical Failure Class Data


The OREDA database classifies the failure severity according to four different classes:
• Critical: A critical failure is one that causes immediate and complete loss of the
capability of a system of providing its output.
• Degradation: A failure mechanism that evolves over time and will typically develop to
a critical failure in time if not corrected.

25
Guide to finding and using reliability data for QRA

• Incipient: Incipient failures have no immediate effect upon function. However, if it is


not attended to, it can develop into a critical or degraded failure overtime.
• Unknown: Instances where failure severity was not recorded or could not be deduced.

These can be represented as a simplified state diagram as shown in Figure 3.4.

λCritical
Healthy Critical

Non-
λDegradation Degradation
Critical

λIncipient
Incipient

Figure 3.4: Simplified Failures State Diagram

In some cases, OREDA does not have any values recorded against ‘Critical’ failure class.
In such cases, it can be assumed that a proportion of degradation and incipient failures
can result in critical failure if not suitably attended to in good time. For simplicity, the
QRA analyst can assume that critical failure rate is the summation of the degradation and
incipient failure rates. However, the following points must be true for this to be a valid
assumption:
• The sample size of the data set is sufficiently large
• No credit has been taken for preventative maintenance

The QRA analyst could also consider other data sources which may have the requisite
data. If weighting of data between different sources is considered necessary, the method is
available in Estimation Procedures section of OREDA [2].

Although unlikely, an analyst may be required to use more sophisticated methods. Such
methods are available, for example Bayesian estimation techniques, where the parameters’
prior distributions are founded on a broader range of the data gathered within in the OREDA
project [31].

26
Guide to finding and using reliability data for QRA

4. Review of data sources

4.1 OREDA Database and Handbook(s)


Originally initiated by the Norwegian Petroleum Directorate in 1981 to collect reliability data
for safety equipment, OREDA is a project organization sponsored by eight oil companies
with worldwide operations. OREDA’s main purpose is to collect and exchange reliability
data among the participating companies and to act as a forum for co-ordination and
management of reliability data collection within the oil and gas industry. OREDA has established
a comprehensive databank of reliability and maintenance data for exploration and production
equipment from a wide variety of geographic areas, installations, equipment types and
operating conditions. Offshore subsea and topside equipment are primarily covered, but onshore
equipment may also be included. The data are stored in a database, and specialized software
has been developed to collect, retrieve and analyse the information. A more recent addition to
the OREDA database is information pertaining to subsea equipment including control systems,
flowlines, manifolds, production risers, templates, wellheads and Xmas trees amongst others.

Note: Access to the electronic database is restricted to participants in the OREDA program.

The project phases are reported in various handbooks as follows:

Table 4-1: OREDA Project Phases History

Phase Time-Period Members Milestones

I 1981-1984 8 • OREDA Reliability Data Handbook (’84 edition)

II 1985-1988 7 • Issued phase II reliability database


• Guideline for Data Collection
• Software for data storage and analysis (DOS)

III 1989-1992 10 • Issued phase III reliability database


• 2nd handbook issued (’92 edition)

III-IV 1993 10 • Data analysis methods developed and tested

IV 1994-1996 10 • Collected data on preventive maintenance

V 1997-1999 10-11 • 3rd Handbook issued (’97 edition)


• 1st edition ISO 14224 issued. Current edition was issued in 2016 [6]

VI 2000-2001 10 • Strengthen focus on subsea equipment


• Co-operation with subsea manufacturers established

VII 2002-2003 8 • 4th Handbook issued (2002 edition)


• Focus on safety & subsea equipment

VIII 2004-2005 9 • Involved in revised ISO 14224 standard (second edition)

IX 2006-2008 9 • Adoption of OREDA taxonomy and software to ISO 14224


• Continued focus on worldwide span of data coverage

X 2009-2011 8-10 • 5th handbook issued (2009 edition)

XI 2012-2014 8-10 • Development of new data collection software

XII 2015 8 • 6th handbook issued [2]

27
Guide to finding and using reliability data for QRA

OREDA data equipment groups and the equipment items covered are listed in Table 4-2.

Table 4-2: OREDA Data Categories

System Equipment Class

Volume 1

1. Machinery 1.1 Compressors

1.2 Gas Turbines

1.3 Pumps

1.4 Combustion Engines

1.5 Turboexpanders

1.6 Steam Turbines

2. Electric Equipment 2.1 Electric Generators

2.2 Electric Motors

2.3 Battery and UPS

3. Mechanical Equipment 3.1 Heat Exchangers

3.2 Vessels

3.3 Heaters and Boilers

4. Control and Safety Equipment 4.1 Fire & Gas Detectors

4.2 Input Devices

4.3 Control Logic Units

4.4 Valves (described by application code)

4.5 Valves (described by taxonomy code)

Volume 2

5. Subsea 5.1 Control Systems

5.2 Flowlines

5.3 Manifolds

5.4 Pipelines (SSIV)

5.5 Risers

5.6 Running Tools

5.7 Templates

5.8 Wellheads & X-mas Trees

4.1.1 OREDA Data Presentation


The OREDA handbook [2] presents the following data recorded for each equipment
taxonomy class recorded.

Boundaries

Each equipment item class has an inventory description provided at the start of the
respective chapter. This should be examined carefully to identify equipment items for the

28
Guide to finding and using reliability data for QRA

system under consideration that lie outside the defined OREDA boundary. These must then
be considered as separate items. An example of this would be a compressor or electrical
generator where the prime mover is listed as a separate item.

Taxonomy code

The taxonomy code gives an identification of the equipment item selected from the
database. It is good practice to record this code and to include it within calculations as a
reference for any data extracted.

Population

Total number of items under surveillance.

Aggregated time in service (calendar time)

This is the total recorded observation time for the population.

Aggregated time in service (operational time)

Total recorded observation time for the population when it is required to fulfil its functional
role. Note that this may be an estimated value.

Number of demands

Total number of recorded demand cycles for the population. Note that this may be an
estimated value.

Failure Mode

This column presents the recorded modes of failure for the equipment item, divided into
severity classes critical, degraded, incipient and unknown. In general, only the critical
severity class failures need be considered, i.e., those that cause an immediate and
complete loss of an items function. Where an equipment item performs more than one
function (e.g., process and protective) it may be necessary to review each failure mode and
identify the requirement to progress it into the risk calculation, either as an aggregated
failure rate value for the equipment item or as individual failure events. i.e., critical failures
may include dangerous, non-dangerous and safe failures. These failures may be critical to
production but not to the equipment’s protective function.

Number of Failures

This is the total number of failures aggregated across all modes. In general, the higher the
number of failures, the greater the confidence in the calculated failure rate.

Failure Rate

All failure rates in the OREDA handbook are presented in terms of failures per million
hours. The following data are presented for each mode, calculated both in terms of
calendar and operational time:

29
Guide to finding and using reliability data for QRA

• Mean: estimated average failure rate, calculated using the “OREDA” estimator – see
Section 3.2.6 for details
• Lower, Upper: 90% confidence bounds for the failure rate
• SD: Standard deviation
• n/T: Point estimate of the failure rate i.e. total number of failures divided by the total
time in service

For most calculations, it is recommended that the mean value (i.e., based on the OREDA
estimator) is used. Note that the difference in value between the point estimate and mean
failure rate relates to the degree of diversity in the population.

4.2 MIL-HDBK-217F
The MIL-HDBK-217 [1] handbook contains failure rate models for the various part types
used in electronic systems, such as integrated circuits, transistors, diodes, resistors,
capacitors, relays, switches, and connectors.

The handbook details two methods for reliability prediction, namely parts count and parts
stress calculation. Parts count prediction is recommended during the design phase of a
project. It is simpler than parts stress and requires less detailed information. To calculate a
system failure rate the following method is used:

For each component part of a system, a baseline failure rate value is selected from tables
based on the type of the part and the operating environment. This value is then modified
by multiplying by a quality factor, again selected from a table (e.g., military or commercial
specification). For microelectronics, a learning factor may also be applied. The overall
system failure rate is then derived by summation of the parts failure rates; hence the
title “parts count”. In general, parts count analysis will provide an adequate estimate of a
system’s failure rate for use in QRA.

Parts stress analysis involves derivation of more multiplying factors that in turn require
detailed analysis of the system.

4.3 FIDES
This is reliability standard created by FIDES Group - a consortium of leading French
international defence companies: AIRBUS, Eurocopter, Giat, MBDA and THALES. The FIDES
methodology is based on the physics of failures and is supported by the analysis of test
data, field returns and existing modelling. The FIDES Guide is a global methodology for
reliability engineering in electronics. It has two parts, namely a reliability prediction guide
and a reliability process control and audit guide.

Its key features are:


• Provides models for electrical, electronic, electromechanical components and some
subassemblies
• Considers all technological and physical factors that play an identified role in a
product’s reliability

30
Guide to finding and using reliability data for QRA

• Considers the mission profile


• Considers the electrical, mechanical and thermal over-stresses
• Failures linked to the development, production, field operation and maintenance
processes

The guide can be downloaded from: http://fides-reliability.org/files/UTE_Guide_


FIDES_2009_Ed_A_EN.pdf

4.4 EPRD-97 and NPRD-95


The databases EPRD-97 (Electronic Parts Reliability) [7] and NPRD-95 (Non Electronic
Parts Reliability) [8] were developed by the United States Department of Defense Reliability
Information Analysis Center (RIAC). The EPRD-97 database contains failure rate data on
electronic components, namely capacitors, diodes, integrated circuits, optoelectronic devices,
resistors, thyristors, transformers and transistors. The NPRD-95 database contains failure
rate data on a wide variety of electrical, electromechanical and mechanical components. Both
databases contain data obtained by long-term monitoring of the components in the field.
The collection of the data was from the early 1970s through 1994 (for NPRD-95) and through
1996 (for EPRD-97). The purposes of the both databases are to provide failure rate data on
commercial quality components, provide failure rates on state-of-the-art components to
complement MIL-HDBK-217F by providing data on component types not addressed therein.

4.5 PDS Data Handbook


The PDS Data Handbook [9] provides reliability data estimates for components of control
and safety systems. Data for field devices (sensors, valves) and control logic (electronics)
are presented, including data for subsea equipment. The data are based on various
sources, including OREDA and expert judgement. Some values for β factors for analysis of
common cause failures are also presented.

4.6 FARADIP III


FARADIP (Failure RAte Data In Perspective) [4] is an electronic database that presents data
concatenated from over 40 published data sources. It provides failure rate data ranges
for a nested hierarchy of items covering electrical, electronic, mechanical, pneumatic,
instrumentation and protective devices. Failure mode percentages are also provided.

4.7 IEEE 493-2007


The objective of this book [10] is to present the fundamentals of reliability analysis applied
to the planning and design of industrial and commercial electric power distribution
systems. The intended audience for this material is primarily plant electrical engineers.

The design of reliable power distribution systems is significant because of the high cost
associated with power outages. It is necessary to consider the cost of power outages when
making design decisions for power distribution systems.

31
Guide to finding and using reliability data for QRA

4.8 Sintef Reports, SubseaMaster and WellMaster


ExproSoft is a spin-from the Norwegian Research Institute SINTEF, and has acquired
all commercial rights to reliability databases previously operated by this institute. These
products have since been refined and extended, creating integrated reliability database and
analysis tools for the upstream sector.

A study (JIP) on reliability of well completion equipment (“Wellmaster Phase III”) [11] was
completed by SINTEF in November 1999. This has resulted in a database of well completion
equipment, with a total of 8000 well-years of completion experience represented.

A subsea equipment reliability database project was completed by ExproSoft in late 2000
(Phase I) [12]. This project, led to the development of the SubseaMaster database and
software version 1.0. Phase II of SubseaMaster was launched as a joint industry project in
May 2001. and was completed in April 2003.

ExproSoft sell copies of the Sintef reports referred to in this datasheet.

Weblink: http://www.exprosoft.com/products/wellmaster-rms/

32
Guide to finding and using reliability data for QRA

5. Typical safety systems reliability

In this section, reliability as aspects pertaining to various types of protection systems used in
oil and gas installations is discussed. Note that the reliability figures quoted are representative
values for the relevant system, and are subject to assumptions and limitations (such as
adequate maintenance being carried out on the components, operating conditions etc.).

The systems discussed in this section are presented in Figure 5.1.

Safety Systems

Process Isolation Fire Safety Passive Safety Refuge and


Systems Systems System Shelters

Process Active Fire Temporary


Blastwalls
Shutdown System Protection Refuges

Emergency Passive Fire Cryogenic Spill Escape and


Shutdown System Protection Protection Evacuation

Emergency Quick
Disconnect Systems

High Integrity
Pressure Protection
System

Blowdown
System

Figure 5.1: Typical Safety Systems

5.1 Process Isolation Systems


Process isolation systems use any unacceptable deviations in process parameters (such
as pressure, vessel level) to trigger a safety action which may involve closing valves or
stopping pumps. The process isolation systems can be regarded as preventative safety
systems as they are designed to prevent propagation of the initiating events to inhibit
realisation of the hazardous scenarios. The level of isolation may range from a small
section of the plant to full-scale shutdowns. Typically:
• Process shutdown systems (PSD) perform ‘local’ shutdowns such as isolation of a
vessel or section of the plant
• Emergency shutdown systems (ESD) perform a higher level of shutdown which may
involve isolation of large sections of the plant or the entire operation
• Blowdown systems which perform controlled depressurisation of the plant

33
Guide to finding and using reliability data for QRA

• Emergency quick disconnect systems (EQD) which disconnect (physical separation)


two systems. This type of safety functionality is part of Blowout preventers, riser –
boat disconnect, cargo loading arms etc
• Safety instrumented system (SIS) which performs the safety function to a known
minimum level of reliability called safety integrity level (SIL). The SISs may be
independent systems or part of the PSD/ESD3
• High integrity pressure protection system (HIPPS) which performs overpressure protection
to a considerably high level of reliability. A HIPPS is a SIS, typically with a reliability of
SIL 3 range (with PFDavg ≥10-4 to < 10-3); and is usually used where a full flow pressure
relief is not available. Majority of HIPPS designs aim to achieve high-end SIL 3 reliability

5.1.1 Reliability Ranges and Guidance


All SIFs will have a defined Safety Integrity Level (SIL) which gives the reliability range
that SIF must meet. Often, a maximum PFDavg is also defined as a target. The data used
in QRA for whole system can use this reliability figure for a better estimate of system’s
performance. The reliability ranges are as follows:

Table 5-1: Reliability Ranges for SIFs

Target SIL PFDavg Risk Reduction Factor (RRF)

SIL 4 ≥10-5 to < 10-4 > 10,000 to ≤ 100,000

SIL 3 ≥10-4 to < 10-3 > 1000 to ≤ 10,000

SIL 2 ≥10 to < 10


-3 -2
> 100 to ≤ 1000

SIL 1 ≥10-2 to < 10-1 > 10 to ≤ 100

If the QRA analyst is aware of a system being classed as a SIS, it is better to obtain the
achieved reliability data for use in the assessment. The following table gives the typical
values that can be used in absence of such information or for a crude estimate. These
are based on geometric mean of the SIL range as suggested by UKOOA guidelines [13].
However, arithmetic averages can also be used.

Table 5-2: Reliability for typical SIS

SIS Type/Typical SIL PFDavg (Geometric Mean) Risk Reduction Factor (RRF)

PSD (SIL 1) 3.16E-02 32

ESD (SIL 2) 3.16E-03 316

HIPPS (SIL 2 – SIL 3 Range) 3.16E-02 - 3.16E-03 316 - 3162

EQD (SIL 2) 3.16E-03 316

Blowdown System (SIL 1) Note 1/2 3.16E-02 32

Fire & Gas Detection (SIL 1) Note 1


3.16E-02 32

3 In many cases, some functions within the PSD and ESD may be classified as Safety Instrumented Functions (SIFs). The overall system
that executes the SIF is the SIS. These systems have a defined range of reliability that they must meet to comply with the functional
safety standards. Hence, it is recommended to use the ‘minimum achieved reliability’ (also: “the lower bound”)’ for these systems in
the QRA process in absence of any relevant information on such systems. A SIL verification report typically will contain the achieved
reliability for these systems.

34
Guide to finding and using reliability data for QRA

Note 1: Blowdown systems and fire and gas detection systems are not SIS in the strict
sense. These systems are typically treated as being equivalent to SIL 1, i.e., these are
typically engineered to achieve at least SIL 1 reliability range. If such systems are relatively
old (i.e., nearing obsolesce or end-of-life), a reduced reliability range is recommended
to allow for normal degradation and systematic errors which may not be rectified using
maintenance. The SIL target only considers the reliability of hardware and does not reflect
the overall reliability of such systems.

Note 2: The reliability values are for an automatic blowdown system which is activated as
part of the safety logic. For manual initiation of the blowdown system, it is suggested that
60% - 70% of availability is assumed.

5.1.2 Effect of Proof Testing


For on-demand process isolation systems, it is essential to perform proof tests at a frequency
assumed in the reliability calculation. If the system is not being tested to the frequency
assumed in the calculation and in accordance with the proof test procedure, any reliability claim
should be carefully considered. It is highly recommended that the claimed achieved reliability
should only be used in conjunction with the confirmation that proof test intervals are being met.

At a sub-system level (sensors, logic solver and final elements), the probability of failure on
demand is dependent upon the test interval as mentioned in section 3.3. An example (using the
formulae in section 3.3) depicting the impact of proof testing interval is shown in Figure 5.2.

In the example, dangerous failure rate (λd) of 8.00 x 10-3 has been assumed. The figure
shows that PFD average increases with increase in the interval between tests. There is
usually an optimal test interval for each system or function which balances the test interval
requirements against cost of conducting tests and other variables.
Impact of Proof Test Interval
1.00E+00

1.00E-01

1.00E-02
PFD Average

1.00E-03

1.00E-04

1.00E-05
Test Interval (Years)
1.00E-06
0.25 0.5 1 2 3
1oo1 1.00E-03 2.00E-03 4.00E-03 8.00E-03 1.20E-02
1oo2 1.33E-06 5.33E-06 2.13E-05 8.53E-05 1.92E-04
2oo2 2.00E-03 4.00E-03 8.00E-03 1.60E-02 2.40E-02
2oo3 4.00E-06 1.60E-05 6.40E-05 2.56E-04 5.76E-04

Figure 5.2: Proof Test Interval Impact on PFD

35
Guide to finding and using reliability data for QRA

5.1.3 System Survivability


The reliability figures typically do not account for survivability of equipment in hazardous
events. If the fire or explosion resulting from a loss of containment event (or others) is of
such magnitude that it exceeds the design accidental load (DAL) or design overpressure
resistance of sensors, actuators or valves, it should be assumed that the relevant protection
system may be rendered non-functional, even if it is a fail-safe design. Also, if DAL exceeds
the maximum value that can be endured by pipework, the protection system will not be able
to perform isolation even if the valves close as required due to damage to the pipes. The
DAL criteria should also be applied to the pipework.

In QRA, it is recommended that the following cases are modelled:

Isolation Blowdown
Case 1 Performs as per design Performs as per design
Case 2 Performs as per design Fails
Case 3 Fails Performs as per design
Case 4 Fails Fails

The probability of each case combination can be estimated using the overall reliability of
each system. For case 3, the installation’s shutdown logic should be consulted to ensure
that it permits blowdown without isolation.

5.2 Fire Safety and Passive Safety Systems


The fire safety systems on an installation are a combination of active and passive mechanisms.

Active fire protection systems comprise water deluge systems (pumps, deluge valves, ring-
main, deluge nozzles, monitors), chemical foam systems, water mist systems etc. Passive
protection systems comprise passive fire protection (PFP) coatings, firewalls and blast
walls. For cryogenic spills, a cryogenic spill protection system may be used.

5.2.1 Reliability Ranges and Guidance


An active fire protection system is a combination of various equipment items. Hence, its
reliability can be estimated by appropriately combining reliability of individual elements
taken from various data sources. Alternatively, failure probabilities listed Table 5 3 can be
used for legacy systems which do not meet current standards. These have been taken from
Northeast Utilities, Millstone 3 PRA Appendix 2-K, 1983 [14].

Table 5-3: Active Fire Suppression System Failure Rates

System Failure Probability (On-Demand)


Water Deluge 0.049
Halon 0.20
Carbon Dioxide 0.116

36
Guide to finding and using reliability data for QRA

For modern fire water systems, an availability of the deluge system of at least 90%
is required but 99% is desired. Various analyses show that the fire water systems
unavailability is largely dependent on the reliability of the deluge valves. The deluge valves
constitute 95.5% of the total unavailability.

For purposes of a QRA, data presented in Appendix 14 (Failure and Event Data) in Lee’s
Loss Prevention in the Process Industries: Hazard Identification [15] can be used. A
summary is presented in Table 5-4.

Table 5-4: Fire Protection Systems Failure Rates

System Failure Rate (per hour)


Inert gas extinguishing system 1.5 x 10-5
Dry sprinkler system (total failure) 2.3 x 10-7
Wet sprinkler system 7.2 x 10-8
CO2 extinguishing system 2.1 x 10-6
Wall hydrants 4.4 x 10-8

National Fire Protection Association (NFPA) standards are widely used for defining required
water/foam delivery rates and durations along API 6F series for fire tests and API 2218 for
fireproofing etc. The most common standards used by design engineers are listed in Table 5-5.

Table 5-5: Fire Protection Systems – List of Common Standards

Code Title
NFPA 4 Standard for Integrated Fire Protection and Life Safety System Testing
NFPA 10 Standard for Portable Fire Extinguishers
NFPA 11 Standard for Low-, Medium-, and High-Expansion Foam
NFPA 12 Standard on Carbon Dioxide Extinguishing Systems
NFPA 12A Standard on Halon 1301 Fire Extinguishing Systems
NFPA 13 Standard for the Installation of Sprinkler Systems
NFPA 14 Standard for the Installation of Standpipe and Hose Systems
NFPA 15 Standard for Water Spray Fixed Systems for Fire Protection
NFPA 16 Standard for the Installation of Foam-Water Sprinkler and Foam-Water
Spray Systems
NFPA 17 Standard for Dry Chemical Extinguishing Systems
NFPA 17A Standard for Wet Chemical Extinguishing Systems
NFPA 20 Standard for the Installation of Stationary Pumps for Fire Protection
NFPA 25 Standard for the Inspection, Testing, and Maintenance of Water-Based
Fire Protection Systems
NFPA 99B Standard for Hypobaric Facilities

37
Guide to finding and using reliability data for QRA

Code Title
NFPA 130 Standard for Fixed Guideway Transit and Passenger Rail Systems
NFPA 750 Standard on Water Mist Fire Protection Systems
NFPA 770 Standard on Hybrid (Water and Inert Gas) Fire Extinguishing Systems
NFPA 1150 Standard on Foam Chemicals for Fires in Class A Fuels
NFPA 2001 Standard on Clean Agent Fire Extinguishing Systems
NFPA 2010 Standard for Fixed Aerosol Fire-Extinguishing Systems
ISO 13702:2015 Petroleum and natural gas industries – Control and mitigation of fires
and explosions on offshore production installations – Requirements and
guidelines

In addition, legacy systems may have been designed to the standard BS 5306 [16]. This
standard mandated water delivery rate of 9.81 litres/min/m2 over the exposed vessel
surface and its supports is required. For protection from lower levels of thermal radiation
from fires on adjacent units, lower rates of water application are allowable.2626

For PFP based systems, typically, the criteria will be that a protected surface will not reach
a certain temperature in a defined time-period during a standard test. The protective
system should meet the requirements of a fire tests (example: pool fire test as described in
UL 1709 [17] and jet fire test as described in ISO 22899 [18]). For well-maintained systems,
an availability of the at least 90% is required but 99% is desired.

In QRAs, the analyst should also consider DAL values for items such as firewalls and blast
walls. If the thermal radiation or overpressure from an event exceeds the specified DAL, the
protection measure should be assumed to be impaired and ineffective.

5.2.2 Effect of Proof Testing


For active fire safety systems, it is essential to perform proof tests at a frequency assumed
in the reliability calculation. If the system is not being tested to the frequency assumed in the
calculation and in accordance with the proof test procedure, any reliability claim should be
carefully considered. It is highly recommended that the claimed achieved reliability should
only be used in conjunction with the confirmation that proof test intervals are being met.

The proof test should include all parts of the overall system including “standby” equipment.

5.2.3 System Survivability


The reliability figures typically do not account for survivability of fire safety equipment in
hazardous events. If the fire or explosion resulting from a loss of containment event (or
others) is of such magnitude that it exceeds the design accidental load (DAL) or design
overpressure resistance of sensors, actuators or valves, it should be assumed that the
relevant active fire protection system may be rendered non-functional, even if it is a fail-
safe design.

38
Guide to finding and using reliability data for QRA

Similarly, effectiveness of PFP and other passive means of safety may be severely
compromised if the hazardous event load exceeds DAL. In such cases, PFP can be
damaged or even blown-off the equipment it is protecting.

5.3 Refuges and Shelters


For temporary refuges (TR), the reliability and availability targets are usually specified in
the performance standards. As these are constructed using fire walls and blast walls type
structures, the inherent availability of such walls governs the availability of TRs. In addition,
the QRA analyst must consider the HVAC reliability and quality of door seals.

Typically, an availability of 90% to 99% on a sliding scale can be assumed for such systems
provided they meet the maintenance requirements, and DAL is not exceeded.

The reliability of evacuation crafts or totally enclosed motor propelled survival craft
(TEMPSC) (or lifeboats and life rafts) is difficult to ascertain due to lack of data. HSE
research report RR599 [19] discusses the issue in detail. However, for QRA purpose,
the assumed reliability should not exceed 90% for newer crafts. Ageing crafts undergo
degradation in the glass reinforced plastic (GRP), and hence offer a much lower reliability
as a system.

Readers are recommended to study:


• OCIMF22 – “Results of a Survey Into Lifeboat Safety”
• OCIMF, Intertanko and SIGTTO23 – “Lifeboat Incident Survey 2000”
• MAIB24 – “Review of Lifeboat and Launching Systems’ Accidents”

39
Guide to finding and using reliability data for QRA

6. Recommended data sources for


further information
The text book Functional Safety – a Straightforward Guide to IEC 61508 [20] presents
background theory and several worked examples including fault trees and analysis of
common cause failures.

Layer of Protection Analysis – Simplified Process Risk Assessment [21] also presents
worked examples together with some specimen reliability data.

Background reliability theory can be found in Practical Reliability Engineering [22] and
Reliability, Maintainability and Risk [3]. The latter also contains some reliability data from
FARADIP [4].

Reliability Technology [23] contains (older) reliability data from the nuclear industry.

Other useful sources are as follows:


• SINTEF, Reliability of Surface Controlled Subsurface Safety Valves, 21/2/1983, STF18
A83002. [24]
• Holand, P.: Subsea BOP Systems, Reliability and Testing. Phase V. STF75 A89054
ISBN 82-595-8585-5, 1989). [25]
• Holand, P.: Reliability of Surface Blowout Preventers (BOPs) STF75 A92026 (ISBN 82-
595-7173-0), 1992. [26]
• SINTEF; Reliability of Surface Controlled Subsurface Safety Valves, Phase IV - Main
Report 1991 STF75 A91038. [27]
• Holand, P.: Reliability of Subsea BOP Systems for Deepwater Application, Phase II
DW.(Unrestricted version). STF38 A99426 (ISBN 82-14-01661-4), 1999. [28]
• Brand, VP, UPM3.1: A pragmatic approach to dependent failures assessment for
standard systems, ISBN 085 356, 1996. [29]
• Lees’ Loss Prevention in the Process Industries, Volume 3, Hazard Identification,
Assessment and Control, Third Edition, 2004, Butterworth-Heinemann, Hardcover
ISBN: 9780750675550 [15]
• Guidelines for Process Equipment Reliability Data, with Data Tables, 1989, American
Institute of Chemical Engineers, Print ISBN: 9780816904228 [30]

40
Guide to finding and using reliability data for QRA

7. References

[1] US DoD, Reliability Prediction of Electronic Equipment, MIL-HDBK-217F, Notice 2 1995.


[2] OREDA Participants, OREDA 2015 Handbook ISBN 82-14-02705-5.
[3] Dr David J Smith, Reliability, Maintainability and Risk Sixth edition, ISBN 0-7506-5168-7, 2001.
[4] FARADIP (FAilure RAte Data In Perspective), Maintenance 2000 Limited, Broadhaugh Building, Suite 110,
Camphill Road, Dundee DD5 2ND 1987 onwards.
[5] IEC 61508, Parts 1 – 7, Edition 2, Functional safety of electrical/electronic/programmable electronic safety-
related systems, 2010.
[6] ISO 14224: 2016, Petroleum, petrochemical and natural gas industries - Collection and exchange of
reliability and maintenance data for equipment.
[7] Electronic Part Reliability Data 1997 (NPRD-97), Reliability Analysis Center, PO Box 4700, Rome, NY.
[8] Non-Electronic Part Reliability Data 1995 (NPRD-95), Reliability Analysis Center, PO Box 4700, Rome, NY.
[9] Reliability Data for Safety Instrumented Systems - PDS Data Handbook, 2006 Edition, Sydvest, Trondheim, Norway.
[10] Institute of Electrical and Electronics Engineers IEEE 493-2007, Recommended Practice for the Design of
Reliable Industrial and Commercial Power Systems (“Gold Book”).
[11] Exprosoft, Klæbuveien 125, Lerkendal Stadion, Trondheim, Wellmaster Database, ongoing.
[12] Exprosoft, Klæbuveien 125, Lerkendal Stadion, Trondheim, Subseamaster Database, ongoing.
[13] United Kingdom Offshore Operators Association (UKOOA). Instrument-Based Protective Systems,
Document Number CP012, 1995.
[14] Northeast Utilities, Millstone 3 PRA Appendix 2-K, 1983
[15] Lees’ Loss Prevention in the Process Industries, Volume 3, Hazard Identification, Assessment and
Control, Third Edition, 2004, Butterworth-Heinemann, Hardcover ISBN: 9780750675550.
[16] BS 5306-0:2011, Fire protection installations and equipment on premises. Guide for selection of installed
systems and other fire equipment.
[17] UL 1709, Standard for Rapid Rise Fire Tests of Protection Materials for Structural Steel, Edition 5, 2017.
[18] ISO 22899-1:2007, Determination of the resistance to jet fires of passive fire protection materials.
[19] Overview of TEMPSC performance standards, RR 599, Prepared by Serco Technical and Assurance
Services for the Health and Safety Executive 2007.
[20] Smith & Simpson, Functional Safety, ISBN 0-7506-5270-5, 2001.
[21] Center for Chemical Process Safety, Layer of Protection Analysis, ISBN 0-8169-0811-7, 2001.
[22] O’Conner, P, Practical Reliability Engineering, ISBN 0-471-95767-4, 1996.
[23] Green & Bourne, Reliability Technology, ISBN 0 471 32480-9, 1981.
[24] SINTEF, Reliability of Surface Controlled Subsurface Safety Valves, 21/2/1983, STF18 A83002.
[25] Holand, P.: Subsea BOP Systems, Reliability and Testing. Phase V. STF75 A89054 ISBN 82-595-8585-5, 1989).
[26] Holand, P.: Reliability of Surface Blowout Preventers (BOPs) STF75 A92026 (ISBN 82-595-7173-0), 1992.
[27] SINTEF; Reliability of Surface Controlled Subsurface Safety Valves, Phase IV - Main Report 1991 STF75 A91038.
[28] Holand, P.: Reliability of Subsea BOP Systems for Deepwater Application, Phase II DW.(Unrestricted version).
STF38 A99426 (ISBN 82-14-01661-4), 1999.
[29] Brand, VP, UPM3.1: A pragmatic approach to dependent failures assessment for standard systems, ISBN 085 356, 1996.
[30] Guidelines for Process Equipment Reliability Data, with Data Tables, 1989, American Institute of Chemical
Engineers, Print ISBN: 9780816904228.
[31] Jørn Vatn, OREDA Data Analysis - Compressor study, SINTEF Report STF75 F92.

41
www.iogp.org
Registered Office Brussels Office Houston Office
City Tower Avenue de Tervuren 188A 19219 Katy Freeway
Level 14 B-1150 Brussels Suite 175
40 Basinghall Street Belgium Houston, TX 77094
London EC2V 5DE USA
T +32 (0)2 790 7762
United Kingdom
eu-reception@iogp.org T +1 (713) 261 0411
T +44 (0)20 3763 9700 reception@iogp.org
reception@iogp.org

The reliabilities of Instrumented


Protection Systems (IPFs) is a key
input to Quantitative Risk Assessment
(QRA) of hydrocarbon exploration and
production facilities. The IPFs include
systems such as the Fire and Gas
System (FGS), Emergency Shutdown
(ESD) and blowdown systems, High
Integrity Pressure Protection System
(HIPPS), Process Shutdown (PSD)
systems and blowout prevention.
This datasheet provides guidance on
obtaining, selecting and using reliability
data for these systems and for their
component parts, for use in QRA.

You might also like