Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

OTC-29544-MS

Developing Probabilistic Risk Assessment, PRA, for a BOP System


Reliability

John Steven Holmes and Viral Shah, Baker Hughes a GE Company

Copyright 2019, Offshore Technology Conference

This paper was prepared for presentation at the Offshore Technology Conference held in Houston, Texas, USA, 6 – 9 May 2019.

This paper was selected for presentation by an OTC program committee following review of information contained in an abstract submitted by the author(s). Contents of
the paper have not been reviewed by the Offshore Technology Conference and are subject to correction by the author(s). The material does not necessarily reflect any
position of the Offshore Technology Conference, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written
consent of the Offshore Technology Conference is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may
not be copied. The abstract must contain conspicuous acknowledgment of OTC copyright.

Abstract
Recent regulatory changes have moved in the direction of more oversight and more prescriptive solutions.
One of the areas that can help operators and drillers alike deal with the new regulations is improved risk
analysis. This paper addresses a methodology for Probabilistic Risk Analysis (PRA) modeling of blowout
preventer (BOP) systems. The PRA utilizes a combination of event trees and fault trees to determine the
probability of a hazard under a given set of conditions. The fault trees are populated with reliability data
from the best available sources.
Traditional BOP risk analysis was done on a deterministic approach. The probabilistic approach will
allow a logical method to assess top level hazards resulting from specific component failures. This approach
has been used in other industries such as nuclear power and space exploration. A method of combining
testing intervals with PRA results to determine the probability of failure on demand is also included.

Introduction
Currently offshore drilling in USA requires testing of critical BOP functions every 14 days. The testing
regimen puts a tax on the BOPs due to the number of times the BOPs are functioned and acts as a tax to the
operations as this prevents from continuous operations. The PRA study on a system can be used to show
that the risk in moving from 14 to 21 days for testing for instance, is minimal and becomes a huge benefit
as it reduces wear and tear on the equipment and provides operational efficiency at the same time.

Probabilistic Risk Assessment (PRA)


A probabilistic risk assessment is constructed in four major steps. Throughout the process the OEM, drilling
contractor, and oil company work in partnership. Each member brings different perspectives and expertise.
At various stages, it may be best for one party or another to drive the process. But, this should not be done
in isolation. Even though the oil company may drive part of the process, both the drilling contractor and
the OEM have perspectives and a stake in the decisions made. Actual creation of the PRA model should be
done by someone with reliability experience and preferably PRA experience.
Throughout the entire process, the analysts should consider uncertainty analysis and sensitivity analysis.
Because the PRA is the basis for the risk-informed safety case (NASA & BSEE, JSC-BSEE-NA-24402-02),
2 OTC-29544-MS

the analyst needs to account for uncertainty. Normally, uncertainty is associated with every data source.
According to NASA (2011), "The uncertainty is usually expressed as a probability distribution function for
the possible values of a variable." When it is prohibitive to develop the uncertainty distribution, the analyst
can perform sensitivity analysis to determine the criticality of various pieces of data (NASA, 2011).
The first step in PRA development is to set the boundaries of the analysis. The boundaries are defined
by determining the end states of interest and the scenarios needed to produce those end states (NASA &
BSEE, JSC-BSEE-NA-24402-02). For example, end states may be damage to environment, injury, death,
etc. The analysis would produce very different results if the goal was to analyze the possibility of death
versus the probability of non-productive time. Determining the analysis goals is an important part of setting
the boundaries. Generally, the best party to establish the end states of interest and the scenarios is the oil
company involved in the exploration with the support of the drilling contractor and the OEM.
The second step in the PRA process is to determine the initiating events (IE) that could lead to a specific
end state (NASA & BSEE, JSC-BSEE-NA-24402-02). Initiating events could be a well kick, a drive off,
an inadvertent LMRP disconnect, etc. The initating events overlap the expertise of the oil company and the
drilling contractor. Either the oil company or drilling contractor may best be suited to lead this analysis with
the support of the other two parties.
Step three in the process is the development of event trees (NASA & BSEE, JSC-BSEE-NA-24402-02).
The event trees represent how the drilling contractor's concept of operation is executed in the event of
an emergency. These trees include a sequence of events that must fail to lose a function and result in an
undesired end state. Generally, development of the event trees could best be led by the drilling contractor
with the support of the OEM and the oil company.
Finally, pivotal event analysis is done (NASA & BSEE, JSC-BSEE-NA-24402-02). Pivotal events are
generally the specific steps that must fail in the event tree. These are normally modelled with fault trees.
While some of the events may be operational or human factors, most of them will be equipment failures.
Since the correct function of the equipment is best understood by the OEM; this analysis is best led by the
OEM with support of the drilling contractor and oil company.

Event Trees
Event trees are a convenient way to combine discrete probabilities into the probability of a final outcome.
This methodology is a technique of basic statistics. Walpole & Myers (2016) referred to the technique as
a tree diagram used to define the entire sample space. The solution to a tree diagram is straight forward.
Consider the diagram of Figure 1. The outcomes on the right are simply the product of the individual event
probabilities. For example, the bubble on the top right would simply be the product of probability PA and
probability PB.

Figure 1—Example tree diagram


OTC-29544-MS 3

The event trees used in a PRA are structured to quantify the probability of a consequence of interest
(NASA & BSEE, pra-05012017, 2017). To achieve this goal, the end states and the initiating events must
be defined ahead of time. The end state definition determines the boundaries of the analysis (NASA &
BSEE, JSC-BSEE-NA-24402-02, 2017). Some end states to consider may be safe shut down, loss of life,
environmental damage, loss or damage to equipment, blowout, fire, collateral damage, etc.
Several techniques exist to define the initiating events. These events may be found through FMEA,
Hazard analysis, Previous risk assessments, functional analysis, etc. (NASA & BSEE, pra-05012017, 2017).
Each end state can be analyzed in terms of loss of some function. Loss of functions happen because of
initiating events or combinations of initiating events. A graphical way to represent the necessary functions
with respect to the end state is a master logic diagram (NASA, 2011, SP-2011-3421). See Figure 2. The
master logic diagram defines relationships much like a functional block diagram. It allows the analysis team
to see which initiating events will affect various components or subsystems. This type of relational analysis
will allow the team to construct the event trees and understand which subsystems need to be treated as
pivotal events with respect to which initiating events. A key decision in this analysis is the trade-off between
complexity and fidelity of the model. Often a model can be made more complex with limited improvement
to the fidelity. The team must determine the appropriate depth of detail.

Figure 2—Master dogic diagram

Once the outcomes, initiating events, and pivotal events are defined, an event tree can be constructed.
The event tree generally represents a sequence of events across the top and the probability tree in the body of
the diagram, see Figure 3. The purpose of the event tree is to determine the probability of a safe (or unsafe)
outcome in response to a given initiating event. This purpose implies that a separate event tree needs to be
constructed for each initiating event.
4 OTC-29544-MS

Figure 3—Event tree

Consider the highly simplified event tree of Figure 3. The initiating event may be a well kick. The
probability of a well kick has been studied by Sintef and data is available in the ExproSoft 2012 Report,
Reliability of Deepwater Subsea BOP Systems and Well Kicks. Pivotal event A may be the probability of
the driller detecting the well kick in time to respond. This type of human factors data has been studied by
NASA. Guidance is available in (NASA & BSEE, JSC-BSEE-NA-24402-02) Probabilistic Risk Assessment
Procedures Guide for Offshore Applications. Pivotal event B may be the probability of an annular BOP
closing when commanded. This probability necessarily includes the reliability of the BOP, reliability of
the hydraulic controls, reliability of the electronic controls, etc. Guidance on the final device probabilities
is available in ExproSoft 2012; but, the entire probability of the functional loop must be calculated. For a
PRA, the loop probability is calculated using fault trees. For someone skilled in the art, it is obvious that an
actual PRA would include many more pivotal events including shear ram closure, pipe ram closure, etc.

Fault Trees
Fault trees are a logical construct that capture the combinations of failures in a system that must occur to
cause the loss of a system function (O'Connor, 2005). Generally, in PRA, fault trees are used to calculate
the probability of a Pivotal Event occurring and these probabilities are rolled up to calculate the event tree
values. Fault trees are constructed of basic events, OR gate, and AND gate logic.
An OR gate represents a function failure when any of the components or subsystems at its input fail. An
AND gate represents a function failure when all the components or subsystems at its input fail. Consider
the highly simplified fault tree of Figure 4. F(A) is the probability of an annular BOP failing, F(Y) is the
probability of the yellow control failing, and F(B) is the probability of the blue control failing. So, this
diagram shows that the probability of loss of an annular function is a result of the loss of the annular BOP
or loss of both the yellow and blue controls due to failing.
OTC-29544-MS 5

Figure 4—Simplified fault tree

When developing fault trees, consideration must be given to common cause failures. Common cause
failures are failures that may result in two identical components at the same time. For example, two
components have similar sensitivity to temperature shock, or two components were made of the same
design on the same assembly line. A good explanation of common cause failures, and how to compute
coupling factors, can be found in IEC 61508, 2010. An entire appendix in that document is dedicated to
how to calculate beta factors. Representing common cause failure in a fault tree is straight forward. When
two components share a common cause failure (CCF) the failure structure is represented as in Figure 5.
Since common cause is a dependent probability (not independent) this representation provides the correct
mathematical construct. This method is preferred because, it promotes easier model validation on the back
end of the development.

Figure 5—Representing common cause failure

When developing fault trees, the basic events should cover every component that could contribute to a
failure. These may be represented as individual components or as an aggregated set of components for a
6 OTC-29544-MS

subsystem (assuming data is available for the subsystem failure rate). It is also absolutely critical that basic
events be assigned a reference designator that relates back to the system schematic or P&ID diagram. This
data is the only linkage between the model and the system design. Finally, a rigorous naming convention
should be followed for naming all events in the model. BSEE recommends naming conventions in NASA
& BSEE, JSC-BSEE-NA-24402-02.

Data Sources
Probabilistic risk models can be fairly complex. A recent model of a BOP system was created. Even
having aggregated the surface controls into cabinet level failure rates and aggregating the subsea electronic
modules, the system model still required 1620 basic events to describe the system. The model also required
nearly 1000 gates to describe the failure logic.
Each basic event in a system requires data for the model to use during the calculation phase of analysis.
Typical data requirements include:

• Event name

• Component failure rate

• Repair time

• Mission time

The event name should be created following the rigorous naming convention recommended by BSEE
(NASA & BSEE, JSC-BSEE-NA-24402-02). Component failure rates should be set based on the best
available data. The repair time and mission time are operational data depending on how the equipment is
used.
It can be difficult to identify the best available data for failure rates. Yet, this is extremely important to
creating a model that is accurate. The phrase ‘best available data’ means that the analyst should use the
data that is considered the most accurate or most reliable. For example, one component may have been
designed with a rigorous testing and qualification program. The data resulting from that program may be
more accurate for that component than general industry data for similar components. The list below shows
many data sources that can be used for drilling equipment. During this project, the order of preference was
as listed in Table 1.
OTC-29544-MS 7

Table 1—Data sources

Methodology
As described in the previous sections the PRA analysis starts by identifying the boundary conditions and
expressing them in an event tree. Once the events are defined, fault trees are created to provide the underlying
analytical structure to perform the analysis. The method of calculation used for this study was software
know as Saphire (Smith, et. al., 2009).
8 OTC-29544-MS

Saphire is an acronym that comes from Systems Analysis Programs for Hands-on Integrated Reliability
Evaluations. The tool was originally developed by Idaho National Labs for use by the Nuclear Regulatory
Commission. The Saphire tool integrates the fault trees, event trees, and necessary reliability data in one
package. It also provides calculation capabilities to analyze the results and provide solution sets (called cut
sets) that can be used as inputs to the risk informed decision process.
The blow out preventer model resulted in 222 fault trees consisting of 1620 basic events and 957 logic
gates. The development of this model started with customer provided event trees. The system schematics
and P&ID diagrams were analyzed to create the fault tree structure. This model was focused around safety,
not down time, so the fault trees created focused on failure modes that would not allow a BOP to close
through any path. This was a correct simplifying assumption for the objective of this model. A model could
be constructed with the objective of downtime analysis but, it would have more complexity.
During the development of the model, each tree was assessed for uncertainty. In general, it was easy to
understand the impact of most failures. In areas where there was question a simple sensitivity test was done
to help validate the model. The models were then updated and revised to include common cause failures.
The analysis was run and the cut sets were observed. The cut set analysis was used to identify failure
probabilities of specific functions. The cut set anaylsis also allowed the team to determine if there were any
single point failures or failures where common mode failures were dominate. The analysis then allowed a
roll up of failure rate to a system level.

Probability of Failure on Demand


Probability of failure on demand (PFD) is a metric that is calculated from reliability data, diagnostic
coverage, and proof test intervals. The method is described in IEC 61508 (2010) and summarized by Holmes
(2015). The basic equations of a simple 1oo1 system are shown in Equations 1–4 (Holmes, 2015). The
calculations for more complex redundant systems can be found in IEC 61508 (2010). Since these equations
use both diagnostic testing and proof testing to establish a metric, they are ideal for making operational
trade-offs between monitoring methods and pressure test intervals (see the next section).
Equation 1
Equation 2
Equation 3

Equation 4
Where λ is failure rate, DU is dangerous undetected, DD is dangerous detected, D is dangerous, DC is
diagnostic coverage, T1 is the proof test interval, MRT, and MTTR are repair times in the standard. For the
purposes of drilling MTR & MTTR can be the calculated as the amount of time it takes to come to a safe
state (close a BOP) after a defect is detected.
For a given dangerous failure rate, these equations demonstrate that as the diagnostic coverage (DC)
increases (increased monitoring), the dangerous failures move from undetected to detected. This has the
effect of reducing reliance on proof test interval (T1). These equations show that increasing the proof test
interval will increase PFD but, increasing diagnostic coverage will reduce the PFD.

Methods of Demonstrating Risk for Extended Pressure Test Cycles


Two methods of assessing the risk of extended pressure testing are proposed. The first method uses the PRA
directly by varying mission times. The second method uses the output of the PRA to assess the dangerous
failure rate (as described in IEC 61508, 2010). Then uses the probability of failure on demand metric to
assess risk.
OTC-29544-MS 9

Mission time analysis


The failure rate of various functions in the system is calculated based on the data included in the PRA. For
repairable components the PRA data includes a failure rate, repair time, and a mission time. The failure
rates are determined by the best available data as described above. The repair time and mission time data
are determined by the analysis team based on domain knowledge of how the equipment is used. These data
may vary from one model to another but, need to be the same when comparing models to each other.
One methodology of assessing the risk of varying test intervals is to use the mission time parameter as the
variable of interest. The PRA risk models are very effective at comparing resultant cut sets between models
to assess the impact of a change to the system design or system assumptions. To perform this analysis, the
components of the model would have mission times set to the current test interval (e.g., 14 days). The model
would run and cut sets would be determined. Then the model would have mission times set to the desired
test interval (e.g., 21 days) and the model rerun. The model results could be compared in a straight forward
way to show how much risk is incurred by changing the test interval. Any increased risk would have to be
offset by a specially developed monitoring plan.

Probability of failure on demand analysis


An alternative method of analysis is to consider the probability of failure on demand (PFD) metric. This
methodology would start with the development of the PRA to determine the dangerous failure rate. An
assessment of the sensors, available data, what data is available in analog form versus simply alarms, period
of monitoring, etc. would be done to determine the diagnostic coverage. The diagnostic coverage is a metric
that defines the percent of the system that is covered by diagnostics. It is likely that information is available
in the system that is not being continuously monitored. Rather some of the system data may currently be
used only for diagnostics or maintenance. These data would not be included in the continuous diagnostics
analysis.
If we assume MTR=MTTR=45 seconds (for drilling). Note that 45 seconds=0.0125 hrs., 14 days=336
hrs., and 21 days=504 hrs. Two equations can be setup, as in Equation 5–6, to establish the PFD for a 14-
day test cycle and a 21-day test cycle.
Equation 5

Equation 6
If these two equations are set equal to each other, the only remaining unknown is the diagnostic
coverage required for a 21-day test. Once that number is determined an action plan can be established that
demonstrates the amount of increased monitoring required to keep the PFD from rising when the proof test
is increased from 14 to 21 days.
A numerical example was calculated assuming the current diagnostic coverage (for a 14-day proof test
interval) was 75%. Using Equation 5–6, the diagnostic coverage required for a 21-day proof test interval
would be 83%. By increasing the diagnostic coverage by 8% the proof test requirement can be relaxed
without impact to the probability of failure on demand.

Conclusions
The work that was completed with our customer on the PRA models shows that these models can be used
for looking at events which can then be corresponded to what the risk level is for each of the events. This
can be useful in determining the test interval for functioning BOPs.
10 OTC-29544-MS

Future Work
Probabilistic risk assessments are now current technology. These models can be built from P&ID diagrams
and electrical schematics and provide useful information to all parties involved. Much of the PRA data
today relies on industry JIP data and standards data. In the future it would be useful to be able to answer the
question of "What is the probability of this event on THIS rig." The only way to answer such a question is
to include rig specific data in the model so the predictions are specific to the rig.
Future work in this area could include integrating rig analytics, captured on a system like SeaLyticsTM,
with the PRA. Specific techniques required could include Bayesian theory. Bayes’ theorem allows
probabilities to be updated as more information becomes available. So, the base data set could be the JIP and
standards-based data that gets updated with Bayesian math as the data capture system on the rig increases
the amount of rig specific information available to the model.

Acknowledgments
We would like to acknowledge Baker Hughes a GE Company for the execution of this program and allowing
the authors to learn while providing benefit to our customers. It was a significant effort to build the PRA
models and validate them in partnership with our customer.
We would also like to acknowledge Anadarko Petroleum Company for their assistance in completing
this work.

References
Bureau of Safety and Environmental Enforcement. Probabilistic Risk Assessment (PRA) Study
(2018). https://www.bsee.gov/what-we-do/offshore-regulatory-programs/risk-assessment-analysis/probabilistic-risk-
assessment-analysis (accessed 21 December 2018).
ExproSoft, 2012, ES 201252 Report, Reliability of Deepwater Subsea BOP Systems and Well Kicks.
Holmes, J. 2015. Safety Integrity Level Requirements in Deepwater Drilling - Where Safety Meets Reliability. IEEE
Xplore. DOI: 10.1109/RAMS.2015.7105116.
International Electrotechnical Commission, 2010. IEC 61508 Functional Safety of Electrical/Electronic/Programmable
Electronnic Safety-Related Systems. (2nd ed.).
Military Handbook, MIL-HDBK-217F, Reliability Prediction of Electronic Equipment (December 1991).
NASA, 2011, SP-2011-3421, Probabilistic Risk Assessment Procedures Guide for NASA Managers and Practitioners,
2nd Edition. https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20120001369.pdf (accessed 21 December 2018).
NASA & BSEE, pra-05012017-whitepaper, Probabilistic Risk Assessment: Applications for the Oil & Gas Industry. https://
www.bsee.gov/sites/bsee.gov/files/pra-05012017-whitepaper.pdf (accessed 21 December 2018).
NASA & BSEE, JSC-BSEE-NA-24402-02, Probabilistic Risk Assessment Procedures Guide for Offshore
Applications (DRAFT). https://www.bsee.gov/sites/bsee.gov/files/ProbalisticRiskAssessment%20%28PRA%29/
bsee_pra_procedures_guide_-_10-26-17.pdf (accessed 21 December 2018).
Naval Surface Warefare Center. (1992). Handbook of Reliability Prediction Procedures for Mechanical Equipment. https://
apps.dtic.mil/dtic/tr/fulltext/u2/a273174.pdf (accessed 8 January 2019).
Quanterion Solutions, NPRD-2016. Nonelectronic Parts Reliability Data. (2016). https://www.quanterion.com/wp-
content/uploads/2015/09/NPRD-2016-1.pdf (accessed 8 January 2019).
O'Connor, P. 2005. Practical reliability engineering. 4th ed. Wiley: Hoboken.
OREDA. 2009. Offshore Reliability Data, 5th ed. Volume 1 – Topside Equipment. SINTEF.
OREDA. 2009. Offshore Reliability Data, 5th ed. Volume 2 – Subsea Equipment. SINTEF.
Smith, C. et al (2009). Saphire Basics: An introduction to probabilistic risk assessment via the systems analysis program
for hands-on integrated reliability evaluations software. https://www.nrc.gov/docs/ML1204/ML12044A174.pdf
(accessed 17 January 2019)
United States Nuclear Regulatory Commission, Probabilistic Risk Assessment (PRA). https://www.nrc.gov/about-nrc/
regulatory/risk-informed/pra.html#Definition (accessed 21 December 2018).
Walpole, R. & Myers, R. 2016. Probability & Statistics for Engineers & Scientists. 9th ed. Pearson.
OTC-29544-MS 11

Appendix - Authors
John Holmes is a Chief Consulting Engineer at Baker Hughes a GE Company. He has been an engineer and
Engineering Manager since 1985 in small, medium and large corporations, working predominately in new
product development. In that time he has published several papers, received 32 US patents and been awarded
several international patents. Dr. Holmes also has twelve years of part time teaching experience in the Purdue
system. Dr. Holmes received his doctorate from Northcentral University in Engineering Management and
holds a BS degree from Purdue, an M.S. degree from Indiana State University and an MBA from Sullivan
University. He is also a Certified Functional Safety Professional, ASQ Certified Reliability Engineer, GE
Certified Six Sigma Black Belt, and a registered Professional Engineer.

Viral Shah is the Systems EngineeringManager at Baker Hughes, a GE company. He has been an Engineer
with Baker Hughes since 2008 working in product teams and New Product Introduction teams. He has
published 3 US patents and has several other publications. Viral received his M.S. degree in Mechanical
Engineering from Texas A&M University and is GE Certified Six Sigma Green Belt.

You might also like