Fault Tree Analysis

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

Introduction

 There is a need to analyze all the possible failure mechanisms in complex


systems (e.g. nuclear power plants).
 Also it performs probabilistic analyses for the expected rate of failures.
 Estimate probabilities of events that are modelled as logical combinations or
logical outcomes of other random events.
 Two main methods:
a) Fault tree analysis
b) Event tree analysis
 Decision trees also exist and are used in risk analysis (combines all feasible
alternatives, possible outcomes and their probabilities, monetary
consequences and utility evaluations).
 Other graphical methods include:
a) Reliability block diagrams
b) Functional logic diagrams
c) Failure Modes and Effects Analysis (FMEA)

1
Fault Tree Analysis
CIVE

A technique by which many events that interact to produce other events


can be related using simple logical relationships.

 It is one of the principal methods of probabilistic safety (or risk) analysis (PRA).
 It was developed by Bell Telephone Laboratories in 1962 for the U.S. Air Force
Minuteman system, later adopted and extensively used by Boeing Company.
 Fault tree diagrams are used most often as a system-level risk assessment
technique, that can model the possible combinations of equipment failures,
human errors, and external conditions that can lead to a specific type of
accident.
 It follows a top-down structure and represent a graphical model of the pathways
within a system between basic events that can lead to a foreseeable loss event
(or a failure) referred to as the top event.
 The contributory events and conditions are interconnected using standard logic
symbols (AND, OR, etc.), also referred to as gates.
 The events that must coexist to cause the top event are described using the
AND relationship, alternate events that can individually cause the top event are
described using the OR relationship.
 The concurrency not lead to a serious or adverse consequence. The relative
likelihood of a number of potential consequences will depend o n the
conditions or subsequent events that follow potential consequences can be
systematically identified using an event tree.

2
Basic Events

An event that cannot be developed any


further.

 All basic events are generally assumed to be statistically independent unless


they are common cause failures (i.e. failures arising from a common cause or an
initiating event).
 Basic events can be either:
a) Primary fault events, i.e. subsystem failure due to a basic mode such as a
structural fault, failure to open or close, or to start or stop.
Or,
b) Secondary fault events, i.e. subsystem failure due to excessive operational
or environmental stress resulting in the system elements to be out of
tolerance.

Advantages:
 It allows the use of reliable information on component failure and other basic
events to estimate the overall risk associated with new system designs for which
no historical data exists.
 It is simple to understand and easy to implement.
 Qualitative descriptions of potential problems and combinations of events
causing specific problems of interest.
 Quantitative estimates of failure frequencies and likelihoods, and relative
importance of various failure sequences and contributing events.
 Lists of recommendations for reducing risks.
 Quantitative evaluations of recommendation effectiveness.

Limitations:
 It is difficult to conceive all possible scenarios leading to the top event.
 Construction of fault trees for large systems can be tedious.
 Correlations between basic events (e.g. failure of components belonging to the
same batch) are difficult to model and exact solutions to correlated events do not
exist.
 Subjective decisions regarding the level of detail and completeness are often
necessary.
CIVE

3
Notation
Symbol Name Description
Primary Event Symbols
Circle Basic Event: A basic initiating fault
requiring no further development.

Oval Conditioning Event: Specific conditions or


restrictions that apply to any logic gate
(used with INHIBIT gate).
Diamond Undeveloped Event: An event that is not
developed further because it is of
insufficient consequence or because
information is unavailable.
House External Event: An event which is
normally expected to occur (not a fault
event).
Intermediate Event Symbols
Rectangle A fault event that occurs as a result of
the logical combination of other events.

Gate Symbols
OR Gate The union operation of events, i.e. the
output event occurs if (at least) one or
more of the inputs occur
AND Gate The intersection operation of events, i.e. the
output event occurs if and only if all the
inputs occurred.
INHIBIT The output event occurs if the (single) input event occurs in the
Gate presence of an enabling condition (i.e. Conditioning Event (oval)
drawn to the right of the gate)
Transfer Symbols
Triangle-in Indicates that the tree is developed
further someplace else (e.g. another
page).
Triangle-out Indicates that this portion of the tree is a sub- tree
connected to the corresponding Triangle-In .

4
CIVE

General Procedure for Fault Tree Analysis

Step 1. Define the system of interest:


 Specify and clearly define the boundaries and initial conditions of the
system for which failure information is needed.
Step 2. Define the top event for the analysis:
 Specify the problem of interest that the analysis will address (e.g. a
specific quality problem, shutdown, safety issue, etc.).
Step 3. Define the treetop structure:
 Determine the events and conditions (i.e., intermediate events) that most
directly lead to the top event.
Step 4. Explore each branch in successive levels of detail:
 Determine the events and conditions that most directly lead to each
intermediate event. Repeat the process at each successive level of the tree
until the fault tree model is complete.
Step 5. Solve the fault tree for the combinations of events contributing to the
top event:
 Examine the fault tree model to identify all the possible combinations of
events and conditions that can cause the top event of interest. A
combination of events and conditions sufficient and necessary to cause the
top event is called a minimal cut set.
Step 6. Identify important dependent failure potentials to adjust the model:
 Study the fault tree model and the list of minimal cut sets to identify
potentially important dependencies among events. Dependencies are
single occurrences that may cause multiple events or conditions to occur
at the same time.
Step 7. Perform quantitative analysis:
 Use statistical characterizations regarding the failure and repair of specific
events and conditions in the fault tree model to predict the future
performance of the system.
Step 8. Use the results in decision-making:
 Use the results of the analysis to identify the most significant
vulnerabilities in the system and to make effective recommendations for
reducing the risks associated with those vulnerabilities.

5
Rules of Fault Tree Construction

A fault tree should only be constructed once the functioning of the entire system is
fully understood.
The objective is to identify all the component failures, or combinations thereof that
could lead to the top event (Steps 2 - 4 above)

Rule 1. State the fault event as a fault, including the description and timing of a
fault
condition at some particular time. It includes:
a) What is the fault state of that system or component is
b) When that system or component is in a faulty state.
Test the fault event by asking:
a) Is it a fault?
b) Is the what-and-when portion included in the fault statement?

Rule 2. There are two basic types of fault statements, state-of-system and state-of-
Component. To continue the tree:
a) If state-of-system fault statement, use Rule 3
b) If state-of-component fault statement, use Rule 4

Rule 3. A state-of-system fault may use an AND, OR, or INHIBIT gate or no gate at all.
To determine which gate to use, the faults must be then:
a) Minimum necessary and sufficient fault events,
b) Immediate fault events.

Rule 4. A state-of-component fault always uses an OR gate.


To continue, look for the primary, secondary, and command failure fault
events. Then state those fault events.
 Primary failure: It is the failure of that component within the design
envelope or environment.
 Secondary failures: These are the failures of that component due to
excessive environments exceeding the design environment.
 Command faults: These are the inadvertent operation of the component
because of a failure of a control element.

6
CIVE

Rule 5. No gate-to-gate relationships.


Put an event statement between any two gates.

Rule 6. Expect no miracles.


Those things that would normally occur as the result of a fault will occur, and
only those things. Also, normal system operation may be expected to occur
when faults occur.

Rule 7. In an OR gate, the input does not cause output.


If any input exists, the output exists. Fault events under the gate may be a
restatement of the output events.

Rule 8. An AND gate defines a causal relationship.


If the input events coexist, the output is produced.

Rule 9. An INHIBIT gate describes a causal relationship between one fault and
another,
but the indicated condition must be present. The fault is the direct and sole
cause of the output when that specified condition is present. Inhibit conditions
may be faults or situations, which is why AND & INHIBIT gates differ.

7
How to Perform Fault Tree Analysis (FTA)

As previously mentioned, the FTA is a logical breakdown from the Top-level undesired
event, cascaded to the Base-level event (root cause). Each path has a probability
assigned. The paths related to the highest severity / highest probability combinations
are identified and will require mitigation. Starting at the Base-level event (at the
bottom of the FTA) and working the path up to the undesirable Top-level event is
called a Cut Set. There are many cut sets within the FTA. Each has an individual
probability assigned to it. The Base-level event is often color-coded to identify the risk
level indicated.

The 5 basic steps to perform a Fault Tree Analysis are as follows:


1. Identify the hazard.
2. Obtain and understanding of the system being analysed.
3. Create the fault tree.
4. Identify the cut sets.
5. Mitigate the risk.

Step 1. Identify the Hazard: Knowing the consequence of the failure is useful in
defining the top-level event of the fault tree. The top-level event, or hazard, should be
defined as precisely as possible:
 How much?
 How long (duration)?
 What is the safety impact?
 What is the environmental impact?
 What is the regulatory impact?

Step 2. Obtain and understanding of the system being analyzed:


 Create or acquire appropriate support information:
a) List of components (Bill of Material)
b) Boundary Diagram
c) Schematic
d) Code Requirements
e) Engineering Noises and Environments
f) Examples of similar products or failures

8
CIVE

 List the potential causes of the hazard to the next level. This is similar to
the ‘5 why’ process, except the development of a fault tree should be
focused on a single level before progressing to the next,
a) Include system design engineers, who have full knowledge of the
system and its functions, in the higher levels of the Fault Tree
Analysis. This knowledge is very important for cause selection.
b) Include Reliability Engineers who can assist in developing the
relationships of causes to a failure or fault.
 Estimate the probability of the causes at the Base-level event.
 Label all causes with codes.
 Prioritize or sequence causes in the order of occurrence or probability.

Step 3. Create the Fault Tree:


In the FTA example to the right, the team would stop the analysis on “Air Present”
because Oxygen presence is outside of the control
of the team developing the FTA.
Analysis continues down to the next level on “Fuel
Leak”. The team performing the FTA is brought
together to focus on the potential causes of fuel
leaks. The analysis is not limited to mechanical
failures alone. The inclusion of electronics and
software in complex design brings both the
opportunity to create or mitigate failures. The risks
may be prevented through engineering choices or
controlled through Quality Control.
The example tree continues to additional, more
detailed levels. The Base-level event (depicted as a
circle or oval) is the point at which the team can
address the risk.
The Base-level event is typically colour coded as
follows:
 Red: Critical Risk
 Orange: High Risk
 Yellow: Minor Risk
 Green: Acceptable / Very Low Risk

9
Step 4. Identify the Cut Sets:
 Risk is estimated for each event,
a) When available, the failure rate data can be used to calculate the risk of
a single chain or many chains.
b) If there is no data, an estimate is established based on subjective
guidelines similar to those used in FMEA development.
 The Cut Sets with risk greater than the system can tolerate (i.e. safety or
inoperative conditions) are selected for mitigation.
 Actions are required for Critical (red) and High Risks (orange).

Step 5. Mitigate the Risk:


Risk Mitigation can take many forms. A popular method is to use the criticality
method. Other techniques require a level of mitigation calculated to Defects per
Million Opportunities (DPMO). Safety systems may require resulting risk to be
mitigated to:
 Error Proofing (cannot Occur)
 1 in 10 million (1 X 10 to the minus 7)

Action logs and revision records are kept for follow-up and closure of each
undesirable risk. Any risk not mitigated to an acceptable level is a candidate
for mistake-proofing or quality control, which protects the consumer from the risk.

 Risk mitigation strategies:


When a risk is unacceptable the team may have several options available. The
following are a few examples of the options available:
 Design change.
 Selection of a component with higher reliability to replace the Base-level
event component.
 Physical Redundancy of the Component.
 Software Redundancy.
 Warning System.
 Quality Control

10
CIVE

Fault Tree Analysis – Example Implementation

Scaffolding Fall Fault Tree Analysis


We can conveniently analyze falls from scaffolding through Fault Tree Analysis. There can be
two potential causes, such as fall by accident, and the safety belt not working. The safety belt
not working is further divided into ‘broken by equipment,’ and ‘did not wear a safety belt.’ In
this way, ‘Fall by accident’ is also analyzed with this diagram.

Fall from
Scaffolding

Height &
ground
conditions

Fall from the


scaffolding

Safety belt not


Fall by accident
working

Broken safety Did not wear


equipment safety belt

Slip & Loss


fall balance

Safety Forget
belt to wear
Upholder broken
broken

11
When to Use Fault Tree Analysis

Every industry needs Fault Tree Analysis for:

 Probability of failure: FTA is typically used to find the probability of failure


in any system or event. With the help of a fault tree diagram, you can easily
calculate the probability of failure by analyzing basic events at the lower level
of a system. When you read the failure analysis, you can calculate the
probability of success in the next step. You can also use it in project work to
identify possible failures that might occur in the system.
 Defining Top-level system failures: Fault tree analysis is commonly used to
define systems at the basic level. A fault tree diagram moves branches from
top to bottom while defining every basic level event and its immediate causes.
It helps you identify the cause of Top-level failures in the system.
 Identifying system characteristics and next-level events: It helps you
understand the system's characteristics and chain of events. The top to
bottom fault tree diagram identifies the chain of next-level events that can
happen and possibly cause failure in the system. It chains all of the events
together and creates a failure path for a better analysis.
 The root cause of an event using logic gates: FTA identifies the root cause
of failure in any event. When it comes to designing a system, finding the
reason behind failures and fixing them gets complex. A fault tree uses logic
gates to reach the root cause for any positive or negative events in the system.
It analyzes the faults in the system to identify negative events that cause
failure. It also defines the positive events that make the system run properly.

12
Types of Fault Tree Analysis
CIVE

The Standard Fault Tree Analysis is not the only option. For specific industries and use
cases, FTA has been extended. These extensions could visualize features that are
difficult to express by standard fault trees. These are some of them:
 Dynamic FTA Dynamic Fault Trees: Dynamic fault trees (DFT), which extend
existing fault trees, model complex system behaviours and interact with them.
 Repairable FTA – Repairable Fault Trees – Enhance the FTA model with the
ability to describe complex dependent repairs to system components.
 Extended: Takes into account multi-state components as well as random
probabilities.
 Fuzzy: This complex mathematical concept, called fuzzy set theory, takes into
account unreliable variables that are hard to predict (like wind and weather).
 State-event FTA: The SEFT is used to analyse dynamic behaviour that fault trees
can’t model.

FTAs generally fall under two categories: Qualitative and Quantitative.


 Qualitative analysis must be performed at all times.
 Quantitative analysis can only be used in cases where you have a good idea of the
likelihood of certain events. Let’s look at each one in detail.

 Qualitative Fault Tree Analysis:

Qualitative fault tree analysis is used to understand the fault tree structure and analyse
the system’s vulnerabilities. We can conduct qualitative fault tree analysis in many
ways.
 Minimal cut (MCS): It helps to identify vulnerabilities in a system. A system that
has a low number of components, or a large number of elements with high failure
probabilities, would be deemed unreliable. These elements are identified in
MCS’s fault tree. You can improve reliability by reducing the likelihood of failure
or adding redundancies.

13
 The Minimal Path Sets (MPS): It will help you to determine the system’s
strength. This is a method that identifies the minimum number of components
necessary to keep the system functioning. Once you have identified the elements,
you can work to reduce their failure rate. This improves the system’s reliability.
 Common Causes Errors (CCE): Determine if multiple elements can cause
failures. CCE identifies critical components. Your team must ensure that these
components are regularly inspected and replaced if necessary. A computer-aided
maintenance management system (CMMS), can plan and schedule maintenance
for these critical components.

 Quantitative Fault Tree Analysis:

The quantitative FTA method can be used for calculating the probability of failure in
the analysis. This will allow you to better understand your risk and prioritize it.
Quantitative FTA may produce stochastic or important measures:
 We can use stochastic measures to determine the likelihood of the system failing.
 Importance Measures indicate the importance of a cut set, path or system to
reliability.
 Once you have a good idea of the probability of your basic events you can
calculate the probabilities for your intermediate events using the gates
connecting them. OR gates and AND gates are the most common gates. Here is an
example.

14

Benefits & Limitations of Fault Tree Analysis


CIVE

 Benefits:
 A visual representational record of a system is determined to show the
logical relationships between causes and events that lead to system failure.
 The analysis helps others to understand the results and pinpoint the
weaknesses quickly.
 At the same time, it also lays the foundation for any further evaluation and
analysis.
 When we make upgrades or changes to the system, with Fault Tree
Analysis, you already have steps for possible changes and effects.
 A Fault Tree diagram can also be used to maintain procedures and design
quality tests.

 Limitations: Fault Tree Analysis also suffers from several limitations,

 We need to foresee and evaluate first the undesired event. Thus, we have to
anticipate all contributing factors to the cause.
 The effort sometimes may prove to be very expensive and time-consuming
unless you equip certain skills to analyse the FTA yourself.
 If the reader doesn’t possess skills, then Fault Tree Analysis may not be
desirable to use.

Conclusion:

Fault Tree Analysis is not an easy process even though the examples might tell you
otherwise. You will feel more able to see the future and predict failures if you have the
right team. You will be the one who plans fault repair into scheduled maintenance
downtime, and your team will work proactively rather than reactively.

15
Bibliography

1. Industrial Safety & Maintenance Management – M.P Poonia & S.C Sharma
2. Industrial Safety, Health and Environment Management Systems – R.K Jain
3. Fault Tree Analysis Guide with Example – www.safetyculture.com
4. Fault Tree Analysis – www.sixsigmastudyguide.com
5. What is Fault Tress Analysis And How To Perform It – limblecmms.com
6. Fault Tress Analysis 8 steps process – accendroreality.com

16
CIVE

Abbreviations

 FTA - Fault Tree Analysis


 ETA - Event Tree Analysis
 FMEA - Failure Mode & Effects Analysis
 PRA - Probabilistic Safety (or Risk) Analysis
 RCA - Root Cause Analysis
 DPMO - Defects per Million Opportunities
 DFT - Dynamic Fault Trees
 SEFT - State Event Fault Trees
 MCS - Minimal Cuts
 MPS - Minimal Path Sets
 CCE - Common Causes Errors
 CMMS - Computer-aided Maintenance Management System
 QFTA - Qualitative/Quantitative Fault Tree Analysis

*******

17

You might also like