Professional Documents
Culture Documents
FMEA A Life Cycle Cost Perspective
FMEA A Life Cycle Cost Perspective
DETC2000/RSAFP-14478
ABSTRACT
Failure Modes and Effects Analysis (FMEA) is a method to Occurrence) have inconsistent definitions, resulting in
identify and prioritize potential failures of a product or process. questionable risk priorities.
The traditional FMEA uses three factors, Occurrence, Severity, This paper introduces a scenario-based FMEA to delineate
and Detection, to determine the Risk Priority Number (RPN). and evaluate risk events more accurately.
This paper addresses two major problems with the conventional
Scenario-based FMEA is an FMEA organized around
FMEA approach: 1) The Detection index does not accurately
undesired cause–effect chains of events rather than
measure contribution to risk, and 2) The RPN is an inconsistent
failure modes. Scenario-based FMEA uses probability
risk-prioritization technique.
and cost to evaluate the risk of failure.
The authors recommend two deployment strategies to
address these shortcomings: 1) Organize the FMEA around The purpose of scenario-based FMEA is to improve the
failure scenarios rather than failure modes, and 2) Evaluate risk representation of failures, and to evaluate these failures with
using probability and cost. The proposed approach uses consistent and meaningful metrics. This approach facilitates
consistent and meaningful risk evaluation criteria to facilitate life economic decision-making about the system design. Table 1
cost-based decisions. shows a rough comparison between traditional FMEA,
KEYWORDS: FMEA, FMECA, Risk Priority Number (RPN), serviceability design, and scenario-based FMEA.
reliability, risk management
Table 1 Comparison of risk-reduction techniques
1. INTRODUCTION method strategy failure cost of product total life
probability failure cost cost
While FMEA is considered a tool to help improve reliability, traditional ↑ reliability ↓ same ↑ ?
it can also guide improvements to the serviceability and FMEA
diagnosability of a system (Di Marco et al., 1995: Lee, 1999). serviceability ↓ service cost same ↓ ↑ ?
design
However, product development teams must balance risk
scenario- ↓ life-cycle cost cost-based cost- cost- ↓
reduction strategies (improved reliability, serviceability and based FMEA (failure cost and decision based based
diagnostics) with cost. product cost) decision decision
Companies are aware of warranty costs associated with
product failures. Additionally, many companies now offer long- Traditional FMEA is usually focused on improving product
term service contracts, making the manufacturer responsible for reliability – often at the expense of product cost. Serviceability
costs formerly borne by the customer. Now more than ever, design reduces the cost of the failure (if failure does occur) but
companies need to make cost-driven decisions when generally adds to product cost through design changes.
compromising between the risk of failure and the cost of Scenario-based FMEA can be used to weigh the expected cost
abatement. The Risk Priority Number (RPN), used to evaluate of the failure against the estimated cost of the solution strategy.
risk in FMEA, is not sufficient for making cost-driven decisions. The next section covers definitions of the Risk Priority
At best, the RPN provides a qualitative assessment of risk. In Number. Section 3 discusses shortcomings of traditional FMEA
addition, components of the RPN (Severity, Detection, and and the RPN. Section 4 compares scenario-based FMEA to
traditional FMEA using a hair dryer and video projector as
Detection Definition 2: What is the chance of the 3.3 Confusion Associated with the “Detection” Index
customer catching the problem before the problem results One definition of Detection relates to the quality processes
in catastrophic failure? of an organization. This definition quantifies how likely the
The rating decreases as the chance of detecting the problem is company’s “controls” are to detect the failure mode during the
increased, for example: product development process. There is some confusion
associated with this definition, for example: If a “design
10 – Almost impossible to detect control” detects a failure mode, does this imply it will be
..
. prevented? How can we quantify the competency of quality
1 – Almost certain to detect
processes? If a potential failure is identified as a line item on an
This definition measures how likely the customer is to detect the FMEA, has it already been “detected”?
failure based on the systems diagnostics and warnings, or Controls used by the organization can help identify
inherent indications of pending “catastrophic” failure. potential failures (e.g. design reviews, reliability estimation,
modeling, testing) and estimate the probability of a failure.
3. SHORTCOMINGS OF TRADITIONAL FMEA However, this is an organizational issue, rather than a product
issue. Assessing the effectiveness of an entire development
3.1 What is Risk? effort is extremely difficult and highly subjective.
The purpose of FMEA is to prioritize potential failures in Another definition of Detection relates to the detectability
accordance with their “risk.” There are many definitions of risk, of a failure once the product is in the hands of the customer.
and we have listed a few below Once the product is in operation, “Detection” reflects the
• The possibility of incurring damage probability that the failure will occur in one mode (detected
(Hauptmanns and Werner, 1991) early) versus another (undetected until ‘catastrophic’ failure).
• Exposure to chance of injury or loss For example, FMEA might list an oil leak resulting in engine
(Morgan and Henrion, 1990) failure with a low Detection rating (easy to detect) due to a
• Possibility of loss or injury,… uncertain danger prominent oil warning light. However, the RPN will give a low
(Webster’s Dictionary, 1988)
Palady, 1995
7
McDermott et al., 1996 • What are the potential effects on the system?
6
• How likely is the cause to occur and result in the
5 Humphries, 1994
4
specified effects?
3
There is a risk associated with the scenario, that is, the
2
series of causes and effects.
1
A failure scenario is an undesired cause-and-effect chain
0
of events. Each scenario can happen with some
1.E-11 1.E-09 1.E-07 1.E-05 1.E-03 1.E-01
Approximate Probability probability and results in negative consequences.
Figure 1 Occurrence rankings and probabilities The cause-effect chain can be lengthened when new effects and
An Occurrence rating of “5” could correspond to failure rates causes of “causes” are identified. Failure scenarios are used
which span several orders of magnitude (e.g. from around 0.1 to frequently in the Risk Analysis field (Kaplan and Garrick, 1981:
0.001 in Figure 1). Modarres, 1992). The difference between failure modes and
The ratings for Severity are not related to any standard scenarios is as follows
measure. These 1-10 values can only provide a relative ranking • failure mode describes a cause and the immediate effect
of “consequences” or “hazard” for a particular FMEA. • failure scenario is an undesired cause-effect chains of
Detection is not related directly to probability or any other events, from the initiating cause to end effect, including all
standard measure (with the exception of some “process FMEA” intermediate effects (Figure 2.)
manuals). In addition, Detection has several disparate
failure mode failure scenario
definitions as we’ve discussed.
immediate next-level end
3.5 Measure Theoretic Perspective on the RPN cause
effect effect effect
The scales for Severity, Occurrence, and Detection are
ordinal. Ordinal scales are used to rank-order items such as the Figure 2 Failure modes and scenarios
size of eggs, or quality of hotels. Ordinal measures preserve Each failure scenario happens with some probability and
transitivity (order) but their magnitude is not “meaningful.” For results in negative consequences. For example, brake failure in
each scenario, such as “low fluid pressure” or “linkage failure” a probability = p(a) [1-p(b|a)] [1-p(d|a)] p(h|a) h
and causes for these events can be added. Figure 3 shows a a p(a) [1-p(b|a)] p(d|a) d
map of potential failure scenarios for a brake system failure. a p(a)p(b|a) b
manufacturing line result in different failure scenarios: they can total expected failure cost = ∑ pi ci (3)
be “detected” at discrete points during production or after they i =1
have been shipped, with differing probabilities and Figure 4 shows the composition of the expected cost equation
consequences. for scenario a–d (from Figure 4).
According to Bayes’ theorem, scenario probability is the probability of scenario
probability of the cause and conditional probability of the end probability of cause conditional probability of consequences
effect (Lindley, 1965). Scenario-based FMEA accommodates a)
multiple causes that could result in a single end effect, b) single eca-d = p(a) [1- p(b|a)] p(d|a) ca-d
causes that result in multiple effect. Figure 3 shows examples of
failure scenarios throughout a products life cycle. measure of risk measure of consequences
∑ p(j|i)c
Aerospace
eci = p(i) (12) Cost Company
i-j
j =1
From equations 4-6, the total risk associated with cause “a”
would be Automotive
Company
Severity Rating
= p(a)[p(b|a)ca-b + [1-p(b|a)] p(d|a)ca-d Figure 7 Hypothetical Cost-Severity relationships
+ [1-p(b|a)][1- p(d|a)]p(h|a)ca-h] Other “cost” metrics could be substituted, such as toxicity
Similarly, expected costs could be calculated for a given end levels, fatalities, etc., as long the magnitude of the measure is
event, using the probability of all contributing causes. This meaningful. Specific examples of cost functions are shown in
technique yields cost estimates that aren’t possible using the Table 5.
Risk Priority Number. For comparison, Figure 6 shows the
Table 5 Examples of cost-Severity tables
composition of the RPN.
linear exponential hybrid
1-10 rating corresponding to the 1-10 rating of Severity Cost ($) Severity Cost ($) Severity Cost ($)
probability of failure cause & mode consequences 1 50 1 10 1 20
2 100 2 50 2 100
3 150 3 200 3 400
RPN = O x D x S
4 200 4 700 4 1000
5 250 5 2500 5 2000
measure difficulty of detecting failure during the 6 300 6 10000 6 3500
of risk design, manufacturing, OR operation (1-10) 7 350 7 35000 7 6000
8 400 8 130000 8 10000
Figure 6 Composition of the Risk Priority Number 9 450 9 500000 9 15000
10 500 10 2000000 10 20000
Expected cost has an infinite range of values, but the RPN is
restricted to integer values between 1 and 1000. The RPN Assumption 3: Detection is omitted from RPN calculations
effectively expands a 0-1 probability into a 1-100 component For this example, we will set the Detection rating equal to 1 for
(Occurrence × Detection) and compresses the measure of all failures. We assume that the “probability of detection” has
consequences into a 1-10 range (Severity.) been rolled into the “probability of occurrence” rating.
Gilchrist (1993) was among the first to recommend
supplanting the RPN with expected cost. We build on Gilchrist's Theoretical Example: RPN vs. Expected Cost
ideas in the next section by performing a detailed comparison This example uses a standard probability-Occurrence
between expected cost and the RPN. relationship (from Table 4) and a linear cost-Severity
relationship (from Table 5). These two tables yield 100 pairs of
{probability, cost} and their corresponding {Occurrence,
c
4.3 Other Occurrence and Severity Functions
60
Figure 9 displays expected cost-RPN relationships for nine
50 combinations of Severity-cost relationships (from Table 5) and
40 Occurrence-probability mappings (from Figure 1.)
e
30 AIAG McDermott Palady
80 80
b 80
a
10 60
60
linear
60
d 40
40
0 40
20
20
0.00001 0.001 0.1 10 1000 20
Expected Cost
0
1E-05 0.0001 0.001 0.01 0.1 1 10 100 1000 0.001 0.01 0.1 1 10 100 1000
100 100
100
60
and cost
hybrid
60 60
40
40 40
20
0
20
60
60 60
40
20
0
0
Table 6 RPN can have different expected costs 1E-06 0.0001 0.01 1 100 10000 1000000 1E+08 0.0001 0.01 1 100 10000 1000000 100000000
Occurrence
Probability
Potential Potential
exp. Cost
Scenario
Severity
Requirement
Failure Causes of
Effects Effects plots probability vs. Severity (Figure 11).
Cost
RPN
Modes Failure
1
convert power to no air hair not
g
rotation
no rotation motor failure 0.001 6
flow dried
100 8 0.1 48 E1
c
convert rotation no fan loose fan
0.01 8
no air hair not
30 6 0.3 48 10-1 A1
to flow rotation connection flow dried
convert power to obstruction motor melt A2
d no rotation 1E-04 4 1000 9 0.1 36
rotation impeding fan overheat casing
supply power to no power to broken fan no air hair not 10-2 A4 B3
i
fan fan switch
0.001 6
flow dried
30 6 0.03 36 C1
j
supply power to no power to loose switch
0.001 6
no air hair not
30 6 0.03 36
C2
fan fan connection flow dried
supply power to no power to short in power no air hair not
10-3 A3
k 0.001 6 30 6 0.03 36
fan fan cord flow dried
convert power to foreign matter- reduced inefficient
a low rotation 0.1 10 10 3 1 30 B2
rotation friction air flow drying 10-4 D1
P
convert power to obstruction no air hair not
b no rotation 0.1 10 10 3 1 30
rotation impeding fan flow dried
supply power to no power to no source no air hair not D2
f
fan fan power
0.01 8
flow dried
10 3 0.1 24 10-5 E2
convert power to rotor/stator reduced hair not
l low rotation 1E-04 4 30 6 0.003 24
rotation misalignment air flow dried
supply power to no power to short in power no air potential
e
fan fan cord
1E-05 2
flow injury
10000 10 0.1 20 10-6
supply power to low power low source reduced inefficient
IV III II I
m 1E-04 4 10 3 0.001 12
fan to fan power air flow drying Severity Classification
convert power to rotor/stator
h
rotation
low rotation
misalignment
0.01 8 noise noise 5 1 0.05 8 ( increasing severity )
Figure 11 Criticality matrix for prioritizing failures (adapted
The prioritization given by decreasing expected cost does from Bowles, 1998)
not match the RPN ordering. Figure 10 shows how the RPN can
give lower priority to failures that have high expected cost. FMECA standards use probability to measure the chance
of a failure and plot it on a log scale against severity categories.
60 1.2 This technique is preferable to the RPN for the several reasons:
• failure frequency is measured with probability
50 exp. Cost 1 • the Detection index is eliminated, and
RPN
• ordinal measures are not multiplied.
40 0.8
However, the heuristics used to calculate equivalent risk
exp. cost
RPN
30 0.6
failures are meaningless without knowing the relative magnitude
of the Severity classifications. For example, E1, A2, and B3 in
20 0.4 Figure 11 are considered “equivalent rank.” Depending on the
scale of the severity classifications, these points may or may
10 0.2 not have equivalent risk priority.
0 0
4.6 Using Expected Cost to Make Design Decisions:
h m e l f b a k j i d c g
failure scenarios Projector Bulb Example
Consider a computer video projector used for
Figure 10 Failures prioritized by expected cost have presentations: bulb failure is a known problem and occurs with
different priorities than the RPN probability 0.01. The cost of bulb replacement is $50. The bulb
Understandably, engineers are not inclined to make can fail either A) during a presentation or B) while “idle” (i.e.,
probability or cost estimates without any substantiating data. failure without an audience,) with respective probabilities of 0.05
Apparently there is less apprehension with using 1-10 point and 0.95. The cost of failure during a presentation is estimated
estimates for probability and severity. However, probability and to be $500 more than the $50 cost of replacing the bulb.
cost carry more meaning than their RPN counterparts. Both risk Expected cost for the two failure scenarios is as follows
criteria are based on educated estimates; the set containing I) Current bulb failure rate (Pbulb failure = 0.01)
probability and cost contains more information. Additionally,
ECbulb failure = ECpresentation (scenario A) + ECidle (scenario B)
probability and cost are ratio scales, which can be legitimately
multiplied into a composite risk measure. = (Pbulb failure )[(Ppresentation)($Cpresentation) + (Pidle )($Cidle )]
= 0.01 [(0.05)($550) + (0.95)($50)]
= $0.75
= $0.55 0.1
0
IV) Early warning of bulb failure (75% of “in presentation”
current design reliable design easy service early warning
failures eliminated) design design
Pidle
C idle