Professional Documents
Culture Documents
Abu Samah2015
Abu Samah2015
9th IFAC
Safety of Symposium on Fault Detection, Supervision and
Technical Processes
9th IFAC
9th IFAC Symposium on Fault
Fault Detection,
Detection, Supervision
Supervision and
and
Safety of Symposium
September Technical
2-4, 2015.
on
Processes
Arts Available online
et Métiers ParisTech, at www.sciencedirect.com
Paris, France
Safety of Technical
Safety of Technical Processes
Processes
September 2-4, 2015. Arts et Métiers ParisTech, Paris, France
September
September 2-4,
2-4, 2015.
2015. Arts
Arts et
et Métiers
Métiers ParisTech,
ParisTech, Paris,
Paris, France
France
ScienceDirect
IFAC-PapersOnLine 48-21 (2015) 844–851
Failure
Failure Prediction Methodology for
Failure Prediction
Prediction Methodology
Methodology for
for
Improved
Improved Proactive
Proactive Maintenance
Maintenance using
using
Improved Proactive Maintenance
using
Bayesian
Bayesian Approach
Bayesian Approach
Approach
A. Abu-Samah ∗∗ M.K. Shahzad ∗∗ E. Zamai ∗∗ A. Ben Said ∗∗ ∗∗
A.
A. Abu-Samah ∗∗ M.K. Shahzad ∗∗ E. Zamai ∗∗ A. Ben Said ∗∗
A. Abu-Samah
Abu-Samah M.K. M.K. Shahzad
Shahzad E. E. Zamai
Zamai A. A. Ben
Ben Said
Said ∗∗
∗
∗ Univ. Grenoble Alpes, G-SCOP, F-38000 Grenoble, France (e-mail:
∗ Univ. Grenoble Alpes, G-SCOP, F-38000 Grenoble, France (e-mail:
∗ Univ. Grenoble
Univ. Grenoble Alpes,
Alpes, G-SCOP, F-38000
F-38000 Grenoble,
Grenoble, France
asma.abu-samah@grenoble-inp.fr;
G-SCOP, France (e-mail:
(e-mail:
asma.abu-samah@grenoble-inp.fr;
asma.abu-samah@grenoble-inp.fr;
muhammadkashif.shahzad@grenoble-inp.fr;
asma.abu-samah@grenoble-inp.fr;
muhammadkashif.shahzad@grenoble-inp.fr;
muhammadkashif.shahzad@grenoble-inp.fr;
eric.zamai@grenoble-inp.fr).
muhammadkashif.shahzad@grenoble-inp.fr;
∗∗ eric.zamai@grenoble-inp.fr).
eric.zamai@grenoble-inp.fr).
∗∗ STMicroelectronics, 850 Rue Jean Monnet, 38926, Crolles, France
eric.zamai@grenoble-inp.fr).
∗∗ STMicroelectronics, 850 Rue Jean Monnet, 38926, Crolles, France
∗∗ STMicroelectronics, 850
850 Rue
(e-mail:
STMicroelectronics, Jean
Jean Monnet,
Monnet, 38926,
38926, Crolles,
anis.bensaid@st.com)
Rue Crolles, France
France
(e-mail:
(e-mail: anis.bensaid@st.com)
anis.bensaid@st.com)
(e-mail: anis.bensaid@st.com)
Abstract: Failure prediction is essential for predictive maintenance due to its ability to
Abstract:
Abstract: Failure
Failure prediction
prediction is essential for predictive maintenance due to its ability to
prevent failure
Abstract: occurrences
Failure prediction andis essential
essential for
maintenance
ismaintenance predictive
costs.
forcosts. maintenance
At present,
predictive mathematical
maintenance due
due to its
and
to and ability
ability to
statistical
its statistical to
prevent
prevent failure occurrences and At present, mathematical
modelingfailure
prevent are the
failure occurrences
prominent
occurrences and
and maintenance
approaches
maintenance usedcosts. At
for failure
costs. At present, mathematical
predictions.
present, mathematical These and are statistical
and based on
statistical
modeling
modeling are
are the
the prominent
prominent approaches
approaches used
used for
for failure
failure predictions.
predictions. These
These are are based
based on
on
equipmentare
modeling degradation
the physical
prominent models and
approaches usedmachine
for learning
failure methods, respectively.
predictions. These are None on
based of
equipment
equipment degradation
degradation physical
physical models
models and
and machine
machine learning
learning methods,
methods, respectively.
respectively. None
None of
of
these approaches
equipment degradation ensures failure models
physical predictions and well before
machine their occurrence
learning methods, to provide sufficient
respectively. None of
these
these approaches
approaches ensures
ensures failure
failure predictions
predictions well
well before their occurrence to provide sufficient
time to
these
time to
treat potential
approaches
treat potential
causes
ensurescauses pro
failurepropredictions
actively. well before
actively. Therefore, before in
Therefore,
their
in thisoccurrence
their
this
paper, we to
occurrence
paper, we to provide
present
provide
present a
sufficient
a Bayesian
sufficient
Bayesian
time
based
time to
to treat
methodology
treat potential
potentialto causes
learn
causesand pro
pro actively.
associate
actively. Therefore,
failure signatures
Therefore, in
in this
with
this paper,
potential
paper, we
we present
failure
present a Bayesian
occurrences.
a Bayesian
based
based methodology
methodology to
to learn
learn and associate
associate failure
andmaintenance failure signatures
signatures with
with potential
potential failure
failure occurrences.
occurrences.
In thismethodology
based approach, event to driven
learn and associate data signatures
failure is used as with symptoms
potential which is aggregated
failure occurrences. on
In
In this
this approach,
approach, event
event driven
driven maintenance
maintenance data
data is
is used
used as
as symptoms
symptoms which
which is
is aggregated
aggregated on
on
discretized
In this intervals.
approach, Thedriven
event failures probabilitiesdata
maintenance as predicted
is used asbysymptoms
the Bayesian whichnetwork
is are plotted
aggregated on
discretized
discretized intervals. The failures probabilities as predicted by the Bayesian network are plotted
as temporalintervals.
discretized evolution.
intervals. The
TheThisfailures
This probabilities
is further
failures exploited
probabilities as
as predicted
to extractby
predicted the
the Bayesian
either
byeither rules or network
Bayesian patternsare
network areas plotted
failure
plotted
as
as temporal
temporal evolution.
evolution. This is
is further
further exploited
exploited to
to extract
extract either rules
rules or
or patterns
patterns as
as failure
signatures
as temporal and critical
evolution. regions. These
This is These are then
further exploited used to monitor
totoextract and
either predict
rules orthe the potential
patterns as failure failure
signatures
signatures and
and critical
critical regions.
regions. These areare then
then used
used onto monitor
monitor and
and predict
predictfrom the potential
potential failure
occurrences.
signatures and The proposed
critical regions. methodology
These are is tested
then used to the dataand
monitor collected
predict the a well reputed
potential failure
occurrences.
occurrences. The
The proposed
proposed methodology
methodology is
is tested
tested on the data collected from aa well reputed
semiconductor
occurrences. The manufacturer
proposed with promising
methodology is results.on
tested on the
the data
data collected
collected from
from a well
well reputed
reputed
semiconductor
semiconductor manufacturer with promising results.
semiconductor manufacturer
manufacturer with with promising
promising results.results.
© 2015, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
Keywords: Failure prediction, predictive maintenance, Bayesian network, rules extraction.
Keywords:
Keywords: Failure
Failure prediction,
prediction, predictive
predictive maintenance,
maintenance, Bayesian Bayesian network,
network, rules rules extraction.
extraction.
Keywords: Failure prediction, predictive maintenance, Bayesian network, rules extraction.
1. INTRODUCTION based on the distance from decision boundary (faulty and
1.
1. INTRODUCTION
INTRODUCTION based
based on
on the
the distance
distance from
from decision
decision boundary
boundary (faulty
(faultyisand
and
1. INTRODUCTION non
based faulty
on classes).
the distance Consequently,
from decision failure
boundaryprobability
(faulty es-
and
non
non faulty
faulty classes).
classes). Consequently,
Consequently, failure
failure probability
probability is
is es-
es-
In a highly competitive production environment, unsched- non timatedfaultyforclasses).
planningConsequently,
maintenance failure decisions. The accuracy
probability is es-
In
In a
a highly
highly competitive
competitive production
production environment,
environment, unsched-
unsched- timated
timated for
for planning
planning maintenance
maintenance decisions.
decisions. The
The accuracy
accuracy
uled
In a equipment
highly breakdowns
competitive productioncause disruptions
environment, in unsched- of such failure prediction methods
the pro- timated for planning maintenance decisions. The accuracy is limited because they
uled equipment breakdowns cause disruptions in
in the
the pro- of
of such
such failure prediction methods is limited because they
uled
duction
uled equipment breakdowns
capacities.
equipment This requires
breakdowns cause improved
cause disruptionsresponse
disruptions in the pro-
for take
pro- of suchintofailure
account
failure prediction
prediction methods
only physical
methods is
is limited
limited because
degradation. Furthermore
because they
they
duction
duction capacities.
capacities. This
This requires
requires improved
improved response
response for
for take take into
takecertain account
into account
account only physical
only physical
physical degradation.
degradation. Furthermore
Furthermore
failure diagnosis
duction capacities.andThisrepair times improved
requires and eventually responsethe ca-for in into application
only such as degradation.
Semiconductor Industry
Furthermore
failure diagnosis and repair times and eventually the ca- in certain application such as Semiconductor Industry
failure diagnosis
pabilitydiagnosis
failure and repair
to proactively
and repair
handletimestimes
theseandand eventually
failure occurrences
eventually the ca-
the for in
ca- (SI),
in certain
certain application
occurrence of a failure
application such
suchis as Semiconductor
an event
as based phenomenon
Semiconductor Industry
Industry
pability
pability to
to proactively
proactively handle
handle these
these failure
failure occurrences
occurrences for
for (SI),
(SI), occurrence
occurrence of
of aa failure
failure is
is an
an event
event based
based phenomenon
phenomenon
optimized
pability to maintenance
proactively management
handle these (Yang
failure et al.
occurrences(2008);for which
(SI), is difficult
occurrence to
of be
a modeled
failure is an statistically
event based for failure
phenomenon pre-
optimized maintenance management (Yang et al. (2008); which
which is
is difficult
difficult to
to be
be modeled
modeled statistically
statistically for
for failure
failure pre-
optimized
Haddad
optimized
Haddad
et
et
maintenance
al. (2012)).
maintenance
al. (2012)).
management
One of
management
One of
the
the
(Yang
promising
promising
et al. (2008);
approaches
al. (2008); which
(Yang et approaches diction
diction
using
isusing only
difficult
only
temporal
to temporal
be modeled data,
data,
due
statistically
due
to
to
the failure pre-
imbalanced
for imbalanced
the pre-
Haddad
to address
Haddad etthis
et al. (2012)).
al. (2012)).
challengeOne One of the
is online
of the promising
failure approaches
prediction,
promising which diction
approaches diction using
dimension using only
of only temporal
functional
temporal data,
anddata, due to
dysfunctional
due to the
thedataimbalanced
(Susto
imbalanced
to address this challenge is online failure prediction, which dimension of functional and dysfunctional data (Susto
to address
requires
to address this
thethis challenge
current stateis
challenge isofonline
online
a systemfailure
to be
failure prediction,
monitored
prediction, and dimension
which
which et al. (2012)).
dimension of
of functional
Due to thisand
functional context
and dysfunctional
and in addition
dysfunctional data
data (Susto
to the
(Susto
requires the current state of aa system to be monitored and et
et al.
al. (2012)).
(2012)). Due
Due to
to this
this context
context and
and in
in addition
addition to
to the
requires
evaluated
requires
evaluated
the
to
the
to
current
predict
current
predict
state
the
state
the
of
of a system
occurrence
system
occurrence
to
of
to
of
be
be monitored
failures in the
monitored
failures in the
and
near
and
near
availability
et al. (2012)).
availability
of
of
large
Due
large
scale
to this
scale
equipment
context
equipment and log
in
log
and
addition
and to the
contextual
contextualthe
evaluated
future. The
evaluated to key
to predict the occurrence
contribution
predict the occurrence
from this of failures
of failures
approachin the
in the near
cannear be dataavailability
availability of
for improving
of large
large scale
scale equipment
productivity
equipment andloglog and
control
and contextual
purposes,
contextual
future.
future. The
The key
key contribution
contribution from
from thisthis approach
approach can
canand be
be datadata
data for for
for improving
improving productivity
productivity and
anddata control
control purposes,
purposes,
divided The
future. into methods
key that reevaluate
contribution from temporal
this approach inputscan be failure prediction
improving models using these
productivity and controlare becoming
purposes,
divided into methods that reevaluate temporal inputs and failure prediction models using these data are becoming
divided
those that
divided into
into methods
rely that
on maintenance
methods that logs. temporal inputs and failure
reevaluate
reevaluate temporal inputs and popular.prediction
failure Li et al. models
prediction models using
(2007) detects
using these
failure
these data are
signatures
data are becoming
based
becoming
those
those that
that rely
rely on
on maintenance
maintenance logs. logs. popular.
popular. Li
Li et
et al.
al. (2007)
(2007) detects
detects failure
failure signatures
signatures based
based
those that rely on maintenance logs. on frequent
popular. Li co-occurrences
et al. (2007) of failures
detects failure and pair it with
signatures based a
With the development of sensors technology and real-time on time
frequent
on frequent
frequent
to failure
co-occurrences
co-occurrences
survival model,
of failures
of while
failures and
and pair
several
pair
pair
methods
it with
it with
with
have
a
a
With
With the development
the development
development of
ofonsensors
sensors technology
technology and
and real-time
real-time on co-occurrences of failures and it a
data collection,
With the researchof failuretechnology
sensors prediction and can be placed time
real-time time to
been
to
to failure
failure
using
survival
survival
Hidden
model,
model,Models
Markov
while several
while several
several
(HMM)methods
methods
methods
to estimate
have
have
data
data collection,
collection, research
research on
onet failure
failure prediction
prediction can
can bebe placed
be Moura
placed been time failure
using survival
Hidden model,
Markov Models while (HMM) to have
estimate
in thecollection,
data former category
research (Luon al. (2007);
failure das Chagas
prediction can placed been using Hidden
hidden Markov Models (HMM) to
in
in
et
the
the
al.
former
former
(2011)).
category
category
In this
(Lu
(Lu et
et
category,
al.
al. (2007);
(2007);
the
das
das
equipment
Chagas
Chagas Moura
Moura
condition
sequences
been
sequencesusingofof Hidden
hidden
degradation
Markov
degradation Models states
states
of a system
(HMM)
of a to estimate
system
before
estimate
before
in
et the
al. former
(2011)). category
In this (Lu et al. (2007);
category, the das Chagas
equipment Moura asequences
condition failure occurs
sequences of
of hidden
(Salfner
hidden degradation
(2005); Zhou
degradation states
states et of
al. aa(2010);
of system
system before
Vrignat
before
et
et al. (2011)).
al. (2011)).
is modelled In this
using
In this category,
structural
category, the
timethe equipment
series condition
by fore- aet
followedcondition
equipment failure
a failure
failure occurs
occurs The (Salfner
(Salfner (2005);
(2005); Zhou
Zhou et al.
etmodels (2010);
al. (2010);
(2010); Vrignat
Vrignat
is modelled using structural time series followed by fore- a al. (2015)).
occurs accuracy
(Salfner (2005); of Zhou
theseet al. dependsVrignaton
is modelled
casting
is using structural
of deteriorated
modelled using structuralstatestimetime series
in future followed
using
series using
followed by fore-
state-space
by fore- et et al. (2015)). The accuracy of these models depends on
casting of deteriorated states in future state-space theal.
et al. (2015)).
quality
(2015)). The
of temporal
The accuracy
data and
accuracy of
of these
are not
these models
suitable
models depends
for large
depends on
on
casting
modelling
casting of
of deteriorated
with regression
deteriorated states
methods.
states in
in future
Apart
future using
from
using state-space
time series
state-space the
the quality
quality of
of temporal
temporal data
data and
and are
are not
not suitable
suitable for
for large
large
modelling with regression methods. Apart from time series number
the qualityof variables.
of temporal data and are not suitable for large
modelling
analysis, some
modelling with regression
with regression
methods use methods. Apartsuch
classifiers
methods. Apart fromastime
from time series number
Support
series number of of variables.
analysis,
analysis, some
some methods
Vector Machine methods
(Susto etuse
use
use classifiers
classifiers
al. classifiers
(2013)) tosuch
such
such as
as Support
predict failures number
Support In addition of variables.
variables.
to the above, failure occurrence can also be
analysis,
Vector some
Machine methods
(Susto et al. (2013)) to as
predict Support
failures In addition to the above, failure occurrence can also be
Vector
Vector Machine
Machine
The authors (Susto
(Susto et
et al.
al. (2013))
(2013)) to
to predict
predict failures
failures In
In addition
influenced
addition by to
tothethe above,
complex
the failure
failure occurrence
above, environment can
can also
of the manufactur-
occurrence also bebe
gratefully acknowledge STMicroelectronics
The authors gratefully acknowledge STMicroelectronics for their for their influenced
influenced by
by the
the complex
complex environment
environment of
of the
the manufactur-
manufactur-
The authors ing process.
influenced by Accordingly,
the complex this paper
environment integrates
of the contextual
manufactur-
support
and
The authorsprovision of
gratefully
gratefullydata for TT
acknowledge
acknowledge case study. The authors
STMicroelectronics
STMicroelectronics
study. The authorsfor
also
for ac-
their
their ing
ing process.
process. Accordingly,
Accordingly, this paper
paper integrates
thisproduct, integrates contextual
contextual
support
knowledge
and provision of data for TT case also ac- information
ing process. collected from
Accordingly, this paper process, contextual
integrates equipment
support andEuropean
support and provisionproject
provision of data INTEGRATE
of data for
for TT
TT case andThe
case study.
study.
andThe
region RhoneAlpes
authors
authors also
also ac-
ac- information
information collected
collected from
from product,
product, process,
process, equipment
equipment
knowledge
for ongoing
knowledge
European
Research.
European
project
project
INTEGRATE
INTEGRATE and
region
region
RhoneAlpes
RhoneAlpes and maintenance
information datafrom
collected sourcesproduct,to predictprocess, a system
equipmentfail-
knowledge European
for ongoing Research. project INTEGRATE and region RhoneAlpes and
and maintenance
maintenance data
data sources
sources to
to predict
predict a
a system
system fail-
fail-
for ongoing Research.
for ongoing Research. and maintenance data sources to predict a system fail-
Copyright 2015 IFAC
2405-8963 © 2015, 844 Hosting by Elsevier Ltd. All rights reserved.
IFAC (International Federation of Automatic Control)
Copyright © 2015 IFAC 844
Copyright
Peer review©
Copyright 2015
©under IFAC
2015 responsibility
IFAC 844
of International Federation of Automatic
844Control.
10.1016/j.ifacol.2015.09.632
SAFEPROCESS 2015
September 2-4, 2015. Paris, France A. Abu-Samah et al. / IFAC-PapersOnLine 48-21 (2015) 844–851 845
845
SAFEPROCESS 2015
846
September 2-4, 2015. Paris, France A. Abu-Samah et al. / IFAC-PapersOnLine 48-21 (2015) 844–851
Learning BN involves creating the qualitative part of the 2.2 Rules Extraction for Real Time Detection of Failures
network which is the causal structure between variables, Before Its Occurrence
commonly known as Directed Acyclic Graph (DAG) and
the quantitavive part of computing the set of conditional At this stage of the proposed methodology, we are
probability distributions of variables, most of the time equipped with the probability graph for each failure over
used as Conditional Probability Table (CPT). The re- discrete time intervals. These probability distributions
sulting network is used to perform probabilistic inference from the BNL&RE testing set are then analyzed in Step-
from multiple variables, such as calculating the value of 3 to extract patterns for all type of failures separately. We
P(failure|presence of all causes and/or symptoms) as well make the assumption that if a failure type probability is
as P(a given cause|knowing the failure). The Bayes theo- superior to certain level of value, it defines the type of fail-
rem is the heart of this computation (Margaritis (2003)). ure the system in question is in and in the approach, that
Since their introduction, BN has been extended to cover the occurrence of failures can be predicted by identifying
many important problems. Kobbacy et al. (2011) discuss the consistency of the failure probabilities the system is
the various utilities of BNs in manufacturing with em- experiencing. This assumption is based on the fact that
phasis on its applicability when uncertainty is the key dependencies among the chosen variables exist and trans-
characteristic. lated into conditional probabilities of target events. In case
Creation of BN is the heart of the first 2 steps to present of no pattern existence, Critical Region (CR) and number
the causal and conditional dependencies between two types of sufficient consecutive points are computed to find rule(s)
of nodes, paired with critical part of the methodology for prediction. Existence of pattern is also possibly in need
which is the handling of time: (i) Predictors, corresponding of CR to refine the results. This step concludes with the
to the observable events and statistical information coming prediction of failure using pattern/rule(s) on the training
from multiple data sources and (ii) Failure code, with BNL&RE dataset and the computation of lead time (the
’no failure’ included as the targeted equipment condition. time interval from the prediction to the failure occurrence)
Step-1 comprises of Predictors identification from man- for each prediction. If none of the pattern/rule(s) extracted
ufacturing process, quality inspection, maintenance and fulfills the users criteria, then it is repeated to identify new
process control operations database. It is one of the most pattern/rule(s) by either relaxing CR or number of consec-
difficult and complex task as it requires multidisciplinary utive points. Prediction of failures with rules on validation
expertise from each domain. The first task is pursued with set is the final step, Step-4 of this methodology. We also
the definition of time interval for these data collection, compute the predictability index (PI) for all chosen rules
because in the database, the targeted variables can either as the average of prediction accuracy (Eq. 1), precision
be event based or continuous data, collected at irregular (Eq. 2) and lead time percentage (Eq. 3) 1 .
intervals. To overcome this challenge, we propose a time
discretization with the objective to monitor the failure TP + TN
Accuracy = (1)
probabilities systematically. The next task is to divide TP + FP + TN + FN
historical dataset in two parts as BN learning with rules
extraction (BNL&RE) and validation (V). In this paper, TP
we distinguish the notion of test and validation. Test refers Precision = (2)
TP + FP
to cross validation task using BNL&RE sample data while
validation is the final step for the methodology. The two Lead time TP
parts of data are initially divided evenly as 50:50. Lead time % = (3)
Lead time TP+ Lead time FP
Step-2 is focused on learning and optimizing BN structure
and computing its CPT. The structure of BN can be 3. CASE STUDY AND PROOF OF CONCEPT
obtained either through experts knowledge or learning
from the data. In the methodology, it is proposed to This section introduces the case study and proof of concept
be learned from data using a score-based unsupervised for the methodology tested on one of the two process
learning algorithm that use Minimum Description Length reactors, one of the many modules of Thermal Treat-
(MDL) as an objective function for its advantage of trade- ment equipment (TASMI) from a reputed semiconductor
off between data fit and model complexity (Lam and manufacturer. It is used to grow and deposit oxide and
Bacchus (1994)). The prediction and accuracy criteria nitride layers on the surface of silicon wafers. It is also
are defined by the end user to validate the generated used for annealing (heat treatment) after production steps
structure and its CPT as the choice of Bayesian inference to stabilize the crystalline structure of silicon wafers. The
model. If it is not fulfilled, the initial BN structure can reactor module, TASMI01 considered in this case study is
be further optimized using other learning algorithm(s). comprised of: 1-Exterior chamber, 2-Inner chamber with
The non-compliance to the user defined criteria using quartz (liner), 3-Wafer support (boat), 4-Elevator boat
selected algorithms results in increasing and adjusting rotation, 5-Watertight door for loading and unloading,
the ratio of BNL&RE and V dataset. This task follows 6-Heating elements, 7-Gas panel, 8-Temperature sensor,
recursive relearning and optimization of BN structure 9-Pressure gauge (manometer), 10-Pressure regulator. At
until user defined criteria is met or Size(V) is less that present, the best preventive maintenance effort made so
0.25*Size(Complete Dataset). The last BN model is then far in this manufacturing industry exist in the shape of
used to plot failure probabilities upon testing dataset Fault Detection and Classification (FDC) technique that
on the discretized time intervals, and for each failure
1 TP=True Positive, TN=True Negative, FP=False Positive and
separately. These graphs are the input to next step.
FN=False Negative
846
SAFEPROCESS 2015
September 2-4, 2015. Paris, France A. Abu-Samah et al. / IFAC-PapersOnLine 48-21 (2015) 844–851 847
847
SAFEPROCESS 2015
848
September 2-4, 2015. Paris, France A. Abu-Samah et al. / IFAC-PapersOnLine 48-21 (2015) 844–851
848
SAFEPROCESS 2015
September 2-4, 2015. Paris, France A. Abu-Samah et al. / IFAC-PapersOnLine 48-21 (2015) 844–851 849
Fig. 8. Step-3&4: Extraction and refinement of rules from each failure probability distribution
3.3 Step-3: Definition and Refinement of Rule(s) for As a result, similar patterns have been detected for all
Failure Prediction occurrences of Failure b and Failure c. Failure b is contin-
uous increasing probabilities and Failure c is a W or M
point-to-point pattern with consecutive values are at least
5 times smaller or bigger. Next, critical regions common to
Plotted graphs provided in step-2 plays a pivotal role in all occurrences of each failure are defined. For Failure a, no
this step to define and refine pattern/rule(s) for detection pattern could be extracted and a CR of all occurrences for
of failure before its occurrence. For the application in this this failure needs to be established first before assigning
study case, we propose a scheme to find the pattern/rules the rules of consecutive number of points. The Table 2
(Figure 8). Initially, based on the probabilities distribution summarizes our results for patterns/rules and their re-
observation of each failure, detect the existence of an ob- spective critical regions. The probability distribution for
vious pattern. If it exist, define Critical Region (CR) com- each failure in chosen time intervals with the count of
mon to all of respective failure occurrence for pattern Pi . occurrences is presented in brackets (Figure 9). In the
The limits for this region are selected as min(probability) graphs we can spot the horizontal lines showing the upper
and max(probability), observed among all failure occur- and lower limit of the CR whereas in brown box are the
rences. However, in case of no pattern, CR are defined patterns associated to defined rules.
first with min and max limits defined as max(mean, mod,
median) and max(probabilities) respectively for all failure Table 2. Summary of results from rule(s) ex-
occurrences. This follows the computation of min and max traction
number of consecutive points as potential identifiers of the Base
Failure Critical Region (CR) Rule(s)
respective failures. The cardinality of [min,max] becomes of Rule
Lower limit=max(mean)=0,175 1) [Min=2;Max=22]
the number of rule(s) extracted from respective region for Failure a Non-pattern Upper limit=max(probability consecutive
each failure. This step concludes with the prediction of observed for the failure)=0,64 points inside the CR.
1) Sequentially increasing
failure using rule(s) on the training BNL&RE dataset and Lower limit=min(probability of
probabilities &,
failure occurrence)=0,08
the computation of lead time for each prediction. User Failure b Pattern
Upper limit=max(probability
2) [Min=3;Max=11]
consecutive points inside the
defined criteria is set as the minimum of time needed observed for the failure)=0,71
CR following rule 1.
to fix each type of failure. If none of the rules extracted Lower limit=min(probability of 1) M/W Pattern &,
failure occurrence)=0,04 2) 2 consecutives
fulfills the criteria of user defined lead time, the scheme is Failure c Pattern
Upper limit=max(probability points of the pattern,
repeated to identify new patterns by relaxing the CR. observed for the failure)=0,57 P(t)=factor of 5*—P(t+1)—
849
SAFEPROCESS 2015
850
September 2-4, 2015. Paris, France A. Abu-Samah et al. / IFAC-PapersOnLine 48-21 (2015) 844–851
850
SAFEPROCESS 2015
September 2-4, 2015. Paris, France A. Abu-Samah et al. / IFAC-PapersOnLine 48-21 (2015) 844–851 851
potential unscheduled time caused by equipment failure prognostic capabilities. Reliability, IEEE Transactions
because early diagnosis and appropriate actions plans can on, 61(4), 872–883.
be activated. On the other hand, early interventions can Kobbacy, K.A., Vadera, S., McNaught, K., and Chan, A.
cause unnecessary corresponding cost of maintenance and (2011). Bayesian networks in manufacturing. Journal
resources. Therefore, it is clear that a proper definition of of Manufacturing Technology Management, 22(6), 734–
lead time is needed for our methodology to operate in the 747.
most effective way. This is left to the choice of end user Lam, W. and Bacchus, F. (1994). Learning bayesian
who can better judge the type of failure and associated belief networks: An approach based on the mdl principle.
repair duration to ultimately decide the target lead time Computational intelligence, 10(3), 269–293.
for early failure prediction. Li, Z., Zhou, S., Choubey, S., and Sievenpiper, C. (2007).
Failure event prediction using the cox proportional haz-
4. CONCLUSION ard model driven by frequent failure signatures. IIE
transactions, 39(3), 303–315.
This paper presents the methodology for failure predic- Lu, S., Tu, Y.C., and Lu, H. (2007). Predictive condition-
tion using BN approach and is complemented with the based maintenance for continuously deteriorating sys-
extraction of rules for failure prediction with computation tems. Quality and Reliability Engineering International,
of lead time and predictability index. Its advantage is 23(1), 71–81.
the use of Predictors coming from multiple data sources Margaritis, D. (2003). Learning Bayesian network model
as predictors in a single prediction model. It uses event structure from data. Ph.D. thesis, US Army.
driven Predictors as temporal characteristics successfully Munteanu, P. and Bendou, M. (2001). The eq framework
to predict the potential failures. Promising results from for learning equivalence classes of bayesian networks.
this offline prediction in case study demonstrates interest In Data Mining, 2001. ICDM 2001, Proceedings IEEE
to extend it for real time predictions and using machine International Conference on, 417–424. IEEE.
learning algorithms for pattern/rule(s) extraction. Salfner, F. (2005). Predicting failures with hidden markov
There are some limitations and potentials in our work. models. In Proceedings of 5th European Dependable
First, rules are extracted, separately, for each failure. Computing Conference (EDCC-5), 41–46.
However, multiple failure detection in the same interval Samah, A.A., Shahzad, M.K., Zamaı̈, E., and Hubac, S.
are not treated in the proposed methodology. Second, (2014). Methodology for integrated failure-cause diag-
PI is a presentation of average of average from predic- nosis with bayesian approach: Application to semicon-
tion accuracy and precision plus time gain percentage. A ductor manufacturing equipment. In Second European
sensitivity analysis is required to assure online prediction Conference of the Prognostics and Health Management
reliability. Third, we have only used the BN model for Society 2014.
failure inference when in fact it can be used as a fault Susto, G.A., Pampuri, S., Schirru, A., De Nicolao, G.,
diagnosis tool as well. We are looking for these potentials McLoone, S., and Beghi, A. (2012). Automatic control
as our future perspectives. and machine learning for semiconductor manufacturing:
Review and challenges. In Proceedings of the 10th
REFERENCES European Workshop on Advanced Control and Diagnosis
(ACD 2012).
Acid, S. and de Campos, L.M. (2003). Searching for Susto, G.A., Schirru, A., Pampuri, S., Pagano, D.,
bayesian network structures in the space of restricted McLoone, S., and Beghi, A. (2013). A predictive main-
acyclic partially directed graphs. Journal of Artificial tenance system for integral type faults based on support
Intelligence Research, 445–490. vector machines: An application to ion implantation.
Arroyo-Figueroa, G. and Sucar, L.E. (1999). A tempo- In Automation Science and Engineering (CASE), 2013
ral bayesian network for diagnosis and prediction. In IEEE International Conference on, 195–200. IEEE.
Proceedings of the Fifteenth conference on Uncertainty Teyssier, M. and Koller, D. (2012). Ordering-based search:
in artificial intelligence, 13–20. Morgan Kaufmann Pub- A simple and effective algorithm for learning bayesian
lishers Inc. networks. arXiv preprint arXiv:1207.1429.
Chickering, D.M. (2002). Learning equivalence classes of Vrignat, P., Avila, M., Duculty, F., and Kratz, F. (2015).
bayesian-network structures. The Journal of Machine Failure event prediction using hidden markov model
Learning Research, 2, 445–498. approaches.
das Chagas Moura, M., Zio, E., Lins, I.D., and Droguett, Yang, Z.M., Djurdjanovic, D., and Ni, J. (2008). Main-
E. (2011). Failure and reliability prediction by support tenance scheduling in manufacturing systems based on
vector machines regression of time series data. Reliabil- predicted machine degradation. Journal of Intelligent
ity Engineering & System Safety, 96(11), 1527–1534. Manufacturing, 19(1), 87–98.
Gallagher, N.B., Wise, B.M., Butler, S.W., White, D., and Zhou, Z.J., Hu, C.H., Xu, D.L., Chen, M.Y., and Zhou,
Barna, G.G. (1997). Development and benchmarking of D.H. (2010). A model for real-time failure prognosis
multivariate statistical process control tools for a semi- based on hidden markov model and belief rule base.
conductor etch process: improving robustness through European Journal of Operational Research, 207(1), 269–
model updating. In Proc. ADCHEM, volume 97, 78–83. 283.
Glover, F. (1986). Future paths for integer programming
and links to artificial intelligence. Computers & opera-
tions research, 13(5), 533–549.
Haddad, G., Sandborn, P.A., and Pecht, M.G. (2012). An
options approach for decision support of systems with
851