Professional Documents
Culture Documents
Accepted Manuscript: Computers & Industrial Engineering
Accepted Manuscript: Computers & Industrial Engineering
Accepted Manuscript: Computers & Industrial Engineering
PII: S0360-8352(17)30520-X
DOI: https://doi.org/10.1016/j.cie.2017.10.033
Reference: CAIE 4970
Please cite this article as: Baptista, M., Sankararaman, S., de Medeiros, o.P., Nascimento Jr., C., Prendinger, H.,
Henriquesa, E.M.P., Forecasting Fault Events for Predictive Maintenance using Data-driven Techniques and ARMA
Modeling, Computers & Industrial Engineering (2017), doi: https://doi.org/10.1016/j.cie.2017.10.033
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Forecasting Fault Events for Predictive Maintenance
using Data-driven Techniques and ARMA Modeling
Abstract
Presently, airline maintenance scheduling does not take fault predictions into
account, and happens at fixed time-intervals. This may result in unneces-
sary maintenance interventions and also in situations where components are
not taken out of service despite exceeding their designed risk of failure. To
address this issue we propose a framework that can predict when a compo-
nent/system will be at risk of failure in the future, and therefore, systematically
advise when maintenance actions should be taken. In order to facilitate such
prediction, we employ an auto-regressive moving average (ARMA) model along
with data-driven techniques, and compare the performance of multiple data-
driven techniques. The ARMA model is intended to derive features that are
used within the data-driven model to come up with the final prediction. The
novelty of our work is the integration of the ARMA methodology with data-
driven techniques to predict fault events. This study reports on a real industrial
case of unscheduled removals of a critical valve of the aircraft engine. Results
show that the proposed approach can outperform the Weibull life usage model
on several evaluation measures such as absolute and percentage errors.
Keywords: Real Case Study; Aircraft Prognostics, Predictive Maintenance;
Data-driven Techniques; ARMA Modelling; Life Usage Modelling;
2010 MSC: 00-01, 99-00
Preprint submitted to Journal of Computers & Industrial Engineering June 24, 2017
Forecasting Fault Events for Predictive Maintenance using Data-driven Techniques
and ARMA Modeling
Abstract
Presently, time-based airline maintenance scheduling does not take fault predictions into account, but happens at fixed
time-intervals. This may result in unnecessary maintenance interventions and also in situations where components are
not taken out of service despite exceeding their designed risk of failure. To address this issue we propose a framework that
can predict when a component/system will be at risk of failure in the future, and therefore, advise when maintenance
actions should be taken. In order to facilitate such prediction, we employ an auto-regressive moving average (ARMA)
model along with data-driven techniques, and compare the performance of multiple data-driven techniques. The ARMA
model adds a new feature that is used within the data-driven model to give the final prediction. The novelty of our work
is the integration of the ARMA methodology with data-driven techniques to predict fault events. This study reports
on a real industrial case of unscheduled removals of a critical valve of the aircraft engine. Our results suggest that the
support vector regression model can outperform the life usage model on the evaluation measures of sample standard
deviation, median error, median absolute error, and percentage error. The generalized linear model provides an effective
approach for predictive maintenance with comparable results to the baseline. The remaining data-driven models have a
lower overall performance.
Keywords: Real Case Study; Aircraft Prognostics, Predictive Maintenance; Data-driven Techniques; ARMA Modeling;
Life Usage Modeling;
2010 MSC: 00-01, 99-00
1. Introduction rely on usage models do not take any predictions into ac-
count. This often results in unnecessary and unproductive
The precise and accurate prediction of fault events is
30 equipment replacements as well as emergency/unscheduled
of central interest to the field of predictive maintenance.
repairs.
Prognostics, as the prediction of events related to the con-
To address the previous issue, we propose a methodol-
5 dition of engineering systems [1], can support the practice
ogy that combines two well-known prognostics techniques:
of predictive maintenance with advanced fault detection
data-driven modeling and auto-regressive moving average
capabilities as well as technologies for the prediction of use-
35 (ARMA) forecasting. Here, the focus is not on modeling
ful lifetimes. As a driver and key enabler of efficient main-
maintenance schedules, as in life usage, but instead the
tenance, prognostics assumes a pivotal role both in im-
online prediction of the next fault event based on past ex-
10 proving maintenance practices and in reducing the cost of
perience.
operations. As a discipline that relies extensively on past
Data-driven modeling consists in the use of non-parametric
experience and observation to perform prediction, data is
40 methods to extract useful information from data. In the
of great importance to prognostics. A kind of data that
context of prognostics, data-driven methods are usually
plays a major role in performing successful prognostics is
used to predict the equipment’s end-of-life (EOL), i.e the
15 life usage data, i.e. data related to the life-cycle of the
point where the equipment no longer meets its design spec-
system. It is important to extract useful knowledge from
ifications [3, 4]. These models have traditionally been used
usage data to better understand and predict the evolution
45 with sensor data, that is, data from some sort of measure-
of failures through time.
ments or instrumentation, or with a combination of sensor
The methods with the longest history in maintenance
and life usage data. Data-driven models have rarely been
20 are the life usage (LU) models [2]. These methods focus on
used with only usage data. An exception to this is the work
the use of life usage data to estimate the hard-time inter-
of Datong et al. [5] who uses a combined online support
val at which the equipment necessarily needs to be subject
50 vector regression with different global and local kernels to
to an intervention. This kind of maintenance scheduling
predict fault events.
continues to be widely in use, especially when the complex-
In our past work we have successfuly applied the data-
25 ity of the system does not allow a model-based approach
driven technique of support vector regression to mainte-
and there is insufficient data to perform more sophisti-
nance events [6]. Also, we have proposed a framework
cated prognostics. However, maintenance practices that
Preprint submitted to Journal of Computers & Industrial Engineering October 31, 2017
55 for prognostics using both messages and removal data [7]. to our work is that in [16, 17]. They also combine the data-
The focus in this latter work was to analyze the influence of driven approach (least squares support vector regression)
fault messages on prognostics using several types of data- with ARMA forecasting on a dataset of aircraft fault data.
driven models. In both previous works we did not system-115 However, the predictions of the data-driven and ARMA
atically account for all the past information on removal models are combined using a density function estimation
60 data to predict future fault events. Only simplistic fea- method. In contrast, in our framework the predictions of
tures such as “past removal time”, “mean of past removal the ARMA model are integrated as features within the
times”, and “standard deviation of past removal times” data-driven model. The strength of the interaction be-
were used to train the data-driven models. In contrast, this120 tween the two models is hence much stronger in our frame-
paper explicitly focuses on incorporating the temporality work.
65 of fault events to facilitate prognostics. First, it includes The following points summarize the main contributions
several additional statistical measures of removal times to of this work:
improve the learning process within the data-driven tech-
nique. Second, the entire history (i.e., temporality) of fault • Novel framework for online fault prediction:
events is captured using an ARMA model, and the output125 In this work we propose a unique solution to inte-
70 of this ARMA model is used as a feature that integrates grate an ARMA model with data-driven techniques.
with the data-driven model. As a result, the available re- The proposed framework performs prognostics in an
moval data is utilized more efficiently. online fashion, meaning it is based on runtime mon-
ARMA is a powerful forecasting methodology that is itoring of past experience [18].
able to capture trends found on a time series and project its130 • Statistics features: In addition to the feature re-
75 future values. We chose this approach because it allowed lated to the ARMA model feature we also consider a
us to naturally capture the empirical properties of tempo- set of 13 statistical features to the prediction. This
rally interrelated data. Several authors have used ARMA features are general enough to be applied to any kind
in prognostics. Ho and Xie [8] use an autoregressive inte- of prognostics problem.
grated moving average (ARIMA) model to predict the time
80 of next fault event. A number of authors [9, 10, 11] employ135 • PCA solution: The framework includes a module
the ARMA methodology to forecast failure rates. Other of feature transformation whose goal is to transform
authors [12, 13] use ARMA to directly forecast equipment the set of raw features into a reduced set of uncor-
faults. Similar to our work, these papers use ARMA to related features. Despite PCA being a traditional
capture information from usage/removal data. However, technique in prognostics it is rarely used in prognos-
85 their focus is only on ARMA and no data-driven model is140 tics based solely on usage data.
used. In our case, data-driven modeling allows us to lever-
age ARMA predictions, using machine learning to gather We validate our approach by comparing the proposed
further insight into the forecasting outcome and improve framework to a traditional life usage Weibull model. This
its accuracy and efficiency. Further, data-driven modeling comparison is done on a real industrial case involving the
90 can capture additional information from the removal data maintenance of a critical valve of the aircraft. This valve
using features such as the “time since last repair” or the145 is a critical and complex element of the aircraft air man-
“maximum past time between repairs”. agement system − the tracking of a potential failure is a
Different forms of prediction have complementary pow- difficult and time-consuming process even for maintenance
ers and limitations. By the combination of diverse tech- experts. The goal here is to help improve the state-of-the-
95 niques, it is possible to improve the performance of prog- art in this kind of problems.
nostics systems. [14]. This is the main motivation for this150 The remaining of the paper is as follows. Section 2
study. Here, we apply the ARMA, a traditional forecasting describes the baseline life usage model while Section 3
technique, with sophisticated data-driven methods. These presents our framework. The case study and the data set
two forms of prediction are combined to obtain a better are described in Section 4. The methodology is explained
100 prediction accuracy. The integration of ARMA method- in Section 5 and the results are presented in Section 6.
ologies with a wide range of data-driven techniques is the155 To conclude, Section 7 summarizes the paper and outlines
main novelty of our work. So far, the combination of these future research directions.
two kind of prognostics methods has not been explored
extensively. The authors that have studied the integra- 2. Life Usage Model
105 tion of these two approaches usually only test a single
data-driven technique with ARMA and tend to focus on The traditional modeling approach to usage/fault data
sensor data. For instance, in the work of Kozlowski [15] is life usage (LU) models [2]. In these models, the usage
the author fuses three predictors − auto-regressive mov-160 measures of the equipment are fit to a probability distri-
ing average (ARMA), neural networks, and fuzzy-logic to bution to characterize failure behavior. Commonly used
110 predict the lifetime of batteries based on impedance mea- distributions include the Exponential, for constant failure
surements (sensor data). The work that is closest in spirit rates, and the Weibull or Log-normal, for varying rates.
2
Figure 2: Life Usage model. The hard-time interval T is estimated
from past usage data ({yE (t)}n t=1 ) using a Weibull distribution.
Maximum likelihood estimation (MLE) is used to fit the Weibull
distribution (defined by the parameters α, β, γ) to the data. The
interval T is used to set the future fixed-time maintenance plan.
Figure 1: Typical bathtub failure curve. Weibull distributions with
β < 1 have a failure rate that decreases with time, also known as
infantile or early-life failures. Weibull distributions with β close to • β = 1.0 indicates a stage of random failures (inde-
or equal to 1 have a fairly constant failure rate, indicative of an useful pendent of age) (II): this second stage represents the
life with random failures. Weibull distributions with β > 1 have a
failure rate that increases with time, also known as wear-out failures.
period where failure has a constant rate
These comprise the three sections of the classic “bathtub curve”.
• β > 1.0 indicates a stage of wear out failures (III):
195 this third stage happens at the end of the equipment
Reliability theory tends to recommend that, unless strong life-cycle. In it the equipment tends to have more
165 evidence of failure times following another distribution, wear-out failures due to the aging process
the Weibull distribution should be used as the preferred
In a life usage model based on the Weibull distribu-
failure model [19]. The Weibull is a continuous distribu-
tion, the fixed time between two preventive maintenance
tion proposed in 1951 [20]. Although initially received
200 actions is usually chosen based on the scale parameter of
with skepticism, it has become widely used in reliability,
the Weibull distribution (α), according to the criticality
170 mostly due to its ability to deal with small sample sizes
and risk of the equipment. This parameter, usually des-
and its flexibility to approximate a wide range of statisti-
ignated as the equipment characteristic life, is important
cal distributions. The Weibull three-parameter (α, β, γ)
as it relates to the mean time between failures (MTBF)
probability density function is defined as
205 and to the point where 63.2% of the equipment will have
β
(
β t−γ β−1 −( t−γ
e α ) t ≥ 0, probably already failed.
fT (t) = α α (1) As an example, consider Figure 2 that illustrates the
0 t < 0,
workings of a life usage model. The model receives as
where α is the scale parameter (or characteristic life), β is input a set of consecutive past failures. These are repre-
175 the shape parameter (or slope), and γ is the location pa-210 sented on the first timeline ({yE (t)}n=6
t=1 ). The model (f )
rameter (or failure free time). The two parameter Weibull outputs a plan of hard intervals (T ) based on the fitting
distribution is obtained when γ is set to zero. of the Weibull distribution (Equation 1) to the data set
of event times ({yE (t)}n=6
t=1 ) using the characteristic life of
The slope of the Weibull plot (β) determines which the distribution (T = fr (α)).
180 member of the family of Weibull failure distributions best
describes the data and also indicates the class of failures.215 3. Proposed Framework
According to reliability theory and the classical “bathtub
curve” [21], failures can be grouped into three distinct In this section we introduce the framework that we
classes (see Figure 1), based on the behavior of the failure developed to predict fault events. We start by explaining
185 rate function the general framework and its constituent parts. Then, we
β−1 describe in detail each of the modules of the framework in
β t a dedicated subsection.
hT (t) = (2)220
α α
3.1. General Description
• β < 1.0 indicates infant mortality and a stage of
early failure (I): this early stage describes the time The goal of our framework is to predict the next fault
interval in which failure behavior is not sufficiently event given the past history of events. The framework
developed due to unknown influences, such as design consists of two models: (1) an ARMA model and (2) a
190 problems or incorrect configuration 225 data-driven model. The purpose of the ARMA model is
3
Figure 3: Proposed framework. The proposed framework consists of two models: the auto-regressive moving average (ARMA) model and
the data-driven model. Prediction proceeds in three steps: first, the past times series of failure events ({yE (t)}n
t=1 ) is input into the ARMA
model, then the ARMA model outputs predictions which are fed into the data-driven model. Finally, the data-driven model uses the features
to calculate the final model predictions ({ŷE (t)}n
t=1 ).
4
280 on a set mathematical equations, predicts the next fault Table 1: Raw statistical features.
event. These predictions are used as input to the PCA
solution to generate the final features of the data-driven Feature Calculation
model (see Figure 4). The goal of having an ARMA model, Time of previous event yE (t − 1)
is to have the ability to perform sequential learning − the Min of past events min {yE (1), . . . , yE (t − 1)}
285 ARMA model captures the entire history of past events Q1 of past events CDF−1 (0.25)
and generates predictions that aid the data-driven model Median of past events CDF−1 (0.50)
in the process of calculating its final predictions (see Fig. 6). Pn−1
Mean of past events t=1 yE (t)
The ARMA model is able to capture the time series not
Q3 of past events CDF−1 (0.75)
only as a function of its past values but also as a moving
Max of past events max {yE (1), . . . , yE (t − 1)}
290 average (unevenly weighted) of past noise or residual val-
Std of past events σ{yE (1), . . . , yE (t − 1)}
ues. To estimate the parameters of the ARMA model we √σ
Se of past events n−1
fit the model to the uni-variate time series of event times
by linear least squares. Skewness of past events E( X−σy¯E )
µ4
Kurtosis of past events Pσ4
3.2.2. Statistical Features Nr previous short events Pi (t − 1)isShort(i)
295 The module of statistics (see Figure 4) consists in cre- Nr previous events it−1
ating initial features that are then used by the PCA to cre- Note: CDF stands for cumulative distribution func-
ate the final features to train the data-driven model. We tion, yE (t) the time to event E, n the number of past
construct 13 statistical features from the initial dataset of events at time, σ the standard deviation, ȳ the mean,
removal times, which are general enough to be applied to E the expectation operator, and µ4 is the 4th central
300 any kind of application (see Table 1). This initial set of moment. The function isShort outputs 1 if the re-
features has the goal of capturing, for each removal time, moval is short and 0 otherwise.
the past history of removals, catching its central tendency
and distribution spread.
To generate the initial features (see Table 1), we ap-335 As shown in Fig. 4 and Fig. 6 we use the predictions of
305 ply a set of basic functions from descriptive statistics [22]. ARMA model as raw features that are transformed in the
Among the considered functions are the mean, median, PCA module. This allows us to have sequential learning
first quartile and interquartile range. We also include the in the framework. First, we calculate predictions using
standard deviation as a measure of the spread of the dis- the ARMA model, use the output of ARMA model as
tribution of past event times (larger standard deviations340 raw features, use PCA to generate uncorrelated features
310 indicate wider distributions). Skewness and kurtosis are which are used to train a data-driven model, and finally, we
also included as measures of the symmetry and flatness generate the final predictions with the data-driven model.
of the distribution. In addition to the previous features,
a measure of the sampling instability of the mean of past 3.3. Data-driven Model
events, the standard error (SE), is considered in the model. Formally, the data-driven model can be described as
315 Given the correlated nature of these statistical functions,345 a mathematical function fˆ that attempts to provide esti-
the resulting features are highly correlated among each mates as close as possible to the actual true function of
other. To remove these correlations we apply PCA. interest. In our case, the model aims to approximate the
As a mathematical procedure that transforms a num- function of time to event based on a set of PCA explaining
ber of (possibly) correlated variables into a (smaller) num- feature variables (x):
320 ber of uncorrelated ones − the principal components (PCs),
PCA is a suitable dimension reduction tool to address our fˆ : X −→ Ŷ (3)
feature selection problem. We hence use PCA to reduce our fˆ(x) = fˆ({x1 (t), . . . , xf (t)}) = yˆE (t) (4)
initial set of 13 statistical feature variables to a smaller set
of uncorrelated final features, that still contains most of350 The process of deriving predictions of the next event
325 the information of the original larger set. Fig. 5 illustrates follows the standard methodology in data-driven model-
the entire procedure. ing [24]. There are two major phases of the process: the
training phase and the prediction phase. Fig. 6 illustrates
3.2.3. Principal Component Analysis the process. As shown, in the training phase, a machine
Principal component analysis (PCA) is a technique that355 learning algorithm (e.g. random forests, neural networks)
is used to reduce the dimensionality of a multivariate dataset receives input from the feature extraction module (see Sec-
330 [23]. It transforms the initial raw features into another set tion 3.2). Here, the input consists in a reduced set of PCA
of final new features, the PCA features. The methodology features. Using these features, the algorithm is able to cre-
uses an orthogonal transformation to convert the set of ate a data-driven model. Five machine learning techniques
correlated features into a set of values of linearly uncorre-360 are tested; four of them classified as top regression algo-
lated new features. rithms in data mining [25]: k-nearest neighbors, random
5
Figure 5: Statistical features. The statistical features are derived from past usage data ({yE (t)}n
t=1 ) using a set of functions from descriptive
statistics. These features are then output to the Principal Component Analysis module to derive the Principal Components (PCs) features.
These final features are used to train the data-driven models.
Figure 6: Training and prediction phase. Prior to prediction, the data-driven model needs to be trained on historical data {yE (t)}n t=1 to
capture statistical relationships between input and output. Here, the input are the features derived from the training set by the feature
n+1
extraction module {x1 (t), . . . , xf (t)}n
t=1 , and the target output is the actual time to failure/inspection {yE (t)}t=2 . During the prediction
phase, the features extracted from a new set of historical data {x1 (t), . . . , xf (t)}m
t=1 are used to estimate the next time to failure/inspection
ŷE (m + 1).
6
forests, neural networks and support vector machines. We
also apply the generalized linear regression.
The prediction phase uses the past series of failures and
365 scheduled events to estimate the new failure time. Again,
here feature extraction (see Section 3.2) is necessary to
derive PCA features from the predictions of the ARMA
model and the raw statistical features. Each machine
learning method has its own way of using these features to
370 calculate the next event based on the training data. Re-
garding the update of the data-driven model, each time a
new failure or scheduled event occurs, this information can
be used to update the training set and update the existing
data-driven model in operation.
375 We chose data-driven modeling as the predictive ap-
Figure 7: Illustrative example of bleed air system. The valve of
proach of our work as this methodology is widely used in interest is located near the engine (engine bleed valve). 1 Feb 2017.
prognostics. The ability to derive good predictions from Digital image. Web.
removal times using a data-driven method is an important
initial step to build more complex models.
415 Typical maintenance of an engine bleed valve involves
having the valve removed. Here by removal we mean a
380 4. Case Study maintenance and repair action where the equipment is re-
moved from the aircraft and restored to its original condi-
To validate our framework and compare its performance
tion or replaced by a new/repaired unit. The engine bleed
with the baseline life usage model we studied a real-world
420 valve is one of the most frequent causes for scheduling in-
case from the aeronautics field. This same case has been
terruptions for aircraft fleets. Costs could be reduced with
studied in some of other previous works such as [26, 6, 7,
the prevention of the high number of unscheduled and/or
385 27].Here, in this section we briefly describe the case study
unnecessary removals that occur.
by first introducing the background and then the used data
set.
4.2. Data set
4.1. Background 425 To address the problem of unscheduled removals we
study real data of 584 engine bleed valve removals recorded
Engine bleed air systems can vary widely in design and
between 2010 and 2015 from aircrafts of three airlines.
390 operation from one airplane type to the other, but they all
Figure 8a shows an histogram of the time between the
perform the same basic group of functions. These sys-
removals. As shown in the plot, the maintenance data of
tems ensure the critical tasks of monitoring and control-
430 the valves is highly dispersed, with the interval between
ling the cabin temperature and air flow to the cockpit,
removals ranging between 0 to 653 days. The highest
passenger, and cargo areas as well as the cooling of avion-
density of removals is located around the 164 days. This
395 ics. A schematic of the studied engine air bleed system is
value marks the division between “short” and “long” re-
presented in Figure 7. The system consists on a complex
movals. K-means clustering with k=2 was used to assess
structure of ducts, tapes, valves and regulators. Of special
435 this value. Even though the distinction between short and
interest is the engine bleed valve (EBV), a kind of pres-
long removals is clear from Figure 8a this inference is not
sure regulator and shut-off valve located near the aircraft
straightforward by looking at the removal’s past history.
400 engine.
The prognostics problem is hence complex.
The engine bleed valve is one of the most important
Figure 8a depicts a parametric and non-parametric es-
components of the bleed air system. The criticality of this
440 timate of the probability density function of the removal
component follows from the number of important functions
times. Specifically regarding the Weibull fit, both the
it performs: this valve is responsible for transferring bleed
parametric estimate of Figure 8a and the probability Q-Q
405 air from the air turbine starter to other parts of the engine
plot of Figure 8b show that the Weibull distribution fits
or the aircraft for further use. Also, when the valve is open
the data well. We also tested the fitted distribution with
each engine is able to supply air to its corresponding air
445 a Kolmogorov-Smirnov test for goodness of fit. Here, the
conditioning pack and anti-ice system.
null hypothesis was that the distributions are equivalent
The engine bleed valve is a valve that is subject to
against the alternative hypothesis that they differ. Ac-
410 extreme conditions. When subject to vibratory loads, an-
cording to the test results, the null hypothesis can not be
gular or radial deflections, high temperatures and/ or dif-
rejected (0.03, p = 0.66).
ferential thermal growth, the valve may become worn, un-
450 Regarding the features of Tab. 1, Mann-Whitney tests
sealed and/or may begin to leak. As a result, the valve can
were used to compare the distributions per type of event
show signs of fast and unexpected mechanical degradation.
(short or long) (see Table 3). The test results (p < 0.05)
7
(a) Density plot of time between removals. (b) QQ plot of time between removals.
Figure 8: Histogram and QQ Plot of data set. The histogram on the left shows the fitting of the Weibull and empirical estimation methods
to the data. The Weibull parameter beta, β, by being close to 1, suggests the equipment is in its operating life stage. The Q-Q plot on the
right allows a visual assessment of the model fit. The plot shows that the Weibull distribution fits the data well with just a slight deviation
in the right tail.
Table 2: Airline predictors, by type of event (short or long events),475 H: The data-driven modeling approach inte-
p-values of χ2 test added.
grated with auto-regressive moving average (ARMA)
forecasting can outperform traditional life us-
nLong %Long nShort %Short nAll %All
age models (Weibull analysis).
Airline 1 46 50.0 375 76.2 421 72.1
Airline 2 41 44.6 108 21.9 149 25.5 In this section we describe the methodology followed to
Airline 3 5 5.4 9 1.8 14 2.4480 test this hypothesis. We describe the evaluation methods
p < 0.01 92 100 492 100 584 100 used and the performance metrics.
8
Table 3: Raw statistical features. Numerical predictors by type of event (short or long), p-values of Mann-Whitney test added.
9
505 5.3. Performance Metrics
A considerable number of performance metrics have
been used in prognostics [29]. In this study we feature a
subset of these. The used accuracy and precision met-
rics include, for example, the root mean squared error
Algorithm 1 Evaluation of Weibull model. 510 (RMSE), and the median absolute deviation (MAD) among
1: procedure 10-fold Cross Validation others. We also include the mean time between failures
2: Input:{yE (t)}nt=1 (MTBF), a metric from the reliability domain [30]. This
3: #Split the removal vector in k=10 equal samples last metric is recommended in [29] as a cost/benefit metric
4: {v}ki=1 ← splitVector({yE (t)}nt=1 , k = 10) intended to measure in an abstract way the benefit pro-
5: for i=1 to k do 515 vided by prognostics. An extensive list of the considered
6: #Select the testing fold metrics and calculation methods is provided in Table 4.
7: in.test ← vi We encourage the reader to get more details on these met-
8: #Select the remaining training folds rics from [31].
9: in.train ← {v}kj=1 , j , i
10: α, β = fitWeibull(in.train) 6. Results
11: for r=1 to length(in.test) do
12: #Compute error for each instance in test 520 In this section we present the results of comparing the
13: er ← (α − in.testr ) life usage approach to our data-driven approach. Con-
14: Output:{ei }n i=1 #Returns the error vector cretely, we compare the life usage approach to 5 vari-
ants of our model, each using a different machine learning
technique, namely generalized linear regression (GLM), k-
525 nearest neighbors (KNN), neural networks (NN), support
vector regression (SVM), and random forests (RF). The
numerical results of this comparison are presented in Ta-
ble 5. Please see Table 4 for a complete description of the
evaluation measures.
530 From Table 5, the main conclusion is that the data-
driven modeling approach was able to improve or have
comparable results to the life usage method in the perfor-
mance measures. The only technique with worse overall
Algorithm 2 Evaluation of data-driven approach. performance was the neural networks. This can be ex-
535 plained by the fact that, given the high density of removals
1: procedure 10-fold Cross Validation
with a short duration, the neural networks tends to over-
2: #Receive as inputs the data set of removal times and
fit to this subset of removals and predicts a removal every
the designation of the technique to be used day, resulting in poor performance results. The remaining
3: Input:{yE (t)}n t=1 , k = 10, technique data-driven models were better able to output predictions
4: #Split the removal vector in k=10 equal size samples
540 closer to the true values of the removal times. In the fol-
5: {v}k n
i=1 ← splitVector({yE (t)}t=1 , k = 10) lowing subsections we analyze each method according to
6: for i=1 to k do #Select the testing fold the different evaluation measures.
7: in.test ← vi
8: #Select the remaining training folds Bias and Accuracy Results
9: in.train ← {v}kj=1 , j , i
10: #Perform PCA on the training data Fig. 9 displays the distribution of unsigned, absolute,
11: pca.train ← P CA(in.train) 545 squared and percentage errors for the different models.
12: #Train model m of technique t on the data These plots allow analyzing the bias and accuracy of the
13: m ← train(pca.train, technique) data-driven techniques versus life usage. In the group of
14: for r=1 to length(in.test) do data-driven methods, the techniques with a comparable or
15: #Compute the error of each testing instance superior accuracy to life usage were the nearest neighbors,
16: er ← (predict(m, in.testr ) − in.testr ) 550 random forests, generalized linear model and support vec-
tor machines. In particular, the support vector machines
17: Output:{ei }n i=1 #Returns the error vector had equal or considerably better results on all metrics ex-
cept for the mean error (ME).
The inferior results of the support vector machines on
555 the mean error measure can be explained by the fact that
10
Table 4: Performance metrics.
u1 n
u X
2
Root Mean Squared Error RMSE t (ŷE (t) − yE (t))
n t=1
n
1 X ŷE (t) − yE (t)
Mean Absolute Percentage Error MAPE
n t=1 yE (t)
n
1X
Mean Absolute Error MAE |ŷE (t) − yE (t)|
n t=1
Median Absolute Error MdAE median({|ŷE (t) − yE (t)|}nt=1 )
Median Absolute Deviation MAD median
v ( {|yE (t) − median({yE (t)}nt=1 )|}nt=1 )
u n
uX
u
u ((ŷE (t) − yE (t)) − ME)2
t t=1
Sample Standard Deviation SSD
t−1
Mean Time Between Failure MTBF ŷE
Note: n stands for number of {yE (t)}nt=1 observations in the testing set. For each observation
yE (t), the model outputs the ŷE (t) prediction. Here, variable yE (t) means the time to next
removal at time index t. All measures are given in days except the MAPE (%) and SSD.
11
Table 5: Performance results.
Error Measures
Bias Accuracy Precision Reliability
ME MdE MAE MdAE RMSE MAPE MAD SSD MTBF
Life Usage -0.06 33.58 72.85 59.53 98.26 522.39 60.32 98.36 88.93
Data-driven
NN -87.99 -54 87.99 54 131.71 93.29 59.3 98.12 1
KNN -2.16 19.3 76.28 52.1 105.19 584.63 67.68 105.28 86.82
RF 4.97 26.76 76.04 55.22 103.53 577.13 66.52 103.52 93.96
GLM 0.17 26.78 72.49 55.99 98.61 525.32 62.2 98.71 89.15
SVM -31.21 -0.69 66.61 39.92 102.57 322.57 60.15 97.81 57.77
Figure 9: Box plots of signed, absolute, squared and percentage errors for life usage and data-driven models. Overall, the support vector
machines is the data-driven model with the best accuracy results, showing narrow error ranges with low first and third quartile values as
well as mean and median statistics comparable or superior to those of life usage, except for mean error. The only data-driven model that
consistently showed worst accuracy than life usage was the neural networks. Remaining models had comparable or similar results.
12
Precision Results
The precision results of Table 5 can be analyzed visu-
ally in Figure 10. The figure displays the density plot of the
595 signed errors of the different methods. Overall, from Fig-
ure 10 and Table 5, it can be seen that the precision of the
compared approaches is sensibly the same. With the ex-
ception of the nearest neighbors and random forests, which
had a slightly larger median absolute deviation (MAD),
600 the data-driven models showed comparable results on this
metric and also in regards to the sample standard devia-
tion (SSD).
As shown in Figure 10, the error distribution of the life
usage model has a negative skew (−1.75) with the long
605 “tail” on the negative side of the peak. All data-driven
models also reported a negative skew similar to the life
usage (except for the special case of neural networks). In
this respect the models were similar. Please note that
the optimal behavior would be a perfectly symmetrical
610 distribution with narrow ranges and the mean exactly at
the peak.
13
Table 6: Qualitative comparison of baseline and data-driven models. Table 5 into the criteria in Table 6. The more asterisks, the
better the model on the criterion. The overall comparison
LU NN KNN RF GLM SVM
of the methods shows that the models are complementary
ME **** * **** **** **** *
− their dissimilar characteristics make them suitable to
MdE * * **** *** ** ****
670 different objectives and industrial scenarios. There is no
MAE **** * * ** *** ****
clear-cut winning method: each of the studied approaches
MdAE * *** **** **** ** ****
has its own benefits and drawbacks. For instance, the life
RMSE **** * * ** *** ***
usage model has the best results in regards to mean er-
MAPE *** **** * * *** ****
ror but shows poor results in median error − the method
MAD **** **** * * ** ****
675 might show poor accuracy in situations where low MdE
SSD *** **** * * ** ****
and MdAE are needed. Its high MAPE is also especially
Score: *** ** ** ** *** **** distressing as, in our case, it means that a considerable
number of short removals will occur before a scheduled
maintenance intervention. One of the methods that best
680 seems to address this problem is the support vector ma-
chines.
The support vector regression shows promising achieve-
ments. This method had more favorable results than the
life usage method. Here, the most important outcomes
685 were the mean and median absolute errors (MAE and
MdAE) and the percentage error (the MAPE). Having
better median and mean absolute errors is important as
it means that on (absolute) average the model is better
able to predict failure. The MAE (mean absolute error) is
690 one of the most important metrics in the industry of aero-
nautics as it provides a clear view of the absolute accuracy
of the model [29]. More importantly, we had better results
with the SVM on MAPE (percentage error). This kind of
metric weights more the errors related to short removals
695 than those of long removals. Industrially, it is not so im-
portant if we have a large error in a long removal as in a
Figure 12: Biplot for a normed PCA (correlation circle): the direc- short removal. Please note that having a short removal of
tion and length of arrows show the quality of the correlation between 20 days being predicted as 10 days is more serious (in the
the raw features and between the raw features and PCA features industrial environment) than having a long removal of 100
700 days being predicted as 90 days. In the first case, 50% of
the residual life is lost while in the second case only 10% is.
individual raw feature. Two raw feature vectors pointing
Accordingly, attaining a MAPE as low as 322% (vs. 522%)
in the same direction are highly positively correlated (co-
is an improvement. In challenging data sets such as ours,
sine = 1); two vectors pointing in opposite directions are
with a high density of short removals, it might be a better
highly inversely correlated (cosine = -1); and two vectors
705 choice to use a data-driven SVM-based method than the
650 with a 90o angle are independent (cosine = 0). The length
traditional life usage approach.
of a feature vector represents its weight on each axis and
The generalized linear model may also be a viable al-
indicates the relative importance of the predictor to each
ternative to the life usage model. From Table 6 it can be
of the first two PCA features, represented in the x-axis
seen that the model has positive results in most metrics
and y-axis respectively. It can be seen from the length
710 and outperforms the life usage in median errors (MdE and
655 and position of the ARMA vector that this feature has a
MdAE). In contrast with the support vector machines, the
considerable effect on the first and second PCA features.
model outputs unbiased estimates. Regarding the remain-
These results suggest that the ARMA predictor is an im-
ing models, the estimates of the neural networks model
portant predictor of removal time.
are too conservative. This technique may however output
715 interesting estimates in other maintenance scenarios. The
6.3. Summary of Findings
nearest neighbors and random forests models loose consid-
660 To summarize the findings of this study, the six studied erably in precision when compared to other methods.
models are classified qualitatively in Table 6 according to Given these results we claim that there is sufficient ev-
three criteria, namely bias, accuracy, and precision. The idence to support our main hypothesis: the data-driven
reliability criterion is not included in our evaluation given720 modeling approach combined with ARMA forecasting can
its subjectivity. We apply the quantile function to compute outperform traditional life usage models. Concretely, we
665 the percentiles that transform the continuous variables of have shown that the method of support vector regression
14
is able to outperform the life usage model on the evalua- 4. Si X.S., Wang W., Hu C.H., Zhou D.H. Remaining
tion measures of sample standard deviation, median error, Useful Life Estimation – A Review on the Statistical Data
Driven Approaches. European Journal of Operational Research
725 median absolute error and percentage error. The gener- 2011;213(1):1–14.
alized linear model showed good performance but could780 5. Datong L., Yu P., Xiyuan P. Fault prediction based on time
not outperform the overall performance of the life usage series with online combined kernel svr methods. In: Instrumen-
model. The remaining models had lower performance. tation and Measurement Technology Conference (I2MTC’09).
IEEE; 2009:1163–6.
6. Baptista M., de Medeiros I.P., Malere J.P., Prendinger H.,
785 Nascimento Jr C.L., Henriques E. Improved time-based main-
7. Conclusion tenance in aeronautics with regressive support vector machines.
In: Annual Conference of the Prognostics and Health Manage-
730 Models based on usage data will continue to be nec- ment Society. PHM Society; 2016:.
essary as there are many reliability situations where only 7. Baptista M., de Medeiros I.P., Malere J.P., Nascimento C.,
this kind of data is available. In this work we attempted790 Prendinger H., Henriques E.M. Comparative case study of
life usage and data-driven prognostics techniques using aircraft
to address the question of how to improve usage-based fault messages. Computers in Industry 2017;86:1–14.
prognostics, that is, the prediction of fault events based 8. Ho S., Xie M. The use of ARIMA models for reliability
735 on failure/inspection data. Here, we proposed the use of forecasting and analysis. Computers & Industrial Engineering
a data-driven approach integrated with ARMA forecast-795 1998;35(1-2):213–6.
9. Singh N. Forecasting time-dependent failure rates of systems
ing as an alternative to traditional life usage modeling. operating in series and/or in parallel. Microelectronics Reliabil-
Our results showed that the proposed approach can have ity 1994;34(3):391–403.
a better performance than traditional life usage for ab- 10. Li R.y., KANG R. Research on failure rate forecasting method
740 solute and percentage errors. This was shown on a con-800 based on ARMA model. Systems Engineering and Electronics
2008;8.
crete case study involving a critical valve of commercial 11. Li B., Zhao J., Guo J. Innovative metrics for equipment
aircraft. From the set of data-driven techniques studied, failure evaluation and prediction system based on arma model.
the regressive support vector machines produced the best Systems Engineering and Electronics 2011;33(1):98–101.
overall results. 805 12. Zhao J., Xu L., Liu L. Equipment fault forecasting based
on arma model. In: International Conference on Mechatronics
745 With this study, we have shown that it is possible to and Automation (ICMA). IEEE; 2007:3514–8.
improve prognostics models that are based solely on re- 13. Huang J.G., Luo H., Long B., Wang H.J. Prediction research
moval times. The goal of this work was to reinforce the about small sample failure data based on arma model. In: In-
notion that it is important to extract useful information810 ternational Conference on Testing and Diagnosis (ICTD’09).
IEEE; 2009:1–6.
from sensor data but also from usage data. The develop- 14. Weigel A.P., Liniger M., Appenzeller C. Can multi-model
750 ment of advanced models of usage data can help better combination really enhance the prediction skill of probabilistic
understand the influence of usage predictors on prognos- ensemble forecasts?Quarterly Journal of the Royal Meteorolog-
ical Society 2008;134(630):241–60.
tics and also create better prognostics models. In data-815 15. Kozlowski J.D. Electrochemical cell prognostics using online
driven prognostics, it is possible to build more sophisti- impedance measurements and model-based data fusion tech-
cated forms of prognostics on top of simpler ones by the niques. In: Aerospace Conference; vol. 7. IEEE; 2003:3257–70.
755 inclusion of new features. It is therefore a viable research 16. Su S., Zhang W., Zhao S. Fault prediction for nonlinear system
820 using sliding arma combined with online ls-svr. Mathematical
direction to use data-driven techniques to first extrapolate Problems in Engineering 2014;2014.
predictions from usage data and later on use these insights 17. Online fault prediction for nonlinear system based on sliding
to build more complex models based both on sensor and ARMA combined with online LS-SVR, author=Su, Shengchao
usage data. and Zhang, Wei and Zhao, Shuguang, booktitle=33rd Chinese
825 Control Conference (CCC), pages=3287–3291, year=2014, or-
760 An important research direction can be the applica- ganization=IEEE. ????:.
tion of data-driven techniques to other case studies. Fu- 18. Salfner F., Lenk M., Malek M. A survey of online fail-
ture research may also include the exploration of ensemble ure prediction methods. ACM Computing Surveys (CSUR)
techniques and uncertainty management. 2010;42(3):10.
830 19. Liu C.c. A Comparison between the Weibull and Lognormal
Models used to Analyse Reliability Data. Ph.D. thesis; Univer-
sity of Nottingham; 1997.
Bibliography 20. Weibull W. A statistical distribution function of wide applica-
bility. Journal of Applied Mechanics 1951;103:293–7.
765 References 835 21. Klutke G.A., Kiessler P.C., Wortman M. A Critical Look
at the Bathtub Curve. IEEE Transactions on Reliability
1. Goebel K., Daigle M., Saxena A., Sankararaman S., I. R., 2003;52(1):125–9.
Celaya J. Prognostics: The Science of Making Predictions. 22. Trochim W.M. Descriptive statistics. Research Methods of
2017. Knowledge; 2006.
2. Schwabacher M. A Survey of Data-driven Prognostics. In: Pro-840 23. Jolliffe I. Principal component analysis. Wiley Online Library;
770 ceedings of the AIAA Infotech@ Aerospace Conference. IEEE; 2002.
2005:1–5. 24. Schwabacher M., Goebel K. A survey of artificial intelligence
3. Jardine A.K., Lin D., Banjevic D. A review on ma- for prognostics. In: Aaai fall symposium. 2007:107–14.
chinery diagnostics and prognostics implementing condition- 25. Wu X., Kumar V., Quinlan J.R., Ghosh J., Yang Q., Motoda
based maintenance. Mechanical Systems and Signal Processing845 H., McLachlan G.J., Ng A., Liu B., Philip S.Y., et al. Top
775 2006;20(7):1483–510. 10 Algorithms in Data Mining. Knowledge and Information
15
Systems 2008;14(1):1–37.
26. Baptista M., P. de Medeiros I., P. Malere J., Prendinger H.,
L. Nascimento Jr. C., Henriques E. A Comparison of Data-
850 driven Techniques for Engine Bleed Valve Prognostics using
Aircraft-derived Fault Messages. In: Third European Confer-
ence of the Prognostics and Health Management Society. PHM
Society; 2016:.
27. Baptista M.L., de Medeiros I.P., Malere J.P., Nascimento
855 C.L., Prendinger H., Henriques E. Aircraft on-condition
reliability assessment based on data-intensive analytics. In:
Aerospace Conference, 2017 IEEE. IEEE; 2017:1–12.
28. Refaeilzadeh P., Tang L., Liu H. Cross-validation. In: Ency-
clopedia of Database Systems. Springer; 2009:532–8.
860 29. Saxena A., Celaya J., Balaban E., Goebel K., Saha B.,
Saha S., Schwabacher M. Metrics for evaluating performance of
prognostic techniques. In: Prognostics and Health Management.
IEEE; 2008:1–17.
30. IEEE IEEE Standard 1413: Standard Methodology for Reli-
865 ability Prediction and Assessment for Electronic Systems and
Equipment2002;.
31. Goebel K., Saxena A., Saha S., Saha B., Celaya J. Prognostic
performance metrics. Machine Learning and Knowledge Discov-
ery for Engineering Systems Health Management 2011;147.
16
Research highlights:
• A novel framework for predictive maintenance is proposed
• The ARMA methodology is combined with PCA and data-driven tech-
niques