Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

ISA Transactions xxx (xxxx) xxx

Contents lists available at ScienceDirect

ISA Transactions
journal homepage: www.elsevier.com/locate/isatrans

Research article

Data-driven remaining useful life prediction via multiple sensor


signals and deep long short-term memory neural network

Jun Wu a , , Kui Hu a , Yiwei Cheng b , Haiping Zhu b , Xinyu Shao b , Yuanhang Wang c
a
School of Naval Architecture and Ocean Engineering, Huazhong University of Science and Technology, Wuhan, China
b
School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China
c
China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, China

highlights

• A new deep long short-term memory (DLSTM) model is constructed for accurate remaining useful life (RUL) prediction.
• Proposed DLSTM model fuses multi-sensor signals for enhanced RUL prediction performance.
• The proposed method is very suitable for multisensory scenario which is validated by two multisensory experiments.

article info a b s t r a c t

Article history: Remaining useful life (RUL) prediction is very important for improving the availability of a system and
Received 4 April 2019 reducing its life cycle cost. This paper proposes a deep long short-term memory (DLSTM) network-
Received in revised form 26 June 2019 based RUL prediction method using multiple sensor time series signals. The DLSTM model fuses
Accepted 2 July 2019
multi-sensor monitoring signals for accurate RUL prediction, which is able to discover the hidden
Available online xxxx
long-term dependencies among sensor time series signals through deep learning structure. By grid
Keywords: search strategy, the network structure and parameters of the DLSTM are efficiently tuned using
Remaining useful life an adaptive moment estimation algorithm so as to realize an accurate and robust prediction. Two
Deep long short-term memory (DLSTM) various turbofan engine datasets are adopted to verify the performance of the DLSTM model. The
neural networks experimental results demonstrate that the DLSTM model has a competitive performance in comparison
Deep learning
with state-of-the-arts reported in literatures and other neural network models.
Sensor data fusion
© 2019 ISA. Published by Elsevier Ltd. All rights reserved.

1. Introduction signals of the system degradation process to predict the RUL, so


they are easier to be implemented compared to the other two
Any unexpected failure of a critical equipment might lead to kinds of the methods.
great economic losses, even catastrophic consequences. If remain- Time series signals are the most common choices in data-
ing useful life (RUL) is predicted in advance, the failure might driven RUL prediction since they are capable of well describing
be avoided by predictive maintenance. Thus, the RUL prediction the health degradation [7]. Many RUL prediction methods have
becomes more and more important to enhance the availability achieved an excellent prognostics using time series data [1,8].
and reliability of the equipment as well as reduce its life cycle
Ompusunggu [9] described a Kalman filter (KF)-based RUL predic-
cost. Based on the analysis of historical data, the RUL prediction
tion method for automatic transmission clutches. Wu [10] utilized
can estimate the residual time of normal operation in health
extreme learning machine (ELM) for RUL prediction of lithium-
state [1–3].
According to the literatures, the existing RUL prediction meth- ion battery which is optimized glowworm swarm optimization.
ods of the system can be categorized into three classes, that is, Khelif [11] proposed a direct RUL estimation approach using
model-based methods [4], data-driven methods [5], and the hy- support vector regression (SVR) which avoids degradation state
brid methods [6]. In practice, the development process of failure estimation or failure threshold setting. Morando [12] adopted
is influenced by various factors, which leads to many uncertain an echo state network model for RUL forecasting of fuel cells
parameters in failure modeling. However, the data-driven meth- using time series data. Li [13] constructed a convolution neural
ods perform deep analysis and mining of the collected monitoring network to achieve RUL prediction by using variate time series in
the acquired monitoring signals. However, these methods achieve
∗ Corresponding author. RUL prediction by mapping the relationship between monitoring
E-mail address: wuj@hust.edu.cn (J. Wu). signals and RUL values, which do not take into account the

https://doi.org/10.1016/j.isatra.2019.07.004
0019-0578/© 2019 ISA. Published by Elsevier Ltd. All rights reserved.

Please cite this article as: J. Wu, K. Hu, Y. Cheng et al., Data-driven remaining useful life prediction via multiple sensor signals and deep long short-term memory neural
network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.07.004.
2 J. Wu, K. Hu, Y. Cheng et al. / ISA Transactions xxx (xxxx) xxx

relevance of time series signals at different times that reflect the


micro change of the health state.
Recurrent neural network (RNN), as a solution to the above
problem, may extract useful vital information from the previously
processed data across time steps and integrate them into the cur-
rent cell state to model the sequential data [14]. Malhi [15] intro-
duced a RNN-based long-term prediction method for health con-
dition monitoring of machinery. Chandra [16] achieved chaotic
time-series prediction via RNNs trained by competitive cooper-
ative coevolution method. However, the problem of gradients
vanishing or exploding in training limits the wide application of
traditional RNNs [17–20]. Fortunately, an improved RNN struc-
ture named long short-term memory (LSTM) has been created
to relieve this problem and used for time series data prediction.
By introducing a set of memory neurons, LSTM has demonstrated
excellent capability in learning robust and sensitive data. Guo [21]
introduced a LSTM-based feature fusion method to fuse multiple
feature data and construct a health index for RUL prediction of
rolling bearing. However, the predicted mean error for two exper-
imental cases are 32.48% and 23.24%, which are not ideal enough
to meet the requirements of industrial application. Cheng [22]
utilized LSTM for failure prediction of rolling bearing and per-
formed a convincing result. However, only the vibration signal is
Fig. 1. Structure of LSTM neuron.
analyzed and the multi-sensor data scene is not considered. More
recently, Elsheikh [23] developed a bidirectional handshaking
LSTM model for RUL prediction. However, it is a challenge for
deep learning (DL) to be performed due to the limitations of the across time. Therefore, RNN has a strong ability in processing
network structure. sequential and time-dependent data [14]. Mathematically, the
Experience shows that predictors with multiple sensors can output of the hidden layers at time t is described as:
increase the prediction accuracy and reliability [24–27]. In LSTM,
ht = ϕ (whx xt + whh ht −1 + bh ) (1)
multi-sensor data are input in matrix form which contains tem-
poral and spatial information for a robust and enhanced esti- where whx and whh are weight coefficients of the input data xt and
mates. On this basis, a deep LSTM (DLSTM)-based RUL prediction previous output ht −1 of hidden layer neuron, bf represents the
approach is proposed for equipment in this paper using multiple bias. However, because of the vanishing gradient problem in the
sensor time series signals. We exploit the combination of DL and backward propagation process during model training, RNN has no
LSTM to construct the DLSTM. In the DLSTM, multiple sensor ability to capture long-term dependencies in the data.
signals are fused to explore more potential information. Given To alleviate the issue, LSTM is proposed by Hochreiter and
a grid search strategy, the network structure and parameters of Schmidhuber [28] based on the RNN architecture. In the LSTM
the DLSTM are tuned efficiently by adaptive moment estimation network structure, LSTM neurons replace traditional RNN hidden
(ADAM) algorithm to obtain an accurate and robust RUL predic- neurons to construct LSTM layer. Every LSTM neuron has three
tion model. A dropout method is introduced in the DLSTM model well-designed gates function, namely forgotten gate, input gate
to relieve the overfitting problems and regularize model. and output gate. Such structure ensures that LSTM neuron has the
The main contributions of this paper are summarized as fol- ability to discover and memory long-term dependencies. Fig. 1
lows. First, a new DLSTM model is constructed for accurate RUL depicts the structure of LSTM neuron.
prediction where some explorations are made to optimize the As shown in Fig. 1, three gate functions in LSTM neuron
network structure and parameters as well as relax overfitting for provide a good nonlinear control mechanism for controlling in-
specific prediction problems. Second, the proposed DLSTM model formation inputting and removing. Input gate resolves what in-
fuses multi-sensor monitoring signals for enhanced RUL predic- formation will be entered into the neuron states. The forget gate
tion performance, which is able to capture the hidden long-term determines what information in the neuron states need to be
dependencies among sensor time series signals via DL structure. discarded. In addition, the output gate decides what information
Third, the proposed method is suitable for multisensory scenario, to be exported from neuron state. The computational process in
which is verified in the experiment part. LSTM neurons can be mathematically expressed as
The remainder of this paper is organized as follows. In Sec- gt = ϕ wgx xt + wgh ht −1 + bg
( )
(2)
tion 2, a new RUL prediction method based on DLSTM-based is
proposed using multiple sensor signals, which involves model it = σ (wix xt + wih ht −1 + bi ) (3)
parameter optimization. In Section 3, two practical experimental ft = σ wfx xt + wfh ht −1 + bf
( )
(4)
studies are implemented for validation of the DLSTM-based RUL
prediction method. The conclusions is drawn in Section 4. ot = σ (wox xt + woh ht −1 + bo ) (5)
st = gt ⊙ it + st −1 ⊙ ft (6)
2. DLSTNN-based RUL prediction with multiple sensors
ht = ϕ (st ) ⊙ ot (7)
2.1. Basic theory of LSTM where wgx , wfx , wix and wgx are weights of input data xt . wgh ,
wfh , wih and woh are weights of the previous output ht −1 of LSTM
Unlike common neural networks, RNN has a unique structure neurons; bg , bf , bi and bo indicates the bias of input node, forget
that the output of hidden layers will return recurrently as input, gate, input gate and output gate; gt , ft , it and ot are the output
which means that hidden layers have self-connections to itself of input node, forget gate, input gate and output gate; σ and ϕ

Please cite this article as: J. Wu, K. Hu, Y. Cheng et al., Data-driven remaining useful life prediction via multiple sensor signals and deep long short-term memory neural
network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.07.004.
J. Wu, K. Hu, Y. Cheng et al. / ISA Transactions xxx (xxxx) xxx 3

but also shares with other neurons. For DLSTM model, the num-
ber of LSTM layers and the neuron number in LSTM layer are vital
to model performance. Hence, these two important parameters
are optimized by grid search strategy in this paper, which is
detailed in Section 2.3.
A fully-connected dense layer is adopted as the output layer,
where LSTM layer output are sent into it and multi-sensor data
are eventually fused to the RUL values. The mean squared error
function, as a commonly used loss function in machine learning,
is adopted for minimizing the error between the predicted RULs
and the RUL labels. During the testing stage, online sensor data is
sequentially sent into the trained DLSTM and predicted RULs will
be acquired.

2.3. Model parameter optimization

Model parameters directly affect the performance of the built


DLSTM for the RUL prediction. To obtain the optimum RUL pre-
diction performance, the parameters optimization of the DLSTM
model is mainly implemented from the following three aspects.

2.3.1. Model structure


The size of the DLSTM model structure including LSTM layer
number and neuron number in each LSTM layer need to be well
confirmed. Those are two important hyper parameters in the
DLSTM model to control the architecture and topology of the
network. Many literatures have proposed the parameter opti-
mization method of DLSTM. Almalaq [32] proposes a parameter
optimization method for DL network based on genetic algorithm,
which uses model parameters to form genes for hybridization
Fig. 2. Illustration of structure of the built DLSTM model. and mutation, and obtain optimal network parameters through
multiple iterations. Hu [33] has successfully realized the opti-
mization of DL model by using differential evolution algorithm,
and the optimized model has excellent prediction performance.
represent the sigmoid and tanh function; st and st −1 are the LSTM
Although these two optimization methods have achieved good
neuron states at time t and t − 1; ⊙ represents the pointwise
results, the optimization process is extremely complex. Because
multiplication.
of the high complexity of DLSTM network structure and the
long training process, DLSTM combined with these optimization
2.2. Proposed DLSTM model for RUL prediction with multiple sensors
methods will inevitably put forward higher requirements for
computing resources.
DL, as an extension of the artificial neural network, can adap-
Grid search is used in the DLSTM for network configura-
tively capture potential features in data through multi-layer net-
tion exploration, which has simple and clear principles, easy-to-
work structure [29–31]. From a structural point of view, DLSTM
implement algorithms and low computational resource require-
contains multiple hidden layers which is a form of DL. In this
ments. The candidates of LSTM layer number and the neuron
paper, a DLSTMN model is constructed to realize automatic fusion
number in each layer form a two-dimensional grid, and each node
of multi-sensor data and accurate prediction of RUL. Fig. 2 shows
parameter in the grid is verified to choose optimal network struc-
the structure of the constructed DLSTM model.
ture parameter. Finally, the parameter with the best validation
The entry of input data is controlled by the input neurons.
prediction performance is considered optimal and used in the
Hence, the number of input neurons is equal to the number
online RUL prediction.
of selected sensor signals. Sensor time series signals data are
divided into different parts and used for network model training,
2.3.2. Optimization of the loss function
verification and testing respectively. The input data is constructed
The optimization algorithm of the loss function will directly
into a two-dimensional matrix. The number of rows and columns
affect the efficiency and time of DLSTM training. In the proposed
in the matrix are k and T , where k denotes the quantity of
DLSTM model, a novel algorithm called adaptive moment esti-
selected sensors and T indicates the quantity of sampling data.
mation (Adam) is utilized to replace the traditional stochastic
Multi-sensor data are eventually fused to the RUL values.
gradient descent (SGD) optimizer for minimizing loss function in
Multiple LSTM layers are stacked in the constructed DLSTM
DLSTM. In SGD algorithm, a fixed learning rate (LR) is kept to up-
model for carrying out deep excavation and fusion of multi-
date all the weights which means that the LR remains unchanged
sensor data. Different LSTM layers are spatially connected, and
in training. The Adam algorithm designs independent adaptive LR
the data are output from the upper layer to the neurons in the
for different parameters by analyzing the first moment estimate
next layer. The same LSTM layer is time-dependent, and the
(FME) and second moment estimate (SME) of gradient. Hence,
previous output of LSTM layer will loop into this layer as input.
Adam algorithm has higher computational efficiency, but requires
Numerous LSTM neurons are contained in every LSTM layer to
less configuration resources. The updating process of network
capture long-term dependencies of sensor data. In every LSTM
parameters by Adam algorithm is expressed as:
layer, LSTM neurons form information exchange with each other ∑
to realize self-connection across time. Moreover, the output of gt = ∇θ L(θt −1 ) (8)
each neuron not only circulates into itself at the next moment, i

Please cite this article as: J. Wu, K. Hu, Y. Cheng et al., Data-driven remaining useful life prediction via multiple sensor signals and deep long short-term memory neural
network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.07.004.
4 J. Wu, K. Hu, Y. Cheng et al. / ISA Transactions xxx (xxxx) xxx

mt = β1 mt −1 + (1 − β1 ) gt (9) Fig. 4 shows sensor 2 measurements of all engines in FD001.


nt = β2 nt −1 + (1 − β ) 2 As shown in Fig. 4, the nonlinear sensory signals have various
2 gt (10)
mt lifespan of engines, which makes accurate RUL prediction of the
m̂t = (11) engines a challenging task.
1 − β1t
It should be noted that all the methods in this paper are exe-
nt
n̂t = (12) cuted on Anaconda and Python3.6. And the computing equipment
1 − β2t is a computer with Intel Core i5-4460 (3.20 GHz) CPU, 16 GB RAM.
m̂t
θt = θt −1 − η × √ (13)
n̂t + ϵ
3.2. Performance evaluation indicators
where gt indicates the gradient of the loss function L(θ ) to net-
work parameter set θ . mt and nt respectively represent the FME
In this study, three indicators including Score, R and RUL error
and SME of gradient. m̂t and n̂t are deviation corrections for mt
range are adopted to evaluate the RUL prediction performance.
and nt , respectively. β1 and β2 represent decay rates of FME and
The indicator Score provided by the data creators [36] is ex-
SME, respectively. η denotes the step size and ϵ is the numerical
pressed as
stability constant. θt −1 is the calculated renewal value of θt at t − 1 ⎧∑ ( )
time. ⎨ n exp − R̂i −Ri − 1 if R̂i − Ri < 0
i=1 13
Score = ∑n ( ) (14)
R̂i −Ri
2.3.3. Dropout i=1 exp −1 if R̂i − Ri ≥ 0

10
Generally, the training of neural networks becomes more
time-consuming so that overfitting turn to a serious problem with where R̂i represents the predicted RUL, Ri indicates the real RUL.
the increase of the number of network layers [34]. Overfitting n indicates the total number of testing engines.
results in the prediction model performing excellent in training The indicator R is defined as
data, but not in test data. To overcome this problem, dropout is 
 n
adopted in the DLSTM to prevent the capture of same features 1 ∑
repeatedly. R=√ (R̂i − Ri )2 (15)
n
The schematic of the dropout method applied to the DLSTM i=1

model is shown in Fig. 3. In this schematic, red circles are the Both the Score and R are used to assess the difference between
neurons in hidden layer which are temporarily discarded from
the forecasted RULs and the actual RULs. And a small Score or R
the network according to a certain probability during the training
value represents a good predictive effect. However, there exists a
process of DLSTM. Since the discards occur randomly, different
subtle difference between the two indicators. As shown in Fig. 5,
networks are trained in each mini-batch. Hence, Dropout can
the Score penalizes late prediction more than the early prediction.
effectively alleviate the data overfitting problem of DLSTM. It
The indicator RUL error range represents the margin of error
should be pointed out that dropout only works during the train-
ing process and is disabled during the testing process, which for all RUL prediction values. A smaller RUL error range reflects
means all hidden neurons are working during the testing process. the higher effectiveness and stability of the prediction method.
In this paper, all candidate dropout values are tested, and the
optimal values are finally applied to DLSTM.
3.3. Multiple sensor data preprocessing

3. Experiment analysis
3.3.1. Sensor data smoothing
3.1. Dataset description Multiple sensor data obtained from the turbofan engines
present a large random fluctuation and noise jamming, which
The datasets used for evaluating the proposed method are might affect the performance of RUL prediction. The exponential
the NASA turbofan engine datasets, which are produced by the smoothing algorithm is adopted to remove the noise and weaken
CMAPSS platform [35]. Different operational settings containing the random fluctuation in the sensor data, which is expressed as
fuel velocity and pressure are variably input for the simulation of
x′t = α xt + (1 − α) x′t −1 ,
{
if t ≥ 2
different faults and degradation process in the turbofan engines. (16)
During the experiments, the turbofan engines begin to run in x′1 = x1 , if t = 1
good condition and some faults are developed which generates where xt is the actual measurement of sensors at t time, x′t repre-
degradation until a failure happens. sents the smoothed value at t time and x′t −1 holds the smoothed
The C-MAPSS platform offers four datasets: FD001∼FD004. In
value at t − 1 time, α indicates the smoothing coefficient.
each dataset, both training and testing sets are included. The
The value of α directly decides the smoothing effect of en-
training sets hold the signal of the whole life time while testing
gine sensor data which indirectly affects the accuracy of RUL
sets only contains multiple sensor data terminated at some time
prediction. Fig. 6 shows the preprocessed sensor data of sensor 2
before engine failure and the RUL need to be predicted. Both the
training sets and the testing sets consist of a series of cycles and with different α compared with raw sensor data in FD001. Three
each cycle contains 26 columns which respectively indicate the different α values are used for the sensor data smoothing which
ID of engine, cycle index, three operational settings and 21 sensor are 0.25, 0.5 and 0.75. As illustrated in Fig. 6, the fluctuations
measurements. of smoothed sensor data are reduced compared with raw sensor
Two groups of the datasets, that is FD001 and FD003, are data and the smoothed sensor data can well reflect the trend of
adopted in this paper since the engine has obvious and clear the raw sensor data. In addition, it is found through a series of
health degradation process. FD001 has only one failure mode comparative experiments that the preprocessed sensor data with
while FD003 has two failure modes. In addition, both FD001 and α value of 0.25 has smaller fluctuations which means the data
FD003 contain 100 training engines and 100 testing engines. smoothing effect is better. So, α is set to 0.25 in this experiment.

Please cite this article as: J. Wu, K. Hu, Y. Cheng et al., Data-driven remaining useful life prediction via multiple sensor signals and deep long short-term memory neural
network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.07.004.
J. Wu, K. Hu, Y. Cheng et al. / ISA Transactions xxx (xxxx) xxx 5

Fig. 3. Dropout applied to the DLSTM model.

well indicate the variation of health degradation. For an accurate


RUL prediction, sensor selection is implemented. Reasonable sen-
sor signals have a good correlation with the process of health
degradation, which shows a trend of monotonous increase or
decrease. [37]. Hence, monotonicity (Mon) and correlation (Corr)
of signal data are analysis to achieve sensor selection.
In order to measure the trend of engine data, the mean trend
feature of engine data are first extracted, expressed as
fU (t) + fL (t)
f (t) = (17)
2
where f (t ) is trend value of sensor signal at time t. fU (t) and fL (t)
represents the upper and lower envelope of sensor signal.
Then, Mon and Corr of the sensor signals are analysis, which
are defined as fellow.
Fig. 4. Sensor 2 measurements for all engines in FD001. ⎧
sign(f (t +1)−f (t ))− t sign(f (t )−f (t +1))|
⎨Mon = | t
∑ ∑

T −1
(18)
|(T t f (t )t −T t f (t ) |
∑ ∑ ∑
k t)
⎩Corr = √

t f (t ) −( t f (t ))
2 2 T
∑ 2 ∑ 2
[T ][ ]
∑ ∑
t t −( t t)

where T indicates the quantity of the signal samples.


Next, a composite selection criteria (CSC) is found by a com-
bination of Mon and Corr, which is expressed as

∫ CSC = ω1 Mon + ω2 Corr


max
T Ω
2

s.t . ωi = 1, ωi > 0 (19)
i=1

where Ω represents all candidate sensor signals, ωi represents


weighted coefficient. It can be seen from the formula that CSC
is linearly and positively correlated with Mon and Corr. In other
words, the higher the CSC index value, the better the sensor can
be represent the changing trend. In order to pick out superior
features, the threshold is set as 0.75.
Fig. 5. Illustration of difference between the score and R. Fig. 7 illustrates the sorted CSC values of the sensors in FD001
and FD003. It can be observed from Fig. 7(a) that CSCs of 21
sensor (S1 ∼ S21) in FD001 are sorted. The CSCs of S2, S3, S4,
S7, S8, S11, S12, S13, S15, S17 S20 and S21 are greater than the
3.3.2. Sensor selection
threshold. Thus, they are selected. In Fig. 7(b), the data of sensor
As mentioned above, signal acquired from 21 sensors are S4, S7, S11, S12, S15, S20 and S21 are selected to construct the
contained in the CMAPSS dataset. However, not all the sensors training dataset in FD003.

Please cite this article as: J. Wu, K. Hu, Y. Cheng et al., Data-driven remaining useful life prediction via multiple sensor signals and deep long short-term memory neural
network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.07.004.
6 J. Wu, K. Hu, Y. Cheng et al. / ISA Transactions xxx (xxxx) xxx

Fig. 6. Raw sensor data and smoothed data of sensor 2 in FD001.

Fig. 7. (a) Sorted CSC values of sensors in FD001; (b) Sorted CSC values of sensors in FD003.

3.3.3. RUL label randomly chosen for the DLSTM model training and the rest of
Noted that the label value has a significant impact on the pre- 10 engines are used to verify the model’s effectiveness.
diction performance. Several papers have proved that piecewise Next, considering the characteristics of monitoring signals in
linear labeling of CMASS engine degradation data is effective and datasets, the proposed DLSTM model for RUL prediction is con-
beneficial [20,38]. In other words, RUL label is assumed to be structed as well as its parameters will be determined optimally.
constant in the initial period and degrades linearly afterwards. The model is trained by the grid search method for exploring
According to the literature [20,38], sample points in the early an optimal DLSTM structure. A two-dimensional grid is formed
stage is labeled with a constant RUL value, which is set to 125 by the LSTM layer number and the neuron number in each
in this paper. layer form, and each node parameter in the grid is verified as a
candidate parameter. Considering the time constraints and com-
putational complexity, the LSTM layer number is set from 1 to 6
3.4. Case one and neuron number in each LSTM layer is set from 50 to 300. In
this grid, each two-parameter combination is used to construct
Dataset FD001 is analyzed for validation of the proposed a new DLSTM. The 10 validation engines are adopted to verify
method. First of all, the selected sensor data of 100 engines each model structure and RMSE are utilized for comparing model
are used to build the training datasets after the preprocessing training results. Fig. 8 shows the training result of DLSTM with
of multiple sensor data is completed, in which 90 engines are different layer numbers and neuron numbers. It can be seen

Please cite this article as: J. Wu, K. Hu, Y. Cheng et al., Data-driven remaining useful life prediction via multiple sensor signals and deep long short-term memory neural
network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.07.004.
J. Wu, K. Hu, Y. Cheng et al. / ISA Transactions xxx (xxxx) xxx 7

Fig. 9. Training of DLSTM with different dropout values.

Fig. 8. Training of DLSTM with different layer numbers and neuron numbers.

Table 1
Training results of DLSTM in the case of partial parameter combination.
No. LSTM layer Neurons number RMSE Training
number in each layer time (s)
1 2 150 20.40 34804.87
2 3 200 20.35 69720.27
3 3 250 18.56 98679.69
4 4 250 20.52 153776.33
5 5 100 18.43 56994.93
6 5 300 18.70 260307.16

from Fig. 8 that each model structure has different performance.


Most important of all, the DLSTM model with 5 LSTM layers
and 100 neurons in each LSTM layer achieves the optimal per-
formance. Hence, a DLSTM model is constructed with 5 LSTM
Fig. 10. Training of DLSTM with different optimizers.
layers and 100 neurons in this case. Table 1 records the training
results of DLSTM in the case of partial parameter combination.
Table 2
Six parameter combinations were selected because they achieved Other parameters of constructed DLSTM model.
better training results. By comparing the parameters and training Parameter Value
time, a conclusion can be found that training time of DLSTM ex-
Initial learning rate 0.001
tends gradually with the increase of LSTM layers and the neuron Training epochs 1000
number in DLSTM layer. Batch size 10
Dropout method is adopted to reduce data overfitting of DL- Loss function Mean square error
STM model. In this paper, Different dropout values have been
tried to determine the optimum in the constructed DLSTM. Fig. 9
shows the training result of the constructed DLSTM with different can be seen from Fig. 11 that the predictions can well reflect the
dropout values. It is obvious in Fig. 9 that DLSTM model has the change of true RULs for 10 engines.
smallest RMSE and gains the best training performance when During the online process, the monitoring signal of testing en-
dropout is 0.7. Therefore, dropout values is set to 0.7 in this case gine is sequentially fed into the DLSTM model and the predicted
for excellent training effect of DLSTM. RULs are obtained. Fig. 12 shows the real RUL versus predicted
Different optimization algorithms are compared with the RUL for 100 engines in FD001. It can be found that the predicted
Adam algorithm for minimizing the loss function in this paper, in- RULs correspond closely to the real RULs.
cluding SGD, root mean square prop (RMSprop), adaptive gradient Some state-of-the-arts are utilized for comparison with DL-
(Adagrad) and Adadelta. Fig. 10 show the training of DLSTM with STM in this paper, including MLP [39], SVR [39], relevance vector
different optimization algorithm. It can be observed from Fig. 10 regression (RVR) [39], deep convolution neural network (DCNN)
that all four optimizers can help DLSTM achieve network opti- [39], ELM with fuzzy clustering (ELM-FC) [40], support vector
mization after 1000 epochs. Nevertheless, Adam in DLSTM shows machine (SVM) [41], echo state network with KF (ESN-KF) [42]
higher efficiency and better convergence than other algorithm. and similarity-based approach (SBA) [43]. The performance com-
parisons are shown in Table 3 where N/A represents that the
Therefore, Adam is utilized in constructed DLSTM as loss function
information is not available. Table 1 compares the performance
optimizer.
of the proposed DLSTM and state-of-the-arts which are imple-
Finally, a DLSTM model with 5 LSTM layers and 100 neu-
mented on the testing engines in dataset FD001 with respect
rons in each LSTM layer is built for RUL prediction of engines. to Score, R and RUL error range. It can be observed that many
Dropout value is set to 0.7 and loss function optimization algo- methods demonstrate their advantages in this RUL prediction
rithm is adopted as Adam. Table 2 reports other parameters of problem, including DCNN, ELM, SBA etc. Compared with other
the constructed DLSTM model. methods, proposed DLSTM owns the smallest Score, R and RUL
After the model training is completed, 10 engines are utilized error range. This means that proposed DLSTM has the best predic-
to performance validation of the trained DLSTM. Fig. 11 compares tion performance, which illustrate the effectiveness of proposed
the predicted RULs and true RULs of 10 engines for validation. It DLSTM on this engine prognostic problem.

Please cite this article as: J. Wu, K. Hu, Y. Cheng et al., Data-driven remaining useful life prediction via multiple sensor signals and deep long short-term memory neural
network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.07.004.
8 J. Wu, K. Hu, Y. Cheng et al. / ISA Transactions xxx (xxxx) xxx

Fig. 11. Prediction results of 10 engines for validation.

Fig. 12. Real RUL versus predicted RUL for 100 engines.

Table 3 concentrated near 0. Compared with other models, the predicted


Performance comparisons. RUL errors of the DLSTM is more concentrated, which shows the
Methods Score R RUL error range better prediction stability of DLSTM.
MLP [39] 17992 37.56 N/A Table 5 compares the prediction results of five models based
SVR [39] 1381 20.96 N/A on the three evaluation indicators and time consumption. It can
RVR [39] 1502 23.80 N/A
DCNN [39] 1287 18.45 N/A
be seen from Table 5 that the proposed DLSTM consistently
ELM-FC [40] 1046 N/A [−80, 120] outperforms the other compared RNN models in terms of Score,
SVM [41] N/A 29.82 [−64, 69] R and RUL error range. The BDLSTM and BDGRU are better at
ESN-KF [42] N/A 63.46 [−185, 120] processing time series data due to their unique bi-directional net-
SBA [43] 791 N/A N/A
work structure. Hence, the evaluation indicators of the BDLSTM
DLSTM 655 18.33 [−47, 56]
and BDGRU are slightly worse than the optimal results. Owing
to the simplest structure of the DRNN, the prediction results of
Table 4 the DRNN are relatively poor compared with other evolved RNN
Key parameters of all methods.
structures. In terms of time consumption analysis, the DLSTM
Model LSTM layer Neurons number Dropout shows no obvious advantages compared with other models. It is
number in each layer
because the DLSTM has a relatively complex network structure.
DRNN 4 100 0.5
The online average calculation time of all the models meets the
DGRU 4 150 0.5
BDLSTM 3 150 0.6 industrial requirements, which proves the DLSTM can be applied
BDGRU 3 300 0.6 to practical equipment in industrial systems.
DLSTM 2 250 0.7
4. Conclusions

A novel sensor data-driven RUL prediction method is in-


3.5. Case two
troduced in this paper by using the built DLSTM model. This
Due to the inclusion of more failure modes, accurate RUL method integrates multiple sensory signal by utilizing the DLSTM
prediction using dataset FD003 is more difficult than dataset to achieve more accurate and robust prediction. Grid search is
FD001. In this paper, the dataset FD003 is utilized to compare applied to obtain the best network structure and the Adam algo-
the RUL prediction performance of DLSTM and other RNN meth- rithm is used to optimize parameters quickly and efficiently. In
ods, including deep RNN (DRNN) [44], deep gated recurrent unit addition, dropout method is adopted for relieving the overfitting
(DGRU) [45], bi-directional GRU (BDGRU) [46] and bi-directional in training. To verify the prediction effect of the DLSTM, the C-
LSTM (BDLSTM) [47]. Grid search and dropout are adopted in MAPSS turbofan engine dataset are utilized in this paper. With the
all methods to obtain the optimal models. Key parameters of all data preprocessing and sensors selection, satisfactory prediction
methods are recorded in Table 4. performance is obtained by the DLSTM-based RUL prediction
Fig. 13 shows the box plot of the RUL error for five models. method for turbofan engines. It is found that the DLSTM is ideal
The predicted RUL errors of the 100 engines for five models are for predicting RUL and achieves the best prediction performance

Please cite this article as: J. Wu, K. Hu, Y. Cheng et al., Data-driven remaining useful life prediction via multiple sensor signals and deep long short-term memory neural
network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.07.004.
J. Wu, K. Hu, Y. Cheng et al. / ISA Transactions xxx (xxxx) xxx 9

Fig. 13. The boxplots of predicted RUL error for five models.

Table 5
Performance comparisons of five models on the dataset FD003.
Methods Score R RUL error Training Online average
range time (s) calculation time (s)
DRNN 1358 26.12 [−73,44] 81503.25 0.11
DGRU 1105 20.86 [−48,45] 296997.23 0.15
BDLSTM 980 19.48 [−54,67] 234484.16 0.28
BDGRU 967 19.94 [−52,67] 449436.09 0.36
DLSTM 828 19.78 [−44,38] 346454.71 0.18

on dataset FD001. Meanwhile, a variety of RNN models are [6] Liao CL, Köttig F. A hybrid framework combining data-driven and model-
compared with the DLSTM, and the prediction results of dataset based methods for system remaining useful life prediction. Appl Soft
Comput 2016;44:191–9.
FD003 show that the DLSTM model is superior to other RNN
[7] Ramasso E, Rombaut M, Zerhouni N. Joint prediction of continuous and
models. discrete states in time-series based on belief functions. IEEE Trans Cybern
In addition, this method has the ability of multisensory data 2013;43:37–50.
fusion and can be widely applied to different types of equipment [8] Liu J, Zio E. Prediction of peak values in time series data for prognos-
in industrial field. Hence, the application of the proposed method tics of critical components in nuclear power plants. IFAC-PapersOnLine
2016;49(28):174–8.
in other multi-sensor monitoring equipment will be carried out [9] Ompusunggu AP, Papy JM, Vandenplas S. Kalman-filtering-based prog-
in the future. nostics for automatic transmission clutches. IEEE/ASME Trans Mech
2016;21:419–30.
Acknowledgments [10] Wu JR, Xu JX, Huang XL. An indirect prediction method of remaining life
based on glowworm swarm optimization and extreme learning machine
for lithium battery. In: Proceedings of the 2017 36th chinese control
This research is funded in part by the National Natural Sci- conference (CCC). Dalian (China); 2017. p. 7259–64.
ence Foundation of China under the Grant No. 51875225 and [11] Khelif R, Chebel-Morello B, Malinowski S, Laajili E. Direct remaining useful
51605095, in part by the National Key Research and Development life estimation based on support vector regression. IEEE Trans Ind Electron
2017;64(3):2276–84.
Program of China under the Grant No. 2018YFB1702302, and in
[12] Morando S, Jemei S, Gouriveau R, Zerhouni N, Hissel D. Fuel cells remaining
part by the Key Research and Development Program of Guang- useful lifetime forecasting using echo state network. In: Proceedings of the
dong Province, China under the Grant No. 2019B090916001. 2014 IEEE vehicle power and propulsion conference. Coimbra (Portugal);
2014. p. 1–6.
Declaration of competing interest [13] Li X, Ding Q, Sun JQ. Remaining useful life estimation in prognostics using
deep convolution neural networks. Reliab Eng Syst Saf 2018;172:1–11.
[14] Liu H, Zhou J, Zheng Y, Jiang W, Zhang Y. Fault diagnosis of rolling
The author(s) declared no potential conflicts of interest with bearings with recurrent neural network-based autoencoders. ISA Trans
respect to the research, authorship, and/or publication of this 2018;77:167–78.
article. [15] Malhi A, Gao RX. Recurrent neural networks for long-term prediction in
machine condition monitoring. In: Proceedings of the 21st IEEE instru-
mentation and measurement technology conference. Como (Italy); 2004.
References p. 2048–53.
[16] Chandra R. Competition and collaboration in cooperative coevolution of
[1] Lei Y, Li N, Guo L, Li N, Yan T, Lin J. Machinery health prognostics: a Elman recurrent neural networks for time-series prediction. IEEE Trans
systematic review from data acquisition to RUL prediction. Mech Syst Neural Netw Learn Syst 2015;26(12):3123–36.
Signal Process 2018;104:799–834. [17] Lukoševičius M, Jaeger H. Reservoir computing approaches to recurrent
[2] Wu J, Wu CY, Cao S, Or SW, Deng C, Shao XY. Degradation data- neural network training. Comput Sci Rev 2009;3(3):127–49.
driven time-to-failure prognostics approach for rolling element bearings [18] Liu B, Cheng J, Cai K, Shi P, Tang X. Singular point probability improve
in electrical machines. IEEE Trans Ind Electron 2019;66(1):529–39. LSTM network performance for long-term traffic flow prediction. In: Na-
[3] Wu J, Wu C, Lv Y, Deng C, Shao X. Design a degradation condition tional conference of theoretical computer science(NCTCS 2017): Theoretical
monitoring system scheme for rolling bearing using EMD and PCA. Ind Computer Science. Wuhan (China); 2017. p. 328–40.
Manage Data Syst 2017;117:713–28. [19] Zhao R, Wang DZ, Yan RQ, Mao KZ, Shen F, Wang JJ. Machine health
[4] Li L-L, Zhang X-B, Tseng M-L, Zhou Y-T. Optimal scale gaussian pro- monitoring using local feature-based gated recurrent unit networks. IEEE
cess regression model in insulated gate bipolar transistor remaining life Trans Ind Electron 2018;65(2):1539–48.
prediction. Appl Soft Comput 2019;78:261–73. [20] Wu YT, Yuan M, Dong SP, Lin L, Liu YQ. Remaining useful life estimation of
[5] Wang YH, Deng C, Wu J, Wang YC, Xiong Y. A corrective maintenance engineered systems using vanilla LSTM neural networks. Neurocomputing
scheme for engineering equipment. Eng Fail Anal 2014;36:269–83. 2018;275:167–79.

Please cite this article as: J. Wu, K. Hu, Y. Cheng et al., Data-driven remaining useful life prediction via multiple sensor signals and deep long short-term memory neural
network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.07.004.
10 J. Wu, K. Hu, Y. Cheng et al. / ISA Transactions xxx (xxxx) xxx

[21] Guo L, Li N, Jia F, Lei Y, Lin J. A recurrent neural network based health [36] Saxena A, Goebel K, Simon D, Eklund N. Damage propagation modeling
indicator for remaining useful life prediction of bearings. Neurocomputing for aircraft engine run-to-failure simulation. In: Proceedings of the 2008
2017;240:98–109. international conference on prognostics and health management. Denver
[22] Cheng YW, Zhu HP, Wu J, Shao XY. Machine health monitoring using (USA); 2008. p. 1–9.
adaptive kernel spectral clustering and deep long short-term memory [37] Elbouchikhi E, Choqueuse V, Amirat Y, Benbouzid MEH, Turri S. An effi-
recurrent neural networks. IEEE Trans Ind Inform 2019;15(2):987–97. cient Hilbert-Huang transform-based bearing faults detection in induction
[23] Elsheikh A, Yacout S, Ouali MS. Bidirectional handshaking LSTM for machines. IEEE Trans Energy Convers 2018;32(2):401–13.
remaining useful life prediction. Neurocomputing 2019;323:148–56. [38] Li X, Ding Q, Sun JQ. Remaining useful life estimation in prognostics using
[24] Xia M, Li T, Xu L, Liu LZ, Silva CW. Fault diagnosis for rotating machinery deep convolution neural networks. Reliab Eng Syst Safe 2018;172:1–11.
using multiple sensors and convolutional neural networks. IEEE/ASME [39] Babu GS, Zhao P, Li XL. Deep convolutional neural network based regres-
Trans Mech 2018;23:101–10. sion approach for estimation of remaining useful life. In: International
[25] Al-Sharman MK, Emran BJ, Jaradat MA, Najjaran H, Al-Husari R, Zweiri Y. conference on database systems for advanced applications. Dallas (USA);
Precision landing using an adaptive fuzzy multi-sensor data fusion 2016. p. 214–28.
architecture. Appl Soft Comput 2018;69:149–64. [40] Javed K, Gouriveau R, Zerhouni N. A new multivariate approach for
[26] Wang W, Hong G, Wong Y, Zhu K. Sensor fusion for online tool condition prognostics based on extreme learning machine and fuzzy clustering. IEEE
monitoring in milling. Int J Prod Res 2007;45(21):5095–116. Trans Cybern 2015;45(12):2626–39.
[27] Wu J, Su Y, Cheng Y, Shao X, Deng C, Liu C. Multi-sensor information [41] Louen C, Ding SX, Kandler C. A new framework for remaining useful
fusion for remaining useful life prediction of machining tools by adaptive life estimation using support vector machine classifier. In: Proceedings of
network based fuzzy inference system. Appl Soft Comput 2018;68:12–23. the 2013 conference on control and fault-tolerant systems (SysTol). Nice
[28] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput (France); 2013. p. 228–33.
1997;9:1735–80. [42] Peng Y, Wang H, Wang JM, Liu DT, Peng XY. A modified echo state network
[29] Qiu X, Ren Y, Suganthan PN, Amaratunga GAJ. Empirical mode decomposi- based remaining useful life estimation approach. In: Proceedings of the
tion based ensemble deep learning for load demand time series forecasting. 2012 IEEE conference on prognostics and health management. Denver
Appl Soft Comput 2017;54:246–55. (USA); 2012. p. 1–7.
[30] Wang JL, Zhang J, Wang XX. Bilateral LSTM: a two-dimensional long short- [43] Wang T, Yu J, Siegel D, Lee J. A similarity-based prognostics approach for
term memory model with multiply memory units for short-term cycle remaining useful life estimation of engineered systems. In: Proceedings of
time forecasting in re-entrant manufacturing systems. IEEE Trans Ind Inf the 2008 IEEE conference on prognostics and health management. 2008.
2018;14(2):748–58. p. 1–6.
[31] Liu B, Wang L, Liu M, Xu C. Lifelong federated reinforcement learning: [44] Schmidhuber J. Deep learning in neural networks: an overview. Neural
a learning architecture for navigation in cloud robotic systems. arXiv Netw 2015;61:85–117.
preprint arXiv:1901.06455. [45] Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F,
[32] Almalaq A, Zhang JJ. Evolutionary deep learning-based energy consumption Schwenk H, Bengio Y. Learning phrase representations using RNN encoder–
prediction for buildings. IEEE Access 2019;7:1520–31. decoder for statistical machine translation. arXiv preprint arXiv:1406.
[33] Hu YL, Chen L. A nonlinear hybrid wind speed forecasting model using 1078.
LSTM network, hysteretic ELM and differential evolution algorithm. Energ [46] Zhao R, Wang D, Yan R, Mao K, Shen F, Wang J. Machine health monitoring
Convers Manage 2018;173:123–42. using local feature-based gated recurrent unit networks. IEEE Trans Ind
[34] Wielgosz M, Skoczeń A, Mertik M. Using lstm recurrent neural networks Electron 2018;65(2):1539–48.
for monitoring the LHC superconducting magnets. Nucl Instrum Methods [47] Zhang Z, Pinto J, Plahl C, Schuller B, Willett D. Channel mapping using
A 2017;867:40–50. bidirectional long short-term memory for dereverberation in hands-free
[35] Frederick DK, DeCastro JA, Litt JS. Users guide for the commercial voice controlled devices. IEEE Trans Consum Electr 2014;60(3):525–33.
modular aero-propulsion system simulation (c-mapss). Technical Manual
TM2007-215026, Cleveland (USA): NASA/ARL; 2007.

Please cite this article as: J. Wu, K. Hu, Y. Cheng et al., Data-driven remaining useful life prediction via multiple sensor signals and deep long short-term memory neural
network. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.07.004.

You might also like