Professional Documents
Culture Documents
Fault_Knowledge_Transfer_Assisted_Ensemble_Method_for_Remaining_Useful_Life_Prediction
Fault_Knowledge_Transfer_Assisted_Ensemble_Method_for_Remaining_Useful_Life_Prediction
Fault_Knowledge_Transfer_Assisted_Ensemble_Method_for_Remaining_Useful_Life_Prediction
3, MARCH 2022
Abstract—Machinery remaining useful life (RUL) predic- the remaining useful life (RUL) [3]. An accurate RUL estima-
tion is an important task in condition-based maintenance. tion can provide useful information for predictive maintenance
Data-driven methods have been widely studied and ap- decisions, thus reducing unplanned breakdown costs. RUL pre-
plied, however, almost all the researches learn degradation
trends regardless of different fault conditions, which can diction has attracted considerable attentions and researches in
lead to different degradation patterns. This article proposes the past decades. However, how to make accurate RUL predic-
a novel fault information assisted RUL prediction method tions still faces many challenges due to the complex machinery
based on a convolutional long short-term memory (LSTM) degradation mechanisms.
ensemble network, where fault conditions are obtained via According to [4], RUL prediction methods can be mainly
fault knowledge transfer. Divergence minimization and do-
main adversarial adaptation are combined to transfer fault categorized into model-based, data-driven based, and combina-
knowledge from a fault dataset to the run-to-failure data tion models. model-based methods develop physical or math-
in a weakly supervised manner. With the predicted fault ematical models to describe the degradation process, such as
information, the RUL prediction network can learn various the Paris–Erdogan model [5]. Data-driven-based approaches
degradation patterns under different faults separately using build models based on the historical condition monitoring data,
a structure of multiple LSTMs. Then an ensemble strategy
based on soft fault conditions is designed to get final RUL which is easy to implement. Combination models integrate the
prediction results. Experiment on bearing datasets verifies model-based and data-driven based models to develop a more
the effectiveness of our proposed method. comprehensive model. Since machinery systems become more
Index Terms—Convolutional neural network (CNN), fault
and more complex, it is difficult to develop a reliable model-
diagnosis, long short-term memory (LSTM) network, re- based method suitable for different conditions. Integrating an
maining useful life prediction, transfer learning. applicable combination model is also challenging. With tremen-
dous big-data-based algorithms arising in the recent years, data-
I. INTRODUCTION driven-based methods have been widely studied and applied.
Useful features or health indicators are first extracted to represent
ITH the rapid development of sensor, control, and
W monitoring technologies in industrial, condition-based
maintenance (CBM) technique has been widely studied and
the degradation trend [6], and then RUL is predicted based on
the features. Recently, with the development of deep learning
and its wide applications in various fields, some end-to-end
implemented to ensure the reliability of complex industrial
deep learning methods have been established for machinery
systems [1]. CBM provides maintenance decisions based on the
RUL prediction. Li et al. [7] employed deep convolutional
condition monitoring information, and diagnostics and prog-
neural network (CNN) to predict RUL of turbofan engines. Zhao
nostics are two main tasks in a CBM system [2]. Diagnostics
et al. [8] combined CNN and bidirectional long short-term mem-
aims to detect and identify the fault modes of a machine system,
ory (LSTM) network to predict milling tool wear. Miao et al. [9]
whereas prognostics assesses the health condition and predicts
proposed a dual-task deep LSTM network to simultaneously
assess degradation state and predict RUL of aeroengines. Qin
Manuscript received April 21, 2021; accepted May 11, 2021. Date of
publication May 18, 2021; date of current version December 6, 2021. et al. [10] proposed a gated recurrent unit network with dual
This work was supported in part by the National Natural Science Foun- attention gates for RUL prediction of bearings.
dation of China under Grant 51975356, in part by the Shanghai AI These data-driven-based prognostic methods learn machinery
Creativity Development Project under Grant 2019-RGZN-01026, and in
part by the Shanghai Municipal Science and Technology Major Project degradation trends of available historical data regardless of its
under Grant 2021SHZDZX0102. Paper no. TII-21-1760. (Corresponding fault modes and degradation patterns, correspondingly suffer-
author: Yixiang Huang.) ing prediction uncertainty [4]. Actually, machinery components
Pengcheng Xia, Yixiang Huang, Peng Li, and Chengliang Liu
are with the State Key Laboratory of Mechanical System and may have various degradation patterns of different individuals
Vibration, Shanghai Jiao Tong University, Shanghai 200240, China and even at different degradation stages due to their diverse fault
(e-mail: xpc19960921@sjtu.edu.cn; david.huangyx@gmail.com; modes in practical applications. Therefore, fault conditions can
peng.li@sjtu.edu.cn; chlliu@sjtu.edu.cn).
Lun Shi is with the Shanghai SmartState Technology Company, provide prior knowledge guiding data-driven methods to better
Ltd, Shanghai 201306, China, and also with the State Key Laboratory model various degradation patterns and improve RUL prediction
of Mechanical System and Vibration, Shanghai Jiao Tong University, accuracy [11].
Shanghai 200240, China (e-mail: shilun@sjtu.edu.cn).
Color versions of one or more figures in this article are available at Though fault condition will affect the degradation process,
https://doi.org/10.1109/TII.2021.3081595. in the literature, very limited RUL prediction researches take it
Digital Object Identifier 10.1109/TII.2021.3081595
1551-3203 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:01 UTC from IEEE Xplore. Restrictions apply.
XIA et al.: FAULT KNOWLEDGE TRANSFER ASSISTED ENSEMBLE METHOD FOR REMAINING USEFUL LIFE PREDICTION 1759
into consideration. Liu et al. [12] proposed a joint-loss CNN to combining 1-D CNN and multiple LSTMs is proposed for RUL
perform bearing fault diagnosis and RUL prediction simultane- prediction. Each LSTM can learn the degradation pattern under a
ously using a partially shared structure. Experiments show that single fault mode, respectively, then an ensemble strategy based
the introduction of diagnosis task improves the RUL prediction on soft fault conditions acquired by FTNN is designed to get the
performance. However, there is a limitation that fault mode final RUL prediction results. The effectiveness of the proposed
of each training sample must be observed and recorded in the method is verified by case study of bearings. Results demonstrate
experiments, which is unrealistic in most practical applications. that the introduction of fault information greatly improves the
In addition, multiple faults may occur successively during a prognostic accuracy. The main contributions of this article are
degradation process and compound faults may be observed after summarized as follows.
the tests [13], then it is difficult to determine the fault type at 1) Fault information is introduced to assist RUL prediction
each predicting time. task through fault knowledge transfer based on a weakly-
Actually, it is difficult to obtain the fault conditions in real ap- supervised domain adaptation. The diagnosed fault con-
plications since fault modes are usually unobserved. A variety of dition helps model to learn various degradation patterns
fault diagnosis methods have been developed based on condition under different fault modes. Through knowledge transfer,
monitoring data to recognize the fault modes. Diagnosis mod- this method can be universally explored without prior
els can also be categorized as model-based, signal-based, and fault information.
knowledge-based (data-driven) according to [14]. Knowledge- 2) To capture degradation patterns caused by different faults,
based or data-driven methods are the most widely studied and a convolutional LSTM ensemble network is proposed for
applied recently. For instance, Fu et al. [15] combined fast RUL prediction in an end-to-end way. 1-D CNN is used to
Fourier transform and uncorrelated multilinear principal com- extract features and a set of LSTMs can model each kind
ponent analysis to diagnose faults of wind turbine systems. of degradation patterns, respectively. Considering fault
These data-driven methods require human experience for feature severity and multiple faults, an ensemble scheme based
engineering. As a result, deep learning algorithms, especially on soft fault conditions is designed to get comprehensive
CNN, have gained more and more attention and shown great RUL prediction results.
success in fault diagnosis tasks in recent years [16]. 1-D CNN is The rest of this article is organized as follows. Section II intro-
used to address 1-D vibration data directly, such as deep CNN duces some theoretical preliminaries of our method. Section III
with wide first-layer kernels (WDCNN) [17] and multiscale describes the proposed method in detail. A case study and results
learning based CNNs [18]. The 2-D CNN is usually utilized after are presented in Section IV. Finally, Section V concludes this
time series permutation or signal-to-image conversion like CNN article.
based on LeNet-5 proposed in [19]. Some more complicated
CNN-based methods like Cascade CNN [20], which introduces II. PRELIMINARIES
cascade structure and dilated convolution operation, have also
been proposed to address fault diagnosis problem. A. Fault Knowledge Transfer
However, in real cases, it is difficult and unrealistic to collect For traditional intelligent algorithms used for fault diagnosis,
sufficient labeled data to train a reliable diagnosis model. Models an general assumption exists that training samples and testing
established based on laboratory data usually fail due to different samples have the same probability distribution. But for samples
data distributions caused by diverse machines or working con- from different operation conditions or machines, this assumption
ditions. Fortunately, transfer learning provides a promising tool usually fails. Transfer learning aims to address this distribution
to address this problem [21]. Lu et al. [22] introduced domain mismatch problem. Let X be the sample from a dataset and
adaptation technique to address fault diagnosis problem under P (X) be its marginal probability distribution. X belongs to
different working conditions. Guo et al. [23] studied knowledge a feature space X , i.e., X ∈ X . Then, a domain is defined
transfer between different machines or datasets with proposed as D = {X , P (X)}. Samples from dataset with fault labels
CNN-based domain adaptation methods. Chen et al. [24] pro- belong to source domain Ds and target domain Dt contains
posed a transfer learning scheme with pretrained CNN on source samples from another dataset with insufficient labels. In general,
dataset. Li et al. [25] proposed a domain adversarial network Ps (Xs ) = Pt (Xt ). One of the most commonly used methods
based method to accomplish diagnosis knowledge transfer from is domain adaptation, which aims to learn domain-invariant
multiple different source machines. features. The feature representation should follow almost same
To integrate fault information to improve RUL prediction per- distributions regardless of whether they are generated from the
formance and develop a universally applicable method, we pro- source domain or target domain. One popular domain adaptation
pose a fault information assisted convolutional LSTM ensemble method is to minimize a divergence which can measure the
method for RUL prediction, where fault conditions are diag- distribution discrepancy of source and target domains, such
nosed through transfer learning. We develop a fault knowledge as maximum mean discrepancy (MMD) [26] and correlation
transfer neural network (FTNN) based on a domain-shared 1-D alignment [27]. Another family of domain adaptation methods
CNN and domain adaptation techniques combining divergence introduces domain adversarial neural network [28], which aims
minimization and domain adversarial adaptation to transfer fault to learn the feature representation, which contains no discrimi-
knowledge from existing fault dataset to the run-to-failure sam- native information for the domain classifier to recognize which
ples. Based on the diagnosed fault information, a deep network domain it is from.
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:01 UTC from IEEE Xplore. Restrictions apply.
1760 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 3, MARCH 2022
where Ns is the number of source domain samples. We should where ⊕ denotes the concatenation operator.
use a source dataset containing almost all potential fault types Then down-sampling process is usually performed through
to provide sufficient fault knowledge. That is, Yt ⊆ Ys , where a max-pooling layer, which extracts the local maximum values
Y = {1, 2, . . . , K} denotes the label space of a domain, and along one direction to reduce the trainable parameters formu-
K is the number of condition categories. Actually, the machine lated by
components at the beginning of run-to-failure tests are usually (j) (j)
pi = max {ck } (3)
brand new, so samples at the very early stage can be regarded as s(i−1)+1≤k≤si
normal without faults. Consequently, the target domain can have (j)
access to part of normal condition samples. We assume there are where s is the length of each pooling region, ck is the convo-
Ntnor samples are labeled as normal category, which form a sub- lution output at the kth time step obtained by the jth kernel.
dataset Xtnor = {(xt,nor
Ntnor
, y = “normal”)}i=1 . The rest Ntunl sam- Finally, the pooling results obtained by multiple kernels are
i
Ntunl stacked column by column to form a feature map.
ples are all unlabeled, forming a subdataset Xtunl = {xt,unl
i }i=1 .
Therefore, the target domain dataset is Xt = Xtnor ∪ Xtunl , which
C. LSTM Network
has Nt = Ntnor + Ntunl samples. Our task is to transfer the fault
knowledge from the source domain Ds to the target domain Dt LSTM network is a popular variant of RNN. LSTM introduces
in a weakly supervised manner. gate mechanisms to enhance its capability of capturing long-
term dependencies. Three gates, i.e., input gate, forget gate, and
B. Convolutional Neural Network output gate, are used to control the information flow and a cell
state Ct is updated at each step. The basic theory of an LSTM
CNN is one of the most widely used deep learning mod-
cell is illustrated as Fig. 2 and formulated as follows:
els. 1-D CNN has been widely applied in fault diagnosis and
prognosis tasks to address sequential signal data. Due to the it = σ(Wi · [xt , ht−1 ] + bi ) (4)
strong feature extraction ability of CNN and great success 1-D
ft = σ(Wf · [xt , ht−1 ] + bf ) (5)
CNN has gained in diagnosis and prognosis tasks, 1-D CNN is
employed as a feature extractor to extract feature representations ot = σ(Wo · [xt , ht−1 ] + bo ) (6)
from raw signal data in the proposed method. Fig. 1 shows a
Ct = it tanh(Wc · [xt , ht−1 ] + bi ) + ft Ct−1 (7)
simplified 1-D CNN structure with a convolutional layer and a
pooling layer. The input Z is signal sequence of N channels, ht = ot tanh(Ct ) (8)
i.e., Z = [z1 , z2 , . . . , zL ], where L is the length of sequence and
zi ∈ RN . The convolutional layer utilizes kernels sliding along where xt and ht denote the input and hidden state of time t,
time direction to perform convolution operations as follows: W∗ and b∗ are the weight matrix and bias matrix, respectively,
represents elementwise product operator, σ(·) denotes sig-
(j)
ci = ϕ(wc(j) · zi:i+l−1 + b(j)
c ) (1) moid activation function, i.e., σ(x) = 1/(1 + e−x ), and tanh(·)
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:01 UTC from IEEE Xplore. Restrictions apply.
XIA et al.: FAULT KNOWLEDGE TRANSFER ASSISTED ENSEMBLE METHOD FOR REMAINING USEFUL LIFE PREDICTION 1761
denotes hyperbolic tangent activation function, i.e., tanh(x) = which is shown as yellow blocks for source domain samples
(ex − e−x )/(ex + e−x ). and green blocks for target domain samples, is used for health
condition recognition, where MMD is employed for domain
III. PROPOSED METHOD adaptation and classification loss is calculated. And the purple
blocks represent the domain classifier Gd for domain adversarial
In this article, a fault knowledge transfer assisted RUL pre-
adaptation.
diction method is proposed. The proposed method consists of
1) Feature Extractor: The feature extractor Gf employs
two important parts: one is the FTNN, and the other is the
three 1-D convolutional blocks to extract features from raw
convolutional LSTM ensemble network for RUL prediction.
signal data of both source and target domains, i.e., Ns source
domain samples xsi and Nt target domain samples xti . Each
A. Fault Knowledge Transfer Neural Network convolutional block contains a 1-D convolutional layer with rec-
The proposed method aims to utilize abundant fault knowl- tified linear unit (ReLU) [29] activation function (ReLU(x) =
edge in the existing fault datasets to assist the RUL prediction max{0, x}) and a 1-D max-pooling layer. Through the hierar-
task to get prior fault information. Since the fault datasets and chical structure, a high-level feature representation is extracted,
the RUL prediction datasets usually have different data distribu- and then a flattened feature vector is formed through a flatten
tions, it is improper to directly use the fault knowledge across layer.
dataset. Transfer learning is an effective and unique method to 2) Condition Classifier: The condition classifier Gc uses two
address this problem. Therefore, we use transfer learning, or fully-connected (FC) layer and an output layer to recognize fault
more specifically, domain adaptation technique to develop a fault categories based on the feature representation. The operation of
knowledge transfer network. Since knowledge transfer across the lth FC layer (l = {1, 2}) can be expressed as follows:
datasets in this manuscript can be challenging, we combine
divergence minimization and domain adversarial adaptation to ylD = ϕ(WlC · yl−1
D
+ bC
l ) (9)
ensure the knowledge transfer ability. Domain adaptation must
where D = {s, t} represent results from source and target do-
base on a feature extractor used to learn domain-invariant fea-
main sample, respectively, y0 denotes the flattened feature vector
tures. In the proposed method, the input of network is raw
obtained by the feature extractor, ϕ is the ReLU activation func-
signals with noise. CNN is the most suitable and powerful
tion. The output layer employs a softmax function to calculate
network to extract features from raw signals according to many
the output probability of each category, which is formulated by
previous literatures in diagnosis and prognosis fields. Therefore,
CNN and transfer learning combining adversarial mechanism ⎡ ⎤ ⎡ (ζ T y +η ) ⎤
p(ĉD = 1|xD ) e 1 2 1
are integrated to construct the FTNN. ⎢ p(ĉD = 2|xD ) ⎥ ⎢ ⎥
⎢ e(ζ2 y2 +η2 ) ⎥
T
⎢ ⎥ 1
The structure of the proposed FTNN is illustrated as Fig. 3. c =⎢
D
⎥ = K (ζ T y +η ) ⎢ ⎥
The network contains three main parts. A feature extractor ⎣ ··· ⎦ k=1 e
k 2 k ⎣ ··· ⎦
(ζK
T
y2 +ηK )
Gf represented by blue blocks is used for building feature p(ĉ = K|x )
D D
e
representations from raw signal data. A condition classifier Gc , (10)
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:01 UTC from IEEE Xplore. Restrictions apply.
1762 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 3, MARCH 2022
where ζk and ηk denotes the weight and bias parameters corre- To address the minimax problem using traditional gradient-
sponding with the kth output, respectively, and ĉD denotes the based optimization algorithm, a gradient reverse layer (GRL) is
predicted category of sample xD from domain D. The category introduced as [28]. The GRL just performs identity mapping in
with maximum conditional probability is the predicted condi- forward propagation, whereas reversing the sign of the gradient
tion type. Cross-entropy loss function is employed to measure during backward propagation before it is passed to the preceding
the condition classification loss of source domain samples Xs layers, i.e., − ∂L
∂θf . Then, the overall optimization objective of
d
and partial target domain samples of normal category Xtnor as the FTNN can be expressed as
follows:
Ns K Ntnor L = Lc + βLMMD + γLd (15)
1 α
Lc = − 1[yns = k] log(csn,k ) − nor log(ctn,1 )
Ns Nt where β and γ are the tradeoff parameters for MMD loss and
n=1 k=1 n=1
(11) domain classification loss. The network is trained by minimizing
where csn,k denotes the kth element of vector csn , which is the this objective via back-propagation algorithm. The network
output corresponding to the nth input sample in source domain, parameters are updated as follows:
yns is the corresponding fault label, where label 1 is the normal
∂Lc ∂LMMD ∂Ld
category, ctn is the output vector corresponding with the nth θf ← θf − δ +β −γ (16)
labeled target domain sample xt,nor and ctn,1 is the first element ∂θf ∂θf ∂θf
n
t
of cn , and α is the tradeoff parameter for target domain loss. ∂Lc ∂LMMD
However, distribution discrepancy of the feature representa- θc ← θc − δ +β (17)
∂θc ∂θc
tions exists. MMD is introduced to the two FC layers to measure
the distribution discrepancy of features of all the input samples ∂Ld
θd ← θd − δ (18)
xsi and xti from two domains and distribution shift is adapted by ∂θd
minimizing MMD. This optimization objective can be expressed where θf , θc , and θd are the parameters of Gf , Gc , and Gd ,
as follows: respectively, and δ is the learning rate.
LMMD = MMD2 (y1s , y1t ) + MMD2 (y2s , y2t ) (12)
Ns Ns B. Convolutional LSTM Ensemble Network for RUL
1
MMD 2
(yls , ylt ) = 2 s
k(yl,m s
, yl,n ) Prediction
Ns m=1 n=1 After the fault condition of each sample is predicted by the
1
Nt Nt FTNN, a convolutional LSTM ensemble network, which uses
+ t
k(yl,m t
, yl,n ) the fault information as prior information is proposed to get
Nt2 m=1 n=1 more accurate RULs. In the proposed RUL prediction network,
Ns Nt we propose to use multiple subnetworks to model degradation
2
− s
k(yl,m t
, yl,n ) (13) patterns under different fault modes, respectively. Since degra-
Ns Nt m=1 n=1 dation patterns rely on temporal information, whereas CNN
is not good at temporal modeling, LSTM, which has strong
D
where yl,n is the feature of the lth FC layer (l = {1, 2}) corre-
temporal modeling ability is used to learn degradation patterns
sponding to the nth input sample in domain D, and k(·, ·) denotes
from the feature representations extracted by CNN. The network
a kernel function. In this article, the Gaussian kernel func-
structure is illustrated as Fig. 4, where the feature extractor
tion is utilized, which is formulated as k(x, y) = exp(− x −
Gf shares the same structure with that in the FTNN. After
y 2 /2σ 2 ), where σ is the kernel bandwidth.
feature representations are extracted, multiple LSTMs are used
3) Domain Classifier: The domain classifier Gd aims to rec-
to model the degradation patterns under different fault modes
ognize the domain each sample belongs to. The feature extractor
and multiple RUL values are obtained. Finally, an ensemble
Gf tries to generate domain-invariant feature representations
process is proposed to get the predicted RUL based on the fault
while the domain classifier Gd updates to distinguish the domain
conditions obtained by the FTNN including fault labels and soft
labels. Then, the domain adaptation process can be converted to
fault conditions.
a minimax problem by minimizing condition classification loss
As degradation trend information is important in RUL pre-
and maximizing domain recognition loss simultaneously. The
diction tasks and LSTM is capable of addressing sequential
domain classifier contains a FC layer and an output layer, which
information, time window technique is used in our method.
are calculated in a similar way as (9) and (10), where the output
When we predict the RUL at time t, signal data of the preceding
dD is a 2-D vector corresponding to the two domain categories.
consecutive T − 1 time cycles are also combined to form a sam-
The domain recognition loss is also defined by cross entropy as
ple Xt = [xt−T +1 , xt−T +2 , . . . , xt ] ∈ RL×T , where xt ∈ RL
Ns Nt
1 1 denotes the signal data at time t, and T is the length of time
Ld = − log(dsn,1 ) − log(dtn,2 ) (14) window. Then, the signal sequence of T channels is input to the
Ns n=1
Nt n=1
feature extractor, which shares the same structure as that in the
where dD D
n,k denotes the kth element of output vector dn where FTNN except that a flatten layer is not introduced after the last
label 1 is source domain and label 2 is target domain. pooling layer.
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:01 UTC from IEEE Xplore. Restrictions apply.
XIA et al.: FAULT KNOWLEDGE TRANSFER ASSISTED ENSEMBLE METHOD FOR REMAINING USEFUL LIFE PREDICTION 1763
Fig. 4. Structure illustration of the convolutional LSTM ensemble network for RUL prediction.
The high-level feature map is then input to an LSTM en- an ensemble mechanism is proposed in testing process. We take
semble module, which contains K LSTM networks with the the output after softmax operation in FTNN (i.e., c), which we
same structure, where K is the number of all possible condition call soft fault condition, as the similarity degree between the
categories. Each LSTM network has an LSTM layer and a health condition and a specific fault. The feature map is input
regression layer. The LSTM layer takes the sequential feature to all the K LSTMs to get K predicted RULs independently,
map, whose number of channels is the same as the kernel number and the soft fault condition is used as ensemble weights to get a
of the last convolutional block, as input, and the hidden state at more comprehensive RUL by
the terminal time step is adopted as the input of the regression
layer, which is calculated as r̂ = c1 r̂1 + c2 r̂2 + · · · + cK r̂K (21)
where hk denotes the hidden state at the terminal time step of C. Method Summary
the LSTM layer in the kth LSTM network, Wrk and bkr are the
weight and bias matrix of the corresponding regression layer, Algorithm 1 gives the detailed algorithmic process of the
respectively, and r̂k is the predicted RUL by the kth LSTM. proposed method.
Since different fault modes can lead to different degradation
patterns, each LSTM in the LSTM ensemble module aims to IV. CASE STUDY
learn the machinery degradation pattern under a specific fault Bearing is one of the most important machine components in
mode separately. At time t, we feed the current signal sample xt industrial applications. Accurately predicting RUL of bearings
into the FTNN to get the predicted fault condition. In training can ensure the reliability of many rotary machines. In this
process, if the machine at time t is predicted to be under the kth article, the proposed method is validated on a bearing run-to-
fault condition, the feature map generated by feature extractor is failure dataset with fault knowledge transfer from a bearing fault
input to the corresponding kth LSTM. Then, the output is taken dataset.
as the predicted RUL to calculate mean-squared-error (MSE)
loss by A. Description of Datasets
Ntr
1 1) CWRU Bearing Dataset: CWRU bearing dataset is a pop-
Lr = (r̂j − rj )2 (20)
Ntr ular bearing fault dataset from experiments by Case Western
j=1
Reserve University [30]. Vibration signals are collected with
where r̂j and rj are the predicted and actual RUL of the jth input, sampling frequency of 12 kHz under four different working
respectively, and Ntr denotes the number of training samples. conditions. Three types of single point faults are introduced to
The network is trained by minimizing this loss value. bearings with fault diameter of 0.1778 mm, thus, four health
However, machines can have different fault severity at differ- conditions are included, i.e., normal (N), inner race fault (IF),
ent time. In addition, multiple faults may exist simultaneously at ball fault (BF), and outer race fault. We select samples with 1200
some time, then the predicted condition type only represents the points and data augmentation is performed by 80% overlapping.
one with the greatest possibility. To overcome these problems, The data details are listed in Table I.
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:01 UTC from IEEE Xplore. Restrictions apply.
1764 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 3, MARCH 2022
vary a lot and mainly two degradation patterns can be observed, where r̂ and r are the predicted and actual RUL, respectively,
one of which has a slightly increasing stage (e.g., Bearing 3) and n is the number of samples.
and the other degrades abruptly when near the end-of-life (e.g., The first convolutional layer of the feature extractor uses wide
Bearing 5). This indicates that different faults may occur during kernels to suppress high frequency noise just like the WDCNN
the experiments for different bearings. To cover these two main in [17]. We set the kernel size of the first layer as 64 and stride as
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:01 UTC from IEEE Xplore. Restrictions apply.
XIA et al.: FAULT KNOWLEDGE TRANSFER ASSISTED ENSEMBLE METHOD FOR REMAINING USEFUL LIFE PREDICTION 1765
TABLE II
DETAILS OF THE PROPOSED MODEL IN THE CASE STUDY
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:01 UTC from IEEE Xplore. Restrictions apply.
1766 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 3, MARCH 2022
Fig. 10. RUL prediction results of CLSTM and our proposed method
on (a) Bearing 3 and (b) Bearing 5. The red straight line represents the
actual RUL.
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:01 UTC from IEEE Xplore. Restrictions apply.
XIA et al.: FAULT KNOWLEDGE TRANSFER ASSISTED ENSEMBLE METHOD FOR REMAINING USEFUL LIFE PREDICTION 1767
V. CONCLUSION
In this article, we proposed a novel fault knowledge transfer
assisted convolutional LSTM ensemble method for RUL predic-
tion. Divergence minimization and domain adversarial adapta-
tion techniques were combined to transfer fault knowledge from
Fig. 11. RMSE values of six prognostic methods. a fault dataset to the run-to-failure samples. With the diagnosed
fault condition information, the RUL prediction network can
learn various degradation patterns under different faults using
(MSCNN) correspond to Fig. 11. Using time frequency repre- the structure of multiple LSTMs. Then, an ensemble process
sentation [38]. Since the training sets are chosen differently in based on predicted soft fault conditions were proposed to get
different literature, we fine-tune the main hyperparameters of RUL prediction results. Experiments on bearing datasets vali-
these three methods to get the best performance following the dates the effectiveness and superiority of our proposed method.
same setup as our grid search experiments. Besides, the results For further researches and applications, this proposed method
are all model output without RUL curve smoothing. All the presents a new prognostic paradigm by diagnosing fault condi-
experiments are repeated for 10 trails to reduce randomness. tions to improve the RUL prediction results, which can also pro-
The RMSE values of our proposed method and the compared vide some guidance for researches on the degradation patterns
methods are summarized as Fig. 11. under different fault types. This may contribute to understand-
From the experiment results, we can see that our pro- ing the machinery degradation mechanisms and constructing
posed method outperforms all the other methods and achieves some hybrid models [39]. Because of the more accurate RUL
great performance improvements on both two testing bear- prediction performance, our model can be integrated into some
ings (48.61% and 31.94% RMSE reduction compared with the complex industrial prognosis systems. For example, the RUL
CLSTM without fault knowledge). It can be seen from Fig. 10 prediction algorithm can be integrated into the key-performance-
that the prediction results do not show too many differences at the indicators oriented prognosis systems [40] to improve the pro-
beginning normal stages compared with the CLSTM. However, duction system reliability or integrated into some industrial
when faults begin to occur, the prior fault information guides cyberphysical systems to contribute to the maintenance decision
the proposed model to utilize different degradation knowledge making process [41].
for different fault condition, thus leading to much more accurate Although great improvements have been achieved by our
prognostic results. Especially at the middle stage of Bearing 3, method in the case study, there are some existing limitations
the complex fault conditions help the model to acquire much that cannot be ignored. First, it is assumed that a fault dataset
more accurate prediction results, instead. The analysis shows containing almost all potential fault types is available to provide
that the model can better model degradation trends under a sufficient fault knowledge. Therefore, our proposed method may
specific fault mode, and then the RUL prediction via ensemble not be applicable for some uncommon machine components
process, which considers complex fault conditions can have where a fault dataset may be difficult to get or not all the main
higher accuracy. It can be noticed from Fig. 10(b) that there fault types can be covered. Second, it is difficult to measure the
are some RUL prediction results near the end-of-life deviating distribution discrepancy degree between the fault dataset and
far from the actual RULs for both CLSTM and our proposed the target run-to-failure dataset, i.e., it is unclear whether the
model on Bearing 5. Since the prediction results with or without fault dataset is suitable for knowledge transfer so the model
fault knowledge both have large relative errors, we infer this performance may have relatively larger uncertainties. As a next
is because the extracted feature representations at that time step, we will try to transfer fault knowledge from multiple source
period are similar to the ones at the early degradation period domain datasets to enhance the fault condition prediction ability
in the training set, leading to much larger predicted RULs. We and develop a suitable metric to measure the transferable degree
inspect the predicted fault types of these about 30 samples, which between a fault dataset and a run-to-failure dataset.
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:01 UTC from IEEE Xplore. Restrictions apply.
1768 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 3, MARCH 2022
REFERENCES [23] L. Guo, Y. Lei, S. Xing, T. Yan, and N. Li, “Deep convolutional transfer
learning network: A new method for intelligent fault diagnosis of ma-
[1] S. Alaswad and Y. Xiang, “A review on condition-based maintenance chines with unlabeled data,” IEEE Trans. Ind. Electron., vol. 66, no. 9,
optimization models for stochastically deteriorating system,” Rel. Eng. pp. 7316–7325, Sep. 2019.
Syst. Saf., vol. 157, pp. 54 – 63, 2017. [24] Z. Chen, K. Gryllias, and W. Li, “Intelligent fault diagnosis for rotary
[2] A. K. S. Jardine, D. Lin, and D. Banjevic, “A review on machinery machinery using transferable convolutional neural network,” IEEE Trans.
diagnostics and prognostics implementing condition-based maintenance,” Ind. Informat., vol. 16, no. 1, pp. 339–349, Jan. 2020.
Mech. Syst. Signal Process., vol. 20, no. 7, pp. 1483–1510, 2006. [25] X. Li, W. Zhang, Q. Ding, and X. Li, “Diagnosing rotating machines with
[3] J. Lee, F. Wu, W. Zhao, M. Ghaffari, L. Liao, and D. Siegel, “Prognostics weakly supervised data using deep transfer learning,” IEEE Trans. Ind.
and health management design for rotary machinery systems-reviews, Informat., vol. 16, no. 3, pp. 1688–1697, Mar. 2020.
methodology and applications,” Mech. Syst. Signal Process., vol. 42, [26] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, “Deep domain
no. 1/2, pp. 314–334, 2014. confusion: Maximizing for domain invariance,” 2014, arXiv:1412.3474.
[4] M. S. Kan, A. C. C. Tan, and J. Mathew, “A review on prognostic [27] B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain
techniques for non-stationary and non-linear rotating systems,” Mech. Syst. adaptation,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 443–450.
Signal Process., vol. 62/63, pp. 1–20, 2015. [28] Y. Ganin et al., “Domain-adversarial training of neural networks,” J. Mach.
[5] P. Paris and F. Erdogan, “A critical analysis of crack propagation laws,” J. Learn. Res., vol. 17, no. 1, pp. 2096–2030, 2016.
Basic Eng., vol. 85, no. 4, pp. 528–533, Dec. 1963. [29] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
[6] P. Kundu, A. K. Darpe, and M. S. Kulkarni, “Weibull accelerated failure with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Pro-
time regression model for remaining useful life prediction of bearing cess. Syst., 2012, pp. 1097–1105.
working under multiple operating conditions,” Mech. Syst. Signal Process., [30] W. A. Smith and R. B. Randall, “Rolling element bearing diagnostics using
vol. 134, 2019, Art. no. 106302. the case western reserve university data: A benchmark study,” Mech. Syst.
[7] X. Li, Q. Ding, and J.-Q. Sun, “Remaining useful life estimation in Signal Process., vol. 64, pp. 100–131, 2015.
prognostics using deep convolution neural networks,” Rel. Eng. Syst. Saf., [31] P. Nectoux et al., “Pronostia: An experimental platform for bearings
vol. 172, pp. 1–11, 2018. accelerated degradation tests.” in Proc. IEEE Int. Conf. Prognostics Health
[8] R. Zhao, R. Yan, J. Wang, and K. Mao, “Learning to monitor machine Manage., 2012, pp. 1–8.
health with convolutional bi-directional lstm networks,” Sensors, vol. 17, [32] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R.
no. 2, pp. 273–290, 2017. Salakhutdinov, “Dropout: A simple way to prevent neural networks from
[9] H. Miao, B. Li, C. Sun, and J. Liu, “Joint learning of degradation assess- overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.
ment and rul prediction for aeroengines via dual-task deep lstm networks,” [33] M. Baktashmotlagh, M. T. Harandi, B. C. Lovell, and M. Salzmann,
IEEE Trans. Ind. Informat., vol. 15, no. 9, pp. 5023–5032, Sep. 2019. “Unsupervised domain adaptation by domain invariant projection,” in
[10] Y. Qin, D. Chen, S. Xiang, and C. Zhu, “Gated dual attention unit neural Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 769–776.
networks for remaining useful life prediction of rolling bearings,” IEEE [34] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
Trans. Ind. Informat., vol. 17, no. 9, pp. 6438–6447, Sep. 2021. 2014, arXiv:1412.6980.
[11] H.-E. Kim, A. C. Tan, J. Mathew, and B.-K. Choi, “Bearing fault prognosis [35] L. van der Maaten and G. Hinton, “Visualizing data using t-sne,” J. Mach.
based on health state probability estimation,” Expert Syst. Appl., vol. 39, Learn. Res., vol. 9, no. 86, pp. 2579–2605, 2008.
no. 5, pp. 5200–5213, 2012. [36] L. Ren, Y. Sun, J. Cui, and L. Zhang, “Bearing remaining useful life
[12] R. Liu, B. Yang, and A. G. Hauptmann, “Simultaneous bearing fault recog- prediction based on deep autoencoder and deep neural networks,” J. Manuf.
nition and remaining useful life prediction using joint-loss convolutional Syst., vol. 48, pp. 71–77, 2018.
neural network,” IEEE Trans. Ind. Informat., vol. 16, no. 1, pp. 87–96, [37] M. Xia, T. Li, T. Shu, J. Wan, C. W. de Silva, and Z. Wang, “A
Jan. 2020. two-stage approach for the remaining useful life prediction of bearings
[13] M. Cerrada et al., “A review on data-driven fault severity assessment in using deep neural networks,” IEEE Trans. Ind. Informat., vol. 15, no. 6,
rolling bearings,” Mech. Syst. Signal Process., vol. 99, pp. 169–196, 2018. pp. 3703–3711, Jun. 2019.
[14] Z. Gao and X. Liu, “An overview on fault diagnosis, prognosis and resilient [38] J. Zhu, N. Chen, and W. Peng, “Estimation of bearing remaining useful
control for wind turbine systems,” Processes, vol. 9, no. 2, 2021, Art. no. life based on multiscale convolutional neural network,” IEEE Trans. Ind.
300. Electron., vol. 66, no. 4, pp. 3208–3216, Apr. 2019.
[15] Y. Fu, Z. Gao, Y. Liu, A. Zhang, and X. Yin, “Actuator and sensor fault [39] Z. Gao, C. Cecati, and S. X. Ding, “A survey of fault diagnosis and
classification for wind turbine systems based on fast Fourier transform fault-tolerant techniques-part ii: Fault diagnosis with knowledge-based
and uncorrelated multi-linear principal component analysis techniques,” and hybrid/active approaches,” IEEE Trans. Ind. Electron., vol. 62, no. 6,
Processes, vol. 8, no. 9, 2020, Art. no. 1066. pp. 3768–3774, Jun. 2015.
[16] R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, and R. X. Gao, “Deep learning [40] Y. Jiang and S. Yin, “Recent advances in key-performance-indicator ori-
and its applications to machine health monitoring,” Mech. Syst. Signal ented prognosis and diagnosis with a MATLAB toolbox: Db-kit,” IEEE
Process., vol. 115, pp. 213–237, 2019. Trans. Ind. Informat., vol. 15, no. 5, pp. 2849–2858, May 2019.
[17] W. Zhang, G. Peng, C. Li, Y. Chen, and Z. Zhang, “A new deep learning [41] S. Yin, J. J. Rodriguez-Andina, and Y. Jiang, “Real-time monitoring and
model for fault diagnosis with good anti-noise and domain adaptation control of industrial cyberphysical systems: With integrated plant-wide
ability on raw vibration signals,” Sensors, vol. 17, no. 2, pp. 425–445, monitoring and control framework,” IEEE Ind. Electron. Mag., vol. 13,
2017. no. 4, pp. 38–47, Dec. 2019.
[18] R. Liu, F. Wang, B. Yang, and S. J. Qin, “Multiscale kernel based residual
convolutional neural network for motor fault diagnosis under nonstationary
conditions,” IEEE Trans. Ind. Informat., vol. 16, no. 6, pp. 3797–3806,
Jun. 2020.
[19] L. Wen, X. Li, L. Gao, and Y. Zhang, “A new convolutional neural network-
based data-driven fault diagnosis method,” IEEE Trans. Ind. Electron.,
vol. 65, no. 7, pp. 5990–5998, Jul. 2018.
Pengcheng Xia received the B.S. degree in
[20] F. Wang, R. Liu, Q. Hu, and X. Chen, “Cascade convolutional neural
2018 from Shanghai Jiao Tong University,
network with progressive optimization for motor fault diagnosis under
Shanghai, China, where he is currently working
nonstationary conditions,” IEEE Trans. Ind. Informat., vol. 17, no. 4,
toward the Ph.D. degree, both in mechanical
pp. 2511–2521, Apr. 2021.
engineering.
[21] R. Yan, F. Shen, C. Sun, and X. Chen, “Knowledge transfer for rotary
His research interests include machinery
machine fault diagnosis,” IEEE Sensors J., vol. 20, no. 15, pp. 8374–8393,
health monitoring, prognostics and health
Aug. 2020.
management, machine learning, and signal
[22] W. Lu, B. Liang, Y. Cheng, D. Meng, J. Yang, and T. Zhang, “Deep model
processing.
based domain adaptation for fault diagnosis,” IEEE Trans. Ind. Electron.,
vol. 64, no. 3, pp. 2296–2305, Mar. 2017.
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:01 UTC from IEEE Xplore. Restrictions apply.
XIA et al.: FAULT KNOWLEDGE TRANSFER ASSISTED ENSEMBLE METHOD FOR REMAINING USEFUL LIFE PREDICTION 1769
Yixiang Huang (Member, IEEE) received the Chengliang Liu received the B.S. degree in
B.S. degree in power and energy engineering, mechanical manufacturing from the Shandong
and the M.S. and Ph.D. degrees in mechatronics University of Technology, Shandong, China, in
engineering from Shanghai Jiao Tong Univer- 1985, and the M.S. and Ph.D. degrees in me-
sity, Shanghai, China, in 2002, 2006, and 2010, chanical engineering from Southeast University,
respectively. Nanjing, China, in 1991 and 1998, respectively.
He is currently an Associate Professor of me- He has been invited as a Senior Scholar with
chanical engineering with Shanghai Jiao Tong the University of Michigan, Ann Arbor, MI, USA
University, where he studies the topics of in- and the University of Wisconsin, Madison, WI,
telligent maintenance, prognostics, and ma- USA, since 2001. He is currently a Professor
chine learning. He was with the NSF Indus- with the Department of Mechanical Engineer-
try/University Cooperative Research Center for Intelligent Maintenance ing, Shanghai Jiao Tong University, Shanghai, China. His current re-
Systems, University of Cincinnati, USA. He is a regular reviewer for a search interests include mechatronic systems, MEMS design, intelli-
number of international journals. His current research interests include gent robot control, remote monitoring techniques, and condition based
big data analysis, sparse coding, and dimensionality reduction for in- monitoring.
dustrial applications and computational intelligence techniques and their
applications in various industrial domains.
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:01 UTC from IEEE Xplore. Restrictions apply.