Fault_Knowledge_Transfer_Assisted_Ensemble_Method_for_Remaining_Useful_Life_Prediction

1758 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO.
3, MARCH 2022
Fault Knowledge Transfer Assisted Ensemble

Method for Remaining Useful Life Prediction
Pengcheng Xia , Yixiang Huang , Member, IEEE, Peng Li, Chengliang Liu, and Lun Shi
Abstract—Machinery remaining useful life (RUL) predic- the remaining useful life (RUL) [3]. An accurate RUL estima-
tion is an important task in condition-based maintenance. tion can provide useful information for predictive maintenance
Data-driven methods have been widely studied and ap- decisions, thus reducing unplanned breakdown costs. RUL pre-
plied, however, almost all the researches learn degradation
trends regardless of different fault conditions, which can diction has attracted considerable attentions and researches in
lead to different degradation patterns. This article proposes the past decades. However, how to make accurate RUL predic-
a novel fault information assisted RUL prediction method tions still faces many challenges due to the complex machinery
based on a convolutional long short-term memory (LSTM) degradation mechanisms.
ensemble network, where fault conditions are obtained via According to [4], RUL prediction methods can be mainly
fault knowledge transfer. Divergence minimization and do-
main adversarial adaptation are combined to transfer fault categorized into model-based, data-driven based, and combina-
knowledge from a fault dataset to the run-to-failure data tion models. model-based methods develop physical or math-
in a weakly supervised manner. With the predicted fault ematical models to describe the degradation process, such as
information, the RUL prediction network can learn various the Paris–Erdogan model [5]. Data-driven-based approaches
degradation patterns under different faults separately using build models based on the historical condition monitoring data,
a structure of multiple LSTMs. Then an ensemble strategy
based on soft fault conditions is designed to get final RUL which is easy to implement. Combination models integrate the
prediction results. Experiment on bearing datasets verifies model-based and data-driven based models to develop a more
the effectiveness of our proposed method. comprehensive model. Since machinery systems become more
Index Terms—Convolutional neural network (CNN), fault
and more complex, it is difficult to develop a reliable model-
diagnosis, long short-term memory (LSTM) network, re- based method suitable for different conditions. Integrating an
maining useful life prediction, transfer learning. applicable combination model is also challenging. With tremen-
dous big-data-based algorithms arising in the recent years, data-
I. INTRODUCTION driven-based methods have been widely studied and applied.
Useful features or health indicators are first extracted to represent
ITH the rapid development of sensor, control, and
W monitoring technologies in industrial, condition-based
maintenance (CBM) technique has been widely studied and
the degradation trend [6], and then RUL is predicted based on
the features. Recently, with the development of deep learning
and its wide applications in various fields, some end-to-end
implemented to ensure the reliability of complex industrial
deep learning methods have been established for machinery
systems [1]. CBM provides maintenance decisions based on the
RUL prediction. Li et al. [7] employed deep convolutional
condition monitoring information, and diagnostics and prog-
neural network (CNN) to predict RUL of turbofan engines. Zhao
nostics are two main tasks in a CBM system [2]. Diagnostics
et al. [8] combined CNN and bidirectional long short-term mem-
aims to detect and identify the fault modes of a machine system,
ory (LSTM) network to predict milling tool wear. Miao et al. [9]
whereas prognostics assesses the health condition and predicts
proposed a dual-task deep LSTM network to simultaneously
assess degradation state and predict RUL of aeroengines. Qin
Manuscript received April 21, 2021; accepted May 11, 2021. Date of
publication May 18, 2021; date of current version December 6, 2021. et al. [10] proposed a gated recurrent unit network with dual
This work was supported in part by the National Natural Science Foun- attention gates for RUL prediction of bearings.
dation of China under Grant 51975356, in part by the Shanghai AI These data-driven-based prognostic methods learn machinery
Creativity Development Project under Grant 2019-RGZN-01026, and in
part by the Shanghai Municipal Science and Technology Major Project degradation trends of available historical data regardless of its
under Grant 2021SHZDZX0102. Paper no. TII-21-1760. (Corresponding fault modes and degradation patterns, correspondingly suffer-
author: Yixiang Huang.) ing prediction uncertainty [4]. Actually, machinery components
Pengcheng Xia, Yixiang Huang, Peng Li, and Chengliang Liu
are with the State Key Laboratory of Mechanical System and may have various degradation patterns of different individuals
Vibration, Shanghai Jiao Tong University, Shanghai 200240, China and even at different degradation stages due to their diverse fault
(e-mail: xpc19960921@sjtu.edu.cn; david.huangyx@gmail.com; modes in practical applications. Therefore, fault conditions can
peng.li@sjtu.edu.cn; chlliu@sjtu.edu.cn).
Lun Shi is with the Shanghai SmartState Technology Company, provide prior knowledge guiding data-driven methods to better
Ltd, Shanghai 201306, China, and also with the State Key Laboratory model various degradation patterns and improve RUL prediction
of Mechanical System and Vibration, Shanghai Jiao Tong University, accuracy [11].
Shanghai 200240, China (e-mail: shilun@sjtu.edu.cn).
Color versions of one or more figures in this article are available at Though fault condition will affect the degradation process,
https://doi.org/10.1109/TII.2021.3081595. in the literature, very limited RUL prediction researches take it
Digital Object Identifier 10.1109/TII.2021.3081595
1551-3203 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:01 UTC from IEEE Xplore. Restrictions apply.
XIA et al.: FAULT KNOWLEDGE TRANSFER ASSISTED ENSEMBLE METHOD FOR REMAINING USEFUL LIFE PREDICTION 1759
into consideration. Liu et al. [12] proposed a joint-loss CNN to combining 1-D CNN and multiple LSTMs is proposed for RUL
perform bearing fault diagnosis and RUL prediction simultane- prediction. Each LSTM can learn the degradation pattern under a
ously using a partially shared structure. Experiments show that single fault mode, respectively, then an ensemble strategy based
the introduction of diagnosis task improves the RUL prediction on soft fault conditions acquired by FTNN is designed to get the
performance. However, there is a limitation that fault mode final RUL prediction results. The effectiveness of the proposed
of each training sample must be observed and recorded in the method is verified by case study of bearings. Results demonstrate
experiments, which is unrealistic in most practical applications. that the introduction of fault information greatly improves the
In addition, multiple faults may occur successively during a prognostic accuracy. The main contributions of this article are
degradation process and compound faults may be observed after summarized as follows.
the tests [13], then it is difficult to determine the fault type at 1) Fault information is introduced to assist RUL prediction
each predicting time. task through fault knowledge transfer based on a weakly-
Actually, it is difficult to obtain the fault conditions in real ap- supervised domain adaptation. The diagnosed fault con-
plications since fault modes are usually unobserved. A variety of dition helps model to learn various degradation patterns
fault diagnosis methods have been developed based on condition under different fault modes. Through knowledge transfer,
monitoring data to recognize the fault modes. Diagnosis mod- this method can be universally explored without prior
els can also be categorized as model-based, signal-based, and fault information.
knowledge-based (data-driven) according to [14]. Knowledge- 2) To capture degradation patterns caused by different faults,
based or data-driven methods are the most widely studied and a convolutional LSTM ensemble network is proposed for
applied recently. For instance, Fu et al. [15] combined fast RUL prediction in an end-to-end way. 1-D CNN is used to
Fourier transform and uncorrelated multilinear principal com- extract features and a set of LSTMs can model each kind
ponent analysis to diagnose faults of wind turbine systems. of degradation patterns, respectively. Considering fault
These data-driven methods require human experience for feature severity and multiple faults, an ensemble scheme based
engineering. As a result, deep learning algorithms, especially on soft fault conditions is designed to get comprehensive
CNN, have gained more and more attention and shown great RUL prediction results.
success in fault diagnosis tasks in recent years [16]. 1-D CNN is The rest of this article is organized as follows. Section II intro-
used to address 1-D vibration data directly, such as deep CNN duces some theoretical preliminaries of our method. Section III
with wide first-layer kernels (WDCNN) [17] and multiscale describes the proposed method in detail. A case study and results
learning based CNNs [18]. The 2-D CNN is usually utilized after are presented in Section IV. Finally, Section V concludes this
time series permutation or signal-to-image conversion like CNN article.
based on LeNet-5 proposed in [19]. Some more complicated
CNN-based methods like Cascade CNN [20], which introduces II. PRELIMINARIES
cascade structure and dilated convolution operation, have also
been proposed to address fault diagnosis problem. A. Fault Knowledge Transfer
However, in real cases, it is difficult and unrealistic to collect For traditional intelligent algorithms used for fault diagnosis,
sufficient labeled data to train a reliable diagnosis model. Models an general assumption exists that training samples and testing
established based on laboratory data usually fail due to different samples have the same probability distribution. But for samples
data distributions caused by diverse machines or working con- from different operation conditions or machines, this assumption
ditions. Fortunately, transfer learning provides a promising tool usually fails. Transfer learning aims to address this distribution
to address this problem [21]. Lu et al. [22] introduced domain mismatch problem. Let X be the sample from a dataset and
adaptation technique to address fault diagnosis problem under P (X) be its marginal probability distribution. X belongs to
different working conditions. Guo et al. [23] studied knowledge a feature space X , i.e., X ∈ X . Then, a domain is defined
transfer between different machines or datasets with proposed as D = {X , P (X)}. Samples from dataset with fault labels
CNN-based domain adaptation methods. Chen et al. [24] pro- belong to source domain Ds and target domain Dt contains
posed a transfer learning scheme with pretrained CNN on source samples from another dataset with insufficient labels. In general,
dataset. Li et al. [25] proposed a domain adversarial network Ps (Xs ) = Pt (Xt ). One of the most commonly used methods
based method to accomplish diagnosis knowledge transfer from is domain adaptation, which aims to learn domain-invariant
multiple different source machines. features. The feature representation should follow almost same
To integrate fault information to improve RUL prediction per- distributions regardless of whether they are generated from the
formance and develop a universally applicable method, we pro- source domain or target domain. One popular domain adaptation
pose a fault information assisted convolutional LSTM ensemble method is to minimize a divergence which can measure the
method for RUL prediction, where fault conditions are diag- distribution discrepancy of source and target domains, such
nosed through transfer learning. We develop a fault knowledge as maximum mean discrepancy (MMD) [26] and correlation
transfer neural network (FTNN) based on a domain-shared 1-D alignment [27]. Another family of domain adaptation methods
CNN and domain adaptation techniques combining divergence introduces domain adversarial neural network [28], which aims
minimization and domain adversarial adaptation to transfer fault to learn the feature representation, which contains no discrimi-
knowledge from existing fault dataset to the run-to-failure sam- native information for the domain classifier to recognize which
ples. Based on the diagnosed fault information, a deep network domain it is from.
1760 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 3, MARCH 2022
Fig. 1. Illustration of 1-D convolutional layer and 1-D pooling layer.
Fig. 2. Illustration of LSTM cell.

In our proposed method, we aim to get the fault conditions
of run-to-failure samples. However, if we predict the fault con-
ditions using a well-trained diagnosis model, the distribution (j)
where ci is the output at the ith time step obtained by the jth
mismatch problem must exist since two datasets are obtained (j) (j)
kernel, wc and bc are the weight matrix and bias matrix of the
from different machines and under different working conditions.
jth kernel, respectively, ϕ(·) represents the nonlinear activation
Therefore, transfer learning is employed to address this problem.
function, and zi:i+l−1 denotes the concatenation of l consecutive
In this article, the target domain contains samples from run-
input data, i.e.,
to-failure tests used for RUL prediction study. Then, a dataset
with fault labels from the same type of machinery component zi:i+l−1 = zi ⊕ zi+1 ⊕ . . . ⊕ zi+l−1 (2)
is used to form the source domain, i.e., Xs = {(xsi , yis )}N i=1 ,
s
where Ns is the number of source domain samples. We should where ⊕ denotes the concatenation operator.
use a source dataset containing almost all potential fault types Then down-sampling process is usually performed through
to provide sufficient fault knowledge. That is, Yt ⊆ Ys , where a max-pooling layer, which extracts the local maximum values
Y = {1, 2, . . . , K} denotes the label space of a domain, and along one direction to reduce the trainable parameters formu-
K is the number of condition categories. Actually, the machine lated by
components at the beginning of run-to-failure tests are usually (j) (j)
pi = max {ck } (3)
brand new, so samples at the very early stage can be regarded as s(i−1)+1≤k≤si
normal without faults. Consequently, the target domain can have (j)
access to part of normal condition samples. We assume there are where s is the length of each pooling region, ck is the convo-
Ntnor samples are labeled as normal category, which form a sub- lution output at the kth time step obtained by the jth kernel.
dataset Xtnor = {(xt,nor
Ntnor
, y = “normal”)}i=1 . The rest Ntunl sam- Finally, the pooling results obtained by multiple kernels are
i
Ntunl stacked column by column to form a feature map.
ples are all unlabeled, forming a subdataset Xtunl = {xt,unl
i }i=1 .
Therefore, the target domain dataset is Xt = Xtnor ∪ Xtunl , which
C. LSTM Network
has Nt = Ntnor + Ntunl samples. Our task is to transfer the fault
knowledge from the source domain Ds to the target domain Dt LSTM network is a popular variant of RNN. LSTM introduces
in a weakly supervised manner. gate mechanisms to enhance its capability of capturing long-
term dependencies. Three gates, i.e., input gate, forget gate, and
B. Convolutional Neural Network output gate, are used to control the information flow and a cell
state Ct is updated at each step. The basic theory of an LSTM
CNN is one of the most widely used deep learning mod-
cell is illustrated as Fig. 2 and formulated as follows:
els. 1-D CNN has been widely applied in fault diagnosis and
prognosis tasks to address sequential signal data. Due to the it = σ(Wi · [xt , ht−1 ] + bi ) (4)
strong feature extraction ability of CNN and great success 1-D
ft = σ(Wf · [xt , ht−1 ] + bf ) (5)
CNN has gained in diagnosis and prognosis tasks, 1-D CNN is
employed as a feature extractor to extract feature representations ot = σ(Wo · [xt , ht−1 ] + bo ) (6)
from raw signal data in the proposed method. Fig. 1 shows a
Ct = it tanh(Wc · [xt , ht−1 ] + bi ) + ft Ct−1 (7)
simplified 1-D CNN structure with a convolutional layer and a
pooling layer. The input Z is signal sequence of N channels, ht = ot tanh(Ct ) (8)
i.e., Z = [z1 , z2 , . . . , zL ], where L is the length of sequence and
zi ∈ RN . The convolutional layer utilizes kernels sliding along where xt and ht denote the input and hidden state of time t,
time direction to perform convolution operations as follows: W∗ and b∗ are the weight matrix and bias matrix, respectively,
represents elementwise product operator, σ(·) denotes sig-
(j)
ci = ϕ(wc(j) · zi:i+l−1 + b(j)
c ) (1) moid activation function, i.e., σ(x) = 1/(1 + e−x ), and tanh(·)
Fig. 3. Structure illustration of FTNN.
denotes hyperbolic tangent activation function, i.e., tanh(x) = which is shown as yellow blocks for source domain samples
(ex − e−x )/(ex + e−x ). and green blocks for target domain samples, is used for health
condition recognition, where MMD is employed for domain
III. PROPOSED METHOD adaptation and classification loss is calculated. And the purple
blocks represent the domain classifier Gd for domain adversarial
In this article, a fault knowledge transfer assisted RUL pre-
adaptation.
diction method is proposed. The proposed method consists of
1) Feature Extractor: The feature extractor Gf employs
two important parts: one is the FTNN, and the other is the
three 1-D convolutional blocks to extract features from raw
convolutional LSTM ensemble network for RUL prediction.
signal data of both source and target domains, i.e., Ns source
domain samples xsi and Nt target domain samples xti . Each
A. Fault Knowledge Transfer Neural Network convolutional block contains a 1-D convolutional layer with rec-
The proposed method aims to utilize abundant fault knowl- tified linear unit (ReLU) [29] activation function (ReLU(x) =
edge in the existing fault datasets to assist the RUL prediction max{0, x}) and a 1-D max-pooling layer. Through the hierar-
task to get prior fault information. Since the fault datasets and chical structure, a high-level feature representation is extracted,
the RUL prediction datasets usually have different data distribu- and then a flattened feature vector is formed through a flatten
tions, it is improper to directly use the fault knowledge across layer.
dataset. Transfer learning is an effective and unique method to 2) Condition Classifier: The condition classifier Gc uses two
address this problem. Therefore, we use transfer learning, or fully-connected (FC) layer and an output layer to recognize fault
more specifically, domain adaptation technique to develop a fault categories based on the feature representation. The operation of
knowledge transfer network. Since knowledge transfer across the lth FC layer (l = {1, 2}) can be expressed as follows:
datasets in this manuscript can be challenging, we combine
divergence minimization and domain adversarial adaptation to ylD = ϕ(WlC · yl−1
D
+ bC
l ) (9)
ensure the knowledge transfer ability. Domain adaptation must
where D = {s, t} represent results from source and target do-
base on a feature extractor used to learn domain-invariant fea-
main sample, respectively, y0 denotes the flattened feature vector
tures. In the proposed method, the input of network is raw
obtained by the feature extractor, ϕ is the ReLU activation func-
signals with noise. CNN is the most suitable and powerful
tion. The output layer employs a softmax function to calculate
network to extract features from raw signals according to many
the output probability of each category, which is formulated by
previous literatures in diagnosis and prognosis fields. Therefore,
CNN and transfer learning combining adversarial mechanism ⎡ ⎤ ⎡ (ζ T y +η ) ⎤
p(ĉD = 1|xD ) e 1 2 1
are integrated to construct the FTNN. ⎢ p(ĉD = 2|xD ) ⎥ ⎢ ⎥
⎢ e(ζ2 y2 +η2 ) ⎥
T
⎢ ⎥ 1
The structure of the proposed FTNN is illustrated as Fig. 3. c =⎢
D
⎥ = K (ζ T y +η ) ⎢ ⎥
The network contains three main parts. A feature extractor ⎣ ··· ⎦ k=1 e
k 2 k ⎣ ··· ⎦
(ζK
T
y2 +ηK )
Gf represented by blue blocks is used for building feature p(ĉ = K|x )
D D
e
representations from raw signal data. A condition classifier Gc , (10)
where ζk and ηk denotes the weight and bias parameters corre- To address the minimax problem using traditional gradient-
sponding with the kth output, respectively, and ĉD denotes the based optimization algorithm, a gradient reverse layer (GRL) is
predicted category of sample xD from domain D. The category introduced as [28]. The GRL just performs identity mapping in
with maximum conditional probability is the predicted condi- forward propagation, whereas reversing the sign of the gradient
tion type. Cross-entropy loss function is employed to measure during backward propagation before it is passed to the preceding
the condition classification loss of source domain samples Xs layers, i.e., − ∂L
∂θf . Then, the overall optimization objective of
d
and partial target domain samples of normal category Xtnor as the FTNN can be expressed as
follows:
Ns K Ntnor L = Lc + βLMMD + γLd (15)
1 α
Lc = − 1[yns = k] log(csn,k ) − nor log(ctn,1 )
Ns Nt where β and γ are the tradeoff parameters for MMD loss and
n=1 k=1 n=1
(11) domain classification loss. The network is trained by minimizing
where csn,k denotes the kth element of vector csn , which is the this objective via back-propagation algorithm. The network
output corresponding to the nth input sample in source domain, parameters are updated as follows:
yns is the corresponding fault label, where label 1 is the normal
∂Lc ∂LMMD ∂Ld
category, ctn is the output vector corresponding with the nth θf ← θf − δ +β −γ (16)
labeled target domain sample xt,nor and ctn,1 is the first element ∂θf ∂θf ∂θf
n
t
of cn , and α is the tradeoff parameter for target domain loss. ∂Lc ∂LMMD
However, distribution discrepancy of the feature representa- θc ← θc − δ +β (17)
∂θc ∂θc
tions exists. MMD is introduced to the two FC layers to measure
the distribution discrepancy of features of all the input samples ∂Ld
θd ← θd − δ (18)
xsi and xti from two domains and distribution shift is adapted by ∂θd
minimizing MMD. This optimization objective can be expressed where θf , θc , and θd are the parameters of Gf , Gc , and Gd ,
as follows: respectively, and δ is the learning rate.
LMMD = MMD2 (y1s , y1t ) + MMD2 (y2s , y2t ) (12)
Ns Ns B. Convolutional LSTM Ensemble Network for RUL
1
MMD 2
(yls , ylt ) = 2 s
k(yl,m s
, yl,n ) Prediction
Ns m=1 n=1 After the fault condition of each sample is predicted by the
1
Nt Nt FTNN, a convolutional LSTM ensemble network, which uses
+ t
k(yl,m t
, yl,n ) the fault information as prior information is proposed to get
Nt2 m=1 n=1 more accurate RULs. In the proposed RUL prediction network,
Ns Nt we propose to use multiple subnetworks to model degradation
2
− s
k(yl,m t
, yl,n ) (13) patterns under different fault modes, respectively. Since degra-
Ns Nt m=1 n=1 dation patterns rely on temporal information, whereas CNN
is not good at temporal modeling, LSTM, which has strong
D
where yl,n is the feature of the lth FC layer (l = {1, 2}) corre-
temporal modeling ability is used to learn degradation patterns
sponding to the nth input sample in domain D, and k(·, ·) denotes
from the feature representations extracted by CNN. The network
a kernel function. In this article, the Gaussian kernel func-
structure is illustrated as Fig. 4, where the feature extractor
tion is utilized, which is formulated as k(x, y) = exp(− x −
Gf shares the same structure with that in the FTNN. After
y 2 /2σ 2 ), where σ is the kernel bandwidth.
feature representations are extracted, multiple LSTMs are used
3) Domain Classifier: The domain classifier Gd aims to rec-
to model the degradation patterns under different fault modes
ognize the domain each sample belongs to. The feature extractor
and multiple RUL values are obtained. Finally, an ensemble
Gf tries to generate domain-invariant feature representations
process is proposed to get the predicted RUL based on the fault
while the domain classifier Gd updates to distinguish the domain
conditions obtained by the FTNN including fault labels and soft
labels. Then, the domain adaptation process can be converted to
fault conditions.
a minimax problem by minimizing condition classification loss
As degradation trend information is important in RUL pre-
and maximizing domain recognition loss simultaneously. The
diction tasks and LSTM is capable of addressing sequential
domain classifier contains a FC layer and an output layer, which
information, time window technique is used in our method.
are calculated in a similar way as (9) and (10), where the output
When we predict the RUL at time t, signal data of the preceding
dD is a 2-D vector corresponding to the two domain categories.
consecutive T − 1 time cycles are also combined to form a sam-
The domain recognition loss is also defined by cross entropy as
ple Xt = [xt−T +1 , xt−T +2 , . . . , xt ] ∈ RL×T , where xt ∈ RL
Ns Nt
1 1 denotes the signal data at time t, and T is the length of time
Ld = − log(dsn,1 ) − log(dtn,2 ) (14) window. Then, the signal sequence of T channels is input to the
Ns n=1
Nt n=1
feature extractor, which shares the same structure as that in the
where dD D
n,k denotes the kth element of output vector dn where FTNN except that a flatten layer is not introduced after the last
label 1 is source domain and label 2 is target domain. pooling layer.
Fig. 4. Structure illustration of the convolutional LSTM ensemble network for RUL prediction.
The high-level feature map is then input to an LSTM en- an ensemble mechanism is proposed in testing process. We take
semble module, which contains K LSTM networks with the the output after softmax operation in FTNN (i.e., c), which we
same structure, where K is the number of all possible condition call soft fault condition, as the similarity degree between the
categories. Each LSTM network has an LSTM layer and a health condition and a specific fault. The feature map is input
regression layer. The LSTM layer takes the sequential feature to all the K LSTMs to get K predicted RULs independently,
map, whose number of channels is the same as the kernel number and the soft fault condition is used as ensemble weights to get a
of the last convolutional block, as input, and the hidden state at more comprehensive RUL by
the terminal time step is adopted as the input of the regression
layer, which is calculated as r̂ = c1 r̂1 + c2 r̂2 + · · · + cK r̂K (21)
r̂k = Wrk · hk + bkr (19) where ck is the kth element of c.
where hk denotes the hidden state at the terminal time step of C. Method Summary
the LSTM layer in the kth LSTM network, Wrk and bkr are the
weight and bias matrix of the corresponding regression layer, Algorithm 1 gives the detailed algorithmic process of the
respectively, and r̂k is the predicted RUL by the kth LSTM. proposed method.
Since different fault modes can lead to different degradation
patterns, each LSTM in the LSTM ensemble module aims to IV. CASE STUDY
learn the machinery degradation pattern under a specific fault Bearing is one of the most important machine components in
mode separately. At time t, we feed the current signal sample xt industrial applications. Accurately predicting RUL of bearings
into the FTNN to get the predicted fault condition. In training can ensure the reliability of many rotary machines. In this
process, if the machine at time t is predicted to be under the kth article, the proposed method is validated on a bearing run-to-
fault condition, the feature map generated by feature extractor is failure dataset with fault knowledge transfer from a bearing fault
input to the corresponding kth LSTM. Then, the output is taken dataset.
as the predicted RUL to calculate mean-squared-error (MSE)
loss by A. Description of Datasets
Ntr
1 1) CWRU Bearing Dataset: CWRU bearing dataset is a pop-
Lr = (r̂j − rj )2 (20)
Ntr ular bearing fault dataset from experiments by Case Western
j=1
Reserve University [30]. Vibration signals are collected with
where r̂j and rj are the predicted and actual RUL of the jth input, sampling frequency of 12 kHz under four different working
respectively, and Ntr denotes the number of training samples. conditions. Three types of single point faults are introduced to
The network is trained by minimizing this loss value. bearings with fault diameter of 0.1778 mm, thus, four health
However, machines can have different fault severity at differ- conditions are included, i.e., normal (N), inner race fault (IF),
ent time. In addition, multiple faults may exist simultaneously at ball fault (BF), and outer race fault. We select samples with 1200
some time, then the predicted condition type only represents the points and data augmentation is performed by 80% overlapping.
one with the greatest possibility. To overcome these problems, The data details are listed in Table I.
Algorithm 1: The Proposed Fault Knowledge Transfer As-

sisted Ensemble Method.
Input: A fault dataset Xs , training set of the target
run-to-failure dataset Xt , online recording data Xo ,
hyper-parameters of FTNN and RUL prediction network
(epoch for FTNN: epoch1, epoch for RUL prediction
ensemble network: epoch2, etc.).
Output: RUL prediction value for Xo .
1: Data Pre-processing: Form training samples with
same lengths for both datasets;
2: Build: FTNN and RUL prediction ensemble network;
3: # FTNN Training Fig. 5. RMS values of horizontal vibrations of bearings over full
4: for i=1:epoch1 do life-time.
5: Input all the samples from Xs and Xt into FTNN;
6: Updating parameters of FTNN by minimizing (15);
7: end for patterns, we choose Bearing 3 and Bearing 5 for model testing
8: Input training samples of target dataset Xt to the and the other five bearings are used for training.
trained FTNN and get fault labels;
9: # RUL Prediction Ensemble Network Training B. Experiment Details
10: for j=1:epoch2 do
In the experiment, CWRU dataset is the source domain and
11: Input training samples from Xt into RUL
training set of PRONOSTIA dataset acts as the target domain.
prediction ensemble network along with the
To ensure same sample length, we select the first 1200 data
predicted fault labels;
points of each time cycle of horizontal vibration signals as
12: Updating parameters of the RUL prediction
one target domain sample. We take the first 200 time cycles
ensemble network by minimizing (20);
of each training bearing as normal samples. In RUL prediction
13: end for
process, the time window length is set as 10, i.e., data of 10
14: # Online Testing
consecutive time cycles are used to form an input sample. As a
15: Input online recording samples Xo to the trained
result, there are totally 8062 samples in the source domain, i.e.,
FTNN and get soft fault labels;
Ns = 8062, and the sample details are listed in Table I. There
16: Input Xo into RUL prediction ensemble network
are 9809 training samples in the target domain, i.e., Nt = 9809,
along with soft fault labels and get the predicted RUL
which are the union of five training bearings and each sample
value according to (21)
is from an independent time cycle. Since samples from the first
200 time cycles of each bearing are taken as normal samples,
TABLE I there are totally 1000 samples labeled as normal category, i.e.,
DESCRIPTION OF CWRU DATASET Ntnor = 1000. All these 8062 samples in the source domain and
9809 training samples in the target domain are used to train the
FTNN. As for the RUL prediction network, five bearings, i.e.,
5/7 bearings from the dataset are used for model training, and
the rest two bearings, i.e., 2/7 are used for testing. There are
9809 training samples contained in the five training bearings
and 4802 testing samples contained in the two testing bearings.
In other words, training data account for 67% and testing data
2) PRONOSTIA Bearing Dataset: The bearing run-to- account for 33% in terms of sample numbers. The RUL values
failure data used for RUL prediction are from accelerated life are normalized linearly to [0,1] because of the wide ranges of
tests on PRONOSTIA platform [31]. Horizontal and vertical lifetimes. Since four fault condition categories are included, four
vibration signals are collected every 10 s for 0.1 s with sam- LSTMs are contained in the RUL prediction network. The RUL
pling frequency of 25.6 kHz. Bearings under the first working prediction performance is measured by root-mean-squared-error
condition, i.e., rotating speed of 1650 r/min and load of 4200 N, (RMSE) formulated as
are chosen for study. Fig. 5 illustrates the run-to-failure root-
1 n
mean-squared (rms) values of horizontal vibrations of the seven RMSE = (r̂ − r)2 (22)
bearings under first condition. We can see that bearing lifetimes n i=1
vary a lot and mainly two degradation patterns can be observed, where r̂ and r are the predicted and actual RUL, respectively,
one of which has a slightly increasing stage (e.g., Bearing 3) and n is the number of samples.
and the other degrades abruptly when near the end-of-life (e.g., The first convolutional layer of the feature extractor uses wide
Bearing 5). This indicates that different faults may occur during kernels to suppress high frequency noise just like the WDCNN
the experiments for different bearings. To cover these two main in [17]. We set the kernel size of the first layer as 64 and stride as
TABLE II
DETAILS OF THE PROPOSED MODEL IN THE CASE STUDY
Fig. 6. RMSE values on Bearing 3 with different hyperparameters.

(a) Different kernel sizes m of convolution layers and different hidden
sizes n of LSTM layers. (b) Different tradeoff parameter for target domain
loss α.
16 just following the experience in [17]. In practice, we tend to

of RMSE). So some fluctuations can be observed when α is less
use a deeper network to enhance the modeling ability of CNN.
than 1.0. β and γ are set to be changed with training process by
The kernel size of the rest convolutional layers is decided from 3,
β, γ = 2/(1 + exp(−10 · p)) − 1, where p denotes the training
5, and 7, so the output size of the third pooling layer is 7, 6, and 4
process percentage, which changes linearly from 0 to 1 [28]. β
accordingly. It is not appropriate to add one more convolutional
and γ change from 0 at the beginning and gradually increase to
layer and pooling layer, otherwise the feature size will be too
1 along with the training process. In FTNN, each batch contains
small to contain sufficient information. Therefore, the number
128 samples from the source domain and 128 from the target
of convolutional layer and pooling layer is decided as 3. Other
domain. Stochastic gradient descent algorithm is used to train
important hyperparameters are the kernel size m of the rest con-
the network for 50 epochs with a learning rate of 0.05 × 0.98e ,
volutional layers and the hidden size n of the LSTM layer. Grid
where e denotes the epoch number. In the RUL prediction
search is performed to determine these two hyperparameters.
ensemble network, the network is trained for 50 epochs using
To ensure as many types of fault as possible are covered so
Adam optimizer [34] with batch size of 128, and learning rate is
that all LSTMs can be trained, models are trained using the 5
set as 0.0002 × 0.98e .
training bearings and applied on one testing bearing (Bearing
3) for parameter validation. The RMSE values on Bearing 3 are
shown as Fig. 6(a). From the results, we can notice that different C. Results and Discussions
trends over n can be observed for different m. We can infer from
After training of the FTNN, fault diagnosis accuracy on
Fig. 6(a) that 5 is the most suitable kernel size in our case study.
the source domain can reach 100%. Then, fault conditions of
When m is five, a too large hidden size n of LSTM will restrict
run-to-failure samples are predicted. To demonstrate the effec-
the modeling ability and a too small n will overfit data, which
tiveness of the domain adaptation process intuitively, we first
both decrease the performance. When m is three, small kernel
train our proposed FTNN using source domain samples without
receptive field may decrease the capability of CNN so that less
any domain adaptation, i.e., the domain classifier and the MMD
useful features may be extracted. Then, a slightly larger hidden
modules, and then the target domain samples are input into
size n of LSTM may overfit data, resulting in performance
the well-trained network. The high-level feature representations
reduction, so that RMSEs increase with n. When m is seven,
of both source domain samples and target domain samples
CNN may have overfitted data so that the change of n cannot
are extracted from the last fully-connected layer of the condi-
influence the performance too much, then results fluctuate with
tion classifier, and t-distributed stochastic neighbor embedding
the increase of n. Based on the results, m and n are set as 5
(t-SNE) technique [35] is used to map these high-dimensional
and 64, respectively. The implementation details of the FTNN
features to a 2-D space. Features of both source and target
and convolutional LSTM ensemble network are described in
domain samples obtained by our proposed FTNN with domain
Table II. Dropout technique [32] is introduced after the LSTM
adaptation are processed similarly. Fig. 7 visualizes the 2-D
layer to prevent overfitting with dropout rate of 0.5. In FTNN,
feature representations without or with domain adaptation. It
the kernel bandwidth σ of MMD is calculated by σ = msd 2 , can be clearly observed that although source domain samples are
where msd is the median squared distance among all pairs of mapped into four separated clusters, the target domain samples
samples [33]. The tradeoff parameter for target domain loss α are mainly mapped to two fault clusters of them. In addition,
is tested for values [0.25, 0.5, 0.75, 1.0, 1.5, 2.0], and is set as target domain samples with normal categories are mixed with the
0.5 finally according to the results illustrated as Fig. 6(b). These unlabeled samples and are not mapped to the same region of the
normal samples are only used to assist the domain adaptation normal samples from source domain. Therefore, the distribution
process to increase the adaptation performance of samples at mismatch problem does exist and knowledge transfer is needed.
the beginning of life. As expected, the choice of α will not After the domain adaptation by FTNN, it can be seen that the
influence the performance too much (less than 0.01 in terms distributions of two domain samples are drawn closer, and the
Fig. 7. Visualization of learned feature representations before domain

adaptation (left) and after domain adaptation (right). Symbol S and T
represent the source and target domains, respectively. T-N and T-UN
represent the samples with normal categories and unlabeled samples
in the target domain, respectively.
Fig. 10. RUL prediction results of CLSTM and our proposed method
on (a) Bearing 3 and (b) Bearing 5. The red straight line represents the
actual RUL.
may become dominant along with time with respect to different

Fig. 8. Visualization of feature representation distributions of the train- bearings, which coincides with the various degradation patterns
ing and testing samples.
shown in Fig. 5. The RUL prediction results over full life-time
on the two bearings are shown as Fig. 10. To show the effec-
tiveness of the proposed fault knowledge transfer assisted RUL
prediction mechanism, convolutional LSTM (CLSTM) network
without any prior fault information is used for comparison. The
CLSTM network shares the same structure with our proposed
RUL prediction ensemble network except that only a single
LSTM is utilized without ensemble process. We also plot the
prediction results of the CLSTM in Fig. 10 for comparison, and
it can be seen that the results of our proposed model are much
Fig. 9. Predicted fault labels of (a) Bearing 3 and (b) Bearing 5 by the more accurate with the actual RUL curve at most stages.
FTNN. Besides, 1-D CNN, which is widely used for RUL prediction
in an end-to-end manner, is also introduced for comparison.
Compared with the CLSTM, the 1-D CNN replaces the LSTM
target domain samples with normal categories are all mapped to layer with dropout by an FC layer with 64 neurons and ReLU
the cluster with normal labels of the source domain. In addition, function, and a flatten layer is added before it. Other training
we also visualize the feature representations extracted by the settings are kept same as our proposed model. To further show
feature extractor in the convolutional LSTM ensemble network the superiority of our proposed method, three state-of-art RUL
using t-SNE to show the distributions of training and testing prediction methods, all of which have been verified on the
sample, which is shown as Fig. 8. We can observe that the train- PRONOSTIA dataset, are applied in our study for comparison.
ing and testing samples basically share the same distributions, The first is a deep learning method based on deep autoencoder
so it is proper to test our trained model using the testing samples. and deep neural network distinguish with simple DNN and
The predicted fault labels of testing bearings are illustrated correspond to Fig. 11. [36]. The second is a two-stage approach
as Fig. 9. We can find that the two bearings are predicted as based on a denoising (TS-DNN) correspond to Fig. 11. [37].
normal state at the beginning stages, whereas different fault types And the third one is a multiscale convolutional neural network
have estimated RULs more than 0.5 in our proposed method,

and we find almost all these samples are predicted as normal
conditions. It indicates that the feature representations of these
samples are close to the normal samples in the feature space,
thus misclassification occurs. This particular signal pattern at
the degradation stage will increase prognostic uncertainty and
decrease reliability to some extend in real applications. To sup-
press this error, more training samples including this particular
degradation pattern should be added to teach the model. Or more
experiments could be conducted to know the corresponding fault
types of this particular degradation pattern.
V. CONCLUSION
In this article, we proposed a novel fault knowledge transfer
assisted convolutional LSTM ensemble method for RUL predic-
tion. Divergence minimization and domain adversarial adapta-
tion techniques were combined to transfer fault knowledge from
Fig. 11. RMSE values of six prognostic methods. a fault dataset to the run-to-failure samples. With the diagnosed
fault condition information, the RUL prediction network can
learn various degradation patterns under different faults using
(MSCNN) correspond to Fig. 11. Using time frequency repre- the structure of multiple LSTMs. Then, an ensemble process
sentation [38]. Since the training sets are chosen differently in based on predicted soft fault conditions were proposed to get
different literature, we fine-tune the main hyperparameters of RUL prediction results. Experiments on bearing datasets vali-
these three methods to get the best performance following the dates the effectiveness and superiority of our proposed method.
same setup as our grid search experiments. Besides, the results For further researches and applications, this proposed method
are all model output without RUL curve smoothing. All the presents a new prognostic paradigm by diagnosing fault condi-
experiments are repeated for 10 trails to reduce randomness. tions to improve the RUL prediction results, which can also pro-
The RMSE values of our proposed method and the compared vide some guidance for researches on the degradation patterns
methods are summarized as Fig. 11. under different fault types. This may contribute to understand-
From the experiment results, we can see that our pro- ing the machinery degradation mechanisms and constructing
posed method outperforms all the other methods and achieves some hybrid models [39]. Because of the more accurate RUL
great performance improvements on both two testing bear- prediction performance, our model can be integrated into some
ings (48.61% and 31.94% RMSE reduction compared with the complex industrial prognosis systems. For example, the RUL
CLSTM without fault knowledge). It can be seen from Fig. 10 prediction algorithm can be integrated into the key-performance-
that the prediction results do not show too many differences at the indicators oriented prognosis systems [40] to improve the pro-
beginning normal stages compared with the CLSTM. However, duction system reliability or integrated into some industrial
when faults begin to occur, the prior fault information guides cyberphysical systems to contribute to the maintenance decision
the proposed model to utilize different degradation knowledge making process [41].
for different fault condition, thus leading to much more accurate Although great improvements have been achieved by our
prognostic results. Especially at the middle stage of Bearing 3, method in the case study, there are some existing limitations
the complex fault conditions help the model to acquire much that cannot be ignored. First, it is assumed that a fault dataset
more accurate prediction results, instead. The analysis shows containing almost all potential fault types is available to provide
that the model can better model degradation trends under a sufficient fault knowledge. Therefore, our proposed method may
specific fault mode, and then the RUL prediction via ensemble not be applicable for some uncommon machine components
process, which considers complex fault conditions can have where a fault dataset may be difficult to get or not all the main
higher accuracy. It can be noticed from Fig. 10(b) that there fault types can be covered. Second, it is difficult to measure the
are some RUL prediction results near the end-of-life deviating distribution discrepancy degree between the fault dataset and
far from the actual RULs for both CLSTM and our proposed the target run-to-failure dataset, i.e., it is unclear whether the
model on Bearing 5. Since the prediction results with or without fault dataset is suitable for knowledge transfer so the model
fault knowledge both have large relative errors, we infer this performance may have relatively larger uncertainties. As a next
is because the extracted feature representations at that time step, we will try to transfer fault knowledge from multiple source
period are similar to the ones at the early degradation period domain datasets to enhance the fault condition prediction ability
in the training set, leading to much larger predicted RULs. We and develop a suitable metric to measure the transferable degree
inspect the predicted fault types of these about 30 samples, which between a fault dataset and a run-to-failure dataset.
REFERENCES [23] L. Guo, Y. Lei, S. Xing, T. Yan, and N. Li, “Deep convolutional transfer
learning network: A new method for intelligent fault diagnosis of ma-
[1] S. Alaswad and Y. Xiang, “A review on condition-based maintenance chines with unlabeled data,” IEEE Trans. Ind. Electron., vol. 66, no. 9,
optimization models for stochastically deteriorating system,” Rel. Eng. pp. 7316–7325, Sep. 2019.
Syst. Saf., vol. 157, pp. 54 – 63, 2017. [24] Z. Chen, K. Gryllias, and W. Li, “Intelligent fault diagnosis for rotary
[2] A. K. S. Jardine, D. Lin, and D. Banjevic, “A review on machinery machinery using transferable convolutional neural network,” IEEE Trans.
diagnostics and prognostics implementing condition-based maintenance,” Ind. Informat., vol. 16, no. 1, pp. 339–349, Jan. 2020.
Mech. Syst. Signal Process., vol. 20, no. 7, pp. 1483–1510, 2006. [25] X. Li, W. Zhang, Q. Ding, and X. Li, “Diagnosing rotating machines with
[3] J. Lee, F. Wu, W. Zhao, M. Ghaffari, L. Liao, and D. Siegel, “Prognostics weakly supervised data using deep transfer learning,” IEEE Trans. Ind.
and health management design for rotary machinery systems-reviews, Informat., vol. 16, no. 3, pp. 1688–1697, Mar. 2020.
methodology and applications,” Mech. Syst. Signal Process., vol. 42, [26] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, “Deep domain
no. 1/2, pp. 314–334, 2014. confusion: Maximizing for domain invariance,” 2014, arXiv:1412.3474.
[4] M. S. Kan, A. C. C. Tan, and J. Mathew, “A review on prognostic [27] B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain
techniques for non-stationary and non-linear rotating systems,” Mech. Syst. adaptation,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 443–450.
Signal Process., vol. 62/63, pp. 1–20, 2015. [28] Y. Ganin et al., “Domain-adversarial training of neural networks,” J. Mach.
[5] P. Paris and F. Erdogan, “A critical analysis of crack propagation laws,” J. Learn. Res., vol. 17, no. 1, pp. 2096–2030, 2016.
Basic Eng., vol. 85, no. 4, pp. 528–533, Dec. 1963. [29] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
[6] P. Kundu, A. K. Darpe, and M. S. Kulkarni, “Weibull accelerated failure with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Pro-
time regression model for remaining useful life prediction of bearing cess. Syst., 2012, pp. 1097–1105.
working under multiple operating conditions,” Mech. Syst. Signal Process., [30] W. A. Smith and R. B. Randall, “Rolling element bearing diagnostics using
vol. 134, 2019, Art. no. 106302. the case western reserve university data: A benchmark study,” Mech. Syst.
[7] X. Li, Q. Ding, and J.-Q. Sun, “Remaining useful life estimation in Signal Process., vol. 64, pp. 100–131, 2015.
prognostics using deep convolution neural networks,” Rel. Eng. Syst. Saf., [31] P. Nectoux et al., “Pronostia: An experimental platform for bearings
vol. 172, pp. 1–11, 2018. accelerated degradation tests.” in Proc. IEEE Int. Conf. Prognostics Health
[8] R. Zhao, R. Yan, J. Wang, and K. Mao, “Learning to monitor machine Manage., 2012, pp. 1–8.
health with convolutional bi-directional lstm networks,” Sensors, vol. 17, [32] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R.
no. 2, pp. 273–290, 2017. Salakhutdinov, “Dropout: A simple way to prevent neural networks from
[9] H. Miao, B. Li, C. Sun, and J. Liu, “Joint learning of degradation assess- overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.
ment and rul prediction for aeroengines via dual-task deep lstm networks,” [33] M. Baktashmotlagh, M. T. Harandi, B. C. Lovell, and M. Salzmann,
IEEE Trans. Ind. Informat., vol. 15, no. 9, pp. 5023–5032, Sep. 2019. “Unsupervised domain adaptation by domain invariant projection,” in
[10] Y. Qin, D. Chen, S. Xiang, and C. Zhu, “Gated dual attention unit neural Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 769–776.
networks for remaining useful life prediction of rolling bearings,” IEEE [34] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
Trans. Ind. Informat., vol. 17, no. 9, pp. 6438–6447, Sep. 2021. 2014, arXiv:1412.6980.
[11] H.-E. Kim, A. C. Tan, J. Mathew, and B.-K. Choi, “Bearing fault prognosis [35] L. van der Maaten and G. Hinton, “Visualizing data using t-sne,” J. Mach.
based on health state probability estimation,” Expert Syst. Appl., vol. 39, Learn. Res., vol. 9, no. 86, pp. 2579–2605, 2008.
no. 5, pp. 5200–5213, 2012. [36] L. Ren, Y. Sun, J. Cui, and L. Zhang, “Bearing remaining useful life
[12] R. Liu, B. Yang, and A. G. Hauptmann, “Simultaneous bearing fault recog- prediction based on deep autoencoder and deep neural networks,” J. Manuf.
nition and remaining useful life prediction using joint-loss convolutional Syst., vol. 48, pp. 71–77, 2018.
neural network,” IEEE Trans. Ind. Informat., vol. 16, no. 1, pp. 87–96, [37] M. Xia, T. Li, T. Shu, J. Wan, C. W. de Silva, and Z. Wang, “A
Jan. 2020. two-stage approach for the remaining useful life prediction of bearings
[13] M. Cerrada et al., “A review on data-driven fault severity assessment in using deep neural networks,” IEEE Trans. Ind. Informat., vol. 15, no. 6,
rolling bearings,” Mech. Syst. Signal Process., vol. 99, pp. 169–196, 2018. pp. 3703–3711, Jun. 2019.
[14] Z. Gao and X. Liu, “An overview on fault diagnosis, prognosis and resilient [38] J. Zhu, N. Chen, and W. Peng, “Estimation of bearing remaining useful
control for wind turbine systems,” Processes, vol. 9, no. 2, 2021, Art. no. life based on multiscale convolutional neural network,” IEEE Trans. Ind.
300. Electron., vol. 66, no. 4, pp. 3208–3216, Apr. 2019.
[15] Y. Fu, Z. Gao, Y. Liu, A. Zhang, and X. Yin, “Actuator and sensor fault [39] Z. Gao, C. Cecati, and S. X. Ding, “A survey of fault diagnosis and
classification for wind turbine systems based on fast Fourier transform fault-tolerant techniques-part ii: Fault diagnosis with knowledge-based
and uncorrelated multi-linear principal component analysis techniques,” and hybrid/active approaches,” IEEE Trans. Ind. Electron., vol. 62, no. 6,
Processes, vol. 8, no. 9, 2020, Art. no. 1066. pp. 3768–3774, Jun. 2015.
[16] R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, and R. X. Gao, “Deep learning [40] Y. Jiang and S. Yin, “Recent advances in key-performance-indicator ori-
and its applications to machine health monitoring,” Mech. Syst. Signal ented prognosis and diagnosis with a MATLAB toolbox: Db-kit,” IEEE
Process., vol. 115, pp. 213–237, 2019. Trans. Ind. Informat., vol. 15, no. 5, pp. 2849–2858, May 2019.
[17] W. Zhang, G. Peng, C. Li, Y. Chen, and Z. Zhang, “A new deep learning [41] S. Yin, J. J. Rodriguez-Andina, and Y. Jiang, “Real-time monitoring and
model for fault diagnosis with good anti-noise and domain adaptation control of industrial cyberphysical systems: With integrated plant-wide
ability on raw vibration signals,” Sensors, vol. 17, no. 2, pp. 425–445, monitoring and control framework,” IEEE Ind. Electron. Mag., vol. 13,
2017. no. 4, pp. 38–47, Dec. 2019.
[18] R. Liu, F. Wang, B. Yang, and S. J. Qin, “Multiscale kernel based residual
convolutional neural network for motor fault diagnosis under nonstationary
conditions,” IEEE Trans. Ind. Informat., vol. 16, no. 6, pp. 3797–3806,
Jun. 2020.
[19] L. Wen, X. Li, L. Gao, and Y. Zhang, “A new convolutional neural network-
based data-driven fault diagnosis method,” IEEE Trans. Ind. Electron.,
vol. 65, no. 7, pp. 5990–5998, Jul. 2018.
Pengcheng Xia received the B.S. degree in
[20] F. Wang, R. Liu, Q. Hu, and X. Chen, “Cascade convolutional neural
2018 from Shanghai Jiao Tong University,
network with progressive optimization for motor fault diagnosis under
Shanghai, China, where he is currently working
nonstationary conditions,” IEEE Trans. Ind. Informat., vol. 17, no. 4,
toward the Ph.D. degree, both in mechanical
pp. 2511–2521, Apr. 2021.
engineering.
[21] R. Yan, F. Shen, C. Sun, and X. Chen, “Knowledge transfer for rotary
His research interests include machinery
machine fault diagnosis,” IEEE Sensors J., vol. 20, no. 15, pp. 8374–8393,
health monitoring, prognostics and health
Aug. 2020.
management, machine learning, and signal
[22] W. Lu, B. Liang, Y. Cheng, D. Meng, J. Yang, and T. Zhang, “Deep model
processing.
based domain adaptation for fault diagnosis,” IEEE Trans. Ind. Electron.,
vol. 64, no. 3, pp. 2296–2305, Mar. 2017.
Yixiang Huang (Member, IEEE) received the Chengliang Liu received the B.S. degree in
B.S. degree in power and energy engineering, mechanical manufacturing from the Shandong
and the M.S. and Ph.D. degrees in mechatronics University of Technology, Shandong, China, in
engineering from Shanghai Jiao Tong Univer- 1985, and the M.S. and Ph.D. degrees in me-
sity, Shanghai, China, in 2002, 2006, and 2010, chanical engineering from Southeast University,
respectively. Nanjing, China, in 1991 and 1998, respectively.
He is currently an Associate Professor of me- He has been invited as a Senior Scholar with
chanical engineering with Shanghai Jiao Tong the University of Michigan, Ann Arbor, MI, USA
University, where he studies the topics of in- and the University of Wisconsin, Madison, WI,
telligent maintenance, prognostics, and ma- USA, since 2001. He is currently a Professor
chine learning. He was with the NSF Indus- with the Department of Mechanical Engineer-
try/University Cooperative Research Center for Intelligent Maintenance ing, Shanghai Jiao Tong University, Shanghai, China. His current re-
Systems, University of Cincinnati, USA. He is a regular reviewer for a search interests include mechatronic systems, MEMS design, intelli-
number of international journals. His current research interests include gent robot control, remote monitoring techniques, and condition based
big data analysis, sparse coding, and dimensionality reduction for in- monitoring.
dustrial applications and computational intelligence techniques and their
applications in various industrial domains.
Lun Shi received the Ph.D. degree in mechan-

ical manufacturing and automation from the In-
Peng Li received the B.S. degree from stitute of Optics, Fine Mechanics and Physics,
the South China University of Technology, Chinese Academy of Sciences, Changchun,
Guangzhou, China, in 2018, and the M.S. China, in 2003.
degree from Shanghai Jiao Tong University, He is currently an Associate Professor with
Shanghai, China, in 2021, both in mechanical the School of Mechanical Engineering, Shang-
engineering. hai Jiao Tong University, Shanghai, China. His
His research interests include machine learn- research interests include intelligent industrial
ing and prognostics and health management of equipment and ultraprecision positioning and
machine tools. machining.

Fault_Knowledge_Transfer_Assisted_Ensemble_Method_for_Remaining_Useful_Life_Prediction

Uploaded by

Copyright:

Available Formats

You might also like

Fault_Knowledge_Transfer_Assisted_Ensemble_Method_for_Remaining_Useful_Life_Prediction

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fault_Knowledge_Transfer_Assisted_Ensemble_Method_for_Remaining_Useful_Life_Prediction

Uploaded by

Copyright:

Available Formats

1758 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO.

Fault Knowledge Transfer Assisted Ensemble

Fig. 1. Illustration of 1-D convolutional layer and 1-D pooling layer.

Fig. 2. Illustration of LSTM cell.

Fig. 3. Structure illustration of FTNN.

r̂k = Wrk · hk + bkr (19) where ck is the kth element of c.

Algorithm 1: The Proposed Fault Knowledge Transfer As-

Fig. 6. RMSE values on Bearing 3 with different hyperparameters.

16 just following the experience in [17]. In practice, we tend to

Fig. 7. Visualization of learned feature representations before domain

may become dominant along with time with respect to different

have estimated RULs more than 0.5 in our proposed method,

Lun Shi received the Ph.D. degree in mechan-

You might also like