Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2022 IEEE 5th International Conference on Electronics Technology (ICET)

Dual Mix-up Adversarial Domain Adaptation for


Machine Remaining Useful Life Prediction
Yanjun Dong Chunhua Zhou
2022 IEEE 5th International Conference on Electronics Technology (ICET) | 978-1-6654-8508-1/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICET55676.2022.9824464

University of Electronic Science and Technology of China Shanghai Radio Equipment Research Institute
Chengdu, China Shanghai, China
202021081830@std.uestc.edu.cn zhouchh800223@sina.com

Zhou Wu Mianzhi Cheng


Peking University Education Foundation Shanghai Radio Equipment Research Institute
Peking, China Shanghai, China
zhouwu@pku.edu.cn vocalano@163.com

Abstract—Remaining useful life (RUL) prediction is one of popular along with the development of machine learning and
the core issues in the equipment maintenance process. It aims deep learning, studies like [7], [8] show data-driven methods’
to accurately forecast machines’ run-to-failure life span using effectiveness, and no prior knowledge of the machines is
previous and current state data. As various data-driven models
are proving to be effective, because RUL labels for machines in needed.
particular conditions are difficult to obtain, domain adaptation Traditional deep learning approaches only works when large
approaches begins to be explored in the RUL prediction issue. We number of labeled data are supplied and there is no distribution
propose a novel Dual Mix-up Adversarial Domain Adaptation difference between training data and testing data. When it
(DMADA) approach to further improve the RUL forecasting comes to real-world industrial application, however, it can be
accuracy, building on existing RUL domain adaptation studies.
In DMADA, both time-series mix-up and domain mix-up reg- difficult to collect sufficient labeled data of the aiming machine
ularization are conducted. Virtual samples generated by linear under a specific working condition because of time or financial
interpolation lead to enriched and more continuous sample space. reasons. Domain adaptation has been developed to address
The linear interpolation encourages consistency of prediction in- such issues.
between samples and allows the model to explore the feature In the RUL prediction problem, raw data is sampled read-
space more thoroughly. At the same time, the domain mix-
up conserves the invariance of learned features. The two mix- ings from continuous signal streams on different sensors. It
up regularizations combined promote both the transferability leads to a gap between the sampled data distribution and the
and the discriminability of extracted features, which is essential real data distribution due to real-world precision constraints.
to satisfactory unsupervised domain adaptation performance. As a result, even though at last the target domain is well-
Thorough experiments on the C-MPASS dataset are conducted aligned with the source domain, the prediction effect in the
and satisfactory results prove the proposed approach effective.
target domain is not good enough, because the data from both
Index Terms—RUL prediction,Adversarial domain adaptation, domains is sampled sparsely and the learned knowledge from
mix-up
the source domain is flawed at the first place.
I. I NTRODUCTION To address this issue and inspired by the mix-up tech-
nique [9], [10], we proposed a novel dual mix-up adversarial
The goal of remaining useful life (RUL) prediction is to domain adaptation (DMADA) approach to conduct domain
accurately predict the remaining time for a machine to adaptation on the RUL prediction problem. In our approach,
normally function until a failure happens. According to the a time-series mix-up regularization and a domain mix-up
analysis of the remaining life prediction results, the equip- regularization are both designed to close the gap between
ment’s availability and reliability can be improved. At the same sampled data distribution and real data distribution. Mix-up
time, maintenance costs and the risk for operations failure generates convex combinations of virtual samples. With the
events could be reduced. Studies on RUL prediction helps to dual mix-up regularization, wider range of samples is available
mitigate serious costs brought by equipment’s sudden failures, and the prediction in-between samples are enforced to be
which have important practical value [1]. more consistent. The time series mix-up leads to a more
The methods for forecasting RUL is mainly divided into two continuous and enriched sample space, allowing the feature
categories: model-based ones [2], [3] and data-driven ones [4]– extractor to explore more thorough information and to learn
[6]. Model-based methods require information from machine more discriminative representations. At the same time, the
constructions as well as statistical knowledge to build physical domain mix-up strengthens the domain discriminator, resulting
models. They are complex to build, expertise-requiring, and in the model learning more domain-invariant representations
hard to generalize. Data-driven approaches are becoming more due to the feature extractor being adversarially trained with

978-1-6654-8508-1/22/$31.00 ©2022 IEEE 573


Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:39:03 UTC from IEEE Xplore. Restrictions apply.
the domain discriminator. In this paper, our contributions are and [20] applied the idea of contrast learning to maximize
as follows: the mutual information of target domain samples and target
1. The novel DMADA is proposed leveraging time-series domain features to achieve the best results so far among
mixup and domain mix-up to bridge the distance between sam- the domain adaptive methods on C-MPASS dataset. Li [21]
pled data and real-world data distributions on cross-domain combined the adaptive batch regularization with the AdaBN-
RUL prediction problems. DCNN framework by combining deep convolutional networks.
2. Our method encourages better alignment between two Fu et al [22] designed a deep residual LSTM network with
domains and thorough experiments are done showing an domain invariance to extract high-dimensional features.
overall more precise RUL prediction.
III. M ETHODOLOGY
II. R ELATED WORK A. Settings and notations
A. RUL Prediction To illustrate the problem clearly, we present below the math-
Deep learning methods for RUL prediction are mainly ematical symbols for the basic domain adaptation issue [23]. In
divided into a convolutional neural network (CNN) based domain adaptation, a domain is denoted as D = {X , P (X)},
ones and recurrent neural network (RNN) based ones. [7] where X is the feature space and the P (X) represents the
was the first attempt to adopt CNN in RUL prediction. Li marginal distribution of the data in the feature space. There
[11] developed a multi-scale feature extraction method and are two basic domains: the source domain DS ={XS , PS (X)}
claims to connect the raw data directly to the true RUL value. with labels and the target domain DT ={XT , PT (X)} without
LSTM is a widely used RNN variant and is designed to address labels. The general idea of unsupervised domain adaptation is
long-term sequence feature extraction. Cheng [8] proposed to to use the knowledge learned in the labeled source domain,
ensemble Bayesian inference with a series of LSTM modules. transfer it to the unlabeled target domain and get better results.
Shi [12] developed a dual LSTM framework and achieve real- In our machine RUL prediction problem, we aim to conduct
time RUL prediction with high precision. Al-Dulaimi [13] supervised deep learning in one working condition and predict
combined LSTM with CNN and design a hybrid approach other engines’ RUL value in different working conditions.
is also proven useful. The source domain is denoted as DS ={XSi , ySi }ni=1 S
. The
i M ×K
nS is the total number of samples. In XS ∈ R , M
B. Adversarial Domain Adaptation represents the M input sensors, and K means K time steps.
Inspired by the famous Generative Adversarial Network, ySi ∈ R is the RUL label. Likewise, the target domain is
Gainin [14] propose the milestone Domain Adversarial Neural denoted as DT = {XTj }nj=1 T
. XTj ∈ RM ×K and nT is the
Network, in which a domain discriminator aiming and the overall number of samples in the target domain. Firstly, in
feature extractor is trained adversarially together. The domain source domain we pre-train a feature extractor ES and a
discriminator distinguishes whether a feature comes from RUL predictorR, conducting a conventional supervised deep
the source domain or target domain, enforcing the feature learning. The weights of ES are transferred to the feature
generator to learn domain-invariant features. However, for extractor in target domain ET for initialization and the RUL
the multi-class classification task, even when the domain predictor remain unchanged. A domain discriminator D is set
discriminator is fooled, there is no guarantee that their feature to adversarially training together with ET , ET is optimized
distribution is similar. Long et al [15] proposed Conditional with adversarial loss and Info-loss, as well as time-series mix-
Domain Adversarial Network (CDAN) and use multi-linear up loss and domain mix up loss. A detailed explanation of each
conditioning to solve this issue. Furthermore, CDAN aligns step is provided in following sections.
the feature-category joint distribution to align features as well
as corresponding labels. Tzeng [16] proposes to use soft B. Supervised pre-training
label distribution matching loss to transfer knowledge between 1) Recurrent Feature Extractor: In the time-series regres-
two domains. Later, Tzeng [17] propose the Adversarial Dis- sion problem, the long short term memory network [24] is
criminative Domain Adaptation (ADDA) that divides training widely used. It introduces a gating mechanism to control
feature generator as source one and target one. Only the the accumulation rate of information, which can selectively
target domain feature generator is adversarially trained with add new information and selectively forget the accumulated
the domain discriminator. information. In our proposed DMADA, we leverage a five-
layer bi-directional LSTM model as the feature extractor. It
C. Cross-domain RUL prediction processes the time series in both directions, leading to more
Domain adaptation methods are mainly developed in classi- thorough exploitation in samples.
fication tasks, and the RUL prediction is a time-series regres- 2) RUL predictor: The RUL predictor is a multi-layer
sion problem. THus, fewer domain adaptation methods have neural network R. It process the features extracted with the
been studied in this issue. [18] is the first attempt of the domain bi-LSTM module : fS = ES (XS ), to a one dimensional RUL
adaptive method on the C-MPASS dataset, which employs value we desired:Rd → R. In the pre-train supervised deep
the basic DANN approach. [19] applied the framework of learning process, the feature extractor ES and RS is trained
ADDA, and then added Info-loss to the ADDA framework together with the mean square error loss (RMSE) between the

574
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:39:03 UTC from IEEE Xplore. Restrictions apply.
true value of RUL labels and the predicted RUL values through
XS fS
the network. The loss is formulated as below:
RUL Prediction
1 X
nS
2 Mix-up ES R
Lmse = ŷSi − ySi , (1) Time Series
nS i=1 XS fS Mix-up Regularization

where nS is the total number of source samples, ŷS = Weight Transfer


Domain Mix-up
R(ES (XS )is the predicted RUL value, and yS is the true value XT fT
Regularization
of the corresponding sample.
D Domain Classification
Mix-up ET
C. Dual Mix-up Adversarial Domain Adaptation (DMADA)
XT fT InfoNCE
1) Adversarial Domain Adaptation: In the adversarial do-
main adaptation module, ET and R are the feature extractors
and the RUL predictor respectively. Since there is no labeled Fig. 1. The architecture of the proposed dual mix-up adversarial domain
adaptation (DMADA).
data in the target domain, we initialize ET and R with the
weights of the pre-trained source model. Samples xS and xT
are all feed into ET and get features fS and fT . To encourage where λ is randomly chosen from a beta distribution β(α, α),
ET to learn more domain-invariant representations, a domain and α ∈ (0, ∞). This idea is relatively intuitive, yet effective.
discriminator D is built. The domain discriminator’s task is It leverages linear interpolation to enrich the dataset which
to distinguish which domain does a feature comes from as leads to a more continuous latent feature space. Mix-up does
accurately as possible. The adversarial training formulation is not acquire prior expertise of the dataset, and it frequently
as follows: improves the deep learning model’s generalization ability [9].
min max Ladv =EXs ∼PS [log D (ES (XS ))] Both time-series and domain mix-ups are conducted on a time
ET D cycle level.
(2)
+EXT ∼PT [log (1 − D (ET (XT )))] , 1) Time-series Mix-up Regularization: The time-series
where ES and ET denoted the feature extractor of the source mix-up regularization is divided into two situations: labeled
domain and target domain respectively. The XS and XT one and unlabeled one. For the source domain, all time-series
represent the time-series data from the two domains. During data has a corresponding label, and a random pair from the
the training process, domain discriminator D tries to minimize same mini-batch is selected to generate (exS , yeS ). The mix-up
the Ladv while the feature extractor ET tries to maximize source sample encourages the prediction to be more consistent
Ladv to learn domain-invariant representations. In this way, in-between samples:
we reduce the discrepancy between the feature distributions LrS (ES , R) = E(xi ,yi ),(xj ,yj )∼DS ℓ(h(e
xS ), yeS ), (5)
S S S S
of the target domain and that of the source domain.
2) InfoNCE Loss: Adversarial domain adaptation methods where h is the composition of feature extractor ET and the
successfully transfer knowledge by learning domain-invariant RUL predictor R , h = R ◦ E. As for the target domain,
features. Unfortunately, the target domain-specific information there are no real label. Fig. 1 shows the process of the
could be ignored when minimizing the adversarial loss. Our unlabeled time-series mix-up. First of all, in order to utilize
DMADA follows the suggestion of CADA [20] to preserve the mix-up regularization, we decided to leverage the outcome
target domain-specific features by leveraging the InfoNCE of RUL predictor R: ŷT = h(xT ) as the pseudo label of
loss. The formulation of the InfoNCE loss is as follows: the target domain.In this way we are able to conduct the
" # time-series convex combination, denoted as (e xT , Mλ (ŷTi , ŷTj ),
eϕk (xk ,fT ) i j
with a pair of data (xT , xT ) and their corresponding pseudo
min LInfoNCE = − E log P ϕk (xj ,fT )
, (3)
ET ,Θ XT
xj ∈XT e labels (ŷTi , ŷTj ). Other than that, because Mλ (ŷTi , ŷTj ) is the
direct mix-up of the pseudo labels, another obvious constraint
where ϕk (xj , fT ) = xj T Θ (fT ), and Θ is a fully connected surfaced. It is preferable that Mλ (ŷTi , ŷTj ) and h(e xT ) to be
layer. consistent.
D. Dual Mix-up Regularization xT ), Mλ (ŷTi , ŷTj )),
LrT (ET , R) = E(xi ,xj )∼DT L1 (h(e (6)
T T
In our proposed DMADA, we apply the dual mix-up method
We use the L1-Norm function to punish the difference between
to the domain adaptation issue of the RUL prediction problem.
Mλ (ŷTi , ŷTj ) and h(e
xT ) .
Generally, the mix-up is a data augmentation method produc-
2) Domain Mix-up Regularization: In the above adver-
ing virtual training samples [9]. The principle of mix-up is
sarial domain adaptation framework, both source features and
to randomly select two sample pairs {xi , yi } and {xj , yj } to
target domain features are fed into the domain discriminator
generate convex combinations:
D. Because there is considerable discrepancy between the
x̃ = M λ(xi , xj ) = λxi + (1 − λ)xj , source domain and the target domain, the discriminator D
(4)
ỹ = M λ(yi , yj ) = λyi + (1 − λ)yj , should be able to not only differentiate true features but also

575
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:39:03 UTC from IEEE Xplore. Restrictions apply.
TABLE I TABLE II
P ROPERTIES OF 4 DATASETS PARAMETER SETTINGS FOR λ AND PREHEAT EPOCHS

Dataset FD001 FD002 FD003 FD004 Scenario λ Number of preheat epoch


#Training engines 100 260 100 249 FD001→FD002 0.2 10
#Testing engines 100 259 100 248 FD001→FD003 0.2 60
#Max life span(cycles) 362 378 512 128 FD001→FD004 0.001 60
#Operating conditions 1 6 1 6 FD002→FD001 0.001 40
#Fault types 1 1 2 2 FD002→FD003 0.001 40
FD002→FD004 0.2 20
FD003→FD001 0.2 70
FD003→FD002 0.2 40
FD003→FD004 0.2 60
the mixed up features fS = ES (xS ) and fT = ET (xT ). FD004→FD001 0.2 40
By this method, discriminator D can learn more about the FD004→FD002 0.001 40
domain difference and distinguish more precisely among the FD004→FD003 0.001 40
latent feature space. The adversarial loss after domain mix-up
is as follows:
Lradv (ES , ET , D) = E(xi ,xj )∼DS log D(ES (e
xS )) window method. One sample is composed of 30 continuous
S S
(7) time-cycle and the step size is 1. Then, we chose the following
+E(xi ,xj )∼DT log(1 − D(ET (e
xT ))),
T T
sensors: S2, S3, S4, S7, S8, S9, S11, S12, S13, S14, S15,
Because the feature extractor ET and the domain discriminator S17, S20, and S21, since they presented informative with
D is trained adversarially, the performance of ET improves the degradation of engines. Thus, every input sample would
as that of discriminator D improves. The feature Extractor be 30×14 in dimension. Also, a linear piece-wise RUL is
ET will learn more domain-invariant representation during the adopted. In our experiment when the RUL label is larger than
process of minimizing the Lradv . 130, it is re-set to 130 cycles for the machine is considered
3) Overall loss function: healthy. At last, in order to avoid value difference from
min maxLrS (ES , R) + LrT (ET , R) + Ladv (ES , ET , D) different working conditions influencing the adaption process,
E,Θ D
, (8) all data are Min-Max normalized into [0,1].
+ Lradv (ET , D) + λLInfoNCE (ET , Θ)
B. Experiment Settings
where LrS (ES , R), LrT (ET , R) and LNCE (ET , Θ) are used to In our proposed DMADA, there is a source domain feature
optimize the feature extractor on target domain ET , and loss extractor ES , a target domain feature extractor ET , a general
Ladv (ES , ET , D) and Lradv (ET , D) are leverage to updated RUL predictor R and a domain discriminator D. Both feature
the domain discriminator D. The two losses mentioned both extractors of the source domain and target domain is five
contribute to improve the performance of the feature extractor. consecutive bi-direction LSTM layers with 32 neurons in
Particularly, the two mix-up regularization enriches the dataset, each layer. The RUL predictor R is composed of three fully
smooth the output distribution and make the latent feature connected (FC) layers with 32, 16, and 1 hidden neurons
space more continuous. The time-series mix-up bridges the respectively. The domain classifier is also built with three
gap between sample distribution and real data distribution successive FC layers with 64, 32, and 1 hidden neurons
encouraging better alignment between two domains. Domain respectively. All the FC layer is followed by a nonlinear
mix-up enforces the learned representation to be more domain- activation function (ReLU) and dropout regularization is used
invariant. Combined together in DMADA the two mix-up leads to mitigate the over-fitting problem.
to better performance of RUL prediction. Following the study of [20], during training, each mini-batch
IV. E XPERIMENT contains 256 shuffled samples from engines in same operation
condition.Adam optimizer is used when training to minimize
A. Data Preparation losses and the learning rate is both 0.5e-5 for the domain
we use the data from Turbo Engine Degradation Simulation discriminator D and feature extractor ET . For the InfoNCE
using C-MAPSS [26] for our experiment. There are four module the learning rate is set to 1e-2. The training epoch
different combinations of working conditions and fault modes range varies from 100 to 300 epochs for different tasks. As
are denoted as FD001, FD002, FD003, and FD004. Under each for the weight of InfoNCE loss λ we follow [20] settings as
operating condition, samples of readings from various sensors well.
on different engines are collected. The goal is to leverage the We follow the studies [9], [10] and chose α = 0.2 for
signal data and predict the remaining normally working cycle the β distribution. As Fig. 2 shows, when λ is chosen from
as precisely as possible. β(0.2, 0.2), most generated virtual samples would be much
The goal is to leverage the sampled signal data and pre- closer to the real sample. In this way, we enrich the dataset and
dict another engine’s remaining normally working cycle as avoid producing intermediate samples which are far from real
precisely as possible. Following the study of [18], [20], firstly samples at the same time. In our DMADA, domain mix-up, as
we process the raw time-dependent signal data with the sliding well as time-series mix-up in the source domain, are conducted

576
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:39:03 UTC from IEEE Xplore. Restrictions apply.
TABLE III
C OMPARISON AGAINST FIVE STATE - OF - THE - ART M ETHODS

RMSE Score
Method WDGRL [25] ADDA [19] RULDDA [18] CADA [20] DMADA WDGRL ADDA RULDDA CADA DMADA
FD001→FD002 21.46 31.26 24.08 19.52 18.40 33160 4865 2684 2122 2016
FD001→FD003 71.7 57.09 43.08 39.58 25.22 15936 32472 10259 8415 2417
FD001→FD004 57.24 56.66 45.7 31.23 28.16 86139 68859 26981 11577 7324
FD002→FD001 15.24 19.73 23.91 13.88 13.65 157572 689 2430 351 316
FD002→FD003 41.45 37.22 47.26 33.53 28.18 19053 11029 12756 5213 3575
FD002→FD004 37.62 37.64 45.17 33.71 31.23 523722 16856 25738 15106 12831
FD003→FD001 36.05 40.41 27.15 19.54 19.90 18307 32451 2391 1451 1667
FD003→FD002 40.11 42.53 30.42 19.33 19.20 32112 459911 6754 5257 4608
FD003→FD004 29.98 31.88 31.82 20.61 21.38 296061 82520 5775 3219 3503
FD004→FD001 42.01 37.81 32.37 20.10 19.63 45394 43794 13377 1840 1092
FD004→FD002 35.88 36.67 27.54 18.5 19.07 38221 23822 4937 4460 3735
FD004→FD003 18.18 23.59 23.31 14.49 14.12 77977 1117 1679 682 665

relatively more dangerous than predicting the failure happens


sooner than it does. The score formulates as follows:
 P  −ŷ 
 1 N e yi13 i
− 1 , if (ŷi < yi ) .
N i=1
Score = PN  ŷi −yi  (10)
1

N i=1 e
10 − 1 , if (ŷi > yi ) .
There are four datasets with different operating conditions
and fault modes. We treat one of them as the source domain
and another one as the target domain in a roll. The pre-training
is conducted with source domain labeled data and domain
alignment is performed with the target domain unlabeled data.
Fig. 2. The probability density function of beta distribution with different α This way we get 12 scenarios with each dataset being source
and target in a term. We compare our approach against the
other five state-of-art methods which also study cross-domain
from the very first epoch. In contrast, when conducting time- RUL prediction on the C-MPASS dataset. As shown in Table
series mix-up in the target domain, after waiting for a few III, 9 out of 12 tasks the proposed DMADA gets the best
epochs, the output of the RUL predictor is considered as the result. Particularly, our approach shows great improvement
pseudo RUL values. Because the predicted RUL value is likely in tasks like FD002→FD003 with such differences between
approaching the target domain true corresponding label after them. Overall, our approach outperforms the second-best result
training a while. So the target domain time-series mix-up loss (CADA) on average by 6% in RMSE and 18% in Score.
is added into the overall algorithm after a few epochs. The
D. Ablation study
waited epoch (preheat epoch) setting is different for every task
and the detail is listed in Table II. In this section, an ablation study is performed to investigate
the independent contribution of the two mix-up modules from
the DMADA. We conducted experiments leveraging without
C. Experiment Results either time-series mix-up and domain mix-up respectively on
12 adaptation tasks without either mix-up regularization. As
To evaluate the performance of our DMADA model, we Table IV shows, removing time-series mix-up or domain mix-
follow the study of [18], [20], both root mean square error up on average does not affect much on the RMSE results.
(RMSE) method and the Score function are leveraged. The However, without the time-series mix-up, the Score results are
RMSE is formulated as follows: 5.1% worse and without domain mix-up, the average Score is
27.66% worse compared to the DMADA. And Score function
v
u N
as an evaluation criterion is valued more because it is preferred
u1 X 2 the predicted remaining life expectancy is no longer than the
RMSE = t (ŷi − yi ) , (9)
N i=1 true RUL. Furthermore, DMADA performs the best when
considering RMSE and the score together, indicating the time-
series mix-up and domain mix-up conducted together indeed
where N represents the total number of testing samples, and
helps to accurately predict the RUL value.
ŷ denotes the predicted RUL value.
The Score function punishes the situation more when the V. C ONCLUSION
predicted RUL value is larger than the true label. Because We propose a novel Dual Mix-up adversarial domain adap-
in that case, the model predicts machines deteriorate slower, tation approach to conducting cross-domain RUL prediction.

577
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:39:03 UTC from IEEE Xplore. Restrictions apply.
TABLE IV
T HE ABLATION STUDY OF TWO MIX - UP MODULES

RMSE Score
w/o time-series mix-up w/o domain mix-up DMADA w/o time-series mix-up w/o domain mix-up DMADA
FD001→FD002 18.89 2.66% 18.70 1.63% 18.40 2204 9.29% 2039 1.10% 2017
FD001→FD003 22.64 -10.23% 25.37 0.59% 25.22 1724 -28.67% 2244 -7.15% 2416
FD001→FD004 29.48 4.69% 29.32 4.12% 28.16 7265 -0.81% 7288 -0.50% 7324
FD002→FD001 13.50 -1.10% 13.62 -0.22% 13.65 312 -1.32% 333 5.33% 316
FD002→FD003 29.67 5.29% 29.88 6.03% 28.81 3860 7.96% 4090 14.39% 3575
FD002→FD004 31.24 0.03% 31.18 -0.16% 31.23 23245 81.16% 17007 32.55% 12831
FD003→FD001 21.48 7.94% 18.65 -6.28% 19.90 2042 22.46% 1231 -26.17% 1667
FD003→FD002 19.76 2.92% 18.91 -1.51% 19.20 6801 47.58% 6568 42.52% 4608
FD003→FD004 21.41 0.14% 21.21 -0.80% 21.38 6028 72.03% 3559 1.57% 3503
FD004→FD001 20.62 5.04% 19.90 1.38% 19.63 1243 13.74% 1153 5.51% 1093
FD004→FD002 19.71 3.36% 18.06 -5.39% 19.70 6701 79.41% 3653 -2.20% 3735
FD004→FD003 14.04 -0.57% 14.26 0.99% 14.12 859 29.08% 627 -5.78% 665
Average 1.68% 0.04% 27.66% 5.10%

The DMADA combines the time-series mix-up and domain [13] A. Al-Dulaimi, S. Zabihi, A. Asif, and A. Mohammadi, “Hybrid deep
mix-up to the adversarial domain framework. The enriched neural network model for remaining useful life estimation,” in ICASSP
2019-2019 IEEE International Conference on Acoustics, Speech and
dataset leads to a more continuous latent feature space. Better Signal Processing (ICASSP), 2019, pp. 3872–3876.
alignment between the source domain and target domain is [14] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavio-
achieved because DMADA encourages the sample distribu- lette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of
neural networks,” 2016.
tion to be closer to real distribution in the two domains. [15] M. Long, Z. CAO, J. Wang, and M. I. Jordan, “Conditional adversarial
The DMADA gets satisfactory results for RUL prediction domain adaptation,” in Advances in Neural Information Processing
precision on the C-MPASS dataset, an average of 6% and Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-
Bianchi, and R. Garnett, Eds., vol. 31, 2018.
18% performance improving are achieved in two evaluation [16] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, “Simultaneous
metrics comparing to the state-of-the-art methods, indicating deep transfer across domains and tasks,” in Proceedings of the IEEE
its effectiveness. International Conference on Computer Vision (ICCV), December 2015.
[17] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discrim-
R EFERENCES inative domain adaptation,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), July 2017.
[1] Fault Diagnosis. John Wiley Sons, Ltd, 2006, ch. 5, pp. 172–283. [18] P. R. de Oliveira da Costa, A. Akçay, Y. Zhang, and U. Kaymak,
[2] Y. Lei, N. Li, S. Gontarz, J. Lin, S. Radkowski, and J. Dybala, “A “Remaining useful lifetime prediction via deep domain adaptation,”
model-based method for remaining useful life prediction of machinery,” Reliability Engineering System Safety, vol. 195, p. 106682, 2020.
IEEE Trans. Reliab., vol. 65, no. 3, pp. 1314–1326, 2016. [19] M. Ragab, Z. Chen, M. Wu, C. K. Kwoh, and X. Li, “Adversarial
[3] N. Li, Y. Lei, T. Yan, N. Li, and T. Han, “A wiener-process-model- transfer learning for machine remaining useful life prediction,” in 2020
based method for remaining useful life prediction considering unit-to- IEEE International Conference on Prognostics and Health Management
unit variability,” IEEE Trans. Ind. Electron., vol. 66, no. 3, pp. 2092– (ICPHM). IEEE, 2020, pp. 1–7.
2101, 2019. [20] M. Ragab, Z. Chen, M. Wu, C. S. Foo, C. K. Kwoh, R. Yan, and
[4] Y. Wang, Y. Zhao, and S. Addepalli, “Remaining useful life prediction X. Li, “Contrastive adversarial domain adaptation for machine remaining
using deep learning approaches: A review,” Procedia manufacturing, useful life prediction,” IEEE Trans. Ind. Informatics, vol. 17, no. 8, pp.
vol. 49, pp. 81–88, 2020. 5239–5249, 2021.
[5] B. Yang, R. Liu, and E. Zio, “Remaining useful life prediction based on [21] J. Li, X. Li, and D. He, “Domain adaptation remaining useful life
a double-convolutional neural network architecture,” IEEE Trans. Ind. prediction method based on adabn-dcnn,” in 2019 Prognostics and
Electron., vol. 66, no. 12, pp. 9521–9530, 2019. System Health Management Conference (PHM-Qingdao). IEEE, 2019,
[6] K. Liu, Y. Shang, Q. Ouyang, and W. D. Widanage, “A data-driven pp. 1–6.
approach with uncertainty quantification for predicting future capacities [22] S. Fu, Y. Zhang, L. Lin, M. Zhao, and S.-s. Zhong, “Deep residual
and remaining useful life of lithium-ion battery,” IEEE Trans. Ind. lstm with domain-invariance for remaining useful life prediction across
Electron., vol. 68, no. 4, pp. 3170–3180, 2021. domains,” Reliability Engineering & System Safety, vol. 216, p. 108012,
[7] G. Sateesh Babu, P. Zhao, and X.-L. Li, “Deep convolutional neural 2021.
network based regression approach for estimation of remaining useful [23] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transac-
life,” in International conference on database systems for advanced tions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–
applications, 2016, pp. 214–228. 1359, 2010.
[8] Y. Cheng, J. Wu, H. Zhu, S. W. Or, and X. Shao, “Remaining useful life [24] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
prognosis based on ensemble long short-term memory neural network,” Computation, vol. 9, pp. 1735–1780, 1997.
IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. [25] J. Shen, Y. Qu, W. Zhang, and Y. Yu, “Wasserstein distance guided
1–12, 2020. representation learning for domain adaptation,” in Thirty-second AAAI
[9] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond conference on artificial intelligence, 2018.
empirical risk minimization,” arXiv preprint arXiv:1710.09412, 2017. [26] A. Saxena, K. Goebel, D. Simon, and N. Eklund, “Damage propagation
[10] Y. Wu, D. Inkpen, and A. El-Roby, “Dual mixup regularized learning modeling for aircraft engine run-to-failure simulation,” in 2008 Interna-
for adversarial domain adaptation,” vol. 12374, 2020, pp. 540–555. tional Conference on Prognostics and Health Management, 2008.
[11] H. Li, W. Zhao, Y. Zhang, and E. Zio, “Remaining useful life prediction
using multi-scale deep convolutional neural network,” Applied Soft
Computing, vol. 89, p. 106113, 2020.
[12] Z. Shi and A. Chehade, “A dual-lstm framework combining change point
detection and remaining useful life prediction,” Reliability Engineering
& System Safety, vol. 205, p. 107257, 2021.

578
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:39:03 UTC from IEEE Xplore. Restrictions apply.

You might also like