LM-CNN A Cloud-Edge Collaborative Method For Adaptive Fault Diagnosis With Label Sampling Space Enlarging

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO.

12, DECEMBER 2022 9057

LM-CNN: A Cloud-Edge Collaborative Method


for Adaptive Fault Diagnosis With Label
Sampling Space Enlarging
Lei Ren , Member, IEEE, Zidi Jia , Graduate Student Member, IEEE, Tao Wang ,
Yehan Ma , Member, IEEE, and Lihui Wang

Abstract—In cloud manufacturing systems, fault diag- changes in the production and operation mode of the manufac-
nosis is essential for ensuring stable manufacturing pro- turing industry. Cloud resources have been applied to all stages
cesses. The most crucial performance indicators of fault of manufacturing in depth. A new manufacturing paradigm has
diagnosis models are generalization and accuracy. An ur-
gent problem is the lack and imbalance of fault data. To been spawned, namely cloud manufacturing (CMfg) [1], driving
address this issue, in this article, most of existing ap- the development and transformation of today’s manufacturing
proaches demand the label of faults as a priori knowledge industry.
and require extensive target fault data. These approaches As the most important manufacturing resource in CMfg sys-
may also ignore the heterogeneity of various equipment. tem, manufacturing equipment must operate stably and continu-
We propose a cloud-edge collaborative method for adap-
tive fault diagnosis with label sampling space enlarging, ously. Predictive maintenance of complex equipment is particu-
named label-split multiple-inputs convolutional neural net- larly crucial since equipment failures may bring production pro-
work, in cloud manufacturing. First, a multiattribute co- cesses to a halt and pose significant safety risks [2]. As an essen-
operative representation-based fault label sampling space tial part of predictive maintenance, fault diagnosis plays an im-
enlarging approach is proposed to extend the variety of
portant role in engineering systems [3], such as generating sets,
diagnosable faults. Besides, a multi-input multi-output data
augmentation method with label-coupling weighted sam- chemical equipment and precision machine tools, which require
pling is developed. In addition, a cloud-edge collaborative extremely high reliability and safety for efficient operation. It is
adaptation approach for fault diagnosis for scene-specific critical to detect and identify any type of potential anomalies and
equipment in cloud manufacturing system is proposed. Ex- faults soonest, implement fault-tolerant operations to minimize
periments demonstrate the effectiveness and accuracy of
performance degradation and avoid hazardous situations [4].
our method.
Due to the distributed heterogeneous integration, on-demand
Index Terms—Cloud-edge collaboration, cloud manufac- dynamic architecture, collaborative interoperability, and high
turing system, fault diagnosis, label-split multiple-inputs scalability of CMfg system [5], the equipment fault diagnosis
convolutional neural network (LM-CNN).
system in CMfg needs to be more integrated, low-cost, and
high-efficiency.
I. INTRODUCTION CMfg systems connect massive manufacturing equipment in
HE booming development of cloud computing, Internet of the industrial Internet of Things (IIoTs) [6]. Monitoring the
T Things (IoT), and other technologies has led to profound equipment generates amounts of industrial data [7], which con-
tain massive industrial elements for equipment fault diagnosis.
Manuscript received 10 May 2022; accepted 28 May 2022. Date of Due to the development of technologies such as Big Data and
publication 8 June 2022; date of current version 30 September 2022.
This work was supported by The National Key Research and Develop-
artificial intelligence (AI), data-driven equipment health indica-
ment Program of China under Grant 2020AAA0109202 and the NSFC tor prediction methods have been extensively studied and are
(National Natural Science Foundation of China)under Project 92167108 gradually becoming the mainstream methods [8], [9]. For in-
and Project 62173023. Paper no. TII-22-2020. (Corresponding authors:
Lei Ren; Zidi Jia.)
stance, Ren et al. [10] proposed a convolutional deep belief net-
Lei Ren, Zidi Jia, and Tao Wang are with the School of Au- work (CDBN) autonomously selecting sensitive features. Wang
tomation Science and Electrical Engineering, Beihang University, Bei- et al. [11] proposed a multiattention 1-D convolutional neural
jing 100191, China (e-mail: renlei@buaa.edu.cn; jiazidi@buaa.edu.cn;
iamownt@buaa.edu.cn).
network (MA1DCNN) to enhance the feature grasping of fault
Yehan Ma is with the School of Electronic Information and Electrical pulses. Guo et al. [12] proposed a multitask convolutional neural
Engineering, Shanghai Jiao Tong University, Shanghai 200240, China network with multidomain feature fusion for fault diagnosis
(e-mail: yehanma@sjtu.edu.cn).
Lihui Wang is with the Department of Production Engineering, KTH
and localization. It is evident that industry-specific deep learn-
Royal Institute of Technology, 10044 Stockholm, Sweden (e-mail: lihui. ing methods have become the trend to address fault diagnosis
wang@iip.kth.se). issues.
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TII.2022.3180389.
Meanwhile, the distributed and integrated characteristics
Digital Object Identifier 10.1109/TII.2022.3180389 of CMfg systems also provide new solutions paradigms for

1551-3203 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:42:36 UTC from IEEE Xplore. Restrictions apply.
9058 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 12, DECEMBER 2022

industrial equipment fault diagnosis [13], [14]. The distributed method with label-splitting is proposed. The labels of faults
cloud-edge-based industrial data analysis model has become are deconstructed into sublabels with various attributes. These
the development trend and received extensive interests from sublabels are analyzed utilizing a multitask learning approach
academia and industry [15]. Cloud-edge collaborative solutions to redefine the faults. The sublabels can also be adopted to
can reduce latency and improve scalability [16], [17]. A large characterize some unknown faults that lack data and make pre-
number of relevant studies have been made. For instance, Kaur dictive diagnosis. Second, we develop a label coupling weighted
et al. [18] proposed a software-defined networking-based edge- sampling-based multiple-input multiple-output (MIMO) data
cloud interaction approach to handle streaming Big Data in augmentation method and a compatible neural network struc-
IIoT environments. Tang et al. [19] propose a cloud-assisted ture. The data from each edge are resampled and aggregated
and edge-decision manufacturing architecture that encompasses to eliminate the bias of the sample distribution and utilized
both the cloud and the production edge, providing data access to capture the variability and correlation among each fault.
and autonomous decision-making capabilities. Li et al. [20] Third, a cloud-edge collaborative scene-specific fault diagnosis
proposed an orchestration scheme that integrates edge-centric adapting approach is proposed. The global fault diagnosis model
computing and content-centric networking to improve the ser- in the cloud is further distilled for scene-specific and real-time
vice capabilities of 5G mobile networks. prediction of each edge.
Though fault diagnosis in CMfg system is extensively re- The rest of this article is organized as follows. Section II
searched, numerous pressing issues remain to be addressed. presents the related works. A fault diagnosis method with label
First, as industrial processes and industrial equipment structures sampling space enlarging is presented in Section III. The cloud-
become increasingly complex, many unknown conditions will edge collaborative framework of the adaptive fault diagnosis
inevitably occur [21]. Historical data may be unavailable for model is presented in Section IV. In Section V, we verify
their quantitative analysis. Thus, these unknown conditions are the effectiveness of the proposed method through experiments
often not predictable. However, these conditions may lead to on bearing fault diagnosis. Finally, Section VI concludes this
many immeasurable consequences without prompt handling. article.
Second, as IIoT Big Data in CMfg systems is generated on a
large scale without interruption, the data collected from various
equipment may be unbalanced [22]. The sample amounts of II. RELATED WORKS
different faults may vary greatly, and the distribution of the This section briefly reviews the related works about fault diag-
collected historical data may not match its actual distribution. It nosis for faults lacking of data and cloud-edge collaborative fault
greatly limits the accuracy of fault diagnosis models [23]. Third, diagnosis methods. We subsequently analyzed several problems
even collaborative training of fault diagnosis models with data of existing fault diagnosis methods.
collected from various edges is a feasible and effective method,
the working conditions of these scenarios may be diverse, re-
sulting in poor scenario adaptation of the trained models and A. Adaptive Fault Diagnosis for Faults Lacking of Data
possibly unnecessary costs. These problems make it difficult for With the rapid development of modern industrial systems, the
general methods to predict effectively. complexity of equipment is constantly increasing. The reliable
In CMfg system, industrial data from various edges can be operation of equipment is the key to industrial manufacturing.
fused with cloud-edge techniques to jointly train fault diagnosis Therefore, predictive maintenance of equipment, especially fault
models in the cloud [24]. With these data collected across diagnosis, receives significant interest. However, due to the
scenes, a more comprehensive and accurate fault diagnosis dynamic nature of the production tasks and status of industrial
system can be constructed. We intend to address the data-related equipment, the potential safety hazards of the equipment are
issues with two aspects: Interdata correlation and interlabel uncertain and constantly changing. With the increasing com-
correlation. Conventional deep learning-based fault diagnosis plexity of industrial equipment and industrial processes, many
methods generally just make a simple classifier to distinguish unknown faults lack historical data. Besides, in an industrial
all known faults. It is clearly not an ideal way of diagnosis. scenario, no or few data are available for faults that occur in daily
Those monitoring signals of similar faults (e.g., faults of the process operations. Since many faults may pose a significant
same diameter or location) may have similar characteristics, safety hazard, production equipment will mostly not be allowed
making it much more difficult to classify two faults with similar to run to faults and collect samples to train a fault diagnosis
labels than to distinguish two faults with completely disparate system [25]. This means that obtaining sufficient fault samples
labels. However, data from faults with highly similar labels can is time-consuming and expensive.
complement each other and play sample-enhancing roles. Be- To solve the problem of data imbalance or a lacking of fault
sides, with the coupling and orthogonality of various attributes data, it is usually needed to augment the data. A common
among labels, the label sampling space can be enlarged. To solve approach is to adopted generative models to generate fault data.
these issues, we propose a cloud-edge collaborative adaptive For instance, Wu et al. [26] used a locally weighted minority
fault diagnosis method, label-split multiple-inputs convolutional oversampling strategy to identify information from a small
neural network (LM-CNN). The main contributions are illus- number of fault samples and generated fault samples using
trated as follows. First, a fault label sampling space enlarging an EM-based interpolation algorithm. Liu et al. [27] propose

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:42:36 UTC from IEEE Xplore. Restrictions apply.
REN et al.: LM-CNN: A CLOUD-EDGE COLLABORATIVE METHOD FOR ADAPTIVE FAULT DIAGNOSIS 9059

a deep feature-augmented generative adversarial network to III. LABEL-SPLIT MULTIPLE-INPUTS CONVOLUTIONAL


improve the performance of imbalanced fault diagnosis. An- NEURAL NETWORK-BASED FAULT DIAGNOSIS
other acceptable approach is to utilize easily available historical
In this section, our proposed model, LM-CNN, of the diagno-
fault data to construct fault diagnosis models to be applied
sis for faults without samples is demonstrated. It is inconvenient
to those faults for which data is lacking [28]. Usually such
to diagnose these faults by traditional data-driven methods. The
issues are solved by transfer learning, meta-learning or domain
proposed method describes various attributes (location, size,
adaptation methods. For instance, Feng and Zhao [29] utilized
load, etc.) of each fault by splitting the label of each fault into
zero-shot learning to describe the fault classes and used super-
multiple sublabels and training the diagnostic model for each
vised principal component analysis to extract features associated
sublabel separately.
with the attributes. Lu et al. [30] utilized the sparse autoen-
coder for the deep domain adaptation of training and target
faults. A. Multiattribute Label-Splitting Based Label Sampling
However, these methods also demand the label of target faults Space Enlarging
as a priori knowledge or require extensive target fault data. The most neural network can diagnose the fault well with
Thus, these methods can only do simple knowledge transfer high-quality historical data. However, lacking data is a common
between known faults, and cannot make predictions for unknown problem for real industrial situation. A prerequisite for accurate
faults. Therefore, an adaptive fault diagnosis method, which predictions of specific faults is the availability of historical
can make predictions for unknown faults automatically, is of data for such faults. We are to investigate a method to make
application. We attempt to make predictions for faults that lack relatively reliable predictions for certain faults even without
samples by enlarging the sampling space of fault labels. By historical data. Thus, the fault label sampling space needs to be
analyzing known faults, a larger label space is constructed. enlarged. It is our concept of solution to obtain the knowledge
The attributes in the labels are deconstructed, and the char- of unknown faults by extracting the features of known faults.
acteristics of the faults are analyzed from the perspective of The industrial equipment fault data often contain massive types
various conjunctive attributes. Thus, the constructed fault di- of information, and some information exists in various faults
agnosis model can diagnose far more fault classes than known simultaneously. This leads to relevancies among these faults.
faults. The training process of a diagnostic model for a certain fault
can adopt samples from others. For example, vibration signal of
two rolling bearing faults with the same location but different
B. Cloud-Edge Collaborative Fault Diagnosis diameters can be adopted to complement each other’s training
CMfg systems require a distributed and integrated fault di- set.
agnosis architecture, so cloud-edge collaboration technologies Taking rolling bearing as an example, the fault diagnosis
are needed to empower that architecture. Cloud computing is system needs to determine the location, diameter, and load of
capable of handling huge amounts of complex data for resource the faulty bearing. Assume that the label of a fault can be
sharing and computing. Edge computing resources have low described as a fault of the kth diameter (k = 1, . . . , K) at the jth
transmission latency, and can be configured for real-time pro- location (j = 1, . . . , J) under load i (i = 1, . . . , I). Traditional
cessing of IIoT edge-specific scenarios. fault classification methods often train discriminators for M
Since cloud-edge collaboration builds a flexible and efficient classification (M is the number of the faults, M ≤ I ∗ J ∗ K),
computing architecture for cloud manufacturing [31], it has which prevents similar labels from being utilized by the model.
been extensively applied in the field of industrial equipment Meanwhile, misdiagnosis of faults often occurs between two
fault diagnosis. This cloud-edge approach to fault diagnosis faults with similar labels due to the highly similar characteristics
can improve the training speed and agility of the model [32]. of these two faults. That is, those faults with similar operating
Data from multiple edges can also be fused to collaboratively conditions or similar fault types may not be well distinguished.
train models [33]. However, most existing cloud-edge-based We call those faults whose labels are similar to the target fault
approaches do not account for the relevance and heterogeneity “Labeled Similar Faults (LSF)” and call the similar attributes as
of each edge scenario concurrently. Most cloud-edge-based IIoT sub-labels, as shown in Fig. 1.
Big Data analytics approaches are applied for one scenario With the assistance of the LSFs, the probability of such
itself or for all edge scenarios indiscriminately. As a result, misdiagnosis can be decreased. The historical data of a fault’s
the generalizability and specificity of these Big Data analytics LSFs can be used as enhanced data for the diagnostic model of
models are usually not satisfied simultaneously. the fault. We propose a method to enlarge label sampling space
It is a worthwhile research issue to structure a fault diagnosis by describing each sublabel. In doing so, some faults lacking of
architecture that is generalizable and applicable to all edge sce- data can be predicted.
narios. This consists of two main issues. One is to systematically The data distributions corresponding to the sublabels of these
integrate industrial data from each edge and use it to frame a unknown faults can be captured by analyzing their LSFs, since
global fault diagnosis model. The second is to adapt the global the sublabel of the target fault is the same as that of its LSFs’. If
fault diagnosis model to the scenario adaptive model for the various attributes of a fault can be characterized by its LSFs, it is
specific conditions of each edge. not essential to maintain massive amounts of this class of data.

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:42:36 UTC from IEEE Xplore. Restrictions apply.
9060 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 12, DECEMBER 2022

Fig. 1. Faults with the split labels. The relationship between the target
fault and its LSFs can be visualized in the figure. Subsequent operations
can be performed for the target fault and its LSFs.

Fig. 2. Faults with the split labels.


Of course, this approach is feasible provided that the number of
the LSFs of such fault is large enough. Each sublabel contains
two or more classes within it, regardless of how the sublabels are
divided. This ensures that the mining of the differences among A typical CNN consists of convolutional layers, pooling layers,
the LSFs and the other faults is sufficient to qualitatively describe and fully connected layers. We utilize discrete 2-D convolution,
the characteristic of this fault. which is
The label in this case can be split into three sublabels: Load, s
 s

diameter, and position. Diagnostic modeling separately obtains a x (i, j) = K(m, n) ∗ x(i + m, j + n) (2)
new submodel for each sublabel. With these diagnostic models, m=−s n=−s
results of I ∗ J ∗ K fusions are obtained to describe the faults.
With the splitting, the training set for each class of each sublabel where x, x , and K are the input, output, and convolution
consists of multiple origin classes, significantly increasing the kernel of the convolutional layer. The network exploits residual
sample richness. Besides, the LSFs are no longer a distraction connections to avoid gradient vanishing and explosion.
for the classification task as the classification task is replaced For an input x, the output ŷ for a classification model is the
with simpler tasks. probability distribution pθ (ŷ|x), which undoubtedly brings un-
The correct rate of prediction after label-split is higher than the certainty to the prediction. In the case of the neural network with
original one, due to the coupling between sublabels. Assume that MIMO configuration, assuming there are M faults, the network
for the prediction of faults, p(T ) > p(LSFi ) > p(Oj ), where receives M inputs simultaneously and provides M outputs [34],
p(T ) represents the probability of the target fault, and p(LSFi ) where each output is the prediction of its corresponding input,
and p(Oj ) represent the probability of the ith LSF and the jth as shown in Fig. 2. For the M samples simultaneously fed to the
other fault, respectively. Although the model’s prediction of an model, each one is sampled, probabilistically, from one of the M
LSF may be biased, its impact can still be reduced by predicting classes of faults. The fault class is not fixed at each location. The
the remaining LSFs. For S sublabels, the prediction function for number of resampled samples can be an exponential multiple of
the bearing fault in the example can be restructured to the original, so data aggregation is a data enhancement method.
Meanwhile, since it is a resampling method, it is not affected by
S
 the data imbalance practice.
p̂θ (T |x) = pθ (Ls |x) ≥ pθ (L1 , . . . , LS |x). (1) This configuration of neural networks can deliver ensembles
s=1 “for free.” And with only a small increase in computation, dis-
crepancies among various faults can be captured [35]. With this
B. Data Augmentation With Resampling and “ensemble” resampling method, the resampled data may better
“ensembled” CNN Fault Diagnosis Model
match the characteristics of the application scenario than the
In recent years, convolutional neural networks (CNN), due original dataset. The resampling strategy is that M sets of input
to their superior ability to capture local features of data, gained samples are aggregated together and resampled independently
many interests from academia and industry, and have become from the training set.
one of the most widely studied and applied neural networks. The model consists of two parts: A CNN module and a per-
Besides, CNN has fewer parameters and is less computationally ceptron. To the best of our knowledge, MIMO-based data aug-
intensive than attention-based mechanisms such as Transformer, mentation methods have not been used in the field of industrial
making them more suitable for industrial equipment health pre- equipment fault diagnosis. Thus, the CNN module is assigned
diction in industry. Causal convolution proposed for temporal is- in a MIMO structure to extract the hidden local pattern. In
sues has been shown to be more applicable to temporal problems contrast, the perceptron is employed to classify the faults with the
than others. Therefore, CNN is considered as the base network. captured pattern. The training process is modified concerning the

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:42:36 UTC from IEEE Xplore. Restrictions apply.
REN et al.: LM-CNN: A CLOUD-EDGE COLLABORATIVE METHOD FOR ADAPTIVE FAULT DIAGNOSIS 9061

conventional neural network. M samples are stitched together for the MIMO network, the subnetworks corresponding to the
as a whole and fed into the neural network simultaneously. The missing faults are no longer fed with the target fault samples,
loss function is the negative natural logarithm of the predictions and the remaining faults maintain their original proportions, i.e.,
of the M sets, as shown below: the resampling probabilities are
 M 
 1
Nl
 N
−Nl
LM (θ) = E − log pθ (ym |x1 , . . . , xM ) . (3) ΦLSF = N φnl (1 − φno )
x1 ,y1 ∈X 1− φn
... m=1 n=1 nl =1 no =1
xM ,yM ∈X
N

At the time of evaluation, the input data is tiled M times, i.e., 1
ΦO = N (1 − φn ). (7)
x1 = x2 = · · · = xM . Since each input is the same, each output 1− φn
n=1 n=1
of the model can be used to predict the fault class independently.
In addition, the M outputs of the model can be “ensembled” Assuming that there are K faults with historical data and n
through a fully connected network to provide more accurate faults without historical data, to construct a more specific model
predictions. for the missing fault, it is feasible to train a model with K + n
inputs and K + n outputs.
M

pθ (ŷ|x) = αm pθ (ym |x1 , . . . , xM ). (4)
C. Multitask Network Adaptive Fine-Tuning
m=1

Though MIMO can improve robustness and reduces uncer- As mentioned above, faults with similarly labels may be con-
tainty, there is no change in class specificity for the classification fused in the classification model and contain massive fault infor-
task. In fact, since the MIMO structure can be considered as an mation due to their great similarity. But mining the similarities
“ensembling,” the class-specific resampling method for “sub- and differences between LSFs requires complex neural network
network” is obviously to be considered. If a subnetwork focuses structures. It may also cause mutual interference. Therefore,
more on a specific fault, the ability of that subnetwork to describe a pretraining–finetuning training approach is considered. The
that fault is enhanced. Furthermore, from a macro perspective, differences between faults are captured in pretraining; in fine-
the whole network can explore both the specificity characteris- tuning, the correlations between faults are exploited to capture
tics of each fault and the similarities and discrepancies among the differences within sublabels.
faults. The fine-tuning stage is for classifying of sublabels, and
Considering this problem from a modeling perspective is the labels are split before. The labels are split into sublabels
obviously costly and may lead to uncertainties. From another according to the conditions and the multiple attributes contained
perspective, since the input data of the MIMO network needs to within the faults, such as the type of the workpiece, the position
be resampled, the class specificity of the “subnetwork” can be on the workpiece, the diameter of the fault, and the load of the
made by a resampling method. equipment, etc. Suppose there are N sublabels and each sublabel
The effect of sublabels can be considered when resampling. can be divided into cn classes (n = 1, . . . , N ), then a total of
 N
Obviously, the respective subnetworks of interests, from highest n=1 cn faults can be fused with the split sublabels.
to lowest, are the target fault, its LSFs, and other faults. Assume In the fine-tuning stage, a classifier is designed for each
that the original label is split into N sublabels. For sublabel sublabel (it is also feasible to fuse several sublabels into one),
n(n = 1, . . . , N ), it should be ensured that the target label is using the pretrained model. In this phase, the structure of the
sampled with probability ΦT > 0.5, i.e., CNN layer is kept unchanged and only the structure of the MLP
layer is modified.
N
 MLP is utilized in the pretraining model for fault classifi-
ΦT = φn > 0.5. (5) cation. For the split sublabels, MLP instead fails to highlight
n=1
the local specificity of sublabels. Thus, a radial basis function
For the LSFs and the other faults, the sampling probabilities (RBF) neural network is considered. The RBF neural network
ΦLSF and ΦO are contains three layers, i.e., input layer, hidden, and output. The
Nl
 N
−Nl input is projected nonlinearly to the hidden layer with RBFs.
ΦLSF = φnl (1 − φno ) Later, the hidden layer is mapped to the output layer with a fully
nl =1 no =1 connected layer. The activation function of the radial basis can
N
be expressed as

ΦO = (1 − φn ) (6) K  
1
n=1 R(x) = exp − 2  xk − ci 2 (8)

where Nl is the number of similar sublabels of the LSFs, and k=1

φnl represents the probability of the lth sublabel of the LSF. where σ and ci are the parameters of the Gaussian kernel.
For faults without samples, the data from their LSFs can The calculation of the loss function also takes the class
be utilized to form their training data. The faults are used as specificity into account when training the model. The MIMO
independent fault category for training the model, and samples model adopted in this manuscript has a total of M 2 outputs for
of its LSFs are used as the training set for that fault. At this point, M faults, and the output Oij represents the possibility that the

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:42:36 UTC from IEEE Xplore. Restrictions apply.
9062 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 12, DECEMBER 2022

Fig. 3. Pretraining and fine-tuning model of our method.

input sample of the ith subnetwork is predicted to be the jth


fault (i, j = 1, . . . , M ). The total loss is defined as follows:
LM (θ) = E ξLTM (θ) + ηLLSF O
M (θ) + ζLM (θ) (9) Fig. 4. Cloud-edge collaborative framework for our method.
x1 ,y1 ∈X
...
xM ,yM ∈X

where LT , LLSF , and LO represent the loss of the subnetworks removed by wavelet transform. The data preprocessed in the
corresponding to the target fault, LSFs, and the other faults edge plane is then transferred to the cloud plane.
(corresponds to the specific location of the subnetwork, rather
than the real sample), respectively, where ξ, η, and ζ are the
B. Cloud Plane
coefficients, and ξ ≥ η ≥ ζ > 0.
The flow of pretraining and fine-tuning is shown in Fig. 3. After the data is preprocessed on the edge plane, the data from
The CNN layer of the pretrained model is still retrained, and various equipment and plants are uploaded to the servers in the
the MLP layer is replaced by N groups RBF module (N is the cloud plane. The data will be used to train the fault diagnosis
number of split sublabels). Each RBF module is used to predict model. As the data is updated, the model continuously undergoes
which category the fault belongs to within this sublabel. The incremental evolution. And according to the specificity require-
prediction results of each RBF module are fused through a fully ments of each edge, the models in the cloud are compressed and
connected layer to be the final fault diagnosis results. transmitted to each edge to meet the real-time requirements of
fault diagnosis tasks.
IV. CLOUD-EDGE COLLABORATIVE FRAMEWORK FOR 1) LM-CNN Fault Diagnosis Model Training: The data up-
LM-CNN-BASED ADAPTIVE FAULT DIAGNOSIS dated will be resampled and aggregated for feature augmentation
for training the initial MIMO fault diagnosis model. Meanwhile,
Given the training as well as the application of our proposed
another set of resampled data, processed by label splitting, is
model is cross-scene and cross-conditions, it is considered to be
utilized to fine-tune the pretrained model to obtain a global fault
improved by a cloud-edge collaborative model. In this section,
diagnosis model.
as shown in Fig. 4, given the flexibility and real-time capability
The pretraining model is preferably trained with as many
of edge computing and the efficiency of cloud computing, a
faults as possible. On the one hand, this improves the
cloud-edge collaborative industrial equipment fault diagnosis
information-mining ability of the pretraining model, which is
framework is proposed, consisting of three parts: Equipment
the basis for the high performance of the model. On the other
plane, edge plane, and cloud plane.
hand, a more adequate sample in the fine-tuning phase can ensure
adequate information mining of the sublabels of faults.
A. Edge Plane
2) Scene-Specific Fault Diagnosis Adaptation: For a specific
1) Data Collection and Data Preprocessing: The real-time edge, it is possible that its working conditions are much less
data from industrial equipment collected by sensors will be complex than the faults described by the global fault data, and
transmitted to the edge for data preprocessing. The data col- the counts of some sublabels here are much smaller than the
lected may be unstructured and heterogeneous. In addition, these global one. This results in a complex global fault diagnosis
data may also exist with missing, redundant, and noisy issues. model that may not accurately describe the faults at that edge,
Therefore, the collected data should be complemented, outliers and this complex model can also cost too much unnecessary
removed, and denoised. The hidden noise in the data will be computational resources. Thus, the model is subject to tuning.

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:42:36 UTC from IEEE Xplore. Restrictions apply.
REN et al.: LM-CNN: A CLOUD-EDGE COLLABORATIVE METHOD FOR ADAPTIVE FAULT DIAGNOSIS 9063

The global fault diagnosis model can be modified with scenario TABLE I
CLASS INFORMATION OF EXPERIMENT ON CWRU DATASET WITH VARIOUS
specificity and then utilized for equipment fault diagnosis in that LOCATIONS AND DIAMETERS
edge.
Through the information interaction between the edge-
computing resources equipped with this plant and the cloud-
computing resources, training samples applicable to this sce-
nario are screened out. Assume there are M faults in the global
fault label space in the cloud. m faults are to be diagnosed in
the edge (m < M ), and m̂ corresponding faults’ data available
in the cloud, as shown in Fig. 4. Obviously, there is no need to
utilize the whole training dataset.
The tuning is in two steps. First, since the conditions on
the individual edge side are relatively simple compared to the
global one, not all faults need to be considered. Therefore,
the first step of the compression is to prune the global model,
where the inputs and output neurons that are not related to the
target edge are pruned (each location of the model’s inputs is
related to a particular fault). Second, the postpruning model
is to be fine-tuned to obtain accuracy. The prepruning model
is then distilled into the postpruning model. The output of the
penultimate layer of the global model is used as the teacher label
for the postpruning model to learn. In this step, only the faults of
the target edges and the LSFs of these faults are used as training
samples for distillation. Fig. 5. Scene-specific faults in global faults.

V. INDUSTRIAL CASE STUDY


We demonstrate the performance of the LM-CNN-based fault 2) Operating Environment:
diagnosis method. The accuracy and the ability to diagnose The experimental environment comprises two parts: The edge
unknown faults were verified, respectively. We also validated the devices and the cloud server, which input the collected data
effectiveness of the scenario-specific model adaptation method. into the cloud-edge collaboration framework for fault diagno-
Without loss of generality, we use existing neural networks sis. Edge devices and cloud servers are configured as follows
(ResNet18, ResNet34) and their improved versions with our and shown in Fig. 6. A cloud server and four edge devices
method for comparison experiments. are configured in the cloud-edge system. The cloud server is
equipped with AMD Ryzen Threadripper 3970X 32-Core CPU,
A. Experimental Setup 96 G RAM, and NVIDIA GeForce RTX 3090 GPU, and the
edge devices are with Intel(R) i5-8500 CPU, 24 GB RAM. The
1) Dataset: switch used for communication is HUAWEI S1730S-L8P-A1.
To verify the effectiveness of LM-CNN, a comparative exper- The equipment associated with these four edge devices operate
iment was conducted on the CWRU [36], a widely used bearing under various loads, respectively (the specific loads are given
fault dataset provided by the Case Western Reserve University in the experiment). The edge devices and the cloud server
Electrical Technology Laboratory. The dataset was collected by communicate via LAN. Historical data of various categories of
the accelerometer under four load conditions (0, 1, 2, and 3 hp), faults are stored in different edge devices. Specifically, edge A
providing four health states: normal, inner race fault, outer race stores the data of faults B07, B14, and B21; edge B stores the
fault, and ball fault. The damage diameters are 7, 14, 21, and 28 data of faults B14, B21, IR14, and IR21; edge C stores the data
mils. The acceleration data of the fan end and the drive end when of faults IR07, IR21, OR07, and OR21; and edge D stores the
the sampling frequency is 12 kHz and the acceleration data of data of faults IR07, IR14, OR07, and OR14.
the drive end when the sampling rate is 48 kHz are recorded,
respectively.
The acceleration signal at the drive end sampled at 12 kHz is B. Experiments and Analysis
used for fault diagnosis experiments. To test the performance of Three sets of experiments were conducted. First, experiments
method, we select faults with damage of 7, 14, and 21 mils at the with imbalanced data are carried out to compare the performance
ball (B), inner race (IR), and 6 o’clock of the outer race (OR), of LM-CNN with several existing methods. Besides, ablation ex-
respectively, as shown in Table I. Each fault in this experiment periments are carried out to validate the fault diagnosis accuracy,
contains 470 samples in training-set before resampling, and robustness of LM-CNN, and its ability to diagnose unknown
each sample contains signals of 512 sampling points. In the faults with missing data. Moreover, experiments are carried out
experiments, the label of each fault was split into two sublabels, to validate the effectiveness of the scene-specific approach. The
location and diameter. batch size of the experiment is 256, and the learning rate is 3e−4.

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:42:36 UTC from IEEE Xplore. Restrictions apply.
9064 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 12, DECEMBER 2022

TABLE III
COMPARISON OF DIAGNOSIS RATE (%) AMONG LM-CNN AND VARIOUS
STATE-OF-THE-ART METHODS

TABLE IV
PERFORMANCE COMPARISONS AMONG CNN, SEPARATED-CNN,
MIMO-CNN, AND LM-CNN

Table III. From the experimental results, it can be observed


that the proposed LM-CNN has higher diagnostic accuracy
and smaller standard deviation. Also, the standard deviation is
Fig. 6. Configuration of edge devices and cloud servers and cloud significantly reduced compared to MIMO-CNN. This shows the
edge collaboration framework. advantages of our data augmentation method and the fine-tuning
method with label split.
TABLE II
2) Performance Comparison of Various Methods for Missing
DESCRIPTION OF IMBALANCE DATASETS Sample Data: In this part, we compare the fault diagnosis per-
formance of LM-CNN for faults missing sample data by ablation
experiments. The acceleration signals of these faults at 2 and 3 hp
are used as the training set. The acceleration signals at 0 and
1 hp are used as the test set. The experimental results are shown
in Table IV. We use the overall diagnostic accuracy (Acc), the
prediction accuracy of the location (AccL), and the diagnostic
accuracy of the diameter (AccD) as indicators to evaluate the
accuracy of the model, respectively. And diagnostic time per
batch (Time) is also used as an time efficiency indicator for
comparison of individual models. We verified the validity of
each structure in the model through ablation studies. LM-CNN
1) Comparison of Performance for Imbalanced Data Among is compared with the basic CNN, the MIMO-augmented CNN
Various Methods: Our data augmentation approach targets the (MIMO-CNN), and CNN separated trained after label splitting
problems of data imbalance and small amount of data. In this (Separated). The experimental results show the superiority of
subsection, we experimentally verify the performance of the our model. In fact, repeated experiments have shown that the
proposed method to deal with the data imbalance problem. The MIMO-augmentation structure does not significantly improve
acceleration signals of these faults at 1, 2, and 3 hp are used the accuracy of the diagnostic model (comparison of CNN and
as the training and test dataset. Four different data fusions are MIMO-CNN). Though the pattern provided for extraction is
constructed to simulate various unbalanced data distributions. significantly increased, it does not seem to be directly applicable
With equal proportions of samples for each fault in Idx1, Idx1 to the fault diagnosis task. It is more about adopting “ensemble”
can be considered as a balanced dataset. In Idx2–4, B, IR, and to bring the accuracy of the model closer to its upper limit.
OR are divided into different proportions, where 30%, 20%, and Apparently, the characteristics captured by MIMO need to be
10% of the samples are used for training, while the test samples further processed to be transformed into valuable information.
remain at 50% for comparison purposes. Table II shows the However, the fine-tuning method with label-splitting is well
division details. suited for this task. The effectiveness of LM-CNN can be
We compared the proposed LM-CNN (ResNet18) with the demonstrated by comparing LM-CNN and the others; and the
models proposed by Yang et al. [37], Tra et al. [38], Wen upgraded result of LM-CNN compared to MIMO-CNN, which is
et al. [39], and Chen et al. [40] through fault diagnosis ex- proof of the excellent interpretation and processing ability of the
periments. The results, averaged by 10 trials, are shown in label-splitting method for information. Besides, a comparison of

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:42:36 UTC from IEEE Xplore. Restrictions apply.
REN et al.: LM-CNN: A CLOUD-EDGE COLLABORATIVE METHOD FOR ADAPTIVE FAULT DIAGNOSIS 9065

TABLE V
PERFORMANCE COMPARISONS OF FAULTS WITHOUT DATA BETWEEN MIMO-CNN AND LM-CNN

TABLE VI TABLE VII


CLASSIFICATION RATE OF IDX.2 AND IDX.4 PERFORMANCE COMPARISON OF SCENE-SPECIFIC MODELS WITH GLOBAL
AND LOCAL MODELS

LM-CNN is not sufficient to make an accurate diagnosis of


unknown faults, it can give a description of various attributes
of the faults. The cause of not achieving may be that our
fine-tune method is not superior enough in the synergy of the
two sublabels. The experimental results of Idx.4 demonstrate
that the same diagnosis can be made when data of multiple faults
are missed. The prediction accuracy for the diameter of OR14
is about 90%. Furthermore, by comparing the diagnosis rate of
those faults without data-missing in the two sets, it is obvious
that the diagnosis rate of faults increases as the number of LSFs
1
increases.
Each row in the table is the frequency of faults being misclassified to each category. The
diagnosis rate for each faults is marked in red, and the LSFs for faults without data are 3) Validity of the Scene-Specific Method: This experiment
marked in yellow. is applied to validate the performance of the proposed scene-
specific model. Suppose the equipment in edge D may fail
on the outer and inner races of the bearing, and the possible
the computational times shows that this “ensemble” model does fault diameters are 7 and 14 mils. That is, faults OR07, OR14,
not significantly decrease the computational efficiency. IR07, and IR14 may occur in this scene, while OR14 is without
In addition, the ability of LM-CNN to diagnose faults lacking historical data. The adopted training data are the whole faults
of samples was evaluated. We validated the diagnostic ability in Table I except B21 (unrelated to this edge equipment) and
of the models for samples with various classes of data lacking. OR14. The proposed scene-specific model is compared with the
The three faults, B21, IR07, and OR14, are removed from the global model and the local model trained only with the data from
training set to ensure that the remaining faults contain all their the edge. To make the conditions of the comparative experiment
LSFs, and the number of LSFs corresponding to each sublabel consistent, the structure of the local model and the global model
is greater than 1. It is shown in Table V that LM-CNN has the was kept the same.
robustness to sample missing, where Acc(lost) represents the Table VII shows the experimental results. The insufficient
diagnosis rate of fault without training data. Even in the case collection of fault data makes the local model unable to diagnose
that the training set is missing a class of data, the fault diagnostic fault OR14 at all. OR14 even cannot be recognized. Obviously
capability can be similar to that of the basic model (comparing at this edge, the fault diagnosis performance of the global model
Idx.2 in Table V with MIMO-CNN in Table IV). Besides, the is inferior to that of the scene-specific model, because it will be
accuracy of LM-CNN in this case is much higher than that of disturbed by other faults not related to this edge. At this edge,
the unfine-tuned model. the fault diagnosis performance of the global model is also not
The diagnosis results of each fault with ResNet18 in Idx.2 ideal, for it can be disturbed by other faults unrelated to this edge
and Idx.4 of Table V are shown in Table VI. As can be spotted, (in particular, faults unrelated to this edge have LSFs of faults in
despite the diagnosis rate of B21 is not high enough, only about this scenario). The scene-specific model, on the other hand, has
22% in Idx.2, it is much higher than the MIMO-only model (less a significantly lower misdiagnosis rate as a result of excluding
than 0.5%). Compared to the original model’s convergence to the effects of unrelated faults when fine-tuning. This proves the
0, it is clear that LM-CNN is effective. Meanwhile, LM-CNN effectiveness of our approach.
achieves a correct diagnosis rate of close to 60% for both the The scene-specific model is also much improved in terms
location-sublabel and the diameter-sublabel of B21. Although of computational efficiency. In fact, the width of the model is

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:42:36 UTC from IEEE Xplore. Restrictions apply.
9066 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 12, DECEMBER 2022

reduced by half in this set of experiments (unrelated neurons [13] A. H. Sodhro, S. Pirbhulal, and V. H. C. de Albuquerque, “Artificial
have been pruned). As a result, the computation time is also intelligence-driven mechanism for edge computing-based industrial ap-
plications,” IEEE Trans. Ind. Informat., vol. 15, no. 7, pp. 4235–4243,
reduced by nearly half. Jul. 2019.
[14] R. Gao et al., “Cloud-enabled prognosis for manufacturing,” CIRP Ann.,
vol. 64, no. 2, pp. 749–772, 2015.
VI. CONCLUSION [15] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Vision and
challenges,” IEEE Internet Things J., vol. 3, no. 5, pp. 637–646, Oct. 2016.
This article proposes a fault label sampling space enlarging [16] X. Wang, L. Ren, R. Yuan, L. T. Yang, and M. J. Deen, “QTT-
approach, label-split multiple-input convolutional neural net- DLSTM: A cloud-edge-aided distributed LSTM for cyber-physical-social
work, for adaptive fault diagnosis. The proposed method can Big Data,” IEEE Trans. Neural Netw. Learn. Syst., to be published,
doi: 10.1109/TNNLS.2022.3140238.
enlarge the label sampling space of faults with the existing fault [17] X. Zhou, X. Yang, J. Ma, and K. I.-K. Wang, “Energy efficient smart routing
data and can address data imbalance. Exploiting the coupling of based on link correlation mining for wireless edge computing in IoT,” IEEE
label attributes, these attributes can be analyzed separately so Internet Things J., to be published, doi: 10.1109/JIOT.2021.3077937.
[18] K. Kaur, S. Garg, G. S. Aujla, N. Kumar, J. J. P. C. Rodrigues, and
as to extend the sampling space of fault labels. The proposed M. Guizani, “Edge computing in the industrial Internet of Things envi-
approach first analyzes the heterogeneousness and similarities ronment: Software-defined-networks-based edge-cloud interplay,” IEEE
of faults through data augmentation; in addition, the fault char- Commun. Mag., vol. 56, no. 2, pp. 44–51, Feb. 2018.
[19] H. Tang, D. Li, J. Wan, M. Imran, and M. Shoaib, “A reconfigurable
acteristics are redefined by analyzing the split fault sublabels. method for intelligent manufacturing based on industrial cloud and edge
This is also used for qualitative analysis of faults that lack intelligence,” IEEE Internet Things J., vol. 7, no. 5, pp. 4248–4259,
historical data. Finally, a distributed fault diagnosis framework May 2020.
[20] H. Li, K. Ota, and M. Dong, “ECCN: Orchestration of edge-centric com-
is constructed by cloud edge computing. A global fault diagnosis puting and content-centric networking in the 5G radio access network,”
model in the cloud is trained with data from each edge, and then IEEE Wireless Commun., vol. 25, no. 3, pp. 88–93, Jun. 2018.
the global model is specifically tuned with fault data relevant to [21] C. Wang, L. Guo, C. Wen, Q. Hu, and J. Qiao, “Event-triggered adaptive
attitude tracking control for spacecraft with unknown actuator faults,”
the target scenario. IEEE Trans. Ind. Electron., vol. 67, no. 3, pp. 2241–2250, Mar. 2020.
However, the method is only a preliminary attempt. Scenario- [22] F. Zhou, S. Yang, H. Fujita, D. Chen, and C. Wen, “Deep learning fault
specific fault diagnosis models require more efficient construc- diagnosis method based on global optimization GAN for unbalanced data,”
Knowl.-Based Syst., vol. 187, 2020, Art. no. 104837.
tion methods. Meanwhile, fault data augmentation as well as [23] W. Liang, Y. Hu, X. Zhou, Y. Pan, and K. I-Kai Wang, “Variational few-
model computation requires more efficient methods. shot learning for microservice-oriented intrusion detection in distributed
industrial IoT,” IEEE Trans. Ind. Informat., vol. 18, no. 8, pp. 5087–5095,
Aug. 2022.
REFERENCES [24] J. Pan and J. McElhannon, “Future edge cloud and edge computing for
Internet of Things applications,” IEEE Internet Things J., vol. 5, no. 1,
[1] L. Ren, L. Zhang, L. Wang, F. Tao, and X. Chai, “Cloud manufacturing: pp. 439–449, Feb. 2018.
Key characteristics and applications,” Int. J. Comput. Integr. Manuf., [25] C. Sun, M. Ma, Z. Zhao, S. Tian, R. Yan, and X. Chen, “Deep transfer
vol. 30, no. 6, pp. 501–515, 2017. learning based on sparse autoencoder for remaining useful life prediction
[2] Q. Luo, Y. Chang, J. Chen, H. Jing, H. Lv, and T. Pan, “Multiple of tool in manufacturing,” IEEE Trans. Ind. Informat., vol. 15, no. 4,
degradation mode analysis via gated recurrent unit mode recognizer and pp. 2416–2425, Apr. 2019.
life predictors for complex equipment,” Comput. Ind., vol. 123, 2020, [26] Z. Wu, W. Lin, B. Fu, J. Guo, Y. Ji, and M. Pecht, “A local adaptive minority
Art. no. 103332. selection and oversampling method for class-imbalanced fault diagnostics
[3] B. Cai, L. Huang, and M. Xie, “Bayesian networks in fault diagnosis,” in industrial systems,” IEEE Trans. Rel., vol. 69, no. 4, pp. 1195–1206,
IEEE Trans. Ind. Informat., vol. 13, no. 5, pp. 2227–2240, Oct. 2017. Dec. 2020.
[4] Z. Gao, C. Cecati, and S. X. Ding, “A survey of fault diagnosis and [27] S. Liu, H. Jiang, Z. Wu, and X. Li, “Data synthesis using deep fea-
fault-tolerant techniques–Part I: Fault diagnosis with model-based and ture enhanced generative adversarial networks for rolling bearing im-
signal-based approaches,” IEEE Trans. Ind. Electron., vol. 62, no. 6, balanced fault diagnosis,” Mech. Syst. Signal Process., vol. 163, 2022,
pp. 3757–3767, Jun. 2015. Art. no. 108139.
[5] L. Zhang et al., “Cloud manufacturing: A new manufacturing paradigm,” [28] S. Shao, S. McAleer, R. Yan, and P. Baldi, “Highly accurate machine
Enterprise Inf. Syst., vol. 8, no. 2, pp. 167–187, 2014. fault diagnosis using deep transfer learning,” IEEE Trans. Ind. Informat.,
[6] L. Ren, Y. Liu, D. Huang, K. Huang, and C. Yang, “MCTAN: A novel vol. 15, no. 4, pp. 2446–2455, Apr. 2019.
multichannel temporal attention-based network for industrial health indi- [29] L. Feng and C. Zhao, “Fault description based attribute transfer for zero-
cator prediction,” IEEE Trans. Neural Netw. Learn. Syst., to be published, sample industrial fault diagnosis,” IEEE Trans. Ind. Informat., vol. 17,
doi: 10.1109/TNNLS.2021.3136768. no. 3, pp. 1852–1862, Mar. 2021.
[7] X. Xu, “From cloud computing to cloud manufacturing,” Robot. Comput.- [30] W. Lu, B. Liang, Y. Cheng, D. Meng, J. Yang, and T. Zhang, “Deep model
Integr. Manuf., vol. 28, no. 1, pp. 75–86, 2012. based domain adaptation for fault diagnosis,” IEEE Trans. Ind. Electron.,
[8] H. Luo, S. Yin, T. Liu, and A. Q. Khan, “A data-driven realization of the vol. 64, no. 3, pp. 2296–2305, Mar. 2017.
control-performance-oriented process monitoring system,” IEEE Trans. [31] H. Luo, H. Zhao, and S. Yin, “Data-driven design of fog-computing-aided
Ind. Electron., vol. 67, no. 1, pp. 521–530, Jan. 2020. process monitoring system for large-scale industrial processes,” IEEE
[9] H. Shao, H. Jiang, H. Zhang, and T. Liang, “Electric locomotive bearing Trans. Ind. Informat., vol. 14, no. 10, pp. 4631–4641, Oct. 2018.
fault diagnosis using a novel convolutional deep belief network,” IEEE [32] X. Zhou, X. Xu, W. Liang, Z. Zeng, and Z. Yan, “Deep-learning-enhanced
Trans. Ind. Electron., vol. 65, no. 3, pp. 2727–2736, Mar. 2018. multitarget detection for end-edge-cloud surveillance in smart IoT,” IEEE
[10] L. Ren, T. Wang, Y. Laili, and L. Zhang, “A data-driven self-supervised Internet Things J., vol. 8, no. 16, pp. 12588–12596, Aug. 2021.
LSTM-DeepFM model for industrial soft sensor,” IEEE Trans. Ind. Infor- [33] T. Wang, H. Ke, X. Zheng, K. Wang, A. K. Sangaiah, and A. Liu, “Big
mat., to be published, doi: 10.1109/TII.2021.3131471. Data cleaning based on mobile edge computing in industrial sensor-cloud,”
[11] H. Wang, Z. Liu, D. Peng, and Y. Qin, “Understanding and learning IEEE Trans. Ind. Informat., vol. 16, no. 2, pp. 1321–1329, Feb. 2020.
discriminant features based on multiattention 1DCNN for wheelset bearing [34] M. Havasi et al., “Training independent subnetworks for robust predic-
fault diagnosis,” IEEE Trans. Ind. Informat., vol. 16, no. 9, pp. 5735–5745, tion,” in Proc. Int. Conf. Learn. Representations, 2020, pp. 1–13.
Sep. 2020. [35] S. Fort, H. Hu, and B. Lakshminarayanan, “Deep ensembles: A loss
[12] S. Guo, B. Zhang, T. Yang, D. Lyu, and W. Gao, “Multitask convolutional landscape perspective,” arXiv:1912.02757.
neural network with information fusion for bearing fault diagnosis and [36] W. A. Smith and R. B. Randall, “Rolling element bearing diagnostics using
localization,” IEEE Trans. Ind. Electron., vol. 67, no. 9, pp. 8005–8015, the case western reserve university data: A benchmark study,” Mech. Syst.
Sep. 2020. Signal Process., vol. 64/65, pp. 100–131, 2015.

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:42:36 UTC from IEEE Xplore. Restrictions apply.
REN et al.: LM-CNN: A CLOUD-EDGE COLLABORATIVE METHOD FOR ADAPTIVE FAULT DIAGNOSIS 9067

[37] Y. Yang, H. Zheng, Y. Li, M. Xu, and Y. Chen, “A fault diagnosis Tao Wang received the B.S. degree in automa-
scheme for rotating machinery using hierarchical symbolic analysis and tion in 2020 from the School of Automation Sci-
convolutional neural network,” ISA Trans., vol. 91, pp. 235–252, 2019. ence and Electrical Engineering, Beihang Uni-
[38] V. Tra, J. Kim, S. A. Khan, and J.-M. Kim, “Bearing fault diagnosis under versity, Beijing, China, where he is currently
variable speed using convolutional neural networks and the stochastic working toward the postgraduate degree in elec-
diagonal Levenberg–Marquardt algorithm,” Sensors, vol. 17, no. 12, 2017, tronic information.
Art. no. 2834. His research interests include industrial intelli-
[39] L. Wen, X. Li, L. Gao, and Y. Zhang, “A new convolutional neural network- gence, natural language processing, and knowl-
based data-driven fault diagnosis method,” IEEE Trans. Ind. Electron., edge distillation.
vol. 65, no. 7, pp. 5990–5998, Jul. 2018.
[40] Z. Chen, A. Mauricio, W. Li, and K. Gryllias, “A deep learning method
for bearing fault diagnosis based on cyclic spectral coherence and con-
volutional neural networks,” Mech. Syst. Signal Process., vol. 140, 2020, Yehan Ma (Member, IEEE) received the B.S.
Art. no. 106683. and M.S. degrees in automation from the Harbin
Institute of Technology (HIT), Harbin, China, in
2015 and 2013, respectively, and the Ph.D. de-
gree in computer science from Washington Uni-
versity in St. Louis, St. Louis, in 2020.
She is currently an Assistant Professor with
the School of Electronic Information and Electri-
Lei Ren (Member, IEEE) received the Ph.D. de- cal Engineering, Shanghai Jiao Tong University,
gree in computer science from the Institute of Shanghai, China. She has authored or coau-
Software, Chinese Academy of Sciences, Bei- thored papers in top-tier conferences and jour-
jing, China, in 2009. nals, including EMSOFT, ICCPS, TCAD, TCPS, and TCST. She has
He is currently a Professor with the School of broadly investigated techniques and solutions for holistic managements
Automation Science and Electrical Engineering, of computation, communication, and control in cyber-physical systems.
Beihang University, Beijing. His research inter- Her research interests include the industrial Internet of Things, cyber-
ests include industrial Internet-of-Things, indus- physical systems, and edge computing.
trial Big Data and AI, and cloud manufacturing. Dr. Ma is the recipient of the Fullgraf Fellowship and Shanghai Chen-
Dr. Ren is an Associate Editor of IEEE TRANS- guang Scholar.
ACTIONS ON NEURAL NETWORKS AND LEARNING
SYSTEMS, SIMULATION-Transactions of the Society for Modeling and
Simulation International, and other international journals. Lihui Wang received the Ph.D. degree in in-
telligence science from Kobe University, Kobe,
Japan, in 1993.
He is currently a Chair Professor with the KTH
Royal Institute of Technology, Stockholm, Swe-
den. He has authored or coauthored ten books
and authored more than 600 scientific publi-
cations. His research interests include cyber-
Zidi Jia (Graduate Student Member, IEEE) re- physical systems, real-time monitoring and con-
ceived the bachelor’s degree in automation and trol, human–robot collaboration, and adaptive
master’s degree in control engineering from the manufacturing systems.
School of Automation Engineering, University of Dr. Wang is the Editor-in-Chief of International Journal of Manu-
Electronic Science and Technology of China, facturing Research, Journal of Manufacturing Systems, and Robotics
Chengdu, China, in 2017 and 2020, respec- and Computer-Integrated Manufacturing. He is a Fellow of Canadian
tively. He is currently working toward the Ph.D. Academy of Engineering (CAE), Fellow of International Academy for
degree in pattern recognition and intelligent sys- Production Engineering (CIRP), Fellow of Society of Manufacturing En-
tems with the School of Automation Science gineers (SME), and Fellow of American Society of Mechanical Engi-
and Electrical Engineering, Beihang University, neers (ASME). He is also a registered Professional Engineer in Canada,
Beijing, China. the President of North American Manufacturing Research Institution
His research interests include industrial artificial intelligence, the in- of SME from 2020 to 2021, and the Chairman of Swedish Production
dustrial Internet-of-Things, and industrial process monitoring. Academy from 2018 to 2020.

Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on December 07,2023 at 14:42:36 UTC from IEEE Xplore. Restrictions apply.

You might also like