Kumar Hati2021 Article TransferLearning BasedDeepCNNM

Neural Computing and Applications
https://doi.org/10.1007/s00521-021-06205-1 (0123456789().,-volV)(0123456789().
,- volV)
ORIGINAL ARTICLE
Transfer learning-based deep CNN model for multiple faults detection

in SCIM
Prashant Kumar1 • Ananda Shankar Hati1
Received: 15 October 2020 / Accepted: 8 June 2021

Ó The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021
Abstract
Deep learning-based fault detection approach for squirrel cage induction motors (SCIMs) fault detection can provide a
reliable solution to the industries. This paper encapsulates the idea of transfer learning-based knowledge transfer approach
and deep convolutional neural network (dCNN) to develop a novel fault detection framework for multiple and simulta-
neous fault detection in SCIM. In comparison with the existing techniques, transfer learning-based deep CNN (TL-dCNN)
method facilitates faster training and higher accuracy. The current signals acquired with the help of hall sensors and
converted to an image for input to the TL-dCNN model. This approach provides autonomous learning of features and
decision-making with minimum human intervention. The developed method is also compared to the existing state-of-the-
art techniques, and it outperforms them and has an accuracy of 99.40%. The dataset for the TL-dCNN model is generated
from the experimental setup and programming is done in python with the help of Keras and TensorFlow packages.
Keywords Deep learning Transfer learning Convolutional neural network Bearing faults Broken rotor bars
1 Introduction production. Achieving higher standards in production

requires new maintenance strategies for SCIM, which
Industry revolution 4.0 is driving the modern industries includes artificial intelligence (AI) technique-based fault
towards automation, Industrial Internet of Things (IIoT), detection (FD). The faults in the SCIM can be categorized
cloud computing, and on-demand manufacturing. Squirrel as mechanical and electrical faults [3, 9, 38]. The
cage induction motor (SCIM) has found immense appli- mechanical fault includes broken rotor bars, bearing fault,
cation in industries like mining, textile, railways, cement, air-gap eccentricity, etc., [7, 13, 37] and electrical fault
and many more. Mine winders use SCIM, and fault includes inter-turn short circuit, crawling, single phasing,
detection in such a scenario is of utmost importance reverse phase current, dielectric failure, etc [23, 27, 38, 39].
because production depends entirely on it. Similarly, in Traditionally, signal-based methods are applied for FD in
many other industries, squirrel cage induction motor is the SCIM. The signal processing techniques like time-domain
driving force for production, and condition monitoring is analysis [19, 51], frequency-domain analysis [17, 29],
vital in these scenarios. SCIM is robust machines, yet faults enhanced frequency analysis [26, 41], and time-frequency
are inevitable. Maintenance of SCIM is essential for analysis [6, 46] techniques are deployed for fault detection.
avoiding sudden breakdown and production losses. Con- The traditional methods of conditional monitoring are
dition-based monitoring of SCIM is vital for optimum clumsy and need human expertise [47].
The lacunas of traditional methods can be eliminated by
the application of artificial intelligence and automation in
& Ananda Shankar Hati condition monitoring. The availability of highly efficient
anandashati@iitism.ac.in
computational machines has also led to the inclusion of AI
Prashant Kumar techniques in the condition-based maintenance of SCIM.
prashant88344@gmail.com
The amalgamation of AI-based techniques has ensured an
1
Department of Mining Machinery Engineering, Indian automatic fault diagnosis of SCIM. It requires the appli-
Institute of Technology (Indian School of Mines), Dhanbad, cation of machine learning (ML) techniques like artificial
India
123
neural network (ANN), support vector machine (SVM), autoencoders [2] and deep belief network (DBN)
k-nearest neighbor (k-NN), decision trees and random [2, 30, 45] are used for FD in SCIM. The DL algorithm
forest, fuzzy logic and expert systems provides an automatic feature extraction and selection due
[11, 21, 25, 34, 44, 49]. However, even these ML-based to its deep architectures and an end-to-end learning solu-
techniques require human expertise and are dependent on tion. It is cumbersome to do feature extraction and selec-
human decision-making. This framework requires signal tion for FD in SCIM as different faults have different
preprocessing, feature extraction, and feature selection characteristic frequency, which depends on supply fre-
steps. The available techniques achieve good accuracy in quency, loading, and other parameters. The available
FD. Still, performance is overwhelmingly dependent on the methods have been developed for a particular fault and
manual feature extraction and feature selection step, pre- vulnerable to the working conditions and not suitable for
processing step, and parameter selection of fault classifier. general applications.
The human expertise-based features are extracted for Deep learning has the ability to work directly on the raw
specific tasks, and they can perform accurately under those signals, and it can overcome the lacunas of the ML-based
circumstances. However, it might not perform as well methods. It has shown excellent learning properties in
under different scenarios [31]. In [14], the authors devel- learning representative features from raw signals due to its
oped a bearing FD approach using optimized feature deep architecture. Deep learning has been used in many
selection by genetic algorithm (GA) and SVM as a clas- fields but not limited to the domain of computer vision
sifier. In [1], the authors have developed a fault detection [20], speech recognition [8], biomedical [4], and natural
technique by utilizing data from multiple sensors and their language processing [28]. Due to DL’s excellent feature
fusion and fault classification using the SVM and short- learning properties, it has attracted researchers globally in
time Fourier transform (STFT). A FD approach based on the field of condition-based maintenance of SCIM. In [12],
SVM is proposed using the continuous wavelet transform authors have proposed a real-time condition monitoring
(CWT) and vibration of the motor in [18]. In [15], the and FD in motor with the help of 1-D CNN. Initially, CNN
authors have developed a FD technique for bearing faults was developed for 2-D images, and input for the CNN was
using ANN and SVM-based classifier by utilizing the necessary to be in 2-D format. Several authors have con-
features extracted from the vibration signal. In [52], verted 1-D vibration data into 2-D images for CNN
authors have proposed a bearing faults detection approach [24, 31, 43]. In [31], the author’s proposed machine fault
by utilizing the features based on hierarchical entropy as an diagnosis using knowledge transfer and DNN. In [35],
input to the SVM in combination with the particle swarm authors have developed a stacked sparse autoencoders-
optimization. In [33], bearings faults were diagnosed by the based DNN along with an unsupervised learning procedure
combination of Hilbert Huang transform, SVM, and sup- for bearing fault detection using a supervised fine-tuning
port vector regression. In [36], the authors have performed process. In [10], authors propounded a bearing FD using
the feature learning using sparse autoencoder and neural CNN without feature extraction. In [40], batch normalized
network for FD in SCIM. Authors [50] have proposed a DNN has been proposed for faster training and automatic
bearing FD approach by utilizing SVM classifier and fea- bearing and a gearbox FD. Most of the developed DL
ture learning from the ensemble empirical mode decom- models have a limited number of hidden layers, and deeper
position of the signal. The authors proposed a bearing fault DL models have not been investigated for FD in SCIM.
detection technique by extracting the time-domain features However, developing deeper architectures in CNN has its
from the signal and using ANN as a classifier [48]. In [22], challenges like high computational time, a higher number
a bearing fault detection approach is developed using the of hyper-parameters, higher labeled data, and computa-
SVM and improvised ant-colony optimization. The devel- tional power.
oped method was compared with different SVM classifiers To overcome these barriers of the deep architecture of
based on GA, cross-validation, and standard ant-colony DL, transfer learning (TL) promises a lot. TL utilizes the
optimization method. information acquired while tackling one issue and applying
In recent times, deep learning (DL) architectures have it to an alternate related issue. Instead of training a com-
gained popularity in solving complex problems and hence, plete neural network, a network is used which is already
attracted researchers of different fields. DL is relatively trained on a large labeled data with some fine-tuning and
new in FD in SCIM, and only a few research work are improvement. For implementing TL, pretrained models are
available compared to the conventional ML-based FD used, which are trained on large datasets from scratch.
scheme in SCIM. Most of the ML-based technique requires However, the present manuscript proposes the use of TL
the assistance of manual feature extraction, and selection with the help of an already established VGG16 [32] model
[16]. Deep learning algorithms like convolutional neural trained on ImageNet with fine-tuning and modification in
network (CNN) [5, 42], deep neural network (DNN), dense layers. The pre-trained VGG16 CNN model will
123
reduce the computational time as the network architectures, or a feature map. Filters denote a group of connections that
hyper-parameters, and model parameters are already get replicated across the entirety of the input. The output
adjusted and can be transferred for FD in SCIM. In this layer is the feature map generated by this filter. A neuron in
way, TL initializes the parameters efficiently, and the fault a feature map is activated if the filter backing to its action
classification model gets a reasonable initialization. detected an appropriate position at the previous layer. It
The present paper propounds an intelligent and novel can be expressed by:
fault detection technique formulated on a transfer learning- !
ðiÞ
X M
ðiÞ
based deep CNN (TL-dCNN) model for broken rotor bar ði1Þ i
Xm 0 ¼ C Wmm0 Xm þ Bm0 ð1Þ
and bearing fault detection. The proposed method is for- m¼1
mulated on the analysis of the current signals acquired
where i denotes the layer number, m0 denotes the index of
from the multiple sensors. The performance of the TL-
the output feature map, m denotes the index of the input
dCNN classifier is evaluated for bearing and broken rotor
feature map. W denotes the weight matrix, and denotes a
bar faults detection. The paper comprises six sections.
2-D discrete convolution operator. C stands for the non-
Section 2 portrays a convolutional neural network and
linear activation function. The developed model in this
method of transfer learning. Section 3 proposes a transfer
paper has used ReLU (Rectified Linear Unit) activation
learning and deep CNN-based fault detection. Section 4
function. It enjoys the benefit that it does not actuate all the
presents the test setup and FD approach exhaustively.
neurons simultaneously and tackle the issue of dead
Section 5 elaborates the performance analysis of the pro-
neurons.
posed method along with the comparison with the estab-
lished methods. Finally, Sect. 6 concludes the proposed
2.2 Pooling layer
work.
The input feature maps are downsized to resolve the
problem of local translation invariance in the pooling layer
2 Convolutional neural network (PL). It helps in summarizing the availability of features in
and method of transfer learning batches on the input feature map. The size of the filters in
the PL is smaller than the size of the feature map. The
Convolutional neural network (CNN) is used in image
pooling operation helps in making the feature map repre-
recognition problems, image classification, biomedical
sentation invariant to the minor changes in the information.
sectors, and many other fields. The application of CNN has
In the VGG16 model, the size of the filters is 2 9 2 in the
seen a boom due to its efficient feature learning capabili-
pooling layer. It also helps in avoiding the over-fitting
ties. CNN inherently combines feature extraction and
problem. The pooling can be max-pooling and average
classification, which provides an edge over the traditional
pooling. Max-pooling operation estimates the maximum
machine learning techniques. The convolutional neural
value for each portion of the feature map, and average
network structure comprises numerous layers that are
pooling estimates the average value for each segment of the
trainable and execute several linear and nonlinear opera-
feature map.
tions. The CNN architecture is comprised of convolution
layers, pooling layers, and the fully connected layer. The
2.3 Fully connected layer
convolution blocks perform the convolution and pooling
operation. The deep CNN network consists of convolution
In the CNN architecture, the fully connected layer (FCL) is
blocks of varying kernel sizes. The main idea behind CNN
a vital component. CNN architecture might consist of
is the convolution operation. It embraces the idea of a
multiple FCL. FCL’s are quite useful in image recognition
limited responsive field. Every little portion of the input is
problems and image segregation. The FCL performs its
associated with each node in the following layer. This
backpropagation process to identify the most viable
association diminishes the quantum of parameters which
weights. The neurons in the FCL receive the appropriate
permits faster training.
weights to prioritize the most suitable weights. The neurons
in the FCL finally vote on each label, and the winner of the
2.1 Convolution layer voting is the final classification decision. The softmax
activation function is generally used for multi-class clas-
A filter or kernel is applied to the input image in the
sification problems. Softmax activation function limits its
convolution layer (CL), and it convolves all over the input
output in the range of 0 and 1 for the given information.
image. The yield is the dimensionally diminished feature
map that is provided to the sub-sampling layer. The layers
of neurons in a feed-forward NN denote the original image
123
2.4 Adaptive gradient optimizer learning is given in Fig. 1. The knowledge of one model
which is already trained on a large dataset is transferred to
The choice of the optimizer is crucial for quicker conver- the required model for accelerated performance and less
gence of the deep models. It helps in the proper classifi- computational time. Figure 1 shows dataset 1, which is a
cation of different faults. The hyper-parameters are a large dataset, and model 1 is trained on it, and knowledge
fundamental component for quicker convergence. After of model 1 is transferred to model 2 for faster learning. It is
every epoch, the adaptive gradient optimizer provides a an example of simple transfer learning. With transfer
specific learning rate for weight updation. Hyper-parame- learning, information from the pretrained models can be
ter, i.e., the learning rate, is required in weight updation transferred to the new model. It helps in mitigating the
and calculation. The weight updation depends on the issues caused by the lesser availability of data required for
choice of hyper-parameters. In the event that the learning a new task.
rate is too little, the updation of weight will be excessively Consider a domain D ¼ ðI; PðIÞÞ, which is expressed by
sluggish, and assuming it is enormous; the weight will two components: a feature space I and a marginal proba-
move over the capacity. It may never arrive at the desired bility distribution P(I) where I ¼ ½i1 ; i2 ; i3 ; . . .; in 2 I. If
loss level. The sensitivity issue for each dimension is often two domains are different, then they either have different
caused by the high dimensional non-arched nature of feature spaces ðIt 6¼ IsÞ or different marginal distributions
neural architecture optimization. Also, the learning rate is ðPðIt Þ 6¼ PðIs ÞÞ. For a specific domain D, a task T ¼
needed to be scaled for each dimension for quicker con- ðY; f ð:ÞÞ consists of two parts: a label space Y and a pre-
vergence. Adaptive gradient optimizer adaptively alters the dictive function f(.), which is not observed but can be
learning rate for each measurement. Mathematically, the learned from training data (ðim ; ym Þ | m 2 1; 2; 3; . . .; N,
operation of adaptive gradient optimizer can be represented where im 2 I and ym 2 Y). From a probabilistic viewpoint
by: f ðim Þ, can also be written as pðym jim Þ, so we can rewrite
or task T as T ¼ ðY; PðYjIÞÞ. In general, if two tasks are
Wt ¼ Wt1 k0t ð2Þ different, then they may have different label spaces ðYt 6¼
oWt1
Ys Þ or different conditional probability distributions
where W denotes weight that is changed during the itera- ðPðYt jIt Þ 6¼ PðYs jIs ÞÞ. For a given a source domain Ds and
tion, k0t denotes the initial learning rate that alters after each corresponding learning task Ts , a target domain Dt and
iteration and r denotes the loss function. learning task Tt , transfer learning aims to improve the
2.5 Transfer learning and its tuning
Transfer learning (TL) is the process of knowledge transfer

where knowledge learned from solving one problem is
applied to the different related problems. TL is quite a
common technique used in image classification as it
reduces computational time. The basic layout of transfer
Fig. 1 Simple architecture of transfer learning Fig. 2 Transfer learning layout for fault detection in SCIM
123
learning of the conditional probability distribution PðYt jIt Þ blocks of the VGG16 model are used in fault detection
in Dt with the information gained from Ds and Ts , where model for extracting the features from the given current
Dt 6¼ Ds or Tt 6¼ Ts . images. The extracted bottleneck features were fed to the
In SCIM fault diagnosis, TL helps in deciding the classifier which includes two fully connected layers with
parameters of the fault detection model with the knowledge 4096 nodes and output layer with softmax activation
transfer from a pre-trained model. The transfer learning- function and 5 nodes The classifier is trained using the
based weight initialization reduces the computational cost adaptive gradient optimizer with initial learning rate of
and fastens the training process of the proposed FD model 0.001 with categorical cross-entropy loss function. The
formulated on deep CNN architecture. Training a deep standard categorical cross-entropy loss function is given
CNN model from scratch is time consuming and cumber- by:
some task. In the case of training from scratch, weights are
1X K X M
initialized randomly, and it gets updated based on the loss Jcce ¼ yk logðhh ðim ; kÞÞ ð3Þ
function and related data. M k¼1 m¼1 m
Transfer learning-based deep CNN (TL-dCNN) fault where M denotes number of training examples, K denotes
detection model provides a time-efficient approach for
the number of classes, ykm denotes the target label for
multiple and simultaneous fault detection in SCIM. Fine-
training example m for class k, i denotes the input for
tuning is the process of updating weights of higher hidden
training example m, and hh denotes the model with neural
layers. The knowledge transfer by transfer learning mini-
network weights h.
mizes the number of parameters during training. The layout
of the TL-based FD model is given in Fig. 2. Tables 1 and 2
summarize the hyper-parameters of convolution layer and
2.6 1-D current signal to image conversion
max-pooling layer. The FD model involves numerous
method
blocks of stacked convolution layer and max-pooling layer.
The proposed fault detection model (TL-dCNN) uses
The traditional method involves substantial pre-processing
classifier comprising of two fully connected layers with
before investigation as data-driven approaches are inca-
4096 nodes and output layer with softmax activation
pable of handling the raw data signals. The feature
function and 5 nodes (based on the number of classes) on
extraction from the raw signals and the selection of opti-
the top of the output of last convolution parts of VGG16
mum features are vital in the data processing. The feature
architecture. The five convolutional blocks enclosed within
extraction and selection are dependent on past experiences,
a lock symbol represent same weights as in VGG16, and
resources, and human expertise. The FD technique is
remaining blocks (classifier) are enclosed with an unlock
substantially dependent on the feature selection. The
symbol and are fine-tuned as per the requirement. The
incorrect feature selection would lead to the poor perfor-
blocks (classifier) are enclosed with an unlock symbol for
mance of the FD model. So, the features are the driving
simplicity and understanding. The five convolutional
force and vital for the efficient performance of the FD
Table 1 Hyper-parameter of
Layer Number of filters Kernel size Stride Padding Activation
convolution layer
CL 1 64 33 1 Same Relu
123
Table 2 Hyper-parameter of max-pooling layer training. The output layer uses a softmax function for
Layer Filter size Stride
decision-making. The fault detection model is saved after
numerous training, and parameters of the model are saved
Max-pooling layer 1 22 2 accordingly. The layout of the proposed framework is
Max-pooling layer 2 22 2 given in Fig. 3.
Max-pooling layer 3 22 2
Max-pooling layer 4 22 2
Max-pooling layer 5 22 2 4 Experimental setup and fault detection
methodologies
Multiple fault detection is a perplexing task, and it involves

enormous data for attaining good fault classification
model. The TL-dCNN model, i.e., the FD model, needs the accuracy. The dataset was created with the help of the test
input as images. The image of size 224 224 3 is nee- setup. The details of the experimental setup and in-depth
ded for input to the proposed FD model. A simple tech- analysis are given below:
nique is illustrated for transforming the 1-D current signals
to the images. For developing the required images of size 4.1 Experimental setup
224 224 3 using the current data, it is normalized in
the range (0–1). Then, the normalized data are separated A complete in-depth analysis of the proposed fault detec-
into different fragments with 256 data points. These frag- tion model is carried out on the data acquired from the
ments are transformed into a 2-D array. The PIL package experimental setup. The experimental setup includes a
available in python is utilized for transforming the 2-D 5 kW squirrel cage induction motor, data acquisition sys-
arrays into the images. The current images require 256 data tem, current sensors and workstation. The experimental
points. setup is given in Fig. 4. The experimental setup has used
National Instruments DAQ system for current data col-
lection with the help of NI LabVIEW software. The dif-
3 SCIM Fault detection using deep CNN ferent bearing faults like inner race fault (BIRF), outer race
and TL fault (BORF) and ball defect (BBDF) were imitated in the
laboratory by damaging the bearing. The different condi-
This paper propounds a fault detection technique using the tions of the bearing are given in Figs. 5, 6, 7 and 8. Sim-
concept of transfer and high depth CNN model for bearing ilarly, the broken rotors were imitated by damaging the
and rotor broken bar FD in SCIM. It can automatically rotor bar externally and then mounting it in the motor. The
learn fault features and recognize SCIM health. The FD in different conditions of the rotor are shown in Figs. 9, 10
SCIM requires image conversion from raw signals, data
preparation, pretrained Oxford Visual Geometry Group
(VGG16) model with fine-tuning and adjustment. VGG16
is a state-of-the-art model for image classification on
ImageNet dataset. It has been used in the proposed
framework; the VGG16 model has been used to decide the
weights of the convolution blocks of the FD model. The
fully connected layers are adjusted and fine-tuned as per
the fault classification requirement. The acquired current
signals are transformed to images with the simple proce-
dure mentioned in Sect. 2.6. The current data are converted
to an image as the TL-dCNN model takes 2-D as input
only. The developed deep CNN architecture is inspired by
the VGG16 model. The weights of the convolution blocks
are same as the weights of the VGG16 model, and FCL’s
are fined tuned. The output layer has multiple neurons, and
it is dependent on the number of states (working states) of
SCIM to be classified. The fully connected layers are ini-
tialized randomly, and weights are adjusted during the
Fig. 3 Layout of the proposed framework for fault detection in SCIM
123
Fig. 8 Bearing with ball defect
Fig. 4 Experimental setup for current data acquisition under different

operating conditions
Fig. 9 Healthy rotor
and 11. The workstation has dedicated NVIDIA Quadro

P4000 graphics processing unit (GPU) for faster compu-
tation along with the Intel Xeon E3 central processing unit
Fig. 5 Healthy bearing
and 64 GB RAM. The fault detection model is developed
in the Spyder notebook with Python programming. The
Keras and TensorFlow packages were used for model
development. Table 3 enlists the technical aspects of the
experimental setup.
4.2 Fault detection methodology
The proposed fault detection methodology is based on the

current signals, deep CNN architecture, and application of
Fig. 6 Bearing with inner race fault
transfer learning. The sampling rate of 10 kHz is used for
the acquisition of stator currents. The current signals are
extracted for five different working conditions of the SCIM
for 200 s under various loading situations like no-load,
25%, 50%, 75%, and 100% load. The five different
working conditions are healthy (H), bearing inner race fault
(BIRF), bearing outer race fault (BORF), ball bearing
defect (BBDF), and broken rotor bar (BRB). The current is
converted to the image of size 224 224 3 with the help
of the method elaborated in Sect. 2.6. The size of the image
has been kept at 224 224 3 as the developed model is
using a pre-trained VGG16 model. The VGG16 model is
Fig. 7 Bearing with outer race fault trained on images of this size. So, to fully capitalize the
123
VGG16 model and for implementing the transfer learning,

the current signals are converted to this size. The five
distinct operating conditions are required to be taken
standalone. Each operating condition comprises 2000
images for training and 500 images for testing. The training
dataset comprises 10,000 images in total for five operating
conditions. Similarly, the test dataset comprises 2500
images in total for five operating conditions. The output
layer of the proposed model has to classify five different
conditions based on the given input data. So, it is a five-
Fig. 10 Rotor with one broken bar instance classification task. The output layer of the pro-
posed has five neurons corresponding to different instan-
ces. Apart from the five convolution blocks, the remaining
fully connected layers are initialized randomly. The
weights of the five convolution blocks are the same as the
VGG16 model (i.e., weights are locked), and the remaining
blocks are fine-tuned. During training, the learning rate was
set to 0.001 and is updated by the adaptive gradient
optimizer.
5 Results and discussion
Fig. 11 Rotor with two broken bars 5.1 Results and comparison with other models
An ample set of experiments were conducted on the test

Table 3 Technical aspects of the experimental setup setup for acquiring current data for TL with a deep CNN
(TL-dCNN)-based FD model. The proposed study shows
Equipments Details
that the TL-dCNN requires less computational time as
SCIM Number of stator slots: 36 compared to a model trained from scratch. It is due to
Number of rotor slots: 28 higher parameters and hyper-parameters adjustment during
Bearing Manufacturer: FAG training. TL-dCNN achieves high accuracy in 50 epochs,
Model: 6208 while deep CNN model trained from scratch (dCNN-TFS)
Pitch diameter: 60 mm require around 80 epochs to achieve good accuracy. Fig-
Ball diameter: 12 mm ure 12 shows the classification accuracy of TL-dCNN
Number of balls: 9 model and dCNN-TFS model for the same number of
epochs. The dCNN-TFS model takes more epochs than TL-
dCNN model for achieving the accuracy of more than 95%.
Due to transfer learning, TL-dCNN model converges faster
and achieves high classification accuracy. The parameter
and hyper-parameter of the deep CNN model without
transfer learning are altered through the trial and error
process. It takes a lot of time to perform this task. How-
ever, the proposed method has optimal hyper-parameters,
and fine-tuning allows it to implement it on the fault
classification task.
The comparison with state-of-the-art methods shows
that the proposed method has better accuracy than the
existing methods. The proposed method is compared with
three conventional machine learning methods, support
Fig. 12 Performance comparison between TL-based deep CNN vector machine (SVM), random forest (RF), and k-nearest
model and without TL neighbor (kNN). SVM, kNN, and RF require handcrafted
features for training rather than original signals. In order to
123
Table 4 Handcrafted features for comparison classifier trained from scratch is good. However, the
Signals Features
training time is high, and transfer learning facilitates faster
training and smoother operation. The confusion matrices
Current Mean Kurtosis Variance show that ML-based techniques perform well in identifying
Envelope RMS Skewness health conditions and bearing outer race faults. However, it
Impulse IMF Shape struggles to identify broken rotor conditions, bearing inner
Factor Energy Factor race fault, and bearing ball defects. The TL-dCNN classi-
Standard Crest Factor Wavelet fiers achieve higher accuracy in identifying all state of
Deviation Energy SCIM. It shows the effectiveness of the method under
different bearing faults and broken rotor bars.
The distributed and sparse representations learned by
analyze the proposed method with the existing state-of-the-
deep architectures (TL-dCNN model) are relatively more
art methods, handcrafted features are required. The hand-
powerful than those learned by general shallow conven-
crafted feature, which is used for SVM, kNN, and RF
tional machine learning models. It is beneficial to employ
classifiers, is tabulated in Table 4. Table 5 encapsulates the
deep architectures as compared to shallow models for
performance of existing methods with the proposed
learning effective representations of data. The profundity
method. The accuracy of SVM, kNN, and RF-based fault
of the networks allows better feature learning. The deep
detection techniques based on handcrafted features is 90%,
structure provides good domain adaptability. The proposed
78:60%, and 89:40%, respectively. The accuracy of the
model is computationally costly compared to machine
dCNN-TFS model is 96:80%. Figures 13, 14 and 15 pre-
learning (kNN or SVM)-based fault detection models. The
sent the confusion matrices for the fault detection tech-
deep learning models have a higher depth which causes the
nique based on ML-based technique. Figures 16 and 17
comparatively higher training time. However, advantages
embody the confusion matrices for dCNN-TFS and TL-
like automatic feature extraction, better feature learning,
dCNN models. The performance of the conventional
good domain adaptability, and end-to-end feature learning
machine learning models relies heavily on the selected
justify using the proposed model over the conventional
features, which restricts its performance. In addition, a
Gaussian SVM, kNN, and random forest-based fault
deep CNN model trained from scratch required 80 epochs
detection models.
to achieve an accuracy of 96:80%, while the proposed
model required 50 epochs to achieve an accuracy of
5.2 Discussion
99:40%. The Gaussian SVM-based fault detection model
required 50 epochs for the training and to achieve an
The proposed method outperforms the available techniques
accuracy of 90%. The accuracy of kNN-based fault
and makes an accurate prediction with minimum human
detection model is poor and not at par with the other
intervention. The proposed method shows that fine-tuning,
models. Also, kNN is a nonparametric algorithm. So,
along with transfer learning of the model, achieves higher
preparing a kNN classifier does not need going through the
accuracy in fewer epochs compared to a model trained
traditional methodology of iterating over the training data
from scratch. The CNN model with higher depth requires
for multiple epochs in order to optimize a set of parame-
high computational time if trained from scratch. Transfer
ters. The random forest consists of many trees that make
learning allows to overcome this deficiency of deep CNN,
the algorithm sluggish and slow, and ineffective for real-
and it is effectively used in the present manuscript. The
time analysis. It is clear from the confusion matrix that the
proposed method also minimizes the computational burden
proposed method outperforms the other available FD
despite being a deep network.
model. Unlike SVM, kNN, and RF classifiers, the proposed
method performs efficiently with no dependence on human
expertise and knowledge. The performance of deep CNN
Table 5 Comparison of
Classifier Accuracy (%) Training time (s) Predicting time (s) Averaging error
proposed method with SVM and
kNN-based fault detection kNN 78.60 5.2 2.1 –
technique
Gaussian SVM 90 10.5 1.7 –
Random Forest 89.40 28.52 4.8 –
Deep CNN 96.80 452.52 24.51 0.3385
TL-dCNN 99.40 353.4 22.4 0.0652
123
Fig. 13 Confusion matrix for model based on kNN Fig. 16 Confusion matrix for dCNN-TFS model
Fig. 14 Confusion matrix for model based on Gaussian SVM Fig. 17 Confusion matrix for TL-dCNN model
6 Conclusion
In the present work, a novel and intelligent fault detection

technique is proposed using CNN architecture with higher
depth for multiple fault detection in squirrel cage induction
motors. The comparative study shows that the proposed
method outperforms the existing state-of-the-art methods.
The proposed technique does not rely on handcrafted fea-
tures and performs faster with transfer learning-based
knowledge transfer. It can simultaneously detect the dif-
ferent bearing faults and broken rotor bar conditions. It
eradicates the mistake triggered due to manual feature
extraction and selection. The proposed method was com-
pared to other state-of-the-art methods, and results
demonstrated that it outperforms the other methods with
higher reliability and accuracy. The proposed method has
Fig. 15 Confusion matrix for model based on random forest end-to-end learning capabilities that allow its use in various
fault applications in SCIM. In the future, the proposed
123
technique can also be utilized for detecting other 19. Kral C, Habetler TG, Harley RG (2004) Detection of mechanical
mechanical and electrical faults. imbalances of induction machines without spectral analysis of
time-domain signals. IEEE Trans Ind Appl 40(4):1101–1106
20. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classi-
fication with deep convolutional neural networks. In: Advances in
neural information processing systems, pp 1097–1105
References 21. Kumar P, Hati AS (2020) Review on machine learning algorithm
based fault detection in induction motors. Arch Comput Methods
1. Banerjee TP, Das S (2012) Multi-sensor data fusion using support Eng, pp 1–12
vector machine for motor fault detection. Inf Sci 217:96–107 22. Li X, Zhang X, Li C, Zhang L et al (2013) Rolling element
2. Chen Z, Li W (2017) Multisensor feature fusion for bearing fault bearing fault detection using support vector machine with
diagnosis using sparse autoencoder and deep belief network. improved ant colony optimization. Measurement
IEEE Trans Instrum Meas 66(7):1693–1702 46(8):2726–2734
3. Choudhary A, Goyal D, Shimi SL, Akula A (2019) Condition 23. Nandi S, Toliyat HA, Li X (2005) Condition monitoring and fault
monitoring and fault diagnosis of induction motors: a review. diagnosis of electrical motors-a review. IEEE Trans Energy
Arch Comput Methods Eng 26(4):1221–1238 Convers 20(4):719–729
4. Di Lena P, Nagata K, Baldi P (2012) Deep architectures for 24. Oh H, Jung JH, Jeon BC, Youn BD (2017) Scalable and unsu-
protein contact map prediction. Bioinformatics pervised feature engineering using vibration-imaging and deep
28(19):2449–2457 learning for rotor system diagnosis. IEEE Trans Industr Electron
5. Ding X, He Q (2017) Energy-fluctuated multiscale feature 65(4):3539–3549
learning with deep convnet for intelligent spindle bearing fault 25. Palmero GS, Santamaria JJ, de la Torre EM, González JP (2005)
diagnosis. IEEE Trans Instrum Meas 66(8):1926–1935 Fault detection and fuzzy rule extraction in ac motors by a neuro-
6. Eren L, Devaney MJ (2004) Bearing damage detection via fuzzy art-based system. Eng Appl Artif Intell 18(7):867–874
wavelet packet decomposition of the stator current. IEEE Trans 26. Pons-Llinares J, Antonino-Daviu JA, Riera-Guasp M, Lee SB,
Instrum Meas 53(2):431–436 Kang TJ, Yang C (2014) Advanced induction motor rotor fault
7. Glowacz A, Glowacz Z (2017) Diagnosis of stator faults of the diagnosis via continuous and discrete time-frequency tools. IEEE
single-phase induction motor using acoustic signals. Appl Acoust Trans Industr Electron 62(3):1791–1802
117:20–27 27. Rao B (1996) Handbook of condition monitoring. Elsevier,
8. Graves A, Mohamed Ar, Hinton G (2013) Speech recognition Amsterdam
with deep recurrent neural networks. In: 2013 IEEE international 28. Sarikaya R, Hinton GE, Deoras A (2014) Application of deep
conference on acoustics, speech and signal processing, belief networks for natural language understanding. IEEE/ACM
pp. 6645–6649. IEEE Trans Audio Speech Lang Process (TASLP) 22(4):778–784
9. Henao H, Capolino G, Manes F (2013) Trends in fault diagnosis 29. Schoen RR, Habetler TG, Kamran F, Bartfield R (1995) Motor
for electrical machines: a review of diagnostic techniques. IEEE bearing damage detection using stator current monitoring. IEEE
Ind Electron Mag 8(2):31–42 Trans Ind Appl 31(6):1274–1279
10. Hoang DT, Kang HJ (2017) Convolutional neural network based 30. Shao H, Jiang H, Li X, Liang T (2018) Rolling bearing fault
bearing fault diagnosis. In: International conference on intelligent detection using continuous deep belief network with locally lin-
computing. Springer, pp 105–111 ear embedding. Comput Ind 96:27–39
11. Hwang YR, Jen KK, Shen YT (2009) Application of cepstrum 31. Shao S, McAleer S, Yan R, Baldi P (2018) Highly accurate
and neural network to bearing fault detection. J Mech Sci Technol machine fault diagnosis using deep transfer learning. IEEE Trans
23(10):2730 Industr Inf 15(4):2446–2455
12. Ince T, Kiranyaz S, Eren L, Askar M, Gabbouj M (2016) Real- 32. Simonyan K, Zisserman A (2014) Very deep convolutional net-
time motor fault detection by 1-d convolutional neural networks. works for large-scale image recognition. arXiv preprint arXiv:
IEEE Trans Industr Electron 63(11):7067–7075 1409.1556
13. Kande M, Isaksson AJ, Thottappillil R, Taylor N (2017) Rotating 33. Soualhi A, Medjaher K, Zerhouni N (2014) Bearing health
electrical machine condition monitoring automation—a review. monitoring based on Hilbert–Huang transform, support vector
Machines 5(4):24 machine, and regression. IEEE Trans Instrum Meas 64(1):52–62
14. Kang M, Kim J, Kim JM, Tan AC, Kim EY, Choi BK (2014) 34. Sugumaran V, Ramachandran K (2011) Effect of number of
Reliable fault diagnosis for low-speed bearings using individually features on classification of roller bearing faults using SVM and
trained support vector machines with kernel discriminative fea- PSVM. Expert Syst Appl 38(4):4088–4096
ture analysis. IEEE Trans Power Electron 30(5):2786–2797 35. Sun J, Yan C, Wen J (2017) Intelligent bearing fault diagnosis
15. Kankar PK, Sharma SC, Harsha SP (2011) Fault diagnosis of ball method combining compressed data acquisition and deep learn-
bearings using machine learning methods. Expert Syst Appl ing. IEEE Trans Instrum Meas 67(1):185–195
38(3):1876–1886 36. Sun W, Shao S, Zhao R, Yan R, Zhang X, Chen X (2016) A
16. Khan S, Yairi T (2018) A review on the application of deep sparse auto-encoder-based deep neural network approach for
learning in system health management. Mech Syst Signal Process induction motor faults classification. Measurement 89:171–178
107:241–265 37. Tandon N, Yadava G, Ramakrishna K (2007) A comparison of
17. Kliman GB, Premerlani WJ, Yazici B, Koegl RA, Mazereeuw J some condition monitoring techniques for the detection of defect
(1997) Sensorless, online motor diagnostics. IEEE Comput Appl in induction motor ball bearings. Mech Syst Signal Process
Power 10(2):39–43 21(1):244–256
18. Konar P, Chattopadhyay P (2011) Bearing fault detection of 38. Tavner P, Ran L, Penman J, Sedding H (2008) Condition mon-
induction motor using wavelet and support vector machines itoring of rotating electrical machines, vol. 56. IET
(SVMS). Appl Soft Comput 11(6):4203–4211 39. Thorsen OV, Dalva M (1995) A survey of faults on induction
motors in offshore oil industry, petrochemical industry, gas ter-
minals, and oil refineries. IEEE Trans Ind Appl 31(5):1186–1196
123
40. Wang J, Li S, An Z, Jiang X, Qian W, Ji S (2019) Batch-nor- 47. Younus AM, Yang BS (2012) Intelligent fault diagnosis of
malized deep neural networks for achieving fast intelligent fault rotating machinery using infrared thermal image. Expert Syst
diagnosis of machines. Neurocomputing 329:53–65 Appl 39(2):2082–2091
41. Wang W, Ismail F et al (2015) An enhanced bispectrum tech- 48. Zarei J (2012) Induction motors bearing fault detection using
nique with auxiliary frequency injection for induction motor pattern recognition techniques. Expert Syst Appl 39(1):68–73
health condition monitoring. IEEE Trans Instrum Meas 49. Zhang W, Jia MP, Zhu L, Yan XA (2017) Comprehensive
64(10):2679–2687 overview on computational intelligence techniques for machinery
42. Wen L, Li X, Gao L (2019) A new two-level hierarchical diag- condition monitoring and fault diagnosis. Chin J Mech Eng
nosis network based on convolutional neural network. IEEE 30(4):782–795
Trans Instrum Meas 60:330–338 50. Zhang X, Zhou J (2013) Multi-fault diagnosis for rolling element
43. Wen L, Li X, Gao L, Zhang Y (2017) A new convolutional neural bearings based on ensemble empirical mode decomposition and
network-based data-driven fault diagnosis method. IEEE Trans optimized support vector machines. Mech Syst Signal Process
Industr Electron 65(7):5990–5998 41(1–2):127–140
44. Wong WK, Loo CK, Lim WS, Tan PN (2010) Thermal condition 51. Zhou W, Habetler TG, Harley RG (2008) Bearing fault detection
monitoring system using log-polar mapping, quaternion correla- via stator current noise cancellation and statistical control. IEEE
tion and max-product fuzzy neural network classification. Neu- Trans Industr Electron 55(12):4260–4269
rocomputing 74(1–3):164–177 52. Zhu K, Song X, Xue D (2014) A roller bearing fault diagnosis
45. Xie J, Du G, Shen C, Chen N, Chen L, Zhu Z (2018) An end-to- method based on hierarchical entropy and support vector machine
end model based on improved adaptive deep belief network and with particle swarm optimization algorithm. Measurement
its application to bearing fault diagnosis. IEEE Access 47:669–675
6:63584–63596
46. Ye Z, Wu B, Sadeghian A (2003) Current signature analysis of Publisher’s Note Springer Nature remains neutral with regard to
induction motor mechanical faults by wavelet packet decompo- jurisdictional claims in published maps and institutional affiliations.
sition. IEEE Trans Industr Electron 50(6):1217–1228
123

Kumar Hati2021 Article TransferLearning BasedDeepCNNM

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kumar Hati2021 Article TransferLearning BasedDeepCNNM

Uploaded by

Copyright:

Available Formats

Neural Computing and Applications

Transfer learning-based deep CNN model for multiple faults detection

Received: 15 October 2020 / Accepted: 8 June 2021

1 Introduction production. Achieving higher standards in production

2.5 Transfer learning and its tuning

Transfer learning (TL) is the process of knowledge transfer

Multiple fault detection is a perplexing task, and it involves

Fig. 8 Bearing with ball defect

Fig. 4 Experimental setup for current data acquisition under different

Fig. 9 Healthy rotor

and 11. The workstation has dedicated NVIDIA Quadro

4.2 Fault detection methodology

The proposed fault detection methodology is based on the

VGG16 model and for implementing the transfer learning,

5 Results and discussion

An ample set of experiments were conducted on the test

In the present work, a novel and intelligent fault detection

You might also like