1, JANUARY 2015

Vibration Spectrum Imaging: A Novel Bearing

Fault Classification Approach
Muhammad Amar, Student Member, IEEE, Iqbal Gondal, Member, IEEE, and
Campbell Wilson, Member, IEEE

Abstract—Incipient fault detection in low signal-to-noise

ratio (SNR) conditions requires robust features for accurate
condition-based machine health monitoring. Accurate fault
classification is positively linked to the quality of features
of the faults. Therefore, there is a need to enhance the
quality of the features before classification. This paper
presents a novel vibration spectrum imaging (VSI) feature
enhancement procedure for low SNR conditions. An artifi-
cial neural network (ANN) has been used as a fault classifier Fig. 1. Frequent fault-causing elements in a bearing, adapted from [2].
using these enhanced features of the faults. The normalized
amplitudes of spectral contents of the quasi-stationary time The overall fault classification problem can be classified
vibration signals are transformed into spectral images. A into two parts, feature selection and classifier design. In re-
2-D averaging filter and binary image conversion, with ap-
propriate threshold selection, are used to filter and enhance
cent years, several feature extraction techniques have been
the images for the training and testing of the ANN classifier. developed for rotary machine fault classification, using time,
The proposed novel VSI augments and provides the visual frequency, and multiresolution analysis (MRA) features [2]–
representation of the characteristic vibration spectral fea- [22]. These techniques perform well for a variety of problems
tures in an image form. This provides enhanced spectral and have certain advantages over others, depending upon the
images for ANN training and thus leads to a highly robust
fault classifier.
nature of the problem. Fourier transform (FT) has been among
the widely used feature extraction tools for machine health
Index Terms—Artificial neural networks (ANNs), bearing monitoring (MHM) applications [2], [5], [6], [23]. However,
fault, fault diagnosis, image processing, machine health
monitoring (MHM).
the FT-based method cannot localize the transients efficiently in
time; thus, short-time Fourier transform (STFT) is employed to
I. I NTRODUCTION localize the transients. The major concern with FT and STFT is
that the accuracy of extracting frequency information is limited

R OTARY machines are a vital part of our daily lives as

we rely on their flawless performance. Thus, the timely
and precise fault diagnosis of the machines is vital. Majority of
by the length of the window relative to the duration of the fault
signature [15]. To overcome this hurdle, MRA [wavelet analysis
(WA)] or empirical mode decomposition (EMD) is used. With
the faults in the motors (about 41% of the total faults) are due the use of WA, the vibration signal can be decomposed into
to bearings [1]. Bearing faults, in motor driven systems, can series of wavelet components, each of which covers a specific
generate mechanical noise and degrade the quality of a product frequency subband called node [7], [16]. WA provides high
line. In worst cases, it can even cause downtime for the entire resolution both in time and frequency domains and can use the
system, resulting in economic losses to the customers [2]. Faults infinite set of possible base functions [22]. EMD decomposes
in the bearing are mostly due to the defects in the inner raceway, a signal into many intrinsic mode functions (IMFs) [24], where
outer raceway, or ball. These elements are illustrated in Fig. 1. each IMF contains information about the original signal. EMD
Vibrations are among the most widely used signals for the is a good choice for nonlinear and nonstationary signals. Vibra-
detection of such type of faults because these signals are an tion signals keep changing their locations in the time domain
indicator for bearing defects, provided that a suitable processing window because of unsynchronized window size with motor
procedure is applied [3]. Many studies have used vibration sig- periodic vibrations. A key drawback in wavelet and EMD is
nals for abnormality and fault detection [2], [4]–[8]. Thus, the the lack of a translation-invariant property in handling vibration
use of information-rich vibration signals for the fault detection signals, thus making node and IMF contents time variant. The
is a reasonable choice. application of wavelet transform (WT) to these time-invariant
vibration signals results in translation-variant contents in the
Manuscript received July 9, 2013; revised October 15, 2013, nodes; therefore, direct assessment from these contents often
January 20, 2014, and March 18, 2014; accepted April 26, 2014. Date of turns out to be tedious or leads to inaccurate results [12]. Thus,
publication May 30, 2014; date of current version December 19, 2014.
M. Amar and C. Wilson are with the Faculty of Information Technology, statistical features from WT node contents are calculated for
Monash University, Clayton, Vic. 3800, Australia (e-mail: Muhammad. classification purposes [7], [12]. Statistical features for pattern classification, calculated from WT nodes, contain the fraction
I. Gondal is with Monash University, Clayton, Vic. 3800, Australia, and
also with Federation University, Ballarat, Vic. 3350, Australia. of the overall information in the signal nodes as most of the time
Digital Object Identifier 10.1109/TIE.2014.2327555 or multiresolution information of the signals is discarded. Loss
of the information such as in [22] to check a certain threshold

value from node contents to generate a digital output worked
well, but in fault pattern classification for translation-variant
features, it has drawbacks. EMD, on the other hand, gives
decomposed IMFs which work well for nonstationary signals.
However, for fault pattern detection for translation-variant sig-
nals, EMD also requires the adaptive selection of relatively
important IMFs and feature extraction from these IMFs, while
Zhang et al. [24] use STFT for feature extraction from IMFs.
Both WT and EMD give decomposed signals which are sen-
sitive to the translation variance of vibrations because of the
unsynchronized window length and motor speed. In contrast,
FT has the advantage of directly representing time moving vi-
bration signals in the form of nonmoving, translation-invariant,
and equivalent spectral pattern. These translation-invariant pat-
terns can be processed and used as direct input to a classifier for Fig. 2. Flowchart of the algorithm.
In the context of spectral features using FT, existing tech-
niques use a vibration snap shot/single time segment [23]
to calculate spectral features for diagnosis purposes. Because
of several internal and external noise sources [4], the quasi-
stationary nature of bearing vibrations [4], [6] and operating Fig. 3. Exemplary motor vibration spectral image.
condition variations can cause the single-time-segmented spec-
tral contents of a particular fault to appear different for different and robustness of the VSI features under different signal-to-
time segments. This gives an imprecise insight into the fault noise ratios (SNRs) have been used for performance evaluation.
contents in the spectrum. Therefore, instead of using a single- Results have shown that the proposed VSI, image processing,
time-segment spectrum, several-time-segment spectrums of vi- and ANN have surpassed existing solutions in classification
bration signals are used into a spectral image. This enables us to accuracy and robustness. Fig. 2 shows the framework for the
visualize the trend of spectral patterns, on a larger time scale as proposed method.
compared to a single time segment. The visualization of spec- The rest of this paper is organized as follows. Section II
tral features in an image gives better understanding of faults describes the time segmentation and spectral imaging of the
and noise contents and helps in feature enhancement methods. vibration signal, Section III presents the feature enhancement
For this purpose, image processing has been introduced in this method, Section IV explains the ANN architecture and learning
paper to visualize and efficiently refine the spectral features by procedures, Section V discusses the experimental results, and
eliminating intermittent and insignificant frequencies of inco- Section VI concludes this paper.
herent noise, distributed over the entire spectral image using a
2-D averaging filter. These feature enhancements also help to II. T IME S EGMENTATION AND S PECTRAL I MAGING
ameliorate the suppressed frequency amplitudes of transients The proposed algorithm time-segments the input signal, as a
of vibrations caused by FT in extracting constituent frequency first step. The vibration signal for processing and information
information constrained by the relative length of the window. extraction is divided into time segments using a fixed rect-
The proposed vibration spectrum imaging (VSI) can enhance angular window of 1024 samples. The window size selection
features which can be used with an appropriate classifier for is discussed in Section V-B. These time-segmented vibration
fault detection. signals are then used to obtain spectral contents. For a signal X
K-means cluster, fuzzy classifier, support vector machine of length l with a window size of W samples, if U time segments
(SVM), Bayesian algorithms, and artificial neural network are combined to make a time-segmented image, then there are
(ANN) are among the widely used classifiers for MHM [6], m training set examples given by
[10], [12], [25]–[27]. Each of these classifiers has been used
m = ı/(W U ). (1)
successfully in MHM. The selection of an appropriate classifier
depends upon the nature and number of features and available With the W window size and U number of time segments
training data sets. ANNs are biologically inspired learning stacked in an image xi of a vibration signal X, it can be
processes and are used for applications such as in prognosis, represented by
classification, function approximation, and pattern recognition  
xi = Xui (w) ,
[10]. ANNs are known for their capabilities to learn a large
u = 1, 2, 3, . . . , U ; w = 1, 2, 3, . . . , W. (2)
number of features with nonlinear and complex patterns and
have been used for many MHM techniques [2], [5], [6], [8]– The spectral contents of each time segment adapted into an
[10], [20], [27]. Thus, in this paper, an ANN fault classifier image are then calculated and normalized to form a spectral
with a VSI image pattern has been used where each pixel of the image f ı using fast Fourier transform (FFT). An exemplary
image has been used as an input. The classification accuracy spectral image of motor vibrations is shown in Fig. 3.
Fig. 4. Enhanced VSI of an exemplary motor vibration signal. Fig. 5. VSI after binary conversion.

III. F EATURED I MAGE E NHANCEMENT This binary conversion of any pixel value index by ind for
the f dı image works as
The grayscale spectral image obtained in the previous section  
contains the information of characteristic frequencies of the 1 f dı (ind) ≥ thr
f bı (ind) = . (6)
bearing health, as shown in Fig. 3. There are a few bright or 0 f dı (ind) < thr
high amplitude vertical continuous spectral lines and a few
with lesser amplitude or brightness in the image. This image Both the Avg_F ilter and the binary thr help in retaining
is treated using the 2-D averaging filter (3) for feature enhance- featured patterns and removing the noise patterns. For this
ment and noise mitigation purpose, the dimension of the filtration matrix Z and the thr
value are very important for spectral enhancement purpose.
f dı = Avg_F ilter(f ı , Z). (3) The optimal size of the filter and the thr value are explained
in detail in Section V. The binary-converted-spectral image of
In (3), Z is a matrix of (rows × columns) dimensions, with the exemplary signal, using an appropriate threshold value, is
each element of the matrix having a unit value. A new value shown in Fig. 5. It can be seen that insignificant values are
of any pixel p of the image is calculated by averaging rows filtered out from the image. This enhanced image is called
(number of vertical neighboring pixel amplitudes) and columns featured image and can be fed to the neural network to learn
(number of horizontal neighboring pixel amplitudes) by keep- the important features in this image for fault classification.
ing p in the center. The size of the matrix has both vertical and
horizontal averaging effects on the image, and its dimension
selection plays an important role in image enhancement under
low SNR. Using larger row and column values causes too much ANN classifiers are a biologically inspired nonlinear empir-
augmentation of the features because of averaging; in contrast, ical model [6]. ANN design includes the input layer, hidden
using smaller values achieves insignificant augmentation. The layers, and output layer. The number of input layer neurons is
selection of dimensions of Z and its effects on feature aug- equal to the input features and the number of output neurons is
mentation will be discussed in Section V. Depending upon equal to the number of classes. There can be one or more than
the dimension of the Z matrix, the Avg_Filter does the 2-D one hidden layers with different numbers of neurons. Most of
averaging, horizontal and vertical in the image. Thus, the filter the times, increasing the number of hidden layer neurons guar-
makes the continuous lines or coherent features prominent by antees good learning, but it results in increased computational
filling the missing values with the average and mitigating the cost. Therefore, the minimum number of the hidden layers
random dots and insignificant lines in the image by averaging and neurons for certain classification accuracy is preferred.
out the contents over the filter size. As a result, averaging Training of ANN can be supervised or unsupervised, depending
will enhance coherent or featured patterns and will depreci- upon the nature of the problem [2], [9]. In this paper, we are
ate incoherent or noise spectral contents of Fig. 3 as shown using supervised learning as, for each input feature pattern, we
in Fig. 4. have an associated target output. In this paper, input feature
These depreciated patterns or noise can be eliminated, and patterns are spectral images, and targets are fault classes. Before
augmented features can be retained by converting the grayscale starting the training, the available data are divided into three
image into a binary one, using an appropriate threshold thr sets: training, validation, and testing. Mean square error (MSE)
is used to measure the accuracy of the training. The steepest
f bı = Binary(f dı , thr). (4) descent method is used to adjust the weights and biases. The
next section will discuss the experimental setup and ANN
The optimum binary thr is calculated using the thr cost
classification accuracy under different SNR conditions.

Cost(thr) = (BIim − T Iim,db,thr ). (5)
db=−d im=1 A. Experimental Setup
Here, d is the minimum decibel value for a particular Fig. 6 depicts the experimental setup for recording the
setup/experiment, n is the number of classes, BI is a baseline actual vibration data sets [28]. Experiments have been
image, and T I is a test image. Considering the BI images for conducted with four different bearings, including one normal
each class, (5) is used to calculate the cost of a particular thr and three having faults in their inner race, ball, and outer
value. The thr value that minimizes the cost function over the race. Bearing specifications are as follows (specified both
range of decibel values for a given number of classes is selected in inches and millimeters): inside diameter = 0.9843 in
as the optimum thr for binary conversion in (4). (25.001 mm), outside diameter = 2.0472 in (51.999 mm),
Fig. 6. Experimental setup adapted from [28].

thickness = 0.5906 in (15.001 mm), ball diameter =

0.3126 in (7.940 mm), and pitch diameter = 1.537 in
(39.040 mm). Drive-end (12 K) fault specifications: Inner Fig. 7. Spectral images of a normal and three fault signals at
raceway, outer raceway, and ball faults—all of these are 0-dB SNR.
of diameter = 0.007 in (0.178 mm) and depth = 0.011 in
(0.279 mm). Minimum available fault diameter of 0.007 in
(0.178 mm) has been used from 0.007 in (0.178 mm),
0.014 in (0.355 mm), 0.021 in (0.533 mm), and 0.028 in
(0.711 mm) to study incipient faults. Vibration signatures of
the faults, in its early stages, result in very poor SNR; thus, the
smallest available fault diameter with additive white Gaussian
noise (AWGN) has been used for poor SNR mimic incipient
fault [7]. Faults in the bearings are created by electrodischarge
machining. In the experiment, faulty bearings support the shaft
of the motor with a load of 2 hp at a speed of 1750 r/min.
The vibration data have been collected through accelerometers
using a 16-channel digital-audio-tape recorder and sampled at
the rate of 12 000 samples per second, giving us a 6000-Hz Fig. 8. Spectral images of a normal and three fault signals after
spectrum. Amar et al. [28] provide all possible available details averaging filter (8 × 4) at 0-dB SNR.
about bearings, faults, accelerometers, speed, and experimental
specifications. Vibration data, for certain bearing faults, can
also be generated using simulated models [29].

B. Spectral Imaging and Feature Enhancement

The recorded vibration data were converted into spectral
images. The 1750 r/min and 12 000 samples per second corre-
spond to a 412-sample length of window per cycle. Two cycle
lengths of vibration samples have been used to capture fault
information. The next nearest power of 2 for two cycles gives
us a 1024 window length. Keeping the window length relative
to the fault signature duration gives the augmented amplitude
calculation of spectral features [27]. Fig. 9. Binary converted spectral images of a normal and three fault
signals with thr = 0.7 at 0-dB SNR.
Spectral images are of (8 × 513) size having 4104 elements,
with U = 8. This indicates that 8 consecutive time segment
spectral contents have been stacked to form an image with each to have a fairly large training set and feature augmentation at
segment of 513 frequency indices in the image. In the high the same time. The number of stacked segments into an image
SNR case, for noise mitigation, the number of time segments and its effect on augmentation and computational time delays
stacked into an image does not matter because averaging has are subjected to future studies. Fig. 7 shows spectral images of
insignificant effect, but a larger U will reduce the training four vibration signals named normal, inner-race fault, ball fault,
set size (1). In contrast, in low SNR, better noise mitigation and outer-race fault at SNR = 0 dB.
can be achieved using a larger number of stacked segments These spectral images are processed by (3) with an averaging
because averaging depreciates the incoherent noise over the filter of (8 × 4) and then converted to binary (4) with thr = 0.7
filter dimensions. With a larger U value, better enhancement as shown in Figs. 8 and 9, respectively. The selection of (8 ×
of features can be achieved, but it results in a delayed and slow 4) filter size and thr = 0.7 is of major concerns in VSI and
diagnostic response. The limited size of available recorded data is discussed later in detail. These obtained binary images are
also puts a limit on the stack size for reasonable training set feature images and are used to form a data set that will be used
size (1). In this paper, images of (8 × 513) size have been used for the training and testing of the ANN.
Fig. 10. Selection of minimum number of hidden layer neurons.

Fig. 13. SNR = −10 dB spectral images.

Fig. 11. Trained neural network architecture.

Fig. 14. SNR = −15 dB spectral images.

and unknown patterns. After training, ANN was assessed with

different normal and abnormal vibration signals, and it was
able to classify fault signals into respective classes with 100%
accuracy with SNR = 0 dB.

D. Discussion on Filter Size and Binary Threshold Value

Selection With Different SNR Conditions
With SNR = 0 dB, ANN was able to classify the faults
Fig. 12. Training curves of ANN.
efficiently. However, in an industrial environment, fault sig-
natures are contaminated with noise [7], and also, incipient
faults have very low SNR. Thus, the robustness of the VSI
C. ANN Training
is to be tested under different SNRs. AWGN has been added
The training set obtained from feature images with SNR = with recorded vibration signals to achieve different SNR values
0 dB contains spectral images with each image having representing incipient fault conditions. In [7], the authors have
4104 frequency features. These 4104 features are the inputs, discussed the classification accuracy at an SNR of −10 dB,
and the four target classes are the output of the ANN. One as the worst case scenario, and have compared classification
hidden layer with a different number of neurons has been in- results with existing techniques using the same experimental
vestigated to achieve the best training with a minimum number setup for inchoate faults. In this paper, we will compare the
of neurons for faster computation. Three neurons in the hidden results with these established methods to prove the efficacy of
layer gave the required accuracy with minimum computation as the proposed method. Figs. 13 and 14 show the spectral images
shown in Fig. 10. of four fault classes at SNR = −10 dB and SNR = −15 dB.
Experimentally recorded data set was divided into 896 (70%) In these cases, SNR is so low that it is hard even to see the
training, 192 (15%) validation, and 192 (15%) test sets after straight lines suppressed under noise in the spectral images as
random shuffling. Using feedforward and back propagation compared to Fig. 7 at SNR = 0 dB. As mentioned previously,
algorithm, ANN has been trained for pattern classification. The the (8 × 4) filter size and thr = 0.7 are the optimized values
architecture of ANN is shown in Fig. 11. Fig. 12 shows the in this approach, and thus, we will first discuss the averaging
training, validation, and testing curves of the ANN. filter and its size effects and then the thr value to enhance these
Our studies show that the best case achieved by the validation spectral images.
and test sets gives very infinitesimal MSE, 3.2e−10 , implying 1) Averaging Filter Size: Fig. 15 shows the enhanced
the accomplished generalized learning of the ANN for known spectral images of Fig. 13, with the (8 × 4) averaging filter.
Fig. 15. Enhanced spectral images with averaging filter of (8 × 4) at Fig. 18. thr = 0.7, SNR = −10 dB, and Z dimensions = (2 × 2).
SNR = −10 dB.

Fig. 19. thr = 0.7, SNR = −10 dB, and Z dimensions = (4 × 4).
Fig. 16. Enhanced image of −10-dB SNR with averaging filter of
(8 × 4) converted to binary image with thr = 0.7.

Fig. 20. thr = 0.7, SNR = −10 dB, and Z dimensions = (8 × 8).
Fig. 17. Binary image with thr = 0.7 without averaging filter.
2) The row value has effects on filling the missing values
Then, Fig. 16 shows the binary converted spectral images of among the stacked segments and mitigating the noise.
Fig. 15 using thr = 0.7. Fig. 17 shows the binary converted With a larger row value, the missing value can be better
spectral images of Fig. 13 with the same threshold thr = estimated, and larger reduction in noise can be achieved.
0.7 without applying the averaging filter. Now, by looking at However, a larger row value truncates the featured image
Figs. 16 and 17, it is clear that the averaging filter enhances from top and bottom as shown in Fig. 21. Smaller row
the image by brightening the fault signature features and by values do not estimate the missing values accurately as
mitigating the noise of the spectral images. shown in Fig. 18. Therefore, a compromised row value
Let us discuss the dimensions of the filter size. Figs. 18–22 is required which gives us optimum results, and that
show the binary converted images processed with different value is 8.
averaging filter sizes. The averaging filter is a 2-D filter with Thus, comparing Figs. 18–22 with Fig. 16, it is evident that,
both horizontal and vertical effects on the image. with the averaging filter of (8 × 4), we get better enhancement
1) With a larger column value, a larger number of adjacent of features and noise mitigation.
frequencies in the spectrum will be affected by the av- 2) Binary Threshold Value: After determining the size of
eraging as shown in Fig. 22. Thus, a maximum column the averaging filter, now, we will discuss about the threshold
value that avoids the merging of the adjacent frequencies value for binary conversion. Figs. 23 and 24 show the effects of
is preferred, and that value is 4 in this case. the threshold values of thr = 0.5 and thr = 0.9 with the use
Fig. 21. thr = 0.7, SNR −10 dB, and Z dimensions = (16 × 16). Fig. 24. thr = 0.5, SNR = −10 dB, and Z dimensions = (8 × 4).

Fig. 22. thr = 0.7, SNR = −10 dB, and Z dimensions = (16 × 4).

Fig. 25. Binary threshold value selection.

optimum choice to remove the noise under poor SNR and still
ensures good classification by retaining the spectral features.
3) Costs Associated With Average Filtering and Binary
Threshold: There are two noise sources in the spectral images:
intentionally added AWGN and processing noise. Processing
noise is the error caused by different enhancement procedures
in the spectral images. The total noise or error cost associated
with the test image relative to the baseline image can be studied
Fig. 23. thr = 0.9, SNR = −10 dB, and Z dimensions = (8 × 4). by modifying (5) into (7)

( nim=1 abc(BIim − T Iim ))
of the averaging filter of size (8 × 4). Comparing these figures Cost(db) = . (7)
N orm
with Fig. 16, it is clear that using a higher thr value results in
the loss of features and a lesser thr value results in inefficient N orm is the normalization factor and is equal to the product
noise removal. After a number of studies using different thr of the number of image pixels and the number of classes. If
values, it was found that thr = 0.7 is a better choice for binary we have a white test image and a black baseline image or
conversion. vice versa, then (7) will give us the cost of 1, which is the
A visually inspected optimized thr = 0.7 value can be con- maximum possible error or cost. As similarity between baseline
firmed by (5). The plot of the cost function in (5) over the thr and test images increases, the cost value decreases. Therefore,
range of 0–1 with n = 4 and d = 12 is shown in Fig. 25 using the cost of 1 corresponds to maximum dissimilarity, and that of
baseline images of four classes with the averaging filter of (8 × 0 corresponds to maximum similarity, indicating the absence of
4). It is evident from the plot that the cost function minimizes AWGN and processing noise. If we know the cost of AWGN,
at the thr = 0.7 value. then the scale of 0 to 1 can be used to estimate the processing
Fig. 25 indicates that 0.7 minimizes the cost function. The noise caused by image enhancement procedures for a spectral
cost of the function is high before and after thr = 0.7. Costs image at any given SNR. Fig. 26 shows the error curves of
at thr values lesser than 0.7 point toward excessive noise in different image enhancement procedures in comparison to no-
the images, and costs at values greater than 0.7 indicate poor enhancement reference curve, i.e., AWGN only at any SNR
fault features. Thus, binary conversion with thr = 0.7 is the level.
Fig. 26. Comparison of error caused by different image enhancement


Comparative results are presented in Fig. 26 in the form of

four error curves. These curves are no-enhancement (AWGN
only), averaging filter (AWGN and processing noise of the aver-
aging filter), binary threshold (AWGN and processing noise of
binary conversion), and combination of both the averaging filter
and binary threshold (AWGN and processing noise caused by Fig. 27. Confusion matrix of VSI-based ANN at −10 dB.
both the averaging filter and binary conversion). The increased
cost/error level shown by the averaging filter curve over the C LASSIFICATION C OMPARISON OF VSI AT SNR = −10 dB
no-enhancement curve indicates that, to achieve feature aug- W ITH T HAT IN [7]
mentation and missing values, the averaging filter introduces
noise in the image; thus, the total cost goes well above the
reference. This augmentation and increased noise can be seen
in Figs. 7 and 8. In contrast, the binary threshold curve always
stays below the reference, indicating noise mitigation, and is
vivid in Fig. 9 when compared with Figs. 7 and 8. This can be
interpreted as that the averaging filter increases the noise level
to achieve content augmentation while the binary conversion
mitigates noise but achieves no feature augmentation; thus, the
combination of both should provide better enhancement by
providing augmentation and noise mitigation in the images. probability), and this can be calculated from the confusion
Fig. 26 shows that the curve with the combination of both matrix, shown in Fig. 27, by considering the number of true
the averaging filter and binary threshold minimizes the cost in positives, false positives, false negatives, and true negatives.
comparison to all other curves. The averaging filter achieved For better fault classification, the high hit and low false alarm
feature augmentation but increased the total noise which was probabilities are preferred, and for a perfect classifier, these
then mitigated by binary threshold and thus resulted in better probabilities will be 1 and 0, respectively [30]. For the four
feature enhancement. classes, from the confusion matrix (see Fig. 27), the hit and
false alarm probability pairs are (1, 0), (0.87, 0), (1, 0), and
(1, 0.04), respectively. The overall hit and false alarm proba-
E. Classification Accuracy
bilities of the classifier are 0.96 and 0.01, respectively. High
Now, we will discuss the classification accuracy of the hit probability (0.96) and low false alarm probability (0.01)
trained ANN under different SNR conditions to validate the indicate that VSI with ANN is capable to distinguish different
robustness of the VSI-based ANN classifier. Table I shows classes accurately, showing very high correct detection percent-
the classification accuracy of the VSI with the SNR varying age with very low false alarms even under poor SNR.
from 0 dB to −15 dB. Results show that VSI is capable to have Table II shows the comparison of well-established techniques
higher classification accuracy even under adverse conditions. for classification accuracies mentioned in [7] in the worst case
Fig. 27 shows the confusion matrix of VSI with ANN at scenario of SNR = −10 dB for inchoate/incipient faults.
−10 dB. The classifier has achieved overall very high (96.9%) Table II indicates that the accuracy of VSI with ANN sur-
classification accuracy. To investigate how good the classifier passes the existing best case techniques by 5.67% at SNR =
has learned about different classes, we need to determine the hit −10 dB. Even, according to Tables I and II, VSI at SNR =
and false alarm probability pair (hit probability and false alarm −12 dB outperforms that in [7] at SNR = −10 dB.
To validate the robustness of the VSI features, ANN was [9] J. F. Martins, V. F. Pires, and A. J. Pires, “Unsupervised neural-network-
trained with WT features proposed in [7] for comparison. These based algorithm for an on-line diagnosis of three-phase induction motor
stator fault,” IEEE Trans. Ind. Electron., vol. 54, no. 1, pp. 259–264,
features gave 100% accuracy at 0 dB with high hit probability Feb. 2007.
and low false alarm probability, but at −10 dB, it had an overall [10] C. S. Tyagi, “A comparative study of SVM classifiers and artificial neu-
hit probability of 0.38 and a false alarm probability of 0.15. This ral networks application for rolling element bearing fault diagnosis us-
ing wavelet transform preprocessing,” in Proc. World Acad. Sci., Eng.
indicates that these features are not performing well and are Technol., Sep. 2008, vol. 45, pp. 319–327.
unable to discriminate among different classes with ANN under [11] F. Filippetti, G. Franceschini, C. Tassoni, and P. Vas, “Recent develop-
poor SNR and hints that VSI features are robust when used with ments of induction motor drives fault diagnosis using AI techniques,”
IEEE Trans. Ind. Electron., vol. 47, no. 5, pp. 994–1004, Oct. 2000.
ANN even under poor SNR in contrast to the features in [7]. [12] C. Chen, B. Zhang, and G. Vachtsevanos, “Prediction of machine health
The translation-invariant nature of FT spectral contents, in condition using neuro-fuzzy and Bayesian algorithms,” IEEE Trans.
contrast to WT, and image processing, with visual aid for Instrum. Meas., vol. 61, no. 2, pp. 297–306, Feb. 2012.
[13] Y. Jun-rong, Y. Min, C. Xia, and H. Yan, “Fault diagnosis of rolling
feature enhancement, has helped to increase the classification bearing based on rough set and neural network,” Appl. Mech. Mater.,
accuracy under low SNR. The averaging filter size and binary vol. 58–60, pp. 974–977, Jun. 2011.
threshold value are very crucial parameters for this reliable [14] M. Xia, F. Kong, and F. Hu, “An approach for bearing fault diagnosis
based on PCA and multiple classifier fusion,” in Proc. 6th IEEE Joint Int.
fault classification. The 2-D averaging filter simultaneously ITAIC, Aug. 2011, pp. 321–325.
mitigates the incoherent noise and augments the coherent fault [15] G. G. Yen and K.-C. Lin, “Wavelet packet feature extraction for vibration
features even under low SNR. Applying an appropriate binary monitoring,” IEEE Trans. Ind. Electron., vol. 47, no. 3, pp. 650–667,
Jun. 2000.
threshold removes the mitigated noise and keeps the augmented [16] S. K. Goumas, M. E. Zervakis, and G. S. Stavrakakis, “Classification of
fault signatures and thus provides feature enhancement that washing machines vibration signals using discrete wavelet analysis for
helps to achieve improved classification accuracy. feature extraction,” IEEE Trans. Instrum. Meas., vol. 51, no. 3, pp. 497–
508, Jun. 2002.
[17] X. Lou and K. a Loparo, “Bearing fault diagnosis based on wavelet
transform and fuzzy inference,” Mech. Syst. Signal Process., vol. 18, no. 5,
VI. C ONCLUSION pp. 1077–1095, Sep. 2004.
[18] S. Seker and E. Ayaz, “Feature extraction related to bearing damage in
In this paper, VSI and ANN-based bearing fault classification electric motors by wavelet analysis,” J. Franklin Inst., vol. 340, no. 2,
into the inner race, outer race, and ball faults against the pp. 125–134, Mar. 2003.
[19] F. Li, G. Meng, L. Ye, and P. Chen, “Wavelet transform-based higher-order
usual behavior has been presented. The spectral contents of statistics for fault diagnosis in rolling element bearings,” J. Vib. Control,
the translation-variant time-segmented vibration signal, trans- vol. 14, no. 11, pp. 1691–1709, Nov. 2008.
formed into a spectral image, have been processed for feature [20] B. Samanta and K. R. Al-Balushi, “Artificial neural network based fault
diagnostics of rolling element bearings using time-domain features,”
enhancement using the 2-D averaging filter and grayscale to Mech. Syst. Signal Process., vol. 17, no. 2, pp. 317–328, Mar. 2003.
binary image conversion. These enhanced featured images are [21] A. Malhi and R. X. Gao, “PCA-based feature selection scheme for ma-
then used to train the ANN classifier. Results have shown that chine defect classification,” IEEE Trans. Instrum. Meas., vol. 53, no. 6,
pp. 1517–1525, Dec. 2004.
ANN with VSI has learned the complex known and unknown [22] K. L. V. Iyer, X. Lu, Y. Usama, V. Ramakrishnan, and N. C. Kar, “A
patterns with 96.90% accuracy, outperforming the existing twofold Daubechies-wavelet-based module for fault detection and voltage
techniques by 5.67% at the worst case of SNR = −10 dB. regulation in SEIGs for distributed wind power generation,” IEEE Trans.
Ind. Electron., vol. 60, no. 4, pp. 1638–1651, Apr. 2013.
[23] M. Blodt, P. Granjon, B. Raison, and G. Rostaing, “Models for bearing
damage detection in induction motors using stator current monitoring,”
R EFERENCES IEEE Trans. Ind. Electron., vol. 55, no. 4, pp. 1813–1822, Apr. 2008.
[1] Motor Reliability Working Group, Power Systems Reliability Subcom- [24] Y. Zhang, C. Bingham, M. Gallimore, Z. Yang, and J. Chen, “Machine
mittee, Power Systems Engineering Committee, Industrial and Commer- fault detection during transient operation using measurement denoising,”
cial Power Systems Department, IEEE Industry Applications Society, in Proc. IEEE Int. Conf. CIVEMSA, Jul. 2013, pp. 110–115.
“Report of large motor reliability survey of industrial and commercial [25] C. T. Yiakopoulos, K. C. Gryllias, and I. A. Antoniadis, “Rolling element
installations, part I,” IEEE Trans. Ind. Appl., vol. IA-21, no. 4, pp. 853– bearing fault detection in industrial environments based on a K-means
864, Jul. 1985. clustering approach,” Expert Syst. Appl., vol. 38, no. 3, pp. 2888–2911,
[2] B. Li, M.-Y. Chow, Y. Tipsuwan, and J. C. Hung, “Neural-network-based Mar. 2011.
motor rolling bearing fault diagnosis,” IEEE Trans. Ind. Electron., vol. 47, [26] V. Sugumaran and K. I. Ramachandran, “Fault diagnosis of roller bearing
no. 5, pp. 1060–1069, Oct. 2000. using fuzzy classifier and histogram features with focus on automatic rule
[3] A. Bellini, F. Immovilli, R. Rubini, and C. Tassoni, “Diagnosis of bearing learning,” Expert Syst. Appl., vol. 38, no. 5, pp. 4901–4907, May 2011.
faults of induction machines by vibration or current signals: A critical [27] M. Amar, I. Gondal, and C. Wilson, “Multi-size-window spectral augmen-
comparison,” in Conf. Rec. IEEE IAS Annu. Meeting, Oct. 2008, pp. 1–8. tation: Neural network bearing fault classifier,” in Proc. IEEE 8th Conf.
[4] M. Amar, I. Gondal, and C. Willson, “Unitary anomaly detection for ICIEA, Jun. 2013, pp. 261–266.
ubiquitous safety in machine health monitoring,” in Proc. Neural Inf. [28] Bearing Data Center, Jan. 2009. [Online]. Available: http://csegroups.
Process., 2012, pp. 361–368.
[5] S. Hayashi, T. Asakura, and S. Zhang, “Study of machine fault diag- [29] F. Immovilli, C. Bianchini, M. Cocconcelli, A. Bellini, and R. Rubini,
nosis system using neural networks,” in Proc. IJCNN, 2002, vol. 1, “Bearing fault model for induction motor with externally induced vi-
pp. 956–961. bration,” IEEE Trans. Ind. Electron., vol. 60, no. 8, pp. 3408–3418,
[6] H. Su and K. T. Chong, “Induction machine condition monitoring using Aug. 2013.
neural network modeling,” IEEE Trans. Ind. Electron., vol. 54, no. 1, [30] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking clas-
pp. 241–249, Feb. 2007. sification models for software defect prediction: A proposed framework
[7] M. F. Yaqub, I. Gondal, and J. Kamruzzaman, “Inchoate fault de- and novel findings,” IEEE Trans. Softw. Eng., vol. 34, no. 4, pp. 485–496,
tection framework: Adaptive selection of wavelet nodes and cumulant Jul./Aug. 2008.
orders,” IEEE Trans. Instrum. Meas., vol. 61, no. 3, pp. 685–695,
Mar. 2012.
[8] Y. Yang and W. Tang, “Study of remote bearing fault diagnosis based on
BP neural network combination,” in Proc. 7th ICNC, Jul. 2011, vol. 2, Authors’ photographs and biographies not available at the time of
pp. 618–621. publication.
