Professional Documents
Culture Documents
s11042-024-18546-9
s11042-024-18546-9
s11042-024-18546-9
https://doi.org/10.1007/s11042-024-18546-9
Abstract
For the safety and reliability of the system, Remaining Useful Life (RUL) prediction is
considered in many industries. The traditional machine learning techniques must provide
more feature representation and adaptive feature extraction. Deep learning techniques like
Long Short-Term Memory (LSTM) achieved an excellent performance for RUL prediction.
However, the LSTM network mainly relies on the past few data, which may only capture some
contextual information. This paper proposes a hybrid combination of Convolution Neural
Network (CNN) and LSTM (CNN+LSTM) to solve this problem. The proposed hybrid
model predicts how long a machine can operate without breaking down. In the proposed
work, 1D horizontal and vertical signals of the mechanical bearing are first converted to 2D
images using Continuous Wavelet Transform (CWT). These 2D images are applied to CNN
for key feature extraction. Ultimately, these key features are applied to the LSTM deep neural
network for predicting the RUL of a mechanical bearing. A PRONOSTIA data is utilized to
demonstrate the performance of the proposed model and compare the proposed model with
other state-of-the-art methods. Experimental results show that our proposed CNN+LSTM-
based hybrid model achieved higher accuracy (98%) with better robustness than existing
methods.
1 Introduction
123
Multimedia Tools and Applications
and energy efficiency prediction of RUL is crucial. Due to the complicated structure of the
machine, its maintenance is an important and challenging task. In the critical time, due to
poor maintenance of the machine, its failure probability is higher. Different maintenance
approaches are used to increase the reliability, safety, accessibility, and operational quality
of the machine. These approaches not only increase the reliability, safety, and accessibility
but also reduce the unplanned downtime and operational costs of the machine.
Different industries, such as automobile, manufacturing, aircraft, etc., use machine main-
tenance breakdown techniques [1, 5]. The traditional machine maintenance approach uses
its break point, like reactive or breakdown maintenance, and repair is performed after the
machine fails. This approach is a normal maintenance strategy in which equipment continues
to operate until it breaks down due to a final failure. However, a key advantage of the reactive
maintenance approach is that it gives a longer running time to the machine till the machine’s
element breaks down. However, the key limitation of this approach is that it simply gives an
inactive response when it fails. It may result in significant damage to the machine or a high
probability of an accident. Further, the reactive maintenance approach is unsuitable when
a machine contains expensive parts [16]. The completely damaged expensive parts require
more money to repair. Hence, using the reactive technique for machine maintenance is not a
good idea.
To overcome the reactive machine maintenance technique’s problem, we need to conduct
frequent equipment tests at regular intervals. This technique is known as the preventive main-
tenance technique. Although the preventative maintenance technique enhances the machine’s
efficiency, performance, and safety, it also raises maintenance costs. The other limitation of
the preventive maintenance test is that it is hard to find out when to perform this test [3, 15].
The answer to this question is that we need to adopt a more novel machine maintenance
technique, the predictive maintenance technique. We can test the machine’s elements using
the predictive maintenance technique before failure happens. This approach estimates the
maintenance time, i.e. (the time it will take for a machine to fail). Further, it also detects
the machine’s flaws and indicates which part of the machine needs to be repaired. Thus,
the predictive maintenance approach not only reduces the downtime of the machine but also
extends the lifetime of the machine.
The predictive maintenance technique is further divided into model-based and data-driven-
based categories. The following two methods (model-based and data-driven) can be used to
estimate the machine’s RUL prediction. Further, Model-based approaches rely on a physical
model based on physics concepts. A model-based approach anticipates when a device may
fail [18, 23]. Additionally, model-based approaches are hard to implement since their models
are hard to comprehend, and building a reliable physical model is also challenging.
The concept of data-driven methods for RUL predicting a mechanical element has recently
gained much attraction due to their advantages over model-based approaches [4, 7]. In a
prediction-based approach, a prediction model of a mechanical element failure is created
using data-driven methodologies. The prediction model takes data as an input and predicts
the output. The prediction of output is based on the input applied to the model. In a machine
learning model using data-driven methodologies, the first step is to collect data under various
operating conditions.
Data-driven approaches may be constrained if the data obtained needs to be clarified or
insufficient. When a significant volume of data about a machine is accessible, degradation or
the damage probability of that machine may be computed efficiently and precisely. However,
applying a data-driven approach for prediction when there is limited or insufficient data
can be challenging. However, it is still possible by applying some strategies and techniques
such as data augmentation, feature engineering, transfer learning, regularization, ensemble
123
Multimedia Tools and Applications
method, etc. Further, using data-driven methodology, the damaged component of a machine
can be calculated even without knowing anything about the equipment [24]. Hence, data-
driven approaches are easier to execute than model-based approaches. This paper proposes a
CNN+LSTM-based hybrid data-driven approach for RUL predicting a mechanical bearing.
The proposed CNN+LSTM-based RUL prediction algorithm comes under the predictive
machine maintenance technique category, in which we can test the machine element based
on a predicted RUL of the mechanical element. Hence, the main goal of the proposed work
in this paper is to predict the RUL of a machine bearing. The fundamental motivation to use
the hybrid combination of CNN and LSTM network for RUL prediction based on vibration
signals is that vibration signals from mechanical equipment are typically time series data.
CNNs excel at learning hierarchical features from spatial data, and they can be adapted to
work with time series data [14].
Using CNNs as our model’s initial layers, we can automatically extract relevant features
from the vibration signals [10]. It reduces the need for manual feature engineering, which
can be time-consuming and error-prone. Further, mechanical equipment vibrations often
contain both spatial and temporal patterns. CNNs are excellent at capturing spatial patterns
within the signals, while LSTMs are designed to capture temporal dependencies [13]. By
combining these two architectures, we can effectively model and exploit spatial and temporal
information in the data. In the proposed mechanical element’s RUL prediction approach,
vibration signals of a machine bearing are often used to monitor the health of the machine’s
bearing. The horizontal and vertical vibrations of a machine’s bearing indicate the present
health condition of a machine.
In short, the fundamental idea of the proposed work in this paper is described as follows:
A data collection of the machine’s mechanical elements, i.e. (bearing), is performed under
different working conditions. The acquired machine bearing dataset consists of horizontal
and vertical vibration signals of the machine’s bearing at regular intervals. These signals are
captured on different dates, hours, minutes, and microseconds. The horizontal and vertical
1D signals are prepossessed and converted from 1D signals to 2D images using Continuous
Wavelength Transform (CWT).
Further, converted 2D images of horizontal and vertical signals are divided into two parts:
training and testing. The training part of 2D images of horizontal and vertical vibration signals
is applied to CNN for key feature vector extraction from the 2D images. The extracted feature
vectors from the CNN are applied to the LSTM for RUL prediction of mechanical bearing.
To check the accuracy of the proposed model, we calculate different performance metrics
such as the Sum of Mean Squared Error (SME), Mean Absolute Error(MAE), etc. The key
contributions of the proposed work in this paper are as follows:
1. To acquire, analyze, and pre-process the dataset of a mechanical bearing on different
operating conditions.
2. To convert 1D vibration signal to 2D images and extract key feature vectors from the
mechanical bearing dataset.
3. To design and develop a proposed CNN+LSTM algorithm for the RUL prediction of a
mechanical bearing.
4. To calculate different performance metrics of the proposed hybrid model for mechanical
bearing RUL prediction.
The remaining sections of the paper are organized as follows. Section 2 deals with the
background and literature survey. Section 3 describes the proposed CNN+LSTM based RUL
prediction algorithm. The experimental setup and result analysis are described in Section 4.
Finally, Section 5 highlights the conclusion and future direction of the proposed work.
123
Multimedia Tools and Applications
2 Literature survey
Li et al. [12] proposed an RUL prediction algorithm for rolling bearing. In their proposed
work, an author used a hybrid combination of the elastic net with an LSTM. The proposed
algorithm is known as E-LSTM. Their proposed approach considered temporal and spatial
correlation to forecast the RUL. Further, they reduced the over-fitting of the LSTM network
by the elastic net-based regularization term. The key limitation of the proposed work is that
the author directly applied the vibration signals to an E-LSTM. Therefore, the accuracy of
the proposed work could have been better.
Xi et al. [21] designed and developed an RUL prediction algorithm for a dynamic system
subject to multiple dependent degradations. The proposed system is referred to as an online
RUL system. By observing the multidimensional data, the author predicted the RUL of a
blast furnace. In their proposed work, the authors used a sequential Kalman filter. They also
verified their solution by the numerical approach. The key limitation of their proposed work
is that the Kalman filter assumes that the dependent and independent variables are linearly
related. Therefore, the accuracy of the model could have been better.
Liu et al. [11] proposed an RUL prediction system for aircraft. The authors used Deep
Convolutional Neural Network (DCNN) and Light Gradient Boosting Machine (LightGBM)
algorithms in their proposed work. The advantage of the proposed system is that signal
processing of raw sensor data and prior expertise are optional. In their proposed work, the
authors applied the time window of raw data as input to the DCNN for key feature vector
extraction. Further, they replaced the fully connected layer of the DCNN with LightGBM.
Hence, by replacing the fully connected layer from LightGBM, they improved the prediction
accuracy of the mechanical element.
Wang et al. [19] performed a comparative study of 4 different deep neural networks,
such as Deep Belief Network (DBN), Convolution Neural Network (CNN), and Recurrent
Neural Network (RNN), for predicting the RUL of mechanical elements. The key limitation
of the proposed work is that the author needed to conduct an experimental part or propose
an algorithm in their proposed work.
Sayyad et al. [15] summarized the different research works on RUL prediction of mechan-
ical components. In their proposed work, the authors highlighted the different data sets
available on RUL prediction. They also described the merits and demerits of the existing
work on RUL prediction. In the end, they highlighted future research directions on RUL
prediction. The key limitation of the proposed work is that the author did not propose any
novel approach for RUL forecasting, and they did not conduct any experiments.
Hong et al. [6] proposed an RUL prediction algorithm for degradable equipment. Their
proposed work used different signal decomposition methods to minimize the outlier. They
designed and developed a fast, accurate RUL prediction system for degradable equipment.
First, they extracted the time-frequency domain features using Wavelet Transform (WT).
Further, they jointly utilized three Recurrent Neural Networks (RNN) to predict the RUL of
degradable equipment. Their experiment used a gas turbine engine dataset for RUL prediction.
Wenqiang et al. [20] proposed a Temporal Convolution Network (TCN) based RUL pre-
diction system for mechanical elements. In their proposed work, they applied the K-mean
clustering to identify the operating condition of the system. Then, they used the sliding time
window concept to construct an input model. In the end, they compared their result with
other RUL prediction techniques. The advantage of the proposed work is that they designed
the RUL prediction algorithm with greater accuracy.
123
Multimedia Tools and Applications
Wang et al. [17] proposed an RUL prediction system based on a Deep Residual Attention
Network (DRAN). The author handled different sensor data for RUL prediction in their
proposed work. The proposed DRAN comprises the representation learning sub-network
and the RUL prediction sub-network. Further, they constructed DRAN to extract important
information hidden in the sensor data and surpass the useless information from the DRAN.
Liu et al. [9] proposed a feature attention-based end-to-end RUL prediction approach. In
their proposed work, they applied input data with greater attention weight to key feature vec-
tors during the training phase of the proposed model. Then, these weighted feature vectors are
applied to Bidirectional Gated Recurrent Units (BGRU) to extract long-term dependencies.
In the end, they used a fully connected network for predicting RUL. They used the turbofan
engines dataset in their experimentation.
A double CNN architecture was proposed by Yang et al. [22]. Their research work fed
the original vibration signal to the proposed double CNN. The double CNN model extracted
maximum key information from the input data. The proposed double CNN architecture
includes two stages: the first CNN model identifies the incipient fault point, and the second
CNN model is constructed for RUL prediction. Further, they compared the proposed double
CNN model performance with other state-of-the-art algorithms. The proposed algorithm
performs well regarding the RUL prediction accuracy of mechanical elements.
Jin et al. [8] proposed a handcrafted feature flows (HFFs) feature extraction technique. In
their work, they suppressed the raw signal noise and improved the sequential information in
the data. Further, they proposed a Bi-directional LSTM (Bi-LSTM) based two-stream network
for RUL prediction. They experimented on the commercial modular aero propulsion system
simulation (C-MAPSS) dataset.
3 Proposed work
The flow chart of the proposed work for the CNN+LSTM-based RUL prediction system is
shown in Fig. 1. As shown in Fig. 1, the proposed work consists of different stages such
as mechanical bearing dataset acquisition, data preprocessing, proposed CNN+LSTM-based
model training, hyperparameter tuning, model testing, and performance evaluation. First,
data acquisition of mechanical bearing is performed from the different resources [2]. The
acquired dataset consists of a mechanical bearing’s horizontal and vertical vibration signals.
The mechanical bearing dataset is a time-series data known as the PRONOSTIA dataset.
After acquiring the mechanical bearing dataset, data preprocessing is performed to detect
outliers and null values in the acquired dataset. Then, we performed data prepossessing to
remove null values outliers from the dataset and converted 1D time-series data to 2D images.
The division of the dataset into different parts, such as training, validation, and testing, is
performed to train, validate, and test the proposed model respectively. In the end, different
performance mercies are calculated in the proposed work. A detailed description of all the
steps in Fig. 1 is as follows.
A PRONOSTIA (PHM IEEE 2012 Data Challenge Data set) [2] dataset is used to develop an
RUL prediction model. The PRONOSTIA dataset consists of two parts: the learning (training)
part and the testing part. The learning part consists of 6 rolling bearing information on dif-
ferent operating conditions such as Bearing11 , Bearing12 , Bearing21 , Bearing22 , Bearing31 ,
123
Multimedia Tools and Applications
Data preprocessing
Partition of dataset
Removing null Conversion of in Training,
Mechanical bearing
values and Partition of dataset vibration signal Validation and
dataset acquisition
outliers into 2D images Testing
Training dataset
Mode training
Feature selection
using CNN
Hypertune
parameters
RUL prediction
using LSTM
Reject
Accept
Calculate
Test dataset Save model performance
metrics
Bearing32 . The testing part consists of 11 rolling bearing information on different operat-
ing conditions such as Bearing13 , Bearing14 , Bearing15 , Bearing16 , Bearing17 , Bearing23 ,
Bearing24 , Bearing25 , Bearing26 , Bearing27 , Bearing33 .
Over some time in the PRONOSTIA dataset, each bearing’s horizontal and vertical signals
are recorded in a (.csv) file. Further, a PRONOSTIA dataset consists of several (.CSV) files
corresponding to each bearing. Each (.CSV) file describes six different pieces of information:
hour, minute, second, microsecond, horizontal acceleration, and vertical acceleration.
The vertical acceleration and horizontal acceleration vibration signals are sampled at 25.6
kHz. The sampling rate determines how many vibration signals, or samples, are captured
every second. The sampling rate, also known as sampling frequency, is 25600. Vibration
signal samples of both horizontal and vertical acceleration are captured every second. Every
0.1 seconds, one vibration sample or vibration data point recording of 0.1 seconds is saved in
the data files. As a result, each data file has 2560 data points (horizontal acceleration vibration
and vertical acceleration vibration data points). A set of observations or data collected at
discrete and evenly spaced periods is referred to as time-series data. Since vibration signals
are recorded every 10 seconds in the acquired mechanical bearing dataset, an acquired dataset
is time series data.
Since the dataset in its original form is not appropriate to train the proposed CNN+LSTM-
based hybrid model. Therefore, there is a need for analyzing and pre-processing the acquired
dataset. First, we analyzed the dataset by calculating the null values and outliers in the
123
Multimedia Tools and Applications
acquired dataset. We performed outliers detection in the acquired dataset using a box plot
method. Since the PRONOSTIA dataset is large enough and very few records are null, to
handle the null values in the dataset, we deleted null value records from the dataset.
Then, CWT is applied to the time series data and converted 1D data into 2D images. To
normalize the coefficients of 2D CWT data, we used the data normalization approach after
applying CWT to the 1D gathered data. To achieve data normalization, we used the min-max
normalization approach to re-scale or adjust all the 2D data to a specific range between (0
and 1). Equation (1) describes the min-max normalization. Figure 2 shows the CWT of the
horizontal vibration signal of a bearing in 1D format. The horizontal axis represents the time
in microseconds, and the vertical axis represents the frequency of a horizontal vibration in
hearts.
Figure 3 represents the vertical vibration in 1D format. In Fig. 3, the horizontal axis rep-
resents the time in a microsecond, and the vertical axis represents the frequency of a vertical
signal in hertz. A high line in Figs. 2 and 3 represents the high vibration of a mechani-
cal bearing in horizontal and vertical directions. The horizontal and vertical vibration of a
mechanical element’s w.r.t. time in microseconds is given in the acquired dataset. Since It is
difficult to understand the 1D signals in two horizontal and vertical directions, a 2D image is
generated using 1D horizontal and vertical signals. The Vibration signals are better visual-
123
Multimedia Tools and Applications
ized, evaluated, and controlled using signal processing. Therefore, we used signal processing
to generate 1D horizontal and vertical features into a 2D image. The CWT signal processing
approach generates a 2D image from the 1D features. Because time-frequency domain charac-
teristics contain more information about the vibration signals, it can easily detect mechanical
bearing degeneration. In regression problems like RUL prediction, more data means faster
and easier analysis. Figure 4 shows the 2D representation of horizontal and vertical vibration
signals. These 2D feature images are used to train the proposed CNN+LSTM model. After
prepossessing the dataset, it is divided into training and validation. Further, the training part is
divided into training and validation parts. The detailed description of training and validation
are described as follows.
The training part is useful for training the model for predicting the RUL of mechanical
elements. This dataset part is divided into two categories: 90% training and 10% validation.
The training part is used to train the proposed CNN+LSTM model, and the validation part
is used to validate the proposed model. The validation part of the dataset is implemented to
determine whether a model performs correctly or not on the data for which it has not been
trained. One of the most important roles of the validation part is to ensure that our model
stays balanced during the training phase.
When a model performs too well and makes accurate predictions on training data but not
so well and makes inaccurate predictions on test data, it means the model is not trained on
(for example, validation data, test data); such a model is said to be over-fitted. Validate the
proposed model by applying a validation dataset to avoid over-fitting. Further, we checked
that on validation data during training to see if the results the model produces for the validation
data are comparable to those the model produces for the train data. So we can tell if our model
is overfitting or not.
The proposed RUL algorithm is a hybrid combination of CNN and LSTM, where CNN is
used to extract the key feature vectors from the input data, and LSTM is used to predict the
123
Multimedia Tools and Applications
RUL of a mechanical bearing. Therefore, to train the CNN to extract the key features, the 1D
signals are converted into 2D images by applying CWT. Further, these 2D images are applied
to CNN for extracting key feature vectors. The extracted key feature vectors are applied to
the LSTM for predicting the RUL of a mechanical bearing. The RUL of a mechanical bearing
is represented by a Health Indicator (HI). A detailed description of three stages, such as the
conversion of the 1D signal to 2D images using CWT, feature extraction using CNN, and
RUL prediction using LSTM is described as follows:
(a) CWT It is used to represent a vibration signal in time and frequency. CWT is also useful
to compute the variable aspects of vibration signal. In a wavelet transform, the signals are
represented as wavelets. The signals are transformed into wavelets using the CWT. Wavelets
are magnitude wave-shaped vibrations that start at 0, progress, and then return to zero.
The CWT provides a well-defined and understandable interpretation of time and frequency
components, which aids in clearly comprehending the bearing deterioration process. How-
ever, the critical limitation of a 1D representation of a vibration signal is that it only contains
the temporal field intelligence of a mechanical bearing. Thus, it is challenging to interpret
vibration signals in only 1D form. Therefore, 1D signals are transformed into 2D images. The
information about the signals from both the time and frequency domains is carried via 2D
signals, i.e., 2D CWT picture characteristics, which assists in accurately viewing the bearing
degradation process.
The CWT has different types of wavelets, such as the market wavelet, gaussian derivative
wavelet, frequency b-spline Wavelet, Mexican hat wavelet, Shannon wavelet, Complex Mor-
let Wavel (CMW), etc. We implemented the CMW method in the proposed work to convert
the 1D signal into 2D images. The key reason to use CMW in the proposed work to transform
1D signals into 2D images is that a CMW is a function whose spectrum has only positive
frequencies, and CMW only responds to the non-negative frequencies of a given signal. It
produces a transform whose modulus is less oscillatory than in the case of a real wavelet.
This property of CMW is a key advantage for detecting and tracking instantaneous fre-
quencies contained in the signal. Therefore, CMW produces better results for regression
problems when compared to other wavelet types. Conversion of 1D horizontal and vertical
vibration signals into 2D images is represented in Fig. 4. The mathematical expression of the
CMW is described by (1).
Y (t) = exp−t /2 cos(5t)
2
(1)
Where ‘t’ is the time instance on which horizontal and vertical vibrations are measured.
(b) CNN After converting the 1D signals into 2D images by applying CMW to the 1D
signals, the transformed 2D images are evaluated using a CNN for extracting key feature
vectors. Figure 5 shows the architecture of CNN in the proposed CNN+LSTM model. The
fundamental motivation for using CNN in the proposed CNN+LSTM algorithm is that CNN
handles the image data more efficiently than other algorithms. The complex architecture
of CNN automatically extracts the critical feature vectors to train the model. A CNN can
assess images effectively and extract more valuable information from an input image. CNN
takes images as arrays of pixel values, i.e. ( px ). CNN accepts input in the [N x C x Hx Wx ]
structure, where N is the batch size, C x is the number of channels or filters, Hx is the image
height in pixels, and Wx is the image width in pixels. Figure 5 shows the architecture of the
CNN in the proposed CNN+LSTM algorithm. A CNN architecture consists of three different
types of layers: convolution layer, pooling layer, and fully connected layer. The CNN layer
extracts the critical features from the input image in CNN architecture. The pooling layer is
123
Multimedia Tools and Applications
Conv1 Conv2
Kernel Max pool1 Kernel Max pool2
size=3*3 Stride=2 size=3*3 Stride=2
Kernels=16 Padding=2 Kernels=16 Padding=2
Padding=1 Padding=1
Stride=1 Stride=1
Input
Conv2
Kernel Max pool3 Max pool4
Conv4
size=3*3 Stride=2 Kernel size=3*3 Stride=2
Kernels=16 Padding=2 Padding=2
Kernels=64
Padding=1 Padding=1
Stride=1 Stride=1
Output
FC1 FC2
Flat layer
responsible for reducing the dimension of the feature vector. Ultimately, the fully connected
layer is responsible for learning high-level features essential for making decisions about the
input data, classification, and flattening the features.
As shown in Fig. 5, the proposed CNN architecture consists of four convolution layers,
four max-pooling layers, one flattened layer and two fully connected layers. A 2D image
of horizontal and vertical acceleration sizes of 2x128x128 as input is applied to the first
convolution layer. This layer performs a convolution operation on a 2D image by applying
16 kernels(filters) each of size 3x3. All other parameters, such as padding=1 and stride=1,
are set onto the first convolution layer. In convolution, layer kernels are used to evaluate
the pixels of the input image. The initial random weights are assigned to each kernel in the
first iteration. In the successive iterations, values in each kernel are updated repeatedly as
the CNN is trained. Hence, during the training phase, on a number of epochs, the value of
each kernel is calculated in the proposed work. During the convolution operation, the kernel
matrix moves across the complete input image from the top left corner to the bottom right
corner.
A convolution operation is performed by matrix multiplication between the kernel’s
weights and the input image’s pixel values. In the convolution layer, stride refers to the
step size at which the convolution or pooling filter moves across the input volume. A more
extensive stride results in fewer filter positions and a smaller output volume, while a smaller
stride leads to more filter positions and a larger output volume. Further, padding involves
adding extra pixels (usually zeros) around the edges of the input volume before applying the
convolution or pooling operations. Padding is essential for controlling the output’s spatial
dimensions and preserving the input’s spatial information.
The fundamental motivation to apply padding in the proposed work is that it prevents
image shrinkage and reduces dimensionality loss. Instead of applying each data sample one
at a time to the CNN, we partitioned the training dataset into different batches of equal
size (each batch consists of the same number of input images). Different batch sizes, such
as 32, 64, 128, and 256, are selected during the training process of the CNN. The batch
normalization entails re-scaling all of a batch’s data, i.e. pixel values, to a specific range
(say [-1,1]). Further, a Rectified Linear Units (ReLU) activation function is utilized at the
convolution layer. The RUL activation function generates the output value described in (2).
123
Multimedia Tools and Applications
Where ‘x’ represents the input value. The key reason to apply the Relu activation function
is that it allows the model to learn faster and learn complex input patterns. After performing
a convolution operation on a 2x128x128 image, the output dimension of the first convolution
layer is generated by Nx16x128x128. The output dimension of the convolution layer is
described by (3).
(Hx , Wx ) + 2P − K
+1 (3)
S
Where Hx and Wx represent the height and width of an input image; ‘P’ represents the
padding value; ‘K’ represents the dimension of the kernel; and ‘S’ represents the stride size.
Further, a feature of Nx16x128x128 dimension generated by the first convolution layer
is applied to the max-pooling layer. In the max-pooling layer, the max-pooling operation
with a 2x2 filter and a stride of (s=2) on the convoluted image is applied to produce a
pooled feature image of 16x64x64 pixels. A stack of convolution and pooling layers with
different operations such as stride, padding, batch normalization, ReLU, and max pooling
is applied until the picture shape reaches 128x8x8. The output of the last pooling layer, i.e.
(128x8x8), is flattened into an 8192 1D feature vector using a flattened layer. Then, two
fully connected layers with the Relu activation function are used to minimize the size of the
input flattened vector. Dropout and Relu activation functions are used to reduce the over-
fitting of the proposed model. A detailed description of all the parameters such as (filters,
stride, padding, feature map dimension, activation function) on different layers of the CNN
is described in Table 1.
(c) LSTM The mechanical bearing input data is a time-series data. It has intrinsic time depen-
dence, where a current output depends on the prior inputs. Therefore, an appropriate strategy
for learning hidden patterns in time-series data is required. Since CNN is based on sequential
modelling, where the output of the 1st sampled image is independent of the 2nd sampled
image, the other limitations of sequential modelling are the inability to represent long-term
dependencies, the inability to keep the order of data items, and the lack of parameter sharing.
An LSTM can efficiently handle all the above-described issues during sequential modelling,
such as handling variable-length sequences, preserving sequence order, tracking long-term
dependencies, and sharing parameters across the sequence.
123
Multimedia Tools and Applications
LSTM Unit LSTM Unit LSTM Unit LSTM Unit LSTM Unit
Hidden
State
FC
123
Multimedia Tools and Applications
unit takes input from the hidden layers (hidden states) and takes the encoded vector sequence
of the CNN encoder of the ‘t’ time step. The output of the LSTM unit at time step ‘t’ and the
current encoded vector sequence, i.e. the encoded vector sequence of the CNN encoder of
the ‘t+1’ time step image, are fed into the LSTM unit at a time step ‘t+1’. At time step ‘t+2’,
the LSTM unit receives the output of the LSTM unit at time step ‘t+1’ as well as the current
encoded vector sequence, which is the encoded vector sequence of the CNN encoder for the
‘t+2’ time step image.
At time step ‘t+3’, the LSTM unit receives the output of the LSTM unit at time step ‘t+2’
as well as the current encoded vector sequence, which is the encoded vector sequence of the
CNN encoder for the ‘t+3’ time step image. At time step ‘t+4’, the LSTM unit receives the
output of the LSTM unit at time step ‘t+3’ as well as the current encoded vector sequence,
which is the encoded vector sequence of the CNN encoder for the t+4 time step image.
The internal architecture of an LSTM cell is described in Fig. 7. Each LSTM cell consists of
four essential units: an input gate, output gate, forget Gate, and memory cell. The forget Gate
is responsible for removing certain information from the input. The input gate is responsible
for remembering or updating the information. The memory cell is responsible for storing the
information. The output gate outputs the information to its next LSTM unit.
The detailed deception of different operations on an LSTM unit is described in (3) to (6).
Input gate (i t ) The input gate determines how much of the input information (current input
and the previous cell state) to let into the cell state. It takes the current input xt and the
previous hidden state h t−1 as inputs and outputs a value between 0 and 1 for each element
of the cell state.
i t = σ (Wii xt + bii + Wii h t−1 + bhi ) (4)
Where Wii is a weight matrix; xt is an input vector at time instant ‘t’; bii is a bias; h t−1 is an
hidden state at time instance-1’ Forget Gate (( f t )): The forget Gate decides what information
from the previous cell state Ct−1 to forget. It considers the current input xt and the previous
hidden state h t−1 .
f t = σ (Wi f xt + bii + Wh f h t−1 + bh f ) (5)
Wi f , Wh f are the weight metrics; bii and bh f are the bias metrics;
Cell State Update (Ct ): This step computes a candidate cell state Ct that could be added
to the cell state. It considers the current input xt and the previous hidden state h t−1 .
C̃t = tanh(Wig xt + big + Whg h t−1 + bhg ) (6)
123
Multimedia Tools and Applications
Where Wig and Whg are the weight metrics; big and bhg are the bias vectors
Cell state (C t ) update The updated cell state Ct is a combination of the previous cell
state
Ct−1 after forgetting certain parts ( f t Ct−1 ) and adding the new candidate values (C̃t it )
Ct = f t Ct−1 + C̃t it (7)
Output gate (ot ) The output gate decides how much of the cell state Ct to expose to the
output based on the current input h t−1 .
ot = σ (Wio xt + bio + Who h t−1 + bho ) (8)
Where Wio , and Who are the weight metrics; bio and bho are bias vectors Hidden State
Update (h t−1 ): The hidden state h t is updated based on the cell state Ct and the output gate
ot .
h t = ot tanh(Ct ) (9)
Where h t−1 represents the previous layer output(hidden state output); x t describes the
input vector applied to the LSTM unit; b f describes the bias value; Ct and Ct− describe
the cell state at time instant ‘t’ and ‘t-1’ respectively. An LSTM cell has mechanisms (input
gate, forget Gate, and output gate) to control the flow of information into and out of the
cell state, and it uses the cell state to store and manage long-term dependencies in the data.
This architecture helps capture long-term dependencies and address the vanishing gradient
problem that can occur in traditional RNNs. Ultimately, based on the unit vector sequence
generated by the LSTM unit in the proposed CNN+LSTM algorithm, an HI(Health Indicator)
or fault probability or failure probability of a mechanical bearing is calculated. The estimated
RUL of a mechanical bearing is calculated by (7).
RU L = P(t) − C(t) (10)
Where P(t) and C(T) represent the predicted fault time and current time instance respec-
tively
Algorithm 1 describes the data preprocessing for the mechanical element’s RUL predic-
tion. In Algorithm 1, step 1 describes removing outliers from the dataset. Step 2 and step 3
123
Multimedia Tools and Applications
describe the calculation of mean and standard deviation of the data. Step 4 sets the threshold
value to remove the outlier. Steps 5 to 10 describe the removal or outliers using z_scor e. Steps
11 to 15 describe the converting the 1D data into 2D images by applying CWT. Algorithm 2
describes the proposed CNN+LSTM hybrid algorithm. Step 1 to 3 describes the creation of
the CNN model. Steps 4 to 6 describe the creation of the LSTM model. Step 7 TO 9 describes
the CNN+LSTM model using CNN and LSTM. Step 10 describes the input shape. Steps 11
to 14 describe the generation of CNN output. Step 15 describes the creation of CNN out of
the stack. Steps 16 to 19 describe the mechanical element’s RUL using LSTM.
Since RUL prediction of a mechanical bearing is a regression problem, to evaluate the per-
formance of the proposed CNN+LSTM algorithm, different performance metrics such as
Mean Squared Error (MSE), Mean Absolute Error (MBE), Explained Variance Score (EVS),
R2-score are calculated. The detailed description of all the considered performance metrics
is described as follows:
a. Mean Squared Error (MSE) The MSE describes risk metrics corresponding to the expected
value of error or loss. It returns the sum of squared error for multi-output points. The MSE
is described in (8).
1
n−1
M S E(y, ŷ) = (yi − ŷi )2 . (11)
n
i=0
Where ‘n’ represents the number of data points; ‘ ŷ’ represents the expected value of a
variable; ‘y’ represents the predicted value of a variable
b. Mean Absolute Error (MAE) The MAE describes a risk metric corresponding to the
expected value of the absolute error loss or norm loss. MAE calculates a non-negative floating
123
Multimedia Tools and Applications
point. The 0.0 MAE value represents the best regression model. MAE is described by (9).
n −1
1
M AE(y, ŷ) = yi − ŷi (12)
n
i=0
c. Explained Variance Score (EVS) It calculates the error between the variance of error and
the variance of true values. The best possible value of this metric is 1.0. The EVS value is
described in (10).
V ar {y − ŷ}
E V S(y, ŷ) = 1 − (13)
V ar {y}
R2 -score The R 2 describes how well the model is fitted to the dataset for predicting future
value. The usual possible value of R 2 is between 0 to 1. In infrequent conditions, the R 2
value can be negative also. The R 2 value is described in (11).
n
(yi − ŷi )2
R 2 (y, ŷ) = 1 − i=1
n (14)
i=1 (yi − ȳ)
2
123
Multimedia Tools and Applications
123
Multimedia Tools and Applications
123
Multimedia Tools and Applications
Fig. 14 Batch=32
Fig. 15 Batch=64
123
Multimedia Tools and Applications
Fig. 16 Batch=128
In the next stage, to find the best optimizer, we applied different optimizers such as adam,
rmsprop, and sgd to the proposed CNN+LSTM model. Figures 18 and 19 show the loss
and accuracy of the proposed CNN+LSTM model in the case of different optimizers. The
performance of the Adam optimizer in terms of loss and accuracy is high as compared to
other rmsprop and sgd.
The key reason behind low loss in the case of the proposed CNN+LSTM model is that
first, we extracted key feature vectors from the input data using CWT and CNN, then after, we
applied these feature vectors to the LSTM model to predict the RUL of a mechanical bearing.
LSTM performs well over time-series or sequential data; therefore, the RUL prediction
accuracy is high in the case of the proposed CNN+LSTM.
The performance of the proposed CNN+LSTM model is evaluated using the loss function.
Since loss can be positive or negative, we selected MSE (Mean Squared Error) as a loss
function in our proposed work. Further, the loss value is determined by the difference between
the actual(expected) value, and the model predicted value. If the loss function produces a
bigger value, it means a higher mismatch between the predicted and expected values. Thus, it
is the case of underfitting. Therefore, more hyperparameter tuning is required for our model.
On the other hand, if the loss function produces a minimal value, it means the predicted value
is approximately equal to the expected value. This condition shows the high accuracy of the
model. Table 4 describes the fine-tuned hyperparameter value for the proposed CNN+LSTM
model. The dropout or batch normalization rate is defined as 0.2. The activation function Relu
is selected in CNN to avoid the vanishing gradient problem in the proposed CNN+LSTM
model. A maximum number of 100 epochs is set in the proposed CNN+LSTM model. A
Mean Squared Error (MSE) loss function is defined to fine-tune the hyperparameters.
Figures 20, 21, 22, 23, 24, 25, 26, 27, 28 and 29 show day-wise, i.e. RUL predication of a
mechanical bearing. The predicted RUL of different mechanical bearings. The performance
Fig. 17 Batch=256
123
Multimedia Tools and Applications
123
Multimedia Tools and Applications
123
Multimedia Tools and Applications
123
Multimedia Tools and Applications
of the pure CNN model in estimating the likelihood of defects on the training dataset. The
blue dots show the fault probability value of the forecast train, the red dots represent predicted
validation fault probability values, and the black line represents expected failure probabil-
ity values. The ideal values are the expected values. Since the acquired mechanical bearing
dataset is run-to-failure, the bearing’s functionality is fine at the start and gradually deterio-
rates as it progresses.
To train the proposed CNN+LSTM algorithm, we imported a torch package for loading
the time series mechanical bearing dataset. The torch package includes several utilities to
make data loading simple: if data loading is simple, feeding data into the model is also simple,
and the code is easier to read. If the actual value is 0.1, the predicted value should be 0.1 as
well. Expected labels are displayed to view the train, and Val results indicate how closely the
actual expected results are to the ideal expected outcomes.
We prepared the data for the CNN+LSTM model by packing it into [(NFL)xCxHxW]
sequences. We first calculated the loss function during training to determine the difference
between the actual or expected value and the model-predicted value. The loss function mea-
sures our proposed CNN+LSTM algorithm’s performance on a mechanical bearing dataset.
The loss function describes whether our model is improving or not and whether our model
is predicting accurately. Therefore, we also evaluated our proposed model’s performance
on different performance metrics such as Mean Squared Error (MSE), Mean Absolute Error
(MAE), Explained Variance Score (EVS), and R 2 -score. To reduce the loss and offer the most
accurate possible results, we used the Adam (Adaptive Moment Estimation) optimizer to train
the CNN. To optimize the proposed RUL model, a learning rate scheduler is implemented.
Based on the number of epochs, the learning rate scheduler adjusts the learning rate for
better results during the training loop. Each time the model weights are updated, the learning
rate determines how much to change the model in response to the predicted error. Epochs are
groups of data samples that are used to train a neural network. Further, to compare the perfor-
mance of the proposed CNN+LSTM algorithm with other state-of-the-art machine learning
and deep learning models, we implemented other algorithms such as (Linear Regression
(LR), Multivariate Regression (MVR), CNN, and LSTM). Table 5 describes the perfor-
mance comparison among different models (LR, MVR) CNN, LSTM, and CNN+LSTM).
The performance of the proposed CNN+LSTM is better as compared to other regression
algorithms. The performance of the proposed CNN+LSTM model is calculated in terms of
different parameters such as MSE, MAE, EVS, and R 2 .
The results of the CNN+LSTM architecture on the test dataset are shown in Figure.
Similar findings were also observed with CNN architecture. The obtained results need to be
corrected. Continuous vibration measurements are used to predict fault probability and RUL
for the bearing arrangement. Time series data features are inherent in input. As a result, this
123
Multimedia Tools and Applications
paper aims to investigate the use of LSTM in conjunction with a CNN encoder to obtain
credible predictions. However, the CNN architecture alone outperforms the CNN+LSTM
design on the training and validation sets. On the test data, both architectures fared badly,
indicating that the networks did not generalize effectively.
Both architectures have overfitted the training data. This behaviour could be related to
the following factors: Horizontal and vertical vibration input CWT converts 1-D signals into
2-D feature maps. During the conversion, windows of size 20 are obtained from the 1D
signal, and the average value is utilized to ensure that the feature maps are 128*128. This
conversion method may not be appropriate and should be investigated further. Both proposed
architectures (CNN and CNN+LSTM) may be larger than required, causing the training data
to overfit. Though the data represents a run-to-failure experiment focused on the original 1-D
input signals, the data appears to be within +/-5m/ s2 for the most part until the end of the
run and, in the end, exceeds 20m/s2 in magnitude.
As a result, it is unlikely that the chance of failure can be anticipated to change linearly as
it was in training. When data is in the nominal range, the fault probability should be low, but
as the values go into the abnormal range, the fault probability should rise. It is the primary
cause of both model’s poor performance on test data, as training labels may not be appropriate
for the learning objective. We propose the following for future work: Increase the number of
helpful training labels by improving the feature extraction scheme. Rather than converting
to 2-D feature maps, create training labels and use 1-D signals as input to 1-D Convolutions
for encoding and LSTM to predict fault risk.
This paper proposed a novel deep learning algorithm for RUL prediction of a mechanical
bearing. The RUL deep learning-based RUL prediction algorithm is a hybrid combination of
CNN and LSTM. A 1D horizontal and vertical signal was converted into 2D images using
the CWT. By stacking different layers, such as convolution, pooling, and fully connected
layers, a CNN network was designed in the proposed CNN+LSTM. A key feature vector was
extracted from the 2D image using the CNN. Further, an LSTM network was designed to take
several LSTM cells. The feature vectors extracted by the CNN were applied to the LSTM
to predict the RUL of a mechanical bearing. The performance optimization of the proposed
model was performed by fine-tuning the different hyperparameters. The performance of the
CNN+LSTM in terms of accuracy and loss was calculated by 98% and 2%, respectively. In
future, the prediction approach will be applied and evaluated to more sophisticated platforms
with multiple components.
Data Availability The data used for experimental purposes is available as an open source at: https://
paperswithcode.com/dataset/pronostia-bearing-dataset
123
Multimedia Tools and Applications
Declarations
References
1. Cui L, Wang X, Wang H, Ma J (2019) Research on remaining useful life prediction of rolling element
bearings based on time-varying Kalman filter. IEEE Trans Instrum Mea 69(6):2858–2867
2. Dataset PB (2023) PRONOSTIA bearing dataset bearing dataset. https://paperswithcode.com/dataset/
pronostia-bearing-dataset
3. Deng Y, Du S, Wang D, Shao Y, Huang D (2023a) A calibration-based hybrid transfer learning framework
for RUL prediction of rolling bearing across different machines. IEEE Trans Instrum Meas 72:1–15
4. Deng Y, Lv J, Huang D, Du S (2023b) Combining the theoretical bound and deep adversarial network for
machinery open-set diagnosis transfer. Neurocomputing, pp 126391
5. Han Y, Chen S, Gong C, Zhao X, Zhang F, Li Y (2023) Accurate SM disturbance observer-based demagne-
tization fault diagnosis with parameter mismatch impacts eliminated for IPM motors. IEEE Trans Power
Electron 38(5):5706–5710
6. Hong J, Wang Q, Qiu X, Chan HL (2019) Remaining useful life prediction using time-frequency fea-
ture and multiple recurrent neural networks. In: 2019 24th IEEE international conference on emerging
technologies and factory automation (ETFA). IEEE, pp 916–923
7. Jimenez JJM, Schwartz S, Vingerhoeds R, Grabot B, Salaün M (2020) Towards multi-model approaches
to predictive maintenance: a systematic literature survey on diagnostics and prognostics. J Manuf Syst
56:539–557
8. Jin R, Chen Z, Wu K, Wu M, Li X, Yan R (2022) Bi-LSTM-based two-stream network for machine
remaining useful life prediction. IEEE Trans Instrum Meas 71:1–10. https://doi.org/10.1109/TIM.2022.
3167778
9. Liu H, Liu Z, Jia W, Lin X (2021) Adjustable uncertainty set constrained unit commitment with operation
risk reduced through demand response. IEEE Trans Industr Inform 17(2):1197–1207
10. Liu H, Yuan H, Hou J, Hamzaoui R, Gao W (2022) PUFA-GAN: a frequency-aware generative adversarial
network for 3D point cloud upsampling. IEEE Trans Image Process 31:7389–7402
11. Liu L, Wang L, Yu Z (2021) Remaining useful life estimation of aircraft engines based on deep convolution
neural network and LightGbM combination model. Int J Comput Intell Syst 14:1–10
12. Liu ZH, Meng XD, Wei HL, Chen L, Lu BL, Wang ZH, Chen L (2021) A regularized LSTM method for
predicting remaining useful life of rolling bearings. Int J Autom Comput 18:581–593
13. Ma M, Mao Z (2021) Deep-convolution-based LSTM network for remaining useful life prediction. IEEE
Trans Industr Inform 17(3):1658–1667. https://doi.org/10.1109/TII.2020.2991796
14. Qu Z, Liu X, Zheng M (2022) Temporal-spatial quantum graph convolutional neural network based on
Schrödinger approach for traffic congestion prediction. IEEE Transactions on Intelligent Transportation
Systems
15. Sayyad S, Kumar S, Bongale A, Kamat P, Patil S, Kotecha K (2021) Data-driven remaining useful
life estimation for milling process: sensors, algorithms, datasets, and future directions. IEEE Access
9:110,255-110,286
16. Shi J, Li Y, Zhang MZ, Liu W (2018) Remaining useful life prediction based on modified relevance vector
regression algorithm. In: 2018 Prognostics and system health management conference (PHM-Chongqing).
IEEE, pp 900–907
17. Wang B, Han T, Lei Y, Li N (2019) Remaining useful life prediction based on deep residual attention
network. In: 2019 International conference on sensing, diagnostics, prognostics, and control (SDPC).
IEEE, pp 79–84
18. Wang B, Zhu D, Han L, Gao H, Gao Z, Zhang Y (2023) Adaptive fault-tolerant control of a hybrid
canard rotor/wing UAV under transition flight subject to actuator faults and model uncertainties. IEEE
Transactions on Aerospace and Electronic Systems
19. Wang Y, Zhao Y, Addepalli S (2020) Remaining useful life prediction using deep learning approaches: a
review. Procedia Manuf 49:81–88
20. Wenqiang J, Jian C, Yi C (2019) Remaining useful life prediction for mechanical equipment based on
temporal convolutional network. In: 2019 14th IEEE international conference on electronic measurement
& instruments (ICEMI). IEEE, pp 1192–1199
123
Multimedia Tools and Applications
21. Xi X, Chen M, Zhou D (2019) Remaining useful life prediction for multi-component systems with hidden
dependencies. Sci China Inf Sci 62:1–16
22. Yang B, Liu R, Zio E (2019) Remaining useful life prediction based on a double-convolutional neural
network architecture. IEEE Trans Ind Electron 66(12):9521–9530
23. Yao J, Lu B, Zhang J (2022) Tool remaining useful life prediction using deep transfer reinforcement learn-
ing based on long short-term memory networks. The International Journal of Advanced Manufacturing
Technology pp 1–10
24. Zhao D, Liu F (2022) Cross-condition and cross-platform remaining useful life estimation via adversarial-
based domain adaptation. Sci Rep 12(1):878
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.
123