s11042-024-18546-9

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Multimedia Tools and Applications

https://doi.org/10.1007/s11042-024-18546-9

Mechanical element’s remaining useful life prediction


using a hybrid approach of CNN and LSTM

Neeraj Kumar Sharma1 · Sriramulu Bojjagani1

Received: 2 July 2022 / Revised: 27 September 2023 / Accepted: 29 January 2024


© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024

Abstract
For the safety and reliability of the system, Remaining Useful Life (RUL) prediction is
considered in many industries. The traditional machine learning techniques must provide
more feature representation and adaptive feature extraction. Deep learning techniques like
Long Short-Term Memory (LSTM) achieved an excellent performance for RUL prediction.
However, the LSTM network mainly relies on the past few data, which may only capture some
contextual information. This paper proposes a hybrid combination of Convolution Neural
Network (CNN) and LSTM (CNN+LSTM) to solve this problem. The proposed hybrid
model predicts how long a machine can operate without breaking down. In the proposed
work, 1D horizontal and vertical signals of the mechanical bearing are first converted to 2D
images using Continuous Wavelet Transform (CWT). These 2D images are applied to CNN
for key feature extraction. Ultimately, these key features are applied to the LSTM deep neural
network for predicting the RUL of a mechanical bearing. A PRONOSTIA data is utilized to
demonstrate the performance of the proposed model and compare the proposed model with
other state-of-the-art methods. Experimental results show that our proposed CNN+LSTM-
based hybrid model achieved higher accuracy (98%) with better robustness than existing
methods.

Keywords RUL · CNN · LSTM · CWT · Deep learning · Machine learning

1 Introduction

In today’s world, we are dependent on a wide range of mechanical equipment. Therefore,


the reliability of mechanical equipment is more important. RUL prediction using machine
learning is essential and serves various needs across different industries and applications.
Some key reasons include cost reduction, asset optimization, safety, reliability improvement,

B Neeraj Kumar Sharma


neeraj16ks@gmail.com
Sriramulu Bojjagani
sriramulubojjagani@gmail.com
1 School of Engineering and Sciences (SEAS), Department of Computer Science and Engineering, SRM
University-AP, Amaravati, Andhra Pradesh, India

123
Multimedia Tools and Applications

and energy efficiency prediction of RUL is crucial. Due to the complicated structure of the
machine, its maintenance is an important and challenging task. In the critical time, due to
poor maintenance of the machine, its failure probability is higher. Different maintenance
approaches are used to increase the reliability, safety, accessibility, and operational quality
of the machine. These approaches not only increase the reliability, safety, and accessibility
but also reduce the unplanned downtime and operational costs of the machine.
Different industries, such as automobile, manufacturing, aircraft, etc., use machine main-
tenance breakdown techniques [1, 5]. The traditional machine maintenance approach uses
its break point, like reactive or breakdown maintenance, and repair is performed after the
machine fails. This approach is a normal maintenance strategy in which equipment continues
to operate until it breaks down due to a final failure. However, a key advantage of the reactive
maintenance approach is that it gives a longer running time to the machine till the machine’s
element breaks down. However, the key limitation of this approach is that it simply gives an
inactive response when it fails. It may result in significant damage to the machine or a high
probability of an accident. Further, the reactive maintenance approach is unsuitable when
a machine contains expensive parts [16]. The completely damaged expensive parts require
more money to repair. Hence, using the reactive technique for machine maintenance is not a
good idea.
To overcome the reactive machine maintenance technique’s problem, we need to conduct
frequent equipment tests at regular intervals. This technique is known as the preventive main-
tenance technique. Although the preventative maintenance technique enhances the machine’s
efficiency, performance, and safety, it also raises maintenance costs. The other limitation of
the preventive maintenance test is that it is hard to find out when to perform this test [3, 15].
The answer to this question is that we need to adopt a more novel machine maintenance
technique, the predictive maintenance technique. We can test the machine’s elements using
the predictive maintenance technique before failure happens. This approach estimates the
maintenance time, i.e. (the time it will take for a machine to fail). Further, it also detects
the machine’s flaws and indicates which part of the machine needs to be repaired. Thus,
the predictive maintenance approach not only reduces the downtime of the machine but also
extends the lifetime of the machine.
The predictive maintenance technique is further divided into model-based and data-driven-
based categories. The following two methods (model-based and data-driven) can be used to
estimate the machine’s RUL prediction. Further, Model-based approaches rely on a physical
model based on physics concepts. A model-based approach anticipates when a device may
fail [18, 23]. Additionally, model-based approaches are hard to implement since their models
are hard to comprehend, and building a reliable physical model is also challenging.
The concept of data-driven methods for RUL predicting a mechanical element has recently
gained much attraction due to their advantages over model-based approaches [4, 7]. In a
prediction-based approach, a prediction model of a mechanical element failure is created
using data-driven methodologies. The prediction model takes data as an input and predicts
the output. The prediction of output is based on the input applied to the model. In a machine
learning model using data-driven methodologies, the first step is to collect data under various
operating conditions.
Data-driven approaches may be constrained if the data obtained needs to be clarified or
insufficient. When a significant volume of data about a machine is accessible, degradation or
the damage probability of that machine may be computed efficiently and precisely. However,
applying a data-driven approach for prediction when there is limited or insufficient data
can be challenging. However, it is still possible by applying some strategies and techniques
such as data augmentation, feature engineering, transfer learning, regularization, ensemble

123
Multimedia Tools and Applications

method, etc. Further, using data-driven methodology, the damaged component of a machine
can be calculated even without knowing anything about the equipment [24]. Hence, data-
driven approaches are easier to execute than model-based approaches. This paper proposes a
CNN+LSTM-based hybrid data-driven approach for RUL predicting a mechanical bearing.
The proposed CNN+LSTM-based RUL prediction algorithm comes under the predictive
machine maintenance technique category, in which we can test the machine element based
on a predicted RUL of the mechanical element. Hence, the main goal of the proposed work
in this paper is to predict the RUL of a machine bearing. The fundamental motivation to use
the hybrid combination of CNN and LSTM network for RUL prediction based on vibration
signals is that vibration signals from mechanical equipment are typically time series data.
CNNs excel at learning hierarchical features from spatial data, and they can be adapted to
work with time series data [14].
Using CNNs as our model’s initial layers, we can automatically extract relevant features
from the vibration signals [10]. It reduces the need for manual feature engineering, which
can be time-consuming and error-prone. Further, mechanical equipment vibrations often
contain both spatial and temporal patterns. CNNs are excellent at capturing spatial patterns
within the signals, while LSTMs are designed to capture temporal dependencies [13]. By
combining these two architectures, we can effectively model and exploit spatial and temporal
information in the data. In the proposed mechanical element’s RUL prediction approach,
vibration signals of a machine bearing are often used to monitor the health of the machine’s
bearing. The horizontal and vertical vibrations of a machine’s bearing indicate the present
health condition of a machine.
In short, the fundamental idea of the proposed work in this paper is described as follows:
A data collection of the machine’s mechanical elements, i.e. (bearing), is performed under
different working conditions. The acquired machine bearing dataset consists of horizontal
and vertical vibration signals of the machine’s bearing at regular intervals. These signals are
captured on different dates, hours, minutes, and microseconds. The horizontal and vertical
1D signals are prepossessed and converted from 1D signals to 2D images using Continuous
Wavelength Transform (CWT).
Further, converted 2D images of horizontal and vertical signals are divided into two parts:
training and testing. The training part of 2D images of horizontal and vertical vibration signals
is applied to CNN for key feature vector extraction from the 2D images. The extracted feature
vectors from the CNN are applied to the LSTM for RUL prediction of mechanical bearing.
To check the accuracy of the proposed model, we calculate different performance metrics
such as the Sum of Mean Squared Error (SME), Mean Absolute Error(MAE), etc. The key
contributions of the proposed work in this paper are as follows:
1. To acquire, analyze, and pre-process the dataset of a mechanical bearing on different
operating conditions.
2. To convert 1D vibration signal to 2D images and extract key feature vectors from the
mechanical bearing dataset.
3. To design and develop a proposed CNN+LSTM algorithm for the RUL prediction of a
mechanical bearing.
4. To calculate different performance metrics of the proposed hybrid model for mechanical
bearing RUL prediction.
The remaining sections of the paper are organized as follows. Section 2 deals with the
background and literature survey. Section 3 describes the proposed CNN+LSTM based RUL
prediction algorithm. The experimental setup and result analysis are described in Section 4.
Finally, Section 5 highlights the conclusion and future direction of the proposed work.

123
Multimedia Tools and Applications

2 Literature survey

Li et al. [12] proposed an RUL prediction algorithm for rolling bearing. In their proposed
work, an author used a hybrid combination of the elastic net with an LSTM. The proposed
algorithm is known as E-LSTM. Their proposed approach considered temporal and spatial
correlation to forecast the RUL. Further, they reduced the over-fitting of the LSTM network
by the elastic net-based regularization term. The key limitation of the proposed work is that
the author directly applied the vibration signals to an E-LSTM. Therefore, the accuracy of
the proposed work could have been better.
Xi et al. [21] designed and developed an RUL prediction algorithm for a dynamic system
subject to multiple dependent degradations. The proposed system is referred to as an online
RUL system. By observing the multidimensional data, the author predicted the RUL of a
blast furnace. In their proposed work, the authors used a sequential Kalman filter. They also
verified their solution by the numerical approach. The key limitation of their proposed work
is that the Kalman filter assumes that the dependent and independent variables are linearly
related. Therefore, the accuracy of the model could have been better.
Liu et al. [11] proposed an RUL prediction system for aircraft. The authors used Deep
Convolutional Neural Network (DCNN) and Light Gradient Boosting Machine (LightGBM)
algorithms in their proposed work. The advantage of the proposed system is that signal
processing of raw sensor data and prior expertise are optional. In their proposed work, the
authors applied the time window of raw data as input to the DCNN for key feature vector
extraction. Further, they replaced the fully connected layer of the DCNN with LightGBM.
Hence, by replacing the fully connected layer from LightGBM, they improved the prediction
accuracy of the mechanical element.
Wang et al. [19] performed a comparative study of 4 different deep neural networks,
such as Deep Belief Network (DBN), Convolution Neural Network (CNN), and Recurrent
Neural Network (RNN), for predicting the RUL of mechanical elements. The key limitation
of the proposed work is that the author needed to conduct an experimental part or propose
an algorithm in their proposed work.
Sayyad et al. [15] summarized the different research works on RUL prediction of mechan-
ical components. In their proposed work, the authors highlighted the different data sets
available on RUL prediction. They also described the merits and demerits of the existing
work on RUL prediction. In the end, they highlighted future research directions on RUL
prediction. The key limitation of the proposed work is that the author did not propose any
novel approach for RUL forecasting, and they did not conduct any experiments.
Hong et al. [6] proposed an RUL prediction algorithm for degradable equipment. Their
proposed work used different signal decomposition methods to minimize the outlier. They
designed and developed a fast, accurate RUL prediction system for degradable equipment.
First, they extracted the time-frequency domain features using Wavelet Transform (WT).
Further, they jointly utilized three Recurrent Neural Networks (RNN) to predict the RUL of
degradable equipment. Their experiment used a gas turbine engine dataset for RUL prediction.
Wenqiang et al. [20] proposed a Temporal Convolution Network (TCN) based RUL pre-
diction system for mechanical elements. In their proposed work, they applied the K-mean
clustering to identify the operating condition of the system. Then, they used the sliding time
window concept to construct an input model. In the end, they compared their result with
other RUL prediction techniques. The advantage of the proposed work is that they designed
the RUL prediction algorithm with greater accuracy.

123
Multimedia Tools and Applications

Wang et al. [17] proposed an RUL prediction system based on a Deep Residual Attention
Network (DRAN). The author handled different sensor data for RUL prediction in their
proposed work. The proposed DRAN comprises the representation learning sub-network
and the RUL prediction sub-network. Further, they constructed DRAN to extract important
information hidden in the sensor data and surpass the useless information from the DRAN.
Liu et al. [9] proposed a feature attention-based end-to-end RUL prediction approach. In
their proposed work, they applied input data with greater attention weight to key feature vec-
tors during the training phase of the proposed model. Then, these weighted feature vectors are
applied to Bidirectional Gated Recurrent Units (BGRU) to extract long-term dependencies.
In the end, they used a fully connected network for predicting RUL. They used the turbofan
engines dataset in their experimentation.
A double CNN architecture was proposed by Yang et al. [22]. Their research work fed
the original vibration signal to the proposed double CNN. The double CNN model extracted
maximum key information from the input data. The proposed double CNN architecture
includes two stages: the first CNN model identifies the incipient fault point, and the second
CNN model is constructed for RUL prediction. Further, they compared the proposed double
CNN model performance with other state-of-the-art algorithms. The proposed algorithm
performs well regarding the RUL prediction accuracy of mechanical elements.
Jin et al. [8] proposed a handcrafted feature flows (HFFs) feature extraction technique. In
their work, they suppressed the raw signal noise and improved the sequential information in
the data. Further, they proposed a Bi-directional LSTM (Bi-LSTM) based two-stream network
for RUL prediction. They experimented on the commercial modular aero propulsion system
simulation (C-MAPSS) dataset.

3 Proposed work

The flow chart of the proposed work for the CNN+LSTM-based RUL prediction system is
shown in Fig. 1. As shown in Fig. 1, the proposed work consists of different stages such
as mechanical bearing dataset acquisition, data preprocessing, proposed CNN+LSTM-based
model training, hyperparameter tuning, model testing, and performance evaluation. First,
data acquisition of mechanical bearing is performed from the different resources [2]. The
acquired dataset consists of a mechanical bearing’s horizontal and vertical vibration signals.
The mechanical bearing dataset is a time-series data known as the PRONOSTIA dataset.
After acquiring the mechanical bearing dataset, data preprocessing is performed to detect
outliers and null values in the acquired dataset. Then, we performed data prepossessing to
remove null values outliers from the dataset and converted 1D time-series data to 2D images.
The division of the dataset into different parts, such as training, validation, and testing, is
performed to train, validate, and test the proposed model respectively. In the end, different
performance mercies are calculated in the proposed work. A detailed description of all the
steps in Fig. 1 is as follows.

3.1 Data collection

A PRONOSTIA (PHM IEEE 2012 Data Challenge Data set) [2] dataset is used to develop an
RUL prediction model. The PRONOSTIA dataset consists of two parts: the learning (training)
part and the testing part. The learning part consists of 6 rolling bearing information on dif-
ferent operating conditions such as Bearing11 , Bearing12 , Bearing21 , Bearing22 , Bearing31 ,

123
Multimedia Tools and Applications

Data preprocessing
Partition of dataset
Removing null Conversion of in Training,
Mechanical bearing
values and Partition of dataset vibration signal Validation and
dataset acquisition
outliers into 2D images Testing

Training dataset

Mode training

Feature selection
using CNN
Hypertune
parameters

RUL prediction
using LSTM

Compare Validation dataset

Reject
Accept
Calculate
Test dataset Save model performance
metrics

Fig. 1 Flow diagram of the proposed work

Bearing32 . The testing part consists of 11 rolling bearing information on different operat-
ing conditions such as Bearing13 , Bearing14 , Bearing15 , Bearing16 , Bearing17 , Bearing23 ,
Bearing24 , Bearing25 , Bearing26 , Bearing27 , Bearing33 .
Over some time in the PRONOSTIA dataset, each bearing’s horizontal and vertical signals
are recorded in a (.csv) file. Further, a PRONOSTIA dataset consists of several (.CSV) files
corresponding to each bearing. Each (.CSV) file describes six different pieces of information:
hour, minute, second, microsecond, horizontal acceleration, and vertical acceleration.
The vertical acceleration and horizontal acceleration vibration signals are sampled at 25.6
kHz. The sampling rate determines how many vibration signals, or samples, are captured
every second. The sampling rate, also known as sampling frequency, is 25600. Vibration
signal samples of both horizontal and vertical acceleration are captured every second. Every
0.1 seconds, one vibration sample or vibration data point recording of 0.1 seconds is saved in
the data files. As a result, each data file has 2560 data points (horizontal acceleration vibration
and vertical acceleration vibration data points). A set of observations or data collected at
discrete and evenly spaced periods is referred to as time-series data. Since vibration signals
are recorded every 10 seconds in the acquired mechanical bearing dataset, an acquired dataset
is time series data.

3.2 Data prepossessing and signal processing

Since the dataset in its original form is not appropriate to train the proposed CNN+LSTM-
based hybrid model. Therefore, there is a need for analyzing and pre-processing the acquired
dataset. First, we analyzed the dataset by calculating the null values and outliers in the

123
Multimedia Tools and Applications

Fig. 2 Horizontal vibration signal

acquired dataset. We performed outliers detection in the acquired dataset using a box plot
method. Since the PRONOSTIA dataset is large enough and very few records are null, to
handle the null values in the dataset, we deleted null value records from the dataset.
Then, CWT is applied to the time series data and converted 1D data into 2D images. To
normalize the coefficients of 2D CWT data, we used the data normalization approach after
applying CWT to the 1D gathered data. To achieve data normalization, we used the min-max
normalization approach to re-scale or adjust all the 2D data to a specific range between (0
and 1). Equation (1) describes the min-max normalization. Figure 2 shows the CWT of the
horizontal vibration signal of a bearing in 1D format. The horizontal axis represents the time
in microseconds, and the vertical axis represents the frequency of a horizontal vibration in
hearts.
Figure 3 represents the vertical vibration in 1D format. In Fig. 3, the horizontal axis rep-
resents the time in a microsecond, and the vertical axis represents the frequency of a vertical
signal in hertz. A high line in Figs. 2 and 3 represents the high vibration of a mechani-
cal bearing in horizontal and vertical directions. The horizontal and vertical vibration of a
mechanical element’s w.r.t. time in microseconds is given in the acquired dataset. Since It is
difficult to understand the 1D signals in two horizontal and vertical directions, a 2D image is
generated using 1D horizontal and vertical signals. The Vibration signals are better visual-

Fig. 3 Vertical vibration signal

123
Multimedia Tools and Applications

Fig. 4 Converted 2D images from 1D vibration signals

ized, evaluated, and controlled using signal processing. Therefore, we used signal processing
to generate 1D horizontal and vertical features into a 2D image. The CWT signal processing
approach generates a 2D image from the 1D features. Because time-frequency domain charac-
teristics contain more information about the vibration signals, it can easily detect mechanical
bearing degeneration. In regression problems like RUL prediction, more data means faster
and easier analysis. Figure 4 shows the 2D representation of horizontal and vertical vibration
signals. These 2D feature images are used to train the proposed CNN+LSTM model. After
prepossessing the dataset, it is divided into training and validation. Further, the training part is
divided into training and validation parts. The detailed description of training and validation
are described as follows.

3.3 Training and validation

The training part is useful for training the model for predicting the RUL of mechanical
elements. This dataset part is divided into two categories: 90% training and 10% validation.
The training part is used to train the proposed CNN+LSTM model, and the validation part
is used to validate the proposed model. The validation part of the dataset is implemented to
determine whether a model performs correctly or not on the data for which it has not been
trained. One of the most important roles of the validation part is to ensure that our model
stays balanced during the training phase.
When a model performs too well and makes accurate predictions on training data but not
so well and makes inaccurate predictions on test data, it means the model is not trained on
(for example, validation data, test data); such a model is said to be over-fitted. Validate the
proposed model by applying a validation dataset to avoid over-fitting. Further, we checked
that on validation data during training to see if the results the model produces for the validation
data are comparable to those the model produces for the train data. So we can tell if our model
is overfitting or not.

3.4 Proposed CNN+LSTM algorithm

The proposed RUL algorithm is a hybrid combination of CNN and LSTM, where CNN is
used to extract the key feature vectors from the input data, and LSTM is used to predict the

123
Multimedia Tools and Applications

RUL of a mechanical bearing. Therefore, to train the CNN to extract the key features, the 1D
signals are converted into 2D images by applying CWT. Further, these 2D images are applied
to CNN for extracting key feature vectors. The extracted key feature vectors are applied to
the LSTM for predicting the RUL of a mechanical bearing. The RUL of a mechanical bearing
is represented by a Health Indicator (HI). A detailed description of three stages, such as the
conversion of the 1D signal to 2D images using CWT, feature extraction using CNN, and
RUL prediction using LSTM is described as follows:
(a) CWT It is used to represent a vibration signal in time and frequency. CWT is also useful
to compute the variable aspects of vibration signal. In a wavelet transform, the signals are
represented as wavelets. The signals are transformed into wavelets using the CWT. Wavelets
are magnitude wave-shaped vibrations that start at 0, progress, and then return to zero.
The CWT provides a well-defined and understandable interpretation of time and frequency
components, which aids in clearly comprehending the bearing deterioration process. How-
ever, the critical limitation of a 1D representation of a vibration signal is that it only contains
the temporal field intelligence of a mechanical bearing. Thus, it is challenging to interpret
vibration signals in only 1D form. Therefore, 1D signals are transformed into 2D images. The
information about the signals from both the time and frequency domains is carried via 2D
signals, i.e., 2D CWT picture characteristics, which assists in accurately viewing the bearing
degradation process.
The CWT has different types of wavelets, such as the market wavelet, gaussian derivative
wavelet, frequency b-spline Wavelet, Mexican hat wavelet, Shannon wavelet, Complex Mor-
let Wavel (CMW), etc. We implemented the CMW method in the proposed work to convert
the 1D signal into 2D images. The key reason to use CMW in the proposed work to transform
1D signals into 2D images is that a CMW is a function whose spectrum has only positive
frequencies, and CMW only responds to the non-negative frequencies of a given signal. It
produces a transform whose modulus is less oscillatory than in the case of a real wavelet.
This property of CMW is a key advantage for detecting and tracking instantaneous fre-
quencies contained in the signal. Therefore, CMW produces better results for regression
problems when compared to other wavelet types. Conversion of 1D horizontal and vertical
vibration signals into 2D images is represented in Fig. 4. The mathematical expression of the
CMW is described by (1).
Y (t) = exp−t /2 cos(5t)
2
(1)

Where ‘t’ is the time instance on which horizontal and vertical vibrations are measured.
(b) CNN After converting the 1D signals into 2D images by applying CMW to the 1D
signals, the transformed 2D images are evaluated using a CNN for extracting key feature
vectors. Figure 5 shows the architecture of CNN in the proposed CNN+LSTM model. The
fundamental motivation for using CNN in the proposed CNN+LSTM algorithm is that CNN
handles the image data more efficiently than other algorithms. The complex architecture
of CNN automatically extracts the critical feature vectors to train the model. A CNN can
assess images effectively and extract more valuable information from an input image. CNN
takes images as arrays of pixel values, i.e. ( px ). CNN accepts input in the [N x C x Hx Wx ]
structure, where N is the batch size, C x is the number of channels or filters, Hx is the image
height in pixels, and Wx is the image width in pixels. Figure 5 shows the architecture of the
CNN in the proposed CNN+LSTM algorithm. A CNN architecture consists of three different
types of layers: convolution layer, pooling layer, and fully connected layer. The CNN layer
extracts the critical features from the input image in CNN architecture. The pooling layer is

123
Multimedia Tools and Applications

Conv1 Conv2
Kernel Max pool1 Kernel Max pool2
size=3*3 Stride=2 size=3*3 Stride=2
Kernels=16 Padding=2 Kernels=16 Padding=2
Padding=1 Padding=1
Stride=1 Stride=1
Input
Conv2
Kernel Max pool3 Max pool4
Conv4
size=3*3 Stride=2 Kernel size=3*3 Stride=2
Kernels=16 Padding=2 Padding=2
Kernels=64
Padding=1 Padding=1
Stride=1 Stride=1
Output

FC1 FC2
Flat layer

Fig. 5 Architecture of CNN

responsible for reducing the dimension of the feature vector. Ultimately, the fully connected
layer is responsible for learning high-level features essential for making decisions about the
input data, classification, and flattening the features.
As shown in Fig. 5, the proposed CNN architecture consists of four convolution layers,
four max-pooling layers, one flattened layer and two fully connected layers. A 2D image
of horizontal and vertical acceleration sizes of 2x128x128 as input is applied to the first
convolution layer. This layer performs a convolution operation on a 2D image by applying
16 kernels(filters) each of size 3x3. All other parameters, such as padding=1 and stride=1,
are set onto the first convolution layer. In convolution, layer kernels are used to evaluate
the pixels of the input image. The initial random weights are assigned to each kernel in the
first iteration. In the successive iterations, values in each kernel are updated repeatedly as
the CNN is trained. Hence, during the training phase, on a number of epochs, the value of
each kernel is calculated in the proposed work. During the convolution operation, the kernel
matrix moves across the complete input image from the top left corner to the bottom right
corner.
A convolution operation is performed by matrix multiplication between the kernel’s
weights and the input image’s pixel values. In the convolution layer, stride refers to the
step size at which the convolution or pooling filter moves across the input volume. A more
extensive stride results in fewer filter positions and a smaller output volume, while a smaller
stride leads to more filter positions and a larger output volume. Further, padding involves
adding extra pixels (usually zeros) around the edges of the input volume before applying the
convolution or pooling operations. Padding is essential for controlling the output’s spatial
dimensions and preserving the input’s spatial information.
The fundamental motivation to apply padding in the proposed work is that it prevents
image shrinkage and reduces dimensionality loss. Instead of applying each data sample one
at a time to the CNN, we partitioned the training dataset into different batches of equal
size (each batch consists of the same number of input images). Different batch sizes, such
as 32, 64, 128, and 256, are selected during the training process of the CNN. The batch
normalization entails re-scaling all of a batch’s data, i.e. pixel values, to a specific range
(say [-1,1]). Further, a Rectified Linear Units (ReLU) activation function is utilized at the
convolution layer. The RUL activation function generates the output value described in (2).

f (x) = max(0, x) (2)

123
Multimedia Tools and Applications

Where ‘x’ represents the input value. The key reason to apply the Relu activation function
is that it allows the model to learn faster and learn complex input patterns. After performing
a convolution operation on a 2x128x128 image, the output dimension of the first convolution
layer is generated by Nx16x128x128. The output dimension of the convolution layer is
described by (3).
(Hx , Wx ) + 2P − K
+1 (3)
S

Where Hx and Wx represent the height and width of an input image; ‘P’ represents the
padding value; ‘K’ represents the dimension of the kernel; and ‘S’ represents the stride size.
Further, a feature of Nx16x128x128 dimension generated by the first convolution layer
is applied to the max-pooling layer. In the max-pooling layer, the max-pooling operation
with a 2x2 filter and a stride of (s=2) on the convoluted image is applied to produce a
pooled feature image of 16x64x64 pixels. A stack of convolution and pooling layers with
different operations such as stride, padding, batch normalization, ReLU, and max pooling
is applied until the picture shape reaches 128x8x8. The output of the last pooling layer, i.e.
(128x8x8), is flattened into an 8192 1D feature vector using a flattened layer. Then, two
fully connected layers with the Relu activation function are used to minimize the size of the
input flattened vector. Dropout and Relu activation functions are used to reduce the over-
fitting of the proposed model. A detailed description of all the parameters such as (filters,
stride, padding, feature map dimension, activation function) on different layers of the CNN
is described in Table 1.
(c) LSTM The mechanical bearing input data is a time-series data. It has intrinsic time depen-
dence, where a current output depends on the prior inputs. Therefore, an appropriate strategy
for learning hidden patterns in time-series data is required. Since CNN is based on sequential
modelling, where the output of the 1st sampled image is independent of the 2nd sampled
image, the other limitations of sequential modelling are the inability to represent long-term
dependencies, the inability to keep the order of data items, and the lack of parameter sharing.
An LSTM can efficiently handle all the above-described issues during sequential modelling,
such as handling variable-length sequences, preserving sequence order, tracking long-term
dependencies, and sharing parameters across the sequence.

Table 1 Parameters of CNN architecture


Layer Filter/Neurons Filter size Stride Padding Feature map Act. function

Conv1 16 3x3 1 1 Nx16x128x128 Relu


Max Pool1 2x2 2 2 Nx16x64x64
Conv2 16 3x3 1 1 Nx32x64x64 Relu
Max Pool2 2x2 2 2 Nx32x32x32
Conv3 32 3x3 1 1 Nx64x32x32 Relu
Max Pool3 2x2 2 2 Nx64x16x16
Conv4 64 3x3 1 1 Nx128x16x16 Relu
MaxPool4 2x2 2 2 Nx128x8x8
Flatten 8192 8192
Fc1 256 256 Relu
Fc2 128 128 Relu

123
Multimedia Tools and Applications

t t+1 t+2 t+3 t+4

CNN CNN CNN CNN CNN


Encoder Encoder Encoder Encoder Encoder

LSTM Unit LSTM Unit LSTM Unit LSTM Unit LSTM Unit
Hidden
State

FC

Fig. 6 CNN+LSTM model

In the proposed CNN+LSTM algorithm, a unidirectional LSTM neural network is used


to process the data generated by CNN for RUL prediction. A flattened output (128 previous
time stamps) generated by the CNN is applied to the LSTM.
Figure 6 shows a detailed architecture of the proposed LSTM algorithm. A proposed
LSTM network consists of three different types of layers: an LSTM cell, a fully connected
layer, and an output layer. Since the RUL prediction problem is moderate and not too many
feature vectors need to be trained, four hidden LSTM cells are used in the proposed LSTM.
Further, each LSTM cell consists of 128 neurons. In LSTM, to avoid the vanishing gradient
problem, a Relu activation function is utilized in place of a sigmoid function. After LSTM
cells, a fully connected layer with 128 neurons is used in the LSTM. At the end, an output
layer with one neuron is used to calculate the RUL of a mechanical bearing. Table 2 describes
the architecture parameters of the LSTM network in the proposed LSTM network.
As shown in Fig. 6, an altered feature vector sequence unit received from the CNN is
applied to an LSTM. LSTM units examine the input vector sequences at various time steps
(‘t’, ‘t+1’, ‘t+2’, ‘t+3’, ‘t+4’, and output the final time step ‘t+5’). At time step ‘t’, the LSTM

Table 2 Parameters of LSTM Parameters of LSTM architecture Value


architecture
Number of timestamp 128
Number of neurons in a LSTM cell 64 to 128
Number of hidden layers 4
(Number of stacked LSTM layers)
Activation function ReLU, Tanh
Recurrent dropout
Input shape 128
Batch size 16 TO 256
Number of neurons at fully 64 TO 256
connected layer
Output layer neuron 1

123
Multimedia Tools and Applications

Fig. 7 Internal components of an LSTM

unit takes input from the hidden layers (hidden states) and takes the encoded vector sequence
of the CNN encoder of the ‘t’ time step. The output of the LSTM unit at time step ‘t’ and the
current encoded vector sequence, i.e. the encoded vector sequence of the CNN encoder of
the ‘t+1’ time step image, are fed into the LSTM unit at a time step ‘t+1’. At time step ‘t+2’,
the LSTM unit receives the output of the LSTM unit at time step ‘t+1’ as well as the current
encoded vector sequence, which is the encoded vector sequence of the CNN encoder for the
‘t+2’ time step image.
At time step ‘t+3’, the LSTM unit receives the output of the LSTM unit at time step ‘t+2’
as well as the current encoded vector sequence, which is the encoded vector sequence of the
CNN encoder for the ‘t+3’ time step image. At time step ‘t+4’, the LSTM unit receives the
output of the LSTM unit at time step ‘t+3’ as well as the current encoded vector sequence,
which is the encoded vector sequence of the CNN encoder for the t+4 time step image.
The internal architecture of an LSTM cell is described in Fig. 7. Each LSTM cell consists of
four essential units: an input gate, output gate, forget Gate, and memory cell. The forget Gate
is responsible for removing certain information from the input. The input gate is responsible
for remembering or updating the information. The memory cell is responsible for storing the
information. The output gate outputs the information to its next LSTM unit.
The detailed deception of different operations on an LSTM unit is described in (3) to (6).
Input gate (i t ) The input gate determines how much of the input information (current input
and the previous cell state) to let into the cell state. It takes the current input xt and the
previous hidden state h t−1 as inputs and outputs a value between 0 and 1 for each element
of the cell state.
i t = σ (Wii xt + bii + Wii h t−1 + bhi ) (4)
Where Wii is a weight matrix; xt is an input vector at time instant ‘t’; bii is a bias; h t−1 is an
hidden state at time instance-1’ Forget Gate (( f t )): The forget Gate decides what information
from the previous cell state Ct−1 to forget. It considers the current input xt and the previous
hidden state h t−1 .
f t = σ (Wi f xt + bii + Wh f h t−1 + bh f ) (5)
Wi f , Wh f are the weight metrics; bii and bh f are the bias metrics;

Cell State Update (Ct ): This step computes a candidate cell state Ct that could be added
to the cell state. It considers the current input xt and the previous hidden state h t−1 .
C̃t = tanh(Wig xt + big + Whg h t−1 + bhg ) (6)

123
Multimedia Tools and Applications

Where Wig and Whg are the weight metrics; big and bhg are the bias vectors
Cell state (C t ) update The updated  cell state Ct is a combination of the previous cell
state
Ct−1 after forgetting certain parts ( f t Ct−1 ) and adding the new candidate values (C̃t it )
 
Ct = f t Ct−1 + C̃t it (7)

Output gate (ot ) The output gate decides how much of the cell state Ct to expose to the
output based on the current input h t−1 .
ot = σ (Wio xt + bio + Who h t−1 + bho ) (8)
Where Wio , and Who are the weight metrics; bio and bho are bias vectors Hidden State
Update (h t−1 ): The hidden state h t is updated based on the cell state Ct and the output gate
ot . 
h t = ot tanh(Ct ) (9)
Where h t−1 represents the previous layer output(hidden state output); x t describes the
input vector applied to the LSTM unit; b f describes the bias value; Ct and Ct− describe
the cell state at time instant ‘t’ and ‘t-1’ respectively. An LSTM cell has mechanisms (input
gate, forget Gate, and output gate) to control the flow of information into and out of the
cell state, and it uses the cell state to store and manage long-term dependencies in the data.
This architecture helps capture long-term dependencies and address the vanishing gradient
problem that can occur in traditional RNNs. Ultimately, based on the unit vector sequence
generated by the LSTM unit in the proposed CNN+LSTM algorithm, an HI(Health Indicator)
or fault probability or failure probability of a mechanical bearing is calculated. The estimated
RUL of a mechanical bearing is calculated by (7).
RU L = P(t) − C(t) (10)
Where P(t) and C(T) represent the predicted fault time and current time instance respec-
tively
Algorithm 1 describes the data preprocessing for the mechanical element’s RUL predic-
tion. In Algorithm 1, step 1 describes removing outliers from the dataset. Step 2 and step 3

Algorithm 1 Data preprocessing.


1: procedure Remove_outliers(data[])
2: mean = cal_mean(data[])
3: standard_deviation = cal_std_dev(data[])
4: Set threshold
5: cleaned_data = []
6: for i=0 to i<data[].size() do
data[i]−mean
7: z_scor e = standar d_deviation
8: if abs_z_scor e ≤ thr eshold then
9: cleaned_data.append(data[i])
10: return cleaned_data
11: wavelet = ’cmor’
12: define scales = range[1, 128]
13: for i=0 to cleaned_data[].size do
14: coeff, freq = cwt(data_1d, scales, wavelet)
15: plot_image(coeff, freq)

123
Multimedia Tools and Applications

describe the calculation of mean and standard deviation of the data. Step 4 sets the threshold
value to remove the outlier. Steps 5 to 10 describe the removal or outliers using z_scor e. Steps
11 to 15 describe the converting the 1D data into 2D images by applying CWT. Algorithm 2
describes the proposed CNN+LSTM hybrid algorithm. Step 1 to 3 describes the creation of
the CNN model. Steps 4 to 6 describe the creation of the LSTM model. Step 7 TO 9 describes
the CNN+LSTM model using CNN and LSTM. Step 10 describes the input shape. Steps 11
to 14 describe the generation of CNN output. Step 15 describes the creation of CNN out of
the stack. Steps 16 to 19 describe the mechanical element’s RUL using LSTM.

3.5 Performance evaluation of CNN+LSTM

Since RUL prediction of a mechanical bearing is a regression problem, to evaluate the per-
formance of the proposed CNN+LSTM algorithm, different performance metrics such as
Mean Squared Error (MSE), Mean Absolute Error (MBE), Explained Variance Score (EVS),
R2-score are calculated. The detailed description of all the considered performance metrics
is described as follows:
a. Mean Squared Error (MSE) The MSE describes risk metrics corresponding to the expected
value of error or loss. It returns the sum of squared error for multi-output points. The MSE
is described in (8).
1
n−1
M S E(y, ŷ) = (yi − ŷi )2 . (11)
n
i=0

Where ‘n’ represents the number of data points; ‘ ŷ’ represents the expected value of a
variable; ‘y’ represents the predicted value of a variable
b. Mean Absolute Error (MAE) The MAE describes a risk metric corresponding to the
expected value of the absolute error loss or norm loss. MAE calculates a non-negative floating

Algorithm 2 Proposed CNN+LSTM.


1: procedure cr eate_cnn_model()
2: cnn_model = de f ineconvolutionaland poolinglayer s
3: r etur ncnn_model
4: procedure cr eate_lstm_model()
5: cnn_model = de f inelstmlayerandcell
6: r etur nlstm_model
7: procedure cr eate_cnnl stm_model()
8: cnn_model = cr eate_cnn_model()
9: lstm_model = cr eate_lstm_model()
10: input = I nput(shape = (image_width, image_height, num_channels))
11: cnn_out puts[]
12: for each_image in image[] do
13: cnn_out put = cnn_model(image[i])
14: cnn_out puts.append(cnn_out put)
15: cnn_out puts_stacked = Concatenate(axis = −1)(cnn_out puts)
16: for each cnn_out puts_stacked in cnn_out puts[] do
17: lstm_out put = lstm_model(cnn_out puts_stacked)
18: RU L = Dense(vocabs i ze, activation = r elu  )(lstm o ut put)
19: RU L

123
Multimedia Tools and Applications

point. The 0.0 MAE value represents the best regression model. MAE is described by (9).
n −1
1   
M AE(y, ŷ) = yi − ŷi  (12)
n
i=0

c. Explained Variance Score (EVS) It calculates the error between the variance of error and
the variance of true values. The best possible value of this metric is 1.0. The EVS value is
described in (10).
V ar {y − ŷ}
E V S(y, ŷ) = 1 − (13)
V ar {y}

R2 -score The R 2 describes how well the model is fitted to the dataset for predicting future
value. The usual possible value of R 2 is between 0 to 1. In infrequent conditions, the R 2
value can be negative also. The R 2 value is described in (11).
n
(yi − ŷi )2
R 2 (y, ŷ) = 1 − i=1
n (14)
i=1 (yi − ȳ)
2

4 Results and analysis

We extensively performed experiments to check the performance of the proposed CNN+LSTM-


based RUL prediction algorithm. We used an Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
with 8 GB RAM HP system. To build the model, we used Python 3.8 and Google cloud-
based IDE. To implement and visualize the performance of the proposed RUL algorithm,
we used other useful Python libraries such as Tensorflow 2.0, Keras, Pandas, Numpy, Mat-
plotlib, and Seaborn. We configured the Tensorflow 2.8 library to implement the proposed
CNN+LSTM-based RUL prediction algorithm. Further, using the Keras tuner, we fine-tuned
the hyperparameters of the proposed algorithm. The proposed RUL algorithm’s performance
visualization is performed using statistical libraries such as Matplotlib and Seaborn, which
were utilized in our experiment. Figures 8 and 9 show the horizontal and vertical vibrations
outliers, respectively. The total outliers in both the cases (horizontal, vertical) vibrations
are much less than the total number of values; therefore, we dropped the outliers from the
dataset. Table 3 describes the mechanical bearing dataset summary regarding min, max, and
std deviation for all the feature vectors.
The machine-bearing dataset contained 13 Comma Separated Value (.CSV) files, and each
(.CSV) file contained 1230 data points. Further, each data point in (.CSV) represents the
horizontal and vertical vibrations of a mechanical bearing in different operating conditions.
To train, test, and validate the proposed CNN+LSTM model, we divided the machine-bearing
dataset into two parts: the 80% training part and the 20% testing part. Further, we divided the
training data set into the 90% training part and the 10% validation part. Accessing the bearing
data points from the original (.CSV) dataset is difficult and time-consuming. Therefore, for
easy access and fast reading of the data points, we converted the original bearing data set into
a pickle file, i.e. (.pkz). The converted pickle file contains 1D data for horizontal and vertical
vibration of a machine bearing in different conditions.

123
Multimedia Tools and Applications

Fig. 8 Horizontal vibration outliers

Fig. 9 Verticle vibration outliers

Table 3 Dataset description Hour Minute Second Micro H-vib. V-vib

Count 2559 2559 2559 2559 2559 2559


Mean 13 39 19 115663 −0.0012 0.02
std 0 0 0 28861 0.444 0.41
Min 13 39 19 65703 −1.67 −1.08
25% 13 39 19 90683 −0.284 −0.23
50% 13 39 19 115660 −0.006 0.025
75% 13 39 19 140640 0.30 0.29
Max 13 39 19 165620 1.52 1.35

123
Multimedia Tools and Applications

Fig. 10 Loss at lr=0.01

In the proposed work, a signal processing method is utilized to extract time-frequency


features that aid in distinguishing between healthy and problematic mechanical bearing
operations. A CWT with Morlet Wavelet is performed on the 1D data. After performing
the CWT on a 1D data file, 1D mechanical bearing data is converted into 2D images. The
converted 2D images are normalized between a given range, i.e.(0,1). These 2D images are
used to extract the critical feature vectors by applying 2D images on CNN. These 2D images
are applied to a CNN for key feature extraction during the training phase. The extracted
features from the CNN are applied to the LSTM for RUL prediction of a mechanical bearing.
Fine-tuning a CNN with an LSTM model for RUL prediction involves optimizing various
hyperparameters and model configurations to achieve the best predictive performance. Define
the architecture of our proposed CNN+LSTM model, such as the number of layers, units,
activation functions, etc. Then, we select different hyperparameters such as learning rate,
batch size, and optimizer. Figures 10, 11, 12, and 13 show the performance of the proposed
CNN+LSTM model on learning rates 0.01, 0.02, 0.03, 0.04 respectively. The performance
of the CNN+LSTM is smooth in the case of a learning rate of 0.01. As increasing the learn-
ing rate, the performance of the CNN+LSTM is noisy, as shown in Figs. 11 to 13. Further,
decreasing the learning rate to less than 0.01 takes more training, and the performance is
almost the same as in the case of 0.01. Therefore, the best learning rate is selected as 0.01 in
the proposed algorithm.
After calculating the best learning rate value, a batch-size hyperparameter is selected
to train the CNN+LSTM model. Figures 14, 15, 16, and 17 show the performance of the
CNN+LSMT model over batch sizes 32, 64, 128, and 256, respectively. Figure 14 (batch
size=32) The bias and variance of the CNN+LSTM model are low compared to the higher
batch sizes. The bias and variance are going high because the batch size is 64, 128, and 256.
Therefore, we select the optimal batch 32 in the proposed CNN+LSTM model.

Fig. 11 Loss at lr=0.02

123
Multimedia Tools and Applications

Fig. 12 Loss at lr=0.03

Fig. 13 Loss at lr=0.04

Fig. 14 Batch=32

Fig. 15 Batch=64

123
Multimedia Tools and Applications

Fig. 16 Batch=128

In the next stage, to find the best optimizer, we applied different optimizers such as adam,
rmsprop, and sgd to the proposed CNN+LSTM model. Figures 18 and 19 show the loss
and accuracy of the proposed CNN+LSTM model in the case of different optimizers. The
performance of the Adam optimizer in terms of loss and accuracy is high as compared to
other rmsprop and sgd.
The key reason behind low loss in the case of the proposed CNN+LSTM model is that
first, we extracted key feature vectors from the input data using CWT and CNN, then after, we
applied these feature vectors to the LSTM model to predict the RUL of a mechanical bearing.
LSTM performs well over time-series or sequential data; therefore, the RUL prediction
accuracy is high in the case of the proposed CNN+LSTM.
The performance of the proposed CNN+LSTM model is evaluated using the loss function.
Since loss can be positive or negative, we selected MSE (Mean Squared Error) as a loss
function in our proposed work. Further, the loss value is determined by the difference between
the actual(expected) value, and the model predicted value. If the loss function produces a
bigger value, it means a higher mismatch between the predicted and expected values. Thus, it
is the case of underfitting. Therefore, more hyperparameter tuning is required for our model.
On the other hand, if the loss function produces a minimal value, it means the predicted value
is approximately equal to the expected value. This condition shows the high accuracy of the
model. Table 4 describes the fine-tuned hyperparameter value for the proposed CNN+LSTM
model. The dropout or batch normalization rate is defined as 0.2. The activation function Relu
is selected in CNN to avoid the vanishing gradient problem in the proposed CNN+LSTM
model. A maximum number of 100 epochs is set in the proposed CNN+LSTM model. A
Mean Squared Error (MSE) loss function is defined to fine-tune the hyperparameters.
Figures 20, 21, 22, 23, 24, 25, 26, 27, 28 and 29 show day-wise, i.e. RUL predication of a
mechanical bearing. The predicted RUL of different mechanical bearings. The performance

Fig. 17 Batch=256

123
Multimedia Tools and Applications

Fig. 18 Loss on different


optimizer

Fig. 19 Accuracy on different


optimizer

Table 4 Parameters of CNN+LSTM parameters Value


CNN+LSTM Activation functions CNN+LSTM ReLU, Tanh)

Dropout and batch normalization 0.2


Learning rate 0.1
Batch size 32
Number of epoch 100
Loss function MSE

Fig. 20 RUL prediction on day 1

123
Multimedia Tools and Applications

Fig. 21 RUL prediction on day 2

Fig. 22 RUL prediction on day 3

Fig. 23 RUL prediction on day 4

Fig. 24 RUL prediction on day 5

123
Multimedia Tools and Applications

Fig. 25 RUL prediction on day 6

Fig. 26 RUL prediction on day 7

Fig. 27 RUL prediction on day 8

Fig. 28 RUL prediction on day 9

123
Multimedia Tools and Applications

Fig. 29 RUL prediction on day


10

of the pure CNN model in estimating the likelihood of defects on the training dataset. The
blue dots show the fault probability value of the forecast train, the red dots represent predicted
validation fault probability values, and the black line represents expected failure probabil-
ity values. The ideal values are the expected values. Since the acquired mechanical bearing
dataset is run-to-failure, the bearing’s functionality is fine at the start and gradually deterio-
rates as it progresses.
To train the proposed CNN+LSTM algorithm, we imported a torch package for loading
the time series mechanical bearing dataset. The torch package includes several utilities to
make data loading simple: if data loading is simple, feeding data into the model is also simple,
and the code is easier to read. If the actual value is 0.1, the predicted value should be 0.1 as
well. Expected labels are displayed to view the train, and Val results indicate how closely the
actual expected results are to the ideal expected outcomes.
We prepared the data for the CNN+LSTM model by packing it into [(NFL)xCxHxW]
sequences. We first calculated the loss function during training to determine the difference
between the actual or expected value and the model-predicted value. The loss function mea-
sures our proposed CNN+LSTM algorithm’s performance on a mechanical bearing dataset.
The loss function describes whether our model is improving or not and whether our model
is predicting accurately. Therefore, we also evaluated our proposed model’s performance
on different performance metrics such as Mean Squared Error (MSE), Mean Absolute Error
(MAE), Explained Variance Score (EVS), and R 2 -score. To reduce the loss and offer the most
accurate possible results, we used the Adam (Adaptive Moment Estimation) optimizer to train
the CNN. To optimize the proposed RUL model, a learning rate scheduler is implemented.
Based on the number of epochs, the learning rate scheduler adjusts the learning rate for
better results during the training loop. Each time the model weights are updated, the learning
rate determines how much to change the model in response to the predicted error. Epochs are
groups of data samples that are used to train a neural network. Further, to compare the perfor-
mance of the proposed CNN+LSTM algorithm with other state-of-the-art machine learning
and deep learning models, we implemented other algorithms such as (Linear Regression
(LR), Multivariate Regression (MVR), CNN, and LSTM). Table 5 describes the perfor-
mance comparison among different models (LR, MVR) CNN, LSTM, and CNN+LSTM).
The performance of the proposed CNN+LSTM is better as compared to other regression
algorithms. The performance of the proposed CNN+LSTM model is calculated in terms of
different parameters such as MSE, MAE, EVS, and R 2 .
The results of the CNN+LSTM architecture on the test dataset are shown in Figure.
Similar findings were also observed with CNN architecture. The obtained results need to be
corrected. Continuous vibration measurements are used to predict fault probability and RUL
for the bearing arrangement. Time series data features are inherent in input. As a result, this

123
Multimedia Tools and Applications

Table 5 Error comparison of Error/Model LR MVR CNN LSTM CNN+LSTM


different models
MSE 24 15 19 12 3.2
MAE 4.8 3.87 4.3 3.2 1.7
EVS −2.1 −1.8 −1.9 −1.2 0.96
R2 0.58 0.75 0.65 0.78 0.98

paper aims to investigate the use of LSTM in conjunction with a CNN encoder to obtain
credible predictions. However, the CNN architecture alone outperforms the CNN+LSTM
design on the training and validation sets. On the test data, both architectures fared badly,
indicating that the networks did not generalize effectively.
Both architectures have overfitted the training data. This behaviour could be related to
the following factors: Horizontal and vertical vibration input CWT converts 1-D signals into
2-D feature maps. During the conversion, windows of size 20 are obtained from the 1D
signal, and the average value is utilized to ensure that the feature maps are 128*128. This
conversion method may not be appropriate and should be investigated further. Both proposed
architectures (CNN and CNN+LSTM) may be larger than required, causing the training data
to overfit. Though the data represents a run-to-failure experiment focused on the original 1-D
input signals, the data appears to be within +/-5m/ s2 for the most part until the end of the
run and, in the end, exceeds 20m/s2 in magnitude.
As a result, it is unlikely that the chance of failure can be anticipated to change linearly as
it was in training. When data is in the nominal range, the fault probability should be low, but
as the values go into the abnormal range, the fault probability should rise. It is the primary
cause of both model’s poor performance on test data, as training labels may not be appropriate
for the learning objective. We propose the following for future work: Increase the number of
helpful training labels by improving the feature extraction scheme. Rather than converting
to 2-D feature maps, create training labels and use 1-D signals as input to 1-D Convolutions
for encoding and LSTM to predict fault risk.

5 Conclusion and future works

This paper proposed a novel deep learning algorithm for RUL prediction of a mechanical
bearing. The RUL deep learning-based RUL prediction algorithm is a hybrid combination of
CNN and LSTM. A 1D horizontal and vertical signal was converted into 2D images using
the CWT. By stacking different layers, such as convolution, pooling, and fully connected
layers, a CNN network was designed in the proposed CNN+LSTM. A key feature vector was
extracted from the 2D image using the CNN. Further, an LSTM network was designed to take
several LSTM cells. The feature vectors extracted by the CNN were applied to the LSTM
to predict the RUL of a mechanical bearing. The performance optimization of the proposed
model was performed by fine-tuning the different hyperparameters. The performance of the
CNN+LSTM in terms of accuracy and loss was calculated by 98% and 2%, respectively. In
future, the prediction approach will be applied and evaluated to more sophisticated platforms
with multiple components.
Data Availability The data used for experimental purposes is available as an open source at: https://
paperswithcode.com/dataset/pronostia-bearing-dataset

123
Multimedia Tools and Applications

Declarations

Competing of interest The authors declare no competing interests.

References
1. Cui L, Wang X, Wang H, Ma J (2019) Research on remaining useful life prediction of rolling element
bearings based on time-varying Kalman filter. IEEE Trans Instrum Mea 69(6):2858–2867
2. Dataset PB (2023) PRONOSTIA bearing dataset bearing dataset. https://paperswithcode.com/dataset/
pronostia-bearing-dataset
3. Deng Y, Du S, Wang D, Shao Y, Huang D (2023a) A calibration-based hybrid transfer learning framework
for RUL prediction of rolling bearing across different machines. IEEE Trans Instrum Meas 72:1–15
4. Deng Y, Lv J, Huang D, Du S (2023b) Combining the theoretical bound and deep adversarial network for
machinery open-set diagnosis transfer. Neurocomputing, pp 126391
5. Han Y, Chen S, Gong C, Zhao X, Zhang F, Li Y (2023) Accurate SM disturbance observer-based demagne-
tization fault diagnosis with parameter mismatch impacts eliminated for IPM motors. IEEE Trans Power
Electron 38(5):5706–5710
6. Hong J, Wang Q, Qiu X, Chan HL (2019) Remaining useful life prediction using time-frequency fea-
ture and multiple recurrent neural networks. In: 2019 24th IEEE international conference on emerging
technologies and factory automation (ETFA). IEEE, pp 916–923
7. Jimenez JJM, Schwartz S, Vingerhoeds R, Grabot B, Salaün M (2020) Towards multi-model approaches
to predictive maintenance: a systematic literature survey on diagnostics and prognostics. J Manuf Syst
56:539–557
8. Jin R, Chen Z, Wu K, Wu M, Li X, Yan R (2022) Bi-LSTM-based two-stream network for machine
remaining useful life prediction. IEEE Trans Instrum Meas 71:1–10. https://doi.org/10.1109/TIM.2022.
3167778
9. Liu H, Liu Z, Jia W, Lin X (2021) Adjustable uncertainty set constrained unit commitment with operation
risk reduced through demand response. IEEE Trans Industr Inform 17(2):1197–1207
10. Liu H, Yuan H, Hou J, Hamzaoui R, Gao W (2022) PUFA-GAN: a frequency-aware generative adversarial
network for 3D point cloud upsampling. IEEE Trans Image Process 31:7389–7402
11. Liu L, Wang L, Yu Z (2021) Remaining useful life estimation of aircraft engines based on deep convolution
neural network and LightGbM combination model. Int J Comput Intell Syst 14:1–10
12. Liu ZH, Meng XD, Wei HL, Chen L, Lu BL, Wang ZH, Chen L (2021) A regularized LSTM method for
predicting remaining useful life of rolling bearings. Int J Autom Comput 18:581–593
13. Ma M, Mao Z (2021) Deep-convolution-based LSTM network for remaining useful life prediction. IEEE
Trans Industr Inform 17(3):1658–1667. https://doi.org/10.1109/TII.2020.2991796
14. Qu Z, Liu X, Zheng M (2022) Temporal-spatial quantum graph convolutional neural network based on
Schrödinger approach for traffic congestion prediction. IEEE Transactions on Intelligent Transportation
Systems
15. Sayyad S, Kumar S, Bongale A, Kamat P, Patil S, Kotecha K (2021) Data-driven remaining useful
life estimation for milling process: sensors, algorithms, datasets, and future directions. IEEE Access
9:110,255-110,286
16. Shi J, Li Y, Zhang MZ, Liu W (2018) Remaining useful life prediction based on modified relevance vector
regression algorithm. In: 2018 Prognostics and system health management conference (PHM-Chongqing).
IEEE, pp 900–907
17. Wang B, Han T, Lei Y, Li N (2019) Remaining useful life prediction based on deep residual attention
network. In: 2019 International conference on sensing, diagnostics, prognostics, and control (SDPC).
IEEE, pp 79–84
18. Wang B, Zhu D, Han L, Gao H, Gao Z, Zhang Y (2023) Adaptive fault-tolerant control of a hybrid
canard rotor/wing UAV under transition flight subject to actuator faults and model uncertainties. IEEE
Transactions on Aerospace and Electronic Systems
19. Wang Y, Zhao Y, Addepalli S (2020) Remaining useful life prediction using deep learning approaches: a
review. Procedia Manuf 49:81–88
20. Wenqiang J, Jian C, Yi C (2019) Remaining useful life prediction for mechanical equipment based on
temporal convolutional network. In: 2019 14th IEEE international conference on electronic measurement
& instruments (ICEMI). IEEE, pp 1192–1199

123
Multimedia Tools and Applications

21. Xi X, Chen M, Zhou D (2019) Remaining useful life prediction for multi-component systems with hidden
dependencies. Sci China Inf Sci 62:1–16
22. Yang B, Liu R, Zio E (2019) Remaining useful life prediction based on a double-convolutional neural
network architecture. IEEE Trans Ind Electron 66(12):9521–9530
23. Yao J, Lu B, Zhang J (2022) Tool remaining useful life prediction using deep transfer reinforcement learn-
ing based on long short-term memory networks. The International Journal of Advanced Manufacturing
Technology pp 1–10
24. Zhao D, Liu F (2022) Cross-condition and cross-platform remaining useful life estimation via adversarial-
based domain adaptation. Sci Rep 12(1):878

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.

123

You might also like