Polverino Et Al 2023 Machine Learning For Prognostics and Health Management of Industrial Mechanical Systems and

Special Issue: Performance Measurement and Management Systems: opportunities,
trends and new perspectives
International Journal of Engineering

Business Management
Machine learning for prognostics and health Volume 15: 1–20
© The Author(s) 2023
management of industrial mechanical Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/18479790231186848
systems and equipment: A systematic journals.sagepub.com/home/enb
literature review
Lorenzo Polverino1, Raffaele Abbate1 , Pasquale Manco1, Donato Perfetto1,

Francesco Caputo1, Roberto Macchiaroli1 and Mario Caterino2
Abstract
In the last decade, the adoption of technological tools in manufacturing industry, such as the use of the Internet of Things (IoT)
and Machine Learning (ML), has led to the advent of the industry 4.0 (I4.0). In this scenario, intelligent devices can generate
large volumes of data about industrial machinery and equipment that can be used to make maintenance more efficient.
Prognostics and Health Management (PHM) is an emerging maintenance strategy that uses systems’ Condition Monitoring
through IoT sensors installed on machinery to diagnose their faults or estimate their Remaining Useful Life (RUL). This study
aims to conduct a Systematic Literature Review (SLR) on the use of ML techniques in the field of PHM of industrial mechanical
systems and equipment. 50 studies resulted eligible for the above-mentioned SLR. Diagnostics and prognostics approach and
the ML algorithm types used in the 50 analyzed papers have been analyzed together with the Key Performance Indicators
(KPIs) used for their validation. From the analyses, it was found that Shallow Learning and Deep Learning (DL) algorithms are
the most applied ones, while KPIs are used differently according to the type of task classification or regression. Moreover,
results highlighted that many authors still use artificial datasets to test their algorithms, instead of datasets based on real data
retrieved by their components. For the last type of datasets, this paper also introduces a schematic framework to standardize
the step-by-step diagnostics and prognostics process carried out by the authors.
Keywords
Machine learning, prognostic and health Management, fault diagnosis, fault prognosis, remaining useful life, key performance
indicators, systematic literature review
Date received: 30 March 2023; accepted: 20 June 2023
Introduction innovative, perfectly fitting into the new I4.0 scenario;

indeed it is based on systems’ Condition Monitoring (CM)
Words like “Internet of Things” (IoT), “Cyber-Physical through IoT sensors installed on machinery.3 PHM absolves
Systems” (CPS), “Internet of Services” (IoS), “Digital
Twins” (DT), and “Machine Learning” (ML) have laid the
foundations for the so-called Industry 4.0 (I4.0), which 1
Department of Engineering, University of Campania “Luigi Vanvitelli”, Italy
2
has prompted many companies to completely renew the Department of Industrial Engineering, University of Salerno, Italy
concept of maintenance, improving productivity, pre-
Corresponding author:
venting downtimes and reducing costs.1,2 Raffaele Abbate, Department of Engineering, University of Campania “Luigi
Among the different maintenance strategies, Prognostics Vanvitelli”, Via Roma, 29, Aversa (CE), Aversa 81031, Italy.
and Health Management (PHM) represents one of the most Email: raffaele.abbate@unicampania.it
Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License
(https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without
further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/
en-us/nam/open-access-at-sage).
2 International Journal of Engineering Business Management
the two important tasks related to diagnosis and prognosis The study aims to conduct a Systematic Literature Re-
to define their health state and avoid unexpected failures view (SLR) on the use of ML techniques in the field of PHM
by preventing damages.4 Different parameters can be of industrial mechanical systems and critical equipment. To
monitored in PHM according to the type of equipment, the best of the authors’ knowledge, the problem presented in
such as temperature, vibration, pressure, acoustic emis- this paper has not been addressed previously. Thus, to fill
sion, force, tension, and others.5 Industrial Structures, this gap, the study investigates diagnostics and prognostics
Systems, or Components (SSCs) are considered to be in a applied to the industrial SSCs, the kind of ML algorithms
normal state if these parameters remain above a pre- used and on the Key Performance Indicators (KPIs) for
determined threshold.6 Indeed, the evolution in time of validating them.
these parameters can be used to monitor any deviation The rest of this paper is outlined as follows: next section
from normal operating conditions, which can help to presents the methodology followed to conduct the research
determine the time the equipment is in good condition that led to the identification of the selected studies;
before it falls into a state of non-healthy condition. then, the results will be analyzed and discussed by carrying
Therefore, PHM is mainly focused on both Fault Diagnosis out a bibliometric analysis and answering the aforemen-
(FD), when a failure state is present and there is the ne- tioned RQs; finally, the last section highlights the con-
cessity to investigate the source of the anomaly, and Fault clusions and future works.
Prognosis (FP), when the necessity is to predict the future
degradation until complete failure occurs;7 in such last
case, often, the Remaining Useful Life (RUL) of SSCs is
Research methodology
estimated. RUL is defined as the time length from the To have an overview on the use of ML techniques in the
current time to the end of the useful life, that is, when field of PHM of industrial SSCs, an SLR14 similar to the
the system condition reaches the failure threshold.8 The ProKnow-C methodology15 was performed to answer
forecasting window plays a crucial role in prognosis be- the following Research Questions (RQs):
cause the objective is to provide an estimate of the future
time-step when a certain event will occur.9 In recent years, - RQ 1. What are the most used ML algorithms for
several methods to evaluate RUL or FD have been pro- PHM’s diagnostics and prognostics of industrial
posed, such as model-based, data-driven, or mixing both of SSCs?
them. Model-based approaches rely on the knowledge of the - RQ 2. What are the main performance metrics of ML
inherent system failure mechanism to build a degradation algorithms adopted in PHM of industrial SSCs?
mathematical model to describe the physical nature of the
fault;10 on the other hand, data-driven techniques rely on To answer the aforementioned questions, a literature
collected data to extract knowledge about the health status search was performed on the Scopus database (www.
of the monitored equipment. This task is particularly scopus.com), which is often used as a unique database
suitable to be performed by ML algorithms.11 These al- because it groups several types of journals covering
gorithms range from conventional Shallow Learning (SL) different fields of science and, in addition, it provides
techniques such as Artificial Neural Network (ANN), exhaustive data for each document and complete infor-
Support Vector Machine (SVM), Decision Tree (DTR), RF, mation on the author(s) and their institution profiles.16–19
to more recent techniques, such as DL algorithms.12 The research string was run on January 10, 2023. Aiming
Machine learning is arising as one of the major ap- to restrain the search field to the desired themes only,
proaches for PHM and RUL estimates. Machine learning is several combinations of keywords have been used for
mainly used for solving two types of tasks, namely, including all the possible papers related to the concepts
“Classification” and “Regression”. Classification tasks have of:
a finite number of output classes, while, in regression tasks,
an infinite number of outputs are represented as real-valued - PHM (i.e., “PdM” OR “predictive maintenance” OR
data. By its nature, the FD is a classification problem. RUL “data-driven PdM” OR “prognostic”, OR “condition-
prediction, instead, is often a regression problem, even if based maintenance”);
there are rare cases in which the RUL is treated as a - diagnostics and prognostics (i.e., “fault” OR “RUL”);
classification problem.13 - ML (i.e., DL).
Regardless of the type of algorithm used or the type of
task faced, an important step in using ML in PHM is being Each of these keywords was searched in the abstract, title
able to measure the performance of the algorithm. There- or keywords (TITLE-ABS-KEY) of the documents, that is,
fore, it is necessary to define Key Performance Indicators at least one keyword for each of the 3 above-mentioned
(KPIs) to determine the accuracy of an algorithm and the batches must be present either in the title, or in the abstract,
associated methodology. or in the document keywords. This query provided a first set
Polverino et al. 3
of 483 results; then, some Inclusion Criteria (IC) were An overview of the whole search process is provided in
considered: Figure 1.
- IC 1. Only papers in the final publication stage;

- IC 2. Only English language papers; Results and discussion
- IC 3. Only recent papers that were published between
2008 and 2023. In this section, the results are shown and discussed ac-
cording to the previously defined RQs. First, in sec-
This first filtering returned 418 papers; to further limit tion 3.1, a bibliometric analysis20 was carried out to
this number of documents, a fourth IC was added to the highlight the trends of the analyzed publications over
others: the years. Next, RQ 1 and RQ 2 were answered, re-
spectively, in section 3.2 and section 3.3 where the ML
- IC 4. Only journal papers (other kinds of documents, algorithms and KPIs used in the 50 analyzed studies are
such as books or conference papers were not examined; finally, in section 3.4, a schematic framework
considered). was developed to standardize the diagnostics and prog-
nostics process carried out by the authors who used own
Following this last IC, the database was restricted to unique datasets, and not the common public available
254 documents. The final search string is reported below: datasets.
TITLE-ABS-KEY ((Prognostic PRE/2 Management OR
“PhM” OR “Data-driven PhM” OR “Predictive Mainte-
Bibliometric analysis
nance” OR PdM OR “Data-driven PdM” OR “prognostic*”
OR “condition-based maintenance” OR CBM) AND Figure 2 shows how the 50 selected papers are distributed
(“Machine Learning” OR “ML” OR “Deep Learning” OR over the years, including the number of citations received
“DL”) AND (fault OR failure OR “Remaining Useful Life” per year. They cover an 8-year long period, starting from
OR “RUL”)) AND PUBYEAR >2007 AND 2016 until 2023, although the IC 3, defined in the pre-
PUBYEAR <2024 AND (LIMIT-TO (PUBSTAGE, “final”)) vious section, considers eligible only papers starting from
AND (LIMIT-TO (DOCTYPE, “ar”)) AND (LIMIT-TO 2008. Only 20 papers of the first set of 483 results belong
(LANGUAGE, “English”)). to the 2008–2015 years and none of them is about the
At this point, three further steps, described below, were industrial mechanical field, but medical, railway, chem-
conducted to finally find the ultimate papers: ical or aeronautical field. For such a reason, they were
excluded from the final analysis. In Figure 2, it is possible
First, 31 documents were removed by simply to note that the number of papers increased in the last few
reading the papers and journals titles, because they years, reaching the peak of 14 publications in 2021. This
were not related to the industrial mechanical systems increasing number of studies over the years is not sur-
field (i.e., medical, railway, robotics, or chemical prising, considering that the word “Industry 4.0” was
field); used for the first time in Germany in 2011, and precisely
Second, the remaining 223 abstracts were analyzed, during the Hanover Fair, where the Communication
discarding 148 documents because either they did not Promoters Group of the Industry-Science Research Al-
consider industrial mechanical applications, but aero- liance (FU) announced a project for the development
nautical, aerospace, and chemical applications, or they of the German industrial manufacturing sector, the
did not consider applications to validate their ML al- “Zukunftsprojekt Industrie 4.0” 21; since then, the Ger-
gorithms at all; the remaining 75 documents went to the man model, combined to the improvements of the inter-
next analysis step, even if 24 of these 75 needed a more connectivity of the IoT and robotics devices brought by
in-depth analysis because by simply reading the ab- Artificial Intelligence (AI) technologies, has inspired
stracts, it was impossible to determine neither if authors numerous researchers to continue researching the ML
considered some kind of applications, nor if the ap- field to improve the productivity and reduce the costs
plications were in theme with the interested industrial related to industrial maintenance.22
field; Concerning the number of citations per year, it is
Finally, a full paper analysis was conducted, which possible to note from Figure 2 that the trend is not stable,
allowed excluding 25 documents because they consider with a peak of 746 citations in 2019, an average of 373.8,
neither ML algorithms (but statistical techniques), nor and 0 citations in 2023, because of the narrow time
industrial mechanical applications. Therefore, 50 papers window available to receive citations in this year, con-
out of the 254 were considered eligible for the following sidering that the literature search date is on January
analysis. 10th, 2023.
Figure 1. Overview of the literature identification process.
papers, bearings are in 74% of the analyzed studies,

followed by gears at 16%, milling machine’s cutting tools
at 10%, a pump’s impeller, a ball screw and a hot strip
mill’s roller at 2%. These trends can be explained by
noting that the most of problems arising in rotating
machinery are caused by faulty gears and bearings.23 As
components between the stationary and the rotating part
of the industrial machinery, bearings represent an es-
sential part of them; in fact, it causes more than 50% of
induction motors’ failures mainly because of overheating,
too high axial and radial loads, and electrical stress such
as the presence of bearing currents.24 As a consequence of
Figure 2. Publication and citations trend per year.
the predominant presence of bearings and gears as SSCs
analyzed by authors, four popular public datasets resulted
Table 1 shows the most relevant journals of the analyzed to be the most used in the analyzed papers, that is, for
papers (journals with only one paper each were put together bearings, IEEE PHM 2012 Challenge dataset (36%),
in the last row named “others”). XJTU-SY and CWRU bearing dataset at (18%), and, for
The 50 analyzed papers present 6 different types of gears, PHM 2009 challenge dataset at (8%). The re-
SSCs (Figure 3) and 9 different types of datasets maining four datasets consist of two datasets for gear-
(Figure 4): 8 are online public datasets, and 1 is an “own- boxes (University of Alberta gearbox and 2021 Tsinghua
datasets” type, that is, datasets created specifically for the University dataset), one dataset for milling machine’s
task addressed and the industrial application of cutting tools (IMS-Foxconn dataset), and one dataset
the authors. It is possible to note that the sum of the for bearings (NASA bearing dataset). Moreover, from
percentage values in Figure 3 and in Figure 4 is be- Figure 4, it is possible to note that 38% of the analyzed
yond 100% because often more than one type of SSC and/ papers present datasets created for the specific problems
or datasets was examined by the authors. About the investigated by the authors; this theme is examined in
mechanical systems and components analyzed in the depth in section 3.4.
Polverino et al. 5
Table 1. Number of papers related to the most relevant journals.
Journal # Papers
Reliability engineering and system safety 4

IEEE Transactions on instrumentation and measurement 4
Applied sciences (Switzerland) 3
Measurement: Journal of the international measurement confederation 3
Knowledge-based systems 2
IEEE Transactions on industrial electronics 2
Journal of manufacturing science and engineering, Transactions of the ASME 2
IEEE access 2
Journal of computing and information science in engineering 2
Journal of manufacturing systems 2
IEEE/ASME Transactions on mechatronics 2
IEEE Transactions on industrial informatics 2
Advanced engineering informatics 2
Others 16
Figure 3. Structures, systems, or components used in the analyzed papers.
Machine learning algorithms for PHM of SSCs “Shallow Learning” refers to all the traditional ML
models, that is, those proposed before 2006;25 among
This section aims to respond to RQ 1, that is, What are the these, those used in the 50 analyzed studies are: shallow
most used ML algorithms for PHM’s diagnostics and ANN, i.e., neural networks with only one hidden layer of
prognostics of industrial SSCs? nodes, SVM, DTR, RF, statistical models, and hybrids,
The constant increase in data availability due to in- that is, combinations of these algorithms; on the other
telligent sensors installed on SSCs, in addition to the hand, DL models are based on neural networks with the
technological progress in terms of computers’ hardware addition of multiple hidden layers between the network’s
and software and a large number of cross-platform li- input and output;7 among these, those used in the
braries, such as MATLAB, Python, R, and Sci-kit Learn, 50 analyzed studies are: Deep Neural Network (DNN),
have led to the rapid development of multiple ML Recurrent Neural Networks (RNN), Convolutional
techniques to better address the issue of PHM of SSCs. Neural Networks (CNN), Auto-Encoders (AE), Re-
These techniques range from the first classic SL tech- stricted Boltzmann Machines (RBM) and hybrids, that is,
niques to the more recent DL ones. The word “shallow,” combinations of these algorithms. Furthermore, the cases
is from the single hidden layer belonging to the first of SL/DL hybrid methods, that is, algorithms in which SL
simple neural networks, therefore usually nowadays and DL models are combined, are not uncommon.
Figure 4. Datasets used in the analyzed papers.
Table 2 shows all the ML algorithms used in the analyzed prognostics tasks, while ANFIS and R-S-G statistical
papers, clarifying their nature (type of algorithm) and their models for diagnostics tasks. About the DL algorithms used
family (SL, DL, or SL/DL hybrid methods). in 41 of the 50 analyzed papers, they are distributed as
Moreover, the frequency of citations of the afore- follows: CNN and RNN are the most used (25.6%,
mentioned ML algorithms is shown in Figure 5, where it i.e., 11 times each), followed by DNN and Hybrid ones
is clear the predominance of the DL methods (82%, (16.3%, i.e., 7 times each) that are constituted by two
i.e., 41/50 sample papers) both on the SL methods models, that is, a mash-up between CNN and RNN (14%,
(10%, i.e., 5/50 sample papers) and Hybrid SL/DL ones i.e., 6 times) and a mash-up between RNN and DNN (2.3%,
(8%, i.e., 4/50 sample papers). One of the reasons for the i.e., 1 time); the DL algorithms less used are RBM (9.3%,
higher use of DL, supplanting the traditional SL algo- i.e., 4 times) and AE (7%, i.e., 3 times). CNN, DNN, RBM,
rithms, is the ability to skip the process of hand-extraction AE, and Hybrid ones have been used both for prognostics
features from the input data before being fed into the and diagnostics tasks, while RNN has been used only for
network, thanks to a nested series of consecutive com- prognostics tasks. In conclusion, about the DL/SL hybrid
putations that result in the extraction of a set of complex algorithms, there are four of them: CNN-RF, RBM-ANN,
and highly informative features; moreover, in these years, DNN-DT, and AE-SVM; each of these algorithms has been
an increasing number of empirical results have shown used only for prognostics tasks.
that these models return better results in terms of diag-
nostics and prognostics performance, compared to
“shallow” methods. The main problem is that, compared Machine learning KPIs for PHM of SSCs
with SL models, DL ones require a larger amount of The efficiency and effectiveness of an ML model can be
training data (not always available) and the models to evaluated using Key Performance Indicators. KPIs are usually
build are more complex.7 divided into 2 groups: (i) KPIs for ML classification tasks, for
The pie charts in Figure 6 show how the ML techniques which the output is divided by positive and negative classes.
are distributed among the 50 sample papers, dividing them For instance, considering FD described through a simple bi-
into SL algorithms (a), DP algorithms (b), and hybrid ones nary classification, negative class stands for “fault” and pos-
(c). About the SL algorithms, as aforementioned, only 5 of itive class stands for “working”; (ii) KPIs for ML regression
50 analyzed papers use SL methods, with a prevalence of tasks, for which the output may be any value. For instance, for
RF (30%, i.e., 3 times), followed by DT and SVM (20%, RUL, KPIs may range from 0 to 100, where 0 stands for
2 times each), ANN, ANFIS (hybrid between ANN and a “fault” and the other values stand for “still working.” Since
statistical model), and R-S-G statistical model (10%, 1 time different ML tasks produce different outputs (i.e., continuous
each). RF, DT, SVM, and ANN have been used for or discrete), the related KPIs are consequently different too.
Polverino et al. 7
Table 2. Machine learning algorithms used in the 50 sample papers.
Algorithm
Acronym Full name Algorithm nature family
ANFIS Adaptive neuro-fuzzy inference system Hybrid (ANN-statistical SL

model)
AOA Adversarial out-domain augmentation DNN DL
BGRU- Bidirectional gated recurrent Unit-domain adversarial neural network Hybrid (RNN-DNN) DL
DANN
BiLSTM Bi-directional long short-Term memory RNN DL
CABLSTM Convolution-based attention mechanism bidirectional long short-Term Hybrid (CNN-RNN) DL
memory
CLSTM Convolution-based long short-Term memory Hybrid (CNN-RNN) DL
CLSTMF Convolutional long short-Term memory fusion networks Hybrid (CNN-RNN) DL
CNN Convolutional neural network CNN DL
CNN-BiLSTM Convolutional neural Network-Bi directional long short-Term memory Hybrid (CNN-RNN) DL
CNN- Convolutional neural Network-gcForest Hybrid (CNN-RF) DL-SL
gcForest
CNN-LSTM Convolutional neural network-long short-Term memory Hybrid (CNN-RNN) DL
CWT-CNN Continuous wavelet Transform-convolutional neural network CNN DL
DBN Deep belief network RBM DL
DBN- FNN Deep belief network feed-forward neural network Hybrid (RBM-ANN) DL-SL
DCNN–MLP Deep convolutional neural network–Multilayer perceptron dual Hybrid (CNN-DNN) DL
network
DNN Deep neural network DNN DL
DSARN Deep subdomain adaptive regression network DNN DL
ET Extremely randomized Trees DT SL
FNN Feed-forward neural network ANN SL
GAN Generative adversarial network DNN DL
GRUNN Gated recurrence Unit neural network RNN DL
GRU-PF Gated recurrent Unit neural network with particle filters RNN DL
InDo-DDM Inter domain-decision discrepancy minimization DNN DL
LSTM Long short-Term memory RNN DL
MDAN Multisource domain adaptation network DNN DL
MRPRF MapReduce-based parallel random forests RF SL
MS-DRNN MultiStream-deep recurrent neural network RNN DL
NICE Nonlinear independent components estimation DNN DL
PSO-CNN Particle swarm optimization with convolutional neural network CNN DL
RBM Restricted Boltzmann machine RBM DL
RF Random forest RF SL
R-S-G Residual building unit-Soft thresholding-Global context Statistical model SL
SCAE Stacked contractive auto-encoder AE DL
SDAE-SVDD Stacked denoising auto-encoder-support vector data description Hybrid (AE-SVM) DL-SL
SPADA Stacked auto-encoder based partial adversarial domain adaptation AE DL
SVM Support vector machine SVM SL
SVR Support vector regression SVM SL
SWAE Stacked wavelet auto-encoder AE DL
TGAN-EBT Tabular generative adversarial networks – Ensemble bagged Tree Hybrid (DNN-DT) DL-SL
WMSCCN Wide convolution and multi-scale convolution CNN DL
WSGRU Wavelet sequence-based gated recurrent Unit RNN DL
XGBoost Extreme gradient boosting DT SL
Figure 5. Frequency of the shallow learning, deep learning, and hybrid algorithms used in the 50 analyzed papers.
Figure 6. Types of shallow learning (a), Deep learning (b), and hybrid (c) algorithms used in the 50 analyzed papers.
Polverino et al. 9
Table 3. KPIs for ML classification tasks.
KPI Formula Description
Accuracy (A) A ¼ TPþTNþFPþFN

TPþTN
(1) It is the ratio between the total number of correctly
classified samples and the total number of samples within
the test set. It is bounded to [0, 1], where 1 represents
predicting all positive and negative samples correctly, and
0 represents predicting none of the positive or negative
samples correctly.
Recall (R) R ¼ TPþFN
TP
(2) It is the ratio between correctly classified positive samples
and all samples assigned to the positive class. It is bounded
to [0, 1], where 1 represents perfectly predicting the
positive class, and 0 represents incorrect prediction of all
positive class samples.
Precision (P) 8 TP It is the ratio between correctly classified class samples and
< PPV ¼ TP þ FP
>
all samples assigned to that class. “Class” is a variable that
P ¼ TCþFC
TC
¼ (3) can assume both “positive” (C = P) and “negative” (C =
>
: TN
NPV ¼ N) values. The positive case of the precision (C = P) is
TN þ FN
called “positive predictive value” (PPV) which is the ratio
between correctly classified positive samples and all
samples classified as positive, while the negative case of
the precision (C = N) is called “negative predictive value”
(NPV) which is the ratio between correctly classified
negative samples and all samples classified as negative. P,
PPV and NPV are bounded to [0, 1], where 1 represents
all samples in the class correctly predicted, and 0
represents no correct predictions in that class.
F1 score (F1) F1 ¼ 1=Rþ1=P
2
(4) It is the harmonic mean of the R and the P; therefore, it
penalizes extreme values of either. It is bounded to [0, 1],
where 1 represents the perfect model (max R and P
values) and 0 represents zero P and R values. Note that a
high F1 value symbolizes a high P value as well as a high R
value, while a low F1 value is not enough to know if the
problem of the model resides on low Recall (type-I
problem) or low Precision (type-II problem) or both of
them (type-III problem). Therefore, F1 is often used
together with other metrics, to better understand if the
model suffers from the type-I, type-II, or type-III problem.
Area under the receiver FPR ¼ TNþFP FP
(5) It is a curve plotted between false positive rate (FPR) on the
operating characteristic x-axis and recall on the y-axis. FPR, just like recall, has
TNðTPþ1Þ
Curve (AUROC) AUROC ¼ TP þ FN 2 TN∙TP (6) values in the range [0, 1], but 1 represents incorrect
prediction of all negative class samples, and 0 represents
perfectly predicting the negative class. The AUROC value
changes according to the model, but in the case of simple
binary classification, the AUROC is equal to the equation
(6). It is bounded to [0,1] where 0 means that the model is
predicting a negative class as a positive class and vice
versa, and 1 means that the model has a perfect capacity
to separate the classes.

Confusion matrix (CMX) TP FN It is a nxn matrix, where “n" is the number of classes that are
Confusion Matrix ¼ (7) to be predicted. In the case of binary classification (n = 2),
FP TN
the confusion matrix looks like the equation (7). It is not
exactly a performance metric but it is a starting point on
which the other metrics, definable starting from the
matrix, evaluate the results.
TP = positive class samples correctly predicted; TN = negative class samples correctly predicted; FP = positive class samples incorrectly predicted; FN =
negative class samples incorrectly predicted; TC = true class; FC = false class.
Table 4. KPIs for ML regression tasks.
KPI Formula Description
Mean squared error P

N It is the average of the squares of the errors, that is, the average
MSE ¼ N1 di ðtÞ RULi ðtÞÞ2 (8)
ðRUL
(MSE) i¼1
squared difference between the predicted and the actual
RUL values at i-th time-instant.
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Root mean squared PN It is the root of the MSE and represents the standard deviation
ci ðtÞRULi ðtÞÞ
ðRUL 2
error (RMSE) of the residuals (prediction errors); residuals are a measure
RMSE ¼ i¼1 N (9) of how far from the regression line data points are. The
RMSE is more sensitive to outliers than the MSE because the
effect of each error on RMSE is proportional to the size of
the squared error.
Mean absolute error PN It is the arithmetic average between the predicted and the
MAE ¼ N1 jRUL di ðtÞ RULi ðtÞj (10)
(MAE) i¼1
actual RUL values at time-instant t. MAE, just like MSE and
RMSE, does not provide any “direction” of error, that is,
whether the model is overfitting or underfitting the
forecast. Moreover, it also measures the average magnitude
of error, that is, how far the predictions are from the actual
output.
Mean absolute PN c It is the arithmetic average between the predicted and the
percentage error MAPE ¼ 100%
N jRULi ðtÞRUL
RULi ðtÞ
i ðtÞ
j (11) actual RUL values, related to the actual RUL values at time-
i¼1
(MAPE) instant t. It measures the forecast accuracy, evaluating the
size of the error in percentage terms.
Coefficient of PN c 2 It is the proportionate amount of variation in the dependent
ðRUL ðtÞ RUL ðtÞÞ
R2 ¼ 1 Pi¼1N
i i
determination (R2) 2
(12) variable explained by the independent variables in the linear
i¼1
ðRUL i ðtÞRULÞ
regression model. It allows to understand how strong is the
predictive capacity of a linear regression model.
Cumulative relative P N It is a normalized weighted sum of relative accuracies at
accuracy (CRA) CRA ðλÞ ¼ N1 ðwi RAλ Þ (13) specific time instances (RAλ). The latter is defined as a
i¼1
measure of the error in RUL prediction, relative to the
actual RUL at a specific time index (RULi(λi)). RAλ is used as
Where: a metric to emphasize that errors closer to the actual failure
8 di ðλi Þj of a component are more severe. λi is defined as the
> jRULi ðλi Þ RUL
>
< RAλ ¼ 1 normalized time and it is the ratio between the time-instant
RULi ðλi Þ
> (ti), and the time-to-failure of component (tf); λ is bounded
>
: ti to [0,1], where 0 means that the component is on its
λi ¼
tf maximum state of health, while 1 means that the component
has failed.27,28

Scoring function (Ai) expðlnð0:5Þ ðEri =5ÞÞ if Eri ≤ 0 This metric was used for the IEEE PHM 2012 prognostic
Ai ¼ (14)
expðlnð0:5Þ ðEri =20ÞÞ if Eri > 0 challenge, and it sets asymmetric penalties for late and early
PN predictions. The letter “i” stands for i-th bearing, in fact, if
A score ¼ N1 Ai (15) there is more than one test bearing (as it often happens), it is
i¼1
RULi ðtÞRUL ci ðtÞ possible to evaluate the average score of the RUL prediction
Where: Eri ¼ 100% RULi ðtÞ
for all testing bearings (A-score). Ai is 1 when the per cent
error Eri is 0; as the per cent error increases, the score
decreases.29
di ðtÞ = i-th predicted RUL value at t-instant; RUL = mean value of the actual RUL
N = Cardinality of the dataset; RULi ðtÞ = i-th actual RUL value at t-instant; RUL
di ðλi Þ = i-th predicted RUL value at λ-instant; ti = i-th time-instant; tf = time-to-failure
samples in the dataset; RULi ðλi Þ = i-th actual RUL value at λ-instant; RUL
of component; wi = i-th weight factor as a function of RUL at all time instants, that is, wi(RULi); Eri = percentage error of the i-th bearing.
Note: MSE, RMSE, MAE, and CRA are bounded to [0, +∞], while MAPE, R2, and Ai are bounded to [0, 1]; since MSE, RMSE, MAE, and MAPE are coefficients
that evaluate an error, the lower the value, the greater the accuracy of the forecast. Instead, for R2, CRA, and Ai, the higher the metric, the better the
prediction performance.
To answer the RQ 2 (What are the main performance metrics Moreover, the frequency of citations of the aforemen-
of ML algorithms adopted in PHM of industrial SSCs?), first tioned EM is shown in the diagram in Figure 7, where the
of all, a brief description of the Evaluation Metrics (EM) used KPIs are divided into classification (a) and regression (b)
in the 50 analyzed papers is shown in Table 3 (KPIs for tasks. Among all the 50 analyzed papers, 18 deal with the
classification tasks) and Table 4 (KPIs for regression tasks).26 classification task, while the remaining 36 deal with the
Polverino et al. 11
Figure 7. Frequency of the key performance indicators for classification (a) and regression (b) tasks.
Table 5. Characteristics of the selected studies.
Cited Industrial
Article by application(s) Objective Dataset(s) ML Technique(s) ML task KPI(s)
30
27 Ball screw Prognostics Own dataset GRU-PF Regression RMSE, MAE
31
46 Bearing Diagnostics Own dataset DBN Classification A
32
94 Bearing Diagnostics Own dataset; CNN Classification A
CWRU bearing dataset
33
22 Bearing Diagnostics Own dataset CNN Classification A
34
16 Bearing Diagnostics CWRU bearing dataset WMSCCN Classification A
35
10 Bearing Diagnostics CWRU bearing dataset NICE Classification AUROC
XJTU-SY dataset
36
4 Bearing Diagnostics CWRU bearing dataset CNN Classification A, (CMX)
37
9 Bearing Diagnostics CWRU bearing dataset R-S-G Classification A, P, R, F1,
(CMX)
38
0 Bearing Diagnostics CWRU bearing dataset CNN-BiLSTM Classification A, (CMX)
39
0 Bearing Diagnostics CWRU bearing dataset CNN Classification A
40
192 Bearing Prognostics IEEE PHM 2012 challenge RBM Regression Ai
dataset
41
100 Bearing Prognostics IEEE PHM 2012 challenge CWT-CNN Regression Ai
dataset
42
264 Bearing Prognostics IEEE PHM 2012 challenge DNN Regression MAE, MAPE,
dataset RMSE
43
175 Bearing Prognostics Own dataset CNN Regression RMSE, CRA
44
11 Bearing Prognostics IEEE PHM 2012 challenge MS-DRNN Regression MSE, MAE
dataset
45
7 Bearing Prognostics IEEE PHM 2012 challenge LSTM Regression Ai
dataset
46
52 Bearing Prognostics IEEE PHM 2012 challenge SDAE-T-GSVDD Classification A
dataset
47
65 Bearing Prognostics IEEE PHM 2012 challenge GAN Regression RMSE, MAE,
dataset MAPE
XJTU-SY dataset
48
15 Bearing Prognostics NASA bearing dataset LSTM Regression RMSE, MAE,
MAPE
49
25 Bearing Prognostics IEEE PHM 2012 challenge DSARN Regression RMSE, MAE,
dataset Ai
XJTU-SY dataset
(continued)
Table 5. (continued)
Cited Industrial
50
12 Bearing Prognostics IEEE PHM 2012 challenge SCAE Regression RMSE, MAPE
dataset
51
82 Bearing Prognostics CWRU bearing dataset CNN-gcForest Classification A, (CMX)
XJTU-SY dataset
52
108 Bearing Prognostics Own dataset CLSTM Regression RMSE, MAE
53
18 Bearing Prognostics IEEE PHM 2012 challenge ET, RF, XGBoost, Regression R2, Ai
dataset SVM
54
11 Bearing Prognostics Own dataset DBN Regression RMSE, MAPE
55
7 Bearing Prognostics Own dataset WSGRU Regression RMSE, MAE
56
4 Bearing Prognostics IEEE PHM 2012 challenge BGRU-DANN Regression R2, MAE, MSE
dataset
57
39 Bearing Prognostics PHM 2012 challenge dataset DCNN–MLP Regression RMSE, MAE
XJTU-SY dataset
58
19 Bearing Prognostics IEEE PHM 2012 challenge CABLSTM Regression MAE, MSE, Ai
dataset
59
8 Bearing Prognostics IEEE PHM 2012 challenge TGAN-EBT Regression RMSE, MAE
dataset
60
3 Bearing Prognostics IEEE PHM 2012 challenge LSTM Regression RMSE, Ai
dataset
61
4 Bearing Prognostics IEEE PHM 2012 challenge CLSTMF Regression RMSE, MAPE
dataset
XJTU-SY dataset
62
4 Bearing Prognostics IEEE PHM 2012 challenge MDAN Regression RMSE, Ai
dataset
XJTU-SY dataset
63
0 Bearing Prognostics IEEE PHM 2012 challenge LSTM Regression RMSE, MAPE
dataset
XJTU-SY dataset
64
0 Bearing Prognostics Own dataset GRUNN Regression MAE, RMSE
65
0 Bearing Prognostics IEEE PHM 2012 challenge AOA Regression RMSE, MAE,
dataset Ai
XJTU-SY dataset
66
50 Bearing Diagnostics Own dataset ANFIS Classification A
Gear
67
14 Bearing Diagnostics Own dataset PSO-CNN Classification A, (CMX)
Impeller
68
22 Bearing Diagnostics CWRU bearing dataset SPADA Classification A, (CMX)
Gear PHM 2009 challenge dataset
69
0 Bearing Diagnostics CWRU bearing dataset InDo-DDM Classification A
Gear PHM 2009 challenge dataset
70
86 Gear Diagnostics Own dataset SWAE Classification A
71
1 Gear Diagnostics PHM 2009 challenge dataset CNN Classification A
2021 Tsinghua University
gearbox dataset
University of Alberta gearbox
dataset72
10
238 Gear Prognostics Own dataset DBN- FNN Regression MAPE, RMSE
73
163 Gear Prognostics PHM 2009 challenge dataset CNN Classification A, R, (CMX)
and own dataset
74
20 Hot strip Prognostics Own dataset LSTM, CNN, Regression RMSE, MAE
mill’s roller DBN
(continued)
Polverino et al. 13
Table 5. (continued)
Cited Industrial
75
316 Milling machine’s Prognostics Own dataset FNN, RF, SVR Regression R2, MSE
cutting tool
76
37 Milling machine’s Prognostics Own dataset MRPRF Regression R2, MSE
cutting tool
77
40 Milling Machine’s Prognostics Own dataset BiLSTM Regression RMSE, MAE
cutting tool
78
31 Milling Machine’s Prognostics IMS-Foxconn dataset BiLSTM Regression MAE
cutting tool
79
8 Milling Machine’s Prognostics Own dataset CNN-LSTM Regression RMSE, MAE,
cutting tool R2
regression task. 4 papers considered both classification and

regression tasks. About the classification task, Accuracy is
the most used metric (58.6%, i.e., 17 times), followed by
CMX (about 24.1%, i.e., 7 times), Recall (about 6.9%,
i.e., 2 times), and AUROC, Precision, and F1 (about 3.4%,
i.e., 1 time each). About the regression task, RMSE is the
most used metric (32.4%, i.e., 22 times), followed by MAE
(26.5%, i.e., 18 times), Ai (13.2%, i.e., 9 times), MAPE
(11.8%, i.e., 8 times), R2 and MSE (7.4%, i.e., 5 times each),
and CRA (1.5%, i.e., 1 time).
Table 5 below summarizes the main characteristics of the
selected papers sorted by industrial application type, in Figure 8. All papers’ diagnostics and prognostics distribution
terms of article, received citations, objective, industrial related to regression and classification Machine Learning (ML)
application(s), dataset(s), ML technique(s), ML task types, tasks.
and KPI(s) used.
PHM framework for the “own-dataset papers”

Figure 8 shows the analyzed papers’ diagnostics and prog-
nostics distribution and how regression and classification tasks
are allocated to them. It is possible to note that Prognostics
overtakes its counterpart with 70% of the papers (divided by
classification end regression tasks) versus 30% of the papers
which face diagnostics (only through classification task).
However, prognostics and diagnostics percentages, shown in
Figure 8, could be misleading because they are not necessarily
related to the real manufacturing industry prognostics and
Figure 9. Own-dataset papers’ diagnostics and prognostics
diagnostics data, but rather they are related to the problem of
distribution related to regression and classification ML tasks.
the complexity of monitoring and analyzing data through IoT
devices for industries, that led to the use of pre-existing da-
tasets just to find the best ML algorithms proposed by the prognostics and diagnostics in the industrial field, Figure 9
50 papers’ authors. This is the reason why the papers that shows the own-dataset papers’ diagnostics and prognostics
present datasets created specifically for the task addressed by distribution related to regression and classification ML tasks.
their authors (own-dataset papers) are further investigated in Comparing Figures 8 and 9, it emerges that both the prog-
this section; from Table 5 it is possible to extrapolate that an nostics and the diagnostics trends are confirmed, that is, a clear
own-dataset has been used in 19 of the 50 analyzed studies. As predominance both of Prognostics on Diagnostics and Re-
a first step, to better understand the real partition between gression on Classification.
Therefore, Figure 10 below shows a single common For example, Wu et al.73 use twelve different time-domain
PHM framework which describes the step-by-step diag- extraction features to form a single feature vector as an input
nostics and prognostics process carried out by the authors of to a neural network: VPP, standard deviation, variance,
the 19 own-dataset papers. It is worth noting that the path is mean, RMS, ARV, form factor, crest factor, kurtosis, kur-
not unique since some steps could be repeated for the di- tosis factor, pulse factor, and margin factor. Other papers
agnostic and prognostic tasks, for example, although the that adopt this type of time-domain based feature extraction
prognostic step relies on the results of the diagnostic step, it are Refs. [10,75,76,43,54,79]. Other time-domain feature
may be necessary to perform steps from 2 to 5 again since extraction methods are Hierarchical Symbolic Analysis
the task purpose is changed. Moreover, step 8 could follow (HAS),67 and a unique deep multilayer LSTM model that
both steps 6 and 7. The aforementioned steps are described can fully extract the features from the monitoring raw
as follows: data.74
Frequency-domain is about extracting statistical features
Data acquisition. The raw data (vibrations, tempera- by applying the Fast-Fourier-Transform (FFT) to raw data;
tures, pressures, acoustic emissions, etc.) are acquired typical statistical frequency domain features are Mean
time by time by the sensors installed on the critical Frequency (MF), Root Mean Square Fluctuations (RMSF),
components in laboratories’ test platforms. Depending on Frequency Modulation (FM), Root Variance Frequency
the type of SSCs, the variables, analyzed by the sensors, (RVF), Power Spectrum Deformation (PSD), etc; for in-
change. Vibration seems to be the most analyzed variable stance, Xie et al.31 extract frequency-domain features and
for bearings,31–33,43,52,54,67 followed by temperature,55 use them as inputs to a DBN model.
and oil supply pressure, pressure applied to bearings Time-frequency domain considers both time and
and lubrification oil flow;64 moreover, vibration, cutting frequency domains to capture how the frequency com-
force, and acoustic signals are the constant variables ponents of the signal vary as a function of time. It is
analyzed by the sensors for milling machine’s cutting tool; commonly used to monitor rotating machinery state, and
75–77,79
vibration is the only analyzed variable for gears; it is very effective for non-stationary time-series anal-
10,70,73
strip temperature, strip thickness, strip width, strip ysis. For example, the vibration signal of a bearing is
flatness, and roller gap are used to analyze the degradation non-stationary and has a weak defect signal within a
performance of the hot stream mill’s roller;74 finally, for strong background of noise.80 Wavelet Transform (WT),
the ball screw, vibration and position of the screw are used Continuous Wavelet Transform (CWT), and Empirical
to evaluate its wear state.30 A singular case concerns the Mode Decomposition (EMD) have been used to extract
paper,66 where only current signals are used as raw data features from raw signals, such as in Ref. 55 where the
for bearings and gears’ diagnostics. wavelet sequences are realized using the CWT, given its
Feature extraction. The raw data are converted into capability to handle the non-stationary signals with
statistical features usable by the specific ML algorithm. multiscale representation, which can provide the hier-
Particularly, this conversion may have three different archy of structural information to show the dynamic
domains: time-domain (TD), frequency-domain (FRD), characteristics of the vibration signals. Another example
and time-frequency domain (TFD). of TFD method is carried out in, Ref. 32 where 8 different
TFD methods are used to extract features for bearing
Fault Diagnosis.
Time-domain is based on converting raw data into sta- Four further cases are about bearings’ prognostics,52,64
tistical features such as mean, median, standard deviation, bearings and gears’ diagnostics,66 and gears’ diagnostics,70
variance, root mean square (RMS), skewness, and kurtosis. in which both frequency and time domains are investigated
Figure 10. Diagnostics and prognostics process followed by the 19 own-dataset papers.
Polverino et al. 15
separately. In particular, in Ref. 24, statistical features in FI is based on simply finding the best features’ sub-set
time and frequency domains, such as RMS, square root according to the specified objective of diagnostics or prog-
value, absolute mean, kurtosis, and others, are used to nostics through several statistical methods, such as correlation,
describe the degradation process of bearings; in Ref. 28, a time-series, chi-square test, and others; unlike the following
total of 16 among classic time-domain features and two methods, FI does not use ML algorithms to perform the
3 frequency-domain features (FC, RMSF, and RVF) are PHM task, therefore it allows to have a sub-set of features more
extracted from five sensors as input to the proposed model; versatile, to be then employed by numerous ML algorithms.
in Ref. 37, the frequency-domain analysis is used for each For example, Saravanakumar et al.66 use Spearman correlation
current signal (features are extracted from electrical signals) to find how the extracted features are correlated with the actual
to extract a characteristic value corresponding to different RUL of bearings. Other examples of filters-based techniques
load variation states, while, on the other hand, the time- are in Ref. 30 and Ref. 75.
domain analysis is applied to extract values that allow WR is based on a specific ML algorithm that has to fit a
tracking the evolution of the bearing and the gear degra- given dataset. The evaluation criterion is simply linked to the
dations; in Ref. 33, the time-domain analysis has been classic ML performance metrics, including those described in
carried out evaluating standard deviation, kurtosis, shape sub-section 3.3. Wrappers are usually able to achieve better
factor, and impulse factor, that have been extracted from performances than FI-based techniques since they are opti-
each sample of each sensor, while, the frequency-domain mized for a specific ML algorithm which is in turn tailored for
has been calculated from the corresponding spectrum a specific task. On the other hand, wrappers are biased toward
sample of each sensor, defining 13 different statistical the ML algorithm they are based on and therefore the resulting
indexes. feature sub-set is not very versatile, that is, it will not be
Other two examples of “meshing” feature extraction generally adequate for alternative ML techniques.7 For ex-
domains are on Ref. 30 and Ref. 77 where all of the three ample, to automatically select and classify the most infor-
different domains are examined separately (TD, FRD, and mative features, Marei et al.79 employ a CNN model, using
TFD) to identify the ML algorithm with the greatest number then test accuracy to get feedback about the performance of the
of useful features. feature section.
EMM presents the feature extraction process into the
- Feature selection. The sub-set of the extracted features ML algorithm, which is able to pull out the most rep-
could contain redundant information, therefore, resentative features from the extracted features’ sub-set.
achieving only the most meaningful information, It is possible to find examples of the embedded approach
according to the best ability to predict or diagnose in, Ref. 31 where an adaptive DBN optimized by the
faults of the SSCs, it is downsized through three types Nesterov Moment (NM) is used to extract features from
of feature extraction techniques: filters (FI), wrappers rotating machinery and recognize bearing fault types and
(WR) and embedded methods (EMM). degrees simultaneously, or in Ref. 43 and, Ref. 73 where
Figure 11. Machine Learning algorithms’ nature used in the 19 own-dataset papers.
Table 6. Characteristics of the 19 own-dataset papers.
Feature Feature ML
Industrial extraction selection Health algorithm ML
Article application Objective Data acquisition method method indicator family Techniques(s) KPI(s)
30
Ball screw Prognostics Vibration TD, FRD, FI no DL GRU-PF RMSE,
Position of the TFD MAE
screw
31
Bearing Diagnostics Vibration FRD EMM no DL DBN A
32
Bearing Diagnostics Vibration TFD EMM no DL CNN A
33
Bearing Diagnostics Vibration TD, FRD EMM no DL CNN A
67
Bearing Diagnostics Vibration TD EMM no DL PSO-CNN A, (CMX)
43
Bearing Prognostics Vibration TD EMM no DL CNN RMSE,
CRA
52
Bearing Prognostics Vibration TD, FRD EMM no DL CLSTM RMSE,
MAE
54
Bearing Prognostics Vibration TD EMM no DL DBN RMSE,
MAPE
55
Bearing Prognostics Vibration, TFD EMM no DL WSGRU RMSE,
temperature MAE
64
Bearing Prognostics Vibration, oil TD, FRD EMM no DL GRUNN MAE,
supply pressure, RMSE
pressure on
bearing,
lubrification oil
flow
66
Bearing; Diagnostics Current signals TD, FRD FI yes SL ANFIS A
gear
70
Gear Diagnostics Vibration TD, FRD EMM no DL SWAE A
10
Gear Prognostics Vibration TD EMM yes DL-SL DBN- FNN MAPE,
RMSE
73
Gear Prognostics Vibration TD EMM no DL CNN A, R,
(CMX)
74
Hot stream Prognostics Strip temperature, TD EMM yes DL LSTM, CNN, RMSE,
mill’s strip thickness DBN MAE
roller Strip width
Strip flatness
Roller gap
75
Milling Prognostics Vibration TD FI no SL FNN, RF, SVR R2, MSE
Machine’s Cutting force
cutting Acoustic signals
tool
76
Milling Prognostics Vibration TD EMM no SL MRPRF R2, MSE
Machine’s Cutting force
tool
77
Milling Prognostics Vibration TD, FRD, EMM yes DL BiLSTM RMSE,
Machine’s Cutting force TFD MAE
tool
79
Milling Prognostics Vibration TD WR no DL CNN-LSTM RMSE,
Machine’s Cutting force MAE,
cutting Acoustic signals R2
tool
Polverino et al. 17
the complex process of feature selection is compressed showed in Figure 10; it is sorted by industrial application
into a single deep learning algorithm (CNN) which is type and the paper’s objective.
able to learn how to select features directly from the
original vibration signals in order to predict RUL43
or diagnose faults.73 Other examples of the EMM are
Conclusions
in Refs. 10,32,33,52,54,55,64,67,70–77].
A SLR about the PHM of industrial mechanical systems and
- Health Indicator creation. Sometimes, the features equipment was carried out. The focus concerned the most
sub-set is converted into one only health indicator used ML algorithms in diagnostics and prognostics field,
through dimension reduction approaches before and the related KPIs employed for validating them. A lit-
being consigned as input to the ML algorithm. For erature search on the Scopus database led to 50 studies
instance, Deutsch et al.10 combine the 6 extracted eligible for the above-mentioned analyses, 31 of which
TD based features (RMS, energy operator RMS, present common public datasets, and the remaining
FM0, narrowband kurtosis, amplitude modulation 19 present own datasets, i.e., datasets created specifically for
kurtosis, and frequency modulation RMS) into a the task addressed and the industrial application used by the
1-D HI to predict the RUL of a gear. Other examples authors. Concerning the family of ML algorithms, DL ones
of HI creations are in Refs. [66,74] and Ref. [77]; result to be the most used. Moreover, among the DL
- ML model application. The selected features sub-set techniques, CNN and RNN resulted as to be the most
is divided into two sub-sub-datasets (training and applied, while RF is predominant among SL techniques.
testing) used to train the SL or DL models and Regarding the KPIs, Accuracy resulted to be largely the
predict RUL or diagnose faults of the SSCs. most used for ML classification tasks, while for ML re-
Figure 11 shows the ML algorithms’ nature used by gression tasks, the frequency of the KPIs results to be more
the 19 papers’ authors, classifying them for algo- balanced with RMSE, MAE, Ai, and MAPE. Later, a further
rithm family (SL and DL), and PHM task type detailed analysis has been carried out with the aim of finding
(Diagnostics and Prognostics). a common PHM framework which describes the step-by-
- Diagnostics. It directly refers to faults’ diagnosis of the step Diagnostics and Prognostics process carried out by the
SSCs. As shown in Figure 11, a SL hybrid method authors of the 19 own-dataset papers. This analysis aims to
(ANFIS) together with 3 different DL methods (RBM, provide the reader a common practice for the best choice of
AE, and CNN) have been used 1 time each in the the ML algorithms and the related evaluation metrics for
19 own-papers to diagnose faults, with the predominance manufacturing industry.
use of CNN (3 times in the 19 own-papers). Overall, by the analyses carried out in this paper, it resulted
- Prognostics. It directly refers to RUL prediction of that research is moving towards the use of more recent DL
the SSCs. As shown in Figure 11, numerous ML techniques, rather than the classic SL algorithms, although DL
methods have been used to carry out prognosis in methods are more complex to build and require the so-called
PHM field, such as three different SL methods “big Data,” not always available. On the other hand, the
(SVR, ANN, and RF), a hybrid DL/SL method automated end-to-end feature extraction, together with an
(DBN- FNN), and four different DL methods, that improved capacity of generalization has led to a large-scale
is, a hybrid one (CNN-LSTM), RBM, CNN, and replacement of the traditional SL architectures for DL ones.
RNN; the latter is predominant, having been used The main limitation of this SLR is about the industrial
6 times in the 19 own papers. mechanical systems and equipment’s field of application; in
- ML model evaluation. The final step is about evaluating fact, other industrial fields, such as aeronautical, chemical,
the performance of ML model for PHM through the robotics, and railway fields have been excluded. Therefore,
already described KPIs in Table 3 and Table 4. It is not future studies may fill this gap.
necessary to show the EM used in the 19 own-dataset
papers, because the choice of KPIs for the evaluation of
Author contributions
ML algorithms does not depend on the type of dataset
used by the authors (own or online free datasets), but on Lorenzo Polverino: Study conception and design, data collection,
the ML task (classification and regression). Therefore, analysis and interpretation of results, writing – original draft
Figure 7 already contains the necessary information to Raffaele Abbate: Study conception and design, Methodology,
understand which EMs are used the most. analysis and interpretation of results, Review & editing
Pasquale Manco: Methodology, Review & editing
Donato Perfetto: Methodology, Review & editing
Table 6 summarizes the results described in this section Francesco Caputo: Funding acquisition, Supervision, Review
about the 19 own-dataset papers regarding the framework Roberto Macchiaroli: Funding acquisition, Supervision, Review
Mario Caterino: Study conception and design, analysis and in- 11. Mahmood S and Sunday O. Artificial intelligence in prog-
terpretation of results, Review & editing. nostic maintenance. In: Proceedings of the 29th European
Safety and Reliability Conference (ESREL), 2019.
Declaration of conflicting interests 12. Jian C, Zhuohong Y, Xi L, et al. A Review of Data Driven
Machinery Fault Diagnosis Using Machine Learning Algo-
The author(s) declared no potential conflicts of interest with re-
rithms. J Vib Eng Technol 2022; 10: 27.
spect to the research, authorship, and/or publication of this article.
13. Wu Z, Lin W and Ji Y. An integrated ensemble learning model
for imbalanced fault diagnostics and prognostics. IEEE Ac-
Funding cess 2018; 6: 8394–8402.
The author(s) disclosed receipt of the following financial support 14. Sekeroglu B, Abiyev R, Ilhan A, et al. Systematic literature
for the research, authorship, and/or publication of this article: review on machine learning and student performance prediction:
This work was supported by the project DESIRE (DEsign So- critical gaps and possible remedies. Appl Sci 2021; 11(22): 23.
lutions for Industry 4 Ready processes) under the PON “Ricerca e 15. Caiado RGG, Dias RdF, Mattos LV, et al. Towards sustainable
Innovazione” 2014-2020 and FSC. development through the perspective of eco-efficiency - A
systematic literature review. J Clean Prod 2017; 165: 890–904.
ORCID iD 16. Divya D, Marath B and Santosh Kumar M. Review of fault
detection techniques for predictive maintenance. J Qual
Raffaele Abbate  https://orcid.org/0000-0001-7885-3861
Maint Eng 2023; 29(2): 420–441.
17. Krechowicz A, Krechowicz M and Katarzyna P. Machine
References learning approaches to predict electricity production from
1. Abbate R, Caterino M, Fera M, et al. Maintenance digital twin renewable energy sources. Energies 2022; 15(23): 1–41.
using vibration data. Procedia Comput Sci 2022; 200: 18. Riahi Y, Saikouk T, Gunasekeran A, et al. Artificial intelli-
546–555. gence applications in supply chain: a descriptive bibliometric
2. Manco P, Caterino M, Fera M, et al. Maintenance manage- analysis and future research directions. Expert Syst Appl
ment for geographically distributed assets: a criticality-based 2021., 173(C): 1–19.
approach, Reliab Eng Syst Saf 2022; 218: 108148, p. 12. 19. Bottani E and Murino T. Green supply chain management: a
3. Calabrese F, Regattieri A, Bortolini M, et al. Predictive meta-analysis of recent reviews. In: IFIP international con-
maintenance: a novel framework for a data-driven, semi- ference on advances in production management systems
supervised, and partially online prognostic health manage- (APMS), 2021, pp. 632–640.
ment application in industries. Appl Sci 2021; 11(8): 3380. 20. Caterino M, Rinaldi M, Fera M, et al. Research trends in
4. Vogl GW, Weiss BA and Helu M. A review of diagnostic and clean, green and sustainable manufacturing: a bibliometric
prognostic capabilities and best practices for manufacturing. review. IFAC-Papers OnLine 2022; 55(10): 2425–2430.
J Intell Manuf 2019; 30(1): 79–85. 21. European Commission. Germany: industrie 4.0. January
5. Martin P, Jaroslav V and BednáJ. Predictive maintenance and 2017. [Online]. Available: https://ati.ec.europa.eu/sites/
intelligent sensors in smart factory: review. Sensors 2021; default/files/2020-06/DTM_Industrie4.0_DE.pdf. (Accessed
21(4): 1470–1510. 30 January 2023).
6. Shafiee M and Maxim F. A proactive group maintenance 22. Mazzei D and Ramjattan R. Machine learning for industry
policy for continuously monitored deteriorating systems: 4.0: a systematic review using deep learning-based topic
application to offshore wind turbines. Proc Inst Mech Eng O modelling. Sensors 2022; 22(22): 26.
J Risk Reliab 2015; 229(5): 373–384. 23. Vakharia V, Gupta VK and Kankar PK. A multiscale per-
7. Biggio L and Kastanis I. Prognostics and health management mutation entropy based approach to select wavelet for fault
of industrial assets: current progress and road ahead. Frontiers diagnosis of ball bearings. J Vib Control 2015; 21(16):
in Artificial Intelligence 2020; 3: 578613, p. 24. 3123–3131.
8. Zheng S, Ristovski K, Farahat A, et al. Long short-term 24. Singleton RK, Strangas EG and Aviyente S. The use of
memory network for remaining useful life estimation. In: bearing currents and vibrations in lifetime estimation of
IEEE International Conference on Prognostics and Health bearings. IEEE Trans Ind Inform 2017; 13(3): 1301–1309.
Management, 19–21 June 2017, Dallas, TX, USA: 8. 25. Xu Y, Zhou Y, Sekula P, et al. Machine learning in con-
9. Lee J, Wu F, Zhao W, et al. Prognostics and health man- struction: from shallow to deep learning. Dev Built Environ
agement design for rotary machinery systems - Reviews, 2021; 6(13): 13.
methodology and applications. Mech Syst Signal Pro 2014; 26. Polverino L, Abbate R, Manco P, et al. Machine Learning
42: 314–344. Key Performance Indicators (KPIs) for Prognostics and
10. Deutsch J and He D. Using deep learning-based approach to Health Management (PHM) of mechanical systems and
predict remaining useful life of rotating components. IEEE equipment: a systematic literature review. Conference Perf
Trans Syst Man Cybern Syst 2018; 48(1): 11–20. Manag 2022; 10.
Polverino et al. 19
27. Lesage J and Longoria RG. Mission feasibility assessment for 43. Yang B, Liu R and Zio E. Remaining useful life prediction
mobile robotic systems operating in stochastic environments. based on a double-convolutional neural network architecture.
J Dyn Sys Meas Control 2014; 137(3): 12. IEEE Trans Ind Electron 2019; 66(12): 9521–9530.
28. Saxena A, Celaya J, Saha B, et al. Metrics for offline eval- 44. Su Y, Tao F, Jin J, et al. Failure prognosis of complex
uation of prognostic performance. Int J Progn Health Manag equipment with multistream deep recurrent neural network.
2010; 1(1): 2153–2648. J Comp Inform Sci Eng 2020; 20: 11.
29. Nectoux P, Gouriveau R, Medjaher K, et al. PRONOSTIA: an 45. Hur J-W and Akpudo UE. A deep learning approach to
experimental platform for bearings accelerated degradation prognostics of rolling element bearings. Int J Integr Eng
tests. In: IEEE International Conference on Prognostics and 2020; 12(3): 178–186.
Health Management; 2012: 1–8. 46. Mao W, Chen J, Liang X, et al. A new online detection
30. Deng Y, Shichang D, Shiyao J, et al. Prognostic study of ball approach for rolling bearing incipient fault via self-adaptive
screws by ensemble data-driven particle filters. J Manuf Sys deep feature matching. IEEE Trans Instrum Meas 2020;
2020; 56: 359–372. 69(2): 443–456.
31. Xie J, Du G, Shen C, et al. An end-to-end model based 47. Li X, Zhang W, Ma H, et al. Data alignments in machinery
on improved adaptive deep belief network and its appli- remaining useful life prediction using deep adversarial neural
cation to bearing fault diagnosis. IEEE Access 2018; 6: networks. Knowl Based Syst 2020; 197: 13.
63584–63596. 48. Akpudo U and Hur J-W. A feature fusion-based prognostics
32. Wang J, Mo Z, Zhang H, et al. A deep learning method for approach for rolling element bearings. J Mecha Sci Technol
bearing fault diagnosis based on time-frequency image. IEEE 2020; 34(10): 4025–4035.
Access 2019; 7: 42373–42383. 49. Ding Y, Jia M and Cao Y. Remaining useful life estimation
33. Yang F, Zhang W, Tao L, et al. Transfer learning strategies for under multiple operating conditions via deep subdomain
deep learning-based PHM algorithms. Appl Sci 2020; 10(7): 19. adaptation. IEEE Trans Instrum Meas 2021; 70: 11.
34. Wang Y, Ning D and Feng S. A novel capsule network based 50. DIng Y, DIng P and Jia M. A novel remaining useful life
on wide convolution and multi-scale convolution for fault prediction method of rolling bearings based on deep transfer
diagnosis. Appl Sci 2020; 10(10): 16. auto-encoder. IEEE Trans Instrum Meas 2021; 70: 12.
35. Zhang L, Lin J, Shao H, et al. End-to-end unsupervised fault 51. Xu Y, Li Z, Wang S, et al. A hybrid deep-learning model for
detection using a flow-based model. Reliab Eng Syst Saf fault diagnosis of rolling bearings 2021. Meas J Int Meas
2021; 215: 107805, p. 14. Confed; 169: 108502.
36. Zhai X, Qiao F, Ma Y, et al. A novel fault diagnosis method 52. Ma M and Mao Z. Deep-convolution-based LSTM network
under dynamic working conditions based on a cnn with an for remaining useful life prediction. IEEE Trans Ind Inform
adaptive learning rate. IEEE Trans Instrum Meas 2022; 48(1): 2021; 17(3): 1658–1667.
11–20. 53. Shi J, Yu T, Goebel K, et al. Remaining useful life prediction
37. Lyu P, Zhang K, Yu W, et al. A novel RSG-based intelligent of bearings using ensemble learning: the impact of diversity in
bearing fault diagnosis method for motors in high-noise in- base learners and features. J Comp Inform Sci Eng 2021;
dustrial environment. Advanced Engineering Informatics 21(2): 12.
2022; 52: 16. 54. Ma M, Sun C, Mao Z, et al. Ensemble deep learning with
38. You K, Qiu G and Gu Y. Rolling bearing fault diagnosis using multi-objective optimization for prognosis of rotating ma-
hybrid neural network with principal component analysis. chinery. ISA Trans 2021; 113: 166–174.
Sensors 2022; 22(22): 20. 55. Ma M and Mao Z. Deep wavelet sequence-based gated re-
39. Ruan D, Wang J, Yan J, et al. CNN parameter design current units for the prognosis of rotating machinery. Struct
based on fault signal analysis and its application in bearing Health Monit 2021; 20(4): 147592172093315.
fault diagnosis. Advanced Engineering Informatics 2023; 55: 56. Wen B, Xiao M, Wang X, et al. Data-driven remaining useful
12. life prediction based on domain adaptation. PeerJ Comp Sci
40. Liao L, Jin W and Pavel R. Enhanced restricted boltzmann 2021; 7: 1–25.
machine with prognosability regularization for prognostics 57. Huang C-G, Huang H-Z, Li Y-F, et al. A novel deep con-
and health assessment. IEEE Trans Ind Electron 2016; volutional neural network-bootstrap integrated method for
63(11): 1. RUL prediction of rolling bearing. J Manuf Syst 2021; 61:
41. Yoo Y and Baek J-G. A novel image feature for the remaining 757–772.
useful lifetime prediction of bearings based on continuous 58. Luo J and Zhang X. Convolutional neural network based on
wavelet transform and convolutional neural network. Appl Sci attention mechanism and Bi-LSTM for bearing remaining life
2018; 8(7): 17. prediction. Appl Intell 2022; 52(1): 1076–1091.
42. Li X, Zhang W and Ding Q. Deep learning-based remaining 59. Bhavsar K, Vakharia V, Chaudhari R, et al. A comparative
useful life estimation of bearings using multi-scale feature study to predict bearing Degradation Using Discrete Wavelet
extraction. Reliab Eng Syst Saf 2019; 182: 208–218. Transform (DWT), Tabular Generative Adversarial Networks
(TGAN) and machine learning models. Machines 2022; for intelligent fault diagnosis. Knowl Based Syst 2023; 259:
10(3): 18. 110065, p. 13.
60. Berghout T, Mouss L-H, Bentrcia T, et al. A semi-supervised 70. Shao H, Lin J, Zhang L, et al. A novel approach of multi-
deep transfer learning approach for rolling-element bearing sensory fusion to collaborative fault diagnosis in mainte-
remaining useful life prediction. IEEE Trans Energy Conver nance. J Manuf Syst 2021; 74: 65–76.
2022; 37(2): 1200–1210. 71. Han T, Zhou T, Xiang Y, et al. Cross-machine intelligent fault
61. Wan S, Li X, Zhang Y, et al. Bearing remaining useful life diagnosis of gearbox based on deep learning and parameter
prediction with convolutional long short-term memory fusion transfer. Struct Control Health Monit 2022; 29(3): 21.
networks. Reliab Eng Syst Saf 2022; 224: 13. 72. Chen Y, Rao M, Chen X, et al. Experiment design and data
62. Ding Y, Ding P, Zhao X, et al. Transfer learning for remaining collection on the fixed-axis gearbox under time-varying op-
useful life prediction across operating conditions based on eration conditions technical report, D. o. M. E. Reliability
multisource domain adaptation. IEEE/ASME Trans Mecha- Research Lab, Ed., Edmonton, Alberta, 2018.
tron 2022; 27(5): 4143–4152. 73. Wu C, Jiang P, Ding C, et al. Intelligent fault diagnosis of
63. Li Y, Wang H, Li J, et al. A 2-D long short-term memory rotating machinery based on one-dimensional convolutional
fusion networks for bearing remaining useful life prediction. neural network. Comp Ind 2019; 108: 53–61.
IEEE Sens J 2022; 22(22): 21806–21815. 74. Jiao R, Peng K and Dong J. Remaining useful life pre-
64. Ding N, Li H, Xin Q, et al. Multi-source domain generalization diction for a roller in a hot strip mill based on deep re-
for degradation monitoring of journal bearings under unseen current neural networks. IEEE/CAA J Autom Sin 2021;
conditions. Reliab Eng Syst Saf 2023; 230: 108966, p. 21. 8(7): 1345–1353.
65. Ding Y, Jia Y, Cao Y, et al. Domain generalization via ad- 75. Wu D, Jennings C, Terpenny J, et al. A comparative study on
versarial out-domain augmentation for remaining useful life machine learning algorithms for smart manufacturing: tool wear
pred1iction of bearings under unseen conditions. Knowl prediction using random forests. J Manuf Sci Eng 2017; 139(7): 9.
Based Syst 2023; 261: 110199, p. 11. 76. Wu D, Jennings C, Terpenny J, et al. Cloud-based parallel
66. Soualhi M, Nguyen K, Soualhi A, et al. Health monitoring of machine learning for tool wear prediction. J Manuf Sci Eng
bearing and gear faults by using a new health indicator Trans ASME 2018; 140(4): 10.
extracted from current signals. Meas J Int Meas Confed 2019; 77. Huang C-G, Yin X, Huang H-Z, et al. An enhanced deep
141: 37–51. learning-based fusion prognostic method for RUL prediction.
67. Saravanakumar R, Krishnaraj N, Venkatraman S, et al. IEEE Trans Reliab 2020; 69(3): 1097–1109.
Hierarchical symbolic analysis and particle swarm optimi- 78. Li X, Jia X, Wang Y, et al. Industrial remaining useful life
zation based fault diagnosis model for rotating machineries prediction by partial observation using deep learning with
with deep neural networks. Meas J Int Meas Confed 2021; supervised attention. IEEE/ASME Trans Mechatron 2020;
171: 8. 25(5): 2241–2251.
68. Liu Z-H, Lu B-L, Wei H-L, et al. A stacked auto-encoder 79. Marei M and Li W. Cutting tool prognostics enabled by
based partial adversarial domain adaptation model for in- hybrid CNN-LSTM with transfer learning. Int J Adv Manuf
telligent fault diagnosis of rotating machines. IEEE Trans Ind Technol 2022; 118(3–4): 817–836.
Inform 2021; 17(10): 6798–6809. 80. Yang Y, Peng Z, Zang W, et al. Parameterised time-frequency
69. Su Z, Zhang J, Tang J, et al. A novel deep transfer learning analysis methods and their engineering applications: a review of
method with inter-domain decision discrepancy minimization recent advances. Mech Syst Signal Process 2019; 119: 182–221.

Polverino Et Al 2023 Machine Learning For Prognostics and Health Management of Industrial Mechanical Systems and

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Polverino Et Al 2023 Machine Learning For Prognostics and Health Management of Industrial Mechanical Systems and

Uploaded by

Copyright:

Available Formats

Special Issue: Performance Measurement and Management Systems: opportunities,

trends and new perspectives

International Journal of Engineering

Lorenzo Polverino1, Raffaele Abbate1 , Pasquale Manco1, Donato Perfetto1,

Date received: 30 March 2023; accepted: 20 June 2023

Introduction innovative, perfectly ﬁtting into the new I4.0 scenario;

- IC 1. Only papers in the ﬁnal publication stage;

Figure 1. Overview of the literature identiﬁcation process.

papers, bearings are in 74% of the analyzed studies,

Table 1. Number of papers related to the most relevant journals.

Reliability engineering and system safety 4

Figure 3. Structures, systems, or components used in the analyzed papers.

Figure 4. Datasets used in the analyzed papers.

Table 2. Machine learning algorithms used in the 50 sample papers.

ANFIS Adaptive neuro-fuzzy inference system Hybrid (ANN-statistical SL

Table 3. KPIs for ML classiﬁcation tasks.

KPI Formula Description

Accuracy (A) A ¼ TPþTNþFPþFN

Table 4. KPIs for ML regression tasks.

KPI Formula Description

Mean squared error P

Table 5. Characteristics of the selected studies.

regression task. 4 papers considered both classiﬁcation and

PHM framework for the “own-dataset papers”

Table 6. Characteristics of the 19 own-dataset papers.

You might also like