Full Text 01

DOC TOR A L T H E S I S
Machinery diagnostic techniques

for maintenance optimization
Juhamatti Saari
Operation and Maintenance Engineering

Machinery diagnostic techniques
for maintenance optimization
Juhamatti Saari
Dept. of Civil, Environmental, and Natural Resources Engineering
Division of Operation and Maintenance Engineering
Luleå University of Technology
Doctoral thesis
November 2018
Printed by Luleå University of Technology, Graphic Production 2018
ISSN 1402-1544
ISBN 978-91-7790-248-5 (print)
ISBN 978-91-7790-249-2 (pdf)
Luleå 2018
www.ltu.se
To my family and all the people who have been believing in me
during my academic journey and have guided me here.
“You never change things by fighting the existing reality. To change

something, build a new model that makes the existing model
obsolete.”
– R. Buckminster Fuller
“Long-term commitment to new learning and new philosophy is

required of any management that seeks transformation. The timid
and the fainthearted, and the people that expect quick results, are
doomed to disappointment.”
– W. Edwards Deming
“Am I tilting at windmills or is this a real giant?”

– The present author, in his capacity as a researcher
Acknowledgements
I would like to thank my supervisors, Professor Jan Lundberg, Dr
Matti Rantatalo and Dr Johan Odelius, for their support. Moreover,
I would like to thank Professors Uday Kumar and Diego Galar for
providing me with the opportunity to pursue my PhD studies. In
addition, special thanks are extended to Professor Sulo Lahdelma for
giving feedback and engaging in interesting discussions during the
research for my thesis, and to Allan Holmgren for his technical support
and all the miscellaneous topics we talked about while working in the
laboratory. SKF AB and Vinnova are gratefully acknowledged for
their financial contribution and support.
Abstract
One of the future challenges of machinery diagnostics and prognostics

is to prepare for the Internet of Things (IoT), where it is possible to
change and improve existing approaches drastically. An intensifying
application of the IoT will increase the use of embedded sensors and,
therefore, create a demand for diagnostic tools where manual work
is minimized and is mainly handled by smart algorithms. The auto-
mated anomaly detection of large assets and their components with a
system of smart algorithms needs proper optimization. Foremost, it is
critical to avoid machinery failures, since they can interrupt produc-
tion, cause unbearable production losses for the business and, even
more, can put the lives of personnel in danger if a catastrophic failure
occurs. On the other hand, if all the components are repeatedly creat-
ing false alarms, the verification of these incidents may be overwhelm-
ing. This research studied how a one-class SVM can be optimized by
tuning the algorithm to function properly by taking the criticality of
the system into consideration. Another topic dealt with was how a
one-class SVM can be used for identifying the location of faults by
carefully selecting proper input features. Furthermore, a method was
tested where a variational Bayesian for Gaussian mixture algorithm
was used for pre-processing and separating the condition monitoring
data into operation mode classes. Later these classes can be used for
improving the time for acquiring the condition monitoring data or
to give more information as to how prognostic algorithms should be
selected. In addition, a method was tested which involved the use of
a Random Forest for feature selection and for the creation of indiffer-
ence to load or other similar external factors by comparing separate
classes with each other. Overall, the idea is that all of these tech-
niques can be combined and merged in order to improve machinery
diagnostic tools and prepare for the coming era of digitalization.
Sammanfattning
Framtidens utmaningar för maskindiagnostik och livslängdprognoser
är att förbereda dessa för användning av Internet of Things (IoT) och
därmed utnyttja de möjligheter till bättre diagnostik och prognoser
som då ges. Framtidens IoT kommer att kräva övervakningsverk-
tyg där manuell arbetskraft minimeras och där detta huvudsakli-
gen hanteras av smarta algoritmer. Anomalitetsdetektering av stora
anläggningar och dess komponenter med hjälp av smarta algoritmsys-
tem kräver korrekt optimering för att fungera på avsett sätt, framför
allt för att undvika dyrbara maskinfel. Detta är betydelsefullt efter-
som maskinfel kan störa produktionen och därmed orsaka stora pro-
duktionsförluster och dessutom innebära fara för personalen. Å an-
dra sidan om alla komponenter skapar upprepade falska larm kan
verifieringen av dessa incidenter bli övermäktiga. I den här avhan-
dlingen presenteras studier av hur Support Vektor Maskiner (SVM),
kan optimeras genom att ställa in dessa algoritmer på ett korrekt sätt
genom att ta hänsyn till systemets kritikalitet. Vidare har studier
genomförts som visar hur dessa resultat kan användas för att iden-
tifiera felens lokalisering genom att noggrant välja rätt inmatnings-
funktioner. Dessutom har en metod testats där algoritmer av typen
Variativ Baysian Gaussisk (VBG) har används för förbehandling och
separering av tillståndsövervakningsdata i olika operationsmodklasser.
I framtiden kan dessa klasser användas för att förkorta tiden för att
samla in dessa tillståndsövervakningsdata eller för att ge mer informa-
tion om hur prognostiska algoritmer ska väljas. Slutligen testades en
metod för funktionsval genom att använda tekniken Random Forrest
och därmed göra det möjligt att ta hänsyn till andra externa faktorer
genom att jämföra och separera klasser av data. Sammanfattningsvis
är tanken att alla dessa tekniker kan kombineras med varandra för
att förbättra maskinsystemens diagnostiska verktyg och på detta sätt
förbereda sig för den kommande epoken av digitalisering.
List of appended papers
This doctoral thesis is composed of the following appended publica-

tions:
Paper A Saari, J., & Odelius, J. (2017). Optimizing the novelty detection
algorithm using a criticality index for rotating machine fault de-
tection on a production line. Submitted and under review.
Paper B Saari, J., Strömbergsson, D., Lundberg, J., & Thomson, A (2017).
Detection and identification of windmill bearing faults using a
one-class support vector machine (SVM). Submitted and under
review.
Paper C Saari, J., & Odelius, J. (2018). Detecting operation regimes us-
ing unsupervised clustering with infected group labelling to im-
prove machine diagnostics and prognostics. Operations Research
Perspectives, 5, 232-244.
Paper D Saari, J., Lundberg, J., Odelius, J., & Rantatalo, M. (2018).
Selection of features for fault diagnosis on rotating machines us-
ing random forest and wavelet analysis. Insight-Non-Destructive
Testing and Condition Monitoring, 60(8), 434-442.
Paper E Saari, J., Odelius, J., Lundberg, J., & Rantatalo, M. (2015).
Using wavelet transform analysis and the support vector machine
to detect angular misalignment of a rubber coupling. In MCMD
and MPMM 2015 conference (pp. 117-126).
Contents
Contents vii
Nomenclature ix
I Part I 1
1 INTRODUCTION 3
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Future of condition monitoring . . . . . . . . . . . . . . . 6
1.1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Linkage of research questions and appended papers . . . . 8
1.2.2 Contribution of authors . . . . . . . . . . . . . . . . . . . 9
1.2.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 STATE OF THE ART 11

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Data acquisition and data processing . . . . . . . . . . . . . . . . 13
2.2.1 Vibration analysis . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1.1 Time-frequency analysis . . . . . . . . . . . . . . 16
2.2.1.2 Envelope analysis . . . . . . . . . . . . . . . . . . 18
2.2.2 Feature selection and dimensional reduction . . . . . . . . 18
2.3 Maintenance decision making . . . . . . . . . . . . . . . . . . . . 20
2.3.1 Detection of faults and system behaviour . . . . . . . . . . 20
2.3.2 Operating context . . . . . . . . . . . . . . . . . . . . . . . 23
vii
CONTENTS
2.3.3 Fault propagation . . . . . . . . . . . . . . . . . . . . . . . 25

2.3.4 Soft issues concerning PHM . . . . . . . . . . . . . . . . . 27
2.4 Concluding remarks concerning the state of the art . . . . . . . . 30
2.5 Research Gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 RESEARCH METHODS 33
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.1 Test rig design . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.2 Measurement equipment . . . . . . . . . . . . . . . . . . . 36
3.2.3 Misalignment of the output shaft . . . . . . . . . . . . . . 37
3.2.4 Several seeded component faults . . . . . . . . . . . . . . . 38
3.3 Test rig for studying bearing defects . . . . . . . . . . . . . . . . . 40
3.4 Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.1 Vibration signals collected from an underground loader . . 41
3.4.2 Bearing test data used in the Paper A . . . . . . . . . . . 43
3.4.3 Damaged wind turbine bearing . . . . . . . . . . . . . . . 44
4 Summary of the appended papers 45

4.1 Paper A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Paper B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Paper C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4 Paper D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5 Paper E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 RESULTS AND DISCUSSIONS 49

5.1 Results from additional bearing degradation tests in the laboratory 49
5.2 Results and discussion related to RQ 1 . . . . . . . . . . . . . . . 53
5.2.1 Comparison of results obtained using the laboratory tests . 59
5.2.1.1 Detection models using OCSVM . . . . . . . . . 60
viii
CONTENTS CONTENTS
6 CONCLUSIONS AND FUTURE WORK 85

6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
REFERENCES 91
II Part II 103
Paper A 105
Paper B 121
Paper C 143
Paper D 157
Paper E 167
ix
Nomenclature
ANN artificial neural network
AS Anomaly score
BPFI Ball pass frequency inner (inner ring)
BPFO Ball pass frequency outer (outer ring)
CBM Condition based maintenance
CM Condition monitoring
CWT Continuous wavelet transform
DWT Discrete wavelet transform
EM algorithm Expectation-maximization algorithm
FDI Fault detection and identification
ICA Independent component analysis
IHVM Integrated vehicle health management
IoT Internet of things
ISHM Integrated system health management
LHD Load, haul, dump machine
MEMS Micro-electro-mechanical system
xi
NN Neural network
OCSVM One-class SVM
PCA Principal component analysis
PEID Product embedded information devices
PHM Prognostics and health management
RBF Radial basis function
RFID Radio frequency identification
ROC Receiver operating characteristic
RUL Remaining useful life
SCADA Supervisory control and data acquisition
SK Spectral kurtosis
STFT Short-time Fourier transform
SVC Support vector clustering
SVDD Support vector domain description
SVM Support vector machine
VBGM Variational Bayesian for Gaussian mixture model
VHM Vehicle health management
WT Wavelet transform
WVD Wigner-Ville distribution

Part I
Part I
1
Chapter 1
INTRODUCTION
1.1 Background
During the machine age, maintenance evolved each time new innovations oc-
curred. The first steps of the machine age were taken when the industrial revolu-
tion process began in the 18th century with the invention of steam engines. This
changed production by replacing human labour with machines and more efficient
means of production were achieved. At this stage, the focus in maintenance was
primarily on corrective maintenance performed to restore failed systems to an
operational state [Kobbacy and Murthy, 2008].
Fixing things when they broke remained the main focus of maintenance until
the next rapid progress took place during the technological revolution, which
created innovations such as mass production and production lines just before
World War 1. These changes were driven by the use of electricity, which made the
production of steel and other products easier. This progress led to a realization
of the importance of preventive maintenance, which was fully appreciated during
the Second World War [Kobbacy and Murthy, 2008]. Furthermore, during this
era, work efficiency was improved and the productivity of factories was increased.
However, these developments created a demand for skilled labour who knew how
to operate and maintain the new machines of the era.
Preventive maintenance was mainly carried out at a fixed time based on recom-
mendations, and the machines were repaired or replaced according to a schedule.
3
1. INTRODUCTION
The Third Industrial Revolution is said to have begun after nuclear energy was
created and this also changed how maintenance was performed. This revolution
witnessed the rise of electronics with the transistor and microprocessor. Inno-
vations in telecommunications and computers led to a point where a high level
of automation in production was possible due to programmable logic controllers
and robots. Due to these innovations, it was possible to implement preventive
maintenance strategies based on the condition of the system. Using these strate-
gies, sensor readings could be compared to previous values in order to decide
when a value had crossed some predefined threshold and maintenance actions
could be performed to restore the machine to its original state. This predictive
maintenance approach was later defined as condition-based maintenance. Even
nowadays, the vast majority of improvements seen in condition-based mainte-
nance are due to the increased capacity of computers, which has made on-site
data processing possible and has improved data processing methods as well as
sensor technology.
It is possible that soon new maintenance approaches will be coming. The
Internet and digitalization have led to a point where the Fourth Industrial Revo-
lution, Industry 4.0, can be said to be beginning [Wollschlaeger et al., 2017]. The
driving forces behind it are the Internet of Things (IoT) and machine-to-machine
communication (M2M), where machines can communicate with other machines
or with people effortlessly. These smart technologies will inevitably change the
way in which these systems can be managed, since the state of system and their
components can be monitored and estimated in real time with minimal human
interventions [Liao et al., 2017]. Once this is achieved, it may be possible to
ease the work load of manual labour, but also some of the tasks, which now need
skilled workers can be outsourced for machines.
Since machines have become complex and difficult to comprehend as a sepa-
rated entity, a word asset is sometimes used instead. An asset is defined according
to ISO 55000 [2014] as an item or entity that has potential or actual value to an
organization. It seems that the next future step in maintenance will be to provide
an integrated solution where maintenance is involved in every step of life cycle
asset management, from design to disposal, and maintenance is fundamentally
considered to be an integral part of the core of the business, similarly to oper-
4
1. INTRODUCTION
ation and design. With the help of supercomputing analytical tools and cloud
computing platforms, there is a potential for creating new innovations in mainte-
nance and the vision of having self-repairable machines is one step closer, even for
complex systems [Umeda et al., 1995]. At the moment these innovations related
to the IoT are available, but are not fully utilized. One possible reason for this
is that no one really knows what type of factories will be in operation in the
future and, therefore, long-term commitment to the creation of new platforms
and protocols is seldom achieved.
For this reason, one of the challenges in maintenance is how to minimize the
overall cost by taking into account all the tangible matters, such as the labour
cost and time, as well as intangible matters such as the reputation of the com-
pany. This is made especially difficult by the fact that the core business of many
successful companies is not related to physical assets any more. Before this mini-
mization of costs can be achieved and maintenance approaches can be optimized,
two aspects of the monitored machines should be known more accurately, i.e. the
current state (health) of the machines and their estimated future state. The art
of determining these states is known as machinery diagnostics and prognostics.
According to ISO 13372 [2012], diagnostics is the examination of symptoms and
syndromes to determine the nature of faults or failures, whereas prognostics is
the analysis of the symptoms of faults to predict the future condition and residual
life within the design parameters. Here symptoms means indications perceived,
by means of human observations and measurements, of the presence of one or
more faults [ISO 13372, 2012].
As can be seen from the above definitions, there are many aspects which will
affect machinery diagnosis and prognosis and these tasks are not as straightfor-
ward as they may first seem. Although there is a fundamental scientific reasoning
behind each value of the estimated state, terms or phrases such as “ symptoms
”or “within the design parameters” make diagnosis and prognosis much harder
than they may first seem. In order to simplify the definition, instead of using the
wording “... to predict the future condition and residual life within the design
parameters”, it has become more popular to speak in terms of estimating the
remaining useful life (RUL) [Saxena et al., 2008; Si et al., 2011; Sikorska et al.,
2011]. Nevertheless, this has the same meaning as the definition of prognostics
5
1. INTRODUCTION
provided by ISO 13372 [2012].
1.1.1 Future of condition monitoring

It is expected that the IoT will transform traditional machines into smart ma-
chines by exploiting their underlying technologies, such as ubiquitous and perva-
sive computing, embedded devices, communication technologies, sensor networks,
and Internet protocols and applications [Al-Fuqaha et al., 2015].
Even though machines have become and are becoming smarter with the in-
creasing use of sensors embedded into assets and with the increasing amount
of data being collected, fundamental problems of condition monitoring are still
present. According to Randall [2011] one of these problems is to distinguish when
changes seen in the measured values are due to changes in the sources (alleged
faulty components) creating the stresses seen in the vibration signals or when
they are due to changes in the transmission path or other unknown factors.
To overcome this problem, one of the researched topics in condition monitoring
is integration of the sensor and the associated electronics into the structure[Holm-
Hansen and Gao, 1997]. This approach can be exemplified by the replacement
of conventional bearings with smart bearings, which would mean that the sensor
would be located at the heart of the machine and that the transform path would
be reduced to a minimum since the signals would not have to travel through
structures. It would also remove the responsibility for installing sensors from
the end user. Another benefit would be an identical sensing configuration for
all installations (i.e. identical transmission path effects) [Holm-Hansen and Gao,
1997]. This may expand the capability of detecting several different types of
failures, since even low-cost machines would have a sensor and these sensors
would be at close proximity to the loading element. On top of that, information
could be shared with components monitored nearby and this information could
be used for estimating the total health of the component inside a larger system.
Although the fundamental problems of condition monitoring have not changed,
methods and algorithms have changed, mainly due to the rapid development of
computers discussed above. This development has created many algorithms which
are referred to as machine learning algorithms and whose aim is to give computer
6
1. INTRODUCTION
systems the ability to make predictions based on training data. These algorithms
can be used, for instance, to state when the machine is not healthy (e.g. classifica-
tion), to estimate the future state (e.g. regression) or to divide the collected data
into meaningful groups (data mining clustering). Machine learning has created
a solid framework for asset health monitoring and many monitoring techniques
have been applied successfully over the past two decades [Farrar and Worden,
2012]. However, there are still problems within condition monitoring which need
to be solved before new maintenance approaches can be obtained.
1.1.2 Problem statement

Many state-of-the-art condition monitoring techniques and technologies are de-
signed and tailored for a specific system and do not perform well when adapted
for other systems without extensive manual work, which is usually quite time
consuming and thus too expensive. This has created the problem that, although
the number of tools available for diagnostics and prognostics is increasing, they
are not integrated in industrial applications, and hence many of these techniques
require industrial validation.
A challenge in machinery diagnostics and prognostics is to be able to optimize
each sub-task (e.g. fault detection or identification) in such a manner that the
state of the system and the estimated future state are known as well as possible,
before reducing the amount of information available. Therefore, more parallel
thinking is needed in order to determine if some of the sub-task problems can be
solved by using other sources of information.
A key problem is to utilize the potential of machine learning methods in
combination with the existing condition monitoring techniques, or to find new
approaches which surpass the old methods completely.
1.2 Research questions

The overall purpose of the present research has been to develop generalized fault
detection methods that can easily be adjusted for different machinery in varying
operating conditions. An additional aim has been that these methods should be
7
1. INTRODUCTION
able to facilitate prognostic approaches by allowing a smoother transition from

knowing the state of the machinery to estimating its future state.
Based on these issues, the following research questions (RQs) have been for-
mulated.
RQ 1 How can rotating machinery fault detection be improved by considering

how critical the machinery is and its usage?
RQ 2 How can operation changes be detected using vibration data together with
a data mining clustering algorithm for machinery diagnostic and prognostic
purposes?
RQ 3 How can less ad hoc condition indicators be extracted from vibration signals
by using failure data?
RQ 4 How does one develop a framework where a smooth transition from diag-
nostic to prognostic approaches is possible by taking into account issues
such as the amount of collected data, the current and predicted future use
of the system, and known physical relations between the wear process and
load?
1.2.1 Linkage of research questions and appended papers

The main linkage between the research questions and the appended papers is
shown in Table 1.1
Table 1.1: Linkage of the research questions (RQs), thesis and appended
papers(A-E)
A B C D E Thesis
RQ 1 X X X
. RQ 2 X
RQ 3 X X
RQ 4 X X X X X X
8
1. INTRODUCTION
1.2.2 Contribution of authors

The contribution of each author to the appended papers is shown in Table 1.2,
divided into the following activities:
1. topic formulation,
2. test rig design & measurement setup,
3. measurements
4. data analysis,
5. drafting the paper,
6. revision of important intellectual content,
7. final approval for submission.
Table 1.2: Contribution of the authors of the appended papers (A-D).
A B C D E
Saari, J. 1,4,5,6,7 1,4,5,6,7 1-7 1-7 1-7
Lundberg, J. 5,6,7 1,2,5,6,7 2,5,6,7
Odelius, J. 4,5,6,7 5,6,7 1,5,6,7 5,6,7
Rantatalo, M. 1,5,6,7 5,6,7
Strömbergsson, D. 4,5,6,7
Thomson, A. 3,4,5,6,7
1.2.3 Limitations
The research has been limited to the study of methods which are relevant for
data-driven models by excluding the usage of physics-based simulation models.
In all the methods studied, it is assumed that the bottleneck of the proposed
architecture is not constituted by wireless communications or how smart sensors
will be powered in the future. All the algorithms are assumed to work in a
centralized manner where the computing power is not restricting the available
9
1. INTRODUCTION
methods and where the signals (e.g. raw vibration signals and speed, temperature
and load signals) can be transferred into the cloud from the smart bearing without
any data loss. Furthermore, the present thesis is mostly limited to analyzing
vibration data, although the use of proposed techniques can be applied to other
data types, as well.
10
Chapter 2
STATE OF THE ART
2.1 Background
By systematically studying the literature, it has been revealed that there are two
prevalent methodologies for system health monitoring: prognostics and health
management (PHM) and condition-based maintenance (CBM).
According to Kalgren et al. [2006] PHM is “a health management approach
utilizing measurements, models, and software to perform incipient fault detection,
condition assessment, and failure progression prediction”.
Ben-Daya et al. [2016] explained that the CBM approach consists of the
following steps:
1. data collection using sensors and condition monitoring techniques;
2. assessing item state if it has not failed, or detect the fault when it has failed;
3. predicting the future condition;
4. taking an appropriate maintenance action.
11
2. STATE OF THE ART
As can be seen from these two definitions, there are no big differences be-
tween these two methodologies and they can be considered to be analogous. In
fact, occasionally PHM is referred to as merely a further development of the
CBM concept, since the term PHM is a more recent term than CBM. Further-
more, in the literature, sometimes the terms VHM/IVHM (vehicle health manage-
ment/integrated vehicle health management) or integrated systems health man-
agement (ISHM) are used instead of PHM. However, these terms have the same
meaning and the difference between them is only their area of application, i.e.
earthbound vehicles or airborne vehicles (e.g. space vehicles) [Schwabacher and
Goebel, 2007]. A brief background to these different terms is given in the study
by Baroth et al. [2001].
Since the term CBM is older than PHM, more search hits are found by using
this term, see Table 2.1. Moreover, when studying only recent hits (published
during the past decade), it can still be seen that more research relates to CBM,
but it seems that the term PHM has become popular within the past decade.
Table 2.1: Number of hits (patents and citations excluded) for terms relating to
fault diagnostic and prognostic techniques.
Google Scholar ScienceDirect
Term Abbreviation
All 2007-2017 All 2007-2017
Prognostics and health PHM 10700 9910 515 511
management
Condition-based main- CBM 32600 17400 772 552
tenance
In order to cover the state of the art in the fields of CBM and PHM fully,
the subject was divided into several categories using the classification seen in
the flowchart in Figure 2.1: data acquisition, data processing and maintenance
decision making. The steps shown in Figure 2.1 were mainly adopted in studies
performed by Jardine et al. [2006], Ben-Daya et al. [2016] and Heng et al. [2009].
In the review by Jardine et al. [2006], a greater emphasis was placed on data
processing and the maintenance decision-making process and its sub-categories.
A book written by Ben-Daya et al. [2016] includes more practical examples of
how to carry out the CBM steps. Heng et al. [2009] provided more details on
several health prediction approaches and their challenges. Figure 2.1 illustrates
12
2. STATE OF THE ART
how all the steps together affect the end result, which subsequently may lead all
the way to estimating the RUL. Furthermore, the relation between risk and the
RUL is shown in order to highlight the fact that there is a connection between
these two. Mainly the connection between the risk and the RUL is handled in
the selection of the threshold of detection and the confidence interval. However,
more practical decisions such as selecting appropriate sensors and tools will also
affect the estimated RUL.
DATA DATA MAINTENANCE DECISION

ACQUISITION PROCESSING MAKING
Condition
Business
monitoring Event data Data
Diagnostics Prognostics strategy,
data cleaning
Management
Examples Examples
Vibration
Breakdown Data COST,
Acoustic analysis PROBABILITY,
CONSEQUENCES
Overhaul
Oil analysis Approaches

Types
Statistical
Temperature
Values
RISK
Physics-
Waveforms based
Pressure
Image
Machine
Multidimensions
learning
RUL
Figure 2.1: Flowchart for the condition monitoring procedure and its relation to
business strategy.
2.2 Data acquisition and data processing

Data collection is the collection and organization of items of data to produce
meaningful information. The goal of data processing in condition monitoring is
to extract and select features, which are measurable properties or characteristics
13
2. STATE OF THE ART
of the phenomenon being observed. In PHM, there are three tasks which mainly
determine which features should be measured: stating the current condition (used
for diagnostic purposes), measuring the degradation process (used for prognostics)
and estimating how the system will alter owing to changes not related to faults
or failures (used for improving the diagnostic and prognostic techniques) [Jardine
et al., 2006; Malhi and Gao, 2004; Randall, 2011].
In the past, data processing was typically performed on clean datasets from
well-known and limited sources [Hashem et al., 2015]. However, with the advent
of industry 4.0, this has started to change and more unstructured datasets will
be in use, and obtaining high-quality data from vast collections of data sources
is and will be a challenge [Hashem et al., 2015]. It is reasonable to assume that
the use of this type of dataset, coming from various places, will most likely also
become more popular for PHM, and therefore the use of tools for processing the
data will increase and become more important. In fact, It is estimated that 80 %
of data analysis is devoted to the process of cleaning and preparing the data [Dasu
and Johnson, 2003], which could be reduced to some extent, if unstructured data
can be exploited more than it is nowadays.
The data collected in a CBM programme can be divided up into two main
types, i.e. event data and condition monitoring data [Jardine et al., 2006]. Event
data include information on what has happened, the cause of the event and
what has been done in connection with the event. Although this information
can be useful for PHM purposes, it is rarely utilized since it requires manual
work and, therefore, is much more prone to errors than condition monitoring
data [Jardine et al., 2006]. Hodkiewicz and Ho [2016] studied how these data
could be utilized more than they are now. They developed a rule-based data
cleansing tool for extracting useful information from work orders. As a result,
it was found that work orders can be used to generate useful information for
failure analysis [Hodkiewicz and Ho, 2016]. It is reasonable to estimate that in
the future, the IoT will make it possible to combine these historical event data
records with condition monitoring (CM) data by stating the time when certain
failures have occurred. A database can be created where the data are labelled
accordingly without extensive manual work. This can be beneficial for many
CM approaches, since for instance more accurate supervised machine learning
14
2. STATE OF THE ART
approaches can be used instead of solely relying on unsupervised methods.
2.2.1 Vibration analysis

The predominant and most widely used CM data type is vibration data, since
many failure modes can cause an increase in the machine vibrations [Randall,
2011]. Vibration analysis has proven to be the predominant method for estimating
the current and future state of the machine. However, raw vibration signals (for
time domain analysis) can rarely be used as they are, since faults are visible only
in certain frequency components. Moreover, these frequency components may
also be masked with other vibration sources coming from structural resonance
or other machine components. In fact, many of the advances made in vibration
analysis concern better approaches where fault-relevant information can be seen
earlier by using better pre-processing steps. The objective of all pre-processing
tools is to extract features which can be used to assess the status of the system.
In the present thesis, features are defined as individual measurable properties or
characteristics of the phenomenon being observed. The features extracted make
up a dataset given as an input for the machine learning algorithm. However, in
some references these are named as metrics, indicators, key indicators, variables
or parameters.
Three main approaches for extracting features from vibration signals are as
follows: envelope analysis [McFadden and Smith, 1984], time-frequency anal-
ysis [Cohen, 1989; Neild et al., 2003] and cepstrum analysis [Oppenheim and
Schafer, 2004; Randall, 1982]].
Comparing these three approaches, envelope analysis and time-frequency anal-
ysis are used more for detecting many types of failure modes, while cepstrum
analysis is found to be effective for detecting faults where the signal contains
families of harmonics and sidebands [Randall, 2011], such as gear faults. How-
ever, as stated by Randall and Antoni [2011], bearing defects can be considered
to be pseudo-cyclostationary, and hence envelope analysis is more suited for de-
tecting bearing faults, even though cepstrum analysis can be used for detecting
bearing faults in some situations. For these reasons, the basis for extracting
features for monitoring critical components of a system favors either envelope
15
2. STATE OF THE ART
analysis or time-frequency analysis. Although the optimum method for extract-

ing fault-specific features would be to use all the methods simultaneously, this
would create another type of problems. These problems relate to the processing
of the data and where it should be performed. If raw signals cannot be transferred
from distant location to a central computers (off-site, centralized process) due to
challenges in computer networks, processing has to be on-site (in a de-centralized
process). However this will require more computational power and can be too
much of a burden for the embedded smart sensors, since many signal processing
methods has to be used simultaneously. Therefore, in reality, mostly only one
signal processing method or a few such methods are used for detecting most of
the probable failure modes in a system.
2.2.1.1 Time-frequency analysis
Time-frequency analysis studies a signal in both the time and the frequency
domain simultaneously by using various time-frequency representations. It is
more suitable than frequency analysis when the speed varies, e.g. for mov-
ing vehicles and wind turbines. It can also reveal small impacts based on the
continuous signal, which may be impossible for pure frequency analysis. Three
known time-frequency analysis methods used for condition monitoring are the
Wigner-Ville distribution (WVD), the short-time Fourier transform (STFT) and
the wavelet transform (WT). The WVD is the oldest known time-frequency trans-
form method. Wigner [1932] applied it to quantum mechanics at the beginning
of the 1930s, and in the 1940s, Ville [1948] applied the transform to signal pro-
cessing, which explains the origin of its name. Since then it has been used to
diagnose many types of machine faults. For example, Staszewski et al. [1997]
used it to detect gearbox faults. Although they concluded that WVD leads to
a superior frequency-domain resolution, it can produce high-energy coefficients
in the transform plane, even though no such coefficients actually exist. This in-
ference term can be filtered, but then some of the excellent frequency resolution
will be lost. When filtering of the plane is included, the WVD can be referred to
as the pseudo-Wigner-Ville distribution. A comparative study of the use of this
method for CM is to be found in an article written by Baydar and Ball [2001].
16
2. STATE OF THE ART
The STFT, or windowed Fourier transform, uses a windowing function to

separate a small section from the signal (a short time) and produces a snapshot
of the signal. Overlapping each analysed segment and summing them will lead
to an image (named a spectrogram) which can represent how the signal will
vary in time. The windowing function can have different shapes, as in standard
frequency analysis. By choosing the optimum windowing function, the detection
performance can be improved. However, the benefits can be quite minimal, and
mostly a Gaussian windowing function is used, since it has been proven to work in
many cases [Wang and McFadden, 1993]. The STFT has also been widely used for
detecting faults in rotating machines [Bartelmus and Zimroz, 2009; Cohen, 1989].
Unfortunately, because of the nature of the calculation of the spectrogram, it is
not possible to know the exact time-frequency representation; i.e. by knowing the
frequency component precisely, the exact time instance is unknown. This is also
known as a manifestation of the famous Heisenberg uncertainty principle, Allen
and Mills [1993].
This fixed resolution pitfall of the STFT is one of the reasons why the wavelet
transform was developed Peng and Chu [2004]. Wavelets are functions whose
translations and dilations can be used for expansions of square-integrable func-
tions. Instead of having a fixed window shape, as is the case in the STFT, the idea
was devised of using the same basic filter shape (mother wavelet) and shrinking
its time domain extension. This leads to a time-scale representation that can have
a good time resolution for high-frequency events and a good frequency resolution
for low-frequency events. For this reason, the wavelet transform is a promis-
ing tool which can be used for many types of machine faults [Peng and Chu,
2004]. It can detect transient signals whose origin is, for example, a broken gear
tooth [Bafroui and Ohadi, 2014; Fan and Zuo, 2006] or signals which are longer
in duration and are caused, for example, by worn gears [Bafroui and Ohadi, 2014]
or worn bearings [Li and Ma, 1997].
Even though wavelets have been studied with a view to optimizing and au-
tomating feature extraction for problems such as gear and bearing faults [Rafiee
et al., 2010; Yan et al., 2014]], there is still a problem with wavelets concerning
the selection process for the mother wavelet, which should resemble the signal
created by the fault [Randall, 2011]. Therefore selecting the mother wavelet is
17
2. STATE OF THE ART
not an easy task since, even if the fault can be simulated using physical models
in order to know the shape of the transient signal, the transformation path is still
unknown and may create a different type of signal once it is acquired.
2.2.1.2 Envelope analysis
The main reason for developing envelope analysis was to shift frequency analysis
from the very high range of resonant carrier frequencies, to the much lower range
of fault frequencies, so that the analysis could be performed with good resolution.
In a study by [Randall and Antoni, 2011]], envelope analysis was declared to be an
effective method, especially for detecting incipient bearing defects. In the same
study it was also said that the main problem with this technique is improper
selection of the envelope window frequency and window bandwidth, and the best
selection is usually achieved only by manual selection for each case. Moreover,
Randall and Antoni [2011] proposed the use of a method where spectral kurtosis
(SK) is applied to select the band where the biggest dB change is seen when
comparing the current condition to the original condition, to make the selection
more automated. The use of SK was investigated further in other studies [Antoni,
2007; Lei et al., 2011; Tang et al., 2016]. Based on these studies, it can be stated
that the disadvantage of this approach is that, if the bandwidth is selected using
SK, noise coming from other components or unwanted system resonances may
lead to a wrong bandwidth which is not optimal. Nevertheless, it seems that
techniques based on SK have been shown to be a most promising way to detect
incipient faults as early as possible [Wang et al., 2016].
2.2.2 Feature selection and dimensional reduction

In contrast to the traditional method of using one feature or one fault indicator,
“black box” approaches use multiple features and have been exerting more attrac-
tion [Olden and Jackson, 2002]. These machine learning approaches utilise multi-
ple input features together to make a prediction as to whether faults are present
or not. However, this has created the problem that, when the dimensionality of
the data increases, the volume of the space increases so fast that the available
data become sparse and reduce the performance of many algorithms [Domingos,
18
2. STATE OF THE ART
2012].
For this reason, techniques have been studied which can be used to reduce the
sparsity, select only those features which are more relevant, and discard others
[Martin-del Campo and Sandin, 2017]. This type of feature engineering is said
to be the holy grail of machine learning, but is difficult because some irrelevant
features may in fact be relevant when combined with other features [Domingos,
2012; Guyon and Elisseeff, 2003].
Feature selection methods can be divided into two groups: methods which
use supervised techniques and methods which use unsupervised ones [Guyon and
Elisseeff, 2003; Malhi and Gao, 2004].
Principal component analysis (PCA) is perhaps one of the simplest multivari-
ate analysis techniques for reducing the dimensionality. PCA can be seen as an
operation which can reveal the internal structure of the data in a way that best
explains the variance in the data [Malhi and Gao, 2004; Zuo et al., 2005].
Malhi and Gao [2004] used PCA to reduce features when classifying seeded
inner and outer race defects of a bearing; they conducted one test where an initial
scratch on the outer raceway of a bearing was continually tested until the entire
raceway was damaged and practically non-functional. As a result, they found out
that PCA was able to improve the classification of a feedforward neural network
as well as the k-means clustering. Malhi and Gao [2004] showed that in studies
performed with seeded faults, very uniform groups of data were created with
some of the features, and therefore the task was slightly easier than expected
in real case studies. The limitation of PCA is that low variance components
are considered as noise and will be discarded as useless. However, in some cases
these features together may play an important role, especially when the difference
between the faulty case and the healthy case is small. For these cases, methods
such as independent component analysis (ICA) can achieve better results [Zuo
et al., 2005]. Zuo et al. [2005] used ICA as a way to reduce features extracted
using wavelet analysis and proved that ICA produced better results than PCA
concerning the detection of gear tooth failure. Their test was performed using a
dataset collected from a SpectraQuest test rig with seeded faults.
Random Forests can be used to rank the importance of variables in a regression
or classification problem in a natural way. A technique was described in Breimans
19
2. STATE OF THE ART
original paper [Breiman, 2001] which is based on measurement of the variable

importance in a dataset by fitting a Random Forest to the data. During the
fitting process, the out-of-bag error score for each data point is recorded and
averaged over the forest; errors in an independent test set can be substituted if
bagging is not used during training. Features which produce large values for this
score are ranked as more important than features which produce small values.
The advantage of this approach over approaches such as PCA or ICA is that
the feature elimination is more explicit and irrelevant features will be eliminated
in a more robust way [Menze et al., 2009; Saeys et al., 2008]. However, this
is achieved at the expense of using ensemble methods where the predictions of
several classifiers will be combined. Nevertheless, these types of feature selection
methods can be useful for cases when, for instance, the fault detection features
are not optimal and post-analysis is needed.
2.3 Maintenance decision making

Maintenance decision making consists of several processes, such as planning, man-
agement, policy selection, efficiency analysis, life-cycle management and outsourc-
ing [Ruschel et al., 2017]. The nature of these processes is very dependent on the
branch of industry in question. Nevertheless, they all are related to the business
risk and, once the business risk is minimized, all the decision-making processes
will be optimized accordingly. Consequently, there is a link between the RUL and
the risk, as illustrated in Figure 2.1. In the following sections, we mainly discuss
maintenance decision making involving issues directly related to PHM.
2.3.1 Detection of faults and system behaviour

As was the case for feature selection, fault detection algorithms can be classi-
fied as being either unsupervised or supervised. In fact, many feature selection
algorithms may be the same as the algorithms used for fault detection, and the
only difference between the two types of algorithms may be their objective func-
tion, with the latter type classifying a new data instance as being either normal
or anomalous. Note that anomaly here means either a fault or an instance of
20
2. STATE OF THE ART
unexpected behaviour which differs from previous cases.

A popular choice for a fault detection algorithm is a supervised classification
algorithm [Gao et al., 2002; Konar and Chattopadhyay, 2011; Kotsiantis et al.,
2006; Li et al., 2000; Paya et al., 1997; Soualhi et al., 2015]
Among the popular supervised classifiers, two classifiers stand out, namely
classifiers based on neural networks (NNs) and classifiers based on support vector
machines (SVMs). According to Widodo and Yang [2007], the SVM can solve
the learning problem with a smaller number of samples than are required using
an ANN. Therefore, the SVM can have a high accuracy and good generalization
for a smaller number of samples, which can be advantageous for machinery fault
diagnostics and prognostics since it is difficult to obtain sufficient fault samples
in practice.
The SVM is a computational learning method based on statistical learning
theory (SLT), developed by Vapnik [1995]. An SVM algorithm constructs and
searches the separating hyperplanes with a maximum margin by transforming
the problem description into a dual space by means of a Lagrangian. Gao et al.
[2002] used an SVM and a wavelet packet transform for the fault diagnosis of
valves in three-cylinder reciprocating pumps and found that the polynomial and
RBF kernel produce similar results. In addition, they found that a one-vs-one
method was more accurate than a one-vs-all method where faulty-condition data
were compared with data consisting of a combination of both healthy-condition
data and faulty-condition data. In addition, the SVM approach showed better
effectiveness and robustness than using ANNs. Konar and Chattopadhyay [2011]
used an SVM to detect bearing faults in an induction motor using a continu-
ous wavelet transform (CWT) to extract relevant features. Two mother wavelets
were tested, namely the Morlet and Daubechies 10 wavelets. The results were
excellent and this approach was considered to be a better alternative than using
an ANN with DWT-based features. Soualhi et al. [2015] used an SVM for de-
tecting bearing faults by dividing the training data into three classes: good-state
data, medium-state data and degraded state data. The kernel function in their
experiment was a polynomial of degree 3. The features used as an input were
obtained using the Hilbert-Huang transform. Although their technique was found
to be promising, it is still unclear how to select the appropriate kernel function
21
2. STATE OF THE ART
and how the degraded states are selected if applying the method for other bear-
ing types. Furthermore the generic disadvantageof the SVM, is that it requires
labelled training data (historical data) as a prerequisite for its use. A similar
multi-class SVM method was used by Li et al. [2013]. In their study, the input
features were obtained by extracting the following time domain features: the
range, mean, absolute average, mean square value, RMS, variance and standard
deviation, skewness and kurtosis. The selection of suitable input features was
performed using an improved ant colony optimization (IACO) algorithm. This
technique was compared with the cross-validation and genetic algorithm (GA)
methods. The technique was found to be feasible, but the training data were
obtained using a test rig with seeded faults, with the speed and load remain-
ing constant during the experiment. Nourmohammadzadeh and Hartmann [2015]
used SVMs which were enhanced using genetic algorithms for classifying centrifu-
gal pump faults. In total, five faults were present and each fault was trained using
a one-vs-all protocol. Four different kernel functions were used, namely the poly-
nomial, RBG, linear and quadratic functions. They also compared the results
with the results for three other methods, namely the ANN, KNN and decision
trees. Comparing all the methods studied, it was concluded that the SVM with
a Gaussian kernel had the best accuracy. Moreover, in their study, the SVM was
found to be superior to the ANN in most of the cases. As said previously, the
requirement for historical data is perhaps the biggest limiting factor for the SVM
and other similar approaches. Therefore, some approaches have been developed
where similar methods are used whose training is performed using only nominal
data. [Schölkopf et al., 2001; Tax and Duin, 1999; Yu, 2013]
Yu [2013] developed a probabilistic approach based on support vector cluster-
ing (SVC) and implemented it to detect and classify faults in a complex chemical
process. He conducted a case study on Tennessee Eastman chemical process data.
The aim of Yus method was to separate healthy operations from faulty opera-
tions by forming separated clusters. Therefore, the method can be considered
to perform the separation of data by using clustering. Although Yu considered
the method to be unsupervised, he recommended using operator feedback to val-
idate the results. Ground-breaking work on the use of unsupervised SVMs was
performed by Tax and Duin [1999], who developed a method called support vec-
22
2. STATE OF THE ART
tor domain description (SVDD), and by Schölkopf et al. [2001] who developed
a method named the one-class SVM (OCSVM or ν − SV M ). In SVDD, one
determines the minimal volume of the hypersphere enclosing most of the target
data (nominal data). New instances outside the boundaries of the describing hy-
persphere are then classified as outliers. SVDD was used for detecting helicopter
drivetrain faults by Camerini et al. [2018] in a study where health indicators were
based on the anomaly score (AS), explained in a study by the Authority, Civil
Aviation [2011]. The results obtained by Camerini et al. [2018] indicated that it
is possible to obtain improved information from vibration data from health and
usage monitoring systems by fusing traditional condition indicators into a single
AS using data description models. However, these authors also stated that the
optimal selection of model parameters is still an open issue. In their study, the
selection was carried out using the approach proposed by Tax and Duin [2001],
where a class of artificial outliers is generated in order to estimate the volume
of the classifier. These issues are elaborated on further by Xiao et al. [2014],
who stated that parameter selection methods used for the binary-class SVM do
not apply to the one-class SVM. These authors divided methods for parameter
selection into two categories: indirect and direct methods. The indirect methods
are independent of OCSVM models, only utilizing the data distribution of one
class. The direct methods select the optimal parameter and train the OCSVM
models using the feedback for the OCSVM models to tune the parameter and
afterwards train new models based on the tuned parameter [Xiao et al., 2014].
There are studies where one-class SVM has been implemented for fault de-
tection [Fernández-Francos et al., 2013; Martı́nez-Rego et al., 2011; Yin et al.,
2014]. Although good results were obtained in all these studies, a shortcoming is
that the authors do not explain how the selection of the tuning parameter was
performed. As explained by Shin et al. [2005], this is not a trivial problem and
should be emphasized more in these types of studies.
2.3.2 Operating context

The aim of machinery condition monitoring is not only to detect faults. It can
also be used for estimating a systems behavioural changes. These changes may
23
2. STATE OF THE ART
occur due to operational changes such as, e.g. changes in the produced process
output or the seasonal temperature variations. In some cases there is a direct
method for measuring these changes, for instance the use of in-motion rail scales
to measure the difference between a fully loaded and an unloaded cargo train [Lin
et al., 2016]. However, in many situations the cause and effect of the operation
mode and the measured parameters are not that obvious. Heng et al. [2009] stated
that the failure behaviour of each unit is a function of the changes in the work
schedule and the operating environment, as well as other duty parameters, and
thus the current condition of an operating unit needs to be monitored online. As
stated by Timusk et al. [2008], different operation modes affect the performance
of the fault detection algorithms. Therefore, there is a connection between the
detection of operational changes and fault detection.
Overall, there are two approaches for detecting operational changes with ma-
chine learning techniques: using classification algorithms [Hanafizadeh et al.,
2015] and using clustering algorithms [Iverson, 2004; Löwe et al., 2016]. The
difference between these two methods is that in classification, a new object is clas-
sified into a pre-defined class, whereas in clustering, a set of objects are grouped
together based on the relationship between the objects. Data mining cluster-
ing could be used to extract various sources of collected data into a sensible
form. Shin and Jun [2015] stated that it has been difficult to achieve effectiveness
in maintenance operations because there is no information visibility during the
product usage period. They also expected that emerging technologies such as ra-
dio frequency identification (RFID), micro-electro-mechanical systems (MEMS),
wireless tele-communication, supervisory control and data acquisition (SCADA),
and product-embedded information devices (PEID) will very soon be used for
gathering and monitoring the status data of products during their usage period.
Therefore, this area of PHM will most likely be attracting more interest in many
applications in the future. However, one of the future challenges in collecting
process data in order to assess the operating context may be the problem of ob-
taining a good time and space correlation. This is because many of these data
types cannot be synchronized with other data, such as vibration data, because
the time stamp of the process event is not the same when the sampling frequency
is only a fraction of it and the data are acquired using measurement systems (e.g.
24
2. STATE OF THE ART
SCADA) designed by other vendors than those who designed the systems pro-
ducing the CM data. In fact, time and space correlation is an important property
of data from the IoT [Chen et al., 2014].
Perhaps the most common unsupervised clustering method is the k-means
algorithm [Jain, 2010]. This algorithm is initialized by picking k initial cluster
points and allocating all the data points to the closest one. Another popular
cluster algorithm proven to be successful in many situations is the expectation-
maximization algorithm [Chamroukhi et al., 2011; Dempster et al., 1977; Yang
et al., 2012].
In aeronautics, studies have been published whose aim has been not only
to detect the point of operational change, for the sake of detecting faults, but
also to find ways to estimate the time until the system cannot operate and per-
form the required function [Schumann et al., 2015; Suarez et al., 2004]. Suarez
et al. [2004] tracked real-time onboard damage accumulation using a model called
PHM/ALPS. The goal was to evaluate the current mission profile (operating con-
ditions) using past mission profiles (historical data) to demonstrate the indepen-
dent life prediction capability.
Camerini et al. [2018] explained how the method called anomaly score can im-
prove the feature-level data fusion, adopting an approach where, after initial data
cleaning, CM data were separated into several categories based on the operational
context.
2.3.3 Fault propagation

Roemer et al. [2006] summarized in their study the range of possible prognostic
approaches as a function of their applicability to various systems and their rel-
ative implementation cost, as can be seen in Figure 2.2. According to Roemer
et al. [2006] a major challenge in machinery prognostics is how to manage the
inherent uncertainty. In their study they also emphasized that accurate and pre-
cise prognosis demands good probabilistic models of the fault growth supported
by sufficient statistical samples of failure data to assist in training, validating and
fine-tuning prognostic algorithms. However, not all prognostic methods require
probabilistic models of the fault growth.
25
2. STATE OF THE ART
Figure 2.2: Prognosis technical approaches. [Roemer et al., 2006]
Although many prognostic approaches first define the current state (diagnosis)
and then estimate the future state (RUL calculations), there are also some tech-
niques which solely estimate the end-of-life state. These techniques are based on
reliability models where the average usage life has been calculated without con-
sidering the individual differences or operational differences. Many of these types
of approaches do not require any physical knowledge or fault growth models.
Sometimes these models are referred to as empirical models.
Perhaps the best-known and most used standardized method is the L10 bearing
life rating method [ISO 2007281, 2007; Palmgren, 1945]. It is a semi-empirical
method which estimates the time when 10 % of similar types of bearings have
failed in laboratory conditions. Even though these types of prognostic methods
are very limited and cannot be used to know exactly the time of failure, they
have at least two main advantages. Firstly, they can be used when defining
the appropriate time for scheduled maintenance. They can act as a reference
measure to determine whether scheduled maintenance tasks are under- or over-
designed. Secondly, there are some advanced prognostic models which are based
on these models and which add individual information about the operational or
other conditions, for instance the stress level [Wesley Hines and Usynin, 2008].
However, they can rarely be used in online prognosis since there is no measure
which takes into account the current state of the system. On the other hand L10
can be a good choice as the starting point for more advanced bearing prognostic
26
2. STATE OF THE ART
methods.
Wesley Hines and Usynin [2008]], prognostic techniques were also catego-
rized into three groups, but as follows: experience-based prognostic techniques,
techniques using evolutionary or trending models and model-based prognostic tech-
niques. Nevertheless, the methods listed in each group were rather similar, the
first being the type 1 techniques and the last being the type 3 techniques. The
type 1 techniques belong to a group where the failure modes are not separated
and only historical time-to-failure data are used for modelling the average remain-
ing life under average usage conditions. For some applications these techniques
may be adequate, but they require several factors before it is meaningful to apply
them. Some of these factor, according to current author are:
• A comprehensive historical failure database is needed.
• The data should only concern similar fault modes which can be treated as
a homogeneous group.
• The operational usage should be similar and extreme conditions should have
the same distribution over the systems lifetime.
• In addition, the system should remain similar (no extensive modification or

updates) so that the historical data remains valid.
The type 2 techniques are similar to the type 1 techniques, with the exception
that the type 2 models use prior observations of explanatory variables such as
stress or temperature and the response variable (e.g. the failure time) to predict
the life of a component [Wesley Hines and Usynin, 2008]. The type 3 approaches,
or effect-based prognostics, use a degradation measure to form a prognostic pre-
diction [Wesley Hines and Usynin, 2008]. These approaches quantify the proba-
bility of failure at a given time by measuring the system degradation using direct
or indirect variables, for instance using physics based degradation models.
2.3.4 Soft issues concerning PHM

Saha et al. [2009] stated that for the end-of-life predictions of critical systems,
it is imperative to establish faith in the prognostic systems before incorporating
27
2. STATE OF THE ART
their predictions into the decision-making process. The inherent uncertainties of

prognostic systems are the aggregate of many unknowns and can result in con-
siderable prediction variability [Roemer et al., 2006], and therefore PHM models
always have some prediction boundaries.
In fault diagnostics, the option is either to aim at an earlier and more sensitive
time of fault detection or to aim at a later and less sensitive detection with
lower uncertainty. The former option may give more reaction time for making a
correct decision, but may increase the expenses due to false alarms and the cost
of verifying the alleged faults. The latter option, on the other hand, will most
likely give fewer false alarms, but allow less time to make the correct diagnostic
decision and may lead to unexpected failures, if the time between detection and
failure is too short or fault detection failed to detect the incipient fault before the
total failure.
This trade-off between these two cases in terms of prediction algorithms is
often plotted as a ROC (receiver operating characteristic) space (see Figure 2.3),
and the challenge is to find a reasonable, rational and desirable balance between
sensitivity and specificity [Swets et al., 2000]. Sensitivity is a measure of how well
the model is able to predict when the system is in a faulty state and specificity is
a measure of how well the model is able to predict when the system is in a healthy
state, The dark area in the ROC curve represents models which are making worse
predictions overall than those that would be obtained with a random flip of an
unbiased coin. However, worse types of models are found at the diagonal line
since the entropy (information theory) of these models will have its maximum
value and contains no information at all. According to a study by Fawcett [2006],
ROC curves are insensitive to changes in the class distribution, and therefore
optimized models can be identified using the ROC convex hull, as explained in
that study.
In the ROC space diagram (see Figure 2.3), there is one area called the con-
servative area and one called the liberal area. In general, conservative models
are able to detect when the system is healthy with higher certainty than when it
is faulty. Liberal models should be able to generate an alarm when the system
is no longer healthy with higher certainty, on the cost of reducing the certainty
of knowing when the system is healthy. In fault detection, the liberal area is for
28
2. STATE OF THE ART
A B
Liberal
Sensitivity
Conservative
C 1-Specificity
Figure 2.3: Illustration of a ROC curve. Point A represents the ideal classifier,
which is able to classify all the data points correctly with zero false positives or
negatives. At the point B, classifier is able to classify correctly when the machine
is faulty with zero false negatives. At the point C classifier is able to classify
correctly when the machine is healthy with zero false positives.
systems for which it is important to know when a fault has occurred, even at
the expense of having more false positive alarms. Liberal models, according to
present author, can be used for instance when the system can be considered to
be critical for the core business. Nuclear power plants and airplanes are good
examples of this type of system. On the other hand, the conservative area can
be for systems which are slightly more peripheral to the core business and can
be adjusted to have fewer false alarms, even at the expense of missing some of
the incipient faults. A good example of this type of system is a redundant pump,
which will not affect the production during a shutdown. This type of system
can also include many systems which are normally not considered to be part of
the machines being subjected to condition monitoring, such as mobile machines
supporting the production.
In the current standard CBM procedure [ISO 17359, 2011], criticality assess-
29
2. STATE OF THE ART
ment is recommended “to create a prioritized list of machines to be included

or not in the condition monitoring programme”. However, later this criticality
information is not used, for instance to assess the sensitivity of the detection
algorithm or to select confidence boundaries for RUL estimations. Therefore the
current author believes it can be worth investigating, how to include the criticality
information into the future diagnostics and prognostics techniques.
2.4 Concluding remarks concerning the state of

the art
• PHM and CBM can be considered to be analogues and more importance
should be attached to the content of the techniques than to the definitions
or applications.*
• The data collected in a CBM programme can be structured into two main
types: event data and condition monitoring data [Jardine et al., 2006]
• SK has been shown to be a promising tool for detecting incipient bearing

faults and can be used for many applications following the same proce-
dure [Wang et al., 2016].
• Eighty percent of data analysis is devoted to the process of cleaning and

preparing the data [Dasu and Johnson, 2003].
• The OCSVM has been implemented successfully for fault detection. How-
ever, the selection of good tuning parameters is an unsolved problem. [Shin
et al., 2005]
• Different operation modes affect the performance of fault detection algo-

rithms [Timusk et al., 2008].
• Prognostic approaches can be classified into three different categories, namely

failure-data-based techniques, stress-based techniques and effect-based tech-
niques; some of these approaches are the precursor of a diagnosis and some
30
2. STATE OF THE ART
bypass the diagnosis to define the current state of the system [Wesley Hines
and Usynin, 2008].
• Concerning end-of-life predictions of critical systems, it is imperative to es-

tablish faith in the prognostic systems before incorporating their predictions
into the decision-making process [Saha et al., 2009].
• A major challenge in machinery prognostics is how to manage the inherent

uncertainty [Roemer et al., 2006].
• Accurate and precise prognosis demands good probabilistic models of the

fault growth [Roemer et al., 2006].
• Criticality assessment of machines according to ISO 17359 [2011] is only

to be used for creating a list of machines to be included in the condition
monitoring programme. Finding new ways of utilizing this information is
to be recommended.*
*Are based on currents authors findings, which are based on several references.
2.5 Research Gaps

Based on a study of the research performed in this area, the present author has
found that six topics have either not yet been explored or are under-explored.
Five of these six research gaps (item 1-5 below) have been partly studied in the
research performed for this thesis; see the research questions for more information.
1. Many promising CM techniques should be re-evaluated using datasets which

are taken from real case studies where faults have occurred or have been
tested using more realistic laboratory equipment. One option would be to
build a similar dataset to the datasets used for machine learning techniques,
which can consist of several mechanical failure datasets [Center for Machine
Learning and Intelligent Systems, 2007].
2. Research is needed on selection of the correct sensitivity when using unsu-

pervised anomaly detection methods for fault detection.
31
2. STATE OF THE ART
3. Research is needed on improved methods enabling one to avoid false alarms

or falsely identified faults when estimating the failure modes of a mechanical
rotating system.
4. Research is needed on the development of methods for mitigating or exploit-

ing operating mode changes when improving machinery diagnostic tech-
niques.
5. Research is needed to answer the question of how to avoid using diagnostic

techniques which will later be completely useless when starting to use more
advanced techniques or trying to estimate the RUL. Mostly this question
concerns techniques for treating the data which do not involve drastic pre-
processing steps to avoid information losses.
6. Research is needed on the development of a robust method for estimat-

ing the wear progress in many machine components by extracting relevant
features from the vibration signals.
32
Chapter 3
RESEARCH METHODS
This chapter presents the methods applied in the research conducted for this
thesis and explains some of the choices made for that research.
33
3. RESEARCH METHODS
3.1 Background
The research topic presented in this thesis was part of a series of studies per-
formed to investigate several aspects of a new smart bearing [SKF AB, 2013].
The smart bearing concept involves the embedding of smart sensors in the bear-
ing or its housing. The above-mentioned series of studies include the exploration
of methods for analysing the collected data and combining them with existing
information in order to improve machine condition monitoring (for more infor-
mation, see [SKF AB, 2013]). The focus of the research presented in this thesis
has been directed towards methods for improving condition monitoring techniques
by combining several methods in such a way that they will work in steps or in
parallel, for the purpose of increasing the possibility of using smart bearings in
the future. This research has followed the guidelines specified in the project de-
scription written in an agreement between Luleå University of Technology (LTU)
and the bearing manufacturer SKF AB. In addition to the Division of Operation,
Maintenance and Acoustics, three other divisions from LTU have been collaborat-
ing in several projects which have dealt with issues relating to embedded sensors
or wireless communication challenges, and therefore these issues have not been
explored in the research performed for this thesis.
3.2 Experiments
Due to difficulties in obtaining a proper dataset for failures and their propagation,
our first decision was to build a test rig for laboratory conditions where faults
could be simulated in a controlled environment. Unlike the rigs seen in many other
research papers, a bigger test rig was built where the forces are closer to the ones
seen in industry and natural sources of noise from other rotating components are
present. Such a rig was built to make the detection and identification of small
incipient component faults much more difficult, and thus provide a tougher and
more precise validation of the usefulness of the tested algorithms. Furthermore,
the test rig was built so that it could be run over a longer period if a fault
degradation process were to be needed. A description of the test rig and the
equipment used for measurements is also provided in Paper D and E.
34
3. RESEARCH METHODS
3.2.1 Test rig design

Many on-the-shelf and off-the-shelf test rigs have been designed for studying
mechanical faults. The more commonly used test rigs include the PRONOSTIA
platform (see, [FEMTO, 2018]), the SpectraQuest test rig (see, [Inc, 2018]) and
a rig developed by Case Western Reserve University (see, [University, 2018]).
Most of these test rigs are suitable for studying the physical relations behind
the investigated faults, but their disadvantage is that all the components are
miniaturized, which can result in scaling problems, and robust analysis is not
achieved since, for instance, noise coming from other sources is not present. Also
many of the faults are simulated and not caused by natural wear.
The following demands were specified before designing the test rig.
• It should be suitable for investigating seeded faults of rotating components

with varying load and speed.
• It should be possible to carry out longer tests where degradation is pro-

gressing from the start-up of a new system until failure.
• It should be possible to simulate many different types of failures to deter-

mine their effect on the fault diagnostic capabilities of tested algorithms.
• It should include other components that will produce natural noise.
The rig (Figure 3.1) used in some parts of this research study was specifically
designed to test several different fault types in a more natural environment. The
rig includes many vibration sources (e.g. a large electric motor, a hydraulic
pump, oil valves, etc.) which can mask the vibration signals coming from the
fault location. Table 3.1 lists some of the main components of the test rig.
Figure 3.2 shows a schematic diagram of the test rig. Mm is the main electric
motor, which is able to produce a maximum of 75 kW of power. The load is
transferred through a two-stage gearbox, and the output shaft is connected to
a hydraulic pump which can produce a maximum torsional load of 500 Nm.
To avoid cavitation, an initial pressure of 1 bar is fed to the pump using an
additional support motor (Ms ), seen in the hydraulic circuits in Figure 3.2. The
load is controlled using the hydraulic oil flow by adjusting the hydraulic valve
35
3. RESEARCH METHODS
Vibration
Tacho sensor Pump
sensor
Torque sensor Gearbox Coupling

y
z x
(a) Mechanical subsystem.
Electric motor (Ms ) Hydraulic valve
(b) Hydraulic subsystem.
Figure 3.1: Test rig for studying mechanical faults.
seen in Figure 3.2. Additional functions in the hydraulic system are a cooling
and a filtering system (see Figure 3.1b). The torsional load (Mi ) is measured
between the gearbox and the main electric motor.
3.2.2 Measurement equipment

Three accelerometers (in the x-y-z directions) were mounted on the gearbox. The
sensor model used was the ICP TO608 (10 mV/g). All the accelerometers were
stud-mounted to ensure that even high-frequency vibrations could be sensed.
36
3. RESEARCH METHODS
z x S1
ωI , MI
Mm G1
G3
G2
G4 γ
h
Ms
Misalignment apparatus
Mechanical subsystem Hydraulic subsystem
Figure 3.2: Schematic of the test rig. Relative sensor position to gear 2 was
approximately (-1cm, - 6cm, 1cm).
The test rig data acquisition was achieved by using the PXI platform from
National Instruments. The following three modules were used. The NI PXI-
4472B was used for acquiring the vibration signals and this module has eight
channels, each of which has a sample rate of 102.4 kS/s. The NI PXIe-6361
was used for measuring the torsional load and speed, and this module has 16
analogue inputs (2 Ms/s), two analogue outputs (2.86 MS/s), 24 digital I/Os and
4 counters/timers. With this module it was also possible to control the speed of
the main electric motor, for example.
3.2.3 Misalignment of the output shaft

During the laboratory tests, a misalignment was introduced between the hydraulic
pump (producing the load) and the medium-sized industrial gearbox connected
with a rubber coupling. The coupling type was the so-called doughnut type,
which is one of the typical rubber couplings used for enduring some degree of
shaft misalignment when ones goal is to have zero misalignment. This type of
coupling is mainly used in cases where sudden impacts are likely to happen, such
as those occurring in highly loaded gearboxes.
37
3. RESEARCH METHODS
Table 3.1: Test rig specifications.
Specification Explanation
Motor 75 kW three phase AC electric motor
Controller ABB inverter
Max RPM 1600 RPM
Gearbox Mekanex 602A Ratio: 3,61
Coupling Rubber doughnut coupling
Load Hydraulic pump max. torque 500 Nm
Vibration sensors IMI PCB 10mV/g
Torque sensor Linear range 0-1000 Nm
Max. offset misalignment of 5 mm (output shaft)
Misalignment apparatus
Max. angular misalignment of 3◦ (output shaft)
3.2.4 Several seeded component faults

Four individual types of defects were tested in this study: angular and offset mis-
alignment of a rubber coupling, a partially broken gear tooth, and macro-pitting
of the gear contact surface. For the angular misalignment, the test rig was tilted
by moving the shaft and the pump in the horizontal x-direction (see, Figure 3.3).
The angle (γ) of the misalignment was three degrees. The offset misalignment (h)
was produced by moving the pump and the pump shaft horizontally 3 mm in the
z-direction. The gear damage was created to mimic two common gear failures,
namely a partially broken gear tooth (see, Figure 3.4a) and macro-pitting of the
mesh surface (see, Figure 3.4b). Both gear defects were created for gear 2 (G2),
seen in Figure 3.2. The macro-pits had on average a radius of 0.6 mm. The width
of the break in the partially broken tooth was half the tooth width. The data
collected from the test rig were used in the studies documented in Papers D and
E.
The data collected from the test rig was used in the studies seen in papers D
and E.
38
3. RESEARCH METHODS
Figure 3.3: Misalignment apparatus.
(a) Broken tooth.
(b) Macro-pitting.
Figure 3.4: Simulated gear faults.
39
3. RESEARCH METHODS
3.3 Test rig for studying bearing defects

One of the demands for the test rig in the beginning was that it should be able to
carry out longer tests where degradation of the components could be monitored
from the start-up to a failure. However, this was not fully achieved until a very
late stage of the thesis studies when an additional test rig was built. The new
test rig was built using the former test rig by removing the gearbox and the
misalignment apparatus and designing a new bearing house which was directly
installed on the electric motor shaft, as seen in Figure 3.5. The load was applied
to the bearing with a hydraulic system where a cylinder pulled the bearing in
the axial direction, as can be seen in Figure 3.5a. In addition, a small triaxial
vibration sensor (Brüel & Kjær type 4524B) was glue-mounted directly above
the studied bearing (See Figure 3.5b). The tested bearing was oil-lubricated by
filling the housing half full with Mobil DTE 25 hydraulic oil (ISO VG 46). The
type of the tested bearing was an SKF 61900 deep groove ball bearing.
Tacho sensor Vibration sensor
Hydraulic cylinder Bearing housing
Load sensor
(a) Bearing housing and the hydraulic(b) Position of the bearing housing and
cylinder setup. the vibration sensor.
Figure 3.5: Modified test rig for studying bearing faults.
During the measurement campaign, a fixed 3 kN load was used and the ro-
tation speed was kept constant at 1620 RPM. The load was estimated based on
the hydraulic pressure and the known area of the piston. The load was selected
by calculating the theoretical L10 to be less than one day. A load of 4 kN was
also tested, but led to an early ball failure, which was not the failure type sought,
and therefore the load was reduced to 3 kN. In total, eight tests were performed
by running a healthy bearing until a pre-defined vibration level (in the axial
40
3. RESEARCH METHODS
direction) was met.

The pre-defined RMS levels were varied from 1.5 RMS to 6 RMS in order
to create several degrees of bearing damage from “small” to “severe”. Table 3.2
shows the pre-defined threshold values and the failures confirmed after post-
mortem analysis. Each time the measurements were acquired using a sample
frequency of 25.6 kHz, the measured sample length was three seconds, and the
samples were taken once per minute. Results from these tests can be seen in
Section 5.5. The purpose of these tests was to illuminate some of the issues seen
in Papers A-D and to create a database which could be used in the future, for
instance to study the use of prognostic techniques.
Table 3.2: Threshold values for the termination of each test and the post-mortem
cause of the test stoppage.
# Test Threshold (RMS level [g]) Test duration (Min) Fault location
1 4,0 403 IR
2 4,0 241 IR
3 1,5 191 IR
4 2,5 264 IR
5 2,0 216 IR
6 6,0 252 IR+BF
7 4,0 289 OR
8 2,0 519 OR
IR = Inner race
OR = Outer race
BF = Ball failure
3.4 Case studies

A couple of the research questions formulated could not be solved using only data
collected from the test rigs. Therefore, other sources of data were also used.
3.4.1 Vibration signals collected from an underground loader

In order to determine how several operation modes can be investigated using a
real case study, data collected from an underground LHD loader were used (in
the study presented in Paper C). A case study is an ideal choice for this type of
study, since operation modes are to a certain extent unknown because they can be
41
3. RESEARCH METHODS
affected by the operating environment and change depending on the position of

the loader. Originally the loader data were collected for fault diagnostic purposes
to detect mechanical component faults in a loaders front axle. The LHD model
in question is made by Sandvik and is an LH621. Vibration measurements were
performed using a CompactRIO 9024 data logger from National Instruments and
four SKF Copperhead CMPT 2310 accelerometer sensors were used. The sensors
were installed on the front axle, two on the left side of the axle and two on the
right side, as seen in Figure 3.6. The vibration measurements were synchronized
with the Cardan axle speed, which was obtained using the tachometer pulse from
the drive shaft. The vibration measurements were continuous, which means that
every operation regime of the LHD was recorded with a precise time stamp. The
sample rate was 12.8 kHz.
(a) An LHD machine.
(b) Sensor in the horizontal direction. (c) Sensor in the vertical direction.
Figure 3.6: A typical underground loader and two sensors mounted on the front
axle.
42
3. RESEARCH METHODS
Figure 3.7: Bearing test rig and position of the accelerometers [Qiu et al., 2006].
3.4.2 Bearing test data used in the Paper A
Vibrational data from the bearing dataset provided by the Center for Intelligent
Maintenance Systems (IMS) at the University of Cincinnati were used in this
study [Lee et al., 2007]. Four double-row bearings (Rexnord ZA-2115 bearings)
were installed on one shaft and two accelerometers were mounted on each of
them to register the vibration signals in two different spatial axes, as shown in
Figure 3.7. The shaft was driven by an AC motor and coupled by rub belts. The
rotation speed was kept constant at 2,000 rpm and a 6,000 lb radial load was
added to the shaft and bearings by a spring mechanism. Vibration data were
collected, first every 5 min and then, after 215 min, every 10 min. In total, the
test run lasted 355 h 45 min. Each file consists of 20,480 points with the sampling
rate set to 20 kHz. At the end of the test, an inner race defect occurred in bearing
3. Therefore, data were captured from the two accelerometers mounted directly
on the housing of bearing 3. Pictures of the defect can be seen in the study
conducted by Qiu et al. [2006].
43
3. RESEARCH METHODS
3.4.3 Damaged wind turbine bearing

The data used in Paper B were acquired from a real wind turbine where a high-
speed shaft bearing on the generators side was damaged after a measurement
campaign. An accelerometer was mounted in the axial direction on the gear-
box housing close to the high-speed shaft (HSS) bearing on the generators side
(GS). In total, 219 measurements (one per day) were collected. Some of the mea-
surements taken during the measurement campaign when the wind turbine was
non-operational were deleted. The sample rate was 12.8 kHz and the measure-
ment time per measurement was 1.28 seconds. Due to a non-disclosure agreement,
more information cannot be given to specify the type of the wind turbine.
44
Chapter 4
Summary of the appended papers
4.1 Paper A
In the study presented in Paper A, fault detection techniques for condition mon-
itoring were investigated. The aim was to study a novelty detection algorithm
and methods for defining an appropriate model sensitivity in order to avoid false
alarms when the criticality of the system is low and late detection when the criti-
cality is high. Here a method was proposed which involved the use of an OCSVM
algorithm with a Gaussian kernel where the input features were extracted using
vibration analysis in the time domain. The purpose of the proposed method was
to determine what type of attributes would affect the sensitivity and whether
these attributes could be linked and selected using the criticality of the system.
In the study presented in Paper A, some of the plausible techniques applied
nowadays for estimating the criticality of the system were reviewed before the
case study was performed. In Paper A, it was revealed that the model tuning
parameters gamma (γ) and nu (ν), together with the threshold parameters, will
define the sensitivity. Furthermore, it was shown that when using a testing set
where only nominal data were present, many of the models did not achieve 100
% accuracy. Therefore, each chosen model was tested using an accuracy index
where the accuracy of the model was compared with the initial accuracy. Overall,
there was a clear trend showing that all the tuning parameters were able to detect
the fault before the end of the measurement campaign. However, the possibility
45
4. SUMMARY OF THE APPENDED PAPERS
exists of estimating a more appropriate point of detection by choosing the model

tuning parameters correctly. Nevertheless, even if one uses the criticality
index, there is a need for selecting a threshold where more than one anomaly
are allowed to cross the boundary. Therefore, one needs to select the model
tuning parameters and the threshold simultaneously in order to define the correct
sensitivity of the model. The results suggest that criticality analysis can improve
the fault detection modelling, which will enhance the effectiveness of maintenance
decision making.
4.2 Paper B
In the study presented in Paper B, the method presented in Paper A was tested
using a real case study. Here the aim was to determine whether the OCSVM
was able to detect the bearing fault using specific features extracted with en-
velope analysis. The goal of the study was to detect and identify wind turbine
bearing faults using fault-specific features extracted from vibration signals. Au-
tomatic identification was achieved by training models by using these features as
an input for a one-class support vector machine. Detection models with differ-
ent sensitivity were trained in parallel by changing the model tuning parameters.
Efforts were made to find a procedure for selecting the model tuning parameters
by first defining the criticality of the system, by estimating what the initial base-
line specificity should be in order to know what tuning parameter values should
be adequate. As a result, an inner race fault of a bearing was detected at an
early stage of the fault propagation and much sooner than it was detected using
traditional methods. The results were promising. When inner-race-related fea-
tures (the BPFI and its sidebands) and outer-race-related features (the BPFO
and its sidebands) were combined, a point-wise model was able to identify the
fault location correctly.
4.3 Paper C
In the study presented in Paper C, an unsupervised clustering technique (VBGM)
was investigated in order to determine whether it could be used for detecting op-
46
eration regimes for an underground LHD (load-haul-dump) machine. The input

features were extracted from the vibration signals measured on the front axle.
Moreover, the speed of the Cardan axle was one of the input features. The initial
results showed that, although groups could be formed, it would be difficult to
know what they represented and this would make further use of the technique
rather difficult. Therefore, in order to obtain correct labels for each separated
class, we used a smaller dataset for infecting these clusters to determine what the
major proportion of the data content was. Once the label is known, these clusters
can be used either to define the correct time to use diagnostic analysis tools or
to estimate what the effect of individual groups will be when fault propagation
is seen. Promising results were obtained, revealing that each operation regime
sought was detected in a sensible manner using vibration RMS values together
with the speed. However, other extracted time domain features were unable to
give reasonable results.
4.4 Paper D
In the study presented in Paper D, fault-sensitive features were extracted us-
ing wavelet analysis together with a machine learning technique called Random
Forests. The aim was to determine how a feature set can be reduced to consist
only of those features which are relevant for identifying the studied defects. An
additional aim was to select only those features which are indifferent to load,
speed or other operating conditions. In this study the solution was to initially
filter vibration signals by using the wavelet transform in such a manner that each
scale would be selected from the same relative position, which would be indiffer-
ent to the rotation speed. In order to reduce the feature set, a Random Forest
was used to select only those variables which were the most important ones. To
reduce the ad hoc scenario even further, each test was performed three times
using a different load. Later, all the selected variables were compared and an
intersection was taken in order to determine the most important features. The
proposed method gave some indications that it could be a viable method, espe-
cially when some previous feature set has failed to detect the fault and a new set
of features is needed.
47
4.5 Paper E
In the study presented in Paper E, the use of wavelet analysis together with an
SVM classification algorithm was tested for detecting the degree of an angular
misalignment. The method was experimentally evaluated in a laboratory test rig
for four different operating conditions by varying the rotational speed and load.
An angular misalignment was introduced between a hydraulic pump (producing
the load) and a medium-sized industrial gearbox connected with a rubber cou-
pling. Vibration data were collected using two accelerometers mounted in an
axial and a radial direction directly on the gearbox casing. The final results of
the confusion matrices clearly indicate that this method can detect misalignment
even when the speed and load vary. This study showed that the SVM algorithm
is a powerful classification algorithm and the problem of using it to obtain perfect
classification results lies more in acquiring a proper dataset where both faulty and
healthy cases are seen and the future conditions remain the same. Furthermore,
the question arose as to what would happen to the detection capabilities when
unseen faults occurred.
48
Chapter 5
RESULTS AND DISCUSSIONS
The results in this thesis are mainly based on the appended papers. To validate
and elaborate some of the subjects which have been discussed in the appended
papers additional laboratory tests were conducted Using the test setup presented
in Section 3.3. The data and results from these tests have not been published
previously outside of this thesis.
5.1 Results from additional bearing degradation

tests in the laboratory
In total eight life length test were conducted to study the degradation of the
bearings, see Table 3.2. The RMS vibration of level during the progression of the
inner race fault for the first six tests are shown in Figure 5.1a. The initial defect
of the inner race of the bearing occurred when the baseline level of the vibration
rose above 0.9 g. The last two tests with outer race failures are presented in
Figure 5.1b. For the outer race the threshold value was 1.4 g. The variation of
the failure time for all eight test was high even though the load and speed were
kept at the same level in each test. Some of the bearings only lasted slightly more
than two hours, while some bearings survived for more than five hours (test 1).
Figure 5.2 shows the calculated time when 10 % of the bearings should fail
using Weibull distribution. Based on the experiment conducted, the estimated
lifetime of the bearing was slightly less than 100 minutes. The calculation method
49
5. RESULTS AND DISCUSSION
applied here is the same as that used when estimating the L10 rating life. The
calculated theoretical L10 bearing life rating for the bearing used in the tests is
eight hours, which is much higher than the real calculated lifetime. However,
the bearing load for these tests (a load ratio of 0.9) was much higher than the
normal load for any applications using the bearing in question. Furthermore,
when calculating the modified SKF life rating, L10mh by estimating the operating
temperature to be 80◦ C, and by using the contamination factor ηc = 0, 5 and
the correct operating viscosity of the oil (10.9 cSt), the expected life rating is
estimated to be two hours. Therefore, it seems that the new modified bearing life
rating calculator produces a rating which is much closer to reality than the old L10
rating in this particular case. What is more interesting to know in some industrial
applications is how long the bearing can survive until total failure after the initial
defect has occurred. Therefore, by setting the same threshold for the initial defect
and plotting the degradation time of all the tests assuming zero time after the
defect, one can see the remaining useful life after the initial defect. Figure 5.3
shows the size of the spall when the bearing was run after the initial defect for
all the inner race failures. Most of these tests were not run until total failure in
order to see the size of the damage during the damage growth. Figure 5.3 shows
that in test 6 (the black curve), which was the only test where the bearing was
run until complete failure, the bearing was able to survive slightly over two hours
after the initial defect, before complete failure occurred. Furthermore, many of
the censored tests follow the same path as test 6, which indicates that the path
after the initial defect is very similar in each case, even though the time before the
initial defect was seen varied greatly in each test. In Figure 5.4 the same graph is
plotted for the outer race faults. However, the time until complete failure is not
seen because mostly the failure occurred in the inner raceway and neither of the
two tests was run until complete failure. Furthermore, the vibration behaviour
when the outer race fault occurred was not constant, which can be seen from
Figures 5.1b and 5.4.
When estimating the damage accumulation for the inner race faults, it can
be seen from Figure 5.3 that the fault starts from one spall and then steadily
increases until the raceway has been damaged severely. Furthermore, it was seen
from test 6 that at the end of the test, balls were separated from the ball cage.
50
This most likely caused one of the balls to be displaced, destroying the ball and
thus the bearing completely. After this secondary damage the RMS level rose
from 6 g to over 20 g in less than one minute.
In the case of the outer race faults, the damage accumulation was slightly
more complex. During test 7, two smaller spalls were seen which were located at
a distance from each other which corresponded approximately to the width of 1.5
balls. In test 8 there was only one spall and it was deeper and bigger than that
in test 7. Images from the faults can be seen in Figure 5.4. These results indicate
that in RUL estimations, it is important to determine the failure mode and to
know the time when the initial defect occurs. This knowledge should improve the
estimation of the RUL.
Failure time Failure time
6 6
1 7
2 8
5 3 5
4
5
6
4 4
RMS (g)
RMS (g)
3 3
2 2
1 1
0 0
0 50 100 150 200 250 300 350 400 450 0 100 200 300 400 500 600
Time (Min) Time (Min)
(a) Inner race failures. (b) outer race failures.
Figure 5.1: Failure time of each bearing test for the SKF 61900 bearing. The
rotation frequency during the tests was 27 Hz and the load was 3 kN.
L10 Using all bearing failures

Weibull; Shape=2,281; Scale=265,2
0,004
Figure 5.2: Calculated L10 rat-
0,003
ing using data from all eight
tests. The definition of failure
Density
0,002
for inner race faults was a vibra-

0,001
0,1
tion RMS level of 0.9 RMS (g).
0,000
For outer race faults the level
0 98,88
Time (Min)
was 1.4 RMS (g).
51
After initial defect

6
4
RMS (g)
6
2 2
1
2
3
1 4 4
5
3 6
0
0 20 40 60 80 100 120 140
Time (Min)
Figure 5.3: Degradation path for each inner race fault after the initial defect
occurred.
After initial defect

6
7
8
5
4
RMS (g)
1 7
8
0
0 10 20 30 40 50 60 70
Time (Min)
Figure 5.4: Degradation path for each outer race fault after the initial defect
occurred.
52
5.2 Results and discussion related to RQ 1

how critical the system is and its usage?
RQ 1 is answered in Papers A, B and C. In Paper A, an unsupervised detection

method called the one-class SVM was studied. Figure 5.2 shows the level of
accuracy index (the accuracy compared with the baseline accuracy, see Paper A
for more details) of 15 separately trained one-class SVM models for 20 different
degradation sets. Degradation sets numbers are increasing over time when the
test is progressing and the last set was acquired when the test stopped. Each
set consist 150 measurements. The training and testing datasets include four
time domain features (the RMS, peak, kurtosis and skewness) extracted from the
vibration signals coming from sensors mounted on the test rig. In this test rig,
the double roller ball bearing was under a static load and ran until the inner race
defect was seen. The difference between these 15 models, as shown in Figure 5.2
is that the nu (ν) and gamma (γ) values increase from 0.01 to 0.6, at the same
rate. The ν parameter indicates the upper bound in the fraction of training
points outside the estimated region and γ is the kernel parameter, which controls
the influence of individual training samples. Higher values than 0.6 were also
tested, but were omitted from the figure since the initial testing accuracy using
the separated nominal test dataset was below 50 %. More information about the
test and results can be obtained in Paper A.
The results obtained in the study presented in Paper A clearly indicate that
the detection time depends quite heavily on the model tuning parameters. There
is an especially high level of dependence during the period when the first fault
indications are most likely to be seen, i.e. between degradation sets 3 to 7. In
this area the deviation in the model sensitivity is high. For instance, the accuracy
index value for degradation set 4 is over 70 points higher when comparing the
sensitivity levels of the lowest model (γ, ν= 0,01) and the highest model (γ, ν=
0,6). On the other hand, at the end (degradation sets 16 to 19), just before
the assumed secondary damages are happening to the system, there is no big
difference between these 15 selected models.
To connect these results with the results obtained in the study performed
53
120
, =0.01
, =0.053
, =0.095
, =0.137
100 , =0.179
, =0.221
, =0.263
, =0.305
80 , =0.348
Accuracy index
, =0.39
, =0.432
, =0.474
, =0.516
60 , =0.558
, =0.6
40
20
0
0 2 4 6 8 10 12 14 16 18 20
Degradation set
Figure 5.5: One class SVM model accuracy by varying both γ and ν values equally.
In accuracy index, model accuracy is compared against the baseline specificity.
by Fernández-Francos et al. [2013], who used the same dataset, one can note
that some limitations can be seen in their method. Even though they included a
sensitivity test, which was able to select more influential features automatically
from the frequency domain, the selection of appropriate tuning parameters was
not discussed. In their study the ν value was 0.01 and the γ value was 0.05,
and these values were regarded as adequate without any thorough investigation
or reasoning behind the selection. However, it is a reasonable postulate that the
same values cannot be used for every system, since each system may have totally
different performance requirements (e.g. concerning the quality of the output or
the level of maintainability). This is in agreement with the findings of the study
performed by Shin et al. [2005], who stated that follow-up studies are needed
54
which investigate how to select appropriate parameters for OCSVMs.

Paper A dealt with the issue of a possible connection between the criticality
of the system and selection of the tuning parameters, and a weak connection
detected was that low ν and γ values will lead to an earlier detection. The
findings showed that selecting the model tuning parameters is not sufficient, since
determining the threshold for how many anomalies are allowed to be outside
the trained nominal space is equally important. Therefore, after selecting the
Gaussian kernel, there are three attributes which should be selected before the
connection between the model and the system criticality can be made, i.e. ν, γ
and the threshold of the alarm.
Although in this study the OCSVM algorithm was used with a Gaussian
kernel, one can also expect these types of results when using other unsupervised
detection algorithms or other types of kernels. The only difference may be that
the criticality needs to be taken into account using some other attributes, and
the number of attributes may very well be more than three. Having to select only
three attributes makes the OCSVM a very applicable technique when trying to
correlate the criticality and the selected values of these attributes.
Another benefit of the OCSVM is that the parameter ν may also be pre-
selected using the collected data. Since ν sets an upper bound on the fraction
of outliers [Schölkopf et al., 2001], there may be a way to define its value by
examining the noise level in the collected data. If the noise level is low, a lower
value of the ν parameter can be selected. However, before making any solid
conclusions, this should be thoroughly investigated with real case studies.
Note that the only exception when the criticality should not affect the ac-
curacy of the detection algorithm is when the classifier is ideal (see the ROC
curve in Paper A for details). However, in practice this is seldom achieved since,
when detecting mechanical faults, what usually happens is that, if the classifier
is ideal, one wants to know the fault even earlier and, once again, the classifier
will be either liberal or conservative. For instance, in bearing fault detection,
this would mean that, if the spall of the inner or outer race is known exactly and
always correctly, the demand changes and the next step is perhaps to know when
the crack growth is starting. The problem of selecting the three parameters cor-
rectly is that for instance selecting low model tuning parameters together with a
55
high threshold value appear to work as well as medium parameter values (ν and γ
around 0,5) with a medium threshold value, as can be seen from Figure 5.2. Since
a good method for selecting these attributes easily was not found, the possibility
of using the initial accuracy (baseline specificity) of the trained model was tested
even further in the study presented in Paper B.
Case study for the further research was a wind turbine converter (WEC).
A rough estimation before any analysis was made, by considering the environ-
ment where the wind turbine was operating in order to select an appropriate
baseline specificity for the OCSVM models. It was decided that this specificity
should be approximately 0.85-0.8 (A2 models). For comparison, two other values
were selected, 0.7-0.65 (A3 models) and 0.95-0.9 (A1 models), as is illustrated in
Figure 5.6. It is practically impossible to know what type of ROC curve each
particular model has when using the training data and, therefore, the assump-
tion was made that it is a parabolic curve and may give sensitivity values where
A1<A2<A3.
A B
A2 A3
A1 LIBERAL
SENSITIVITY
CONSERVATIVE
C 1-SPECIFICITY
Figure 5.6: Illustration of a ROC curve and the three selected baseline specifici-
ties.
Using the same logic as in Paper A and keeping the ν and γ parameters equal
or as equal as possible and selecting feature set where each feature was sensitive
56
to find the correct fault, led to good results and most of the times detection was
accomplished earlier than using traditional methods. However using same model
tuning parameters detection of the abnormal behaviour would have happened at
the same time, when features were sensitive to find other type of faults and other
component faults (in this case other type of bearing locating next to the damaged
one). These result can be seen by comparing models seen in the Figure 5.7. In
this case Bearing (B2) remained healthy during the data collection and bearing
(B1) was damaged and confirmed to have an inner race spall. If the method could
be used for detection and identification, the detection time of models seen in the
Figure 5.7a should occur at the correct time and before any of the models seen
in the Figures 5.7b, 5.7c or 5.7d are detecting the fault.
Nevertheless the detection time of all of the models seen in the Figure 5.7,
which occurred at the day 326 can be considered to be an early detection of the
abnormal behaviour of the WEC. More details can be found from the Paper B.
To generalize these results, it can be said that most likely these types of fault
detection methods can be improved by limiting the model tuning parameters to
the correct levels by taking into account the criticality calculations using the
ROC space as it was explained it the Paper B. However, selecting the number of
anomalies allowed before an alarm is raised remains rather unclear and requires
further investigation. On the other hand, it may very well be the case that, after
using these types of methods in practice, an empirical level can be achieved which
will work in many cases.
In Paper B, several combination of feature sets were used. However in future
it could be beneficial to investigate and find methods which can make models
only sensitive for certain faults and insensitive for other type of faults or to the
changes seen in the operation.
On possible way to make features insensitive for other type of faults could be
to compare the level of defect frequencies against the overall level of the vibration.
This may reduce the leakage of frequency components, which normally occurs in
real life vibration measurements and reduces the accuracy of analysis (see for
study done by Wu and Zhao [2009] for more information).
To reduce the vibration fluctuation, which is caused due to changes in the
operation method discussed in the Paper C could be advantageous to used as a
57
A2 models A2 models
5 5
= 0.185 = 0.22 = 0.185 = 0.22 = 0.185 = 0.22 = 0.185 = 0.22
=0.185 =0.185 =0.22 =0.22 =0.185 =0.185 =0.22 =0.22
4 4
Predicted healthy points

3 3
2 2
1 1
0 0
125 175 225 275 325 375 125 175 225 275 325 375
Time (d) Time (d)
(a) Sensitive to detect inner race faults (B1) (b) Sensitive to detect outer race faults (B1)
A2 models A2 models
5 5
= 0.185 = 0.22 = 0.185 = 0.22 = 0.185 = 0.22 = 0.185 = 0.22
=0.185 =0.185 =0.22 =0.22 =0.185 =0.185 =0.22 =0.22
4 4
3 3
2 2
1 1
0 0
125 175 225 275 325 375 125 175 225 275 325 375
Time (d) Time (d)
(c) Sensitive to detect inner race faults (B2) (d) Sensitive to detect outer race faults (B2)
Figure 5.7: Detecting bearing faults using the one-class SVM algorithm. Models
sensitive to detect inner race faults have been trained using features (BPFI, BPFI
1st H and BPFI 2nd H). Models sensitive to detect outer race faults have been
trained using features (BPFO, BPFO 1st H and BPFO 2nd H). Bearing 1 (B1)
was damaged at the end of the test and bearing 2 (B2) remained intact.
pre-process step and collect the testing data only when the operation regime is
similar as it was when the detection model was trained. More details how to use
the method can be found from the Paper C and from the section 5.3.
58
5.2.1 Comparison of results obtained using the laboratory

tests
There are some limitations concerning the data acquired from the wind turbine
converter. Firstly, the data for training were quite limited, since only one mea-
surement was taken once per day. Secondly, it was difficult to know when the
initial defect had occurred, which makes it almost impossible to estimate when
a possible detection is a false alarm and when it is an early detection. Thirdly,
only inner race faults occurred and, therefore, there was no possibility of deter-
mining whether the method would also work for outer race defects. For these
reasons further analysis was performed using the data taken from the laboratory
tests, see Section 3.3. These data were processed similarly to the data processing
performed in the case of the wind turbine converter in Paper B. Figure 5.8 shows
the trend of the condition indicators named as BP F Irss and BP F Orss . These
features were calculated using the following steps.
1. An envelope of the vibration signal was created, setting the bandpass filter
to 500-10k Hz.
2. The frequency components were selected which were located near defect
frequencies (±1 Hz) related to BPFI and BPFO. signals are modulated
with the rotation frequency, sidebands are also included when calculating
the BP F Irss .
3. Step 2 was repeated for the 1st and 2nd harmonics.
4. The absolute value of each isolated frequency component was then summed
together to obtain the BP F Irss and BP F Irss , see Table 5.1.
As can be seen from the Figure 5.8, both the inner race fault and the outer
race fault can be diagnosed once the defect has occurred by following the trend.
However, as can be seen from Figure 5.8a, it becomes harder to identify the fault
location once the defect progresses (after 250 minutes). Based on these findings,
it seems that there is a certain time window when identification of the fault is
optimal.
59
Normalized values Normalized values

150 35
BPFIrss BPFIrss
BPFOrss BPFOrss
30
25
100
20
15
50
10
0 0
0 50 100 150 200 250 300 0 50 100 150 200 250 300
(a) Defect which has occurred in the inner (b) Defect which has occurred in the outer
race of the bearing (Test 6). race of the bearing (Test 7).
Figure 5.8: Trend of condition indicators calculated from the enveloped ac-
celerometer signals for the SKF 61900 bearing.
Table 5.1: Detailed explanation of features used in Figure 5.8 and Table 5.3.
Feature Explanation Unit
BP F I Sum of the BPFI and the absolute values of the± 3 shaft speed SBs g
BP F I 1st H Sum of the 1st harmonic of the BPFI and the absolute values of the ± 3 shaft speed SBs g
BP F I 2nd H Sum of the 2nd harmonic of the BPFI and the absolute values of the ± 3 shaft speed SBs g
BP F Irss Sum of the absolute values of all the BPFI-related defect frequency components g
*BP F Orss Sum of the absolute value of all the BPFO-related defect frequency components g
Feature set
Features A B C D
F1 BP F I BP F O BP F I |BP F I − BP F O|
F2 BP F I 1st H BP F O 1st H BP F I 1st H |BP F I 1st H − BP F O 1st H|
F3 BP F I 2nd H BP F O 2nd H BP F I 2nd H |BP F I 2nd H − BP F O 2nd H|
F4 BP F O BP F I
BP F I 1st H + BP F O 1st H
F5 BP F O 1st H
2
BP F I 2nd H + BP F O 2nd H
F6 BP F O 2nd H
2
Abbreviation Meaning
BPFI Ball pass frequency, inner
BPFO Ball pass frequency, outer
SB Sideband
*The BPFO-related features are calculated similarly to how the BPFI-related features are calculated,
without including any of the peak values of the shaft speed SBs.
5.2.1.1 Detection models using OCSVM
Since the OCSVM is able to handle more than one-dimensional features, the
input features were not summed together as is done when calculating the fea-
tures BP F Irss and BP F Orss features. Instead of that, BPFI, BPFI 1st , BPFI
2nd ,BPFO,BPFO 1st ,BPFO 2nd were used as individual features.
60
In total, four feature sets were tested. The first set, set A, included only
features sensitive for the detection of inner race faults. The second set, set B,
included only features sensitive for the detection of outer race faults. In a compar-
ison of the detection times of the models trained using set A and those trained
using set B, the ideal outcome would be that the former models would detect
bearing defects in the inner race earlier than the latter models, and that the
latter models would detect defects in the outer race earlier than the former mod-
els. Thus both the detection and identification could be accomplished using the
OCSVM.
The third and fourth feature sets, set C and D, were created by combining
both the inner and outer race-sensitive features into the same model. In set C
each BPFI- and BPFO-related feature was handled separately and, in total, six
input features were used. The set D features were combined by calculating the
separation distance between the BPFI and BPFO features and its mean distance
to the x axis. In Table 5.1 one can find an explanation of how each feature was
calculated for set A–D.
Before using the OCSVM for fault detection, the duration of the training
period needs to be defined. Since the L10 , for this particular case is around 100
minutes, the training period must logically be shorter than that. Therefore, the
data were divided into a training set, which is taken during the first 60 minutes,
and a baseline testing set, which consists of the data for the following 20 minutes
(20 data points). Similarly to the procedure in Paper B, 15 ν and γ parameters
were selected and the baseline accuracy was tested; the results are shown in
Table 5.2.
For further analysis, three models were selected where the aim was to have a
baseline accuracy of 95 %, 90 % and 50 %. These three models are marked with
a grey background in Table 5.2. The results seen in Table 5.2 were calculated
using feature set A and data collected from test 6.
Figure 5.9 shows the detection accuracy of test 6 using feature sets A–D with
the three selected model tuning parameters marked in Table 5.2. The predicted
healthy points (the y-axis) were estimated using a sliding window where the five
most recently measured points were tested to determine how many of them would
be seen as anomalous. If all the points were inside the trained space, they would
61
Table 5.2: One-class SVM baseline specificity (%) by varying the ν and γ param-
eters for test 6. The tuning parameters selected to train the detection models
are highlighted using different shades of grey background. Three BPFI-related
features were used as an input.
aa
ν aaγ 0.01 0.045 0.08 0.12 0.15 0.19 0.22 0.26 0.29 0.33 0.36 0.40 0.43 0.47 0.5
a
0.01 95 95 90 90 75 75 75 75 70 70 65 60 45 45 40
0.045 95 95 90 90 75 75 75 75 70 70 65 60 45 45 40
0.08 95 90 90 90 75 75 75 75 70 70 65 60 45 45 40
0.12 85 80 80 75 75 75 75 75 70 70 65 60 45 45 40
0.15 80 80 75 75 75 75 75 75 70 70 65 60 45 45 40
0.19 80 80 80 75 75 75 75 75 60 65 60 60 45 45 40
0.22 80 80 80 75 75 75 70 60 50 45 45 45 45 45 40
0.26 80 75 75 70 60 55 50 40 40 40 45 45 45 40 40
0.29 75 70 55 55 50 50 45 40 40 40 40 40 40 40 40
0.33 60 55 55 55 50 50 40 35 35 35 40 40 40 40 35
0.36 55 55 55 55 55 45 35 35 35 35 35 35 35 35 35
0.40 45 45 50 50 50 45 35 35 35 35 35 35 35 35 35
0.43 45 45 45 45 45 40 35 35 35 35 35 35 35 35 35
0.47 45 40 40 35 35 30 30 35 35 35 35 35 35 35 35
0.5 40 40 35 35 35 30 30 30 30 30 30 30 30 30 30
A1 A2 A3
all indicate that the machine is healthy. The X axis shows the time of the latest
measured point in the sliding window. During the 6th test, the defect occurred in
the inner raceway. Therefore, the models whose results are shown in Figure 5.9a
should indicate the fault before the models whose results are shown in Figure 5.9b
if the method is capable of detecting the inner and outer race faults and separating
them from each other.
As can be seen from Figure 5.9a, five consecutive points were seen as anoma-
lous after the test had continued for 123 minutes. Visually estimating the time
when there was a defect using the RMS vibration level, the estimated time was
119 minutes (see Figure 5.8a). The trained detection models using inner race-
related features were able to detect the fault almost at the same time. The reason
for the slightly later detection of these models is that the sliding window function
needs the five consecutive anomalous points in order to detect the presence of the
fault. As can be seen from model 1 (the black curve), detection would also have
been possible to achieve by using three consecutive anomalous points without
any false alarms. If this had been the case, the time of the alarm would have
been 121 minutes. Comparing the models where the features are related to inner
race or outer race defect frequencies only (Figures 5.9a and 5.9b), two interesting
results should be noted. Firstly, model 3 (the blue curve) would have caused an
62
5
= 0.01 = 0.045 = 0.22
=0.01 =0.115 =0.29
4
0
84 89 94 99 104 109 114 119 124
Time (Min)
(a) Detecting inner race faults, feature set A.

5
= 0.01 = 0.045 = 0.22
=0.01 =0.115 =0.29
4
0
84 89 94 99 104 109 114 119 124
Time (Min)
(b) Detecting outer race faults, feature set B.

5
= 0.01 = 0.045 = 0.22
=0.01 =0.115 =0.29
4
0
84 89 94 99 104 109 114 119 124 129
Time (Min)
(c) Detecting inner and outer race faults with the same model, feature set C.
5
= 0.01 = 0.045 = 0.22
=0.01 =0.115 =0.29
0
84 104 124 144 164 184 204 224 244 264
Time (Min)
(d) Detecting inner and outer race faults with the same model, feature set D.
Figure 5.9: Models for detecting bearing faults using the one-class SVM algorithm
by using data from test 6. It was confirmed that the defect occurred in the inner
race of the bearing.
63
early false alarm with feature set B, no matter how many consecutive anomalous
points out of five consecutive points had been used for the detection threshold.
Secondly, the other two models (model 1-2, the black and red curves) detected
the fault at roughly the same time using feature set B, with model 2 detecting
the fault slightly earlier than model 1. These results indicate that the method
of using only inner or outer race-specific features extracted with the explained
enveloping method is not a good method for identifying the fault location. Fig-
ure 5.9c shows the detection accuracy when inner and outer race-related features
were individually added to the same model. These models were able to detect
the fault slightly later than the models using feature set A or B. However, what
is interesting to note is their consistency. As can be seen, all the models detected
the presence of the fault at the same time and only a few times were one or two
points out of five seen as anomalies. Therefore, this indicates that combining
inner and outer race features into the same model makes it more robust, but
slightly less sensitive. The models trained using feature set D (Figure 5.9d) led
to poor results. Two models predicted the fault almost right after the training
period (the red and blue curves) and one model was unable to predict it at all
by using five consecutive anomalous points. However, the use of these combined
features creates an additional problem concerning the identification of the fault.
If all the features are used as a feature set of a detection model, it cannot estimate
the fault location.
Table 5.3 shows the times when the bearing fault was detected for all eight
bearing tests. Similarly to Figure 5.9, Table 5.3 presents the results for three
model tuning parameter setups used with four input feature sets. Also presented
in the table are the real failure times estimated by visually inspecting the trend of
several vibration indicators (e.g. RMS BP F Irss BP F Orss and peak). For most
of the tests, a clear indication was found with a sudden increase being visible.
However, for test 8, at a time of 341 minutes the vibration level rose almost two
times higher than the baseline level and stayed slightly elevated for 14 minutes
until it dropped to the same level as it had been at in the beginning of the test
(see Figure 5.10a). A clear increase in the vibration level was then seen after 472
minutes when the vibration level rose to a level more than four times the level
it had been at before. Therefore, for this test, two real failure times were given,
64
Table 5.3: Times for detecting bearing faults using the OCSVM when four con-
secutive points out of the five most recent consecutive testing points are seen as
anomalous events.
Detection time (Min)*
# Test Real failure time (Min) Input feature set
M1 M2 M3
A 328+ 327+ 291−
B 329+ 112+ 86−
1 325
C 329+ 112+ 110+
D 106+ 85− 110+
A 166+ 166+ 106−
B 166+ 123+ 102−
2 162
C 164+ 164+ 164+
D 86+ 93− 108−
A 190+ 190+ 133+
B 190+ 190+ 85−
3 186
C 188+ 188+ 97−
D 152− 92− >191
A 247+ 89+ 89+
B 88+ 86+ 86+
4 243
C 247+ 116+ 102−
D 102− 84+ 81−
A 129+ 129+ 129+
B 129+ 129+ 91+
5 126
C 130− 130− 129+
D 81+ 81+ 81+
A 122+ 122+ 120+
B 122+ 121+ 110−
6 119
C 125− 125− 125−
D 91− 85+ 93−
A 231− 229− 108−
B 231− 231− 85−
7 228
C 231− 231− 85−
D >289 >289 >289
A 343− 343− 343−
B 485− 342− 340−
8 341 or 472
C 485− 485− 340−
D >519 >519 397−
In tests 1–6, failure occurred in the inner raceway; 7–8 it occurred in the outer raceway.
Feature sets A–D, see Table 5.1 for detailed information.
Model parameters:
M1 ν = 0, 01, γ = 0, 01, M2 ν = 0, 045, γ = 0, 115, M3 ν = 0, 22, γ = 0, 29
*Detection time symbol, see Table 5.1 for detailed information.
+ BP F IP rss is relatively larger than BP F Orss
− BP F IP rss is relatively smaller than BP F Orss
65
4
BPFIrss
BPFOrss
3.5
2.5
1.5
1
0 100 200 300 400 500 600
(a) BP F Irss and BP F Orss feature values during the test 8

0.2
BPFOrss
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0 20 40 60 80 100 120 140 160
(b) BP F Orss feature values during the first 160 minutes for the
test 4.
Figure 5.10: Trend of condition indicators, which were calculated from the en-
veloped accelerometer signals for the SKF 61900 bearing.
since there might be a possibility that a first defect occurred at 341 minutes.
Instead of using five consecutive anomalous points out of five consecutive points
as the alarming level, now four consecutive anomalous events will detect the fault.
66
This selection was made based on the results seen in Figure 5.9. As mentioned
earlier, the detection models using feature set C and D cannot be employed for
identification without any further analysis. For this reason, next to the detection
time a symbol was added in Table 5.3 to indicate where the fault is. The symbol
“+” means that the fault is estimated to be in the inner raceway and the symbol
“−” means that it is estimated to be in the outer raceway. These symbols were
calculated using the following procedure.
1. Calculate the median value of each defect frequency peak (i.e. each peak
of the inner and outer defect frequencies and their harmonics) during the
training period.
2. Sum together the defect frequencies related to the inner race, Ipeaksumof median
and do the same for the outer race frequencies to obtain Opeaksumof median .
3. Once the fault has been detected using the OCSVM, calculate the mean of
the points in the evaluated sliding window, i.e. the mean of five measure-
ment points (Ipeaksumof mean and Opeaksumof mean ).
4. If the ratio between Ipeaksumof mean and Ipeaksumof median is larger than that
between Opeaksumof mean and Opeaksumof median , the symbol is “+” , and if
not, the symbol is “−”.
When studying Table 5.3, one can observe that all the inner race faults were
detected at the correct time using feature set A and model 1. However, when
comparing feature sets A and B, one can see that the outer race-sensitive features
worked almost as well as the inner race-related ones. Furthermore, during test 4
the outer race features caused a false alarm at a time of 88 minutes. A possible
reason for this false alarm can be seen in Figure 5.10b, where the trend of the
outer race-related features is shown. It is most likely the case that, after the
training period, the running-in smoothing occurred around 88 minutes and all the
values dropped below the nominal baseline value. For this reason the OCSVM
trained detection model shows this as a collective anomaly. However, during this
period the system was most likely running even better than it had been doing
at the beginning. This phenomenon is likely to happen when the training period
67
cannot cover all the nominal situations of the system and can be considered as a
disadvantage of the method.
By comparing the models trained with input feature set C with those trained
with set A and B, similar results can be seen. With set C and model 1 (M1),
detection is achieved in all the tests almost at the same time as it is achieved
with models only trained with features related to the fault that caused the defect.
Moreover, when comparing the detection time with the real failure time, one can
observe that the detection was achieved almost always shortly after the defect
had been initiated. The only exception was in test 8, where the detection was
made 13 minutes later than the optimal time. However, even during this test, the
detection took place at the same time as the detection achieved with feature set
B. When studying the symbols indicating the fault location for all the detection
models, one can observe that the identification was almost always successfully
accomplished when the detection time was near the actual failure time. It was
only during test 5, using model 1 and 2 with feature set C, that the fault was
estimated to be in the outer raceway while in reality it was in the inner raceway.
As can be seen in Table 5.3, the models using feature set D performed poorly.
The detection time was mostly wrong, and occasionally it could not detect the
failure at all (e.g. test 7). The initial idea of using this model was to be able to
ignore all the changes which would affect both the inner and outer race-related
features at the same time. However, at least for the setup studied in the present
research, where the speed, noise and operation attributes remained similar, these
types of features behaved poorly.
In overall, using one OCSVM and features extracted using enveloped signals
filtered with specific defect frequencies seems to provide a viable solution for
estimating the time of the initial bearing damage. Identifying the location of the
damage was more difficult since the features were also sensitive for the detection
of other types of failures than those which they were meant to find. The reason
for this seems to be that the whole noise level of the enveloped signal increases
and, consequently, every model finds points after the defect to be anomalous. One
possible method for performing the fault identification after the detection is to
take the baseline median and compare it to the mean value during the detection.
However, no clear advice can be provided as to which method should be used at
68
this point. The benefit of comparing the baseline median to the current mean is
the simplicity of this approach. By comparing the laboratory test results with the
results from the wind turbine case study, the following similarities and differences
can be found.
• In both cases, when the ν parameter value was low, good models were
achieved. This would suggest that it is easier to find good models by chang-
ing the γ value and selecting an appropriate number of consecutive outliers
to set the model for detection of the presence of a fault.
• In both cases, using the OCSVM detection was successful when spalling
occurred.
• No reliable feature sets were found which could avoid false alarms and ac-
complish significantly earlier detection of faults.
• By comparing models trained using only inner and outer race-related fea-
ture sets, it was not possible to identify the fault location and differentiate
between inner and outer raceway failures.
However, it is worth mentioning that these laboratory tests are not ideal case
studies for detecting small initial defects, since the size of the bearing was rather
small (with an outer diameter of 22 mm) and the sensor was located near the
bearing. The reason why a small bearing might behave differently from a large
bearing is the different ratios between the initial defect and the ball size for
these two types of bearing. The metal grain size is the same for the big and the
small bearing, but the contact area for a large ball is greater and creates smaller
vibration levels. Therefore, it would be interesting to produce a similar dataset
using a larger bearing for comparison.
69

purposes?
In the study presented in Paper C, a method called the variational Bayesian

for Gaussian mixture model (VBGM) was investigated concerning its use for
detecting the estimated operation modes of an underground loader using vibration
data acquired for monitoring the condition of the front axle. The advantage of
this method over similar methods (for instance EM algorithm, see section 2) is
that according to its developers [Schölkopf et al., 2001], it can reduce the number
of clusters if the number of clusters initially selected is too high. It is a parametric
approximation model which is reasonably fast for online use and can be used for
large-scale assets. Therefore, it should be very suitable for use in the era of M2M
communication when the operation modes need to be defined separately for each
component. However, the disadvantage of this method is that it cannot select
the best features automatically, and therefore the results are highly dependent
on the selected input features. It is most likely that for each case there is some
feature set which should work extremely well, and that an extensive amount of
manual work is required before this set can be found. In the study presented in
Paper C, the input features selected were time domain features extracted from
the vibration signals. The assumption was that these time domain features (the
RMS, peak, skewness and kurtosis) can detect the general change of appearance
in the vibration signals, which will be transformed when the operation mode is
changing from one to another. Furthermore, the speed of the Cardan axle was
selected as one of the inputs, since it was shown to be a clear indicator of many of
the operation modes sought (see Paper C for more details). The VBGM algorithm
was run three times by changing the day when the vibration signal was acquired.
The settings were kept at default values and the initial K value was set to ten.
Some of the results can be seen in Figure 5.11. One can observe that there are
some changes in the position between each cluster found when comparing the
results for the different days with one another, but overall the results were found
to be promising. However, two problems were encountered which were related to
70
the number of clusters and to the fact that one has no knowledge of what each
cluster
represents. For this reason, the clusters were infected using a smaller set
of data where the operation mode was known in order to determine whether a
major part of these data instances would go for one particular cluster. Even
though some part of the data needs to be labelled, this method still has some
advantages over supervised methods. For instance, there is a possibility that
these labelled data may not be distributed over a certain cluster, which would
be a sign that there are some unknown operation modes which have not been
considered and re-evaluation can be performed. Moreover, it may be possible to
merge some of the clusters if the labelled data are distributed evenly in a few
clusters. However, sometimes this can also be an indicator that the considered
operation mode should be separated into two different groups instead using only
one group.
Figure 5.13, the model 1 clusters for data collected on a Tuesday are labelled
and similar clusters have been merged using the collected infection data (see
Paper C for more details). The results indicate that the final model using the
RMS together with the speed was able to detect the different operation regimes,
and is a good model for detecting operation regimes.
71
Model 1, Tuesday Model 1, Wednesday Model 1, Thursday
4 4 4
3 3 3
RMS-V
RMS-V
RMS-V
2 2 2
1 1 1
0 0 0
-1 -1 -1
4 4 4
4 4 4
2 2 2 2 2
2
0 0 0 0 0 0
RMS-H Speed RMS-H Speed RMS-H Speed
Model 2, Tuesday Model 2, Wednesday Model 2, Thursday
4 4
4
Kurtosis-V
Kurtosis-V
Kurtosis-V
3 3
3
2 2
2
1 1
1
0 0
0
4 4 4 4 4 4
3 3 3
2 2 2
2 2 2
1 0 1 0 1 0
0 0 0
Kurtosis-H Speed Kurtosis-H Speed Kurtosis-H Speed
Figure 5.11: Results obtained using the VBGM method for data collected on
separate days. The speed is the rotation speed of the Cardan axle and the features
were obtained from acceleration sensors mounted vertically and horizontally on
the front axle. The initial K value was 10. All the values were normalized using
the z-score method.
72
Cluster 1
Data
Cluster 2
(a) Data before clustering. (b) Data being clustered into

separated groups.
Cluster 1 Regime 1
Cluster 2
Regime 2
Regime 1
Regime 2
(c) Infecting groups by us- (d) Clusters with correct la-

ing a smaller set of data col- bels.
lected during a specific time
period.
Figure 5.12: Proposed method how to determine operation groups by using un-
supervised clustering.
FINAL MODEL
Loading
Transit (AUTO)
Transit (MAN)
4 Hauling
Idle
3 Unknown (Noise)
RMS-V
-1
4
2
0 2 4
0
RMS-H
Speed
Figure 5.13: Final model where clusters (model 1 Tuesday) have been merged
based on the collected infection data.
73

RQ 3 is answered in Papers D and C. The method proposed in these papers

uses a Random Forest and provides many benefits which make it possible to
reduce the feature size and select only those features which are sensitive enough
to find the fault in question [Cutler and Zhao, 2001]. One of the benefits of using
a Random Forest is that this approach does not require a large amount of fine
tuning and adjustments to obtain a satisfactory model. This is an important
prerequisite when selecting a machine learning method; it should be possible to
use such a method on several occasions without spending too much time adjusting
it to obtain good results [Cutler and Zhao, 2001].
In order to reduce the need for ad hoc adjustments, a method was proposed
in Paper D which takes into account the load and where the Random Forest
is trained in several situations and the results are then compared. The pro-
posed method should reduce the number of bad features which only appear ef-
fective because system resonances or other non-fault-related factors are changing
when comparing healthy- and faulty-condition datasets. Figure 5.14 presents a
flowchart describing how this should be accomplished. The separation and se-
lection of load zones or other non-fault-related factors can be achieved using the
method presented in Paper C (see Figure 5.12) or other similar methods, depend-
ing on the system. For instance, if the load zones can be measured accurately
with a load sensor and the operation of the system is well known, pre-processing
of the data using the method presented in Paper C is not needed.
The method was tested with the following seeded faults: angular misalign-
ment, offset misalignment, a partially broken gear tooth and uniformly macro-
pitted gear wear. The feature set was extracted using wavelet analysis where
several Morlet mother wavelet bandwidths were tested. The results can be seen
in Tables 5.4 and 5.5
Studying the threshold values shown in Tables 5.4 and 5.5, we can observe
that no specific loads were better than others at detecting the gear faults. Top
variables for macro-pitting were found when the median load was around 100
74
Raw vibration signal Parameter selection
Nominal Faulty
Separation of data) e.g. Operation modes or load zones
Parameter tuning
Continuous wavelet transform
Scales and bandwidth parameter
(Complex Morlet wavelet
e.g RMS, peak and entropy

Statistical parameters
values of each coefficient)
Random Survival forests Nodesize, mtry and no. of trees
Comparison of top features
No Yes
Non-zero union found End
Figure 5.14: Flowchart for selecting features and the parameters which need to
be considered.
Nm for all the low fb values, but never in all the investigated load zones. This
would indicate that, when using the given wavelet analysis, there is no benefit
to be gained from separating the data into their own load zones, since common
variables cannot be found.
It appears that features based on the entropy and peak values are very sensi-
tive to load change, as many of the intersected variables are based solely on the
RMS values (see Table 5.6). This suggests that the RMS is a better feature than
the entropy or peak when the load is changing.
One disadvantage of using a Random Forest is that evaluating the model size
can be a slow process. Another disadvantage is that it is a black box approach and
even excellent results can be hard to validate [Cutler and Zhao, 2001]. Therefore,
the present author recommends that this type of technique should not be used
every time a fault detection system is needed, but only on those occasions when
previously tried models with common features (see Papers A and B) have failed
and have been unable to detect the fault.
75
Table 5.4: Random Forest results using features extracted by using complex
Morlet wavelet.
Bandwidth Distribution of top variables
Fault type Load No. top variables Depth threshold RMS Peak Entropy
50 Nm 8 4.3283 8 0 0
Angular misalignment 100 Nm 1 3.474 1 0 0
150 Nm 112 4.3613 61 28 23
50 Nm 76 1.0545 54 11 11
Offset misalignment 100 Nm 0
150 Nm 84 2.3292 57 14 13
fb = 0.01
50 Nm 9 3.2947 7 1 1
Tooth failure, gear 100 Nm 90 1.29 56 16 18
150 Nm 9 4.2133 8 0 1
50 Nm 0
Macro-pitting, gear 100 Nm 257 4.3405 107 86 64
150 Nm 0
50 Nm 8 4.3476 8 0 0
150 Nm 111 4.2891 61 21 29
50 Nm 3 1.4849 3 0 0
150 Nm 94 2.4445 56 17 21
fb = 0.05
50 Nm 8 3.1577 6 0 2
150 Nm 87 4.4262 51 15 21
50 Nm 199 3.0884 100 48 51
150 Nm 0
50 Nm 4 4.1741 2 0 2
150 Nm 129 4.9589 65 29 35
50 Nm 79 1.0297 45 10 24
150 Nm 90 2.4029 63 13 14
fb = 0.1
50 Nm 100 2.4745 66 9 25
Tooth failure, gear 100 Nm 0
150 Nm 91 4.3337 50 17 24
50 Nm 0
150 Nm 0
50 Nm 111 5.0662 60 19 32
150 Nm 86 4.106 59 10 17
50 Nm 0
Offset misalignment 100 Nm 109 3.4691 59 15 35
150 Nm 115 2.3642 64 17 34
fb = 0.5
50 Nm 102 2.1907 69 11 22
Tooth failure, gear 100 Nm 0
150 Nm 4 3.3228 2 1 1
50 Nm 0
150 Nm 0
76
Table 5.5: Random Forest results using features extracted by using complex
Morlet wavelet.
Bandwidth Distribution of top variables
Fault type Load No. top variables Depth threshold RMS Peak Entropy
50 Nm 11 4.25 11 0 0
150 Nm 86 4.328 53 8 25
50 Nm 0
150 Nm 126 2.478 65 22 39
fb = 1
50 Nm 109 3.0175 75 12 22
150 Nm 94 3.3357 60 16 18
50 Nm 0
Macro-pitting, gear 100 Nm 0
150 Nm 0
50 Nm 8 4.1441 7 0 1
150 Nm 88 4.153 51 9 28
50 Nm 83 1.2533 47 24 12
150 Nm 0
fb = 3
50 Nm 96 3.3474 67 18 11
150 Nm 92 3.4083 60 13 19
50 Nm 0
150 Nm 0
50 Nm 9 4.1637 9 0 0
150 Nm 86 4.084 51 9 26
50 Nm 88 1.305 53 25 10
150 Nm 0
fb = 5
50 Nm 96 3.4203 61 27 8
150 Nm 82 3.4183 50 13 19
50 Nm 0
150 Nm 0
50 Nm 87 4.4439 58 25 4
150 Nm 81 4.0654 47 12 22
50 Nm 81 1.4162 47 27 7
150 Nm 0
fb = 10
50 Nm 112 3.4673 64 31 17
150 Nm 98 3.3591 58 19 21
50 Nm 0
150 Nm 0
77
Table 5.6: Intersection of top variables and best variables found after running the
Random Forest algorithm with a changing load.
TOP 5 Variables*
Bandwidth Fault type No. common variables 1 2 3 4 5
P-freq
Angular misalignment 0
Feature
P-freq
Offset misalignment 0
Feature
fb = 0.01
P-freq 15,8
Tooth failure, gear 1
Feature RMS
P-freq
Macro-pitting, gear 0
Feature
P-freq 1181,5 1670,9 1701,9 3974,2 4047,9
Feature RMS RMS RMS RMS RMS
P-freq
Feature
fb = 0.05
P-freq 15,8 21,95
Feature RMS RMS
P-freq
Feature
P-freq 3974 4048
Feature RMS RMS
P-freq
Feature
fb = 0.1
P-freq
Feature
P-freq
Feature
P-freq 106,4 114,2 124,2 126,5 135,8
P-freq
Feature
fb = 0.5
P-freq
Feature
P-freq
Feature
P-freq 271,5
Feature RMS
P-freq
Feature
fb = 1
P-freq 114,2 124,2 178,9 496,8 506,0
Feature RMS RMS Ent. RMS RMS
P-freq
Feature
P-freq
Feature
P-freq
Feature
fb = 3
P-freq 114,2 496,8 1086,1 2024,0 2583,2
P-freq
Feature
P-freq 1291,6
Feature RMS
P-freq
Feature
fb = 5
P-freq 2810,2 2862,3 3072,0 3653,2 5166,5
P-freq
Feature
P-freq 1182 1203 1292 1292 17378
Feature RMS RMS RMS Peak RMS
P-freq
Feature
fb = 10
P-freq 417,7 496,8 702,5 715,6 851,0
P-freq
Feature
*If more than 5 common features are found, order was calculated based on load group which had the highest load.
Pseudo-frequency (P-freq [Hz]) was calculated using the equation P-freq=center frequency*sample rate/scale
78

load?
The final framework can be seen in Figure 5.15. In this framework, there are
the following three phases:“start”, “analyse and create database” and “advanced
prognostic approach”. “Start” is the phase which is run every time events have
occurred, e.g. stoppages or changes in the system or in its operation. The idea in
this connection is to keep the database valid without causing extensive manual
work.
Start -phase
In the “start” phase, the first task is to define the criticality of each monitored
component and the system. As discussed in Papers A and B, once this criticality
has been defined, it can give some indications as to how to adjust the detection
algorithms for the given component or system. FMECA and risk analysis seem to
be two of the few viable practical solutions, as stated in the ISO standard for con-
dition monitoring [ISO 17359, 2011]. However, criticality estimation can be based
on either qualitative or quantitative approaches, as discussed in Paper A. Quan-
titative methods seem to be the methods which should be preferred. However,
these require extensive historical data on breakdowns, which may not be avail-
able, and, if they are available, their analysis requires thorough pre-processing.
For this reason, criticality estimation may in practice be based on qualitative
approaches in the beginning. Although qualitative approaches may be less ac-
curate, they have one advantage over quantitative approaches, namely that they
can engage the owner of the asset in estimating the criticality of the system more
comprehensively, which can be beneficial if the detection algorithm is wrongly
adjusted because the criticality was wrongly estimated in the beginning. Nev-
ertheless, estimating the criticality will require some manual work. The second
79
task in the proposed framework is to define when a process cycle is ending. This
task can be based on the experience of experts alone, or it can be supported by
other data sources, such as the CMMS system, where accurate information about
the process is stored reliably. This is important for novelty detection algorithms,
which are trained to map the nominal space and generate an alarm when features
drift away from this mapped space, since, if all the operation modes are not seen
during the training phase, the detection algorithm may cause false alarms when
unseen operation modes are occurring. Once this cycle is completed, one can
decide either to use the whole dataset for training the detection algorithms or to
separate it using, for instance, a method explained in Paper C. For prognostic
purposes, it can be important to segment each operation mode of a whole process
cycle, which in many approaches can improve the RUL estimation with rather
small efforts. In this proposed approach the training period can be the same for
novelty detection and operation mode clustering. However, the collected training
data should not be the same. For instance, novelty detection should be trained
with features which are sensitive to certain faults or are able to detect all the
possible fault types, if they cannot be known in advance. In contrast, operation
mode clustering features should be sensitive to operational changes and insensi-
tive to small defects such as cracks or minor component wear. If time domain
vibration features such as the RMS are used for separating operation modes in
the beginning, the same feature set cannot be used as an input for the novelty
detection algorithm.
Analyse and create database -phase

In the second phase, estimations should be performed to select the most efficient
diagnostic and prognostic approach when the system is no longer new and there
are records of failures and events available. Here there should most likely be a
transition from using unsupervised techniques, such as those used in Papers A and
B, to using supervised ones, such as those used in Paper E. The reason why one
should start using supervised approaches, even though unsupervised approaches
may have worked initially, is that supervised approaches can optimize the model
more accurately by using a cross-validation or other similar procedures. However,
80
this transition does not have to happen immediately and in the beginning both
unsupervised and supervised methods can work in parallel. Once the first failure
data have been seen and have been recorded in the database, they can be used
for tuning the parameters of the novelty detection algorithm by calculating the
sensitivity and specificity of the model (as explained in Paper A) and by select-
ing tolerable values keeping in mind the criticality of the system. Later, this
model can be used together with the supervised methods, such as the SVM, for
identifying future faults.
The aim of the second phase of the framework is to estimate the quality of the
data in order to select optimal diagnostic and prognostic approaches. For some
systems, records might be available concerning failures seen and data acquired
on faults. In these cases, supervised classification algorithms can even be used at
the beginning instead of unsupervised algorithms. For prognostic purposes, these
historical data can help one to decide what type of algorithm can be used. In the
research for this thesis, prognostic techniques have not been studied or analysed.
However, studies performed by Roemer et al. [2006] and Wesley Hines and Usynin
[2008] deal with these techniques and categorize them into four classes. Empirical-
model-based prognostics (EMBP) is a technique that was not included in their
four classes. EMBP differs from failure-data-based prognostics in the way in
which the data have been acquired. In EMBP the data for the models are collected
in laboratory conditions, whereas in failure-data-based prognostics, the data are
collected from the monitored system. Most likely EMBP is the only possible
prognostic approach when there are no historical failure data in the beginning or
there are no physical models available to estimate the failure propagation.
Advanced prognostic approach -phase

The third phase concerns selection of the optimal prognostic approach. The pre-
requisite for more accurate approaches is a correct estimation of the future use of
the system. Two important factors for achieving this are knowledge of the period
of time for the usage and knowledge of the stresses during the usage. It is most
likely that in the future, when embedded smart sensors (see for instance [SKF
AB, 2013]) will be installed in systems, it will be possible to know the stresses
81
in some components of the system quite accurately. However, this will create a
greater need for techniques such as those explained in Paper C, i.e. techniques for
separating the operation modes and for estimating the future use for each mode
separately. When there are embedded sensors available for a certain system, there
may be a way to calculate the stress directly or indirectly for that system using
process parameters as an input. In cases such as this, the data can be sepa-
rated based directly on the modes which can predict the level of stress instead
of being separated based on all the operation modes. If the relation between the
measured parameters and the stress is unknown, it may also be possible to use
physical models where the stresses in a specific point can be estimated using vir-
tual sensors when one possesses knowledge of some of the process parameters or
the power usage. However, the use of virtual sensors requires precise simulations
with accurate models and usually requires extensive prior work. Moreover, these
models should be kept up to date when changes are made to the system. For these
reasons, their usability is mainly limited to applications for critical systems, and
in the near future it is difficult to imagine that these types of models will be in
generic use.
The purpose of the proposed framework is to outline the interrelation of each
individual method discussed in the Papers A–E and what other type of aspects
needs to be considered when creating a functional PHM techniques. Note that
many of the steps illustrated in the structure requires multiple sub-steps, which
are not discussed in details. Therefore the solution to solve the research question
4 cannot be said to be complete. Nevertheless the proposed framework can act
as a rough guideline and a reference point for future research topics.
82
Step 1 Start
FMECA, Risk analysis Novelty detection

No
Define criticality and

Process cycle Operation mode
monitored Collect training data
completed? separation
Yes components
NEW SYSTEM? Yes
No
Use existing System or operation Write database and
READ DATABASE No
diagnostic models changes? proceed to step 2
Step 2 Analyse and create database

Supervised
multiclass
classification and
Several failure Yes separation of failure
modes? data
Proceed to step 3
No
Yes
Supervised
classification
Record of failures
DATABASE
and events?
No
Empirical-model-
Novelty detection
based prognostics
Possible to create Create failure

Diagnostics Yes
syntetic failure data? dataset
PHYSICAL MODELS No
Ignore
Physical model of No Update database
Prognostics fault propagation
existing? Yes
Extrapolate records
of censored data or
create new failure
dataset
Step 3 Advanced prognostic approach
ESTIMATE FUTURE
Is it possible to Failure-data-based
OPERATION MODES
Stress can be estimate the stress No prognostics
estimated using No qualitatively using
virtual sensor? past failure and
No
event data? Yes
SEPARATION OF Stress can be
DATA measured? Stress-based
Yes prognostics
Yes
Can degradation No
Calculate mean measure of how the
FAULT DIAGNOSIS stress of each system is
operation mode functioning be Yes Effects-based
created? prognostics
Figure 5.15: Flowchart of the proposed PHM approach; the operation mode and
failure mode estimations affect the selection of the prognostic approach.
83
84
Chapter 6
CONCLUSIONS AND FUTURE

WORK
6.1 Conclusions
The following conclusions can be made on the basis of the research performed for
this thesis to answer the four research questions (RQs) formulated.

how critical the machinery is and its usage?
• The proposed use of a criticality index together with the OCSVM allows
easier parameter selection for the fault detection model by enabling one to
choose between under-fitted and over-fitted models more easily. The diffi-
cult questions still arise as to how to select a good combination of model tun-
ing parameters and how to select the threshold for allowed outside points.
• It was found that OCSVM detection models using combined features where
inner race and outer race related features are used together as feature set
improved the detection possibilities of the inner race faults of the studied
wind turbine bearing compared with traditional methods. As for the the
laboratory tests, almost all tests showed that the detection time was same
or close to the one with using separated inner and outer features. This
approach of combining features can reduce the number of detection models
needed when one model can identify two different fault types.
85
6. CONCLUSIONS AND FUTURE WORK
• No robust method was found to identify the bearing fault location of a wind
turbine bearing or bearing analysed at the laboratory using the investigate
OCSVM algorithm together with the features extracted using the envelop-
ing method. Therefore additional step is needed in order to identify the
correct fault location. Comparing the mean of each defect related features
against the median value during the training showed to be good candidate
to be used after the detection for identifying the fault location.
purposes?
• Using the angular speed of the Cardan axle and the RMS on the horizontal
and vertical level made it possible to separate the operation modes of an
underground loader into different classes. Therefore, the operation mode
classes could be used as a pre-processing step which could define the time
and position for acquiring the diagnostic data. Later, these operation modes
could help one to select and use an appropriate RUL estimator.
• The use of statistical features such as kurtosis, skewness and peak values,
which were extracted from the vibration signals, failed to detect the oper-
ation regimes of an underground loader using the proposed method.
• Using Random Forests to select top variables and comparing the results
with separated modes can help in selecting good load-indifferent features
generally not considered when selecting features manually.
• A wavelet bandwidth parameter (fb ) shorter than the time between indi-
vidual gear tooth impacts was successful in finding two top variables using
Ran, whose pseudo-frequency was rather low when the fault in question
was a partially broken gear tooth. When fb was increased slightly to cover
two repetitive impacts, sensitivity was lost and no top variables were found.
86
When the pseudo-frequency was increased even further, top common vari-
ables were found again, but their pseudo-frequency was much higher than
that found with low fb values.
• None of the top features found were load-indifferent when the fault in ques-
tion was macro-pitting. Therefore, the load should remain the same during
the training of a classifier and when acquiring the testing points for that
particular fault type.
• Taking the intersection of the top variables where only the load changed
showed almost conclusively that in the proposed method shown in Paper D,
the RMS level of the wavelet-filtered vibration signals was a better feature
for classifying mechanical faults than the entropy or peak value.
load?
• The criticality class of the system and its components should be taken into
account when tuning a detection algorithm for the first time.
• When a decision has been taken to use supervised methods instead of unsu-
pervised methods, there should be a smooth transition during which both
techniques should be working in parallel. One option is to use unsuper-
vised methods, such as OCSVM, and adjust the model tuning parameters
using the labelled faulty data instead of training a supervised classification
immediately after the first fault has occurred.
6.2 Future work

When developing a diagnostic and prognostic technique combining several types
of information sources and PHM methods in the future, especially when using
87
smart sensors, top priority should be given to the following two objectives: firstly,
making the technique as generic as possible and, secondly, making the transition
from diagnostics to prognostics as easy as possible so that the technique can be
upgraded, if necessary. The first objective implies, for instance, that the tech-
nique developed should be capable of detecting bearing faults in different systems
which resemble each other, in order to minimize the time and manual work re-
quired for implementing the method. In connection with the second objective, it
should be noted that the primary demand for many systems is merely to know
when faults have occurred without any estimation of the RUL. However, more
effort should be invested in developing techniques where the transition from di-
agnostics to prognostics requires less effort and. Moreover, one should strive to
develop techniques where the time spent on training and gathering the diagnos-
tics data can be partially used in prognostics not only for the use of a triggering
method to acquire the best possible time for gathering the diagnostic data, but
also to collect more data and then distinguish the best time for performing the
diagnostic step. Moreover methods which can minimize the need of considering
multiple tuning parameters and additional rules during the PHM process should
be prioritized. It is also important to develop plug-and-play smart sensors which
can be used for detecting faults in systems or system components. Such sensors
have not previously played a role in CBM, and they can expand the possibilities
of CBM by providing knowledge of the state of the asset from much wider per-
spectives. Furthermore, the data collected with plug-and-play smart sensors can
be combined and used for assessing the state of the critical parts of the system
more accurately by using data fusion techniques. The prognostic techniques re-
ferred to as type 3 approaches in this thesis require knowledge of how the stress
varies during the operation cycle. However, in practice, using specific sensors
for measuring the load may be too expensive, and therefore indirect methods
are most likely the only or at least the first option and should be the focus of
more research in the future. However, this may lead to the use of approximation
techniques, which may never be as good as simpler techniques using direct sensor
readings. Nevertheless, approximation techniques can be implemented for several
applications even when the dataset is not perfect. The next practical step in the
future should be to conduct more degradation studies where, for instance, the
88
size of the spall of the bearing surface can be estimated using vibration analysis
or other analysis methods. The aim should be to find, for prognostic purposes,
good features which can provide accurate knowledge of the state of the faulty
system when variables such as the speed and load are changing. Later, these
results should be compared with the estimated operation modes in order to de-
termine the relation between each operation mode and the degradation speed. In
addition, further research needs to be performed on the implementation of the
approaches documented in this thesis for other real-world applications, in order
to verify these approaches and investigate whether they might have unexpected
shortcomings.
89
90
REFERENCES
Ala Al-Fuqaha, Mohsen Guizani, Mehdi Mohammadi, Mohammed Aledhari, and

Moussa Ayyash. Internet of things: A survey on enabling technologies, pro-
tocols, and applications. IEEE Communications Surveys & Tutorials, 17(4):
2347–2376, 2015. 6
R.L Allen and D. Mills. Signal analysis: time, frequency, scale, and structure.
Wiley InterScience, 1993. 17
Jerome Antoni. Fast computation of the kurtogram for the detection of transient
faults. Mechanical Systems and Signal Processing, 21(1):108–124, 2007. 18
Authority, Civil Aviation. Intelligent management of helicopter vibration health

monitoring report. CAA Paper, 1:2011, 2011. 23
Hojat Heidari Bafroui and Abdolreza Ohadi. Application of wavelet energy and
shannon entropy for feature extraction in gearbox fault detection under varying
speed conditions. Neurocomputing, 133(0):437 – 445, 2014. ISSN 0925-2312.
doi: http://dx.doi.org/10.1016/j.neucom.2013.12.018. 17
E Baroth, W Powers, Jack Fox, B Prosser, J Pallix, K Schweikard, and J Zakra-

jsek. Ivhm (integrated vehicle health management) techniques for future space
vehicles. In 37th Joint Propulsion Conference and Exhibit, page 3523, 2001.
12
W. Bartelmus and R. Zimroz. Vibration condition monitoring of planetary gear-

box under varying external load. Mechanical Systems and Signal Processing,
23(1):246–257, 2009. doi: 10.1016/j.ymssp.2008.03.016. 17
91
REFERENCES
N. Baydar and A. Ball. A comparative study of acoustic and vibration signals in

detection of gear failures using wigner-ville distribution. Mechanical Systems
and Signal Processing, 15(6):1091–1107, 2001. doi: 10.1006/mssp.2000.1338.
16
Mohammed Ben-Daya, Uday Kumar, and D.N. Prabhakar Murthy.

Condition-Based Maintenance, pages 355–387. John Wiley & Sons, Ltd,
2016. ISBN 9781118926581. doi: 10.1002/9781118926581.ch16. 11, 12
Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001. 20
V Camerini, G Coppotelli, and S Bendisch. Fault detection in operating helicopter

drivetrain components based on support vector data description. Aerospace
Science and Technology, 73:48–60, 2018. 23, 25
Center for Machine Learning and Intelligent Systems. Machine learning reposi-
tory, 2007. URL http://archive.ics.uci.edu/ml/about.html. 31
Faicel Chamroukhi, Allou Samé, Patrice Aknin, and Gérard Govaert. Model-
based clustering with hidden markov model regression for time series with
regime changes. In Neural Networks (IJCNN), The 2011 International Joint
Conference on, pages 2814–2821. IEEE, 2011. 25
Min Chen, Shiwen Mao, and Yunhao Liu. Big data: A survey. Mobile Networks
and Applications, 19(2):171–209, 2014. 25
Leon Cohen. Time-frequency distributions-a review. Proceedings of the IEEE,

77(7):941–981, 1989. 15, 17
Adele Cutler and Guohua Zhao. Pert-perfect random tree ensembles. Computing
Science and Statistics, 33:490–497, 2001. 74, 75
Tamraparni Dasu and Theodore Johnson. Exploratory data mining and data
cleaning, volume 479. John Wiley & Sons, 2003. 14, 30
Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood

from incomplete data via the em algorithm. Journal of the royal statistical
society. Series B (methodological), pages 1–38, 1977. 25
92
REFERENCES REFERENCES
Pedro Domingos. A few useful things to know about machine learning.

Communications of the ACM, 55(10):78–87, 2012. 18, 19
Xianfeng Fan and Ming J. Zuo. Gearbox fault detection using hilbert and wavelet
packet transform. Mechanical Systems and Signal Processing, 20(4):966 – 982,
2006. ISSN 0888-3270. doi: http://dx.doi.org/10.1016/j.ymssp.2005.08.032. 17
Charles R Farrar and Keith Worden. Structural health monitoring: a machine

learning perspective. John Wiley & Sons, 2012. 7
Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):

861–874, 2006. 28
FEMTO. Femto-st franche comt electronique mcanique thermique et optique -

sciences et technologies, 2018. URL http://www.femto-st.fr/en/Research-
departments/AS2M/Research-groups/PHM/Pronostia. 35
Diego Fernández-Francos, David Martı́nez-Rego, Oscar Fontenla-Romero, and

Amparo Alonso-Betanzos. Automatic bearing fault diagnosis based on one-
class ν-svm. Computers & Industrial Engineering, 64(1):357–365, 2013. 23,
54
Junfeng Gao, Wengang Shi, Jianxun Tan, and Fengjin Zhong. Support vector
machines based approach for fault diagnosis of valves in reciprocating pumps.
In Electrical and Computer Engineering, 2002. IEEE CCECE 2002. Canadian
Conference on, volume 3, pages 1622–1627. IEEE, 2002. 21
Isabelle Guyon and André Elisseeff. An introduction to variable and feature

selection. The Journal of Machine Learning Research, 3:1157–1182, 2003. 19
P. Hanafizadeh, J. Eshraghi, A. Taklifi, and S. Ghanbarzadeh. Experimental

identification of flow regimes in gas-liquid two phase flow in a vertical pipe.
Meccanica, pages 1–12, 2015. doi: 10.1007/s11012-015-0344-4. 24
Ibrahim Abaker Targio Hashem, Ibrar Yaqoob, Nor Badrul Anuar, Salimah
Mokhtar, Abdullah Gani, and Samee Ullah Khan. The rise of big data on
cloud computing: Review and open research issues. Information Systems, 47:
98–115, 2015. 14
93
REFERENCES
Aiwina Heng, Sheng Zhang, Andy CC Tan, and Joseph Mathew. Rotating ma-
chinery prognostics: State of the art, challenges and opportunities. Mechanical
systems and signal processing, 23(3):724–739, 2009. 12, 24
Melinda Hodkiewicz and Mark Tien-Wei Ho. Cleaning historical maintenance

work order data for reliability analysis. Journal of Quality in Maintenance
Engineering, 22(2):146–163, 2016. 14
Brian T Holm-Hansen and Robert X Gao. Smart bearing utilizing embedded

sensors: design considerations. In Proc. SPIE 4th International Symposium on
Smart Structures and Materials, pages 602–610, 1997. 6
SpectraQuest Inc. Spectraquest inc.,: Machinery fault simulators, 2018.

URL http://spectraquest.com/products/simulators/machinery-fault-
simulators. 35
ISO 13372. 13372: 2012 (2012),condition monitoring and diagnostics of machi-

nesvocabulary. Cesar, A., Hernandez, S., Luis, F., Pedraza, M., Octavio, J.
and Salcedo, P, 2012. 5, 6
ISO 17359. Condition monitoring and diagnostics of machines - general guidelines.

2011. 29, 31, 79
ISO 2007281. 281: 2007. Rolling BearingsDynamic Load Ratings and Rating
Life, 2007. 26
ISO 55000. Asset management – overview, principles and terminology, 2014. 4
David L Iverson. Inductive system health monitoring. 2004. 24
Anil K Jain. Data clustering: 50 years beyond k-means. Pattern recognition

letters, 31(8):651–666, 2010. 25
Andrew KS Jardine, Daming Lin, and Dragan Banjevic. A review on machin-

ery diagnostics and prognostics implementing condition-based maintenance.
Mechanical systems and signal processing, 20(7):1483–1510, 2006. 12, 14, 30
94
Patrick W Kalgren, Carl S Byington, and Michael J Roemer. Defining phm,

a lexical evolution of maintenance and logistics. In Autotestcon, 2006 IEEE,
pages 353–358. IEEE, 2006. 11
Khairy Ahmed Helmy Kobbacy and DN Prabhakar Murthy. Complex system

maintenance handbook. Springer Science & Business Media, 2008. 3
P Konar and P Chattopadhyay. Bearing fault detection of induction motor using

wavelet and support vector machines (svms). Applied Soft Computing, 11(6):
4203–4211, 2011. 21
Sotiris B Kotsiantis, Ioannis D Zaharakis, and Panayiotis E Pintelas. Ma-

chine learning: a review of classification and combining techniques. Artificial
Intelligence Review, 26(3):159–190, 2006. 21
J Lee, H Qiu, G Yu, and J Lin. Rexnord technical services, bearing data set, ims,
university of cincinnati, nasa ames prognostics data repository, 2007. 43
Yaguo Lei, Jing Lin, Zhengjia He, and Yanyang Zi. Application of an improved
kurtogram method for fault diagnosis of rolling element bearings. Mechanical
Systems and Signal Processing, 25(5):1738–1749, 2011. 18
Bo Li, M-Y Chow, Yodyium Tipsuwan, and James C Hung. Neural-network-

based motor rolling bearing fault diagnosis. IEEE transactions on industrial
electronics, 47(5):1060–1069, 2000. 21
C James Li and Jun Ma. Wavelet decomposition of vibrations for detection of

bearing-localized defects. Ndt & E International, 30(3):143–149, 1997. 17
Xu Li, Xunan Zhang, Chenchen Li, Li Zhang, et al. Rolling element bearing fault
detection using support vector machine with improved ant colony optimization.
Measurement, 46(8):2726–2734, 2013. 22
Yongxin Liao, Fernando Deschamps, Eduardo de Freitas Rocha Loures, and Luiz
Felipe Pierin Ramos. Past, present and future of industry 4.0-a systematic liter-
ature review and research agenda proposal. International Journal of Production
Research, 55(12):3609–3629, 2017. 4
95
REFERENCES
Janet Lin, Thomas Nordmark, and Liangwei Zhang. Data analysis of heavy haul
wagon axle loads on Malmbanan line, Sweden: A case study for LKAB. Luleå
tekniska universitet, 2016. 24
Roland Löwe, Henrik Madsen, and Patrick McSharry. Objective classification of

rainfall in northern europe for online operation of urban water systems based
on clustering techniques. Water, 8(3):87, 2016. 24
Arnaz Malhi and Robert X Gao. Pca-based feature selection scheme for machine
defect classification. IEEE Transactions on Instrumentation and Measurement,
53(6):1517–1525, 2004. 14, 19
S Martin-del Campo and Fredrik Sandin. Online feature learning for condi-
tion monitoring of rotating machinery. Engineering Applications of Artificial
Intelligence, 64:187–196, 2017. 19
David Martı́nez-Rego, Oscar Fontenla-Romero, and Amparo Alonso-Betanzos.

Power wind mill fault detection via one-class ν-svm vibration signal analysis.
In Neural Networks (IJCNN), The 2011 International Joint Conference on,
pages 511–518. IEEE, 2011. 23
PD McFadden and JD Smith. Vibration monitoring of rolling element bearings

by the high-frequency resonance techniquea review. Tribology international,
17(1):3–10, 1984. 15
Bjoern H Menze, B Michael Kelm, Ralf Masuch, Uwe Himmelreich, Peter Bachert,
Wolfgang Petrich, and Fred A Hamprecht. A comparison of random forest
and its gini importance with standard chemometric methods for the feature
selection and classification of spectral data. BMC bioinformatics, 10(1):213,
2009. 20
SA Neild, PD McFadden, and MS Williams. A review of time-frequency methods

for structural vibration analysis. Engineering Structures, 25(6):713–728, 2003.
15
Abtin Nourmohammadzadeh and Sven Hartmann. Fault classification of a

centrifugal pump in normal and noisy environment with artificial neural
96
network and support vector machine enhanced by a genetic algorithm, pages

58–70. 2015. 22
Julian D Olden and Donald A Jackson. Illuminating the black box: a random-
ization approach for understanding variable contributions in artificial neural
networks. Ecological modelling, 154(1-2):135–150, 2002. 18
Alan V Oppenheim and Ronald W Schafer. From frequency to quefrency: A

history of the cepstrum. IEEE signal processing Magazine, 21(5):95–106, 2004.
15
Arvid. Palmgren. Ball and roller bearing engineering. SKF Industries, Inc., First
edition, 1945. 26
BA Paya, II Esat, and MNM Badi. Artificial neural network based fault di-
agnostics of rotating machinery using wavelet transforms as a preprocessor.
Mechanical systems and signal processing, 11(5):751–765, 1997. 21
Z.K. Peng and F.L. Chu. Application of the wavelet transform in machine condi-
tion monitoring and fault diagnostics: a review with bibliography. Mechanical
Systems and Signal Processing, 18(2):199 – 221, 2004. ISSN 0888-3270. doi:
http://dx.doi.org/10.1016/S0888-3270(03)00075-X. 17
Hai Qiu, Jay Lee, Jing Lin, and Gang Yu. Wavelet filter-based weak signature
detection method and its application on rolling element bearing prognostics.
Journal of sound and vibration, 289(4):1066–1090, 2006. 43
J. Rafiee, M.A. Rafiee, and P.W. Tse. Application of mother wavelet functions for
automatic gear and bearing fault diagnosis. Expert Systems with Applications,
37(6):4568 – 4579, 2010. ISSN 0957-4174. doi: http://dx.doi.org/10.1016/
j.eswa.2009.12.051. 17
RB Randall. A new method of modeling gear faults. Journal of Mechanical

Design, 104(2):259–267, 1982. 15
Robert B Randall and Jerome Antoni. Rolling element bearing diagnosticsa

tutorial. Mechanical systems and signal processing, 25(2):485–520, 2011. 15,
18
97
REFERENCES
Robert Bond Randall. Vibration-based condition monitoring: industrial,

aerospace and automotive applications. John Wiley & Sons, 2011. 6, 14,
15, 17
Michael J Roemer, Carl S Byington, Gregory J Kacprzynski, and George Vacht-

sevanos. An overview of selected prognostic technologies with application to
engine health management. ASME Paper No. GT2006-90677, 2006. 25, 26, 28,
31, 81
Edson Ruschel, Eduardo Alves Portela Santos, and Eduardo de Freitas Rocha
Loures. Industrial maintenance decision-making: A systematic literature re-
view. Journal of Manufacturing Systems, 45:180–194, 2017. 20
Yvan Saeys, Thomas Abeel, and Yves Van de Peer. Robust feature selection
using ensemble feature selection techniques. In Joint European Conference
on Machine Learning and Knowledge Discovery in Databases, pages 313–325.
Springer, 2008. 20
Bhaskar Saha, Kai Goebel, Scott Poll, and Jon Christophersen. Prognostics
methods for battery health monitoring using a bayesian framework. IEEE
Transactions on instrumentation and measurement, 58(2):291–296, 2009. 27,
31
Abhinav Saxena, Jose Celaya, Edward Balaban, Kai Goebel, Bhaskar Saha,
Sankalita Saha, and Mark Schwabacher. Metrics for evaluating performance
of prognostic techniques. In Prognostics and health management, 2008. phm
2008. international conference on, pages 1–17. IEEE, 2008. 5
Bernhard Schölkopf, John C Platt, John Shawe-Taylor, Alex J Smola, and

Robert C Williamson. Estimating the support of a high-dimensional distri-
bution. Neural computation, 13(7):1443–1471, 2001. 22, 23, 55, 70
Johann Schumann, Indranil Roychoudhury, and Chetan Kulkarni. Diagnostic

reasoning using prognostic information for unmanned aerial systems. 2015. 25
Mark Schwabacher and Kai Goebel. A survey of artificial intelligence for prog-
nostics. In Aaai fall symposium, pages 107–114, 2007. 12
98
Hyun Joon Shin, Dong-Hwan Eom, and Sung-Shick Kim. One-class support
vector machinesan application in machine fault detection and classification.
Computers & Industrial Engineering, 48(2):395–408, 2005. 23, 30, 54
Jong-Ho Shin and Hong-Bae Jun. On condition based maintenance policy. Journal
of Computational Design and Engineering, 2(2):119–127, 2015. 24
Xiao-Sheng Si, Wenbin Wang, Chang-Hua Hu, and Dong-Hua Zhou. Remain-
ing useful life estimation–a review on the statistical data driven approaches.
European journal of operational research, 213(1):1–14, 2011. 5
JZ Sikorska, Melinda Hodkiewicz, and Lin Ma. Prognostic modelling options for
remaining useful life estimation by industry. Mechanical Systems and Signal
Processing, 25(5):1803–1836, 2011. 5
SKF AB. Skf launches skf insight, groundbreaking intelligent bearing technol-
ogy, 2013. URL http://www.skf.com/fi/news-and-media/news-search/
2013-04-08-skf-launches-skf-insight-groundbreaking-intelligent-
bearing-technology.html. 34, 81
Abdenour Soualhi, Kamal Medjaher, and Noureddine Zerhouni. Bearing health

monitoring based on hilbert–huang transform, support vector machine, and
regression. IEEE Transactions on Instrumentation and Measurement, 64(1):
52–62, 2015. 21
W.J. Staszewski, K. Worden, and G.R. Tomlinson. Time-frequency analysis in

gearbox fault detection using the wigner-ville distribution and pattern recog-
nition. Mechanical Systems and Signal Processing, 11(5):673–692, 1997. 16
Eva L Suarez, Michael J Duffy, Robert N Gamache, Robert Morris, and An-
drew J Hess. Jet engine life prediction systems integrated with prognostics
health management. In Aerospace Conference, 2004. Proceedings. 2004 IEEE,
volume 6, pages 3596–3602. IEEE, 2004. 25
John A Swets, Robyn M Dawes, and John Monahan. Psychological science can
improve diagnostic decisions. Psychological science in the public interest, 1(1):
1–26, 2000. 28
99
REFERENCES
Guiji Tang, Fucheng Zhou, and Xinghua Liao. Fault diagnosis for rolling bearing
based on improved enhanced kurtogram method. In Ubiquitous Robots and
Ambient Intelligence (URAI), 2016 13th International Conference on, pages
881–886. IEEE, 2016. 18
David MJ Tax and Robert PW Duin. Support vector domain description. Pattern
recognition letters, 20(11):1191–1199, 1999. 22
David MJ Tax and Robert PW Duin. Uniform object generation for optimizing
one-class classifiers. Journal of machine learning research, 2(Dec):155–173,
2001. 23
Markus Timusk, Mike Lipsett, and Chris K Mechefske. Fault detection using
transient machine signals. Mechanical Systems and Signal Processing, 22(7):
1724–1749, 2008. 24, 30
Yasushi Umeda, Tetsuo Tomiyama, and Hiroyuki Yoshikawa. A design method-

ology for self-maintenance machines. Journal of Mechanical Design, 117(3):
355–362, 1995. 5
Case Western Reserve University. Bearing data center website, 2018.

URL http://www.femto-st.fr/en/Research-departments/AS2M/Research-
groups/PHM/Pronostia. 35
Vladimir N Vapnik. The nature of statistical learning theory. 1995. 21
J Ville. Théorie et applications de la notion de signal analytique. Câbles et

Transmission, 2:61?74, 1948. 16
WJ Wang and PD McFadden. Early detection of gear failure by vibration analysis

i. calculation of the time-frequency distribution. Mechanical Systems and Signal
Processing, 7(3):193–203, 1993. 17
Yanxue Wang, Jiawei Xiang, Richard Markert, and Ming Liang. Spectral kur-
tosis for fault detection, diagnosis and prognostics of rotating machines: A
review with applications. Mechanical Systems and Signal Processing, 66:679–
698, 2016. 18, 30
100
J Wesley Hines and Alexander Usynin. Current computational trends in equip-

ment prognostics. International Journal of Computational Intelligence Systems,
1(1):94–102, 2008. 26, 27, 31, 81
Achmad Widodo and Bo-Suk Yang. Support vector machine in machine condition
monitoring and fault diagnosis. Mechanical Systems and Signal Processing, 21
(6):2560–2574, 2007. 21
E. Wigner. On the quantum correction for thermodynamic equilibrium. Phys.

Rev., 40:749–759, Jun 1932. doi: 10.1103/PhysRev.40.749. 16
Martin Wollschlaeger, Thilo Sauter, and Juergen Jasperneite. The future of in-
dustrial communication: Automation networks in the era of the internet of
things and industry 4.0. IEEE Industrial Electronics Magazine, 11(1):17–27,
2017. 4
Jing Wu and Wei Zhao. A simple interpolation algorithm for measuring multi-
frequency signal based on dft. Measurement, 42(2):322 – 327, 2009. ISSN 0263-
2241. doi: https://doi.org/10.1016/j.measurement.2008.06.008. URL http:
//www.sciencedirect.com/science/article/pii/S026322410800105X. 57
Yingchao Xiao, Huangang Wang, Lin Zhang, and Wenli Xu. Two methods of
selecting gaussian kernel parameters for one-class svm and their application to
fault detection. Knowledge-Based Systems, 59:75–84, 2014. 23
Ruqiang Yan, Robert X Gao, and Xuefeng Chen. Wavelets for fault diagnosis of
rotary machines: A review with applications. Signal processing, 96:1–15, 2014.
17
Miin-Shen Yang, Chien-Yo Lai, and Chih-Ying Lin. A robust em clustering

algorithm for gaussian mixture models. Pattern Recognition, 45(11):3950–3961,
2012. 25
Shen Yin, Xiangping Zhu, and Chen Jing. Fault detection based on a robust one
class support vector machine. Neurocomputing, 145:263–268, 2014. 23
101
REFERENCES
Jie Yu. A support vector clustering-based probabilistic method for unsupervised

fault detection and classification of complex chemical processes using unlabeled
data. AIChE Journal, 59(2):407–419, 2013. 22
Ming J Zuo, Jing Lin, and Xianfeng Fan. Feature separation using ica for a one-
dimensional time series and its application in fault detection. Journal of Sound
and Vibration, 287(3):614–624, 2005. 19
102
Part II
Part II
103
Paper A
105
Optimizing the novelty detection algorithm using a criticality index for rotating
machine fault detection on a production line
Juhamatti Saaria,b,∗, Johan Odeliusb

a SKF-LTU University Technology Centre, Luleå University of Technology, SE-97187 Luleå, Sweden (e-mail: juhamatti.saari@ltu.se)
b Divisionof Operation, Maintenance and Acoustics, Luleå University of Technology, SE-97187 Luleå, Sweden
Abstract
In fault detection techniques for condition monitoring, defining an appropriate model sensitivity is important in order
to avoid false alarms, e.g. due to changes in the operating conditions. Poorly optimized models may lead to a scenario
where operators doubt each alarm and finally disable the whole fault detection system. In this paper, a method is
presented where the sensitivity of a fault detection model is adjusted using a criticality index. The criticality index
is based on risk and cost-benefit analyses. The purpose is dual: model selection and alarm threshold definition. This
approach is important in the new era of the Internet of Things, where a great number of individual models are trained
and many components are monitored simultaneously.
As proof of concept, the method was applied using the one-class support vector machine (one-class SVM). The
models were validated using bearing degradation data created at a laboratory. The results show that each of the trained
models can be an appropriate detection model as long as the desired sensitivity is known. All models were able to
detect the bearing fault once the threshold limit was set to the correct position. The results suggest that criticality
analysis can improve the fault detection modelling, which will enhance the effectiveness of maintenance decision
making.
Keywords: Fault detection, Novelty detection, One-class SVM, Maintenance optimization, FMECA
1. Introduction
In machine diagnostics goal is to detect an abnormal condition (fault detection), determine which component
is defective (fault isolation), and estimate the nature and extent of the fault (fault identification). A wide range of
methods are available for fault detection and they differ depending on the type of system to which they are applied
(Venkatasubramanian et al., 2003; Jardine et al., 2006). Automatic fault diagnosis using machine learning techniques
has attracted increasing attention during recent years (Dai and Gao, 2013; Precup et al., 2015). Sub-categories of
machine learning techniques are supervised learning algorithms, e.g. the support vector machine (SVM) (Widodo and
Yang, 2007; Saari et al., 2015) and unsupervised learning algorithms, e.g. the deep belief network, the k-nearest neigh-
bours algorithm (Tamilselvan and Wang, 2013; Traore et al., 2015), and the novelty detection algorithm (Ding et al.,
2014; Kemmler et al., 2013; Fernández-Francos et al., 2013). What makes the fault detection problem challenging, is
the fact that often this decision needs to be done without any historical data about the previously seen similar faults.
Therefore unsupervised machine learning techniques which work without any labelled data are more commonly used
for this task. However this makes the model tuning difficult, since there are no good evaluation metrics (sensitivity
and specificity) available due to the lack of data.
When performing condition monitoring, it is recommended that one should perform criticality assessment, e.g. by
using the failure rate and the mean time to repair, the consequential or secondary damage, the cost of maintenance or
spares, the cost of machine downtime, and safety and environmental changes (ISO17359, 2011). There are, however,
no guidelines on criticality assessment for fault detection accuracy, besides the recommendation that the detection
∗ Corresponding author
Preprint submitted to journal October 24, 2018
106
should be as early as possible. Detecting failures as early as possible without considering what will happen once the
alleged fault has been detected can have several drawbacks, e.g. the users may lose confidence in the system and
even disable it completely in response (Katipamula and Brambley, 2005). Katipamula and Brambley (2005) stress
that it is particularly important to avoid too many false alarms in the early stages of introducing a new technology.
The criticality of the system therefore affects the sensitivity of the diagnostic tools; methods applied to non-critical
systems should be tuned to generate fewer false alarms, whereas the opposite applies to critical systems. In other
words, a diagnostic tool that is designed to detect an abrupt change quickly will be sensitive to noise and can lead to
frequent false alarms during normal operation (Venkatasubramanian et al., 2003).
The balance between early detection and false alarms is also of increasing importance for large systems and
IoT-enabling systems, e.g. complex software systems (Jiang et al., 2009). For automated fault detection, a method
for assessing a criticality index is therefore needed. This study also addresses the problems involved in the model
evaluation process for cases where an appropriate model needs to be selected using incomplete information. This
scenario is thus common in those cases where there are only healthy condition data. Novelty detection can therefore
be used instead of pure classification algorithms, rather then waiting until enough data are collected and labelled.
This paper is divided into two parts. In part A classification metrics are discussed and a criticality index based
on risk and cost-benefit analyses is described. In Part B, discussed the scenario when only healthy data are available
and effective performance metrics cannot be used. Here, a method using the criticality index to optimize a novelty
detection algorithm called the one-class support vector machine is proposed for fault detection. The data for training
the detection model were produced in a bearing run-to-failure laboratory experiment.
2. Part A
2.1. Classification metrics for fault detection
Fault detection is a traditional binary classification problem and the model outcome is labelled as positive (p)
or negative (n). A positive outcome means that an alarm was raised and a negative outcome that the system was
considered to work under the nominal condition or, in other words, was regarded as healthy. However, there are
actually four possible outcomes which need to be considered carefully each time the model is trained. These four
outcomes are as follows.
• True positive (TP): The prediction is “faulty” and the system is faulty.
• False positive (FP): The prediction is “faulty” and the system is healthy.
• True negative (TN): The prediction is “healthy” and the system is healthy.
• False negative (FN): The prediction is “healthy” and the system is faulty .
Using these four outcomes, the following four important metrics can be expressed in Equations 1-4 (Fawcett,
2006):
FP
False positive rate (FPR) = (1)
FP + TN
TN
Specificity = (2)
TN + FP
TP
Sensitivity = (3)
TP + FN
TP + TN
Accuracy (ACC) = (4)
TP + FP + TN + FN
In fault detection, a verbal definition of these metrics would be as follows:
• FPR: A measure of how many times a false alarm was triggered when only healthy data were used, aka the false
alarm rate;
107
• Specificity: measure of how well the model is able to predict when the system is in a healthy state, aka 1- FPR;
• Sensitivity: measure of how well the model is able to predict when the system is in a faulty state, aka recall;
• ACC: true measure of how many times the prediction model was able to identify the true state of the system.
In the ideal case, the perfect classifier will have zero FPs and FNs (i.e. ACC = 1) and the model selection is easy
to justify without considering the circumstances how the machine is operating. However, for reasons such as noise in
the measurements, imperfect key indicators (features), and process noise, accuracy of 100% is difficult to achieve in
practice. Accordingly, better performance metrics are those using sensitivity and specificity rather than plain accuracy
(Fawcett, 2006).
2.2. ROC analysis

The trade-off between sensitivity and specificity is often plotted as a ROC (receiver operating characteristic) space,
see Figure 1. The challenge is to find a reasonable, rational, desirable balance between sensitivity and specificity
(Swets et al., 2000). The dark area in the ROC curve represents the model where the sensitivity and specificity are
low and which is making worse predictions than those that would be obtained with a random flip of an unbiased coin.
According to Fawcett (2006), ROC curves are insensitive to changes in the class distribution, and therefore optimized
models can be identified using the ROC convex hull explained in their study.
A B
Liberal
Sensitivity
Conservative
C 1-Specificity
Figure 1: Illustration of a ROC curve. Point A represents the ideal classifier, which is able to classify all the data points correctly with zero false
positives or negatives. At the point C, classifier is able to classify correctly when the machine is faulty with zero false positives. At the point B
classifier is able to classify correctly when the machine is healthy with zero false negatives.
In the ROC space diagram (see Figure 1) there is one area called the conservative area and one called the liberal
area. In general, conservative models are able to detect when the system is healthy with lesser false alarms related
to false positive, but may miss the fault by giving more false negatives. Liberal models should be able to alarm
when the system is no longer healthy, but may give more false alarms by increasing the false positive value. In fault
detection, the liberal area is for systems for which it is important to know when a fault has occurred, even at the
expense of having more false positive alarms. This type of system is usually considered to be critical for the core
business. Nuclear power plants and airplanes are good examples of this type of system. The conservative area is for
systems which are slightly more peripheral to the core business and can be adjusted to have fewer false alarms, even
at the expense of missing some of the incipient faults. A good example of this type of system is a redundant pump,
which will not affect the production during a shutdown. This type of system can also include many systems which are
108
normally not considered to be part of the machines being subjected to condition monitoring, such as mobile machines
supporting the production.
In the ideal case, the steps for optimizing the fault detection classifier using the ROC analysis are as follows:
• defining the group which the system belongs to (conservative or liberal);

• defining the lower acceptable limits of the sensitivity and the specificity;
• calculating the sensitivity and specificity of each model under consideration;
• selecting the optimal classifier model by using the ROC convex hull (Fawcett, 2006), which is able to identify
potentially optimal classifiers.
However, in practice, the model selection is not that straightforward. First of all, there are no well-explained
methods for deciding whether the model should be conservative or liberal. Secondly, even when one knows whether
the model should be conservative or liberal, there are no methods for deciding how conservative or liberal it should be
(for selecting the sensitivity and specificity limits). Lastly, in most cases, false positives and false negatives have not
been verified or even seen, since the machine is operating under the nominal state.
Therefore, other methods are needed, either to optimize the model differently or to try in some way to define the
sensitivity and specificity values more indirectly before enough data have been collected to calculate the sensitivity
and specificity accurately.
2.3. Risk, cost and benefit

On many occasions, optimal models cannot be selected by using the measured data since the selection decision
is very subjective and depends on many other related issues, such as the operating conditions and the demands of the
legislature. It is difficult to compile a comprehensive list of the issues which have an impact on the selection decision.
However, an comprehensive compilation of all the things which should be considered in this connection can be found
in ISO asset management standard (ISO55000, 2014). This standard emphasizes the importance of the organization
realizing the value of all of its assets and not solely focusing on the condition monitoring of physical assets. Neverthe-
less, fault detection is a key part of asset management and ISO 55000 can be used as a helpful guideline. The standard
includes factors such as the nature and purpose of the organization, its operating context, its financial constraints and
regulatory requirements, and the needs and expectations of the organization and its stakeholders, which have to be
considered when improving the asset management.
Risk
Risk analysis can be based on e.g. failure mode and effect analysis (FMEA), fault tree analysis (FTA) and Bayesian
networks (BN). These are analyses that often already have been carried out for critical assets. However, methodologi-
cal approaches that would assist practitioners in selecting suitable risk assessment techniques are missing (Chemweno
et al., 2015), and therefore it is difficult to decide which method should be used for risk analysis. Nevertheless, the
organization should determine the actions that are necessary for addressing the risk with the overall purpose of under-
standing the cause, effect and likelihood of adverse events occurring (ISO55000, 2014). Moreover, this information
should not be wasted when optimizing fault detection models.
Risk assessment can be quantitative or qualitative. The output of a quantitative risk assessment will typically be
a number, such as the cost impact per unit time (Khan and Haddara, 2003). The number can then be used to divide
each system being monitored into groups based on some preselected cost threshold. Those groups which are more
costly will receive a high risk value and those which are less costly a low risk value. In qualitative risk assessment,
the results are often shown in the form of a simple risk matrix where one axis of the matrix represents the probability
and the other represents the consequences (Khan and Haddara, 2003). Therefore, this method is less rigorous and
in most cases very debatable. Quantitative risk assessment requires a great deal of data for both the assessment of
probabilities and the assessment of consequences.
109
Cost-benefit analysis
In the cost-benefit analysis (CBA), the tangible and intangible costs and benefits of both false positives and false
negatives should be analysed. In these calculations the benefit is evaluated by estimating how important it is to detect
the fault as early as possible. An important factor is the possibility of forecasting the time until failure given the
degradation rate, see e.g. Jardine et al. (2006).
Many CBA models have been developed. For instance, Rouhan and Schoefs (2003) evaluated the global cost of
inspection planning for offshore structures based on decision and detection theories, and included both the probability
of false alarms and the probability of detection. In relation to false alarms, cost of downtime and spare part man-
agement and logistics are important factors in the CBA. A model combining risk and cost-benefit calculation was
suggest by Khan and Haddara (2003), which studied a risk-based maintenance methodology whose purpose was the
minimization of hazards (both to humans and the environment) caused by the unexpected failure of equipment. In
their model they incorporated some factors which took into account the damage done to property and assets and the
fatality factor of each accident.
2.4. Criticality index

Because risk, cost and benefit estimations are highly dependent on the system concerned, some rough guidelines
are given by the present authors as how to calculate the criticality index (Cidx ) are presented. Three important aspects
are:
• The Cidx value is set between 0 and 100. If the risk and/or cost are calculated using different boundaries, they
need to be scaled accordingly. Values less than 50 make the system conservative and values over 50 make it
liberal.
• Emphasis should be placed on data quality, e.g. by conducting a qualitative risk analysis, if, there are not enough
data present to make a quantitative risk analysis.
If the Cidx value is close to 0, the decision to apply the detection algorithm to that particular component should be
revised, since it may be useless to waste resources by monitoring a system which is non-critical.
The criticality index Cidx depends on the failure mode. For simplicity we suggest an approach which involves
choosing the maximum Cidx value for each system or taking an average value of top three ones. An alternative would
be to base the calculation on each particular critical failure mode (and train several models with varying criticality).
However, individually defining the fault detection accuracy rate may be impractical in the context of large systems.
Implementing a criticality index should allow rather fast implementation possibilities for companies whose main-
tenance protocol is already based on reliability-centred maintenance (RCM), or which have used failure mode, ef-
fects and criticality analysis (FMECA) to identify potential failure modes and the risk of each failure mode occur-
ring (McElroy et al., 2015; Lipol and Haq, 2011).
3. Part B
3.1. Novelty detection

The aim of novelty detection is to identify new or unknown data which a machine learning system is not aware
of during training (Markou and Singh, 2003). The novelty detection algorithm uses two main approaches for es-
timating the probability density function (PDF), namely the parametric approach (for known distributions) and the
non-parametric approach (for unknown distributions) (Desforges et al., 1998). In this study we used a method called
the one-class support vector machine (one-class SVM), which is sometimes referred to as the nu-SVM. It is a non-
parametric method where the PDF is estimated using the kernel method.
Since only data from one class are available, the one-class SVM algorithm cannot maximize the margin between
two classes, as is done when using the regular SVM (Schölkopf et al., 1999). Instead, the goal is to develop an
algorithm which returns a function that takes value 1 in a small region (nominal region in this case) and -1 elsewhere
(faulty region). The strategy is to map the data into feature space F (using a known kernel function) and to separate
them from the origin with the maximum margin.
110
The objective function of a one class SVM (Equation 5) resembles that of the two-class SVM with some small
differences. Instead of the cost function, in the one-class SVM it is the parameter nu (ν) ∈ [0, 1] that characterizes the
solution by solving quadratic solution, where (Schölkopf et al., 1999)
n
kωk2 1 X
min + ξi − ρ (5)
ω,ξ,ρ 2 νn i=1
Subject to:
(ω · Φ(xi )) ≥ ρ − ξi ∀i ∈ N
ξi ≥ 0 ∀i ∈ N,
where n is the number of instances, ρ is the offset parameter and ξ is the slack variable. ω and ρ are hyperplane
parameter represented with the equation, where ωT x + ρ = 0.
This quadratic optimization problem is solved using Lagrange multipliers, the decision function rule for a data-
point x then becomes (Schölkopf et al., 1999)
n
X
f (x) = sgn( αi yi K(x, xi ) + b), (6)
i=1
where the coefficient αi > 0 and K is the kernel function.

To ensure an excellent performance, the kernel must be chosen with great care, as it has a crucial effect on the
performance Muller et al. (2001). Determining which kernel to use depends on the data and the number of features.
In this study the Gaussian radial base function (RBF) was used:
2
K(x, x0 ) = exp(γ x − x0 ), (7)
where γ is the kernel parameter and kx − x0 k is the dissimilarity measure.
Selecting the RPF kernel and using the default tolerance of the termination criterion, there are two parameters
which needs to be optimized when using one class SVM toolbox. These parameters are gamma (γ) and nu (ν) Chang
and Lin (2011). Gamma defines the extent of the influence of a single training example. Increasing the γ value, lowers
it. ν sets an upper bound on the fraction of outliers (training examples regarded out-of-class) and it is a lower bound
on the number of training examples used as Support Vector Schölkopf et al. (2000).
ν and γ parameters can be defined using the traditional grid search method, where the goal is to find the best
pair by manually specifying parameter subsets. Other methods which can be used include Bayesian optimization and
gradient-based optimization, nevertheless problem is to define an appropriate value (accuracy of the fault detection)
of the objective function.
In theory, the one-class SVM should under-fit the data when the ν and γ parameters are chosen to be too low and
fewer false positives are happening at the risk perhaps of missing some of the actual faults.
The new and interesting advantage of combining criticality calculations with novelty detection algorithms is the
dynamic properties of this combination. Since novelty detection only needs the nominal data for re-training, the
calculation of the cost and benefit can be performed dynamically and the model can be re-defined at any given time.
For instance, during the holiday season, the cost of setting off a false alarm can be much higher due to the lack of
manpower, and therefore the detection model can be slightly under-fitted temporarily. By doing so, fewer false alarms
should occur and yet critical failures are prevented.
3.2. Novelty detection combined with anomaly detection

Using the one-class SVM for fault detection entails the problem of deciding how to proceed when anomalies
are seen and confirmed. One possibility is to use a different algorithm, for instance the EllipticEnevelope algorithm
(provided by scikit-learn Pedregosa et al. (2011)) to switch from using the novelty detection algorithm to use of an
anomaly detection algorithm (with data from the nominal condition and a few instances of faults).
One alternative, if multiple algorithms are not used, is to use outliers for fine-tuning the one-class SVM. This can
be accomplished by using testing data containing anomalies to calculate the model sensitivity and select a one-class
111
SVM model optimized using different ν and γ values. With this method, choosing between a conservative or a liberal
model will be easier (see Figure 1).
Once enough anomalies are seen and when the system is in a faulty state, there should be a smooth transition from
detection algorithms to pure classification algorithms such as the traditional SVM, where labels are known for both
the healthy and the faulty cases, see e.g. (Widodo and Yang, 2007; Saari et al., 2015).
3.3. Baseline as an accuracy threshold

Usually with machine learning algorithms, after training, the model is tested using a smaller portion of the same
data (Chandola et al., 2009). Therefore, the nominal data are randomly divided into two portions (for example 80 %
for training and 20 % for testing). Once each model has been trained, the testing data will give the initial accuracy,
which is later referred as the baseline specificity of the model, since there are no faulty data present. The procedure
for selecting the novelty detection model is performed as follows.
• Normalize the nominal data and select the ratio for training and validation (e.g. 80/20);
• Calculate the baseline specificity using the smaller portion of the nominal data;
• Selecting a range for parameter by selecting models whose initial specificity is higher than 50 %. With this data
set 15 parameters were selected where parameter values are between 0 – 0,6;
• Re-train the model using the complete data set, see Figure 3;
• Models are assumed to be conservative when the initial specificity is higher than 75 % and liberal when it is
lower than 75%;
• Calculate Cidx and choose the most appropriate model. If Cidx is high, lean towards slightly over-fitted models
(liberal models) and, if it is low, lean towards under-fitted models (conservative models). The threshold for the
accuracy will be approximately the same as Cidx .
4. Degradation data
Vibrational data from the bearing data set provided by the Center for Intelligent Maintenance Systems (IMS) of
the University of Cincinnati were used in this study (Lee et al., 2007). Four double-row bearings (Rexnord ZA-2115
bearings) were installed on one shaft and two accelerometers were mounted on each of them to register the vibration
signals in two different spatial axes, as shown in Figure 2. The shaft was driven by an AC motor and coupled by rub
belts. The rotation speed was kept constant at 2,000 rpm and a 6,000 lb radial load was added to the shaft and bearings
by a spring mechanism.
Vibration data were collected, first every 5 min and, after 215 min, every 10 min. In total the test run lasted 355
h 45 min. Each file consists of 20,480 points with the sampling rate set at 20 kHz. At the end of the test, an inner
race defect occurred in bearing 3. Therefore, data were captured from the two accelerometers mounted directly on the
housing of bearing 3. Pictures of the damage can be seen in the paper by Qiu et al. (2006) (Fig. 17a).
4.1. Feature extraction

For the experiments performed in the present study, the data were divided into 20 different degradation sets. The
first degradation set (nominal data) consisted of the first 656 data captures, corresponding to a total running time of
105 h 45 min. This data set was used for training the one-class SVM algorithm with varying ν and γ parameters. Each
of the subsequent 19 data sets consisted of 150 captures which are overlapping the previous one by 50 %.
The bearing was considered to be in a faulty state, but, with regard to the 19 data sets created for testing, it was
difficult to know when the degradation had begun. By looking at the feature values, fault is assumed to be present
during the third degradation dataset, since small increase in the rms level is seen during that time.
From the signals sent by the two accelerometers mounted on the bearing, four time domain features were extracted,
namely the RMS level, kurtosis, peak value and peak-to-peak value. There are two reasons why these features were
selected. First of all the assumption is that model should not only alarm when a certain fault (e.g inner race fault) is
112
Figure 2: Bearing test rig and position of the accelerometers. Qiu et al. (2006)
occurring and therefore features needs to be universal such as the RMS levels usually are. Secondly these features
will simplify the interpretation of the results and the functionality of the selected features is not lost. However, for
each system the selection of features should be done carefully or to use more advanced feature selection techniques,
such as studied by Guyon and Elisseeff (2003), to find the optimum set of features.
In total there were eight features (two vibration sources and four extracted features) which were used as input to
the one-class SVM. Figure 4 shows the level of each feature during the test. The separation of the data segments in
time is also marked in the figure.
4.2. Training, testing and the alarm threshold selection

The one-class SVM models were trained using 15 different ν and γ values, whose range was from 0.01 to 0.6.
When parameter values were higher than 0.6, baseline specificity was lower than 50 % and therefore such values were
omitted.
Figure 3 illustrates the acquisition of the baseline specificity, the training of the model using all the data points (the
final model), the calculation of the accuracy index (accuracy of the testing set divided by the baseline specificity), and
how the criticality index Cidx was used for defining the alarm threshold. With this method, the one-class SVM was
trained twice. First the original data are divided into training data and testing data and, once the baseline specificity
is calculated, the final model is calculated using the complete nominal data set (set 0). Adopting this procedure, data
will not be wasted, which can be important for cases where the nominal data set is small. Reason for calculating the
accuracy index, is barely to avoid situation where threshold limit, aka the accuracy index, is scaled individually for
every system, since the model cannot reach to an accuracy which is close to 100 %.
Since the data used came from a test rig, there was no possibility of calculating the real Cidx by using the theoretical
method explained in the part A. Therefore, six Cidx values were chosen which will cover the range of risk levels for
common systems. Assumed risk levels were: 99 (extremely important), 95 (highly important) 90 (important), 80 (of
average importance), 60 (of low importance), and 40 (of very low importance).
5. Results
The baseline specificity of the one-class SVM with several ν and γ parameters is presented in Table 1. The models
were built using the nominal data set, of which 80 % was used for training and 20 % for testing. The data points were
selected randomly for the training set and the testing sets.
Each model failed to achieve 100 % baseline specificity, as can be seen in Table 1. Therefore, the one-class SVM
alarm threshold should not be based on the 100 % baseline specificity with the given dataset, since this would lead
to a models which are repeatedly sending false alarms. Models with low ν and γ values were able to achieve a high
baseline specificity. These results were quite expected since low γ values should increase the nominal space and thus
make models more under-fitted than with γ values. Very clear trend was seen where higher parameter values gave a
113
Nominal data (set 0)
Training Testing
20 %
80 %
Baseline
Initial model specificity
(BSPEC)
100 % Accuracy
Final model
(ACC)
Degradation data (set 1-19)

ACC ≥ Cidx OK
Accuracy
index
BSPEC < Cidx ALARM
Figure 3: Calculating the baseline specificity and how to use it for creating the accuracy index.
Table 1: One-class SVM accuracy (%) by varying the ν and γ parameters.
aa
ν aaγ 0.01 0.05 0.09 0.14 0.18 0.22 0.26 0.31 0.35 0.39 0.43 0.47 0.52 0.57 0.6
a
0.01 97.71 96.95 95.42 92.37 92.37 90.08 87.02 84.73 84.73 82.44 80.92 77.86 77.86 76.34 72.52
0.05 96.18 95.42 95.42 92.37 92.37 90.08 87.02 84.73 84.73 82.44 80.92 77.86 77.86 76.34 72.52
0.094 93.13 94.66 91.60 90.08 89.31 88.55 86.26 85.50 84.73 82.44 80.92 77.86 77.86 76.34 72.52
0.14 90.84 90.08 90.84 90.08 88.55 87.79 85.50 84.73 83.97 83.21 81.68 77.86 77.86 76.34 72.52
0.18 86.26 87.79 86.26 87.02 87.02 84.73 83.21 83.21 83.21 82.44 82.44 78.63 77.86 76.34 72.52
0.22 81.68 84.73 82.44 83.97 81.68 80.92 83.21 82.44 82.44 82.44 81.68 79.39 77.86 76.34 72.52
0.26 80.15 80.92 81.68 82.44 80.15 80.15 80.15 80.15 80.92 80.15 79.39 77.10 76.34 75.57 71.76
0.31 78.63 78.63 78.63 77.10 79.39 78.63 77.86 77.86 77.86 77.86 77.10 75.57 75.57 73.28 69.47
0.35 75.57 75.57 74.05 70.99 73.28 74.81 74.81 74.05 75.57 74.05 73.28 73.28 71.76 71.76 68.70
0.39 71.76 73.28 68.70 68.70 70.23 70.23 69.47 70.23 72.52 70.99 70.99 70.23 69.47 67.18 64.12
0.43 67.18 67.18 66.41 67.94 67.18 65.65 66.41 66.41 67.94 67.18 65.65 66.41 64.12 63.36 61.83
0.47 64.89 64.89 65.65 65.65 64.89 65.65 65.65 63.36 62.60 63.36 62.60 60.31 61.07 59.54 56.49
0.52 61.07 60.31 64.12 63.36 64.12 62.60 60.31 59.54 59.54 59.54 58.02 58.78 57.25 55.73 55.73
0.56 53.44 55.73 54.20 57.25 58.02 57.25 57.25 56.49 55.73 54.20 53.44 53.44 52.67 51.91 53.44
0.6 47.33 49.62 50.38 50.38 51.15 50.38 48.86 47.33 49.62 49.62 47.33 47.33 46.56 46.56 46.56
lower accuracy. Parameter values which were higher than 0,6, were also tested and when both parameters were close
to 1, the accuracy was around 2 %. When these models with parameter values were tested using degradation testing
sets, models were behaving very randomly and no clear trend was seen.
Figure 5a shows the results for the level of accuracy of each diagonal model (ν and γ are equal, see Table 1) after
the training period. In Figures 5b and 5c only one of the parameters changes while the other remains constant.
When parameter values were lower than 0,6 (see Figure 5) all models were converging to zero at the end and
each of the models were increasing or decreasing the similarly with given degradation datasets. However based on the
individual feature values (see Figure 4.1), detection at the time when accuracy is 0 %, would have been most likely too
late for all models. Therefore the threshold limit should be lower than 100, but higher than zero. These result clearly
indicates how important it is not only to choose an appropriate model, but also to choose a threshold level accordingly
for each system.
Comparing the situation where only ν was fixed (Figure 5b) with the situation where only γ was fixed (Figure 5c),
it was shown that varying ν had a bigger impact on the accuracy index value than varying γ .
To see how early each model is able to detect abnormal behaviour, results seen in the Figure 5a are presented also
as a Table, where each cell value will give the number the degradation testing set, which detected the fault. Model
are numbered from 1-15, where Model 1 will have lowest parameter values (ν and γ are 0,01) and model 15 have the
highest values (ν and γ are 0,6). According to the given procedure, Models from 1-9 are considered to be conservative
114
0.6 5
x-axis
Set 15
Set 17
Set 19
Set 15
Set 17
Set 19
x-axis
y-axis 4.5 y-axis
0.5
4
Set 11
Set 13
Set 11
Set 13
Set 9
Set 9
3.5
0.4
RMS amplitude
Peak value
Set 3
Set 5
Set 7
Set 3
Set 5
Set 7
0.3 2.5
Set 1
Set 1
2
Nominal data Nominal data
0.2
1.5
1
0.1
0.5
0 0
50h 100h 150h 200h 250h 300h 355h 45min 50h 100h 150h 200h 250h 300h 355h 45min
Time Time
(a) RMS level during the test (b) Peak level during the test.
80 10
x-axis
Set 15
Set 17
Set 19
Set 15
Set 17
Set 19
x-axis
y-axis 9
70 y-axis
8
60
Set 11
Set 13
Set 11
Set 13
Set 9
Set 9
7
Peak-to-Peak value
50
6
Kurtosis
Set 3
Set 5
Set 7
Set 3
Set 5
Set 7
40 5
Set 1
Set 1
4
30 Nominal data Nominal data
3
20
2
10
1
0 0
50h 100h 150h 200h 250h 300h 355h 45min 50h 100h 150h 200h 250h 300h 355h 45min
Time Time
(c) Kurtosis value during the test. (d) Peak-to-Peak level during the test.
Figure 4: The increase in the feature values from the start to the failure and the time segment for each testing set which did not overlap another set.
Nominal data set was collected from start to 105h 45 min.
and models from 10-15 are liberal. Altogether, five different scenarios were tested where threshold value is chosen to
be 99, 95, 90, 80, 60 or 40. These values are the same as the estimated criticality value, hence the name Cidx . The
results can be seen in Table 2.
Table 2: Effect of lowering the threshold value (analogous to the Cidx ). Each cell value represents the number of the degradation testing set which
was able to detect abnormal behaviour
aa
aaDiagonal
amodel 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Cidx aa
a
99 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
95 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1
90 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1
80 15 4 3 2 2 1 1 1 1 1 1 1 1 1 1
60 16 7 4 4 3 3 3 3 3 3 3 3 3 3 3
40 16 16 16 4 4 4 4 4 4 4 4 4 4 4 4
From these results can be seen that using Cidx values 99 and 95, all models would have send the alarm immediately
115
and would have definitely been seen as false alarms. However, if the criticality of the system is over 95 %, some false
alarms can be accepted. When the threshold limit was 60, the selection of the model tuning parameters were almost
insignificant. Almost all models, were able to detect the fault during the third degradation data set. Only fist four
model would have send the alarm later than that and only first model detected it perhaps too late. In overall, even
though using the criticality of the system can help to identify appropriate model, some problems would still be there;
For instance, using conservative models with high parameter values, would also mean that criticality index should
be high (perhaps around 80). However as can be seen in the Table 2, these model were unable to avoid false alarms.
However, to be noted that the proposed method is not aimed for finding optimal model parameters and threshold values
for individual case, but rather as a selection tool when individual model selection is impractical, such as in the new
era of Industrial Internet, where hundreds or even thousands of individual models are trained and many components
are monitored simultaneously.
116
120
, =0.01
, =0.053
100 , =0.095
, =0.137
, =0.179
, =0.221
80 , =0.263
Accuracy index
, =0.305
, =0.348
60 , =0.39
, =0.432
, =0.474
, =0.516
40 , =0.558
, =0.6
20
0
0 2 4 6 8 10 12 14 16 18 20
Degradation set
(a) Varying both γ and ν values.

100
= 0,01 , =0.01
90 = 0,01 , =0.053
= 0,01 , =0.095
80 = 0,01 , =0.137
= 0,01 , =0.179
70 = 0,01 , =0.221
= 0,01 , =0.263
Accuracy index
60 = 0,01 , =0.305
= 0,01 , =0.348
50 = 0,01 , =0.39
= 0,01 , =0.432
40
= 0,01 , =0.474
= 0,01 , =0.516
= 0,01 , =0.558
30
= 0,01 , =0.6
20
10
0
0 2 4 6 8 10 12 14 16 18 20
Degradation set
(b) Fixed ν value and varying γ values.

140
= 0,01 , =0.01
= 0,01 , =0.053
120 = 0,01 , =0.095
= 0,01 , =0.137
= 0,01 , =0.179
100 = 0,01 , =0.221
= 0,01 , =0.263
Accuracy index
= 0,01 , =0.305
80 = 0,01 , =0.348
= 0,01 , =0.39
= 0,01 , =0.432
60 = 0,01 , =0.474
= 0,01 , =0.516
= 0,01 , =0.558
40 = 0,01 , =0.6
20
0
0 2 4 6 8 10 12 14 16 18 20
Degradation set
(c) Fixed γ value and varying ν values.
Figure 5: One class SVM model accuracy with different γ and ν values. In accuracy index, model accuracy is compared against the baseline
specificity.
117
6. Future work
Since feature selection plays an important role, in future it would be interesting to repeat the study using features
which are more sensitive to specific fault modes, for instance enveloped signals with specific band pass filters to locate
the fault of interest. Despite having some advantages, this approach might have the disadvantage that a detection model
specifically tailored for certain faults might miss other types of fault modes or locations and therefore needs a thural
investigation and was left out from this study.
Also, in future it would be interesting to ascertain how each model behaves when one knows the full development
of the fault, meaning a proper specifications are made when the initial defect has started. In this way, the sensitivity
and specificity can be calculated accurately and selected models using the given procedure can be compared.
7. Conclusions
The proposed criticality index allows easier fault detection model parameter selection by enabling to choose
between under-fitted and over-fitted models more easily. Other benefit is the possibility to expand the model usability
into systems which are usually neglected when assessing a condition monitoring system by selecting model parameter
in a such way that the model only reacts when detected fault is certainly a true positive. The results obtained using
the bearing data set show that highly over-fitted models should be avoided (ν over 0,6), at least when the calculated
initial baseline specificity is poor.
Acknowledgements
This research study was supported by SKF AB and Vinnova and was performed within the framework of the
SKF-LTU University Technology Centre at Luleå University of Technology. The authors thank Chih-Chung Chang
and Chih-Jen Lin for their contribution of creating the LIB-SVM library and making it available for public use.
References
Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):15.
Chang, C.-C. and Lin, C.-J. (2011). Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST),
2(3):27.
Chemweno, P., Pintelon, L., Van Horenbeek, A., and Muchiri, P. (2015). Development of a risk assessment selection methodology for asset
maintenance decision making: An analytic network process (anp) approach. International Journal of Production Economics, 170:663–676.
Dai, X. and Gao, Z. (2013). From model, signal to knowledge: A data-driven perspective of fault detection and diagnosis. IEEE Transactions on
Industrial Informatics, 9(4):2226–2238.
Desforges, M., Jacob, P., and Cooper, J. (1998). Applications of probability density estimation to the detection of abnormal conditions in engineer-
ing. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 212(8):687–703.
Ding, X., Li, Y., Belatreche, A., and Maguire, L. P. (2014). An experimental evaluation of novelty detection methods. Neurocomputing, 135:313–
327.
Fawcett, T. (2006). An introduction to roc analysis. Pattern recognition letters, 27(8):861–874.
Fernández-Francos, D., Martı́Nez-Rego, D., Fontenla-Romero, O., and Alonso-Betanzos, A. (2013). Automatic bearing fault diagnosis based on
one-class ν-svm. Computers & Industrial Engineering, 64(1):357–365.
Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157–1182.
ISO17359 (2011). Condition monitoring and diagnostics of machines - general guidelines. Technical report.
ISO55000 (2014). Asset management – overview, principles and terminology.
Jardine, A. K., Lin, D., and Banjevic, D. (2006). A review on machinery diagnostics and prognostics implementing condition-based maintenance.
Mechanical systems and signal processing, 20(7):1483–1510.
Jiang, M., Munawar, M. A., Reidemeister, T., and Ward, P. A. (2009). Automatic fault detection and diagnosis in complex software systems by
information-theoretic monitoring. In 2009 IEEE/IFIP International Conference on Dependable Systems & Networks, pages 285–294. IEEE.
Katipamula, S. and Brambley, M. R. (2005). Review article: methods for fault detection, diagnostics, and prognostics for building systems – a
review, part i. HVAC&R Research, 11(1):3–25.
Kemmler, M., Rodner, E., Wacker, E.-S., and Denzler, J. (2013). One-class classification with gaussian processes. Pattern Recognition,
46(12):3507–3518.
Khan, F. I. and Haddara, M. M. (2003). Risk-based maintenance (rbm): a quantitative approach for maintenance/inspection scheduling and
planning. Journal of Loss Prevention in the Process Industries, 16(6):561–573.
Lee, J., Qiu, H., Yu, G., and Lin, J. (2007). Rexnord technical services, bearing data set, ims, university of cincinnati, nasa ames prognostics data
repository.
118
Lipol, L. S. and Haq, J. (2011). Risk analysis method: Fmea/fmeca in the organizations. International Journal of Basic & Applied Sciences
IJBAS-IJENS, 11(05):74–82.
Markou, M. and Singh, S. (2003). Novelty detection: a reviewpart 1: statistical approaches. Signal processing, 83(12):2481–2497.
McElroy, L. M., Khorzad, R., Nannicelli, A. P., Brown, A. R., Ladner, D. P., and Holl, J. L. (2015). Failure mode and effects analysis: a comparison
of two common risk prioritisation methods. BMJ quality & safety, pages bmjqs–2015.
Muller, K.-R., Mika, S., Ratsch, G., Tsuda, K., and Scholkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE transactions
on neural networks, 12(2):181–201.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas,
J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of
Machine Learning Research, 12:2825–2830.
Precup, R.-E., Angelov, P., Costa, B. S. J., and Sayed-Mouchaweh, M. (2015). An overview on fault diagnosis and nature-inspired optimal control
of industrial process applications. Computers in Industry, 74:75–94.
Qiu, H., Lee, J., Lin, J., and Yu, G. (2006). Wavelet filter-based weak signature detection method and its application on rolling element bearing
prognostics. Journal of sound and vibration, 289(4):1066–1090.
Rouhan, A. and Schoefs, F. (2003). Probabilistic modeling of inspection results for offshore structures. Structural safety, 25(4):379–399.
Saari, J., Odelius, J., and Lundberg, J. (2015). Using wavelet transform analysis and the support vector machine to detect angular misalignment
of a rubber coupling. In Proceedings of the Maintenance, Condition Monitoring and Diagnostics Maintenance Performance Measurement and
Management, pages 117–126. Pohto.
Schölkopf, B., Smola, A. J., Williamson, R. C., and Bartlett, P. L. (2000). New support vector algorithms. Neural computation, 12(5):1207–1245.
Schölkopf, B., Williamson, R. C., Smola, A. J., Shawe-Taylor, J., Platt, J. C., et al. (1999). Support vector method for novelty detection. In NIPS,
volume 12, pages 582–588. Citeseer.
Swets, J. A., Dawes, R. M., and Monahan, J. (2000). Psychological science can improve diagnostic decisions. Psychological science in the public
interest, 1(1):1–26.
Tamilselvan, P. and Wang, P. (2013). Failure diagnosis using deep belief learning based health state classification. Reliability Engineering & System
Safety, 115:124–135.
Traore, M., Chammas, A., and Duviella, E. (2015). Supervision and prognosis architecture based on dynamical classification method for the
predictive maintenance of dynamical evolving systems. Reliability Engineering & System Safety, 136:120–131.
Venkatasubramanian, V., Rengaswamy, R., Yin, K., and Kavuri, S. N. (2003). A review of process fault detection and diagnosis: Part i: Quantitative
model-based methods. Computers & chemical engineering, 27(3):293–311.
Widodo, A. and Yang, B.-S. (2007). Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal
Processing, 21(6):2560–2574.
119
6. PAPER A
120
Paper B
121
Detection and identification of windmill bearing faults using a one-class support
vector machine (SVM)
Juhamatti Saaria,b,∗, Daniel Strömbergssona,c , Jan Lundbergb , Allan Thomsond

a SKF-LTU University Technology Centre, Luleå University of Technology, SE-97187 Luleå, Sweden
b Division of Operation, Maintenance and Acoustics, Luleå University of Technology, SE-97187 Luleå, Sweden
c Division of Machine Elements, Luleå University of Technology, SE-97187 Luleå, Sweden
d SKF (U.K), Industrial Digitalisation & Solutions, Livingston, EH54 7DP, Scotland
Abstract
The maintenance cost of wind turbines needs to be minimized in order to keep their competitiveness and, there-
fore, effective maintenance strategies are important. The remote location of wind farms has led to an opportunistic
maintenance strategy where maintenance actions are postponed until they can be handled simultaneously, once the
optimal opportunity has arrived. For this reason, early fault detection and identification are important, but should not
lead to a situation where false alarms occur on a regular basis. The goal of the study presented in this paper was
to detect and identify wind turbine bearing faults by using fault-specific features extracted from vibration signals.
Automatic identification was achieved by training models by using these features as an input for a one-class support
vector machine. Detection models with different sensitivity were trained in parallel by changing the model tuning
parameters. Efforts were also made to find a procedure for selecting the model tuning parameters by first defining the
criticality of the system and using it when estimating how accurate the detection model should be. Method was able
to detect the fault earlier than using traditional methods without any false alarms. Optimal combination of features
and model tuning parameters was not achieved, which could identify the fault location without using any additional
techniques.
Keywords: Novelty detection, Wind turbine, Bearing fault diagnostics
1. Introduction
Wind energy is an important source of clean and renewable energy. In order to keep its competitiveness, the main-
tenance cost needs to be minimized and, therefore, effective maintenance strategies need to be adopted. The remote
location of wind farms has led to an opportunistic maintenance strategy where component degradations are monitored
over a long period and any non-critical maintenance actions are postponed until they can be handled simultaneously,
once the optimal opportunity has arrived (Keizer et al., 2017). For this reason, early fault detection and identification
are important, especially for the rotational structures of wind energy converters (WECs) (Hameed et al., 2009).
Two approaches commonly used for WEC fault detection are to look for inconsistencies in the process data which
indicate the presence of faults (Zaher et al., 2009; Schlechtingen and Santos, 2011) and to measure vibrations in the
converters (Villa et al., 2011; Barszcz and Randall, 2009). Vibration analysis is especially applicable to monitor the
gears and bearings in the gearbox, the bearings in the generator and the main bearing (Hameed et al., 2009). Vibration
analysis is carried out using spectral analysis and involves a process where, based on knowledge of mechanics, defect
frequencies are isolated which can give an early indication of certain faults. Furthermore, envelope analysis (a.k.a
the high-frequency resonance technique) is found to be the foremost technique (Randall and Antoni, 2011), since it
is able to detect defect frequencies even if they are masked by vibration generated by structural elements and other
machine elements. When applying this technique, each time a defect in a rolling element bearing enters the rolling
∗ Corresponding author (e-mail: juhamatti.saari@ltu.se)
Preprint submitted to a journal November 8, 2018
122
element/raceway contact under load, it will cause an impact which usually excites a resonance in the system at a
high frequency (above 10 kHz), and thus the defect in question is easier to detect using the developed demodulation
technique (McFadden and Smith, 1984).
After relevant features are extracted (for instance the peak of the defect frequencies), one must define an appropri-
ate threshold for the alarm in order to know when the incipient fault has reached the limit when actions are necessary
for retaining or restoring the original function of the component in question (i.e. trend analysis). A common method
for setting the threshold is to monitor the values under nominal conditions, to define the threshold manually by assum-
ing the data points to have a Gaussian distribution, and to use the standard deviation together with the ”68–95–99.7”
rule. This is a fast and rough way to find the anomalies caused by faults in the system. However, this technique has
some limitations. Firstly, it can only be applied for one feature at a time and cannot spot the small changes which may
together be considered as the fault anomaly, since the features are not compared with each other, and since several
features are merely combined and summed together. This may lead to a situation where different features counterbal-
ance each other. Secondly, it is difficult to define which sigma range should be used for selecting the confidence level
and the number of outliers which should be allowed; this requires a manual effort.
In order to avoid these problems, machine learning techniques have been investigated where multiple key indica-
tors can be used as an input. Using supervised methods, a fault detection threshold is set by using existing historical
data, which are divided into two groups labelled nominal condition data and faulty condition data. These data are used
later for optimizing the boundary between these two classes. However, the drawback of machine learning techniques
is the need to have failure data covering all the possible failure modes. For windmill WECs, this problem is sometimes
avoided by training fault detection algorithms using data collected during failures observed in other similar turbines.
However, applying this approach, the working environment and data collection are considered to be similar, which
may not be a totally realistic assumption.
According to Hameed et al. (2009), an Aegis PR pattern recognition tool has been developed which is capable
of detecting wind turbine blade faults using unsupervised techniques. However, in the review presented in (Hameed
et al., 2009), no mention is made of using similar techniques for WEC gearbox faults. In unsupervised techniques,
the threshold is set using nominal data collected before any faults have been observed. There is no requisite for
failure data in such techniques and they can work with multiple input features. One unsupervised technique tested
for real machine fault monitoring problems is the kernel-based one-class support vector machine (SVM) (Schölkopf
et al., 1999; Shin et al., 2005; Fernández-Francos et al., 2013). Shin et al. (2005) used a one-class SVM for detecting
several machine faults and concluded it to be superior to the compared artificial neural network methods. However,
the one-class SVM was shown to be sensitive to the selected parameters and, therefore, follow-up studies are needed
which investigate how to select appropriate parameters for one-class SVMs (Shin et al., 2005). Fernández-Francos
et al. (2013) demonstrated how the one-class SVM was able to detect bearing defects by using energy-related features
extracted by dividing the power spectrum of the raw vibration signals into several sub-bands. After an anomaly
detection, an envelope analysis was performed where the classification of faults was carried out. In their experiment,
accurate fault detection was achieved each time using the same parameter values, which may reduce the sensitivity, as
stated by Shin et al. (2005).
The aim of the present study was to detect and identify WEC bearing faults by extracting relevant features using
envelope analysis, and then use these features as an input for the unsupervised one-class SVM. In order to diagnose
the presence of a specific fault as early as possible, several detection models were trained in parallel by changing
the input features together with the model tuning parameters. This was expected to lead to a situation where faults
could be detected and identified by comparing the accuracy of the different models. Moreover, efforts were made to
find a procedure for selecting the model tuning parameters by first defining the criticality of the system by estimating
how accurate the detection model should be. The analyses were compared with analyses performed using traditional
methods where the failure thresholds were set manually.
2. Measurement campaign and feature extraction

An SKF CMSS WIND-100-10 (100mV/g) accelerometer was mounted in the axial direction on the gearbox hous-
ing close to the high-speed shaft (HSS) generator side (GS) bearing. Figure 1 shows the schematics of the WEC,
where the gearbox stages are seen. The annulus rings of both planetary stages were fixed and the output shaft is on the
generator side. The accelerometer was mounted near the HSS shaft where bearing 1 and 2 are located (see Figure 1).
123
The HSS bearings, which were damaged during the measurement campaign, has been shown in statistical analyses to
most often develop a fault of the gearbox bearings (Keller et al., 2012). The failed bearing was a tapered roller bearing
and the diameter was over 30 cm.
: Bearings
Accelerometer
Ring
Pinion
Output shaft
Ring
Planets
Planets
Bearing 2 Bearing 1
Gear
Sun
Sun
Main shaft
Carrier
Helical gear
1st planetary stage
stage
2nd planetary
stage
Figure 1: Schematic figure of the WEC where the location of the accelerometer is shown.
In total 518 measurements (one per day) were collected. The sample rate was 12.8 kHz and the measurement
time per measurement was 1.28 seconds. This led to a frequency resolution of 0.7813 Hz for each bin. During
the measurement campaign, the speed of the high-speed shaft varied from 702 to 1174 RPM. Moreover, a process
parameter which indicates how much power the converter is producing was recorded (referred to as the feature P).
The difference when dividing the highest value of P by the smallest one was 2.77. In order to trend the wind turbine
degradation, key condition indicators (CI) were extracted (see Table 1) which are sensitive to certain faults. The defect
frequency multipliers for bearing 1 (B1) were 11.37 (BPFI) and 8.63 (BPFO). For bearing 2 (B2), these multipliers
were 9.25 (BPFI) and 6.75 (BPFO).
Table 1: Explanation of uncommon features.
Feature Explanation Unit

BPFI Sum of the BPFI and the absolute values of the± 3 shaft speed SBs
BPFI 1st H Sum of the 1st harmonic of the BPFI and the absolute values of the ± 3 shaft speed SBs g
BPFI 2nd H Sum of the 2nd harmonic of the BPFI and the absolute values of the ± 3 shaft speed SBs g
BPFIrss Sum of the absolute values of all the BPFI-related defect frequency components g
*BPFOrss Sum of the absolute value of all the BPFO-related defect frequency components g
P Process (produced power) Watt
*The BPFO-related features are calculated similarly to how the BPFI-related features are calculated,
without including any of the peak values of the shaft speed SBs.
Abbreviation Meaning
BPFI Ball pass frequency inner
BPFO Ball pass frequency outer
SB Sideband
124
3. One-class support vector machine
The aim of novelty detection is to identify new or unknown data which a machine learning system is not aware of
during training (Markou and Singh, 2003). A novelty detection algorithm uses two main approaches for estimating the
probability density function (PDF), namely the parametric approach (for known distributions) and the non-parametric
approach (for unknown distributions) (Desforges et al., 1998). In this study we used a method called the one-class
support vector machine (OCSVM, one-class SVM or nu-SVM), which was originally developed by Schölkopf et al.
(1999). It is a non-parametric method where the PDF is estimated using the kernel method.
Since only data from one-class are available (nominal data in this case), the one-class SVM algorithm cannot
maximize the margin between two classes similarly to what is done when using the regular SVM (Schölkopf et al.,
1999). Instead, the goal is to develop an algorithm which returns a function that takes value 1 in a small region (a
nominal region in this case) and -1 elsewhere (a faulty region). The strategy is to map the data into feature space F
(using a known kernel function) and to separate them from the origin with the maximum margin, which is also known
as the hyperplane. The objective function of a one-class SVM (Equation 1) resembles that of the two-class SVM
with some small differences. Instead of the cost function, in the one-class SVM it is the parameter ν ∈ [0, 1] that
characterizes the solution by solving a quadratic solution, where (Schölkopf et al., 1999)
n
kωk2 1 X
min + ξi − ρ (1)
ω,ξ,ρ 2 νn i=1
Subject to:
(ω · Φ(xi )) ≥ ρ − ξi ∀i ∈ N
ξi ≥ 0 ∀i ∈ N,
where n is the number of instances, ρ is the offset parameter and ξ is the slack variable. ω and ρ are hyperplane
parameters represented in the equation, ωT x + ρ = 0.
This quadratic optimization problem is solved using Lagrange multipliers, and the decision function rule for a
datapoint x then becomes (Schölkopf et al., 1999)
n
X
f (x) = sgn( αi yi K(x, xi ) + b), (2)
i=1
where the coefficient αi > 0, b is a constant and K is the kernel function.

To ensure an excellent performance, the kernel must be chosen with great care, as it has a crucial effect on the
performance (Muller et al., 2001). Determining which kernel to use depends on the data and the number of features.
In this study the Gaussian radial base function (RBF) was used:
2
K(x, x0 ) = exp(γ x − x0 ), (3)
where γ is the kernel parameter and kx − x0 k is the dissimilarity measure.
4. Training the one-class SVM
Selecting the Gaussian (RBF) kernel and using the default tolerance of the termination criterion, there are two
parameters (γ and ν), which needs to be optimized when using one-class SVM toolbox (Chang and Lin, 2011). γ
defines the extent of the influence of a single training example. Increasing the γ value, lowers the influence. ν sets
an upper bound on the fraction of outliers (training examples regarded out-of-class) and it is a lower bound on the
number of training examples used as Support Vector (Schölkopf et al., 2000).
125
A B
A2 A3
A1 LIBERAL
SENSITIVITY
CONSERVATIVE
C 1-SPECIFICITY
Figure 2: Illustration of an ROC curve. Point A represents the ideal classifier, which is able to classify all the data points correctly with zero false
positives and zero false negatives. Point B represents the classifier which is able to classify correctly the times when the machine is faulty with zero
false negatives. Point C represents the classifier which is able to classify correctly every time when the machine is healthy with zero false positives.
Areas A1 (specificity 1-0.95), A2 (specificity 0.8-0.75) and A3 (0.6-0.55) are areas which can be targeted when setting the initial trade-off accuracy
between sensitivity and specificity.
5. Estimating the criticality of the system
The trade-off between sensitivity (how well the model is able to predict when the system is in a faulty state)
and specificity (how well the model is able to predict when the system is in a healthy state) is often plotted as an
ROC (receiver operating characteristic) space, as seen in Figure 2. The challenge is to find a reasonable, rational and
desirable balance between sensitivity and specificity (Swets et al., 2000).
In the ROC space diagram, there is one area called the conservative area and one called the liberal area. In general,
conservative models are able to detect when the system is healthy with fewer false alarms related to a false positive,
but may miss faults by giving more false negatives. Liberal models should be able to trigger an alarm signalling when
the system is no longer healthy, but may give more false alarms by increasing the false positive value. The remote
location of wind farms makes them hard to maintain. Therefore, the cost of false alarms is high and should be avoided.
On the other hand, critical faults must be avoided since the initial cost of building a new wind turbine is extremely
high. For these reasons, the wind turbine can be considered as a slightly liberal system. Considering these attributes,
the closest estimated area for detecting WEC faults should be near the A2 area seen in Figure 1. Models under group
A2 should give a rather good detection sensitivity when a fault is present, but may sometimes give false alarms. The
other areas tested in this study are those marked as A1 and A3. Theoretically, area A1 should give fewer false alarms,
but should give a later indication of when a fault is present than models near area A2 or A3. Models in group A3
should give more false alarms than those in the other groups, but should give earlier indications of when a fault has
occurred. In this study these three areas are estimated using the following procedure when training the one-class SVM
detection algorithm.
1. Nominal data were collected before any faults were present (covering 120 days).
2. The nominal data were divided into two classes, i.e. a training set (covering 100 days) and a testing set (covering
20 days).
3. The initial testing set (covering 20 days) was used to calculate the accuracy (later referred to as the baseline
specificity). Three baseline specificity values were targeted, 0.95, 0.85 and 0.7, and they were based on the
selected areas A1, A2 and A3.
4. A grid-search method was used where 15 ν and γ parameters were and the maximum ν and γ values were
chosen up to a value which would give a baseline accuracy of 50 % or higher.
126
5. Four models from each A area were selected. If more than four models from the selected area were found, the
models were selected where ν and γ are equal or almost equal.
6. Once the parameter values were selected, the models were re-trained using all the nominal data (covering 120
days).
6. Results
The presentation of the results is divided into two parts. First, we present an analysis of the condition of the wind
turbine performed using traditional condition monitoring methods (i.e. key indicators extracted from the time domain
and enveloped signals), and then we present a corresponding analysis performed using the one-class SVM.
6.1. Traditional analysis
40
RMS Acceleration
RMS of an enveloped signal
35
30
25
20
15 Event 2: Spall detected
10 Event 1: Incipient fault
0
0 100 200 300 400 500 600
Figure 3: RMS level of the measured acceleration signals from the wind turbine.
Before any hindsight failure analysis, two events were detected using available traditional vibration analysis tools.
These events (Event 1 and 2) are marked into Figure 3 with arrows. Event 1 indicates a point (on the 240th day) when
a small fault was first noticed, but could not be identified or verified. Event 2 (on the 347th day) indicates a time
when the fault was identified to be in the inner race of bearing 1. Detailed spectra for both of these events and two
measurement during the spall growth are shown in the Figure 4.
Spectra for the event 1 is shown in the Figure 4a, where BPFI defect frequency component, its two harmonics
and sidebands are marked with narrow lines. This event was estimated by condition monitoring technicians to most
likely be a surface anomaly, which was not a spall yet. As can be seen from the Figure 4a, overall level of vibration
has increased when comparing to the nominal condition. However clear indications cannot be seen at the close
proximity of the marked defect frequencies. Spectra for the event 2 seen in the Figure 4b shows even more elevated
overall vibration level of multiple frequency components. Nevertheless, BPFI defect frequency component nor its
harmonic are significantly higher than any other frequency components. Some of the sidebands are elevated higher
than any other frequency components, which may indicate the start of a spall. Five days later, BPFI defect frequency
component and its harmonics can clearly be seen by looking at the Figure 4c. At this point, most of the vibration
127
frequency components are ten times higher than what they were in the beginning. At this point, most of the good
vibration analysis tools should be able to detect the presence of a fault. Figure 4d, shows the spectra when spall
growth has continued for at least 50 days. At this point BPFI defect frequencies were clearly separated from the other
frequency components.
1.6 2.5
Nominal condition Nominal condition
th th
1.4 240 measurement 340 measurement
BPFI BPFI
BPFI 1st H. 2 BPFI 1st H.
1.2 BPFI 2nd H. BPFI 2nd H.
Sidebands Sidebands
1
1.5
gE
gE
0.8
1
0.6
0.4
0.5
0.2
0 0
0 100 200 300 400 500 600 0 100 200 300 400 500 600
Frequency (Hz) Frequency (Hz)
(a) Day 240 (Event 1). (b) Day 340 (Event 2).
8 100
Nominal condition Nominal condition
7 350 th measurement 90 400 th measurement
BPFI BPFI
BPFI 1st H. 80 BPFI 1st H.
6 BPFI 2nd H. BPFI 2nd H.
Sidebands 70 Sidebands
5
60
gE
gE
4 50
40
3
30
2
20
1 10
0 0
0 100 200 300 400 500 600 0 100 200 300 400 500 600
Frequency (Hz) Frequency (Hz)
(c) Day 350. (d) Day 400.
Figure 4: Enveloped spectra of the acceleration signals. Yellow curve shows the spectra calculated during the first measurement day.
Figure 3 shows the RMS level of the accelerometer signal measured near the high-speed shaft of the WEC as a
function of time. The RMS and enveloped RMS values in event 1 are only slightly higher than those seen at days
between 81–89 and 167. Therefore, the threshold should have been placed exactly on the correct level in order to
detect this incipient fault without any false alarms. The identification of the inner race fault (event 2) might have taken
place earlier, since similar shapes (to those appearing during event 2) were also seen twice (see the peaks on day 324
and day 335) before the confirmed spalling. This would indicate that the spalling started at least as early as day 324,
but was unnoticed. However, when comparing the peaks seen in event 2 to the one seen at day 81–89, an identification
was made rather early, since the spalling could easily have been undetected with the threshold not set exactly to the
correct position. Furthermore, most likely the latest detection time using RMS or enveloped RMS indicators would
have been around the 354th measurement, since the levels are twice as high as those seen on day 8 (an RMS over 5
128
and an enveloped RMS over 8). Note that the RMS levels would only be able to send an alarm and would be unable
to identify the location of the fault.
Figure 5a shows the trend of four key indicators used for identifying bearing faults. These features were calculated
from the enveloped accelerometer signals and each feature is explained in Table 1. Moreover, event 1 and 2 are marked
similarly to how they were marked in Figure 3.
At event 1, the time of the incipient fault, no difference between inner and outer race features can be observed.
Without any further processing, the four features do not provide data to determine fault location at this early stage of
the degradation. When comparing the levels of each feature presented in Figure 5a near event 2, a slight increase is
seen in the level of BPFIrss in bearing 1. These results indicate that the fault is located in the inner race in bearing 1.
At day 324 increased level of BPFIrss is also seen and this is most likely the first point when some indications of the
fault location could have been identified using features based on the envelope analysis.
Figure 5b shows the same analysis as presented in Figure 5a, but with the difference that each feature has been
divided with the process parameter P. When the features are scaled with P, the peaks previously seen at days 81 to 89
and day 167 are smeared out. Because the process parameter P is correlated to the load, the peaks seen at days 81–89
and day 167 were probably caused by high load in the WEC system. In fact, during the day 167 the P feature value
was eight largest value seen during the whole measurement campaign. It could therefore be advantageous to scale
vibration levels with load based features when engineering input features for one-class SVM.
129
90
B1 BPFIrss
80 B1 BPFOrss
B2 BPFIrss
B2 BPFOrss
70
60
50
40
Event 2: Spall detected
30
Event 1: Incipient fault
20
10
0
0 100 200 300 400 500 600
(a) Features without scaling with the process parameter.

70
B1 BPFIrss/P
B1 BPFOrss/P
60 B2 BPFIrss/P
B2 BPFOrss/P
50
40
30
Event 2: Spall detected
20
Event 1: Incipient fault
10
0
0 100 200 300 400 500 600
(b) Features scaled with the process parameter.
Figure 5: Trend of wind turbine features calculated from the enveloped accelerometer signals.
130
6.2. Fault detection using a one-class SVM
Based on the vibration analysis performed using the traditional methods, the WEC degradation data set was
divided into several segments (see Table 2). These segments represent estimated stages of fault progression. For
instance, if the trained one-class SVM model sends an alarm between days 121 and 237, it is considered to be a false
alarm. Note, that these segments are estimated using manual vibration analysis and cannot be validated with high
certainty, especially during the early period of the measurement campaign. Therefore, in the Table 2 description of
the symptoms is given how the segments are divided based on the vibration levels during the measurement campaign.
Table 2: Separating the wind turbine degradation data into several segments according to the condition of the bearing using traditional methods.
Segments (d) WEC condition Estimated stage Symptoms

1–100 Nominal Training
Low level of overall
101–120 Nominal Baseline specificity testing/Training
vibration
121–237 Nominal False positive
238–340 Incipient fault Early alarm Sudden increase of CI levels
341–350 Faulty system Late alarm/Early fault identification Sidebands around BPFI de-
fect frequencies
351–430 Faulty system Late fault identification Increase of BPFI related de-
fect frequencies
431-end Faulty system Fast fault propagation High level of overall vibra-
tion
Table 3: One-class SVM baseline specificity (%) by varying the ν and γ parameters. Three estimated areas (A1-A3) are highlighted using different
shades of grey background. Three BPFI-related features were used as an input.
ν γ 0.01 0.045 0.08 0.12 0.15 0.19 0.22 0.26 0.29 0.33 0.36 0.40 0.43 0.47 0.5
0.01 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
0.045 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95
0.08 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95
0.12 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95
0.15 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95
0.19 85 85 85 85 85 85 85 90 85 90 85 85 85 85 85
0.22 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85
0.26 80 80 80 80 80 80 80 80 80 80 85 85 85 85 85
0.29 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70
0.33 60 60 60 60 60 60 60 60 60 60 60 60 60 55 55
0.36 60 60 60 60 60 60 55 55 55 55 55 55 55 55 55
0.40 50 50 50 50 50 50 50 50 50 50 55 55 55 55 55
0.43 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50
0.47 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50
0.5 45 45 45 45 50 50 50 50 50 50 50 50 50 50 50
A1 A2 A3
One-class SVM models for fault detection were selected using the steps explained in Section 5. The nominal
(healthy) data were divided into two sets, the first of which was used for initial training while the second was used for
calculating the baseline specificity. The input features for the results presented in Table 3 were BPFI, BPFI 1st H and
BPFI 2nd H for bearing 1. The testing set consisted of twenty points and, therefore, the accuracy is increasing by five
points. As can be seen in Table 3, the accuracy is decreasing almost consistently when the ν value is increasing. γ
value was affecting less to the baseline accuracy.
Based on these results presented in the Table 3, 12 model parameters were selected by choosing four models from
area 1 (highlighted with a light grey background), where the targeted baseline specificity was 95 %. In this area the
ν values were between 0.045–0.15 and γ values were between 0.045–0.5. Four models were selected from area 2
131
(highlighted with a medium grey background), where the targeted baseline specificity was 85 %. In this area the range
of the ν value was between 0.19–0.26 and γ value was between 0.19–0.26. Four models were selected from area 3
(highlighted with a dark grey background), where the targeted baseline specificity was 70%. In this area the range
of the ν value was always 0.29 and γ value was between 0.01–0.5. If more than four models had the same initial
specificity within the selected range, the models were chosen to be near the diagonal vector where ν and γ had equal
or almost equal values. After this step, models with the chosen model tuning parameters were re-trained using all 120
nominal points in order to increase the training set size.
Later the accuracy of each chosen model was tested using the remaining data set (from measurement point 121 to
the end) by taking a sliding window where the five latest measurement points were evaluated. When five (0/5 healthy
points in this case) measurements indicated the presence of a fault, an alarm was raised and the further accuracy was
set to zero.
Fault detection using specific fault frequency features

Figure 6 shows the times of the alarms for each of the 12 models when the input features were specified for bearing
1, and Figure 7 shows the corresponding results when the input features were specified for bearing 2. In these figures,
the first row shows four models whose model parameter values were selected using a baseline accuracy of 0.95. These
models are considered to be in area 1.
When comparing these first-row models to each other, the results are very similar. All model specified for inner
race fault B1 sent an alarm on day 328. This can be considered as a good result for detection. However, when
comparing these results against models specified for detecting B1 outer race faults, the fault detection is occurring at
the same time. Therefore, it is not possible to accurately say if the fault is at the inner raceway or outer raceway.
By comparing these models against models trained using feature set sensitive to detect bearing faults of a bearing,
which remained healthy (upper row, Figure 7), similar results were seen. Therefore, it can be concluded that these
A1 models could have been used for detecting abnormal behaviour when 5 consecutive points are sent as anomalous
event, but it was not possible to identify the fault location. Surprisingly A1 model (ν = 0, 0115,γ = 0, 115) trained to
detect inner race fault of a bearing 2 would have seen abnormal behaviour with 4/5 consecutive anomalous points at
day 285. Even though this can be considered as early alarm time, it would have been difficult differentiate it from a
false alarm, especially since this bearing was not the one that was damaged.
The second rows of Figures 6 and 7 show four models where the model parameter values were selected using an
initial accuracy of 0.85. These models are considered to be in area 2 and should give an earlier alarm than A1 models,
but may produce more false alarms. Also, these models would have raised the alarm at the same time (day 328), if all
measurement points in the tested window should be seen as anomalous. However, there are two models, which would
have raised the alarm at day 285 with 4/5 consecutive anomalous points. However, the sensitivity was the same as it
was when the model was trained using feature set specified to detect bearing 2 inner race faults.
The third rows of Figures 6 and 7 show four models whose model parameter values were selected using an initial
accuracy of 0.6. In these models, the results were rather similar as was seen with A2 models. Only exception was that
all models would have caused the alarm on a day 285 with 4/5 consecutive anomalous points.
In general, these results indicate that using collective anomalies where 5 points are causing the alarm as a threshold,
the selection of model tuning parameter is less important. However, when using 3/5 anomalous consecutive point or
less as the threshold, big difference within models were seen. For instance using 3/5 rule as the threshold for the A1
models and comparing the models presented in the first rows of Figures 6 and 7 with those presented in the second
and third rows, there is only one period of time when false alarms are seen, but using the same threshold value for
A2 and A3 models, there are two periods (first one starting at a day 170 and second one starting at a day 203) where
multiple false alarms are seen.
Note that the parameter selection based on the calculated baseline specificity was only performed using features
related to the inner race fault in order to be able to compare the models, which were trained using the same tuning
parameters. In reality, parameter values based on the baseline specificity should be re-calculated when using different
input features in order to obtain the correct model accuracy for other faults too.
132
A1 models A1 models
5 5
= 0.08 = 0.08 = 0.115 = 0.115 = 0.08 = 0.08 = 0.115 = 0.115
=0.01 =0.045 =0.08 =0.115 =0.01 =0.045 =0.08 =0.115
4 4

3 3
2 2
1 1
0 0
125 175 225 275 325 375 125 175 225 275 325 375
Time (d) Time (d)
A2 models A2 models
5 5
= 0.185 = 0.22 = 0.185 = 0.22 = 0.185 = 0.22 = 0.185 = 0.22
=0.185 =0.185 =0.22 =0.22 =0.185 =0.185 =0.22 =0.22
4 4
3 3
2 2
1 1
0 0
125 175 225 275 325 375 125 175 225 275 325 375
Time (d) Time (d)
A3 models A3 models
5 5
= 0.29 = 0.29 = 0.29 = 0.29 = 0.29 = 0.29 = 0.29 = 0.29
=0.22 =0.255 =0.29 =0.325 =0.22 =0.255 =0.29 =0.325
4 4
3 3
2 2
1 1
0 0
125 175 225 275 325 375 125 175 225 275 325 375
Time (d) Time (d)
Figure 6: Detecting bearing 1 faults using the one-class SVM algorithm. The left-hand column shows the number of predicted healthy points when
the models have been trained using features related to inner race faults (BPFI, BPFI 1st H and BPFI 2nd H). The right-hand column shows the
corresponding results when the models have been trained using features related to outer race faults (BPFO, BPFO 1st H and BPFO 2nd H).
133
A1 models A1 models
5 5
= 0.08 = 0.08 = 0.115 = 0.115 = 0.08 = 0.08 = 0.115 = 0.115
=0.01 =0.045 =0.08 =0.115 =0.01 =0.045 =0.08 =0.115
4 4

3 3
2 2
1 1
0 0
125 175 225 275 325 375 125 175 225 275 325 375
Time (d) Time (d)
A2 models A2 models
5 5
= 0.185 = 0.22 = 0.185 = 0.22 = 0.185 = 0.22 = 0.185 = 0.22
=0.185 =0.185 =0.22 =0.22 =0.185 =0.185 =0.22 =0.22
4 4
3 3
2 2
1 1
0 0
125 175 225 275 325 375 125 175 225 275 325 375
Time (d) Time (d)
A3 models A3 models
5 5
= 0.29 = 0.29 = 0.29 = 0.29 = 0.29 = 0.29 = 0.29 = 0.29
=0.22 =0.255 =0.29 =0.325 =0.22 =0.255 =0.29 =0.325
4 4
3 3
2 2
1 1
0 0
125 175 225 275 325 375 125 175 225 275 325 375
Time (d) Time (d)
the models have been trained using features related to inner race faults (BPFI, BPFI 1st H and BPFI 2nd H). The right-hand column shows the
corresponding results when the models have been trained using features related to outer race faults (BPFO, BPFO 1st H and BPFO 2nd H).
134
Fault detection using specific fault frequency features which are divided using the process parameter
The results using normalized features are presented in Figures 8 and 9. The approach is similar to the previously
presented result, expect now each feature is divided with the measured and normalized process feature (values are
normalized between 1-2). By comparing first row models in Figure 8 it can be seen that fault was detected at the day
257 using the model with ν = 0.115 and γ = 0.08 and the model with ν = 0.115 and γ = 0.115, respectively. The other
two models were able to detect the fault at day 301. The fault detection on day 257 and 301, can both be considered
to be early alarms. As a comparison, models trained to detect inner race faults of bearing 2 were raising the alarm at
day 257. This indicates that fault detection using features divided by the process parameter are not as promising as
it first seems. For the second row models (A2 area), an alarm was triggered for bearing 1 again at day 257 as shown
in Figure 8. Furthermore, all A2 models would have caused false alarms before day 196 and third row models were
even worse than the A2 models.
In overall, based on the results presented in Figure 8 and Figure 9, the detection and identification of the fault using
one-class SVM performed worse when compared to traditional methods. Features divided with process parameter
caused more false positive point anomalies and good results were not achieved despite that some of the peaks were
smeared, which were caused most likely due to increase stresses in the WEC. Therefore, using the process parameter
for reducing the effects of operational changes, was unsuccessful.
Fault identification by combining faults-specific features

Because the identification of fault location was unsuccessful using models with inner ring and outer ring features
separately, one-class SVM models including both these features were tested. In total, six features were considered:
BPFI, BPFO, BPFI 1st ,BPFO 1st , BPFI 2nd and BPFO 2nd . Models with combined features, which are able to detect
both inner and outer rings faults, would also avoid the problem of redundant detection models in systems with several
bearings.
Before training one-class SVM models, baseline accuracy was tested using the grid search with the 15 ν and γ
parameters. Results are shown in Table 4. From the table four models were selected from each area (A1–A3) for
further studies.
Table 4: One-class SVM baseline specificity (%) by varying the ν and γ parameters. Three estimated areas (A1-A3) are highlighted using different
shades of grey background. Three BPFI-related features were used as an input.
ν γ 0.01 0.045 0.08 0.12 0.15 0.19 0.22 0.26 0.29 0.33 0.36 0.40 0.43 0.47 0.5
0.01 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
0.045 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
0.08 100 100 100 100 100 100 100 95 95 95 95 95 95 95 95
0.12 95 95 95 95 95 95 95 90 90 90 90 90 90 90 90
0.15 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
0.19 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85
0.22 75 75 75 75 75 75 75 75 75 75 75 75 85 85 80
0.26 70 70 70 70 70 70 75 75 75 75 70 70 70 70 70
0.29 70 70 70 65 65 65 65 65 65 65 65 65 65 65 65
0.33 55 55 55 55 55 55 55 55 55 55 55 55 55 55 65
0.36 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55
0.40 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55
0.43 50 50 50 50 50 50 50 50 50 50 50 50 50 55 55
0.47 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50
0.5 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50
A1 A2 A3
The results when defect frequencies were targeted to detect faults in bearing 1 are presented in Figure 10. As can
be seen in the figure, all A1 models detected the fault at day 355. This can be considered as a late fault detection time.
Before day 325 there were no anomalous points, expect immediately after day 125. Using 3 out of 5 consecutive
anomalies as the threshold of detection, these models would have detected anomalous event around day 325, which
can be considered to be an early time of detection. A2 models detected the anomalous event at day 329 by using 5/5
135
A1 models A1 models
5 5
= 0.08 = 0.08 = 0.115 = 0.115 = 0.08 = 0.08 = 0.115 = 0.115
=0.01 =0.045 =0.08 =0.115 =0.01 =0.045 =0.08 =0.115
4 4

3 3
2 2
1 1
0 0
125 145 165 185 205 225 245 265 285 305 125 145 165 185 205 225 245 265 285 305
Time (d) Time (d)
A2 models A2 models
5 5
= 0.185 = 0.22 = 0.185 = 0.22 = 0.185 = 0.22 = 0.185 = 0.22
=0.185 =0.185 =0.22 =0.22 =0.185 =0.185 =0.22 =0.22
4 4
3 3
2 2
1 1
0 0
125 145 165 185 205 225 245 265 125 145 165 185 205 225 245 265
Time (d) Time (d)
A3 models A3 models
5 5
= 0.29 = 0.29 = 0.29 = 0.29 = 0.29 = 0.29 = 0.29 = 0.29
=0.22 =0.255 =0.29 =0.325 =0.22 =0.255 =0.29 =0.325
4 4
3 3
2 2
1 1
0 0
125 145 165 185 205 225 245 265 125 145 165 185 205 225 245 265
Time (d) Time (d)
the models have been trained using features related to inner race faults (BPFI, BPFI 1st H and BPFI 2nd H), which are divided point-wise by feature
vector P. The right-hand column shows the corresponding results when the models have been trained using features related to outer race faults
(BPFO, BPFO 1st H and BPFO 2nd H), which are divided point-wise by feature vector P.
136
A1 models A1 models
5 5
= 0.08 = 0.08 = 0.115 = 0.115 = 0.08 = 0.08 = 0.115 = 0.115
=0.01 =0.045 =0.08 =0.115 =0.01 =0.045 =0.08 =0.115
4 4

3 3
2 2
1 1
0 0
125 145 165 185 205 225 245 265 125 145 165 185 205 225 245 265 285
Time (d) Time (d)
A2 models A2 models
5 5
= 0.185 = 0.22 = 0.185 = 0.22 = 0.185 = 0.22 = 0.185 = 0.22
=0.185 =0.185 =0.22 =0.22 =0.185 =0.185 =0.22 =0.22
4 4
3 3
2 2
1 1
0 0
125 135 145 155 165 175 185 195 205 125 130 135 140 145 150 155 160 165 170
Time (d) Time (d)
A3 models A3 models
5 5
= 0.29 = 0.29 = 0.29 = 0.29 = 0.29 = 0.29 = 0.29 = 0.29
=0.22 =0.255 =0.29 =0.325 =0.22 =0.255 =0.29 =0.325
4 4
3 3
2 2
1 1
0 0
125 130 135 140 145 150 155 160 165 125 130 135 140 145 150 155 160 165 170
Time (d) Time (d)
the models have been trained using features related to inner race faults (BPFI, BPFI 1st H and BPFI 2nd H), which are divided point-wise by feature
vector P. The right-hand column shows the corresponding results when the models have been trained using features related to outer race faults
(BPFO, BPFO 1st H and BPFO 2nd H), which are divided point-wise by feature vector P.
137
as the threshold of alarm. By comparing these models against models seen in the group A1, there are more occasions
where one or two measurements are seen as anomalies. This is expected and is aligned with the initial hypothesis
of using the ROC space selection of models. A3 models also detected the anomalous event at day 329. Because A3
models are slightly more over-fitted than A2 models, more anomalies were seen when compared to the A2 group
models.
By selecting a threshold where four faulty cases would raise an alarm (0 or 1 out of 5 predicted as healthy), all 12
models would have detected the anomalous event at the early stage of the fault.
In reality it is not useful that detection models are only targeted for one component in a system. Therefore one-
class SVM was trained using the same technique, but selecting features that are more sensitive of finding bearing
2 faults. Since the bearing 2 remained healthy during the measurement campaign, the fault in bearing 1 should be
detected and localized later or not at all.
The result for the one-class SVM trained for bearing 2 is shown in Figure 11. The fault in bearing 1 was detected
for all A1 models at day 357 when features were sensitive to find faults in bearing 2 using 5 out 5 fault points as
threshold. This is only two days later than models trained to detect bearing 1 faults. Using 4/5 as threshold, the
anomalous event was detected 10 days earlier when compared to using features sensitive of detecting bearing 1 faults.
These results indicate that one-class SVM can detect the abnormal events using the A1 models, even though the fault
is not occurring at the bearing it was trained for. The reason can be an overall increased vibration level as well as
leakage of frequency components as explained by (Wu and Zhao, 2009). To improve fault localization, each feature
could be divided with the overall vibration level of the signal envelope.
The A2 group models detected the anomalous event at the same time or some 27 days later (with the threshold
0/5) when compared to models trained for the fault in bearing 1. For the A3 group, detection occurred at the same
time as when using features sensitive to detect bearing 1 faults.
Comparing the A1, A2 and A3 groups, the best results were achieved for one-class SVM models in the A2 group.
For this group, the anomalous event was detected either earlier or at the same time as using features insensitive for
the occurred inner race fault of a bearing 1. Furthermore, these models were able to send the alarm earlier than it was
detected using traditional methods.
Because the one-class SVM models using both inner and outer ring features were not able to identify the location
of the fault, further development is needed. One approach is to calculate the median value of each specific defect fre-
quency components in the beginning of the test and compare it against the calculated mean value of the measurement
points (points inside the detection window), once the model has detected the anomalous event.
If the investigated method is used for detecting other type of faults or in other wind turbines some aspects ought
to be considered.
• Training set should be long enough to cover even some of the rarely seen operating conditions. Otherwise too
many false alarms are seen and the selection of good tuning parameters is difficult to achieve.
• Since best results were achieved by using 4/5 or 5/5 consecutive anomaly points as the alarming threshold,
collective anomaly should work better than point anomaly detection.
• In most of the cases, features specified to detect a certain fault were as accurate as features specified to detect
another type of faults. Therefore, the one-class SVM model cannot identify the location and further studies are
needed.
• Since the identification was not possible, it is advised to use feature set where numerous defect frequency related
features are combined.
7. Conclusions
The present study arrived at the following findings concerning the detection of WEC bearing faults by using a
one-class SVM with the given data set, where an inner race spalling was later confirmed.
• Accurate fault detection was achieved when BPFI-related and BPFO-related features were combined as one
feature set and the model parameters were selected using pre-defined baseline specificity of 0.85.
138
A1 models
5
= 0.08 = 0.08 = 0.115 = 0.115
=0.01 =0.045 =0.08 =0.115
4
0
125 175 225 275 325 375
Time (d)
A2 models
5
= 0.185 = 0.185 = 0.22 = 0.22
=0.15 =0.185 =0.43 =0.465
0
125 175 225 275 325 375
Time (d)
A3 models
5
= 0.255 = 0.255 = 0.29 = 0.29
=0.185 =0.36 =0.045 =0.08
4
0
125 175 225 275 325 375
Time (d)
Figure 10: Detecting bearing 1 faults using the one-class SVM algorithm. Models have been trained using features related to inner race faults
(BPFI, BPFI 1st H and BPFI 2nd H) and to outer race faults (BPFO, BPFO 1st H and BPFO 2nd H).
• Fault detection capability was worse when each measured feature was scaled using the process parameter.
• A combination of model tuning parameters and feature set was not found that were able to identify the bearing
fault location without using any additional post-process techniques.
8. Acknowledgements
The authors would like to thank SKF AB and Vinnova for their financial support.
References
Barszcz, T., Randall, R. B., 2009. Application of spectral kurtosis for detection of a tooth crack in the planetary gear of a wind turbine. Mechanical
Systems and Signal Processing 23 (4), 1352–1365.
139
A1 models
5
= 0.08 = 0.08 = 0.115 = 0.115
=0.255 =0.29 =0.115 =0.15
4
0
125 175 225 275 325 375
Time (d)
A2 models
5
= 0.185 = 0.185 = 0.22 = 0.22
=0.15 =0.185 =0.43 =0.465
0
125 175 225 275 325 375
Time (d)
A3 models
5
= 0.255 = 0.255 = 0.29 = 0.29
=0.185 =0.36 =0.045 =0.08
4
0
125 175 225 275 325 375
Time (d)
Figure 11: Detecting bearing 2 faults using the one-class SVM algorithm. The left-hand column shows the number of predicted healthy points
when the models have been trained using features related to inner race faults (BPFI, BPFI 1st H and BPFI 2nd H) to outer race faults (BPFO, BPFO
1st H and BPFO 2nd H).
Chang, C.-C., Lin, C.-J., 2011. Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST)
2 (3), 27.
Desforges, M., Jacob, P., Cooper, J., 1998. Applications of probability density estimation to the detection of abnormal conditions in engineering.
Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 212 (8), 687–703.
Fernández-Francos, D., Martı́Nez-Rego, D., Fontenla-Romero, O., Alonso-Betanzos, A., 2013. Automatic bearing fault diagnosis based on one-
class ν-svm. Computers & Industrial Engineering 64 (1), 357–365.
Hameed, Z., Hong, Y., Cho, Y., Ahn, S., Song, C., 2009. Condition monitoring and fault detection of wind turbines and related algorithms: A
review. Renewable and Sustainable energy reviews 13 (1), 1–39.
Keizer, M. C. O., Flapper, S. D. P., Teunter, R. H., 2017. Condition-based maintenance policies for systems with multiple dependent components:
A review. European Journal of Operational Research.
Keller, J., McDade, M., LaCava, W., Guo, Y., Sheng, S., 2012. Gearbox reliability collaborative update. US National Renewable Energy Laboratory
(NREL) report PR-5000-54558.
Markou, M., Singh, S., 2003. Novelty detection: a reviewpart 1: statistical approaches. Signal processing 83 (12), 2481–2497.
140
McFadden, P., Smith, J., 1984. Vibration monitoring of rolling element bearings by the high-frequency resonance techniquea review. Tribology
international 17 (1), 3–10.
Muller, K.-R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B., 2001. An introduction to kernel-based learning algorithms. IEEE transactions on
neural networks 12 (2), 181–201.
Randall, R. B., Antoni, J., 2011. Rolling element bearing diagnosticsa tutorial. Mechanical systems and signal processing 25 (2), 485–520.
Schlechtingen, M., Santos, I. F., 2011. Comparative analysis of neural network and regression based condition monitoring approaches for wind
turbine fault detection. Mechanical systems and signal processing 25 (5), 1849–1875.
Schölkopf, B., Smola, A. J., Williamson, R. C., Bartlett, P. L., 2000. New support vector algorithms. Neural computation 12 (5), 1207–1245.
Schölkopf, B., Williamson, R. C., Smola, A. J., Shawe-Taylor, J., Platt, J. C., et al., 1999. Support vector method for novelty detection. In: NIPS.
Vol. 12. Citeseer, pp. 582–588.
Shin, H. J., Eom, D.-H., Kim, S.-S., 2005. One-class support vector machinesan application in machine fault detection and classification. Computers
& Industrial Engineering 48 (2), 395–408.
Swets, J. A., Dawes, R. M., Monahan, J., 2000. Psychological science can improve diagnostic decisions. Psychological science in the public interest
1 (1), 1–26.
Villa, L. F., Reñones, A., Perán, J. R., De Miguel, L. J., 2011. Angular resampling for vibration analysis in wind turbines under non-linear speed
fluctuation. Mechanical Systems and Signal Processing 25 (6), 2157–2168.
Wu, J., Zhao, W., 2009. A simple interpolation algorithm for measuring multi-frequency signal based on dft. Measurement 42 (2), 322 – 327.
URL http://www.sciencedirect.com/science/article/pii/S026322410800105X
Zaher, A., McArthur, S., Infield, D., Patel, Y., 2009. Online wind turbine fault detection through automated scada data analysis. Wind Energy
12 (6), 574–593.
141
6. PAPER B
142
Paper C
143
Operations Research Perspectives 5 (2018) 232–244
Contents lists available at ScienceDirect
Operations Research Perspectives

journal homepage: www.elsevier.com/locate/orp
Detecting operation regimes using unsupervised clustering with infected

group labelling to improve machine diagnostics and prognostics
⁎,a,b
Juhamatti Saari , Johan Odeliusb
a
SKF-LTU University Technology Centre, Luleå University of Technology, Luleå, SE-97187, Sweden
b
Division of Operation, Maintenance and Acoustics, Luleå University of Technology, Luleå, SE-97187, Sweden
A R T I C LE I N FO A B S T R A C T
Keywords: Estimating the stress level of components while operation modes are varying is a key issue for many prognostic
Maintenance models in condition monitoring. The identification of operation profiles during production is therefore im-
Operation regime portant. Clustering condition monitoring data with regard to operation regimes will provide more detailed in-
Clustering formation about the variation of stress levels during production. The distribution of the operation regimes can
Data mining
then support prognostics by revealing the cause-and-effect relationship between the operation regimes and the
LHD
wear level of components.
In this study unsupervised clustering technique was used for detecting operation regimes for an underground
LHD (load-haul-dump machine) by using features extracted from vibration signals measured on the front axle
and the speed of the Cardan axle. The clusters were also infected with a small portion of the data to obtain the
corresponding labels for each cluster. Promising results were obtained where each sought-for operation regime
was detected in a sensible manner using vibration RMS values together with speed.
1. Introduction by using data to distinguish, for example, a faulty system from a healthy
one [19,27]. Regression models are mainly used for prognosis where
Prognostic and health management (PHM) of a system is a discipline the time to failure is estimated using existing historical data (see for
that link studies of failure mechanism to system lifecycle manage- instance [28]). Regression analysis involves the use of such techniques
ment [25]. One of the challenges of PHM is to estimate the stress level as neural networks, fuzzy logic systems and simpler univariate regres-
of the components of a system when operation modes vary. For many sion models; these techniques are not strictly reserved for regression
systems, it is either impossible or impractical to measure component analysis and can also be used for data mining. Recently Hanafizadeh
stress accurately, so the next best thing may be to detect operation et al. [16] used supervised neural networks to identify flow regimes in a
profiles during production. However, for complex systems, even the pipe to determine when the flow type was changing during operation.
operation profile can be unknown and may change on daily basis. There This technique aims to improve the control of the process by de-
is a need for methods which can use pre-existing data (condition termining when it is not optimal. However, it is not the most practical
monitoring or process data), often collected for other purposes, to de- one for identifying operation regimes of complex machines; the data
tect operation regimes. Results can be used to predict different life need to be labelled while training the model, and this is seldom done in
scenarios in case of incipient faults or to determine the correct time and a varying operating environment, as, for instance, with mobile ma-
place to apply diagnostic techniques. chines. Suarez et al. [26] tracked real-time onboard damage accumu-
Machine learning and pattern recognition techniques for data lation using a model called PHM/ALPS. The goal was to evaluate the
mining have been improving dramatically recently, with many more current mission profile (operating conditions) using past mission pro-
areas of application, including PHM. They have been adapted for and files (historical data) to demonstrate independent life prediction cap-
are used in the PHM of machines in the automotive industry [6], de- ability. It is difficult to adapt this type of technique for operation regime
fense and space programs [30] and heavy industries [33]. Machine detection, however, unless several mission profiles are pre-recorded or
learning techniques used in PHM can be divided into three rough ca- simulated. Unsupervised clustering techniques, may be more practical
tegories: classification, regression and clustering techniques. than supervised ones in some cases since they do not require historical
Classification algorithms are used to classify two or more categories data from several different operating conditions. The benefit of
⁎
Corresponding author at: SKF-LTU University Technology Centre, Luleå University of Technology, Luleå, SE-97187, Sweden
E-mail address: juhamatti.saari@ltu.se (J. Saari).
https://doi.org/10.1016/j.orp.2018.08.002
Received 7 February 2018; Received in revised form 6 July 2018; Accepted 3 August 2018
Available online 04 August 2018
2214-7160/ © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/BY-NC-ND/4.0/).
144
J. Saari, J. Odelius Operations Research Perspectives 5 (2018) 232–244
unsupervised techniques is the possibility of finding natural groups and N K
patterns in the data by optimizing the boundaries and the clusters in the p (X Z , μ, Λ) = ∏ ∏ (xn μk , Λ−k 1) znk ,
n=1 k=1 (2)
data. Mostly these techniques are used for anomaly detection, where
several clusters are formed to characterize typical system behaviour and where μ = {μk } is mean and Λ = {Λk} is precision.
alarm is send when data vector is outside of clusters [17]. Perhaps the Using conjugate prior distribution and choosing a Dirichlet dis-
most common unsupervised clustering method is the k-means algo- tribution over the mixing coefficient \boldmath π, which is defined
rithm [18]. This algorithm is initialized by picking k initial cluster as [7]
points and allocating all data points to the closest one. Another popular K
cluster algorithm proven to be successful in many situations is the ex- p (π ) = Dir (π α 0 ) = C (α 0 ) ∏ πkα0−1,
pectation-maximization [9,11]. When detecting operation regimes, k=1 (3)
there are limitations to using these algorithms (see for in-
where C(α0) is the normalization constant for the Dirichlet distribution.
stance [13,31]). Perhaps the biggest problem when trying to implement
Hyperparameter α0 can be interpreted as the effective number of ob-
these techniques is the need to set the number of clusters in advance, as
servations associated with each component of a mixture. If α0 is small,
this is rarely known for complex machines operating under unknown
the posterior distribution will be influenced primarily by the data rather
conditions or in a changing environment. For instance, load change in
than the prior.
one time and position during production might create three separate
By introducing independent Gaussian-Wishart prior governing the
clusters which cannot be treated as one mode.
mean and precision of each Gaussian component, the distribution can
To overcome this problem, Corduneanu and Bishop [10] have de-
be written as [7]
veloped a variational Bayesian Gaussian mixture (VBGM) model. With
K
this algorithm, it is not necessary to know the exact number of clusters
(k) in the beginning, since only the maximum number of clusters needs
p (μ, Λ) = p (μ Λ) p (Λ) = ∏ (μk m 0 , (β0 Λk)−1) (Λk W0, υ0 ).
k=1
to be set. Similar techniques have been applied to defining operation
(4)
regimes in the process industry using control parameters, such as valve
openings or temperature [34]. but in these techniques, the k value is Joint distribution of all of the random variables, is given by the
defined using another algorithm [12]. equation [7]
Although unsupervised techniques have advantages when compared
p (X , Z , π , μ, Λ) = p (X Z , μ, Λ) p (Z μ, Λ) p (π ) p (π Λ) p (Λ). (5)
to supervised ones, there are some practical limitations. One is the
validation of cluster labels, i.e., what each cluster actually represents. In the Eq. 3 only the variables X are observed.
To overcome this problem, we propose a method where the VBGM al- Considering a variational distribution which factorizes between the
gorithm is first used to separate a large set of condition monitoring data latent variables and the parameters, so that [7]
into groups (clusters) which are later infected with a smaller set of data
q (Z , π , μ, Λ) = q (Z ) q (π , μ, Λ). (6)
with labels. We apply the method to the analysis of vibration data
collected from a complex machine operating under harsh conditions With this assumption it is possible to obtain a traceable practical
(underground mining loader, LHD). Aim is to see how the unsupervised solution to the Bayesian mixture model. The optimal solution is found
algorithm, together with infection data, can be applied for separating by seeking a distribution for which the lower bound is largest.
operation modes using only condition monitoring data. We use vibra- A toolbox for the algorithm is publicly available at Mathworks [22].
tion measurement data collected for diagnosis purposes and consisting In this study, we kept the parameter settings at default each time the
of noise from many natural sources. Work is novel in that it applies the algorithm was run.These parameters, α0, was 1 and β0, which affects to
VGBM clustering algorithm to real data and explains how it can be used the initial precision value (Λ), was 1.
generically with infection data to predict labelled clusters. To overcome the problem of not knowing what each cluster re-
present, we propose method to collect another set of data which is
much smaller than the training set (See Fig. 1). This smaller set of data
2. Background and labelling operation regimes can be used to infect some or all of the found clusters in order to know
what they represent by predicting their clusters using already trained
Clustering technique (VBGM) used in this study for separating data models. Benefit of the technique is that the training can be carried out
for different clusters is based on the work by Corduneanu and Bishop for a much larger data set and rare patterns which may occur during
[10], which can be also found in the book written by Bishop [7]. When production in some situations, will be included in the model. However
the VBGM algorithm is used for mining condition monitoring data, disadvantage may be the difficulties of interpreting cluster labels, if
more specifically, to separate data into meaningful operation regimes, it data is distributed evenly among clusters. In these cases, parameters
is not necessary to know the exact number of clusters, since components needs to re-selected or use different initial parameter values to achieve
whose expected mixing coefficients are numerically indistinguishable better results. Infection data should be collected in such manner that
from zero are not plotted [7]. The method is also more practical one complete cycle of the operation is present.
(generalizable) since it can rely on data when the training set is large With this technique, once he computationally demanding training
and on the prior distribution assumption when the data set is small. phase is over (although it is the same as compared to traditional
In Gaussian mixture model for each observation xn we have a cor- maximum likelihood ones), real time or near real time cluster predic-
responding latent variable zn comprising a 1-of-K binary vector with tion for new data set is achievable for several system/components by
elements znk for k=1,... ,K. Denotation for observed data set is using on-site feature extraction and wireless communication together
X = x1, …, xN , similarly latent variables are denoted as Z = z1, …, zN . with centralized computing.
Conditional distribution of Z, given the mixing coefficients \bold- The ideal way to collect infection data set would be to let the op-
math π, is defined as follows [7] erator determine when to acquire data during operation (first-hand
knowledge) or to automatize data collection and use RFID tags or other
N K
similar techniques. These are used in many industries to keep track of
p (Z π ) = ∏ ∏ πk znk.
n=1 k=1 (1) mobile machines (for instance, in mining industry). Time period for
data collection should cover the whole operation mode in the beginning
For the observed data, the conditional distribution, given the latent and only later, if the operation mode is distributed evenly into many
variables and the component parameters, is as follows [7] clusters, a deeper analysis and better selection should be done.
145
Fig. 1. Proposed method how to determine operation groups by using unsupervised clustering.
2.1. Parameter selection adaptable for alternate systems. However authors do believe that
comprehensive, generic method for selecting input parameters prob-
The number of input parameters for detecting operation regime can ably cannot be achieved in the near future for solving the explained
concern two or more dimensions. Optimal parameter selection would problem. Not at least before some of the solution are in generic use and
be to choose the ones which are connected to the stress induced in the practical shortcomings are encountered. Therefore the best approach is
monitored component. For instance, the power of an electric motor can to use input features, which are tailored to each system separately.
be a good parameter when the assumption is that an increase in the However first priority should be to find features which has a potential
torsional load might be causing problems for the system being mon- to function with many types of systems. Therefore as a first step, we
itored, since it is known that P = Mω, where P is the power, M is the have chosen to study how common vibration features, without any pre-
moment and ω is the angular velocity. processing, are able to separate operation regimes in our case study.
However, in practice sensor network is always optimized based on
to the price-benefit ratio and therefore many of the measured para- 3. Relation to diagnostic and prognostic techniques
meters are only indirectly related to the stress of a certain component.
Therefore it is rational to study those parameters which are most In generic, the formal definition of the RUL can be expressed as the
commonly collected for other purposes (e.g. diagnosis purposes). Note, stochastic degradation process of the system [36]. Stochastic process
that this may create another problem where used feature values start to can be modelled as a first hitting time (FHT model [21], denoted by
change once the monitored component is degrading and thus the X (t ), t ∈ , x ∈ with initial value X (0) = x 0 , where is the time
trained clusters are not valid anymore. For instance vibration based space and is the state space of the process. Boundary set when the
features have a tendency to increase near the end of its technical life. failure occurs can be denoted as a boundary set , where ⊂ . FHT
One solution how the problem can be avoided is to use vibration model denotes that RUL has ended when the process lie outside the
sources which are not directly related to the diagnosis of the given boundary set . Two important aspects for defining the RUL correctly is
component, but are located nearby (monitoring other component to define the Boundary set correctly and to estimate the time when
faults). For instance as in the given case study (see Section 4), sensors the process X(t) lies outside the Boundary set [21].
located on the left side of the front axle can be used for monitoring General description estimating the RUL cannot consider any op-
operation changes happening at the right side and vice versa. This way, erational changes and assumption is made that the system will only
(once detection algorithms are available), both can be studied in par- have one homogeneous operation mode. However in most of the real
allel without using the same features for diagnosis purposes and for world systems, it is reasonable to assume that some of the working state
detecting the operation regimes. will cause a higher degradation rate than others [24]. Therefore the
Generic unsupervised feature selection is the focus of many studies state space can consist of possible states during the operation
[8,15,35]. If these selection filters can be successfully used as a pre- , r = 0, 1, 2, …. With the help of proposed clustering technique it is
processing step, it would make the operation regime clustering easily possible to separate the process variable data into several clusters so
146
Proposed model is built by completing the following steps.
• Collect a set of data using the chosen parameters with sufficient time
segment.
• Data normalization (e.g. z-score method).
• Choose the maximum number of clusters (K).
• Collect small samples from each operational mode to infect and
label the clusters, see Fig. 2c.
• Once the model has been trained, study the distribution of new data
instances to determine how the operation modes vary during the
process and re-evaluate the need of merging or separating found
clusters.
4. Case study and selected input parameters
To test the proposed method in a real environment, data were col-

lected from a LHD (load-haul-dump machine) working in an under-
ground mine (Fig. 3a). Such LHD’s are subjected daily to several dif-
ferent operating conditions where the environment, boulder sizes, road
condition and even the operator are changing. Loaders belong to a class
of vehicles for which it can be very challenging to use traditional
condition monitoring techniques, since the rotation speed changes and
the loads and type of load vary. This combination of factors is usually so
demanding that there are no good methods for estimating when some of
the critical components are going to fail, since even the performance of
diagnoses can sometimes be challenging. Nowadays this type of ma-
chine mostly relies on preventive maintenance and weekly inspec-
tions [1]. However, the development of these machines is heading to-
wards full automation [14] and therefore the demand to find methods
for the creation of a condition-based maintenance protocol is in-
creasing.
The LHD model in question is made by Sandvik and is an LH621.
Vibration measurements were performed using a National Instruments
CompactRIO 9024 data logger where four SKF Copperhead CMPT 2310
accelerometer sensors were used. Originally data was collected for di-
agnosis purposes and therefore were installed to be near the most cri-
tical component (see [20]). Sensors were installed on the front axle, two
on the left side of the axle and two on the right side, as seen in Fig. 3.
Fig. 2. Flowchart of the proposed method: monitoring the condition of a The vibration measurements were synchronized with the cardan axle
complex system by combining operational regimes with fault diagnosis and speed, which was obtained using the tachometer pulse from the drive
prognosis techniques. shaft. The vibration measurements were continuous, which means that
every operation regime of the LHD was recorded with a precise time
that the number of states r ≤ K, where K is the number of found clus- stamp. The sample rate was 12.8 kHz.
ters. In these machines there is an in-built condition monitoring tool that
Once the separation of the process variable data is done, there are can record several parameters from the machine, such as the RPM of the
several approaches that can be performed: First, the degradation path engine, the machine speed, the RPM of the cardan axle, the temperature
can be recorded during independent failures in order to estimate what at several positions, the driven gear and the hydraulic pressure. All this
was the degradation rate in each of them. Later this information can be information can be used for detecting the operational regimes in the
used by recalculating the RUL by varying the estimated distribution of future with the proposed technique once the data is synchronized and
each operation state. Second, for each operation mode it is possible to shared online with condition monitoring systems (e.g. using machine to
define individual degradation model that can be used instead one machine communication). Some feasibility analysis using the given tool
universal degradation model. For instance in fracture mechanics, crack is already performed in other studies [37] and using Kalman filter
propagation can have three modes (opening sliding and tearing) and technique [32].
once the relation between operation and the stress type is known,
correct model can be used in each defined operation states. Third, it can 4.1. Feature extraction
also be beneficial for diagnostic to acquire the training data only when
the system is in a particular state (i.e, cluster (x) in Fig. 2). Since raw vibration measurements can rarely be used, a common
A universal flowchart showing how defined operation regimes can method is to extract features which will indicate certain attributes
be used for condition monitoring is provided in Fig. 2. In Fig. 2 steps (qualitative or quantitative). Some features are sensitive to overall vi-
related to diagnosis are for defining the correct failure mode, which can bration levels, for instance, the RMS value. Other features can give good
help to define the correct degradation model or to assist to select an results when used to detect impacts, such as the peak value or the peak-
appropriate boundary set . Eventually the estimation of the RUL can to-peak value. We selected five commonly used features that would
be achieved using either knowledge-based reasoning [2], physical distinguish how the shape and form of the vibration signal changed: the
models [3] or data-driven approaches [3]. More detailed explanation of RMS, peak, peak-to-peak, kurtosis and skewness values. All these
each approaches was recently reviewed by Zhang et al. [36]. parameters should vary while the machine is in operation. Although
many studies (related to fault identification) consider pre-processing
147
Fig. 3. A typical underground loader and two sensors mounted on to the front axle.
vibration measurements using frequency filtering or time-frequency drilled and blasted from the working face. This operation can be per-
filtering [4,5,23,29], we have only studied time domain features for formed using a manned LHD or an LHD remotely controlled by an
two reasons. First, because the proposed method should be generic, operator sitting in a van. During this phase the LHD is normally oper-
features which do not require extensive manual pre-processing should ated at lower speeds, but is subjected to heavy impacts, since some of
be prioritized. Second (based on our literature study), similar studies the rocks can be rather big. According to previous studies, this might be
using vibration measurements for detecting operation regimes have not the most harmful phase for the e.g. gears in the front axle [14]. This is
been done using time domain features. especially true of LHD’s driven using remote control, since in this case
the operator loses their intuitive sense of the machine’s handling and
4.2. Operation regimes of the LHD cannot perceive the subtle differences between smooth and rough
handling.
Roughly the operation can be divided into five different phases:
driving between the working face and the maintenance hall (located Hauling and dumping
underground), transit to the loading position, loading, hauling and id- During the hauling phase, the ore is carried inside the LHD’s shovel
ling. How each of these phases are wearing a monitored component(s) and is transported to the dumping point, where it is crushed or prepared
depends on the failure type (the malfunctioning component and the for crushing into smaller pieces. The difference between the hauling
type of failure). One can estimate the harmfulness of each stage only operation and the two transit stages is that, during hauling, the total
when each group has been analysed separately after failures have oc- weight of the LHD is much greater and the average speed is less than
curred and have been documented correctly. during the transit stages. Moreover, the hauling operation is performed
using an autopilot and the operator only monitors the event. During
Transit from the maintenance hall to the working face hauling, impact forces do not usually occur, since sudden movements
In this phase the speed is usually higher than that in the other stages are avoided and collisions caused by human error do not happen, since
and at the same time the LHD has the lowest static load, since there is the operation is automated. However, higher static loads might be
no ore inside the shovel. During this phase the LHD is almost always harmful to some of the components, especially the front axle, since most
driven using manual operation and some speed decreases might occur, of the weight is carried by the front axle because of the shovel’s location
depending on the traffic inside the caverns and how tight the corners at the front.
are in the tunnel. During this phase, an uneven road or holes in the road Dumping is a very quick operation and has little effect on me-
can have an effect on the health of a monitored component. chanical components. Moreover, when concentrating on monitoring the
condition of the other components, for example the hydraulic cylinders
Transit to the loading position of the shovel, it would be useful to differentiate between this operation
This phase is similar to the first phase, since there is no load which and hauling. However, when defining operation modes affecting the
needs to be carried. However, the difference is that, during this phase, lifetime of a e.g. front axle, this would be futile and, therefore, hauling
the LHD is operated automatically and the maximum speed is limited to and dumping can be considered as two parts of a single phase (which is
a lower RPM than in manual driving. Moreover, there might be some done in this study).
differences in the road condition, since the smaller tunnels used for this
operation may not be in as good a condition as the main tunnels. 4.3. Extraction of the infection data
Therefore, there might be a slight increase in the vibration levels
compared to those encountered in the first phase. In order to collect the infection data, we extracted the WAV signal of
one of the acceleration signals and its spectrogram (0–6 kHz) to char-
Loading acterize and reveal different regimes. Infection data was collected on
Loading is the event when the ore is picked up after it has been Tuesday for defining operation regimes manual transit and idle and
148
Fig. 4. LHD cycles during operation (50 min. time periods).
Wednesday for defining other operation modes. As shown in Fig. 4 the tools) the isolated data segments which belonged to these five different
LHD works for periods of approximately 50 min.. During each period, groups. Later these data were used when infecting clusters in the way
there are three cycles where the LHD will first transit to the loading mentioned in Section 2. Note, however, that this type of analysis is
position (transit with automatic operation), collect the ore (loading) unnecessary when using the proposed method in real life, since infec-
and return to the dumping position (hauling+dumping). As can be tion data can be separated from the other data during production.
observed, there is a similar pattern when each mode is changing, except
that the separation between a transit and a hauling operation cannot be
seen. However, these two operations can be distinguished based on the 4.4. Typical operation during one week
fact that hauling always follows loading and transit always follows
dumping. In Fig. 6 it can be observed that the overall pattern for most days in
Fig. 5shows the time period when the LHD is in transit from the the week seems to be rather similar. There are two shifts during each
maintenance hall to the cavern where the ore is being removed from the day which are operated quite punctually. However, on Monday and
face. The duration of this phase is approximately one hour and 45 min.. Friday, the daily operation time is quite shorter for some reason, pos-
The difference between this transit and that visualized in Fig. 4 is that sibly due to weekly predictive maintenance tasks and other main-
in the former transit, the LHD is operated manually and the speed can tenance issues. Therefore, the further tests were performed comparing
be much higher. The reason why this mode of operation should be se- the data collected during the three days from Tuesday to Thursday.
parated from automated transit is that, in some cases, the speed in-
crease might be crucial for the health of the components, since e.g. the
front axle is lubricated using oil and sump starvation may occur at 5. Results and discussion
higher speeds and thus increases the wear process in some circum-
stances. Moreover, this phase may also be a good opportunity to acquire The results were processed in three steps. Firstly the distribution
vibration signals which can be used later for diagnosis purposes. and values of the selected input parameters were visualized in order to
Sometimes there are periods when the LHD is stopped for some see how noise and other factors might have influenced the measure-
unknown reason and must wait before it can become productive again. ments.
This can be seen clearly in Fig. 5 during the period from 39 to 44 min.. Secondly the models, after clustering, was evaluated by estimating
It is also advantageous to separate this mode from the other regimes, how the trained models converged when the training data were col-
since it should not damage the machine at all. If this mode increases lected on different days. If the parameters are suitable and can be used
during production, this can be taken into account when calculating the for data separation, each model should have approximately the same
RUL. amount of clusters in similar positions when data are taken at a random
As can be observed in Figs. 4b and 5 b the rotation speed of the time during production.
Cardan axle seems to be a very promising indicator for distinguishing Thirdly the models trained using data collected on Tuesday were
most operation regimes from each other. Therefore, in further tests, evaluated using the infection data, to determine how well each opera-
speed was one of the parameters used in all the models. tion regime was represented by each cluster.
Based on this analysis, we manually selected (by using audiovisual
149
Fig. 5. LHD transit to the working face (1h. 43 min. time period).
Fig. 6. Time periods when the loader was running during one week.
150
observe that the range values are almost the same as the peak values
scaled by 2. After normalization, which is almost always performed in
many machine learning applications, it would be redundant to use both
parameters as inputs for the cluster algorithm and, therefore, models
are trained only by using the peak as an input parameter without
considering the range value.
Separating the data using the clustering technique
Each model was trained using the speed of the Cardan axle as a
single parameter together with each vibration feature (RMS, kurtosis,
peak values and skewness) individually, to see how well the algorithm
would produce similar results when the training data were collected on
different days (Tuesdsay, Wednesday and Thursday). Only the shape
and size could be evaluated, since the numbering varied because the
initial starting point was chosen randomly inside the feature space.
Model 5 is the exception, since it includes all the nine parameters and
cannot be evaluated visually. Therefore, model 5 was only evaluated
using the infection data. Before using the chosen parameters as inputs
Fig. 7. Distribution of the rotation speed of the Cardan axle on a typical day for the model, normalization was performed using the z-score method.
(Tuesday). Each speed value is the average of a 5 sec, segment. After normalization, those feature values which were four times larger
than the standard deviation were excluded from the final models in
order to avoid anomalies caused by noise. As can be observed in Fig. 8
Parameter evaluation
the algorithm was not able to reduce the number of clusters to five (the
number of analysed regimes). Instead all the models found ten clusters,
The nine parameters selected were as follows: the rotation speed of
which was the initial K value. This indicates that the data are not dense
the Cardan axle,
and focused exclusively on the operation regimes.
RMSV (vertical), RMSH (horizontal), PeakV, PeakH, KurtosisH,
Results indicate that, it would be advisable to assume the K value to
KurtosisV SkewnessH and SkewnessV. To test how these parameters could
be higher than the assumed number of operational regimes; later, if the
be used together to determine the operational behaviour of the LHD,
cluster infection explained in Fig. 1 works properly, all the clusters
each parameter was statistically evaluated in order to ensure that
without a label can be neglected as being noise. Moreover, if one op-
mixture of Gaussian distribution is meaningful. The distribution of the
eration mode is dominant in two clusters, the clusters can either be
speed for each chosen day can be seen in Fig. 7. When comparing 3 full
combined into a single cluster or kept separate and treated differently.
working days (Tuesday, Wednesday and Thursday) it was clear that
For instance, in the loading mode, crashing into a boulder and gently
each day had a similar speed distribution where 3 almost Gaussian
lifting ore can be seen in different clusters and later be defined as se-
modes were present which were centred around the speeds 2.5 Hz, 7 Hz
parate operations.
and 13 Hz. Moreover, there was a narrow spike at 0 Hz, that indicates
that the machine is idling.
Model 1 (RMS and speed)
Take the result first and then you may discuss it and draw conclu-
In Fig. 8 model 1 represent clusters which were trained using the
sions: As shown in Fig. 7 the idling state (bin where the speed is zeros)
RMS and speed parameters. When comparing results when training data
is very dominant. Using scheduled maintenance based on operation
was collected on different days, one can observe that, cluster positions
time is therefore not suitable if idle time is not considered. This finding
and sizes are rather similar. This indicates that the RMS and speed as an
is well aligned with the study done by [21] who stated that the time
input features are able to produce similar results and do not depend
scale for the process is not the same as the calendar or clock time.
heavily on the collection day.
Further, using mean speed is not a good indicator for the wear of
When the values are low, this model has some difficulty in finding
bearing and gears.
similar clusters. For example, when comparing clusters 1 and 4
As shown in Fig. 7 the idling state (bin where the speed is zeros) is
(Tuesday), cluster 10 (Wednesday) and clusters 4 and 2 to each other,
very dominant. Using scheduled maintenance based on operation time
one can observe that the data in this area are sometimes divided into
is therefore not suitable if idle time is not considered. Further, using
two clusters and are sometimes concentrated to only one. Nevertheless,
mean speed is not a good indicator for the wear of bearing and gears.
the results seem to be promising, since these small variations are to be
Better approximation would be to have separate analysis into three
expected when using these types of approximation schemes.
Gaussian distribution centred into 2 separated frequency areas (bins
2.5 Hz, 7 Hz and 13 Hz). Naturally this is only how the speed will affect
Model 2 (kurtosis and speed)
into the RUL and several other aspects are also present such as the level
Model 2 represent clusters which were trained using the kurtosis
of vibrations.
and speed parameters. One can observe that, when comparing the
Other features extracted from the vibration signals can be found in
cluster positions and sizes, most of them are not similar. These results
Table 1. Unlike the speed, each of the vibration features has a shape
indicate that using kurtosis values in a real industrial environment can
which is more like a unimodal shape and is quite close to the shape of a
give poor results. This might be due to random noise peaks, which can
normal distribution, except that there is a high end tail. Reason for the
alter the kurtosis value quite dramatically, as seen in Table 1. Perhaps
long tail might be due rocks and boulders, which are randomly hitting
in the future, if kurtosis is used as an input feature, one should use pre-
the bottom of the loader and causing transient increases in the accel-
processing filter where frequency band is taken on a more narrower
eration levels. This can have a huge effect, especially on features like
spectra, which should reduce impacts coming from noise sources.
kurtosis and range. However, when examining the difference between
the median and mean values of the kurtosis, one can observe that these
Model 3 (peak values and speed)
peaks happen quite seldom, since the median value is much lower.
Model 3 represent clusters which were trained using the peak and
By comparing the range and peak values seen in Table 1 one can
speed parameters. By comparing the cluster positions and sizes, most of
151
Table 1
Statistical information on each feature before using the VBGM algorithm. V stands for the vertically and H stands for the horizontally mounted accelerometer values.
The signal length was five sec. for each individual value, and during each day around 10,000 parameter values were calculated during different operation regimes.
Parameter Tuesday Wednesday Thursday
Median Mean Std Max Min Median Mean Std Max Min Median Mean Std Max Min
RMS V (g) 0.12 0.13 0.10 4.86 0.00 0.12 0.12 0.08 3.42 0.00 0.11 0.11 0.09 5.10 0.00
RMS H (g) 0.09 0.10 0.09 4.98 0.00 0.09 0.10 0.08 3.49 0.00 0.08 0.09 0.08 5.20 0.00
Kurtosis V 4.13 78.82 424.76 24921.61 1.70 4.25 91.98 419.14 28050.70 2.12 4.50 90.59 473.01 27711.59 2.12
Kurtosis H 4.45 138.68 452.19 16851.51 1.63 4.83 162.74 507.36 27201.94 1.64 5.25 143.30 566.03 27989.68 1.85
Peak V (g) 0.68 1.91 2.82 31.60 0.00 0.69 2.07 3.07 31.60 0.00 0.68 1.91 2.92 31.60 0.00
Peak H (g) 0.58 2.08 3.87 32.19 0.00 0.60 2.22 3.96 32.19 0.00 0.60 1.92 3.42 32.19 0.00
Skewness V –0.01 –0.09 2.55 104.76 –32.87 –0.01 –0.12 2.42 147.07 –28.46 –0.01 –0.12 2.81 146.42 –109.79
Skewness H 0.01 0.15 2.41 93.24 –34.30 0.01 0.05 2.50 144.15 –47.75 0.01 0.08 3.25 147.10 –110.64
Range V (g) 1.38 3.91 5.76 58.61 0.01 1.39 4.20 6.23 63.19 0.01 1.37 3.87 5.85 63.19 0.01
Range H (g) 1.16 4.13 7.59 64.41 0.01 1.20 4.42 7.84 64.41 0.01 1.21 3.82 6.70 64.41 0.01
them are not similar. increases dramatically.

The initial assumption was that the peak values might be a good One interesting finding is that inside any given operation mode, for
indicator when trying, for example, to separate the loading regime from example loading, there are data instances that are actually labelled as
the other regimes, since the vibration peaks increase during loading, idling. In model 1, for example, 23% of the infection data of the loading
which can be seen when comparing each stage manually, as in Figs. 4 cluster belongs to the idling cluster. Actually, this makes sense, since
and 5. The increase in the peak values can be seen especially at the the LHD occasionally stops during loading before choosing the next
horizontal level. The poorly converging results obtained when using the portion of ore to be loaded.
VBGM method in this case might be due to the fact that such peak When comparing the results of each model seen in Table 2, model 1
values also occur during production in other stages, and therefore are was the only model which was able to separate each sought-for op-
masked and cannot form clear operation regimes. eration regime in such a way that each operation mode is seen as the
largest mode in at least one of the clusters. Although hauling, transit
Model 4 (skewness and speed) with manual operation (’transit (man)’) and transit with automatic
Model 4 represent clusters which were trained using the skewness operation (’transit (auto)’) were rather mixed, this can be considered a
and speed parameters. The data behaviour for each day is quite similar good result. Even the fact that loading is separated into four different
and, when the speed increases, the skewness dispersion is very small. clusters can be considered a reasonable result, since there are so many
When the speed is low, the skewness values are greatly dispersed and different phases included in loading that it would perhaps be wise to
many clusters can be found within this region. These results indicate divide loading into two or more operation regimes. Fig. 9, model 1
that skewness might be a good parameter for separating operational (data collected on Tuesday) clusters are labelled and similar cluster are
regimes. At least the VBGM method can give promising results when merged together by using the collected infection data (Table 2). The
cluster are not labelled. final model using RMS together with speed was able to detect the dif-
ferent operation regimes. The result indicate that the final model using
RMS together with speed can be used to detect operation regimes.
5.1. Results after defining operation regimes using data infection
Model 2 and model 4 do not work for the given case study because
one of the clusters include a majority of the data points. Furthermore.
For the label infection, the models trained using the data collected
The results for model 3 (peak values and speed) show that the se-
on Tuesday were chosen. The results can be found in Table 2 for model
paration of regimes was unsuccessful. However, cluster 6 was identified
1–5, which were all infected using the same data. If the proposed
correctly, where the was a large proportion of loading. Therefore, when
method worked, all five different modes in the ’largest mode’ row
combining the peak parameter with other parameters, one could obtain
should be found. For simpler cases, the percentage proportion of each
some extra information which could be useful.
mode should be close to 100%. However, for complex systems, it is not
Model 5, where all the parameters are combined, found only three
reasonable to assume a perfect data separation and, therefore, the
clusters where all the operation regimes were located, and most of them
percentage proportion of each mode can be somewhere between 30%
were in two data clusters. One can conclude from this that combining
and 100%, depending on how many operation regimes we are trying to
good parameters (like RMS in this case) and bad parameters (like
separate. Minimum threhold limit is cannot be defined since it depends
kurtosis in this case) will lead to poorer results than just using good
also how many labelled clusters are aimed to find. For instance, each
parameters.
operation mode should have a single cluster where its percentage
proportion is dominant when comparing the percentages column-wise.
For example, in Table 2, the largest mode in cluster 9 in model 1 is 6. Conclusions
loading, which can be regarded as a good result since 49% is a domi-
nant proportion. To calculate how large the percentage proportion is There is a need for techniques that can use existing information to
with regard to the column-wise percentage distribution, we can add all estimate external factors such as operation regimes. The proposed
the values in the column in question and then perform the following method is one such method. It employs an unsupervised clustering
calculation: technique in condition monitoring data and then infects these data with
LargestMode = 100*49/(8 + 49 + 6) = 77%, which means that a smaller data set to label each cluster. It is suggested as a technique
loading is a very dominant operation mode useful for the industry, as a larger amount of training data can be col-
The idling mode was detected accurately with most of the models, lected without needing to know the correct labels for all operation
especially with model 1 (RMS and speed), where it was 100% con- modes beforehand. Using speed and vibration RMS values (model 1)
centrated to one cluster. For prognostic purposes, this can be used when gave reasonable result for the distribution of the operation regimes
estimating the total production time. For instance, maintenance tasks during production.
based on the total operating h. can be postponed if the idling time The use of common statistical features, such as kurtosis, skewness
152
Fig. 8. Results obtained using the VBGM method for data collected on separate days. The speed is the rotation speed of the Cardan axle and features were obtained
from acceleration sensors mounted vertically and horizontally on the front axle. The initial K value was 10.
153
Table 2
Distribution of the validation data (Tuesday).
Model 1 (RMS and speed)
Operation Distribution of data points %

regime
cluster 1 cluster 2 cluster 3 cluster 4 cluster 5 cluster 6 cluster 7 cluster 8 cluster 9 cluster 10 no. of
points
Idling 0 0 0 100 0 0 0 0 0 0 70
Transit (Man) 1 2 0 15 14 47 5 7 8 2 1125
Transit (Auto) 0 0 0 5 39 15 36 1 0 3 97
Loading 7 10 0 23 0 0 0 5 49 5 154
Hauling 0 0 0 16 10 11 56 0 6 0 98
Largest mode Loading Loading Empty Idling Transit (Auto) Transit Hauling Transit Loading Loading
(Man) (Man)
Model 2 (Kurtosis and speed)
regime
points
Idling 0 0 100 0 0 0 0 0 0 0 70
Transit (Man) 0 0 73 0 1 19 0 1 6 0 1125
Transit (Auto) 0 0 93 0 0 7 0 0 0 3 97
Loading 0 0 82 0 3 15 0 0 0 0 154
Hauling 0 0 82 0 3 15 0 0 0 0 98
Largest mode Empty Empty Idling Empty Loading+Hauling Transit Empty Transit Transit Transit (Auto)
(Man) (Man) (Man)
Model 3 (Peak and speed)
regime
points
Idling 0 0 0 0 0 0 0 100 0 0 70
Transit (Man) 15 0 1 5 10 8 0 23 32 5 1125
Transit (Auto) 22 0 2 0 23 2 0 43 4 4 97
Loading 0 0 0 0 8 36 0 42 14 0 154
Hauling 1 0 0 0 5 2 0 83 7 2 98
Largest mode Transit Empty Transit Transit Transit (Auto) Loading Empty Idling Transit Transit (Man)
(Auto) (Auto) (Man) (Man)
Model 4 (Skewness and speed)
regime
points
Idling 49 0 0 0 0 0 51 0 0 0 70
Transit (Man) 85 0 7 1 0 0 3 0 0 4 1125
Transit (Auto) 93 0 4 0 0 2 0 0 1 4 97
Loading 71 0 24 1 0 0 3 0 0 0 154
Hauling 86 0 10 4 0 0 0 0 0 2 98
Largest mode Transit Empty Loading Hauling Empty Transit Idling Empty Transit Transit
(Auto) (Auto) (Auto) (Auto+Man)
Model 5 (All combined)
Operation Distribution of data points
regime
points
Idling 0 0 0 100 0 0 0 0 0 0 70
Transit (Man) 0 0 0 33 0 0 0 18 0 50 1125
Transit (Auto) 0 0 0 12 0 0 0 1 0 87 97
Loading 0 0 0 69 0 0 0 1 0 31 154
Hauling 0 0 0 33 0 0 0 0 1 66 98
Largest mode Empty Empty Empty Idling Empty Empty Empty Transit Hauling Transit (Auto)
(Man)
and peak values, which were extracted from the vibration signals, failed operation regime or treat them as two individual regimes and further
to detect operation regimes using the proposed method. Possibly en- analyze why they should be treated as two separate groups.
vironmental noise affected the measurements too much. If time domain
features are to be used as inputs, some pre-processing filters are needed Future work
to reduce the noise and improve the results.
In the proposed approach, the chosen number of clusters (k) should Based on the identified operation regimes, future work can in-
be larger than the number of sought-for operation regimes to avoid vestigate the relation of a component wear together with detected op-
having a cluster where only noise is included and two operation modes eration modes using the studied approach. Furthermore simulations can
become mixed as noise is mixed with one of the operation groups. Using be done where RUL is estimated using several different operational
the infection method, it is possible to take an operation mode which is distribution and can help to decide is it possible to continue the pro-
divided into two clusters and either merge the two clusters into one duction using fail safe operation modes or should the system be
154
Fig. 9. Final model where clusters (Model 1 Tuesday) are merged based on the collected infection data.
maintained promptly. Therefore more work is needed where these 2009;23(5):1528–34.

models are used while collecting several degradation data sets in order [6] Benedettini O, Baines TS, Lightfoot H, Greenough R. State-of-the-art in integrated
vehicle health management. Proceedings of the Institution of Mechanical Engineers
to truly validate their effectiveness. 2009;223(2):157–70.
In the future proposed method should be tested using process [7] Bishop CM. Pattern recognition and machine learning. springer; 2006.
parameters (together with or without vibration features). More em- [8] Boutemedjet S, Bouguila N, Ziou D. A hybrid feature extraction selection approach
for high-dimensional non-gaussian data clustering. IEEE Trans Pattern Anal Mach
phasis should be placed on feature selection and on the pre-processing Intell 2009;31(8):1429–43.
step where filtering techniques are also applied without losing the focus [9] Chamroukhi F, Samé A, Aknin P, Govaert G. Model-based clustering with hidden
of finding input parameters which are generic and do not require ex- markov model regression for time series with regime changes. Neural Networks
(IJCNN), The 2011 International Joint Conference on. IEEE; 2011. p. 2814–21.
tensive pre-processing before they can be applied to many different
[10] Corduneanu A, Bishop CM. Variational bayesian model selection for mixture dis-
systems. The infection step should be carried out with more precise data tributions. Artificial intelligence and Statistics. 2001. Morgan Kaufmann Waltham,
(let the operator do the selection) with a view to gaining a better un- MA; 2001. p. 27–34.
[11] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via
derstanding of the relations between the parameters and the operation
the em algorithm. J R Stat Soc Series B (methodological) 1977:1–38.
regimes and possibly the harmful effects of the various operation re- [12] Figueiredo MA, Jain AK. Unsupervised learning of finite mixture models. Pattern
gimes on machine health. Furthermore, studies where investigated Anal Mach Intell, IEEE Trans 2002;24(3):381–96.
method is compared with other similar methods, should be done. [13] Fraley C, Raftery AE. How many clusters? which clustering method? answers via
model-based cluster analysis. Comput J 1998;41(8):578–88.
[14] Gustafson A, Schunnesson H, Galar D, Kumar U. The influence of the operating
Acknowledgements environment on manual and automated load-haul-dump machines: a fault tree
analysis. Int J Min Reclam Environ 2013;27(2):75–87.
[15] Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn
The authors would like to thank SKF AB, National Instruments, Res 2003;3(Mar):1157–82.
Pyhäsalmi Mine and Sandvik for their contributions and support. [16] Hanafizadeh P, Eshraghi J, Taklifi A, Ghanbarzadeh S. Experimental identification
of flow regimes in gas-liquid two phase flow in a vertical pipe. Meccanica
2015:1–12. https://doi.org/10.1007/s11012-015-0344-4.
References [17] Iverson D.L.. Inductive system health monitoring2004.
[18] Jain AK. Data clustering: 50 years beyond k-means. Pattern Recognit Lett
[1] Al-Chalabi H, Lundberg J, Ahmadi A, Jonsson A. Case study: model for economic 2010;31(8):651–66.
lifetime of drilling machines in the swedish mining industry. Eng Econ [19] Kotsiantis SB, Zaharakis ID, Pintelas PE. Machine learning: a review of classification
2015;60(2):138–54. and combining techniques. Artif Intell Rev 2006;26(3):159–90.
[2] Alamaniotis M, Grelle A, Tsoukalas LH. Regression to fuzziness method for esti- [20] Laukka A, Saari J, Ruuska J, Juuso E, Lahdelma S. Condition-based monitoring for
mation of remaining useful life in power plant components. Mech Syst Signal underground mobile machines. Int J Ind Syst Eng 2016;23(1):74–89.
Process 2014;48(1–2):188–98. [21] Lee M-LT, Whitmore GA. Threshold regression for survival analysis: modeling event
[3] An D, Kim NH, Choi J-H. Practical options for selecting data-driven or physics-based times by a stochastic process reaching a boundary. Stat Sci 2006:501–13.
prognostics algorithms with reviews. Reliab Eng Syst Safety 2015;133:223–36. [22] Pattern recognition and machine learning. Toolbox. Mathworks; 2016.
[4] Antoni J, Randall R. The spectral kurtosis: application to the vibratory surveillance [23] Peng Z, Peter WT, Chu F. A comparison study of improved hilbert–huang transform
and diagnostics of rotating machines. Mech Syst Signal Process 2006;20(2):308–31. and wavelet transform: application to fault diagnosis for rolling bearing. Mech Syst
[5] Bartelmus W, Zimroz R. A new feature for monitoring the condition of gearboxes in Signal Process 2005;19(5):974–88.
non-stationary operating conditions. Mech Syst Signal Process. [24] Si X-S, Hu C-H, Kong X, Zhou D-H. A residual storage life prediction approach for
155
systems with operation state switches. IEEE Trans Ind Electron algorithms in data mining. Knowl Inf Syst 2008;14(1):1–37. https://doi.org/10.
2014;61(11):6304–15. 1007/s10115-007-0114-2.
[25] Si X-S, Wang W, Hu C-H, Zhou D-H, Pecht MG. Remaining useful life estimation [32] Wyłomańska A.P.D.A., Śliwiński M.P.. Identification of loading process based on
based on a nonlinear diffusion degradation process. IEEE Trans Reliab hydraulic pressure signal2016;:459–466.
2012;61(1):50–67. [33] Yang Z-X, Wang X-B, Zhong J-H. Representational learning for fault diagnosis of
[26] Suarez EL, Duffy MJ, Gamache RN, Morris R, Hess AJ. Jet engine life prediction wind turbine equipment: a multi-layered extreme learning machines approach.
systems integrated with prognostics health management. Aerospace Conference, Energies 2016;9(6):379.
2004. Proceedings. 2004 IEEE. 6. IEEE; 2004. p. 3596–602. [34] Yu J, Qin SJ. Multimode process monitoring with bayesian inference-based finite
[27] Timusk M, Lipsett M, Mechefske CK. Fault detection using transient machine sig- gaussian mixture models. AlChE J 2008;54(7):1811–29.
nals. Mech Syst Signal Process 2008;22(7):1724–49. [35] Yuwono M, Guo Y, Wall J, Li J, West S, Platt G, et al. Unsupervised feature selection
[28] Tse P, Atherton D. Prediction of machine deterioration using vibration based fault using swarm intelligence and consensus clustering for automatic fault detection and
trends and recurrent neural networks. J Vib Acoust 1999;121(3):355–62. diagnosis in heating ventilation and air conditioning systems. Appl Soft Comput
[29] Urbanek J, Barszcz T, Zimroz R, Antoni J. Application of averaged instantaneous 2015;34:402–25.
power spectrum for diagnostics of machinery operating under non-stationary op- [36] Zhang Z, Si X, Hu C, Lei Y. Degradation data analysis and remaining useful life
erational conditions. Measurement 2012;45(7):1782–91. estimation: a review on wiener-process-based methods. Eur J Oper Res 2018.
[30] Walker M, Figueroa F, Toro-Medina J. Phm enabled autonomous propellant loading [37] Zimroz R, Wodecki J, Król R, Andrzejewski M, Sliwinski P, Stefaniak P. Self-pro-
operations. Aerospace Conference, 2017 IEEE. IEEE; 2017. p. 1–11. pelled mining machine monitoring system–data validation, processing and analysis.
[31] Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, et al. Top 10 Mine planning and equipment selection. Springer; 2014. p. 1285–94.
156
Paper D
157
DOI: 10.1784/insi.2018.60.8.434
AUTOMATED CM
Selection of features for fault diagnosis on rotating

machines using random forest and wavelet
analysis
J Saari, J Lundberg, J Odelius and M Rantatalo
Identification of component faults using automated condition monitoring methods has a huge potential to improve the
prediction of machine failures. The ongoing development of the Internet of Things (IoT) will support and benefit feature
selection and improve preventative maintenance decision making. However, there may be problems with the selection
of features that best describe a specific fault and remain valid even when the operation mode is changing (for example
different levels of load). In this study, features were extracted from vibration signals using wavelet analysis; a feature
subset was selected using the random forest ensemble technique. Three different datasets were created where the load
of the system was changing while the rotational speed remained the same. The tests were repeated five times by first
recording the nominal condition and then introducing four faults: angular misalignment; offset misalignment; partially
broken gear tooth failure; and macro-pitting of the gear. To improve previous feature selection techniques, a method is
proposed where, before training a classifier, the most promising features are compared at different degrees of torsional
load. The results indicate that the proposed method of using random forests to select top variables can help to choose
good features that may not have been considered in manual feature selection or in individual load zones.
1. Introduction In addition, the stochastic nature of future faults may make it

difficult for the classification algorithm to detect unseen faults
In machine fault diagnosis and prognosis, identifying component (different size and location). If this is not taken into account, the
faults, such as misalignment or gear and bearing faults, and result may be ad hoc fault diagnostic solutions.
determining their location has been the topic of research for many In vibration analysis, before selecting which features to study
decades[1,2]. In the future, the Internet of Things (IoT) may create (manually or using algorithms), the features need to be extracted from
smart machines and change the way condition monitoring is carried the raw vibration signals. During the last decade, wavelet analysis has
out by exploiting a number of technologies, such as ubiquitous been shown to reveal many different types of faults quite efficiently, as
and pervasive computing, embedded devices, communication it provides a much greater representation of transient signals than the
technologies, sensor networks, internet protocols and applications[3]. more common fast Fourier transform (FFT) analysis[13,14]. According
To diagnose machine faults, key indicators (also known as main to Peng and Chu[4], the key concern when extracting features out of
features or variables) that best describe a specific fault need to be wavelet coefficients is to select the ones best describing the fault (for
selected. One common practice is to extract these features from example wavelet type, scale and statistical parameters). Lin and Qu[15]
vibration signals using the time, frequency or time-frequency successfully used minimal entropy to select the optimal value of the
domain based on the physical understanding of the system[4]. These bandwidth parameter for the Morlet wavelet. Bafroui and Ohadi[8]
features are used to detect faults and identify the fault type, ie performed a similar study, but instead of minimum entropy they
diagnosis[5], or to estimate the time to failure, ie prognosis[6]. used maximum energy (Shannon entropy) to reduce the number of
A major proportion of traditional diagnosis techniques rely on features. Some problems may arise with these techniques when a fault
manual analysis with key indicators selected and measured based on of interest creates a weaker transient signal than a noise source found
the physical knowledge of the system, for example extracting the peak nearby. The methods are also unable to define the scale parameters
value of a specific band in a frequency spectra or filtering the sum of of the minimum entropy. Liu and Han[16] used multiscale entropy,
peak harmonics. However, changes in load can affect the value of the calculating a sample entropy across a sequence of scales to analyse
key indicator and make it harder to interpret. In some cases, system the series complexity under different scales. A similar technique for
resonances may diminish or amplify the value of the feature, for detecting the presence of a series of transients and their location in the
example when the calculated bandpass area is in a different position, frequency domain (or the scale domain) was proposed by Antoni[17].
and cause false positive or false negative readings[7].
Some more recently developed techniques of feature extraction
and selection use supervised machine learning classification l Submitted 29.08.17 / Accepted 29.06.18
algorithms, such as artificial neural network (ANN)[8,9], support Juhamatti Saari, Jan Lundberg, Johan Odelius and Matti Rantatalo
vector machine (SVM)[10] and fuzzy logic[11,12]. With such are with the Division of Operation, Maintenance and Acoustics, Luleå
techniques, there is no need to predefine specific key indicators University of Technology, 97187 Luleå, Sweden.
beforehand (black box approach). Using multi-dimensional feature Juhamatti Saari is also with SKF-LTU University Technology Centre,
space instead of one or two key indicators, a fault may be identified Luleå University of Technology, 97187 Luleå, Sweden. Email: juhamatti.
by a slight decrease or increase in multiple features. A disadvantage saari@ltu.se
of these techniques is the need to have both healthy and faulty data.
158 Insight • Vol 60 • No 8 • August 2018

AUTOMATED CM
Instead of entropy, this study used a method based on spectral

kurtosis. Nevertheless, the aim of both of the latter techniques was
to find optimal parameters to visualise and/or extract features able to
specify the fault type and location.
In some cases, for instance when the fault type is unknown and the
optimisation of features is completed too soon, useful information is
lost during signal preprocessing. Yet, blindly extracting many features
using several scaling factors can result in a feature set in which
many of the features have little or no meaning, thus decreasing the
classification accuracy. Methods such as random forests are useful,
as they are able to scrutinise every part of the signal to find a variable
subset that holds most of the required information content[18-20]. These
techniques may not be suitable for online fault detection nowadays
because the required computational power is too high, but they
may provide useful information by selecting good features offline,
especially those features not considered previously or neglected
during the preprocessing of the vibration signals (for example a
specific bandwidth and scale parameter in wavelet analysis). For this
to be practicable, a method for selecting only the strong features and
reducing the size of the feature set[21] needs to be found.
To improve feature selection for fault diagnosis, a method is
proposed in which, before training a classifier, promising features
are selected and compared in data collected for different types of
operational modes. Once the non-zero intersection is found, there Figure 1. Test-rig for studying mechanical faults: (a) mechanical
should be a good chance that these features are robust and can subsystem; and (b) hydraulic subsystem
withstand the effects of ad hoc situations. The method may also
allow the discovery of new fault indicators not commonly used for a
Table 1. Test-rig specifications
particular fault type, but which may be able to train a fault detection
algorithm (not necessarily a classification algorithm). In this study, Specification Explanation
features were extracted using wavelet analysis and a feature subset Motor 75 kW electric motor
selection was created using a random forest ensemble technique.
Controller ABB inverter
Overall, three different datasets were used in which the load of the
system was changing (while running the test-rig under nominal Maximum angular speed 1600 r/min
conditions and for four different fault types). Gearbox Mekanex 602A, ratio 3:61
Number of teeth (z1) 30
2. Experiment Number of teeth (z2) 39
2.1 Test-rig Number of teeth (z3) 18
The test-rig (Figure 1) used in this study was specifically designed Number of teeth (z4) 50
to test several different fault types in a more natural environment. Coupling Rubber doughnut coupling
The scale of the test-rig is much closer to the size of normal machine Hydraulic pump, maximum
components used in industry and it includes many sources of vibration Load
torque 500 Nm
(for example a large electric motor, a hydraulic pump, oil valves, etc),
Vibration sensors IMI PCB 10 mV/g
which can mask the vibration signals coming from the fault location.
Table 1 lists some of the main components of the test-rig. Torque sensor Linear range 0-1000 Nm
Figure 2 shows the schematic view of the
test-rig. Mm is the main electric motor able
to produce a maximum of 75 kW of power.
Load is transferred through a two-stage
gearbox and the output shaft is connected to
a hydraulic pump, which is able to produce a
maximum of 500 Nm torsional load. To avoid
cavitation, an initial pressure of 1 bar is fed to
the pump using an additional support motor
(Ms), seen in the hydraulic circuits. Load is
controlled using the hydraulic oil flow, by
adjusting the hydraulic valve seen in Figure 2.
Other additional functions in the hydraulic
system are a cooling and filtering system
(see Figure 1(b)). Torsional load (Mi) is
measured between the gearbox and the main Figure 2. Schematic of the test-rig
electric motor.
Insight • Vol 60 • No 8 • August 2018 159

AUTOMATED CM
2.2 Simulated faults ⎛d ⎞ ⎛d ⎞

2 2
⎛d ⎞ ⎛d ⎞
2 2
Lab = ⎜ a2 ⎟ − ⎜ b2 ⎟ + ⎜ a1 ⎟ − ⎜ b1 ⎟ − a ∗sin (α wt ) .... (2)

Four individual types of damage were tested in this study: angular ⎝ 2 ⎠ ⎝ 2 ⎠ ⎝ 2 ⎠ ⎝ 2 ⎠
and offset misalignment of a rubber coupling; partially broken Face contact ratio ϵβ , the contact ratio in an axial plan, can also
teeth; and macro-pitting of the gear contact surface. For the angular be calculated using the following equation[22]:
misalignment, the test-rig was tilted by moving the shaft and the
pump in a horizontal direction (see Figure 3). Angle (γ) of the є ..................................... (3)
misalignment was 3°. Offset misalignment (h) was produced by
moving the pump and the pump shaft horizontally by 3 mm. The measurements and estimations used to solve these equations
appear in Table 2. Equations (1) and (3) can be solved using this
information, giving a transverse contact ratio of 1:7 and a face
contact ratio of 1:5 for gear 2.
Table 2. Measured gear geometry

for gears 1 and 2 and the
equations used for calculating
the line of action and the contact
ratio[22]
Name Value
Reference diameter (d2) Measured 56 mm
Reference diameter (d1) Measured 43.5 mm
Figure 3. Horizontal misalignment apparatus Width (b2) Measured 16.8 mm
Reference helix angle (β) Measured 22.5°
The coupling of the test-rig was a doughnut type. This type of Profile shift coefficient (x2) Estimated 0.1
coupling is mainly used in cases where sudden impacts are likely Profile shift coefficient (x1) Estimated 0.1
to happen, such as highly loaded gearboxes. Gear damage was Normal pressure angle (αn) Default 20°
created to mimic two common gear failures: partially broken gear
Equation
teeth (see Figure 4(a)) and macro-pitting of the mesh surface
(see Figure 4(b)). Both gear damages were carried out on gear 2, a cos (α t )
Working axle distance (aw)
as seen in Figure 2. The size of the macro-pits were, on average, 2
0.6 mm in radius. The size of the tooth break was half the tooth
mt ( z1 + z2 )
width. Table 2 shows the number of teeth for each gear and the Axle distance (a)
2
estimated gear geometry values for gear 2.
d2
Module transverse (mt)
z2
Normal module m mtcos(β)

Appendum (ha) m(1 + x)−∆ha
⎛ z +z ⎞
∆ha m ⎜ 1 2 + x1 + x2 ⎟ − aw
⎝ 2 cos ( β ) ⎠
Transverse pressure angle ⎛ tan (α n ) ⎞

arctan ⎜
Figure 4. Simulated gear faults: (a) broken tooth; and (b) macro- (αt) ⎝ cos ( β ) ⎟⎠
pitting
Tranverse pressure angle of 2 ( x1 + x2 tan (α n ))
inv (α wt ) = inv (α t ) +
gear pair (αwt) z1 + z2
2.2.1 Gear geometry calculations da d + 2 * ha
Helical gears are known to be quieter and to run more smoothly
db d * cos(αt)
than spur gears, as the contact area is more gradual and several
teeth mesh simultaneously, usually for a longer period. In the case Calculated values
of tooth failure, a high transverse contact ratio value ϵα may make m 1.4
diagnosis harder, as the surrounding tooth pairs will compensate ϵα 1.7
for the loss of one.
ϵβ 1.5
ϵα can be calculated using the following equation[22]:
є ................................. (1)
2.3 Measurements and data acquisition
and the line of contact, Lab , can be calculated using the following The load was kept constant with three different values.
equation: More specifically, the median torsional load was approximately

AUTOMATED CM
50 Nm, 100 Nm and 150 Nm. The rotational frequency in all octave into a chosen number of voices per octave. In this study,
situations was kept at a constant 24 Hz and measured from the the lowest scales were selected, in which the pseudo-frequency
input shaft using an eddy current speed sensor. was the same as the shaft rotational frequency; as there were three
Vibration data were collected using an accelerometer stud rotating shafts, this was carried out three times. After this, the three
mounted directly on the gearbox in a radial direction, as close as scales were increased in octaves with four voices until the Nyquist
technically possible to gears 1 and 2. The relative position to gear 2 Theorem was reached. In total, 131 scales were used.
was approximately (−1 cm, −6 cm, 1 cm), as seen in Figure 1(a). To reshape the Morlet wavelet, eight arbitrary bandwidth values,
Data collection was carried out using a National Instruments PXI fb , were used in this study: 0.01, 0.05, 0.1, 0.5, 1, 3, 5 and 10.
platform. The selected sampling frequency was 51.2 kHz. All of the Figure 5 gives an illustrative example of a time domain signal
signal processing and analysis was carried out using either LabVIEW, sample analysed using WT. The scalograms in this Figure were
MATLAB or R software. The rotating speed was measured using a calculated for each different condition using the parameter values
tachometer and the load was measured using a torque sensor located fc = 1 and fb = 0.5. The scalograms show the value of each absolute
between the electric motor and the gearbox. The load, Mi , and the coefficient in relation to the pseudo-frequencies and its sampled
speed, ωI , were measured from the input shaft side.1 position. The scalogram in Figure 5(a) was generated when the
system was considered to be healthy and no faults were present.
2.4 Post-processing of the time signals As can be seen in Figure 5(a), the coefficient values of most of
The duration of each vibration signal was around 0.2 s. The segment the lower frequency contents remain constant throughout the
was chosen to be four times the rotation of an input shaft. Since whole revolution of the output shaft. However, when the pseudo-
the gearbox ratio is 3:61, all shafts of the gearbox had enough time frequency is around 300 Hz, there is a transient increase in the
to make at least one complete turn in each chosen time segment. values, which happens three times per round. When examining
A total of 300 vibration data samples had no faults and 300 had faults. the higher frequency content, there are transient spikes occurring
in a wide bandwidth area from 1400 Hz to 1800 Hz. These spikes
2.4.1 Wavelet analysis and feature extraction occur several times during the rotation, have their highest value
Using wavelet transform (WT) for feature extraction has several in the beginning and then attenuate slowly before reaching their
advantages. First, the WT method is a combination of time domain lowest value. When visually comparing the nominal condition
and frequency analysis and can detect transient signals and their with the cases seen in Figures 5(b)-5(e), it is difficult to discern
time of appearance. The continuous wavelet transform (CWT) can clear differences. When comparing the measurements taken when
be expressed as: the system was in its nominal condition (Figure 5(a)) with those
1 ∞ ⎛ t − b⎞ taken in the angular misalignment case (Figure 5(b)) and the offset
M Ψ f = S ( a,b ) = ∫ f (t ) Ψ ⎜⎝ a ⎟⎠ dt ............. (4) misalignment case (Figure 5(c)), it seems that the coefficient values
a −∞
at the lower frequencies (below 300 Hz) are reduced when the
where a is the scaling factor and b is the translation value. ψ(t) is misalignment is present. However, if this phenomenon was used
a continuous function usually referred to as the mother wavelet. as a condition indicator, it would be difficult to identify these faults
In this study, a complex Morlet wavelet (CMW) is used as a mother visually by comparing these scalograms with the given wavelet
wavelet. A CMW can be defined as: parameters, especially since multiple scalograms are needed for the
Ψ ( t, fc , fb ) =
1
π fb
( )
exp −t 2 fb exp ( j2π fct )............ (5) comparison and several wavelet parameter values must be used to
find the most sensitive ones. Furthermore, it would require a trained
where fc is the centre frequency and fb is the bandwidth parameter. eye to spot the correct frequency band and pattern to identify and
For efficiency, the chosen mother wavelet should resemble the separate each failure mode. It should also be noted that the fact that
vibration signal stimulated by the fault. However, since each fault the scalograms seen in Figure 5 did not reveal a clear presence of
type may excite a totally unique signal it might be very difficult to the faults would suggest that the faults are not large and, therefore,
computationally investigate each wavelet type separately. not easily seen with any arbitrary bandwidth value.
Previous studies have ascertained that, even though the Morlet Three statistical parameters were chosen to extract features
wavelet may not be the best candidate, it is usually one of the best from the wavelet filtered signals: root mean square (RMS), peak
candidates for many types of fault[23]. Impacts usually create a transient and entropy.
signal, the shape of which is very close to the shape of the Morlet
wavelet after damping effects. Therefore, it is reasonable to choose the
2.4.3 Random forest algorithm
Morlet wavelet as the mother wavelet when trying to implement an Random forests are a combination of tree predictors such that
automated fault diagnosis method based on wavelet analysis. each tree depends on the values of a random vector sampled
After WT is performed, some statistical parameters are needed independently and with the same distribution for all trees in the
to reveal the change in the attributes of each decomposed WT forest[24]. Breiman[24] concluded that datasets with many weak
parameter. In spectrum analysis, the amplitude of each frequency inputs are becoming more common, making it difficult for the usual
component is generally used to determine the behaviour of each classifiers, such as neural networks and trees, to obtain good results.
feature. However, in wavelet analysis, each scale coefficient can be Random forests seem to have the ability to work with very weak
treated as the time domain signal and more values than just the classifiers as long as their correlation is low. Random forests can be
peak value can be used. used not only to classify regression tasks but also to select variables
by assessing variable importance (VIMP). In random forests, two
2.4.2 Selection of scales and WT bandwidth values parameters should be decided beforehand: the number of variables
Selecting many scales will lead to a better representation of the randomly sampled at each split (mtry) and the minimum terminal
information of each frequency component at the expense of node size. Setting the number larger causes smaller trees to be
computational power. Therefore, a common way to increase the grown (thus taking less time). However, choosing too large a value
computation time is to select scales in octaves and then divide each is computationally expensive. If the number of trees (ntree) is too

AUTOMATED CM
All features were normalised using the

z-scores method. Figure 6 gives a flowchart
showing how the proposed method selects
meaningful features.
3. Results and discussion

3.1 Angular and offset
misalignment
As can be seen in Tables 3 and 4, variables for
angular misalignment were easier to spot in a
medium load; the minimum depth threshold
was lowest each time (3.474, 3.0766, 3.3058,
2.37, 2.4421, 3.1009, 3.1195, 3.2563), no
matter what the fb value was. In many cases,
RMS was the most dominant feature; no
other features were among the top variables
when fb values of 0.01, 0.05, 1 and 5 were
used. This suggests that RMS should be used
(instead of or together with the peak values)
when trying to detect angular misalignment
using wavelet filtered signals.
As for the top variables for offset
misalignment (Tables 3 and 4), the proposed
method had difficulty selecting key variables
when the load was around –100 Nm; some were
only found when using fb values of 0.5 and 1.
There was no clear indication as to whether to
use low or high fb values; the top variables were
not found in all load cases using the same fb
value. Perhaps the vertical sensor position
and the horizontal offset misalignment
Figure 5. Scalograms calculated using a complex Morlet wavelet. The rotation frequency made the classification task too difficult or
was 24 Hz and the median load was approximately 150 Nm, the bandwidth parameter value the nature of the produced load (caused
was 0.5 and the centre frequency was 1: (a) nominal condition; (b) angular misalignment; by the misalignment) was too static. None
(c) offset misalignment; (d) partially chipped tooth; and (e) macro-pitting of the features were as dominant for offset
misalignment as for angular misalignment.
small, the accuracy may decrease. Increasing the value up to the
point where every input row is predicted at least a few times should
be adequate[24].
This study used the randomForestSRC package based on
Ishwaran and Kogalur’s Random Survival Forests (RSF) package[25].
Variable selection used minimal depth variable selection, where
features are selected by calculating the distance between the
shortest tree and the root of the largest tree. A lower value means
that the features in that tree are more meaningful than others.
Once this is carried out for the selected number of trees, the results
for each feature are calculated and averaged. This method gives a
good estimation of how meaningful the features are. According
to Ishwaran et al[25], it should improve both the variance and the
bias of the results. The selected parameters for randomForestSRC
were mtry = 88 and conservative level = low. Other parameters
included the following default values: ntree = 1000; nsplit = 10; and
nodesize = 2. More information on how to optimise these
parameters can be found in the work by Ishwaran et al[26].
In total, random forest classification was carried out separately
96 times (four faults, three load cases and eight different wavelet
bandwidth parameters), where nominal data was compared
against a faulty one (300 samples on each). Since there were 131 Figure 6. Flowchart for selecting features and the parameters that
wavelet coefficient vectors, 393 features were extracted out of each need to be considered
vibration sample. The size of the input matrix was then 600 × 393.

AUTOMATED CM
3.2 Partially broken gear tooth and gear unable to define any specific top variables, while other cases were
able to find at least eight top variables and, most of the time, several
macro-pitting
dozen.
When looking at the threshold values shown in Tables 3 and 4, it can
be seen that no specific loads were better than others at detecting 3.3 Features indifferent to load
the gear faults. To avoid models requiring similar training and testing conditions
With low fb values (see Table 3), top variables for macro-pitting (exactly the same load), the number of selected top variables
were found when the median load was around 100 Nm for all of remaining the same when the load was different was tested. Table 5
the low fb values, but never in all investigated load zones. With high compares load scenarios taken from Tables 3 and 4 with the same fb
fb values (see Table 4), none of the top variables were found. This value and calculates the intersections.
suggests that there is no real physical meaning behind the chosen As identified by Piotrowski[27], coupling misalignment generally
variables and parameters. Therefore, finding good variables for produces a frequency that is twice the shaft speed frequency.
macro-pitting is very case-specific when using the proposed method. However, looking at Table 5, it can be seen that the top variables are
However, it seems better to use low fb values than high fb values. not centred on those frequencies. In fact, many of the top variables
In the case of tooth failure, the number of top variables was high have much higher pseudo-frequencies in angular misalignment.
in all cases despite the fb value, as seen in Tables 3 and 4. With fb In some studies, the bandwidth parameter (or the analogue
values of 0.1 and 0.5, a case study where the load was 100 Nm was windowing parameter using short-time Fourier transform) is
Table 3. Random forest results from features extracted using a complex Morlet wavelet
Number of top Depth Distribution of top variables
Bandwidth Fault type Load
variables threshold RMS Peak Entropy
50 Nm 8 4.3283 8 0 0
150 Nm 112 4.3613 61 28 23
50 Nm 76 1.0545 54 11 11
Offset misalignment 100 Nm 0 – – – –
150 Nm 84 2.3292 57 14 13
fb = 0.01
50 Nm 9 3.2947 7 1 1
150 Nm 9 4.2133 8 0 1
50 Nm 0 – – – –
150 Nm 0 – – – –
50 Nm 8 4.3476 8 0 0
150 Nm 111 4.2891 61 21 29
50 Nm 3 1.4849 3 0 0
150 Nm 94 2.4445 56 17 21
fb = 0.05
50 Nm 8 3.1577 6 0 2
150 Nm 87 4.4262 51 15 21
50 Nm 199 3.0884 100 48 51
150 Nm 0 – – – –
50 Nm 4 4.1741 2 0 2
150 Nm 129 4.9589 65 29 35
50 Nm 79 1.0297 45 10 24
150 Nm 90 2.4029 63 13 14
fb = 0.1
50 Nm 100 2.4745 66 9 25
Tooth failure, gear 100 Nm 0 – – – –
150 Nm 91 4.3337 50 17 24
50 Nm 0 – – – –
150 Nm 0 – – – –
50 Nm 111 5.0662 60 19 32
150 Nm 86 4.106 59 10 17
50 Nm 0 – – – –
150 Nm 115 2.3642 64 17 34
fb = 0.5
50 Nm 102 2.1907 69 11 22
Tooth failure, gear 100 Nm 0 – – – –
150 Nm 4 3.3228 2 1 1
50 Nm 0 – – – –
150 Nm 0 – – – –

AUTOMATED CM
calculated using physical knowledge from the system[28]. Although 3.4 General findings and improvements
Randall and Antoni[28] used the technique to find the maximum
value of spectral kurtosis by selecting the optimum window length, One way to improve the proposed method could be to place the
the general idea is similar, as the length of the impulse is the sought- sensor much closer to the gear (wireless sensor) or add another step
after value. Therefore, in manual calculations, the optimum fb value to the signal processing where, for instance, the wavelet filtered signal
should be greater than the duration of one impulse and less than is enveloped to improve the signal-to-noise ratio. However, adding
the time between two impacts (ti). In the case of tooth failure for more signal processing steps too soon might cause other problems
gear 2, ti was calculated as 0.054 s (the same as the rotation time and the aim is to keep the method as general as possible (multiple
of the intermediate shaft). When a very small bandwidth was used fault modes). The boundaries of the proposed method should be
(fb = 0.01 or 0.05), which is close to the one calculated using the carefully investigated before adding any preprocessing steps, since
physical knowledge, it was found that some of the top variables there is a high chance that some valuable information will be lost.
use scales with rather low pseudo-frequencies (15.8 Hz and It appears that features based on the entropy and peak values are
21.95 Hz). Although the pseudo-frequency is not exactly the very sensitive to load change, as many of the intersected variables are
same as the impact frequency (18.5 Hz), this would suggest that based solely on the RMS values (see Table 5). This suggests the RMS
the optimum bandwidth parameter (fb = 0.05) will function and is is a better feature than entropy or peak when the load is changing.
able to find at least one good key indicator that is indifferent to the Results could be improved by using a shaft encoder, as this
load. allows the angle of the shaft to be recorded much more accurately.
Table 4. Random forest results from features extracted using a complex Morlet wavelet
Number of top Depth Distribution of top variables
Bandwidth Fault type Load
variables threshold RMS Peak Entropy
50 Nm 11 4.25 11 0 0
150 Nm 86 4.328 53 8 25
50 Nm 0 – – – –
150 Nm 126 2.478 65 22 39
fb = 1
50 Nm 109 3.0175 75 12 22
150 Nm 94 3.3357 60 16 18
50 Nm 0 – – – –
Macro-pitting, gear 100 Nm 0 – – – –
150 Nm 0 – – – –
50 Nm 8 4.1441 7 0 1
150 Nm 88 4.153 51 9 28
50 Nm 83 1.2533 47 24 12
150 Nm 0 – – – –
fb = 3
50 Nm 96 3.3474 67 18 11
150 Nm 92 3.4083 60 13 19
50 Nm 0 – – – –
150 Nm 0 – – – –
50 Nm 9 4.1637 9 0 0
150 Nm 86 4.084 51 9 26
50 Nm 88 1.305 53 25 10
150 Nm 0 – – – –
fb = 5
50 Nm 96 3.4203 61 27 8
150 Nm 82 3.4183 50 13 19
50 Nm 0 – – – –
150 Nm 0 – – – –
50 Nm 87 4.4439 58 25 4
150 Nm 81 4.0654 47 12 22
50 Nm 81 1.4162 47 27 7
150 Nm 0 – – – –
fb = 10
50 Nm 112 3.4673 64 31 17
150 Nm 98 3.3591 58 19 21
50 Nm 0 – – – –
150 Nm 0 – – – –

AUTOMATED CM
Table 5. Intersection of top variables and best variables found after running the random forest algorithm when the load is changing
Number of Top five variables*

Bandwidth Fault type common
1 2 3 4 5
variables
P-freq – – – – –
Feature – – – – –
P-freq – – – – –
fb = 0.01
P-freq 15.8 – – – –
Feature RMS – – – –
P-freq – – – – –
P-freq 1181.5 1670.9 1701.9 3974.2 4047.9
P-freq – – – – –
fb = 0.05
P-freq 15.8 21.95 – – –
Feature RMS RMS – – –
P-freq – – – – –
P-freq 3974 4048 – – –
Feature RMS RMS – – –
P-freq – – – – –
fb = 0.1
P-freq – – – – –
P-freq – – – – –
P-freq 106.4 114.2 124.2 126.5 135.8
P-freq – – – – –
fb = 0.5
P-freq – – – – –
P-freq – – – – –
P-freq 271.5 – – – –
P-freq – – – – –
fb = 1
P-freq 114.2 124.2 178.9 496.8 506.0
Feature RMS RMS Entropy RMS RMS
P-freq – – – – –
P-freq – – – – –
P-freq – – – – –
fb = 3
P-freq 114.2 496.8 1086.1 2024.0 2583.2
P-freq – – – – –
P-freq 1291.6 – – – –
P-freq – – – – –
fb = 5
P-freq 2810.2 2862.3 3072.0 3653.2 5166.5
P-freq – – – – –
P-freq 1182 1203 1292 1292 17378
Feature RMS RMS RMS Peak RMS
P-freq – – – – –
fb = 10
P-freq 417.7 496.8 702.5 715.6 851.0
P-freq – – – – –
*If more than five common features are found, the order was calculated based on the load group with the highest load.
Pseudo-frequency (P-freq (Hz)) was calculated using the equation P-freq = centre frequency*sample rate/scale.

AUTOMATED CM
With the analogue eddy current sensor, some reading errors were Processing, Vol 16, No 6, pp 1005-1024, 2002.
seen when selecting the time for one rotation; this might reduce 8. H H Bafroui and A Ohadi, ‘Application of wavelet energy
the accuracy of the proposed method when the speed is fluctuating. and Shannon entropy for feature extraction in gearbox fault
By using the encoder, the time domain vibration signal could easily detection under varying speed conditions’, Neurocomputing,
be transformed into the angle domain, so the benefit could be seen in Vol 133, pp 437-445, 2014.
two places. In the future, it would be interesting to use the proposed 9. B Samanta and K Al-Balushi, ‘Artificial neural network-based
method in places where input variables are extracted using process fault diagnostics of rolling element bearings using time-domain
parameters only or are combined with variables originating from the features’, Mechanical Systems and Signal Processing, Vol 17,
vibration signals. This way, even more variables could be found that No 2, pp 317-328, 2003.
are not considered in more traditional condition monitoring methods. 10. A Widodo and B-S Yang, ‘Support vector machine in machine
condition monitoring and fault diagnosis’, Mechanical Systems
and Signal Processing, Vol 21, No 6, pp 2560-2574, 2007.
4. Conclusions 11. W Wang, F Ismail and F Golnaraghi, ‘A neuro-fuzzy approach
l The proposed method of using random forests to select top to gear system monitoring’, IEEE Transactions on Fuzzy
variables can help to select good features that are not generally Systems, Vol 12, No 5, pp 710-723, 2004.
considered when selecting features manually. 12. C Mechefske, ‘Objective machinery fault diagnosis using fuzzy
l A wavelet bandwidth parameter (fb) shorter than the time logic’, Mechanical Systems and Signal Processing, Vol 12, No 6,
between individual impacts was successful at finding two top pp 855-862, 1998.
variables, the pseudo-frequency of which was rather low where 13. F Auger and P Flandrin, ‘Improving the readability of time-
the fault in question was the partially broken gear tooth. When frequency and time-scale representations by the reassignment
the fb was increased slightly to cover two repetitive impacts, method’, IEEE Transactions on Signal Processing, Vol 43, No 5,
sensitivity was lost and no top variables were found. When pp 1068-1089, 1995.
pseudo-frequency was increased even further, top common 14. W Wang and P McFadden, ‘Application of wavelets to gearbox
variables were found again but their pseudo-frequency was vibration signals for fault detection’, Journal of Sound and
much higher than was found with low fb values. Vibration, Vol 192, No 5, pp 927-939, 1996.
l None of the top features were the same for the macro-pitting 15. J Lin and L Qu, ‘Feature extraction based on Morlet wavelet and
fault; therefore, knowing the load appears to be important when its application for mechanical fault diagnosis’, Journal of Sound
training a classifier for that particular fault type. and Vibration, Vol 234, No 1, pp 135-148, 2000.
l Taking the intersection of top variables where only the load 16. H Liu and M Han, ‘A fault diagnosis method based on local
was changing showed almost conclusively that, in the proposed mean decomposition and multiscale entropy for roller bearings’,
method, RMS was a better feature for classifying mechanical Mechanism and Machine Theory, Vol 75, pp 67-78, 2014.
faults than entropy or peak value. 17. J Antoni, ‘The spectral kurtosis: a useful tool for characterising
non-stationary signals’, Mechanical Systems and Signal
Acknowledgements Processing, Vol 20, No 2, pp 282-307, 2006.
This work was supported by SKF AB, Vinnova and SKF’s University 18. R Díaz-Uriarte and S A De Andres, ‘Gene selection and
Technology Centre at the University of Luleå. classification of micro-array data using random forest’, BMC
Bioinformatics, Vol 7, No 1, p 3, 2006.
References 19. G Biau, ‘Analysis of a random forests model’, Journal of Machine
1. A K Jardine, D Lin and D Banjevic, ‘A review on machinery Learning Research, Vol 13, pp 1063-1095, April 2012.
diagnostics and prognostics implementing condition-based 20. M Pal and G M Foody, ‘Feature selection for classification of
maintenance’, Mechanical Systems and Signal Processing, hyperspectral data by SVM’, IEEE Transactions on Geoscience
Vol 20, No 7, pp 1483-1510, 2006. and Remote Sensing, Vol 48, No 5, pp 2297-2307, 2010.
2. A Heng, S Zhang, A C Tan and J Mathew, ‘Rotating machinery 21. M Dash and H Liu, ‘Feature selection for classification’,
prognostics: state-of-the-art, challenges and opportunities’, Intelligent Data Analysis, Vol 1, No 1-4, pp 131-156, 1997.
Mechanical Systems and Signal Processing, Vol 23, No 3, 22. ISO 6336:1996, ‘Calculation of load capacity of spur and helical
pp 724-739, 2009. gears’, 1996.
3. A Al-Fuqaha, M Guizani, M Mohammadi, M Aledhari and 23. J Rafiee, M Rafiee and P Tse, ‘Application of mother wavelet
M Ayyash, ‘Internet of Things: a survey on enabling technologies, functions for automatic gear and bearing fault diagnosis’, Expert
protocols and applications’, IEEE Communications Surveys & Systems with Applications, Vol 37, No 6, pp 4568-4579, 2010.
Tutorials, Vol 17, No 4, pp 2347-2376, 2015. 24. L Breiman, ‘Random forests’, Machine Learning, Vol 45, No 1,
4. Z Peng and F Chu, ‘Application of the wavelet transform in pp 5-32, 2001.
machine condition monitoring and fault diagnostics: a review 25. H Ishwaran, U B Kogalur, E Z Gorodeski, A J Minn and
with bibliography’, Mechanical Systems and Signal Processing, M S Lauer, ‘High-dimensional variable selection for survival
Vol 18, No 2, pp 199-221, 2004. data’, Journal of the American Statistical Association, Vol 105,
5. ISO 17359:2011, ‘Condition monitoring and diagnostics of No 489, pp 205-217, 2010.
machines – general guidelines’, 2011. 26. H Ishwaran, U B Kogalur, X Chen and A J Minn, ‘Random
6. J Sikorska, M Hodkiewicz and L Ma, ‘Prognostic modelling options survival forests for high-dimensional data’, Statistical Analysis
for remaining useful life estimation by industry’, Mechanical and Data Mining, Vol 4, No 1, pp 115-132, 2011.
Systems and Signal Processing, Vol 25, No 5, pp 1803-1836, 2011. 27. J Piotrowski, Shaft Alignment Handbook, CRC Press, 2006.
7. C Stander, P Heyns and W Schoombie, ‘Using vibration 28. R B Randall and J Antoni, ‘Rolling element bearing diagnostics:
monitoring for local fault detection on gears operating under a tutorial’, Mechanical Systems and Signal Processing, Vol 25,
fluctuating load conditions’, Mechanical Systems and Signal No 2, pp 485-520, 2011.

Paper E
167
Using wavelet transform analysis and the
support vector machine to detect angular
misalignment of a rubber coupling
Juhamatti Saari ∗
∗∗
Johan Odelius Jan Lundberg ∗∗ Matti Rantatalo ∗∗
∗
SKF-LTU University Technology Centre, Luleå University of
Technology, SE 97187 Luleå, Sweden (e-mail: juhamatti.saari@ltu.se)
∗∗
Division of Operation, Maintenance and Acoustics, Luleå University
of Technology, SE 97187 Luleå, Sweden
Abstract: Shaft misalignment is a common problem for many types of rotating systems. It
can cause machine breakdowns due to the premature failure of bearings or other components.
Common diagnostic approaches rely on detecting increasing vibration response spectra at
multiples of the shaft speed. However, in many time-variant systems, such as wind turbines, the
speed and load vary considerably, which can make spectrum analysis insufficient. In this paper,
a method for detecting shaft misalignment by using wavelet analysis is proposed. The method
was experimentally evaluated in a laboratory test rig for four different operating conditions
by varying the rotational speed and load. An angular misalignment was introduced between a
hydraulic pump (load) and a medium-sized industrial gearbox connected with a rubber coupling.
Vibration data were collected by using two accelerometers mounted in an axial and a radial
direction directly on the gearbox casing. The features extracted from wavelet representation
were classified by using a support vector machine algorithm. The detection of misalignment
and the sensitivity of the proposed method are presented using validation data and confusion
matrices. The final results of the confusion matrices clearly indicate that this method can detect
misalignment even when the speed and load vary. The proposed method can be used for systems
which are connected with shafts and there are many similar systems (comprising an electric
motor, a gearbox and a centrifugal pump) working under the same circumstances.
Keywords: Shaft misalignment, wavelet transform, support vector machine, machine learning
1. INTRODUCTION Nowadays, when the computational power of computers

has increased and more methods are available for the
use of this computer power to solve condition monitoring
problems, there is a demand for automatic fault diagno-
A properly aligned rotating machine can reduce the main- sis methods. Methods for fault diagnosis usually involve
tenance cost by increasing the lifespan of the system and experts performing the final analysis and/or scheduled
its components. It can reduce bearing, coupling, seal and routines for manual inspections Kothamasu et al. (2006).
shaft failures and improve the manufacturing quality by Well-known methods are the extraction of sensitive fea-
reducing the overall vibration levels. Moreover, an aligned tures from the time domain and spectrum analysis (per-
system will enjoy reduced energy consumption. It has been formed before or after enveloping). Based on the spectrum,
estimated that the energy consumption can be reduced known hit repetition frequencies can be located which
by 4-5% on average Luedeking (2012). Furthermore, this can indicate certain faults. To diagnose the state of a
is just the tip of the iceberg, which is relatively easy to machine, experts often use methods in both the time
see. The biggest savings are to be gained through the and the frequency domain, as well as many more (non-
increased lifetime of the system and the reduced unsched- vibration based) methods, to draw the final conclusions
uled downtime and repair expenses. Therefore, the need about the state of the machine. In order to develop au-
for automatic detection of misalignment definitely exists, tomatic fault diagnosis methods, it should be possible to
even though this need is not recognized by many, since use similar approaches. However, there are some problems
misalignment may remain hidden and only secondary fail- that need to be solved before this will be possible. First
ures of other components may be detected. In order to of all, experts usually have a deep knowledge of similar
increase the lifetime of the system or accurately estimate machines and they can use all of their senses to gather
its remaining technical life, knowing the goodness of the information from the surroundings of the machine. For
shaft alignment plays an important role. In this paper, a example, for a human it is easy to determine from the
continuous wavelet transform and a support vector classi- vibration signals if the noise level is too high or the speed
fier have been tested to detect the angular misalignment variation is too big to use spectrum analysis. However,
of a rubber coupling.
168
for unsupervised algorithms these steps are much harder. windowing function is used, since it has been proven to
Condition monitoring (CM) experts do not necessarily work in many cases Wang and McFadden (1993). The
need any charts or thresholds in order to realize when STFT has also been widely used for detecting faults in
to use more advanced condition monitoring techniques, rotating machines Cohen (1989); Bartelmus and Zimroz
such as filtering or signal averaging techniques, to process (2009). Unfortunately, because of the nature of the calcu-
the signal before analysis; but again, that is not the case lation of the spectrogram, it is not possible to know the
for machines. Even the smartest algorithms usually rely exact time-frequency representation; i.e. by knowing the
on people who first set up these thresholds before the frequency component precisely, the exact time instance is
algorithm can work. Fuzzy logic algorithms are a good unknown. This is also known as a manifestation of the
example of this type of algorithm Yao (1998). However, a famous Heisenberg Uncertainty Principle, Allen and Mills
large-scale automated diagnosis method should work even (1993).
without this step, and such methods are currently one of
This fixed resolution pitfall of the STFT is one of the rea-
the main topics in the field of CM.
sons why the wavelet transform was developed Peng and
An interesting candidate for selection as an automated Chu (2004). Wavelets are functions whose translations and
diagnosis method is a combination of existing CM methods dilations can be used for expansions of square-integrable
with machine learning techniques developed in computer functions. Instead of having a fixed window shape, as is
science for data analysis. The idea of using machine learn- the case in the STFT, the clever idea was devised of using
ing techniques is to ease the load on trained experts by the same basic filter shape (mother wavelet) and shrinking
drawing simple conclusions automatically, or to transform its time domain extension. This leads to a time-scale rep-
the information to a level which is more understandable resentation that can have a good time resolution for high-
for the operator. frequency events and a good frequency resolution for low-
frequency events. For this reason the wavelet transform
One approach worth considering for automated fault di-
is a promising tool which can be used for many types
agnosis is not to use traditional time domain or frequency
of machine faults Peng and Chu (2004). It can detect
analysis separately for feature extraction, but instead to
transient signals whose origin is, for example, a broken
use both at the same time. This type of analysis is referred
gear tooth Fan and Zuo (2006); Bafroui and Ohadi (2014)
to as time-frequency analysis. It is more suitable than or signals which are longer in duration and are caused, for
frequency analysis when the speed varies, e.g. for moving example, by worn gears Bafroui and Ohadi (2014) or worn
vehicles and wind turbines. It can also reveal small impacts bearings Li and Ma (1997).
based on the continuous signal, which may be impossible
for pure frequency analysis. Some of the known time- Even though wavelets have been studied with a view to
frequency analysis methods used for condition monitor- optimizing and automating feature extraction for problems
ing are the Wigner-Ville distribution (WVD), the short- such as gear and bearing faults Rafiee et al. (2010),
time Fourier transform (STFT) and the wavelet transform less effort has been made to solve problems such as
(WT). misalignment. The most common method for diagnosing
misalignment still relies on detecting increasing vibration
The WVD is the oldest known time-frequency transform
response spectra at the shaft speed or its harmonics.
method. Wigner (1932) applied it to quantum mechanics
However, this can be problematic when the speed varies
in the beginning of the 1930s, and in the 1940s,Ville
or low frequency noise of the machine is masking the
(1948) applied the transform to signal processing, which
signal. Moreover, it has been shown by many researchers
explains the origin of its name. Since then it has been
working on the misalignment problem that, for example,
used to diagnose many types of machine faults. For ex-
for some systems the 2nd harmonic in frequency domain
ample, Staszewski et al. (1997) used it to detect gearbox
is the most dominant, while for others the 4th in the time
faults. Although the WVD leads to a superior frequency-
domain can be the most dominant Lahdelma and Laurila
domain resolution, it can produce high-energy coefficients
(2012). To make the problem even more complicated,
in the transform plane, even though no such coefficients misalignment can be highly non-linear, in such a way that
actually exist. This inference term can be filtered, but a bigger misalignment does not necessarily mean that the
then some of the excellent frequency resolution will be amplitude will increase.
lost. When filtering of the plane is included, the WVD
can be referred to as a pseudo-Wigner-Ville distribution. Although wavelet analysis can be a very effective method
A comparative study of the use of this method for CM for diagnosing many types of faults, it requires trained
is to be found in an article written by Baydar and Ball eyes to detect different types of faults from the image
(2001) (WT scalogram), just as it requires a professional to
interpret the spectrum. However, a substantial amount of
The STFT, or windowed Fourier transform, uses a win-
research has been conducted to analyse images by using
dowing function to separate a small section from the signal
machine learning techniques. Recently these techniques
(short time) and produces a snapshot of the signal. Over-
have become more popular and many people have used
lapping each analysed segment and summing them will
them to diagnose machine failures. However, in order to
lead to an image (named a spectrogram) which can repre-
apply these methods, the following two crucial steps are
sent how the signal will vary in time. The windowing func-
needed:
tion can have different shapes, as in standard frequency
analysis. By choosing the optimum windowing function, (1) finding sensitive features that represent well all the
the detection performance can be improved. However, the possible changes which can be seen in the scalogram,
benefits can be quite minimal, and mostly a Gaussian
169
(2) the use of a classifier which can label each state Misalignment was introduced between a hydraulic pump
accordingly with confidence. (load) and a medium-sized industrial gearbox connected
with a rubber coupling. The coupling type was a so-called
Suitable machine learning techniques for classifying fea- doughnut type, which is one of the best types of rubber
tures extracted from the scalogram images are a topic of couplings for enduring a large degree of misalignment.
great current interest and are dealt with in Agarwal et al. This type of coupling is mainly used in cases where
(2004). Deeper knowledge of how the classifying algorithm sudden impacts are likely to happen, such as highly loaded
chosen for the present study (the SVM algorithm) works gearboxes.
and how it is used for condition monitoring can be found
in Widodo and Yang (2007). The test rig (Figure 2) used in this study was specifically
designed for testing several different failure modes in an
SVM algorithms are efficient learning algorithms for non- environment where natural noise is coming from other
linear functions. They can separate non-linear regions by components (e.g. hydraulic systems, and bigger electric
using kernel functions to measure similarity, based on the motors), since the scale of the test rig is much closer to
dot products of the vector space. By separating the data the size of normal machine components. In Table 1 some
into training, testing and validation sets, it is possible to of the main component of the test rig are listed.
find a model which will maximize the decision boundary,
and new measurements performed online can be tested Table 1. Test rig specifications.
without defining any threshold limits manually. The main
advantage of using the SVM is that the value of each
feature does not necessarily have to increase in order to Specification Explanation
separate the healthy system from the unhealthy one. In the Motor 75 kW three phase AC electric motor
case of a misalignment this is important, since sometimes Controller ABB inverter
it might happen that the vibration level of a certain Max RPM 1600 RPM
frequency component is actually decreasing when the axles Gearbox Mekanex 602A Ratio:3,61
are poorly aligned. This might be due to an increase in Coupling Rubber doughnut coupling
the static load, which may not be detected by using a Load Hydraulic pump max. torque 500 Nm
Vibration sensors IMI PCB 10mV/g
vibration sensor, which is more sensitive to dynamic loads.
Torque sensor Linear range 0-1000 Nm
One disadvantage of the SVM is that there is no standard
method for choosing the kernel function. Moreover, as in
the case of other machine learning techniques, over-fitting 2.1 Wavelet transform and feature extraction
or under-fitting the data can happen when the technique
is not applied correctly in all its parts or when features
The selection of the mother wavelet for wavelet analysis
are chosen poorly. Therefore, it is important to separate
is still an open question, although many researchers have
the data properly into the training, testing and validation
tried to find the best solution or a method for choosing
sets. A limitation of the SVM which explains why it is
the mother wavelet by comparing the signal to a best
not commonly used for CM diagnosis is that both healthy
suitable one Rafiee et al. (2010). Some have also used
and unhealthy data are needed for training the algorithms
the well-known trial-and-error method to define it before
to detect faults in the future (Widodo and Yang (2007)).
the analysis. However, previous studies have ascertained
However, this is a common problem for all supervised
that, even though the Morlet wavelet may not be the best
methods and not only for the SVM. To solve this problem,
candidate, it is usually one of the best candidates for many
it has been suggested that synthetic data should be used
types of faults Rafiee et al. (2010). The reason for this
to simulate all or some of the failure modes Leturiondo
is that impacts usually create a transient signal whose
et al. (2015). Another way of obtaining both healthy and
shape is very close to the shape of the Morlet wavelet
unhealthy data is to apply a good management protocol
after damping effects. Therefore, it is reasonable to choose
in which, each time a failure has happened, the data are
the Morlet wavelet as the mother wavelet when trying to
labelled correctly with a good description from the system.
implement an automated fault diagnosis method based on
Later on these data can be used to train the classifying
wavelet analysis.
algorithm to diagnose the failure the next time it occurs.
The most common feature of interest in traditional CM
methods is the root mean square (rms) value of the
2. METHOD vibration acceleration. This value is a measure of the
energy content of the signal and is a good indicator of
the overall health of the system. However, it can still be a
In this study, complex Morlet wavelets have been used challenge to diagnose the fault just by using the rms value,
to denoise the raw vibration signal by extracting features since the root cause is very difficult to isolate and attribute
with varying scaling and bandwidth parameters. From the to a certain component and its problem.
resultant coefficients, statistical parameters were chosen
to obtain as much information as possible for the clas- To overcome the difficulty of identifying the reason for
sification algorithm. The classifier used was a support a vibration level increase, the common procedure is to
vector machine (SVM), which is a supervised machine use a band alarm in such a way that the raw vibration
learning model. The model was validated using a new set signal is pre-processed by filtering over a chosen frequency
of data and a confusion matrix was used to present the band, and then a prober threshold is selected which
classification results. In Figure 1 the steps needed to build will trigger an alarm if the amplitude of the frequency
up the model can be seen. component reaches this level. These bands can be chosen
170
Hydraulic pump
MODEL
Raw vibration signal
Continuous wavelet transform

(Complex Morlet wavelet)
Feature extraction Sensors Misalignment screw
Selection of scale and

bandwidth parameters Fig. 2. Test rig.
is increasing at the same time. Using band alarms for
detecting misalignment can be challenging, since there are
no specific passing frequencies which are to be located
Statistical parameters and isolated. However, misalignment is usually detected
through the observation of an increased level of vibration
in the shafts rotation speed or in its harmonics bands using
these frequencies. The difficulty of using this technique is
that misalignment can be highly non-linear, in such a way
Feature optimization
PCA (Optional) that the vibration level might actually decrease when a
bigger misalignment occurs. In addition, for some cases
the 2nd harmonic in the frequency domain is the best
indicator, while the 3rd or 4th harmonic is the best for
others.
SVM classification Using wavelet transforms for feature extraction has several
advantages. First of all, the WT method is a combination
of time domain and frequency analysis and can detect
transient signals and give an indication of the condition
of the system based on the overall behaviour and what
Accuracy rate is usually seen in the time domain analysis. The contin-
uous wavelet transform (CWT) can be expressed in the
following equation:
Z ∞
1 t−b
DIAGNOSIS
Online Data MΨ f = S(a, b) = p f (t)Ψ( ) dt, (1)

|a| −∞ a
OR where a is the scaling factor and b is the translation value.
Faulty system Healthy system Ψ(t) is a continuous function usually referred to as the
mother wavelet. In this study a complex Morlet wavelet
(CMW) has been used as a mother wavelet. A CMW can
Fig. 1. Flow chart for choosing sensitive features for fault be defined as,
detection.
1
Ψ(t, fc , fb ) = √ exp(−t2 /fb )exp(j2πfc t), (2)
by using physical models for which, for example, the ball πfb
passing frequency for the outer race of the bearing or where fc is the centre frequency and the fb is the band-
the mesh frequency of two meshing gears is known. The width parameter. Figure 3 illustrates how changing the
limitations of this method are that usually the speed parameter fc will affect the time-frequency resolution of
needs to be known precisely and the variation of the the wavelet analysis. Note that when the bandwidth pa-
speed should be minimal. This is especially the case when rameter value is big, Morlet wavelet is very close to a
the rotation speed of the shaft is low and other passing sinusoidal wave.
frequencies are occurring at close frequencies. Moreover,
the limitation of just detecting the peak of a certain After a WT has been performed, some statistical param-
frequency component is that the increased level of the eters are still needed in order to reveal the change in the
sidebands may go undetected, or the band must be wide attributes of each decomposed WT parameter. Usually
enough and other statistical parameters (e.g. the rms) for spectrum analysis, the amplitude of each frequency
need to be used for the band-passed signal. However, the component is used to determine the behaviour of each
risk of detecting other passing frequencies or pure noise feature. However, in wavelet analysis each scale coefficient
171
can be treated as the time domain signal and more values is not compulsory and sometimes may weaken the sensi-
than just the peak value can be used. In this article, five tivity of the model. Accordingly, to reduce the dimensions
different statistical parameters (see Table 2) have been without losing too much information, PCA was used in
used to extract as much information as possible from each such a way that a 95 % variance was retained each time
scale coefficient. before training the SVM.
2.3 Support vector machine

Spectrum, f b = 0.10, fc = 2.00 Hz
To achieve non-linear classification, a Gaussian kernel was

used for the SVM. To avoid problems such as over-fitting
-10 -8 -6 -4 -2 0 2 4 6 8 10 and converging to a local minimum, 5-fold cross-validation
Frequency [Hz] was implemented and 20 random initial values were used.
Spectrum, f b = 1.00, fc = 2.00 Hz From these sets, the Matlab fminsearch function was used
to choose the best parameters. The data were always
separated using 60-20-20% sections (training, testing and
-10 -8 -6 -4 -2 0 2 4 6 8 10
validation). The actual number of data instances is to be
Frequency [Hz] found in Table 3.
Spectrum, f b = 10.00, fc = 2.00 Hz
2.4 Test setup
The time length of each data instance was 2.5 seconds and
-10 -8 -6 -4 -2 0 2 4 6 8 10 for each speed and load, data instances were collected by
Frequency [Hz]
using an accelerometer which was stud-mounted directly
on the gearbox in an axial direction, as can be seen
in Figure 2. Data were also collected using another ac-
Fig. 3. The effect of changing the fb parameter of the
celerometer mounted in a radial direction. However, since
complex Morlet wavelet.
angular misalignment is usually easier to detect using an
axial direction, only data from the accelerometer mounted
Table 2. Statistical parameters extracted from in an axial direction were used to train the final model.
the wavelet coefficients. Nevertheless, it would be possible to repeat the same steps
for a radial direction or even combine features extracted
Feature Equation Explanation from both locations, before training the SVM. All the data
Peak max|xk | Maximum absolute were collected using a National Instruments PXI platform.
value of the signal The sampling frequency used was 102.4 kHz. However,
v
uP the linearity of the accelerometers used was within an
u N x2 error or 15% up to 10 kHz, and therefore, before any
t k
RMS k=0
Measure of the energy wavelet analysis, all the signals were low-pass filtered and
N
content of the signal downsampled to 10.24 kHz using the decimate function
Kurtosis
E(x−µ) 4
Measure of the peaked- of Matlab. All of the signal processing and analysis was
σ4
ness of the probability carried out using Matlab software. Data were collected
distribution of a real- using two different speeds and loads. The rotating speed
valued random variable was obtained using a tachometer and the load by using
E(x−µ)3 a torque sensor located between the electric motor and
Skewness σ3
Measure of the
asymmetry of the the studied gearbox. Both the speed and the load were
data around the kept constant, but a slight natural variation of the load
sample mean occurred. Accurate mean load and speed values are to be
Range max(xk ) − min(xk ) Difference of the max- found in Table 3. The load and the speed were measured
imum and minimum from the input shaft side.
value of the signal
Table 3. Number of data instances collected for
*E(t) is the expected value each case.
of the quantity t
Speed [RPM] Load [Nm] 0deg 1deg 2deg 3deg

66 102 51 52 55
2.2 Principal component analysis 1500
35 101 52 51 52
Principal component analysis (PCA) is a method for iden- 65 101 51 51 51

1000
tifying patterns in data or expressing data in such a way 37 101 51 51 52
that the similarities and differences can be highlighted.
Another main advantage of PCA is its capability to reduce
the number of data dimensions without losing too much 3. RESULTS AND DISCUSSION
information. In the present study, PCA was first used to
reduce the dimensions to two in order to gain a better In this section the preliminary results are first shown
understanding from the SVM results. However, this step using acceleration (g) signal and its Fourier transform
172
(Figures 4 and 5). Then the spectra for each data instance
Axial vibration signal. Spectrum of an axial direction
are displayed in waterfall plots (Figures 6, 7, 8 and 9) in 5 0.4
order to visualize how each misalignment case affects the 0.3
vibration levels of the frequency components.
[g]
[g]
0 0.2
Each step of the flow chart (Figure 1) was performed and
0.1
the results are to be found in Figures 10 – 16 and Table 4.
First the SVM was trained using only individual scaling -5 0
0 1 2 0 1000 2000 3000 4000 5000
factor values to determine whether it was possible to use Time[sec] Frequency[Hz]
only one set of features to detect misalignment when the Radial vibration signal. Spectrum of a radial direction
5 0.4
speed and load varied. Later the SVM was trained using
feature set gotten using the first four scaling factors. 0.3
[g]
[g]
0 0.2
Visual inspection of signals time and frequency domains 0.1
-5 0
Typical signals of one data instance and their amplitude 0 1 2 0 1000 2000 3000 4000 5000
spectra are shown in Figure 4 for the case where the Time[sec] Frequency[Hz]
axles were aligned, and in Figure 5 for the case where
the misalignment was three degrees. Comparing these Fig. 4. Acceleration [g] signals and their amplitude spectra
extreme cases, it seems that the overall vibration level for the axial and radial directions with a speed of 1500
does not change dramatically. Figures 6, 7, 8 and 9 RPM and a torque of 66 Nm. Good alignment of the
show that the rubber coupling works well and that the shafts.
frequency components of the rotation frequencies and their Axial vibration signal. Spectrum of an axial direction
harmonics remain almost the same for the healthy signal 5 0.4
and for each misalignment case; this is especially the case 0.3
for low frequencies. However, the change in the behaviour
[g]
[g]
0 0.2
of the system was different for each case, which was easy to
detect by listening to the running sound, at least when the 0.1
misalignment reached its highest point. Moreover, small -5 0

bits of rubber were detached from the rubber coupling, so 0 1 2 0 1000 2000 3000 4000 5000
Time[sec] Frequency[Hz]
that, clearly, the misalignment would have been harmful
Radial vibration signal. Spectrum of a radial direction
to the system in the long run. However, with traditional 5 0.4
techniques, the misalignment might have been hard to 0.3
detect. In addition, since the maximum speed for the test
[g]
[g]
0 0.2
rig is 1500 RPM and the ratio is 3.61, the calculated
maximum rotation frequency of the output shaft is 6.93 Hz 0.1
and, when the speed is 1000 RPM, the rotation frequency -5 0
is 4.62 Hz. Therefore, the functionality of detecting the 0 1 2 0 1000 2000 3000 4000 5000
rotation frequency is on the verge of its limitation. Even Time[sec] Frequency[Hz]
though the sensitivity of the accelerometers is 10 mv/g, we
are operating under the standard factory calibration limit Fig. 5. Acceleration [g] signals and their amplitude spectra
of 10 Hz. Therefore, the use of the first rotation frequency for the axial and radial directions with a speed of
might be obscured. 1500 RPM and a torque of 66 Nm. An angular
misalignment of three degrees.
Wavelet analysis and SVM classifier
Before choosing the value a according to the rotation
frequency, other a values were also tested to see how
According to preliminary observations, it seems that well the misalignment could be detected. It was found,
changing both the scaling factor (a) and the centre fre- surprisingly, that by using some a values not related
quency (fc ) of the Morlet wavelet has little or no effect, to the rotation frequencies the misalignment could be
since doubling the value a and halving the value fc lead detected better than by using the rotation frequencies to
to almost the same scalogram image. This is in contrast calculate the a values. However, these other a values were
to the findings of Rafiee et al. (2010) who found some excluded from the final models and only the first 4 a values
minor differences by using different centre frequencies. In were used, which all were related to the rotation speed.
the present study, only the bandwidth parameter (fb ) and The reason for this was that there are no good way of
the value a had chosen parameters which were altered. knowing whether other faults would also have an affect
For the fb values, 0.1, 1 and 10 were chosen, as shown in and disturb the sensitivity of the final model. Therefore,
Figure 3. For all the cases, the scaling factor values were the functionality of the method was tested only using
chosen by taking the mean speed so that the first scaling a values centred around the rotation frequencies which
factor’s pseudo-frequency was centred around the rotation should improve the robustness of the final model.
speed. In total ten different scaling factors were used and
all of them were centred like the first one, such that the Since misalignment can be very sensitive to changes in the
harmonics of the rotation speeds were chosen to be up to load and speed, the SVM was first trained by choosing
ten times the rotation speed. feature sets which consist only wavelet coefficient using
173
Fig. 6. Acceleration (g) waterfall plot with a speed of 1500 Fig. 7. Acceleration [g] waterfall plot with a speed of 1500
RPM and a torque of 35 Nm. RPM and a torque of 66 Nm.
Fig. 8. Acceleration [g] waterfall plot with a speed of 1000 Fig. 9. Acceleration [g] waterfall plot with a speed of 1000
RPM and a torque of 37 Nm. RPM and a torque of 65 Nm.
scaling factors from 1 – 10 individually. All the individual only to one scaling factor value cannot be used individually
cases are shown in Figures 10 – 16 in the same graph. to detect the misalignment with good confidence. The tests
Mostly a 95 % variance was obtained with 4 features out of for all the cases shown in Figures 10-16 were performed
total 5. When comparing these figures, it can be seen that when fb = 10.
some of the models are able to detect the misalignment
To achieve a better and more robust model, the first four
with good confidence. However, the most sensitive feature
feature sets (four different scaling factor values) and all
set do not remain the same when the load or speed changes
four cases were combined and used together to train the
and the first four feature sets are rarely 100% accurate. It
SVM. Moreover, this procedure was carried out by using
seems that it is reasonable and necessary to use more than
three different fb parameters to determine whether this
one scaling factor values to make the model more robust.
would have any effect on the final model. Results for
Furthermore, to test how well the feature set which is these three different bandwidth parameters are to be found
chosen using only one a value would work when the load in Table 4. Surprisingly, this method was almost 100%
and speed varied, all the data cases were combined into accurate, even though individually feature set led to quite
three different sections. Figures 14 and 15 show the results poor results. Mostly a 95% variance (see Section 2.2) was
for two cases where the speed remained the same and obtained by using six dimensions out of the chosen 20 (4
the load varied, while Figure 16 shows the results for a values and 5 features).
all the cases combined. Again it seems that some of the
feature sets work quite well, but they do not show the Confusion matrices were used to determine the evaluation
same performance when the speed changes. In addition, metrics of the misalignment classifier. For each class, the
following four classification outcome states were employed.
combining all four cases seems to give the worst results,
which clearly indicates that the feature set which is based
174
10 10
1 Degree angle 1 Degree angle
8 8
6 6
SVM model
SVM model
4 4
2 2
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Accuracy rate [%] Accuracy rate [%]
Fig. 10. Accuracy of the SVM models when using only Fig. 11. Accuracy of the SVM models when using only
individual scaling factors (a) from 1 to 10. Rotation individual scaling factors (a) from 1 to 10. Rotation
speed is 1500 RPM and torque is 35 Nm. speed is 1500 RPM and torque is 66 Nm.
10 10
8 8
6 6
SVM model
SVM model
4 4
2 2
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
individual scaling factors (a) from 1 to 10. Rotation individual scaling factors (a) from 1 to 10. Rotation
speed is 1000 RPM and torque is 37 Nm. speed is 1000 RPM and torque is 65 Nm.
175
10 10
8 8
6 6
SVM model
SVM model
4 4
2 2
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
individual scaling factors (a) from 1 to 10. Speed is individual scaling factors (a) from 1 to 10. Speed is
1500 RPM and torque is 35 or 66 Nm. 1000 RPM and torque is 37 or 65 Nm.
10
1 Degree angle
2 Degree angle
3 Degree angle
8
6
SVM model
0
0 10 20 30 40 50 60 70 80 90 100
Accuracy rate [%]
Fig. 16. Accuracy of the SVM models when using only

individual scaling factors (a) from 1 to 10. All cases
(see Table 3) combined.
176
(1) True positive (TP): The system is healthy and was whether this would also be the case when there is some
predicted to be healthy. fluctuation around the mean speed. Our estimation is that
(2) False positive (FP): The system is faulty, but was there is a high correlation between the speed variance and
predicted to be healthy. the bandwidth parameter.
(3) False negative (FN): The system is healthy, but was
Table 4. Confusion matrices when SVM is
predicted to be faulty.
trained using all data instances with 3 different
(4) True negative (TN): The system is faulty and was
bandwidth parameters.
predicted to be faulty.
The accuracy of the SVM model, which is presented in fc=2,fb=0,1 and scales (a) 1-4 Predicted class
Table 4, could be calculated using the following equation: Misalignment= 1 2 3 Healthy Faulty
Accuracy = (T P + T N )/(P + N ), where P is the number
Healthy Actual 37 41 47 0 1 0
of positive data instances and N is the number of negative Faulty Actual 0 0 0 25 19 19
data instances. In all the cases, P and N were randomly
Accuracy [%] 100 98,4 100
selected using the data instances listed in Table 3.
From Table 4 it can be seen that changing the fb parameter
of the mother wavelet does not affect to the results that
fc=2,fb=1 and scales (a) 1-4 Predicted class
much. The reason for this might be that the speed was
constant without any minor fluctuation around the mean Misalignment= 1 2 3 Healthy Faulty
speed. However, it would be interesting to determine Healthy Actual 38 46 42 0 0 0
Faulty Actual 0 0 0 24 16 20
Accuracy [%] 100 100 100
fc=2,fb=10 and scales (a) 1-4 Predicted class

Misalignment= 1 2 3 Healthy Faulty
Healthy Actual 38 46 45 0 1 2
Faulty Actual 1 0 1 21 15 14
Accuracy [%] 96,7 98,4 95,2
4. CONCLUSIONS
The features which were extracted by using Morlet wavelet

transforms with scaling factor (a) centred around the
harmonics of the rotation frequencies and which were
classified using the SVM can be used to detect and classify
misalignment with great confidence, as seen in Table 4.
Varying the bandwidth parameter of the Morlet wavelet
seems to have little or no effect when the speed does not
fluctuate around its mean value.
Feature sets using only individual scaling factors are very
sensitive to changes in the rotation speed and torsional
load and cannot be used to detect the misalignment
according to this test setup.
5. FUTURE WORK
This research study has only dealt with one failure type,
namely shaft misalignment. Future research should inves-
tigate whether the model is still valid when other types
of failure, e.g. bearing or gear damage, are introduced in
the same system. It is only after such research has been
performed that it will be possible to classify and validate
the sensitivity of the chosen features for individual failure
modes with great confidence.
Furthermore, the process parameters, e.g. the load and
speed, should vary even more and the speed should fluc-
tuate around the mean value during each measured time
segment.
177
To improve the sensitivity of the method, it could be Li, C.J. and Ma, J. (1997). Wavelet decomposition of
beneficial to integrate the acceleration signal to obtain the vibrations for detection of bearing-localized defects. Ndt
velocity. We can also calculate an infinite number of signals & E International, 30(3), 143–149.
by means of fractional order derivation. However, in the Luedeking, A. (2012). Shaft alignment, soft foot & energy
test setup of the present study, it was sufficient to use savings. URL http://www.ludeca.com/casestudy/
acceleration. energysavings_uptime1211.pdf.
Peng, Z. and Chu, F. (2004). Application of the wavelet
This type of model can also be used as an input for
transform in machine condition monitoring and fault
estimating the remaining technical life of a system. Since
diagnostics: a review with bibliography. Mechanical
sometimes it is impossible to stop the machine under
Systems and Signal Processing, 18(2), 199 – 221. doi:
investigation (owing to distance or location, or because
http://dx.doi.org/10.1016/S0888-3270(03)00075-X.
it is part of a bigger system) and fix a misalignment,
Rafiee, J., Rafiee, M., and Tse, P. (2010). Application of
the model can support decision making by giving some
mother wavelet functions for automatic gear and bearing
estimations based on the nature of the misalignment, even
fault diagnosis. Expert Systems with Applications, 37(6),
if such estimations may not be 100% accurate.
4568 – 4579. doi:http://dx.doi.org/10.1016/j.eswa.2009.
12.051.
ACKNOWLEDGEMENTS Staszewski, W., Worden, K., and Tomlinson, G. (1997).
Time-frequency analysis in gearbox fault detection using
The authors would like to thank SKF AB, Vinnova and the wigner-ville distribution and pattern recognition.
National Instruments for their contributions and support. Mechanical Systems and Signal Processing, 11(5), 673–
692.
Ville, J. (1948). Théorie et applications de la notion de
REFERENCES signal analytique. Câbles et Transmission, 2, 61?74.
Wang, W. and McFadden, P.A.A. (1993). Early detection
Agarwal, S., Awan, A., and Roth, D. (2004). Learning of gear failure by vibration analysis i. calculation of the
to detect objects in images via a sparse, part-based time-frequency distribution. Mechanical Systems and
representation. IEEE Transactions on Pattern Analysis Signal Processing, 7(3), 193–203.
and Machine Intelligence, 26(11), 1475–1490. doi:10. Widodo, A. and Yang, B.S. (2007). Support vector
1109/TPAMI.2004.108. machine in machine condition monitoring and fault
Allen, R. and Mills, D. (1993). Signal analysis: time, diagnosis. Mechanical Systems and Signal Processing,
frequency, scale, and structure. Wiley InterScience. 21(6), 2560–2574.
Bafroui, H.H. and Ohadi, A. (2014). Application of wavelet Wigner, E. (1932). On the quantum correction for ther-
energy and shannon entropy for feature extraction in modynamic equilibrium. Phys. Rev., 40, 749–759. doi:
gearbox fault detection under varying speed conditions. 10.1103/PhysRev.40.749.
Neurocomputing, 133(0), 437 – 445. doi:http://dx.doi. Yao, Y. (1998). A comparative study of fuzzy sets and
org/10.1016/j.neucom.2013.12.018. rough sets. Information sciences, 109(1), 227–242.
Bartelmus, W. and Zimroz, R. (2009). Vibration condition
monitoring of planetary gearbox under varying external
load. Mechanical Systems and Signal Processing, 23(1),
246–257. doi:10.1016/j.ymssp.2008.03.016.
Baydar, N. and Ball, A. (2001). A comparative study of
acoustic and vibration signals in detection of gear fail-
ures using wigner-ville distribution. Mechanical Systems
and Signal Processing, 15(6), 1091–1107. doi:10.1006/
mssp.2000.1338.
Cohen, L. (1989). Time-frequency distributions-a review.
Proceedings of the IEEE, 77(7), 941–981.
Fan, X. and Zuo, M.J. (2006). Gearbox fault detection
using hilbert and wavelet packet transform. Mechanical
Systems and Signal Processing, 20(4), 966 – 982. doi:
http://dx.doi.org/10.1016/j.ymssp.2005.08.032.
Kothamasu, R., Huang, S., and VerDuin, W. (2006).
System health monitoring and prognostics a review of
current paradigms and practices. The International
Journal of Advanced Manufacturing Technology, 28(9-
10), 1012–1024. doi:10.1007/s00170-004-2131-6.
Lahdelma, S. and Laurila, J. (2012). Detecting misalign-
ment of a claw clutch using vibration measurements.
The Ninth International Conference on Condition Mon-
itoring and Machinery Failure Prevention Technologies
2012, 2, 1010–1025.
Leturiondo, U., Mishra, M., Galar, D., and Salgado, O.
(2015). Synthetic data generation in hybrid modelling
of rolling element bearings. Insight, 57 (7), 395–400.
178

Full Text 01

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Full Text 01

Uploaded by

Copyright:

Available Formats

DOC TOR A L T H E S I S

Machinery diagnostic techniques

Operation and Maintenance Engineering

“You never change things by fighting the existing reality. To change

“Long-term commitment to new learning and new philosophy is

“Am I tilting at windmills or is this a real giant?”

One of the future challenges of machinery diagnostics and prognostics

This doctoral thesis is composed of the following appended publica-

2 STATE OF THE ART 11

2.3.3 Fault propagation . . . . . . . . . . . . . . . . . . . . . . . 25

4 Summary of the appended papers 45

5 RESULTS AND DISCUSSIONS 49

6 CONCLUSIONS AND FUTURE WORK 85

ANN artificial neural network

BPFI Ball pass frequency inner (inner ring)

BPFO Ball pass frequency outer (outer ring)

CBM Condition based maintenance

CWT Continuous wavelet transform

DWT Discrete wavelet transform

EM algorithm Expectation-maximization algorithm

FDI Fault detection and identification

ICA Independent component analysis

IHVM Integrated vehicle health management

IoT Internet of things

ISHM Integrated system health management

LHD Load, haul, dump machine

MEMS Micro-electro-mechanical system

OCSVM One-class SVM

PCA Principal component analysis

PEID Product embedded information devices

PHM Prognostics and health management

RBF Radial basis function

RFID Radio frequency identification

ROC Receiver operating characteristic

RUL Remaining useful life

SCADA Supervisory control and data acquisition

STFT Short-time Fourier transform

SVC Support vector clustering

SVDD Support vector domain description

SVM Support vector machine

VBGM Variational Bayesian for Gaussian mixture model

VHM Vehicle health management

WVD Wigner-Ville distribution

provided by ISO 13372 [2012].

1.1.1 Future of condition monitoring

1.1.2 Problem statement

1.2 Research questions

able to facilitate prognostic approaches by allowing a smoother transition from

RQ 1 How can rotating machinery fault detection be improved by considering

1.2.1 Linkage of research questions and appended papers

1.2.2 Contribution of authors

2. test rig design & measurement setup,

5. drafting the paper,

6. revision of important intellectual content,

7. final approval for submission.

Table 1.2: Contribution of the authors of the appended papers (A-D).

STATE OF THE ART

1. data collection using sensors and condition monitoring techniques;

3. predicting the future condition;

4. taking an appropriate maintenance action.

DATA DATA MAINTENANCE DECISION

Oil analysis Approaches

2.2 Data acquisition and data processing