Universidade Federal de Santa Catarina Rafael Hoffmann Fallgatter

UNIVERSIDADE FEDERAL DE SANTA CATARINA
Rafael Hoffmann Fallgatter
MACHINE LEARNING METHODS FOR THE PREDICTION OF

SUCTION AND DISCHARGE PRESSURES OF COMPRESSORS USING
VIBRATION DATA
Florianópolis
2019
MACHINE LEARNING METHODS FOR THE PREDICTION OF SUCTION

AND DISCHARGE PRESSURES OF COMPRESSORS USING VIBRATION
DATA
Bachelor’s Thesis presented to the Mechanical

Engineering Undergraduate Course at the Federal
University of Santa Catarina as a partial
requirement to obtain the Bachelor’s Degree on
Mechanical Engineering.
Advisor: Prof. Dr. Eng. Rodolfo César Costa

Flesch
Florianópolis
2019
MACHINE LEARNING METHODS FOR THE PREDICTION OF SUCTION

AND DISCHARGE PRESSURES OF COMPRESSORS USING VIRATION
DATA
This Bachelor’s Thesis was considered appropriate to obtain the Bachelor’s Degree on
Mechanical Engineer and approved on its final form by the Graduate Course of
Mechanical Engineering
Florianópolis, 25th November 2019.
____________________________________
Prof. Carlos Enrique Niño Bohórquez, PhD. Eng
Course Coordinator
Examining Committee:
____________________________________
Prof. Rodolfo César Costa Flesch, PhD. Eng.
Supervisor
Universidade Federal de Santa Catarina
____________________________________
Ahryman Seixas Busse de Siqueira Nascimento, M.Sc.
Co-supervisor
____________________________________
Prof. Carlos Enrique Niño Bohórquez, PhD. Eng
____________________________________
Bernardo Barancelli Schwedersky, M.Sc.
To my family and friends
AKNOWLEDGEMENTS
I would firstly like to thank my parents and my stepfather for all the help and
support throughout my studies and my life. All my achievements are also theirs. Thanks
also to my siblings for putting up with me every single day and making my days funnier.
And thanks to the rest of my family for always being with me.
I would also like to thank all my friends, those who I have just met and those
who I have known throughout my life. They are all very important to me. A special thanks
to the great people of PET-MA, you have a huge influence on who I am today. The work
at PET-MA was my life during the graduation.
Moreover, I would like to thank the teachers and other people that I have met in
university, who have taught me so much.
Finally, I would like to thank very much my supervisor, Rodolfo Flesch, as well
as Ahryman Nascimento, Bernardo Schwedersky and the other people of LABMETRO,
for their great support throughout the development of this project.
“I am enough of an artist to draw freely upon my imagination.
Imagination is more important than knowledge. Knowledge is limited.
Imagination encircles the world”
(Albert Einstein)
RESUMO
A medição das condições de operação de compressores de refrigeração, caracterizadas

parcialmente pelas pressões de sucção e descarga, são costumeiras durante etapas de
desenvolvimento de novos produtos, assim como para teste e controle de qualidade da
produção. Especificamente para a medição dessas duas pressões, existe o interesse de
realizá-la de maneira não-intrusiva. Uma alternativa é a utilização do chamado
sensoriamento virtual, no qual uma outra grandeza é medida e correlacionada com a
grandeza de interesse, neste caso as pressões. No presente trabalho, dados de vibração
medidos no corpo do compressor são utilizados, uma vez que estes apresentam uma
grande quantidade de informação a respeito de sua condição de operação. Como a
correlação da vibração com a pressão é de alta complexidade, métodos de Aprendizado
de Máquina são utilizados. Este trabalho propõe um procedimento para o processamento
do sinal de vibração utilizando a transformada rápida de Fourier para a geração de um
conjunto de dados de entrada para o treinamento do algoritmo. Diferentes algoritmos de
Aprendizado de Máquina foram testados e comparados, com foco em métodos ensemble
baseados em Árvores de Decisão. Os hiperparâmetros destes métodos foram ajustados
utilizando-se de um algoritmo de Otimização Bayesiana. Neste trabalho, optou-se por
construir um algoritmo que realiza a previsão das temperaturas de evaporação e
condensação do fluido, grandezas estas que são representativas, respectivamente, da
pressão de sucção e descarga, mas que podem ser generalizadas para diferentes tipos de
compressores. O sistema final é capaz de prever a temperatura de evaporação do fluido
com um Erro Médio Absoluto (MAE) de 1.22 °C e da temperatura de condensação com
um MAE de 2.95 °C. Finalmente, este modelo é utilizado para analisar a influência das
pressões do fluido, frequência rotacional do compressor e eixos de medição da vibração
na qualidade das estimações do método, conhecimento este que pode ser utilizado no
desenvolvimento de sistemas comerciais de medição.
Palavras-chave: Compressores herméticos, Sensoriamento virtual, Aprendizado de

máquina, Árvores de decisão
RESUMO EXPANDIDO
Introdução
A geração de frio de uma forma confiável e econômica é um desafio de
grande importância para a sociedade atual. Refrigeradores e sistemas de ar-
condicionado estão presentes em diversos tipos de aplicações industriais, comerciais
e domésticas.
Um dos componentes de maior importância em sistemas de refrigeração é o
compressor. Para a realização da manutenção destes equipamentos, é muitas vezes
necessária a obtenção de sua pressão de sucção e descarga. Entretanto, para isso, é
necessária a inserção de sensores de pressão nas linhas do sistema, o que traz uma
série de implicações tanto econômicas quanto de qualidade de medição. Assim sendo,
existe um interesse na realização desta medição de forma não-intrusiva.
Neste trabalho, utiliza-se a técnica conhecida como sensoriamento virtual,
onde uma grandeza é medida e correlacionada com a grandeza que se deseja analisar.
Optou-se pela medição da vibração da carcaça do compressor, uma vez que existe
uma significativa relação entre a mesma e as pressões de sucção e descarga.
Entretanto, tal relação é de alta complexidade, sendo inviável a utilização de uma
modelagem analítica. Dessa forma, este trabalho tem por objetivo o desenvolvimento
de um algoritmo de Aprendizado de Máquina capaz de realizar previsões utilizando
exemplos obtidos através de ensaios.
Objetivos
O objetivo global deste projeto é o desenvolvimento de um modelo ensemble de
Aprendizado de Máquina, baseado em Árvores de Decisão, capaz de predizer as pressões
de sucção e descarga de um compressor recíproco de uso doméstico. Os objetivos
específicos são os seguintes:
- Gerar um conjunto otimizado de features utilizando dados de vibração;
- Comparar diferentes algoritmos de Aprendizado de Máquina baseados em
Árvores de Decisão;
- Otimizar os parâmetros dos modelos selecionados;
- Propor aplicações práticas para o modelo desenvolvido
Metodologia
Este trabalho utilizou dados de vibração obtidos em uma bancada de testes do
Laboratório de Metrologia (LABMETRO) da Universidade Federal de Santa Catarina
(UFSC). Nela, foram realizados ensaios em 5 compressores de um mesmo modelo,
medindo-se a vibração em 3 eixos perpendiculares durante o seu funcionamento. Foram
utilizados 11 diferentes valores de pressão de sucção, 11 valores de pressão de descarga
e 3 valores de velocidade de rotação do compressor, gerando um total 363 combinações
possíveis. Este experimento foi repetido três vezes. Adicionalmente, 6 valores
intermediários de pressão de sucção e descarga e 2 valores de velocidade de rotação foram
utilizados para gerar um conjunto de dados de teste.
Os dados dos quatro primeiros compressores são utilizados para o treinamento dos
algoritmos, formando o chamado Conjunto de Treino. Os dados obtidos nestes mesmos
compressores, mas utilizando valores intermediários, formam o Conjunto de Teste,
utilizado para testar diferentes métodos e otimizar os parâmetros dos modelos.
Finalmente, os dados do quinto compressores são utilizados para avaliar a performance
final dos algoritmos, constituindo o Conjunto de Validação.
Neste trabalho, optou-se por construir um algoritmo que realiza a previsão das
temperaturas de evaporação e condensação do fluido, grandezas estas que são
representativas, respectivamente, da pressão de sucção e descarga. O motivo disso é que
são essas temperaturas que são utilizadas para o desenvolvimento de sistemas de
refrigeração, uma vez que elas são independentes do fluido refrigerante sendo utilizado,
ao contrário do caso da pressão.
Para a geração das features a serem utilizadas como entrada nos modelos, é
aplicada uma Transformada Rápida de Fourier e é calculada a energia do espectro dentro
de faixas de frequência com espaçamento linear. Diferentes valores de frequência
máxima, comprimento das faixas e sobreposição são testados para se obter os valores que
geram os melhores resultados. Adicionalmente, dois métodos de Redução de
Dimensionalidade são aplicados: Principal Components Analysis (PCA) e Feature
Importance. Diferentes parâmetros dos mesmos são testados. O método de Feature
Importance é também utilizado para analisar quais eixos de medida e frequências
apresentam maior importância para a previsão de cada uma das pressões.
O conjunto gerado é então utilizado para treinar e comparar a performance de 4
algoritmos: Regressão Linear Múltipla, Árvore de Decisão, Random Forest e LS Boost,
sendo que estes últimos dois algoritmos são classificados como métodos ensemble
baseados em Árvores de Decisão. Uma versão modificada do LS Boost, com amostragem
aleatória das variáveis utilizadas em cada nó, é também utilizada. Um algoritmo de
Otimização Bayesiana é então aplicado para a seleção dos hiperparâmetros dos
algoritmos ensemble utilizando-se o Conjunto de Treino.
Finalmente, são avaliados os modelos finais no Conjunto de Validação e é testada
a sensibilidade de sua performance à diferentes valores de pressão, velocidade de rotação
e eixos de medição. É feita também uma análise de como estes conhecimentos podem ser
utilizados no desenvolvimento de sistemas comerciais.
Resultados e discussão
O modelo final que apresentou melhor performance tanto para o caso de
temperatura de evaporação quanto para temperatura de condensação foi o LS Boost com
amostragem aleatória de variáveis nos nós, sem utilização de Redução de
Dimensionalidade. Este modelo obteve um Erro Médio Absoluto (MAE) de 1,22 °C para
a temperatura de evaporação e de 2.95 °C para a temperatura de condensação. Entretanto,
esta performance é altamente dependente das condições de pressões e de velocidade de
rotação do compressor.
Finalmente, mostrou-se que o sensor localizado próximo ao tubo de descarga
possui pouca influência para a previsão da temperatura de evaporação. Já para o caso da
previsão da temperatura de condensação, é o sensor alinhado com a vertical que menos
contribui com a qualidade do modelo.
Considerações finais
Mostrou-se que é possível o desenvolvimento de um algoritmo de Aprendizado
de Máquina para a previsão das pressões de sucção e descarga de um compressor
utilizando-se de dados de vibração. Os modelos e dados gerados neste trabalho, assim
como os conhecimentos obtidos, poderão ser utilizados no futuro para o desenvolvimento
de sistemas comerciais de medida não-invasiva de pressões de sucção e descarga de
compressores.
Palavras-chave: Compressores herméticos, Sensoriamento virtual, Aprendizado de

máquina, Árvores de decisão
ABSTRACT
The measurement of the operating condition of a refrigeration compressor, which is

partially characterized by suction and discharge pressures, is often used for new product
development, as well as for tests and quality control during production. There is, however,
an interest on doing such a measurement on a non-intrusive manner. An alternative is by
the use of the so-called soft sensing, in which another variable is measured and then
correlated to the aimed one, in this case, the pressures. Vibration data of the compressor
shell is used in this work, as it presents a great amount of information about the
compressor operating condition. As the correlation between the vibration and pressure is
very complex, Machine Learning methods are used. This work proposes a procedure for
signal processing using Fast Fourier Transform in order to generate a dataset for training
the algorithms. Different Machine Learning algorithms are compared, focusing on
ensemble tree-based methods. The hyperparameters of these models are then adjusted by
the use of a Bayesian Optimization algorithm. In this work, it was decided to develop an
algorithm that predicts the evaporation and condensing temperatures of the fluid,
characteristics that represent, respectively, the suction and discharge pressures, but that
can be easily generalized to different kinds of compressors. The final system is capable
of predicting the fluid condensing temperature with a Mean Absolute Error of 1.22 °C
and the evaporation temperature with a Mean Absolute Error of 2.95 °C. Finally, this
model is used to analyze the influence, over the algorithm accuracy, of the fluid pressure,
the compressor speed of rotation and the axis of vibration measurement, knowledge that
could be used on designing commercial measurement systems.
Keywords: Hermetic compressors, Soft sensing, Machine Learning, Decision Trees

LIST OF FIGURES
Figure 1 - Example of decision tree. .............................................................................. 25

Figure 2 – Representation of Bias and Variance ............................................................ 27
Figure 3 - Pseudocode of LSBoost ................................................................................. 31
Figure 4 – a) Photo of compressor; b) Schematic representation of a postive-displacement
compressor ...................................................................................................................... 35
Figure 5 - Torque profile as a funtion of the angular position for 5 different pressure
conditions ....................................................................................................................... 38
Figure 6 - Schematic representation of test bench.......................................................... 39
Figure 7 - Representation of axis from sensors .............................................................. 40
Figure 8 - Mapping of measurements ............................................................................. 41
Figure 9 - Energy of vibration in the Z-axis for each measurement of one round of the
Compressor 1 at 2100 RPM ........................................................................................... 44
Figure 10 - Energy of vibration in the Z-axis for each measurement of one round of the
Compressor 1 at 2850 RPM ........................................................................................... 45
Figure 11 - Energy of vibration in the Z-axis for each measurement of the Compressor 1
........................................................................................................................................ 46
Figure 12 - Energy of vibration in the x-axis for each measurement ............................. 47
Figure 13 - Energy of vibration in the y-axis for each measurement ............................. 47
Figure 14 - Energy of vibration in the z-axis for each measurement ............................. 48
Figure 15 - Frequency spectrum in log scale for the Z-axis of the first compressor ...... 49
Figure 16 - Frequency spectrum for the Z-axis of the first compressor at different
condensing temperatures ................................................................................................ 50
Figure 17 - FFT parameter optimization: bands length and maximum frequency for (a)
𝑇1 and (b) 𝑇2.................................................................................................................. 51
Figure 18 - FFT parameters optimization: bands length and superposition of (a) 𝑇1 and
(b) 𝑇2 .............................................................................................................................. 53
Figure 19 - R² when applying dimensionality reduction with random forest for different
number of components for (a) 𝑇1 and (b) 𝑇2 ................................................................. 54
Figure 20 - FFT bands importance for 𝑇1 ...................................................................... 56
Figure 21 - Acumulated number of most important features per axis for 𝑇1 ................ 56
Figure 22 - FFT bands importance for𝑇2 ....................................................................... 57
Figure 23 - Acumulated number of most important features per axis for 𝑇2 ................ 57
Figure 24 - Energy of each measurement for the best band of (a) 𝑇1 and (b) 𝑇2 .......... 58
Figure 25 - Measured values of the Test Dataset for (a) 𝑇1 and (b) 𝑇2......................... 61
Figure 26 - Real values versus predicted values using Multiple Linear Regression for (a)
𝑇1 and (b) 𝑇2.................................................................................................................. 63
Figure 27 - R² of Regression Tree for different levels of prunning for (a) 𝑇1 and (b) 𝑇2
........................................................................................................................................ 65
Figure 28 - Real values versus predicted values using pruned Decision Tree for (a) 𝑇1
and (b) 𝑇2 ....................................................................................................................... 66
Figure 29 - Graphical representation of Decision Tree for 𝑇1 ....................................... 67
Figure 30 - Graphical representation of Decision Tree for 𝑇2 ....................................... 68
Figure 31 - Real values versus predicted values using Random Forest for (a) 𝑇1 and (b)
𝑇2.................................................................................................................................... 69
Figure 32 - Real values versus predicted values using LS Boost for (a) 𝑇1 and (b) 𝑇2 71
Figure 33 - Sensibility of R² of 𝑇2 to the Number of Predictors ................................... 77
Figure 34 - Sensiblity of R² of 𝑇2 to the Learnig Rate .................................................. 78
Figure 35 - Sensiblity of R² of 𝑇2 to the Minimum Leaf Size ....................................... 78
Figure 36 - Sensiblity of R² of 𝑇2 to the Number of Variable to Sample ...................... 79
Figure 37 - Predicted vs real values of temperature on the Validation Dataset for (a) 𝑇1
and (b) 𝑇2 ....................................................................................................................... 82
LIST OF TABLES
Table 1 - Experiment parameters.................................................................................... 41

Table 2 - Values used for optimization of Bands Length and Maximum Frequency .... 51
Table 3 - Values used for optimization of Bands Length and Percentage of Superposition
........................................................................................................................................ 52
Table 4 - FFT parameters selected for the generation of features .................................. 53
Table 5 - Variance by number of components when using PCA ................................... 55
Table 6 - Results for Multiple Linear Regression .......................................................... 62
Table 7 - Results for unpruned Decision Tree ................................................................ 64
Table 8 - Results for pruned Decision Tree .................................................................... 66
Table 9 - Definition of the variables used for the pruned Decision Tree of 𝑇1 ............. 67
Table 10 - Definition of the variables used for the pruned Decision Tree of 𝑇2 ........... 68
Table 11 - Results for Random Forest ............................................................................ 68
Table 12 - Results for LS Boost ..................................................................................... 70
Table 13 – Summary of results for all models................................................................ 72
Table 14 - Results for Random Forest after optimization .............................................. 73
Table 15 - Best parameters for Random Forest defined by the optimization algorithm 74
Table 16 - Results for LS Boost after optimization ....................................................... 75
Table 17 - Best parameters for LS Boost defined by the optimization algorithm ......... 75
Table 18 - Results for LS Boost with Random Sampling after optimization ................. 76
Table 19 - Best parameters for LS Boost with Random Sampling defined by the
optimization algorithm ................................................................................................... 76
Table 20 - Results for the three optimized models for the dataset with 100 components
........................................................................................................................................ 79
Table 21 - Summary of results for Train, Test and Validation Datasets ........................ 82
Table 22 - Mean Absolute Error of Validation Dataset of 𝑇1 for different speeds: a) 2100
RPM, b) 2850 RPM and c) 3600 RPM........................................................................... 84
Table 23 - Mean Absolute Error of Validation Dataset of 𝑇2 for different speeds: a) 2100
RPM, b) 2850 RPM and c) 3600 RPM........................................................................... 85
Table 24 - Mean Absolute Error of mapped measurements of 𝑇1 for all compressors . 86
Table 25 - Mean Absolute Error of mapped measurements of 𝑇2 for all compressors . 86
Table 26 - Summary of results for Train, Test and Validation Datasets per axis of
measurement for 𝑇1........................................................................................................ 87
Table 27 - Summary of results for Train, Test and Validation Datasets per axis of
measurement for 𝑇2........................................................................................................ 87
LIST OF ABBREVIATIONS AND ACRONYMS
ANN – Artificial Neural Network
BPNN – Back Propagation Neural Network
CART – Classification and Regression Tree
DBN – Deep Belief Network
EMD – Empirical Mode Decomposition
FEM – Finite Elements Method
FFT – Fast Fourier Transform
HT – Hilbert Transform
IBGE – Brazilian Institute of Geography and Statistics (from the Prtuguese Instituto Brasileiro de
Geografia e Estatística)
LABMETRO – Laboratory of Metrology (from the Portuguese Laboratório de Metrologia)
LSE – Least Squares Error
LS Boost – Least Squared Boost
LVA – Laboratory of Vibration and Acoustics (from the portuguese Laboratório de Vibração e
Acústica)
LVQ – Learning Vector Quantization
MAE – Mean Absolute Error
MSE – Mean Squared Error
PCA – Principal Component Analysis
RMSE – Root Mean Squared Error
RP – Recursive Partitioning
SOM – Self Organizing Map
SVM – Support Vector Machine
TKEO – Teager Kaiser Energy Operation
UFSC – Federal University of Santa Catarina (from the Portuguese Universidade Federal de Santa
Catarina)
USA – United States of America
VCC – Variable Capacity Compressor
WT – Wavelet Transform
LIST OF SYMBOLS
A – the internal area of tube used to calculate its natural frequencies
β – coefficient of linear regression
𝛾 – scaling factor of LS Boost
e – error of the Random Forest
f – natural frequency of a pipe
E – Young’s modulus
F – an specific prediction of an LS Boost model
h – an specific prediction of a single tree from an LS Boost model
i – impurity used to make splits on a Decision Tree
I – area momentum of a pipe, used to calculate its natural frequencies
Imp – feature importance calculated for a Random forest
l – length of tube, used to calculate its natural frequencies
L – loss function
𝜈 – learning factor used on LS Boost algorithm
𝑣(𝑠 ) – variable used in split 𝑠 of LS Boost algorithm
M – number of iterations of LS Boost algorithm
m– the mass of tube per unit length, used to calculate its natural frequencies
N – number of observations on a Random Forest
𝑁 – number of samples reaching a specific node of a Random Forest
𝑝̅ – average correlation between trees
p – proportion of samples reaching a split of a Decision Tree over total number of

samples
P – pressure
𝑃 ,𝑃 – suction pressure
TABLE OF CONTENTS
1 INTRODUCTION........................................................................................................... 19
1.1 CONTEXTUALIZATION ............................................................................................19
1.2 STATE OF THE ART...................................................................................................20
1.3 OBJECTIVES....................... ........................................................................................21
1.4 DOCUMENT STRUCTURE ........................................................................................22
2 THEORETICAL FUNDAMENTS................................................................................ 23
2.1 MACHINE LEARNING ...............................................................................................23
2.1.1 Multivariable Linear Regression..................................................................... 24
2.1.2 Decision trees ..................................................................................................... 24
2.1.3 Ensemble Methods ............................................................................................ 26
2.1.3.1. Random Forest ............................................................................... 28
2.1.3.2. LS Boost ......................................................................................... 29
2.2 DIMENSIONALITY REDUCTION AND FEATURE SUBSET SELECTION .........31
2.2.1 Principal Components Analysis....................................................................... 32
2.2.2 Feature Importance .......................................................................................... 32
2.3 PERFORMANCE METRICS .......................................................................................33
2.4 BAYESIAN OPTIMIZATION .....................................................................................34
3 SYSTEMS AND METHODS ......................................................................................... 35
3.1 COMPRESSOR ARCHITECTURE .............................................................................35
3.2 COMPRESSOR VIBRATION SOURCES ..................................................................36
3.3 EXPERIMENT DESCRIPTION...................................................................................39
3.4 TRAINING, TESTING AND VALIDATION OF MODELS ......................................42
4 DEVELOPMENT ........................................................................................................... 43
4.1 GENERATION OF FEATURES ..................................................................................43
4.1.1 Preliminary analysis ......................................................................................... 43

28
4.1.2 FFT Analysis ...................................................................................................... 48
4.1.3 FFT parameters optimization .......................................................................... 50
4.1.4 Dimensionality reduction and feature selection ............................................. 54
4.1.5 Discussion of results .......................................................................................... 59
4.2 COMPARISON OF DIFFERENT MACHINE LEARNING MODELS ..................... 60
4.2.1 Multiple Linear Regression .............................................................................. 62
4.2.2 Decision Tree ..................................................................................................... 64
4.2.3 Random Forest .................................................................................................. 68
4.2.4 LS Boost ............................................................................................................. 70
4.3 OPTIMIZATION OF HYPERPARAMETERS ........................................................... 73
4.3.1 Random Forest .................................................................................................. 73
4.3.2 LS Boost ............................................................................................................. 74
4.3.3 LS Boost with Random feature sampling ....................................................... 76
5 FINAL RESULTS ........................................................................................................... 81
5.1 RESULTS FOR THE VALIDATION DATASET ...................................................... 81
5.2 COMPARISON BETWEEN AXIS .............................................................................. 87
5.3 PRACTICAL APPLICATIONS AND FUTURE DEVELOPMENTS ........................ 88
6 CONCLUSION ................................................................................................................ 91
REFERENCES ..................................................................................................................... 93
19
1 INTRODUCTION
1.1 CONTEXTUALIZATION
The generation of cold in a reliable and economic manner is a challenge of great

importance for the current society. Refrigerators and air conditioning systems are present in all
kinds of industrial, commercial and domestic applications. In Brazil, more than 58 million
households had refrigerators in 2011 (IBGE, 2011). At the core of such systems are the
compressors, which are highly responsible for the system performance. It is estimated that in
United States of America alone there are more than 400 million compressors in existence
(SOEDEL 2008).
One of the most used designs both in industry and household applications is the
reciprocating compressor, mainly due to its simplicity and flexibility (NAVARRO, 2007), as
well as a high pressure ratio achievement (TRAN, 2014). It is composed by an electric motor
connected to a piston, which is used to compress the refrigeration fluid to the pressure of
operation of the system.
Refrigeration systems, just as any other machine, are error-prone (both the
compressors and the other components of the system), therefore maintenance must be carried
out. In order to make a diagnosis of the system, the information of the fluid pressure at the
compressor inlet (suction pressure) as well as at the compressor outlet (discharge pressure) is
of great value, since those are highly dependent on the operating condition (WALENDOWSI,
2017). However, it is economically infeasible to constantly keep pressure sensors installed in
small household systems. Therefore, technicians need to insert pressure sensors in the fluid
lines always when a diagnosis must be carried out.
This procedure has numerous disadvantages. Firstly, the process of installation of the
sensors is relatively complex, as it is necessary to remove the refrigeration fluid from the
system, install the sensors and then recharge it, which has also financial implications. Moreover,
as the refrigeration fluid is replaced, there is no way of knowing if the condition before the
intervention is the same as before. Finally, the insertion of extra components in the line may
also change the operating condition of the system.
Accordingly, there is an interest on making a measurement of these pressures on a non-
intrusive fashion during the operation of the system. A good option is through the measurement
of the compressor vibration, as any rotating machine emits a characteristic vibration pattern that
20
is altered when changes on its working conditions occur. It would just be a matter of relating
those changes on vibration to the values of pressures.
Soedel (2008) presents strategies for calculating analytically the vibration
characteristics of compressors. Those methods are, however, highly complex and approximate,
as the relation between the pressure and the vibration is extremely complex. As an alternative,
this project presents a method of making this calculation using Machine Learning techniques.
1.2 STATE OF THE ART
The method of measuring a physical property of a system and using it to estimate other
correlated properties by the use of mathematical models is known in literature as soft sensing
or virtual sensing (LIU et al., 2009). When this relationship can be expressed mathematically,
an analytical model may be used. However, in the case that this relationship is too complex,
empirical models, such as Artificial Neural Networks (ANN), must be applied (LIN et al.,
2007). According to recent surveys, soft sensing is being applied successfully in a wide variety
of fields (Kadlec et al., 2009; Qin et al., 2012; Yin et al., 2015).
The application of soft-sensing has already been explored by researchers at the
Laboratory of Metrology (LABMETRO) of UFSC, the laboratory in which the author is an
intern, applying mainly ANNs in procedures of performance analysis of the compressors
manufactured by the company which funded this project (LIMA, 2010; PENZ, 2011; CORAL,
2014; PACHECO, 2015; NASCIMENTO, 2015).
Walendowsky (2017) developed a method to estimate the profile of variation of
angular velocity during one cycle of rotation of a compressor and used it in addition to the
average current as input for two ANNs to predict the suction and discharge pressures of
compressors. Similar projects applied to refrigerators and medium-sized air conditioning
systems can also be found in literature (PARIS et al., 2014; SCHANTZ, 2011; SCHANTZ,
LEEB, 2017).
However, a model of soft-sensing which makes use of vibration data is new to the
laboratory, and nothing could be found in the literature about its use for the prediction of
compressor pressures. Most works that exist focus on using the vibration for the prediction of
failures of compressors.
Yang (2005) made a comparison of three techniques for defect identification in
reciprocating compressors: Self Organizing Maps (SOM), Learning Vector Quantization
(LVQ) and Support Vector Machine (SVM). The input features of the models were obtained
21
through a Wavelet Transform (WT) of the vibration and the calculation of statistical values of
the filtered signal.
Tran (2013) developed a method for the identification of defects in reciprocating
compressor through data of vibration, pressure and current. It calculates the envelope by the
method of Teager Kaiser Energy Operation (TKEO) and filtering by WT. As classification
algorithms, three methods were compared: Deep Belief Network (DBN), SVM and Back
Propagation Neural Network (BPNN).
Because of the high demand of compressors with low noise emission, plenty of
research has been done in order to model its vibroacoustic behavior analytically, by the Finite
Elements Method (FEM) and experimentally, aiming mainly to get information for the design
of quieter compressors. Soedel (2008) presents an extensive study of analytical formulations of
the vibroacoustic behavior of reciprocating compressors, including possible influences of the
suction and discharge pressures.
The Laboratory of Vibration and Acoustics (LVA) of UFSC has also a history of
projects in this research line, modelling specifically the compressors of the company which
funded this project (SANGOI, 1983; DIESEL, 2000; CARMO, 2001; DENCKER, 2002;
FULCO, 2014). The work by Fulco (2014) is especially interesting, in which an analytical
model of a compressor similar to the one used in this work was developed, which took into
account the variation of rotational speed due to the chamber pressure. Moreover, an FEM model
was developed in order to evaluate the paths of transmission of vibroacoustic energy up to the
frequencies of 6300 Hz. However, the focus of these projects was the design of compressors
with lower vibration and not the prediction of other properties by the levels of vibration.
1.3 OBJECTIVES
The global objective of this project is to develop a tree-based ensemble Machine

Learning model capable of predicting the suction and discharge pressures of a household
reciprocating compressor using vibration data. The specific objectives are as follows:
- To generate an optimized set of features using vibration data;

- To compare different tree-based Machine Learning algorithms;
- To optimize the parameters of the selected algorithms;
- To propose practical applications for the developed method.
22
1.4 DOCUMENT STRUCTURE
This document is divided into six chapters.

Chapter 2 presents the theory behind the different mathematical tools used in this
project, namely the Machine Learning models and the methods for dimensionality reduction,
model evaluation and optimization of hyperparameters.
Chapter 3 makes a description of the systems involved in the experiment. Firstly,
details about the compressors are given, including the theory behind the relation between its
vibration and suction and discharge pressures. Following, the test bench is presented, as well
as the data used for training and testing the Machine Learning models.
Chapter 4 presents all the studies carried out in this project. Firstly, a preliminary
analysis of the data is shown, followed by the procedure made for the generation of features
using FFT. In sequence, the performance of different Machine Learning models is compared,
and the hyperparameters of the best models are optimized using Bayesian Optimization.
Chapter 5 makes an analysis of the final performance of the models using new
measurements. The result for different system conditions is studied and suggestions for a
practical implementation of the system are made, as well as propositions of future
developments.
Finally, chapter 6 presents the conclusions of this project.
23
2 THEORETICAL FUNDAMENTS
2.1 MACHINE LEARNING
Humankind has been trying for a long time to develop smart systems, capable of
simulating human reasoning. However, hard-coded approaches proved to be inefficient to
tackle problems of higher complexity. The solution was to develop systems with the ability to
acquire their own knowledge and extract patterns from raw data, techniques that are known
nowadays as Machine Learning. Although such algorithms already exist since many decades,
they have grown in importance only recently with the fast expansion of available data and
computing power, as well as the evolution of the statistical methods (GOODFELLOW, 2016;
LANTZ, 2015).
One category of such methods is known as ensemble Tree-based methods. Tree-based
algorithms are simple but powerful Machine Learning techniques that are widely used in Data
Mining problems (TAN, 2006), being great for dealing with domains with large number of
variables and cases (TORGO, 1999). Moreover, these techniques are capable of analyzing the
importance of each feature, which can be of great use for the selection of features to be used.
Those methods can be applied for both classification and regression. The simplest of
such algorithms is the Decision Tree, which alone has limited applications but, when combined
with other Decision Trees in an ensemble manner, can achieve great results (TAN, 2006).
Caruana (2006) has made a large-scale empirical comparison of ten supervised
learning algorithms using eight performance criteria. Prior to calibration of the
hyperparameters, Bagged Trees, Random Forests and Neural Networks gave the best average
performance. However, after calibration using Platt’s Method, Boosted Trees move into first
place.
It is evident the high accuracy of tree-based methods and thus this project focuses on
this kind of algorithm. More advanced techniques, such as deep learning, are not appropriate
for this case, because of the relatively small size of the dataset and large number of input
variables (GOODFELLOW, 2016).
Firstly, the theory of the Mulivariable Linear Regression is presented, one of the
simplest methods of Machine Learning. Following, a revision about Tree-based methods is
done, starting with an explanation about the theory behind the Decision Tree. This method is
24
the building block for the next two algorithms, Random Forest and Gradient Boosting, which
are then explained in section 2.1.3.
2.1.1 Multivariable Linear Regression
The Simple Linear Regression is a straightforward method, which assumes a linear

relationship between the dependent and independent variables, characteristic that is rare in
nature. However, there are problems in which this kind of model is good enough, and it should
thus be tested as a first estimative.
The Multivariable Linear Regression is an extension to the Simple Linear Regression
to situations in which there are multiple variables that influence the response, as is the case of
most situations in real life. The equation takes the form shown below, in which the independent
variables 𝑋 , 𝑋 , … , 𝑋 are multiplied by the coefficients 𝛽 , 𝛽 , … , 𝛽 and added to a bias 𝛽
in order to generate a prediction of the dependent variable Y. Those coefficients are obtained
using the same least squares procedure applied to the Linear Regression (LANTZ, 2015).
𝑌 = 𝛽 +𝛽 𝑋 +𝛽 𝑋 +⋯+ 𝛽 𝑋 (1)
2.1.2 Decision trees
According to Tan (2005), a Decision Tree makes predictions by dividing the dataset
into subgroups depending on the features of each observation. Each of those divisions happens
at one of the tree nodes. The tree may have several levels of divisions until reaching what is
called the Terminal Nodes or Leaf Nodes.
If the problem being solved is a Classification, the Leaves will have classes and, in the
case of a Regression problem, it will have a value for the prediction of the dependent variable.
As the problem being tackled by this project is a Regression, focus is given to the Regression
Trees.
Figure 1 shows an example presented by Shalizi (2009), in which the tree predicts the
price of houses in the USA based on the latitude and longitude. For example, for a sample with
latitude lower than 38.485, longitude lower than -121.655 and latitude higher than 37.925, this
Decision Tree would make a prediction of 12.10.
25
Figure 1 - Example of decision tree.
Source: Shalizi (2009)
According to Breiman (1984), the process of building (or training) a Regression Tree
is done through a Recursive Partitioning (RP) algorithm and the most widely used of such
techniques is known as CART, which simply stands for Classification and Regression Trees. It
builds Least Squares Regression Trees, or in other words, it tries to find the parameters which
minimize the Least Squares Error (LSE) criterion. It is the one that is used in this work.
At each node, the algorithm searches among all variables and among all possible split
values of each variable for the option that reduces the most the LSE. It then uses this criterion
to create the rule for the split into further nodes (BREIMAN, 1984).
The main problem when defining the parameters of a decision tree is to decide when
to stop splitting. Torgo (1999) discusses that the more a tree is grown, the more unreliable it
gets, because at each division the number of samples is reduced. As the reduction of error is
being evaluated using the data of the training set, this leads to an overfit of the model.
26
According to Tan (2006), the process of simplifying this model is called pruning. He
discusses that there are basically two methods of pruning. On the first one, known as pre-
pruning, more strict conditions are imposed to the tree so that it is not allowed to grow pass a
certain size. In addition to the condition that an additional split must decrease the MSE, it can
be defined the maximum number of splits of a tree and also the minimum number of
observations at each Leaf.
The second method, known as post-pruning, consists on growing a tree until its
maximum depth, followed by a process of elimination of the splits that give less reduction of
MSE, until reaching a maximum of accuracy. According to Tan (2006), this method normally
yields better results than pre-pruning, because this last one can leave pass important splits due
to a too early stop.
2.1.3 Ensemble Methods
Even though Decision Trees have a series of advantages, they have also a high
instability that may degenerate their quality, as a small change in the training set may lead to a
different choice when building a node. Moreover, their prediction in regression is highly non-
smooth and the ability to evaluate complex problems is restricted (TORGO, 1999).
There are several techniques to overcome these problems. One solution is the use of
ensemble methods, which use a series of Decision Trees and makes an estimation by making a
combination of the output of each of those trees (TAN, 2006).
The error of a predictor is composed by two competing properties: bias and variance.
According to Lantz (2015), variance refers to the amount that predictions would change when
the model is trained with a different training set whereas bias refers to the error that is introduced
by approximating a real-life problem (which may be extremely complicated) by a much simpler
model. Normally, when more flexible methods are used, bias decreases (as it is able to better
describe the phenomena) but variance increases.
A good way to understand these phenomena is by analyzing how polynomial curves
of different degrees fit into a dataset, as represented by the graphs in figure 2, extracted from
Lantz (2015). In the left, certain experimental points are represented, as well as three models
that try to fit to it: a linear regression (orange), and two smoothing spline fits (blue and orange).
The graph in the right shows the MSE for test set (red curve) and training set (gray curve) for
all models.
27
The linear regression is a model too simple to represent the data, and thus it has a huge
bias and, consequently, a big MSE both for training and test set. When a more complex model
is used, such as the one represented by the blue line, the system can be better modeled and the
MSE falls. However, if even more flexibility is added to the model, as the one represented by
the green line, it will be able to represent well the training set, reducing the related MSE, but
will also lose its generalization capabilities, increasing its test set error. This is a case of big
variance. The ensemble methods work by trying to minimize the effect of such elements.
Figure 2 – Representation of Bias and Variance
Source: Lantz (2015)
Tan (2006) says that there are two conditions for an ensemble method to be better than
a single classifier: 1) the base-classifiers must be independent among each other and 2) each
classifier must be better than a random choice. In practice, it is hard to have completely
independent classifiers, but there are methods to make those classifiers the least possibly
correlated. The first method is through the manipulation of the training set, technique used by
Bagging and by the Gradient Boosting. The second is by manipulating the training features,
which is done by the Random Forest. Both of those methods work by reducing the variance of
the answer. However, boosting goes further and, by iteratively improving the classifiers, it also
helps to reduce bias, at the expense of the risk of overfitting.
28
2.1.3.1. Random Forest
Random Forest is a highly stable method presented by Breiman (2001), using

bootstrapping to generate a different training dataset for each Decision Tree. It samples
randomly N observations from the N observations present in the dataset (same number of
sampled observations as the total number of available observations), with reposition, which
means that some observation points are repeated. This way, the generated training sets drop
out approximately 37% of the original dataset, due to these repeated observations.
Moreover, for each split of each Decision Tree, instead of evaluating all the predicting
variables, a set of F variables are randomly selected. This method decreases the correlation
between the Trees, reducing overfit and improving accuracy (BREIMAN, 2001).
Therefore, the result depends on the selected value of F, being it a hyperparameter of
the model. Breiman (2001) has shown that high values of F improve the strength of each
individual predictor, but also increases their correlation, which reduces the accuracy of the
model. On the other hand, small values of F decrease strength but also decrease correlation.
Accordingly, there is an optimal range which needs to be found out.
Another hyperparameter that must be chosen is the number of trees being used.
Breiman (2001) has proven that, by the Strong Law of Large Numbers, increasing the number
of trees makes the prediction error to converge to the value below, represented by e, where
𝑝̅ stands for the average correlation between trees and s is a measure of strength of the
classifiers. Overfitting is thus not a problem and the number of trees to be used is restricted
solely by computational power.
𝑝̅ (1 − 𝑠 )
𝑒= (2)
𝑠²
Breiman (2001) has also demonstrated that Random Forests are especially robust
against noise on the dependent variable (which can otherwise generate overfitting) and that they
can also deal well with datasets having a high number of weak, highly redundant variables. This
is the case of the present work: in order to predict the suction and discharge pressures of the
refrigeration fluid, the values of pressure measured on a test bench are used as labels, which are
filled with noise and don’t actually represent perfectly the real value of pressure. Moreover, the
variables used as input come from bands of FFT and thus there may be a high number of
independent variables depending on the number of bands used.
29
2.1.3.2. LS Boost
LS Boost stands for Least Squared Boost and is a version of Gradient Boosting used
for Regression Problems. This method was presented by Friedman (2001) and also uses several
trees to make a prediction. However, instead of using independent predictors as the Random
Forest, the LS Boost builds each Tree based on the previous one, improving the accuracy of the
model at each iteration. It starts by initializing the model to a constant value, represented by
equation 3.
𝐹 (𝑥) = arg min 𝐿(𝑦 , 𝛾) (3)
Where 𝐹 is the function generated by the model, L is the loss function, y is the value
of the dependent variable for each observation, 𝛾 is the constant value to be found and i
represents each observation. According to this equation, 𝐹 is assigned to the value of 𝛾 that
minimizes the sum of the loss function for all observations.
Since the Least Squared Error has been chosen as the loss function, Friedman (2001)
has shown that the result is the mean value of the dependent variable for all observations.
Therefore, the first iteration of the model simply calculates the mean of all measured pressures
and uses this value as prediction for all observations.
The next step is the calculation of the so-called pseudo-residuals for each observation,
as defined in equation 4.
𝜕𝐿 𝑦 , 𝐹(𝑥 )
𝑦 =− for 𝑖 = 1, … , 𝑛 (4)
𝜕𝐹(𝑥 ) ( ) ( )
As the Least Squared Error is being used, the pseudo-residuals are the actual residuals
(i.e. the difference between the predicted values and the actual ones), multiplied by two.
𝑦 = 2[𝑦 − 𝐹 (𝑥)], for 𝑖 = 1, … , 𝑛 (5)

30
Following, a base learner is fitted to these pseudo-residuals, which means to build a

Regression Tree that predicts the value of the pseudo-residuals. The prediction of the base
learner of the iteration m for the observation i is denoted by ℎ (𝑥 ).
These predictions are multiplied by a scaling factor 𝛾 and added to the predictions of
the model from the previous iteration, generating the new model, as represented by equation 6.
𝐹 (𝑥) = 𝐹 (𝑥) + 𝛾ℎ (𝑥). (6)
An optimization procedure is followed in order to choose the value of 𝛾 in such a way

that minimizes the loss function in the original dataset, i.e. the real features, not the pseudo-
residuals. This is shown in equation 7.
𝛾 = 𝑎𝑟𝑔 𝑚𝑖𝑛 𝐿(𝑦 , 𝐹 (𝑥 ) + 𝛾ℎ (𝑥 )) (7)
However, according to Friedman 2001, this procedure tends to overfit the data, and
thus a regularization must be carried out. This is achieved by multiplying the response of the
base learners by a learning factor 𝜈, which has normally a value around 0.1.
Therefore, the final shape of the equation that defines the prediction of the model at
the iteration m is expressed in equation 8.
𝐹 (𝑥) = 𝐹 (𝑥) + 𝜈𝛾ℎ (𝑥). (8)
This procedure is repeated several times, adding new trees to the model, until reaching
a certain stopping criteria defined by the user, such as the number of iterations M.
The whole procedure can be summarized by figure 3.
31
Figure 3 - Pseudocode of LSBoost
Source: Author
Tests carried out by Caruana (2006) show that LS Boost generally achieves a higher
accuracy than Random Forests after calibration, as it is able to model more closely highly
complex conditions, although this result depends heavily on the problem being modeled. The
results also show that the performance of the Boosted Trees is highly dependent on the model
hyperparameters: if no tuning is carried out, the Random Forest tends to achieve a much better
result.
2.2 DIMENSIONALITY REDUCTION AND FEATURE SUBSET SELECTION
As further explained in section 4.1.2, the dataset used for the model is composed by
features that are extracted from the FFT spectrum by dividing it in constant-length bands. This
means that each band yields one feature and, as the size of each band is reduced in order to get
more concentrated information, the number of features increases as well. Therefore, there will
probably be bands that do not contribute to the modeling of the pressure. Moreover, there are
bands in different axes of measurement, or even in the same axis, which carry the same
information, being thus redundant. Two methods are tested to overcome this problem:
Dimensionality Reduction and Feature Subset Selection.
As discussed by Tan (2006), there is a variety of benefits to the use of Dimensionality
Reduction. Firstly, many Machine Learning algorithms work better if the dimensionality is
lower, mainly because of the reduction of noise and the elimination of irrelevant features. It can
also lead to a more understandable model. Finally, the amount of time and memory required is
reduced.
32
Still according to Tan (2006) the term Dimensionality Reduction is often reserved to
those techniques that reduce dimensionality of a dataset by creating new attributes that are a
combination of the old attributes. One such method is known as Principal Components Analysis
(PCA) and is the one used on this project.
The reduction of dimensionality may also be achieved by selecting new attributes that
are a subset of the old one, technique that is known as Feature Subset Selection or Feature
Selection (TAN, 2006). The technique used on this project for the selection of features is the
Feature Importance calculated by Random Forest, which is better explained in section 2.2.2.
2.2.1 Principal Components Analysis
As explained by Tan (2006), Principal Component Analysis (PCA) is a technique that

builds new attributes (principal components) as linear combinations of the original ones, trying
to maximize the amount of variation carried by each of those. When PCA is applied to a dataset,
it generates the same number of components as the original number of features. As each of
these components carry a certain amount of variation of the data, it is possible to create a new,
smaller, dataset by choosing only the components that carry the most information.
However, this new dataset represents only a fraction of the total information of the
original dataset. As this method only detects linear relations between the independent variables
and is not able to analyze correlations to the independent variable, it may lose important
information for the case of complex, non-linear, problems.
2.2.2 Feature Importance
Datasets often contain features that are irrelevant and redundant, and thus they could
be reduced simply by choosing only the ones that carry information about the problem being
solved. The ideal approach would be to try all possible subsets of features as input to the model
being developed and choose the one that produces the best result. This method, however, is
impractical, as the number of trials would increase exponentially with the number of features
(TAN, 2006). Therefore, other techniques must be used.
A solution is to use a Decision Tree to calculate the Feature Importance, which
represents how much each feature is contributing to the final model accuracy. This kind of
technique, in which the feature selection occurs naturally as part of the Machine Learning
algorithm, is known as an Embedded Approach (TAN, 2006).
33
According to Breiman (2001, 2002), for the case of classification trees, the importance
of a feature 𝑋 is measured as the sum of the weighted impurity decrease 𝑝(𝑡)Δ𝑖(𝑠 , 𝑡) for all
nodes t in which 𝑋 is used, where p is the proportion 𝑁 /N (samples reaching t over total
number of samples) and ∆i is the reduction of impurity due to this specific split. For the case of
a regression tree, the reduction of impurity is simply replaced by the reduction of RMSE.
This process is presented in equation (9), where t represents a specific split, T a specific
tree, and 𝑣(𝑠 ) is the variable used in split 𝑠 .
𝐼𝑚𝑝(𝑋 ) = 𝑝(𝑡)Δ𝑖(𝑠 , 𝑡) (9)

: ( )
The same technique can be applied for Ensemble Methods (presented in section 2.1.3)
simply by averaging the importance across all trees T, as shown in equation 10 (BREIMAN,
2001, 2002), where 𝑁 represents the total number of trees.
1
𝐼𝑚𝑝(𝑋 ) = 𝑝(𝑡)Δ𝑖(𝑠 , 𝑡) (10)
𝑁
: ( )
2.3 PERFORMANCE METRICS
In order to evaluate the generated models, it is necessary the definition of a

performance metric well-adapted to the problem. As the models here created are regressions,
two metrics are going to be used throughout this project: Root Mean Squared Error (RMSE)
and R Squared (R²). RMSE measures the error in absolute values, whereas the R² is scale-free,
with maximum value of 1. Their definition is presented in equations 11 and 12, where 𝑦
represents the real value of the independent variable for observation j, 𝑦 represents the
predicted value for this same observation, 𝑦 is the average value of y across all observations
and n is the total number of observations.
1
𝑅𝑀𝑆𝐸 = 𝑦 −𝑦 ² (11)
𝑛
34
∑ 𝑦 −𝑦 ²
𝑅 =1− (12)
∑ 𝑦 −𝑦 ²
For a better visualization of the quality of the final model, the Mean Absolute Error
(MAE) is also used, as it gives a more intuitive feeling of the model precision.
1
𝑀𝐴𝐸 = 𝑦 −𝑦 (13)
2
2.4 BAYESIAN OPTIMIZATION
Machine Learning models often require a careful tuning of its many hyperparameters.
However, this process is often done manually, using experience, or with a brute-force approach.
(SNOEK et al, 2012).
As an alternative, it is possible to use an automated procedure, such as the Bayesian
Optimization, a technique that has been shown to outperform other state of the art global
optimization algorithms on a number of cases (SNOEK et al, 2012). This method has been used
on a series of cutting-edge Machine Learning projects, such as DeepMind’s AlphaGo (CHEN,
2018).
The version of the algorithm applied in this project creates a Gaussian Process function
to predict the response of the Machine Learning model to a set of hyperparameters, as described
by Snoek et al (2012). The accuracy of this function is improved as new sets of hyperparameters
are used.
The algorithm must then choose the parameters that minimize a predicted loss function
(the RMSE for the present work). There are several methods that can be used for the choice of
the set of hyperparameters to be used at each iteration. The one used in this project was the
“expected-improvement-per-second-plus” (MATHWORKS, 2019). It uses the traditional
Expected Improvement per Second algorithm (SNOEK et al 2012), which optimizes not only
the accuracy but also the training time of the models tested, but with a modification to avoid
overexploiting an area, as proposed by Bull et all (2011).
35
3 SYSTEMS AND METHODS
This chapter begins with a general explanation about the compressor architecture and
its vibration characteristics. Following, it presents the experiments and data acquisition details.
It finishes with an explanation of the organization of the generated datasets.
3.1 COMPRESSOR ARCHITECTURE
The compressor used in this study is a reciprocating positive-displacement compressor

used in small domestic refrigerators. It is a VCC (Variable Capacity Compressor), which means
that it is equipped with an inverter capable of controlling the speed of rotation.
Figure 4 – a) Photo of compressor; b) Schematic representation of a postive-displacement compressor
Source: a) Author, b) Soedel (2008)
The compressor has an induction electric motor that actuates a piston, which moves
on a reciprocating fashion inside a cylinder (A) similar to an internal combustion engine. When
this piston retracts, a region of low pressure is generated, making the fluid to enter the cylinder.
Following, the direction of movement of the piston is inverted and it then starts exerting work
over the gas, compressing it. As a consequence, the pressure of the fluid is increased until
reaching a value that opens the discharge valve, injecting this pressurized fluid into the system.
The amount of pressure increase depends on the characteristics of the refrigeration
system and the speed of rotation of the compressor. By varying this speed, it is possible to
control the refrigeration capacity of the system.
36
In this specific architecture, the fluid at low pressure (suction pressure) is injected in
the whole cavity of the compressor through (C), entering the cylinder through an intake valve.
After being pressurized, the fluid exits the cylinder through a discharge valve and a pipe to the
rest of the refrigeration system (D). These valves are composed by flexible plates designed to
open on a specific pressure differential.
All moving parts of the compressor are assembled in the casing, which is mounted
over springs (F) in order to avoid transmitting their vibrating to the shell (B). Apart from the
springs, the only physical connection of these parts to the shell is the discharge pipe, which is
also designed to reduce the transmitted vibration. The shell is composed of two parts (top and
bottom cap) which are welded together, forming a hermetic cavity. The only component present
in the upper part of the shell is the connection of the suction pipe.
3.2 COMPRESSOR VIBRATION SOURCES
The modelling of vibration and noise of compressors is a wide field of study and an
in-depth discussion of its vibration characteristics is beyond the scope of this project. Therefore,
focus is given only to the possible mechanisms of influence of the pressures over the vibration.
Soedel (2008) discusses that a variation on the conditions of the gas inside the shell
may change the gas resonances, as the speed of sound in it is modified. In addition to that, the
difference between the discharge and suction pressures may change the stiffness of the
discharge pipe. An increase in the pressure differential increases its stiffness, shifting by a few
hertz the natural frequencies of the pipe, which may be enough to cause resonance. This can be
better visualized in equation 14, which shows the natural frequencies of a simplified, straight,
discharge pipe, ignoring the effect of gas velocity (SOEDEL, 2008).
𝜋 𝑛 𝐸𝐼 (𝑃 − 𝑃 )𝐴𝑙²
𝑓 = 1+ (14)
4𝑙 𝑚 𝐸𝐼𝜋²𝑛²
Where 𝑃 is the discharge pressure, 𝑃 is the suction pressure, A is the internal area of
tube, l is the length of the tube, E is the Young’s modulus, I is the area moment, m is the mass
of the tube per unit length and n = 1, 2, 3…
There are two possible sources of vibration to the discharge pipe. The first one is the
vibration of the compressor casing (cylinder, piston, etc.). The second one is the case in which
the discharge pipe has curves (which is the case for the compressor used in this study), causing
37
that the pulses of pressure produced by the valves to excite vibration at each curve. Because of
its slenderness, this pipe has normally several natural frequencies in the range of 100 Hz to
5000 Hz, which is the range of concern for typical household compressors. Therefore, the
occurrence of pipe resonances is almost inevitable, contributing considerably to the shell
vibration (SOEDEL, 2008). As the change in pressure shifts such natural frequencies, it is
expected an important relation between pressure and vibration.
In the FEM simulation done by Fulco (2008), the model received as input the measured
discharge pressure and calculated as output the vibration transmitted to the shell. Until 1000 Hz,
the sound at the discharge pipe is composed basically by the displacement of the casing, with
virtually no influence of the pressure. However, after 2500 Hz, the pressure starts to have some
influence and at 6300 Hz there is an exceptionally high incidence of noise due to the discharge
pressure. According to the simulation, from 3150 Hz to 6300 Hz, the intake acoustic filter and
the discharge pipe accounted to over 50% of the total sound power level.
Fulco (2008) also shows how the variation of pressure causes a cyclical variation of
resistant force, which in turn causes a variation of rotational speed and thus of the vibrational
pattern of the casing. This effect is more pronouncing over the lower spectrum of vibration.
Walendowski (2017) made use of this fact to make an estimation of the suction and
discharge pressures of the compressors using measures of electric quantities from the motor.
The figure below shows the results of an experiment conducted by him, in which the instant
torque is measured at each angular position for 5 different pressure conditions. It is quite evident
the effect that the pressures have on the peak of torque.
This variation in torque also changes the vibration pattern of the casing, which is in
turn transmitted to the shell. Therefore, this is another way by which the pressure has some
influence over the shell vibration.
38
Figure 5 - Torque profile as a funtion of the angular position for 5 different pressure conditions
Source: Translated from Walendowsi (2017)
Soedel (2008) also discusses the effect of the valves on the compressor vibration and
acoustic noise. Due to the intermittent nature of piston compressors, the valves open and close
at a defined frequency, hitting the valve seat and causing pulses of pressure, which may
modulate the structure resonances.
Moreover, the valves may flutter, adding high frequency components to the vibration.
However, the specific dynamics of the opening and closing acts more like a disturbance, and
those high frequency components have few influence on the vibration when compared to the
frequency of rotation of the shaft.
Finally, gas pulsations of the suction manifold may cause a sloshing effect typically
around 200 Hz to 500 Hz. Turbulence can also have an influence, mostly in the high frequency
spectrum, around 3000 Hz to 6000 Hz (SOEDEL, 2008).
39
3.3 EXPERIMENT DESCRIPTION
In order to train and test the Machine Learning models of this project, data collected
from a series of experiments carried out in a test rig built at the Laboratory of Metrology
(LABMETRO), from UFSC, were used. This test rig is capable of controlling the pressure of
the refrigeration fluid during the operation of a compressor at both intake and discharge. A
schematic representation of the test rig is presented in figure 6.
Figure 6 - Schematic representation of test bench
Source: modified from Walendowski (2017)
Although the system measures and controls the suction and discharge pressures (𝑃
and 𝑃 ), these values were transformed to evaporation and condensing temperatures of the
refrigeration fluid (𝑇 and 𝑇 , respectively) in order to be used in the Machine Learning models.
The reason for that is because different kinds of refrigerants may be used depending on the
system, which means that the pressures may also vary. Therefore, the design of refrigeration
systems is made using the evaporation and condensing temperatures, which will be the same
independently of the refrigerant. It is thus convenient to use these temperatures as independent
variables of the model instead of suction and discharge pressures, although they actually mean
the same thing.
The experiments used in this work considered five identical compressor samples. For
each compressor, a series of tests was made varying 𝑇 and 𝑇 , as well as the speed of rotation.
During these tests, the vibration of the compressor shell was measured using three sensors at a
sampling rate of 51.2 kHz and the data were stored as vibratory velocity. The sensors were
mounted aligned to the X, Y and Z axes, as shown in the figure 7.
40
Figure 7 - Representation of axis from sensors
Source: Author
The X-axis and Y-axis sensors were positioned on the bottom cap, whereas the Z-axis
sensor was positioned on the upper cap. The Z-axis is aligned with the axis of the springs of the
casing, the X-axis is aligned with the axis of the cylinder (which is horizontally oriented in this
design), and the Y-axis is perpendicular to the axis of the cylinder. The X-axis sensor is located
near the discharge pipe connection.
All the compressors were subjected to the same testing procedures. Two sets of
conditions were implemented. The first one (mapped measurements) used 3 levels of speed, 11
levels of 𝑇 and 11 levels of 𝑇 , leading to a total of 363 possible combinations. The range of
values of temperature used was defined using the tables of application of the manufacturer. It
is interesting to note that each of the possible 121 combinations of pressures was chosen in a
random sequence by the system, instead of following a pre-defined sequence. This was done in
order to avoid systematic errors.
The reason why 3 different speed values are used is because this model of compressor
is able to work at different speeds, so it is necessary that the proposed tool is able to make
predictions in any of these conditions. As the vibration is highly dependent on the frequency of
rotation, the speed acts as an important disturbance of the model.
The experiment using these combinations was executed three times and each
measurement was done during 10 seconds in order to reduce the effect of noise of the
experiment. Only the experiments using the first and second speeds on the first compressor
were executed five times, instead of three.

Universidade Federal de Santa Catarina Rafael Hoffmann Fallgatter

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Universidade Federal de Santa Catarina Rafael Hoffmann Fallgatter

Uploaded by

Copyright:

Available Formats

UNIVERSIDADE FEDERAL DE SANTA CATARINA

Rafael Hoffmann Fallgatter

MACHINE LEARNING METHODS FOR THE PREDICTION OF

MACHINE LEARNING METHODS FOR THE PREDICTION OF SUCTION

Bachelor’s Thesis presented to the Mechanical

Advisor: Prof. Dr. Eng. Rodolfo César Costa

MACHINE LEARNING METHODS FOR THE PREDICTION OF SUCTION

Florianópolis, 25th November 2019.

A medição das condições de operação de compressores de refrigeração, caracterizadas

Palavras-chave: Compressores herméticos, Sensoriamento virtual, Aprendizado de

Palavras-chave: Compressores herméticos, Sensoriamento virtual, Aprendizado de

The measurement of the operating condition of a refrigeration compressor, which is

Keywords: Hermetic compressors, Soft sensing, Machine Learning, Decision Trees

Figure 1 - Example of decision tree. .............................................................................. 25

Table 1 - Experiment parameters.................................................................................... 41

ANN – Artificial Neural Network

BPNN – Back Propagation Neural Network

CART – Classification and Regression Tree

DBN – Deep Belief Network

EMD – Empirical Mode Decomposition

FEM – Finite Elements Method

FFT – Fast Fourier Transform

LABMETRO – Laboratory of Metrology (from the Portuguese Laboratório de Metrologia)

LSE – Least Squares Error

LS Boost – Least Squared Boost

LVQ – Learning Vector Quantization

MAE – Mean Absolute Error

MSE – Mean Squared Error

PCA – Principal Component Analysis

RMSE – Root Mean Squared Error

SOM – Self Organizing Map

SVM – Support Vector Machine

TKEO – Teager Kaiser Energy Operation

USA – United States of America

VCC – Variable Capacity Compressor

A – the internal area of tube used to calculate its natural frequencies

β – coefficient of linear regression

𝛾 – scaling factor of LS Boost

e – error of the Random Forest

f – natural frequency of a pipe

F – an specific prediction of an LS Boost model

h – an specific prediction of a single tree from an LS Boost model

i – impurity used to make splits on a Decision Tree

I – area momentum of a pipe, used to calculate its natural frequencies

Imp – feature importance calculated for a Random forest

l – length of tube, used to calculate its natural frequencies

𝜈 – learning factor used on LS Boost algorithm

𝑣(𝑠 ) – variable used in split 𝑠 of LS Boost algorithm

M – number of iterations of LS Boost algorithm

N – number of observations on a Random Forest

𝑁 – number of samples reaching a specific node of a Random Forest

𝑝̅ – average correlation between trees

p – proportion of samples reaching a split of a Decision Tree over total number of

1.1 CONTEXTUALIZATION ............................................................................................19

1.2 STATE OF THE ART...................................................................................................20

1.3 OBJECTIVES....................... ........................................................................................21

1.4 DOCUMENT STRUCTURE ........................................................................................22

2.1 MACHINE LEARNING ...............................................................................................23

2.1.1 Multivariable Linear Regression..................................................................... 24

2.1.2 Decision trees ..................................................................................................... 24

2.1.3 Ensemble Methods ............................................................................................ 26

2.1.3.1. Random Forest ............................................................................... 28