2022-IPE3 - Anomaly Detection For Centrifuge Natural Gas Compressor Using LSTM Based Autoencoder - Biendong POC

Anomaly Detection for Centrifuge Natural Gas Compressor Using
LSTM-Based Autoencoder in Hai Thach – Moc Tinh Field, Offshore

Vietnam
Hai H. Ngo - Bien Dong Petroleum Operating Company

Trung N. Tran, Tung V. Tran, Khoa Q. Dao, Trung T. Nguyen, Son K. Hoang - Bien Dong
Petroleum Operating Company
Truong H. Trieu - Ha Noi University of Mining and Geology
ABSTRACT
Predictive Maintenance (PdM) practices are becoming increasingly popular in many industry
sectors because they reduce unnecessary maintenance operations and improve machinery
reliability. With the recent advancement of Industry 4.0, Artificial Intelligence and Machine
Learning have found wide applications in PdM practice to assist humans in processing,
continuously monitoring, and analyzing equipment health. The first step in a PdM program is
identifying or distinguishing between the equipment normal and abnormal operating environment.
At Hai Thach - Moc Tinh (HT-MT) field, monitoring the natural gas compressor is crucial in the gas
condensate processing system. Therefore, research on anomaly detection was identified as a
prerequisite and pioneering step in applying PdM program in the HT-MT field.
In this study, the authors used an improved LSTM-based autoencoder network for anomaly
detection of multivariate features with natural gas compressor. The network, combined with the
Random Search optimization technique and the predefined set of hyperparameters, was trained
to achieve an optimized loss function of Mean Absolute Error (MAE). Then, the optimal threshold
value was selected to achieve the highest F-score among the optimized models. Two critical
hyperparameters, namely timesteps and activation functions, were tested for different runs to
obtain the model with the highest F-score value of 0.57143. The adaptability and accuracy
network correctly detected all the anomalies and early findings while maintaining the overall
performance, such as avoiding overfitting and providing early stopping procedures. Indeed, an
early finding was spotted sooner than the ground truth by domain expert labeling, giving the
operators a warning earlier by 2 minutes. The improved LSTM-based autoencoder network
efficiently extracts essential features from the multivariate time series to correctly classify the
anomalies and minimize the misdiagnosis instances.
1. INTRODUCTION ABOUT ANOMALY DETECTION FOR MACHINERY
Anomaly detection is an important data mining task with various applications in various
domains [1]. There is no consent about distinguishing anomalies and outliers (interchangeably).
Anomaly detection refers to the problem of finding patterns in data that do not conform to
expected behavior. Often, anomalies in data are translated to significant and critical information in
a wide variety of applications such as intrusion detection, fraud detection, fault detection,
equipment health monitoring, image processing, and sensor networks. All these definitions
highlight two main characteristics of anomalies, namely: (1) the distribution or abnormal patterns
of anomalies deviate significantly from the general distribution of the data [2] and (2) a significant
portion of the data set consists of normal data points, in which the anomalies form only a tiny part
of the dataset [3].
Mechanical devices such as engines, vehicles, and aircraft are typically instrumented with
numerous sensors to capture the behavior and health of the machine. With the rise of the Fourth
Industrial Revolution (Industry 4.0), improvements of Information Technology infrastructure have
been made to support intelligent sensing, Internet of Things (IoT), extensive data collection, and
analytics tools; companies now can get the most out of their data.
Figure 1: Framework of data-driven machine monitoring systems
Data-driven methods are gaining popularity among various machine health monitoring
approaches due to advanced sensing and data analytic techniques. Data-driven decisions can be
made by analyzing historical events and trends in the past [4-6]. For instance, manual controls
and/or unmonitored environmental conditions or loads may lead to inherently unpredictable time
series. Detecting anomalies in such scenarios becomes a crucial need that utilizes Machine
Learning (ML) to process, continuously monitor, and analyze equipment health. The data-driven
models focus on training based on historical measured data and then predict with real-time data
to monitor the operational status of the machine changes. In addition, the development of
advanced sensors and computing systems makes the research topic of data-driven machine
monitoring systems more and more attractive. The basic framework behind data-driven models for
machinery health monitoring, as illustrated in Figure 1, consists of four major parts: (1) data
acquisition (taking various sensor data as inputs), (2) feature extraction/selection/fusion, (3) model
training, validating and optimization, and (4) model prediction and assessment.
Depending on equipment types, different combinations of sensors are used so that the
captured data can reflect the degradation process of machinery. Sensors installed at the
equipment collect continuous time-series operational data. Some commonly used sensors are
accelerometers, acoustic emission sensors, infrared thermometers, current, temperature,
pressure sensors, etc. Anomaly detection in time-series is strongly linked to time-series analysis
and forecasting methods. Chalapathy and Chawla suggested that anomalies consists of three
different types [7]:
▪ Point anomalies: If a point deviates significantly from the rest of the data, it is considered a
point anomaly. Hence, a point 𝑋𝑡 is considered a point anomaly, if its value differs
significantly from all the points in the interval [𝑋𝑡−𝑘 , 𝑋𝑡+𝑘 ], 𝑘 ∈ ℝ and 𝑘 are sufficiently
large.
▪ Collective anomalies: There are cases where individual points are not anomalous, but a
sequence of points is labeled as an anomaly.
▪ Contextual anomalies: Some points can be normal in a particular context while detected
as an anomaly in another context. For example, a differential pressure of 1 bar in the start-
up stage is normal, while the same differential pressure in the normal stage is regarded as
an anomaly.
Therefore, in equipment health monitoring, anomalies are related to unexpected data from
sensors or processors (point anomalies), significant events/sequences (collective anomalies); or
unexpected behavior during transient stages that require analysis, monitoring, and evaluation.
Having robust anomaly detection tools helps companies avoid potential breakdowns or address
the problem proactively. With improved equipment performance monitoring techniques,
corporations can achieve greater operational and maintenance efficiency and effective growth
management. To obtain robust anomaly detection, commonly used methods are (1) statistical
methods, (2) classical machine learning methods, and (3) methods using Deep Learning [3]. In
the last decade, deep learning methods have made significant progress in detecting anomalies.
That has prompted us to approach deep learning methods in this study. The LSTM-based
autoencoder (LSTM-AE) used in this study was constructed as a sequence to sequence model
containing stacked LSTM layers. This network allows extracting essential features from the
multivariate time series more efficiently. The output of an anomaly detection algorithm can be one
of two types: (1) outlier scores to quantify the level of “outliers” of each data point, or (2) binary
labels indicating whether a data point is an outlier or not. This LSTM-AE scheme learns to
reconstruct normal time-series behavior and uses reconstruction error to detect binary labels for
each anomaly.
2. DEEP LEARNING APPROACHES FOR MACHINERY ANOMALY DETECTION
With the development of deep learning methods in the last few years, deep neural network
structures can extract local features that are robust and informative from the sequential input
(Convolutional Neural Network - CNN), long-term capture dependencies, and sequential model
data (Long Short Term Memory - LSTM). Then, the fully connected layers and the linear
regression layer can help predict the target value to improve the prediction accuracy. The deep
learning methods are widely used in fault detection and diagnosis [8-13], and they are verified to
be effective for big data of multiple parameters to detect the faults for machinery [14-20].
2.1 Deep learning approaches for anomaly detection

Since deep learning has achieved tremendous results in computer vision tasks like object
detection, classification, segmentation, time-series forecasting, and analysis, many references
have been reviewed to provide an expansive overview of deep learning approaches, especially for
anomaly detection with time-series datasets [21, 22]. A comprehensive survey on deep learning
approaches for anomaly detection was conducted by Chalapathy and Chawla [7]. These deep
hybrid models for anomaly detection combine separated deep networks like Multi-Layer
Perceptrons (MLPs), Convolution Neural Network (CNN), Recurrent Neural Network (RNN), Long-
Short Term Memory (LSTM), and deep learning-based autoencoder (AE) [3]. However, one must
use more complex deep learning models to improve performance. Many other deep learning
networks in anomaly detection for machinery applications are listed in Table 1, like Stacked LSTM
[21], Convolutional Bi-Directional LSTM [22], and CNN-LSTM with Attention Mechanism [17].
Table 1: Deep learning models for anomaly detection
Deep Learning Network Application Reference
Deep Autoencoder (DA) Reconstruct the errors between input and output values [23]
network to monitor the operation state of wind turbine
Stacked LSTM network Anomaly prediction of space shuttle and engine based on [21]
the probability distribution of residual vectors
Convolutional Bi- Predict the tool wear for the CNC milling machine using [22]
Directional LSTM CNN's local feature extraction capability and the ability to
capture the temporal information of the LSTM network
CNN-LSTM with The CNN is combined with LSTM to extract the wind [17]
Attention Mechanism turbine's time features to enable condition monitoring and
anomaly detection. AM is designed for LSTM to
concentrate on the characteristics that significantly
impact output variables to improve the model's accuracy.
Generative-Temporal The generative countermeasures network (GAN) was [19]

Convolutional Neural used as the characteristic extraction block, and the Time
Network Convolutional Neural Network (TCNN) was used as the
fault classifier method for wind turbines
Deep neural networks Fault identification method based on rotor speed [18]
(DNN) prediction error for wind turbine generators
Vanilla LSTM Estimate the remaining useful life of an aircraft turbofan [24]
engine under complex operating conditions and strong
background noise
The deep learning-based anomaly detection models are divided into unsupervised, semi-
supervised, and supervised methods. Due to the high cost of labeling data for supervised
Machine Learning (ML) algorithms, unsupervised approaches have been given substantial
attention in recent literature [25]. A semisupervised ML algorithm allows the use of unlabeled data
together with labeled data. However, semi-supervised learning has a problem with contamination
of training datasets, interpretation of results, and poor detection capabilities [25]. In this study, the
authors focus on the semi-supervised machine learning approach to overcome this problem
because the time to highlight anomalies for multivariate time-series data during normal operation
will be much less than point-by-point labeling. In addition, transient changes such as load
transferring, well opening and closing, or changing operating modes are very frequent during the
operation of a compressor in the gas-condensate processing plant.
2.2 Research objectives

Recently, LSTM has emerged as a powerful technique to learn long-term dependencies and
effectively represent the relationship between current and previous events. LSTM networks-based
autoencoder scheme for Anomaly Detection learns to reconstruct “normal” multivariate time-series
behavior and uses reconstruction error vectors to detect anomalies. LSTM incorporates
representation learning and model training together to address multivariate data sequences,
capture long-term dependencies, and improve the model's generalization capability. Therefore,
the proposed methodology in this study was developed to:
▪ Present an effective method for detecting anomalies using an improved LSTM-based
Autoencoder (LSTM-AE) for multivariate time series data of natural gas compressor
without using any assumptions for the distribution of the prediction errors;
▪ Focus on selecting the optimized model using the Random Search optimization technique
with different hyperparameter set based on Mean Absolute Error (MAE) loss function
metrics;
▪ Timesteps (sliding window) and the activation function for the improved LSTM
autoencoder were compared among the optimized model to select the highest F-score
network. The main goal of this section is to create an adaptable and accurate anomaly
detection system with maximized detection capabilities by classifying all the anomalies
and early finding instances.
3. METHODOLOGY
In the last decade, deep learning approaches achieved tremendous progress in detecting
anomalies. In this section, we brieﬂy review some artificial intelligence networks necessary to
build the proposed model to detect anomalies, including the LSTM, the autoencoder network, and
the methodology for the LSTM-AE model in this study. The ability of LSTM to learn patterns in
data over long sequences makes them suitable for time series forecasting or anomaly detection.
Autoencoder is an unsupervised feed-forward neural network that recreates the input data while
extracting its essential features through dimensions reduction using the nonlinear activation
function and multiple neural layers. Taking the advantages from both LSTM and autoencoder
networks, the LSTM-based autoencoder network was studied to reconstruct the multivariate time-
series inputs and then used residual error to classify anomalies.
3.1 Dataset and data preprocessing techniques

In this particular problem, the dataset contains the multivariate time series inputs for each of
the following features: pressure, temperature, flow, and power of a centrifuge natural gas
compressor. This is data from multiple sensors, which are installed at different locations of the
compressor. Data is labeled with anomaly instances for further performance evaluation. The
anomalies with gas compressor are usually related to phenomena such as measurement error,
spike or noise of the input signal; data processing error; short-term process or machinery
adjustment to adapt some changes on the system (making setpoint change, open well, reducing
gas rate, line-up, etc.); unexpected responses or machinery performance degradation in a long-
term manner. According to a previous study by Vávra et al., checking the number of missing
values in the datasets commonly due to data transaction problems is recommended. The missing
values can be replaced by value zero, mean of the feature, or feature median [25]. However, in
this study, the datasets have no missing value with a timestamp of one-minute intervals.
The different scales of features can negatively affect their importance and, therefore, may
result in weak detection capabilities of the machine learning model [25]. The two standard data
scale techniques, namely normalization (Equation 1) or standardization (Equation 2), are
compared in this study.
𝑥−𝑚𝑖𝑛(𝑥)
𝑥 ′ = 𝑚𝑎𝑥(𝑥)−𝑚𝑖𝑛(𝑥) (1)
𝑥−𝑥̅
𝑥′ = (2)
𝜎
Where, 𝑥 is the original dataset, 𝑥 𝑖 is the scaled dataset, 𝑥̅ and 𝜎 are the mean and standard
deviation.
According to other studies, the deep learning anomaly detection algorithms are implemented
semisupervised (one-class classification). As recommended by Vavra et al., for the semi-
supervised learning model [25], the training and validation dataset contains only normal class
data, and the test dataset includes all classes of a dataset. The data in the test dataset are
entirely separated from the training and validation dataset. This technique is also applied in Xiang
et al. and Sakurada and Yairi's studies for deep learning anomaly detection models [17, 26]. On
the other hand, for the stacked LSTM network, Malhotra et al. divided the normal sequence into
four sets: normal train, normal validation-1, normal validation-2, and normal test. The anomalous
sequence(s) are divided into anomalous validation and abnormal test [21, 27]. Zhao et al. applied
a three-fold training/testing data splitting technique consisting of two training and one testing
dataset [22]. In this study, the authors divided the dataset into four sets: normal train, normal
validation, a test dataset that contains normal and anomaly instances. During the training and
validation process, the LSTM-AE network is optimized to select the most suitable hyperparameter
set. The labeled anomalies and normal instances are used to calculate the threshold and
performance of the classification process.
3.2 LSTM-based autoencoder model

3.2.1 Long Short Term Memory networks (LSTM)
LSTM, introduced by Hochreiter and Schmidhuber, is a Recurrent Neural Network (RNN) that
allows the network to retain long-term dependencies between data at a given time from many
timesteps before [28]. LSTM mainly needs a significant amount of time in training, while its
inference time is speedy and often used for text or audio classification purposes. Due to the
vanishing gradient problem during backpropagation for model training (the weight updates in the
backpropagation step become very small), traditional RNN may not capture long-term
dependencies. Therefore, forget gates were introduced in LSTMs to avoid the long-term
dependency problem and nonlinear dynamics in time series data [21]. LSTM network contains a
chain of the repeated neural cell containing the inputting, outputting, and forget gates. This design
allows control of how much of the data is kept, forgot and delivered to the output. In addition,
using logistic regression and element-wise multiplication determines which elements of the long-
term memory should be erased. Its principle diagram is shown in Figure 2. A direct comparison of
popular variants of LSTM made by Greff et al. showed that these variations are almost the same;
a few among them are more efficient than others but only in some specific problems [29].
Figure 2: Schematic diagram of LSTM [17]
The abandoned information is implemented by forget gate 𝑓𝑡 and input gate 𝑖𝑡 , which can be
denoted by:
𝑓𝑡 = 𝜎(𝑊𝑓 . [ℎ𝑡−1 , 𝑥𝑡 ] + 𝑏𝑓 ) (3)
𝑖𝑡 = 𝜎(𝑊𝑖 ∙ [ℎ𝑡−1 , 𝑥𝑡 ] + 𝑏𝑖 ) (4)

𝑅𝑒𝐿𝑈(𝑥) = 𝑚𝑎𝑥(0, 𝑥) (5)
𝐶̃𝑡 = 𝑅𝑒𝐿𝑈(𝑊𝐶 ∙ [ℎ𝑡−1 , 𝑥𝑡 ] + 𝑏𝑐 ) (6)
Where 𝑥𝑡 is the current input, 𝑜𝑡 is the output gate.
The output ℎ𝑡 of the LSTM cell can be expressed by:
𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖𝑡 ∗ 𝐶̃𝑡 (7)
𝑜𝑡 = 𝜎(𝑊𝑜 [ℎ𝑡−1 , 𝑥𝑡 ] + 𝑏𝑜 ) (8)
ℎ𝑡 = 𝑜𝑡 ∗ 𝑅𝑒𝑙𝑢(𝐶𝑡 ) (9)
The matrices (𝑊𝑓 , 𝑊𝑖 , 𝑊𝑜 , 𝑊𝑐 ) are the weight matrix of forgetting, input, output gates and cell,
and (𝑏𝑓 , 𝑏𝑖 , 𝑏0 , 𝑏𝑐 ) are the biases of forgetting, input, output gates and cell. These parameters are
not time-dependent and do not change from one time step to another.
The LSTM recurrent manner makes it an appropriate method for sequence data, incredibly
complex multivariate time-series [22]. Malhotra et al. suggested using the prediction error's
probability distribution. A stacked LSTM network with two hidden LSTM layers can mark
timestamps as normal or anomalous [21]. Then, Malhotra et al. detected a multi-sensor anomaly
detection based on an LSTM encoder-decoder scheme [27]. Xiang et al. developed a CNN-
LSTM-AM model with a high-dimensional feature as LSTM input, and the internal dynamic
changes of features are learned [17]. Then, the exponential weighted moving average method
(EWMA) was applied to identify the operation state and predict the early anomaly state of wind
turbines by prediction residuals.
3.2.2 Autoencoder
Autoencoder is an unsupervised feed-forward neural network that recreates the input data
while extracting its features through different dimensions. Autoencoder network consists of the
input and output layer with the same dimension, an encoder and decoder neural network, and a
latent space (Figure 3).
Figure 3: Autoencoder include the encoding (dimension reduction), decoding layers

(reconstruction), and latent space
When the data is fed to the network, the encoder layers 𝐸 compresses input array 𝑋 into the
latent space 𝑍, whereas the decoder 𝐷 decompresses the encoded representation into the output
layer with the same dimension as the input array 𝑋̂ (Figure 3). Formally, the reconstructed array
can be expressed by:
𝑋̂ = 𝐷(𝐸(𝑋)) (10)
Then, the error is backpropagated through the architecture to update the weights 𝜃𝜙 and 𝜃𝜓
of the encoding and decoding part. The optimization function of the autoencoder tries to minimize
the deviation between 𝑋 and 𝑋̂ [3]:
2 2
min ‖𝑋 − 𝑋̂‖ = min ‖𝑋 − 𝐷(𝐸(𝑋))‖ (11)
𝜃𝜓 ,𝜃𝜙 𝜃𝜓 ,𝜃𝜙
By constraining the latent space to have a smaller dimension than the input, the autoencoder
is forced to learn the most critical features of the training data. Autoencoders are best suited to
semi-supervised learning approaches where the training data only consists of normal points. The
reconstruction error during training autoencoder network with normal instances is much lower
than feeding with an anomalous sequence. The comparison or deviation of error vector when
feeding with normal and abnormal data can be used to classify the anomaly [3, 25]. Since it is a
deep learning-based strategy, it will particularly struggle with less data. The computation costs will
be significantly increased if the network's depth increases, and dealing with big data and the
complicated architecture shall lead to overfitting issue.
3.2.3 LSTM Autoencoder
In the field of anomaly detection with deep learning methods for machines, many studies
involve the development of the LSTM models as feature extractors. Several types of
autoencoders have been proposed in the literature, such as vanilla autoencoder, deep
autoencoder, convolutional autoencoder, regularized autoencoder, and LSTM autoencoder. The
LSTM-based autoencoder refers to the autoencoder that both the encoder and the decoder are
the LSTM network. LSTM autoencoder models have been recently proposed for sequence-to-
sequence learning tasks like machine translation [30, 31], detecting anomalies in multi-sensor
time series using only the normal sequences [27]. Malhotra et al. showed that the LSTM and
stacked LSTM autoencoder network are robust and can detect anomalies from predictable,
unpredictable, periodic, aperiodic, and quasi-periodic time series [21, 27].
The LSTM-based autoencoder used in this study was constructed as a sequence to
sequence model containing stacked LSTM layers. This network allows extracting essential
features from the multivariate time series more efficiently. Indeed, the LSTM encoder learns a
fixed-length vector representation of the input time series, and the LSTM decoder uses this
representation to reconstruct the time series using the current hidden state and the value
predicted at the previous time step. The calculation procedure of the LSTM-based autoencoder
network in this study can be illustrated in Figure 4 and with the following detailed steps.
Step 1:
The dataset was divided into training, validating, and testing datasets. The training and the
validating datasets only contain normal instances, while the testing dataset contains some labeled
anomalies.
(1) (2) (3) (𝑛)
Let 𝑋𝑡 = {𝑋𝑡 , 𝑋𝑡 , 𝑋𝑡 , … , 𝑋𝑡 } be an array that represents multivariate time series of 𝑡
samples and 𝑛 number of features. In this particular problem, 𝑋𝑡 contains the time series of
samples for each of the following features: pressure, temperature, flow, and power of a centrifuge
gas compressor.
Step 2:
In the training process, the dataset was reconstructed into 3 dimensions (3D) array with
timesteps of 𝑖. The sequence of observed data became a 3D array [sample 𝑡, timesteps 𝑖,
features 𝑛]. Then, the sequence of observations is scaled using the normalization (Equation 1) or
standardization (Equation 2) function.
(1) (2) (3) (𝑛)
𝑋𝑡 [𝑡, 𝑛] = {𝑋𝑡 , 𝑋𝑡 , 𝑋𝑡 , … , 𝑋𝑡 } => 𝑋𝑡′ [𝑡, 𝑖, 𝑛] = {{𝑌𝑖𝑛 }, {𝑌𝑖+1
𝑛 },
… , {𝑌𝑡𝑛 }} (12)
(1) (1)
𝑥1 … 𝑥𝑖
𝑛
𝑌𝑡 = [ ⋮ ⋱ ⋮ ] (13)
(𝑛) (𝑛)
𝑥1 ⋯ 𝑥𝑖
Figure 4: An illustration of the calculation procedure of the LSTM-based autoencoder network
Step 3:
In the training process, the array 𝑋𝑡′ [𝑡, 𝑖, 𝑛] is fed to the LSTM-based autoencoder network to
create a reconstructed array 𝑋̂𝑡′ [𝑡, 𝑖, 𝑛]. The network shall update the weights and biases of LSTM
layers to minimize the Mean Absolute Error (MAE) by putting each the array {𝑌𝑖𝑛 } until the last
array of 𝑋𝑡′ [𝑡, 𝑖, 𝑛].
̂
(1) ̂
(1)
𝑥1 … 𝑥𝑖
𝑌̂𝑛
𝑡 =[ ⋮ ⋱ ⋮ ] (14)
̂
(𝑛) ̂
(𝑛)
𝑥1 ⋯ 𝑥𝑖
̂′ [𝑡, 𝑖, 𝑛] = {{𝑌̂
𝑋 𝑛 ̂𝑛 ̂𝑛
𝑖 }, {𝑌𝑖+1 }, … , {𝑌𝑡 }} (15)
𝑡
The weights and biases of the LSTM network are trained to minimize the loss function of
MAE:
1
𝐿𝑜𝑠𝑠(𝑀𝐴𝐸) = 𝑡 ∑𝑡𝑘=1‖𝑋𝑘 − 𝑋̂𝑘 ‖ (16)
The LSTM-based autoencoder network was optimized with ranges of hyperparameters to

achieve the lowest Mean Square Error (MEA). An early stopping procedure was set up to reduce
the training time if the MEA metric was not decreased for a number of consecutive epochs.
Step 4:
̂′ [𝑡, 𝑖, 𝑛] (3D array [sample 𝑡, timesteps 𝑖, features 𝑛]) was re-arranged
After that, the array 𝑋𝑡
into the original 2 dimensions (2D) array with timesteps of 𝑖.
̂′ [𝑡, 𝑖, 𝑛] = {{𝑌̂𝑛 ̂𝑛 ̂𝑛 ̂ ̂
(1) ̂(2) ̂(3) ̂(𝑛)
𝑋𝑡 𝑖 }, {𝑌𝑖+1 }, … , {𝑌𝑡 }} => 𝑋𝑡 [𝑡, 𝑛] = {𝑋𝑡 , 𝑋𝑡 , 𝑋𝑡 , … , 𝑋𝑡 } (17)
3.3 Hyperparameter optimization

The optimization process of the LSTM deep learning model includes the fine-tuning process
for the following hyperparameters: number of neurons, number of layers, number of epochs, batch
size, dropout size, activation function, the optimizer for algorithm training, etc. The dropout
technique has been used to prevent overfitting and reduce the regression error of the
Convolutional Bi-Directional LSTM model [22]. In a study by Nguyen et al., the learning rate, the
number of cells, and the dropout were optimized for the LSTM network [32]. Ioffe and Szegedy
suggest a regularization technique called Batch Normalization [33], which helped produce a
higher learning rate model alternatively from dropout layers.
In this study, some model parameters need to be optimized based on the input data to
achieve the best performance. The authors decided to use the Random Search (RS) optimization
technique among the three well-known optimization algorithms (Random Search, Genetic
algorithms, and Tree-Structured Parzen Estimator). RS is a simple and usually the first selection
for the optimization algorithm of machine learning algorithms [25]. This optimization algorithm is
based on the "Grid Search" (GS) algorithm that systematically calculates all possible
hyperparameters combinations. The downside of the algorithm is its enormous time and
computation demands. Unlike GS, RS is more likely to select the more effective solution that
randomly selects hyperparameters from all search space [25].
3.4 Performance assessment and evaluation metrics for anomaly detection

The nature of the anomaly detection problems requires different techniques, and one dataset
might work well with particular algorithms and its hyperparameter set. Therefore, each machine
learning model is evaluated to determine the best performance considering the complexity,
adaptability, and test accuracy.
3.4.1 Prediction error
To quantitatively evaluate the performances of several machine learning models, Zhao et al.
suggested two metrics, named Mean Absolute Error (MAE) and Root Mean Squared Error
(RMSE), to predict tool wear prediction using Convolutional Bi-Directional LSTM network [22]. The
RMSE, MAE, the Mean Absolute Percent Error (MAPE), and the coefficient of determination (r-
square 𝑅 2) of the deep learning model (CNN-LSTM-AM) was also used to identify the wind
turbine's failed operation [17]. The corresponding equations (18-22) for the calculations of these
metrics are given as follows:
1
𝑀𝐴𝐸 = 𝑛 ∑𝑛𝑖=1|𝑦̂𝑖 − 𝑦𝑖 | (18)
1
𝑅𝑀𝑆𝐸 = √𝑛 ∑𝑛𝑖=1(𝑦̂𝑖 − 𝑦𝑖 )2 (19)
1 𝑦𝑖 −𝑦̂𝑖
𝑀𝐴𝑃𝐸 = 𝑛 ∑𝑛𝑖=1 | 𝑦𝑖
| (20)
∑𝑛 ̂ 𝑖 −𝑦𝑖 )2
𝑖=1(𝑦
𝑅2 = 1 − 𝑛
∑𝑖=1 ̅−𝑦𝑖 )2
(𝑦
(21)
1
𝑦̅ = 𝑛 ∑𝑛𝑖=1 𝑦𝑖 (22)
Where 𝑛 represents the predicted points; 𝑦𝑖 and 𝑦̂𝑖 represent the observed value and the
predictive value of the ith point, respectively.
3.4.2 Binary confusion matrix
Several evaluation metrics can be considered to compare the different anomaly detection
methods. The binary confusion matrix expresses the relationship between predicted and actual
classes or prediction/reconstruction error between the predicted/forecasted and the actual values.
In addition to the binary confusion matrix, Vavra et al. applied the following evaluation metrics:
Matthews Correlation Coefficient (MCC), Precision, Recall, F-score, and False-positive rate (FPR)
[25]. Matthews Correlation Coefficient (MCC) expresses all aspects of the confusion matrix and is
resilient against the usage of unbalanced datasets.
TP is the True Positive classification, which is anomalous and detected as anomalous by the
algorithm. FP is the False Positive that are normal but have been incorrectly diagnosed as
anomalies. TN (True Negative) stands for the number of normal events correctly diagnosed as
normal. FN (False Negative) stands for the number of anomalies incorrectly classified as normal
events. The False-Positive Rate (FPR) is the most critical metric, expressing positive cases
identified as a false class. The True Positive Rate (TPR) and False-Positive rate (FPR) can be
obtained by:
𝑇𝑃
𝑇𝑃𝑅 =
𝑃
𝐹𝑃
𝐹𝑃𝑅 =
𝑁
The comparison is also made by using the Precision, Recall, and F-score metrics. Precision
considers the False-Positive (FP) classification in the metric calculation. In comparison, the Recall
is used to evaluate how complete the result is.
TP
Precision =
TP+FP
TP
Recall =
TP+FN
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹 − 𝑠𝑐𝑜𝑟𝑒 = 2 ×
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
Munir et al. used this metric to evaluate their anomaly detection methods on different time
series [34]. Maya et al. also use the F-score in addition to the Recall and Precision [35]. Together
with Precision, Recall, and F-score metrics, two studies have applied Accuracy to the LSTM
autoencoder model as a feature extractor to extract essential multivariate time series input
representations [32, 36].
TP+TN
Accuracy =
TP+FP+TN+FN
3.4.3 Anomaly detection mechanism
The prediction residual or reconstruction error evaluated the machine learning model
performance and classified the anomalies. The reconstruction error has low values if test samples
are normal instances, while the residual error becomes large with anomalous samples for CNN-
LSTM-AM deep learning models [17]. According to Tran et al. study in 2019, Sheather and
Marron's kernel quantile estimator method was applied to estimate the threshold over the set of
error vectors between the predicted and observed values of the LSTM network [36]. The
proposed methodology by Tran et al. classified the anomaly samples with 94% accuracy, 96%
precision, and 86% F-score. In the studies by Malhotra et al., anomaly detection was classified
based on the reconstruction error of LSTM-based Autoencoder network, assuming that those
error vectors followed a Gaussian distribution [21, 27]. In addition, Nguyen et al. suggested
applying the One-Class Support Vector Machine (OCSVM) without any specific assumption of
data to detect the anomaly [32]. After learning within the hidden representations of autoencoders,
the OCSVM can define a hyperplane to separate anomalies from normal observations from these
independent error vectors. On the other hand, the evaluation of detection capabilities was
provided due to the test dataset classification. Vávra et al. suggested using the precision/recall
curve to calculate the final threshold to distinguish the normal data from anomalies for semi-
supervised machine learning models [25].
4. RESULTS AND DISCUSSION
In this paper, sensing values from the natural gas compressor were captured as a case
analysis. One set of data contained faults in July 26, 2021, with the event of excessive liquid
carrying over to the suction side of the compressor. This abnormal event might cause excessive
vibration, bearing damage, and rotor blade degradation. According to domain experts, the data
before and after the event was labeled as the normal operating condition of the machine.
Figure 5: Data splitting (blue dash line) and anomalies plot (red dash line) for three different
features
As shown in Figure 5, typical features are plotted sequentially in the time domain. The zoom-
in section of the dataset showed the separation between training and testing datasets. The
anomalies in the test dataset are highlighted in red dash lines. In total, we choose the six most
important features during the operation of the natural gas compressor, namely: pressure (2
features), temperature (2 features), flow (1 feature), and power consumption (1 feature). However,
there is no noticeable change in the trend, so the operating status of the compressor cannot be
directly determined by the change in each of the parameters. Following those multivariate time
series plots, it is near-impossible for a human operator to interpret data in real-time during normal
operation, and the conventional systems cannot spot anomalies reading in a specified time
window. In a particular case, as illustrated in Figure 5, the operators might spot the other overshot
instances for the “PQPTI2180” feature to be the anomalies.
Since the slugging flow from remote wellhead to the processing plant is frequently observed
for a brownfield like in Hai Thach – Moc Tinh field, our task is to recognize these anomalies by
using the LSTM-based autoencoder and the F-score to seek a balance between the Precision and
Recall method. This is especially important since in changing the loading of the centrifuge
compressor, the operators often have to adjust and monitor a lot of information simultaneously,
such as machine load, pressure fluctuation, well opening and closing, and gas and liquid flow.
4.1 Hyperparameter optimization

As listed in Table 2, the selection of the hyperparameters is cross-validated in a portion of
training and validating datasets. The training processing time of the proposed models varied from
15 to 20 hours with a workstation using Intel 10th Generation Core i7 CPU with 6 cores at 2.60
GHz, GPU NVIDIA Quadro T2000, and 32 GB RAM. The more the timesteps, the larger the input
tensor and the more the computation effort. Due to the computational volume of the LSTM-based
autoencoder mode, an early stopping procedure was established in case the monitoring metric
was not changed for a number of consecutive epochs. On the other hand, the prediction time took
less than a second for LSTM-based autoencoder model.
Table 2: Hyperparameters optimization
Hyperparameter Function Optimization ranges
Number of Explore additional hierarchical learning 4 LSTM, 1 Dropout, 1

layers and capacity by adding more layers and varied Repeat Vector, and 1
neurons numbers of neurons in each layer TimeDistributed layer.
LSTM Layer 1 = 32 to 512.
32 neurons step range per
search for each layer.
Features and The selection of the sliding window or The number of features (6)
Timesteps timestep for the input array is equal to the number of
installed sensors on the
compressor.
Timesteps = 3 to 7 with 1
step range.
Activation Transform the weighted sum of the input into Rectified Linear Unit
function an output from one or many nodes in a (ReLU) and Leaky
network layer Rectified Linear Unit
(LeakyReLU)
Optimization Investigate optimization algorithms to see if Random Search

Algorithm specific configurations to speed up or slow
down learning can lead to benefits
Loss Function Find out the loss functions that provide the MAE
best performance in terms of predefined
metric
Regularization Explore how weight regularization, such as L2 regularization = 0 to 0.1

L1 and L2, can slow down learning and with 0.025 step range
overfitting the network on some
configurations
Dropout Fraction of the input units to drop (or weights Dropout rate = 0 to 0.25
frozen during training) with 0.05 step range
Batch size and The batch size is 32 for all of the tests. The early stopping
number of procedure during the
The number of epochs is 10,000 for all of the
epochs training and testing dataset
tests with early stopping procedures.
is 200 and 250
consecutive epochs,
respectively.
Learning rate Each node's weights are updated during 0.01, 0.001, 0.0001
training, referred to as the step size or the
learning rate.
The LSTM-based autoencoder network in this study consisted of 4 LSTMs, 1 Dropout, 1

Repeat Vector, and 1 TimeDistributed layers, as detailed in Table 3. During the training process,
other hyperparameters were changed in the Random Search process to find the optimized model
with the smallest MAE. The hyperparameters and optimization ranges are detailed in Table 2. The
learning rate was also optimized to ensure the convergence time of the model. Dropout layer and
L2 regularization were added to avoid overfitting for the network. Parts of the hidden outputs were
randomly masked so that these neurons would not influence the forward propagation during
training procedures. When it came to testing phases, the dropout was turned off, and the outputs
of all hidden neurons would affect model testing.
Table 3: Layer configurations for LSTM-based autoencoder
Layer Optimization process Layer sizing
Layer 1 LSTM LTSM Optimization_1: number of Input: Timesteps i x Feature n

neurons 32 to 512 with 32
Output: Timesteps i x Optimization_1 No.
neurons step range per search
neurons
for each layer; Activation
function = ReLU and Leaky
ReLU; L2 regularization = 0 to
0.1 with 0.025 step range
return_sequences = True
Layer 2 LTSM LTSM Optimization_2: number of Input: Timesteps i x Optimization_1 No.

neurons 32 to 256 with 32 neurons
Output: 1 x Optimization_2 No. neurons
0.1 with 0.025 step range;
Dropout rate = 0 to 0.25 with
0.05 step range.
Layer 3 RepeatVector(Timesteps i) Input: 1 x Optimization_2 No. neurons

RepeatVector
neurons

neurons
0.1 with 0.025 step range;
Dropout rate = 0 to 0.25 with
0.05 step range.

neurons
0.1 with 0.025 step range
Layer 6 TimeDistributed(Dense(Features Input: Timesteps i x Optimization_1 No.

TimeDistributed n)) neurons
Output: Timesteps i x Features n
Figure 6: MAE for training and validation dataset with early stopping at about 2400 epochs
Since the Random Search process was time-consuming and required substantial
computation costs, the authors proposed to use the early stopping procedure to reduce the
training time. As illustrated in Figure 6, MAE was selected as a metric to evaluate model loss
during training, and the early stopping procedure took effect when stopping the training process at
2400 epochs. With each run corresponding to a different timestep selection, we will have an
optimized network corresponding to a set of hyperparameters through the Random Search
process.
In this study, the selection of the sliding window or timestep was also an important parameter
when considering the long-term learning capacity of the LSTM network because it helped to
convert the original time series features into 3D tensors for LSTM models. In addition, it also
showed the influence of past samples on the present (recurrent). That means the author can
always find a way to optimize the network hyperparameter set for each of the timestep or sliding
window sizes, which means updating the weights and biases of the network again for each run. In
addition, the activation function was also a critical hyperparameter for the LSTM network that
transformed the weighted sum of input to output for all of the nodes in a network layer [37].
Therefore, for each timestep and activation function, there was an optimized LSTM-AE network
with a combination of the different parameters listed in Table 2. Next, the authors compared the
results of the optimized model with different timesteps and activation and evaluated the effect of
these two values on the final result, which was the F-score.
4.2 Reconstruction error trend

The study aimed to provide a model with good results when training and validating with
standard evaluation metrics. Therefore, the prediction error was used to evaluate the
effectiveness of optimizing hyperparameters for the network and determining the appropriate
threshold values. In addition, the reconstruction error score was calculated as the Mean Absolute
Error (MAE) between the predicted and actual values. The reconstruction error or prediction
residual has low values if test samples are normal instances, while the error becomes large with
anomalous samples.
(a)
(b)
Figure 7: Reconstruction error distribution for Training (a) and Testing dataset (b)
The anomalies are separated from the dataset by selecting the appropriate threshold value.
As shown in Figure 7, the mean and standard deviation of the anomaly scores are different for the
training and testing datasets. Also, we can check the reconstruction error histogram with fitted
Gaussian distribution for the dataset from mean and standard deviation, find out the probability
density of the anomaly score. However, unlike the previous work of Malhotra et al. [21, 27], we do
not use Gaussian distribution of the error vectors to classify anomalies. Using classification
techniques with the Gaussian distribution, as shown in Figure 7, a large amount of data will be
mislabeled because the distribution does not fit the Gaussian distribution.
(a) (b)
Figure 8: Precision and Recall values versus Threshold for an optimized network with timesteps =
2 and activation function = Leaky ReLU
This study analyzed the reconstruction error of the multivariate time-series inputs to obtain
the highest F-score in a semi-supervised learning manner. The authors applied the F-score to
seek a balance between Precision and Recall. F-score is a robust metric for model classification
errors using a binary confusion matrix. While referring to Figure 8, as the threshold increases, the
Precision rate is up to 100% while the Recall rate decreases significantly. On the other hand,
applying the author's method in this study, the threshold value was selected among the minimum
and maximum values for the reconstruction error vector so that the F-score value is maximum. In
this case, with timesteps = 2 and the activation function being ReLU, the optimal Threshold value
is 0.070233, and the F-score value is 0.57143 with 2 False-positive and 1 False-negative
samples. As a result, it makes the machine learning algorithms for classification or anomaly
detection more efficient and avoids false detection.
4.3 Timestep and activation function selection

As shown in Table 4, the ReLU/LeakyReLU gives equivalent results in many cases, for
example, with the value of timesteps equal to 3. With the activation function as ReLU and the
timesteps are 2 and 4, the resulting F-score max is 0.57143. However, the MAE value when
timestep = 2 is 0.07077, smaller than MAE of 0.1189 when the value of timesteps is 4. With the
activation function as Leaky ReLU, the best timestep value is also 4, giving the maximum F-score
result of 0.5. With both types of activation function, as the timestep increases, the F-score
decreases, and the MAE of reconstruction error increases.
Indeed, as the value of timestep increases, the ability to add historical information (or long-
term memory) to the current cell is better. For example, with timestep = 3, two previous samples
(𝑥̂𝑡−2 , 𝑥̂𝑡−1 ) and the current sample 𝑥𝑡 will be used to calculate 𝑥̂𝑡 . Then compare reconstruction
error between 𝑥𝑡 and 𝑥̂𝑡 with a threshold value to see if 𝑥𝑡 is an anomaly or not. However, if 𝑥𝑡 is
an anomaly sample, it will also affect the reconstruction error of 𝑥̂𝑡+1 and 𝑥̂𝑡+2 . In addition, when
requiring more memory capability from previous and current block (long-term memory), more time
is needed to compute the weights and biases matrix for forgetting, input, output gates, and cell,
and MAE of reconstruction error will increase as shown in Table 4. Therefore, the larger the
timestep, the more likely that incorrect labeling of the samples before the anomalies values will
occur, increasing the incorrect labeling rate. However, it is also possible to show early warning
detection compared to ground truth anomalies.
Table 4: TPR, FPR, and Threshold value at F-score max
ReLU Leaky ReLU
Timesteps Threshold Threshold

Confusion F-score Confusion F-score
@ F-score @ F-score
Matrix max Matrix max
max max
TPR = 0.667 TPR = 0.333

2 0.57143 0.0707765 0.1538 0.070233
FPR = 0.0006 FPR = 0.003
TPR = 0.3333 TPR = 0.3333

3 0.4 0.1359536 0.4 0.126873
FPR = 0.0003 FPR = 0.0003
TPR = 0.6667 TPR = 0.333

4 0.57143 0.1188905 0.5 0.145998
FPR = 0.0006 FPR = 0
TPR = 0.667 TPR = 0.667

5 0.5 0.095287 0.2353 0.1222
FPR = 0.001 FPR = 0.004
TPR = 0.333 TPR = 0.667

6 0.5 0.1915758 0.444 0.10387
FPR = 0 FPR = 0.0013
TPR = 0.333 TPR = 0.333

7 0.4 0.165682 0.333 0.220343
FPR = 0.0003 FPR = 0.0006
As previous studies have shown, the activation function is crucial for any deep learning
model. However, it can be seen that, with the same timestep value, ReLU gives better F-score
results. Many previous studies have assessed that Leaky ReLU is not always superior to plain
ReLU. The Leaky ReLU activation function fixed the “dying ReLU problem”, as it does not have
zero-slope parts and speeds up the training process [38]. The dying problem is likely to occur
when the learning rate is too high or a significant negative bias. However, the learning rate was
optimized with different values from 0.0001 to 0.01 in this study, so there was no negative bias.
As shown in Figure 9, the loss distribution (MAE) of all the data is represented, and the threshold
value at 0.1189 is selected to get the best F-score, and the anomaly instances have a much
higher MAE of the reconstruction error. By comparing the timestep and the activation function, the
authors can select the corresponding threshold value to get an F-score of 0.57143, TPR of 0.667,
and FPR of 0.0003 with all the anomalies and an early finding correctly identified.
Figure 9: Loss distribution of the dataset and threshold value
Moreover, it can be seen from Figure 10 that anomalies are difficult to detect with the
conventional method by the operator if comparing data over a long period. Even when zoom-in,
the change is so small to detect. However, when evaluating the mean of the error vectors for six
features, the anomaly samples were detected accurately and 1 to 2 minutes earlier than ground
truth time by domain expert labeling. Detected anomalies (green lines in Figure 10b) were spotted
sooner than the ground truth and counted as False-negative. However, in this case, it gives an
early warning to the operator. This study's optimum LSTM-based autoencoder network consisted
of 4 LSTMs with 448, 224, 224, and 448 neurons, respectively, 1 Dropout of 0.15 fraction, 1
Repeat Vector, and 1 TimeDistributed layer. The activation function was ReLU, L2 regularization
of 0.025, the learning rate of 0.001, and MAE's loss function to provide the best performance
metric.
(a) (b)
Figure 10: Ground truth (red dash line) and detected anomalies (green line in yellow highlight) for
three typical features (a) and with the zoom-in portion of the dataset (b)
5. CONCLUSIONS
Having robust anomaly detection tools enables organizations to leverage their existing data,
automatically compare the incoming data and the information from previous case histories to
anticipate or predict abnormal equipment behavior before it happens, to stay ahead of potential
breakdowns or disruptions in services and address them proactively instead of reacting to issues
as they arise. This paper discussed the improved LSTM-based autoencoder network for anomaly
detection for centrifuge natural gas compressor. Different network hyperparameters were carefully
studied to obtain the optimized model with the maximum value of F-score. Significant conclusions
are highlighted as follows:
▪ A practical methodology for anomaly detection on a natural gas compressor was
developed using an improved LSTM-based Autoencoder (LSTM-AE) containing stacked
LSTM layers that efficiently extract essential features from the multivariate time series.
▪ The Random Search technique effectively optimized the improved LSTM-AE network to
obtain the lowest MAE of reconstruction error vector for different hyperparameter set such
as network architecture, the number of neurons, sliding window size, activation function,
regularization, dropout, batch size, and learning rate.
▪ An adaptable and accurate LSTM-AE network was developed to obtain the highest value
of F-score of 0.57143 with all the anomalies and an early finding correctly identified. The
network consisted of 4 LSTMs with 448, 224, 224, and 448 neurons, respectively, 1
Dropout of 0.15 fraction, 1 Repeat Vector, and 1 TimeDistributed layer. The activation
function was ReLU, L2 regularization of 0.025, the learning rate of 0.001, and MAE's loss
function to provide the best performance metric.
The improved LSTM-based autoencoder for anomaly detection has opened up a positive
research direction in intelligent management and monitoring of oil and gas processing equipment
and well performance with different production profiles. Thereby, the developed results should
have significant contributions to Machine Learning applications in oil and gas operation and
production areas.
6. LIST OF FIGURES
Figure 1: Framework Of Data-Driven Machine Monitoring Systems .............................................. 2

Figure 2: Schematic Diagram Of Lstm [16].................................................................................... 7
Figure 3: Autoencoder Include The Encoding (Dimension Reduction), Decoding Layers
(Reconstruction), And Latent Space.............................................................................................. 8
Figure 4: An Illustration Of The Calculation Procedure Of The LSTM-Based Autoencoder Network
................................................................................................................................................... 10
Figure 5: Data Splitting (Blue Dash Line) And Anomalies Plot (Red Dash Line) For Three Different
Features...................................................................................................................................... 13
Figure 6: Mae For Training And Validation Dataset With Early Stopping At About 2400 Epochs . 17
Figure 7: Reconstruction Error Distribution For Training (A) And Testing Dataset (B) ................. 18
Figure 8: Precision And Recall Values Versus Threshold For An Optimized Network With
Timesteps = 2 And Activation Function = Leaky Relu.................................................................. 19
Figure 9: Loss Distribution Of The Dataset And Threshold Value ................................................ 21
Figure 10: Ground Truth (Red Dash Line) And Detected Anomalies (Green Line In Yellow
Highlight) For Three Typical Features With The Zoom-In Portion Of The Dataset (B) ................. 21
7. NOMENCLATURE
AM Attention Mechanism
CNN Convolutional Neural Network
DNN Deep neural networks
GAN Generative Adversarial Networks
HT-MT Hai Thach – Moc Tinh field
IT Information Technology
LSTM-AE Long Short Term Memory based Autoencoder

network
LSTM Long Short Term Memory network
ML Machine Learning
MLP Multi-Layer Perceptrons
MAE Mean Absolute Error
MAPE Mean Absolute Percent Error
MCC Matthews Correlation Coefficient
MSE Mean Square Error
OCSVM One-class Support Vector Machine

ReLU Rectified Linear Unit
LeakyReLU Leaky Rectified Linear Unit
RMSE Root Mean Squared Error
RNN Recurrent Neural Network
RS Random Search
TCNN time convolutional neural network
8. ACKNOWLEDGEMENT
The research work described herein is part of the government research project number
077.2021.CNKK.QG/HĐKHCN, order 196/QD-BCT of the Ministry of Industry and Trade of the
Socialist Republic of Viet Nam.
9. REFERENCES
[1] T. R. Bandaragoda, K. M. Ting, D. Albrecht, F. T. Liu, Y. Zhu, and J. R. Wells, "Isolation-

based anomaly detection using nearest-neighbor ensembles," vol. 34, no. 4, pp. 968-998,
2018.
[2] S. Thudumu, P. Branch, J. Jin, and J. Singh, "A comprehensive survey of anomaly
detection techniques for high dimensional big data," Journal of Big Data, vol. 7, no. 1, p.
42, 2020/07/02 2020.
[3] M. Braei and S. Wagner, "Anomaly Detection in Univariate Time-series: A Survey on the
State-of-the-Art," vol. abs/2004.00433, 2020.
[4] T. Tran Vu, T. Tran Ngoc, H. Ngo Huu, and T. Nguyen Thanh, "Digital transformation in oil
and gas companies - A case study of Bien Dong POC," Petrovietnam Journal, vol. 10, pp.
67-78, 2020/10/30 2020.
[5] T. N. Trung, T. V. Tùng, H. K. Sơn, N. H. Hải, and Đ. Q. Khoa, "Thực tiễn triển khai nền
tảng số hóa tập trung tại mỏ Hải Thạch - Mộc Tinh," PETROVIETNAM JOURNAL, vol. Số
12 - 2020, trang 47 - 56, no. Số 12 - 2020, pp. 47 - 56, 2020.
[6] T. N. Trung et al., "Virtual Multiphase Flowmetering Using Adaptive Neuro-Fuzzy
Inference System (ANFIS): A Case Study of Hai Thach-Moc Tinh Field, Offshore
Vietnam," SPE Journal, pp. 1-15, 2021.
[7] R. Chalapathy and S. Chawla, "Deep Learning for Anomaly Detection: A Survey," 2019.
[8] X. Ding and Q. He, "Energy-Fluctuated Multiscale Feature Learning With Deep ConvNet
for Intelligent Spindle Bearing Fault Diagnosis," IEEE Transactions on Instrumentation and
Measurement, vol. 66, no. 8, pp. 1926-1935, 2017.
[9] Y. Gao, X. Liu, and J. Xiang, "FEM Simulation-Based Generative Adversarial Networks to
Detect Bearing Faults," IEEE Transactions on Industrial Informatics, vol. 16, no. 7, pp.
4961-4971, 2020.
[10] X. Liu, H. Huang, and J. Xiang, "A personalized diagnosis method to detect faults in gears
using numerical simulation and extreme learning machine," Knowledge-Based Systems,
vol. 195, p. 105653, 02/01 2020.
[11] Z. Pan, Z. Meng, Z. Chen, W. Gao, and Y. Shi, "A two-stage method based on extreme
learning machine for predicting the remaining useful life of rolling-element bearings,"
Mechanical Systems and Signal Processing, vol. 144, p. 106899, 2020/10/01/ 2020.
[12] W. Zhang, X. Li, and Q. Ding, "Deep residual learning-based fault diagnosis method for
rotating machinery," ISA Transactions, vol. 95, pp. 295-305, 2019/12/01/ 2019.
[13] W. Zhang, X. Li, X.-D. Jia, H. Ma, Z. Luo, and X. Li, "Machinery fault diagnosis with
imbalanced data using deep generative adversarial networks," Measurement, vol. 152, p.
107377, 2020/02/01/ 2020.
[14] L. Jing, M. Zhao, P. Li, and X. Xu, "A convolutional neural network based feature learning
and fault diagnosis method for the condition monitoring of gearbox," Measurement, vol.
111, pp. 1-10, 2017/12/01/ 2017.
[15] X. Guo, L. Chen, and C. Shen, "Hierarchical adaptive deep convolution neural network
and its application to bearing fault diagnosis," Measurement, vol. 93, pp. 490-502,
2016/11/01/ 2016.
[16] L. Yu, J. Qu, F. Gao, and Y. Tian, "A Novel Hierarchical Algorithm for Bearing Fault
Diagnosis Based on Stacked LSTM," Shock and Vibration, vol. 2019, pp. 1-10, 01/06
2019.
[17] L. Xiang, P. Wang, X. Yang, A. Hu, and H. Su, "Fault detection of wind turbine based on
SCADA data analysis using CNN and LSTM with attention mechanism," Measurement,
vol. 175, p. 109094, 2021/04/01/ 2021.
[18] W. Teng, H. Cheng, X. Ding, Y. Liu, Z. Ma, and H. Mu, "A DNN-based approach for fault
detection in a direct drive wind turbine," IET Renewable Power Generation, vol. 12, 05/14
2018.
[19] S. Afrasiabi, M. Afrasiabi, B. Parang, M. Mohammadi, M. Arefi, and M. Rastegar, Wind
Turbine Fault Diagnosis with Generative-Temporal Convolutional Neural Network. 2019,
pp. 1-5.
[20] L. Wang, Z. Zhang, J. Xu, and R. Liu, "Wind Turbine Blade Breakage Monitoring With
Deep Autoencoders," IEEE Transactions on Smart Grid, vol. 9, no. 4, pp. 2824-2833,
2018.
[21] P. Malhotra, L. Vig, G. Shroff, and P. Agarwal, Long Short Term Memory Networks for
Anomaly Detection in Time Series. 2015.
[22] R. Zhao, R. Yan, J. Wang, and K. Mao, "Learning to Monitor Machine Health with
Convolutional Bi-Directional LSTM Networks," vol. 17, no. 2, p. 273, 2017.
[23] H. Zhao, H. Liu, W. Hu, and X. Yan, "Anomaly detection and fault analysis of wind turbine
components based on deep learning network," Renewable Energy, vol. 127, pp. 825-834,
2018/11/01/ 2018.
[24] Y. Wu, M. Yuan, S. Dong, L. Lin, and Y. Liu, "Remaining useful life estimation of
engineered systems using vanilla LSTM neural networks," Neurocomputing, vol. 275, pp.
167-179, 2018/01/31/ 2018.
[25] J. Vávra, M. Hromada, L. Lukáš, and J. Dworzecki, "Adaptive anomaly detection system
based on machine learning algorithms in an industrial control environment," International
Journal of Critical Infrastructure Protection, vol. 34, p. 100446, 2021/09/01/ 2021.
[26] M. Sakurada and T. Yairi, Anomaly Detection Using Autoencoders with Nonlinear
Dimensionality Reduction. 2014, pp. 4-11.
[27] P. Malhotra, A. Ramakrishnan, G. Anand, L. Vig, P. Agarwal, and G. Shroff, "LSTM-based
Encoder-Decoder for Multi-sensor Anomaly Detection," 07/01 2016.
[28] S. Hochreiter and J. Schmidhuber, "Long Short-term Memory," Neural computation, vol. 9,
pp. 1735-80, 12/01 1997.
[29] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, "LSTM: A
Search Space Odyssey," IEEE Transactions on Neural Networks and Learning Systems,
vol. 28, no. 10, pp. 2222-2232, 2017.
[30] K. Cho, B. van Merriënboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio,
"Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine
Translation," 06/03 2014.
[31] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to Sequence Learning with Neural
Networks," in NIPS, 2014.
[32] H. D. Nguyen, K. P. Tran, S. Thomassey, and M. Hamad, "Forecasting and Anomaly
Detection approaches using LSTM and LSTM Autoencoder techniques with the
applications in supply chain management," International Journal of Information
Management, vol. 57, p. 102282, 2021/04/01/ 2021.
[33] S. Ioffe and C. Szegedy, "Batch normalization: accelerating deep network training by
reducing internal covariate shift," presented at the Proceedings of the 32nd International
Conference on International Conference on Machine Learning - Volume 37, Lille, France,
2015.
[34] M. Munir, S. A. Siddiqui, A. Dengel, and S. Ahmed, "DeepAnT: A Deep Learning Approach
for Unsupervised Anomaly Detection in Time Series," IEEE Access, vol. 7, pp. 1991-2005,
2019.
[35] S. Maya, K. Ueno, and T. Nishikawa, "dLSTM: a new approach for anomaly detection
using deep learning with delayed prediction," International Journal of Data Science and
Analytics, vol. 8, 09/01 2019.
[36] K. P. Tran, H. D. Nguyen, and S. Thomassey, "Anomaly detection using Long Short Term
Memory Networks and its applications in Supply Chain Management," IFAC-
PapersOnLine, vol. 52, no. 13, pp. 2408-2412, 2019/01/01/ 2019.
[37] V. K and S. K, "Towards activation function search for long short-term model network: A
differential evolution based approach," Journal of King Saud University - Computer and
Information Sciences, 2020/05/13/ 2020.
[38] S. Sharma, S. Sharma, and A. Athaiya, "ACTIVATION FUNCTIONS IN NEURAL
NETWORKS," International Journal of Engineering Applied Sciences and Technology, vol.
04, pp. 310-316, 05/10 2020.

2022-IPE3 - Anomaly Detection For Centrifuge Natural Gas Compressor Using LSTM Based Autoencoder - Biendong POC

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2022-IPE3 - Anomaly Detection For Centrifuge Natural Gas Compressor Using LSTM Based Autoencoder - Biendong POC

Uploaded by

Copyright:

Available Formats

Anomaly Detection for Centrifuge Natural Gas Compressor Using

LSTM-Based Autoencoder in Hai Thach – Moc Tinh Field, Offshore

Hai H. Ngo - Bien Dong Petroleum Operating Company

Figure 1: Framework of data-driven machine monitoring systems

2. DEEP LEARNING APPROACHES FOR MACHINERY ANOMALY DETECTION

2.1 Deep learning approaches for anomaly detection

Deep Learning Network Application Reference

Generative-Temporal The generative countermeasures network (GAN) was [19]

2.2 Research objectives

3.1 Dataset and data preprocessing techniques

3.2 LSTM-based autoencoder model

𝑖𝑡 = 𝜎(𝑊𝑖 ∙ [ℎ𝑡−1 , 𝑥𝑡 ] + 𝑏𝑖 ) (4)

Figure 3: Autoencoder include the encoding (dimension reduction), decoding layers

The LSTM-based autoencoder network was optimized with ranges of hyperparameters to

3.3 Hyperparameter optimization

3.4 Performance assessment and evaluation metrics for anomaly detection

4. RESULTS AND DISCUSSION

4.1 Hyperparameter optimization

Hyperparameter Function Optimization ranges

Number of Explore additional hierarchical learning 4 LSTM, 1 Dropout, 1

Optimization Investigate optimization algorithms to see if Random Search

Regularization Explore how weight regularization, such as L2 regularization = 0 to 0.1

The LSTM-based autoencoder network in this study consisted of 4 LSTMs, 1 Dropout, 1

Layer Optimization process Layer sizing

Layer 1 LSTM LTSM Optimization_1: number of Input: Timesteps i x Feature n

Layer 2 LTSM LTSM Optimization_2: number of Input: Timesteps i x Optimization_1 No.

Layer 3 RepeatVector(Timesteps i) Input: 1 x Optimization_2 No. neurons

Layer 4 LTSM LTSM Optimization_2: number of Input: Timesteps i x Optimization_2 No.

Layer 5 LTSM LTSM Optimization_1: number of Input: Timesteps i x Optimization_2 No.

Layer 6 TimeDistributed(Dense(Features Input: Timesteps i x Optimization_1 No.

4.2 Reconstruction error trend

4.3 Timestep and activation function selection

ReLU Leaky ReLU

Timesteps Threshold Threshold

TPR = 0.667 TPR = 0.333

TPR = 0.3333 TPR = 0.3333

TPR = 0.6667 TPR = 0.333

TPR = 0.667 TPR = 0.667

TPR = 0.333 TPR = 0.667

TPR = 0.333 TPR = 0.333

Figure 1: Framework Of Data-Driven Machine Monitoring Systems .............................................. 2

CNN Convolutional Neural Network

DNN Deep neural networks

GAN Generative Adversarial Networks

HT-MT Hai Thach – Moc Tinh field

LSTM-AE Long Short Term Memory based Autoencoder

LSTM Long Short Term Memory network

MLP Multi-Layer Perceptrons

MAE Mean Absolute Error

MAPE Mean Absolute Percent Error

MCC Matthews Correlation Coefficient

MSE Mean Square Error

OCSVM One-class Support Vector Machine

LeakyReLU Leaky Rectified Linear Unit

RMSE Root Mean Squared Error

RNN Recurrent Neural Network

TCNN time convolutional neural network

[1] T. R. Bandaragoda, K. M. Ting, D. Albrecht, F. T. Liu, Y. Zhu, and J. R. Wells, "Isolation-

You might also like