Professional Documents
Culture Documents
2022-IPE3 - Anomaly Detection For Centrifuge Natural Gas Compressor Using LSTM Based Autoencoder - Biendong POC
2022-IPE3 - Anomaly Detection For Centrifuge Natural Gas Compressor Using LSTM Based Autoencoder - Biendong POC
ABSTRACT
Predictive Maintenance (PdM) practices are becoming increasingly popular in many industry
sectors because they reduce unnecessary maintenance operations and improve machinery
reliability. With the recent advancement of Industry 4.0, Artificial Intelligence and Machine
Learning have found wide applications in PdM practice to assist humans in processing,
continuously monitoring, and analyzing equipment health. The first step in a PdM program is
identifying or distinguishing between the equipment normal and abnormal operating environment.
At Hai Thach - Moc Tinh (HT-MT) field, monitoring the natural gas compressor is crucial in the gas
condensate processing system. Therefore, research on anomaly detection was identified as a
prerequisite and pioneering step in applying PdM program in the HT-MT field.
In this study, the authors used an improved LSTM-based autoencoder network for anomaly
detection of multivariate features with natural gas compressor. The network, combined with the
Random Search optimization technique and the predefined set of hyperparameters, was trained
to achieve an optimized loss function of Mean Absolute Error (MAE). Then, the optimal threshold
value was selected to achieve the highest F-score among the optimized models. Two critical
hyperparameters, namely timesteps and activation functions, were tested for different runs to
obtain the model with the highest F-score value of 0.57143. The adaptability and accuracy
network correctly detected all the anomalies and early findings while maintaining the overall
performance, such as avoiding overfitting and providing early stopping procedures. Indeed, an
early finding was spotted sooner than the ground truth by domain expert labeling, giving the
operators a warning earlier by 2 minutes. The improved LSTM-based autoencoder network
efficiently extracts essential features from the multivariate time series to correctly classify the
anomalies and minimize the misdiagnosis instances.
1. INTRODUCTION ABOUT ANOMALY DETECTION FOR MACHINERY
Anomaly detection is an important data mining task with various applications in various
domains [1]. There is no consent about distinguishing anomalies and outliers (interchangeably).
Anomaly detection refers to the problem of finding patterns in data that do not conform to
expected behavior. Often, anomalies in data are translated to significant and critical information in
a wide variety of applications such as intrusion detection, fraud detection, fault detection,
equipment health monitoring, image processing, and sensor networks. All these definitions
highlight two main characteristics of anomalies, namely: (1) the distribution or abnormal patterns
of anomalies deviate significantly from the general distribution of the data [2] and (2) a significant
portion of the data set consists of normal data points, in which the anomalies form only a tiny part
of the dataset [3].
Mechanical devices such as engines, vehicles, and aircraft are typically instrumented with
numerous sensors to capture the behavior and health of the machine. With the rise of the Fourth
Industrial Revolution (Industry 4.0), improvements of Information Technology infrastructure have
been made to support intelligent sensing, Internet of Things (IoT), extensive data collection, and
analytics tools; companies now can get the most out of their data.
Data-driven methods are gaining popularity among various machine health monitoring
approaches due to advanced sensing and data analytic techniques. Data-driven decisions can be
made by analyzing historical events and trends in the past [4-6]. For instance, manual controls
and/or unmonitored environmental conditions or loads may lead to inherently unpredictable time
series. Detecting anomalies in such scenarios becomes a crucial need that utilizes Machine
Learning (ML) to process, continuously monitor, and analyze equipment health. The data-driven
models focus on training based on historical measured data and then predict with real-time data
to monitor the operational status of the machine changes. In addition, the development of
advanced sensors and computing systems makes the research topic of data-driven machine
monitoring systems more and more attractive. The basic framework behind data-driven models for
machinery health monitoring, as illustrated in Figure 1, consists of four major parts: (1) data
acquisition (taking various sensor data as inputs), (2) feature extraction/selection/fusion, (3) model
training, validating and optimization, and (4) model prediction and assessment.
Depending on equipment types, different combinations of sensors are used so that the
captured data can reflect the degradation process of machinery. Sensors installed at the
equipment collect continuous time-series operational data. Some commonly used sensors are
accelerometers, acoustic emission sensors, infrared thermometers, current, temperature,
pressure sensors, etc. Anomaly detection in time-series is strongly linked to time-series analysis
and forecasting methods. Chalapathy and Chawla suggested that anomalies consists of three
different types [7]:
▪ Point anomalies: If a point deviates significantly from the rest of the data, it is considered a
point anomaly. Hence, a point 𝑋𝑡 is considered a point anomaly, if its value differs
significantly from all the points in the interval [𝑋𝑡−𝑘 , 𝑋𝑡+𝑘 ], 𝑘 ∈ ℝ and 𝑘 are sufficiently
large.
▪ Collective anomalies: There are cases where individual points are not anomalous, but a
sequence of points is labeled as an anomaly.
▪ Contextual anomalies: Some points can be normal in a particular context while detected
as an anomaly in another context. For example, a differential pressure of 1 bar in the start-
up stage is normal, while the same differential pressure in the normal stage is regarded as
an anomaly.
Therefore, in equipment health monitoring, anomalies are related to unexpected data from
sensors or processors (point anomalies), significant events/sequences (collective anomalies); or
unexpected behavior during transient stages that require analysis, monitoring, and evaluation.
Having robust anomaly detection tools helps companies avoid potential breakdowns or address
the problem proactively. With improved equipment performance monitoring techniques,
corporations can achieve greater operational and maintenance efficiency and effective growth
management. To obtain robust anomaly detection, commonly used methods are (1) statistical
methods, (2) classical machine learning methods, and (3) methods using Deep Learning [3]. In
the last decade, deep learning methods have made significant progress in detecting anomalies.
That has prompted us to approach deep learning methods in this study. The LSTM-based
autoencoder (LSTM-AE) used in this study was constructed as a sequence to sequence model
containing stacked LSTM layers. This network allows extracting essential features from the
multivariate time series more efficiently. The output of an anomaly detection algorithm can be one
of two types: (1) outlier scores to quantify the level of “outliers” of each data point, or (2) binary
labels indicating whether a data point is an outlier or not. This LSTM-AE scheme learns to
reconstruct normal time-series behavior and uses reconstruction error to detect binary labels for
each anomaly.
With the development of deep learning methods in the last few years, deep neural network
structures can extract local features that are robust and informative from the sequential input
(Convolutional Neural Network - CNN), long-term capture dependencies, and sequential model
data (Long Short Term Memory - LSTM). Then, the fully connected layers and the linear
regression layer can help predict the target value to improve the prediction accuracy. The deep
learning methods are widely used in fault detection and diagnosis [8-13], and they are verified to
be effective for big data of multiple parameters to detect the faults for machinery [14-20].
Deep Autoencoder (DA) Reconstruct the errors between input and output values [23]
network to monitor the operation state of wind turbine
Stacked LSTM network Anomaly prediction of space shuttle and engine based on [21]
the probability distribution of residual vectors
Convolutional Bi- Predict the tool wear for the CNC milling machine using [22]
Directional LSTM CNN's local feature extraction capability and the ability to
capture the temporal information of the LSTM network
CNN-LSTM with The CNN is combined with LSTM to extract the wind [17]
Attention Mechanism turbine's time features to enable condition monitoring and
anomaly detection. AM is designed for LSTM to
concentrate on the characteristics that significantly
impact output variables to improve the model's accuracy.
Deep neural networks Fault identification method based on rotor speed [18]
(DNN) prediction error for wind turbine generators
Vanilla LSTM Estimate the remaining useful life of an aircraft turbofan [24]
engine under complex operating conditions and strong
background noise
The deep learning-based anomaly detection models are divided into unsupervised, semi-
supervised, and supervised methods. Due to the high cost of labeling data for supervised
Machine Learning (ML) algorithms, unsupervised approaches have been given substantial
attention in recent literature [25]. A semisupervised ML algorithm allows the use of unlabeled data
together with labeled data. However, semi-supervised learning has a problem with contamination
of training datasets, interpretation of results, and poor detection capabilities [25]. In this study, the
authors focus on the semi-supervised machine learning approach to overcome this problem
because the time to highlight anomalies for multivariate time-series data during normal operation
will be much less than point-by-point labeling. In addition, transient changes such as load
transferring, well opening and closing, or changing operating modes are very frequent during the
operation of a compressor in the gas-condensate processing plant.
3. METHODOLOGY
In the last decade, deep learning approaches achieved tremendous progress in detecting
anomalies. In this section, we briefly review some artificial intelligence networks necessary to
build the proposed model to detect anomalies, including the LSTM, the autoencoder network, and
the methodology for the LSTM-AE model in this study. The ability of LSTM to learn patterns in
data over long sequences makes them suitable for time series forecasting or anomaly detection.
Autoencoder is an unsupervised feed-forward neural network that recreates the input data while
extracting its essential features through dimensions reduction using the nonlinear activation
function and multiple neural layers. Taking the advantages from both LSTM and autoencoder
networks, the LSTM-based autoencoder network was studied to reconstruct the multivariate time-
series inputs and then used residual error to classify anomalies.
𝑥−𝑥̅
𝑥′ = (2)
𝜎
Where, 𝑥 is the original dataset, 𝑥 𝑖 is the scaled dataset, 𝑥̅ and 𝜎 are the mean and standard
deviation.
According to other studies, the deep learning anomaly detection algorithms are implemented
semisupervised (one-class classification). As recommended by Vavra et al., for the semi-
supervised learning model [25], the training and validation dataset contains only normal class
data, and the test dataset includes all classes of a dataset. The data in the test dataset are
entirely separated from the training and validation dataset. This technique is also applied in Xiang
et al. and Sakurada and Yairi's studies for deep learning anomaly detection models [17, 26]. On
the other hand, for the stacked LSTM network, Malhotra et al. divided the normal sequence into
four sets: normal train, normal validation-1, normal validation-2, and normal test. The anomalous
sequence(s) are divided into anomalous validation and abnormal test [21, 27]. Zhao et al. applied
a three-fold training/testing data splitting technique consisting of two training and one testing
dataset [22]. In this study, the authors divided the dataset into four sets: normal train, normal
validation, a test dataset that contains normal and anomaly instances. During the training and
validation process, the LSTM-AE network is optimized to select the most suitable hyperparameter
set. The labeled anomalies and normal instances are used to calculate the threshold and
performance of the classification process.
The abandoned information is implemented by forget gate 𝑓𝑡 and input gate 𝑖𝑡 , which can be
denoted by:
𝑓𝑡 = 𝜎(𝑊𝑓 . [ℎ𝑡−1 , 𝑥𝑡 ] + 𝑏𝑓 ) (3)
When the data is fed to the network, the encoder layers 𝐸 compresses input array 𝑋 into the
latent space 𝑍, whereas the decoder 𝐷 decompresses the encoded representation into the output
layer with the same dimension as the input array 𝑋̂ (Figure 3). Formally, the reconstructed array
can be expressed by:
𝑋̂ = 𝐷(𝐸(𝑋)) (10)
Then, the error is backpropagated through the architecture to update the weights 𝜃𝜙 and 𝜃𝜓
of the encoding and decoding part. The optimization function of the autoencoder tries to minimize
the deviation between 𝑋 and 𝑋̂ [3]:
2 2
min ‖𝑋 − 𝑋̂‖ = min ‖𝑋 − 𝐷(𝐸(𝑋))‖ (11)
𝜃𝜓 ,𝜃𝜙 𝜃𝜓 ,𝜃𝜙
By constraining the latent space to have a smaller dimension than the input, the autoencoder
is forced to learn the most critical features of the training data. Autoencoders are best suited to
semi-supervised learning approaches where the training data only consists of normal points. The
reconstruction error during training autoencoder network with normal instances is much lower
than feeding with an anomalous sequence. The comparison or deviation of error vector when
feeding with normal and abnormal data can be used to classify the anomaly [3, 25]. Since it is a
deep learning-based strategy, it will particularly struggle with less data. The computation costs will
be significantly increased if the network's depth increases, and dealing with big data and the
complicated architecture shall lead to overfitting issue.
3.2.3 LSTM Autoencoder
In the field of anomaly detection with deep learning methods for machines, many studies
involve the development of the LSTM models as feature extractors. Several types of
autoencoders have been proposed in the literature, such as vanilla autoencoder, deep
autoencoder, convolutional autoencoder, regularized autoencoder, and LSTM autoencoder. The
LSTM-based autoencoder refers to the autoencoder that both the encoder and the decoder are
the LSTM network. LSTM autoencoder models have been recently proposed for sequence-to-
sequence learning tasks like machine translation [30, 31], detecting anomalies in multi-sensor
time series using only the normal sequences [27]. Malhotra et al. showed that the LSTM and
stacked LSTM autoencoder network are robust and can detect anomalies from predictable,
unpredictable, periodic, aperiodic, and quasi-periodic time series [21, 27].
The LSTM-based autoencoder used in this study was constructed as a sequence to
sequence model containing stacked LSTM layers. This network allows extracting essential
features from the multivariate time series more efficiently. Indeed, the LSTM encoder learns a
fixed-length vector representation of the input time series, and the LSTM decoder uses this
representation to reconstruct the time series using the current hidden state and the value
predicted at the previous time step. The calculation procedure of the LSTM-based autoencoder
network in this study can be illustrated in Figure 4 and with the following detailed steps.
Step 1:
The dataset was divided into training, validating, and testing datasets. The training and the
validating datasets only contain normal instances, while the testing dataset contains some labeled
anomalies.
(1) (2) (3) (𝑛)
Let 𝑋𝑡 = {𝑋𝑡 , 𝑋𝑡 , 𝑋𝑡 , … , 𝑋𝑡 } be an array that represents multivariate time series of 𝑡
samples and 𝑛 number of features. In this particular problem, 𝑋𝑡 contains the time series of
samples for each of the following features: pressure, temperature, flow, and power of a centrifuge
gas compressor.
Step 2:
In the training process, the dataset was reconstructed into 3 dimensions (3D) array with
timesteps of 𝑖. The sequence of observed data became a 3D array [sample 𝑡, timesteps 𝑖,
features 𝑛]. Then, the sequence of observations is scaled using the normalization (Equation 1) or
standardization (Equation 2) function.
(1) (2) (3) (𝑛)
𝑋𝑡 [𝑡, 𝑛] = {𝑋𝑡 , 𝑋𝑡 , 𝑋𝑡 , … , 𝑋𝑡 } => 𝑋𝑡′ [𝑡, 𝑖, 𝑛] = {{𝑌𝑖𝑛 }, {𝑌𝑖+1
𝑛 },
… , {𝑌𝑡𝑛 }} (12)
(1) (1)
𝑥1 … 𝑥𝑖
𝑛
𝑌𝑡 = [ ⋮ ⋱ ⋮ ] (13)
(𝑛) (𝑛)
𝑥1 ⋯ 𝑥𝑖
Figure 4: An illustration of the calculation procedure of the LSTM-based autoencoder network
Step 3:
In the training process, the array 𝑋𝑡′ [𝑡, 𝑖, 𝑛] is fed to the LSTM-based autoencoder network to
create a reconstructed array 𝑋̂𝑡′ [𝑡, 𝑖, 𝑛]. The network shall update the weights and biases of LSTM
layers to minimize the Mean Absolute Error (MAE) by putting each the array {𝑌𝑖𝑛 } until the last
array of 𝑋𝑡′ [𝑡, 𝑖, 𝑛].
̂
(1) ̂
(1)
𝑥1 … 𝑥𝑖
𝑌̂𝑛
𝑡 =[ ⋮ ⋱ ⋮ ] (14)
̂
(𝑛) ̂
(𝑛)
𝑥1 ⋯ 𝑥𝑖
̂′ [𝑡, 𝑖, 𝑛] = {{𝑌̂
𝑋 𝑛 ̂𝑛 ̂𝑛
𝑖 }, {𝑌𝑖+1 }, … , {𝑌𝑡 }} (15)
𝑡
The weights and biases of the LSTM network are trained to minimize the loss function of
MAE:
1
𝐿𝑜𝑠𝑠(𝑀𝐴𝐸) = 𝑡 ∑𝑡𝑘=1‖𝑋𝑘 − 𝑋̂𝑘 ‖ (16)
1
𝑅𝑀𝑆𝐸 = √𝑛 ∑𝑛𝑖=1(𝑦̂𝑖 − 𝑦𝑖 )2 (19)
1 𝑦𝑖 −𝑦̂𝑖
𝑀𝐴𝑃𝐸 = 𝑛 ∑𝑛𝑖=1 | 𝑦𝑖
| (20)
∑𝑛 ̂ 𝑖 −𝑦𝑖 )2
𝑖=1(𝑦
𝑅2 = 1 − 𝑛
∑𝑖=1 ̅−𝑦𝑖 )2
(𝑦
(21)
1
𝑦̅ = 𝑛 ∑𝑛𝑖=1 𝑦𝑖 (22)
Where 𝑛 represents the predicted points; 𝑦𝑖 and 𝑦̂𝑖 represent the observed value and the
predictive value of the ith point, respectively.
3.4.2 Binary confusion matrix
Several evaluation metrics can be considered to compare the different anomaly detection
methods. The binary confusion matrix expresses the relationship between predicted and actual
classes or prediction/reconstruction error between the predicted/forecasted and the actual values.
In addition to the binary confusion matrix, Vavra et al. applied the following evaluation metrics:
Matthews Correlation Coefficient (MCC), Precision, Recall, F-score, and False-positive rate (FPR)
[25]. Matthews Correlation Coefficient (MCC) expresses all aspects of the confusion matrix and is
resilient against the usage of unbalanced datasets.
TP is the True Positive classification, which is anomalous and detected as anomalous by the
algorithm. FP is the False Positive that are normal but have been incorrectly diagnosed as
anomalies. TN (True Negative) stands for the number of normal events correctly diagnosed as
normal. FN (False Negative) stands for the number of anomalies incorrectly classified as normal
events. The False-Positive Rate (FPR) is the most critical metric, expressing positive cases
identified as a false class. The True Positive Rate (TPR) and False-Positive rate (FPR) can be
obtained by:
𝑇𝑃
𝑇𝑃𝑅 =
𝑃
𝐹𝑃
𝐹𝑃𝑅 =
𝑁
The comparison is also made by using the Precision, Recall, and F-score metrics. Precision
considers the False-Positive (FP) classification in the metric calculation. In comparison, the Recall
is used to evaluate how complete the result is.
TP
Precision =
TP+FP
TP
Recall =
TP+FN
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹 − 𝑠𝑐𝑜𝑟𝑒 = 2 ×
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
Munir et al. used this metric to evaluate their anomaly detection methods on different time
series [34]. Maya et al. also use the F-score in addition to the Recall and Precision [35]. Together
with Precision, Recall, and F-score metrics, two studies have applied Accuracy to the LSTM
autoencoder model as a feature extractor to extract essential multivariate time series input
representations [32, 36].
TP+TN
Accuracy =
TP+FP+TN+FN
3.4.3 Anomaly detection mechanism
The prediction residual or reconstruction error evaluated the machine learning model
performance and classified the anomalies. The reconstruction error has low values if test samples
are normal instances, while the residual error becomes large with anomalous samples for CNN-
LSTM-AM deep learning models [17]. According to Tran et al. study in 2019, Sheather and
Marron's kernel quantile estimator method was applied to estimate the threshold over the set of
error vectors between the predicted and observed values of the LSTM network [36]. The
proposed methodology by Tran et al. classified the anomaly samples with 94% accuracy, 96%
precision, and 86% F-score. In the studies by Malhotra et al., anomaly detection was classified
based on the reconstruction error of LSTM-based Autoencoder network, assuming that those
error vectors followed a Gaussian distribution [21, 27]. In addition, Nguyen et al. suggested
applying the One-Class Support Vector Machine (OCSVM) without any specific assumption of
data to detect the anomaly [32]. After learning within the hidden representations of autoencoders,
the OCSVM can define a hyperplane to separate anomalies from normal observations from these
independent error vectors. On the other hand, the evaluation of detection capabilities was
provided due to the test dataset classification. Vávra et al. suggested using the precision/recall
curve to calculate the final threshold to distinguish the normal data from anomalies for semi-
supervised machine learning models [25].
In this paper, sensing values from the natural gas compressor were captured as a case
analysis. One set of data contained faults in July 26, 2021, with the event of excessive liquid
carrying over to the suction side of the compressor. This abnormal event might cause excessive
vibration, bearing damage, and rotor blade degradation. According to domain experts, the data
before and after the event was labeled as the normal operating condition of the machine.
Figure 5: Data splitting (blue dash line) and anomalies plot (red dash line) for three different
features
As shown in Figure 5, typical features are plotted sequentially in the time domain. The zoom-
in section of the dataset showed the separation between training and testing datasets. The
anomalies in the test dataset are highlighted in red dash lines. In total, we choose the six most
important features during the operation of the natural gas compressor, namely: pressure (2
features), temperature (2 features), flow (1 feature), and power consumption (1 feature). However,
there is no noticeable change in the trend, so the operating status of the compressor cannot be
directly determined by the change in each of the parameters. Following those multivariate time
series plots, it is near-impossible for a human operator to interpret data in real-time during normal
operation, and the conventional systems cannot spot anomalies reading in a specified time
window. In a particular case, as illustrated in Figure 5, the operators might spot the other overshot
instances for the “PQPTI2180” feature to be the anomalies.
Since the slugging flow from remote wellhead to the processing plant is frequently observed
for a brownfield like in Hai Thach – Moc Tinh field, our task is to recognize these anomalies by
using the LSTM-based autoencoder and the F-score to seek a balance between the Precision and
Recall method. This is especially important since in changing the loading of the centrifuge
compressor, the operators often have to adjust and monitor a lot of information simultaneously,
such as machine load, pressure fluctuation, well opening and closing, and gas and liquid flow.
Features and The selection of the sliding window or The number of features (6)
Timesteps timestep for the input array is equal to the number of
installed sensors on the
compressor.
Timesteps = 3 to 7 with 1
step range.
Activation Transform the weighted sum of the input into Rectified Linear Unit
function an output from one or many nodes in a (ReLU) and Leaky
network layer Rectified Linear Unit
(LeakyReLU)
Loss Function Find out the loss functions that provide the MAE
best performance in terms of predefined
metric
Dropout Fraction of the input units to drop (or weights Dropout rate = 0 to 0.25
frozen during training) with 0.05 step range
Batch size and The batch size is 32 for all of the tests. The early stopping
number of procedure during the
The number of epochs is 10,000 for all of the
epochs training and testing dataset
tests with early stopping procedures.
is 200 and 250
consecutive epochs,
respectively.
Learning rate Each node's weights are updated during 0.01, 0.001, 0.0001
training, referred to as the step size or the
learning rate.
Since the Random Search process was time-consuming and required substantial
computation costs, the authors proposed to use the early stopping procedure to reduce the
training time. As illustrated in Figure 6, MAE was selected as a metric to evaluate model loss
during training, and the early stopping procedure took effect when stopping the training process at
2400 epochs. With each run corresponding to a different timestep selection, we will have an
optimized network corresponding to a set of hyperparameters through the Random Search
process.
In this study, the selection of the sliding window or timestep was also an important parameter
when considering the long-term learning capacity of the LSTM network because it helped to
convert the original time series features into 3D tensors for LSTM models. In addition, it also
showed the influence of past samples on the present (recurrent). That means the author can
always find a way to optimize the network hyperparameter set for each of the timestep or sliding
window sizes, which means updating the weights and biases of the network again for each run. In
addition, the activation function was also a critical hyperparameter for the LSTM network that
transformed the weighted sum of input to output for all of the nodes in a network layer [37].
Therefore, for each timestep and activation function, there was an optimized LSTM-AE network
with a combination of the different parameters listed in Table 2. Next, the authors compared the
results of the optimized model with different timesteps and activation and evaluated the effect of
these two values on the final result, which was the F-score.
(b)
Figure 7: Reconstruction error distribution for Training (a) and Testing dataset (b)
The anomalies are separated from the dataset by selecting the appropriate threshold value.
As shown in Figure 7, the mean and standard deviation of the anomaly scores are different for the
training and testing datasets. Also, we can check the reconstruction error histogram with fitted
Gaussian distribution for the dataset from mean and standard deviation, find out the probability
density of the anomaly score. However, unlike the previous work of Malhotra et al. [21, 27], we do
not use Gaussian distribution of the error vectors to classify anomalies. Using classification
techniques with the Gaussian distribution, as shown in Figure 7, a large amount of data will be
mislabeled because the distribution does not fit the Gaussian distribution.
(a) (b)
Figure 8: Precision and Recall values versus Threshold for an optimized network with timesteps =
2 and activation function = Leaky ReLU
This study analyzed the reconstruction error of the multivariate time-series inputs to obtain
the highest F-score in a semi-supervised learning manner. The authors applied the F-score to
seek a balance between Precision and Recall. F-score is a robust metric for model classification
errors using a binary confusion matrix. While referring to Figure 8, as the threshold increases, the
Precision rate is up to 100% while the Recall rate decreases significantly. On the other hand,
applying the author's method in this study, the threshold value was selected among the minimum
and maximum values for the reconstruction error vector so that the F-score value is maximum. In
this case, with timesteps = 2 and the activation function being ReLU, the optimal Threshold value
is 0.070233, and the F-score value is 0.57143 with 2 False-positive and 1 False-negative
samples. As a result, it makes the machine learning algorithms for classification or anomaly
detection more efficient and avoids false detection.
As previous studies have shown, the activation function is crucial for any deep learning
model. However, it can be seen that, with the same timestep value, ReLU gives better F-score
results. Many previous studies have assessed that Leaky ReLU is not always superior to plain
ReLU. The Leaky ReLU activation function fixed the “dying ReLU problem”, as it does not have
zero-slope parts and speeds up the training process [38]. The dying problem is likely to occur
when the learning rate is too high or a significant negative bias. However, the learning rate was
optimized with different values from 0.0001 to 0.01 in this study, so there was no negative bias.
As shown in Figure 9, the loss distribution (MAE) of all the data is represented, and the threshold
value at 0.1189 is selected to get the best F-score, and the anomaly instances have a much
higher MAE of the reconstruction error. By comparing the timestep and the activation function, the
authors can select the corresponding threshold value to get an F-score of 0.57143, TPR of 0.667,
and FPR of 0.0003 with all the anomalies and an early finding correctly identified.
Figure 9: Loss distribution of the dataset and threshold value
Moreover, it can be seen from Figure 10 that anomalies are difficult to detect with the
conventional method by the operator if comparing data over a long period. Even when zoom-in,
the change is so small to detect. However, when evaluating the mean of the error vectors for six
features, the anomaly samples were detected accurately and 1 to 2 minutes earlier than ground
truth time by domain expert labeling. Detected anomalies (green lines in Figure 10b) were spotted
sooner than the ground truth and counted as False-negative. However, in this case, it gives an
early warning to the operator. This study's optimum LSTM-based autoencoder network consisted
of 4 LSTMs with 448, 224, 224, and 448 neurons, respectively, 1 Dropout of 0.15 fraction, 1
Repeat Vector, and 1 TimeDistributed layer. The activation function was ReLU, L2 regularization
of 0.025, the learning rate of 0.001, and MAE's loss function to provide the best performance
metric.
(a) (b)
Figure 10: Ground truth (red dash line) and detected anomalies (green line in yellow highlight) for
three typical features (a) and with the zoom-in portion of the dataset (b)
5. CONCLUSIONS
Having robust anomaly detection tools enables organizations to leverage their existing data,
automatically compare the incoming data and the information from previous case histories to
anticipate or predict abnormal equipment behavior before it happens, to stay ahead of potential
breakdowns or disruptions in services and address them proactively instead of reacting to issues
as they arise. This paper discussed the improved LSTM-based autoencoder network for anomaly
detection for centrifuge natural gas compressor. Different network hyperparameters were carefully
studied to obtain the optimized model with the maximum value of F-score. Significant conclusions
are highlighted as follows:
▪ A practical methodology for anomaly detection on a natural gas compressor was
developed using an improved LSTM-based Autoencoder (LSTM-AE) containing stacked
LSTM layers that efficiently extract essential features from the multivariate time series.
▪ The Random Search technique effectively optimized the improved LSTM-AE network to
obtain the lowest MAE of reconstruction error vector for different hyperparameter set such
as network architecture, the number of neurons, sliding window size, activation function,
regularization, dropout, batch size, and learning rate.
▪ An adaptable and accurate LSTM-AE network was developed to obtain the highest value
of F-score of 0.57143 with all the anomalies and an early finding correctly identified. The
network consisted of 4 LSTMs with 448, 224, 224, and 448 neurons, respectively, 1
Dropout of 0.15 fraction, 1 Repeat Vector, and 1 TimeDistributed layer. The activation
function was ReLU, L2 regularization of 0.025, the learning rate of 0.001, and MAE's loss
function to provide the best performance metric.
The improved LSTM-based autoencoder for anomaly detection has opened up a positive
research direction in intelligent management and monitoring of oil and gas processing equipment
and well performance with different production profiles. Thereby, the developed results should
have significant contributions to Machine Learning applications in oil and gas operation and
production areas.
6. LIST OF FIGURES
7. NOMENCLATURE
AM Attention Mechanism
IT Information Technology
ML Machine Learning
RS Random Search
8. ACKNOWLEDGEMENT
The research work described herein is part of the government research project number
077.2021.CNKK.QG/HĐKHCN, order 196/QD-BCT of the Ministry of Industry and Trade of the
Socialist Republic of Viet Nam.
9. REFERENCES