Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

SOCAR Proceedings No.

1 (2024) 048-056

Reservoir and petroleum engineering

journal home page: http://proceedings.socar.az

A PREDICTIVE MODEL FOR OIL WELL MAINTENANCE:


A CASE STUDY IN KAZAKHSTAN

D. Aktaukenov*1, M. Alshaalan2, Z. Omirbekova1,3, E. Pinsky2


Satbayev University, Almaty, Kazakhstan
1

2
Metropolitan College, Boston University, Boston, Massachusetts, USA
3
Kazakh National University named after Al-Farabi, Almaty, Kazakhstan

ABSTRACT

This paper proposes a predictive model to help oil workers build a reliable model for identifying oilwell failures. It can help
geologists experienced with Machine Learning to improve the accuracy of failure identification and a more accurate approach to
well-maintenance planning. This study is based on output data statistics such as per-well daily oil flowmeter readings. The volatility
of these indications makes it possible to determine the probability of an oilwell failure. This method makes it possible to rank wells
according to the principle of the most probable failures for workers making decisions. The use of predictive diagnostics can help
to detect equipment problems early, thereby minimizing unplanned downtime. Unplanned sudden oilwell failures increase the
company’s operating costs, as well as increase risks of environmental pollution.

Keywords: machine learning; classification algorithms; oil and gas; prediction algorithms; decision-making.

© 2024 «OilGasScientificResearchProject» Institute. All rights reserved.

1. Introduction timely notification of an impending breakdown.


The current state of the oil and gas industry in Kazakhstan The United Kingdom National Data Repository is one of
is determined by the fact that most of the oil fields are at the first large oil and gas open data releases. It contains 130
a late stage of development – «Brownfield», characterized terabytes of geophysical, infrastructure, field, and well data,
by increased water cuts, reduced average flow rates of covering more than 12500 wellbores, 5000 seismic surveys,
producing wells, and higher costs per unit of production. In and 3000 pipelines [4]. However, the presented data informa-
addition, the oil and gas industry is just recovering from the tion is a generalized set.
post-COVID recession, which resulted in serious operating Today, the proportion of flowing wells per one million oil
losses for oil producers [1]. and gas wells worldwide is only 5%, the rest is accounted for
According to a report by the Annual KazMunayGas by artificial lift methods [5].
report [2] and Analysis Centre of Kazakhstan [3] 85.7 million Sucker rod pump units (SRP) appeared at the end of the
tons of oil were produced, of which 30% were produced 19th century and still keep the most common method of oil
using SRP. production, figure 1. A drive of a deep rod pump is a rocking
In 2022 geopolitical changes in the global community had machine, which converts the rotational movement of the
a significant impact on global business. motor shaft into a reciprocating movement of the hanging
In recent years, the role of technological innovation in point of the rods. Currently, more than half of the wells
oil companies has increased significantly. According to in countries of the former Commonwealth of Independent
the National Company «KazMunayGas» annual report [2], States are equipped with these installations, and in some
global research highlights the main trends that will have the countries, their amount reaches 90% [7].
strongest impact on the oil and gas industry in the coming During repair work, the wells do not produce products.
years: the use of DARQ (D – Distributed ledger technologies, The normal operation of production wells is disrupted for
A – Artificial intelligence, R – Extended reality, Q – Quantum) various reasons, which leads either to a complete shutdown
technologies, technological identity, strengthening the of the well or to a significant decrease in its production rate.
professional skills of employees with new technological The reasons for stopping or reducing production can be very
solutions, cybersecurity risk management, etc. diverse, related to the failure of underground or surface
The problem statement of this article is to optimize equipment, changes in reservoir conditions, the cessation
the operating costs of any oil-producing company, it is of electricity or gas supply for gas lift wells, the cessation of
important to prevent or minimize downtime of wells by pumping and transporting fluid to the surface, etc. One way
or another, part-time the wells are idle either in anticipation
*E-mail: daur.aktaukenov@gmail.com of repair or during the repair itself.
http://dx.doi.org/10.5510/OGP20240100939 As the SRP consists of underground and surface

48
D. Aktaukenov et al. / SOCAR Proceedings No.1 (2024) 048-056

Fig. 1. Artificial lift technique of oil production -


Fig. 2. AI, ML, and DS today (Modified after [10])
Sucker rod pump (Modified after [6])

equipment, in our research we consider only repairs of ment to support strategic decision-making [12]. Quite recent-
underground equipment, parts 2 and 3 of figure 1. ly, advances in computational performance have allowed the
This paper is organized as follows: the second section application of Machine Learning (ML) algorithms which can
contains a literature review on the topic under study, the find correlations and identify patterns [12]. Li proposed an
third section presents the data processing methodology, the ML approach to predict impending failures and alarms on
fourth section presents comparison results of the selected the examples of critical rail car components aiming at both
processing methods, and the last sections conclude our study driving proactive inspections and repairs and reducing oper-
with discussion and conclusion. ational equipment failure [13].
ML approaches can handle high-dimensional and multi-
2. Literature review variate data, and extract hidden relationships within data in
2.1. Artificial Intelligence and Machine Learning complex and dynamic environments [14].
today Artificial Neural Networks (ANN) are one of the most
During the past more than 60 years, great progress has common and applied ML algorithms. The main advantages
been made in the field of Artificial Intelligence (AI) research. of ANN include that no expert knowledge to make decisions
Theories of Heuristic Searching Strategies, Non-monotonic is needed since they are based only on historical data.
Reasoning, Machine Learning, etc., have been proposed. However, some disadvantages of ANNs are: networks can
Applications of AI, especially Expert Systems, Intelligent reach conclusions that deny the rules and theories established
Decision-Making, Intelligent Robots, Natural Language by the applications; long training time for ANN; a huge data
Understanding, etc., also promoted the research of AI [8]. set is needed for an ANN to learn correctly [15].
In the work of Hopgood he defined AI as the science Figure 2 Venn diagram showing the relationship between
of mimicking human mental faculties in a computer [9]. diversified fields of AI, ML and Data Science (DS).
AI covers a wide range of applications based on which it
has been classified from time to time [10]. In this era of 2.2. Artificial Intelligence and Machine Learning in
technological evolution, AI has opened abundant scope for Oil and Gas industry
development in apparently all sectors, including business, Areas of the petroleum industry in which neural networks
marketing, education, science and engineering, medicine, can be applied are seismic pattern recognition, drill bit
and law. diagnosis, improvement of gas well production, identification
Currently, the industry is going through what experts of sandstone lithofacies, and prediction and optimization of
have called «The Fourth Industrial Revolution», also called well performance [16].
Industry 4.0. The integration of these environments allows Mehta reported a survey by General Electric and
the collection of a large amount of data that is collected Accenture and identified that 81% of the executives valued
by different equipment, located in different sectors of the deployment of Big Data as one of the top goals of the oil and
factories [11]. gas industry [17].
AI technological developments in recent years have led ANN – Generalized Auto-Regressive Conditional
to a reduction in expenses related to monitoring the health of Heteroscedasticity (ANN- GARCH) ML method is used to
machinery and the acquisition and storage of huge amounts predict oil price volatility [18].
of data. The application of cutting-edge analytical models Han, Jung, and Kwon used the Random Forest method
to data provides valuable information and knowledge from (RF) to develop a predictive model that can be used to predict
manufacturing processes, production systems, and equip- productivity during the early phase of production [19]. The

49
D. Aktaukenov et al. / SOCAR Proceedings No.1 (2024) 048-056

Table 1
Upstream activity, tool for the application and AI approach
Activity Tool for the application Artificial Intelligence Approach
– A tool for automatically mapping reservoir rock
characteristics over an oil field – Interpolation techniques none
Evaluation of the – A program for collecting geological data from well gradient optimization
subsurface geology logs. Boosting the gradient by 100 times or more – Gradient boosting
accelerates the process. Based on photos of rock – Deep neural network
samples collected from wells, a tool for rock typing
Using real-time drilling telemetry, this tool can detect Algorithms for ML in
Drilling
the drilled rock form and possible failure combination
Reservoir Traditional reservoir simulations can be sped up with
Deep neural networks
engineering this tool
Production A data-driven method for predicting the efficacy of Gradient boosting feature
optimization well care campaigns objectively selection based on expert opinion

required datasets were obtained from 150 wells, targeting approaches [26].
shale gas, stationed at Eagle Ford shale formations [20]. Examples of works that use deep learning algorithms
Recently, Hajizadeh [21], Hanga and Kovalchuk [22] reported for Predictive Maintenance purposes include many studies
the efforts to date of applying AI in fault detection in [27, 28, 29, 30].
the oil and gas industry, suggesting ways to ensure its For example, the study proposes a fuzzy alarm system
greater adoption for strategic management and technology to predict early equipment degradation in a car production
enablement [12]. line [31], to reduce costs with sudden shutdowns. Wei, Zhao
The performance of modern electronic devices is enhanced and He propose a condition-based maintenance strategy
by increasing data processing capabilities. Table 1 represents to determine the optimal action based on the system state
the upstream activities, tools and AI approach that can be to minimize the average cost rate [32]. On the other hand,
used as per the activity in oil and gas [4]. the study developed a prognostic and health management
framework to detect sensor degradation in manufacturing
2.3. Predictive maintenance systems to optimize the maintenance schedule, reduce
Machinery maintenance has evolved from breakdown maintenance costs, avoid unnecessary downtimes and
maintenance to time-based preventive maintenance [23]. support decision-making [33].
In the literature, different nomenclature and groups of In industries, equipment maintenance affects the
maintenance management strategies can be found. Consider operation time of equipment and its efficiency. Therefore,
common categories proposed by the works Susto [24, 25]. equipment faults need to be identified and solved, to avoid a
They classify the maintenance procedures as follows: shutdown in the production processes [34].
• Run-to-Failure; A very relevant case study could be understood from
• Preventive Maintenance; the data strategy of big oil and gas corporations like Royal
• Predictive Maintenance. Dutch Shell. The company uses big data to form a strategy,
Predictive Maintenance advantages include: maximizing wherein there is minimum downtime and failures which can
the time of use and operation of equipment, delaying/ otherwise happen without perfecting the timing of equip-
reducing maintenance activities, and reducing material and ment maintenance which is prone to wear and tear. This is all
labor costs [15]. done by sensors connected at different ends of the machines,
According to the study, maintenance approaches able where aggregated data is compared with the ongoing per-
to monitor equipment conditions for diagnostic and prog- formance and defunct parts can be replaced with minimum
nostic purposes can be grouped into three main categories: downtime [35]. Royal Dutch Shell is one of the largest oil and
statistical approaches, AI approaches, and model-based gas companies – one of the «supermajors» which also include
BP, Chevron, Total, and ExxonMobil – and the world’s fourth
largest company by revenue. For some time now it has
Table 2
Production expenses in 2021* been developing the idea of the «data-driven oilfield» in an
attempt to bring down the cost of drilling for oil [35].
Payroll 310 672 44.83%
Predictive analytics will not only help us pinpoint the
Repair and maintenance 116 151 16.76% right hotspots but will also reduce our drilling time giving a
Energy 98 258 14.18% very smooth oil extraction process [36].
Transportation costs 45 599 6.58% National Company «KazMunayGas» (KMG) is
Short-term lease expenses 28 213 4.07% Kazakhstan’s leading oil and gas company, operating assets
across the entire production cycle from exploration and
Others 94 138 13.58%
production of hydrocarbons to transportation, refining, and
693 031 100% provision of services. KMG production expenses (million
*Independent auditor's report of KMG from Ernst & Young LLP Kazakhstani Tenge) in 2021 are shown in table 2.
which comprises the consolidated financial position as of According to table 2, the second highest expense of all
31 December 2021 (see «KMG financial report» [2]) annual Production Expenditures in the fields of Kazakhstan

50
D. Aktaukenov et al. / SOCAR Proceedings No.1 (2024) 048-056

is related to Well workover and Maintenance services. Naïve Bayesian classifier is a classifier that calculates a set
The Kimberlite study found that operators using a of probabilities by counting the frequency and combinations
predictive, data-driven approach to maintenance experience of values in a given set of data. The algorithm uses Bayes’
36% less unplanned downtime than those with a reactive theorem and assumes that all features are independent given
approach. Today, only 3-5 % of oil and gas equipment the value of the class variable. This assumption of conditional
is sensors connected. Operators can implement analytics- independence is rarely applicable in real-world applications,
based work management programs, using predictive so it is characterized as naive, nevertheless, the algorithm
maintenance to repair vital equipment before it breaks tends to learn quickly in various supervised classification
down, reducing downtime and increasing production [37]. problems [42].
Thus, a good maintenance strategy should improve the In Oil and Gas production, most production activities
equipment condition, reduce the equipment failure rates and are achieved by a group of machines, which collect data and
minimize maintenance costs, while maximizing the life of the only a minority of them are valuable. First, healthy condition
equipment [15]. accounts for the majority of the long-term operation of
machines, while faults seldom happen. Consequently, it
3. Methodology is easier to collect healthy data than faulty data. Second,
There are a lot of modeling techniques based on ML and AI: the quality of the collected data is not always satisfactory
Logistic Regression; Random Forest; Naïve Bayesian classifier; because some of them may suffer from emergencies, such as
Decision Tree; Artificial neural network; Bayesian Belief transmission interruption and the anomaly of measurement
Networks; Support Vector Machine; Principal Component devices [43].
Analysis; Gradient boosted machine; Genetic algorithm; The proposed methodology based on ML follows the
Fuzzy logic and others. Some of them are explainable but following five steps: data collection, data preprocessing
with low accuracy, and others have high accuracy but are and partitioning, model selection, parameter tuning and
hard to run and explain. In this research, we used the first prediction, model evaluation, and visual analysis [44].
three of them due to their prevalence and convenience for the
analysis of our imbalanced data. The algorithms used in this 3.1. Data set
work are presented in table 3. The accuracy of the Predictive Maintenance model has
Logistic Regression is one of the most important statistical been tested on a real industrial production dataset. The
and data mining techniques employed by researchers for the dataset was provided by one of the subsidiaries of the
analysis and classification of binary datasets [38, 39, 40]. National Company «KazMunayGas», which is a major oil
Regression is generally applied when the input and output and gas-producing company in Kazakhstan. The analysed
variables are continuous. And logistic regression works best data contains production information about oil production
when the output variable takes on only two values. The in the period from 1 January 2020 to 31 March 2022.
importance of logistic regression is that many data analysis Production data, as well as any other information obtained
problems can be solved using binary classification or reduced through processing, generalization, or analytical calculations,
to it. Using logistic regression, it is possible to estimate some is equated to confidential information of the company owner.
event’s probability of occurrence (or non-occurrence): the In this work, the name of the oilfield, the operator of the
well is working (broken). This makes logistic regression a oil production, and well-by-well information will remain
powerful decision-support tool. anonymous.
Random Forest is a widely used representative ensemble The Dataset contains the following information in figure 3.
learning method combining Bagging and random feature A – General production information about unique ~4000
selection. The term «random» refers to generating n sub- oilwells including the number of wells and their location;
dataset by randomly sampling with replacement and B – Technological regime of oilwells;
randomly removing some features of each sub-dataset. The C – Daily oil production indications: Extracted liquid
term «forest» means that sub-classifiers are decision trees. (m3/day), Product water cut (%), Extracted oil (t/day).
The prediction results are based on a voting mechanism [41]. In general, more data leads to more reliable models
These classifiers often give better predictions but they are and, consequently, better results. However, data should be
not explainable and not easy to use. representative of the analysed process [12].

Table 3
Classification of supervised learning algorithms
Algorithms Concept and principle Advantage Disadvantage
It can only be used to predict
LR transforms its output using the It is very fast
Logistic Regression discrete functions. Hence, the
logistic sigmoid function to return at classifying
(LR) dependent variable of LR is
a probability value unknown records.
bound to the discrete number set
Random forest Algorithm integrated by multiple Strong anti-
Slower execution
(RF) decision trees jamming ability
Based on the knowledge of probability
Naive Bayes statistics, calculate the probability that the The prediction effect is poor for
Simple logic; low
Classification sample to be tested belongs to each catego- samples with high attribute rel-
false positive rate
(NBC) ry, and use the category with the highest evance
probability as the category of this sample

51
D. Aktaukenov et al. / SOCAR Proceedings No.1 (2024) 048-056

Technological regime 01.01.2020


№ Oil field Oil well Condition
Q liquid, m Water cut, % Q oil, t/day Q liquid, m
3 3
Water cut, % Q oil, t/day
1 XXX №_2651 in idle 5 55 1.89 4 60 1.5
2 XXX №_0392 in work 60 70 15.11 69 60.21 23.03894
3 XXX №_0567 in work 70 92 4.7 66 94.23 3.19659
4 XXX №_0568 in work 35 90 2.94 34 88.27 3.34761
5 XXX №_0584 in work 30 92 2.02 30 92 2.0136

Fig. 3. Dataset

3.2. Data collection and preparation procedure The vast majority of repairs are pump changes. This
Though, we need to determine whether there is a statistical number includes both planned and unplanned repairs.
relationship or correlation between the variables if we choose Figure 5 shows an example of normal well operation
to predict the value of one using the other. However, #9445 with low volatility of oil production.
precision needs to be maintained in the sense, that how close In this example, we observe that a decrease in the
the predicted values are to the observed ones. The predictions performance of the fluid pump by more than 50% and oil by
need to be both unbiased and close to the actual values [45]. 40% of the well mode is a trigger for well repair works.
The distribution by type of repairs for the analysed period Figure 6 shows an example of abnormal well operation
at the oil field of the Kazakhstani investigated oil company is #9696 with the high volatility of oil production which leads
shown in figure 4. to oilwell failure.

Fig. 4. Distribution of repairs

Fig. 5. Normal well operation #9445

Fig. 6. Abnormal well operation #9696

52
D. Aktaukenov et al. / SOCAR Proceedings No.1 (2024) 048-056

Workover operations at the well are preceded by high We can also compute additional metrics such as precision,
fluctuations without pronounced directional dynamics. recall, and F1-score that are widely used in evaluating
As evidenced by the few research articles emerging performances of prediction models [41]:
recently that have explored correlations between oil and
TP
stock markets, high stock market volatility precedes a period Recall = ∑ ,
TP + FN
of high economic uncertainty [46]. In our research, there is a
TP
correlation between the high volatility of oil production and Precision = ∑ , (2)
TP + FP
the onset of oilwell workover.
2 × Recall × Precision
F1 =
Recall + Precision
3.3. Data analysis
The main goal of our analysis is to use data for daily We should note that our data set is highly imbalanced -
production readings for the past W days, we are trying to over 90% have positive labels (on average 3 days per month
predict well failures ∆ days in advance, figure 7. each oilwell is under repair, respectively 10% of negative
To prepare the models, we used train-test approaches. In labels). For such imbalanced data, it is important to consider
the train-test approach, the split of the data was 50% and 50% F1 as the additional metric for comparing different predictive
for training and test processes, respectively. The raw input models.
data were randomly split into two datasets: training and test
datasets. 4. Results
The classification model predicts the class of each data The novelty of this research is the application of ML for
pattern by assigning each sample its predicted label (positive preventive diagnostics on the basis of oil flow rate output,
or negative). A positive label for a well means that this well to be exact the detection of data fluctuations that reflect the
is operating normally. A negative label for a well means that quality of SRP process.
this well is under repair. These labels are assigned for each The results of this work show that the model allows us to
day (∆ in advance) based on the value of oil output in the correctly detect oilwell deviation trends and generate failure
previous W days. prediction alerts as a maintenance decision support system
Therefore, at the end of the classification we compute for operators to prevent possible incoming failures. We
following model results as in study [47]: present and compare results using three models.
• True Positives (TP);
• False Negatives (FN); 4.1. Result with Logistic Regression
• True Negatives (TN); Logistic Regression is easier to implement, and interpret,
• False Positives (FP). and very efficient to train, even for production personnel in
Once we have these values, we can compute the overall the field. Therefore, we took Logistic Regression as the basis
accuracy of our predictive model as the fraction of correctly for the main indicator of accuracy, table 4.
predicted labels in the data set: The highest percentage of accuracy was achieved by the
standard deviation of 14-days of oil production and predicted
TP + TN
Accuracy = (1) 7 days before failure. Despite the fact that accuracy results
TP + TN + FP + FN
may be lower than other classifiers, Logistic Regression

Fig. 7. Analysis

Table 4
Result 1 - Logistic Regression Accuracy
Logistic W=7 W = 14
Regression ∆=7 ∆ = 14 ∆ = 21 ∆=7 ∆ = 14 ∆ = 21
σ (Oil) 70% 69% 68% 72% 71% 70%
σ (Oil)/μ 60% 59% 59% 57% 57% 56%
Slope Oil 53% 53% 53% 53% 53% 53%
σ (Concentration) 62% 62% 61% 64% 63% 63%
F1-score (Oil) 62% 61% 60% 66% 65% 64%

53
D. Aktaukenov et al. / SOCAR Proceedings No.1 (2024) 048-056

Although the Accuracy is higher than LR result, Naïve


Bayesian F1-score results turned out to be slightly worse than
the rest because it is more complex in calculation samples
with high attribute relevance. Thus, this classifier cannot be
representative due to its complexity of data processing and
labelling.

5. Discussion
Most of the research works in the field of maintenance
prediction are based on the approach of studying input data
- measurements of vibration sensors, condition of pumping
machine engine, corrosion sensors, underground equipment
condition diagnostics, etc. The distinguishing feature of our
Fig. 8. The standard deviation of Q oil. study is based on the output data - our prediction calculations
W = 14 days, ∆ = 7 days are based on oil production indications.
The purpose of this work was not to find a high-precision
classifier results are the most relevant for our study because accurate ML model, but to show how using a simple and
it is the simplest to calculate and easily interpret. intuitive ML algorithm, to achieve good oilwell failure
Figure 8 shows the labelling distribution of the Standard prediction results aiming at performing agile and informed
deviation of Oil production. One of the attractive futures of decision-making. Our results can be used by oil workers
LR is the possibility to compute the probabilities. This would without experience in ML allowing one to predict failures
allow an oil manager not only to identify future failures but and rank oil wells based on failure probabilities.
to rank and priorities them. In our model, in addition to prediction, it is also possible
to evaluate and rank each oilwell separately for the possibility
4.2. Result with Random Forrest of failure.
The Random Forest accuracy percentage result shown in The strength of this study is the use of oil flow rate
table 5. output data, which has been applied not only to total pro-
The highest percentage of accuracy was achieved by the duction but also to process equipment diagnostics. A com-
Standard deviation of 7-days of oil production and predicted parative analysis of the results in terms of time intervals
7 days before failure. But this difficult to use in practice. (7 days/ 14 days/ 21 days) showed anomalies and deviations
that indicate the life cycle of equipment and provide an
4.3. Result with Naïve Bayesian classifier opportunity to improve the paradigm of run-to-fail control
Naïve Bayesian classifier accuracy percentage result algorithms for the careful use of equipment.
shown in table 6. The weakness of this study is that using data from one oil
The highest percentage of accuracy was achieved by the field area for comparison, it will be good to use data from all
coefficient of variation (the ratio of the standard deviation σ the fields that exist in Kazakhstan, which has technological
to the mean μ) of 14-days oil production and predicted 7 days specifics such as a high percentage of water cut, but in
before failure. general, it did not detract from the results of the study.

Table 5
Result 2 - Random Forrest Accuracy

Logistic W=7 W = 14
Regression ∆=7 ∆ = 14 ∆ = 21 ∆=7 ∆ = 14 ∆ = 21
σ (Oil) 82% 79% 78% 80% 78% 76%
σ (Oil)/μ 82% 80% 78% 81% 79% 77%
Slope Oil 82% 80% 78% 80% 78% 76%
σ (Concentration) 79% 70% 75% 79% 77% 75%
F1-score (Oil) 73% 82% 80% 82% 81% 80%

Table 6
Result 3 - Naïve Bayesian Accuracy
W=7 W = 14
Naïve Bayesian
∆=7 ∆ = 14 ∆ = 21 ∆=7 ∆ = 14 ∆ = 21
σ (Oil) 64% 63% 63% 66% 65% 65%
σ (Oil)/μ 66% 63% 62% 77% 74% 73%
Slope Oil 54% 53% 53% 53% 53% 53%
σ (Concentration) 59% 58% 58% 61% 61% 60%
F1-score (Oil) 45% 44% 43% 51% 50% 50%

54
D. Aktaukenov et al. / SOCAR Proceedings No.1 (2024) 048-056

Conclusion
Nowadays production wells are potential sources of oil spills or emissions of harmful substances into the
atmosphere, which may cause environmental pollution. Emergencies only increase this impact. Accidental
spills and leaks from oilwells pollute the soil, surface, and ground waters, disturb soil and water environments.
Therefore, the timely prevention of breakdowns of technological downhole equipment during operation
indirectly helps to reduce environmental pollution.
The global upstream Oil and Gas industry has the opportunity to capitalize on ML algorithms. The current
environment in the oil upstream industry provides an opportunity to implement new technologies, predictive
models, and decision-making processes, based on data, to improve performance and reduce production costs.
The study’s key findings have significant implications for the oil production industry, particularly in Sucker
Rod Pumping. The study provides insights for developing a preventive diagnostics system that serves as both
a decision-support tool and an enhanced management system to optimize costs associated with oil production.
The article’s primary focus is developing an intelligent model for diagnosing SRPs, enabling predictive fault
detection throughout specific operational lifecycles. The model displays an accuracy rate of approximately
70%, empowering the oil and gas industry to implement preventive measures to avert breakdowns, mitigate
environmental pollution, and optimize costs. Based on the results of our model, we can identify potential well
failure 7 days before shutdown with a high degree of probability.
Overall, this study can potentially revolutionize the Oil and Gas industry by offering a constructive
approach to addressing environmental concerns while enhancing operational efficiency and reducing costs.
By prioritizing preventive measures, the industry can positively impact the environment while reaping the
benefits of cost savings and improved productivity. Additionally, the study identifies system errors within
automated control systems. Future iterations will include visualizing all rod pumps on a cluster basis to
facilitate comprehensive analysis of preventive diagnostics.

We gratefully thank JSC «Center for International Programs» for providing the research internship within the
Bolashak International Scholarship.
Our sincere thanks also go to the National Company «KazMunayGas» for the opportunity to conduct research in the
field of Oil and Gas in the Republic of Kazakhstan.

References
1. Omirbekova, Z., Aktaukenov, D., Amangeldiyev, A., Abdallah, A. (2021). Developing predictive oil well diagnostics
based on intelligent algorithms. In: 2021 ieee International Conference on Smart Information Systems and Technologies.
2. (2021). Kazmunaygas Annual Report. https://www.kmg.kz/
3. (2022). Analysis of the Republic of Kazakhstan oil and gas industry for the 2021 year. https://iacng.kz/
4. Sircar, A., Yadav, K., Rayavarapu, K., et al. (2021). Application of machine learning and artificial intelligence in oil and
gas industry. Petroleum Research, 6(4), 379-391.
5. Shlumberger - https://www.slb.ru/services/artificial lift/
6. Takacs, G. (2015). Sucker-rod pumping handbook: production engineering fundamentals and long-stroke rod
pumping. Gulf Professional Publishing.
7. Bubnov, M. V., Zyuzev, A. M. (2016). Tools for diagnosing equipment of sucker rod pump units. In: First Scientific and
Technical Conference of Young Scientists of the Ural Energy Institute. Yekaterinburg.
8. Shi, Z. (2019). Advanced artificial intelligence. Vol. 4. World Scientific Publishing.
9. Hopgood, A. A. (2005). The state of artificial intelligence. Advances in Computers, 65, 1–75.
10. Pandey, R. K., Dahiya, A. K., Mandal, A. (2021). Identifying applications of machine learning and data analytics based
approaches for optimization of upstream petroleum operations. Energy Technology, 9(1), 2000749.
11. Borgi, T., Hidri, A., Neef, B., Naceur, M. S. (2017). Data analytics for predictive maintenance of industrial robots. In:
2017 International Conference on Advanced Systems and Electric Technologies (IC ASET).
12. Orr`u, P. F., Zoccheddu, A., Sassu, L., et al. (2020). Machine learning approach using mlp and svm algorithms for the
fault prediction of a centrifugal pump in the oil and gas industry. Sustainability, 12(11), 4776.
13. Li, H., Parikh, D., He, Q., et al. (2014). Improving rail network velocity: A machine learning approach to predictive
maintenance. Transportation Research Part C: Emerging Technologies, 45, 17–26.
14. Wuest, T., Weimer, D., Irgens, C., Thoben, K.-D. (2016). Machine learning in manufacturing: advantages, challenges,
and applications. Production & Manufacturing Research, 4 (1), 23–45.
15. Carvalho, T. P., Soares, F. A., Vita, R., et al. (2019). A systematic literature review of machine learning methods applied
to predictive maintenance. Computers & Industrial Engineering, 137, 106024.
16. Ali, J. (1994). Neural networks: a new tool for the petroleum industry? SPE-27561-MS. In: European Petroleum Computer
Conference, Aberdeen, United Kingdom.
17. Mehta, A. (2016). Tapping the value from big data analytics. Journal of Petroleum Technology, 68 (12), 40–41.

55
D. Aktaukenov et al. / SOCAR Proceedings No.1 (2024) 048-056

18. Kristjanpoller, W., Minutolo, M. C. (2016). Forecasting volatility of oil price using an artificial neural network-garch
model. Expert Systems with Applications, 65, 233–241.
19. Han, D., Jung, J., Kwon, S. (2020). Comparative study on supervised learning models for productivity forecasting of
shale reservoirs based on a data-driven approach. Applied Sciences, 10(4), 1267.
20. Tadjer, A., Hong, A., Bratvold, R. B. (2021). Machine learning based decline curve analysis for short-term oil
production forecast. Energy Exploration & Exploitation, 39 (5), 1747–1769.
21. Hajizadeh, Y. (2019). Machine learning in oil and gas; a swot analysis approach. Journal of Petroleum Science and
Engineering, 176, 661–663.
22. Hanga, K. M., Kovalchuk, Y. (2019). Machine learning and multi-agent systems in oil and gas industry applications:
A survey. Computer Science Review, 34, 100191.
23. Scheffer, C., Girdhar, P. (2004). Practical machinery vibration analysis and predictive maintenance. Elsevier.
24. Susto, G. A., Beghi, A., De Luca, C. (2012). A predictive maintenance system for epitaxy processes based on filtering
and prediction techniques. IEEE Transactions on Semiconductor Manufacturing, 25 (4), 638–649.
25. Susto, G. A., Schirru, A., Pampuri, S., et al. (2014). Machine learning for predictive maintenance: A multiple classifier
approach. IEEE Transactions on Industrial Informatics, 11 (3), 812–820.
26. Jardine, A. K., Lin, D., Banjevic, D. (2006). A review on machinery diagnostics and prognostics implementing
condition-based maintenance. Mechanical Systems and Signal Processing, 20(7), 1483–1510.
27. Amihai, I., Gitzel, R., Kotriwala, A. M., et al. (2018). An industrial case study using vibration data and machine
learning to predict asset health. In: 2018 IEEE 20th Conference on Business Informatics (CBI), 1, 178–185.
28. Butte, S., Prashanth, A., Patil, S. (2018). Machine learning based predictive maintenance strategy: a super learning
approach with deep neural networks. In: 2018 IEEE Workshop on Microelectronics and Electron Devices (WMED).
29. Luo, B., Wang, H., Liu, H., et al. (2018). Early fault detection of machine tools based on deep learning and dynamic
identification. IEEE Transactions on Industrial Electronics, 66(1), 509–518.
30. Mathew, J., Luo, M., Pang, C. K. (2017). Regression kernel for prognostics with support vector machines. In: 2017 22nd
IEEE International Conference on Emerging Technologies and Factory Automation (ETFA).
31. Vafaei, N., Ribeiro, R. A., Camarinha-Matos, L. M. (2019). Fuzzy early warning systems for condition based
maintenance. Computers & Industrial Engineering, 128, 736–746.
32. Wei, G., Zhao, X., He, S., He, Z. (2019). Reliability modeling with condition-based maintenance for binary-state
deteriorating systems considering zoned shock effects. Computers & Industrial Engineering, 130, 282–297.
33. Dong, Y., Xia, T., Fang, X., et al. (2019). Prognostic and health management for adaptive manufacturing systems with
online sensors and flexible structures. Computers & Industrial Engineering, 133, 57–68.
34. Wan, J., Tang, S., Li, D., et al. (2017). A manufacturing big data solution for active preventive maintenance. IEEE
Transactions on Industrial Informatics, 13 (4), 2039–2047.
35. Marr, B. (2015). Big data in big oil: how shell uses analytics to drive business success. Forbes May.
36. Desai, J. N., Pandian, S., Vij, R. K. (2021). Big data analytics in upstream oil and gas industries for sustainable
exploration and development: A review. Environmental Technology & Innovation, 21, 101186.
37. Mathew, B. (2016). How big data is reducing costs and improving performance in the upstream industry. World Oil.
38. Agresti, A. (2018). An introduction to categorical data analysis. John Wiley &Sons.
39. Hilbe, J. M. (2009). Logistic regression models. Chapman and Hall/CRC.
40. Hosmer Jr, D. W., Lemeshow, S., Sturdivant, R. X. (2013). Applied logistic regression. Vol. 398. John Wiley & Sons.
41. Dong, S.-Q., Sun, Y.-M., Xu, T., et al. (2022). How to improve machine learning models for lithofacies identification
by practical and novel ensemble strategy and principles. Petroleum Science, 20(2), 733-752.
42. Saritas, M. M., Yasar, A. (2019). Performance analysis of ANN and Naive bayes classification algorithm for data
classification. International Journal of Intelligent Systems and Applications in Engineering, 7(2), 88–91.
43. Lei, Y., Yang, B., Jiang, X., et al. (2020). Applications of machine learning to machine fault diagnosis: A review and
roadmap. Mechanical Systems and Signal Processing, 138, 106587.
44. Hong, B.-Y., Liu, S.-N., Li, X.-P., et al. (2022). A liquid loading prediction method of gas pipeline based on machine
learning. Petroleum Science, 19(6), 3004-3015.
45. Making predictions with regression analysis. https://statisticsbyjim.com/regression/predictions-regression/
46. Zhao, W., Wang, Y.-D. (2022). On the time-varying correlations between oil-, gold-, and stock markets: The
heterogeneous roles of policy uncertainty in the US and China. Petroleum Science, 19(3), 1420–1432.
47. Chicco, D., Jurman, G. (2020). The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy
in binary classification evaluation. BMC Genomics, 21(1), 1–13.

56

You might also like