Energies 16 02853

energies
Article
An Artificial Lift Selection Approach Using Machine Learning:
A Case Study in Sudan
Mohaned Alhaj A. Mahdi * , Mohamed Amish and Gbenga Oluyemi
School of Engineering, Robert Gordon University, Garthdee Road, Aberdeen AB10 7GJ, UK
* Correspondence: m.mahdi1@rgu.ac.uk
Abstract: This article presents a machine learning (ML) application to examine artificial lift (AL) selec-
tion, using only field production datasets from a Sudanese oil field. Five ML algorithms were used to
develop a selection model, and the results demonstrated the ML capabilities in the optimum selection,
with accuracy reaching 93%. Moreover, the predicted AL has a better production performance than
the actual ones in the field. The research shows the significant production parameters to consider in
AL type and size selection. The top six critical factors affecting AL selection are gas, cumulatively
produced fluid, wellhead pressure, GOR, produced water, and the implemented EOR. This article
contributes significantly to the literature and proposes a new and efficient approach to selecting the
optimum AL to maximize oil production and profitability, reducing the analysis time and production
losses associated with inconsistency in selection and frequent AL replacement. This study offers a
universal model that can be applied to any oil field with different parameters and lifting methods.
Keywords: artificial lift; machine learning; production data; supervised learning; algorithms
1. Introduction
Some wells can naturally produce oil at the start of production by using reservoir
primary drive mechanisms such as solution gas drive, gas expansion, and strong water
drive. However, most reservoir energies are finite, will deplete over time, and cannot
Citation: Mahdi, M.A.A.; Amish, M.;
naturally lift hydrocarbons to the surface [1]. An AL is a production system unit that
Oluyemi, G. An Artificial Lift
lifts the hydrocarbons from the reservoir to the surface to support insufficient reservoir
Selection Approach Using Machine
energy [2].
Learning: A Case Study in Sudan.
The AL is a milestone in the oil and gas industry since it accounts for 95% of worldwide
Energies 2023, 16, 2853. https://
doi.org/10.3390/en16062853
oil production [3]. There are several types of AL: sucker rod pumping (SRP) or beam
pumping unit (BPU), progressive cavity pump (PCP), gas lift (GL), electrical submersible
Academic Editors: Mohamed pump (ESP), plunger lift (PL), hydraulic jet pump (HJP), and hydraulic piston pump (HPP).
Mahmoud and Zeeshan Tariq SRP produces approximately 70% of oil worldwide and is postulated to be the oldest lifting
Received: 14 February 2023 method [4]. Optimum AL selection is critical since it determines the daily fluid production
Revised: 28 February 2023 (daily revenue) that the oil corporations will gain. The AL selection techniques in the
Accepted: 16 March 2023 literature study the advantages and disadvantages of each lifting method considering field
Published: 19 March 2023 conditions, well, and reservoir parameters [5,6]. They use qualitative methods, primarily
relying on engineers’ personal experience [7]. The critical issue is that the field parameters
are dependent on and change over production years. In addition, the parameters are neither
theoretically nor numerically correlated, leading to inconstancy in the selection and much
Copyright: © 2023 by the authors.
time spent on analysis. Therefore, more expenses emerge due to AL replacement within
Licensee MDPI, Basel, Switzerland.
a short production period. Some AL experts suggest that old selection techniques have
This article is an open access article
nearly vanished and are, to some extent, outdated [8].
distributed under the terms and
The uncertainty in field data and screening resulted in different selection methods and
conditions of the Creative Commons
a gap in the universal selection criteria. Another gap in the AL selection methods is that the
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
crucial field factors have not been critically identified. There should be one or more factors
4.0/).
in each field parameter category (production, operation, economic, or environmental) that
Energies 2023, 16, 2853. https://doi.org/10.3390/en16062853 https://www.mdpi.com/journal/energies

Energies 2023, 16, 2853 2 of 15
predominantly affect the selection in either adequacy or elimination of the nominated AL.
This pushed many companies to establish their own selection systems or develop computer
programs based on their field conditions [9–11].
This research aims to develop a universal AL selection model and examine a new
selection criterion based on AL production performance by considering only the field
production dataset. Furthermore, the goal is to identify the critical production factors that
primarily influence AL selection. This study applied it to select PCP, BPU, GL, ESP, and the
ability to flow naturally. The model was also devoted to a wider selection of AL sizes. The
project applied supervised ML (via Python), a guided process where inputs and outputs
are provided to let the algorithm find relevant patterns between those parameters [12].
Five algorithms were used to model AL selection: logistic regression (LR), support vec-
tor machines (SVM), K nearest neighbors (KNN), decision tree (DT), and random forest
(RF). The model determined how successful the process is when applied to only field
production data.
2. Materials Preparation
The data contains oil and gas production and recovery methods from 2006 to 2021
for light, medium, heavy, and extra-heavy oil wells. Twenty-four wells were randomly
selected as the model dataset. The selected field is in the Muglad basin (Figure 1 [13]), has
Energies 2023, 16, x FOR PEER REVIEW 3 of 15
more than 500 oil wells, and comprises seven subfields (blocks) spread over approximately
17,800 square km in seven remote locations [14,15].
Figure 1. Muglad basin modified from [13], the selected field is the upper block.
Figure 1. Muglad basin modified from [13], the selected field is the upper block.
16,000
14,000
12,000
10,000
Energies 2023, 16, 2853 3 of 15
The reservoir lithology is sandstone interbedded with shale and has three main formations:
(1) A is a deep formation that produces light oil and gas;
(2) B is a shallow formation that produces heavy and extra-heavy oil;
(3) C is a deep formation that produces light oil.
Two more tight formations, D and E, have low oil reserves.
The field’s estimated oil reserve is more than 500 million barrels. Most production
wells are completed with 95/8 inch casing, 27/8 inch, 31/2 inch, and 41/2 inch production
tubing. Wells’ depths range from 500 m in shallow reservoirs to more than 3000 m in
deep reservoirs.
Most light oil wells started production in NF at the beginning of their production
lives before moving toward AL. Four primary artificial lift methods are installed: PCP, SRP,
GL, and ESP. PCP and SRP have been the primary lifting methods since the field started
production in 2003. The implemented recovery methods are water-flooding [water injection
(WI)], nitrogen injection (NI), gas injection, and thermal recovery [cyclic steam stimulation
(CSS) and steam flooding (SF)]. PCP is installed for cold heavy oil production (CHOP) and
cold heavy oil production with sand (CHOPS). BPU and metal-to-metal PCP (MTM_PCP)
are used with thermal recovery, while GL and ESP produce light oil. Figure 2 demonstrates
the distribution of AL in the dataset. PCP dominates lifting methods in the field, even
though the cumulative fluid production by GL is higher than PCP, ESP, and BPU, as shown
in Figure 3. Nonetheless, NF cumulative production is high; it rapidly drops after a short
production period because of insufficient reservoir energy. The imbalance in the data may
lead to an accuracy paradox if it is not up- or downsampled (this refers to the labels being
better when there is approximately equal class distribution in the dataset) [16]. In our
model, the data was kept imbalanced to reflect the actual field state and assess the model’s
robustness in AL selection. Some recent studies have shown developments in modeling
Figure 1. Muglad
imbalanced basin
data and modified
learning from
from it [13],
[17]. the selected field is the upper block.
16,000
14,000
12,000
10,000
Count
8,000
6,000
4,000
2,000
0
PCP GL BPU ESP MTM_PCP NF
Figure2.2.AL
Figure ALdistribution
distributioninin
thethe dataset.
dataset.
Energies
Energies2023,
2023, 16,
16, x2853
FOR PEER REVIEW 4 of 15
4 of 15
25,000
Cumulative oil (Thousands STB) 2,500
250
25
GL NF PCP ESP BPU MTMPCP
Figure3.3.Cumulative
Figure Cumulativeoiloilproduced
producedbyby each
each ALAL according
according to IOR
to IOR andand
EOR.EOR.
Input
InputParameters
Parameters(Features)
(Features)Selection
Selection
The
Theparameters
parameters analyzed
analyzed before selecting
before the optimum
selecting the optimum lifting method are categorized
lifting method are catego-
into production, operation, economic, and environmental [18]. Since those parameters
rized into production, operation, economic, and environmental [18]. Since those parame-
are neither theoretically nor numerically correlated, we used one category of field data
ters are neither theoretically nor numerically correlated, we used one category of field
(production) for AL selection to assess the success of applying ML to specific datasets. The
data (production) for AL selection to assess the success of applying ML to specific datasets.
deficiency of measuring and metering tools in the field, such as downhole real-time pressure
The deficiency ofgauges,
and temperature measuring has and metering
restricted tools in the
the selected field, such
features. Thereasisdownhole real-time
an insufficient
pressure and
recording temperature
of reservoir gauges,
parameters andhas restrictedoperations.
well-testing the selected Thefeatures.
reservoir There
data is
is an insuffi-
solely
cient recording of reservoir parameters and well-testing operations.
recorded during the drilling and at the start of production. In addition, some reservoir The reservoir data is
measurements are obtained intermittently during workovers. Therefore, the modelingreser-
solely recorded during the drilling and at the start of production. In addition, some
voir measurements
depends on production areparameters,
obtained intermittently
especially thoseduring
mostlyworkovers.
correlated with Therefore, the model-
and measured
ingthe
by depends on production
flow rate, which measures parameters,
oil wellespecially
performance. thoseWe mostly
usedcorrelated with and meas-
the daily cumulative
flow
uredrate andflow
by the daily production
rate, parameters
which measures during
oil well the whole We
performance. AL used
year of theproduction to
daily cumulative
thoroughly analyze the production performance of each AL at the
flow rate and daily production parameters during the whole AL year of production to oil field rather than using
only the flowanalyze
thoroughly rate limitations. Additionally,
the production some parameters
performance of each AL at were
the selected according
oil field rather thantousing
their effect on artificial lift and well productivity, such as IOR and EOR
only the flow rate limitations. Additionally, some parameters were selected according to recovery methods.
Ittheir
is postulated that many
effect on artificial liftfeatures
and wellnegatively affectsuch
productivity, model performance,
as IOR lead to model
and EOR recovery methods.
overfitting, and require high computations (Brownlee, 2020). The following nine features
It is postulated that many features negatively affect model performance, lead to model over-
were selected for modeling according to the available field data: wellhead pressure (psi),
fitting, and require high computations (Brownlee, 2020). The following nine features were
daily produced fluid (BLPD), gas–oil ratio (GOR) (SCF/STB), daily oil production (STB/D),
selected for modeling according to the available field data: wellhead pressure (psi), daily
daily gas production (MSCF/D), daily water production (BWPD), daily sand production
produced
(RB/D), fluidcut
water (BLPD),
(%), and gas–oil ratio (GOR)secondary
the implemented (SCF/STB), daily
and oil production
tertiary (STB/D),todaily
recovery methods
gas production (MSCF/D), daily water production (BWPD), daily
increase oil production. The secondary recoveries, known as improved oil recovery (IOR) sand production (RB/D),
and used for reservoir pressure maintenance, are gas injection, NI, and WI. The tertiary oil
water cut (%), and the implemented secondary and tertiary recovery methods to increase
production.
recoveries, The
also secondary
called enhanced recoveries, known
oil recovery (EOR)as improved
and used tooilreducerecoveryheavy(IOR) and used for
oil viscosity
reservoir
and pressure
increase mobility maintenance, are gas injection,
towards the wellbore, are CSS and NI, SF.
andThe WI.same
The production
tertiary recoveries,
features also
calledselected
were enhanced oil recovery
for model (EOR) and
size selection, usedAL
where towas
reduceused heavy
as anoil viscosity
input and increase mo-
parameter.
bility towards the wellbore, are CSS and SF. The same production features were selected for
3. Methods
model Application
size selection, where AL was used as an input parameter.
AL selection is a classification problem since the targeted labels are categorical. The
algorithms
3. Methodshave been selected accordingly to build the model. The ML algorithms cannot
Application
directly perform modeling on raw data, though it is cleaned and pre-processed using
AL selection is a classification problem since the targeted labels are categorical. The
data wrangling. Data wrangling is considered the most challenging step for the success
of ML [19]. have
algorithms been selected
The process accordingly
is commonly to build
performed the model.
through theseThe MLdata
steps: algorithms cannot
collection,
directly perform modeling on raw data, though it is cleaned and pre-processed using data
wrangling. Data wrangling is considered the most challenging step for the success of ML
[19]. The process is commonly performed through these steps: data collection, feature
Energies 2023, 16, 2853 5 of 15
selection, data structuring, outlier and duplicate removal, and missing data cleaning. The
data wrangling
feature selection, dataapplication sweptoutlier
structuring, 26,571andoutliers and missing
duplicate removal,values, as welldata
and missing as 4578
clean-du-
plicates, from the raw dataset. The data was then normalized
ing. The data wrangling application swept 26,571 outliers and missing values, as well as between 0 and 1 to reduce
the duplicates,
4578 generalization from error, and the
the raw categorical
dataset. The dataparameters
was thenwere encodedbetween
normalized as the algorithms
0 and 1
deal solely with digits.
to reduce the generalization error, and the categorical parameters were encoded as the
The deal
algorithms modeling
solely waswith applied
digits. to 24 production wells consisting of approximately
465,000
The samples
modelingafter wascleaning
appliedand to preprocessing.
24 production Data wellsfrom 16 wellsofranging
consisting from 2006
approximately
465,000 samples after cleaning and preprocessing. Data from 16 wells ranging from 2006Two
to 2019 was used for model training and validation (approximately 429,000 samples). to
wells
2019 was minimum
used forwere model selected
training fromand each subfield(approximately
validation to ensure the model 429,000hassamples).
a piece of Twoinfor-
mation
wells from each
minimum wereblock.
selectedThefrom
model eachused 12 wells
subfield to train
to ensure theand learnhas
model thea installed lifting
piece of infor-
method in each well according to input parameters and field
mation from each block. The model used 12 wells to train and learn the installed lifting conditions. The remaining
four wells
method in the
in each welldataset wereto
according used
inputto parameters
validate theand model.fieldThis approachThe
conditions. differs slightly
remaining
from
four the common
wells train-test-split
in the dataset were usedinto ML applications,
validate where
the model. theapproach
This algorithmdiffers
randomly splits
slightly
the dataset
from the common that istrain-test-split
given in bulk.in Then
ML the model waswhere
applications, testedthe by algorithm
a new dataset of eight
randomly wells
splits
with approximately 11,600 samples ranging from 2020 to 2021. The
the dataset that is given in bulk. Then the model was tested by a new dataset of eight wells use of recent data is to
test model performance under current and future field production
with approximately 11,600 samples ranging from 2020 to 2021. The use of recent data is to conditions. Moreover,
to model
test measure how the model
performance underwill act on
current future
and future unlabeled data. conditions. Moreover, to
field production
measure Nominating
how the model whichwill MLactalgorithms to use is critical,
on future unlabeled data. as choosing them depends on
their accuracy [20].
Nominating whichIt isML
rarealgorithms
and challenging
to use istocritical,
use oneasclassifier
choosingfor classification
them depends on and find
their
accuracy [20]. It iswith
two algorithms rare the
andsame
challenging
accuracy to because
use one classifier for classification
each algorithm has its own and find two
learning ap-
algorithms
proach from withdata
the [12].
sameThe accuracy becausesignificance
algorithms’ each algorithm has its own
is different; thus,learning
combining approach
two or
from
more data [12]. The
algorithms algorithms’ significance
is recommended for excellent is results
different; thus,We
[21,22]. combining two or more
selected algorithms that
algorithms
can quickly is and
recommended
efficiently model for excellent results
field data and[21,22].
handle We selected
outliers and algorithms
noise in field that can
param-
quickly and efficiently
eters. These model(SVM,
five algorithms field data
LR, and
KNN, handle
DT, andoutliers and been
RF) have noiseselected
in field parameters.
for modeling
These five aalgorithms
to build combination (SVM, LR, KNN,
of different DT, andfor
classifiers RF)adequate
have been selected
results. Thefor modelingwith
algorithms to
build a combination
the best prediction of different
accuracy onclassifiers
training and for adequate
validationresults.
were then Theused
algorithms withAL
to predict theon
best
theprediction
test datasetaccuracy
and determineon training and validation
the critical were then
features. Figure used tothe
4 illustrates predict AL onwork-
simplified the
test dataset and determine the
flow of the AL and size selection models. critical features. Figure 4 illustrates the simplified workflow
of the AL and size selection models.
Data collection,
features selection,
cleaning, and pre- Model training Testing model
processing and validation performance on New dataset
on AL and size AL and size
selection selection
Importing algorithms
Best AL and Unlabelled

size selection dataset
performance
Optimum AL
and size for the model
prospective
well
Figure
Figure 4. 4.
AL AL and
and size
size selection
selection workflow.
workflow.
Ounsakul
Ounsakul [18]
[18] demonstrated
demonstrated thethe process
process of of
ALAL selection
selection (Y)(Y) using
using MLML algorithms
algorithms as aas
a function
function of field
of field datadata
f(X),f(X), which
which is written
is written as: as:
Y =Yf (X)
= f (X) (1)(1)
or
or
AL AL Selection
Selection Model
Model = Algorithm
= Algorithm (Field
(Field Data)
Data) (2)(2)
Energies 2023, 16, 2853 6 of 15
In our model, the equation is written as:
AL Selection Model = Algorithm (Field Production Data) (3)
The equation explains that the data quality determines the optimum AL selection. In
supervised learning, the data are provided to the algorithms, which are a set of mathemati-
cal equations where the inputs X and outputs Y are given. The algorithms train and learn
by connecting the input variables to the outputs and exploring each variable’s importance
in the dataset. The algorithms then predict the new output (AL) with the best accuracy
based on training results.
Algorithms Classification Criteria

(a) LR solves the probability P(Y|X) task by learning from a training dataset Dt written
as a vector of weights and a bias (WiXi + b), where the weight Wi is the importance
of the feature Xi in the dataset. To make a classification on a new test dataset, LR
calculates the weighted sum class evidence z and then passes the result down to the
sigmoid function σ (z), which narrows the results between 1 true and 0 negative class.
The algorithm uses the decision boundary of 0.5 to predict the classes [23].
!
n
z= ∑ WiXi +b (4)
i =1
1
σ(z) = (5)
1 + exp−z

1 i f σ (z) > 0.5
P(Y|X) = (6)
0 otherwise
(b) SVM is a binary classifier; however, it can be used for multiclass classification by
breaking down the problem into a series of binary classification cases. To do this, we
applied the OVO (one vs. one) approach. The concept is that each classifier sepa-
rates the points of two classes, including all OVO classifiers, to establish a multiclass
classifier. The number of classifiers needed for this is calculated using n(n − 1)/2
(n = no of classes). Eventually, the most common class in each binary classifica-
tion is then selected by voting [24]. We assume that we have a training dataset
Dt {Xi, Yi} that is binary and linearly separable, where each Xi has dimensionality q
and is either one of Yi = −1 or Yi = +1 (here q = 2). The training data points can be
described as follows:
Xi ·W + b ≥ +1 f or Yi = +1 (7)
Xi ·W + b ≤ −1 f or Yi = −1 (8)
The SVM draws hyperplanes to separate the classes of each training data point. The
hyperplane is expressed as W · X + b = 0. The support vectors separate the hyperplanes,
while the machines keep the distance as far as possible between the hyperplanes that
separate the classes. In our multiclassification problem, if the data points are not linearly
separable, then the SVM applies the kernel function K(Xi, Xj) to recast the data points into
a higher-dimensional space X7−→ ∅(X) to be separable [25]. Thus, the training data points
recasting into the higher space are written as:
(Wi )T ∅( X ) + b j where j = 1, . . . , n (9)

Energies 2023, 16, 2853 7 of 15
We applied the radial-based kernel (Gaussian kernel) in our model, which is

expressed as:
k ( Xi, Xj) = exp−γ(k Xi− Xjk2) (10)
where γ controls the width of the kernel function. The decision function of voting that
provides the n class label of the mth function is written as [24]:

majority votingm=1,...,n WmT ∅( Xi ) + bm (11)
(c) In KNN, we have a given training dataset Dt {(X1 , Y1 ), (X2 , Y2 ), . . . ,(Xn, Yn)}, and the
algorithm calculates the Euclidean distance of each class to all training data points
using the below formula. Then construct a boundary for each class by determining
the K’s nearest neighbor.
q
∑in=1 ( Xi − Yi )2 (12)
The algorithm’s performance is susceptible to the K value, which is difficult to estimate.

A small K will result in overfitting, while a large K leads to class boundary intersections
and training data scattering in many neighborhoods. The best option is to try different K
values until the highest accuracy is achieved [26].
(d) The DT consists of a root node, internal nodes, and leaf nodes that assign the class
labels. The concept of DT is that it identifies the informative features regarding each
class label. To obtain that, we use the CART (classification and regression tree) because
some features are categorical as well as the outputs; the DT applies the Gini index in
each node to split the data. Gini is defined as a measure of the probability of incorrect
predictions when the features are selected randomly [27]. Assume we have a Dt
training dataset; the Gini index is calculated using the following equation:
n
Gini = 1 − ∑ ( Pi )2 (13)
i =0
where Pi is the probability of partitioned data of class i in Dt and n is the total number
of classes of Dt. The feature with a lower Gini value is used to split the data.
(e) In RF, let Dt be a training dataset {(X1 , Y1 ), (X2 , Y2 ), . . . ,(Xn, Yn)} since the RF is a
combination of trees, the RF applies bagging (bootstrap aggregating), an aggregation
of multi-tree results. The idea is that the RF randomly splits the training dataset to
bootstrap samples to create multi-trees and repeats the process. The class Y that has
the majority among the results of the trees is then selected by voting [28]. The RF
classifies through the following steps [29]:
1. For b = 1 to B (no. of trees);
I. Create a bootstrap sample Z of size N on the Dt;
II. Grow a tree Tb on the bootstrapped data and recursively repeat these steps
until the minimum node size is reached:
i. Select random m variable from the p variables;
ii. Find the optimum split point among the m variable using the Gini
index to create the daughter nodes.
2. Obtain the output ensemble trees {{ Tb }1B }.
Let Ybb ( X ) be the class prediction of the bth tree, then the conclusive predicted class is
given by:
B
YRF ( X ) = majority voting {Yb ( X )} } (14)
d b
1
Energies 2023, 16, 2853 8 of 15
4. Results and Discussion

4.1. AL Selection Results
The study demonstrates the optimum AL and size selection, as well as the most crucial
production factors that affect the selection. The results were validated with the actual
lifting methods used in the field, considering that they are the optimum lifting methods
according to the previous selection screening and the production performance simulation.
The outcomes of this research have provided insight into the robustness of the ML model
in AL selection using only production data. Table 1 summarizes the AL selection model
training and validation accuracies scored by each algorithm. The algorithms performed
well in training and validation, while the two algorithms, DT and RF, had excellent results
on the new test dataset, as shown in Table 2, including precision, recall, and F1 scores.
RF scored 92.42% accuracy, and DT scored 93.02%. Both RF and DT have test accuracy
that is approximately equivalent to their validation accuracy. SVM, LR, and KNN testing
accuracies are between 58 and 64%.
Table 1. AL selection model training and validation accuracies.
Algorithm Training Accuracy (%) Validation Accuracy (%)

LR 88.61 90.30
SVM 95.46 84.00
KNN 98.09 72.30
DT 99.80 88.60
RF 99.65 92.86
Table 2. AL selection test accuracies.
Precision (%) Recall (%) F1 Score (%)

Algorithm Accuracy (%) * * *
** ** **
93.02 93.02 93.02
DT 93.02
84.6 92.37 85.25
92.42 92.42 92.42
RF 92.42
77.71 75.38 75.32
* Micro and ** macro average values.
The accuracy of both RF and DT in training and testing is demonstrated in Figure 5.

Both algorithms obtained their highest test accuracy at a maximum tree depth of 7. It is
important to note that the obtained accuracy considers data imbalance and actual field
conditions. Figure 6 is the classification report (known as the confusion matrix), illustrating
the predicted and actual ALs used in the field. We can see that both algorithms effectively
predicted BPU, PCP, and ESP. The prediction error of 7.5% resulted in thermal recovery
pumps, namely BPU and MTM_PCP, while the 7% error is in GL and NF classification.
The wrong prediction of thermal AL is because MTM_PCP has the lowest weight in the
dataset; nevertheless, both algorithms predicted it for CSS. The BPU was used in both
SF and CSS, while MTM_PCP was only used in CSS wells in the field; thus, the model
selected MTM_PCP as more appropriate than the BPU considering input parameters. The
RF classifier predicted GL instead of the actual NF label due to the similarity of gas and
GOR features. Additionally, NF has approximately the same amount of cumulative GL
oil before reservoir energy is depleted and replaced by another lifting method. The DT
classifier predicted GL for the actual ESP actual because some ESP wells produce a small
amount of gas, which the algorithm principally uses to classify the AL.
their model is conditioned on an enclosed data inventory of five lifting methods and might
not bemodel
their applicable if differentoninput
is conditioned parameters
an enclosed dataor other ALs
inventory are used
of five liftinginstead.
methods Our
andmodel
might
isnot
independent of specific datasets; any other data and AL can be modeled. In
be applicable if different input parameters or other ALs are used instead. Our model addition,
[30] tests their model
is independent on onedatasets;
of specific well while
anywe testdata
other our model
and ALoncaneight wells to substantiate
be modeled. In addition,
Energies 2023, 16, 2853 the
[30]results.
tests their model on one well while we test our model on eight wells to substantiate 9 of 15
the results.
Training Error Test Error Training Error Test Error

Training Error Test Error Training Error Test Error
0.4 0.4
- Accuracy)
- Accuracy)
0.4
0.35 0.4
0.35
(1 -(1Accuracy)
(1 -(1Accuracy)
0.35
0.3 0.35
0.3
0.3
0.25 0.3
0.25
0.25
0.2 0.25
0.2
Error
Error
0.2
0.15 0.2
0.15
Error
Error
0.15
0.1 0.15
0.1
Prediction
Prediction
0.1 0.1
0.05
Prediction
Prediction
0.05
0.05
0 0.05
0
0 1 3 5 7 9 11 13 15 17 19 0 1 3 5 7 9 11 13 15 17 19
1 3 5 7Max.
9 Depth
11 13 15 17 19 1 3 5 7Max.
9 Depth
11 13 15 17 19
Max. Depth Max. Depth
(a) (b)
(a) (b)
Figure 5. Model training error vs. test prediction error. (a) DT prediction error; (b) RF prediction
error.
Figure5.5.Model
Modeltraining
trainingerror
error
Figure vs.vs. test
test prediction
prediction error.
error. (a) prediction
(a) DT DT prediction
error;error;
(b) RF(b) RF prediction
prediction error.
error.
(a) (b)
(a)
Figure
(b) field and predicted AL by
Figure 6.
6. Classification
Classificationreport
report(confusion
(confusionmatrix)
matrix)ofofactual
actualAL
ALininthe
the field and predicted AL by
model.
Figure(a)
model. 6. DT
(a) DT confusion
confusion matrix
Classification matrix (true
report (true labelvs.
(confusion
label vs. predicted
matrix) label);AL
of actual
predicted label); (b)RF
(b) inRFthe
confusion matrix
field and
confusion (true
predicted
matrix (true label
AL
label by
vs.
vs. predicted
model. (a) label).
DT confusion matrix (true label vs. predicted label); (b) RF confusion matrix (true label
predicted label).
vs. predicted label).
Table The
3. ALMLselection results in
application incomparison
AL selection to aseems
recent to
study
be ausing ML.
premature process, as one recent
study conducted by [18] shows. Table 3 compares this studyML.
Table 3. AL selection
Category
results in comparison to
This Study
a recent study using
[18]with [18] selection results.
The highest model training accuracy obtained
Category by [18] was394%;
This Study (DT, no testing
Naïve [18] scores
Bayes, were
Neural
No. of used Our
mentioned. ML algorithms 5 (LR,
model’s testing SVM, KNN,
accuracy using RF, DT)
a separate dataset that it has not been
3 (DT, Naïve Bayes, Neural
Network)
No. ofon
trained used ML algorithms
is 93%, 5 (LR, SVM,
indicating satisfactory KNN, RF, DT)
performance in predicting the optimum lifting
Max training accuracy % 99.8 (scored by DT) Network)
94 (scored by DT)
methods. Our study proves that the selection could be accomplished using a specific
Max
Maxtraining accuracy
test accuracy % % 99.8 (scored
93.02 by DT) 94 (scored
N/A by DT)
dataset and obtaining the highest accuracy. Another recent interesting study was pre-
Max
sented bytest
[30]accuracy
to select%the optimum AL 93.02 N/A
using fuzzy logic and mathematical models.
However, their model is conditioned on an enclosed data inventory of five lifting methods
and might not be applicable if different input parameters or other ALs are used instead.
Our model is independent of specific datasets; any other data and AL can be modeled. In
addition, ref. [30] tests their model on one well while we test our model on eight wells to
substantiate the results.
Energies 2023, 16, 2853 10 of 15
Table 3. AL selection results in comparison to a recent study using ML.
Category This Study [18]

No. of used ML algorithms 5 (LR, SVM, KNN, RF, DT) 3 (DT, Naïve Bayes, Neural Network)
Max training accuracy% 99.8 (scored by DT) 94 (scored by DT)
Max test accuracy% 93.02 N/A
No. of wells 24 9
No. of AL 4 + NF 4
This study demonstrates that the AL selection could be performed using only pro-
duction data. The widely used selection techniques in literature, such as selection ta-
bles [5,31,32], expert systems [10], decision trees [33], and decision-making and ranking
models [34], did not select the optimum AL. Nevertheless, they provide advantages, disad-
vantages, rankings, and guidelines to eliminate inappropriate AL and recommend other
lifting methods. On the other hand, our model directly predicted the optimal AL by learn-
ing from the available data. These research findings strengthen [8] the presumption of
invalidation of the conventional selection techniques.
4.2. AL Size Selection Results

RF was selected for modeling in the AL size selection because of its best performance
among the other algorithms. Additionally, it predicted that the AL would have better
production performance compared to the DT. The model conducted three runs using
13, 8, and 6 size classes. Unlike AL selection’s high accuracy, the accuracies obtained
by the RF classifier on the 13 and 8 classes were 69.68% and 75.74%, respectively. It is
evident that many classes of AL size resulted in poor model performance. The rationale
is that the distribution of AL sizes over an oil well’s life varies as it continuously changes
after years of production. In other words, the continuous replacement of AL sizes led to
some sizes appearing in the training dataset (during a specific production period) while
disappearing in the test dataset (during another production period). Thus, the model could
not find the optimum size to select, although the same AL exists. Operationally, and after
years of production, the AL size is replaced due to low productivity, failure, or IOR/EOR
implementations. Therefore, it is unlikely to use the same AL type and size for the whole
oil well production life because of the change in well and reservoir properties that affects
well deliverability over time. RF obtained the highest model accuracy of 92.42% (Table 4)
using the 6 AL size classes (Large, Medium, and Small for PCP, BPU, and MTM_PCP, WSB
for ESP, NUE-Mandrel for GL, and X-tree_5000psi used for naturally flowing wells). The
other algorithms scored (57–75%) as shown in Figure 7.
The 7.5% RF prediction error resulted from the two wrongly predicted sizes, Small
and Medium, in which Large was the actual size, as illustrated in Figure 8. The reason is
that after years of production, some parameters have similar values, such as flow rates,
pressures, production years, and WC%, which are recorded in different sizes.
Table 4. AL size selection test accuracies.

** ** **
92.42 92.42 92.42
RF 92.42
80.39 82.39 81.23
production, the AL size is replaced due to low productivity, failure, or IOR/EOR imple-
mentations. Therefore, it is unlikely to use the same AL type and size for the whole oil
well production life because of the change in well and reservoir properties that affects well
deliverability over time. RF obtained the highest model accuracy of 92.42% (Table 4) using
the 6 AL size classes (Large, Medium, and Small for PCP, BPU, and MTM_PCP, WSB for
Energies 2023, 16, 2853 11 of 15
ESP, NUE-Mandrel for GL, and X-tree_5000psi used for naturally flowing wells). The
other algorithms scored (57–75%) as shown in Figure 7.
Energies 2023, 16, x FOR PEER REVIEW AL Size 11 of 15
93.02%
92.42%
92.42%
75.45%
75.45%
Table 4. AL size selection test accuracies.
69.43%
63.30%
62.21%
Accuracy Scores
58.00%
57.40%
** ** **
92.42 92.42 92.42
RF 92.42
80.39 82.39 81.23
The 7.5% RF prediction error resulted from the two wrongly predicted sizes, Small
and Medium, in which Large was the actual size, as illustrated in Figure 8. The reason is
D T of production,
that after years R F some parameters
KNN LR
have similar values,S such
V M as flow rates,
pressures, production years, and WC%, which are recorded in different sizes.
Figure 7. Algorithms test performances on AL and size selection.
Figure 7. Algorithms test performances on AL and size selection.
Figure 8.
Figure RF size
8. RF size selection
selection confusion
confusion matrix
matrix (true
(true label
label vs.
vs. predicted
predicted label).
label).
4.3. Production Performance of Predicted and Actual AL

4.3. Production Performance of Predicted and Actual AL
We conducted a simulation using the PIPESIM simulator to model the performance
of theWe conducted
predicted AL,a well-A,
simulationandusing
well-B.theWell-A
PIPESIM simulator
undergoes CSSto and
model the performance
is produced by SRP
of the predicted AL, well-A, and well-B. Well-A undergoes CSS and
using a thermal sand control pump (275TH7.2S-1.2 Grade-III, Large). Our ML predictedis produced by SRP
using
MTM_PCP a thermal sandofcontrol
instead pumpBPU.
the current (275TH7.2S-1.2
We simulatedGrade-III,
well-ALarge).
productionOur ML predicted
using a MET-
MTM_PCP instead of the current BPU. We simulated well-A production
80V1000 Medium-size thermal pump. Well-B is naturally flowing using X-tree_5000psi. using a MET-
80V1000 Medium-size thermal pump. Well-B is naturally flowing using
The well was simulated according to the predicted AL using 3 GL valves. The results X-tree_5000psi.
The
showedwellthat
wasthesimulated
predictedaccording
AL from the to the
ML predicted
model hasAL using
better 3 GL valves.
performance thanThe results
the current
showed
installedthat
AL.the predicted
Well-A can AL from the
produce 269ML model
STB/D has better
using performance
MTM_PCP, more than than the
the current
current
installed
97 STB/D AL.byWell-A can produce
BPU. Well-B 269 STB/D
can produce 1878using
STB/D MTM_PCP,
using GL,more than the
compared to current
its current97
STB/D
naturalby BPU.
flow Well-B canofproduce
production 1878 STB/D
1260 STB/D. using
Figures GL,10compared
9 and to its current
are the sensitivity natural
analyses of
flow production
the actual of 1260 STB/D.
and predicted Figures 9 The
lifting methods. and figures
10 are the sensitivity
compare eachanalyses of the actual
lifting method’s flow
and
rates,predicted
average lifting methods.
costs, and TheThe
run life. figures
cost compare each lifting
of the predicted method’s
AL could flow rates,
be higher than av-
the
erage
current costs, andmethods;
lifting run life. however,
The cost of thethe predicted
high AL could
production be higher
revenues (above than the current
3 million USD
lifting methods; however, the high production revenues (above 3 million USD from well-
A and 11 million USD from well-B) would undoubtedly cover the operating expenses.
These case study results are promising for further application in AL selection for prospec-
tive wells.
Energies 2023, 16, 2853 12 of 15
from well-A and 11 million USD from well-B) would undoubtedly cover the operating
expenses.
Energies 2023, 16, x FOR PEER REVIEW These case study results are promising for further application in AL12selection
of 15 for
prospective wells.
Well-A
Well-A
Difference Actual BPU Predicted MTMPCP
Difference Actual BPU Predicted MTMPCP
-50
Avg_Run_Life (Day) -50 365
315
Avg_Run_Life (Day) 365
315
97141.51
Avg_Completion (USD) 94726.31
97141.51 191867.82
Avg_Completion (USD) 94726.31
191867.82
29596.97
Avg_Workover (USD) 29596.97 49462.29 79059.26
Avg_Workover (USD) 49462.29
79059.26
24715
Price (USD) 24715 62285
87000
Price (USD) 62285
87000
63145
Flowrate (STB/Year) 35040 63145 98185
Flowrate (STB/Year) 35040
98185
173
Flowrate (STB/Day) 96
173
269
269
-20,000 30,000 80,000 130,000 180,000 230,000
-20,000 30,000 80,000 130,000 180,000 230,000
Flowrate Flowrate Avg_Workover Avg_Completio Avg_Run_Life
Flowrate Flowrate Price (USD) Avg_Workover Avg_Completio Avg_Run_Life
(STB/Day) (STB/Year) Price (USD) (USD) n (USD) (Day)
(STB/Day) (STB/Year) (USD) n (USD) (Day)
Difference 173 63,145 24,715 29,596.97 97,141.51 -50
Difference 173 63,145 24,715 29,596.97 97,141.51 -50
Actual BPU 96 35,040 62,285 49,462.29 94,726.31 365
Actual BPU 96 35,040 62,285 49,462.29 94,726.31 365
Predicted MTMPCP 269 98,185 87,000 79,059.26 191,867.82 315
Predicted MTMPCP 269 98,185 87,000 79,059.26 191,867.82 315
Figure 9. Well-A production and cost difference of actual and predicted AL.
Figure 9. Well-A
Figure production
9. Well-A productionand costdifference
and cost difference
ofof actual
actual andand predicted
predicted AL. AL.
Well-B
Well-B
Difference Actual NF Predicted GL
Difference Actual NF Predicted GL
521
521
889
889
-164,129.69
Avg_Completion (USD) -164,129.69 119,431.31
283,561.00
Avg_Completion (USD) 119,431.31
283,561.00
-159,952.19
Avg_Workover (USD) -159,952.19 65,778.78
225,730.97
Avg_Workover (USD) 65,778.78
225,730.97
30,518
Price (USD) 2500
30,518
33,018
Price (USD) 2500
33,018
225,789
Flowrate (STB/Year) 225,789 459,900
685,689
Flowrate (STB/Year) 459,900
685,689
618.6
618.6
1878.6
1878.6
-300,000 -100,000 100,000 300,000 500,000 700,000 900,000
-300,000 -100,000 100,000 300,000 500,000 700,000 900,000
Flowrate Flowrate Avg_Workover Avg_Completion Avg_Run_Life
Flowrate Flowrate Price (USD) Avg_Workover Avg_Completion Avg_Run_Life
(STB/Day) (STB/Year) Price (USD) (USD) (USD) (Day)
(STB/Day) (STB/Year) (USD) (USD) (Day)
Difference 618.6 225,789 30,518 -159,952.19 -164,129.69 521
Difference 618.6 225,789 30,518 -159,952.19 -164,129.69 521
Actual NF 1260 459,900 2500 225,730.97 283,561.00 368
Actual NF 1260 459,900 2500 225,730.97 283,561.00 368
Predicted GL 1878.6 685,689 33,018 65,778.78 119,431.31 889
Predicted GL 1878.6 685,689 33,018 65,778.78 119,431.31 889
Figure 10. Well-B production and cost difference of actual and predicted AL.
Figure 10. Well-B production and cost difference of actual and predicted AL.
Figure 10. Well-B production and cost difference of actual and predicted AL.
4.4. Critical Production Selection Features
4.4.4.4. Critical
Critical ProductionSelection
The Production
Selection Features
model also highlighted Features
the critical factors affecting AL and size selection in the
The model also highlighted the critical factors affecting AL and size selection in the
TheFigure
field. model11also highlighted
illustrates the critical
the essential featuresfactors
that theaffecting AL and and
RF used foremost sizethat
selection
affect in the
field. Figure 11 illustrates the essential features that the RF used foremost and that affect
field. Figure 11 illustrates the essential features that the RF used foremost and that affect
the AL classification. Gas and produced fluid substantially determine the classification,
followed by the wellhead pressure. In other words, the algorithm depends primarily on
Energies 2023, 16, 2853 13 of 15

the AL classification. Gas and produced fluid substantially determine the classification,
followed by the wellhead pressure. In other words, the algorithm depends primarily on
gas, produced fluid, and wellhead pressure to classify the lifting methods and considers
gas, produced fluid, and wellhead pressure to classify the lifting methods and considers
them crucial factors in AL selection. It is worth mentioning that although there was a poor
them crucial factors in AL selection. It is worth mentioning that although there was a
correlation between input parameters, the model obtained classification results above
poor correlation between input parameters, the model obtained classification results above
90%. According to the results, these factors should be extensively studied in any current
90%. According to the results, these factors should be extensively studied in any current
or prospective oil wells since they determine the production performance. If we take the
or prospective oil wells since they determine the production performance. If we take
gas feature of the old selection technique as described in the literature and in the field, we
the gas feature of the old selection technique as described in the literature and in the
find that it is the worst obstacle to most lifting methods except GL and NF. Our model
field, we find that it is the worst obstacle to most lifting methods except GL and NF. Our
correctly identified the significance of the gas feature and demonstrated how it could in-
model correctly identified the significance of the gas feature and demonstrated how it
fluence AL selection and production performance. The other highlighted critical factors
could influence AL selection and production performance. The other highlighted critical
are theare
factors produced fluid, GOR,
the produced fluid,produced water, thermal
GOR, produced recovery,
water, thermal and the and
recovery, years ofyears
the produc-of
tion. The [30] model lacked any IOR or EOR data that affects
production. The [30] model lacked any IOR or EOR data that affects AL performance,AL performance, while [34]
considered
while it, and ourit,model
[34] considered and our found it afound
model critical
it afeature
criticalinfeature
AL classification. [34] used[34]
in AL classification. the
production rate as the most highly weighted parameter in their
used the production rate as the most highly weighted parameter in their model, supportingmodel, supporting the
importance
the importanceof the
of flow rate feature
the flow in ourinmodel.
rate feature Flow rate
our model. Flow is arate
crucial
is a factor
crucialinfactor
all selection
in all
methods in the literature; however, the literature uses only the flow rate
selection methods in the literature; however, the literature uses only the flow rate limitations limitations of
each AL [5,31–33]. These limitations are proposed operating ranges
of each AL [5,31–33]. These limitations are proposed operating ranges that vary in the that vary in the liter-
ature. OurOur
literature. model studied
model the daily
studied and and
the daily cumulative
cumulative fluidfluid
produced
producedduring the entire
during run-
the entire
life of the AL and provided the optimum lifting method that can last
run-life of the AL and provided the optimum lifting method that can last longer with higher longer with higher
productionperformance,
production performance,as asininthe
theproduction
productionperformance
performancesimulation
simulationresults.
results.
Gas
Total_Fluid
Wellhead_Press
GOR
Feature
Water
IOR_EOR_SF
IOR_EOR_CSS
Year
IOR_EOR_None
Oil
0 0.05 0.1 0.15 0.2 0.25

Importance
Figure11.
Figure 11.Critical
CriticalAL
AL selection
selectionfeatures.
features.
5.
5. Conclusions
Conclusions
AL
AL isis aa milestone
milestone in in the
the oil
oil and
and gasgas industry,
industry,and
andthis
thispaper
paperintroduced
introduced aa new
new AL
AL
selection
selectionapproach
approachthat thatcan
canfacilitate
facilitatethe selection
the selectionand save
and ALAL
save parameter
parameterscreening time
screening in
time
the Sudanese
in the Sudanese oil fields. TheThe
oil fields. developed
developedmodel is universal
model and and
is universal can be
canimplemented in any
be implemented in
other established field using different data and lifting methods. The following
any other established field using different data and lifting methods. The following con- conclusions
are the result
clusions are theof this research:
result of this research:
•• The
The paper
paper introduced
introduced aa new
new ALAL type
type and
and size
size selection
selection approach
approach using
using ML
ML from
from
specific
specificfield
fieldproduction
productiondata.
data. The
The results
results show
show thetherobustness
robustnessof ofML
MLapplication
applicationinin
AL selection from only field production data;
AL selection from only field production data;
•• The
Theapplication
applicationof ofML
MLininALALselection
selection will
will facilitate
facilitate the
the process
process and
andreduce
reducethe
thetime
time
spent using old techniques that are considered outdated to a certain
spent using old techniques that are considered outdated to a certain degree; degree;
• The predicted ALs have higher production performances and revenues than the actual
ones in the field;
Energies 2023, 16, 2853 14 of 15
• Production data from 24 wells from 2006 to 2021 was used for modeling. Data from
2006 to 2019 were used for model training and validation, while data from 2020 to
2021 were used to test model performance;
• Although it is complex to correlate the numeric and categorical input parameters,
the algorithms determine the crucial factors that primarily affect the selection. Gas,
wellhead pressure, produced fluid, GOR, produced water, and years of production are
significant in AL and size selection from production data;
• The model obtained high AL selection accuracies of 93.02% and 92.4%, respectively,
by DT and RF classifiers;
• RF scored the same accuracy of 92.42% in AL size selection, whereas other algorithms
did not perform well;
• Many AL size classes resulted in low model accuracy (69.68% and 75.74% obtained
from 13 and 8 classes, respectively);
• The case study’s results are promising for further application in other fields’ AL selection.
Author Contributions: Conceptualization, M.A.A.M.; methodology, M.A.A.M.; software, M.A.A.M.;

validation, M.A.A.M.; formal analysis, M.A.A.M.; investigation, M.A.A.M.; resources, M.A.A.M.; data
curation, M.A.A.M.; writing—original draft preparation, M.A.A.M.; writing—review and editing,
M.A.A.M., M.A. and G.O.; visualization, M.A.A.M.; supervision, M.A. and G.O. All authors have
read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The data are not publicly available due to a confidentiality request by
the provider. The notebook and code of this study can be provided by the corresponding author
upon reasonable request.
Acknowledgments: The authors would like to thank R. Saleem from the University of Aberdeen, M.
El-feel from Schlumberger, M. Mahgoub from Sudan University of Science and Technology/Petro-
Energy E&P Co. Ltd., Musaab F. Alrahman, Naji, and K. Al-Mubarak from Petro-Energy E&P Co. Ltd.,
and all the reviewers who participated in the review for their support in finalizing and publishing
this article.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Temizel, C.; Canbaz, C.H.; Betancourt, D.; Ozesen, A.; Acar, C.; Krishna, S.; Saputelli, L. A Comprehensive Review and
Optimization of Artificial Lift Methods in Unconventionals. In Proceedings of the SPE Annual Technical Conference and
Exhibition, Virtual, 19 October 2020.
2. Lea, J.F.; Clegg, J.D. Chapter 10. Artificial Lift Selection. In Petroleum Engineering Handbook Volume 4: Production Operations
Engineering; Society of Petroleum Engineers: Dallas, TX, USA, 2007.
3. Beckwith, R. Pumping oil: 155 years of artificial lift. J. Pet. Technol. 2014, 66, 101–107. [CrossRef]
4. Dave, M.K.; Mustafa, G. Performance evaluations of the different sucker rod artificial lift systems. In Proceedings of the SPE
Symposium: Production Enhancement and Cost Optimization, Kuala Lumpur, Malaysia, 7 November 2017.
5. Clegg, J.D.; Bucaram, S.M.; Hein, N.W. Recommendations and Comparisons for Selecting Artificial-Lift Methods (includes
associated papers 28645 and 29092). J. Pet. Technol. 1993, 45, 1128–1167. [CrossRef]
6. Syed, F.I.; Alshamsi, M.; Dahaghi, A.K.; Neghabhan, S. Artificial lift system optimization using machine learning applications.
Petroleum 2020, 8, 219–226. [CrossRef]
7. Shi, J.; Chen, S.; Zhang, X.; Zhao, R.; Liu, Z.; Liu, M.; Zhang, N.; Sun, D. Artificial lift methods optimising and selecting based on
big data analysis technology. In Proceedings of the International Petroleum Technology Conference, Beijing, China, 22 March 2019.
8. Noonan, S.G. The Progressing Cavity Pump Operating Envelope: You Cannot Expand What You Don’t Understand. In
Proceedings of the International Thermal Operations and Heavy Oil Symposium, Calgary, AB, Canada, 20 October 2008.
9. Naderi, A.; Ghayyem, M.A.; Ashrafi, M. Artificial Lift Selection in the Khesht Field. Pet. Sci. Technol. 2014, 32, 1791–1799.
[CrossRef]
10. Espin, D.A.; Gasbarri, S.; Chacin, J.E. Expert system for selection of optimum Artificial Lift method. In Proceedings of the SPE
Latin America/Caribbean Petroleum Engineering Conference, Buenos Aires, Argentina, 27 April 1994.
11. Alemi, M.; Jalalifar, H.; Kamali, G.; Kalbasi, M. A prediction to the best artificial lift method selection on the basis of TOPSIS
model. J. Pet. Gas Eng. 2010, 1, 009–015.
Energies 2023, 16, 2853 15 of 15
12. Mohamed, A.E. Comparative study of four supervised machine learning techniques for classification. Int. J. Appl. Sci. Technol.
2017, 7, 2.
13. Makeen, Y.M.; Shan, X.; Ayinla, H.A.; Adepehin, E.J.; Ayuk, N.E.; Yelwa, N.A.; Yi, J.; Elhassan, O.M.; Fan, D. Sedimentology,
petrography, and reservoir quality of the Zarga and Ghazal formations in the Keyi oilfield, Muglad Basin, Sudan. Sci. Rep. 2021,
11, 743. [CrossRef] [PubMed]
14. CNPC Country Reports, CNPC in Sudan 2009. Available online: https://www.cnpc.com.cn/en/crsinSudan/AnnualReport_list.
shtml (accessed on 15 January 2022).
15. Liu, B.; Zhou, S.; Zhang, S. The Application of Shallow Horizontal Wells in Sudan. In Proceedings of the International Oil and
Gas Conference and Exhibition in China, Beijing, China, 8 June 2010.
16. Akosa, J. Predictive accuracy: A misleading performance measure for highly imbalanced data. Proc. SAS Glob. Forum 2017,
12, 942.
17. Krawczyk, B. Learning from imbalanced data: Open challenges and future directions. Prog. Artif. Intell. 2016, 5, 221–232.
[CrossRef]
18. Ounsakul, T.; Sirirattaanachatchawan, T.; Pattarachupong, W.; Yokrat, Y.; Ekkawong, P. Artificial lift selection using machine
learning. In Proceedings of the International Petroleum Technology Conference, Beijing, China, 22 March 2019.
19. Brownlee, J. Data Preparation for Machine Learning: Data Cleaning, Feature Selection, and Data Transforms in Python; Machine Learning
Mastery: Vermont, Australia, 2020; p. 111.
20. Kotsiantis, S.B.; Zaharakis, I.; Pintelas, P. Supervised machine learning: A review of classification techniques. Emerg. Artif. Intell.
Appl. Comput. Eng. 2007, 160, 3–24.
21. Osisanwo, F.Y.; Akinsola, J.E.T.; Awodele, O.; Hinmikaiye, J.O.; Olakanmi, O.; Akinjobi, J. Supervised machine learning algorithms:
Classification and comparison. Int. J. Comput. Trends Technol. (IJCTT) 2017, 48, 128–138.
22. Gianey, H.K.; Choudhary, R. Comprehensive review on supervised machine learning algorithms. In Proceedings of the 2017
International Conference on Machine Learning and Data Science (MLDS), Noida, India, 14–15 December 2017; IEEE: New York,
NY, USA, 2017; pp. 37–43.
23. Jurafsky, D.; Martin, J.H. Speech and Language Processing, (draft) edition; Prentice Hall: Hoboken, NJ, USA, 2021; Chapter 5.
24. Mathur, A.; Foody, G.M. Multiclass and binary SVM classification: Implications for training and classification users. IEEE Geosci.
Remote Sens. Lett. 2008, 5, 241–245. [CrossRef]
25. Fletcher, T. Support Vector Machines Explained; Tutorial paper; University College London: London, UK, 2009; pp. 1–19.
26. Taunk, K.; De, S.; Verma, S.; Swetapadma, A. A brief review of nearest neighbor algorithm for learning and classification. In
Proceedings of the 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 15–17
May 2019; IEEE: New York, NY, USA, 2019; pp. 1255–1260.
27. Tan, P.N.; Steinbach, M.; Kumar, V. Classification: Basic concepts, decision trees, and model evaluation. Introd. Data Min. 2006,
1, 145–205.
28. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [CrossRef]
29. Hastie, T.; Tibshirani, R.; Friedman, J. Random forests. In The Elements of Statistical Learning; Springer: New York, NY, USA,
2009; pp. 587–604.
30. Crnogorac, M.; Tanasijević, M.; Danilović, D.; Karović Maričić, V.; Leković, B. Selection of Artificial Lift Methods: A Brief Review
and New Model Based on Fuzzy Logic. Energies 2020, 13, 1758. [CrossRef]
31. Neely, B.; Gipson, F.; Clegg, J.; Capps, B.; Wilson, P. Selection of artificial lift method. In Proceedings of the SPE Annual Technical
Conference and Exhibition, San Antonio, TX, USA, 4 October 1981.
32. Brown, K.E. Overview of artificial lift systems. J. Pet. Technol. 1982, 34, 2384–2396. [CrossRef]
33. Heinze, L.R.; Winkler, H.W.; Lea, J.F. Decision Tree for selection of Artificial Lift method. In Proceedings of the SPE Production
Operations Symposium, Oklahoma City, OK, USA, 9–10 April 1995.
34. Adam, A.M.; Mohamed Ali, A.A.; Elsadig, A.A.; Ahmed, A.A. An Intelligent Selection Model for Optimum Artificial Lift Method
Using Multiple Criteria Decision-Making Approach. In Proceedings of the Offshore Technology Conference Asia, Kuala Lumpur,
Malaysia, 18 March 2022; OnePetro: Kuala Lumpur, Malaysia, 2022.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Energies 16 02853

Uploaded by

Copyright:

Available Formats

You might also like

Energies 16 02853

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Energies 16 02853

Uploaded by

Copyright:

Available Formats

energies

Energies 2023, 16, 2853. https://doi.org/10.3390/en16062853 https://www.mdpi.com/journal/energies

Cumulative oil (Thousands STB) 2,500

Energies 2023, 16, 2853 5 of 15

Best AL and Unlabelled

In our model, the equation is written as:

AL Selection Model = Algorithm (Field Production Data) (3)

Algorithms Classification Criteria

(Wi )T ∅( X ) + b j where j = 1, . . . , n (9)

We applied the radial-based kernel (Gaussian kernel) in our model, which is

The algorithm’s performance is susceptible to the K value, which is difficult to estimate.

4. Results and Discussion

Table 1. AL selection model training and validation accuracies.

Algorithm Training Accuracy (%) Validation Accuracy (%)

Table 2. AL selection test accuracies.

Precision (%) Recall (%) F1 Score (%)

The accuracy of both RF and DT in training and testing is demonstrated in Figure 5.

Training Error Test Error Training Error Test Error

Table 3. AL selection results in comparison to a recent study using ML.

Category This Study [18]

4.2. AL Size Selection Results

Table 4. AL size selection test accuracies.

Precision (%) Recall (%) F1 Score (%)

Energies 2023, 16, x FOR PEER REVIEW AL Size 11 of 15

4.3. Production Performance of Predicted and Actual AL

Energies 2023, 16, 2853 13 of 15

0 0.05 0.1 0.15 0.2 0.25

Author Contributions: Conceptualization, M.A.A.M.; methodology, M.A.A.M.; software, M.A.A.M.;

You might also like