Professional Documents
Culture Documents
1 s2.0 S0920410521015217 Main
1 s2.0 S0920410521015217 Main
Keywords: In recent years, machine learning has been adopted in the Oil and Gas industry as a promising technology for
Drilling solutions to the most demanding problems like downhole parameters estimations and incidents detection. A
Machine learning big amount of available data makes this technology an attractive option for solving a wide variety of drilling
Rate of penetration
problems, as well as a reliable candidate for performing big-data analysis and interpretation. Nevertheless,
Drilling data quality improvement
this approach may cause, in some cases, that petroleum engineering concepts are disregarded in favor of more
Recurrent neural networks
data-intensive approaches. This study aims to evaluate the impact of drilling data measurement correction on
data-driven model performance. In our study, besides using the standard data processing technologies, like
gap filling, outlier removal, noise reduction etc., the physics-based drilling models are also implemented for
data quality improvement and data correction in consideration of the measurement physics, rarely mentioned
in most of publications. In our case study, recurrent neural networks (RNN) that are able to capture temporal
natures of a signal are employed for the rate of penetration (ROP) estimation with an adjustable predictive
window. The results show that the RNN model produces the best results when using the drilling data recovered
through analytical methods. Moreover, the comprehensive data-driven model evaluation and engineering
interpretation are conducted to facilitate better understanding of the data-driven models and their applications.
1. Introduction (2017), Soares and Gray (2020) and Barbosa et al. (2019b), different
authors have presented machine learning approaches (KNN, GBM, RF,
1.1. Background SVM etc.) for data-driven rate of penetration modeling. For example,
in Han et al. (2018), the authors investigated the performance of the
Having an accurate method for the rate of penetration (ROP) pre- RF model for ROP predictions and compared it with other ML archi-
diction has been the one of the objectives of the drilling engineering tectures, like Artificial Neural Networks (ANN’s) (McCulloch and Pitts,
industry since the 1950s. However, this problem has been proven to be 1943). It showed that the results using the RF were optimal, specially
more complex than initially anticipated as there is no obvious relation considering the situation when limited data was available. Another ML
between a single drilling parameter and the target ROP value. Different
model that has been used for the ROP prediction is the mentioned
variables, e.g. weight on bit (WOB), rotary speed (RPM), standpipe
ANN’s, see Amer et al. (2017), Han et al. (2019) and Moran et al.
pressure (SPP), formation/bit properties, interact in ways such that it is
(2010). In general, there are good results for the ROP prediction using
difficult to formulate an accurate and dynamic mathematical expression
the above-mentioned models with different ML structures, see Hegde
to predict the ROP dynamics or estimate a good value close to the
ground truth. The complexity of the parameters’ relationships involved et al. (2019), Esmaeli et al. (2012), Tunkiel et al. (2021) and Soares
in the ROP models makes the use of machine learning (ML) technology and Gray (2020).
appealing (Barbosa et al., 2019a). Some of the most important challenges of data-driven modeling are
There are a number of ML methods to formulate a regression prob- data issues like data availability, data generality, data quality, data
lem, like Random Forest (RF) (Breiman, 2001), Gradient Boosting Ma- scaling, and data selection. Among them, the data quality issue is very
chines (GBM) (Friedman, 2001), K-nearest Neighbors (KNN) (Altman, critical for model performances. Data problems due to sensors’ failure,
1992), Support Vector Machine (SVM) (Christiani and Shawe-Taylor, malfunction, miscalibration, user errors, sampling frequencies, time
2000). In Ahmed et al. (2019), Eskandarian et al. (2016), Hegde et al. delay, data corruptions and so on, are often encountered, as seen in the
∗ Corresponding author.
E-mail addresses: andrzej.t.tunkiel@uis.no (A.T. Tunkiel), dan.sui@uis.no (D. Sui).
URL: https://github.com/mauro-encinas (M.A. Encinas).
https://doi.org/10.1016/j.petrol.2021.109904
Received 5 July 2021; Received in revised form 9 November 2021; Accepted 22 November 2021
Available online 7 December 2021
0920-4105/© 2021 Elsevier B.V. All rights reserved.
M.A. Encinas et al. Journal of Petroleum Science and Engineering 210 (2022) 109904
discussions in Geekiyanage et al. (2020). High data quality is desired that maintain the information from previous computations when pro-
to provide valuable and useful information. Data processing techniques cessing new information. Recently, a novel and accurate approach
like data filtering, cleansing, imputation, resampling, outlier removal (Tunkiel et al., 2021) was proposed to solve a depth-based series
and data correction are the necessary steps to improve data quality, that problem using a multi branch structure that contains a multilayer
are always implemented in data preparation work. However, to the best perceptron (MLP) (Jain et al., 1996) and a RNN to predict the well
of the authors’ knowledge, there is limited research focused on how to inclination in the section between the downhole sensor and the drilling
improve or correct the data with the help of measurement physics or bit. In our work, the RNN ROP model developed follows a similar model
analytical methods. In some cases, like sensor failure or malfunction, structure and configuration as the one presented in Tunkiel et al. (2021)
it would be beneficial to validate and correct wrong measurements with some changes, see Section 3.
through the use of physics-based models. (C3): The RNN model prediction abilities were evaluated and compared
with and without the data correction step, see Section 4. This was
1.2. Objectives and novelty done to verify if the ML models can significantly benefit from corrected
data, or the ML algorithms can overcome flawed inputs. Moreover, the
The main objective of this study is to develop a data-driven model other evaluated consideration is the model prediction capability, see
that is able to predict the ROP with a high degree of confidence. Section 5.
The second objective is to show the importance of data correction on
increasing the accuracy of data-driven models, where the input data 2. Data preparation
quality is improved using not only standard signal processing methods,
but also the analytical physics-based models. The specific objectives As a general overview of drilling data, it is important to understand
towards fulfilling this goal are: the parameters’ physics. This helps us understand whether the behavior
of the data is appropriate or erratic. For instance, how to identify an
• Identify the key parameters involved in the ROP prediction; outlier from a data-set that could present some meaningful information,
• Determine/correct downhole weight on bit (DWOB) from the or is outside of the previously shown pattern? The most of the common
surface measurements based on physics-based models; drilling data issues are the presence of missing data, outliers, noisy
• Implement and evaluate the different ML models; data, and faulty measurements. Each of these issues can be solved
• Analyze and interpret ML models from data science and drilling in different ways. In this section, we first present some widely used
engineering perspectives respectively; methods in signal processing domain to solve common data issues via
• Provide a benchmark data-set that provides the opportunity for noise reduction, imputation, outlier removal and redundant measure-
reproducible results. ments. Considering the specifics of drilling data, especially the HL and
The main innovation behind this paper is to show that feature WOB, the analytical methods are employed here for such measurement
engineering can be a significant factor in prediction quality in relation correction based on mathematical drilling models.
to the ROP prediction. We present a case study, where Hookload (HL)
and DWOB data, recovered through analytical methods, yields signifi- 2.1. Signal processing methods
cant improvements over a model utilizing only the raw available data,
where data is processed purely using standard data processing methods. 2.1.1. Data imputation
The case study illustrates how much improvement data reconciliation Most of data sets have missing information that could be repre-
can bring to the method utilizing branched ML model, expanding the sented as Not a Number (NaN) values in the set. There are two principal
understanding of the method’s potential and sensitivity to data quality. ways of dealing with this situation, eliminating the row containing
The work presented in this paper in terms of the model architecture such value or replacing it with an estimated value determined by
relies on the previously published work (Tunkiel et al., 2021), where the behavior of the analyzed feature. The first case is used when the
a two-branched model architecture utilizing recurrent elements was available data set is large enough that the elimination of rows does not
introduced. This previous work focused on the inclination data only affect the quality of the whole data set. When the row elimination is not
and therefore the performance of the method for other applications an option, filling of these empty cells (i.e. estimating the intermediate
was unknown. The work presented here re-uses the ML architecture values) can be done by using interpolation techniques (Hauser, 2009)
to predict the ROP, a first attempt with the ROP prediction is based or regression methods, such as linear, quadratic, cubic and polynomial
interpolation. To decide which one to be used will depend on the
on an ML model utilizing both Recurrent Neural Networks as well as
particular characteristics of data sets. For instance, it can be beneficial
multilayer perceptron elements.
to perform a check on each attribute’s first numerical derivative. If
the most of values of the first derivative are zero, then forward filling
1.3. Methodologies and structures
(using last valid value forward) is selected as presumably the better
method. This will identify rarely changing values such as wellbore
Presented research is divided into 3 categories:
diameter, which is more likely to change rapidly or discretely, where
• (C1) Data preparation and correction; linear interpolation would introduce unrealistic slopes to the data.
• (C2) Data driven ROP modeling via Recurrent Neural Networks
(RNN) (Rumelhart et al., 1986); 2.1.2. Data assimilation
• (C3) RNN model evaluation and interpretation. Fig. 1 shows one encountered drilling data issue: the hookload
measurements below 2200 m did not follow the same incremental
(C1): The data cleaning process used in this study (like noise reduction, tendency as the data above 2200 m. It is known that in wells that
outlier removal, gap filling) is conducted to remove outliers and correct are not horizontal, the weight of the string increases continuously with
erratic measurements, see Section 2.1. This work also comprises the depth because more pipes are added to the well, considering that we
data correction based on measurement physics to improve the accu- analyzed a hole section that used the same drillstring to be drilled (see
racy of the measurements. For this purpose, a physics-based model is Section 4.1.2). In the presented case, this pattern was not followed
implemented to correct the WOB measurements before feeding them to by the hookload measurements, as it changes abruptly at 2200 m
ML models, see Section 2.2. measured depth. Therefore, as all the other features in the data set
(C2): Recurrent Neural Networks, such as Long–Short-Term-Memory contained valuable information, it was necessary to correct this faulty
(LSTM) (Hochreiter and Schmidhuber, 1997) access data sequentially, measurement.
2
M.A. Encinas et al. Journal of Petroleum Science and Engineering 210 (2022) 109904
1 ∑
𝑁−1
𝑦(𝑡) = 𝑥(𝑡 − 𝑗),
𝑁 𝑗=0
where 𝑥 is the raw data, 𝑦 is the processed data after filtering, 𝑡 is the
time/depth coordinate and the index 𝑗 corresponds to the number of
convolution steps for a data point at time 𝑡, 𝑁 is the number of data
point span considered for the average taking. The moving average filter
is very practical for engineers since it does not require much frequency
analysis for cutoff frequency selection. However it might lead to time-
delay issues. In some cases when the quick reactions are needed, low
pass filter is an alternative to the moving average filter over a certain
frequency selected and the delay problem might be resolved, see more
discussions in Geekiyanage et al. (2020).
3
M.A. Encinas et al. Journal of Petroleum Science and Engineering 210 (2022) 109904
applied from hydraulic and electric lines. For this correction, Eq. (7)
in Hareland et al. (2014) is used:
Fig. 3. DWOB Correction Diagram.
HLa3 = 𝛼 ∗ SPP ∗ ID2 , (7)
where HLa3 is the load effect due to standpipe pressure, 𝛼 is the model
coefficient, SPP is the standpipe pressure and ID the inner diameter of applied taking into account. For example, if the error is a positive
the mud hose. With this equation, we aim to determine the value of value, meaning that the calculated HL is bigger than the corrected
the force applied by the mud filled hose connected to the Top Drive. HL, the model will increase the DWOB in a predetermined value to
Eq. (8) shows the final result of this process: reduce this difference. This process is repeated until the error is equal or
HLcorrected = HLa1 − HLa2 − HLa3 , (8) smaller than the threshold value. In order to validate the calculations,
the results of this algorithm were compared to the ones obtained using
where HLcorrected is the corrected hookload with considering these three Halliburton’s WellPlan Software (Halliburton, 2020), see Section 5.1.
effects. Fig. 2 shows the necessary inputs for the model (HL, Weight of Fig. 3 shows the flowchart of DWOB correction algorithm. The input
Traveling Block and SPP), the equations used and finally the output of data of this algorithm is the corrected HL, friction factor, well path,
this process. More detailed information about the use of these equations BHA data and fluid density. The output is the calculated DWOB.
for the analyzed case are given in Section 4.2.1.
3. Data-driven ROP modeling
2.2.2. WOB correction
Once the corrected HL is calculated, it is possible to calculate the Throughout this section, the main data-driven ROP modeling steps
DWOB using the Torque and Drag (T&D) model. Fig. 3 presents the flow will be presented, including the identification of the most important
process used for the calculation of the DWOB for given data points. The parameters of the ROP models and the ML architecture.
inputs besides the aforementioned are the friction factor, well trajec-
tory, BHA configuration and fluid density. The T&D model selected for 3.1. Key features
this study was developed by Johancsik et al. (1984). The methodology
to calculate the DWOB is based on the shooting method Press et al. Physics-based ROP models have been developed throughout the
(2007). The idea of shooting method is that an initial guess is made of years, starting in the 1950’s. Among them, the one of the most impor-
the unknown boundary conditions at one end of the interval. Using this tant developments in those earlier years is the Bingham model (Bour-
guess, the terminal conditions obtained from the numerical integration goyne et al., 1991) which considers two main operational parameters,
are compared to the known terminal conditions and if the integrated WOB and RPM. As the years passed, the more developments were
terminal conditions differ from the known terminal conditions by more presented. Some of them are regarded as a reliable source for the
than a specified tolerance, the unknown initial conditions are adjusted ROP calculation, for example, the model presented in Bourgoyne et al.
and the process is repeated until the difference between the integrated (1991). It was developed in a time where tri-cone bits were used
terminal conditions and the required terminal conditions is less than but has been adapted throughout the years to be used with PDC
some specified threshold. bits, see Wiktorski et al. (2017). One of the recent models was pre-
First, the algorithm sets an initial DWOB guess which is used as an sented by Motahhari et al. (2010), where the different factors were
input for the T&D model, then the T&D model calculates the weight taken into considerations, for example, confined rock strength, cutter
of the drill string (calculated HL) under the given conditions. This diameter (PDC bits), pore pressure gradient, Equivalent Circulating
calculated HL is compared to the results obtained in the previous step Density (ECD), and many more. Some of these parameters are available
at the same depth. The result of this difference (error) is then identified and possibly do not change between one location and the other one.
as either a positive or negative value. It will be positive if the calculated However in most cases most of measurements are not always available
HL is greater than the corrected HL1 and vice versa. If this error is or they are only applicable in some specific areas.
greater than a predetermined threshold, a correction to the DWOB is It is important to analyze how ML models function with their pa-
rameters and which parameters have been historically selected for this
purpose. In Barbosa et al. (2019a), the authors presented a study based
1
The steps for the corrected HL are given in Section 4.1.2. on more than 50 publications which studied machine learning ROP
4
M.A. Encinas et al. Journal of Petroleum Science and Engineering 210 (2022) 109904
models, showing that 8 parameters were the most used or mentioned in Fig. 5. Presented Machine Learning Model Structure, Two Branches: RNN Branch (left)
and MLP Branch (right).
all of them. Therefore, in our study, based on the available data-sets and
the correlation index, see Fig. 4, the following parameters are included
in the model for this study:
past ROP data, which goes through a Gaussian noise layer to ensure
• Bit Depth; that we generate some levels of randomization in the data (the value
• Average Hookload; used is 0.007 of standard deviation of noise distribution). Afterwards,
• WOB (surface or calculated downhole ones); the information goes towards the RNN. The output of the LSTM cells,
• Surface Torque (STQ); enters a dropout layer as an additional measure to avoid overfitting.
• RPM; The other branch of the model (MLP), uses all other parameters
• Standpipe Pressure. selected for the study as inputs (Depth, WOB, HL, STQ, RPM and
SPP). This information goes directly to the MLP layer, the output of
The heatmap of the correlation between the ROP and the features
the MLP goes to a dropout layer to avoid overfitting. The output of
selected for this study is shown in Fig. 4. It is easy to see that the
the dropout layer enters a flatten layer as a previous step towards
correlation score between the HL and the ROP, and the one between the
another MLP layer. The output of both branches RNN and MLP are then
DWOB and the ROP are quite high. Hence it shall be with the necessary
step to correct the HL data and DWOB data for the ML data preparation. concatenated in order to obtain a single tensor that is the input for an
In addition, from the heatmap, the correlation score between the DWOB additional MLP layer. Finally, the output of the last MLP will provide
and the ROP is higher than the one between the SWOB and the ROP, the predicted ROP, during the training stage. The model is ran several
meaning that if the DWOB is available, it would be wise to prioritize times and then the model with the lowest validation score is selected.
the good downhole measurements, see more discussions in Section 4.3. In the current work, no quantitative work was performed to identify
the cost–benefit curve for quantity of validation runs against model
3.2. Machine learning model structure performance gains. For the purpose of work presented in this paper
the model was re-trained from scratch 10 times. The desired value is
This study uses a continuous learning idea proposed by Liu (2017) dependent on the performance increase (about 10% in presented case
and applied in the domain of drilling by Tunkiel et al. (2021) to study) evaluated against additional time needed to perform training
continuously update the developed ML model. In Tunkiel et al. (2021), (1-10 min in presented case study), which will be dependent on the
the model considers an expanding data set, which represents a real life computational hardware available, problem at hand, complexity of the
case for drilling operations. The initial available data equals to 15% network, available time and the effect of diminishing returns. More
of the total available data, that will be used to train and validate the quantitative methods with respect to time, trial and the heuristics
model. And the following 20% of the total available data is used for involved for optimal model development will be further investigated.
testing, which is the data that the model has never seen, completely out The data set has to be reshaped to fit the presented machine learning
of samples. Afterwards, the quantity of the available data for training model, which means that samples are created to contain the input and
is increased and the training and validation process is repeated, see the predicted output of the model. In Fig. 6, the attribute A is the one to
detailed presentation in Tunkiel et al. (2021). be predicted. A user shall specify the length of past data of the target
In our ROP modeling, the model structure is composed of two attribute (A) to be fed to the RNN branch (data shown in green boxes),
branches, a RNN branch and an MLP branch, see Fig. 5. This structure and also the length of the predicted output from the MLP (data shown
was originally presented by Tunkiel et al. (2021, 2020). In this work the in the red boxes). The blue boxes with the attributes (B, C, D) are the
main changes are primarily given by the fact that the RNN branch uses other input attributes fed to the MLP branch. In general, the length
LSTM cells instead of Gated Recurrent Unit. Additionally, the number of the green boxes (N1) and red boxes (N2) are selected by the users
of LSTM cells and neurons in MLP were determined after a hyper- in consideration with the sampling interval and model accuracy. More
parameter tuning process. In the RNN branch, the input is only the discussions on the choice of N1 and N2 are given in Section 5.2.
5
M.A. Encinas et al. Journal of Petroleum Science and Engineering 210 (2022) 109904
Table 1
Parameters used in case study with ranges.
Feature Units Range
1 Bit Depth (MD) meters 1900–2400
2 Surface Weight on Bit tonnes 0.7–7.2
3 Average Hookload tonnes 126–143
4 Surface Torque kN-m 14.9–25.3
5 Downhole Weight on Bit tonnes 0.5–7.4
7 Average Rotary Speed rpm 150
8 Average Standpipe Pressure kPa 18363–20646.5
6
M.A. Encinas et al. Journal of Petroleum Science and Engineering 210 (2022) 109904
Fig. 7. Hookload vs Depth Comparison before Correction (blue) and after Data
Assimilation Correction (green).
7
M.A. Encinas et al. Journal of Petroleum Science and Engineering 210 (2022) 109904
Fig. 11. Comparison between Calculated DWOB (blue) and Measured SWOB (green).
0.25 m. Since the available surveys for this well were registered each
40 m, it is necessary to interpolate the well trajectory to obtain at least
5 m interval of survey points in the BHA section and 20 m in the drill
pipe section. This was done to obtain more accurate results of the HLs,
Fig. 9. Processed Data Elements. as the efficacy of this model relies on the quality of this parameter.
The shooting method mentioned in Section 2.2.2 was numerically
implemented to search the best matched DWOB following the proce-
dure presented in Fig. 3. First, a lower boundary value of the DWOB
is selected. In this case the first assumption was to set this value
equivalent to the registered surface WOB (SWOB). Then the model
back-calculates the total weight of the drill string, which is then com-
pared to the corrected HL value. For this case study the threshold was
set at 0.25Ton and if the difference is greater than this value, the DWOB
guess value is changed. This process is repeated until the difference is
smaller than the predefined threshold. Once the termination condition
is satisfied, the guess value of DWOB is the estimated WOB based
on T&D model calculation with regards to the calculated HL. Fig. 11
presents the estimated DWOB results (blue curve) obtained from this
process and the registered SWOB (green curve). It shows that the DWOB
and SWOB have big deviations during two sections (1900 m–2000 m
and 2200 m- 2300 m). In the following section, we will compare the
ML models’ results with using two different measurements respectively,
especially for ROP predictions in such two sections.
8
M.A. Encinas et al. Journal of Petroleum Science and Engineering 210 (2022) 109904
Fig. 12. Error in Predicted ROP Comparison when Using Calculated DWOB (blue) and
SWOB (red).
Hookload, Surface Torque, RPM and SPP), while the only input to be
different is the WOB (SWOB measurement or corrected DWOB).
Fig. 12 presents the model results for the case where the calculated
DWOB is used as the input (blue line), and the one where the SWOB is
used as the input (red line). In this plot the MAE is located in the y axis,
while the data percentage (continuous learning such that the training
data is increasing with the depth during operations) is located on the
x axis. It is observed that the use of the calculated DWOB provides
an overall better result for the ROP predictions, especially in the first Fig. 13. Heatmap Comparison ROP Prediction Using Corrected DWOB and Measured
drilling section (approximately 18%–26%). SWOB.
Fig. 13 shows how the MAE varies with the drilled depth and
the fixed prediction length (5 m) using the corrected DWOB and the
measured SWOB respectively. In the y axis we have the drilled depth
in meters; and in the x axis it represents the distance predicted ahead
of the bit in meters. The figure is divided into three sections, on the
left hand side is the results for the calculations using the DWOB; in
the middle is the results for the calculations using the SWOB and on
the right is the color bar to understand the quality of the prediction
(MAE).
It is noted that at the beginning with less information (15% of
the data used to train the model which represents only 75 m), it
achieves a good level of the precision in the prediction with the use
of corrected DWOB. It is also noted that for this case only in a part
between 2125–2150 m the prediction error obtained is higher than
6 m/h, while the rest of the section is predicted properly. Another
observation is that with the more data used for training, the error in
the prediction decreases. For the case of the ROP prediction using the
SWOB as the input, at the beginning of the well section, we encounter
with higher errors. As the number of training samples is increasing,
the error reduces but still, when compared to the plot on the left
(calculated DWOB) there is a higher overall error in the predictions,
more discussions on the results are given in Section 5.2–5.3.
Fig. 14 presents the different HL results. In this plot three values are as the verification of the proposed calculations. From this figure, it
shown. The red line represents the results of the corrected HL, which
is also possible to see that the implemented model (red one) works
takes into account three effects. The yellow line shows the results for
the corrected HL calculated by the T&D model. Finally, the blue line properly to maintain a low difference with the calculated HL (yellow
shows the values obtained by Halliburton’s WellPlan, which was used one).
9
M.A. Encinas et al. Journal of Petroleum Science and Engineering 210 (2022) 109904
10
M.A. Encinas et al. Journal of Petroleum Science and Engineering 210 (2022) 109904
5.5. Discussions
11
M.A. Encinas et al. Journal of Petroleum Science and Engineering 210 (2022) 109904
utilizes a complete series of inputs for each attribute used, making Declaration of competing interest
processes such as ROP optimization much more complex. The inputs
can be arbitrary only to an extent, as they have to be realistic. The The authors declare that they have no known competing finan-
ML model will respond in a meaningful way only for the area of input cial interests or personal relationships that could have appeared to
hyperspace that was covered in training. In the presented case study, influence the work reported in this paper.
the input is three dimensional as required when working with LSTM
networks, the batch size is constantly increasing per the principle of Acknowledgments
continuous learning and one unit in the input sequence in each training
cycle, which necessarily means that training covers only a small portion We would like to express gratitude to Equinor, Norway for providing
of this hyperspace, even if the whole realistic area is covered. funding for this research through Equinor Akademia Program and
thank OTICS at UiS for supporting the project.
Abbreviations
References
ANN Artificial Neural Network Aadnoy, B.S., Fazaelizadeh, M., Hareland, G., 2010. A 3d analytical model for wellbore
friction. J. Can. Pet. Technol. 25–36. http://dx.doi.org/10.2118/141515-PA.
BHA Bottom Hole Assembly Ahmed, O., Adeniran, A.A., Samsuri, A., 2019. Computational intelligence based
prediction of drilling rate of penetration: a comparative study.. J. Pet. Sci. Eng.
DA Data Assimilation 1–12.
Altman, N.S., 1992. An introduction to kernel and nearest-neighbor nonparametric
DWOB Downhole Weight on Bit regression. Amer. Statist..
Amer, M.M., DAHAB, A.S., El-Sayed, A.-A.H., 2017. An ROP predictive model in Nile
Delta area using artificial neural networks. D023S012R001, URL https://doi.org/
ECD Equivalent Circulation Density 10.2118/187969-MS.
Barbosa, L.F.F., Nascimento, A., Mathias, M.H., de Carvalho, J.A., 2019a. Machine
GBM Gradient Boosting Machines learning methods applied to drilling rate of penetration prediction and optimization
- a review. J. Pet. Sci. Eng. 183, URL http://www.sciencedirect.com/science/
HL Hookload article/pii/S0920410519307533.
Barbosa, L.F.F.M., Nascimento, A., Mathias, M.H., Carvalho, J.A., 2019b. Machine
learning methods applied to drilling rate of penetration prediction and optimization
IQR Interquartile Range
- a review.. J. Pet. Sci. Eng..
Bourgoyne, A., Millheim, K., Chenevert, M., Young, F., 1991. Applied Drilling
KNN K-nearest Neighbors Engineering, Vol. 2. SPE Textbook Series.
Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32, URL https://doi.org/
LSTM Long Short Term Memory 10.1023/A:1010933404324.
Cayeux, E., Skadsem, H.J., Kluge, R., 2015. Accuracy and correction of hook load
MAE Mean Average Error measurements during drilling operations. D021S010R005, URL https://doi.org/10.
2118/173035-MS.
Christiani, N., Shawe-Taylor, J., 2000. Support Vector Machines: Data Analysis, Machine
MAPE Mean Average Percentage Error Learning and Applications. Cambridge University Press, 0521780195.
Deep, R., 2005. Probability and Statistics with Integrated Software Routines. Academic
ML Machine Learning Press, p. 290.
Equinor, 2018. Disclose all volve data. https://www.equinor.com/en/news/14jun2018-
MLP Multilayer Perceptron disclosing-volve-data.html(Accessed: 19 May 2020).
Equinor, 2020. Volve field. https://www.equinor.com/en/what-we-do/norwegian-
continental-shelf-platforms/volve.html(Accessed: 19 May 2020).
NaN Not a Number
Eskandarian, S., Bahrami, P., Kazemi, P., 2016. A comprehensive data mining approach
to estimate the rate of penetration: application of neural network, rule based models
RF Random Forest and feature ranking.. J. Pet. Sci. Eng..
Esmaeli, A., Elahifar, B., Fruhwirth, R., Thonhauser, G., 2012. ROP modeling using
RNN Recurrent Neural Network neural network and drill string vibration data. Soc. Petrol. Eng. SPE 163330
http://dx.doi.org/10.2118/163330-MS.
ROP Rate of Penetration Friedman, J.H., 2001. Greedy function approximation: A gradient boosting machine.
Ann. Statist. 29 (5), 1189–1232. http://dx.doi.org/10.1214/aos/1013203451, URL
https://projecteuclid.org:443/euclid.aos/1013203451.
RPM Rotary Speed
Geekiyanage, S.C.H., Sui, D., Aadnoy, B.S., 2018. Drilling data quality management:
Case study with a laboratory scale drilling rig. Volume 8: Polar and Arctic Sciences
SPP Stand Pipe Pressure and Technology; Petroleum Technology, URL https://doi.org/10.1115/OMAE2018-
77510.
SVM Support Vector Machine Geekiyanage, S., Tunkiel, A.T., Sui, D., 2020. Drilling data quality improvement and
information extraction with case studies. J. Petrol. Explor. Prod. Technol. http:
STQ Surface Torque //dx.doi.org/10.1007/s13202-020-01024-x.
Halliburton, 2020. WellPlan engineering software. https://www.landmark.solutions/
WellPlan-Well-Engineering-Software.
SWOB Surface Weight on Bit Han, T., Jiang, D., Zhao, Q., Wang, L., Yin, K., 2018. Comparison of random forest,
artificial neural networks and support vector machine for intelligent diagnosis
T&D Torque and Drag of rotating machinery. Trans. Inst. Meas. Control 40 (8), 2681–2693, URL https:
//doi.org/10.1177/0142331217708242.
WOB Weight on Bit Han, J., Sun, Y., Zhang, S., 2019. A data driven approach of ROP prediction and
drilling performance estimation. D011S010R006, URL https://doi.org/10.2523/
IPTC-19430-MS.
CRediT authorship contribution statement Hareland, G., Wu, A., Lei, L., 2014. The Field Tests for Measurement of Down-
hole Weight on Bit(DWOB) and the Calibration of a Real-time DWOB Model.
Mauro A. Encinas: Conceptualization, Investigation, Data cura- International Petroleum Technology Conference, Doha, Qatar, p. 6, IPTC, URL
https://doi.org/10.2523/IPTC-17503-MS.
tion, Methodology. Andrzej T. Tunkiel: Formal analysis, Supervision, Hauser, J.R. (Ed.), 2009. Interpolation. In: Numerical Methods for Nonlinear Engineer-
Writing – original draft. Dan Sui: Conceptualization, Methodology, ing Models. Springer Netherlands, Dordrecht, pp. 187–226, URL https://doi.org/
Supervision, Writing – original draft. 10.1007/978-1-4020-9920-5_6.
12
M.A. Encinas et al. Journal of Petroleum Science and Engineering 210 (2022) 109904
Hawkins, D.M., 1980. Identification of Outliers. In: Monographs on applied probability Nogueira, F., 2014. BayesIan optimization: Open source constrained global optimization
and statistics, Chapman and Hall, http://dx.doi.org/10.1007/978-94-015-3994-4. tool for python. URL https://github.com/fmfn/BayesianOptimization.
Hegde, C., Daigle, H., Millwater, H., Gray, K., 2017. Analysis of rate of penetration Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
(ROP) prediction in drilling using physics-based and data-driven models.. J. Pet. Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Sci. Eng.. Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit-learn: Machine
Hegde, C., Soares, C., Gray, K., 2019. Rate of penetration (ROP) modeling using learning in python. J. Mach. Learn. Res. 12, 2825–2830.
hybrid models: Deterministic and machine learning. In: Unconventional Resources Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 2007. Numerical recipes
Technology Conference, https://doi.org/10.15530/URTEC-2018-2896522. 3rd edition: The art of scientific computing, third ed. Cambridge University Press,
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9 (8), USA, pp. 955–983.
1735–1780, URL https://doi.org/10.1162/neco.1997.9.8.1735. Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning representations by back-
Jain, A.K., Mao, J., Mohiuddin, K.M., 1996. Artificial neural networks: a tutorial. propagating errors. Nature 323 (6088), 533–536, URL https://doi.org/10.1038/
Computer 29 (3), 31–44. http://dx.doi.org/10.1109/2.485891. 323533a0.
Johancsik, C.A., Friesen, D.B., Dawson, R., 1984. Torque and drag in directional Sheppard, M.C., Wick, C., Burgess, T.M., 1986. Designing well paths to reduce drag
wells-prediction and measurement. J. Pet. Technol. 36 (06), 987–992, SPE, URL and torque. In: Presented At the 61st Annual Technical Conference and Exhibition
https://doi.org/10.2118/11380-PA. of the SPE, New Orleans, https://doi.org/10.2118/15463-PA.
Law, K., Stuart, A., Zygalakis, K., 2015. Data Assimilation: A Mathematical Introduction. Soares, C., Gray, K., 2020. Real-time predictive capabilities of analytical and machine
In: Texts in Applied Mathematics, vol. 62, Springer, Cham, A mathematical learning rate of penetration (ROP) models. J. Pet. Sci. Eng. http://dx.doi.org/10.
introduction, URL https://doi.org/10.1007/978-3-319-20325-6. 1016/j.petrol.2018.08.083.
Liu, B., 2017. Lifelong machine learning: a paradigm for continuous learning. Front. Tunkiel, A.T., Sui, D., Wiktorski, T., 2021. Training-while-drilling approach to in-
Comput. Sci. 11 (3), 359–361, URL https://doi.org/10.1007/s11704-016-6903-6. clination prediction in directional drilling utilizing recurrent neural networks. J.
Luke, G.R., Juvkam-Wold, H.C., 1993. The determination of true hook-and-line tension Pet. Sci. Eng. 196, 108128. http://dx.doi.org/10.1016/j.petrol.2020.108128, URL
under dynamic conditions. SPE Drill. Complet. 8 (04), 259–264, SPE, URL https: http://www.sciencedirect.com/science/article/pii/S0920410520311827.
//doi.org/10.2118/23859-PA. Tunkiel, A.T., Wiktorski, T., Sui, D., 2020. Continuous drilling sensor data reconstruc-
McCulloch, W.S., Pitts, W., 1943. A logical calculus of the ideas immanent in tion and prediction via recurrent neural networks. Volume 1: Offshore Technology,
nervous activity. Bull. Math. Biophy. 5 (4), 115–133, URL https://doi.org/10.1007/ URL https://doi.org/10.1115/OMAE2020-18154.
BF02478259. Wiktorski, E., Kuznetcov, A., Sui, D., 2017. ROP optimization and modeling in
McLaughlin, D., 2014. Data assimilation. In: Njoku, E.G. (Ed.), Encyclopedia of Remote directional drilling process. D011S003R002, URL https://doi.org/10.2118/185909-
Sensing. Springer New York, New York, NY, pp. 131–134, URL https://doi.org/10. MS.
1007/978-0-387-36699-9_33. Wu, A., Hareland, G., Fazaelizadeh, M., 2011. Torque & drag analysis using finite
Mockus, J., Tiesis, V., Zilinskas, A., 2014. The application of Bayesian methods for element method. Mod. Appl. Sci. http://dx.doi.org/10.5539/mas.v5n6p13.
seeking the extremum. 2, pp. 117–129, Wu, Z., Hareland, G., Loggins, S.M., Lai, S., Eddy, A., Olesen, L., 2017. A new method of
Moran, D., Ibrahim, H., Purwanto, A., Osmond, J., 2010. Sophisticated ROP predic- calculating rig parameters and its application in downhole weight on bit automation
tion technologies based on neural network delivers accurate drill time results. in horizontal drilling. D031S004R005, URL https://doi.org/10.2118/185643-MS.
SPE-132010-MS, URL https://doi.org/10.2118/132010-MS.
Motahhari, H.R., Hareland, G., James, J.A., 2010. Improved drilling efficiency technique
using integrated PDM and PDC bit parameters. J. Can. Pet. Technol. 49 (10), 45–52,
URL https://doi.org/10.2118/141651-PA.
13