1 s2.0 S0920410521015217 Main

Journal of Petroleum Science and Engineering 210 (2022) 109904
Contents lists available at ScienceDirect
Journal of Petroleum Science and Engineering

journal homepage: www.elsevier.com/locate/petrol
Downhole data correction for data-driven rate of penetration prediction

modeling
Mauro A. Encinas, Andrzej T. Tunkiel, Dan Sui ∗
Department of Energy and Petroleum Engineering, Faculty of Science and Technology, University of Stavanger, 4036 Stavanger, Postboks, 8600 12 Forus, Norway
ARTICLE INFO ABSTRACT
Keywords: In recent years, machine learning has been adopted in the Oil and Gas industry as a promising technology for
Drilling solutions to the most demanding problems like downhole parameters estimations and incidents detection. A
Machine learning big amount of available data makes this technology an attractive option for solving a wide variety of drilling
Rate of penetration
problems, as well as a reliable candidate for performing big-data analysis and interpretation. Nevertheless,
Drilling data quality improvement
this approach may cause, in some cases, that petroleum engineering concepts are disregarded in favor of more
Recurrent neural networks
data-intensive approaches. This study aims to evaluate the impact of drilling data measurement correction on
data-driven model performance. In our study, besides using the standard data processing technologies, like
gap filling, outlier removal, noise reduction etc., the physics-based drilling models are also implemented for
data quality improvement and data correction in consideration of the measurement physics, rarely mentioned
in most of publications. In our case study, recurrent neural networks (RNN) that are able to capture temporal
natures of a signal are employed for the rate of penetration (ROP) estimation with an adjustable predictive
window. The results show that the RNN model produces the best results when using the drilling data recovered
through analytical methods. Moreover, the comprehensive data-driven model evaluation and engineering
interpretation are conducted to facilitate better understanding of the data-driven models and their applications.
1. Introduction (2017), Soares and Gray (2020) and Barbosa et al. (2019b), different
authors have presented machine learning approaches (KNN, GBM, RF,
1.1. Background SVM etc.) for data-driven rate of penetration modeling. For example,
in Han et al. (2018), the authors investigated the performance of the
Having an accurate method for the rate of penetration (ROP) pre- RF model for ROP predictions and compared it with other ML archi-
diction has been the one of the objectives of the drilling engineering tectures, like Artificial Neural Networks (ANN’s) (McCulloch and Pitts,
industry since the 1950s. However, this problem has been proven to be 1943). It showed that the results using the RF were optimal, specially
more complex than initially anticipated as there is no obvious relation considering the situation when limited data was available. Another ML
between a single drilling parameter and the target ROP value. Different
model that has been used for the ROP prediction is the mentioned
variables, e.g. weight on bit (WOB), rotary speed (RPM), standpipe
ANN’s, see Amer et al. (2017), Han et al. (2019) and Moran et al.
pressure (SPP), formation/bit properties, interact in ways such that it is
(2010). In general, there are good results for the ROP prediction using
difficult to formulate an accurate and dynamic mathematical expression
the above-mentioned models with different ML structures, see Hegde
to predict the ROP dynamics or estimate a good value close to the
ground truth. The complexity of the parameters’ relationships involved et al. (2019), Esmaeli et al. (2012), Tunkiel et al. (2021) and Soares
in the ROP models makes the use of machine learning (ML) technology and Gray (2020).
appealing (Barbosa et al., 2019a). Some of the most important challenges of data-driven modeling are
There are a number of ML methods to formulate a regression prob- data issues like data availability, data generality, data quality, data
lem, like Random Forest (RF) (Breiman, 2001), Gradient Boosting Ma- scaling, and data selection. Among them, the data quality issue is very
chines (GBM) (Friedman, 2001), K-nearest Neighbors (KNN) (Altman, critical for model performances. Data problems due to sensors’ failure,
1992), Support Vector Machine (SVM) (Christiani and Shawe-Taylor, malfunction, miscalibration, user errors, sampling frequencies, time
2000). In Ahmed et al. (2019), Eskandarian et al. (2016), Hegde et al. delay, data corruptions and so on, are often encountered, as seen in the
∗ Corresponding author.
E-mail addresses: andrzej.t.tunkiel@uis.no (A.T. Tunkiel), dan.sui@uis.no (D. Sui).
URL: https://github.com/mauro-encinas (M.A. Encinas).
https://doi.org/10.1016/j.petrol.2021.109904
Received 5 July 2021; Received in revised form 9 November 2021; Accepted 22 November 2021
Available online 7 December 2021
0920-4105/© 2021 Elsevier B.V. All rights reserved.
M.A. Encinas et al. Journal of Petroleum Science and Engineering 210 (2022) 109904
discussions in Geekiyanage et al. (2020). High data quality is desired that maintain the information from previous computations when pro-
to provide valuable and useful information. Data processing techniques cessing new information. Recently, a novel and accurate approach
like data filtering, cleansing, imputation, resampling, outlier removal (Tunkiel et al., 2021) was proposed to solve a depth-based series
and data correction are the necessary steps to improve data quality, that problem using a multi branch structure that contains a multilayer
are always implemented in data preparation work. However, to the best perceptron (MLP) (Jain et al., 1996) and a RNN to predict the well
of the authors’ knowledge, there is limited research focused on how to inclination in the section between the downhole sensor and the drilling
improve or correct the data with the help of measurement physics or bit. In our work, the RNN ROP model developed follows a similar model
analytical methods. In some cases, like sensor failure or malfunction, structure and configuration as the one presented in Tunkiel et al. (2021)
it would be beneficial to validate and correct wrong measurements with some changes, see Section 3.
through the use of physics-based models. (C3): The RNN model prediction abilities were evaluated and compared
with and without the data correction step, see Section 4. This was
1.2. Objectives and novelty done to verify if the ML models can significantly benefit from corrected
data, or the ML algorithms can overcome flawed inputs. Moreover, the
The main objective of this study is to develop a data-driven model other evaluated consideration is the model prediction capability, see
that is able to predict the ROP with a high degree of confidence. Section 5.
The second objective is to show the importance of data correction on
increasing the accuracy of data-driven models, where the input data 2. Data preparation
quality is improved using not only standard signal processing methods,
but also the analytical physics-based models. The specific objectives As a general overview of drilling data, it is important to understand
towards fulfilling this goal are: the parameters’ physics. This helps us understand whether the behavior
of the data is appropriate or erratic. For instance, how to identify an
• Identify the key parameters involved in the ROP prediction; outlier from a data-set that could present some meaningful information,
• Determine/correct downhole weight on bit (DWOB) from the or is outside of the previously shown pattern? The most of the common
surface measurements based on physics-based models; drilling data issues are the presence of missing data, outliers, noisy
• Implement and evaluate the different ML models; data, and faulty measurements. Each of these issues can be solved
• Analyze and interpret ML models from data science and drilling in different ways. In this section, we first present some widely used
engineering perspectives respectively; methods in signal processing domain to solve common data issues via
• Provide a benchmark data-set that provides the opportunity for noise reduction, imputation, outlier removal and redundant measure-
reproducible results. ments. Considering the specifics of drilling data, especially the HL and
The main innovation behind this paper is to show that feature WOB, the analytical methods are employed here for such measurement
engineering can be a significant factor in prediction quality in relation correction based on mathematical drilling models.
to the ROP prediction. We present a case study, where Hookload (HL)
and DWOB data, recovered through analytical methods, yields signifi- 2.1. Signal processing methods
cant improvements over a model utilizing only the raw available data,
where data is processed purely using standard data processing methods. 2.1.1. Data imputation
The case study illustrates how much improvement data reconciliation Most of data sets have missing information that could be repre-
can bring to the method utilizing branched ML model, expanding the sented as Not a Number (NaN) values in the set. There are two principal
understanding of the method’s potential and sensitivity to data quality. ways of dealing with this situation, eliminating the row containing
The work presented in this paper in terms of the model architecture such value or replacing it with an estimated value determined by
relies on the previously published work (Tunkiel et al., 2021), where the behavior of the analyzed feature. The first case is used when the
a two-branched model architecture utilizing recurrent elements was available data set is large enough that the elimination of rows does not
introduced. This previous work focused on the inclination data only affect the quality of the whole data set. When the row elimination is not
and therefore the performance of the method for other applications an option, filling of these empty cells (i.e. estimating the intermediate
was unknown. The work presented here re-uses the ML architecture values) can be done by using interpolation techniques (Hauser, 2009)
to predict the ROP, a first attempt with the ROP prediction is based or regression methods, such as linear, quadratic, cubic and polynomial
interpolation. To decide which one to be used will depend on the
on an ML model utilizing both Recurrent Neural Networks as well as
particular characteristics of data sets. For instance, it can be beneficial
multilayer perceptron elements.
to perform a check on each attribute’s first numerical derivative. If
the most of values of the first derivative are zero, then forward filling
1.3. Methodologies and structures
(using last valid value forward) is selected as presumably the better
method. This will identify rarely changing values such as wellbore
Presented research is divided into 3 categories:
diameter, which is more likely to change rapidly or discretely, where
• (C1) Data preparation and correction; linear interpolation would introduce unrealistic slopes to the data.
• (C2) Data driven ROP modeling via Recurrent Neural Networks
(RNN) (Rumelhart et al., 1986); 2.1.2. Data assimilation
• (C3) RNN model evaluation and interpretation. Fig. 1 shows one encountered drilling data issue: the hookload
measurements below 2200 m did not follow the same incremental
(C1): The data cleaning process used in this study (like noise reduction, tendency as the data above 2200 m. It is known that in wells that
outlier removal, gap filling) is conducted to remove outliers and correct are not horizontal, the weight of the string increases continuously with
erratic measurements, see Section 2.1. This work also comprises the depth because more pipes are added to the well, considering that we
data correction based on measurement physics to improve the accu- analyzed a hole section that used the same drillstring to be drilled (see
racy of the measurements. For this purpose, a physics-based model is Section 4.1.2). In the presented case, this pattern was not followed
implemented to correct the WOB measurements before feeding them to by the hookload measurements, as it changes abruptly at 2200 m
ML models, see Section 2.2. measured depth. Therefore, as all the other features in the data set
(C2): Recurrent Neural Networks, such as Long–Short-Term-Memory contained valuable information, it was necessary to correct this faulty
(LSTM) (Hochreiter and Schmidhuber, 1997) access data sequentially, measurement.
2
2.1.4. Noise reduction

In the case of noisy data, some useful methods are different types of
low pass filters or high pass filters. In our case study, the most widely
used method, moving average filter is implemented to smooth data
signals and remove noises. It is simple, yet powerful tool used for data
filtering. The formula in discrete time domain is given below:
1 ∑
𝑁−1
𝑦(𝑡) = 𝑥(𝑡 − 𝑗),
𝑁 𝑗=0
where 𝑥 is the raw data, 𝑦 is the processed data after filtering, 𝑡 is the
time/depth coordinate and the index 𝑗 corresponds to the number of
convolution steps for a data point at time 𝑡, 𝑁 is the number of data
point span considered for the average taking. The moving average filter
is very practical for engineers since it does not require much frequency
analysis for cutoff frequency selection. However it might lead to time-
delay issues. In some cases when the quick reactions are needed, low
pass filter is an alternative to the moving average filter over a certain
frequency selected and the delay problem might be resolved, see more
discussions in Geekiyanage et al. (2020).
2.2. Engineering data correction

Fig. 1. Raw Hookload vs Depth.
In the following section, we will present the physics-based method-
ology used to correct the WOB data.
For this case, the data assimilation (DA) (Law et al., 2015; McLaugh-
lin, 2014) for measurement correction is useful as it works in situations 2.2.1. Hookload correction
where the DA mathematical discipline seeks to optimally combine the- The foundation for this physics-based calculations is the work pre-
sented in Hareland et al. (2014), as it shows the necessity of correcting
ory with observations /estimations. In data assimilation, the parameter
the HL measurements registered on the field. In Luke and Juvkam-Wold
is calibrated using statistical analysis. In Geekiyanage et al. (2018),
(1993), it presents that, due to the position of the sensor, a difference
it was shown that if there is a proven independence between the
between the true HL and the measured one exists. However their work
measurements or estimations, the calibrated calculation 𝑥̂ based on the
only considered the effect of sheaves as the possible source of such
two individual measurements/estimations 𝑥1 and 𝑥2 , is calculated as:
difference. In recent years more detailed studies on this matter were
𝑥̂ = 𝑎1 𝑥1 + 𝑎2 𝑥2 (1) published. For example, Cayeux et al. (2015) presented a more detailed
analysis of the source leading to this difference, where it analyzed the
where, effects such as the sheave angle, diameter, length and angle of the mud
hose, etc.
𝑎1 + 𝑎2 = 1, The first effect to be analyzed is the sheave effect. This correction
only applies if the measurement of the HL is done with a tension sensor
1∕𝜎12 1∕𝜎22 in the anchor of the deadline. Sheave effect is present as the source
𝑎1 = , 𝑎2 = . of the signal must pass through the block and tackle the arrangement
1∕𝜎12 + 1∕𝜎22 1∕𝜎12 + 1∕𝜎22
before it is measured, which causes a distortion in the measurement.
In the above equation, 𝜎1 is the standard deviation of the measurements Installing a sensor located close to the saver sub in the Top Drive has
𝑥1 and 𝜎2 is the one from 𝑥2 . More details regarding the implementation been discussed in Cayeux et al. (2015), Wu et al. (2017), as well as
of this methodology for the case study are given in Section 4.1.2. having load cell located in the hanging point of the Top Drive (Cayeux
et al., 2015). Eq. (5) presented in Hareland et al. (2014) represents the
impact of the sheaves on the measured weight:
2.1.3. Outlier removal
1 − 𝑒𝑛
An outlier is defined as an observation that deviates so much from HLa1 = 𝐹dl , (5)
1−𝑒
the other observations as to arouse suspicions, see Hawkins (1980).
where 𝑛 is the number of lines that pass through the traveling block,
There are several methods for outlier removal, like using median filter,
𝐹dl is the measured tension, 𝑒 is the individual sheave efficiency, and
mean filter or other methods based on density/distance information. HLa1 the corrected hook load considering sheave effect.
Among them, the Interquartile Range (IQR) method is often used. Eq. (5) is similar to the one presented in Luke and Juvkam-Wold
In Deep (2005), the IQR equation is given as: (1993) for the inactive dead-line sheave, and it is also applicable
to the second effect: static HL effect. In this case the weight of the
IQR = 𝑃75 − 𝑃25 (2)
traveling block needs to be corrected as this weight is also added to
where the measurement from the dead-line sensor. Eq. (6) is used to determine
this corrected load:
Lower Range = 𝑃25 − 1.5 ∗ IQR, (3) 1 − 𝑒𝑛
HLa2 = 𝐹tb , (6)
Upper Range = 𝑃75 + 1.5 ∗ IQR. (4) 1−𝑒
where 𝐹tb is the weight of the traveling block and HLa2 is the corrected
Eq. (2) presents the values that are accepted to pass this outlier removal traveling block load.
process. 𝑃75 represents the 75th percentile and 𝑃25 is the 25th per- The third correction to the HL measurement is the stand pipe
centile. Eqs. (3)–(4) define the lower and upper range for data removal pressure effect (Cayeux et al., 2015). It covers a broader range of
respectively. elements, not only the one present in the mud hose, but the weight
3
Fig. 2. Hookload Correction Diagram.
applied from hydraulic and electric lines. For this correction, Eq. (7)
in Hareland et al. (2014) is used:
Fig. 3. DWOB Correction Diagram.
HLa3 = 𝛼 ∗ SPP ∗ ID2 , (7)
where HLa3 is the load effect due to standpipe pressure, 𝛼 is the model
coefficient, SPP is the standpipe pressure and ID the inner diameter of applied taking into account. For example, if the error is a positive
the mud hose. With this equation, we aim to determine the value of value, meaning that the calculated HL is bigger than the corrected
the force applied by the mud filled hose connected to the Top Drive. HL, the model will increase the DWOB in a predetermined value to
Eq. (8) shows the final result of this process: reduce this difference. This process is repeated until the error is equal or
HLcorrected = HLa1 − HLa2 − HLa3 , (8) smaller than the threshold value. In order to validate the calculations,
the results of this algorithm were compared to the ones obtained using
where HLcorrected is the corrected hookload with considering these three Halliburton’s WellPlan Software (Halliburton, 2020), see Section 5.1.
effects. Fig. 2 shows the necessary inputs for the model (HL, Weight of Fig. 3 shows the flowchart of DWOB correction algorithm. The input
Traveling Block and SPP), the equations used and finally the output of data of this algorithm is the corrected HL, friction factor, well path,
this process. More detailed information about the use of these equations BHA data and fluid density. The output is the calculated DWOB.
for the analyzed case are given in Section 4.2.1.
3. Data-driven ROP modeling
2.2.2. WOB correction
Once the corrected HL is calculated, it is possible to calculate the Throughout this section, the main data-driven ROP modeling steps
DWOB using the Torque and Drag (T&D) model. Fig. 3 presents the flow will be presented, including the identification of the most important
process used for the calculation of the DWOB for given data points. The parameters of the ROP models and the ML architecture.
inputs besides the aforementioned are the friction factor, well trajec-
tory, BHA configuration and fluid density. The T&D model selected for 3.1. Key features
this study was developed by Johancsik et al. (1984). The methodology
to calculate the DWOB is based on the shooting method Press et al. Physics-based ROP models have been developed throughout the
(2007). The idea of shooting method is that an initial guess is made of years, starting in the 1950’s. Among them, the one of the most impor-
the unknown boundary conditions at one end of the interval. Using this tant developments in those earlier years is the Bingham model (Bour-
guess, the terminal conditions obtained from the numerical integration goyne et al., 1991) which considers two main operational parameters,
are compared to the known terminal conditions and if the integrated WOB and RPM. As the years passed, the more developments were
terminal conditions differ from the known terminal conditions by more presented. Some of them are regarded as a reliable source for the
than a specified tolerance, the unknown initial conditions are adjusted ROP calculation, for example, the model presented in Bourgoyne et al.
and the process is repeated until the difference between the integrated (1991). It was developed in a time where tri-cone bits were used
terminal conditions and the required terminal conditions is less than but has been adapted throughout the years to be used with PDC
some specified threshold. bits, see Wiktorski et al. (2017). One of the recent models was pre-
First, the algorithm sets an initial DWOB guess which is used as an sented by Motahhari et al. (2010), where the different factors were
input for the T&D model, then the T&D model calculates the weight taken into considerations, for example, confined rock strength, cutter
of the drill string (calculated HL) under the given conditions. This diameter (PDC bits), pore pressure gradient, Equivalent Circulating
calculated HL is compared to the results obtained in the previous step Density (ECD), and many more. Some of these parameters are available
at the same depth. The result of this difference (error) is then identified and possibly do not change between one location and the other one.
as either a positive or negative value. It will be positive if the calculated However in most cases most of measurements are not always available
HL is greater than the corrected HL1 and vice versa. If this error is or they are only applicable in some specific areas.
greater than a predetermined threshold, a correction to the DWOB is It is important to analyze how ML models function with their pa-
rameters and which parameters have been historically selected for this
purpose. In Barbosa et al. (2019a), the authors presented a study based
1
The steps for the corrected HL are given in Section 4.1.2. on more than 50 publications which studied machine learning ROP
4
Fig. 4. Heatmap with Correlation Coefficient Values.
models, showing that 8 parameters were the most used or mentioned in Fig. 5. Presented Machine Learning Model Structure, Two Branches: RNN Branch (left)
and MLP Branch (right).
all of them. Therefore, in our study, based on the available data-sets and
the correlation index, see Fig. 4, the following parameters are included
in the model for this study:
past ROP data, which goes through a Gaussian noise layer to ensure
• Bit Depth; that we generate some levels of randomization in the data (the value
• Average Hookload; used is 0.007 of standard deviation of noise distribution). Afterwards,
• WOB (surface or calculated downhole ones); the information goes towards the RNN. The output of the LSTM cells,
• Surface Torque (STQ); enters a dropout layer as an additional measure to avoid overfitting.
• RPM; The other branch of the model (MLP), uses all other parameters
• Standpipe Pressure. selected for the study as inputs (Depth, WOB, HL, STQ, RPM and
SPP). This information goes directly to the MLP layer, the output of
The heatmap of the correlation between the ROP and the features
the MLP goes to a dropout layer to avoid overfitting. The output of
selected for this study is shown in Fig. 4. It is easy to see that the
the dropout layer enters a flatten layer as a previous step towards
correlation score between the HL and the ROP, and the one between the
another MLP layer. The output of both branches RNN and MLP are then
DWOB and the ROP are quite high. Hence it shall be with the necessary
step to correct the HL data and DWOB data for the ML data preparation. concatenated in order to obtain a single tensor that is the input for an
In addition, from the heatmap, the correlation score between the DWOB additional MLP layer. Finally, the output of the last MLP will provide
and the ROP is higher than the one between the SWOB and the ROP, the predicted ROP, during the training stage. The model is ran several
meaning that if the DWOB is available, it would be wise to prioritize times and then the model with the lowest validation score is selected.
the good downhole measurements, see more discussions in Section 4.3. In the current work, no quantitative work was performed to identify
the cost–benefit curve for quantity of validation runs against model
3.2. Machine learning model structure performance gains. For the purpose of work presented in this paper
the model was re-trained from scratch 10 times. The desired value is
This study uses a continuous learning idea proposed by Liu (2017) dependent on the performance increase (about 10% in presented case
and applied in the domain of drilling by Tunkiel et al. (2021) to study) evaluated against additional time needed to perform training
continuously update the developed ML model. In Tunkiel et al. (2021), (1-10 min in presented case study), which will be dependent on the
the model considers an expanding data set, which represents a real life computational hardware available, problem at hand, complexity of the
case for drilling operations. The initial available data equals to 15% network, available time and the effect of diminishing returns. More
of the total available data, that will be used to train and validate the quantitative methods with respect to time, trial and the heuristics
model. And the following 20% of the total available data is used for involved for optimal model development will be further investigated.
testing, which is the data that the model has never seen, completely out The data set has to be reshaped to fit the presented machine learning
of samples. Afterwards, the quantity of the available data for training model, which means that samples are created to contain the input and
is increased and the training and validation process is repeated, see the predicted output of the model. In Fig. 6, the attribute A is the one to
detailed presentation in Tunkiel et al. (2021). be predicted. A user shall specify the length of past data of the target
In our ROP modeling, the model structure is composed of two attribute (A) to be fed to the RNN branch (data shown in green boxes),
branches, a RNN branch and an MLP branch, see Fig. 5. This structure and also the length of the predicted output from the MLP (data shown
was originally presented by Tunkiel et al. (2021, 2020). In this work the in the red boxes). The blue boxes with the attributes (B, C, D) are the
main changes are primarily given by the fact that the RNN branch uses other input attributes fed to the MLP branch. In general, the length
LSTM cells instead of Gated Recurrent Unit. Additionally, the number of the green boxes (N1) and red boxes (N2) are selected by the users
of LSTM cells and neurons in MLP were determined after a hyper- in consideration with the sampling interval and model accuracy. More
parameter tuning process. In the RNN branch, the input is only the discussions on the choice of N1 and N2 are given in Section 5.2.
5
Table 1
Parameters used in case study with ranges.
Feature Units Range
1 Bit Depth (MD) meters 1900–2400
2 Surface Weight on Bit tonnes 0.7–7.2
3 Average Hookload tonnes 126–143
4 Surface Torque kN-m 14.9–25.3
5 Downhole Weight on Bit tonnes 0.5–7.4
7 Average Rotary Speed rpm 150
8 Average Standpipe Pressure kPa 18363–20646.5
be used to verify the validity of the developed calculations. The data is

publicly available for its use as depth- or time-based data sets.3
In principle it is possible to work on both time-based and depth-
based data in relation to real-time drilling logs. In the presented case
study it was decided to use a depth-based data since such logs present
drilling in a single continuous process. Time-based data sets are less
processed and contain all the operations performed during drilling,
such as tripping, connecting pipes etc., which is generally irrelevant
to the ROP prediction process. The complete data set for the 15/9-F5
well is located as a csv file4 and it contains 1287579 rows and 201
columns. The raw data cannot be directly usable for machine learning
purposes, and must go through extensive data cleaning and preparation
processes. After the cleaning process, the data set is reduced to 16220
rows. Table 1 shows the range of the attributes used in the case study.
Using modern consumer grade computer hardware (i7 CPU, 32 GB
Fig. 6. Input Data Shapes of RNN and MLP.
RAM, Nvidia Quadro P1000), one training cycle took approximately
between 2 and 10 min to complete, depending on the data quantity as
dictated by the continuous learning approach.
In our study, the machine learning model was developed using
Keras library with Tensorflow 2 backend. Both of them are open source 4.1. Data preparation
projects developed and maintained by Google. The LSTM implemen-
tation included in Keras was used with 281 neurons. Other branch Data preparation is an important step to ensure the quality of the
consists notably of two MLP layers of 5 and 1 neurons and a dropout results delivered by any ML models. Drilling data is, in most cases, not
layer randomly truncating 45% of connections. The two branches are usable for modeling when it is received from field. Even tough there are
concatenated and processed in a further MLP layer of 44 neurons. some general procedures that can be followed for most of the data sets,
Details of multiple parameters related to the model are available in the in some cases (like the one presented in this study) additional steps are
source code published on GitHub.2 needed to ensure the data is usable for ML purposes. We divided the
In order to determine the structure parameters, Bayesian Optimiza- data preparation for the 15/9-F5 into three steps that will be explained
tion (Mockus et al., 2014; Nogueira, 2014) was used for the hyper- in the following paragraphs.
parameter tuning process. During the algorithm implementation, three
percentages of the data were selected for the optimization process, 20, 4.1.1. Data cleaning and imputation
50 and 80 percent. Then an average of the obtained parameters for each First, we selected the section of the well to be analyzed, including
of these percentages was used as the parameters of the final model. Due the ’on bottom’ feature available on the data set that will be later
to the limited data set, it was not verified if the hyperparameter tuning removed. This feature helps identify when the bit is drilling. In the
process realized for this well could be optimal for the deployment in data set it contains two values, 0 when the bit was ’off bottom’ and
other wells. But considering the parameters used as inputs which are 1 when the bit was ’on bottom’. As it will help us separate the drilling
available in all drilling rigs, it is possible to believe that if necessary, from non-drilling operations, we dropped the rows that contained NaN
the tuning time could be reduced compared to the initial run made for values. Finally, the missing information in the features is filled by using
this model. polynomial interpolation.
4. Case study 4.1.2. Data assimilation

There are some good reasons to explain the HL measurement devia-
The Volve Field data was released by Equinor, Norwegian Energy tion, seen in Fig. 1, e.g., change of the BHA due to drilling another hole
Company in 2018. The complete production and downhole well data section or the BHA with new drillpipe. Nevertheless, as per the daily
from the Volve Field was made publicly available (Equinor, 2018). The drilling reports, the 12 41 section (1405 m - 2921 m) was drilled using 5 12
field is located at the central part of the North Sea. The field started inch drillpipe with the BHA without observing major problems. Hence,
production in 2008 but was first discovered in 1993. The field has an inconsistency of this magnitude of the HL can be explained by a
produced for 8 years until 2016 when it was decided to permanently sensor failure. If such wrong HL measurements (after 2200 m) were to
close production (Equinor, 2020). The data-set used for this study is be used in ML models’ training, the ML models’ behavior would have
a part of the Volve Field data where the well 15/9-F5 was selected
among other wells, since it contains the relevant information that could
3
https://www.ux.uis.no/~atunkiel/.
4
http://www.ux.uis.no/~atunkiel/Norway-StatoilHydro-15_$47$_9-F-
2
https://github.com/mauro-encinas. 5%20time.zip.
6
Fig. 7. Hookload vs Depth Comparison before Correction (blue) and after Data
Assimilation Correction (green).
been affected. Therefore, the necessary data correction is expected for

such case.
In this context the data assimilation given in Section 2.1.2 is used
to correct such HL measurements. For this, 𝑥1 and 𝑥2 (see Eq. (1)), are
based on two calculated values. The first one, 𝑥1 was determined by
the results of the implemented Johancsik T&D model (Johancsik et al.,
1984) and 𝑥2 was obtained by using Halliburtons’ WellPlan software.
These are the two independent signals that will help us apply the data Fig. 8. Raw Data Elements after HL Correction.
assimilation process. Fig. 7 shows the comparison of the HL data, where

the blue curve is the raw surface HL measurements and the green one
is the corrected one after data assimilation. of these options would help the model obtain better results. When
After this step, the data set which includes the features used in working with Artificial Neural Networks and RNN, the recommended
the case study: Bit Depth, Average Hookload, WOB, STQ, RPM, SPP is practice is to take feature scale in a range between 0 and 1. It is worth
presented in Fig. 8. As visible in the plots, the data is noisy and needs mentioning that the scaling to an interval between 0 and 1 depends
to be further processed in order to make it usable for ML purposes. strongly on the problems, and in lots of cases other transformations
may be more adequate.
4.1.3. Outlier & noise removal and data resampling
This process starts by removing outliers, which are observation 4.2. Data correction
points that are located far away from the average measurements and
interfere with the model’s ability to perform properly. Additionally, the
As above-mentioned, the one of the objectives of this study is to
noises present in all sensors were smoothed using a moving average
correct measurements using physics-based models. Two results for the
filter (given in Section 2.1.4). In this study the IQR method discussed in
surface HL and downhole WOB correction are presented below.
Section 2.1.3 is used to remove the outliers. These steps are important
for the model to perform as desired.
When the measurements were not sampled equally, it will generate 4.2.1. Hookload correction
problems when the data is fed to the RNN model. One reason behind Section 2.2.1 describes the procedure to correct the hookload mea-
this inconsistency may be that sensors do not have the same sampling surement. Firstly, the effect of sheaves shall be removed from the
frequency. Another reason, might be the existence of a data gap in measurement, that only applies if the sensor is located at the anchor
data-sets. To solve this issue, and to get an evenly sampled data set, of the deadline. Secondly, it is necessary to subtract the weight of the
Radius Neighbors Regressor, using open source Scikit-learn (Pedregosa traveling block, that is not part of the weight of the drill string, but
et al., 2011) implementation is used such that the data is re-sampled is measured by the sensor. Finally, it is necessary to remove the SPP
to get equally distanced observation points; depth interval of approx- effect. As the sheave efficiency for the rig is not known, a sensitivity
imately 0.25 m is selected to reduce the high amount of data. The analysis was run to identify the correct value. According to Luke and
processed data set after outlier removal, noise reduction and gap filling Juvkam-Wold (1993), the sheave efficiency range is between 96%
is presented in Fig. 9. to 99%. The corrected hookloads with applying different sheave ef-
In addition, feature scaling is also a necessary step to ensure that ML ficiency coefficient are presented in Fig. 10. In order to determine
models perform properly. In this case, Scikit-Learn (Pedregosa et al., the correct sheave efficiency, the results of the HL correction were
2011) offers a number of possibilities, e.g., Standard-Scaler, MinMax- compared against the ones obtained by the simulations using Hallibur-
Scaler, Robust-Scaler, and Normalizer etc. Depending on the type of ton’s WellPlan. For this case the 99% sheave efficiency is the one that
models and the intended use of the data, one can argue that each provided the best results.
7
Fig. 11. Comparison between Calculated DWOB (blue) and Measured SWOB (green).
0.25 m. Since the available surveys for this well were registered each
40 m, it is necessary to interpolate the well trajectory to obtain at least
5 m interval of survey points in the BHA section and 20 m in the drill
pipe section. This was done to obtain more accurate results of the HLs,
Fig. 9. Processed Data Elements. as the efficacy of this model relies on the quality of this parameter.
The shooting method mentioned in Section 2.2.2 was numerically
implemented to search the best matched DWOB following the proce-
dure presented in Fig. 3. First, a lower boundary value of the DWOB
is selected. In this case the first assumption was to set this value
equivalent to the registered surface WOB (SWOB). Then the model
back-calculates the total weight of the drill string, which is then com-
pared to the corrected HL value. For this case study the threshold was
set at 0.25Ton and if the difference is greater than this value, the DWOB
guess value is changed. This process is repeated until the difference is
smaller than the predefined threshold. Once the termination condition
is satisfied, the guess value of DWOB is the estimated WOB based
on T&D model calculation with regards to the calculated HL. Fig. 11
presents the estimated DWOB results (blue curve) obtained from this
process and the registered SWOB (green curve). It shows that the DWOB
and SWOB have big deviations during two sections (1900 m–2000 m
and 2200 m- 2300 m). In the following section, we will compare the
ML models’ results with using two different measurements respectively,
especially for ROP predictions in such two sections.
4.3. ROP predictions
There are several metrics to evaluate the error in the prediction,

Fig. 10. Corrected HL Measurements for Different Sheave Efficiencies.
e.g., coefficient of determination (𝑅2 ), mean absolute error (MAE). It
is a common practice for ML models to report the results with respect
to the 𝑅2 . Nevertheless, the MAE was selected by the authors as it
provides a quantifiable option to understand the results. This results
4.2.2. DWOB correction
are expressed in the units of the predicted value (m/h) and help better
There are a number of considerations taken during this process. visualize the results for ML model.
Firstly, the value of the bit depth is set as the deepest value registered As the main objective is to know whether the additional step of the
in the data. Secondly, a trajectory needs to be set for this point up to DWOB correction improves the overall results of the ROP prediction,
the surface, considering that the data is re-sampled approximately each the model is evaluated with the same input data (Depth, Average
8
Fig. 12. Error in Predicted ROP Comparison when Using Calculated DWOB (blue) and
SWOB (red).
Hookload, Surface Torque, RPM and SPP), while the only input to be
different is the WOB (SWOB measurement or corrected DWOB).
Fig. 12 presents the model results for the case where the calculated
DWOB is used as the input (blue line), and the one where the SWOB is
used as the input (red line). In this plot the MAE is located in the y axis,
while the data percentage (continuous learning such that the training
data is increasing with the depth during operations) is located on the
x axis. It is observed that the use of the calculated DWOB provides
an overall better result for the ROP predictions, especially in the first Fig. 13. Heatmap Comparison ROP Prediction Using Corrected DWOB and Measured
drilling section (approximately 18%–26%). SWOB.
Fig. 13 shows how the MAE varies with the drilled depth and
the fixed prediction length (5 m) using the corrected DWOB and the
measured SWOB respectively. In the y axis we have the drilled depth
in meters; and in the x axis it represents the distance predicted ahead
of the bit in meters. The figure is divided into three sections, on the
left hand side is the results for the calculations using the DWOB; in
the middle is the results for the calculations using the SWOB and on
the right is the color bar to understand the quality of the prediction
(MAE).
It is noted that at the beginning with less information (15% of
the data used to train the model which represents only 75 m), it
achieves a good level of the precision in the prediction with the use
of corrected DWOB. It is also noted that for this case only in a part
between 2125–2150 m the prediction error obtained is higher than
6 m/h, while the rest of the section is predicted properly. Another
observation is that with the more data used for training, the error in
the prediction decreases. For the case of the ROP prediction using the
SWOB as the input, at the beginning of the well section, we encounter
with higher errors. As the number of training samples is increasing,
the error reduces but still, when compared to the plot on the left
(calculated DWOB) there is a higher overall error in the predictions,
more discussions on the results are given in Section 5.2–5.3.
5. Result analysis and evaluations

Fig. 14. Hookload Comparisons.
5.1. Data quality analysis
Fig. 14 presents the different HL results. In this plot three values are as the verification of the proposed calculations. From this figure, it
shown. The red line represents the results of the corrected HL, which
is also possible to see that the implemented model (red one) works
takes into account three effects. The yellow line shows the results for
the corrected HL calculated by the T&D model. Finally, the blue line properly to maintain a low difference with the calculated HL (yellow
shows the values obtained by Halliburton’s WellPlan, which was used one).
9
The model for Halliburton’s WellPlan T&D Analysis was originally

developed by Johancsik et al. (1984) and put in a standard form
by Sheppard et al. (1986). As observed in Fig. 14, there is a difference
between the results obtained with our simulation (yellow curve) and
WellPlan (blue curve), but both curves have a similar tendency. For
results verification, other reliable models (Aadnoy et al., 2010; Wu
et al., 2011) can also be used.
5.2. Model prediction analysis
The model was tested with different percentage combinations of

data available for training and validation. With the comparisons, the
combination of 80% for training and 20% for validation out of the area
considered available at a given step provides the best results.
Fig. 13 shows the model results with a predicted length of 5 m. The
model was evaluated extensively in order to determine the best possible
combination of retained information and predicted length (N1 and N2,
see Fig. 6). It is also interesting to see the model performance with
the different prediction window sizes. The model predictive capabilities
were tested starting with 5 m, then 7.5 m and finally 10 m ahead of the
bit. It means that the model could predict 20, 30 and 40 data points
ahead respectively. This test was carried out both for the calculated
DWOB and surface WOB as the inputs for the models.
The results of the model that used the calculated DWOB as the input
are presented in Fig. 15. This plot comparison provides us a better
insight on the model’s behaviors. It is noted that the stability of the
model decreases while we try to predict further ahead (larger window
size, N2). Compared with the errors presented in the first two heat
maps (for 5 m and 7.5 m case), the MAE increases considerably for the
last plot (window size is 10 m). In the case study of ROP prediction,
the performance starts to significantly deteriorate for model predicting
furthest, and therefore experiments with even longer prediction win-
Fig. 15. Heatmap Comparison ROP Prediction Using Corrected DWOB with Different
dow are not presented as not useful. Moreover, using longer prediction
Predicted Lengths.
window degrades the near performance, i.e. prediction 3 meters ahead
in 5 m model will be much closer to reality than the same predicted
location for 10 m model. Combining multiple models is considered
potential future work that can extend the prediction window without parameters, which makes us wonder why the other sections do present
such quality deterioration. this behavior. We attribute these behaviors (high and low difference)
Fig. 16 presents the results for the ROP prediction using the SWOB to two main reasons, which were identified after analyzing the well
as the input parameter. As observed in Fig. 15, increasing the predicted characteristics and normal operational practices. First, the well does
length causes the decrease of the accuracy of the results. Comparing the not have a high inclination profile, actually the analyzed section has
results obtained by using the corrected DWOB, the plots show a higher a 33◦ inclination, which helps us make sense of why the difference in
tendency of increasing the error as the predicted length increases, the 2000–2200 m is not bigger. Finally, while drilling, once the stand
which is another point in favor of improving the quality of the input is drilled, the driller followed the company’s operational best practices
parameters of the model. before adding another stand. When the new stand is added the driller
Moreover, it is important to explain why the results presented in set the initial (zero) value of the WOB, therefore, the consistency of the
Figs. 15–16 only cover the 1975–2300 m section. The model was measured SWOB depends on the driller’s expertise. In some cases if this
provided initially 15% of all the data for training and validation, is not performed properly, it is possible to see the difference observed
that corresponds to the initial 75 m. After it, the model makes the in the 1900–2000 m and 2200–2300 m sections.
predictions for the next 20% of the data (100 m, which corresponds to If we refer back to Figs. 15–16 and compare the results presented in
1975–2075 m section). Then the process is repeated. Finally, the last the heat maps. The difference between the two runs provide the enough
one to be analyzed is 2300 m which corresponds to 80% of the data evidence that the use of either of the WOB values will have a heavy
set. They are used to train and validate the model to finally predict for influence on the results of the predicted ROP. While the difference
the last 20% of the data, completing 100% use of the available data. between the results might not look as extreme, it is encouraging enough
to provide a basis for further studies on this topic.
5.3. Engineering analysis
5.4. ML models evaluation
The main difference between the two ML model results is the
use of the calculated DWOB and the measured SWOB respectively. The model presented in this study represents a great alternative for
The parameter, WOB, is regarded as the one of the most influential the ROP prediction which requires only 75 meters of the training data
in the ROP determination, and therefore its influence should not be to start predicting with the relative high confidence. It is necessary,
understated. specially when presenting a new alternative, to compare the accuracy
It is evident that the biggest difference between these two param- of the presented model with existing and widely used ML models in
eters are located, firstly, in the initial 100 meters of the analyzed the same conditions. In this case we used Random Forest Regressor,
area and secondly, located between 2200–2300 m. The section of Gradient Boosting Regressor and Extreme Gradient Boosting for com-
2000–2200 m does not present a considerable difference between the parisons. The results of the comparisons are presented in Fig. 17. All
10
The plot shows that there is a considerable difference in the results

obtained by the presented model (blue) and the other models. The MAE
for the other models presented in Fig. 17 is considerably higher than
the one of the presented model, at 76% exceeds a 20 m/h error in pre-
diction. A reason behind it could be that the conventional ML models
usually do not perform as expected when predicting the information out
of sample range. For instance, Random Forest is a decision tree-based
model and as its names suggests uses decision trees to provide results.
Therefore, if the value that we aim to predict was not present in the
training data set, it is too complicated for the model to come with an
appropriate prediction. The high MAE results could be explained by the
fact that, in most cases, these models need larger amounts of available
information to provide good results. The analyzed case and the focus
of the study is related to limited data sets, where the data expands as
the well is drilled. This could be a solution for many companies that
do not have large data sets available for offset wells.
5.5. Discussions
The main limitation of the presented model is the prediction hori-

zon. The RNN model is dependent on the temporal information being
accurate, as the inputs are taken from ground truth data. It is possible
to extend the model, i.e. predict a given distance of ROP data and re-use
this data as input to the model. This will cause error accumulation and
at one point the error in the ROP input will be higher than the benefit
of using it at all. From that point it will be more accurate to use a model
that does not utilize temporal information.
In practice it means that the presented model is more useful to be
used during actual drilling operation to predict and optimize the ROP
values, than to predict ROP throughout an extended portion of the well.
In fact this is already seen in Fig. 13 where further predictions are lower
quality than early ones.
Fig. 16. Heatmap Comparison ROP Prediction using SWOB for Different Predicted
Lengths.
6. Conclusions
The main objective of the study is to verify the feasibility of im-

proving the performance of ML models by providing better input data,
and using already available physics-based models as a way to process
the data. The obtained results suggest that this was achieved in the
analyzed case, specially in areas where a bigger difference between the
raw data and corrected data is observed.
Moreover, the developed RNN model’s performance was compared
with widely used ML models. It shows that the presented model pro-
vides better results in prediction. Also, the model performance was
tested in different prediction scenarios, varying the predicted distance.
The results show that the prediction accuracy decayed with the increas-
ing prediction length, but in the first two cases (5 m and 7 m) the model
provides a good prediction of the ROP values.
The results obtained from this study, specially in areas where the
differences between the raw and corrected data are larger, suggest a
promising and easy-to-implement way to improve the ML models for
the ROP prediction. For the future work, it is necessary to evaluate the
model with more generic data, for instance, from different wells and
scenarios, specially when the well inclination is larger such that there
Fig. 17. Comparison between Presented Model and Other Widely Used ML Models. exists the large differences of downhole/surface WOB.
Further work is also needed to evaluate how much benefit the
utilized branched model architecture actually brings. While the inclina-
tion (Tunkiel et al., 2021) and ROP case study presented in this study
of the models used the calculated DWOB as the input, as it provided
show the performance gains compared to traditional methods such as
both the best results for the analyzed case and highest correlation with Random Forest, Gradient Boosting Regressor etc., it is still not clear
regards to the ROP (compared with the SWOB, see Fig. 4). what kind of ML problems would benefit the most. The ML model used
in this study requires more complex data preparation, as well as it
In Fig. 17, the x axis presents the data percentage used for training
depends on the GPU calculations for acceptable performance (training
the model (15% the starting point is equivalent to 70 m); and in the y in 3–10 min, depending on the data size). Additional complexity comes
axis it presents the MAE in the predicted values for the following 20% from the size of the inputs to the ML model. While traditional methods
of the data, which is equivalent to 100 m. utilize inputs from a give point in time or space, the presented approach
11
utilizes a complete series of inputs for each attribute used, making Declaration of competing interest
processes such as ROP optimization much more complex. The inputs
can be arbitrary only to an extent, as they have to be realistic. The The authors declare that they have no known competing finan-
ML model will respond in a meaningful way only for the area of input cial interests or personal relationships that could have appeared to
hyperspace that was covered in training. In the presented case study, influence the work reported in this paper.
the input is three dimensional as required when working with LSTM
networks, the batch size is constantly increasing per the principle of Acknowledgments
continuous learning and one unit in the input sequence in each training
cycle, which necessarily means that training covers only a small portion We would like to express gratitude to Equinor, Norway for providing
of this hyperspace, even if the whole realistic area is covered. funding for this research through Equinor Akademia Program and
thank OTICS at UiS for supporting the project.
Abbreviations
References
ANN Artificial Neural Network Aadnoy, B.S., Fazaelizadeh, M., Hareland, G., 2010. A 3d analytical model for wellbore
friction. J. Can. Pet. Technol. 25–36. http://dx.doi.org/10.2118/141515-PA.
BHA Bottom Hole Assembly Ahmed, O., Adeniran, A.A., Samsuri, A., 2019. Computational intelligence based
prediction of drilling rate of penetration: a comparative study.. J. Pet. Sci. Eng.
DA Data Assimilation 1–12.
Altman, N.S., 1992. An introduction to kernel and nearest-neighbor nonparametric
DWOB Downhole Weight on Bit regression. Amer. Statist..
Amer, M.M., DAHAB, A.S., El-Sayed, A.-A.H., 2017. An ROP predictive model in Nile
Delta area using artificial neural networks. D023S012R001, URL https://doi.org/
ECD Equivalent Circulation Density 10.2118/187969-MS.
Barbosa, L.F.F., Nascimento, A., Mathias, M.H., de Carvalho, J.A., 2019a. Machine
GBM Gradient Boosting Machines learning methods applied to drilling rate of penetration prediction and optimization
- a review. J. Pet. Sci. Eng. 183, URL http://www.sciencedirect.com/science/
HL Hookload article/pii/S0920410519307533.
Barbosa, L.F.F.M., Nascimento, A., Mathias, M.H., Carvalho, J.A., 2019b. Machine
learning methods applied to drilling rate of penetration prediction and optimization
IQR Interquartile Range
- a review.. J. Pet. Sci. Eng..
Bourgoyne, A., Millheim, K., Chenevert, M., Young, F., 1991. Applied Drilling
KNN K-nearest Neighbors Engineering, Vol. 2. SPE Textbook Series.
Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32, URL https://doi.org/
LSTM Long Short Term Memory 10.1023/A:1010933404324.
Cayeux, E., Skadsem, H.J., Kluge, R., 2015. Accuracy and correction of hook load
MAE Mean Average Error measurements during drilling operations. D021S010R005, URL https://doi.org/10.
2118/173035-MS.
Christiani, N., Shawe-Taylor, J., 2000. Support Vector Machines: Data Analysis, Machine
MAPE Mean Average Percentage Error Learning and Applications. Cambridge University Press, 0521780195.
Deep, R., 2005. Probability and Statistics with Integrated Software Routines. Academic
ML Machine Learning Press, p. 290.
Equinor, 2018. Disclose all volve data. https://www.equinor.com/en/news/14jun2018-
MLP Multilayer Perceptron disclosing-volve-data.html(Accessed: 19 May 2020).
Equinor, 2020. Volve field. https://www.equinor.com/en/what-we-do/norwegian-
continental-shelf-platforms/volve.html(Accessed: 19 May 2020).
NaN Not a Number
Eskandarian, S., Bahrami, P., Kazemi, P., 2016. A comprehensive data mining approach
to estimate the rate of penetration: application of neural network, rule based models
RF Random Forest and feature ranking.. J. Pet. Sci. Eng..
Esmaeli, A., Elahifar, B., Fruhwirth, R., Thonhauser, G., 2012. ROP modeling using
RNN Recurrent Neural Network neural network and drill string vibration data. Soc. Petrol. Eng. SPE 163330
http://dx.doi.org/10.2118/163330-MS.
ROP Rate of Penetration Friedman, J.H., 2001. Greedy function approximation: A gradient boosting machine.
Ann. Statist. 29 (5), 1189–1232. http://dx.doi.org/10.1214/aos/1013203451, URL
https://projecteuclid.org:443/euclid.aos/1013203451.
RPM Rotary Speed
Geekiyanage, S.C.H., Sui, D., Aadnoy, B.S., 2018. Drilling data quality management:
Case study with a laboratory scale drilling rig. Volume 8: Polar and Arctic Sciences
SPP Stand Pipe Pressure and Technology; Petroleum Technology, URL https://doi.org/10.1115/OMAE2018-
77510.
SVM Support Vector Machine Geekiyanage, S., Tunkiel, A.T., Sui, D., 2020. Drilling data quality improvement and
information extraction with case studies. J. Petrol. Explor. Prod. Technol. http:
STQ Surface Torque //dx.doi.org/10.1007/s13202-020-01024-x.
Halliburton, 2020. WellPlan engineering software. https://www.landmark.solutions/
WellPlan-Well-Engineering-Software.
SWOB Surface Weight on Bit Han, T., Jiang, D., Zhao, Q., Wang, L., Yin, K., 2018. Comparison of random forest,
artificial neural networks and support vector machine for intelligent diagnosis
T&D Torque and Drag of rotating machinery. Trans. Inst. Meas. Control 40 (8), 2681–2693, URL https:
//doi.org/10.1177/0142331217708242.
WOB Weight on Bit Han, J., Sun, Y., Zhang, S., 2019. A data driven approach of ROP prediction and
drilling performance estimation. D011S010R006, URL https://doi.org/10.2523/
IPTC-19430-MS.
CRediT authorship contribution statement Hareland, G., Wu, A., Lei, L., 2014. The Field Tests for Measurement of Down-
hole Weight on Bit(DWOB) and the Calibration of a Real-time DWOB Model.
Mauro A. Encinas: Conceptualization, Investigation, Data cura- International Petroleum Technology Conference, Doha, Qatar, p. 6, IPTC, URL
https://doi.org/10.2523/IPTC-17503-MS.
tion, Methodology. Andrzej T. Tunkiel: Formal analysis, Supervision, Hauser, J.R. (Ed.), 2009. Interpolation. In: Numerical Methods for Nonlinear Engineer-
Writing – original draft. Dan Sui: Conceptualization, Methodology, ing Models. Springer Netherlands, Dordrecht, pp. 187–226, URL https://doi.org/
Supervision, Writing – original draft. 10.1007/978-1-4020-9920-5_6.
12
Hawkins, D.M., 1980. Identification of Outliers. In: Monographs on applied probability Nogueira, F., 2014. BayesIan optimization: Open source constrained global optimization
and statistics, Chapman and Hall, http://dx.doi.org/10.1007/978-94-015-3994-4. tool for python. URL https://github.com/fmfn/BayesianOptimization.
Hegde, C., Daigle, H., Millwater, H., Gray, K., 2017. Analysis of rate of penetration Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
(ROP) prediction in drilling using physics-based and data-driven models.. J. Pet. Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Sci. Eng.. Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit-learn: Machine
Hegde, C., Soares, C., Gray, K., 2019. Rate of penetration (ROP) modeling using learning in python. J. Mach. Learn. Res. 12, 2825–2830.
hybrid models: Deterministic and machine learning. In: Unconventional Resources Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 2007. Numerical recipes
Technology Conference, https://doi.org/10.15530/URTEC-2018-2896522. 3rd edition: The art of scientific computing, third ed. Cambridge University Press,
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9 (8), USA, pp. 955–983.
1735–1780, URL https://doi.org/10.1162/neco.1997.9.8.1735. Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning representations by back-
Jain, A.K., Mao, J., Mohiuddin, K.M., 1996. Artificial neural networks: a tutorial. propagating errors. Nature 323 (6088), 533–536, URL https://doi.org/10.1038/
Computer 29 (3), 31–44. http://dx.doi.org/10.1109/2.485891. 323533a0.
Johancsik, C.A., Friesen, D.B., Dawson, R., 1984. Torque and drag in directional Sheppard, M.C., Wick, C., Burgess, T.M., 1986. Designing well paths to reduce drag
wells-prediction and measurement. J. Pet. Technol. 36 (06), 987–992, SPE, URL and torque. In: Presented At the 61st Annual Technical Conference and Exhibition
https://doi.org/10.2118/11380-PA. of the SPE, New Orleans, https://doi.org/10.2118/15463-PA.
Law, K., Stuart, A., Zygalakis, K., 2015. Data Assimilation: A Mathematical Introduction. Soares, C., Gray, K., 2020. Real-time predictive capabilities of analytical and machine
In: Texts in Applied Mathematics, vol. 62, Springer, Cham, A mathematical learning rate of penetration (ROP) models. J. Pet. Sci. Eng. http://dx.doi.org/10.
introduction, URL https://doi.org/10.1007/978-3-319-20325-6. 1016/j.petrol.2018.08.083.
Liu, B., 2017. Lifelong machine learning: a paradigm for continuous learning. Front. Tunkiel, A.T., Sui, D., Wiktorski, T., 2021. Training-while-drilling approach to in-
Comput. Sci. 11 (3), 359–361, URL https://doi.org/10.1007/s11704-016-6903-6. clination prediction in directional drilling utilizing recurrent neural networks. J.
Luke, G.R., Juvkam-Wold, H.C., 1993. The determination of true hook-and-line tension Pet. Sci. Eng. 196, 108128. http://dx.doi.org/10.1016/j.petrol.2020.108128, URL
under dynamic conditions. SPE Drill. Complet. 8 (04), 259–264, SPE, URL https: http://www.sciencedirect.com/science/article/pii/S0920410520311827.
//doi.org/10.2118/23859-PA. Tunkiel, A.T., Wiktorski, T., Sui, D., 2020. Continuous drilling sensor data reconstruc-
McCulloch, W.S., Pitts, W., 1943. A logical calculus of the ideas immanent in tion and prediction via recurrent neural networks. Volume 1: Offshore Technology,
nervous activity. Bull. Math. Biophy. 5 (4), 115–133, URL https://doi.org/10.1007/ URL https://doi.org/10.1115/OMAE2020-18154.
BF02478259. Wiktorski, E., Kuznetcov, A., Sui, D., 2017. ROP optimization and modeling in
McLaughlin, D., 2014. Data assimilation. In: Njoku, E.G. (Ed.), Encyclopedia of Remote directional drilling process. D011S003R002, URL https://doi.org/10.2118/185909-
Sensing. Springer New York, New York, NY, pp. 131–134, URL https://doi.org/10. MS.
1007/978-0-387-36699-9_33. Wu, A., Hareland, G., Fazaelizadeh, M., 2011. Torque & drag analysis using finite
Mockus, J., Tiesis, V., Zilinskas, A., 2014. The application of Bayesian methods for element method. Mod. Appl. Sci. http://dx.doi.org/10.5539/mas.v5n6p13.
seeking the extremum. 2, pp. 117–129, Wu, Z., Hareland, G., Loggins, S.M., Lai, S., Eddy, A., Olesen, L., 2017. A new method of
Moran, D., Ibrahim, H., Purwanto, A., Osmond, J., 2010. Sophisticated ROP predic- calculating rig parameters and its application in downhole weight on bit automation
tion technologies based on neural network delivers accurate drill time results. in horizontal drilling. D031S004R005, URL https://doi.org/10.2118/185643-MS.
SPE-132010-MS, URL https://doi.org/10.2118/132010-MS.
Motahhari, H.R., Hareland, G., James, J.A., 2010. Improved drilling efficiency technique
using integrated PDM and PDC bit parameters. J. Can. Pet. Technol. 49 (10), 45–52,
URL https://doi.org/10.2118/141651-PA.
13

1 s2.0 S0920410521015217 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0920410521015217 Main

Uploaded by

Copyright:

Available Formats

Journal of Petroleum Science and Engineering 210 (2022) 109904

Contents lists available at ScienceDirect

Journal of Petroleum Science and Engineering

Downhole data correction for data-driven rate of penetration prediction

ARTICLE INFO ABSTRACT

2.1.4. Noise reduction

2.2. Engineering data correction

Fig. 2. Hookload Correction Diagram.

Fig. 4. Heatmap with Correlation Coefficient Values.

be used to verify the validity of the developed calculations. The data is

4. Case study 4.1.2. Data assimilation

been affected. Therefore, the necessary data correction is expected for

assimilation process. Fig. 7 shows the comparison of the HL data, where

4.3. ROP predictions

There are several metrics to evaluate the error in the prediction,

5. Result analysis and evaluations

The model for Halliburton’s WellPlan T&D Analysis was originally

5.2. Model prediction analysis

The model was tested with different percentage combinations of

The plot shows that there is a considerable difference in the results

The main limitation of the presented model is the prediction hori-

The main objective of the study is to verify the feasibility of im-

You might also like