Paper Volve Data Exploration

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/347714801

Drilling Dataset Exploration, Processing and Interpretation Using Volve Field


Data

Conference Paper · August 2020


DOI: 10.1115/OMAE2020-18151

CITATIONS READS
3 42

3 authors, including:

Andrzej Tunkiel Dan Sui


University of Stavanger (UiS) University of Stavanger (UiS)
6 PUBLICATIONS 8 CITATIONS 88 PUBLICATIONS 457 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Andrzej Tunkiel on 24 May 2021.

The user has requested enhancement of the downloaded file.


Proceedings of the ASME 2020 39th International
Conference on Ocean, Offshore and Arctic Engineering
OMAE2020
June 28-July 3, 2020, Fort Lauderdale, FL, USA

OMAE2020-18151

DRILLING DATASET EXPLORATION, PROCESSING AND INTERPRETATION USING


VOLVE FIELD DATA

Andrzej T. Tunkiel Tomasz Wiktorski Dan Sui


Research Fellow Associate Professor Professor
Department of Energy and Department of Electrical Engineering and Department of Energy and
Petroleum Engineering Computer Science Petroleum Engineering
Faculty of Science and Technology Faculty of Science and Technology Faculty of Science and Technology
University of Stavanger University of Stavanger University of Stavanger
Norway Norway Norway
Email: andrzej.t.tunkiel@uis.no Email: tomasz.wiktorski@uis.no Email: dan.sui@uis.no

ABSTRACT within the oil and gas. The field is located in the North Sea and
was in operation from 2008 to 2016. One of the biggest benefits
In 2018 Equinor made an unprecedented step for an energy of using an on open dataset is that it allows for experiment re-
company and made a multi-terabyte dataset from Volve field production and benchmarking. It is impossible to do so if results
open. However, there is a long way from downloading data to are published while withholding the raw data. Over 52 percent
executing meaningful analysis. With no way of quickly evaluat- of scientists surveyed by Nature in 2016 claim there is currently
ing the data due to its size and unfamiliar file formats the use of a reproducibility crisis in science [3], with unavailability of raw
Volve data was so far limited. data among the top contributing factors.
This paper presents our exploratory work related to the real- Additionally, data related to drilling operation are of rela-
time drilling part of the dataset. We provide description of com- tively unusual type - it is a data-series with up to hundreds of
mon obstacles and approaches for overcoming them. We also correlated attributes. It is hard to find a dataset where data struc-
describe specific contents of the dataset for others to gauge the ture and data problems are similar.
potential for case studies. We hope that this will lower the bar Curated datasets exist in other fields. Especially in Machine
for Volve field data accessibility, promote research, and become Learning there are a number of datasets that allow researchers
a catalyst for other data science projects. to benchmark their methods, with MNIST database1 containing
labelled images of handwritten numbers being the most notable
INTRODUCTION one. There are also other, more specialized open datasets, such
as Johns Hopkins Turbulence Database2 , which enables research
through providing data that would be otherwise prohibitively ex-
Work done in the energy sector, up until recently, was fully
pensive to acquire.
dependant on laboratory scale work [1] and commercial partners
In this study, we want to facilitate a step change in re-
tied to specific research projects. Prerequisite legal overhead,
search quality within the energy sector by making Volve real-
confidentiality and sensitivity of the real well data means they
time drilling data easily accessible. Today, the data in question
are often prohibitively difficult to obtain.
Volve field dataset [2], made public by Equinor, has a po-
tential to become the go-to dataset for data scientists working 1 http://yann.lecun.com/exdb/mnist/
2 http://turbulence.pha.jhu.edu/
1 Copyright c 2020 by ASME
are available as WITSML (Wellsite Information Transfer Stan- nificantly more information in the time-based data, since it is
dard Markup Language) files [4], format common in the industry, recorded continuously throughout operations. The depth-based
but not compatible with most common data science tools. are only a subset recorded when actual drilling is performed,
In this paper, we present what data are available and meth- when the rock is being physically cut.
ods for dealing with common problems. As an extension to this In Table 3, appendix, we summarized the available data in
paper we also provide an online platform for data exploration terms of wellbores and number of samples available for both
and download. This will lift the real-time drilling part of Volve depth and time data.3
from being accessible only by experts, to a resource that is easily
accessible for a wide range of scientists and engineers. Parsing of WITSML data
Presented work contributes to the field of data preparation, WITSML is an industry standard for transferring real-time
one of six major phases in Cross-industry standard process for data between cooperating oilfield companies. Despite wide
data mining (CRISP-DM), the most widely-used analytics model adoption we failed to identify a Python library allowing for direct
in data mining, by characterizing the typical problems found in import of such data.
real time drilling data. Discussed issues and presented solu- Decision was made to parse the files using regular expres-
tions are meant to bridge the gap between researchers and pub- sions. Other methods, such as libraries dedicated to reading
lic raw dataset within drilling, as well as to be a reference for XML files, are equally suitable.
wider data science community that is not necessarily familiar WITSML format provides a number of useful values that
with petroleum industry. we decided to retain. There is information attached to each at-
This paper starts with an introduction to the Volve dataset tribute, such as full name, mnemonic, unit, data type, minimum,
and its contents, with a focus on WITSML real-time drilling data. and maximum date index. Our conversion effort aimed at provid-
As a second step typical data related issues and potential pitfalls ing data in CSV format to achieve full compatibility with most
are highlighted together with recommended solutions. To better common Python data analysis library - Pandas, and other data an-
visualize some of the discussed topics we present several exam- alytics tools such as R and Excel. It meant that only a simple title
ples, coupled with tables giving a bird’s-eye view of the dataset. is possible for each attribute. Format of full name concatenated
with unit was used.
VOLVE DATASET Additional attribute was created informing user which sec-
tion of the well the log is from. This information was obtained
from metafileinfo.txt files residing within the folder structure. No
Volve field is located in the North Sea, mid way between Sta-
other modification to the data was performed. All but one well
vanger and Aberdeen. It was discovered in 1993 and production
were parsed successfully, with the exception stemming from a
started in February 2008 and it lasted eight years. The reservoir
seemingly different data structure.
is in the sandstone of Middle Jurassic age in the Hugin Formation
at the depth between 2700m and 3100m with seabed at depth of
Available data
80m. Peak production was 56 000 barrels per day with a total
As indicated earlier, available data exist as either time-based
of 63 million barrels of oil produced. In June 2018 Equinor de-
data or depth-based data. In general terms, time-based data are
cided to disclose all subsurface and operating data for this field
expected to contain values as-recorded with mostly fixed time-
totalling approximately 40 000 files of various kind. The data
steps. One can observe the movement of the drawworks, down-
is published on a very permissive license - Creative Commons
time between drilled sections, pulling out of hole etc. Depth-
BY-NC-SA 4.0 - which, in short, means that any derivative work
based data on the other hand will be processed to contain seem-
has to attribute the original license holder (BY, by attribution),
ingly continuous drilling operation. Depth-based data do not
cannot be commercial (NC - non-commercial) and be shared on
contain time information.
an identical license (SA - share-alike). There are total 14 com-
In some, but not all wells, time-based data contain not only
pressed archives available for download, see Table 2 for size ref-
drilling operations, but also events such as casing or completion
erence.
running.
With just one exception, if depth-based data are available,
Real-time dataset time-based equivalent also exists. There are typically above 100
Our work was focused on the part of the dataset named Volve attributes available in the depth-based logs and above 200 in
WITSML Realtime drilling data, the 5GB archive seen in Table time-based ones. Measurement units in Volve dataset are exclu-
3, in the appendix. sively metric.
Within multiple, nested folders, it contains drilling logs as
both time and depth-based data. The main difference between
these is the indexing attribute. Additionally, there is usually sig- 3 We did not investigate why seemingly the same wells are logged in different

folders.
2 Copyright c 2020 by ASME
Raw amount of data TABLE 1. Depth range
Different logging frequencies mean that a new entry is gen-
erated even though not all attributes have a new value available. Minimum4 Maximum
Folder Well
This leads to a relatively high amount of missing values (not a depth (m) depth (m)
number, NaN) in the dataset. For example, in well F9 A, depth- Norway-Statoil-NO F-1 C - 0 1 257.0 3632.1
based, over 80.8 percent of individual data points are empty. 74.2 Norway-Statoil-NO F-1 C - A 2 564.0 3 682.8
percent cells in the time data for the same well also have no value. Norway-Statoil-NO F-1 C - B 2 591.9 3 465.0
This is common throughout all the wells. Norway-Statoil-NO F-1 C - C 2 528.6 4 008.4
Norway-StatoilHydro F-4 191.6 2 992.4
Depth Norway-StatoilHydro F-5 2 911.7 3 793.0
One of the key attributes of well data is depth, or more pre- Norway-Statoil F-7 131.8 914.9
cisely, multiple variants of depth. From a catalogue of defini- Norway-StatoilHydro F-9 162.2 633.5
tions, one can typically find at least one of the common ones that Norway-NA F-9 A 400.1 1 206.0
is available for the majority of the dataset. Norway-StatoilHydro F-10 448.5 5 311.1
Depth-based logs will usually have attribute called Mea- Norway-Statoil-NO F-11 196.0 347.6
sured Depth m, which is typically complete. In the case of well
Norway-Statoil-NO F-11 T2 363.8 2 574.0
F9 A a Bit Depth m value is also available, however it covers
Norway-Statoil-NO F-11 A 2 522.7 3 762.0
only a small section of the well and in practice is identical to the
Measured Depth m. Norway-Statoil-NO F-11 B 2 655.9 4 770.6
Time-based F9 A dataset has a number of depth-related at- Norway-Statoil F-12 279.3 3 464.5
tributes. One has to investigate three available Bit Depths, as well Norway-StatoilHydro F-14 316.0 3 466.1
as Continuous Survey Depth m, especially when analysing direc- Norway-StatoilHydro F-15 1 392.6 4 065.3
tional drilling aspect of the well, as it is corrected for the sensor Norway-StatoilHydro F-15 A 2 517.5 3 233.0
position in the bottom hole assembly. Norway-StatoilHydro F-15 B 2 968.8 3 035.5
Depth range of the depth-based datasets is reported in Table Norway-StatoilHydro F-15 S 1 503.6 4 090.0
1. The raw minimum depth is not reported, as some datasets con-
tained clearly incorrect values. We decided to use 5th percentile
of the depth series, which corrects this issue while still providing TYPICAL PROBLEMS AND PROPOSED SOLUTIONS
a good indication of wells’ usable data range.
Uneven data frequency
Attribute availability Not all data are received simultaneously. Mud pulse teleme-
try will provide data in a continuous slow stream, in sequences
All the logs differ in terms of attribute availability due to dif-
repeating every couple of minutes. In the logged data, every time
ferences in equipment and practices utilized during operations. A
a new value is available it is written down in a new line into the
listing of selected attributes’ keywords is provided in appendix,
log. There is a minimal waiting time to collect other attributes,
in Table 4.
hence very often rows are sparsely populated and dominated by
One can roughly identify what kind of analysis is possible
missing values.
based on presented table. Wells F-1, F-11, and F-15D have sig-
This problem is visualized in Figure 1. Sampling of all three
nificant amount of attributes related to gamma and neutron based
attributes is different. Attribute A is logged at half of the fre-
measurements. Those wells were drilled in 2013 as opposed to
quency of attribute B. Attribute C is logged at the same frequency
other wells drilled in years 2007 - 2009. We did not investigate
as B, but at different times. In the presented example, the first
the reason behind the use of different type of equipment.
sample will only have a value for attribute C. Second sample
Nearly all logs contain basic drilling attributes, such as rate
will contain A and B, third one again only C, fourth one only B
of penetration, surface torque, weight on bit, etc. When explor-
and so on. This practical approach to logging retains maximum
ing the dataset it is worth using a script [5] that would automate
amount of information while at the same time it creates a number
searching for attributes through all the files, as well as plotting
of problems for data analysts.
charts to gauge the usability of the data for a given research prob-
In a case of a correlation analysis between (often logged)
lem.
attributes B and C it is possible that they never co-exist in the
same row. If both are downhole attributes, uploaded through mud
pulse telemetry, they will necessarily be offset by a fixed value.
4 5th percentile Additionally, series-type analysis, be it depth or time, re-

3 Copyright c 2020 by ASME


Data should be inspected and understood before applying such
operation.

Resampling. Particularly useful due to its ease of use,


transparency, and good results, re-sampling technique is Radius-
Neighbors Regressor from scikit-learn library [6] available for
Python, based off k-nearest neighbors algorithm [7]. This is a
lazy algorithm, that defers calculation until the actual regression
is performed. Using this method, one first specifies new times-
tamps (or depth values) with even spacing. Then, the regressor
is called to evaluate the value of the new, evenly spaced attribute
based on the original data and a specified radius. Values from the
original attribute that fall within the specified radius are averaged
and that value is returned.
The value of the radius has to be as small as possible to retain
most of the data features, but big enough to cover all the distances
FIGURE 1. NON-MATCHING SAMPLING between the points. When applied this way, this method causes
minimal averaging of the signal.
When a regression model is created, one should plot the
quires that data are evenly sampled. With attribute A being
original data and re-sampled data to inspect the fit. We pro-
logged at a different frequency than attribute B a pre-processing
vide access to a python script to enable this operation [5]. This
solution is also required to re-sample these attributes at common
method allows for creation of new data points at arbitrary fre-
frequency and in phase.
quency and phase.
There are number of solutions to tackle this problem and the
ones most suitable for Volve real-time drilling data are elabo-
rated on below. Forward and backward filling are described as Finite difference, derivative. Calculating a rate of
the most basic methods, with resampling as a more advanced ap- change may be required for various types of analysis. For ex-
proach. Pitfalls related to calculating the rate of change are also ample, dogleg severity may be of interest, but it is not recorded
highlighted. in any logs and has to be calculated through a finite difference or
finite derivative.
Volve real-time logs do not have a constant time nor depth
Forward and backward filling. Problem of uneven
step length. This means that one cannot apply simple finite dif-
data sampling may be addressed by simply forward or backward
ference as:
filling existing data to cover the gaps. The main benefit of this
approach is simplicity, as such functions are built-into common
data analysis libraries such as Pandas, and are computed, for all
intents and purposes, nearly instantly. ∇[ f ](x) = f (xn ) − f (xn−1 ). (1)
In case of forward filling, the algorithm iterates over the
samples of the dataset. When a value is missing for a given sam-
ple at a time t, ie. xt = NaN, previous, existing value is propa- or more generally:
gated forward, that is xt := xt−1 . This operation is performed up
to the end of the dataset. Backward filling is identical, except for
direction - the algorithm iterates from the last sample backwards, ∇h [ f ](x) = f (x) − f (x − h). (2)
filling NaN values at xt with value from xt+1 .
One potential artifact of forward and backward filling may
be changed value of derivative of a function. Assume an attribute where spacing h is variable due to uneven depth or time
that is logged every 1 minute with the complete log having sam- steps. Ignoring varying spacing values may lead to incorrect con-
pling rate of 1 second. In reality the value rises at a steady rate clusions. It is then necessary to calculate finite derivative, where:
of x every minute. Forward filling a log will give an impression
of step rise at a rate of 60 · x per minute, once every minute.
Additionally, if an attribute is no longer logged the last ∇h [ f ](x)
recorded value will get populated up until the end of the log. f 0 (x) ' as h → 0 (3)
h

4 Copyright c 2020 by ASME


Alternatively, one can unify the sampling rate using the basis.
method described in subsection on Radius-Neighbors Regressor Other, more complex methods exist for data recovery, such
and calculate a finite difference. Typically in the realm of Ma- as neural networks [9], which are out of scope of this paper.
chine Learning inputs are normalized [8] and their absolute value
is not used, making this an efficient and valid approach.
Both methods are computationally light and fairly simple to
implement.
Time based data

Big gaps
There are limited possibilities when it comes to gaps that ex- Sort by Time
ceed several meters in datasets. They may appear due to change
in equipment, sensor failures, logging failures, data corruption,
and similar. Volve data contains a number of such longer gaps. Forward Fill
We discovered that some gaps may exist in the depth based
dataset while not being present in the time based one, and vice Select time range
versa. One example is illustrated in Figure 3 and discussed in
detail in further sections. There are no simple universal methods Depth based data
Sort by Depth with a gap
of merging date and time datasets. Below we provide a generic
algorithm that can be used to address this issue.
Select given Ensure no
Depth range corrupted data
Patching. Data from a time-based log can be used to fill exists on the gap
in, or patch, relatively large gaps in a depth-based log. A process Adjust attribute edges
that allows patching of a depth-based dataset using data from a names
time-based dataset is shown in Figure 2.
Time-based data have to be sorted by time and attributes for-
ward filled. This is done to ensure that there is no data loss in the Merge
consequent depth-based sorting. As mentioned in the previous
sections, the dataset is in big part empty, as values are recorded
at different intervals. If data is sorted by depth without forward-
filling first, all the samples that are in between logged depth val- Gap-filled depth
ues would be lost. If multiple operations, including tripping in based data
and out, are logged, it is beneficial to isolate a rough time-range FIGURE 2. PATCHING FROM TIME PROCESS
of the log that corresponds exclusively to drilling. With drilling-
only part of the dataset, one can simply sort by depth to convert
the log into a depth-based log. Note that there may be differ-
ent depths in the time-based log and it may not be immediately
clear which one is the most appropriate nor which one is the most Outliers, artifacts, and sentinel values
complete. Not all values in a log can be considered valid. There are a
Next step is to identify exact depth range that is missing or number of reasons for erroneous values with their own respective
is otherwise corrupted in the depth-based log. Note, when a gap methods for removing them. Traditional outliers, in the context
in data exists, attributes near the gap may be incorrect. It must of Volve WITSML data, can often be relatively easily removed
be ensured that there are no corrupted data on the edges of the with a median filter. There are publications [10–12] that deal
gap. Data removal is often necessary so that the edge values are with other, more complex methods.
without errors. With the depth range identified, the range of the Data artifacts are incorrect values due to flaws in measure-
patch dataset must be adjusted. For technical reasons it is likely ment or recording techniques. These can often be seen as im-
necessary to adjust the names of the attributes so they can merge possible straight lines in plots, such as the one on the bottom in
correctly. Take note that seemingly the same attributes may in Figure 3. This unfortunately often requires manual intervention,
fact differ in terms of filtering, noise levels, data artifacts or depth such as removing all given values of an attribute from samples
shift. Figure 4 shows values for four inclination attributes for the between two depths.
same well, without any two lines overlapping completely. One Sentinel values are employed to show a lack of value. They
must evaluate if given gap filling is acceptable on a case by case are often selected to be physically or mathematically impossible,

5 Copyright c 2020 by ASME


such as -999. When identified they can be easily removed, as logged. Some contain meaningful data, some do not. One cannot
they stay constant throughout the complete log. simply quickly browse through them due to their sheer quantity.
CSV files for a single well log can be as big as 5 GB causing
Repeated old data problems for software such as Excel.
We outlined earlier in the paper, that the datasets are to We developed and made available a simple tool [13] that
a high degree empty due to uneven polling frequencies of at- allows user to create scatter plots and histograms of selected
tributes. It was also discovered, that some sensor values may be wells with basic outlier filtering. It is meant to provide effort-
recorded despite not having new data to report. This creates a sit- less glimpse at what lies in the logs, and partially solves the data
uation, where the same value is recorded multiple times before it overload issue.
changes to a new one. Increasing the difficulty further, this effect
is visible on top of having empty values in between, creating a Multiple operations logged
sequence such as: In some cases not only drilling is logged in the time-based
15.2, NaN, NaN, 15.2, NaN, NaN, 15.2, NaN, NaN, 16.1, NaN, data. For example, well F9 A contains records of multiple oper-
NaN, 16.1, NaN, NaN, 16.1.... ations after drilling the well, such as running of the casing.
This may or may not be an issue, depending on analysis Unfortunately, to identify specific operations one has to
performed. The problem may be mitigated by utilizing Radius- manually analyze the various reports, as such data were not
Neighbour Regression function with a radius spanning the iden- recorded in the WITSML file. Nevertheless, the data exist and
tified true polling frequency, or a centered rolling average. Anal- can be analyzed.
ysis through exploration of the charts is also very useful in iden- The best way to verify if there are multiple operations logged
tifying this problem. is to analyze depth in a function of time. While it is typical that
the end of the drillstring goes up and down multiple times due
”Corrected” attributes to equipment change or for heavyweight pipe rearrangement, the
Some attributes in the real-time drilling Volve dataset have a maximum achieved depth increases after such operation in rela-
word ”corrected” added to them without an explanation of what tion to previously achieved maximum. If a non-drilling opera-
was corrected and why. One must exercises caution when using tion is performed the reported depth will not increase over the
these attributes, as well as accompanying attributes without such previous maximum before decreasing again, ie. tripping in and
prefix. It may be impossible to know the exact nature of cor- tripping out without drilling in between.
rection, unless one is knowledgeable about a standard practice
resulting in such a new attribute. DATA EXAMPLES

Depth lag issues Inclination data of well F5


Bottom hole assembly (BHA) contains multiple sensors that Below we are showing an example taken from the Volve
record values during drilling. They are necessarily at some dis- dataset. This is done to provide a better understanding of dis-
tance from the bit. When correlating Gamma values with bit- cussed problems and to demonstrate the aforementioned issues
related attributes such as torque or weight on bit, one must be on a specific example.
aware that the Gamma values are most likely lagging depth-wise For one of our other studies utilizing Volve dataset we were
in the log. We found no specific information whether data correc- focusing on continuous inclination data [9]. Well F5 was iden-
tion was done in this regard on any of the depth-based datasets. tified as containing applicable attributes. Upon closer inspection
One can perform comparative analysis between the time-based we discovered that despite a seemingly large amount of data be-
and depth-based files, when both exist, to check if the offset re- ing available, the well exhibit many of the problems outlined in
mains the same. This is for example the case in the well F9 the earlier sections.
A, where not only depth lag is visible in inclination data, there There are in total four log files for the well F5 with one
is also an additional attribute logged called Continuous Survey depth-based and one time-based containing data relevant to that
Depth m in the time-based log which brings the inclination val- study. Among these logs, there are five attributes referring to
ues to the position of the bit. Another example can be seen in continuous inclination, they are visualized in Figure 3. It is clear
Figure 4, where one of the inclination values visibly lags behind that no attribute, nor no log, carries complete data necessitat-
others. ing the use of patching methods for big gaps. Depth-based files
contain inclination data only starting from 3000 meters onward
Data quantity as a problem while time-based logs are missing over 500 meters of data in this
Paradoxically, the amount of attributes can also be a prob- range. While much more complete, data were noisy with multi-
lem. Throughout all the logs, there are over 15 000 attributes ple artifacts, especially for data logged with PowerDrive tool.

6 Copyright c 2020 by ASME


FIGURE 4. Inclination data of well F5

DISCUSSION, RECOMMENDATION

Volve dataset, due to its size, requires significant effort to


analyze. There is a great amount of information available in re-
lation to real-time drilling data; it remains however very much
hidden in many different files and formats.
While importing a parsed log in a CSV format may not al-
ways be a substitute to an in-depth analysis of a given problem, it
removes a technological barrier that existed for many researchers
preventing them from benefiting from the Volve dataset.
Information, tools, and methods, presented in this paper,
lower the bar for data access. We not only converted the data to
an easy to use form, but also highlighted Volve-typical problems
hiding within the dataset - and provided solutions. Wikipedia-
like portal dedicated to Volve data is being developed at Univer-
FIGURE 3. Inclination data of well F5 sity of Stavanger. It is planned that various researchers, com-
panies and students will contribute to a better understanding of
available logs and reports. It is also possible to develop so called
spider bots [14], that would convert the logs into human readable
reports.
It is potentially worth pursuing a well-centric, not a domain-
centric and field-wide approach. A complete description of a
All inclination values, when plotted in depth domain, show given well, including detailed description of all the logged at-
that different attributes are offset by up to 7 meters and differ tributes and linking key points from reports to response in the
from 0 to 0.25 degrees in value, as presented in Figure 4. It high- logs. Such painstakingly collected information may help creat-
lights depth lag issues and difference in calibration of different ing ideas and open possibilities in terms of automation of drilling
equipment. Depth lag can be explained by physical distance be- and more.
tween the sensors in the bottom hole assembly and appropriate Lowering the bar for real-world drilling data should also en-
correction has to be made before connecting different attributes able more research in relation to signal processing. Research
to achieve one inclination attribute that is continuous throughout into methods such as recursive least squares method used in data
the well. fitting to explore the correlation between different attributes is

7 Copyright c 2020 by ASME


applied successfully in both drilling [15–17] and in other fields 175–185. DOI 10.1080/00031305.1992.10475879.
[18–20]. URL http://www.tandfonline.com/doi/abs/
10.1080/00031305.1992.10475879.
CONCLUSION [8] Jayalakshmi, T. and Santhakumaran, A. “Statistical Nor-
malization and Back Propagationfor Classification.” In-
ternational Journal of Computer Theory and Engineer-
Having parsed WITSML real-time data we developed a
ing Vol. 3, No. 1 (2011): pp. 89–93. DOI 10.7763/
number of tools and methods necessary for efficient data anal-
ijcte.2011.v3.288.
ysis and further processing. We are releasing both the data in
[9] Tunkiel, Andrzej T, Wiktorski, Tomasz and Sui, Dan.
CSV format [13], Python source code for some of the discussed
“Continuous drilling sensor data reconstruction and predic-
functions [5], as well as a data exploration website [13].
tion via recurrent neural networks.” Submitted to Proceed-
We are hoping that this will make the Volve field more acces-
ings of the International Conference on Offshore Mechan-
sible to researchers, where easier access to data will yield valu-
ics and Arctic Engineering - OMAE. 2020.
able research.
[10] Tsay, Ruey S. “Time Series Model Specification in
Presented methods are not limited to drilling data. Similar
the Presence of Outliers.” Journal of the Ameri-
data problems exist in other fields; a lot of analogous problems
can Statistical Association Vol. 81, No. 393 (1986):
can be found when logging data from vehicles, be it in relation to
pp. 132–141. DOI 10.1080/01621459.1986.10478250.
self-driving cars or in personal monitoring in sports events, e.g.
URL https://www.tandfonline.com/doi/abs/
bicycle races. Attributes can be presented in terms of both time
10.1080/01621459.1986.10478250.
and distance, there may be data gaps due to dirt on car sensors
[11] Johansen, Søren and Nielsen, Bent. “Asymptotic
or lapses in heart rate monitoring data, etc. By analysing issues
Theory of Outlier Detection Algorithms for Lin-
in the drilling data we are building knowledge that can be easily
ear Time Series Regression Models.” Scandina-
translated the the other applications contributing to understand-
vian Journal of Statistics Vol. 43, No. 2 (2016):
ing of overall problem of data preparation.
pp. 321–348. DOI 10.1111/sjos.12174. URL
https://onlinelibrary.wiley.com/doi/
REFERENCES abs/10.1111/sjos.12174.
[12] Chaudhary, Nitinkumar L and Lee, W John. “Detecting and
[1] Geekiyanage, Suranga C.H., Sui, Dan and Aadnoy, Bernt S. Removing Outliers in Production Data to Enhance Produc-
“Drilling data quality management: Case study with a lab- tion Forecasting.” SPE/IAEE Hydrocarbon Economics and
oratory scale drilling rig.” Proceedings of the International Evaluation Symposium: p. 21. 2016. Society of Petroleum
Conference on Offshore Mechanics and Arctic Engineering Engineers, Houston, Texas, USA. DOI 10.2118/179958-
- OMAE: pp. –. 2018. American Society of Mechanical En- MS. URL https://doi.org/10.2118/179958-MS.
gineers. DOI 10.1115/OMAE2018-77510. [13] “Andrzej Tunkiel, Personal Page, University of Stavanger.”
[2] Equinor. “Volve field data (CC BY-NC-SA 4.0).” (2018). URL http://www.ux.uis.no/˜atunkiel/.
URL https://www.equinor.com/en/news/ [14] Saini, Gurtej, Chan, Hong Chih, Ashok, Pradeepkumar,
14jun2018-disclosing-volve-data.html. van Oort, Eric, Behounek, Michael, Thetford, Taylor and
[3] Baker, Monya and Penny, Dan. “Is there a reproducibility Shahri, Mojtaba. “Spider bots: Database enhancing and
crisis?” Nature Vol. 533, No. 7604 (2016): pp. 452–454. indexing scripts to efficiently convert raw well data into
DOI 10.1038/533452A. valuable knowledge.” SPE/AAPG/SEG Unconventional Re-
[4] “WITSML Data Schema Overview Version 1.4.0.” sources Technology Conference 2018, URTC 2018 (2018):
(2019). URL http://w3.energistics.org/ pp. 1–8DOI 10.15530/urtec-2018-2902181.
schema/witsml v1.4.0 data/doc/ [15] Aarsnes, Ulf Jakob F., Ambrus, Adrian, Vajargah,
witsml schema overview.html. Ali Karimi, Aamo, Ole Morten and Van Oort, Eric. “A
[5] Andrzej T. Tunkiel. “OMAE 2020 ATunkiel GitHub simplified gas-liquid flow model for kick mitigation and
Repository.” URL https://github.com/ control during drilling operations.” ASME 2015 Dynamic
AndrzejTunkiel/VolveDataExploration. Systems and Control Conference, DSCC 2015. 2015. Amer-
[6] “sklearn.neighbors.RadiusNeighborsRegressor — scikit- ican Society of Mechanical Engineers. DOI 10.1115/
learn 0.21.3 documentation.” URL https://scikit- DSCC2015-9791.
learn.org. [16] Gan, Chao, Cao, Weihua, Wu, Min, Chen, Xin, Hu, Yule,
[7] Altman, N. S. “An Introduction to Kernel and Wen, Guojun, Gao, Hui, Ning, Fulong and Ding, Huafeng.
Nearest-Neighbor Nonparametric Regression.” The “An Online Modeling Method for Formation Drillability
American Statistician Vol. 46, No. 3 (1992): pp. Based on OS-Nadaboost-ELM Algorithm in Deep Drilling

8 Copyright c 2020 by ASME


Process.” IFAC-PapersOnLine Vol. 50, No. 1 (2017): pp.
12886–12891. DOI 10.1016/j.ifacol.2017.08.1941.
[17] Nikoofard, Amirhossein, Johansen, Tor Arne and Kaasa,
Glenn Ole. “Design and comparison of adaptive esti-
mators for Under-balanced Drilling.” Proceedings of the
American Control Conference: pp. 5681–5687. 2014. In-
stitute of Electrical and Electronics Engineers Inc. DOI
10.1109/ACC.2014.6858930. TABLE 3. Sample count for all processed wells
[18] Anantharamu, Sreevatsa and Mahesh, Krishnan. “A paral-
lel and streaming Dynamic Mode Decomposition algorithm Folder Wellbore Depth Time
with finite precision error analysis for large data.” Journal Norway-NA 15/9-F-1 0 68 044
of Computational Physics Vol. 380 (2019): pp. 355–377.
Norway-Statoil-NO 15/9-F-1 C-0 270 968 2 883 803
DOI 10.1016/j.jcp.2018.12.012.
[19] Zhang, Hao, Rowley, Clarence W., Deem, Eric A. and Norway-Statoil-NO 15/9-F-1 C-A 162 468 1 088 289
Cattafesta, Louis N. “Online dynamic mode decomposi- Norway-Statoil-NO 15/9-F-1 C-B 90 008 4 528 122
tion for time-varying systems.” SIAM Journal on Applied Norway-Statoil-NO 15/9-F-1 C-C 181 242 5 638 806
Dynamical Systems Vol. 18, No. 3 (2019): pp. 1586–1609. Norway-Statoil-NO 15/9-F-4 0 522 851
DOI 10.1137/18M1192329. Norway-StatoilHydro 15/9-F-4 24 604 2 187 678
[20] Liu, Chang, Fu, Shixiao, Zhang, Mengmeng and Ren, Hao- NA-NA 15/9-F-5 0 118 989
jie. “Time-varying hydrodynamics of a flexible riser un- Norway-Statoil-NO 15/9-F-5 0 1 181 642
der multi-frequency vortex-induced vibrations.” Journal of Norway-StatoilHydro 15/9-F-5 30 357 1 287 580
Fluids and Structures Vol. 80 (2018): pp. 217–244. DOI Norway-Statoil 15/9-F-7 21 014 547 787
10.1016/j.jfluidstructs.2018.03.004.
Norway-Statoil-NO 15/9-F-7 0 145 432
Norway-Statoil-NO 15/9-F-9 0 113 272
APPENDIX
Norway-StatoilHydro 15/9-F-9 13 831 457 590
Norway-NA 15/9-F-9 A 16 671 419 748
Norway-StatoilHydro 15/9-F-10 264 836 2 387 148
TABLE 2. Volve Dataset files Norway-Statoil-NO 15/9-F-11 11 377 1 139 252
Norway-Statoil-NO 15/9-F-11 T2 454 485 4 100 406
Description Compressed size Norway-Statoil-NO 15/9-F-11 A 187 233 1 233 091
Geophysical Interpretations 99 MB Norway-Statoil-NO 15/9-F-11 B 265 904 4 278 694
GeoScience OW Archive 54.6 GB Norway-Statoil 15/9-F-12 147 251 2 086 734
Production Data 2 MB Norway-Statoil-NO 15/9-F-12 0 1 065 093
Reports 162 MB Norway-Statoil-NO 15/9-F-14 0 1 450 098
Reservoir Model (Eclipse) 390 MB Norway-StatoilHydro 15/9-F-14 11 0302 1 655 167
Reservoir Model (RMS) 2.1 GB Norway-StatoilHydro 15/9-F-15 112 715 778 240
Seismic ST0202 1.2 TB Norway-StatoilHydro 15/9-F-15 A 48 795 0
Seismic ST0202 vs ST10010 4D 330.4 GB Norway-StatoilHydro 15/9-F-15 B 651 139 500
Seismic ST10010 2.6 TB Norway-Statoil-NO 15/9-F-15 C 217 2 383 787
Seismic VSP 95 MB Norway-Statoil-NO 15/9-F-15 D 0 6 462 927
Well Logs 6.9 GB Norway-StatoilHydro 15/9-F-15 S 124 637 1 883 250
Well Logs (Per Well) 7 GB
Well Technical Data 212 MB
WITSML Real-Time Drilling Data 5 GB

9 Copyright c 2020 by ASME


TABLE 4. Count of attributes containing selected keywords

Folder Well Data Neutron Gamma Inclination Azimuth Continuous MWD Caliper
Norway-NA F-1 time 0 2 1 1 3 13 1
Norway-Statoil-NO F-1 C 0 depth 45 46 13 32 0 1 22
Norway-Statoil-NO F-1 C 0 time 0 13 8 7 0 1 3
Norway-Statoil-NO F-1 C A depth 37 17 7 27 0 1 18
Norway-Statoil-NO F-1 C A time 28 13 6 7 0 2 15
Norway-Statoil-NO F-1 C B depth 45 25 5 36 0 1 22
Norway-Statoil-NO F-1 C B time 28 24 6 19 0 2 15
Norway-Statoil-NO F-1 C C depth 39 68 9 75 0 1 20
Norway-Statoil-NO F-1 C C time 28 47 8 33 0 2 15
Norway-Statoil-NO F-4 time 0 0 0 0 0 1 1
Norway-StatoilHydro F-4 depth 4 6 3 3 4 22 1
Norway-StatoilHydro F-4 time 3 7 5 3 8 29 2
NA-NA F-5 time 0 0 0 0 0 1 1
Norway-Statoil-NO F-5 time 0 0 0 0 0 1 1
Norway-StatoilHydro F-5 depth 2 2 2 1 3 13 2
Norway-StatoilHydro F-5 time 0 6 3 1 4 23 3
Norway-Statoil F-7 depth 0 2 2 1 3 13 0
Norway-Statoil F-7 time 0 2 2 1 4 14 1
Norway-Statoil-NO F-7 time 0 0 0 0 0 1 1
Norway-NA F-9 A depth 0 3 1 1 2 15 0
Norway-NA F-9 A time 0 3 1 1 3 26 1
Norway-Statoil-NO F-9 time 0 0 0 0 0 1 1
Norway-StatoilHydro F-9 depth 0 2 1 1 2 17 0
Norway-StatoilHydro F-9 time 0 4 1 1 4 20 1
Norway-StatoilHydro F-10 depth 5 7 3 2 4 15 1
Norway-StatoilHydro F-10 time 3 6 3 2 5 16 2
Norway-Statoil-NO F-11 A depth 39 17 7 27 0 0 20
Norway-Statoil-NO F-11 A time 28 12 6 7 0 1 15
Norway-Statoil-NO F-11 B depth 39 49 9 64 0 1 20
Norway-Statoil-NO F-11 B time 28 25 6 21 0 2 15
Norway-Statoil-NO F-11 depth 0 17 6 6 0 0 0
Norway-Statoil-NO F-11 T2 depth 31 37 10 10 0 0 16
Norway-Statoil-NO F-11 T2 time 28 23 9 10 0 1 15
Norway-Statoil-NO F-11 time 0 13 6 6 0 1 1
Norway-Statoil F-12 depth 3 5 2 2 2 14 1
Norway-Statoil F-12 time 3 10 4 3 6 32 3
Norway-Statoil-NO F-12 time 0 0 0 0 0 1 1
Norway-Statoil-NO F-14 time 0 0 0 0 0 1 1
Norway-StatoilHydro F-14 depth 3 8 3 2 4 21 1
Norway-StatoilHydro F-14 time 3 6 3 2 5 22 2
Norway-StatoilHydro F-15 depth 3 5 3 3 4 22 2
Norway-StatoilHydro F-15 time 1 3 3 3 5 23 3
Norway-StatoilHydro F-15 A depth 5 4 2 2 4 23 1
Norway-StatoilHydro F-15 B depth 9 14 5 3 4 38 6
Norway-StatoilHydro F-15 B time 1 1 2 2 5 22 3
Norway-Statoil-NO F-15 C depth 0 0 0 0 0 0 0
Norway-Statoil-NO F-15 C time 0 0 0 0 0 1 1
Norway-Statoil-NO F-15 D time 28 47 7 33 0 2 15
Norway-StatoilHydro F-15 S depth 5 6 3 2 3 14 1
Norway-StatoilHydro F-15 S time 3 5 3 2 4 15 2

10 Copyright c 2020 by ASME

View publication stats

You might also like