Petroleum: Big Data Analytics in Oil and Gas Industry: An Emerging Trend

Petroleumxxx(xxxx)xxx–xxx
Contents lists available at ScienceDirect
Petroleum
journal homepage: http://www.keaipublishing.com/en/journals/petroleum
Big Data analytics in oil and gas industry: An emerging trend

Mehdi Mohammadpoor, Farshid Torabi∗
FACULTY of Engineering AND Applied Sciences, University of REGINA, REGINA, S4S 0A2, SK, CANADA
ARTICLEINFO
ABSTRACT
Keywords:
This paper reviews the utilization of Big Data analytics, as an emerging trend, in the upstream and downstream
Big Data
Hadoop oil and gas industry. Big Data or Big Data analytics refers to a new technology which can be employed to handle
R large datasets which include six main characteristics of volume, variety, velocity, veracity, value, and com-
Oil and gas industry plexity. With the recent advent of data recording sensors in exploration, drilling, and production operations, oil
and gas industry has become a massive data intensive industry. Analyzing seismic and micro-seismic data,
improving reservoir characterization and simulation, reducing drilling time and increasing drilling safety, op-
timization of the performance of production pumps, improved petrochemical asset management, improved
shipping and transportation, and improved occupational safety are among some of the applications of Big Data in
oil and gas industry. Although the oil and gas industry has become more interested in utilizing Big Data analytics
recently, but, there are still challenges mainly due to lack of business support and awareness about the Big Data
within the industry. Furthermore, quality of the data and understanding the complexity of the problem are also
among the challenging parameters facing the application of Big Data.
1. Introduction petroleum engineering. This shows how the interest in Big Data has
changed from 2012 to 2018 among the oil and gas industry executives.
The recent technological improvements have resulted in daily gen- This paper presents an extensive review on the recent papers about
eration of massive datasets in oil and gas exploration and production the application of Big Data analytics in both upstream and
industries. It has been reported that managing these datasets is a major downstream oil and gas industry. In the first part of the paper, Big
concern among oil and gas companies. A report by Brule [1] stated Data is defined and the processing tools are introduced. In the second
that petroleum engineers and geoscientists spend over half of their part of the paper, the utilization of Big Data in oil and gas industry is
time in searching and assembling data. Big Data refers to the new presented. For the last part, the major challenges facing the Big Data
technologies in handling and processing these massive datasets. These analytics in oil and
datasets are recorded in different varieties and generated in large gas industry are addressed.
volume in various operations of upstream and downstream oil and
gas industry [2–10]. Moreover, in most cases, if processed efficiently, 2. Big Data analytics
they can reveal important underlying governing equations behind
sophisticated engineering problems. It is reported by Mehta [11] that 2.1. Big DATA definition
based on the results of a survey conducted by General Electric and
Accenture among the executives, 81% of them considered Big Data to Big data includes unstructured (not organized and text-heavy) and
be among the top three priorities of oil and gas companies for 2018. multi-structured data (including different data formats resulting from
Based on their paper, the main reason behind this popularity is the people/machines interactions) [13]. The term Big Data (also called Big
need for improving the oil and gas exploration and production Data Analytics or business analytics) defines the first characteristic of
efficiency. This viewpoint and future prediction among executives for this method and that is the size of the available data set. There are
2018 become more interesting once we compare the findings by other characteristics related to the data which make it viable for Big
Feblowitz [12] in 2013. Based on a survey in 2012 by IDC Energy, 70% Data tools. Those characteristics are well named by IBM as three Vs.
of the participants from U.S. oil and gas companies were not familiar These three Vs refer to volume, variety, and velocity [14]. However,
with Big Data and its applications in more recent articles have added two more Vs to give a better
definition for
Peer review under responsibility of Southwest Petroleum University.

∗
Corresponding author.
E-MAIL ADDRESS: fa r sh id .t o ra bi@ureg ina .ca (F. Torabi).
https://doi.org/10.1016/j.petlm.2018.11.001
R24e0ce5i-v6e5d612/0 CAoupgyursitgh2t01©8;20R1e9ceSivoeudthiwn
e rsetvPiseetdrofloerumm2U3nNivoevrseimtyb.ePrr2od0u18ct;ioAnccaenpdtehdo3st0inNgob
.org/licenses/BY-NC-ND/4.0/).
vyemElbseevr i2e0r1B8.V.ThisisanopenaccessarticleundertheCCBY-NC-ND
Pleasecitethisarticleas:Moham
mad
por,M.,Petroleum,https:/
/doi.org/1
0.1
610/j.petlm.2
8.1
101.0
10
M. MOHAMMADPOOR, F. Petroleumxxx(
TORABI xxxx)xxx–xxx
Big Data. The additional Vs include veracity and value [15]. filtered to be used for data analysis; otherwise the results will not be
Volume refers to the quantity of data or information. These data reliable. The veracity of data is challenging in oil and gas industry
can come from any sensor or data recording tool. This vast quantity of specifically due to nature of data, which mainly comes from subsurface
data is challenging to be handled due to storage, sustainability, and facilities and it might include uncertainty. Another challenge comes
analysis issues [13]. Many companies are dealing with huge volume of from the data collected by conventional manual data recording, which
data in their archives; however they do not have the capability of is done by human operators.
processing these data. The main application of Big Data is to provide Value is a very significant characteristic of the Big Data. The re-
processing and analysis tools for the increasing amounts of data [15]. turned value of investments for Big Data infrastructures is of a great
It is obvious that this characteristic of Big Data can be seen in importance. Big Data analyzes huge data sets to reveal the underlying
various sectors of oil and gas industry, such as exploration, drilling, trends and help the engineers to forecast the potential issues. Knowing
and production. During oil and gas exploration seismic data the future performance of equipments used during operation and
acquisition generates a large amount of data used to develop 2D and identifying the failures before happening can make the company to
3D images of the subsurface layers. For the offshore seismic studies, have competitive advantage and bring value to the company.
narrow-azimuth towed streaming (NATS) uses the gathered data to It is also stated in the literature that beside these five Vs there is
develop images of the underlying geology. Wide azimuth (WAZ) is a another important characteristic, which should be considered for ap-
more recent in- novation to capture more data and develop higher plying Big Data. This important characteristic is about the complexity
quality images. All these tools and innovations are generating more of the problem for which the data gathering is conducted [17]. Dealing
data which requires processing and analysis. with large data sets which are coming from a complex computing
Recent innovations in drilling tools are also generating large problem is sophisticated and finding the underlying trend can be
amount of data during drilling operations. Tools such as logging while challenging. For these problems Big Data tools can be very helpful.
drilling (LWD) and measurement while drilling (MWD) are Fig. 1 summarizes the above mentioned characteristics of Big Data.
transmitting various data to the surface real time.
Optical fibers combined with various sensors are now being used 2.2. Big DATA methodology
in well tubular to record different parameters such as fluid pressure,
temperature, and composition during oil and gas production [12]. As the Big Data is involving huge data sets and in some cases
The term velocity as a characteristic of Big Data refers to the speed complicated problems, it is very important to have access to innovative
of data transmission and processing. It also refers to the fast pace of and powerful technologies. These robust technologies should be very
data generation. The challenging issue about the velocity component is fast and accurate processors. In this section the tools and technologies
the limited number of available processing units compared to the vo- which are available for Big Data analytics are listed and introduced.
lume of data. Recently, the data generation velocity is huge, as a data
of 5 exabyte is generated just in two days. This is equivalent to the 2.2.1. APACHE HADOOP
total amount of data created by humans until 2003 [16]. This tool is an open-source framework which is created by Doug
The velocity characteristic is even more prominent for oil and gas Cutting and Mike Caferella in 2005 which is named after a toy
industry due to complex nature of various petroleum engineering pro- elephant [15]. Hadoop is initially written in Java [14] and it uses
blems. Processing large amount of generated data by an individual for distributed processing through enormous clusters of computers [15].
a complex problem is impossible and results in significant delay and Hadoop has the capability of parallel processing of huge data sets,
uncertainty. There are many cases in which real time and fast proces- which results in scalable computing. Apache Hadoop is comprised of
sing of data is crucial in oil and gas industry. For example, fast pro- two major layers: Hadoop distributed file system (HDFS) and
cessing of well data during drilling can result in identifying kicks and MapReduce. In fact, Apache Hadoop is a framework to implement
preventing destructive blow-outs efficiently [12]. MapReduce programming model [18]. The tasks are handled in two
Variety refers to the various types of data which are generated, major phases. The first phase, which is storing data, is done under
stored, and analyzed. The data recording devices and sensors are dif- HDFS layer with its master/slave architecture by a master server
ferent in types and as a result the generated data can be in different called NameNode and clusters of slaves which are called DataNodes.
sizes and formats. The formats of the generated data can be in text, Fig. 2 shows the architecture of the HDFS layer.
image, audio, or video. The classification can be done in a more tech- The second phase of handling tasks, which includes tracking and
nical way as structured, semi-structured, and unstructured data [16]. It executing jobs, will take place in MapReduce layer. The master node
is reported that generally 90% of the generated data is unstructured for MapReduce is called JobTracker and the slave node is called
[15]. However, the majority of oil and gas generated data from TaskTracker [18]. In other words, the data processing and analysis in
SCADA systems, surface and subsurface facilities, drilling data, and Hadoop is conducted in two phases which are called Map phase and
production data are structured data. These data could be time series Reduce phase. MapReduce can handle large datasets in parallel by
data which have been recorded through a certain course of time. using multiple clusters. These clusters are scalable and they are flexible
Another source of structured data includes the asset, risk, and project and fault-tolerant [18]. In Map phase the data will be divided into two
management reports. There would be also external structured data
sources such as market prices and weather data, which can be used for
forecasting. The sources of unstructured data in oil and gas industry
include well logs, daily written reports of drilling, and CAD drawing.
The sources of semi- structured data include processed data as a result
of modeling and simulation. There are various practices of
experimental and computer simulation in the oil and gas industry to
generate data for further analysis. These data can be categorized as
semi-structured data and later to be used with Big Data tools [12].
Veracity refers to the quality and usefulness of the available data
for the purpose of analysis and decision making. It is about
distinguishing between clean and dirty data. This is very important as
the dirty data can significantly affect the velocity and accuracy of data
analysis. The generated data should be professionally and efficiently Fig. 1. Big Data characteristics.
processed and
2
and output visualization tools. friendly interface to conduct
column orientated. Cassandra It is estimated to gain more various data processing tasks
was first a Facebook project interest as it uses a user- [23].
that became open sourced few
years later. It is especially
efficient where it is possible to
spend more time to learn a
complex system which will
provide a lot of power and
flexibility [23].
2.3. Big DATA processing
Fig. 2. HDFS architecture

Big data sets which are
with Namenode and collected need to be analyzed
Datanodes [19]. to extract the valuable
underlying information. There
groups of Key and Value. In have been different processing
fact, key is node ID and value is tools which translates the large
the property of the node. So, the data sets into meaningful
input data are taken by results and outcomes.
MapReduce in key- value pairs Following is a list of common
and JobTracker assigns tasks to processing tools for Big Data.
TaskTracker. Then further
processing of data will be
2.3.1. R
conducted by TaskTracker.
R is a modern, functional
Then the output data during
programming language that
Map phase will be sorted and
allows for rapid development
stored in a local file system
of ideas, together with object-
during an intermediate phase.
oriented features for rigorous
In the next step, the sorted data F
software development initially
will be passed to Reduce phase, i
created by Robert Gentleman g
where the input data will be
and Robert Ihaka. The .
combined [20]. Fig. 3 shows
powerful set of inbuilt
the architecture of MapReduce.
functions makes it ideal for 3
high-volume analysis or .
2.2.2. MANGODB statistical simulations. It also
This is a NoSQL (non- supports the packaging system, M
relational) database which means that the code a
technology which is provided by others can easily p
document-orientated, based be shared. Finally, it generates R
on JSON and written in C++. high-quality graphical outputs, e
JSON is data processing d
so that all stages of a study, u
format based on a JavaScript from modeling/analysis to c
and is built on a collection of publication, can be undertaken e
name/value pairs or an within R [25].
ordered list of values. NoSQL It can be said that R is a a
database technology can specialized language which r
handle unstructured data such includes various modules and c
as documents, multi- media, toolboxes to mainly facilitate h
and social media. Moreover, i
the statistical computations. It t
MangoDB provides a dynamic can help with loading data, e
and flexible structure to be conducting complicated c
customized to fit the computations, and finally t
requirements of various users visualizing the results and u
[13,21–24]. r
outputs. However, from data e
processing point of view, R's
2.2.3. CASSANDRA major drawback is working
w
This is another NoSQL with datasets that fit within a i
database technology which is single machine's memory [23]. t
key and h
2.3.2. DATAMEER M
Datameer is an easy to use a
programming platform which p
uses Hadoop to improve its
data processing. It comes with a
user-friendly data importing n
3
d uncertainty. However, if a large
data set is available and used (PCA) with self-organizing
R with more sophisticated STEM maps (SOM) to carry out multi-
tools, more promising patterns component seismic analysis. In
can be re- cognized, which may his research, the analysis was
be much closer to the true followed during five stages.
values [27]. During the first stage, the
geological issue was clearly
defined; then during the second
3.1. Big DATA in EXPLORATION stage, PCA was run to identify
the key attributes related to the
The task of interpreting the defined problem; during the
seismic data requires third stage, SOM was run by
sophisticated processing employing machine learning
computers with powerful tools to train a prediction tool;
visualizations capabilities. With during the fourth stage, the
the recent improvements in outcomes of SOM analysis was
seismic devices, the amount of further analyzed by 2D maps to
generated data has boosted identify the important
e significantly. The detailed geological features; finally
d interpretation of these new during the fifth stage, a
u datasets needs to go beyond the sensitivity analysis was
c conventional methods. In fact,
e conducted to refine the results
one of the most important by considering various
applications of Big Data in oil attributes and different training
p and gas industry is analyzing the
h scenarios [29].
seismic data [28]. Machine In another research done by
a
s learning tools can reveal the Joshi et al. [30], Big Data was
e relationship between the utilized to analyze the micro-
s recorded data more efficiently, seismic data sets to model the
specifically for the recent case of fracture propagation maps
[ dealing with huge datasets. In a during hydraulic fracturing. In
2 research conducted by Roden this research, the authors used
0 [29], the author incorporated
] the Hadoop platform instead of
principal component analysis conventional tools to manage
.
the massive datasets generated
2.3.3. BigSheets the upstream oil and gas Fig. 4. The relationship between by micro-seismic tools. They
IBM has offered a web industry is also impacted by the data, STEM tools, and patterns used various datasets from
application called BigSheets, versatility of Big Data. The perception [27]. exploration, drilling, and
which helps less expert and application of Big Data has production operations to
nontechnical users to gather become prominent as the characterize the reservoir.
unstructured data from various amount of data generated and Furthermore, the success ratio
online and internal sources and recorded in oil and gas industry was improved by detecting the
then conduct a data analysis has significantly increased. The potential anomalies based on
and present the results with improvements in seismic the previous failed jobs [30].
simple visualization tools. acquisitions devices, channel In a study by Olneva et al.
BigSheets also utilizes Hadoop counting, fluid front monitoring Big Data was used to cluster 1D,
to process massive datasets. It geophones, carbon capture and 2D, and 3D geological maps for
also employs some additional sequestration sites, LWD, and West Siberian Petroleum Basin
tools such as OpenCalais to MWD tools have provided vast with seismic data. For their
facilitate the extracting of amount of data to be processed work, they followed two
structured data from a pool of and analyzed [26]. Anand [27] different approaches which was
unstructured data. This tool presents an in- formative called by the authors as “from
should be used for data analysis description on why and how Big general to particulars” and
individually and it is easier to Data can now reveal too much “from particulars to general”
be used by the users familiar hidden information from the approaches. For the first
with spreadsheet applications vast amount of available data in approach, they used drilling
[23]. oil and gas industry. He used a data and regional maps for 5000
3D plain to show the relationship wells. For the second approach,
between data, science, they used seismic and
3. Big data in upstream oil
technology, engineering, and geological patterns for more
and gas industry
mathematics (STEM) tools, and than 40000 km2 [31].
pattern recognition. As it is
The application of Big Data
shown in Fig. 4, if limited 3.2. Big DATA in drilling
is now extended beyond the
amount of data is utilized with
database, marketing, and
basic STEM tools, the result There are various sources of
business techniques. Many
would reveal limited patterns, data in drilling industry which
engineering disciplines are
which may lack thorough insight mainly include the generated
utilizing Big Data analytics for
and may carry significant data from digital rig site and
various applications. Recently,
4
manually entered data by electronic drilling recorder tool (MLib) to conduct the Big quisition 2. Data transfer to
human operators. These data (EDR), and cross-plots of Data analytics. They also showed business domain 3. Data storage
which are gathered from weight on bit (WOB) and that transferring the developed [41].
different operations through differential pressure were used model to a web-based platform Big Data has also been used to
drilling can be applied to to optimize the drilling can facilitate the user/system conduct reservoir modeling for
conduct various analyses from performance. In their study, interactions [38]. unconventional oil and gas
scheduling to drilling they emphasized that data Recently, a new generation of resources [42,43]. Lin [42]
operation itself. The invention filtering, quality control, and reservoir simulation technique is combined the physics and
and application of new data also knowing the basic physics becoming more popular. This analytics-based solutions to carry
recording tools and data behind the problem under new technique incorporates the out reservoir modeling by using
formats have made it even study are critical factors, which artificial intelligence and data Big Data.
more applicable to employ the should be considered in order mining technologies with the Udegbe et al. [44] used Big
Big Data tools in drilling to find a reliable optimized Closed-Loop Reservoir Data to improve the modeling of
operations. There are now outcome. Otherwise, the Management (CLRM) and hy- draulically fractured
more than 60 different sensors, findings can be misleading Integrated Asset Modeling reservoirs by analyzing the
which are recording various which result in loss of time (IAM). The result will be an production data. They generated
parameters throughout and resources. innovative information-oriented the required data by developing a
drilling operations [32]. In a In another study done by reservoir modeling approach. dual-permeability model and
work done by Duffy et al. [33] Yin et al. [35], Big Data was In fact, data-driven methods can trying various fracture
the drilling rig efficiency was used to find the invisible non- improve the modeling by parameters. They applied the
improved by implementing production time (INPT) by predicting the affective pattern recognition (similar to a
best-safe-practices initiatives using the collected real- time parameters which theory-based face detection technology)
identified by an automated logging data. The authors equations of state cannot capture methodology to the generated
drilling state detection improved the drilling [1,39]. data to reveal the underlying
monitoring service. In their operations by optimizing the In a study by Haghighat et al. trends in the data.
case study on pad drilling in INPT through using [40], Big Data and data-driven Big Data has also been used to
Bakken, they focused on mathematical statistics, the methods were utilized to optimize the selection and
Wight to Weight (W2W) artificial experience, and cloud improve the CO2 sequestration application of costly enhanced
connection time during computing. by predicting the possibility of oil recovery (EOR) methods. In a
drilling operations. Based on In a study by Johnston and CO2 leakage. For this purpose, study done by
their results, a savings of more Guichard [36] Big Data was two permanent downhole
than 11.75 days on a single employed to reduce the risks gauges (PDG) were installed in
pad of nine wells drilled by associated with drilling an observation well to collect the
the same rig was observed. operations. They used drilling pressure data. The different
They also found that the total data, well logging data, and scenarios of CO2 leakage were
non-drilling time was geological formation tops for modeled by using a simulated
improved by 45%. In another about 350 oil and gas wells in reservoir model for the field of
study by Maidla et al. [34] the the UK North Sea. They were interest (Citronelle Dome,
drilling performance was dealing with different data Alabama). Machine learning
improved by applying Big types such as .txt, .xls, .pdf, and tools were used to analyze the
Data analytics and including .las. They reported that the high volume and high frequency
drilling and formation challenging part was the data pressure data. Finally, they were
parameters. In their study, the gathering and processing step able to develop a real-time, long-
data from morning report, in the term, CO2 leakage detection
system.
project. downhole gauges (PDG), and In a study carried out by
In a study by Hutchinson et discrete distributed strain Popa et al. [41] Big Data was
al. [37] the data from downhole sensors (DDSS) has resulted in utilized to conduct an
vi- bration sensors were utilized generation of huge amount of optimization on heavy oil
to characterize the drill string data in the field of reservoir reservoirs which are under steam
dynamics. In their study they characterization. Bello et al. [38] assisted gravity drainage
combined the actual data with used these data to develop a (SAGD) and cyclic steam
the simulation data to develop a reservoir management operations. In their research, the
drilling automation application. application based on utilizing authors focused on Chevron's
There developed model re- Big Data analytics. The four San Joaquin heavy oil reservoirs
duced the risks of drilling major components of their which in total included more
failures and also lowered the application included visualizer, than 14,200 wells. These number
drilling development costs. downhole data filtering, model of wells provided vast amount of
builder, and model application. structured and unstructured
3.3. Big DATA in reservoir The visualizer helped with data static and dynamic data
engineering viewing and analysis, while the including various logs,
filtering component was used to temperature, steam, core data,
The advent of distributed eliminate the outliers and non- fluid saturation, well completion
downhole sensors such as reliable data. For the model data, geological features, steam
distributed temperature sensors builder, machine learning tools in- jection rate and pressure, and
(DTS), discrete distributed were used to do the training, flow-line and wellhead
temperature sensors (DDTS), model development, and temperature. The workflow of
distributed acoustic sensors validation. They used the their study was followed in
(DAS), single-point permanent Apache Spark machine learning following steps: 1. Data ac-
5
Xiao and Sun [45], the workflows which conducted are developed by employing maintenance costs.
researchers employed Big the required calculations to data analysis can significantly In a recent project by Repsol
Data analytics to optimize the develop the model, and the reduce the downtime and SA, Big Data analytics is
application of EOR projects third step was in- teractive data utilized to
through an improved hydro- visualization which provided a
dynamic reservoir simulation. user-friendly interface to conduct management based on publicly available
extract the results [50]. optimization for one of the automatic identification system
3.4. Big DATA in production Shale operators are also company's integrated refineries data and marine environment
engineering using Big Data to improve in Spain. For this project, Google data. The energy efficiency was
hydraulic fracturing projects. Cloud would provide Repsol defined as the ship fuel
Seemann et al. [46] from In a project done by a shale with data analytics products and consumption by engine power
Saudi Aramco developed a operator, Southwestern consultation as well as Google versus the operation weight and
smart forecast and flow Energy, the field and Cloud machine learning services distance. For implementing Big
method to conduct automated simulation data revealed that [54]. Data, authors used Hadoop
decline analysis. Their goal proppant loading and spacing In a study by Khvostichenko framework and Apache Spark for
was to identify the underlying between fracturing stages and Makarychev-Mikhailov [17] machine learning tasks.
pattern in production data and would significantly affect the Big Data was used to develop a In a study conducted by
to forecast the production productivity index [51]. workflow to investigate the Tarrahi and Shadravan [57], Big
performance. In another study conducted effects of completion parameters Data analytics was used to
Rollins et al. [47] by Ockree et al. [52] Big Data on well productivity. They improve the oil and gas
conducted a study for Devon was used to develop AI- based gathered the data from 4500 well occupational safety by managing
Energy to develop a production type curves to be which were under slickwater the risk and enhancing the safety.
production allocation incorporated with economic treatment. They in- vestigated The study was carried on based
technique by using Big Data. analysis to conduct field the effects of two different on a case by Bureau of Labor
For the first task, they used the development. In their work, the chemicals i.e. linear guar gels Statistics (BLS) which included
publicly available data from first step was followed based and surfactant-based flowback 846 sources of injury from 1278
IHS to develop an allocation on an extensive data processing aids. They also gathered the industries between 2011 and 2014.
methodology. In the next step pipeline including raw data monthly production data from The first step in their study was
Big Data was used as a (structured and unstructured the IHS Energy database. The data collection and processing.
platform to conduct the databases) gathering, data statistical approach used to For this purpose they filtered the
allocation procedure for the filtering, joining the filtered analyze the data was t-test. raw data based on the quality of
users. The processing tool for data, and transferring the data recordings and they eliminated
Big Data in their study was to machine learning pipeline. the outliers from datasets by
Hadoop. They finally The authors used Robust 4.2. Big DATA in oil AND GAS relative standard error
developed a user- friendly Mahalonobis technique to TRANSPORTATION measurement. Then they
map-based visual output for remove the outliers from the developed the structured data by
the allocated production data. gathered data. Anagnostopoulos [55] format conversion and data
Moreover, Big Data has conducted a research to apply decoding. In the next step the
been successfully used to 4. Big data in downstream oil Big Data analytics in order to authors conducted data clustering
optimize the performance of and gas industry improve the shipping and mapping to identify the
electric submersible pumps performance. In his study, he underlying hidden trends. At the
(ESPs) [48,49]. Sarapulov and 4.1. Big DATA in refining aimed to predict the propulsion end, in order to present an easily
Khabibullin [48] utilized Big power to improve the understandable outcome, they
Data to evaluate the In a study by Plate [53], the performances of ships and used multi-dimensional statistical
performance of ESPs by application of Big Data in consequently to lower the analysis [57,58].
identifying emergency refining is reviewed. In this greenhouse gas emissions. The It is reported by Pettinger [59]
situations such as overheating case study, the historical data data gathered for this study were that the data gathered from safety
and un- successful start-ups. were analyzed and processed collected over period of three inspections can be used to
For their study a total of about to improve a petrochemical months from the sensors develop safety predictive
200 million logs were gathered asset management in a three- throughout a LCTC (Large Car analytics. It is crucial to gather the
from 1649 wells during one step procedure. In this case Truck Carrier) M/V. In the next safety indicator data within the
year. The raw data gathered study, the equipment of step, they used eXtreme company con- tinuously and
were in various formats, so the interest was a four-stage Gradient Boosting (XGBoost) incorporate them in predictive
authors first converted all the cracked gas compressor (CGC). and Multi- Layer Perceptron analytics. The safety indicators
data to csv format. The analysis started by first (MLP) neural networks to which will provide the required
In a study done by Palmer predicting the performance of conduct the data analysis. data includes assessing beha-
and Turland [50], Big Data CGC by analyzing the current viors and assessing compliance.
was utilized to optimize the and historical oper- ating data. Cadei et al. [60] employed Big
performance of rod pump In the next phase, based on the 4.3. Big DATA in HEALTH AND Data to develop prediction
wells based on a three-step device's end-of-life criteria and SAFETY Executive (HSE) software to forecast hazard events
workflow. The three steps of failure conditions, the and operational upsets during oil
their workflow included the performance prediction of the In a study by Park et al. [56] and gas production operations.
first step to be the data CGC was further tuned. Big Data was utilized to develop The indicator that they used as a
acquisition which was Finally, the estimated an energy efficiency model based hazard event for prediction was
comprised of well test data, performance of the CGC was on the operation data gathered H2S concentration. They gathered
well equipment data, and presented in a user-friendly during ship operations. In their data from various sources
supervisory control and data and visual report to be used for study, an energy indicator called including real-time series,
acquisition (SCADA), the management decisions [53]. energy efficiency operational historical data, maintenance
second step was automated These predictive reports which indicator (EEOI) was estimated reports, operator data, and
6
chemical analysis. The challenges in digital oilfields is scientists to correctly apply the
workflow of their study in- 5. Big Data challenges the data transfer from the field Big Data tools to solve the
cludes data collection, problem to data processing facilities various problems in the field of
definition, data processing, One of the major challenges based on the type of data, petroleum engineering.
modeling (using artificial neural of Big Data's application in any amount of data, and data It is recommended by
network (ANN), random industry including oil and gas protocols [64,65]. Preveral et al. [66] that each
forest), and finally model industry is the cost associated In a survey conducted by company develop their specific
validation. with managing the data IDC Energy [12], it was found Big Data tools, including data
recording, storage, and that the biggest challenge in recording and storage facilities
analysis. With the recent utilizing Big Data in oil and gas and also data analytic tools. This
technological improvements, industry is lack of awareness would reduce the cost of
fog computing, cloud and business support. Other software ownership and it
computing, and Internet of challenges found in that survey would optimize the value of the
Things (IoT) have become were decision about the recorded data.
available to fix the issues relevant data, lack of skilled
regarding data storage and personnel, and cost of Big Data 6. Conclusions
computations [22,61]. Costly infrastructure. Therefore,
and limited cloud computing familiarizing the staff and In this paper a
facilities are not suitable executive members with the comprehensive review was
options for non-fixed location technology and its applications conducted on the application
or latency-sensitive appli- will significantly facilitate the of Big Data analytics in oil and
cations. On the other hand, fog implementation of Big Data in gas industry. The term Big Data
computing facilities provide oil and gas industry. (also called Big Data Analytics
storage and computing In a more recent study, or business analytics) defines
facilities closer to data Maidla et al. [34] listed more the first characteristic of this
generation sources, which technical challenges facing the method, which is the volume
resolves the mentioned application of Big Data. Based (size) of the available data set.
challenges to some extent. on their research, the technical The other characteristics of Big
However, IoT is a newer issues were mainly related to Data are velocity, variety,
technology, which is more the limitations associated with veracity, value, and complexity.
mobile and fixes the latency the data recording sensors. The Because of the recent
issues as well [62]. other issue was the frequency improvements in data recording
In a study done by of data recording and also the technologies and the necessity
Cameron [63], the author quality of the recorded data. for efficient exploration and
mentions that the challenges of Finally, an important challenge production operations, Big Data
using Big Data for oilfield is the thorough understanding has gained interest and sig-
service companies include the of the physics of the problem. nificance in oil and gas
knowledge of personnel in oil Expert petroleum engineers industry. For the exploration
companies and the data should collaborate with data operations, the
ownership issues. He mentions
that Big Data can be used for recent improvements in seismic refining, oil and gas
seismic analysis, reservoir devices, the amount of generated transportation, and HSE.
modeling, drilling services, and data has boosted significantly. It Although Big Data is gaining
production reporting [63]. has been reported that methods interest by E&P companies, but
Further- more, he defined nine such as PCA analysis or there are still some major
factors for a successful platforms such as Hadoop can be challenges which are required to
application of Big Data for oil used to interpret seismic and be addressed in order to apply the
and gas industries including micro-seismic data. In a case Big Data efficiently. Those
accurately defining the study in the field of drilling challenges mainly include lack of
business problem, combining engineering, the data obtained business support and awareness
Big Data methods with through an automated drilling about the Big Data within the in-
physics-based data analysis, state detection monitoring dustry, quality of the data, and
using interdisciplinary team of service, was analyzed to improve understanding the complexity of
computer scientists and the drilling time and drilling the problem.
petroleum engineers, safety. Furthermore, analyzing
delivering the results as a user- the data from DTS, DDTS, DAS, Acknowledgements
friendly interface, being need- PDG, and DDSS sensors have
driven, and addressing exactly improved the reservoir The authors gratefully
how the solved problem is characterization and appreciate the financial support
related to the whole picture simulation. Big Data has been from Petroleum Technology
[63]. successfully used in production Research Centre and Mitacs.
The emergence of Big Data engineering in areas such as
in oil and gas industry has optimization of the performance References
become more prominent by electric submersible pumps and
evolution of digital oilfields, production allocation [1] M.R. BruléGroup IBMS, The Data
where various sensors and techniques. Big data has also Reservoir : How Big Data
Technologies Advance Data
recording devices are been successfully used in Management and Analytics in E & P
generating millions of data downstream of oil and gas in- Introduction – General Data
each day. One of the critical dustry in areas such as oil Reservoir Concepts Data, Reservoir
for E & P, 2015.
7
[2] B.C. Wipro, K.K. Wipro, Smart analysis, SPE Int. Conf. Exhib. using Big data analytics,
Decision Making Needs Form. Damage Control, Lafayette: [20] A4ACADEMICS. MapReduce Offshore Technol. Conf, SPE,
Automated Analysis " Making Society of Petroleum Engineers, Architecture n.d. Houston, 2015.
Sense Out of Big Data in Real-time, 2018, https://doi.org/ http://a4academics.com/tutoria [37] M. Hutchinson, L.D. International,
(2014). 10.2118/189551-MS. ls/ 83-hadoop/840-map-reduce- B. Thornton, P. Theys, Optimizing
[3] W. Wu, X. Lu, B. Cox, G. Li, L. Lin, [18] M. Rehan, D. Gangodkar, Hadoop, architecture (Accessed August 7, drilling by simulation and
Q. Yang, et al., Retrieving MapReduce and HDFS : a 2018). automation with Big data, SPE
Information and Discovering developers perspective, Procedia - [21] C. Gyęrödi, R. Gyęrödi, G. Annu. Tech. Conf. Exhib, Society of
Knowledge from Unstructured Data Procedia Comput Sci 48 (2015) 45– Pecherle, A. Olah, A comparative Petroleum Engineers, Dallas, 2018,
Using Big Data Mining Technique : 50, https://doi.org/10.1016/j.procs. Study : MongoDB vs, MySQL https://doi.org/10.2118/191427-
Heavy Oil Fields Example, (2014). 2015.04.108. (2015) 0–5. MS.
[4] A Bin Mahfoodh, M. Ibrahim, M. [19] Borthakur D. HDFS Design [22] N. Mounir, Y. Guo, Y. Panchal, I.M. [38] O. Bello, D. Yang, S. Lazarus,
Hawi, K. Hakami, S. Aramco, n.d. Mohamed, A.W. Management, X.S. Wang, T. Denney, B.H.
Introducing a Big Data System for https://hadoop.apache.org/d Integrating Big Data : Simulation , Incorporated, Next Generation
Maintaining Well Data Quality ocs/r1.2.1/hdfs_ design.html Predictive Analytics , Real Time Downhole Big Data Platform
and Integrity in a World of (Accessed August 7, 2018). Monitoring , and Data for Dynamic Data-driven
Heterogeneous Environment Warehousing in a Single Cloud Well and Reservoir
Methedology, (2017). Application, (2018). Management, (2017).
[5] R.K. Perrons, J. Jensen, The [23] P. Warden, Big Data Glossary, [39] M.R. Brulé, I.B.M.S. Group, Big
Unfinished Revolution : what Is O’REILLY, Sebastopol, CA, USA, Data in E & P : Real-time
Missing from the E & P Industry ’ 2011. Adaptive Analytics and Data-
S Move to “ Big Data ”, (2014). [24] T. Kudo, M. Ishino, K. Saotome, flow Architecture, (2013), pp. 5–7.
[6] R.K. Perrons, J.W. Jensen, I. N. Kataoka, A proposal of [40] S.A. Haghighat, S.D. Mohaghegh, V.
Corporation, Data as an Asset : transaction processing method for Gholami, A. Shahkarami, D. Moreno,
what the Upstream Oil & Gas MongoDB, Procedia - Procedia W. Virginia, Using Big Data and
Industry Can Learn about “ Big Comput Sci 96 (2016) 801–810, Smart Field Technology for Detecting
Data ” from Companies like Social https:// Leakage in a CO2 Storage Project,
Media what Has Made Big Data doi.org/10.1016/j.procs.2016.08.2 (2013), pp. 1–7.
Possible ? (2014). 51. [41] A.S. Popa, E. Grijalva, S. Cassidy,
[7] M. Akoum, A. Mahjoub, SPE [25] S.J. Eglen, A Quick Guide to J. Medel, A. Cover, C. North, et
167410 a unified Framework for Teaching R Programming to al., SPE-174912- MS Intelligent
Implementing Business Intelligence , Computational Biology Use of Big Data for Heavy Oil
Real-time Operational Intelligence Students 5 (2009) 8–11, Reservoir Management, (2015).
and Big data Analytics for https://doi.org/10.1371/journ [42] A. Lin, Principles of Big Data
Upstream, (2013), pp. 1–15. al.pcbi.1000482. Algorithms and Application for
[8] C.J.N. Sousa, I.H.F. Santos, V.T. [26] J. Spath, S.P.E. President, Big Unconventional Oil Introduction :
Almeida, A.R. Almeida, G.M. Data !, (2015). Insufficient Resources, (ISR)
Silva, A.E. Ciarlini, et al., [27] P. Anand, S. Resources, Big Data Is Computing and BD, 2014.
Applying Big Data Analytics to a Big Deal, (2013). [43] C. Chelmis, J. Zhao, V. Sorathia,
Logistics Processes of Oil and Gas [28] A. Alfaleh, S. Aramco, Y. Wang, S. Agarwal, V. Prasanna, M.
Exploration and Production A. Texas, B. Yan, Topological Hsieh, Semiautomatic , semantic
through a Hybrid Modeling and Data Analysis to Solve Big Data assistance to manual curation of
Simulation, (2015). Problem in Reservoir Engineering data in smart oil fields, SPE West.
[9] K. Hilgefort, Big data analysis using : Application to Inverted 4D Reg. Meet, SPE, Bakersfield, CA,
bayesian network modeling: a case Seismic Data, (2015). USA, 2012, pp. 1–18.
study with WG-ICDA of a gas [29] R. Roden, G. Insights, Seismic [44] E. Udegbe, E. Morgan, S.
storage field, Nace Int. (2018) 1– Interpretation in the Age of Big Srinivasan, T. Pennsylvania, SPE-
13. Data, (2016), pp. 4911–4915. 187328-MS from Face Detection to
[10] A. Sukapradja, J. Clark, H. [30] P. Joshi, R. Thapliyal, A.A. Fractured Reservoir
Hermawan, S. Tjiptowiyono, E. Chittambakkam, R. Ghosh, S. Characterization : Big Data
Total, P. Indonesie, Sisi nubi Bhowmick, S.N. Khan, OTC- Analytics for Restimulation
Dashboard : implementation of 28381-MS Big Data Analytics Candidate Selection, (2017).
business intelligence in reservoir for Micro-seismic Monitoring, [45] J. Xiao, X. Sun, Big Data Analytics
modelling & Synthesis : managing (2018), pp. 20–23. Drive EOR Projects, (2017), pp. 5–8.
Big data and streamline the [31] T. Olneva, D. Kuzmin, S. [46] D. Seemann, M.W. Spe, S.
decision making process, Field Rasskazova, A. Timirgalin, G. Hasan, S. Aramco, SPE 167482
General. 1–14 (2017). Ntc, Big data approach for Improving Resevoir
[11] A. Mehta, Tapping the Value from geological study of the Big region Management through Big
Big Data Analytics, (2018) 2016–7. West Siberia, SPE Annu. Tech. Data Technologies, (2013), pp.
[12] J. FeblowitzInsights IDCE, Conf. Exhib, SPE, Dallas, 28–30.
Analytics in Oil and Gas: the Big 20182018. [47] B.T. Rollins, A. Broussard, B.
Deal about Big Data, (2013), pp. [32] N. Rossi, J. Michelez, F. Concina, Cummins, A. Smiley, N. Dobbs,
5–7. Big Data for Advanced Well Continental production
[13] Trifu MR, Ivan ML. Big Data: Engineering Holds Strong allocation and analysis through
Present and Future n.d.:32–41. Potential to Optimize Drilling Big data, Unconv. Resour.
[14] H.E. Pence, What is Big Costs, (2018). Technol. Conf, Society of
Data and Why is it [33] W. Duffy, J. Rigg, E. Maidla, Petroleum Engineers, Austin,
Important ? vol. 43, (2015), T.D.E. Petroleum, D. Solutions, 2017, ,
pp. 159–171, Efficiency Improvement in the https://doi.org/10.15530/urtec-
https://doi.org/10.2190/ET Bakken Realized through Drilling 2017-2678296.
.43.2.d. Data Processing Automation and [48] N.P. Sarapulov, R.A. Khabibullin,
[15] J. Ishwarappa, J. Anuradha, A the Recognition and SPE-187738-MS Application of Big
Brief Introduction on Big Data Standardization of Best Safe Data Tools for Unstructured Data
5Vs Characteristics and Hadoop Practices, (2017). Analysis to Improve ESP
Technology vol. 48, (2015), pp. [34] E. Maidla, W. Maidla, J. Rigg, M. Operation Efficiency, (2017).
319–324, Crumrine, P. Wolf-zoellner, [49] S. Gupta, L. Saputelli, F.
https://doi.org/10.1016/j. Drilling analysis using Big data Corporation, M. Nikolaou, Big
procs.2015.04.188. has been misused and abused, Data Analytics Workflow to
[16] M.S. Sumbal, E. Tsui, See-to EWK, IADC/SPE Drill. Conf. Exhib., Safeguard ESP Operations in
M.S. Sumbal, E. TsuiSee-to EWK, Fort Worth, 2018 Real-time, (2016), pp. 25–27.
Interrelationship between Big https://doi.org/10.2118/189583 [50] T. Palmer, M. Turland, SPE-
Data and Knowledge -MS. 181216-MS Proactive Rod
Management : an Exploratory [35] Q. Yin, J. Yang, B. Zhou, M. Pump Optimization :
Study in the Oil and Gas Sector, Jiang, X. Chen, C. Fu, et al., Leveraging Big Data to
(2017), Improve the Drilling Accelerate and Improve
https://doi.org/10.1108/JKM-07- Operations Efficiency by the Operations, (2016).
2016- 0262. Big Data Mining of Real-time [51] J. Betz, J.P.T.S. Writer, Low oil
[17] D. Khvostichenko, S. Makarychev- Logging, SPE/IADC- 189330- prices increase value of Big Data
mikhailov, Effect of fracturing MS, 2018. in fracturing, J. Petol. Technol.
chemicals on well Productivity : [36] J. Johnston, A. Guichard, New 67 (2015) 60–61
avoiding pitfalls in Big data findings in drilling and wells https://doi.org/10.2118/0415-
8
0060-JPT. 181037-MS Big Data Analytics
[52] M. Ockree, K.G. Brown, J. for Prognostic Foresight New
Frantz, M. Deasy, R. Dimension of Petroleum Asset
Resources-appalachia, Management, (2016), pp. 6–8.
Integrating Big data analytics [54] R. Brelsford, Repsol launches
into development planning Big data, AI project at tarragona
optimization, SPE/AAPG East. refinery, Oil Gas J. 116 (2018).
Reg. Meet, Society of [55] A. Anagnostopoulos, Big Data
Petroleum Engineers, Techniques for Ship Performance
Pittsburgh, 2018, Study, (2018), pp. 887–893.
https://doi.org/10.2118/ [56] S. Park, M. Roh, M. Oh, S. Kim, W.
191796-18ERM-MS. Lee, I. Kim, et al., Estimation model
[53] M Von Plate, C. Ag, SPE- of energy
efficiency operational indicator https://doi.org/10.2118/1011-

using public data based on Big 0042-JPT.
data technology, 28th Int. Ocean [62] S. Konovalov, R. Irons-mclean,
Polar Eng. Conf., Sapporo, Addressing O & G Big Data
International Society of Offshore Challenges at the Remote Edge
and Polar Engineers, 2018, pp. Fog Computing and Key Use
894–897. Cases, (2015), pp. 3–5.
[57] M. Tarrahi, A. Shadravan, R. Llc, [63] D. Cameron, S. As, Big Data in
Advanced Big Data Analytics Exploration and Production : Silicon
Improves HSE Management, Snake-oil , Magic Bullet , or
(2016). Useful Tool ? (2014).
[58] M. Tarrahi, A. Shadravan, R. Llc, [64] P. Neri, Big Data in the Digital
Intelligent HSE Big Data Oilfield Requires Data Transfer
Analytics Platform Promotes Standards to Perform Industry
Occupational Safety Fatal Environment a Trend towards
Occupational Injuries Data Base, More Collaboration
(2016), pp. 26–28. Standardization the Critical Role
[59] C.B. Pettinger, Leading indicators , of Metadata, (2018), pp. 1–6.
culture and Big Data : using your [65] Y. Gidh, N. Deeks, L.O. Grovik,
data to eliminate death, ASSE D. Johnson, J. Schey, J.
Prof. Dev. Conf. Expo, American Hollingsworth, Paving the Way
Society of Safety Engineers, for Big Data Analytics through
Orlando, 2014. Improved Data Assurance and
[60] L. Cadei, M. Montini, F. Landi, F. Data Organization, (2016).
Porcelli, V. Michetti, E.S. [66] A. Preveral, A. Trihoreau, N.
Upstream, et al., Big data Petit, Geographically-distributed
advanced anlytics to forecast Databases : a Big data technology
operational upsets in upstream for production analysis in the oil
production system, Abu Dhabi Int. & gas industry, SPE Intell. Energy
Pet. Exhib. Conf, Society of Conf. Exhib, Society of Petroleum
Petroleum Engineers, Abu Dhabi, Engineers, Utrecht, 2014,
2018, https://doi.org/10.2118/ 167844-
https://doi.org/10.2118/193190- MS.
MS.
[61] R. Beckwith, Managing Big Data :
cloud computing, J Pet Technol 63
(2011) 42–45

Petroleum: Big Data Analytics in Oil and Gas Industry: An Emerging Trend

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Petroleum: Big Data Analytics in Oil and Gas Industry: An Emerging Trend

Uploaded by

Copyright:

Available Formats

Petroleumxxx(xxxx)xxx–xxx

Contents lists available at ScienceDirect

Big Data analytics in oil and gas industry: An emerging trend

Peer review under responsibility of Southwest Petroleum University.

2.3. Big DATA processing

Fig. 2. HDFS architecture

eﬃciency operational indicator https://doi.org/10.2118/1011-

You might also like