Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

SPE 133929

Transforming Data into Knowledge using Data Mining Techniques: Application


in Water Production Problem Diagnosis in Oil Wells
M. Rabiei and R. Gupta, Curtin University of Technology, Y.P. Cheong and G.A. Sanchez Soto, CSIRO, Earth Science
and Resource Engineering

Copyright 2010, Society of Petroleum Engineers

This paper was prepared for presentation at the SPE Asia Pacific Oil & Gas Conference and Exhibition held in Brisbane, Queensland, Australia, 18–20 October 2010.

This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not been reviewed
by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect any position of the Society of Petroleum Engineers, its officers, or
members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is
restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.

Abstract Introduction
Excess water production is a serious economic and In recent years, as more amounts of data are
environmental problem in most mature oil fields. Accurate continuously measured and collected, unconventional and
and timely diagnosis of the water production mechanism is intelligent mathematical and soft computing techniques
critical in the success of the applied treatment methodology. become more popular in the oil and gas industry (Nikravesh
While many empirical techniques have been traditionally 2004; Aminzadeh 2005). Complex nature of the oil fields,
used in production data analysis, the significance of water- staggering volume and diversity of data and uncertainties
oil ratio (WOR) in proper identification of the type of the associated with it calls for more sophisticated techniques to
water production problem in oil wells is not yet fully integrate various types of data, quantify uncertainties,
investigated. Data mining techniques could facilitate identify hidden patterns and extract useful information.
extracting any hidden predictive information from oil and Data mining is one of the promising methodologies that
water production data to be used in water control studies. can offer great benefits to the oil industry by extracting
This paper applies a meta learning classification implicit, previously unknown and potentially useful
technique called Logistic Model Trees (LMT) to diagnose information from the huge amounts of raw data. This
water production mechanisms based on WOR data and static technique is an iterative and interactive process, which uses
reservoir parameters. Synthetic reservoir models are built to past and present information to discover previously
simulate excess water production due to coning, channeling unknown patterns in the data, then trains and builds models
and gravity segregated flows. Various cases are then to predict future trends and behaviors (Kantardzic 2002).
generated by varying some of the input parameters in each Classification trees are one of the most popular classification
model. A number of key features from plots of WOR against algorithms used in data mining. They are easy to use, simple
oil recovery factor are heuristically extracted by segmenting to understand and interpret, and require little data
these plots at certain points. LMT classifiers are then applied preparation (Tomei 2008). The successful application of
to integrate these features with reservoir parameters to build tree-based ensemble classifiers in predicting water
classification models for predicting the water production production mechanisms, were established in previous studies
mechanism in different scenarios of pre and post water- conducted by our group (Rabiei, Gupta et al. 2009; Rabiei,
production stages. Gupta et al. 2010).
It is observed that a valid association between WOR data In this study we explore another relatively new addition
and the water production mechanism exists. Our results with to the tree-based classifiers called LMT, which combines the
high prediction accuracy rates of 88% for pre-production linear logistic regression technique with classification tree
and more than 94% for post-production stage demonstrate algorithms to produce more comprehensible tree structures
efficiency of the proposed LMT classifiers and significance with higher accuracy. The significance of our current and
of WOR values in classifying excess water production previous experiments in studying excess water production
problems. mechanisms, is the different approach we have taken in
utilizing WOR diagnostic plots in diagnosing different types
2 SPE 133929

of water production problems. The deficiency of To provide the training dataset for the LMT
conventional WOR diagnostic plots (Chan 1995) in proper classification algorithm, we use simulated reservoir models,
identification of the type of the water production problem in each corresponding to a different water production
oil wells has been demonstrated in Seright (1998) and Rabiei mechanism. Reservoir input parameters in each model are
et al. (2009). Nevertheless, many oil companies to date then varied to produce several instances of that model to be
apply the WOR diagnostic plots for water production studies included in the dataset.
and problem diagnosis (Al Hasani, Khayari et al. 2008);
(Sanchez, Delgado et al. 2007). Although, in specific Reservoir Simulation Models
circumstances, these plots might help in diagnosing between Synthetic reservoir models (Fig. 1) were built to simulate
two more common types of excess water production excess water production due to coning, channeling and
problems, namely coning and channeling, they are by no gravity segregated flows. Water coning occurs when the
means general and applicable in all conditions (Seright water/oil contact locally rises toward the completed interval
1998). of a well that normally produces from an oil column lying on
In our studies, we explore plots of WOR against the oil top of an active aquifer (Seright, 1998). Water channeling is
recovery factor and use WOR feature subsets to extract common when high permeability layers or fractures allow
predictive data points from these plots to be used in the early water breakthrough during water flooding (Seright
classifier along with other reservoir parameters. Feature 1998). These models were simulated using a commercial
extraction helps in removing irrelevant and redundant reservoir simulation software (Roxar-Tempest).
information, which adversely affect the performance of the
Bottom water drive (water coning): A radial model with
classifier.
a drainage area of 160 acres. The radius of this sector model
The rest of the paper is organized as follows: we first
is 1490 ft with a total thickness of 300 ft, constant porosity
briefly introduce the LMT technique and then explain how
of 0.2 and a constant permeability of 1,000 mD. The oil
various simulated reservoir models were used to generate a
column is 100 ft thick with a 200 ft water column. A vertical
database for training and validating LMT classification
well is perforated at the top 20% of the 100 ft oil column
algorithms. Next we describe how feature parameters from
with a wellbore radius of 0.25 ft (Fig 1.a).
reservoir characteristics and production data were selected
and extracted and explain the process of building pre and Edge water drive (water coning): A 3D Cartesian grid,
post-water-production classifier models. Finally, we present with an area of 1,500 ft x 1,700 ft, constant porosity of 0.2,
the results and draw some conclusions. constant permeability of 1,000 mD, a true vertical thickness
of 100 ft and a dip angle of ~5 degree. A vertical well is
Theory of Logistic Model Trees (LMT) perforated to a true vertical depth of 50 ft above the OWC
Logistic model trees (LMT) algorithm (Landwehr, Hall (Fig 1.b).
et al. 2003) combines linear logistic regression and tree
Water injection (water channeling): A 3D Cartesian
induction algorithms in order to overcome the disadvantages
grid of 1,000 ft x 1,000 ft with a reservoir thickness of 100 ft
associated with the application of each of these methods
and a constant porosity of 0.2. A pair of injector and
separately. Linear logistic regression process is quite stable
producer is placed at both ends of this sector model for a
with low variance but potentially high bias. Tree induction
direct line-drive water-flooding pattern. Three different
algorithm on the other hand, exhibits low bias but often high
scenarios for small and large drainage area with different
variance (Landwehr 2003).
combination of flow units are considered (Fig 1.c).
LMT generates a single classification tree with binary
splits on numeric attributes and logistic regression models at
Edge water drive (water channeling): A 3D Cartesian
the leaves. The algorithm uses a stagewise fitting process to
grid similar to the water coning (edge water drive) model
construct the logistic regression models by incrementally
with a different location for the vertical well farther up dip.
refining the models constructed in higher levels. The LMT
The strong downdip aquifer provides the energy to sweep
tree is then pruned using the CART cross-validation based
the oil toward the updip producer. Two different scenarios
pruning algorithm.
with different number of flow units and various
permeabilities are considered (Fig 1.d).
Experiment Setup
Mathematically a classifier can be defined as:
Bottom water drive with baffles in vertical direction: A
I(y) = f (x1, x 2,..., x m ) , where (x1, x 2,..., x m ) are the values
3D Cartesian model with baffles in the vertical direction. In
of the m input parameters (qualitative or quantitative) used this model spherical thin impermeable layers (800 ft in
for classification and I(y) is an indicator random variable diameter) were randomly populated to act as zero
taking the possible values of {1, 2, ..., r} , where each number transmissibility in the vertical direction. It was observed that
corresponds to a classification category out of r possible cone forming is minimal and this model exhibits a
categories. channeling behavior (Fig 1.e).
SPE 133929 3

(a) (b)

(c) (d)

(e)

Figure 1. Simulated reservoir models


permeability layers, showed the gravity dominated flow or
Dataset generation water under-run problem (Bailey, Tyrie et al. 2000). These
Running each base model with different reservoir cases are labeled as ‘GravityDominated’ in our study. Those
characteristics input parameters results in a new instance of cases, where water production rate does not reach the
that model, from which a new case is formed to be included defined critical point of WOR equal to 0.1 (or 9% water cut)
in the dataset. We observed that some of the water injection in our analysis are labeled as ‘NoWater’ cases. These cases
and edge water drive models with high Kv/Kh ratio and low are used as the control cases for investigating the efficiency
4 SPE 133929

of the classifier in identifying risk free situations. A total of quantitative analysis of all the available WOR data can be a
n =714 cases of the aforementioned models were generated complex and tedious task. In this work, we identify segments
and for each case, WOR diagnostic plots were generated. of the WOR graph, where the gradient remains constant.
Each generated model represents a water production From these segments, we extract discrete RF values, which
mechanism and its associated reservoir and fluid properties. are meaningful to the user and represent useful
Various reservoir characteristics, well conditions and characteristics of water and oil production rate with respect
fluid properties are responsible in causing a water to the oil recovery factor (Fig 2).
production problem. In other words, each problem type case
is described by a set of input parameters and the resulted
WOR plots, which display characteristic trends of water and
oil production in that model.

Static imput parameters: At the first stage we use prior


domain knowledge, expert intuition and the available
literature to select the most plausible relevant input
parameters in causing water production problems. Typical
parameters selected at this stage are vertical to horizontal
permeability, API, wettability, well drainage area, aquifer
strength and water injection rate (Table 1). Further Analysis
of variance (ANOVA) test was conducted to confirm that the
average of the given input parameter was significantly
different across at least one of the classification group.
Figure 2. Sample plots of WOR against oil recovery factor for
different simulated reservoir models.
Table 1. Static reservoir parameters selected for inclusion in the
case structure
Variable Name Abbreviation These points are denoted as RFWOR(0.1 to 40) (e.g. RFWOR0.1
Vertical to horizontal permeability Kv/Kh represents the value of RF at WOR equal to 0.1). The RF
values below the point of WOR=0.1 are too small to yield
API API
any helpful information and hence are discarded. The cut off
Wettability WET value point is at WOR equal to 40, which represents 97.5%
Initial oil flow rate IOFR water cut. In total, 15 new dynamic input variables were
Plateau period for the initial oil flow rate PP extracted in this manner. These input variables are sufficient
Drainage area DA to capture the essential trend characteristics of the WOR
plots. Similar to the static parameters, the ANOVA
Aquifer strength – Water/oil volume AQWOV
technique was used to assess the significance of the
Water injection rate WIR extracted parameters in identifying water production
mechanism.
Dynamic WOR parameters: In addition to the static
reservoir input parameters, new dynamic WOR parameters Case Structure
are also introduced to be included in the case structure. The selected static and dynamic parameters form the
These dynamic features are extracted from the plots of WOR structure of each case in the dataset. Each row in the dataset
versus the oil recovery factor (RF) associated with each matrix corresponds to an individual case of water production
case. Unlike conventional WOR diagnostic studies that mechanism, defined by a vector. Each vector consists of a
focus on the trends of log/log plots of WOR (Chan 1995), set of predictor parameters representing the selected static
we explore plots of WOR against the oil recovery factor (a and dynamic parameters and the associated water production
dimensionless time, which is a ratio of cumulative oil being mechanism defined as the response parameter. The cases in
produced versus oil in-place with a maximum of unity) and the dataset are then randomly sampled to form the training
extract predictive data points from these plots to be used in and validating sets such that both training and validating
the classifier along with other reservoir parameters. The use datasets have the same proportion of cases from each water
of dimensionless time will enable better analyses and production mechanism class. The training set includes
comparisons of the WOR curves between models with a approximately two thirds of the cases in the dataset and the
wide range of drainage area, well operational histories, etc. remaining cases form the validating set.
Evaluating the behavior of the WOR plots using the
conventional WOR diagnostic technique is qualitative and LMT Classifer Algorithms
can be biased by human judgment. In this study, we adopt a Three different scenarios for building the classifiers are
quantitative approach in utilizing WOR data. However, considered and for each scenario, appropriate set of features
SPE 133929 5

are used accordingly. In the first scenario, named pre-water- Post-Water-Production Scenario (2): Similar to the
production scenario, only the static reservoir parameters are previous algorithm, another set of models were also built
used. Such a model could be applied before a well starts using dynamic RFWOR parameters, except that static
production to investigate the possible likelihood of a water parameters were discarded from these models. These models
production problem in the future. The next two scenarios are will be used to demonstrate whether dynamic production
applicable after water break-through happens in the well. For data alone can be effectively used with regards to identifying
the second scenario, both static reservoir features and water production mechanisms. If this hypothesis is feasible,
dynamic WOR features are employed in order to investigate these models can be successfully used to diagnose water
the interaction between these features and the resulted effect production mechanism in situations where immediate access
on problem diagnosis. The last scenario solely examines the to the static reservoir parameters is not possible. They can
significance of WOR features in diagnosing the water also facilitate a quick evaluation of the problem at hand
production mechanism without reflecting on the reservoir without having to go through all the detailed information
characteristics. related to the situation.

Pre-Water-Production Scenario: The first scenario • Model #1*: RFWOR0.1 + RFWOR0.5


considered for model generation is before the well starts • Model #2*: RFWOR0.1 + RFWOR0.5 + RFWOR1
producing water. In this model, the predictor parameters are • Model #3*: RFWOR0.1 + RFWOR0.5 + RFWOR1 + RFWOR2
the static reservoir parameters, which will be used for .
training different classifiers. The response parameter is the .
class of the water production mechanism, to which the .
classifier allocates the case. This model is a control model in • Model #14*: RFWOR0.1 + RFWOR0.5 + RFWOR1 + RFWOR2 + … +
which only static parameters are incorporated without
RFWOR40
including any dynamic production data. Such a model could
be used before the start of any water production to get a
Performance evaluation techniques
rough estimation of the possibility of future excess water
The learnt patterns from the cases in the training dataset
production in the well. This model will be referred to as
are applied to the remaining cases in the database to evaluate
Model #0 throughout this study
the efficiency of the trained classifier in classifying each
case into one of coning, channeling, gravity dominated or
Post-Water-Production Scenario (1): Once the well
the no-water classes. The performance of each implemented
starts producing water, the behavioral trend of the WOR vs.
model is evaluated based on the classification accuracy and
RF plot also starts to change. Instead of using all the
also kappa coefficient.
extracted dynamic features in just one model, we decided to
add these parameters sequentially and generate a separate
Classification Accuracy: The first measure used for
model for each stage of the water production cycle. This
evaluating the performance of the proposed models is the
would enable us to thoroughly examine the effect of the
percentage of correctly classified cases. Classification
extracted dynamic parameters in identifying the water
accuracy (or its complement, misclassification error) is used
production mechanism. It would also define at which stage
commonly for evaluating classifiers performance and
of water production cycle, one is more likely to identify the
prediction algorithms. Confusion matrix is a matrix whose
cause of water production more accurately. For this purpose,
rows represent the true classifications and columns represent
a separate classifier model was implemented for each
the classifications made by the algorithm. In the confusion
dynamic parameter, while taking into account the history of
matrix shown in figure 3, classification accuracy is equal to
WOR trends before that specific point. This means for each
the total number of correctly classified cases
new model the next RFWOR parameter in sequence is added
(P11+P22+P33+P44) devided by the total number of cases used
to the predictor parameters already used in the previous
in the study.
model.

• Model #1: all static variables + RFWOR0.1 + RFWOR0.5 Predicted


• Model #2: all static variables + RFWOR0.1 + RFWOR0.5 + RFWOR1 Channelling Coning
Gravity
NoWater
• Model #3: all static variables + RFWOR0.1 + RFWOR0.5 + RFWOR1 Dominated
+ RFWOR2 Channelling P11 P12 P13 P14
. Coning P21 P22 P23 P24
. Actual Gravity
P31 P32 P33 P34
. Dominated
• Model #14: all static variables + RFWOR0.1 + RFWOR0.5 + NoWater P41 P42 P43 P44
RFWOR1 + RFWOR2 + … + RFWOR40 Figure 3. Confusion matrix
6 SPE 133929

Kappa coefficient: Despite the widely use of The area under the ROC curve (AUC) measures the
classification accuracy measure, it is known in the machine- probability that the classifier output for a randomly chosen
learning community that this measure is not a perfect meter positive example is greater than the classifier output for a
and it has been demonstrated by Arie (2007) that a randomly chosen negative example. Higher AUC represents
classifier’s predictions may be due to mere chance. He states a better classification accuracy.
that “classifiers’ accuracy should be compared after
compensating for random hits” and one way to do this is the Results and Discussions
use of kappa coefficient (Cohen 1960). Cohen’s kappa The cases in the dataset are randomly sampled to form
measures the agreement between two categorical variables the training and validating sets such that both training and
while taking in to account those successfully classified cases validating datasets have the same proportion of cases from
that might be attributed to chance alone. Kappa coefficient each water production mechanism class. The training set
can range from 1.0 (perfect agreement) to -1.0 (complete includes approximately two thirds of the cases in the dataset
disagreement). A kappa value of zero indicates no agreement and the remaining cases form the validating set. The learnt
above that expected by chance. patterns from the cases in the training dataset are then
applied to the validating dataset to evaluate the efficiency of
Area under the ROC curve (AUC): AUC is one of the the trained classifier in classifying each case into one of
popular methods to measure the performance of a classifier. ‘Coning’, ‘Channeling’, ‘GravityDominated’ or ‘NoWater’
The ROC curve (Reciever Operating Characteristic) is the categories. The tree structure of one of the trained classifiers
plot of the true positive against the false positive outputs. (Model#14*) is shown in figure 4.

RFWOR4 <= 0: LM_1: (76)


RFWOR4 > 0
| RFWOR0.1 <= 0.067503: LM_2: (51)
| RFWOR0.1 > 0.067503
| | RFWOR0.1 <= 0.252044
| | | RFWOR20 <= 0.452907: LM_3: (34)
| | | RFWOR20 > 0.452907: LM_4: (53)
| | RFWOR0.1 > 0.252044: LM_5: (89)
Channelling: -6.13 + [RFWOR0.1] * 9.84 +[RFWOR20] * -1.13 +[RFWOR30] * -3.5
LMT_1 Coning: -11.15 + [RFWOR0.1] * -14.88 +[RFWOR40] * 11.88
GravityDominated: -11.83 + [RFWOR0.5] * 8.73 +[RFWOR2] * -2.56 +[RFWOR20] * 4.71
NoWater: 15.96 + [RFWOR0.1] * 0.01 +[RFWOR40] * -41.21
Channelling: 5.73 + [RFWOR0.1] * 100.96 +[RFWOR0.5] * -0.27 +[RFWOR2] * 0.9 +[RFWOR4] * 0.51 +[RFWOR20] * -1.13 +[RFWOR30] * -5.85 +[RFWOR40] * -23.54
LMT_2 Coning: -2.1 + [RFWOR0.1] * -136.55 +[RFWOR2] * 0.52 +[RFWOR5] * 17.72 +[RFWOR40] * 14
GravityDominated: -10.46 + [RFWOR0.5] * 11.15 +[RFWOR2] * -6.03 +[RFWOR5] * -29.55 +[RFWOR20] * 21.72 +[RFWOR30] * 16.49
NoWater: -6.66 + [RFWOR40] * -40.88
Channelling: 497.76 + [RFWOR0.1] * -37.09 +[RFWOR0.5] * -1781.16 +[RFWOR2] * 10.79 +[RFWOR4] * 0.51 +[RFWOR7] * -4.38 +[RFWOR10] * -12.57 +[RFWOR20] *
-7.47 +[RFWOR30] * -5.85 +[RFWOR40] * 7.53
LMT_3 Coning: -14.71 + [RFWOR0.1] * -15.09 +[RFWOR1] * -0.02 +[RFWOR2] * 15.4 +[RFWOR3] * -9.43 +[RFWOR4] * -7.36 +[RFWOR7] * -0.02 +[RFWOR10] * 1.8 +[RFWOR20]
* 4.06 +[RFWOR30] * 1.43 +[RFWOR40] * 13.74
GravityDominated: -511.12 + [RFWOR0.1] * 67.44 +[RFWOR0.5] * 1845.21 +[RFWOR2] * -26 +[RFWOR7] * 2.41 +[RFWOR8] * 1.65 +[RFWOR9] * 4.46 +[RFWOR10] *
6.72 +[RFWOR20] * 10.34 +[RFWOR40] * -10.54
NoWater: -21.66 + [RFWOR2] * 0 +[RFWOR40] * -40.88
Channelling: 5.51 + [RFWOR0.1] * 10.7 +[RFWOR0.5] * -0.27 +[RFWOR1] * -2.6 +[RFWOR2] * 18.82 +[RFWOR3] * 2.67 +[RFWOR4] * 0.51 +[RFWOR7] * -4.38
+[RFWOR10] * -25.99 +[RFWOR20] * -7.47 +[RFWOR30] * -5.85 +[RFWOR40] * 14.59
LMT_4 Coning: -1.61+[RFWOR0.1]* -16.89+[RFWOR2] *16.66+[RFWOR3] * -12.92+[RFWOR4] * -7.36+[RFWOR10] * 6.8+[RFWOR20] * 6.21+[RFWOR30] * 1.43+[RFWOR40] * 0.17
GravityDominated: -5.19 + [RFWOR0.1] * -6.67 +[RFWOR0.5] * 20.56 +[RFWOR2] * -35.58 +[RFWOR7] * 2.41 +[RFWOR8] * 1.65 +[RFWOR9] * 4.46 +[RFWOR10] *
18.09 +[RFWOR20] * 10.34 +[RFWOR40] * -10.54
NoWater: -21.66 + [RFWOR2] * 0 +[RFWOR40] * -40.88
Channelling: 1.96 + [RFWOR0.1] * 15.76 +[RFWOR0.5] * -0.27 +[RFWOR1] * -10.26 +[RFWOR2] * 0.9 +[RFWOR4] * 0.51 +[RFWOR7] * -7.12 +[RFWOR10] * -5.97
+[RFWOR20] * -1.13 +[RFWOR30] * -5.85 +[RFWOR40] * 16.85
LMT_5 Coning: -22.18 + [RFWOR0.1] * -19.33 +[RFWOR2] * 6.04 +[RFWOR3] * -1.19 +[RFWOR4] * -3.9 +[RFWOR5] * -3.3 +[RFWOR6] * -6.54 +[RFWOR20] * 38.44 +[RFWOR30]
* 1.43 +[RFWOR40] * 14
GravityDominated: 4.14 + [RFWOR0.1] * -2.29 +[RFWOR0.5] * 11.15 +[RFWOR1] * 2.09 +[RFWOR2] * -1.1 +[RFWOR6] * 2.72 +[RFWOR7] * 2.41 +[RFWOR8] * 1.65
+[RFWOR9] * 4.46 +[RFWOR20] * 5.71 +[RFWOR40] * -28.99
NoWater: -14.16 + [RFWOR2] * 0 +[RFWOR40] * -40.88
Figure 4. The LMT structure of Model#14*
SPE 133929 7

Classification Accuracy and confirm the superiority of the models with both reservoir
The total classification performances for all the models characteristics and WOR feature data. We also observe an
in three scenrios are shown in figure 5. Comparing the result improvement in AUC scores as soon as WOR data are
of Model # 0 with a total accuracy of 87%, with Models # introduced to the models compared to Model # 0. This figure
(1-14) demonstrates the significance of the extracted WOR also indicates a redcution in AUC values in Models#(1*-
features in improving the classification performance. As was 14*) corresponding to “NoWater”, “Coning”, “Channeling”
expected, models in which both reservoir characteristics and and “GravityDominated” mechanisms, respectively.
WOR variables were used perform incredibly well with a
staggering accuracy of at least 94%. However, we see a
slight decrease in the performance of Models # (1*-14*), in
which only WOR variables were used, compared to Model
#0. This accuracy decrease stablishes the important role of
reservoir characteristics in water production studies.
In addition to the total accuracy, performance of each
model in classifying individual water production
mechanisms can also be studied (Table 2). The results show
that the “GravityDominated” cases are the most difficult and
the “NoWater” cases are the most successful ones to predict.
Nevertheless, the results suggest that as more water
producton data becomes available, diagnosing the
“GravityDominated” cases improves in both post-water-
production scenarios.
Figure 5. Classification accuracy for all models
Out of the 29 classifiers developed, only Model#0
performs poorly in classifying the “NoWater” cases, while it
represents similar performance in predicting other problem
categories. An interesting point is that while Models#(1-14)
and Models#(1*-14*) achieve compareable results in
predicting the channelling problem, we see a significant
decline in accuracy in predicting the coning problem using
Models#(1*-14*). This demonstrates the role of reservoir
characteristics in initiating and forming a water coning
problem in an oil well.

Kappa Coefficient
A more robust measure to show the efficiency of the
proposed LMT classifiers is the kappa coefficient. Kappa
values higher than 0.8 are usually considered as very good
agreement, disproving the role of chance. As shown in figure Figure 6. Kappa coefficient
6, the trend of any fluctuation in the kappa coefficients,
corresponds perfectly to the total classification accuracies
presented in figure 5. Models#(1-14) have all kappa values
greater than 0.9, which confirm the efficiency of the
proposed classification algorithm. Models without reservoir
parameters have slightly lower than 0.8 kappa coefficients
but they are still in a range considered as good agreement.
Model #0 also shows good agreement with a kappa of 0.8.

Evaluation of AUC
The AUC measures the ability of a classifier to rank data
points in order of decreasing probability of belonging to the
positive class. An AUC score of 1.0 represents a perfect
classifier, while random guessing produces an AUC of 0.5.
Figure 7 shows a comparison of AUC scores for all the LMT Figure 7. Comparison of AUC scores of LMT models in
models generated in this study. The AUC scores too are in predicting different water production mechanisms
complete agreement with the classification accuracy results
8 SPE 133929

Table 2. Classification accuracy for individual categories current study will be updated as more real field data and
Gravity No different types of water production mechanisms are made
Channelling Coning
Dominated Water
available. Similarly, the classifier models will be
Model # 0 89% 88% 67% 87%
97% 100% 67% 100%
redeveloped and updated using the new extended training
Model # 1
Model # 2 95% 100% 73% 100% dataset and will be validated through a range of techniques.
Model # 3 94% 100% 80% 100%
Model # 4 94% 100% 73% 100% References
Model # 5 95% 95% 73% 100%
Model # 6 95% 100% 80% 100% Al Hasani, M.A., Al Khayari, S.R., Al Maamari, R.S. and Al
Model # 7 90% 100% 80% 100% Wadhahi, M.A. 2008. Diagnosis of Excessive Water
Model # 8 95% 95% 73% 100% Production in Horizontal Wells Using WOR Plots. In
Model # 9 92% 100% 80% 100% International Petroleum Technology Conference, Kuala
Model # 10 97% 100% 80% 100% Lumpur, Malaysia.
Model # 11 95% 100% 80% 100%
Model # 12 95% 100% 80% 100% Aminzadeh, F. 2005. Application of AI and Soft Computing for
Model # 13 97% 100% 80% 100% Challenging Problems in the Oil Industry. Journal of
Model # 14 92% 100% 80% 100%
Petroleum Science and Technology, 47(1-2): 5-14.
Model # 1* 95% 84% 0% 100%
Model # 2* 95% 87% 0% 100%
Arie, B.D. 2007. A Lot of Randomness Is Hiding in Accuracy.
Model # 3* 90% 89% 0% 100%
Engineering Applications of Artificial Intelligence, 20(7):
Model # 4* 94% 87% 0% 100%
875-885.
Model # 5* 94% 86% 0% 100%
Model # 6* 97% 88% 0% 100%
Model # 7* 94% 84% 0% 100%
Bailey, B., Tyrie, J., Elphick, J., Kuchuk, F., Romano, C. and
Model # 8* 94% 84% 0% 100% Roodhart, L. 2000. Water Control. Oilfield Review,
Model # 9* 98% 86% 0% 100% Schlumberger, 12(1): 30-51.
Model # 10* 95% 84% 0% 100%
Model # 11* 95% 84% 0% 100% Chan, K. S. 1995. Water Control Diagnosis Plots. Paper SPE 30775
Model # 12* 95% 86% 0% 100% presented at the SPE Annual Technical Conference &
Model # 13* 89% 84% 40% 100% Exhibition. Dallas, USA.
Model # 14* 73% 89% 38% 100%
Cohen, J. 1960. A Coefficient of Agreement for Nominal Scales.
Conclusion Educational and Psychological measurement, 20(1): 37-46.
To the best of our knowledge, the methodology Fulcher, J. 2008. Computational Intelligence: An Introduction.
presented in this study is the first attempt in extracting Studies in Computational Intelligence, Springer.
knowledge-rich WOR features and applying Logistic Model
Trees in water production diagnostics. Kantardzic, M. 2002. Data mining: concepts, models, methods and
In this paper we have described how dimension algorithms, John Wiley.
reduction techniques can be applied to extract the essential
and useful information from the huge amount of raw data Landwehr, N., Hall, M. and Frank, M. 2003. Logistic Model Trees.
available. The high classification accuracy results indicate In Proceedings of the 14th European Conference on Machine
the important role of identification and extraction of relevant Learning, Croatia, 241-252. Springer-Verlag.
WOR features in diagnosing the type of water production Liu, H., Hussain, F., Tan, C.L. and Dash, M. 2002. Discretization.
mechanisms in oil wells. Data Mining and Knowledge Discovery, 6: 393-423.
The findings reported here establish that the LMT
technique can still be successfully applied in situations Nikravesh, M. 2004. Soft Computing-Based Computational
where either reservoir information or water producton data Intelligent for Reservoir Characterization. Expert systems with
are not available. Although, the accuracy rates in these applications, 26(1): 19-38.
models are not as high as the models that use both sources of
information, they have a reasonable accuracy rate of at least Rabiei, M., Gupta, R, Cheong, Y.P. and Sanchez Soto, G.A. 2009.
84%. Our results reveal that WOR monitoring could also Excess Water Production Diagnosis in Oil Fields Using
Ensemble Classifiers. In International Conference on
help in predicting the type of the water production Computational Intelligence and Software Engineering.
mechanisms before the actual problem hitting the well, Wuhan, China.
which means remedial actions could be taken accordingly
ahead of time. Rabiei, M., Gupta, R, Cheong, Y.P. and Sanchez Soto, G.A. 2010.
We anticipate that the complete water production A Novel Approach in Extracting Predictive Information From
diagnostic system will be made available to the end user as a Water-Oil Ratio For Enhanced Water Production Mechanism
stand-alone tool in future. The training data used in the Diagnosis. APPEA Journal, 2010.
SPE 133929 9

Sanchez, P.Z., Delgado, M.A. and Quinones, V.H. 2007. Water


Control in Heavy-Oil Mature Field, Block 1AB. Paper
SPE108039 presented at SPE Latin American and Caribean
Petroleum Engineering Conference Buenos Aires, Argentina.

Seright, R.S. 1998. Improved Methods for Water Shutoff. Final


Technical Progress Report (U.S. DOE Report
DOE/PC/91008-14), U.S. DOE Contract DE-AC22-
94PC91008, BDM-Oklahoma Subcontract G4S60330

Tomei, L.A. 2008. Encyclopedia of information technology


curriculum integration. USA.

Webb, A.R. 2002. Statistical Pattern Recognition, John Wiley &


Sons, Ltd.

You might also like