Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

energies

Article
An Automatic Classification Method of Well Testing
Plot Based on Convolutional Neural Network (CNN)
Hongyang Chu 1,2,† , Xinwei Liao 1,2 , Peng Dong 1,2,† , Zhiming Chen 1,2, *, Xiaoliang Zhao 1,2 and
Jiandong Zou 1,2
1 College of Petroleum Engineering, China University of Petroleum, Beijing 102249, China
2 State Key Laboratory of Petroleum Resources and Engineering, Beijing 102249, China
* Correspondence: zhimingchn@cup.edu.cn
† These authors contributed equally to this work.

Received: 16 June 2019; Accepted: 21 July 2019; Published: 24 July 2019 

Abstract: The precondition of well testing interpretation is to determine the appropriate well
testing model. In numerous attempts in the past, automatic classification and identification of well
testing plots have been limited to fully connected neural networks (FCNN). Compared with FCNN,
the convolutional neural network (CNN) has a better performance in the domain of image recognition.
Utilizing the newly proposed CNN, we develop a new automatic identification approach to evaluate
the type of well testing curves. The field data in tight reservoirs such as the Ordos Basin exhibit various
well test models. With those models, the corresponding well test curves are chosen as training samples.
One-hot encoding, Xavier normal initialization, regularization technique, and Adam algorithm are
combined to optimize the established model. The evaluation results show that the CNN has a better
result when the ReLU function is used. For the learning rate and dropout rate, the optimized values
respectively are 0.005 and 0.4. Meanwhile, when the number of training samples was greater than
2000, the performance of the established CNN tended to be stable. Compared with the FCNN of
similar structure, the CNN is more suitable for classification of well testing plots. What is more, the
practical application shows that the CNN can successfully classify 21 of the 25 cases.

Keywords: convolutional neural network; well testing; tight reservoirs; pressure derivative;
automatic classification

1. Introduction
Well testing generally has two major categories: Transient rate analysis and transient pressure
analysis. For the transient pressure analysis, its main purpose is to identify the type of target reservoir
and further quantitatively determine the reservoir properties. Muskat [1] first proposed a method of
estimating the initial reservoir pressure and parameters using a buildup test plot. Due to the fact that
compressibility of the formation fluid is difficult to study, this method only can qualitatively analyze
the results. Van Everdingen and Hurst [2] used the Laplace integral method to obtain the analytical
solution of the transient diffusion equation, which gives the mathematical theoretical basis of well
testing. Based on this truth, Horner et al. [3] developed a classic “semi-log” analysis method, which
can determine the permeability, skin factor, productivity index, and other parameters. These methods
make full use of the mid and late period data in well testing, but a common disadvantage is that the
early data of the well testing is ignored.
In order to make reasonable use of the early data in well testing, Ramey et al. [4] first proposed
a “plate analysis method” of the log–log type plot. Further, Gringarten et al. [5] extended this
method to various well test models such as the dual-porosity model and fractured well model, and a
combination of different parameters were used to greatly reduce the difficulty in curve classification

Energies 2019, 12, 2846; doi:10.3390/en12152846 www.mdpi.com/journal/energies


Energies 2019, 12, 2846 2 of 27

and interpretation, which indicated that the well testing interpretation was widely used around the
world. Bourdet et al. [6] found that different types of reservoirs had distinct responses in the pressure
derivative curve, so the pressure derivative curve was introduced into the “plate analysis method”.
Compared to the pressure dynamic, the application of the pressure derivative curve makes the
classification of reservoir types, and the overall curve fitting, easier. Therefore, the pressure derivative
plot is the most critical part of the large-scale application of well testing interpretation methods.
Recently, with the advancement in machine learning technology and the vast datasets in the
petroleum industry, the broad prospects of machine learning technology in the petroleum industry have
gradually been proven, and it has been applied to different aspects of the petroleum industry [7–13].
Awoleke et al. [12] combined self-organizing maps, the k-means algorithm, the competitive-
learning-based network (CLN), and the feed-forward neural network (FFNN) to predict the well water
production in Barnett shale. The expected misclassification error was about 10% for CLN and the
average prediction error was between 10% and 26% for FFNN, which depended on the quality of the
training data set.
Akbilgic et al. [14] used a neural network-based model to predict the steam-to-oil ratio in oil sands
reservoirs. Porosity, permeability, oil saturation, reservoir depth, and thickness characterized by well
logging and core data were used as data sets for the models.
With deep neural networks (DNNs), Wang et al. [11] used production data from 2919 wells in
Bakken shale reservoirs to forecast well productivity. Results show that the predicted oil production of
DNNs for both six months and 18 months was acceptable and the average proppant placed per stage
was the most important factor in affecting productivity.
In numerous research studies about machine learning in the petroleum industry, Al-Kaabi and
Lee [15] firstly used a three-layer FCNN to determine the well test interpretation model. In their work,
the pressure derivative and corresponding time were entered into the FCNN with 60 input nodes.
Additionally, different well test models were exported, and the accuracy of the prediction was verified
by two field examples. This meaningful work demonstrates meaningful guidelines for later work
on well test plot identification by neural networks. Following that, a series of researchers [16–18]
utilized a more complex network structure and data set to optimize FCNN’s recognition result of well
testing curves.
Although a large number of scholars have done some meaningful research, due to the previous
limitations of CPU performance and mathematical theory basis, there are also several shortcomings
as follows: (a) The number of training samples and input nodes in present neural network models
are relatively insufficient, which greatly restricts the generalization ability of neural network models.
(b) There is no corresponding method to overcome the over-fitting and local minimum problem, which
is the phenomenon that exists widely in the fitting process of neural network models. (c) Almost all
the current research is limited to the FCNN, and the newly proposed CNN has not been considered
in research.
Nowadays, CNN is one of the most popular methods in the field of machine learning. Compared
with FCNN, the CNN has a better performance in the domain of image recognition [19–22]. Since the
different forms of pressure derivative curves represent various reservoir types, flow regimes, and outer
boundary properties, in this paper, an automatic classification method of well testing curves is proposed
based on CNN. By summarizing the buildup test data in low permeability reservoirs, the vertically
fractured well model, dual-porosity model, and radial composite model were selected as the base
model, which were used to generate 2500 theoretical curves of five different types. To overcome the
problem of overfitting and local minimum, the regularization technique, Adam optimization algorithm,
ReLU activation function, and mini batch technique were used to optimize the established CNN.
The model input nodes were 488, which ensured that the information of the curve is completely input.
Further, we compared the training performance of CNN and FCNN. The analysis of confusion matrix
showed that the Score for CNN and FCNN on the validation set were 0.91 and 0.81 respectively, which
Energies 2019, 12, x FOR PEER REVIEW 3 of 27
Energies 2019, 12, 2846 3 of 27
better prediction result than FCNN. Finally, 25 buildup test curves from Ordos Basin were used to
verify the generalization ability of the CNN noted above.
means that the CNN had a better prediction result than FCNN. Finally, 25 buildup test curves from
Ordos Basin were used to verify the generalization ability of the CNN noted above.
2. Background
2. Background
The Ordos Basin is the second largest sedimentary basin in China and it contains abundant oil
and gas
Thereserves.
Ordos BasinIn terms
is theofsecond
geology, the Ordos
largest Basin is
sedimentary a large-scale
basin in China multicycle craton
and it contains basin with
abundant oil
simple
and gastectonics,
reserves. which
In termsis made up of the
of geology, the Ordos
Yimeng uplift,
Basin is aWeibei uplift,
large-scale the Western
multicycle margin
craton basinthrust
with
belt, the
simple Tianhuan
tectonics, which depression,
is made up andof thethe Jinxiuplift,
Yimeng flexture
Weibeibeltuplift,
[23,24]. This Basin
the Western marginis thrust
a typical
belt,
low-permeability formation with an average permeability of less than 1 mD. Except for
the Tianhuan depression, and the Jinxi flexture belt [23,24]. This Basin is a typical low-permeability the Chang 6
reservoir with
formation withdeveloped
an average horizontal bedding
permeability [25,26],
of less than the
1 mD.horizontal
Except stress
for theinChang
most areas of the basin
6 reservoir with
is greater than the vertical stress, which means that the fractures generated by
developed horizontal bedding [25,26], the horizontal stress in most areas of the basin is greaterhydraulic fracturing
than
are vertical
the mainly vertical fractures
stress, which means [27–30].
that the fractures generated by hydraulic fracturing are mainly vertical
fractures [27–30].
3. Theory
3. Theory
3.1. Concept of CNN
3.1. Concept of CNN
Traditional neural networks (like the FCNNs) use the matrix multiplication to describe the
Traditional
connection neural
between inputnetworks
nodes and (like the FCNNs)
output use the matrix
nodes. Wherein, multiplication
each individual weighttoofdescribe
the weightsthe
connection between
matrix describes theinput nodes and
interaction output
between annodes. Wherein,
input unit and aneach individual
output weight
unit. For of the weights
traditional neural
matrix
networks,describes
when the the number
interaction between
of input nodesan isinput
quiteunit andthe
large, annumber
output ofunit. For traditional
weights neural
will also become
networks,
very huge,when and the number
trainingofefficiency
input nodes willisdrop
quitedrastically.
large, the number of weights
To address will also
this issue, the become very
convolution
huge,
method andofthe trainingthe
reducing efficiency
number will
ofdrop drastically.
weights is needed To address this training
to reduce issue, thecosts.
convolution
The two method
main
of reducing of
advantages thethe
number of weights
convolution method is needed
are weightto reduce
sharingtraining costs.connections,
and sparse The two main advantages
which of
effectively
the convolution method are weight sharing and sparse connections, which
improve the situation. The calculation process of convolution method is shown in Figure 1. The effectively improve the
situation. The calculation
filter contains the weightsprocess of convolution
to be optimized, andmethod is shown
the forward in Figure process
propagation 1. The filter contains
of the thea
filter is
weights
process ofto calculating
be optimized, theand the forward
output propagation
data by using the inner process of the
product filtermultiplication
matrix is a process ofbetween
calculating
the
the output
weights in data by using
the filter andthetheinner
inputproduct matrix
data. In a CNN, multiplication betweenused
the filter weights the weights
by eachinconvolutional
the filter and
the
layerinput
(CONVdata.layer)
In a CNN,
are thethe filterThe
same. weights used
sharing of by each
filter convolutional
weights can make layer
image(CONV
contentlayer) are the
unaffected
same.
by localThefeature
sharingandof filter
reduceweights can makeofimage
the number content
weights. unaffected
Then, the databy are
localconvolved
feature andand reduce the
output
number
through of theweights. Then,
activation the data are convolved and output through the activation function.
function.

Figure 1. Schematic diagram of convolution in convolutional neural network (CNN) (the elements in
the matrix represent
represent the
the pixel
pixel values
values of
of the
the input
input data
data and
and weights).
weights).

In
In addition
additiontotothe
theCONV
CONV layer, thethe
layer, network
networkoften usesuses
often a pooling layer,layer,
a pooling whichwhich
can adjust
can the output
adjust the
structure of the layer and reduce the size of the model. With the pooling
output structure of the layer and reduce the size of the model. With the pooling layer, layer, the calculation speed
the
and the robustness
calculation speed andof the
themodel are improved.
robustness The pooling
of the model layer usually
are improved. includes
The pooling a max-pooling
layer layer
usually includes
and an average-pooling layer, as shown in Figure 2, which is used to output the
a max-pooling layer and an average-pooling layer, as shown in Figure 2, which is used to output maximum value and
the average value in the filter area. So, no weights exist in the filter of pooling layer.
the maximum value and the average value in the filter area. So, no weights exist in the filter of
pooling layer.
Energies 2019, 12, 2846 4 of 27
Energies 2019, 12, x FOR PEER REVIEW 4 of 27

(a)

(b)
Figure 2.
2. Schematic
Schematic diagram
diagram ofof pooling
pooling layer
layer calculation
calculation process
process (a)
(a) max-pooling
max-pooling layer
layer (b)
average-pooling layer (the
(the elements
elements in
in the
the matrix
matrix are
are various
various pixels).
pixels).

To
To achieve
achievedifferent
differenttest tasks,
test different
tasks, layers
different needneed
layers to be toconnected to formtoa CNN.
be connected form aThe AlexNet
CNN. The
is a typical
AlexNet is CNN proposed
a typical CNN by Krizhevsky
proposed et al. [31], which
by Krizhevsky has awhich
et al. [31], simplehasmodel structure
a simple modelbutstructure
accurate
image recognition
but accurate imagerate. The AlexNet
recognition rate.fully
The demonstrates
AlexNet fullythe superior performance
demonstrates the superior of CNN in dealing
performance of
with
CNNcomplex
in dealing data. Ascomplex
with shown indata.
Figure
As3,shown
the structure
in Figureof AlexNet is 8 layers
3, the structure with weights,
of AlexNet including
is 8 layers with
5weights,
layers ofincluding
CONV layers andof3 CONV
5 layers layers of fullyand
layers connected
3 layerslayers (FCconnected
of fully layers). Three
layersmax-pooling
(FC layers). layers
Three
are utilized tolayers
max-pooling adjustare
theutilized
output to shape.
adjustAdditionally, to reduce
the output shape. the dimension
Additionally, of curve
to reduce probabilistic
the dimension of
prediction data, a flatten
curve probabilistic methoddata,
prediction is used before method
a flatten the FC layer.
is used Finally,
beforethethe
FC FC
layers are used
layer. to achieve
Finally, the FC
the dataare
layers dimensional reduction
used to achieve the and
dataoutput the finalreduction
dimensional results. Inand the calculation
output the process of FC layers,
final results. In the
the softmax process
calculation functionofis usually chosen
FC layers, to calculate
the softmax the probability
function is usuallyofchosen
the datatoafter dimension
calculate reduction.
the probability
The picture
of the data with
after the highest probability
dimension reduction. Theis thepicture
final result
withof CNN.
the Equation
highest (1) gives
probability the final
is the mathematical
result of
expression
CNN. Equation of the(1)
softmax function.
gives the mathematical expression of the softmax function.
eal
so f tmax(l) = c e al (1)
softmax(l ) = c eak
P

e
k = 1 ak (1)
where al is the output value of the lth node of the output
k =1layer, c is the total number of sample classes.

where al is the output value of the lth node of the output layer, c is the total number of sample
classes.
Energies 2019, 12, 2846 5 of 27
Energies 2019, 12, x FOR PEER REVIEW 5 of 27

Figure 3. Schematic diagram of the AlexNet structure.


Figure 3. Schematic diagram of the AlexNet structure.
3.2. Model of CNN
3.2. Model of CNN
3.2.1. Sample Obtaining
3.2.1. Sample Obtaining
The type curve of well testing is the log–log plot, which is based on the analysis of the time,
The type
pressure, and itscurve of wellintesting
derivative is thecoordinates.
the log–log log–log plot, which
The is based
reservoir typesonarethe analysis ofbythe
determined time,
different
pressure, and its derivative in the log–log coordinates. The reservoir
shapes of the curve, and they are very critical to well testing interpretation results. Due to the types are determined by
different shapes of the curve, and they are very critical to well testing interpretation
non-uniqueness in interpretation results, it is difficult to quickly and accurately determine the reservoir results. Due to
the
typenon-uniqueness
corresponding to inainterpretation
large amount of results, it is difficult
interpretation data. to quickly and
Automatic accurately
identification ofdetermine
well test curvethe
reservoir type corresponding to a large amount of interpretation data. Automatic
types based on CNN can significantly reduce the workload of identification, and it provides a reliable identification of
well test curve types based
basis for accurate parameter inversion.on CNN can significantly reduce the workload of identification, and it
provides a reliablewells
Production basisinfor accurate parameter
unconventional inversion.
reservoirs represented by the Ordos Basin are generally
hydraulically fractured, so the vertically fractured model is onebyofthe
Production wells in unconventional reservoirs represented theOrdos
commonly Basin used
are generally
well test
hydraulically
interpretationfractured,
models inso the vertically reservoirs.
unconventional fractured model At theissame
one time,
of theduecommonly usedfracturing,
to hydraulic well test
interpretation modelsinin
the natural fractures theunconventional reservoirs.
formation are activated, Atthe
and theresulting
same time, due to hydraulic
considerable amountsfracturing,
of buildup
the
test data of Ordos basins are characterized by a dual-porosity model. On the other hand,amounts
natural fractures in the formation are activated, and the resulting considerable large-scale of
buildup
hydraulic test data of Ordos
fracturing basins improves
significantly are characterized by a dual-porosity
the permeability model.region,
of the near-well On thewhich
other means
hand,
large-scale hydraulic fracturing significantly improves the permeability
that the radial composite model is also used as the reservoir model for well test interpretation in of the near-well region,
which means that
unconventional the radial
reservoirs. Thecomposite
mathematic model is also used
expressions of theasabove
the reservoir
well testingmodel
modelsfor are
wellgiven
test
interpretation
in the Appendices in unconventional
B–D. With these reservoirs.
mathematic Theexpressions,
mathematicFigureexpressions
4 shows ofthat
the the
above wellwell
typical testingtest
models are given in the Appendixes B–D. With these mathematic expressions,
curves for the above models can be roughly divided into the following categories. In the same reservoir Figure 4 shows that
the typical there
conditions, well test
is nocurves
doubt thatfor the abovecomposite
the radial models can modelbe with
roughly divided
mobility ratiointo the dispersion
>1 and following
categories. In the same reservoir conditions, there is no doubt that the radial
ratio >1 in the five well test models has the greatest productivity. The reason is that this model assumes composite model with
mobility ratio >1 and dispersion ratio >1 in the five well test models has
that the area around the production well has been adequately stimulated by the hydraulic fracturing, the greatest productivity.
The
so anreason
inner is thatofthis
zone model
high assumes is
permeability that the area
formed around
around the the production
production well,well has contributes
which been adequatelyto the
stimulated by the
largest productivity. hydraulic fracturing, so an inner zone of high permeability is formed around the
production well, which contributes to the largest productivity.
In this paper, the training set included 2725 well test curves for five well test models, and 25 field
buildup test cases were used to evaluate the generalization ability of CNN. The pressure derivative-time
curve data for each training sample was used for classification. There were 545 curves of each well test
model type and Table 1 shows the range of corresponding parameters for five well test models.
Energies 2019, 12, 2846 6 of 27
Energies 2019, 12, x FOR PEER REVIEW 6 of 27

(a)

(b)

(c)

Figure 4. Cont.
Energies 2019, 12, 2846 7 of 27
Energies 2019, 12, x FOR PEER REVIEW 7 of 27

(d)

(e)
Figure 4.4.The
Figure Thetypical
typical well
well testtest curves
curves for models
for models used inused in this(a)work.
this work. (a) Infinite-conductivity
Infinite-conductivity vertically
vertically model
fractured fractured model
without skinwithout skin factor
factor (Model 1); (b) (Model 1); (b) infinite-conductivity
infinite-conductivity vertically fracturedvertically
model
fractured
with model(Model
skin factor with skin factor
2); (c) (Model 2);
dual-porosity (c) dual-porosity
model model state
with pseudo-steady with (Model
pseudo-steady state
3); (d) radial
composite model with mobility ratio >1 and dispersion ratio >1 (Model 4); (e) radial composite model
(Model 3); (d) radial composite model with mobility ratio >1 and dispersion ratio >1 (Model 4); (e)
with mobility ratio <1 and dispersion ratio >1 (Model 5).
radial composite model with mobility ratio <1 and dispersion ratio >1 (Model 5).

In this paper, the training set included 2725 well test curves for five well test models, and 25
Table 1. The range of model parameters of various well test models in this paper.
field buildup test cases were used to evaluate the generalization ability of CNN. The pressure
derivative-time curve
Modeldata for each training 1 sample 2was used for 3 classification.
4 There were
5 545
curves of each well test model type
Wellbore storage coefficient (m /MPa)
3 and Table
0–0.25
1 shows the
0–0.25
range of
0–0.25
corresponding
0–0.25
parameters
0–0.25
for
five well test models.
Skin factor 0–0.05 0.05–2 0–1 0–1 0–1
Fracture half length (m) 20–80 20–80 / / /
Initial
Tablepressure
1. The(MPa) 15–35
range of model parameters 15–35 well test
of various 15–35 15–35
models in this paper. 15–35
Permeability (mD) 0.10–50 0.10–50 0.10–50 0.10–50 0.10–50
ThicknessModel
(m) 9.14 1 9.14 2 9.14 3 9.144 5
9.14
Porositycoefficient (m3/MPa)
Wellbore storage 0.10 0-0.250.10 0-0.25 0.10 0-0.25 0.10
0-0.25 0.10
0-0.25
Omega
Skin factor / 0-0.05 / 0.05-2 0.01–0.60
0-1 /
0-1 0-1/
lambda / / 10−6 –10−9 / /
Fracture half length (m) 20-80 20-80 / / /
Mobility ratio / / / 1–20 0–1
Initial pressure
Dispersion ratio (MPa) / 15-35 / 15-35 / 15-35 15-35
1–20 15-35
0–1
Permeability
Composite radius (m)(mD) / 0.10-50 / 0.10-50 /0.10-50 10–200
0.10-50 0.10-50
10–200
Thickness (m) 9.14 9.14 9.14 9.14 9.14
Porosity 0.10 0.10 0.10 0.10 0.10
Omega / / 0.01-0.60 / /
lambda / / 10−6-10−9 / /
Mobility ratio / / / 1-20 0-1
Dispersion ratio / / / 1-20 0-1
Energies 2019, 12, x FOR PEER REVIEW 8 of 27

Composite radius (m) / / / 10-200 10-200

Before training,
Energies improving, and evaluating the CNN model for well test plots, it was
2019, 12, 2846 necessary
8 of 27

to divide the training data sets into training set, validation set, and test set. Their quantities
respectivelyBefore
accounted
training,for 90.909%,
improving, 8.182%, and
and evaluating the CNN0.909%
model of the test
for well total number
plots, of samples.
it was necessary to The
primary role of the validation set was to compare the performance of different neural network
divide the training data sets into training set, validation set, and test set. Their quantities respectively
models. accounted
The test for set90.909%,
was used 8.182%, and 0.909%
to verify of the total number
the generalization of samples.
ability of the The primary
model role ofon
based thethe mine
validation set was to compare the performance of different neural network models. The test set was
data. The validation set and test set were not involved in the training process of the network. The
used to verify the generalization ability of the model based on the mine data. The validation set and
first time they
test werenotentered
set were involvedinto
in thethe network
training was
process of in
thethe process
network. Theof network
first time theyverification.
were entered In total,
2500 of the
into theoretical
the network was curves were
in the determined
process of networkas training sets,
verification. and
In total, 2500theofremaining
the theoretical225 curves were
curves
chosen aswerethedetermined
validation set. Additionally,
as training 25 field buildup
sets, and the remaining 225 curvestest
were cases
chosenfrom
as thethevalidation
Ordos Basinset. were
Additionally, 25 field buildup test cases from the Ordos
used as a test set. Figure 5 is a schematic diagram of training data partition.Basin were used as a test set. Figure 5 is a
schematic diagram of training data partition.

5. Data partition including training set, validation set, and test set.
FigureFigure
5. Data partition including training set, validation set, and test set.
3.2.2. Structure of Neural Network Model
3.2.2. Structure of Neural Network Model
The neural network model has strong ability of nonlinear representation, as its basic unit is a
Theneuron.
neuralThrough
network themodel
design of different
has strong numbers
abilityofofneurons and different
nonlinear layers, various
representation, mapping
as its basic unit is a
relations can be characterized.
neuron. Through the design of different numbers of neurons and different layers, various mapping
relationsModel
can be characterized.
Building of CNN
The CNN constructed in this paper was a five-layer deep network with weights, in which three
Model Building ofCONV
layers were CNN layers and two layers were FC layers, as shown in the Figure 6. There was also
the max-pooling layer and average-pooling layer between various CONV layers, which was used
The CNN constructed in this paper was a five-layer deep network with weights, in which three
to compress the input data and reduce overfitting problem. Table 2 shows the number of network
layers were
weightsCONVin thelayers and
different twoand
layers layers were
the total FC layers,
number as shown
of weights in theInFigure
was 76,583. order to6.minimize
There was also
the max-pooling
the weightslayer andofaverage-pooling
number CNN, the input layer layer between
of the CNN was various
the dataCONV
point oflayers,
pressure which was used to
derivative
time plot, rather than the curve picture. Since the input data point of
compress the input data and reduce overfitting problem. Table 2 shows the number of networkpressure derivative time curve
weights was one-dimensional
in the data with
different layers and respect to time,
the total numberwe used the layerswas
of weights containing
76,583.one-dimensional (1D)
In order to minimize the
filter, including CONV1D, max-pooling1D. Layers containing two-dimensional (2D) filters (such as
weights number of CNN, the input layer of the CNN was the data point of pressure derivative time
CONV2D, max-pooling2D, and average-pooling2D) were used to transform 1D data into 2D data
plot, rather
neededthanfor the curve picture.
convolutional SinceInthe
calculations. input
the final data
layer, point
flatten of pressure
method derivative
and softmax time curve was
activation function
one-dimensional
were used todata with
output respect
the result to time, we used the layers containing one-dimensional (1D)
of CNN.
filter, including CONV1D, max-pooling1D. Layers containing two-dimensional (2D) filters (such as
CONV2D, max-pooling2D, and average-pooling2D) were used to transform 1D data into 2D data
needed for convolutional calculations. In the final layer, flatten method and softmax activation
function were used to output the result of CNN.
Energies 2019, 12, 2846 9 of 27

Table 2. The layer shape and weights number of CNN.

Layer Layer Shape (Output Shape) Weights Number


Input (2,244) 0
Conv1D (38,80) 418
Max-Pooling1D (38,38) 0
Conv2D (17,17,64) 1664
Max-Pooling2D (5,5,64) 0
Conv2D (2,2,128) 73856
Average-Pooling2D (1,1,128) 0
Flatten 128 0
Energies 2019, 12, x FOR
FCPEER REVIEW
(Output) 5 645 9 of 27

Figure 6. Schematic diagram of CNN structure.


Figure 6.

Model Building of FCNN


Table 2. The layer shape and weights number of CNN.
In contrast, a FCNN was established, which had a similar number of weights as CNN. The input
Layer Layer Shape (Output Shape) Weights Number
layer, hidden layer, and output layer for FCNN had 488, 106, and 5 neurons respectively. Figure 7
Input (2,244) 0
shows that the input layer consisted of 488 nodes that accepted the 244 data points (t, dP). Table 3
Conv1D
Energies 2019, 12, x FOR PEER REVIEW (38,80) 418 10 of 27
demonstrates that the FCNN had a total of 76,575 weights.
Max-Pooling1D (38,38) 0
Conv2D (17,17,64) 1664
Max-Pooling2D (5,5,64) 0
Conv2D (2,2,128) 73856
Average-Pooling2D (1,1,128) 0
Flatten 128 0
FC (Output) 5 645

Model Building of FCNN


In contrast, a FCNN was established, which had a similar number of weights as CNN. The
input layer, hidden layer, and output layer for FCNN had 488, 106, and 5 neurons respectively.
Figure 7 shows that the input layer consisted of 488 nodes that accepted the 244 data points (t, dP).
Table 3 demonstrates that the FCNN had a total of 76,575 weights.

Figure 7. Schematic diagram of fully connected neural networks (FCNN) structure.


Evaluation Results for the CNN and FCNN
Figure 7. Schematic diagram of fully connected neural networks (FCNN) structure.
During the training process of the FCNN and CNN, the maximum value of the output
corresponded to the type of 3.curves
Table being
The layer predicted,
shape which
and weights was
number recorded as ŷ . In order to
of FCNN.
optimize the weights
Layer in models,
LayertheShape
cross-entropy function values
(Output Shape) Weightsof the predicted and the
Number
theoretical curve Input
type were calculated. As shown
488 in Equation (2), the cross-entropy
0 function is
usually recorded asFCthe loss function L. When106
the loss function value is the smallest, the ratio of the
75,795
number of accurately predicted
FC (Output) training samples
5 to the total number of samples
780 (called accuracy) is
the largest, which means that the network model has the highest performance.
3.2.3. One-Hot Encoding m
ˆ y)= −
L y,
( y log yˆ (2)
Energies 2019, 12, 2846 10 of 27

Table 3. The layer shape and weights number of FCNN.

Layer Layer Shape (Output Shape) Weights Number


Input 488 0
FC 106 75,795
FC (Output) 5 780

Evaluation Results for the CNN and FCNN


During the training process of the FCNN and CNN, the maximum value of the output corresponded
to the type of curves being predicted, which was recorded as ŷ. In order to optimize the weights
in models, the cross-entropy function values of the predicted and the theoretical curve type were
calculated. As shown in Equation (2), the cross-entropy function is usually recorded as the loss function
L. When the loss function value is the smallest, the ratio of the number of accurately predicted training
samples to the total number of samples (called accuracy) is the largest, which means that the network
model has the highest performance.
m
X
L( ŷ, y) = − yi log ŷi (2)
i=1

where yi the type of the ith training samples, ŷi is the probability of ith training samples, and m is the
number of training samples. In order to obtain a robust and fast CNN, the newly proposed one-hot
encoding, Xavier normal initialized model, ReLU activation function, L2 regularization method, Adam
optimization algorithm, and mini batch technique were combined to further construct the CNN.

3.2.3. One-Hot Encoding


In the training tasks of machine learning, the variety of sources of training data led to more
complex data types. The training data can be roughly divided into type data and numerical data.
The training process of the neural network model was performed on the basis of numerical data.
Therefore, in the classification task, the type data needed to be converted into numerical data and were
further used to train the neural network model. The one-hot encoding method is a commonly used
method of encoding type data into numerical data, which encodes the type data into a binary vector
with at most one valid value. As shown in Table 4, each column represents each category in the training
sample data and the unit containing “1” represents the category to which the sample data belongs.

Table 4. The schematic diagram of one-hot encoding.

Class1 Class2 Class3 Class4 Class5


Sample1 0 1 0 0 0
Sample2 1 0 0 0 0
Sample3 0 0 1 0 0
... ... ... ... .
Sample2724 0 0 1 0 0
Sample2725 0 0 0 0 1

3.2.4. Determination of Model Initialization


During the training process of the neural network model, proper initialization of the weights
was essential to establish a robust neural network model. The proper initialization of the weights
will cause the weights to be distributed over a reasonable range. If the initial weight value is small,
the effective information in the backpropagation process will be ignored and the training process
of neural network model maybe invalidated. When the initial weight values are large, the weight
fluctuations in the backpropagation process will increase, which may lead to the instability or even
Energies 2019, 12, 2846 11 of 27

collapse of the model. The commonly used initialization methods of neural network model include
Energies 2019, 12, x FOR PEER REVIEW 11 of 27
four categories and they are as follows: (1) Randomnormal method; (2) Randomuniform; (3) Xavier
normal;
normal; (4)(4) Xavier
Xavier uniform
uniform [32].
[32]. In
In this
this work,
work, wewe compared
compared the
the effects
effects of
of four
four initialization
initialization methods
methods
on the training results. After 100 iterations of the model, the Xavier normal initialized
on the training results. After 100 iterations of the model, the Xavier normal initialized model model had had
the
highest accuracy in the training set and the validation set (Figure
the highest accuracy in the training set and the validation set (Figure 8).8).

Figure
Figure 8. Comparison
8. Comparison results
results of CNN
of CNN on training
on training set and validation
set and validation set under
set under different different
initialization
initialization
methods. methods.

3.2.5.
3.2.5. Selection
Selection ofof Activation
Activation Functions
Functions
The
The activation
activation function
function ofof the
the neural
neural network
network has has aa significant
significant impact
impact on on the
the prediction
prediction effect
effect
of the model. When no activation function is used (i.e., f (x) = x),
of the model. When no activation function is used (i.e., f(x) = x), the input of each node
the input of each node is
is aa linear
linear
function
function ofofthe
theoutput
outputresult of of
result thethenode in upper
node in upperlayer. In this
layer. In case, regardless
this case, of theof
regardless number of layers
the number of
in the neural
layers network,
in the neural the output
network, result only
the output is aonly
result linearis acombination of inputofresults,
linear combination input and the and
results, hiddenthe
layer
hidden does not does
layer work.not Only when
work. the neural
Only when the network
neuralmodelnetwork uses model
a nonlinear
uses activation
a nonlinear function
activationare
the outputare
function results
the no longerresults
output a linearno combination
longer a of inputcombination
linear results and itof caninput
approximate
results an arbitrary
and it can
function.
approximate an arbitrary function. Table 5 shows the five commonly used activation functionsELU,
Table 5 shows the five commonly used activation functions (i.e., linear, tanh, sigmoid, (i.e.,
and ReLU).
linear, tanh,As shown in
sigmoid, Figure
ELU, and9,ReLU).
the comparative
As shownresults
in Figure showed
9, thethat the neural results
comparative networkshowed
model hadthat
athe
better effect when the ReLU function was used in the middle layer.
neural network model had a better effect when the ReLU function was used in the middle layer.

Table 5. The mathematical expression of five commonly activation functions.


Table 5. The mathematical expression of five commonly activation functions.

Type Type Equation


Equation
linear linear ) =x x
f (fx()x=
ex −x e−x − x
tanh f (x) = xe − −xe
tanh f ( x) =e +x1 e − x
sigmoid f (x) = e +x e
( x 1+e
e − 1 ,1 x < 0
sigmoid ELU f (x) = f ( x ) =
x 1+ , exx ≥ 0
ReLU f (x) = max ( 0, x)
e − 1, x < 0
x

ELU f ( x) = 
 x ,x ≥ 0
ReLU f ( x ) = m a x (0, x )
Energies 2019, 12, 2846 12 of 27
Energies 2019, 12, x FOR PEER REVIEW 12 of 27

Figure 9.
Figure 9. Comparison results
Comparison of CNN
results on training
of CNN set andset
on training validation set underset
and validation different
under activation
different
functions. functions.
activation

3.2.6.
3.2.6. Regularization
Regularization Technique
Technique
The
The overfitting
overfitting is is aa common
common problem
problem in in the
the training
training process
process of
of neural
neural network
network models
models and and itit
greatly
greatly reduces
reduces the the generalization
generalization ability
ability of
of neural
neural network
network models
models [33].
[33]. The
The main
main reasons
reasons for
for the
the
overfitting problem are insufficient training samples and a complex structure
overfitting problem are insufficient training samples and a complex structure of networks. To of networks. To overcome
this problem,
overcome thisthe dropout
problem, themethod
dropout and L2 regularization
method method aremethod
and L2 regularization used toaredynamically adjust the
used to dynamically
network
adjust thestructure,
networkwhich can effectively
structure, which can avoid the overfitting
effectively avoid the problem.
overfitting(1)problem.
In the process of forward
(1) In the process
propagation, the dropout method can make various nodes stop working
of forward propagation, the dropout method can make various nodes stop working in a certain in a certain probability p
(called dropout
probability rate) and
p (called the relative
dropout importance
rate) and of each
the relative node is balanced.
importance of each After
node the introduction
is balanced. Afterof the
the
dropout method, each node of the neural network model contributes more
introduction of the dropout method, each node of the neural network model contributes more equally to the output results
and avoids
equally the output
to the situation whereand
results a few high-weight
avoids nodeswhere
the situation fully control the output results.
a few high-weight nodes(2) Forcontrol
fully the L2
regularization method,
the output results. the sum
(2) For the L2of regularization
the squared value for weight
method, squares
the sum is added
of the squared to value
the loss function,
for weight
which can constrain the size of the weight values to reduce complexity of the
squares is added to the loss function, which can constrain the size of the weight values to reduce model. Therefore,
Equation
complexity (2)of
is the
rewritten
model.asTherefore,
Equation Equation
(3). (2) is rewritten as Equation (3).
mm nn
L((ŷ,y,yy))== −− yyii log i + λ w
logŷyi +
X X
L λ w2j 2j (3)
(3)
i =11
i= j=j1=1

where λλis the


where is the super-parameter,
super-parameter, which
which is used
is used to control
to control the of
the level level of weight
weight n is the n
decay. decay. is the
amount
amount of weights. In contrast, the results of the classification accuracy of the model
of weights. In contrast, the results of the classification accuracy of the model with dropout method,with dropout
method,
L2 L2 regularization
regularization method, method, and without
and without regularization
regularization methodmethod are compared.
are compared. It can
It can be befrom
seen seen
from Figure
Figure 10 that 10
the that
model the
hadmodel had the
the highest highest
accuracy accuracy
in the in the
validation validation
set when using set
the when
dropout using the
method
dropout method and the accuracy in validation set was close to the
and the accuracy in validation set was close to the that for the training set. that for the training set.
Energies 2019, 12, 2846 13 of 27
Energies 2019, 12, x FOR PEER REVIEW 13 of 27

Figure 10. Comparison


Comparisonresults
resultsofof CNN
CNN on
on training
training set
set and
and validation
validation set
set under
under different
regularization techniques.

3.2.7. Adam
3.2.7. Adam Optimization
Optimization Algorithm
Algorithm
To obtain
To obtain the
the minimum
minimum value
value of
of loss
loss function
function of
of the
the model,
model, the
the weights
weights inin the
the network
network model
model
need to be updated at each iteration step. Among various optimization algorithms forupdating,
need to be updated at each iteration step. Among various optimization algorithms for weight weight
the Adam optimization
updating, algorithm proposed
the Adam optimization by Kingma
algorithm proposedandby
Ba, Kingma
[34] has the
andhighest performance
Ba, [34] [35,36].
has the highest
performance [35,36]. Compared to the classical gradient descent method, this method can avoidloss
Compared to the classical gradient descent method, this method can avoid the oscillation of the the
function and
oscillation ofthe
themodel
loss with Adam
function andoptimization
the modelalgorithm
with Adamhas aoptimization
higher convergence speed.
algorithm hasThe Adam
a higher
optimization speed.
convergence algorithm
Theupdates the network model
Adam optimization weights
algorithm in thethe
updates form of Equation
network model(4).
weights in the
form of Equation (4).
ηt
wtj = wt−1
j − q ωtj (4)
υηtj + ε
t
w =w t
j
t −1
j − ω t
j (4)
υ tj + ε
where ηt is the learning rate at the tth time step, wtj is the network weight of the jth feature of the
where η is the learning rate at the tth time step, wj is the network weight of the jth feature of
t
training sample data under the tth time step, ε is a smallt constant to avoid a zero denominator.
In Equation (5),
the training sample data under the tth β1time ωt−1
j
+ (1 − ε
step, β1 )is
gtj a small constant to avoid a zero
denominator. ωtj = t
(5)
In Equation (5), 1 − ( β 1 )
2
β2βυω 1 (1 − β2 )( gt )t
( 1)
t−1t −+
t t 1
j + 1 − β gj
υω
j ==
j
2
j (6)
(5)
j
− ((ββ2 ))t
11−
1

where β1 and β2 are the exponential decay rates for the moment estimates, gtj is the gradient in the jth
parameter under the tth time step. Int thisβwork,2υ j +
t −1
(1 used
we − β 2 )the
( g tjparameters
)2 recommended by Kingma
υ =
and Ba, [34]: β1 = 0.9, β2 = 0.999, ω0j =j 0, υ0j = 0 and ε = 2 −8 .
(6)
1 − (β ) 10
2

β1 and
3.2.8. Mini
where BatchβTechnique
2 are the exponential decay rates for the moment estimates, gtj is the gradient
in theThe
jth premise of machine
parameter under thelearning
tth timeisstep.
thatIn
it requires
this work,a huge sample
we used size. Duringrecommended
the parameters each iterationby
of
Kingma and Ba, [34]: β1 = 0.9, β 2 = 0.999, ω j = 0 , υ j = 0 and ε = 10−8.
the model, the optimization algorithm needs 0 fit the model
to 0 to all training samples at once, so the
requirement for CPU will be enormous. In order to reduce the requirements for CPU and improve
computational efficiency, the mini batch technique was utilized as it can randomly select a small
3.2.8. Mini Batch Technique
portion of the training samples in the training set for each iteration process of the model. Meanwhile,
The premise of machine learning is that it requires a huge sample size. During each iteration of
the model, the optimization algorithm needs to fit the model to all training samples at once, so the
requirement for CPU will be enormous. In order to reduce the requirements for CPU and improve
Energies 2019, 12, 2846 14 of 27

the random selection of training samples also made the mini batch technology effectively avoid the
neural network model falling into local minimum problems during the training process. For the mini
batch technique, the gradient gt in Equations (5) and (6) is as follows:

b
1X t
gt = gk (7)
b
k =1

s
1X
gtk = ∇L(wt−1 ; xir ; yir ) (8)
s
r=1
m
 
b= (9)
s
where b represents the number of iterative step of the mini batch method from t − 1th time step to
tth time step. gtk is the gradient of the kth iterative step from t-1th time step to tth time step. wt−1 is
the weights at the t-1th time step. s is the number of training samples in one mini batch. i1 , . . . , is are
a random number between 1 and m. xir is the pressure derivative-time curve data of the ir training
sample. yir the type of the ir training sample. m is the total number of training samples.

4. Results and Discussions

4.1. Comparison of Classification Performance for FCNN and CNN


We used the same techniques (including regularization technique, Adam optimization algorithm,
activation functions, and initialization methods) to optimize the FCNN model. Table 6 and Figure 11
compare the errors of different models. The error of the test set verified the performance of the well
test plot classification in the field buildup test cases and demonstrated the generalization ability of
the CNN. For the FCNN, after 100 iterations, the loss function value was 0.44, and the classification
accuracy was 91.2%. In the validation set, its accuracy was 89.8%. For the CNN, the loss function and
accuracy for training set and validation set respectively were 0.19, 96.6%, and 95.6%. The comparing
results of FCNN and CNN showed that the CNN had a higher accuracy when the number of weights
of two models were close (76,583 weights and 76,575 weights).
As shown in Figures 12–14, the confusion matrix analysis is a method for judging the
classification performance of neural network model, which shows the accuracy result of classification.
Mukhanov et al. [37] used the confusion matrix to evaluate the classification result of the waterlogging
curve by the support vector machine technology. The confusion matrix separately calculates the
number of misclassification classes and the number of correct classification classes in the model. It can
be seen from Figures 12 and 13, and Tables 7 and 8 that the FCNN had different classification capabilities
for various types of well test curves in the training set and validation set. For the CNN, its classification
results of the 5 well test models were 0.98, 0.94, 0.97, 0.95, 0.98 for training set and 0.97, 0.93, 0.96,
0.93, 0.98 for validation set, which were generally better than the FCNN results. Figure 13 and Table 8
also showed that the FCNN forecasting results of class1, class3, and class5 in the validation set were
basically correct, but there was a large error for the curves of class2 and class4. For CNN, the stability
of forecasting result was high, and the prediction errors of various types of curves were almost the
same, indicating the reliability of CNN. Through the confusion matrix, the recall rate (Equation (11))
and precision rate (Equation (10)) of the model could be calculated. The precision rate represents
the number of correctly predicted samples in a class (TP) to all actually retrieved items (the sum of
TP and FP). The recall rate refers to the TP as a percentage of all items (the sum of TP and FN) that
should be retrieved. F1 value is the harmonic mean of the precision rate and recall rate. The average of
the F1 values for all training samples is determined as Score (Equation (13)). Table 8 summarizes the
performance of different network models in validation sets. It can be seen that the Score of the FCNN
Energies 2019, 12, 2846 15 of 27
Energies 2019, 12, x FOR PEER REVIEW 15 of 27

model was matrix,


confusion 0.81 andthe
therecall
value rate
of CNN was 0.91,
(Equation (11)) indicating that therate
and precision overall performance
(Equation of the
(10)) of CNN was
model
better than FCNN.
could be calculated. The precision rate represents the number of correctly predicted samples in a
TP
(the sum=of TP and FP). The recall rate refers to the TP (10)
class (TP) to all actually retrieved itemsPrecision as a
TP + FP
percentage of all items (the sum of TP and FN) that should be retrieved. F1 value is the harmonic
TP
mean of the precision rate and recall rate. The=average of the F1 values for all training samples
Recall is
(11)
TP + FN
determined as Score (Equation (13)). Table 8 summarizes the performance of different network
models in validation sets. It can be seen Precison × Recall
F1that
= 2 the Score of the FCNN model was 0.81 and the value of
(12)
Precison
CNN was 0.91, indicating that the overall performance + of
Recall
CNN was better than FCNN.
 c 2
 1 X TP 
Score =  = (F1 )k 
Precision (13)
(10)
c TP + FP
k =1

where TP is the number of correctly predicted samples in a class. For a certain type of training sample,
TP
FN is the difference between the numberRecall =
of successfully predicted training samples and the (11)total
TP + FN
number of training samples, FP is the difference between the number of successfully predicted training
samples and the predicted number of samples for a certain type, c is the total number of sample classes.
Precison × Recall
1 = 2 was verified on 25 field buildup test data, among which
Finally, the classification ability ofFCNN (12)
Precison + Recall
21 samples are successfully classified. Table 9 and Figure 14 show the confusion matrix of the model in
the test set, and its Score was 0.69. Appendix A shows the data 2 of the 25 field buildup test data.
1 c 
Score = c 
(F )
 k =1
Table 6. Model prediction k
1

accuracy.
(13)

where TP is the number of correctly predicted Loss Function


samples Accuracy
in a class. For (%)
a certain type of training
sample, FN is the difference between
CNN the number of 0.19
train set successfully predicted
96.6 training samples and the
total number of trainingCNN validation
samples, FP issetthe difference /between the number
95.6 of successfully predicted
FCNN train set 0.44 91.2
training samples and the predicted number of samples for a certain type, c is the total number of
FCNN validation set / 89.8
sample classes.

Figure 11.
Figure 11. The
The changes
changesin inthe
theaccuracy
accuracyand
andloss
lossfunction
functioncurve
curve
forfor FCNN
FCNN and
and CNN
CNN on on training
training set set
as
as the
the number
number of iterations
of iterations increases.
increases.

Table 6. Model prediction accuracy.

Loss Function Accuracy (%)


CNN train set 0.19 96.6
CNN validation set / 95.6
FCNN train set 0.44 91.2
FCNN validation set / 89.8
Energies 2019, 12, xx FOR
Energies FOR PEER REVIEW 16 of
of 27
2019, 12, 2846 PEER REVIEW 16
16 of 27

Predict
Predict Predict
Predict
CNN
CNN FCNN
FCNN
11 22 33 44 55 11 22 33 44
55
11 495
495 00 55 00 00 11 461
461 13
13 26
26 00 00

22 44 453
453 12 25
12 25 66 22 99 411
411 24
24 41
41 15
15
Ture

Ture
Ture

Ture
33 10 00 489
10 489 00 11 33 26
26 17
17 456
456 00 11

44 00 13
13 00 483
483 44 44 00 25
25 88 463
463 44

55 00 11 00 44 495
495 55 00 88 00 33 489
489
(a)
(a) (b)
(b)
Figure 12.
Figure 12. The
The confusion
The confusion matrix
confusion matrix of
matrix of CNN and
of CNN and FCNN
FCNN on
on training
training set.
set. (a)
(a) CNN
(a) CNN confusion
CNN confusion matrix;
confusion matrix; (b)
matrix; (b)
(b)
FCNN confusion
FCNN confusion matrix.
matrix.

Predict
Predict Predict
Predict
CNN
CNN FCNN
FCNN
11 22 33 44 55 11 22 33 44 55
11 43
43 22 00 00 00 11 39
39 55 00 11 00
22 00 41
41 00 44 00 22 11 33
33 00 11
11 00
Ture

Ture
Ture

Ture

33 00 00 43 00
43 22 33 00 00 45 00
45 00
44 00 00 22 43
43 00 44 00 44 11 40
40 00
5 00
5 REVIEW0 0 00 00 45
45 55 00 00 00 00 45
45
Energies 2019, 12, x FOR PEER 17 of 27
(a)
(a) (b)
(b)
model in the
Figure test
Theset, and itsmatrix
confusion Score was 0.69. Appendix
FCNN on A shows the
on validation
validation set. data of the
(a) CNN
CNN 25 field
confusion buildup
matrix; (b) test
Figure 13. The
13. confusion matrix of
of CNN
CNN and
and FCNN
FCNN on validation set.
set. (a)
(a) CNN confusion
confusion matrix;
matrix; (b)
(b)
data.FCNN confusion matrix.
FCNN confusion matrix.

Table 7.
Table 7. The
The evaluation
evaluation result
result of Predict
of FCNN
FCNN and CNN
and CNN on
on training
training set.
set.
CNN
Model
Model Index
Index Class1
Class1 1Class2
2 3 Class3
Class2 4 5 Class4
Class3 Class4 Class5
Class5 Score
Score
Precision (%)
Precision (%) 92.94
92.94 86.71 88.72 91.32 96.07
FCNN Recall (%) 92.20
1 3 86.71
1
82.20
0 88.72
0 0 91.32
91.20 92.60
96.07
97.80 0.83
FCNN Recall (%) 92.20 82.20 91.20 92.60 97.80 0.83
F11 Score
F Score 0.932
0.93 0 0.845
0.84 0 0.90
1 0 0.92
0.90 0.92 0.97
0.97
Ture

Precision (%)
Precision (%) 97.25
97.25 97.00 96.64 94.34 97.83
3 0 97.00
0 2 96.64
0 0 94.34 97.83
CNN
CNN Recall (%)
Recall (%) 99.00
99.00 90.60
90.60 97.80
97.80 96.60
96.60 99.00
99.00 0.93
0.93
F 1 Score
F1 Score 0.984
0.98 0 0.941
0.94 0 0.97
4
0.97 0 0.95
0.95 0.98
0.98

Table 8.
8. The
The evaluation
5
evaluation result
0
result of
of FCNN
0
FCNN and
and CNN
1
CNN on
0
on validation
7
validation set.
set.
Table
Model The confusion
IndexFigure 14.Class1
Class1 matrix of
Class2 CNN on test
Class3 set.
Class4 Class5 Score
Model Index Class2 Class3 Class4 Class5 Score
Precision (%)
Table 7.(%)
Precision 97.50
The evaluation
97.50 78.57
result of FCNN and
78.57 97.83 76.92
CNN on76.92 100
training set.100
Table 9. The evaluation result of 97.83
test set for CNN.
FCNN
FCNN Recall (%)
Recall (%) 86.67
86.67 73.33
73.33 100
100 88.89
88.89 100
100 0.81
0.81
Model Index
F 1 Score
Class1
0.92 Class2
0.76 Class3
0.99 Class4
0.82 Class5
1.00 Score
Index Model
F1 Score1 Model
0.92 2 Model
0.76 3 0.99 Model 4
0.82 Model
1.00 5 Score
Recall Precision
Precision (%)
75 (%)(%) 92.94
100
83.3 86.71 88.72
95.35100 95.5695.56 91.32
91.49
8091.49 96.07
95.74
87.5
Precision 100 95.35 95.74
FCNN
CNN Recall
Recall(%)
(%) 92.20
95.56 82.20
91.11 91.20
95.56 92.60
95.56 97.80
100 0.83
0.91
Precision
CNN 100
Recall (%) 71.4
95.56 91.1166.7 95.56 80 95.56 100 0.69
0.91
F1 Score 0.93 0.84 0.90 0.92 0.97
F1 Score F Score
F0.86
1
1 Score
0.97
0.77
0.97 0.93
0.930.80 0.96
0.96 0.800.93 0.93 0.98
0.93
0.98
Precision (%) 97.25 97.00 96.64 94.34 97.83
CNN Recall (%) 99.00 90.60 97.80 96.60 99.00 0.93
Finally,
4.2. Effects the classification
classification
of Parameters
Finally, the ability of
on Classification
ability ofResults
CNN was
CNN was verified
verified on
on 25
25 field
field buildup
buildup test
test data,
data, among
among
F1 Score 0.98 0.94 0.97 0.95 0.98
which 21
which 21 samples
samples are
are successfully
successfully classified.
classified. Table
Table 99 and
and Figure
Figure 14
14 show
show the
the confusion
confusion matrix
matrix of
of the
the
Sensitivity analysis is a key step in testing CNN performance and determining the impact of
input parameters on the predictive results [38–43]. In order to study the influence of a series of
CNN parameters on the prediction results and further optimize the established CNN, a sensitivity
analysis was conducted.
Energies 2019, 12, 2846 17 of 27

Table 8. The evaluation result of FCNN and CNN on validation set.

Model Index Class1 Class2 Class3 Class4 Class5 Score


Precision (%) 97.50 78.57 97.83 76.92 100
FCNN Recall (%) 86.67 73.33 100 88.89 100 0.81
F1 Score 0.92 0.76 0.99 0.82 1.00
Precision (%) 100 95.35 95.56 91.49 95.74
CNN Recall (%) 95.56 91.11 95.56 95.56 100 0.91
F1 Score 0.97 0.93 0.96 0.93 0.98

Table 9. The evaluation result of test set for CNN.

Index Model 1 Model 2 Model 3 Model 4 Model 5 Score


Recall 75 83.3 100 80 87.5
Precision 100 71.4 66.7 80 100 0.69
F1 Score 0.86 0.77 0.80 0.80 0.93

4.2. Effects of Parameters on Classification Results


Sensitivity analysis is a key step in testing CNN performance and determining the impact of
input parameters on the predictive results [38–43]. In order to study the influence of a series of CNN
parameters on the prediction results and further optimize the established CNN, a sensitivity analysis
was conducted.

4.2.1. Effect of the Learning Rate


The loss function is a function of weights, and the learning rate determines the update speed of
the weights in the CNN and determines the value of the loss function. If the learning rate is too large,
it will cause the loss function to oscillate and the CNN is hard to converge. When the learning rate is a
small value, the updated value of the weights also is small and the model converges slowly. Therefore,
there is an optimized learning rate for each CNN. In order to determine the optimal learning rate,
we changed the value of the learning rate from 0.0001 to 0.03 and remaining values were constant.
Figure 15 shows that the CNN had the highest accuracy on both the validation set and the training set
when
Energiesthe learning
2019, 12, x FORrate
PEERwas 0.005.
REVIEW 18 of 27

Figure15.
Figure 15.Comparison
Comparisonof of
thethe training
training results
results of CNN
of CNN on training
on training set andset and validation
validation set underset under
different
differentrates.
learning learning rates.

4.2.2. Effect of the Dropout Rate


The key point of the dropout method to prevent the overfitting problem is to make some nodes
of the CNN to stop working in a probability of dropout rate. Therefore, the value of the dropout
rate has a significant impact on the training effect of the CNN. As shown in Figure 16, as the
Figure
Energies 15.2846
2019, 12, Comparison of the training results of CNN on training set and validation set under
18 of 27
different learning rates.

4.2.2. Effect of
4.2.2. Effect of the
the Dropout
Dropout Rate
Rate
The
The key
key point
point ofof the
thedropout
dropoutmethod
methodtotoprevent
preventthetheoverfitting
overfittingproblem
problem is is
to to
make
make some
somenodes of
nodes
the CNN to stop working in a probability of dropout rate. Therefore, the value
of the CNN to stop working in a probability of dropout rate. Therefore, the value of the dropout of the dropout rate
has
rate ahas
significant impact
a significant on theon
impact training effect of
the training the CNN.
effect of the As shown
CNN. As in Figure
shown in16, as the16,dropout
Figure as the
rate
dropout rate increased, the accuracy of CNN on the training set continued to decrease. For the
increased, the accuracy of CNN on the training set continued to decrease. For the value for the
validation
value for the set,validation
it increasedset,firstly and then
it increased decreased
firstly and then when the dropout
decreased when rate increased.
the dropout rateMeanwhile,
increased.
the accuracy the
Meanwhile, difference
accuracy of difference
the trainingofsettheand the validation
training set validation
set and the was large insetthe
wascase of small
large in thedropout
case of
value, indicating that the overfitting phenomenon had occurred. As the dropout
small dropout value, indicating that the overfitting phenomenon had occurred. As the dropout rate became larger,
rate
the accuracy difference was very small, meaning that the CNN had not been
became larger, the accuracy difference was very small, meaning that the CNN had not been well well fitted to the training
data.
fitted For thetraining
to the CNN indata. this paper,
For thetheCNNoptimal
in thisvalue
paper,of the
the dropout rate was
optimal value 0.4.dropout rate was 0.4.
of the

16. Comparison
Figure 16. Comparisonofofthethe training
training results
results of CNN
of CNN on training
on training set andset and validation
validation set underset under
different
different dropout
dropout rates. rates.

4.2.3. Effect of the Number of Training Samples


The performance of CNN is strongly controlled by the number of training samples. A small
sample size can make the training process of the CNN difficult to converge. In general, CNN require a
pretty large training sample size and the negative effect is that this large sample size usually increases
the requirement for CPU. To select as few samples as possible while ensuring the best performance
of CNN, the impact of sample size on CNN learning curves was investigated. Figure 17 shows that
as the number of training samples increased, the accuracy of CNN model on the validation set also
increased. When the number of training samples was greater than 2000, the increase of the accuracy on
the validation set tended to be flat. Meanwhile, the CNN had a similar accuracy on both the training
set and the validation set. Therefore, the number of training samples was finally determined to be 2500.
best performance of CNN, the impact of sample size on CNN learning curves was investigated.
Figure 17 shows that as the number of training samples increased, the accuracy of CNN model on
the validation set also increased. When the number of training samples was greater than 2000, the
increase of the accuracy on the validation set tended to be flat. Meanwhile, the CNN had a similar
accuracy
Energies on12,both
2019, 2846 the training set and the validation set. Therefore, the number of training samples
19 of 27
was finally determined to be 2500.

17. Comparison
Figure 17. Comparisonofofthethe training
training results
results of CNN
of CNN on training
on training set andset and validation
validation set underset under
different
different numbers of training
numbers of training samples. samples.

5. Conclusions
5. Conclusions
In
In this
this paper,
paper,a aCNN CNN model
model was developed
was developed to classify the well
to classify the testing curves.
well testing In order
curves. In to obtain
order to
the best test curve classification effect, before the training, we optimized
obtain the best test curve classification effect, before the training, we optimized the CNN model the CNN model from several
aspects such asaspects
from several regularization
such as technology, activation
regularization function, activation
technology, and optimization
function,algorithm. The results
and optimization
show that the Xavier normal initialization worked best in the four
algorithm. The results show that the Xavier normal initialization worked best in the four optimization methods. Among the
five activation functions, the CNN model had the best performance
optimization methods. Among the five activation functions, the CNN model had the best when the activation function of
the convolution
performance when layer
theand the output
activation layerofwas
function the chosen as thelayer
convolution ReLU and function.
the output Compared
layer was to chosen
the L2
regularization method, the dropout method had a better performance
as the ReLU function. Compared to the L2 regularization method, the dropout method had a betterin avoiding overfitting problem.
In addition, the
performance inutilization of mini batch
avoiding overfitting technique
problem. In and Adam the
addition, optimization
utilizationalgorithm made the
of mini batch model
technique
not
andfall into local
Adam minimum algorithm
optimization and fast convergence.
made the Further,model the not impacts
fall into of key
localparameters
minimumin the andCNN fast
model on work performance were studied. It was found that when
convergence. Further, the impacts of key parameters in the CNN model on work performance the learning rate was 0.005, the CNN
were
had the highest
studied. precision
It was found thatinwhen
the validation
the learningset and
rate the
wastraining
0.005, theset.CNN
For thehaddropout rate, precision
the highest CNN could in
better fit the training data without over-fitting phenomenon in the case of 0.4.
the validation set and the training set. For the dropout rate, CNN could better fit the training data The analysis of training
sample
withoutnumbers
over-fittingshowed that the accuracy
phenomenon in the difference
case of 0.4. between the training
The analysis set and the
of training validation
sample numbers set
could
showed be that
ignoredthe when the number
accuracy differenceof training
betweensamples was 2500.
the training set Finally,
and thethe classification
validation set results
could be of
CNN and FCNN with similar structures on well testing curves were compared.
ignored when the number of training samples was 2500. Finally, the classification results of CNN For the validation set,
andScore
the FCNN of with
FCNN and CNN
similar were on
structures 0.81 andtesting
well 0.91, respectively,
curves were indicating
compared. that the validation
For the CNN had set, a morethe
robust
Score ofperformance
FCNN andinCNN the classification
were 0.81 and results
0.91,of respectively,
the well test curve. The 25
indicating field
that thecases
CNN fromhadtheaOrdos
more
Basin
robustshowed that the
performance in trained CNN could
the classification successfully
results classify
of the well 21 cases
test curve. Theand 25the robustness
field cases from of the
the
model was further
Ordos Basin showed proved.
that the trained CNN could successfully classify 21 cases and the robustness of
the model was further proved.
Author Contributions: Every author has contributed to this work. Conceptualization, C.H. and D.P.; Methodology,
C.H.; Software, D.P.; Validation, C.H., D.P. and L.X.; Formal Analysis, L.X.; Investigation, L.X.; Resources, L.X. and
C.Z; Data Curation, L.X.; Writing-Original Draft Preparation, C.H.; Writing-Review & Editing, C.H., D.P. and C.Z;
Visualization, Z.J.; Supervision, Z.X.; Project Administration, L.X.; Funding Acquisition, L.X., Z.X. and C.Z.
Funding: This research was funded by the Joint Funds of the National Natural Science Foundation of China
(Grant No. U1762210), National Science and Technology Major Project of China (Grant No.2017ZX05009004-005),
Science Foundation of China University of Petroleum, Beijing (Grant No. 2462018YJRC032), and Post-doctoral
Program for Innovation Talents (BX20180380).
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design of the
study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to
publish the results.
Energies 2019, 12, 2846 20 of 27

Nomenclature
CNN Convolutional Neural Network
FCNN Fully Connected Neural Network
CONVlayer Convolutional Layer
FC layer Fully Connected Layer
1D One Dimensional
2D Two Dimensional
TP True Positive
FP False Positive
FN False Negative
w Network Weight
g Gradient
n Number of Network Weight
b Number of Iterative Step in Mini Batch Technique
s Number of Training Samples in Mini Batch Technique
c Number of Sample Classes
x Sample Matrix
y Real Sample Label Matrix
ŷ Predictive Sample Label Matrix
a Output Value of Neural Network
m Number of Training Samples
Greek
η Learning Rate
β1 , β2 Exponential Decay Rates in Adam Algorithm
ω, υ Momentum in Adam Algorithm
ε Constant
λ L2 Regularization Parameter
Subscript
i i-th Sample
j j-th Feature
k Iteration
Superscript
t t-th Time Step
Energies 2019, 12, 2846 21 of 27

Appendix A. Field Cases Used in This Work

Table A1. Parameter distribution of field cases used in this work.

Initial Wellbore
Thickness Porosity Permeability Skin Composite Mobility Dispersion Fracture Half Curve
Pressure Storage Omega Lambda
(m) (%) (mD) Factor Radius (m) Ratio Ratio Length (m) Type
(MPa) Coefficient
Case1 9.4 10.94 0.82 15.06 0.19 0.05 / / / 23.1 / / 1
Case2 9.56 13.12 0.02 13.56 0.12 −5.88 / / / 59 / / 1
Case3 13.12 9.08 7.53 14.02 2.12 0.02 / / / 112 / / 1
Case4 5.68 11.04 0.5 17.2 0.01 0.01 / / / 76 / / 1
Case5 16 11.61 0.2 14.28 0.03 0.63 / / / 11.1 / / 2
Case6 4.3 13.68 0.41 10.29 0.6 0.11 / / / 46 / / 2
Case7 13.7 12.56 0.09 26.82 0.07 0.18 / / / 16.4 / / 2
Case8 9.7 10.8 1.14 20.05 1.96 0.26 / / / 128 / / 2
Case9 13.3 11.28 0.17 12.29 0.33 0.21 / / / 56.5 / / 2
Case10 8.5 11.77 1.08 22.61 0.12 0.3 / / / 126 / / 2
Case11 9.67 10.13 1 19.13 0.01 0.02 / / / / 0.29 1.4 × 10−8 3
Case12 10.78 13.03 0.27 40.55 0.07 −0.81 / / / / 0.08 9.4 × 10−4 3
Case13 13.2 11.1 0.84 17.08 0.24 −3.77 40.6 7.36 3.61 / / / 4
Case14 5.4 12.3 0.34 13.47 0.16 −4.58 12.3 7.71 12.8 / / / 4
Case15 16.3 10.95 0.13 12.41 0.14 −3.54 31 2.32 1.4 / / / 4
Case16 11.6 13.12 0.25 16.02 0.15 −3.64 31 2 3.06 / / / 4
Case17 11.2 9.63 0.78 13.24 0.14 −2.23 13.3 9.56 8.03 / / / 4
Case18 8.2 11.25 0.25 20.87 0.15 −2.82 33.26 0.82 0 / / / 5
Case19 9.2 14.06 0.91 18.52 0.66 −1.37 52.1 0.44 0 / / / 5
Case20 27.2 13.12 0.4 24.64 0.32 −1.72 13.9 0.36 0 / / / 5
Case21 9.7 10.8 1.14 20.05 1.96 0.26 94.3 0.75 0.1 / / / 5
Case22 8.2 13.36 0.42 25.82 0.89 −3.78 59 0.51 0 / / / 5
Case23 11.2 10.31 0.1 23.21 0.26 −3.11 41 0.03 0 / / / 5
Case24 7.3 14.6 1.35 24.68 0.61 −3.56 92.2 0.72 0.01 / / / 5
Case25 7.6 12.74 0.66 23.66 1.13 −3.45 18.9 0.98 0 / / / 5
Energies 2019, 12, 2846 22 of 27

Appendix B. Infinite-Conductivity Vertically Fractured Model


At first, a series of dimensionless variables need to be defined:

kh∆p
pwD = (A1)
1.842 × 10−3 qµB

3.6kt
tD = (A2)
ϕµCt L2
0.1592C
CD = (A3)
ϕhCt L2
y
yD = (A4)
L
x
xD = (A5)
L
r
rD = (A6)
L
where k is the permeability, ϕ is the porosity, C is the wellbore storage coefficient, L is the reference
length, Ct is the compressibility, t refers to time, µ refers to the viscosity, B is the volume factor, q refers to
the flux rate, p is the pressure, h is the formation thickness. After dimensionless treatment, the diffusion
equation in Laplace domain can be expressed as:

d2 pD 1 dpD
+ = upD (A7)
dr2D rD drD

where pD is the dimensionless pressure in Laplace domain, u refers to the Laplace variable, and rD is
the dimensionless distance. The initial condition is:

pD (rD , 0) = 0. (A8)

The internal boundary condition and exterior boundary respectively are:


" #
dpD
lim rD = −1 (A9)
rD →0 drD

pD (∞, tD ) = 0. (A10)

The general solution of Equation (A7) is shown as following:

1  √ 
pD = K0 rD u (A11)
u
where K0 is the first class zero order Bessel function. With pressure superposition method, the pressure
solution of the infinite conductivity vertically fractured model is obtained:

Z1 !
1
q
2 √
pD = K0 (xD − xi ) + y2D u da (A12)
u
−1
Energies 2019, 12, 2846 23 of 27

Appendix C. Dual-Porosity Model with Pseudo-Steady State


In the dual-porosity model, the corresponding dimensionless variables are:

k f h∆p
pwD = (A13)
1.842 × 10−3 qµB

3.6kt
tD = (A14)
(ϕCt ) f +m µL2
0.1592C
CD = (A15)
(ϕCt ) f +m µL2
km
λ = aL2 (A16)
kf
(ϕCt ) f
ω= (A17)
(ϕCt ) f (ϕCt )m
where subscript f is the natural fracture system, subscript m refers to the matrix system, w is the
wellbore system, λ is the interporosity flow coefficient, ω refers to the storage ratio. The diffusion
equation of the pseudo-steady state in Laplace domain can be expressed as:

d2 p f D 1 dp f D  
+ = λ p f D − pmD + ωupD (A18)
dr2D rD drD
 
(1 − ω)upmD = λ p f D − pmD . (A19)

The initial condition is:


p f D (rD , 0) = pmD (rD , 0) = 0. (A20)

The boundary conditions are:



dp f D 1
CD udpwD − = (A21)
drD u
rD =1


dp f D 


pwD = p f D − S  (A22)
drD 
rD = 1

p f D = pmD = 0 (A23)
rD →∞

where S is the skin factor and CD is the dimensionless wellbore storage coefficient. Combining
Equations (A18) and (A19), the general solution is determined as [44–46]:
p p p 
f (u)u + S f (u)uK1
K0 f (u)u
pwD = p p  p  p p . (A24)
u f (u)uK1 f (u)u + CD u2 K0 f (u)u + SCD u2 f (u)uK1 f (u)u

In Equation (A24),

ω(1 − ω) × u + λ
f (u) = . (A25)
(1 − ω) × u + λ
Energies 2019, 12, 2846 24 of 27

Appendix D. Radial Composite Model


For the radial composite model, the dimensionless variables are defined as follows:

kir h∆p
pirD = (A26)
1.842 × 10−3 qµir B

ker h∆p
perD = (A27)
1.842 × 10−3 qµer B
kir h∆pw f
pwD = (A28)
1.842 × 10−3 qµir B
3.6kir t
tD = (A29)
ϕµir Ct L2
0.159C
CD = (A30)
ϕhCt L2
r
rD = (A31)
L
rf
rfD = (A32)
L
(k/µ)ir
M= (A33)
(k/µ)er
(ϕCt )ir
W= (A34)
(ϕCt )er
where pirD and perD are the dimensionless pressure in inner region and outer regions, r f D is the
dimensionless radius of the interface, M is the mobility ratio, W is the dispersion ratio. The diffusion
equations in Laplace domain for the inner and outer regions of the composite model can be written as:
!
1 d dp
rD irD = upirD (A35)
rD drD drD
!
1 d dperD W
rD = uperD . (A36)
rD drD drD M
Correspondingly, the inner and outer boundary conditions are:

dpirD 1
CD udpwD − = (A37)
drD r =1
u
D

!
dp
= pirD − S irD

pwD (A38)
drD

rD =1

perD r →∞ = 0. (A39)
D

There is an interface between the inner and outer regions. For this interface, the pressure and pressure
derivative meet the following requirements:

pirD = perD r (A40)
D =r f D


dpirD 1 dperD
= . (A41)
drD M drD r

D =r f D
Energies 2019, 12, 2846 25 of 27

Therefore, the solution can be determined as:


 √   √ 
pirD = AK0 rD u + BI0 rD u . (A42)

To satisfy the conditions of the interface, the value of A and B can be obtained:

A = qD (A43)

 √   √ √  √   √ √
qD MK0 rirD Mu/W K1 rirD u U − qD MK1 rirD Mu/W K0 rirD u Mu/W
B=  √   √ √  √   √ √ . (A44)
I0 rirD u K1 rirD Mu/W Mu/W − MI1 rirD u K0 rirD Mu/W u

References
1. Muskat, M. The flow of homogeneous fluids through porous media. Soil Sci. 1938, 46, 169. [CrossRef]
2. Van Everdingen, A.F.; Hurst, W. The application of the Laplace transformation to flow problems in reservoirs.
J. Pet. Technol. 1949, 1, 305–324. [CrossRef]
3. Horner, D.R. Pressure build-up in wells. In Proceedings of the 3rd World Petroleum Congress, The Hague,
The Netherlands, 28 May–6 June 1951.
4. Ramey, H.J., Jr. Short-time well test data interpretation in the presence of skin effect and wellbore storage.
J. Pet. Technol. 1970, 22, 97–104. [CrossRef]
5. Gringarten, A.C.; Ramey, H.J., Jr.; Raghavan, R. Unsteady-state pressure distributions created by a well with
a single infinite-conductivity vertical fracture. Soc. Pet. Eng. J. 1974, 14, 347–360. [CrossRef]
6. Bourdet, D.; Ayoub, J.A.; Pirard, Y.M. Use of pressure derivative in well test interpretation. SPE Form. Eval.
1989, 4, 293–302. [CrossRef]
7. Zhou, Q.; Dilmore, R.; Kleit, A.; Wang, J.Y. Evaluating gas production performances in Marcellus using data
mining technologies. J. Nat. Gas. Sci. Eng. 2014, 20, 109–120. [CrossRef]
8. Ma, Z.; Leung, J.Y.; Zanon, S.; Dzurman, P. Practical implementation of knowledge-based approaches for
steam-assisted gravity drainage production analysis. Expert Syst. Appl. 2015, 42, 7326–7343. [CrossRef]
9. Lolon, E.; Hamidieh, K.; Weijers, L.; Mayerhofer, M.; Melcher, H.; Oduba, O. Evaluating the relationship
between well parameters and production using multivariate statistical models: A middle Bakken and three
forks case history. In Proceedings of the SPE Hydraulic Fracturing Technology Conference, The Woodlands,
TX, USA, 9–11 February 2016.
10. Wang, S.; Chen, S. Insights to fracture stimulation design in unconventional reservoirs based on machine
learning modeling. J. Pet. Sci. Eng. 2019a, 174, 682–695. [CrossRef]
11. Wang, S.; Chen, Z.; Chen, S. Applicability of deep neural networks on production forecasting in Bakken
shale reservoirs. J. Pet. Sci. Eng. 2019b, 179, 112–125. [CrossRef]
12. Awoleke, O.; Lane, R. Analysis of data from the Barnett shale using conventional statistical and virtual
intelligence techniques. SPE Reserv. Eval. Eng. 2011, 14, 544–556. [CrossRef]
13. Chu, H.; Liao, X.; Zhang, W.; Li, J.; Zou, J.; Dong, P.; Zhao, C. Applications of Artificial Neural Networks in
Gas Injection. In Proceedings of the SPE Russian Petroleum Technology Conference, Moscow, Russia, 15–17
October 2018.
14. Akbilgic, O.; Zhu, D.; Gates, I.D.; Bergerson, J.A. Prediction of steam-assisted gravity drainage steam to oil
ratio from reservoir characteristics. Energy 2015, 93, 1663–1670. [CrossRef]
15. Al-Kaabi, A.U.; Lee, W.J. Using artificial neural networks to identify the well test interpretation model
(includes associated papers 28151 and 28165). SPE Form. Eval. 1993, 8, 233–240. [CrossRef]
16. Sultan, M.A.; Al-Kaabi, A.U. Application of neural network to the determination of well-test interpretation
model for horizontal wells. In Proceedings of the SPE Asia Pacific Oil and Gas Conference and Exhibition,
Melbourne, VIC, Australia, 8–10 October 2002.
17. Kharrat, R.; Razavi, S.M. Determination of reservoir model from well test data, using an artificial neural
network. Sci. Iran. 2008, 15, 487–493.
Energies 2019, 12, 2846 26 of 27

18. AlMaraghi, A.M.; El-Banbi, A.H. Automatic Reservoir Model Identification using Artificial Neural Networks
in Pressure Transient Analysis. In Proceedings of the SPE North Africa Technical Conference and Exhibition,
Cairo, Egypt, 14–16 September 2015.
19. Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12
June 2015; pp. 815–823.
20. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA,
27–30 June 2016; pp. 779–788.
21. Gatys, L.A.; Ecker, A.S.; Bethge, M. A neural algorithm of artistic style. arXiv 2015, arXiv:1508.06576.
[CrossRef]
22. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
23. Liu, D.; Li, Y.; Agarwal, R.K. Numerical simulation of long-term storage of CO2 in Yanchang shale reservoir
of the Ordos basin in China. Chem. Geol. 2016, 440, 288–305. [CrossRef]
24. Chu, H.; Liao, X.; Chen, Z.; Zhao, X.; Liu, W. Estimating carbon geosequestration capacity in shales based on
multiple fractured horizontal well: A case study. J. Pet. Sci. Eng. 2019, 181, 106179. [CrossRef]
25. Meng, X. Horizontal Fracture Seepage Model and Effective Way for Development of Chang 6 Reservoir.
Ph.D. Thesis, Southwest Petroleum University, Chengdu, China, 2018.
26. Chu, H.; Liao, X.; Chen, Z.; Zhao, X.; Liu, W.; Dong, P. Transient pressure analysis of a horizontal well with
multiple, arbitrarily shaped horizontal fractures. J. Pet. Sci. Eng. 2019, 180, 631–642. [CrossRef]
27. Jingli, Y.; Xiuqin, D.; Yande, Z.; Tianyou, H.; Meijuan, C.; Jinlian, P. Characteristics of tight oil in Triassic
Yanchang formation, Ordos Basin. Pet. Explor. Dev. 2013, 40, 161–169.
28. Hua, Y.A.N.G.; Jinhua, F.; Haiqing, H.; Xianyang, L.I.U.; Zhang, Z.; Xiuqin, D.E.N.G. Formation and
distribution of large low-permeability lithologic oil regions in Huaqing, Ordos Basin. Pet. Explor. Dev. 2012,
39, 683–691.
29. Li, Y.; Song, Y.; Jiang, Z.; Yin, L.; Luo, Q.; Ge, Y.; Liu, D. Two episodes of structural fractures: Numerical
simulation of Yanchang Oilfield in the Ordos basin, northern China. Mar. Pet. Geol. 2018, 97, 223–240.
[CrossRef]
30. Guo, P.; Ren, D.; Xue, Y. Simulation of multi-period tectonic stress fields and distribution prediction of
tectonic fractures in tight gas reservoirs: A case study of the Tianhuan Depression in western Ordos Basin,
China. Mar. Pet. Geol. 2019, 109, 530–546. [CrossRef]
31. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks.
In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe,
NV, USA, 3–6 December 2012; pp. 1097–1105.
32. Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks.
In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia,
Italy, 13–15 March 2010; pp. 249–256.
33. Ghaderi, A.; Shahri, A.A.; Larsson, S. An artificial neural network based model to predict spatial soil type
distribution using piezocone penetration test data (CPTu). Bull. Eng. Geol. Environ. 2018, 1–10. [CrossRef]
34. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980.
35. Chen, X.; Liu, S.; Sun, R.; Hong, M. On the convergence of a class of adam-type algorithms for non-convex
optimization. arXiv 2018, arXiv:1808.02941.
36. Reddi, S.J.; Kale, S.; Kumar, S. On the convergence of adam and beyond. arXiv 2019, arXiv:1904.09237.
37. Mukhanov, A.; Arturo Garcia, C.; Torres, H. Water Control Diagnostic Plot Pattern Recognition Using Support
Vector Machine. In Proceedings of the SPE Russian Petroleum Technology Conference, Moscow, Russia,
15–17 October 2018.
38. Hamdia, K.M.; Silani, M.; Zhuang, X.; He, P.; Rabczuk, T. Stochastic analysis of the fracture toughness of
polymeric nanoparticle composites using polynomial chaos expansions. Int. J. Fract. 2017, 206, 215–227.
[CrossRef]
39. Shahri, A.A.; Asheghi, R. Optimized developed artificial neural network-based models to predict the
blast-induced ground vibration. Innov. Infrastruct. Solut. 2018, 3, 34. [CrossRef]
Energies 2019, 12, 2846 27 of 27

40. Saltelli, A.; Ratto, M.; Andres, T.; Campolongo, F.; Cariboni, J.; Gatelli, D.; Saisana, G.; Tarantola, S.
Global Sensitivity Analysis: The Primer; John Wiley & Sons: Chichester, UK, 2008; ISBN 9780470725184.
41. Saltelli, A. Sensitivity analysis for importance assessment. Risk Anal. 2002, 22, 579–590. [CrossRef]
42. Vu-Bac, N.; Lahmer, T.; Zhuang, X.; Nguyen-Thoi, T.; Rabczuk, T. A software framework for probabilistic
sensitivity analysis for computationally expensive models. Adv. Eng. Soft. 2016, 100, 19–31. [CrossRef]
43. Shahri, A.A. An optimized artificial neural network structure to predict clay sensitivity in a high
landslide prone area using piezocone penetration test (CPTu) data: A case study in southwest of Sweden.
Geotech. Geol. Eng. 2016, 34, 745–758. [CrossRef]
44. Chen, Z.; Liao, X.; Sepehrnoori, K.; Yu, W. A Semianalytical Model for Pressure-Transient Analysis of
Fractured Wells in Unconventional Plays With Arbitrarily Distributed Discrete Fractures. SPE J. 2018, 23,
2041–2059. [CrossRef]
45. Chen, Z.; Liao, X.; Zhao, X.; Lyu, S.; Zhu, L. A comprehensive productivity equation for multiple fractured
vertical wells with non-linear effects under steady-state flow. J. Pet. Sci. and Eng. 2017, 149, 9–24. [CrossRef]
46. Zongxiao, R.; Xiaodong, W.; Dandan, L.; Rui, R.; Wei, G.; Zhiming, C.; Zhaoguang, T. Semi-analytical model
of the transient pressure behavior of complex fracture networks in tight oil reservoirs. J. Nat. Gas Sci. Eng.
2016, 35, 497–508. [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like