Download as pdf or txt
Download as pdf or txt
You are on page 1of 134

Dr.

Srikanta Mishra
mishras@battelle.org
+1-614-424-5712

Application of Data Analytics in


Petroleum Engineering
Virtual Workshop Offered for
IIT(ISM) Dhanbad Dept. of Petroleum Engineering
25 September 2021
Speaker Introduction
• Technical Director, Geo-energy
Modeling & Analytics, Battelle Memorial
Institute, Columbus, Ohio, USA
• Recipient of SPE 2021 International
Award for Distinguished Membership
• Author of book “Applied Statistical
Modeling and Data Analytics”
• Instructor of multiple SPE and industry
workshops on petroleum data analytics
• SPE Distinguished Lecturer
(2018-19) on big data analytics
• Technical Lead, US Department of
Energy’s Subsurface ML Initiative
• PhD (Stanford), MS (Texas-Austin),
BTech (ISM) – all in Petroleum Engg.

Mishra - IIT(ISM) 2021 2


The Attraction / Challenge

Big Data Analytics ? Game Changer


=
large volumes of data
about subsurface, physical
infrastructure and flows

Actionable
New insights about reservoir
from “data mining” can help information
increase operational efficiencies

Mishra - IIT(ISM) 2021 3


Workshop Goals

• Provide a practical introduction to applied statistical


modeling and data analytics techniques
▪ Foundational concepts
▪ Linear regression and variants (for building simple input-output models)
▪ Multivariate data reduction and clustering (for finding sub-groups of data
that have similar attributes)
▪ Machine learning for regression and classification (for developing more
complex data-driven input-output models)
▪ R/RATTLE software demonstration (application of ML techniques)

• Emphasize practical aspects (not statistical rigor)

Mishra - IIT(ISM) 2021 4


Workshop Agenda

600 – 650p Introduction, statistical foundations –


regression analysis and multivariate statistics

650 – 700p Break

700 – 750p Machine learning basics, case studies

750 – 800p Break

800 – 850p Software demonstration

850 – 900p Wrap up

Mishra - IIT(ISM) 2021 5


What is Statistics All About?

INFORMATION  Structure ~ Trends ~ Relationships

Relationships require specifying the form of model

Mishra - IIT(ISM) 2021 6


Big Data & Analytics – What & Why?
Big Data Data Analytics

Volume
Make
better
decisions
Understand
“what does
Examine data say”
data Prediction
Velocity Variety
Learning

Data Analytics (aka Machine Learning, Knowledge Discovery, Data Mining)


helps understand hidden patterns and relationships in large, complex datasets –
machine (algorithm) determines the form of the (black-box) model

Mishra - IIT(ISM) 2021 7


Competing on Analytics
by Tom Davenport
It’s virtually impossible to differentiate yourself from competitors
based on products alone. Your rivals sell offerings similar to yours.
And thanks to cheap offshore labor, you’re hard-pressed to beat
overseas competitors on product cost.

How to pull ahead of the pack? Become an analytics competitor:


Use sophisticated data-collection technology and analysis to wring
every last drop of value from all your business processes.

With analytics, you discern not only what your customers want but
also how much they’re willing to pay and what keeps them loyal.
You look beyond compensation costs to calculate your workforce’s
exact contribution to your bottom line. And you don’t just track
existing inventories; you also predict/prevent future inventory issues.

Harvard Business Review 84(1):98-107, January 2006

Mishra - IIT(ISM) 2021 8


Types of Analytics

Mishra - IIT(ISM) 2021 9


A Rose by Any Other Name!
Artificial
Intelligence

Knowledge
Machine Discovery
Learning Data
Mining

Data Statistical
Analytics Learning

Mishra - IIT(ISM) 2021 10


Data Analytics v/s Machine Learning
v/s Artificial Intelligence
• Data analytics  data collection A
and analysis to understand Collect Data
C
hidden patterns and relationships
Collect Data
▪ Machine learning  building B
model between predictors and Infer Rules
response, most commonly using (predictive model)
black-box methods
D
E
• Artificial intelligence  Make Decision
applying predictive model with
new data to make decisions,
Data analytics  A, B
without human intervention (with Machine learning  B
possibility of feedback) Artificial Intelligence  C, B, D, E

Mishra - IIT(ISM) 2021 11


Data Analytics Process
Exploratory Data Analysis
• Multi-dimensional data visualization
• Scatter-plot matrix, trellis plots

Unsupervised Learning
• Data reduction and clustering
• PCA, k-means, self-organizing maps

Supervised Learning
• Regression and classification
• Random forest, SVM, neural nets, kriging

Mishra - IIT(ISM) 2021 12


Supervised v/s Unsupervised Learning

Ma et al., 2018, Symmetry, 10, 734

Mishra - IIT(ISM) 2021 13


Areas of Application in O&G
Combining streaming data Exploration Finding hidden patterns
with past performance Data Mining in large geologic datasets
to predict potential failures

Predictive Reservoir
Maintenance Management
Identifying factors for
Real-time prediction of system improved performance
response (drilling, fluid injection)

Performance Proxy Creating fast system


Forecasting Modeling “emulators”

Reduce cost, improve productivity, increase efficiency

Mishra - IIT(ISM) 2021 14


Exponential Growth in ML Applications
"Machine Learning" Hits from OnePetro Database
6000

5000

4000

3000

2000

1000

0
1985 1990 1995 2000 2005 2010 2015 2020 2025

Mishra - IIT(ISM) 2021 15


Examples
• Predicting oil production as a function of water
• Predicting core-derived permeability from injection rates of surrounding injectors to identify
basic well-log attributes injector-producer connectivity
• Predicting rock facies (originally based on • Predicting bottom-hole and/or surface pressure as a
core and advanced log data) from basic well function of production rate to flag potential
log attributes anomalous trends and/or adverse events
• Predicting incidence of vuggy zones in
carbonate reservoirs (originally based on
• Identifying variables responsible for equipment
failure (e.g., ESP) based
core and advanced log data) from basic
on historical data, and calculating forward-looking
well log attributes
failure probability using real-time data
• Predicting total organic carbon content in
shale formations from basic well log • Analyzing drillers logs, fields reports etc. (using
attributes natural language processing) to identify incidence of
adverse events + underlying causes
• Predicting geomechanical properties in shale
wells from basic well log attributes • Building predictive model for drilling rate of
penetration using historical data, and forecasting
• Predicting oil production from shale wells as
future response using real-time data
a function of geologic and completion related
parameters • Building fast surrogate (proxy) model using
• Calculating water saturation from basic well reservoir simulation outputs
log attributes for repetitive calculations (e.g., history matching,
uncertainty quantification)
• Predicting PVT properties from basic crude
oil and reservoir characteristics • Using data-driven fits to decline curves for
predicting EUR

Mishra - IIT(ISM) 2021 16


Statistical Foundations

Basic Regression Analysis


Ch. 4, Mishra and Datta-Gupta (2017)

Mishra - IIT(ISM) 2021 17


Linear Regression

• Given data (x1,y1), (x2, y2)….(xn,yn)


• Postulate model
yi = a + bxi +ei ; e~N(0,s)
• Minimize objective function
S(a,b) = S(yi - a - bxi)2
• Residuals should be normally distributed with
mean=0 and SD= s
• If trend in ei versus xi is non-random (i.e., cyclic,
monotonic, etc.), then linear model not appropriate

Mishra - IIT(ISM) 2021 18


Linear Regression Example
350

y = 2.0626x + 97.397
300
Initial Well Potential (BOPD)

R2 = 0.5385

250

200

150

100

50

0
0 10 20 30 40 50 60 70 80 90
Net Pay (ft)

Mishra - IIT(ISM) 2021 19


Regression Coefficients
Regression Statistics
Multiple R 0.733851
R Square 0.538538 Fraction of total variance explained by model
Adjusted R Square
0.522625
~ RMSE
Standard Error
44.71329 Estimated SD of error term in regression =
Observations 31

Coeff Std Error t Stat P-value Lower 95% Upper 95%


Intercept 97.397 13.844 7.035 9.75E-08 69.082 125.711
X Variable 1 2.063 0.355 5.818 2.63E-06 1.337 2.788

Mean and SD =Coeff/SE, the smaller ~ Coeff ± 2SE


=
of regression the bigger the better
coefficients the better (likelihood that
Coeff is different
from zero)

Mishra - IIT(ISM) 2021 20


Diagnostic Plots
Observed v/s Predicted X Variable 1 Residual Plot
400 100
350 80
300 60
Y - predicted

250 40

Residuals
200 20
0
150
-20 0 20 40 60 80 100
100
-40
50
-60
0
-80
0 100 200 300 400
-100
Y - observed X Variable 1

Normal Probability Plot


20
y = 7.2525x - 8E-16
15
R² = 0.9879
10
5
Y

0
-3 -2 -1 -5 0 1 2 3
-10
-15
Standard Normal Deviate

Mishra - IIT(ISM) 2021 21


Non-Linear Regression

• Given data (x1,y1), (x2, y2)….(xn,yn)


• Postulate model
f(yi) = a + b g(xi) +ei ; e~N(0,s)
• Minimize objective function
Can Use
S(a,b) = S(f(yi) - a - b g(xi))2 Excel SOLVER
• f(.) and g(.)  linearizing transforms
▪ Logarithmic ▪ Power
▪ Exponential ▪ Non-parametric

Mishra - IIT(ISM) 2021 22


Multiple Linear Regression

• Also called Ordinary Least Squares (OLS) regression


• Assume a linear model between response and predictors

• Solve for bk by minimizing the sum of squared residuals

• Easily interpretable, fast, well-studied statistical properties


• Can capture some non-linear behavior through
transformation of variables (e.g., quadratic model)

Mishra - IIT(ISM) 2021 23


How Many Terms in Regression?
• Seek parsimonious
balance between
goodness of fit and
model complexity
• Minimize AIC (Akaike
Information Criterion)
AIC = n log( SS E / n ) + 2 p
where
n = number of observations
p = number of model parameters
SS E = residual sum of squares

Mishra - IIT(ISM) 2021 24


Key Takeaways

• First, plot the data !!!!


• Model structure important – may need to experiment
with multiple models for best fit
• Understand distribution of residuals
• Examine significance of regression coefficients
• Check for parsimonious outcomes
• Partition data into sub-populations prior to regression
analysis, if necessary

Mishra - IIT(ISM) 2021 25


Plotting Multivariate Data

Mishra et al., 2014, Env. Geosci., 21(2), 59-74.

Mishra - IIT(ISM) 2021 26


Visualizing Correlations
Correlation CO2_MMP.csv using Spearman
• Rank (Spearman) correlation 

MWC7.

MWC5.
C2.C6

MMP
robust measure for strength of

API
C7.

Vol
C1
V.I
Int

T
1
association (linear/non-linear) C7.

• Rank samples from smallest


0.8
C2.C6

(rank=1) to largest (rank=N) MWC7.


0.6

• Helps identify both Int


0.4

▪ Redundant variables API 0.2

▪ Relevant variables MWC5. 0

• Also referred to as “feature V.I -0.2

selection” when dealing with C1


-0.4

large data sets – multiple Vol


-0.6
approaches possible MMP
-0.8
T
https://www.kdnuggets.com/2021/06/ -1
feature-selection-overview.html
Rattle 2019-Oct-07 15:02:10 MISHRAS

Mishra - IIT(ISM) 2021 27


Statistical Foundations

Multivariate Analysis
Ch. 5, Mishra and Datta-Gupta (2017)

Mishra - IIT(ISM) 2021 28


Principal Component Analysis

• Statistical technique for


▪ reducing dimensionality

▪ making data independent of each other

▪ without significant loss of information

• PCs formed by weighted linear combination of


original variables (rotation and projection)
• Coordinate transformation into a new set of variance
maximizing, mutually orthogonal coordinates (PCs)
• PCs can be interpreted as surrogate variables

Mishra - IIT(ISM) 2021 29


PCA – Details

PC = a1x1 + a2x2 + …..anxn

Weighting factors Original variables

• Weighting factors given by eigenvectors of correlation


matrix of original variables
• Relative importance of PCs given by eigenvalues of
correlation matrix
• Correlation between PCs and original variables given
by factor loadings

Mishra - IIT(ISM) 2021 30


Raw and Standardized Data

x1 x2 x3 dx1 dx2 dx3


2 1 2 -1.92 -1.85 -1.29
4 3 1 -1.19 -1.30 -1.55
6 7 5 -0.46 -0.19 -0.50
7 6 3 -0.09 -0.46 -1.02
5 4 7 -0.82 -1.02 0.02
8 9 6 0.27 0.37 -0.24
9 8 7 0.64 0.09 0.02
7 10 9 -0.09 0.65 0.54
10 11 11 1.01 0.93 1.07
8 9 8 0.27 0.37 0.28
12 11 10 1.74 0.93 0.81
9 13 14 0.64 1.48 1.85
mean 7.25 7.67 6.92 0.00 0.00 0.00
variance 7.48 12.97 14.63 1.00 1.00 1.00

Mishra - IIT(ISM) 2021 31


Eigenvalues

• Eigenvalues solution of
Correlation Matrix |C-lI| = 0
x1 x2 x3 C = correlation matrix
x1 1.000 0.886 0.750
x2 0.886 1.000 0.889
I = identity matrix
x3 0.750 0.889 1.000 l = vector of eigenvalues
• N (obs) x P (var) dataset
Eigenvalues has P eigenvalues
l 2.685
l 0.251
• Eigenvalue = variance of
l 0.065 corresponding PC
• Sli = 3 = P (trace of C)

Mishra - IIT(ISM) 2021


32
Selection of Key PCs
Eigenvalues of correlation matrix
• Based on relative magnitude of Active variables only
3.5
eigenvalues 43.38%
3.0
• Each eigenvalue represents
fraction of total variance 2.5

explained by corresponding PC 2.0


23.12%
• Criteria for selecting key PCs 1.5

13.92%

Eigenvalue
▪ Scree plot (keep all PCs above 1.0
10.27%
“floor” level) 6.30%
0.5
3.01%
▪ Kaiser criterion (keep all PCs with .00%
0.0
eigenvalue > 1)
-0.5
▪ Variance threshold (keep all PCs -1 0 1 2 3 4 5 6 7 8 9

explaining ~90% variance) Eigenvalue number

Mishra - IIT(ISM) 2021 33


Eigenvectors

• Eigenvectors solution of
Eigenvalues
|C-liI|ui = 0
l 2.685 C = correlation matrix
l 0.251 I = identity matrix
l 0.065 li = eigenvalues for i-th PC
ui = eigenvector for li
Eigenvectors • Eigenvectors are coefficients
u1 u2 u3 of variables in linear equations
0.567 0.711 0.416 defining PCs
0.597 -0.007 -0.802
0.567 -0.703 0.429
• Also define rotation from
original variable to PC space

Mishra - IIT(ISM) 2021


34
Principal Components

dx1 dx2 dx3 pc1 pc2 pc3


-1.92 -1.85 -1.29 -2.92 -0.45 0.13
-1.19 -1.30 -1.55 -2.33 0.25 -0.12
-0.46 -0.19 -0.50 -0.65 0.03 -0.26
-0.09 -0.46 -1.02 -0.91 0.66 -0.11
-0.82 -1.02 0.02 -1.06 -0.59 0.48
0.27 0.37 -0.24 0.24 0.36 -0.29
0.64 0.09 0.02 0.43 0.44 0.20
-0.09 0.65 0.54 0.64 -0.45 -0.32
1.01 0.93 1.07 1.73 -0.04 0.13
0.27 0.37 0.28 0.54 -0.01 -0.06
1.74 0.93 0.81 1.99 0.66 0.33
0.64 1.48 1.85 2.30 -0.86 -0.13
mean 0.00 0.00 0.00 0.00 0.00 0.00
variance 1.00 1.00 1.00 2.685 0.250 0.065

l1 = 2.685

PC1 = 0.27*0.567 + 0.37*0.597 – 0.24*0.567 = 0.24

Mishra - IIT(ISM) 2021 35


Example – PCA (Salt Creek Data)
0.486

0.272

0.116 0.098
0.028

Mishra - URTeC 2019


36
Cluster Analysis

• The goal is to group objects that are similar based on some


measured characteristics.
• Unsupervised classification since the operation is not guided
by a priori hypothesis or external models

Mishra - AAPG ES 2019 37


k-Means Clustering

• Modifies an initial classification by moving objects


from one group to another
• Requires a measure of distance between each pair
of points (x, y)
o Usually Euclidean distance

• Requires specifying number of clusters in advance


• Normalize variables for better, more stable results
• Needs an initial classification of the data (i.e., use
hierarchical approach for initialization)

Mishra - URTeC 2019


38
Example – Clustering (Salt Creek Data)

Mishra - URTeC 2019


39
Hierarchical Clustering

• Start with each observation in


separate groups (i.e., k = n)

• Clusters are progressively


merged until all observations are
in a single group

• At each step, the clusters chosen


for the merge are the ones that
are the least dissimilar, as defined
by some dissimilarity measure D
• Merge two clusters with smallest
dissimilarity
• Compute dissimilarity between new
cluster and all remaining clusters

Mishra - IIT(ISM) 2021 40


Key Takeaways

• Different (novel) possibilities for data visualization


• Principal component analysis for reducing data
dimensionality and identifying surrogate variables

• Cluster analysis for grouping data into statistically


homogeneous populations for model building

• 1st step in determining structure within the space of


independent variables building input-output modeling

Mishra - IIT(ISM) 2021 41


Mishra - IIT(ISM) 2021 42
Machine Learning

Basic Concepts
Ch. 8, Mishra and Datta-Gupta (2017)

Mishra - IIT(ISM) 2021 43


Data-Driven Modeling

• Classical statisticspostulate model between independent


(predictor) and dependent (response) variables
• Need to look beyond linear regression (and variants) for
complex multi-dimensional data sets
• Idea is to extract the model from the data without making
any assumptions regarding the underlying functional form
(supervised learning)
▪ regression problems, where response variable is continuous
(e.g., permeability)
▪ classification problems, where response variable is categorical
(e.g., rock type)

Mishra - IIT(ISM) 2021 44


Classification v/s Regression

Outputs are categorical Outputs are continuous

https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d

Mishra - IIT(ISM) 2021 45


Why ML Models and When?
• Historically, subsurface science and engineering
analyses have relied on mechanistic models
• Incorporation of causal input-output relationship
• Experienced professionals are wary of purely “black-
box” ML models that lack such understanding
• Nevertheless, the use of ML models is easy to justify
➢ relevant physics-based model is computation intensive
and/or immature
➢ suitable mechanistic modeling paradigm does not exist

Mishra - IIT(ISM) 2021 46


Three Cases for Black-Box Models

• When the cost of a wrong answer is low relative to the


value of a correct answer,
➢ Proxy models in history matching

• When they produce the best results,


➢ Image libraries for recognizing well-test response patterns

• As tools to inspire and guide human inquiry,


➢ Preventive maintenance applications

Holm, 2019, Science, 367, 26-27.

Mishra - IIT(ISM) 2021 47


Key Concepts for Model Building

• Predictive modeling methods


• Model evaluation and validation
• Automatic tuning of model parameters
• Model aggregation
• Variable Importance
• Classification problems

Mishra - IIT(ISM) 2021 48


Predictive Modeling Methods
Regression &
Regression &
Classification Tree
Classification Tree

Random
Random X1 < t1
Forest
Forest

Gradient
Gradient Boosting
Boosting
Machine
Machine X2 < t2 X2 < t3

Support
Support Vector
Vector
Machine
Machine R1 R2 R4 R3

Artificial Neural
Neural Partition
Build
Inputs
Find parameter
Multidimensional
Build
hyperplane
sequence
mapped
ensemble to space
ofmaximizing into
interpolation
of
outputs
trees
trees
that
via
using rectangular
address
separation
hidden
considering
random
short-
units
of
Network regions
trend
data
using awith
subsets
comings
and
and constant
sequence
of
transform values
autocorrelation
of
observations
eachof data
previous orlinear
nonlinear
into class
structure
and fitted labels
predictors
functions
tree
of
space
data

Gaussian
Gaussian Process
Process
Emulation
Emulation

Mishra - IIT(ISM) 2021 49


Regression Model Evaluation

Model overfitting likely, if evaluated solely


against training dataset
Mishra - IIT(ISM) 2021 50
Bias and Variance
Bias  difference between expected prediction and correct value
Variance  variability in predictions (due to model complexity)

Mishra - IIT(ISM) 2021 51


Goodness-of-fit Metrics
• Average absolute error (AAE)
𝑛
1
𝐴𝐴𝐸 = ෍ 𝑦𝑖 − 𝑦ො𝑖
𝑛
𝑖=1

• Mean or root mean squared error (MSE/RMSE)

• Pseudo-R2 (not bounded between 0 and 1 for general regression!)

Mishra - IIT(ISM) 2021 52


k-fold Cross Validation
Recommended, if an independent test dataset is not available

Train

Model Model Model Model Model

Full Dataset
Predict

Mishra - IIT(ISM) 2021 53


General Logistics of Model Fitting

Kuhn, M. and K. Johnson, Applied Predictive Modeling, 2013. Springer.

Mishra - IIT(ISM) 2021 54


Tuning Parameters

Method Tuning Parameters


Tree depth,
Regression Tree
cost complexity parameter
Random Forest Randomly selected predictors
Number of iterations,
Gradient Boosting Machine Tree depth,
shrinkage
Cost,
Support Vector Regression
Sigma (radial basis function)
Number of hidden units,
Neural Networks
Weight decay

Mishra - IIT(ISM) 2021 55


Model Aggregation – Why?
• Model fits measured in terms of training or test error –
multiple competing models may arise!

• Aggregating over a large set of acceptable models can


provide more robust understanding and predictions
• Ensemble models (with predictions aggregated) top
performers in data science competitions
Mishra - IIT(ISM) 2021 56
Ensemble Modeling Methods
• Model aggregation strategies
▪ Simple averaging (direct average of constituent model predictions,
e.g., using arithmetic average)

▪ Weighted averaging (weighted averaging of constituent model


predictions, e.g., using inverse of RMSE)

▪ Stacking (predictions from the constituent models are used as


predictors in an aggregate model, e.g., NN training)

• Similar arguments underlie Beven’s concept of


“equifinality” in watershed hydrology modeling in the
GLUE framework (weighted averaging)
Schuetter et al., URTeC (2019)

Mishra - IIT(ISM) 2021 57


Variable Importance

• Some algorithm specific importance


metrics available (e.g., RF, GBM, NN)

• Not always consistent across models


(because of different approaches for
handling variable interaction)
• R2-loss  loss in explanatory power
of model if the variable of interest is
dropped from regression

• Simple and intuitive model-


independent measure

• Can be aggregated for “meta ranking”

Mishra - IIT(ISM) 2021 58


Variable Importance Approaches
Strategy Notation Description
Removing a Remove a variable from the model, re-train the model and compare
Remove
variable the reduction in pseudo-R2, i.e. R2 loss.
Permute a variable’s values, which breaks the relationship between
Permuting a the variable and the true outcome, then compare the reduction in
Permute
variable pseudo-R2, i.e. R2 loss, of the dataset with permuted values to that
with true values.
The partial dependence plot shows the marginal effect of different
Partial variables on the predicted outcome. PDPs are “flat” for less important
PDP
Dependent Plot variables while the variables whose PDP vary across a wider range of
the response are more likely to be important.
Compare how the model predictions change in a small “window” of
Accumulated
ALE different variables. ALE plots are faster and unbiased alternative to
Local Effects Plot
partial dependence plots.
Local LIME attempts to understand the model by perturbing the input of
Interpretable data samples and interpreting how the predictions change. Variable
LIME
Model-Agnostic weights can then be extracted from a simple local model on the
Explanations permuted dataset to explain local behavior.
SHAP is a method to explain individual predictions based on the game
theoretically optimal Shapley values. A prediction can be explained by
Shapley
assuming that each feature value of the instance is a “player” in a
Addictive SHAP
game where the prediction is the payout. Shapley values – a method
exPlantations
from coalitional game theory, tells us how to fairly distribute the
“payout” among the features.

https://christophm.github.io/interpretable-ml-book/

Mishra - IIT(ISM) 2021 59


Classification Problems

• Goal is to use the predictors (x1, x2, …, xp) to determine


a group label for the observation
• Response y is categorical ; predictors xi can be
categorical or continuous
• As in regression, the model is trained using data
▪ Response is observed for n samples y1, y2, …, yn

▪ Predictors are observed for n samples, where the kth sample is


(x1k, x2k, …, xpk)
• Same concerns as before on evaluating model
performance (i.e., use training AND test data)

Mishra - IIT(ISM) 2021 60


Classification Model Evaluation

▪ Be aware of class imbalance, especially for rare events


− Many methods can use weights on observations, or include options for
balancing the analysis with respect to class sizes

▪ One useful visualization: ROC curve + area under curve (AUC)


− Measures probability of separating 1
two classes (true v/s false positive rate)
− Shows performance (TPR
vs. FPR) as the parameter is True Receiver Operating
varied over a range Positive
Characteristic Curve
Rate
− AUC = 1  perfect prediction
− AUC = 0.5  no ability to distinguish
between classes
0 False Positive Rate 1
− AUC = 0  perfect inverse prediction

Mishra - IIT(ISM) 2021 61


Other Classification Metrics

Recall

https://manisha-sirsat.blogspot.com/2019/04/confusion-
matrix.html
Mishra - IIT(ISM) 2021 62
Machine Learning Methods

Case Studies & Examples


Ch. 8, Mishra and Datta-Gupta (2017)

Mishra - IIT(ISM) 2021 63


Case Study 1 – Shale Well Productivity
Schuetter et al., SPEJ (2018)
Field Description
• Wolfcamp Shale ID
M12CO
Well ID
Cum. production of 1st 12 producing months (BBL)
horizontal wells MMO12 Max. monthly production of 1st 12 producing months (BBL)
MMO2CLAT Production efficiency (MMO12/LATLEN)
▪ Data from 476 Wells Opt2 Categorized operator code
COMPYR Well completion year
▪ Goal Fit M12CO SurfX, SurfY Geographic location
as a function of the AZM Azimuth angle
TVDSS True vertical depth (ft)
12 predictors DA Drift angle
LATLEN Total horizontal lateral length (ft)
▪ Multiple regression
STAGE Frac stages
modeling methods FLUID Total frac fluid amount (gal)
PROP Total proppant amount (lb)
▪ Model validation + PROPCON Proppant concentration (lb/gal)
variable importance GORMM12 Gas-oil-ratio of the max. producing month (MMcf/BBL)
GORC12 Avg. gas-oil-ratio of the 1st 12 producing months (MMcf/BBL)

Mishra - IIT(ISM) 2021 64


Scatter Plot Matrix Analysis

Mishra - IIT(ISM) 2021 65


Model Fits

Variety of
modeling methods

Three types of
model validation
- full training data
- 10-fold CV
- held out test data

Mishra - IIT(ISM) 2021 66


Predictor Importance

Mishra - IIT(ISM) 2021 67


Conditional Sensitivity Analysis

Mishra - IIT(ISM) 2021 68


Classification Tree Analysis

▪ Binary classification to identify


factors separating top 25% from Top 25%: Not
bottom 25% producing wells too shallow, not
too deep, long
lateral with
▪ Accuracy: more proppant,
but not too long

BOTTOM TOP CORRECT


25% 25% ID
BOTTOM
62 18 78%
25%
TOP 25% 7 73 91%
TOTAL 69 91 70%

Mishra - IIT(ISM) 2021 69


Ensemble Modeling
M1 — direct averaging; M2 — weighted averaging;
M3a — stacking with NN; M3b — stacking with RF

RMSE
Model Name
(x1k BBL)
M1 37.57
M2 37.45
M3a 36.21
M3b 36.15
LPM 47.12
QPM 40.03
SVR 39.00
RF 38.33
GBM 40.40

Mishra - IIT(ISM) 2021 70


Case Study 2 – Vug Characterization
Howat et al., AAPG (2016)
• Vuggy zones create high-
permeability pathways in
carbonate rocks

• Generally identified from


cores and FMI logs

• Challenge: Can vuggy


zones be identified from
well-log response alone? Zone of high
density vugs

• Approach: use exploratory


data analysis and machine
learning tools to create Large
classification rules Vug

Crystalline
Dolomite

Mishra - IIT(ISM) 2021 71


Machine Learning Phase-1

• Identify vugs in a
single well using
image logs and
core samples

• Using that “truth”


data, train several
models to detect
vugs using sensor
log data only

Mishra - IIT(ISM) 2021 72


Machine Learning Phase-2

• Identify vugs in multiple wells Correct ID


Held Out Well
Rate
• Evaluate using the best Well #1 0.721
performing model from Well #2 0.675
Phase I on the new data Well #3 0.748
▪ Wells held out one at a time Well #4 0.820
Well #5 0.767
▪ Model trained using the other Well #6 0.885
wells, then predicted on the held Well #7 0.733
out well
Well #8 0.604
• Vug correct identification rate Well #9 0.810
range 60% – 90% Well #10 0.820

Mishra - IIT(ISM) 2021 73


Example Predictions on a Well

• Train a final model


using all the wells,
then use it to identify
vugs in wells for which
no image logs are
currently available

• Output file is a
Synthetic Vug Log:
SVL (0-1)

Mishra - IIT(ISM) 2021 74


Mapping Vugs in Multiple Wells

Probability
of Vugs

q = 5 bbl/min q = 5 bbl/min q = 1 bbl/min

Mishra - IIT(ISM) 2021 75


Case Study 3 – Prediction & Optimization
of Drilling Rate of Penetration

• Goal  Fit data-driven


model to predict and
optimize ROP during drilling
• Example  Data from
vertical well in Texas
• Selected features
▪ Weight on bit (WOB)
▪ Rotary speed of drilling (RPM)
▪ Rock strength (UCS)
▪ Flow rate

Hegde and Gray, 2017, JNGSE,


40, 327-335
Mishra - IIT(ISM) 2021
Model Fitting and Validation

• Random Forest
▪ R^2 = 0.96
▪ RMSE = 7.4 ft/hr
▪ Mean error = ~5%

• Linear regression
▪ R^2 = 0.42
▪ RMSE = 18.4 ft/hr
▪ Mean error = ~14%

• Results above are for


Tyler sandstone
• RF error <10% for all
other formations

Mishra - IIT(ISM) 2021


(Near) Real-time Optimization

• Using data-driven model, explore


feature space for improved ROP
▪ Change WOB, RPM, flow
▪ Use feature subspace in vicinity of
depth of interest
▪ Should not extrapolate, but expand
range of training data

• Possible to improve efficiency by


increasing ROP
• Practical considerations would
call for optimization over finite
intervals
Improved performance and cost savings

Mishra - IIT(ISM) 2021


Example [1]
Perez et al.
SPERE April 2005

• Classification tree
analysis for
identifying rock
types from basic well
log attributes
• Accounting for
missing well logs
• Application for
permeability
prediction in Salt
Creek field

Mishra - IIT(ISM) 2021 79


Example [2]
Shelley et al.
SPE-171003, 2014

• Identifying performance
drivers and completion
effectiveness for
Marcellus shale wells
• Predictive model using
ANN (Artificial Neural
Networks)
• Role of different
variables evaluated

Mishra - IIT(ISM) 2021 80


Example [3]
Schuetter et al.
SPE 174905, 2015

• Building proxy model to CO2


geologic sequestration full-
physics simulation output
• Compositional simulation
with 9 inputs and 3 inputs
• Different designs (Box-
Behnken, Maximin LHS,
Maximum Entropy) used to
generate training runs
• Response fitted with
quadratic and kriging models

Mishra - IIT(ISM) 2021 81


Example [4]
Santos et al.
OTC-26275, 2014

• Building prognostic
classifier for specific
turbogenerator failures
during startup

Test Accuracy
• Data from offshore (deg C)
facility – extraction of
Temperaure
Temperature

fuel burning related


Average

features
• RUSBoost and RF
models
Validation Set

• Multi-fold validation
approach for evaluation
Mishra - IIT(ISM) 2021 82
Example [5]
Arumugam et al.
SPE-184062, 2016

• Processing of daily
drilling data to identify
drilling anomalies / Drill, Directional Drill,
Connections increased,
best practices observed excess drag,
observed fresh cuttings
▪ Information retrieval

▪ Conversion to Drill, maintained


structured data ROP/WOB/Torque, No
Drill, tight spot, tank stopped caving observed, Good
increasing, losses in trip tank, hole cleaning, No
▪ Clustering rare cavings, over pulled, vibrations
observed torque, drags
observed
▪ Pattern identification

▪ Knowledge management

Mishra - IIT(ISM) 2021 83


Recap of Lessons Learned

• Problem formulation is important


• Predictive modeling is nuanced
• Multiple competing models may exist
• Data quality/quantity can compromise results
• Unwrapping black-box models is difficult
• Communicating results can be challenging
• Text-based datasets mostly untapped

Mishra - IIT(ISM) 2021 84


Mishra - IIT(ISM) 2021 85
Machine Learning Methods

Software Demo

Mishra - IIT(ISM) 2021 86


R / RATTLE
• R – a freely available language and environment for statistical
computing and graphics which provides a wide variety of statistical
and graphical techniques: linear and nonlinear modelling, statistical
tests, time series analysis, classification, clustering, etc

• https://cran.r-project.org/

• Rattle – a popular GUI for data mining using R. It presents statistical


and visual summaries of data, transforms data so that it can be
readily modelled, builds both unsupervised and supervised machine
learning models from the data, presents the performance of models
graphically, and scores new datasets for deployment into production.

• https://rattle.togaware.com/

Mishra - IIT(ISM) 2021 87


Problem 1 (Regression)
• Data set => Salt_Creek_regress
• 5 well-logs as inputs (log10(LLD), GR, NPHI, RHOB, PEF)
• Permeability as output (ln(Kg))
• Exploratory data analysis (distribution, correlation)
• Modeling (linear, tree, random forest, neural net)
• Validation
• Variable importance

Mishra - IIT(ISM) 2021 88


Problem 2 (Classification)
• Data set => Salt_Creek_classify
• 5 well-logs as inputs (log10(LLD), GR, NPHI, RHOB, PEF)
• Class of permeability as output (high or low)
• Modeling (linear, tree, random forest, SVM, neural net)
• Validation
• Variable importance

Mishra - IIT(ISM) 2021 89


Wrap-up Comments

Mishra - IIT(ISM) 2021 90


Resources

• Mishra, Srikanta; and Akhil Datta-Gupta (2017). Applied Statistical Modeling


and Data Analytics for the Petroleum Geosciences, Elsevier, New York, NY.
• Mohaghegh, Shahab (2017). Data-Driven Reservoir Modeling, Society of
Petroleum Engineers, Richardson, TX.
• Mishra, Siddharth; Hao Li; and Jiabo He (2019). Machine Learning in
Subsurface Characterization, Gulf Professional Publishing, Houston, TX.
• Holdaway, Keith (2014). Harness Oil and Gas Big Data with Analytics: Optimize
Exploration and Production with Data-Driven Models, John Wiley & Sons, New
York, NY.
• Hastie, T., R. Tibshirani, and J.H. Friedman, 2008. The Elements of Statistical
Learning: Data Mining, Inference, and Prediction, Springer, New York, NY.
• Kuhn, M. and K. Johnson, 2013, Applied Predictive Modeling, 2013. Springer,
New York, NY.

Mishra - IIT(ISM) 2021 91


Software Resources

• Mishra and Datta-Gupta (2017) companion site – which includes:


(a) GRACE (non-parametric regression), (b) E-FACIES
(multivariate analysis), (c) E-REGRESS (experimental design),
(d) misc R scripts for machine learning, and (e) example datasets
https://www.elsevier.com/books-and-journals/book-
companion/9780128032794/software
• R – open source statistical analysis software
• Python – Programming language with built-in libraries
• WEKA – suite of machine learning software in Java
• Commercial packages (MATLAB, SAS, SPSS, …..)

Mishra - IIT(ISM) 2021 92


Combining the Old and the New
350

y = 2.0626x + 97.397
300

Initial Well Potential (BOPD)


R2 = 0.5385

250

200

150

100

50

0
0 10 20 30 40 50 60 70 80 90
Net Pay (ft)

Exploratory Regression Multivariate


Data Analysis Modeling Analysis

0.5

-0.5

-1
1

0.5 1
0.5
0
0
-0.5
-0.5
-1 -1

Machine Experimental Uncertainty


Learning Design Quantification

Mishra - IIT(ISM) 2021 93


Recommended Workflow
• Framing the problem
• Checking the data
• Selecting the causal variables
• Picking the software
• Choosing the modeling technique(s)
• Validating the model
• Understanding and communicating the results

Mishra - IIT(ISM) 2021 94


Challenges for Acceptance of ML

• Our ML models are not very good.

• If I don’t understand the model,


how can I believe it?

• We are still waiting for the “Aha” moment!

• My staff need to learn data science, but how?


Mishra et al., 2021, JPT (March)

Mishra - IIT(ISM) 2021 95


Challenges for Acceptance of ML
Poor Model Quality Lack of Understanding
• Consumer marketing ML/AI models • Articulate adequacy of predictors
are not necessarily highly accurate!
• Demonstrate model robustness
• Need to manage expectations re.
quality of fit for subsurface models • Explain inner workings (key variables)
• Focus more on added value from ML • Use creative visualizations
models and complementary role

“Aha” Moment? Learning Data Science


• ML model may or may not produce • Significant (informal) self-learning to
new insights
become “citizen data scientists”
• Provides an alternative quantitative
input-output relationship • Need formal knowledge of
conventional data analysis, python/R
• Useful when physics-based model is
slow, data-intensive or immature programming, and machine learning

Mishra et al., 2021, JPT (March)


Mishra - IIT(ISM) 2021 96
Q1– Which Software Should I Use?
• Commercial visualization tools
▪ Spotfire, Tableau

• “Light and Easy” statistics tools


▪ Excel, JMP, SPSS, Minitab

• Commercial statistics software


▪ SAS, MATLAB, Stata

• Open source statistics software


▪ R/RATTLE, Python

Mishra - IIT(ISM) 2021 97


Q2– Do I Need Machine Learning?
Linear
Linear

Random
Random Forest
Forest

Mishra - IIT(ISM) 2021 98


Q3– Which Technique Works Best?

• No single technique is Power of Ensemble Modeling


consistent best performer
• Often, multiple competing
models have equally good
fits (R2/RMSE)

• Aggregate models  robust


understanding
and predictions
• Pick “Forest” over “Trees”
• Baseline regression model + one(+) tree-based model (e.g.,
RF, GBM) + one(+) non-tree based model (e.g., SVM, ANN)

Mishra - IIT(ISM) 2021 99


Q4– How Much Data Do I Need?
Large datasets can produce poor
results if key causal variables
are not included in the model

N=81

Robust models can be built with N=2935


small datasets if all relevant causal
variables are included in the model

Mishra - IIT(ISM) 2021 100


Q5– How Do I Learn Data Science?
• Citizen data scientist/analyst
(one who learns from data)
▪ Basic skills  domain knowledge (e.g., PE)

▪ New skills  statistics/ML, programming

• Core (data science) competencies


▪ Data collection, preparation, exploration

▪ Data storage and retrieval

▪ Computing with data

▪ Applied machine learning

▪ Data visualization/communication

Donoho, J. Comp. Graphical Stat, 2017 https://www.linkedin.com/pulse/new-venn-


diagram-data-science-pierluigi-casale/

Mishra - IIT(ISM) 2021 101


Looking Ahead

• Growing trend towards the use of statistical and machine


learning techniques for oil and gas applications
• Goal  “mine” big data and develop data-driven insights
to improve reservoir description & performance prediction
• Broader applications for digital oil field data management,
real-time analytics and predictive maintenance
• Petro-techs need to develop better understanding of
full repertoire of available techniques and their potential
• Data scientists need to understand problem domain to
propose/apply appropriate techniques

Mishra - IIT(ISM) 2021 102


Gartner “Hype” Trajectory

Mishra - IIT(ISM) 2021 103


Final Thoughts

Beware the
hype / manage
expectations

ML comes
after posing
the problem

Don’t forget
the physics

Mishra - IIT(ISM) 2021 104


Dr Srikanta Mishra
Battelle, Columbus, OH
(614) 424-5712
mishras@battelle.org

Mishra - IIT(ISM) 2021 105


Machine Learning

Regression/Classification Techniques
Ch. 8, Mishra and Datta-Gupta (2017)

Mishra - IIT(ISM) 2021 106


Regression Trees (RT)

• Regression trees are simple, interpretive models to


describe how the predictors impact the response
• General idea:
▪ Split the predictor space into nested rectangular regions

▪ Within each region, predict the response with a constant value

▪ Can handle non-linear behavior using arbitrarily small regions

• Why is it called a tree?


▪ Rectangular regions are defined by using a branching structure

▪ Each branch is a split obtained by applying a threshold to the value


of one of the predictors

Mishra - IIT(ISM) 2021 107


RT – Schematic

X1 < t1
R3
R2 t3
X2

X2 < t2 X2 < t3
t2
R4
R
R1 R2 R4 R3
t1
X1

Mishra - IIT(ISM) 2021 108


RT – Procedure

• Parameters to be chosen at each split:


▪ Predictor j ; Threshold value s; Predicted response c1 and c2

▪ Find j, s, c1 and c2 to minimize

• Given j and s, the best prediction within each node is just


the mean response
• Pruning (balance between complexity and fit quality)
▪ Grow a tree to nearly full size

▪ Select the subtree that optimizes a complexity criterion

Mishra - IIT(ISM) 2021 109


Example – RT Fit

Regression
True Surface Tree

Mishra - IIT(ISM) 2021 110


RT – Pruning

summation term representing overall node


impurity, and a penalty term combining a
tuning (cost-complexity) parameter and the
number of terminal nodes.

Mishra - IIT(ISM) 2021 111


Different Levels of Pruning

No Pruning More complicated, less pruning

Mishra - IIT(ISM) 2021 112


RT – Pros and Cons

• Advantages
▪ Interpretable

▪ Identifies important predictors and critical values

▪ Easy to fit, fast evaluation

▪ Invariant to monotone transformations of inputs

▪ Resistant to outliers

• Disadvantages
▪ Less accurate than other models

▪ Prone to overfitting (can use pruning to mitigate this)

Mishra - IIT(ISM) 2021 113


Classification Tree Analysis

• Analog of regression trees


• Provides rules that partition output into categories
based on input values

• Useful for determining prediction rules and finding


structure in data

• 2-variable “partition plot” helps visualize separation


of categorical outcomes

• Widely used in medical decision making

Mishra - IIT(ISM) 2021 114


Simple Classification Tree
x1 x2 y
1 3 High
3 1 High
x1 x2 y
4 5 High
3 1 High
9 2 Low 5 4 Low
1 3 High 9 2 Low
11 6 Low
x1 x2 y 5 4 Low
4 5 High 2 7 High
4 5 High
11 6 Low 6 8 High
3 1 High
9 9 High
1 3 High 2 7 High
8 10 High
5 4 Low 6 8 High
6 11 High
9 2 Low 9 9 High
7 12 High
11 6 Low 8 10 High
2 7 High 6 11 High x1 > 4.5
7 12 High
Low
6 8 High
9 9 High
8 10 High x2 < 6.5
6 11 High
7 12 High
x1 < 4.5 High

x2 > 6.5 High

Mishra - IIT(ISM) 2021 115


Random Forest Regression (RF)

• Random forest regression uses an ensemble of trees to


increase performance of a single regression tree

• Same data input always yields the same tree – how to


introduce variation?
▪ Fit each tree with only a subset of the data (a bootstrap sample)

▪ Constrain branches to select from a random selection of predictors

▪ Forces trees to view the dataset from multiple perspectives

• Called a “bagging” approach (bootstrap aggregation)

Mishra - IIT(ISM) 2021 116


RF – Schematic

Mishra - IIT(ISM) 2021 117


RF – Procedure

• Prediction
▪ Observation is passed through all of the trees in the ensemble

▪ Each tree produces a regression estimate

▪ Final estimate is an average of those tree-level estimates

• Built-in cross-validation
▪ Since each tree sees only a subset of the data, the remaining
observations are called out-of-bag samples
▪ For that tree, those out-of-bag samples are independent test data

▪ Used to get error rate estimates for gauging model performance

Mishra - IIT(ISM) 2021 118


Example – RF Fit

True Surface (100 sampled points) Random Forest – 500 Trees

Mishra - IIT(ISM) 2021 119


RF – Pros and Cons

• Advantages
▪ Can handle highly non-linear behavior

▪ Useful built-in methods (cross-validation, variable importance,


proximity measure, missing data imputation)
▪ Invariant to monotone transformations of inputs

▪ Resistant to outliers

• Disadvantages
▪ Not easily interpretable

▪ Slower to fit, although it is parallelizable

▪ Can have more trouble modeling certain types of surfaces

Mishra - IIT(ISM) 2021 120


RF- Classification
• The random forest classifier is trained the same way as it
was in the regression setting
▪ Now classification trees are used instead of regression trees

• Prediction
▪ Observation is passed through all of the trees in the ensemble

▪ Each tree produces a predicted class label

▪ Final label is most popular vote among the trees

• Out-of-bag samples used to estimate misclassification


rate on independent test data
• Advantages and disadvantages are similar to those in the
regression case

Mishra - IIT(ISM) 2021 121


Gradient Boosting Machines (GBM)

• GBMs are a different kind of ensemble method where


models in the ensemble are trained sequentially
• Each model is trained to overcome weaknesses of
previous models in the sequence
• General procedure:
▪ Begin with a base model F0(x)

▪ Repeat for m = 1, …, M:
− Fit a model hm(x) to the negative gradient of the residuals y – Fm-1(x)
− Let Fm(x) = Fm-1(x) + hm(x)
▪ Make predictions with the final model FM(x)

Mishra - IIT(ISM) 2021 122


GBM – Schematic

Mishra - IIT(ISM) 2021 123


Example – GBM Fit

Random Forest – 500 Trees GBM – 150 Trees

Mishra - IIT(ISM) 2021 124


GBM – Pros and Cons

• Advantages
▪ Invariant under all monotone transformations of the input variables

▪ Robust against presence of irrelevant input variables

▪ Easy handling of missing values

▪ Useful built-in methods, similar to Random Forest

▪ Competitive accuracy

• Disadvantages
▪ Can easily overfit

▪ Can take a while to fit, but there are tricks for speeding this up

▪ Not easily interpretable

Mishra - IIT(ISM) 2021 125


GBM - Classification
• GBMs are easily ported over from regression to the
classification setting
• TreeBoost scenario:
▪ Rather than fitting a single model at each step, there are K trees,
one for each group
▪ Negative gradient is based on multinomial deviance, rather than
regression residuals
• Advantages and disadvantages are the same as in the
regression setting
• One of the better models out there for regression or
classification

Mishra - IIT(ISM) 2021 126


Support Vector Regression (SVR)
• SVR machines are linear models with ε-insensitive loss
• Errors within ε-tube are ignored
• vi are the training inputs
• w is the vector of parameters
▪ Goal is to minimize

▪ Alternate formulation:

• Kernel “trick” to handle non-linearities


Mishra - IIT(ISM) 2021 127
SVR – Kernel Trick
• The “kernel trick”
▪ The model is only specified through the dot product of the support
vectors and the predictors (vitx)
▪ The dot product can be replaced by any kernel function

Polynomial

Gaussian

Exponential

Hyperbolic Tangent

▪ Using kernels like the ones above can produce regression fits to
non-linear surfaces
Mishra - IIT(ISM) 2021 128
Kernel Trick - Schematic

Mishra - IIT(ISM) 2021 129


SVR – Pros and Cons

• Advantages
▪ Can capture non-linear behavior
− Kernel function allows adaptability to many situations
▪ Accurate predictor compared to most methods

• Disadvantages
▪ Not easily interpretable

▪ Prediction requires storage of training data

▪ Can be influenced by outliers

Mishra - IIT(ISM) 2021 130


Support Vector Machines
• SVMs1-2 define a hyperplane that separates two classes
▪ Hyperplane maximizes the β0 + xtβ = 1
margin between the classes Group A β0 + xtβ = 0
(Y = 1)

β0 + xtβ = -1
▪ Prediction is made using the sign
of the hyperplane equation
Group B
(Y = -1)

▪ β is a linear combination of vectors lying exactly on the margin


− These are called support vectors

• The kernel trick works here as it does in SVR machines


1. Vapnik V., The Nature of Statistical Learning Theory, Springer, 2000.
2. Hastie, T., R. Tibshirani, and J.H. Friedman, The elements of statistical learning: data mining, inference, and prediction. Second Edition 2008. Springer.

Mishra - IIT(ISM) 2021 131


Neural Networks (NN)

• Neural Nets mimic how the


human brain works
• Neurons collect electrical
impulses from surrounding
cells (Σ)
• These impulses are combined
in a non-linear fashion (σ)
• Electrical impulses are
attenuated by the strengths of
the interconnections between
neurons (weights, bias)

Mishra - IIT(ISM) 2021 132


NN – Procedure

• Each hidden unit (neuron) is a non-linear function of


weighted linear combinations of the inputs
▪ σ is the activation function

▪ Most commonly used


function today is a sigmoid

• Outputs fk(x) are different non-linear functions of weighted


linear combinations of the hidden units
• gk is the output function

Mishra - IIT(ISM) 2021 133


Building the NN Model

• How many parameters?


▪ In a fully connected network
− p+1 parameters for each of M1 units in the first hidden layer
− M1+1 parameters for each of M2 units in the second hidden layer
− Continue until the last layer, with K outputs
▪ Example
− p = 4 inputs
− Two hidden layers (M1 = 5, M2 = 3)
− K = 2 outputs
− That is 5(4+1) + 3(5+1) + 2(3+1) =
25 + 18 + 8 = 51 parameters

Mishra - IIT(ISM) 2021 134

You might also like